URL is the acronym for Uniform Resource Locator. It is a reference (an address) to a resource on the Internet. You provide URLs to your favorite Web browser so that it can locate files on the Internet in the same way that you provide addresses on letters so that the post office can locate your correspondents.
Java programs that interact with the
Internet also may use URLs to find the resources on the Internet they wish to
access. Java programs can use a class called URL
in the java.net
package to represent a URL address.
Terminology Note: The
term URL can be ambiguous. It can refer to an Internet address or a URL
object in
a Java program. Where the meaning of URL needs to be specific, this text uses
"URL address" to mean an Internet address and "URL
object" to refer to an instance of the URL
class in a
program.
If you've been surfing the Web, you have undoubtedly heard the term URL and have used URLs to access HTML pages from the Web.
It's often easiest, although not entirely accurate, to think of a URL as the name of a file on the World Wide Web because most URLs refer to a file on some machine on the network. However, remember that URLs also can point to other resources on the network, such as database queries and command output.
Definition: URL is an acronym for Uniform Resource Locator and is a reference (an address) to a resource on the Internet.
The following is an example of a URL which addresses the Java Web site hosted by Sun Microsystems:
As in the previous diagram, a URL has two main components:
· Protocol identifier
· Resource name
Note that the protocol identifier and the resource name are separated by a colon and two forward slashes. The protocol identifier indicates the name of the protocol to be used to fetch the resource. The example uses the Hypertext Transfer Protocol (HTTP), which is typically used to serve up hypertext documents. HTTP is just one of many different protocols used to access different types of resources on the net. Other protocols include File Transfer Protocol (FTP), Gopher, File, and News.
The resource name is the complete address to the resource. The format of the resource name depends entirely on the protocol used, but for many protocols, including HTTP, the resource name contains one or more of the components listed in the following table:
Host Name |
The name of the machine on which the resource lives. |
Filename |
The pathname to the file on the machine. |
Port Number |
The port number to which to connect (typically optional). |
Reference |
A reference to a named anchor within a resource that usually identifies a specific location within a file (typically optional). |
For many protocols, the host name and the
filename are required, while the port number and reference are optional. For
example, the resource name for an HTTP URL must specify a server on the network
(Host Name) and the path to the document on that machine (Filename); it also
can specify a port number and a reference. In the URL for the Java Web site java.sun.com
is the host name and the trailing slash is shorthand for the file named /index.html
.
The easiest way to create a URL
object is
from a String
that represents the
human-readable form of the URL address. This is typically the form that another
person will use for a URL. For example, the URL for the Gamelan site, which is
a directory of Java resources, takes the following form:
http://www.gamelan.com/
In your Java program, you can use a String
containing
this text to create a URL
object:
URL gamelan = new URL("http://www.gamelan.com/");
The URL
object
created above represents an absolute URL. An absolute URL contains all
of the information necessary to reach the resource in question. You can also
create URL
objects from a relative URL
address.
A relative URL contains only enough information to reach the resource relative to (or in the context of) another URL.
Relative URL specifications are often used
within HTML files. For example, suppose you write an HTML file called JoesHomePage.html
.
Within this page, are links to other pages, PicturesOfMe.html
and MyKids.html
, that are on the same
machine and in the same directory as JoesHomePage.html
.
The links to PicturesOfMe.html
and MyKids.html
from JoesHomePage.html
could be specified
just as filenames, like this:
<a href="PicturesOfMe.html">Pictures of Me</a>
<a href="MyKids.html">Pictures of My Kids</a>
These URL addresses are relative URLs.
That is, the URLs are specified relative to the file in which they are
contained – JoesHomePage.html
.
In your Java programs, you can create a URL
object
from a relative URL specification. For example, suppose you know two URLs at
the Gamelan site:
http://www.gamelan.com/pages/Gamelan.game.html
http://www.gamelan.com/pages/Gamelan.net.html
You can create URL
objects
for these pages relative to their common base URL:
http://www.gamelan.com/page/
like this:
URL gamelan = new URL("http://www.gamelan.com/pages/");
URL gamelanGames = new URL(gamelan, "Gamelan.game.html");
URL gamelanNetwork = new URL(gamelan, "Gamelan.net.html");
This code snippet uses the URL
constructor that lets you create a URL
object
from another URL
object (the base) and a relative
URL specification. The general form of this constructor is:
URL(URL baseURL, String relativeURL)
The first argument is a URL
object
that specifies the base of the new URL
. The
second argument is a String
that specifies the rest of the
resource name relative to the base. If baseURL
is
null, then this constructor treats relativeURL
like an absolute URL specification. Conversely, if relativeURL
is
an absolute URL specification, then the constructor ignores baseURL
.
This constructor is also useful for creating
URL
objects for named anchors (also called references) within a file. For example,
suppose the Gamelan.network.html
file has a named
anchor called BOTTOM
at the bottom of the file. You
can use the relative URL constructor to create a URL
object for
it like this:
URL gamelanNetworkBottom = new URL(gamelanNetwork, "#BOTTOM");
The URL
class
provides two additional constructors for creating a URL
object.
These constructors are useful when you are working with URLs, such as HTTP
URLs, that have host name, filename, port number, and reference components in
the resource name portion of the URL. These two constructors are useful when
you do not have a String containing the complete URL specification, but you do
know various components of the URL.
For example, suppose you design a network
browsing panel similar to a file browsing panel that allows users to choose the
protocol, host name, port number, and filename. You can construct a URL
from the
panel's components. The first constructor creates a URL
object
from a protocol, host name, and filename. The following code snippet creates a URL
to the Gamelan.net.html
file at the Gamelan site:
new URL("http", "www.gamelan.com", "/pages/Gamelan.net.html");
This is equivalent to
new URL("http://www.gamelan.com/pages/Gamelan.net.html");
The first argument is the protocol, the second is the host name, and the last is the pathname of the file. Note that the filename contains a forward slash at the beginning. This indicates that the filename is specified from the root of the host.
The final URL
constructor adds the port number to the list of arguments used in the previous
constructor:
URL gamelan = new URL( "http", "www.gamelan.com", 80,
"pages/Gamelan.network.html" );
This creates a URL
object for
the following URL:
http://www.gamelan.com:80/pages/Gamelan.network.html
If you construct a URL
object
using one of these constructors, you can get a String
containing the complete URL address by using the URL
object's toString
method or the equivalent toExternalForm
method.
Each of the four URL
constructors throws a MalformedURLException
if the arguments
to the constructor refer to a null
or unknown protocol. Typically,
you want to catch and handle this exception by embedding your URL constructor
statements in a try
/catch
pair,
like this:
try {
URL myURL = new URL(. . .)
} catch (MalformedURLException e) {
. . .
// exception handler code here
. . .
}
Note: URL
s are
"write-once" objects. Once you've created a URL
object,
you cannot change any of its attributes (protocol, host name, filename, or port
number).
The URL
class
provides several methods that let you query URL
objects.
You can get the protocol, host name, port number, and filename from a URL using
these accessor methods:
getProtocol
– Returns the protocol
identifier component of the URL.
getHost
– Returns the host name
component of the URL.
getPort
–
Returns the port number component of the URL. The getPort
method
returns an integer that is the port number. If the port is not set, getPort
returns -1.
getFile
– Returns the filename
component of the URL.
getRef
– Returns the reference
component of the URL.
Note: Remember that not all URL addresses contain these components. The URL class provides these methods because HTTP URLs do contain these components and are perhaps the most commonly used URLs. The URL class is somewhat HTTP-centric.
You can use these get
XXX
methods to get information about the URL regardless of the constructor that you
used to create the URL object.
The URL class, along with these accessor methods, frees you from ever having to parse URLs again! Given any string specification of a URL, just create a new URL object and call any of the accessor methods for the information you need. This small example program creates a URL from a string specification and then uses the URL object's accessor methods to parse the URL:
import java.net.*;
import java.io.*;
public class ParseURL {
public static void main(String[] args) throws Exception {
URL aURL = new URL("http://java.sun.com:80/docs/books/"
+ "tutorial/index.html#DOWNLOADING");
System.out.println("protocol = " + aURL.getProtocol());
System.out.println("host = " + aURL.getHost());
System.out.println("filename = " + aURL.getFile());
System.out.println("port = " + aURL.getPort());
System.out.println("ref = " + aURL.getRef());
}
}
Here's the output displayed by the program:
protocol = http
host = java.sun.com
filename = /docs/books/tutorial/index.html
port = 80
ref = DOWNLOADING
After you've successfully created a URL
, you can
call the URL
's openStream()
method to get a stream from which you can read the contents of the URL. The openStream()
method returns a java.io.InputStream
object, so reading from a URL is as easy as reading from an input stream.
The following small Java program uses openStream()
to get an input stream on the URL http://www.yahoo.com/
. It then opens a
BufferedReader
on the input stream and reads from the BufferedReader
thereby reading from the URL. Everything read is copied to the standard output
stream:
import java.net.*;
import java.io.*;
public class URLReader {
public static void
main(String[] args) throws Exception {
URL yahoo = new
URL("http://www.yahoo.com/");
BufferedReader in =
new BufferedReader(
new
InputStreamReader(yahoo.openStream()));
String inputLine;
while ((inputLine =
in.readLine()) != null)
System.out.println(inputLine);
in.close();
}
}
When you run the program, you should see,
scrolling by in your command window, the HTML commands and textual content from
the HTML file located at http://www.yahoo.com/
. Alternatively,
the program might hang or you might see an exception stack trace.
After you've successfully created a URL
object,
you can call the URL
object's openConnection
method to connect to it. When you connect to a URL
, you are
initializing a communication link between your Java program and the URL over
the network. For example, you can open a connection to the Yahoo site with the
following code:
try {
URL yahoo = new URL("http://www.yahoo.com/");
URLConnection yahooConnection = yahoo.openConnection();
} catch (MalformedURLException e) { // new URL() failed
. . .
} catch (IOException e) { // openConnection() failed
. . .
}
If possible, the openConnection
method creates a new URLConnection
(if an appropriate one
does not already exist), initializes it, connects to the URL, and returns the URLConnection
object. If something goes wrong--for example, the Yahoo server is down--then
the openConnection
method throws an
IOException.
Now that you've successfully connected to
your URL, you can use the URLConnection
object to perform
actions such as reading from or writing to the connection. The next section
shows you how.
If you've successfully used openConnection
to initiate communications with a URL, then you have a reference to a URLConnection
object. The URLConnection
class contains many
methods that let you communicate with the URL over the network. URLConnection
is an HTTP-centric class; that is, many of its methods are useful only when you
are working with HTTP URLs. However, most URL protocols allow you to read from
and write to the connection. This section describes both functions.
The following program performs the same
function as the URLReader
program shown in Reading
Directly from a URL.
However, rather than getting an input stream
directly from the URL, this program explicitly opens a connection to a URL and
gets an input stream from the connection. Then, like URLReader
,
this program creates a BufferedReader
on the input stream and
reads from it. The bold statements highlight the differences between this
example and the previous.
import java.net.*;
import java.io.*;
public class URLConnectionReader {
public static void main(String[] args) throws Exception {
URL yahoo = new URL("http://www.yahoo.com/");
URLConnection yc = yahoo.openConnection();
BufferedReader in = new BufferedReader(
new InputStreamReader(
yc.getInputStream()));
String inputLine;
while ((inputLine = in.readLine()) != null)
System.out.println(inputLine);
in.close();
}
}
The output from this program is identical to
the output from the program that opens a stream directly from the URL. You can
use either way to read from a URL. However, reading from a URLConnection
instead of reading directly from a URL might be more useful. This is because
you can use the URLConnection
object for other tasks
(like writing to the URL) at the same time.
Again, if the program hangs or you see an error message, you may have to set the proxy host so that the program can find the Yahoo server.
Many HTML pages contain forms –
text fields and other GUI objects that let you enter data to send to the
server. After you type in the required information and initiate the query by
clicking a button, your Web browser writes the data to the URL over the
network. At the other end, a cgi-bin
script (usually) on the server
receives the data, processes it, and then sends you a response, usually in the
form of a new HTML page.
Many cgi-bin
scripts use the POST METHOD for reading the data from the client. Thus writing
to a URL is often called posting to a URL. Server-side scripts use the
POST METHOD to read from their standard input.
Note: Some
server-side cgi-bin
scripts use the GET METHOD to
read your data. The POST METHOD is quickly making the GET METHOD obsolete
because it's more versatile and has no limitations on the amount of data that
can be sent through the connection.
A Java program can interact with cgi-bin
scripts also on the server side. It simply must be able to write to a URL, thus
providing data to the server. It can do this by following these steps:
1. Create
a URL
.
2. Open
a connection to the URL
.
3. Set
output capability on the URLConnection
.
4. Get
an output stream from the connection. This output stream is connected to the
standard input stream of the cgi-bin
script on the server.
5. Write to the output stream.
6. Close the output stream.
Hassan Schroeder, a member of the Java
development team, wrote a small cgi-bin
script named backwards
and made it available at the Java Web site, http://java.sun.com/cgi-bin/backwards
.
You can use this script to test the following example program. You can also put
the script on your network, name it backwards
, and
test the program locally.
The script at our Web site reads a string
from its standard input, reverses the string, and writes the result to its
standard output. The script requires input of the form string=string_to_reverse
,
where string_to_reverse
is the string whose
characters you want displayed in reverse order.
Here's an example program that runs the
backwards script over the network through a URLConnection
:
import java.io.*;
import java.net.*;
public class Reverse {
public static void main(String[] args) throws Exception {
if (args.length != 1) {
System.err.println("Usage: java Reverse "
+ "string_to_reverse");
System.exit(1);
}
String stringToReverse = URLEncoder.encode(args[0]);
URL url = new URL("http://java.sun.com/cgi-bin/backwards");
URLConnection connection = url.openConnection();
connection.setDoOutput(true);
PrintWriter out = new PrintWriter(
connection.getOutputStream());
out.println("string=" + stringToReverse);
out.close();
BufferedReader in = new BufferedReader(
new InputStreamReader(
connection.getInputStream()));
String inputLine;
while ((inputLine = in.readLine()) != null)
System.out.println(inputLine);
in.close();
}
}
Let's examine the program and see how it works. First, the program processes its command-line arguments:
if (args.length != 1) {
System.err.println("Usage: java Reverse " +
"string_to_reverse");
System.exit(-1);
}
String stringToReverse = URLEncoder.encode(args[0]);
These statements ensure that the user provides
one and only one command-line argument to the program, and then encodes it. The
command-line argument is the string that will be reversed by the cgi-bin
script
backwards
.
It may contain spaces or other non-alphanumeric characters. These characters
must be encoded because the string is processed on its way to the server. The URLEncoder
class methods encode the characters.
Next, the program creates the URL
object--the URL for the backwards
script on java.sun.com
--opens
a URLConnection
, and sets the connection
so that it can write to it:
URL url = new URL("http://java.sun.com/cgi-bin/backwards");
URLConnection c = url.openConnection();
c.setDoOutput(true);
The program then creates an output stream on
the connection and opens a PrintWriter
on it:
PrintWriter out = new PrintWriter(c.getOutputStream());
If the URL does not support output, getOutputStream
method throws an UnknownServiceException
. If the URL
does support output, then this method returns an output stream that is
connected to the standard input stream of the URL on the server side--the
client's output is the server's input.
Next, the program writes the required information to the output stream and closes the stream:
out.println("string=" + stringToReverse);
out.close();
This code writes to the output stream using
the println
method. So you can see that
writing data to a URL is as easy as writing data to a stream. The data written
to the output stream on the client side is the input for the backwards script
on the server side. The Reverse
program constructs the input
in the form required by the script by concatenating string=
to the
encoded string to be reversed.
Often, when you are writing to a URL, you
are passing information to a cgi-bin
script, as in this example.
This script reads the information you write, performs some action, and then
sends information back to you via the same URL. So it's likely that you will
want to read from the URL after you've written to it. The Reverse
program does this:
BufferReader in = new BufferedReader(
new InputStreamReader(c.getInputStream()));
String inputLine;
while ((inputLine = in.readLine()) != null)
System.out.println(inputLine);
in.close();
When you run the Reverse
program using "Reverse Me" as an argument (including the double quote
marks), you should see this output:
Reverse Me
reversed is:
eM esreveR