Computers running on the Internet communicate to each other
using either the Transmission Control Protocol (TCP) or the User Datagram
Protocol (UDP), as this diagram illustrates:
When you write Java programs that communicate over the
network, you are programming at the application layer. Typically, you don't
need to concern yourself with the TCP and UDP layers. Instead, you can use the
classes in the java.net
package. These classes provide system-independent network communication.
However, to decide which Java classes your programs should use, you do need to
understand how TCP and UDP differ.
When two applications want to communicate to each other
reliably, they establish a connection and send data back and forth over that
connection. This is analogous to making a telephone call. If you want to speak
to Aunt Beatrice in Kentucky, a connection is established when you dial her
phone number and she answers. You send data back and forth over the connection
by speaking to one another over the phone lines. Like the phone company, TCP
guarantees that data sent from one end of the connection actually gets to the
other end and in the same order it was sent. Otherwise, an error is reported.
TCP provides a point-to-point channel for
applications that require reliable communications. The Hypertext Transfer
Protocol (HTTP), File Transfer Protocol (FTP), and Telnet are all examples of
applications that require a reliable communication channel. The order in which
the data is sent and received over the network is critical to the success of
these applications. When HTTP is used to read from a URL, the data must be
received in the order in which it was sent. Otherwise, you end up with a
jumbled HTML file, a corrupt zip file, or some other invalid information.
Definition:
TCP (Transmission Control Protocol) is a connection-based
protocol that provides a reliable flow of data between two computers.
The UDP protocol provides for communication that is not
guaranteed between two applications on the network. UDP is not connection-based
like TCP. Rather, it sends independent packets of data, called datagrams,
from one application to another. Sending datagrams is much like sending a
letter through the postal service: The order of delivery is not important and
is not guaranteed, and each message is independent of any other.
Definition: UDP (User
Datagram Protocol) is a protocol that sends independent packets of data,
called datagrams, from one computer to another with no guarantees about arrival.
UDP is not connection-based like TCP.
For many applications, the guarantee of
reliability is critical to the success of the transfer of information from one
end of the connection to the other. However, other forms of communication don't
require such strict standards. In fact, they may be slowed down by the extra
overhead or the reliable connection may invalidate the service altogether.
Consider, for example, a clock server that sends
the current time to its client when requested to do so. If the client misses a
packet, it doesn't really make sense to resend it because the time will be
incorrect when the client receives it on the second try. If the client makes
two requests and receives packets from the server out of order, it doesn't
really matter because the client can figure out that the packets are out of
order and make another request. The reliability of TCP is unnecessary in this
instance because it causes performance degradation and may hinder the
usefulness of the service.
Another example of a service that doesn't need
the guarantee of a reliable channel is the ping command. The purpose of the
ping command is to test the communication between two programs over the
network. In fact, ping needs to know about dropped or out-of-order packets to
determine how good or bad the connection is. A reliable channel would
invalidate this service altogether.
The UDP protocol provides for communication that
is not guaranteed between two applications on the network. UDP is not
connection-based like TCP. Rather, it sends independent packets of data from
one application to another. Sending datagrams is much like sending a letter
through the mail service: The order of delivery is not important and is not
guaranteed, and each message is independent of any others.
Note: Many firewalls and routers
have been configured not to allow UDP packets. If you're having trouble
connecting to a service outside your firewall, or if clients are having trouble
connecting to your service, ask your system administrator if UDP is permitted.
Generally speaking, a computer has a single physical
connection to the network. All data destined for a particular computer arrives
through that connection. However, the data may be intended for different
applications running on the computer. So how does the computer know to which
application to forward the data? Through the use of ports.
Data transmitted over the Internet is accompanied
by addressing information that identifies the computer and the port for which
it is destined. The computer is identified by its 32-bit IP address, which IP
uses to deliver data to the right computer on the network. Ports are identified
by a 16-bit number, which TCP and UDP use to deliver the data to the right
application.
In connection-based communication such as TCP, a
server application binds a socket to a specific port number. This has the
effect of registering the server with the system to receive all data destined
for that port. A client can then rendezvous with the server at the server's port,
as illustrated here:
Definition: The TCP and UDP
protocols use ports to map incoming data to a particular process running on a
computer.
In datagram-based communication such as UDP, the
datagram packet contains the port number of its destination and UDP routes the
packet to the appropriate application, as illustrated in this figure:
Port numbers range from 0 to 65,535 because ports are
represented by 16-bit numbers. The port numbers ranging from 0 - 1023 are
restricted; they are reserved for use by well-known services such as HTTP and
FTP and other system services. These ports are called well-known ports.
Your applications should not attempt to bind to them.
Through the classes in java.net
, Java programs can use TCP or
UDP to communicate over the Internet. The URL
, URLConnection
, Socket
, and ServerSocket
classes all use TCP to communicate over the network. The DatagramPacket
,
DatagramSocket
,
and MulticastSocket
classes are for use with UDP.
The Internet Protocol Suite
The java.net package provides a set of classes that
support network programming using the communication protocols employed by the
Internet. These protocols are known as the Internet protocol suite and
include the Internet Protocol (IP), the Transport Control Protocol
(TCP), and the User Datagram Protocol (UDP) as well as other,
less-prominent supporting protocols. Although this section cannot provide a
full description of the Internet protocols, it gives you the basic information
that you need to get started with Java network programming. In order to take
full advantage of this chapter, you need an Internet connection.
Asking the question What is the Internet? may
bring about a heated discussion in some circles. In this book, the Internet
is defined as the collection of all computers that are able to communicate,
using the Internet protocol suite, with the computers and networks registered
with the Internet Network Information Center (InterNIC). This definition
includes all computers to which you can directly (or indirectly through a
firewall) send Internet Protocol packets.
Computers on the Internet communicate by
exchanging packets of data, known as Internet Protocol, or IP, packets. IP is
the network protocol used to send information from one computer to another over
the Internet. All computers on the Internet (by our definition in this book)
communicate using IP. IP moves information contained in IP packets. The IP
packets are routed via special routing algorithms from a source computer that
sends the packets to a destination computer that receives them. The routing
algorithms figure out the best way to send the packets from source to
destination.
In order for IP to send packets from a source
computer to a destination computer, it must have some way of identifying these
computers. All computers on the Internet are identified using one or more IP
addresses. A computer may have more than one IP address if it has more than one
interface to computers that are connected to the Internet.
IP addresses are 32-bit numbers. They may be
written in decimal, hexadecimal, or other formats, but the most common format
is dotted decimal notation. This format breaks the 32-bit address up into four
bytes and writes each byte of the address as unsigned decimal integers
separated by dots. For example, one of my IP addresses is 0xccD499C1.
Because 0xcc
= 204, 0xD4
= 212, 0x99
= 153, and 0xC1
= 193, my address in dotted decimal form is 204.212.153.193.
IP addresses are not easy to remember, even using
dotted decimal notation. The Internet has adopted a mechanism, referred to as
the Domain Name System (DNS), whereby computer names can be associated
with IP addresses. These computer names are referred to as domain names.
The DNS has several rules that determine how domain names are constructed and
how they relate to one another. For the purposes of this chapter, it is
sufficient to know that domain names are computer names and that they are
mapped to IP addresses.
The mapping of domain names to IP addresses is
maintained by a system of domain name servers. These servers are able to
look up the IP address corresponding to a domain name. They also provide the
capability to look up the domain name associated with a particular IP address,
if one exists.
As I mentioned, IP enables communication between
computers on the Internet by routing data from a source computer to a
destination computer. However, computer-to-computer communication only solves
half of the network communication problem. In order for an application program,
such as a mail program, to communicate with another application, such as a mail
server, there needs to be a way to send data to specific programs within a
computer.
Ports are used to enable communication between
programs. A port is an address within a computer. Port addresses are
16-bit addresses that are usually associated with a particular application
protocol. An application server, such as a Web server or an FTP server, listens
on a particular port for service requests, performs whatever service is
requested of it, and returns information to the port used by the application
program requesting the service.
Popular Internet application protocols are
associated with well-known ports. The server programs implementing these
protocols listen on these ports for service requests. The well-known ports for
some common Internet application protocols are:
Port |
Protocol |
21 |
File Transfer Protocol |
23 |
Telnet Protocol |
25 |
Simple Mail Transfer Protocol |
80 |
Hypertext Transfer Protocol |
The well-known ports are used to standardize the
location of Internet services.
Transport protocols are used to deliver
information from one port to another and thereby enable communication between
application programs. They use either a connection-oriented or connectionless
method of communication. TCP is a connection-oriented protocol and UDP is a
connectionless transport protocol.
The TCP connection-oriented protocol establishes
a communication link between a source port/IP address and a destination port/IP
address. The ports are bound together via this link until the connection is
terminated and the link is broken. An example of a connection-oriented protocol
is a telephone conversation. A telephone connection is established,
communication takes place, and then the connection is terminated.
The reliability of the communication between the
source and destination programs is ensured through error-detection and
error-correction mechanisms that are implemented within TCP. TCP implements the
connection as a stream of bytes from source to destination. This feature allows
the use of the stream I/O classes provided by java.io.
The UDP connectionless protocol differs from the
TCP connection-oriented protocol in that it does not establish a link for the
duration of the connection. An example of a connectionless protocol is postal
mail. To mail something, you just write down a destination address (and an
optional return address) on the envelope of the item you're sending and drop it
in a mailbox. When using UDP, an application program writes the destination
port and IP address on a datagram and then sends the datagram to its
destination. UDP is less reliable than TCP because there are no
delivery-assurance or error-detection and -correction mechanisms built into the
protocol.
Application protocols such as FTP, SMTP, and HTTP
use TCP to provide reliable, stream-based communication between client and
server programs. Other protocols, such as the Time Protocol, use UDP because
speed of delivery is more important than end-to-end reliability.
The Internet provides a variety of services that
contribute to its appeal. These services include e-mail, newsgroups, file
transfer, remote login, and the Web. Internet services are organized according
to a client/server architecture. Client programs, such as Web browsers and file
transfer programs, create connections to servers, such as Web and FTP servers.
The clients make requests of the server, and the server responds to the
requests by providing the service requested by the client.
The Web provides a good example of client/server
computing. Web browsers are the clients and Web servers are the servers. Browsers
request HTML files from Web servers on your behalf by establishing a connection
with a Web server and submitting file requests to the server. The server
receives the file requests, retrieves the files, and sends them to the browser
over the established connection. The browser receives the files and displays
them to your browser window.
Clients and servers establish connections and
communicate via sockets. Connections are communication links that are
created over the Internet using TCP. Some client/server applications are also
built around the connectionless UDP. These applications also use sockets to
communicate.
Sockets are the endpoints of Internet
communication. Clients create client sockets and connect them to server
sockets. Sockets are associated with a host address and a port address. The
host address is the IP address of the host where the client or server program
is located. The port address is the communication port used by the client or
server program. Server programs use the well-known port number associated with
their application protocol.
A client communicates with a server by
establishing a connection to the socket of the server. The client and server
then exchange data over the connection. Connection-oriented communication is
more reliable than connectionless communication because the underlying TCP
provides message-acknowledgment, error-detection, and error-recovery services.
When a connectionless protocol is used, the
client and server communicate by sending datagrams to each other's socket. The
UDP is used for connectionless protocols. It does not support reliable
communication like TCP.
The java.net package provides several classes that
support socket-based client/server communication.
The InetAddress class encapsulates Internet IP addresses
and supports conversion between dotted decimal addresses and hostnames.
The Socket, ServerSocket, DatagramSocket and
MulticastSocket classes implement client and server sockets for connection-oriented
and connectionless communication. The SocketImpl class and the SocketImplFactory
interface provide hooks for implementing custom sockets.
The URL, URLConnection, and URLEncoder classes implement high-level
browser-server Web connections.