Ethernet: Distributed Packet Switching for Local Computer Networks
Robert M. Metcalfe and David R. Boggs
Xerox Palo Alto Research
Center
Reprinted from Communications of the ACM,
Vol. 19, No. 5, July 1976 pp. 395 - 404
Copyright © 1976, Association for Computing
Machinery Inc.
This is a digitized copy derived from an ACM copyrighted work.
It is not guaranteed to be an accurate copy of the author's original
work.
Table of Contents
Abstract
1. Background
2. System Summary
3. Design Principles
4. Implementation
5. Growth
6. Performance
7. Protocol
8. Conclusion
References
Ethernet is a branching broadcast communication system for carrying
digital data packets among locally distributed computing stations. The packet
transport mechanism provided by Ethernet has been used to build systems
which can be viewed as either local computer networks or loosely coupled
multiprocessors. An Ethernet's shared communication facility, its Ether,
is a passive broadcast medium with no central control. Coordination of access
to the Ether for packet broadcasts is distributed among the contending transmitting
stations using controlled statistical arbitration. Switching of packets
to their destinations on the Ether is distributed among the receiving stations
using packet address recognition. Design principles and implementation are
described, based on experience with an operating Ethernet of 100 nodes along
a kilometer of coaxial cable. A model for estimating performance under heavy
loads and a packet protocol for error controlled communication are included
for completeness.
Key Words and Phrases: computer networks, packet switching, multiprocessing,
distributed control, distributed computing, broadcast communication, statistical
arbitration
CR Categories: 3.81, 4.32, 6.35
One can characterize distributed computing as a spectrum of activities varying
in their degree of decentralization, with one extreme being remote computer
networking and the other extreme being multiprocessing. Remote computer
networking is the loose interconnection of previously isolated, widely separated,
and rather large computing systems. Multiprocessing is the construction
of previously monolithic and serial computing systems from increasingly
numerous and smaller pieces computing in parallel. Near the middle of this
spectrum is local networking, the interconnection of computers to gain the
resource sharing of computer networking and the parallelism of multiprocessing.
The separation between computers and the associated bit rate of their communication
can be used to divide the distributed computing spectrum into broad activities.
The product of separation and bit rate, now about I gigabit-meter per second
(1 Gbmps), is an indication of the limit of current communication technology
and can be expected to increase with time: Activity |
Separation | Bit rate |
Remote networks |
> 10 km | < .1 Mbps |
Local networks |
10-.1 km | .1-10 Mbps |
Multiprocessors | .1
km | > 10 Mbps |
1.1 Remote Computer Networking
Computer networking evolved from telecommunications terminal-computer communication,
where the object was to connect remote terminals to a central computing
facility. As the need for computer-computer interconnection grew, computers
themselves were used to provide communication [2, 4, 29].
Communication
using computers as packet switches [15-21, 26]
and communications among
computers for resource sharing [10, 32]
were both advanced by the development
of the Arpa Computer Network.
The Aloha Network at the University of Hawaii was originally developed to
apply packet radio techniques for communication between a central computer
and its terminals scattered among the Hawaiian Islands [1, 2]
. Many of the
terminals are now minicomputers communicating among themselves using the
Aloha Network's Menehune as a packet switch. The Menehune and an Arpanet
Imp are now connected, providing terminals on the Aloha Network access to
computing resources on the U.S. mainland.
Just as computer networks have grown across continents and oceans to interconnect
major computing facilities around the world, they are now growing down corridors
and between buildings to interconnect minicomputers in offices and laboratories
[3, 12, 13, 14, 35].
1.2 Multiprocessing
Multiprocessing first took the form of connecting an I/O controller to a
large central computer; IBM's Asp is a classic example [29]. Next, multiple
central processors were connected to a common memory to provide more power
for compute-bound applications [33] For certain of these applications, more
exotic multiprocessor architectures such as Illiac IV were introduced [5].
More recently minicomputers have been connected in multiprocessor configurations
for economy, reliability, and increased system modularity [24, 36]. The
trend has been toward decentralization for reliability; loosely coupled
multiprocessor systems depend less on shared central memory and more on
thin wires for interprocess communication with increased component
isolation [18, 26].
With the continued thinning of interprocessor communication
for reliability and the development of distributable applications, multiprocessing
is gradually approaching a local form of distributed computing.
1.3 Local Computer Networking
Ethernet shares many objectives with other local networks such as Mitre's
Mitrix, Bell Telephone Laboratory's Spider, and U.C. Irvine's Distributed
Computing System (DCS) [12, 13, 14, 35].
Prototypes of all four local networking
schemes operate at bit rates between one and three megabits per second.
Mitrix and Spider have a central minicomputer for switching and bandwidth
allocation, while DCS and Ethernet use distributed control. Spider and DCS
use a ring communication path, Mitrix uses off-the-shelf CATV technology
to implement two one-way busses, and our experimental Ethernet uses a branching
two-way passive bus. Differences among these systems are due to differences
among their intended applications, differences among the cost constraints
under which trade-offs were made, and differences of opinion among researchers.
Before going into a detailed description of Ethernet, we offer the following
overview (see Figure 1).
- Figure 1. A two-segment Ethernet
-
Ethernet is a system for local communication among computing stations. Our
experimental Ethernet uses tapped coaxial cables to carry variable length
digital data packets among, for example, personal minicomputers, printing
facilities, large file storage devices, magnetic tape backup stations, larger
central computers, and longer-haul communication equipment.
The shared communication facility, a branching Ether, is passive. A station's
Ethernet interface connects bit-serially through an interface cable to a
transceiver which in turn taps into the passing Ether. A packet is broadcast
onto the Ether, is heard by all stations, and is copied from the Ether by
destinations which select it according to the packet's leading address bits.
This is broadcast packet switching and should be distinguished from store-and-forward
packet switching, in which routing is performed by intermediate process
processing elements. To handle the demands of growth, an Ethernet can be
extended using packet repeaters for signal regeneration, packet filters
for traffic localization, and packet gateways for internetwork address extension.
Control is completely distributed among stations, with packet transmissions
coordinated through statistical arbitration. Transmissions initiated by
a station defer to any which may already be in progress. Once started, if
interference with other packets is detected, a transmission is aborted and
rescheduled by its source station. After a certain period of interference-free
transmission, a packet is heard by all stations and will run to completion
without interference. Ethernet controllers in colliding stations each generate
random retransmission intervals to avoid repeated collisions. The mean of
a packet's retransmission intervals is adjusted as a function of collision
history to keep Ether utilization near the optimum with changing network
load.
Even when transmitted without source-detected interference, a packet may
still not reach its destination without error; thus, packets are delivered
only with high probability Stations requiring a residual error
rate lower than that provided by the bare Ethernet packet transport mechanism
must follow mutually agreed upon packet protocols.
Our object is to design a communication system which can grow smoothly to
accommodate several buildings full of personal computers and the facilities
needed for their support.
Like the computing stations to be connected, the communication system must
be inexpensive. We choose to distribute control of the communications facility
among the communicating computers to eliminate the reliability problems
of an active central controller, to avoid creating a bottleneck in a system
rich in parallelism, and to reduce the fixed costs which make small systems
uneconomical.
Ethernet design started with the basic idea of packet collision and retransmission
developed in the Aloha Network [1].
We expected that, like the Aloha Network,
Ethernets would carry bursty traffic so that conventional synchronous time-division
multiplexing (STDM) would be inefficient
[1, 2, 21, 26]. We saw promise
in the Aloha approach to distributed control of radio channel multiplexing
and hoped that it could be applied effectively with media suited to local
computer communication. With several innovations of our own, the promise
is realized.
Ethernet is named for the historical luminiferous ether through
which electromagnetic radiations were once alleged to propagate. Like an
Aloha radio transmitter, an Ethernet transmitter broadcasts completely-addressed
transmitter- synchronous bit sequences called packets onto the Ether and
hopes that they are heard by the intended receivers. The Ether is a logically
passive medium for the propagation of digit signals Is and can be constructed
using any number of media including coaxial cables, twisted pairs, and optical
fibers.
3.1 Topology
We cannot afford the redundant connections and dynamic routing of store-and-forward
packet switching to assure reliable communication, so we choose to achieve
reliability through simplicity. We choose to make the shared communication
facility passive so that the failure of an active element will tend to affect
the communications of only a single station. The layout and changing needs
of office and laboratory buildings leads us to pick a network topology with
the potential for convenient incremental extension and reconfiguration with
minimal service disruption.
The topology of the Ethernet is that of an unrooted tree. It is a tree
so that the Ether can branch at the entrance to a building's corridor, yet
avoid multipath interference. There must be only one path through the Ether
between any source and destination; if more than one path were to exist,
a transmission would interfere with itself, repeatedly arriving at its intended
destination having traveled by paths of different length. The Ether is unrooted
because it can be extended from any of its points in any direction: Any
station wishing to join an Ethernet taps into the Ether at the nearest convenient
point.
Looking at the relationship of interconnection and control, we see that
Ethernet is the dual of a star network. Rather than distributed
interconnection through many separate links and central control
in a switching node, as in a star network, the Ethernet has central
interconnection through the-Ether and distributed control among
its stations.
Unlike an Aloha Network, which is a star network with an outgoing broadcast
channel and an incoming multi-access channel, an Ethernet supports many-to-many
communication with a single broadcast multiaccess channel.
3.2 Control
Sharing of the Ether is controlled in such a way that it is not only possible
but probable that two or more stations will attempt to transmit a packet
at roughly the same time. Packets which overlap in time on the Ether are
said to collide; they interfere so as to be unrecognizable by a receiver.
A station recovers from a detected collision by abandoning the attempt and
retransmitting the packet after some dynamically chosen random time period.
Arbitration of conflicting transmission demands is both distributed and
statistical.
When the Ether is largely unused, a station transmits its packets at will,
the packets are received without error, and all is well. As more stations
begin to transmit, the rate of packet interference increases. Ethernet controllers
in each station are built to adjust the mean retransmission interval in
proportion to the frequency of collisions; sharing of the Ether among competing
station-station transmissions is thereby kept near the optimum [20, 21].
A degree of cooperation among the stations is required to share the Ether
equitably. In demanding applications certain stations might usefully take
transmission priority through some systematic violation of equity rules.
A station could usurp the Ether by not adjusting its retransmission interval
with increasing traffic or by sending very large packets. Both practices
are now prohibited by low-level software in each station.
3.3 Addressing
Each packet has a source and destination, both of which are identified in
the packet's header. A packet placed on the Ether eventually propagates
to all stations. Any station can copy a packet from the Ether into its local
memory, but normally only an active destination station matching its address
in the packet's header will do so as the packet passes. By convention, a
zero destination address is a wildcard and matches all addresses; a packet
with a destination of zero is called a broadcast packet.
3.4 Reliability
An Ethernet is probabilistic. Packets may be lost due to interference with
other packets, impulse noise on the Ether, an inactive receiver at a packet's
intended destination, or purposeful discard. Protocols used to communicate
through an Ethernet must assume that packets will be received correctly
at intended destinations only with high probability.
An Ethernet gives its best efforts to transmit packets successfully, but
it is the responsibility of processes in the source and destination stations
to take the precautions necessary to assure reliable communication of the
quality they themselves desire [18, 21]. Recognizing the costliness and
dangers of promising "error-free" communication, we refrain from
guaranteeing reliable delivery of any single packet to get both economy
of transmission and high reliability averaged over many packets
[21]. Removing
the responsibility for reliable communication from the packet transport
mechanism allows us to tailor reliability to the application and to place
error recovery where it will do the most good. This policy becomes more
important as Ethernets are interconnected in a hierarchy of networks through
which packets must travel farther and suffer greater risks.
3.5 Mechanisms
A station connects to the Ether with a tap and a transceiver. A tap is a
device for physically connecting to the Ether while disturbing its transmission
characteristics as little as possible. The design of the transceiver must
be an exercise in paranoia. Precautions must be taken to insure that likely
failures in the transceiver or station do not result in pollution of the
Ether. In particular, removing power from the transceiver should cause it
to disconnect from the Ether.
Five mechanisms are provided in our experimental Ethernet for reducing the
probability and cost of losing a packet. These are (1) carrier detection,
(2) interference detection, (3) packet error detection, (4) truncated packet
filtering, and (5) collision consensus enforcement.
3.5.1 Carrier detection.
As a packet's bits are placed on the Ether by a station, they are phase
encoded (like bits on a magnetic tape), which guarantees that there is at
least one transition on the Ether during each bit time. The passing of a
packet on the Ether can therefore be detected by listening for its transitions.
To use a radio analogy, we speak of the presence of carrier as a packet
passes a transceiver. Because a station can sense the car carrier of a passing
packet, it can delay sending one of its own until the detected packet passes
safely. The Aloha Network does not have carrier detection and consequently
suffers a substantially higher collision rate. Without carrier detection,
efficient use of the Ether would decrease with increasing packet length.
In Section 6 below, we show that with carrier detection, Ether efficiency
increases with increasing packet length.
With carrier detection we are able to implement deference: no station
will start transmitting while hearing carrier. With deference comes acquisition:
once a packet transmission has been in progress for an Ether end-to-end
propagation time, all stations are hearing carrier and are deferring; the
Ether has been acquired and the transmission will complete without an interfering
collision.
With carrier detection, collisions should occur only when two or more stations
find the Ether silent and begin transmitting simultaneously: within an Ether
end-to-end propagation time. This will almost always happen immediately
after a packet transmission during which two or more stations were deferring.
Because stations do not now randomize after deferring, when the transmission
terminates, the waiting stations pile on together, collide, randomize, and
retransmit.
3.5.2 Interference detection.
Each transceiver has an interference detector. Interference is indicated
when the transceiver notices a difference between the value of the bit it
is receiving from the Ether and the value of the bit it is attempting to
transmit.
Interference detection has three advantages. First, a station detecting
a collision knows that its packet has been damaged. The packet can be scheduled
for retransmission immediately, avoiding a long acknowledgment timeout.
Second, interference periods on the Ether are limited to a maximum of one
round trip time. Colliding packets in the Aloha Network run to completion,
but the truncated packets resulting from Ethernet collisions waste only
a small fraction of a packet time on the Ether Third, the frequency of detected
interference is used to estimate Ether traffic for adjusting retransmission
intervals and optimizing channel efficiency.
3.5.3 Packet error detection.
As a packet is placed on the Ether, a checksum is computed and appended.
As the packet is read from the Ether, the checksum is recomputed. Packets
which do not carry a consistent checksum are discarded. In this way transmission
errors, impulse noise errors, and errors due to undetected interference
are caught at a packet's destination.
3.5.4 Truncated packet filtering.
Interference detection and deference cause most collisions to result in
truncated packets of only a few bits; colliding stations detect interference
and abort transmission within an Ether round trip time. To reduce the processing
load that the rejection of such obviously damaged packets would place on
listening station software, truncated packets are filtered out in hardware.
3.5.5 Collision consensus enforcement.
When a station determines that its transmission is experiencing interference,
it momentarily jams the Ether to insure that all other participants in the
collision will detect interference and, because of deference, will be forced
to abort. Without this collision consensus enforcement mechanism, it is
possible that the transmitting station which would otherwise be the last
to detect a collision might not do so as the other interfering transmissions
successively abort and stop interfering. Although the packet may look good
to that last transmitter, different path lengths between the colliding transmitters
and the intended receiver will cause the packet to arrive damaged.
Our choices of I kilometer, 3 megabits per second, and 256 stations for
the parameters of an experimental Ethernet were based on characteristics
of the locally distributed computer communication environment and our assessments
of what would be marginally achievable; they were certainly not hard restrictions
essential to the Ethernet concept.
We expect that a reasonable maximum network size would be on the order of
I kilometer of cable. We used this working number to choose among Ethers
of varying signal attenuation and to design transceivers with appropriate
power and sensitivity.
The dominant station on our experimental Ethernet is a minicomputer for
which 3 megabits per second is a convenient data transfer rate. By keeping
the peak rate well below that of the computer's path to main memory, we
reduce the need for expensive special-purpose packet buffering in our Ethernet
interfaces. By keeping the peak rate as high as is convenient, we provide
for larger numbers of stations and more ambitious multiprocessing communications
applications.
To expedite low-level packet handling among 256 stations, we allocate the
first 8-bit byte of the packet to be the destination address field and the
second byte to be the source address field (see figure 2). 256 is a number
small enough to allow each station to get an adequate share of the available
bandwidth and approaches the limit of what we can achieve with current techniques
for tapping cables. 256 is only a convenient number for the lowest level
of protocol; higher levels can accommodate extended address spaces with
additional fields inside the packet and software to interpret them.
Our experimental Ethernet implementation has four major parts: the Ether,
transceivers, interfaces, and controllers (see Figure 1).
4.1 Ether
We chose to implement our experimental Ether using low-loss coaxial cable
with off-the-shelf CATV taps and connectors. It is possible to mix Ethers
on a single Ethernet; we use a smaller-diameter coax for convenient connection
within station clusters and a larger-diameter coax for low-loss runs between
clusters. The cost of coaxial cable Ether is insignificant relative to the
cost of the distributed computing systems supported by Ethernet.
4.2 Transceivers.
Our experimental transceivers can drive a kilometer of coaxial cable Ether
tapped by 256 stations transmitting at 3 megabits per second. The transceivers
can endure (i.e. work after) sustained direct shorting, improper termination
of the Ether, and simultaneous drive by all 256 stations; they can tolerate
(i.e. work during) ground differentials and everyday electrical noise, from
typewriters or electric drills, encountered when stations are separated
by as much as a kilometer.
An Ethernet transceiver attaches directly to the Ether which passes by in
the ceiling or under the floor. It is powered and controlled through five
twisted pairs in an interface cable carrying transmit data, receive data,
interference detect, and power supply voltages. When unpowered, the transceiver
disconnects itself electrically from the Ether. Here is where our fight
for reliability is won or lost; a broken transceiver can, but should not,
bring down an entire Ethernet. A watchdog timer circuit in each transceiver
attempts to prevent pollution of the Ether by shutting down the output stage
if it acts suspiciously. For transceiver simplicity we use the Ether's base
frequency band, but an Ethernet could be built to use any suitably sized
band of a frequency division multiplexed Ether.
Even though our experimental transceivers are very simple and can tolerate
only limited signal attenuation, they have proven quite adequate and reliable.
A more sophisticated transceiver design might permit passive branching of
the Ether and wider station separation.
4.3 Interface
An Ethernet interface serializes and deserializes the parallel data used
by its station. There are a number of different stations on our Ethernet;
an interface must be built for each kind.
Each interface is equipped with the hardware necessary to compute a 16-bit
cyclic redundancy checksum (CRC) on serial data as it is transmitted and
received This checksum protects only against errors in the Ether and specifically
not against errors in the parallel portions of the interface hardware or
station. Higher-level software checksums are recommended for applications
in which a higher degree of reliability is required.
A transmitting interface uses a packet buffer address and word count to
serialize and phase encode a variable number of 16-bit words which are taken
from the station's memory and passed to the transceiver, preceded by a start
bit (called SYNC in Figure 2) and followed by the CRC. A receiving interface
uses the appearance of carrier to detect the start of a packet and uses
the SYNC bit to acquire bit phase. As long as carrier stays on, the interface
decodes and deserializes the 'incoming bit stream depositing 16-bit words
in a packet buffer in the station's main memory. When carrier goes away,
the interface checks that an integral number of 1 6-bit words has been received
and that the CRC is correct. The last word received is assumed to be the
CRC and is not copied into the packet buffer.
These interfaces ordinarily include hardware for accepting only those pa
ckets with appropriate addresses in their headers. Hardware address filtering
helps a station avoid burdensome software packet processing when the Ether
is very busy carrying traffic intended for other stations.
An Ethernet controller is the station-specific low level firmware or software
for getting packets onto and out of the Ether. When a source-detected collision
occurs, it is the source controller's responsibility to generate a new random
retransmission interval based on the updated collision count. We have studied
a number of algorithms for controlling retransmission rates in stations
to maintain Ether efficiency [20, 22]. The most practical of these algorithms
estimate traffic load using recent collision history.
Retransmission intervals are multiples of a slot, the maximum time between
starting a transmission and detecting a collision, one end-to-end round
trip delay An Ethernet controller begins transmission of each new packet
with a mean retransmission interval of one slot. Each time a transmission
attempt ends in collision, the controller delays for an interval of random
length with a mean twice that of the previous interval, defers to any passing
packet, and then attempts retransmission. This heuristic approximates an
algorithm we have called Binary Exponential Backoff (see
Figure 3) [22].
- Fig. 3. Collision control algorithm
-
When the network is unloaded and collisions are rare, the mean seldom departs
from one and retransmission are prompt. As the traffic load increases, more
collisions are experienced, a backlog of packets builds up in the stations,
retransmission intervals increase, and retransmission traffic backs off
to sustain channel efficiency.
5.1 Signal Cover
One can expand an Ethernet just so far by adding transceivers and Ether.
At some point, the transceivers and Ether will be unable to carry the required
signals. The signal cover can be extended with a simple unbuffered packet
repeater. In our experimental Ethernet, where because of transceiver simplicity
the Ether cannot be branched passively, a simple repeater may join any number
of Ether segments to enrich the topology while extending the signal cover.
We operate an experimental two-segment packet repeater, but hope to avoid
relying on them. In branching the Ether and extending its signal cover,
there is a trade. off between using sophisticated transceivers and using
repeaters. With increased power and sensitivity, transceivers become more
expensive and less reliable. The introduction of repeaters into an Ethernet
makes the centrally interconnecting Ether active. The failure of a transceiver
will sever the communications of its owner; the failure of a repeater partitions
the Ether severing many communications.
5.2 Traffic Cover
One can expand an Ethernet just so far by adding Ether and packet repeaters.
At some point the Ether will be so busy that additional stations will just
divide more finely the already inadequate bandwidth. The traffic cover can
be extended with an unbuffered traffic-filtering repeater or packet filter,
which passes packets from one Ether segment to another only if the destination
station is located on the new segment. A packet filter also extends the
signal cover.
5.3 Address Cover
One can expand an Ethernet just so far by adding Ether, repeaters, and traffic
filters. At some point there will be too many stations to be addressed with
the Ethernet's 8-bit addresses. The address cover can be extended with packet
gateways and the software addressing conventions they implement [7]. Addresses
can be expanded in two directions: down into the station by adding fields
to identify destination ports or processes within a station, and up into
the internetwork by adding fields to identify destination stations on remote
networks. A gateway also extends the traffic and signal covers.
There can be only one repeater or packet filter connecting two Ether segments;
a packet repeated onto a segment by multiple repeaters would interfere with
itself. However, there is no limit to the number of gateways connecting
two segments; a gateway only repeats packets addressed to itself as an intermediary.
Failure of the single repeater connecting two segments partitions the network;
failure of a gateway need not partition the net it there are paths through
other gateways between the segments.
We present here a simple set of formulas with which to characterize the
performance expected of an Ethernet when it is heavily loaded. More elaborate
analyses and several detailed simulations have been done, but the following
simple model has proven very useful in understanding the Ethernet's distributed
contention scheme, even when it is loaded beyond expectations
[1, 20, 21,22, 23, 27].
We develop a simple model of the performance of a loaded Ethernet by examining
alternating Ether time periods. The first, called a transmission interval,
is that during which the Ether has been acquired for a successful packet
transmission. The second, called a contention interval, is that
composed of the retransmission slots of Section 4.4, during which stations
attempt to acquire control of the Ether. Because the model's Ethernets are
loaded and because stations defer to passing packets before starting transmission,
the slots are synchronized by the tail of the preceding acquisition interval.
A slot will be empty when no station chooses to attempt transmission in
it and it will contain a collision if more than one station attempts to
transmit. When a slot contains only one attempted transmission, then £he
Ether has been acquired for the duration of a packet, the contention interval
ends, and a transmission interval begins.
Let P be the number of bits in an Ethernet packet. Let C
be the peak capacity in bits per second, carried on the Ether. Let T be
the time in seconds of a slot, the number of seconds it takes to detect
a collision after starting a transmission. Let us assume that there are
Q stations continuously queued to transmit a packet; either the
acquiring station has a new packet immediately after a successful acquisition
or another station comes ready. Note that Q also happens to give
the total offered load on the network which for this analysis is always
l or greater. We assume that a queued station attempts to transmit in the
current slot with probability 1/Q, or delays with probability 1
- (1/Q)); this is known to be the optimum statistical decision
rule, approximated in Ethernet stations by means of our load-estimating
retransmission control algorithms [20, 21].
6.1 Acquisition Probability
We now compute A, the probability that exactly one station attempts a transmission
in a slot and therefore acquires the Ether. A is Q*(l/Q)*((1
- (l/Q))**(Q- 1)); there are Q ways In which
one station can choose to transmit (with probability ( 1/ Q) )
while Q-l stations choose to wait (with probability l- (I/Q)).
Simplifying,
A = (l -(1/Q))**(Q-l) .
6.2 Waiting Time
We now compute W, the mean number of slots of waiting in a contention
interval before a successful acquisition of the Ether by a station's transmission.
The probability of waiting no time at all is just A, the probability that
one and only one station chooses to transmit in the first slot following
a transmission. The probability of waiting l slot is A*(l-A);
the probability of waiting i slots is A*((1-A)**i).
The mean of this geometric distribution is
W = (l -A)/A.
6.3 Efficiency
We now compute E, that fraction of time the Ether is carrying good
packets, the efficiency. The Ether's time is divided between transmission
intervals and contention intervals. A packet transmission takes P/C
seconds. The mean time to acquisition is W*T. Therefore,
by our simple model, E = (P/C)/((P/C)
+ (W*T)). Table 1 presents representative performance
figures (i.e. E) for our experimental Ethernet with the indicated
packet sizes and number of continuously queued stations. The efficiency
figures given do not account for inevitable reductions due to headers and
control packets nor for losses due to imprecise control of the retransmission
parameter l/Q; the former is straightforwardly protocol-dependent
and the latter requires analysis beyond the scope of this paper. Again,
we feel that all of the Ethernets in the table are overloaded; normally
loaded Ethernets will usually have a Q much less than l and exhibit
behavior not covered by this model. For our calculations we use a C
of 3 megabits per second and a T of 16 microseconds. The slot duration
T must be long enough to allow a collision to be detected or at
least twice the Ether's round trip time. We limit in software the maximum
length of our packets to be near 4000 bits to keep the latency of network
access down and to permit efficient use of station packet buffer storage.
For packets whose size is above 4000 bits, the efficiency of our experimental
Ethernet stays well above 95 percent. For packets with a size approximating
that of a slot, Ethernet efficiency approaches 1/e, the asymptotic
efficiency of a slotted Aloha network [27].
There is more to the construction of a viable packet communication system
than simply providing the mechanisms for packet transport. Methods for error
correction, flow control, process naming, security, and accounting must
also be provided through higher-level protocols implemented on top of the
Ether control protocol described in Sections 3 and
4 above.
[7, 10, 12,21, 28, 34].
Ether control includes packet framing, error detection, addressing
and multi-access control; like other line control procedures, Ethernet is
used to support numerous network and multiprocessor architectures
[30, 31].
Here is a brief description of one simple error-controlling packet protocol.
The EFTP (Ethernet File Transfer Protocol) is of interest both because it
is relatively easy to understand and implement correctly and because it
has dutifully carried many valuable files during the development of more
general and efficient protocols.
7.1. General Terminology
In discussing packet protocols, we use the following generally useful terminology.
A packet is said to have a source and a destination. A
flow of data is said to have a sender and a receiver,
recognizing that to support a flow of data some packets (typically acknowledgments)
will be sourced at the receiver and destined for the sender. A connection
is said to have a listener and an initiator and a service
is said to have a server and a user. It is very useful
to treat these as orthogonal descriptors of the participants in a communication.
Of course, a server is usually a listener and the source of data-bearing
packets is usually the sender.
7.2 EFTP
The first 16 bits of all Ethernet packets contain its interface-interpretable
destination and source station addresses, a byte each, in that order (see
Figure 2). By software convention, the second 16 bits of all Ethernet packets
contain the packet type. Different protocols use disjoint sets of packet
types. The EFTP uses 5 packet types: data, ack, abort, end, and endreply.
Following the 16-bit type word of an EFTP packet are 16 bits of sequence
number, 16 bits of length, optionally some 16-bit data words, and finally
a 16-bit software checksum word (see Figure 4). The Ethernet's hardware
checksum is present only on the Ether and is not counted at this level of
protocol.
- Fig. 4. EFTP packet layout
-
It should be obvious that little care has been taken to cram certain fields
into just the right number of bits. The emphasis here is on simplicity and
ease of programming. Despite this disclaimer, we do feel that it is more
advisable to err on the side of spacious fields; try as you may, one field
or another will always turn out to be too small.
The software checksum word is used to lower the probability of an undetected
error. It serves not only as a backup for the experimental Ethernet's serial
hardware 16-bit cyclic redundancy checksum (in
Figure 2), but also for protection
against failures in parallel data paths within stations which are not checked
by the CRC. The checksum used by the EFTP is a l's complement add and cycle
over the entire packet, including header and content data. The checksum
can be ignored at the user's peril at either end; the sender may put all
l's (an impossible value) into the checksum word to indicate to the receiver
that no checksum was computed.
7.2.1 Data transfer.
The 16-bit words of a file are carried from sending station to receiving
station in data packets consecutively numbered from 0. Each data packet
is retransmitted periodically by the sender until an ack packet with a matching
sequen ce number is returned from the receiver. The receiver ignores all
damaged packets, packets from a station other than the sender, and packets
whose sequence number does not match either the expected one or the one
preceding. When a packet has the expected sequence number, the packet is
acked, its data is accepted as part of the file, and the sequence number
is incremented. When a packet arrives with a sequence number one less than
that expected, it is acknowledged and discarded; the presumption is that
its ack was lost and needs retransmission [21].
7.2.2 End.
When all the data has been transmitted, an end packet is sent with the next
consecutive sequence number and than the sender waits for a matching endreply.
Having accepted an end packet in sequence, the data receiver responds with
a matching endreply and then dallys for some reasonably long period of time
(10 seconds). Upon getting the endreply, the sending station transmits an
echoing endreply and is free to go off with the assurance that the file
has been transferred successfully. The dallying receiver then gets the echoed
endreply and it too goes off assured.
Table 1. Ethernet Efficiency.
Q | P = 4096 | P = 1024 |
P = 512 | P = 48 |
1 |
1.0000 | 1.0000 | 1.0000 | 1.0000 |
2 | 0.9884 | 0.9552 | 0.9143 | 0.5000 |
3 | 0.9857 | 0.9447 | 0.8951 |
0.4444 |
4 | 0.9842 | 0.9396 | 0.8862 |
0.4219 |
5 | 0.9834 | 0.9367 | 0.8810 |
0.4096 |
10 | 0.9818 | 0.9310 | 0.8709 |
0.3874 |
32 | 0.9807 | 0.9272 | 0.8642 |
0.3737 |
64 | 0.9805 | 0.9263 | 0.8627 |
0.3708 |
128 | 0.9804 | 0.9259 |
0.8620 | 0.3693 |
256 | 0.9803 |
0.9257 | 0.8616 | 0.3686 |
The comparatively complex end-dally sequence is intended to make it practically
certain that the sender and receiver of a file will agree on whether the
file has been transmitted correctly. If the end packet is lost, the data
sender simply retransmits it as it would any packet with an overdue acknowledgement.
If the endreply from the data receiver is lost, the data sender will time
out in the same way and retransmit the end packet which will in turn be
acknowledged by the dallying receiver. If the echoed endreply is lost, the
dallying receiver will be inconvenienced having to wait for it, but when
it has timed out, the receiver can nevertheless be assured of successful
transfer of the file because the end packet has been received.
At any time during all of this, either side is free to decide communication
has failed and just give up; it is considered polite to send an abort packet
to end the communication promptly in the event of, say, a user- initiated
abort or a file system error.
7.2.3 EFTP shortcomings.
The EFTP has been very useful, but its shortcomings are many. First, the
protocol provides only for file transfer from station to station in a single
network and specifically not from process to process within stations either
on the same network or through a gateway. Second, process rendezvous is
degene~ate in that there are no mechanisms for finding processes by name
or for convenient handling of multiple users by a single server. Third,
there is no real flow control. If data arrives at a receiver unable to accept
it into its buffers, the data can simply be thrown away with complete assurance
that it will be retransmitted eventually. There is no way for a receiver
to quench the flow of such wasted transmissions or to expedite retransmission.
Fourth, data is transmitted in integral numbers of 16-bit words belonging
to unnamed files and thus the EFTP is either terribly restrictive or demands
some nested file transfer formats internal to its data words. And fifth,
functional generality is lost because the receiver is also the listener
and server.
Our experience with an operating Ethernet leads us to conclude that our
emphasis on distributed control was well placed By keeping the shared components
of the communication system to a minimum and passive, we have achieved a
very high level of reliability. Installation and maintenance of our experimental
Ethernet has been more than satisfactory. The flexibility of station interconnection
provided by broadcast packet switching has encouraged the development of
numerous computer networking and multiprocessing applications.
Acknowledgments. Our colleagues at the Xerox Palo Alto Research
Center, especially Tat C. Lam, Butler W. Lampson, John F. Shoch, and Charles
P. Thacker, have contributed in many ways to the evolution of Ethernet ideas
and to the construction of the experimental system without which such ideas
would be just so much speculation.
Received May 1975; revised December 1975
1. Abramson, N. The Aloha system. AFIPS Conf. Proc., Vol. 37, 1970 FJCC,
AFIPS Press, Montvale, N.J., 1970, pp. 281-285.
2. Abramson, N. and Kuo, F.F. Computer-Communication Net works. Prentice-Hall,
Englewood Cliffs, N.J., 1975.
3. Ashenhurst, R.L., and Vonderohe, R.H. A hierarchical network. Datamation
21, 2 (Feb. 1975), 40-44.
4. Baran, P. On distributed communications; Rand Corp. Memo RM-3420-PR,
Aug. 1964.
5. Barnes, G.H., Brown, R.M., Kato, M., Kuck, D.J., Slotnick, D.L., and
Stokes, R.A. The Illiac IV Computer. IEEE Trans. Computers C-17, 8 (Aug.
1968), 758-770.
6. Binder, R., Abramson, N., Kuo, F., Okinaka, A., and Wax, D. Aloha packet
broadcasting-a retrospect. AFIPS Conf. Proc., Vol. 44, 1975 NCC, AFIPS Press,
Montvale, N.J., 1975.
7. Cerf, V.G., and Kahn, R.E. A protocol for packet network intercommunication
IEEE Trans. Comm. COMM- 22, 5 (May 1974), 637-648.
8. The shrinking world: computer networks and communications. Computer
7, 2 (Feb. 1974).
9. Distributed-function computer architectures. Computer 7, 3 (March 1974).
10. Crocker, S.D., Heafner, J.F., Metcalfe, R.M., and Postel, J.B. Function-oriented
protocols for the Arpa computer network. AFIPS Conf. Proc., Vol. 40, 1972
SJCC, AFIPS Press, Montvale, N.J., 1972, pp. 271-279.
11. Crowther, W.R., Heart, F.E., McKenzie, A.A., McQuillan, J.M., and Walden,
D.C. Issues in packet-switching network de sign. AFIPS Conf Proc., Vol.
44, 1975 NCC, AFIPS Press, Mont vale, N.J., 1975, pp. 161-175.
12. Farber, D.J., et al. The distributed computing system. Proc. 7th Ann.
IEEE Computer Soc. International Conf., Feb. 1973, pp. 31-34.
13. Farber, D.J., A ring network. Datamation 21, 2 (Feb. 1975), 44 46.
14. Fraser, A.G. A virtual channel network. Datamation 21, 2 (Feb. 1975),
51-53.
15. Heart, F.E., Kahn, R.E., Ornstein, S.M., Crowther, W.R., and Walden,
D.C. The interface message processor for the Arpa computer network, AFIPS
Conf. Proc., Vol. 36, 1970 SlCC, AFIPS Press, Montvale, N.J., 1970, pp.
551-567.
16. Heart, F.E., Ornstein, S.M., Crowther, W.R., and Barker, W.B. A new
minicomputer-multiprocessor for the Arpa network. AFIPS Conf. Proc., Vol.
42, 1972 SJCC, AFIPS Press, Montvale, N.J., 1972, pp. 529-537.
17. Kahn, R.R. The organization of computer resources into a packet ratio
network. AFIPS Conf. Proc., Vol. 44, 1975 NCC, AFIPS Press, Montvale, N.J.,
1975, pp. 177-186.
18. Metcalfe, R.M. Strategies for interprocess communication in a distributed
computing system. Proc. Symp. on Computer Com mun. Networks and Teletraffic.
Polytechnic Press, New York, 1972.
19. Metcalfe, R.M. Strategies for Operating Systems in Computer Networks,
Proc. ACM National Conf.' August 1972, pp. 278-281.
20. Metcalfe, R.M. Steady-state analysis of a slotted and con trolled aloha
system with blocking. Proc. 6th Hawaiu Conf. on System Sci. Jan. 1973, pp.
375-380.
21. Metcalfe, R.M. Packet communication. Harvard Ph.D. Th., Project Mac
TR-I 14, Dec. 1973.
22. Metcalfe, R.M. Distributed algorithms for a broadcast queue. Talk given
at Stanford University in November 1974 and at the University of California
at Berkeley in February 1975, paper in preparation.
23. Murthy, P. Analysis of a carrier-sense random-access system with random
packet length. Aloha System Tech. Rep. B75-17, U. of Hawaii, May 1975.
24. Ornstein, S.M., Crowther, W.R., Kraley, M.F., Bressler, R.D., Michel,
A., and Heart, F.E. Pluribus-a reliable multiprocessor AFIPS Conf. Proc.,
Vol. 44, 1975 NCC, AFIPS Press, Montvale, N.J., 1970, pp. 551-559.
25. Retz, D.L. Operating system design considerations for the packet switching
environment. AFIPS Conf. Proc., Vol. 44, 1975 NCC, AFIPS Press, Montvale,
N.J., 1970, pp. 155-160.
26. Roberts, L., and Wessler, B. Computer network development to achieve
resource sharing. AFIPS Conf. Proc., Vol. 36, 1970 SJCC, AFIPS Press, Montvale,
N.J., 1970, pp. 543-549.
27. Roberts, L. Capture effects on Aloha channels. Proc. 6th Hawaii Conf.
on System Sci., Jan. 1973.
28. Rowe, L.A. The distributed computing operating system. Tech. Rep. 66,
Dep. of Information and Computer Sci., U. of California, Irvine, June 1975.
29. Rustin, R. (Ed.) Computer Networks (Proc. Courant Computer Sci. Symp.
3, December 1970), Prentice-Hall, Englewood Cliffs, N.J., 1970.
30. IBM synchronous data link control-general information. IBM Systems Development
Div., Pub. Center, Research Triangle Park, N.C., 1974.
31. IBM system network architecture-general information. IBM Systems Development
Div., Pub. Center, Research, Triangle Park, N.C., 1975.
32. Thomas, R.H. A resource sharing executive for the Arpanet. AFIPS ConL
Proc., Vol. 42, 1973 NCC, AFIPS Press, Montvale, N.J., 1973, pp. 155-163.
33. Thornton, J.E. Design of a Computer: the Control Data 6600. Scott Foresman
and Co., Glenview, III. 1970.
34. Walden, D.C. A system for interprocess communication in a resource sharing
computer network. Comm. ACM, 15, 4 (April 1972), 221-230.
35. Willard, D.G. Mitrix: A sophisticated digital cable communications
system Proc. National Telecommunications Conf., Nov. 1973.
36. Wulf, W., and Levin, R. C.mmp-a multi-mini-processor, AFIPS Conf. Proc.,
Vol. 41, 1972 FJCC, AFIPS Press, Montvale, N.J., 1972.