Dynamic Routing Protocols

Chapter Description

This sample chapter from CCIE: Routing TCP/IP Volume I shows how routers can discover how to correctly switch packets to their respective destinations automatically and share that information with other routers via dynamic routing protocols.

Link State Routing Protocols

The information available to a distance vector router has been compared to the information available from a road sign. Link state routing protocols are like a road map. A link state router cannot be fooled as easily into making bad routing decisions, because it has a complete picture of the network. The reason is that unlike the routing-by-rumor approach of distance vector, link state routers have firsthand information from all their peer7 routers. Each router originates information about itself, its directly connected links, and the state of those links (hence the name). This information is passed around from router to router, each router making a copy of it, but never changing it. The ultimate objective is that every router has identical information about the internetwork, and each router will independently calculate its own best paths.

Link state protocols, sometimes called shortest path first or distributed database protocols, are built around a well-known algorithm from graph theory, E. W. Dijkstra'a shortest path algorithm. Examples of link state routing protocols are:

  • Open Shortest Path First (OSPF) for IP

  • The ISO's Intermediate System to Intermediate System (IS-IS) for CLNS and IP

  • DEC's DNA Phase V

  • Novell's NetWare Link Services Protocol (NLSP)

Although link state protocols are rightly considered more complex than distance vector protocols, the basic functionality is not complex at all:

  1. Each router establishes a relationship—an adjacency—with each of its neighbors.

  2. Each router sends link state advertisements (LSAs), some

  3. Each router stores a copy of all the LSAs it has seen in a database. If all works well, the databases in all routers should be identical.

  4. The completed topological database, also called the link state database, describes a graph of the internetwork. Using the Dijkstra algorithm, each router calculates the shortest path to each network and enters this information into the route table.

Neighbors

Neighbor discovery is the first step in getting a link state environment up and running. In keeping with the friendly neighbor terminology, a Hello protocol is used for this step. The protocol will define a Hello packet format and a procedure for exchanging the packets and processing the information the packets contain.

At a minimum, the Hello packet will contain a router ID and the

instance, an IP address from one of the router's interfaces. Other fields of the packet may carry a subnet mask, Hello intervals, a specified maximum period the router will wait to hear a Hello before declaring the neighbor "dead," a descriptor of the circuit type, and flags to help in bringing up adjacencies.

When two routers have discovered each other as neighbors, they

Beyond building adjacencies, Hello packets serve as keepalives to monitor the adjacency. If Hellos are not heard from an adjacent neighbor within a certain established time, the neighbor is consider_ed unreachable and the adjacency is broken. A typical interval for the exchange of hello packets is 10 seconds, and a typical dead period is four times that.

Link State Flooding

After the adjacencies are established, the routers may begin sending out LSAs. As the term flooding implies, the advertisements are sent to every neighbor. In turn, each received LSA is copied and forwarded to every neighbor except the one that sent the LSA. This process is the source of one of link state's advantages over distance vector. LSAs are forwarded almost immediately, whereas distance vector must run its algorithm and update its route table before routing updates, even the triggered ones, can be forwarded. As a result, link state protocols converge much faster than distance vector protocols converge when the topology changes.

The flooding process is the most complex piece of a link state protocol. There are several ways to make flooding more efficient and more reliable, such as using unicast and multicast addresses, checksums, and positive acknowledgments. These topics are examined in the protocol-specific chapters, but two procedures are vitally important to the flooding process: sequencing and aging.

Sequence Numbers

A difficulty with flooding, as described so far, is that when all routers have received all LSAs, the flooding must stop. A time-to-live value in the packets could simply be relied on to expire, but it is hardly efficient to permit LSAs to wander the internetwork until they expire. Take the internetwork in Figure 4.8. Subnet 172.22.4.0 at router A has failed, and A has flooded an LSA to its neighbors B and D, advertising the new state of the link. B and D dutifully flood to their neighbors, and so on.

Figure 4.8 When a topology change occurs, LSAs advertising the change will be flooded throughout the internetwork.

Look next at what happens at router C. An LSA arrives from router B at time t1, is entered into C's topological database, and is forwarded to router F. At some later time t3, another copy of the same LSA arrives from the longer A-D-E-F-C route. Router C sees that it already has the LSA in its database; the question is, should C forward this LSA to router B? The answer is no because B has already received the advertisement. Router C knows this because the sequence number of the LSA it received from router F is the same as the sequence number of the LSA it received earlier from router B.

When router A sent out the LSA, it included an identical sequence number in each copy. This sequence number is recorded in the routers' topological databases along with the rest of the LSA; when a router receives an LSA that is already in the database and its sequence number is the same, the received information is discarded. If the information is the same but the sequence number is greater, the received information and new sequence number are entered into the database and the LSA is flooded. In this way, flooding is abated when all routers have seen a copy of the most recent LSA.

As described so far, it seems that routers could merely verify that their link state databases contain the same LSA as the newly received LSA and make a flood/discard decision based on that information, without needing a sequence number. But imagine that immediately after Figure 4.8's network 172.22.4.0 failed, it came back up. Router A might send out an LSA advertising the network as down, with a sequence number of 166; then it sends out a new LSA announcing the same network as up, with a sequence number of 167. Router C receives the down LSA and then the up LSA from the A-B-C path, but then it receives a delayed down LSA from the A-D-E-F-C path. Without the sequence numbers, C would not know whether or not to believe the delayed down LSA. With sequence numbers, C's database will indicate that the information from router A has a sequence number of 167; the late LSA has a sequence number of 166 and is therefore recognized as old information and discarded.

Because the sequence numbers are carried in a set field within the LSAs, the numbers must have some upper bound. What happens when this maximum sequence number is reached?

Linear Sequence Number Spaces

One approach is to use a linear sequence number space so large that it is unlikely the upper limit will ever be reached. If, for instance, a 32-bit field is used, there are 232 = 4, 294, 967, 296 available sequence numbers starting with zero. Even if a router was creating a new link state packet every 10 seconds, it would take some 1361 years to exhaust the sequence number supply; few routers are expected to last so long.

In this imperfect world, unfortunately, malfunctions occur. If a link state routing process somehow runs out of sequence numbers, it must shut itself down and stay down long enough for its LSAs to age out of all databases before starting over at the lowest sequence number (see the section "Aging," later in this chapter).

A more common difficulty presents itself during router restarts. If router A restarts, it probably will have no way of remembering the sequence number it last used and must begin again at, say, one. But if its neighbors still have router A's previous sequence numbers in their databases, the lower sequence numbers will be interpreted as older sequence numbers and will be ignored. Again, the routing process must stay down until all old LSAs are aged out of the internetwork. Given that a maximum age might be an hour or more, this solution is not very attractive.

A better solution is to add a new rule to the flooding behavior described thus far: If a restarted router issues to a neighbor an LSA with a sequence number that appears to be older than the neighbor's stored sequence number, the neighbor will send its own stored LSA and sequence number back to the router. The router will thus learn the sequence number it was using before it restarted and may adjust accordingly.

Care must be taken, however, that the last-used sequence number was not close to the maximum; otherwise, the restarting router will simply have to restart again. A rule must be set limiting the "jump" the router may make in sequence numbers—for instance, a rule might say that the sequence numbers cannot make a single increase more than one-half the total sequence number space. (The actual formulas are more complex than this example, taking into account age constraints.)

IS-IS uses a 32-bit linear sequence number space.

Circular Sequence Number Spaces

Another approach is to use a circular sequence number space, where the numbers "wrap"—that is, in a 32-bit space the number following 4, 294, 967, 295 is 0. Malfunctions can cause interesting dilemmas here, too. A restarting router may encounter the same believability problem as discussed for linear sequence numbers.

Circular sequence numbering creates a curious bit of illogic. If x is any number between 1 and 4, 294, 967, 295 inclusive, then 0 < x < 0. This situation can be managed in well-behaved internetworks by asserting two rules for determining when a sequence number is greater than or less than another sequence number. Given a sequence number space n and two sequence numbers a and b, a is considered more recent (of larger magnitude) in either of the following situations:

  • a > b, and (ab) _ n/2

  • a < b, and (ba) > n/2

For the sake of simplicity, take a sequence number space of six bits, shown in Figure 4.9:

n = 26 = 64, 	so n/2 = 32. 

Figure 4.9 A six-bit circular address space.

Given two sequence numbers 48 and 18, 48 is more recent because by rule (1):

48 > 18	and	(48 – 18) = 30,	and	30 < 32. 

Given two sequence numbers 3 and 48, 3 is more recent because by rule (2):

3 < 48	and	 (48 – 3) = 45,	and	45 > 32. 

Given two sequence numbers 3 and 18, 18 is more recent because by rule (1):

18 > 3	and	(18 – 3) = 15,	and	15 < 32. 

So the rules seem to enforce circularity.

But what about a not-so-well-behaved internetwork? Imagine an internetwork running a six-bit sequence number space. Now imagine that one of the routers on the internetwork decides to go belly-up, but as it does so, it blurts out three identical LSAs with a sequence number of 44 (101100). Unfortunately, a neighboring router is also malfunctioning—it is dropping bits. The neighbor drops a bit in the sequence number field of the second LSA, drops yet another bit in the third LSA, and floods all three. The result is three identical LSAs with three different sequence numbers:

44

(101100)

40

(101000)

8

(001000)


Applying the circularity rules reveals that 44 is more recent than 40, which is more recent than 8, which is more recent than 44! The result is that every LSA will be continuously flooded, and databases will be continually overwritten with the "latest" LSA, until finally buffers become clogged with LSAs, CPUs become overloaded, and the whole internetwork comes crashing down.

This chain of events sounds pretty far-fetched. It is, however, factual. The ARPANET, the precursor of the modern Internet, ran an early link state protocol with a six-bit circular sequence number space; on October 27, 1980, two routers experiencing the malfunctions just described brought the entire ARPANET to a standstill.8

Lollipop-Shaped Sequence Number Spaces

This whimsically-named construct was proposed by Dr. Radia Perlman.9 Lollipop-shaped sequence number spaces are a hybrid of linear and circular sequence number spaces; if you think about it, a lollipop has a linear component and a circular component. The problem with circular spaces is that there is no number less than all other numbers. The problem with linear spaces is that they are—well—not circular. That is, their set of sequence numbers is finite.

When router A restarts, it would be nice to begin with a number a that is less than all other numbers. Neighbors will recognize this number for what it is, and if they have a pre-restart number b in their databases from router A, they can send that number to router A and router A will jump to that sequence number. Router A might be able to send more than one LSA before hearing about the sequence number it had been using before restarting. Therefore, it is important to have enough restart numbers so that A cannot use them all before neighbors either inform it of a previously used number or the previously used number ages out of all databases.

These linear restart numbers form the stick of the lollipop. When they have been used up, or after a neighbor has provided a sequence number to which A can jump, A enters a circular number space, the candy part of the lollipop.

One way of designing a lollipop address space is to use signed sequence numbers, where –k < 0 < k. The negative numbers counting up from –k to 1 form the stick, and the positive numbers from 0 to k are the circular space. Perlman's rules for the sequence numbers are as follows. Given two numbers a and b and a sequence number space n, b is more recent than a if and only if:

  1. a < 0 and a < b, or

  2. a > 0, a < b, and (ba) < n/2, or

  3. a > 0, b > 0, a > b, and (ab) > n/2.

Figure 4.10 shows an implementation of the lollipop-shaped sequence number space. A 32-bit signed number space N is used, yielding 231 positive numbers and 231 negative numbers. –N (–231, or 0x80000000) and N – 1 (231 – 1, or 0x7FFFFFFF) are not used. A router coming online will begin its sequence numbers at –N + 1 (0x80000001) and increment up to zero, at which time it has entered the circular number space. When the sequence reaches N – 2 (0x7FFFFFFE), the sequence wraps back to zero (again, N–1 is unused).

Figure 4.10 A lollipop-shaped sequence number space.

Next, suppose the router restarts. The sequence number of the last LSA sent before the restart is 0x00005de3 (part of the circular sequence space). As it synchronizes its database with its neighbor after the restart, the router sends an LSA with a sequence number of 0x80000001 (–N + 1). The neighbor looks into its own database and finds the pre-restart LSA with a sequence number of 0x00005de3. The neighbor sends this LSA to the restarted router, essentially saying, "This is where you left off." The restarted router then records the LSA with the positive sequence number. If it needs to send a new copy of the LSA at some future time, the new sequence number will be 0x00005de6.

Lollipop sequence spaces were used with the original version of OSPF, OSPFv1 (RFC 1131). Although the use of signed numbers was an improvement over the linear number space, the circular part was found to be vulnerable to the same ambiguities as a purely circular space. The deployment of OSPFv1 never progressed beyond the experimental stage. The current version of OSPF, OSPFv2 (originally specified in RFC 1247) adopts the best features of linear and lollipop sequence number spaces. It uses a signed number space like lollipop sequence numbers, beginning with 0x80000001. However, when the sequence number goes positive, the sequence space continues to be linear until it reaches the maximum of 0x7fffffff. At that point the OSPF process must flush the LSA from all link state databases before restarting.

Aging

The LSA format should include a field for the age of the advertisement. When an LSA is created, the router sets this field to zero. As the packet is flooded, each router increments the age of the advertisement.10

This process of aging adds another layer of reliability to the flooding process. The protocol defines a maximum age difference

The age of an LSA continues to be incremented as it resides in a link state database. If the age for a link state record is incremented up to some maximum age (MaxAge)—again defined by the spe

If the LSA is to be flushed from all databases when MaxAge is reached, there must be a mechanism to periodically validate the LSA and reset its timer before MaxAge is reached. A link state

The Link State Database

In addition to flooding LSAs and discovering neighbors, a third major task of the link state routing protocol is establishing the link state database. The link state or topological database stores the LSAs as a series of records. Although a sequence number and age and possibly other information are included in the LSA, these variables exist mainly to manage the flooding process. The important information for the shortest path determination process is the advertising router's ID, its attached networks and neighboring routers, and the cost associated with those networks or neighbors. As the previous sentence implies, LSAs may include two types of generic information:11

  • Router link information advertises a router's adjacent neighbors with a triple of (Router ID, Neighbor ID, Cost), where cost is the cost of the link to the neighbor.

  • Stub network information advertises a router's directly connected stub networks (networks with no neighbors) with a triple of (Router ID, Network ID, Cost).

The shortest path first (SPF) algorithm is run once for the router link information to establish shortest paths to each router, and then stub network information is used to add these networks to the routers. Figure 4.11 shows an internetwork of routers and the links between them; stub networks are not shown for the sake of simplicity. Notice that several links have different costs associated with them at each end. A cost is associated with the outgoing direction of an interface. For instance, the link from RB to RC has a cost of 1, but the same link has a cost of 5 in the RC to RB direction.

Figure 4.11 Link costs are calculated for the outgoing direction from an interface and do not necessarily have to be the same at all interfaces on a network.

Table 4.2 shows a generic link state database for the internetwork of Figure 4.11, a copy of which is stored in every router. As you read through this database, you will see that it completely describes the internetwork. Now it is possible to compute a tree that describes the shortest path to each router by running the SPF algorithm.

Table 4.2 The topological database for the internetwork in Figure 4.11.

Router ID

Neighbor

Cost

RA

RB

2

RA

RD

4

RA

RE

4

RB

RA

2

RB

RC

1

RB

RE

10

RC

RB

5

RC

RF

2

RD

RA

4

RD

RE

3

RD

RG

5

RE

RA

5

RE

RB

2

RE

RD

3

RE

RF

2

RE

RG

1

RE

RH

8

RF

RC

2

RF

RE

2

RF

RH

4

RG

RD

5

RG

RE

1

RH

RE

8

RH

RF

6


The SPF Algorithm

It is unfortunate that Dijkstra's algorithm is so commonly referred to in the routing world as the shortest path first algorithm. After all, the objective of every routing protocol is to calculate shortest paths. It is also unfortunate that Dijkstra's algorithm is often made to appear more esoteric than it really is; many writers just can't resist putting it in set theory notation. The clearest description of the algorithm comes E. W. Dijkstra's original paper. Here it is in his own words, followed by a "translation" for the link state routing protocol:

Construct [a] tree of minimum total length between the n nodes. (The tree is a graph with one and only one path between every two nodes.)

In the course of the construction that we present here, the branches are divided into three sets:

  1. the branches definitely assigned to the tree under construction (they will be in a subtree);
  2. the branches from which the next branch to be added to set I, will be selected;
  3. the remaining branches (rejected or not considered).

The nodes are divided into two sets:

  1. the nodes connected by the branches of set I,
  2. the remaining nodes (one and only one branch of set II will lead to each of these nodes).

We start the construction by choosing an arbitrary node as the only member of set A, and by placing all branches that end in this node in set II. To start with, set I is empty. From then onwards we perform the following two steps repeatedly.

Step 1: The shortest branch of set II is removed from this set and added to set I. As a result, one node is transferred from set B to set A.

Step 2: Consider the branches leading from the node, that has just been transferred to set A, to the nodes that are still in set B. If the branch under construction is longer than the corresponding branch in set II, it is rejected; if it is shorter, it replaces the corresponding branch in set II, and the latter is rejected.

We then return to step 1 and repeat the process until sets II and B are empty. The branches in set I form the tree required.12

Adapting the algorithm for routers, first note that Dijkstra describes three sets of branches: I, II, and III. In the router, three databases represent the three sets:

  • The Tree Database. This database represents set I. Links (branches) are added to the shortest path tree by adding them here. When the algorithm is finished, this database will describe the shortest path tree.

  • The Candidate Database. This database corresponds to set II. Links are copied from the link state database to this list in a prescribed order, where they become candidates to be added to the tree.

  • The Link State Database. The repository of all links, as has been previously described. This topological database corresponds to set III.

Dijkstra also specifies two sets of nodes, set A and set B. Here the nodes are routers. Specifically, they are the routers represented by Neighbor ID in the Router Links triples (Router ID, Neighbor ID, Cost). Set A comprises the routers connected by the links in the Tree database. Set B is all other routers. Since the whole point is to find a shortest path to every router, set B should be empty when the algorithm is finished.

Here's a version of Dijkstra's algorithm adapted for routers:

  1. A router initializes the Tree database by adding itself as the root. This entry shows the router as its own neighbor, with a cost of 0.

  2. All triples in the link state database describing links to the root router's neighbors are added to the Candidate database.

  3. The cost from the root to each link in the Candidate database is calculated. The link in the Candidate database with the lowest cost is moved to the Tree database. If two or more links are an equally low cost from the root, choose one.

  4. The Neighbor ID of the link just added to the Tree database is examined. With the exception of any triples whose Neighbor ID is already in the Tree database, triples in the link state database describing that router's neighbors are added to the Candidate database.

  5. If entries remain in the Candidate database, return to step 3. If the Candidate database is empty, then terminate the algorithm. At termination, a single Neighbor ID entry in the Tree database should represent every router, and the shortest path tree is complete.

Table 4.3 summarizes the process and results of applying Dijkstra's algorithm to build a shortest path tree for the network in Figure 4.11. Router RA from Figure 4.11 is running the algorithm, using the link state database of Table 4.2. Figure 4.12 shows the shortest path tree constructed for router RA by this algorithm. After each router calculates its own tree, it can examine the other routers' network link information and add the stub networks to the tree fairly easily tasks. From this information, entries may be made into the routing table.

Table 4.3 Dijkstra's algorithm applied to the database of Table 4.1.

Candidate

Cost to Root

Tree

Description

 

 

RA,RA,0

Router A adds itself to the tree as root.

RA,RB,2 RA,RD,4 RA,RE,4

2 4 4

RA,RA,0

The links to all of RA's neighbors are added to the candidate list

RA,RD,4 RA,RE,4 RB,RC,1 RB,RE,10

4 4 3

RA,RA,0 RA,RB,2

(RA,RB,2) is the lowest-cost link on the candidate list, so it is added to the tree. All of RB's neighbors except those already in the tree are added to the candidate list. (RA,RE,4) is a lower-cost link to RE than (RB,RE,10), so the latter is dropped from the candidate list.

RA,RD,4 RA,RE,4 RC,RF,2

4 4 5

RA,RA,0 RA,RB,2 RB,RC,1

(RB,RC,1) is the lowest-cost link on the candidate list, so it is added to the tree. All of RC's neighbors except those already on the tree become candidates.

RA,RE,4 RC,RF,2 RD,RE,3 RD,RG,5

4 5 7 9

RA,RA,0 RA,RB,2 RB,RC,1 RA,RD,4

(RA,RD,4) and (RA,RE,4) are both a cost of 4 from RA; (RC,RF,2) is a cost of 5. (RA,RD,4) is added to the tree and its neighbors become candidates. Two paths to RE are on the candidate list; (RD,RE,3)is a higher cost from RA and is dropped.

RC,RF,2 RD,RG,5 RE,RF,2 RE,RG,1 RE,RH,8

5 9 6 5 12

RA,RA,0 RA,RB,2 RB,RC,1 RA,RD,4 RA,RE,4

(RF,RE,1) is added to the tree. All of RE's neighbors not already on the tree are added to the candidate list. The higher-cost link to RG is dropped.

RE,RF,2 RE,RG,1 RE,RH,8 RF,RH,4

6 5 12 9

RA,RA,0 RA,RB,2 RB,RC,1 RA,RD,4 RA,RE,4 RC,RF,2

(RC,RF,2) is added to the tree, and its neighbors are added to the candidate list. (RE,RG,1) could have been selected instead because it has the same cost (5) from RA. The higher-cost path to RH is dropped.

RF,RH,4

 

RA,RA,0 RA,RB,2 RB,RC,1 RA,RD,4 RA,RE,4 RC,RF,2 RE,RG,1

(RE,RG,1) is added to the tree. RG has no neighbors that are not already on the tree, so nothing is added to the candidate list.

 

 

RA,RA,0 RA,RB,2 RB,RC,1 RA,RD,4 RA,RE,4 RC,RF,2 RE,RG,1 RF,RH,4

(RF,RH,4) is the lowest-cost link on the candidate list, so it is added to the tree. No candidates remain on the list, so the algorithm is terminated. The shortest path tree is complete.


Figure 4.12 The shortest path tree derived by the algorithm in Table 4.3.

Areas

An area is a subset of the routers that make up an internetwork. Dividing an internetwork into areas is a response to three concerns commonly expressed about link state protocols:

  • The necessary databases require more memory than a distance vector protocol requires.

  • The complex algorithm requires more CPU time than a distance vector protocol requires.

  • The flooding of link state packets adversely affects available bandwidth, particularly in unstable internetworks.

Modern link state protocols and the routers that run them are designed to reduce these effects, but cannot eliminate them. The last section examined what the link state database might look like, and how an SPF algorithm might work, for a small eight-router internetwork. Remember that the stub networks that would be connected to those eight routers and that would form the leaves of the SPF tree were not even taken into consideration. Now imagine an 8000-router internetwork, and you can understand the concern about the impact on memory, CPU, and bandwidth.

This impact can be greatly reduced by the use of areas, as in Figure 4.13. When an internetwork is subdivided into areas, the routers within an area need to flood LSAs only within that area and therefore need to maintain a link state database only for that area. The smaller database means less required memory in each router and fewer CPU cycles to run the SPF algorithm on that database. If frequent topology changes occur, the resulting flooding will be confined to the area of the instability.

Figure 4.13 The use of areas reduces link state's demand for system resources.

The routers connecting two areas (Area Border Routers, in OSPF

Distance vector protocols, such as RIP and IGRP, do not use areas. Given that these protocols have no recourse but to see a large internetwork as a single entity, must calculate a route to every network, and must broadcast the resulting huge route table every 30 or 90 seconds, it becomes clear that link state protocols utilizing areas can actually save system resources.

4. Interior and Exterior Gateway Protocols | Next Section Previous Section