Switches, Routers, Bridges and LANs/Print version
The need for a layered approach
Computer networks have very high expectations of them: we expect to be able to communicate with anywhere in the world with minimal delay, and with high reliability. But the hardware that provides this communication sometimes fails, or needs to be upgraded. Nor is the internet a static system: new networks are connected every day, and some old networks stop working temporarily or permanently. The systems that are used for computer networking need to detect and work around errors, and do so automatically without waiting for human intervention. Errors can occur on any level, from tiny amounts of electrical interference or cosmic rays that might alter a single bit of information on a wire, to an entire trans-Atlantic network trunk cable being cut by a ship's anchor.
Writing a single protocol that can deal with all these eventualities is achievable in theory, but in practice the need to deal with all the possible sources of error would quickly make it unmanageably complex. In addition, there's a political problem with deploying one single monolithic protocol: everyone would have to use the same protocol, and everyone would have to adopt it at the same time.
The solution that developed is a common one in software engineering: the use of abstraction. Networking is implemented in terms of a series of layers, each of which only solves a small part of the whole problem of networked communications. However, each layer can rely on the services provided by the layers below it, so that the entire stack working together can solve a problem that no single layer solves. In addition, the layered model means that different parts of the network can solve the lower-layer problem in different ways, provided that they all implement the same interface that the higher-level layers rely on.
The OSI model
An early attempt to standardize the layers of networking was made in 1978 by the Open Systems Interconnection (OSI) project by the International Organization for Standardization (ISO). The product of this group described 7 network layers and specified a suite of protocols to operate at each of these layers. Though the protocols specified didn't catch on and were superseded by TCP/IP, the concepts of the 7 layers stuck, and are still used to this day to describe networking protocols. There is no strict need for protocols to comply with the OSI 7-layer model, and indeed many protocols blur the boundaries or collapse the functions of several layers into one protocol, but the layer numbers are still useful for informal communication.
Layers in the OSI model
Layer 1: the physical layer
The lowest layer of the stack deals with the physical details of sending signals from one place to another. Typically this information is carried encoded in electrical signals or laser light, but in principle any means of communication could be used. For example, if you had a way of encoding binary data into sound transmitted from a speaker to a microphone (with a second speaker and microphone to send data back the other way) then you could use the rest of the standard protocols on top of this physical layer without having to change them.
The physical layer needn't provide completely reliable transmission: the upper layers are responsible for detecting errors and resending data if necessary. The physical layer merely provides some way of propagating data from one node to another.
Among other things, the specification of a physical layer protocol will need to specify the voltage of an electric signal or frequency and power of a laser, size and shape of connectors, modulation of the signal, and the way that multiple nodes share the same link.
Layer 2: The Data Link Layer
The data link layer is responsible for providing the means to transfer data from one node to the other, if possible detecting or correcting errors in the physical layer. Layer 2 introduces the concept of unique addresses that identify the nodes that are communicating. Unlike layer 3, data link layer addresses use a flat structure, i.e. the structure of the address doesn't yield any information about the relative location of nodes or the route that traffic should take between them.
The most familiar layer 2 protocol is Ethernet, although the Ethernet standard also specifies details of the physical layer.
Layer 3: The Network Layer
The network layer builds on the lower layers to support routing data across interconnected networks (rather than within a single network). Addressing at the network layer takes advantage of a hierarchical structure so that it's possible to summarise the route to thousands or millions of hosts as a single piece of information. Typically, nodes that are on the same network as each other will share a common prefix on their layer 3 address. Layer 3 communication doesn't contain any concept of a continuing connection: each packet of data sent between a pair of communicating hosts is treated separately, with no knowledge of the packets that went before it. Layer 3 protocols may be able to correct errors in a packet that have been introduced at the physical or data link layer, but do not guarantee that no packet will be lost.
The dominant layer 3 protocol is IP.
Layer 4: The Transport Layer
The transport layer allows communicating hosts to establish an ongoing connection between them. Layer 4 protocols may detect missing packets and compensate by retransmitting them, but not all protocols do so: most obviously, TCP does provide reliable transmission but UDP doesn't. Providing reliable transmission incurs an overhead, and some data is obsolete once it has been even slightly delayed (e.g. data for a live phone call or video conference) so there are cases where unreliable transmission is desirable.
Layer 5: The Session Layer
The OSI model allowed for a fifth layer that provides the mechanism for creating, maintaining and destroying a semi-permanent session between end-user applications. For example, it might make it possible to checkpoint and restore communication sessions, or bring several streams from different sources into sync. In practice, although there are protocols that provide features of this type, layer 5 is rarely referred to as a general concept.
Layer 6: The Presentation Layer
Layer 6 is the layer at which data structures that have meaning to the application are mapped into a stream of bytes, the details of which need not concern the lower layers. In theory, this relieves the application layer of having to worry about the differences between one computer platform and another, e.g. a computer that uses ASCII to encode its text files communicating with one that uses EBCDIC. In practice, protocols rarely bother to differentiate this layer from the highest layer (the application layer), treating the two combined as one layer.
Layer 7: The Application Layer
The application layer covers the protocols that describe application-specific details of communication. FTP, HTTP and SMTP are all application-layer protocols.
Switching at different layers
The distinction between the aspects of communication that are constrained by each protocol layer may at first seem unimportant or arbitrary. This is true in the case of the most trivial network, consisting of just two computers with a single cable connecting them. However, as the number of nodes on the network increases, it gets more and more useful to clearly distinguish the responsibilities of each layer.
Two computers can share a single physical cable between them, but if we want to add a third computer to this micro-network, how do we connect it? Do we attempt to share the same physical cable by cutting it and splicing in a branch to the cable? This might work in theory, but would be inflexible in practice: apart from the time taken to cut and splice the cable, it would be very hard to add a node to the network without disrupting the network for the existing network users. A more maintainable solution is to plug each computer into a common hub. The hub will have network sockets on it that our network cables can plug into, and the circuitry within the hub will ensure that as soon as a cable is plugged in it will be able to send and receive signals with any other cables already connected.
The hub is a purely physical layer device. It doesn't know the meaning of any of the signals it transmits, nor does it make any decisions about which signals should go where or whether data is corrupt. Every signal on every cable is copied to all the other cables.
An alternative to using a physical-layer hub is to use a layer 2 switch to connect the hosts. Unlike a hub, a switch attempts to process the data it receives so as to understand something about the packets being transferred. A switch will only parse the layer 2 content of a packet, treating all the higher layer data as a blob of data that can be transferred without understanding it. The advantage of parsing the layer 2 wrapper is that this contains the source and destination addresses of the packet. If the switch knows which direction to send the packet (based on the destination address and its knowledge of the network) it can send the packet to only one link on the network, which saves bandwidth. If the switch doesn't know where to send the packet, it sends it to all interfaces other than the one on which it received the packet: this is called flooding.
Using switches in place of hubs saves bandwidth (by only sending packets on links that need to receive them), the only disadvantage being that switches are more complex devices and may cost more or require additional configuration. In practice, simple switches don't require any configuration and have long since become as cheap as hubs (or even cheaper, now that there is little demand for physical layer hubs).
This discussion has glossed over the detail of how exactly the switch knows where to send a particular packet. When a switch is first connected to the network, it doesn't know anything about the network. Without any knowledge, it has to flood every packet it receives onto every port, behaving in effectively the same way as a hub would. However, the switch can learn from each packet it sees, and this means it can make better decisions. For example, if a switch receives a packet with source address A and destination B on interface 2, it can conclude that source address A is reachable via interface 2. Next time it receives a packet with destination address A (whether or not it comes from address B) it can forward it straight to interface 2, saving bandwidth on the other links.
Inside every switch there is therefore a table mapping layer 2 addresses to port numbers. The size of this table is limited by the available memory, and the speed of looking up addresses in the table (packets must be looked up very quickly in order to be forwarded on without too much delay). The specialist hardware that enables this fast lookup is very expensive, so in practice if you want your switch to have more layer 2 address capacity you have to be prepared to pay extra for it. Home network switches might be able to store a few thousand addresses, while high-end data center switches can cope with tends or hundreds of thousands.
No matter how much you spend on your switch, it's never going to be able to cope with an entry in its lookup table for every one of the billions of hosts on the internet. The only sensible way to deal with this is to break down the network into sections, and store information about it a section at a time. This is the way that routing (which takes place at layer 3) differs from switching (taking place at layer 2).
“A device used to connect two separate Ethernet networks into one extended Ethernet. Bridges only forward packets between networks that are destined for the other network. Term used by Novell to denote a computer that accepts packets at the network layer and forward them to another network.”
Why Use Bridges?
Bridges are important in some network because the networks are geographically divided into many parts. Something is required to join these networks so that we can connect the whole network. Take for example LAN, if there is no medium to join these LAN an enterprise may be limited in its growth potential. The bridge is one of the tools to join these LANS.
Secondly LAN (for example Ethernet) can be limited in its distance. We can eliminate this problem using bridges so that we can connect the network within the building or campus using bridges. The geographically challenged networks can be connected using Bridges.
Third, the network administrator can control the amount of traffic going through bridges sent across the expensive network media.
Fourth, the bridge is plug and play device so there is no need to configure bridge. And suppose any machine was taken out from the network then there is no need for network administrator to update the information as bridges are self configured.
The MAC Bridge
Bridges are used to connect LANs. Therefore in determining how to transmit traffic between LANs they use a destination MAC address. Bridges pushes the function of network layer such as route discovery and forwarding to the data link layer. There is no conventional network layer for bridge.
The bridges can not maintain the data integrity. For example suppose there is a error in one frame and that frame is not transmitted properly the bridge will not give any acknowledgement to retransmit that frame. If the bridge becomes congested the frames can be discarded to make the traffic smooth. On the other hand the bridges are easy to implement and no need to configure them.
Types of Bridges
- Transparent basic bridge
- Source routing bridge
- Transparent learning bridge
- Transparent spanning bridge
The Transparent Basic Bridge
The simplest type of bridge is the transparent basic bridge. It stores the traffic until it can transmit it to the next network. The amount of time the data is stored is very brief. Traffic is sent to all ports except the port from which the bridge received the data. No conversion of traffic is performed by a bridge. In this regard, the bridge is similar to a repeater.
Source Routing Bridge
The route through the LAN internet is determined by the source (originator) of the traffic hence this bridge is called as source routing bridge. The routing information field (RIF) in the LAN frame header, contains the information of route followed by the LAN network.
The intermediate nodes that are required to receive and send the frame must be identified by the routing information. For this reason source routing requires that the user traffic should follow the path determined by the routing information field.
The frames of the source routing protocol are different from the other bridge frames because the source routing information must be contained within the frame. The architecture of the other bridges and the source routing bridges are similar. Both uses MAC relay entity at the LAN node. Interfaces are provided through MAC relay entity and LLC.
The figure shown above shows the functional architecture of the source routing bridges. There are two primitives who are invoked between MAC entities and the LLC. The first is M_UNITDATA.request, and the second is M_UNITDATA.indication
The parameter in these primitives must provide the information about to create the frame (frame control), MAC address, the routing information which is used to forward the traffic through the LAN. The frame check sequence value should be included if frame check sequence operations are to be performed. The primitive also contain a user priority parameter, a data parameter, and a service class parameter. A user parameter and a service class parameter are used only with token rings and not used for other LANs, for example Ethernet or Token bus.
The Transparent Learning Bridge
The transparent bridge finds the location of user using the source and destination address. When the frame is received at the bridge it checks its source address and the destination address. The destination address is stored if it was not found in a routing table. Then the frame sent to all LAN excluding the LAN from which it came. The source address is also stored in the routing table. If another frame is arrived in which the previous source address is now its destination address then it is forwarded to that port.
The physical topology of transparent bridges cannot allow the loops in the network. This is the restriction over transparent learning bridge. The whole operation of this bridge is operated by the bridge processor which is responsible for routing traffic across its ports. The processor decides the destination ports of associated MAC addresses by accessing a routing database. When a frame arrives the processor will check the output port in the database on which the frame will be relayed. If the destination address is not in the database then the processor will broadcast that frame onto all ports except the port from which the frame was arrived. The bridge processor also stores the source address in the frame because this source address may be the destination address for another incoming frame.
The bridge processes the incoming frame as shown above in figure . The Bridge will check the source and destination address of the incoming frame. In this case the source address is ‘B’ and the destination address is ‘A’. After accessing its routing table, it finds that the destination address is not present in the database and it broadcast the frame to all outgoing ports except the port from which it was arrived. After forwarding the frame it will check the SA (source address) and confirms that it knows it. If SA is present in the database, it will refresh the database and update this entry by refreshing the timer which means that this address is still ‘timely and valid’. In the shown example it does not know about the SA of ‘B’. It stores ‘B’ into database that ‘B’ is an active station on the LAN and from the view point of the bridge ‘B’ can be found on the port 1.
As shown in figure , a frame arrives at the bridge on port 3 with DA (destination address) as ‘B’ and SA (source address) as C. The bridge will access its database and check the destination address is present or not. The bridge finds the destination port and determines that ’B’ can reach through its port 1. This decision was made from the previous operation in which a frame arrived from port 1 with SA= ’B’ and DA= ’A’. This source address becomes the destination address for this frame. So bridge know the exact destination port no. This time the bridge will not broadcast the frame because it knows the DA. It will forward the frame to port 1 and not on port 2. The bridge also stores in its routing table that the SA ‘C’ can be reached through port 3. This information will be useful in coming next frames. The learning bridge assumes that the frame received on an incoming port has been properly delivered by the other bridges and LANs so it does not forward the frame to the port from where it arrives. The learning bridge is totally based on trust.
In some situation bridges will not forward the frame to any of the port. The total filtering is possible as shown in the figure 5. a frame arrives to the bridge with SA= ’D’ and DA = ’B’ from port 1. The bridge will access its database and check the SA and DA and reveals that DA=’B’ can be found on port 1. The frame is arrived from the same port so it will not forward that frame to port 2 or port 3 and also not backward to port 1. It also checks the SA is present in the database. The SA =’D’ is not present in the database so it will add this entry to the database and time is attached to the entry.
The multicasting and broadcasting is permitted in learning bridge. AS shown in figure a frame arrives from port 1 with DA = ‘ALL’ (All 1s in the destination address field) and SA= ‘D’. The bridge will not update its routing table as ‘D’ is already present in the database. In this example traffic is sent to ports two and three.
The figure 6 shows examples of how a bridge forward and filter the incoming frame. A frame transmitted from station ‘A’ to station ‘B’ is not forwarded by bridge one because it assumes that the traffic was successfully transformed from ‘A’ to ‘B’ using broadcasting. The traffic sent by station ‘A’ to station ‘C’ must be forwarded by bridge 1 and discarded by bridge 2. And the traffic sent by station ‘A’ to station ’D’ must be forwarded by both of the bridges.
The following flow chart shows the overall working logic of the learning bridges. When the bridge will receive incoming frame from port Z, it will look for its destination MAC address in the database. If it is not present the bridge will broadcast that frame to all outgoing ports except from the arrival frame port. If the DA is present in the database, it will forward the frame else if the DA is the port from where it arrives, it will discard that frame. However if the DA is different then it will forward the frame to appropriate port. Then the bridge will check the SA is present in the database. If it is present then it will refresh the timer else it will add that SA to the database.
The Transparent Spanning Tree Bridge
The last type of bridge is transparent spanning bridge. These bridges use a subnet of the full topology to create a loop free operation. The following table show the logic of transparent spanning tree bridge. The received frame is checked by the bridge in following manner. The destination address of arrived frame is checked with routing table in the database. Here more information is required for bridge so the bridge port is also stored in the database. This information is known as port state information and it helps in deciding that, a port can be used for this destination address or not. The port can be in a block state to fulfill the requirements of spanning tree operations or in a forwarding state. If the port is in forwarding state the frame is routed across the port. The port can have different status such as; it may be in “disabled” state for the maintenance reason or may also be unavailable temporarily if databases are being changed in the bridge because of result of the change in the routed network.
The Configuration Message: The following fig. shows the format for the configuration message. It is also called as a bridge protocol data unit (BPDU). The following parameters are set to zero:
- The protocol identifier,
- Version identifier,
- Message type.
|1||Type of BPDU|
|4||Cost to root|
The root identifier field contains the identity of the root bridge, and a 2 octet field which is used for deciding the priority of the root bridge and the designated bridge. The root path cost field indicates the total cost from the transmitting bridge to the bridge which is listed in the root id. field. The flag field in the bridge message contains a topology change notification flag which is used to inform non root bridges that they should update station entries in cache. This field can also be used to indicate topology change notification bit. The bridges do not have to inform a parent bridge that a topology changes has occurred using the previous field. The parent bridge will perform this task. The priority and ID of the bridge that is sending the configuration message can be indicated by the bridge and port identifier. The message age field indicates the time in 1/256th of a second. The hello time field also indicates 1/256th of a second, defines the time between the sending of message by the root bridge. The forward delay field, also in 1/256th of a second, is the time during which a port should stay in an intermediate state such as learning or listening before moving from blocking state to forwarding state.
Spanning tree Algorithm
Some site uses two or more bridges in parallel between the pair of LANs to increase the reliability of the network as shown in figure 7. This arrangement introduces some looping problem in the network.
In above figure frame F, with unknown destination is handled.Bridges don’t know the destination address so each of the two bridges broadcast that frame. For this example this means copying it to LAN 2. After that bridge 1 sees the frame F2 which is a frame with unknown destination address, which it copies to LAN1, generating another frame F3 which is not shown in figure. Same way bridge 2 copies frame F1 to LAN1 generating F4 which is also not shown in figure. Bridge 1 now forward frame F4 and bridge 2 copies frame F3. This cycle goes on and on. This is where looping problem came in picture. The solution to this looping problem is bridges should communicate with each other and change their actual topology with spanning tree that reaches to each LAN in the network. In the spanning tree some bridges in the network are discarded as we want to construct the loop free topology. For example in figure shown below there are 10 routers connecting 9 different LANs. This configuration can be changed to loop free topology shown by the graph in figure 9. Here LANs are shown as node and the LANs are connected by the bridges shown by an arc. This graph can be reduced to the spanning tree by dropping the arc shown by dotted lines. Now there is only single path from one LAN to another. In this way looping problem was solved by the spanning tree and there is only single path from each source to each destination. Loops are totally removed using spanning 3.
To construct the spanning tree follow following Spanning tree Algorithm
- First of all select the root bridge. The root bridge is the bridge with the lowest serial number (this number is provided by the router manufacturer). All ports which are coming to the bridge or going out from the bridge are designated port. In our given example in figure the root bridge is ‘A’ and the ports coming from LAN 1 and LAN 2 are the designated ports.
- Then select a root port for the non-root bridge. Root port for the non-root bridge is the port with the lowest path cost to the root bridge. In our example the incoming port to bridge ‘B’ is lowest cost path. Same logic applies for the other bridges.
- Select a designated port on each segment. The designated port has the lowest cost to the root bridge. In our example the outgoing port from bridge ‘B’ is designated port which has the lowest cost to the root bridge. Same logic applies for the other bridges.
- After spanning tree algorithm determine the lowest cost spanning tree, it enables all root ports and the designated ports, and disables all other ports.
- The spanning tree algorithm continues to run during normal operation.
The advantages of bridging over routing are as follows:
- Transparent bridges are plug and play as they are self learning and do not require any configuration. For the assignment of network address routers require definition for each interface. These addresses should be unique.
- Bridging has less overhead for handling packets as compared to routing.
- Bridging is protocol independent while routing is protocol dependent.
- Bridging will forward all LAN protocols while router can route packets only.
Bridges are used to connect two (or more than 2) different distant LANs. For example a company may have different department at different locations each with its own LAN. The whole network should be connected so that it will act like one large LAN. This can be achieved by the placing a bridge on each LAN and connecting the bridges with the lines (the line given by the telephone company). In the figure10 the three LANs are connected with the remote bridges. As given in the figure we can join number of LANs using remote bridges.
Basics of Routing
What Is Routing?
Routing is the process of moving information across an internetwork from source to destination. At least one intermediate node must be encountered along the way. Routing and bridging look similar but the primary difference between the two is that bridging occurs at Layer 2 (the link layer) of the OSI reference model, whereas routing occurs at Layer 3 (network layer). One important difference between routing and bridging is that the layer 3 addresses are allocated hierarchically, so it is possible for a router to have a single rule allowing it to route to an entire address range of thousands or millions of addresses. This is an important advantage in dealing with the scale of the internet, where hosts are too numerous (and are added and removed too quickly) for any router to know about all hosts on the internet.
The role of routing the information in network layer is performed by routers. Routers are the heart of the network layer. Now first we look the architecture of the router, processing of datagram in routers and then we will learn about routing algorithms.
The Architecture of a router
A router will include the following components:
Input port performs several functions. The physical layer function is performed by the line termination unit. Protocol decapsulation is performed by data link processing. Input port also performs lookup and forwarding function so that packets are forwarded into the switching fabric of the router emerges at the appropriate output port. Control packets like packets carrying routing protocol information for RIP, OSPF etc. are forwarded to routing processor. Input port also performs input queuing when output line is busy.
Output port forwards the packets coming from switching fabric to the corresponding output line. It performs exact reverse physical and data link functionality than the input port. Output port also performs queuing of packets which comes from switching fabric.
Routing processor executes routing protocols. It maintains routing information and forwarding table. It also performs network management functions within the router.
The job of moving the packets to particular ports is performed by switching fabrics. Switching can be accomplished in number of ways:
- Switching via Memory: The simplest, easiest routers, with switching between output and input ports being done under direct control of CPU (router processor). Whenever a packet arrives at input port routing processor will come to know about it via interrupt. It then copies the incoming packets from input buffer to processor memory. Processor then extracts the destination address look up from appropriate forwarding table and copies the packet to output port’s buffer. In modern routers the lookup for destination address and the storing (switching) of the packet into the appropriate memory location is performed by processors input line cards.
- Switching via Bus: Input port transfers packet directly to the output port over a shared bus, without intervention by the routing processor. As the bus is shared only one packet is transferred at a time over the bus. If the bus is busy the incoming packets has to wait in queue. Bandwidth of router is limited by shared bus as every packet must cross the single bus. Examples: Bus switching CISCO-1900, 3-COM’s care builder5.
- Switching via Interconnection Networks: To overcome the bandwidth problem of a shared bus cross bar switching networks is used. In cross-bar switching networks input and output ports are connected by horizontal and vertical buses. If we have N input ports and N output ports it requires 2N buses to connect them. To transfer a packet from the input port to corresponding output port, the packet travels along the horizontal bus until it intersects with vertical bus which leads to destination port. If vertical is free the packet is transferred. But if vertical bus is busy because of some other input line must be transferring packets to same destination port. The packets are blocked and queued in same input port.
Processing the IP datagram
The incoming packets to the input port are stored in queue to wait for processing. As the processing begins, the IP header is processed first. The error checksum is performed to identify the errors in transmission. If it does not contain error then the destination IP address field is check. If it is for the local host then taking into account the protocol UDP, TCP or ICMP etc. the data field is passed to the corresponding module.
If the destination IP address is not for local host then it will check for the destination IP address in its routing table. Routing table consist of the address of next router to which the packet should be forwarded. Then the output operation are performed on the outgoing packet such as its TTL field must be decrease by one and checksum bits are again calculated and the packet is forwarded to the output port which leads to the corresponding destination. If the output port is busy then packet has to wait in output queue.
Packet scheduler at the output port must choose the packet from the queue for transmission. The selection of packet may be on the basis of First-come-first-serve (FCFS) or priority or waited fair queuing (WFQ), which shares the outgoing link “fairly” among the different end-to-end connections that have packets queued for transmission. For quality-of-service packet scheduling plays very crucial role. If the incoming datagram contains the routing information then the packet is send to the routing protocol which will modify the routing table entry accordingly.
Now we will take into consideration different routing algorithms. There are two types of protocol for transferring information form source to destination.
Routed vs Routing Protocol
Routed protocols are used to direct user traffic such as IP or IPX between routers. Routed packet contains enough information to enable router to direct the traffic. Routed protocol defines the fields and how to use those fields.
Routed protocols include:
- Internet Protocol
- Remote Procedure Call (RPC)
- Novell IPX
- Open Standards Institute networking protocol
- Banyan Vines
- Xerox Network System (XNS)
Routing protocol allow routers to gather and share the routing information to maintain and update routing table which in turn used to find the most efficient route to destination.
Routing protocol includes:
- Routing Information Protocol (RIP and RIP II)
- Open Shortest Path First (OSPF)
- Intermediate System to Intermediate System (IS-IS)
- Interior Gateway Routing Protocol (IGRP)
- Cisco's Enhanced Interior Gateway Routing Protocol (EIGRP)
- Border Gateway Protocol (BGP)
Design Goals of Routing Algorithms
Routing algorithms have one or more of the following design goals:
- Optimality: This is the capability of the routing protocol to search and get the best route. Routing metrics are used for finding best router. The number of hops or delay can be used to find the best path. Paths with fewer hops or paths having least delay should be preferred as the best route.
- Simplicity and low overhead: Routing algorithms also are designed to be as simple as possible. The routing algorithm must offer its functionality efficiently, with a minimum of software overhead. Efficiency is particularly important when the software implementing the routing algorithm must run on a computer with limited physical resources, or work with large volumes of routes.
- Robustness and stability: Routing protocol should perform correctly in the face of unusual or unforeseen circumstances, such as hardware failures, high load conditions, and incorrect implementations. This property of routing protocols is known as robustness. The best routing algorithms are often those that have withstood the test of time and that have proven stable under a variety of network conditions.
- Rapid convergence: Routing algorithms must converge rapidly. Convergence is the process of agreement, by all routers, on optimal routes. In a network when a network event causes routes to either go down or become available or cost of link changes, routers distribute routing update messages which causes the other network routers to recalculate optimal routes and eventually cause other routers in networks to agree on these routes.
- Flexibility: Routing algorithms should also be flexible. They should quickly and accurately adapt to a variety of network circumstances.
Classification of routing algorithms
Routing algorithms are mainly are of two types
- Static routing: In static routing algorithms the route followed by the packet always remains the same. Static routing algorithm is used when routes change very slowly. In this network administrator computes the routing table in advance, the path a packet takes between two destinations is always known precisely, and can be controlled exactly.
- Predictability: Because the network administrator computes the routing table in advance, the path a packet takes between two destinations is always known precisely, and can be controlled exactly.
- No overhead on routers or network links: In static routing there is no need for all the routers to send a periodic update containing reachability information, so the overhead on routers or network links is low.
- Simplicity: Configuration for small networks is easy.
- Lack of scalability: Computing the static routing for small number of hosts and routers is easy. But for larger networks finding static routes becomes cumbersome and may lead to errors.
- If a network segment moves or is added: To implement the change, you would have to update the configuration for every router on the network. If you miss one, in the best case, segments attached to that router will be unable to reach the moved or added segment. In the worst case, you'll create a routing loop that affects many routers
- It cannot adapt failure in a network: If a link fails on a network using static routing, then even if an alternative link is available that could serve as a backup, the routers won't know to use it.
- Dynamic routing: Machines can communicate to each other trough a routing protocol and build the routing table. The router then forwards the packets to the next hop, which is nearer to the destination. With dynamic routing, routes change more quickly. Periodic updates are send on the network, so that if there is change in link cost then all the routers on the network come to know it and will change there routing table accordingly.
- Scalability and adaptability: A dynamically routed network can grow more quickly and grow larger without becoming unmanageable. It is able to adapt to changes in the network topology brought about by this growth.
- Adaptation to failures in a network: In dynamic routing routers learn about the network topology by communicating with other routers. Each router announces its presence. It also announces the routes it has available to the other routers on the network. Because of this if you add a new router, or add an additional segment to an existing router, the other routers will hear about the addition and adjust their routing tables accordingly.
- Increase in complexity: In dynamic routing it has to send periodic updates about the communicating information about the topology. The router has to decide exactly what information it must send. When the router comes to know about the network information from other routers it is very difficult to correctly adapt to changes in the network and it must prepare to remove old or unusable routes, which adds to more complexity.
- Overhead on the lines and routers: Routers periodically send communication information in packets to the other router about the topology of the network. These packets does not contain user information but only the information necessary for the routers so it is nothing but the extra overhead on the lines and routers.
Classification of Dynamic Routing Protocols
The first classification is based on where a protocol is intended to be used: between your network and another's network, or within your network: this is the distinction between interior and exterior. The second classification has to do with the kind of information the protocol carries and the way each router makes its decision about how to fill in its routing table which is link-state vs. distance vector.
Link State Routing
In a link-state protocol, a router provides information about the topology of the network in its immediate vicinity and does not provide information about destinations it knows how to reach. This information consists of a list of the network segments, or links, to which it is attached, and the state of those links (functioning or not functioning). This information is then broadcasted throughout the network. Every router can build its own picture of the current state of all of the links in the network because of the information broadcast throughout the network. As every router sees the same information, all of these pictures should be the same. From this picture, each router computes its best path to all destinations, and populates its routing table with this information. Now we will see the link state algorithm known as Dijkstra’s algorithm.
The notation and there meanings are as follows:
- Denotes set of all nodes in graph.
- is the link cost from node to node which are in . If both nodes are not directly connected, then . The most general form of the algorithm doesn't require that , but for simplicity we assumed that they are equal.
- is the node executing the algorithm to find the shortest path to all the other nodes.
- denotes the set of nodes incorporated so far by the algorithm to find the shortest path to all the other nodes in .
- cost of the path from the source node to destination node .
Definition of the Algorithm
In practice each router maintins two lists, known as Tentative and Confirmed. Each of these lists contains a set of entries of the form (Destination, Cost, Nexthop).
The algorithm works as follows:
- Initialize the Confirmed list with an entry for myself; this entry has a cost of 0.
- For the node just added to the Confirmed list in the previous step, call it node Next, select its LSP.
- For each neighbor (Neighbor) of Next, calculate the cost (Cost) to reach this Neighbor as the sum of the cost from myself to Next and from to Neighbor.
- If Neighbor is currently on neither the Confirmed not the Tentative list, then add (Neighbor, Cost, NextHop) to the Tentative list, where NextHop is the direction I go to reach Next.
- If Neighbor is currently on the Tentative list, and the Cost is less than the currently listed cost for Neighbor, then replace the current entry with (Neighbor , Cost, NextHop), where NextHop is the direction I go to reach Next.
- If the Tentative list is empty, stop. Otherwise, pick the entry from the Tentative
list with the lowest cost, move it to the Confirmed list, and return to step 2.
[algorithm from Computer Networks a system approach – Peterson and Davie.]
Now lets look at example : Consider the Network depicted below.
Steps for building routing table for A is as follows:
|1||( A,0,-)||Since A is the only new member of the confirmed list, look at its LSP.|
|2||(A,0,-)||(B,9,B) (C,3,C) (D,8,D)||A’s LSP says we can reach B through B at cost 9, which is better than anything else on either list, similarly for C and D.|
|3||(A,0,-) (C,3,C)||(B,9,B) (D,8,D)||Put lowest-cost member of Tentative (C) onto Confirmed list. Next, examine LSP of newly confirmed member (C)|
|4||(A,0,-) (C,3,C)||(D,4,C)||Cost to reach E through C is 4, so replace (B, infinity,-).|
|5||(A,0,-) (C,3,C) (D,4,C)||(B,6,E) (D,6,E)||Cost to reach B through E is 6, so replace (B, 9, B).|
|6||(A,0,-) (C,3,C) (D,4,C) (B,6,E)||The only node remains is D perform the steps 2 to 6 again and we will get distance of D from A through E is 6 by following algorithm. So next iteration add (D,6,E)|
|7||(A,0,-) (C,3,C) (D,4,C) (B,6,E) (D,6,E)||We are done. Now shortest path to all the destinations are know.|
Distance vector routing
Distance vector algorithm is iterative, asynchronous, and distributed. In distance vector each node receives some information from one or more of its directly attached neighbors. It then performs some calculation and may then distribute the result of calculation to its neighbors. Hence it is distributive. It is inteactive because this process of exchanging information continues until no more information is exchanged between the neighbors.
Let be the cost of the least-cost path from node to node y. The least cost are related by Bellman-Ford equation:
where min v in the equation is taken over all of x’s neighbors. After traveling from x to v , then we take the shortest path from v to y, the shortest path from x to y will be C(x, V) + dv(y). As we begin to travel to some neighbor v, the least cost from x to y is minimum of C(x, V) + dv(y) taken over all neighbours v.
In distance vector algorithm each node x maintains routing data. It should maintain :
- The cost of each link attached to neighbors i.e. for attach node v it should know C(x,v).
- It also maintains its routing table which is nothing but the x’s estimate of its cost to all destinations, y, in N.
- It also maintains distance vectors of each of its neighbors. i.e. Dv = [Dv(y): y in N]
In distributed , asynchronous algorithm each node sends a copy of distance vector time to time from each of the neighbors. When a node x receives a its neighbors distance vector then it saves it and update its distance vector as:
when node update its distance vector then it will send it to its neighbors. The neighbor performs the same actions this process continues until there is no information to send for each node.
Distance vector algorithm [ from kurose] is as follows :
At each node , x :
Lets consider the example: the network topology is given as
Now we will look at the steps for building the router table for R8 after step 1: after step 2: after step 3: and it is the solution. For node R8 now the routing table contains.
|Destination||Next hop||Cost to|
“In the network bad news travels slowly”. Consider R1, R2, R3 and R4 are the four routers connected in a following way.
The routing information of the routers to go from them to router R4 is R1 R2 R3 3, R2 2, R3 1, R4 Suppose R4 is failed. Then as there is no direct path between R3 and R4 it makes its distance to infinity. But in next data exchange R3 recognize that R2 has a path to R4 with hop 2 so it will update its entry from infinity to 2 + 1 = 3 i.e. (3,R3). In the second data exchange R2 come to know that both R1 and R2 goes to R4 with a distance of 3 so it updates its entry for R4 as 3 + 1 = 4 i.e (4, R3). In the third the data exchange the router R1 will change its entry to 4 + 1 = 5 ie ( 5, R2). This process will continue to increase the distance. The summary to this is given in following table.
|0||3, R2||2, R3||1, R4|
|1||3, R2||2, R3||3, R3|
|2||3, R2||4, R3||3, R3|
|3||5, R2||4, R3||5, R3|
|.. Count to infinity ...|
Solutions of count-to-infinity problem:
- Defining the maximum count
- For example, define the maximum count as 16 in RFC 1058 . This means that if all else fails the counting to infinity stops at the 16th iteration.
- Split Horizon
- Use of Split Horizon.. Split Horizon means that if node A has learned a route to node C through node B, then node A does not send the distance vector of C to node B during a routing update.
- Poisoned Reverse
- Poisoned Reverse is an additional technique to be used with Split Horizon. Split Horizon is modified so that instead of not sending the routing data, the node sends the data but puts the cost of the link to infinity. Split horizon with poisoned reserve prevents routing loops involving only two routers. For loops involving more routers on the same link, split horizon with poisoned reverse will not suffice.
Interior Gateway Protocol is nothing but intra-AS routing protocol. It determines how routing is performed within an autonomous system. The intra routing protocols are Routing Information Protocol : (RIP) RIP is the distance vector routing protocol. Participating machines in RIP are participated as active and passive machines. Active machines participate in adverting the route and passive machines do not advertise but they listen to RIP messages and use them to update their routing table. Routing updates are exchanged( broadcast) between neighbors approximately every 30 seconds. This update message is called as RIP response message or RIP advertisement. In RIP routers apply hysteresis when it learns from another router. Hysteresis means router does not replace the route with an equal cost route. It will improve performance and reliability.
OSPF is the link state protocol which uses flooding of link state information and Dijkrstra least cost path algorithm. It develops complete topological map with the help of OSPF and shortest-path in the autonomous system by running Dijkrastra’s algorithm on the router. In OSPF, router broadcasts routing information to all the other routers in the autonomous system. The router broadcast the link state information whenever there is change of link cost. It broadcast the link’s sate periodically at least once every 30 minutes. The OSPF protocol checks that links are operational by sending HELLO message that is sent to the attached neighbors. It allows the OSPF router to obtain a neighboring router’s database of network-wide link state.
Some of the advantages of OSPF are as follows:
- The specification is available in the published literature. As it is open standard anyone can implement it without paying the license fees that encourage many vendors to support OSPF.
- Load balancing is performed by OSPF. If there are multiple routes with the same cost then OSPF distributes traffic over all routes equally.
- OSPF allows site to partition networks and routers into subnets called area. The area topology is hidden from other area and each area is self-contained. The area can change its internal topology independently thus it permits growth and make the networks at a site easier to manage.
- All the exchanges between the routers are authenticated. OSPF allows variety of authentication schemes. Different area can use different authentication scheme. Router authentications are done because only trusted routers should propagate routing information.
- There is an integrated support for unicast and multicast routing. Multiple OSPF (MOSPF) provides multicast routing. MOSPF is simple extension to OSPF. MOSPF uses existing OSPF link database and adds a new type of link-state advertisement to the existing OSPF link-state broadcast mechanism.
- OSPF support for the hierarchy within a single routing algorithm.
Hierarchical structure of OSPF network is shown below:
[ Diagram similar to diagram from kurose]
- Internal router - They are within an area. They only perform intra AS routing.
- Area border router - These router belong to both an area and the backbone.
- attached to multiple areas
- runs a copy of SPF for each attached area
- relays topological info on attached areas to backbone
- Backbone router - These routes perform routing within the backbone but themselves are not area border routers.
- AS boundary routers – A boundary router perform inter-AS routing. It is through such a boundary router that other routers learn about paths to external networks.
An Exterior Gateway Protocol is any protocol that is used to pass routing information between two autonomous systems (AS), i.e. between networks that aren't under the control of a single common administrator. BGP is currently the de facto standard for exterior routing on the internet. The current version of BGP is v4, which has been in use since 1994, all earlier versions now being obsolete.
When two autonomous systems agree to exchange routing information then two routers that are used for exchanging information using BGP are known as BGP peers. As a router speaking BGP communicates with a peer in another AS which is near to the edge of the AS, this is referred to as a border gateway or border router.
BGP is a path vector protocol. Like distance vector protocols (and unlike link-state networks such as OSPF), it doesn't attempt to map the entire network. Instead, it maintains a database of the cost to access each subnet it knows about, and chooses the route that has the lowest cost. However, instead of storing a single cost, it keeps the entire path used to access each network (not with every single hop on the path, but with a list of ASes that the path passes through). This means that routing loops can be eliminated, which can be hard to ensure in simpler distance vector protocols like RIP, while still allowing the protocol to scale to the level of the entire internet.
Three types of activities involved in route advertisement are as follows:
- It receives and filters route advertisements from directly attached neighbors. Received routes with paths that contain the router's own AS number are rejected, to avoid creating routing loops.
- It selects the route. BGP router may receive several route advertisements to the same destination, and by default chooses a single route from among them as the preferred route (ECMP extends this to allow traffic to be load-balanced across several paths with equal cost).
- It also sends route advertisements to its neighbors.
BGP was originally specified to advertise IPv4 routes only, but the multi-protocol extensions in version 4 of the BGP protocol allow routes in other address families to be shared via BGP. In particular, BGP can be used to share IPv6 routes. The transport protocol that the BGP peers use to communicate is typically IPv4, but can be IPv6 or indeed any other protocol. In keeping with the layered networking model, BGP specifies the packets to be exchanged but doesn't rely on any details of how the packets are transferred.
Internal and External BGP
BGP is the de facto exterior gateway protocol, so routers that connect networks to the internet have to speak it. But that isn't the end of the story. In the simplest case, each AS would have one border router that shared routes with the outside world, and all routing inside the AS would be done with an interior protocol (OSPF, RIP, etc.) However, larger networks will rarely have a single border router for the entire AS. As a result, routes that are received by one border router need to be propagated over to the other border router in order to be shared with the peers of that router.
One way to do this would be to insert the received routes into the existing interior protocol (OSPF, say), and use the interior protocol to propagate them to the other border router. The second border router could then redistribute them back into BGP and out onto the open internet. However, this has some drawbacks. Most seriously, the AS_PATH information from the route is lost (since OSPF and other protocols don't know how to share an AS_PATH) so the main method by which routing loops are eliminated is unable to operate. In addition, interior protocols are not designed to cope with the sheer volume of routes on the internet.
A more scalable alternative it to use internal BGP between the routers within the AS. This propagates routes between the routers in a similar way to external BGP, except that AS_PATHs are not appended to.
One problem with using iBGP within an AS is that the BGP routers have to be connected in a full mesh. That is, every router has to be connected to every other router within the AS. The reason for this is that although routers using iBGP will pass on to their peers routes that have been learned via eBGP, these routes only travel one hop within the AS. No route that is received via iBGP is passed on to another peer via iBGP. This is because the AS_PATH information can't be used to eliminate routing loops.
The reason why it's undesirable to have to connect all the routers in a full mesh is that the number of peerings gets very large when a large number of routers are in operation: connections are necessary to make a mesh of routers. Each connection takes up CPU, memory and network bandwidth resources.
An alternative is to use route reflection. This allows one or more routers in the AS to readvertise iBGP routes, and avoids the possibility of routing loops by placing constraints on which routes can be readvertised.
To use route reflection, one or more routers are designated as route reflectors. Each route reflector divides its peerings into client peers and non-client peers. The route reflector will reflect routs between one group and the other, and between client peers. Non-client peers must be fully meshed.
BGP Functionality and Message Types
BGP peers perform three basic functions as follows:
- Initial peer acquisition and authentication: the two peers establish a TCP connection and perform a message exchange that guarantees both sides have agreed to communicate.
- Both side sends positive or negative reachability information: it will advertise a network as unreachable if one or more neighbors are no longer reachable, and no backup route is available for the routes in question.
- Ongoing verification: It provides ongoing verification that the peers and the network connections between them are functioning correctly.
The BGP message types are:
|OPEN||A soon as two BGP peers establish a TCP connection, they each send an OPEN message to declare their autonomous system number and establish other operating parameters. An OPEN message contains a suggested length for the hold timer, which is the maximum number of seconds which may elapse between the receipt of two successive messages. On receiving an OPEN message, the receiver replies with KEEPALIVE.|
|UPDATE||After TCP connection and the sending and receiving of OPEN and acknowledgement, peers use UPDATE to advertise the new destinations that are reachable or withdraw previous advertisement.|
|NOTIFICATION||This BGP message is used to inform a peer that an error has been detected or sender is about to close the BGP session.|
|KEEPALIVE||This is used to test network connectivity and to verify that both peers continue to function. BGP uses TCP for transport, and TCP does not include a mechanism to continually test whether a connection endpoints is reachable. Both sides sends KEEPALIVE so that they know if the TCP connection fails. The KEEPALIVE message is as short as possible, so as not to waste bandwidth.|
The BGP state machine
BGP packet formats
The Message Header
Each BGP packet starts with a fixed-size header.
- This field is for backward compatibility. It is 16 bytes of all ones.
- This is a 2-byte unsigned integer that specifies the length of the packet.
- This is one byte that specifies the type of the message: Open, Update, Notification or Keepalive.
The OPEN message
The OPEN message is the first message that each router sends to the other on a newly established peering. A successful OPEN message is acknowledged by sending back a KEEPALIVE message.
The BGP identifier is included in the OPEN message. This is a 4-byte number that must uniquely identify the router on the network. This must be the IPv4 address of one of the interfaces on the router. In theory, a router may not have any IPv4 addresses if it is only being used for IPv6, but in practice this rarely happens and doesn't matter anyway since any unique 4-byte value can be used.
The OPEN message can contain optional parameters. If it contains any parameters, then the Optional Parameter Length field will be set to a non-zero value to indicate the length. Each parameter in the parameters field is encoded as a group of three values: parameter type (1 byte), parameter length (1 byte) and a variable length (up to 255 bytes) field for the parameter.
The UPDATE message
An UPDATE message is sent from one peer to another to carry new information about the network: routes that are newly available and routes that are no longer available.
The NOTIFICATION message
The KEEPALIVE message
Since BGP doesn't rely on any details of the transport protocol that is used to carry packets between the peers, it can't make use of TCP to detect when a peer has become unavailable. Therefore the protocol requires that regular keepalive packets are sent between the peers. The hold timer is reset every time a packet is received, and the connection is closed if the timer runs out. UPDATE and NOTIFICATION packets also reset the timer, but if no other packet has been sent then the peer must send a KEEPALIVE packet. The keepalive packet is typically sent at one third of the hold time, in order to strike a balance between not flooding the network and ensuring that a single dropped packet doesn't cause the connection to be torn down.
BGP path attributes
BGP is designed to be extensible, so the base protocol allows for an extensible list of attributes to be attached to a route. BGP doesn't require that every BGP router understand every attribute that is used, but attributes are divided into four categories of how they should be handled:
- Well-known mandatory
- Every BGP router should recognise and process these attributes when received, and should advertise them to neighbors
- Well-known discretionary
- These attributes need not be advertised, but any BGP router should recognise them
- Optional transitive
- If the BGP router doesn't know what to do with this attribute, it will be passed on to its BGP neighbors
- Optional nontransitive
- If the BGP router doesn't know what to do with this attribute, it will be ignored.
Currently supported attributes include:
- Whether the route originated from an IGP, an EGP or elsewhere
- The list of ASes that the route has been through to reach the current router. Among other things, this enables the BGP router to reject routes that contain its own AS on the AS_PATH, since otherwise it could lead to a routing loop.
- The IP address of the router that should be used as the next hop for this route
- An optional attribute that, if present, can be used by the router to choose between several different entry points to the same AS.
- This attribute is only included on internal communication between peers within the same AS. It enables the BGP router to choose between external routes to the same subnet by using the route with the higher LOCAL_PREF value.
- This is used when the BGP router has aggregated several routes into one and omitted some ASes from the AS_PATH as a result.
- An option attribute that can be added by a BGP router to routes where the router has performed route aggregation. The attribute specifies the AS number and IP address of the router that performed aggregation.
- A community value is used to specify a common property that can be applied to a number of routes. Some community values are standardised, but other community numbers can be allocated by any group of BGP ASes that can agree on the standard meaning.
- The originator ID is used within an AS where route reflection is used to prevent routing loops. The originator ID is a 32-bit value that is either the router that injected the route into BGP (as a manually configured BGP prefix, or via redistribution from another protocol) or the border router that received the route via eBGP.
- Like the originator ID, this is used to prevent loops when route reflection is being used. It records the list of clusters that the route has passed through, much like the AS_PATH records the list of ASes that a route has passed through.
Border Router Selection
On a perimeter router there are generally two border routers to avoid single point of failure. Interior routers and hosts on the perimeter network choose a border router to deliver their Internet traffic.
Central Question in Border Router Selection
Now we will see how reliable internet connectivity is establish with at least one working border router with reliable connection. Reliability, complexity, and hardware requirements can be traded off to meet the needs.
Border Router Selection vs. Exit Selection
Exit selection is the process used by BGP to decide which exit from your AS will be used. Border router selection is the process your interior routers and hosts use to pick a border router. Border router selection happens first as a host or interior router must choose a border router. Then the chosen border router decides if the packet should exit through one of its connections or if it should instead be forwarded to another border router for delivery.
Border Router Selection with IBGP
If all interior routers and hosts on the perimeter network run IBGP with the border routers then the border router selection problem can be neatly solved. Selecting Border Router with IBGP
The copy of BGP routing tables from each border router is coped into the interior router and host by IBGP. The interior router and host would always pick the best border router for each destination as they learned via IBGP. A lot of complexity is added to most hosts because of BGP. The extra memory and CPU power required by BGP in interior routers may make them substantially more expensive than they'd be if they didn't run BGP. Hence, most network designs will run BGP only on the border routers and therefore be faced with the border router selection problem. Now we will discuss network policies for selecting border routers without using BGP. Border Router Selection with a Static Route The simplest way for a host or interior router to choose a border router is to use a static default route. Static routing may lead to “wrong” border router selection.
Host Choosing "Wrong" Border Router
Consider network had default static route pointing at Border RouterB and host wanted to deliver traffic to a customer of ISPA. ISPA was sending customer routes so that your AS was aware that the destination was a customer of ISPA. In this case, Border RouterB would have learned ISPA's customer routes via IBGP from Border RouterA. So Border RouterB would receive the traffic and immediately redirect or forward it to Border RouterA via the perimeter network. The traffic would've traversed the perimeter network twice, wasting bandwidth. But if Border RouterB fails then there's an even higher price to pay. Host Unreachable from Internet when Border Router Fails
The interior router would likely share an IGP with the border routers and your IGP should be configured to select a functioning border router with at least one good Internet connection. Your IGP would detect the failure of Border RouterB so your interior router would use Border RouterA as its default route. It has a static default route pointing at the now dead Border RouterB. Hence it has lost all Internet connectivity. This is another example of how static routes and reliable networks often don't mix.
Border Router Selection with HSRP
With the help of two or more routers can dynamically share a single IP address. Hosts that have static default routes pointing at this address will see a reliable exit path from your AS without having to listen to BGP or your IGP. HSRP isn't a routing protocol at all. It's simply a way for routers on the same multi-access network to present a "non-stop" IP address. HSRP has the benefit that it keeps host configuration simple—a commonly used static default is all that's required. It also reacts to failures in a matter of seconds. Here are some examples of HSRP in action. HSRP with Two Border Routers in Normal Operation
The site has a T3 for its primary Internet connection and a T1 on a different border router for a backup. The perimeter network interface of Border RouterA is configured to have address 10.0.0.253. The perimeter network interface of Border RouterB is configured to have address 10.0.0.254. Since Border RouterA has the primary Internet connection, HSRP on it is configured so that it normally also holds the shared virtual interface address (10.0.0.1) on its perimeter network interface. HSRP on Border RouterB is configured to monitor the health of Border RouterA. Internet traffic from the host follows the static default route toward 10.0.0.1 to Border RouterA and exits on the T3 when both border routers are operating. But suppose Border RouterA fails
HSRP with Failed Primary Border Router
Within seconds of Border RouterA's failure, Border RouterB's perimeter network interface takes over the shared virtual interface address (10.0.0.1). The static default route in the host now points to Border RouterB with no work on the host's part. Its Internet traffic now exits on the T1 via Border RouterB. Now suppose that the T3 fails but Border RouterA continues to operate. We want Border RouterB to take over the shared virtual address even though Border RouterA is still functioning. This case is handled by configuring Border RouterA to "give up" the address whenever it looses carrier detect on the T3.
HSRP with Failed Primary Internet Connection
This behavior is implemented with a priority system. Border RouterA is configured to lower its priority whenever carrier detect is lost on the T3. Border RouterB seizes control of the shared virtual interface address whenever it notices that its priority is now the highest in the group of routers sharing the address. (Yes, more than two routers can share a single virtual interface address.)
Limitations of HSRP
- HSRP won't help you if an interface fails to pass data but carrier detect doesn't drop. This type of failure can happen if line between you and the central office is good but the DAX at the CO fails. BGP will eventually notice this kind of failure and reroute your traffic—it just won't happen with the speed of HSRP.
- HSRP won't help your hosts pick the "optimal" border router. Note that HSRP is available on all Cisco routers, but can have only a single IP address on the lower-end routers (e.g. 1600, 2500, 2600, and 3600 series routers as of this writing).
- HSRP can appear to interfere with outbound load sharing if you're not taking at least customer routes from one of your ISPs.
- HSRP alone isn't sufficient for reliable Internet connectivity. You'll still need to have BGP configured correctly at your border routers and at all your ISPs to retain connectivity in the face of line and/or router failure.
Border Router Selection with Hosts Listening to IGP
HSRP is usually the best way for hosts to select a border router because it recovers quickly from failures and keeps host configuration simple. If you can't use HSRP, the next best choice for selecting a border router is to have hosts that listen to an IGP. It's most common for hosts to be able to listen to RIP, but the slow (several minute) convergence time of RIP makes it a poor IGP for those interested in reliability. OSPF makes a much better IGP, but is substantially more complicated than RIP.
Border Router Selection and Load Sharing
HSRP does a lot for reliability, but it can work against outbound load sharing in some cases. (Unfortunately, these cases often occur at sites with 2 T1s and more than 1 T1s worth of output bandwidth.) Load Sharing with BGP but Without HSRP Since both ISPs are sending only default routes, each border router will use its Internet connection for all exit traffic it receives. If each host generates about the same amount of outbound traffic, reasonably good outbound load sharing is achieved. (This might be especially desirable if both hosts together generated more traffic than would fit on either Internet connection individually.) Although the outbound load sharing might be good with this configuration, your outbound traffic might be reaching its destination through some pretty circuitous paths. As a quick reminder, think about what happens to traffic from HostB that destined for a customer of ISPA. It would have to be carried by at least ISPB (and perhaps several other ASes) before reaching ISPA. If either Internet connection fails, BGP will lose the default route it had heard through that connection. Exit traffic sent to either router will eventually exit on the remaining (working) Internet connection.
Load Sharing with BGP and HSRP
There are two changes that could be made to achieve both reliability and good outbound load sharing: •The border routers running HSRP could receive at least customer routes from one ISP. But this might require more memory be added to your border routers. •More than one HSRP virtual interface address could be used. Higher-end Cisco routers can be configured with two virtual interface addresses on the same physical interface. One of these addresses could be configured to favor Border RouterA in the normal case while the other were configured to favor Border RouterB in the normal case. Both would be configured to use the remaining working connection in the event of failure. HostA and HostB would then be configured with static default routes toward different HSRP virtual interface addresses. But lower-end Cisco routers support only one HSRP virtual interface address per physical interface. Alternatives here would be upgrading to higher-end routers or using lower-end routers with 2 interfaces to split the perimeter network.
Switches, Routers, Bridges and LANs/Network Architecture
Switching technologies are crucial to the new network design. Because the prices on layer 2 switching have been dropping dramatically, it is easier to justify the cost of buying switches for your entire network. This doesn't mean that every business can afford switch ports for all users, but it does allow for a cost-effective upgrade solution when the time comes. Layer 2 switching – Layer 2 switching is hardware based, which means it uses the Media Access Control (MAC) address from the host's network interface cards (NICs) to filter the network. Switches use Application-Specific Integrated Circuits. (ASICs) to build and maintain filter tables. It is OK to think of a layer 2 switch as a multiport bridge. Layer 2 switching provides the following: Hardware-based bridging (MAC) Wire speed High speed Low latency Low cost Layer 2 switching is so efficient because there is no modification to the data packet, only to the frame encapsulation of the packet, and only when the data packet is passing through dissimilar media (such as from Ethernet to FDDI). Use layer 2 switching for workgroup connectivity and network segmentation (breaking up collision domains). This allows you to create a flatter network design and one with more network segments than traditional 10BaseT shared networks. Layer 2 switching has helped develop new components in the network infrastructure:
Servers are no longer distributed to physical locations because virtual LANs can be created to create broadcast domains in a switched internetwork. This means that all servers can be placed in a central location, yet a certain server can still be part of a workgroup in a remote branch, for example.
Allows organization-wide client/server communications based on a Web technology. These new technologies are allowing more data to flow off of local sub-nets and onto a routed network, where a router's performance can become the bottleneck.
Limitations of Layer 2 Switching
Layer 2 switches have the same limitations as bridge networks. Remember that bridges are good if you design the network by the 80/20 rule: users spend 80 percent of their time on their local segment.
Bridged networks break up collision domains, but the network is still one large broadcast domain. Similarly, layer 2 switches (bridges) cannot break up broadcast domains, which can cause performance issues and limits the size of your network. Broadcast and multicasts, along with the slow convergence of spanning tree, can cause major problems as the network grows. Because of these problems, layer 2 switches cannot completely replace routers in the internetwork.
Layer 3 Switching –
The only difference between a layer 3 switch and a router is the way the administrator creates the physical implementation. Also, traditional routers use microprocessors to make forwarding decisions, and the switch performs only hardware-based packet switching. However, some traditional routers can have other hardware functions as well in some of the higher-end models. Layer 3 switches can be placed anywhere in the network because they handle high-performance LAN traffic and can cost-effectively replace routers. Layer 3 switching is all hardware-based packet forwarding, and all packet forwarding is handled by hardware ASICs. Layer 3 switches really are no different functionally than a traditional router and perform the same functions, which are listed here: Determine paths based on logical addressing Run layer 3 checksums (on header only) Use Time to Live (TTL) Process and responds to any option information Can update Simple Network Management Protocol (SNMP) managers with Management Information Base (MIB) information Provide Security The benefits of layer 3 switching include the following: Hardware-based packet forwarding High-performance packet switching High-speed scalability Low latency Lower per-port cost Flow accounting Security Quality of service (QoS) Layer 4 Switching – Layer 4 switching is considered a hardware-based layer 3 switching technology that can also consider the application used (for example, Telnet or FTP). Layer 4 switching provides additional routing above layer 3 by using the port numbers found in the Transport layer header to make routing decisions. These port numbers are found in Request for Comments (RFC) 1700 and reference the upper-layer protocol, program, or application. Layer 4 information has been used to help make routing decisions for quite a while. For example, extended access lists can filter packets based on layer 4 port numbers. Another example is accounting information gathered by NetFlow switching in Cisco's higher-end routers. The largest benefit of layer 4 switching is that the network administrator can configure a layer 4 switch to prioritize data traffic by application, which means a QoS can be defined for each user. For example, a number of users can be defined as a Video group and be assigned more priority, or band- width, based on the need for videoconferencing. However, because users can be part of many groups and run many applications, the layer 4 switches must be able to provide a huge filter table or response time would suffer. This filter table must be much larger than any layer 2 or 3 switch. A layer 2 switch might have a filter table only as large as the number of users connected to the network may be even less if some hubs are used within the switched fabric. However, a layer 4 switch might have five or six entries for each and every device connected to the network! If the layer 4 switch does not have a filter table that includes all the information, the switch will not be able to produce wire-speed results.
Multi-Layer Switching (MLS)
Multi-layer switching combines layer 2, 3, and 4 switching technologies and provides high-speed scalability with low latency. It accomplishes this high combination of high-speed scalability with low latency by using huge filter tables based on the criteria designed by the network administrator. Multi-layer switching can move traffic at wire speed and also provide layer 3 routing, which can remove the bottleneck from the network routers. This technology is based on the idea of route once, switch many. Multi-layer switching can make routing/switching decisions based on the following: MAC source/destination address in a Data Link frame IP source/destination address in the Network layer header Protocol filed in the Network layer header Port source/destination numbers in the Transport layer header There is no performance difference between a layer 3 and a layer 4 switch because the routing/switching is all hardware based.
LAN Switch Types
LAN switching is used to forward or filter frames based on their hardware destination. However, there are three different methods in which frames can be forwarded or filtered. Each method has its advantages and disadvantages, and by understanding the different LAN switch methods available, you can make smart switching decisions. There are three switching modes:
With the store-and-forward mode, the complete data frame is received on the switch's buffer, a cyclic redundancy check (CRC) is run, and then the destination address is looked up in the MAC filter table.
With the cut-through mode, the switch waits for only the destination hardware address to be received and then looks up the destination address in the MAC filter table.
FragmentFree is the default mode for the Catalyst 1900 switch; it is sometimes referred to as modified cut-through checks the first 64 bytes of a frame for fragmentation (because of possible collisions) before forwarding the frame.
Figure shows the different points where the switching mode takes place in the frame. The different switching modes are discussed in detail in the following sections.
Route Switch Modules (RSMs)
Route Switch Modules (RSMs) are also called internal route processors because the processing of layer 3 packets is internal to a switch. You need to add an RSM to a layer 2 device for example, a 5000 Catalyst switch--to be able to provide switching of layer 3 packets without a router. An RSM makes layer 2 switches a multi-layer switch and can integrate layer 2 and layer 3 functionality in a single box. The 5000 series uses the RSM or a Route Switch Feature Card (RSFC), and the 6000 series uses the Multilayer Switch Module (MSM) to perform this function. The RSM, RSFC, and MSM are configured in exactly the same way on the switch. The RSM is a module plugged directly into the switch, which runs the Cisco IOS in order to perform inter-VLAN communication. The 5000 series switch sees the RSM as a single trunked port and a single MAC address. In other words, it appears as a router on a stick to the switch. The RSM inter- face to the switch is through VLAN 0 and VLAN 1. VLAN 0 is not accessible to the administrator. The RSM uses two channels, and VLAN 0 maps to channel 0, which supports communication between the RSM and the Catalyst 5000 series default VLAN (VLAN 1). VLAN 1 maps to channel 1. The MAC address assigned to the RSM is from the Programmable Read Only Memory (PROM) on the line communication processor (LCP). This MAC address is used to identify the slot of the RSM and for diagnostics. The MAC addresses for VLAN 1 are assigned from a PROM that contains 512 MAC addresses. All routing interfaces except VLAN 0 use the base MAC address. The RSFC is a daughter card for the Supervisor Engine II G and Supervisor III G cards. The RSFC is a fully functioning router running the Cisco IOS. The MSM uses four full-duplex Gigabit Ethernet interfaces to connect to the switch and looks like an external router to the switch. These four inter- faces can be four separate links for four different VLANs, or they can be trunked and configured as one load-balanced link running EtherChannel and ISL or 802.1q. Subinterfaces are then used to configure each VLAN
Multi-layer Switching (MLS)
Fundamentals of MLS
You have undoubtedly heard of the term "router on a stick." Figure 7.1 depicts the router on a stick architecture.
As you can see from the diagram, there are multiple hosts using two separate VLAN assignments. One segment is running on VLAN10, and the other segment is running on VLAN50. Both VLANs or segments, are connected to the same switch. The switch is then connected to a router. Here we show an external router, but an RSM provides the same functionality, just internally.
By now you understand that for Host A on VLAN10 to communicate to Host D on VLAN50, packets must be routed through Router A. Because of the VLAN assignments, the switch must send the packet to the router on interface FE0/0.10. The router knows that the route to the network assigned to VLAN50 is through interface FE0/0.50. The packet is then sent back to the switch and forwarded to Host D.
Now back to our original question. Why use MLS? You can see from the diagram in Figure 7.1 that it very inefficient to have to use a router, or Route Switch Module (RSM), to move a packet from Host A to Host D when they are connected to the same device. MLS is used to bypass the router on subsequent packets of the same flow. A flow is created by using packet header information—Inter-Switch Link (ISL), layer 2, and layer 3 headers. There are several fields within a packet that make it unique:
- Source and destination IP addresses
- Source and destination MAC addresses
- Type of Service (TOS)
- Protocol type (i.e., HTTP, FTP, ICMP, etc.)
These are just some of the characteristics of a packet that can be used to establish a flow. A flow is defined by using a specified set of these attributes.
Cisco Catalyst switches require additional hardware to see the packet header information. Catalyst 5000 switches use the NetFlow Feature Card (NFFC) to gather this information and cache it. Catalyst 6000 series switches use the Multilayer Switch Feature Card (MSFC) and the Policy Feature Card (PFC) to gather and cache header information. There is a detailed process, which will be discussed later in the chapter, that allows switches to establish flows. MLS requires three components to function in any network Multilayer Switching Protocol (MLSP) is a protocol that runs on the router and allows it to communicate to the MLS-SE regarding topology or security changes. Multilayer Switching Route Processor (MLS-RP) can be an MLS-capable router or an RSM installed in the switch. Multilayer Switching Switching Engine (MLS-SE) is an MLS-capable switch (a 5000 with an NFFC or a 6000 with an MSFC and PFC).
Now that you have a basic understanding of what MLS does and what is required for MLS to function in a network, let's get into the nitty-gritty of how it works.
We discussed the three required components of MLS. It is important to understand how they work together to enable layer 3 switching. Let's look at a sample network topology that will support MLS. Figure 7.2 shows a simple architecture of a router and a switch with two connected hosts on the switch. Again, the hosts have different VLAN assignments, requiring the router's intervention to route packets. Notice that the figure depicts the main interface with two subinterfaces, FE0/0.2 and FE0/0.3. MLS follows a four-step process to establish the layer 3 switching functionality. These four steps can then be broken down into more detailed processes. The four steps required to enable MLS are as follows: MLSP discovery The MLS-RP uses MLSP to send hello packets out all interfaces to discover MLS-SE and establish MLS-RP/MLS-SE neighbor relationships. Identification of candidate packets The NFFC or PFC watches incoming packets and creates partial cache entries for them, thus identifying the packets as potential candidates for a flow, or candidate packets. Identification of enable packets The NFFC or PFC watches packets coming from the MLS-RP and tries to match them with candidate packet entries. If matches are made, the packets are tagged as enable packets and a shortcut forwarding entry is made in the CAM table. Subsequent flow packets are layer 3 switched Incoming packets are compared against CAM table entries. If the packets match the flow criteria, they are rewritten by the NFFC or PFC, then sent to the corresponding exit port for the flow. MLSP Discovery – Switches, NFFCs or PFCs specifically, need routers to perform the initial route table lookup and the packet rewrite. This dependency requires that MLS adjacencies are established between the switch and the router. This is accomplished using the MLSP protocol. Initially, the router, or MLS-RP, sends hello packets containing all of the MAC addresses and VLANs configured for use on the router. These messages are sent every 15 seconds to a layer 2 multicast address of 01-00-0C- DD-DD-DD. The intended recipients of these hello packets are the MLS-SE devices on the network. When an MLS-SE receives the information, it makes an entry in the CAM table of all the MLS-RP devices in the layer 2 network. Layer 2 is mentioned because MLS-SE devices are not concerned with devices that are not directly connected to layer 2 devices, such as switches. Figure 7.3 depicts the MLSP discovery process. Part of the information that is stored in the CAM table once an MLSP hello packet is received is an ID called an XTAG. The next section describes the significance and purpose of the XTAG. XTAGs Simply put, an XTAG is a unique identifier that MLS-SEs (switches) use to keep track of the MLS-RPs in the network. All of the MAC addresses and VLANs in use on the MLS-RP are associated to the XTAG value in the CAM table. You can clearly see that the MFSC has been assigned the XTAG value of 1. The MFSC receives the assignment because the MFSC acts as the MLS-RP. In this example, only one MAC address is associated with XTAG 1. How- ever, there are two VLANs associated with it.
Once MLS-SEs have established CAM entries for MLS-RPs, the switch is ready to start scanning packets and creating cache entries. This was described previously as identification of candidate and enable packets. The cache entries are made in order to maintain flow data. Flow data allows the MLS-SE to rewrite the packets with the new source and destination MAC address and then forward the packets. All this is done without sending the packets to the router for a route lookup and to be rewritten.
Cache entries happen in two steps:
- Candidate packet entries
- Enable packet entries
After these entries have been made in the MLS-SE, subsequent packets are matched against existing flow entries and dealt with accordingly.
Identifying Candidate Packets
The process of identifying candidate packets is quite simple. As has already been established, the MLS-SE has MAC address entries for any and all inter- faces that come from the MLS-RP. Using this information, the MLS-SE starts watching for incoming frames destined for any MLS-RP-related MAC addresses.
An incoming frame will match one of the following three criteria:
- Not destined for an MLS-RP MAC address
- Destined for an MLS-RP MAC address, but no cache entry exists for this flow
- Destined for an MLS-RP MAC address, but a cache entry already exists for this flow
Different actions will be taken by the MLS-SE, depending on which criteria match. We will discuss the first one right now. The others will be addressed in the following sections. If the incoming frame is not destined for a MAC address associated with the MLS-RP, no cache entry is made. No cache entry is made because MLS is used to avoid additional route lookups. If the frame is destined to another MAC address in the CAM table, the frame is layer 2 switched. Let's move on to discuss the processes for identifying and acting on the next two criteria. First we'll discuss what happens when an entry already exists. Then we'll cover the details of the cache entry process for a candidate packet. Figure depicts the occurrence of a candidate packet Cache Entry Exists When frames enter the switch destined for an MLS-RP MAC address, the MLS-SE checks to see if a cache entry has been made that matches the attributes of the current packet. As was mentioned briefly previously, each frame has distinguishing characteristics or attributes that allow the MLS-SE to categorize a packet into a flow. The MLS-SE uses these attributes to pattern match. If an incoming packet has the same attributes as an established flow cache entry, the packet is layer 3╜ or shortcut-switched. No Cache Entry When a qualified (destined for an MLS-RP MAC address) incoming frame is compared against the cache and fails (no match is found), a cache entry is made. At this point, the packet is tagged as a candidate packet. Once the cache entry is made, the packet is forwarded to the router (MLS-RP) for normal processing. Here the router performs the route lookup, rewrites the layer 2 header, and sends the packet out the next-hop interface, whichever it may be. The state of the MLS cache is only partial at this stage. A complete flow cache has not been established because the MLS-SE has only seen a packet come in and be forwarded to the router. It still needs to see something that it can tag as an enable packet come back from the router.
Identifying Enable Packets
Enable packets are the missing piece of the flow cache puzzle. Just as the MLS-SE watched all incoming frames destined for the MLS-RP MAC addresses, it also watches all of the packets coming from the MLS-RP. It watches these packets hoping for a match with the candidate packet cache entry. If it can make the match, the packet is tagged as an enable packet and the remaining elements of the flow cache are completed in the CAM table. Figure 7.5 depicts the occurrence of an enable packet. The match is made using the following criteria: The source MAC address is from an MLS-RP. The destination IP matches the destination IP of a candidate packet. The source MAC address is associated to the same XTAG value as the candidate packet's destination MAC address. If all three of these criteria are met, the MLS-SE completes the shortcut cache entry. Frame Modification It is important to understand that this shortcut switching occurs at layer 3. The layer 2 frame is rewritten by the switch. Normally, a router (layer 3 device) would rewrite the frame with the necessary information. A rewrite consists of changing the VLAN assignment, the source and destination MAC addresses, and the checksums. The MLS-SE can also modify the TTL, check- sums, TOS, and encapsulation Because MLS packets are no longer sent to the router, the MLS-SE must perform the rewrite function. When it changes the source and destination MAC address, the MLS-SE uses the MAC address of the MLS-RP for the source, and it changes the destination MAC to the MAC of the directly connected host. Through this procedure, the frame appears to the destination host as if it had come through the router. Figure 7.6 depicts the differences between the incoming frame and the exiting frame. Subsequent Packets Once the candidate and enable packets have been identified and a shortcut, or flow cache, has been established, subsequent packets are forwarded by the switch to the destination without the use of the router. Because the MLS-SE has the capability to rewrite the frames, it can make the necessary modifications and forward the frame directly to the destination host. The MLS-SE stores the necessary information in cache, such as the source and destination IP addresses, the source and destination MAC addresses, and the MLS-RP-related MAC addresses. Using this information, the MLS- SE is then capable of identifying packets belonging to a specific flow, rewriting the frame, and forwarding the packets to the proper destination. Disabling MLS There is a right way and a wrong way (not necessarily wrong, just unwanted) to disable MLS on a router or switch. Both methods will be discussed here. The Right Way The normal, and correct, way to disable MLS depends on the equipment you are using. Disabling MLS on a router can be paralleled with disabling MLS on an MSFC for a 6500 series switch. The command is even the same: no mls rp ip issued from the interface on either the router or the MSFC. To disable it completely, you can issue the same command from the global con- figuration mode. The consequences of this action vary depending on the sys- tem on which it is issued. When the command is issued on the router, the router alone disables MLS. When it's issued on an MSFC, MLS is disabled on the MSFC and the switch itself. That's why there is a difference when different switches are used. When you're using a 5000 series switch, MLS is disabled by default. However, on a 6000 series switch, MLS is enabled by default. To disable MLS on a 5000 series switch, use the set mls disable command. On a 6000 series, MLS should be disabled by issuing the no ip mls command on the MSFC. The Wrong Way There are several ways to inadvertently disable MLS on switches. Some are temporary, and others are permanent. Here is a list of MSFC/router com- mands that can disable MLS: no ip routing ip security ip tcp compression-connections ip tcp header-compression clear ip route By disabling IP routing on the MSFC or router, you automatically disable MLS. IP security disables MLS on the interface to which the command is applied. The same results occur with the IP TCP compression commands. Finally, the clear ip route command simply clears the MLS cache entries and the flow caches must be reestablished.