Switches, Routers, Bridges and LANs/Routers/BGP

An Exterior Gateway Protocol is any protocol that is used to pass routing information between two autonomous systems (AS), i.e. between networks that aren't under the control of a single common administrator. BGP is currently the de facto standard for exterior routing on the internet. The current version of BGP is v4, which has been in use since 1994, all earlier versions now being obsolete.

When two autonomous systems agree to exchange routing information then two routers that are used for exchanging information using BGP are known as BGP peers. As a router speaking BGP communicates with a peer in another AS which is near to the edge of the AS, this is referred to as a border gateway or border router.

BGP is a path vector protocol. Like distance vector protocols (and unlike link-state networks such as OSPF), it doesn't attempt to map the entire network. Instead, it maintains a database of the cost to access each subnet it knows about, and chooses the route that has the lowest cost. However, instead of storing a single cost, it keeps the entire path used to access each network (not with every single hop on the path, but with a list of ASes that the path passes through). This means that routing loops can be eliminated, which can be hard to ensure in simpler distance vector protocols like RIP, while still allowing the protocol to scale to the level of the entire internet.

Three types of activities involved in route advertisement are as follows:

It receives and filters route advertisements from directly attached neighbors. Received routes with paths that contain the router's own AS number are rejected, to avoid creating routing loops.
It selects the route. BGP router may receive several route advertisements to the same destination, and by default chooses a single route from among them as the preferred route (ECMP extends this to allow traffic to be load-balanced across several paths with equal cost).
It also sends route advertisements to its neighbors.

BGP was originally specified to advertise IPv4 routes only, but the multi-protocol extensions in version 4 of the BGP protocol allow routes in other address families to be shared via BGP. In particular, BGP can be used to share IPv6 routes. The transport protocol that the BGP peers use to communicate is typically IPv4, but can be IPv6 or indeed any other protocol. In keeping with the layered networking model, BGP specifies the packets to be exchanged but doesn't rely on any details of how the packets are transferred.

Internal and External BGP

BGP is the de facto exterior gateway protocol, so routers that connect networks to the internet have to speak it. But that isn't the end of the story. In the simplest case, each AS would have one border router that shared routes with the outside world, and all routing inside the AS would be done with an interior protocol (OSPF, RIP, etc.) However, larger networks will rarely have a single border router for the entire AS. As a result, routes that are received by one border router need to be propagated over to the other border router in order to be shared with the peers of that router.

One way to do this would be to insert the received routes into the existing interior protocol (OSPF, say), and use the interior protocol to propagate them to the other border router. The second border router could then redistribute them back into BGP and out onto the open internet. However, this has some drawbacks. Most seriously, the AS_PATH information from the route is lost (since OSPF and other protocols don't know how to share an AS_PATH) so the main method by which routing loops are eliminated is unable to operate. In addition, interior protocols are not designed to cope with the sheer volume of routes on the internet.

A more scalable alternative it to use internal BGP between the routers within the AS. This propagates routes between the routers in a similar way to external BGP, except that AS_PATHs are not appended to.

Route Reflection

One problem with using iBGP within an AS is that the BGP routers have to be connected in a full mesh. That is, every router has to be connected to every other router within the AS. The reason for this is that although routers using iBGP will pass on to their peers routes that have been learned via eBGP, these routes only travel one hop within the AS. No route that is received via iBGP is passed on to another peer via iBGP. This is because the AS_PATH information can't be used to eliminate routing loops.

The reason why it's undesirable to have to connect all the routers in a full mesh is that the number of peerings gets very large when a large number of routers are in operation: ${\frac {1}{2}}n(n-1)$ connections are necessary to make a mesh of $n$ routers. Each connection takes up CPU, memory and network bandwidth resources.

An alternative is to use route reflection. This allows one or more routers in the AS to readvertise iBGP routes, and avoids the possibility of routing loops by placing constraints on which routes can be readvertised.

To use route reflection, one or more routers are designated as route reflectors. Each route reflector divides its peerings into client peers and non-client peers. The route reflector will reflect routs between one group and the other, and between client peers. Non-client peers must be fully meshed.

Route Aggregation

BGP Functionality and Message Types

BGP peers perform three basic functions as follows:

Initial peer acquisition and authentication: the two peers establish a TCP connection and perform a message exchange that guarantees both sides have agreed to communicate.
Both side sends positive or negative reachability information: it will advertise a network as unreachable if one or more neighbors are no longer reachable, and no backup route is available for the routes in question.
Ongoing verification: It provides ongoing verification that the peers and the network connections between them are functioning correctly.

The BGP message types are:

OPEN	A soon as two BGP peers establish a TCP connection, they each send an OPEN message to declare their autonomous system number and establish other operating parameters. An OPEN message contains a suggested length for the hold timer, which is the maximum number of seconds which may elapse between the receipt of two successive messages. On receiving an OPEN message, the receiver replies with KEEPALIVE.
UPDATE	After TCP connection and the sending and receiving of OPEN and acknowledgement, peers use UPDATE to advertise the new destinations that are reachable or withdraw previous advertisement.
NOTIFICATION	This BGP message is used to inform a peer that an error has been detected or sender is about to close the BGP session.
KEEPALIVE	This is used to test network connectivity and to verify that both peers continue to function. BGP uses TCP for transport, and TCP does not include a mechanism to continually test whether a connection endpoints is reachable. Both sides sends KEEPALIVE so that they know if the TCP connection fails. The KEEPALIVE message is as short as possible, so as not to waste bandwidth.

The BGP state machine

BGP packet formats

The Message Header

Each BGP packet starts with a fixed-size header.

Marker: This field is for backward compatibility. It is 16 bytes of all ones.
Length: This is a 2-byte unsigned integer that specifies the length of the packet.
Type: This is one byte that specifies the type of the message: Open, Update, Notification or Keepalive.

The OPEN message

The OPEN message is the first message that each router sends to the other on a newly established peering. A successful OPEN message is acknowledged by sending back a KEEPALIVE message.

The BGP identifier is included in the OPEN message. This is a 4-byte number that must uniquely identify the router on the network. This must be the IPv4 address of one of the interfaces on the router. In theory, a router may not have any IPv4 addresses if it is only being used for IPv6, but in practice this rarely happens and doesn't matter anyway since any unique 4-byte value can be used.

The OPEN message can contain optional parameters. If it contains any parameters, then the Optional Parameter Length field will be set to a non-zero value to indicate the length. Each parameter in the parameters field is encoded as a group of three values: parameter type (1 byte), parameter length (1 byte) and a variable length (up to 255 bytes) field for the parameter.

The UPDATE message

An UPDATE message is sent from one peer to another to carry new information about the network: routes that are newly available and routes that are no longer available.

The NOTIFICATION message

The KEEPALIVE message

Since BGP doesn't rely on any details of the transport protocol that is used to carry packets between the peers, it can't make use of TCP to detect when a peer has become unavailable. Therefore the protocol requires that regular keepalive packets are sent between the peers. The hold timer is reset every time a packet is received, and the connection is closed if the timer runs out. UPDATE and NOTIFICATION packets also reset the timer, but if no other packet has been sent then the peer must send a KEEPALIVE packet. The keepalive packet is typically sent at one third of the hold time, in order to strike a balance between not flooding the network and ensuring that a single dropped packet doesn't cause the connection to be torn down.

BGP path attributes

BGP is designed to be extensible, so the base protocol allows for an extensible list of attributes to be attached to a route. BGP doesn't require that every BGP router understand every attribute that is used, but attributes are divided into four categories of how they should be handled:

Well-known mandatory: Every BGP router should recognise and process these attributes when received, and should advertise them to neighbors
Well-known discretionary: These attributes need not be advertised, but any BGP router should recognise them
Optional transitive: If the BGP router doesn't know what to do with this attribute, it will be passed on to its BGP neighbors
Optional nontransitive: If the BGP router doesn't know what to do with this attribute, it will be ignored.

Currently supported attributes include:

ORIGIN: Whether the route originated from an IGP, an EGP or elsewhere
AS_PATH: The list of ASes that the route has been through to reach the current router. Among other things, this enables the BGP router to reject routes that contain its own AS on the AS_PATH, since otherwise it could lead to a routing loop.
NEXT_HOP: The IP address of the router that should be used as the next hop for this route
MULTI_EXIT_DISC: An optional attribute that, if present, can be used by the router to choose between several different entry points to the same AS.
LOCAL_PREF: This attribute is only included on internal communication between peers within the same AS. It enables the BGP router to choose between external routes to the same subnet by using the route with the higher LOCAL_PREF value.
ATOMIC_AGGREGATE: This is used when the BGP router has aggregated several routes into one and omitted some ASes from the AS_PATH as a result.
AGGREGATOR: An option attribute that can be added by a BGP router to routes where the router has performed route aggregation. The attribute specifies the AS number and IP address of the router that performed aggregation.
COMMUNITY: A community value is used to specify a common property that can be applied to a number of routes. Some community values are standardised, but other community numbers can be allocated by any group of BGP ASes that can agree on the standard meaning.
ORIGINATOR_ID: The originator ID is used within an AS where route reflection is used to prevent routing loops. The originator ID is a 32-bit value that is either the router that injected the route into BGP (as a manually configured BGP prefix, or via redistribution from another protocol) or the border router that received the route via eBGP.
CLUSTER_LIST: Like the originator ID, this is used to prevent loops when route reflection is being used. It records the list of clusters that the route has passed through, much like the AS_PATH records the list of ASes that a route has passed through.