Identifying BGP instability
There are many elements that contribute to the availability of services delivered across any network. Redundancy in the topology, coupled with resilience in the configuration are key. Routing protocols are used to manage that redundancy and failover to backup traffic paths should a failure occur in the active path. For this to work successfully, it is vital that the environment remains as stable as possible and is not subject to constant change. IP Fabric can help you by analyzing routing protocol stability and pinpointing issues.
As we know, the stability of BGP peering can cause performance problems with large networks. Events such as link failures can trigger sequences of updates along paths in the network – this can cause:
- temporary forwarding loops for affected prefixes during convergence;
- bandwidth consumption spikes due to increased communication activity between network nodes;
- CPU utilization increases in network nodes due to the requirement to process large amounts of ever-changing data.
As an example of this, consider that the Internet routing table has reached such a size that it never actually fully converges. This is a symptom of the churn – the number of updates and withdrawals – caused by link events within and between ASs which of course occur around the clock!
BGP is usually used in an enterprise to connect together networks that are managed by different organizations or different parts of the same organization. It follows then that once established, the connections should stay up and remain so. Fluctuations in that connectivity have the potential to have far-reaching consequences and so it pays to keep track of the stability of that peering.
How stable is stable?
But how do we measure that stability? In particular, we might address this by focussing on two particular elements. For each BGP peering relationship in the network we might look to answer two questions:
- what state is the peering in? Is it fully established? And if so …
- how long has the peer relationship been in the established state? This might be an indicator that it is regularly cycling.
In order to check that manually, a network analyst might have to
- log in to each router, Layer 3 switch, and firewall in the network;
- using the appropriate vendor CLI commands, establish whether BGP sessions are configured;
- check the state of each BGP peering on the node;
- record the results in a spreadsheet;
- hand the details over to a more experienced engineer to analyze.
Or you could spend the time to write the scripts and develop the tooling to automate the process so you can repeat the checks at regular intervals.
Let IP Fabric have a go
Alternatively, you could give the job to IP Fabric.
IP Fabric analyses configuration and operational state of the devices in the network records them in a vendor-agnostic form in its database, then runs 120+ standard validation checks and presents the results on the product dashboard. These checks include identifying BGP peering across platforms and vendors and checking the relationships for the length of the establishment of peering:
and for current state:
Clicking through the dashboard on peerings in an active state shows a table of the details for those peerings, and you have all the details to hand.
Taking a step further, click through the site location in the table to see the topology with the peering in question from the “live” documentation:
Next, we focus on BGP topology by disabling all other protocols and enable the BGP Compliance intent verification check.
We can see that the platform has highlighted the problem with L64R7. IP Fabric presents information on the problematic peering with L64R4 when we select the router in question. The implication here is that L64R4 is not configured to peer with L64R7.
It is apparent that the peering appears to be configured in one direction and not the other from the arrows in the diagram. From the table, it looks like an IP address doesn’t appear to be assigned to the peering. On inspecting the routers we can see that L64R7 looks fine:
but the peering is disabled on L64R4:
And so IP Fabric has allowed us to drill down and reach the conclusion far quicker than going through a process of having to extract the detail, analyze it and troubleshoot manually.
- “BGP in 2019 – BGP Churn” by Geoff Huston, APNIC Blog, https://blog.apnic.net/2020/01/15/bgp-in-2019-bgp-churn/
- “RFC4271 – A Border Gateway Protocol (BGP-4)” by Yakov Rekhter, Tony Li, Susan Hares, IETF, https://tools.ietf.org/html/rfc4271
- “Routing Protocol Visualisation” by Milan Zapletal, IP Fabric, https://ipfabric.io/blog/routing-protocol-visualization/
If you have found this article helpful, please follow our company’s LinkedIn or Blog, where more content will be emerging. If you would like to test our solution to see for yourself how IP Fabric can help you manage your network more effectively, please contact us through www.ipfabric.io.