OSPF Path Cost Consistency
What does OSPF do for me?
Open Shortest Path First (OSPF) is a standards-based routing protocol for IPv4. It is used to advertise paths to a network prefix between routers. As a router receives advertisements for prefixes, so it creates a representation of the network topology in a database. It then runs a process using an algorithm developed by Edsger Dijkstra to find the shortest path to the advertising node. The router will then use that result as a candidate path for its routing table.
In order for OSPF to be able to accurately represent the topology, each link in the network need to have a cost associated. OSPF uses costs which are inversely proportional to the bandwidth of the link. As a result, higher bandwidth links are preferred over lower.
How can that go wrong?
The representation of the network is incorrect when links are not correctly costed. This means that the algorithm may not be able to be trusted to give you the “best” path through the network, as lower bandwidth links are preferred. This in turn diverts traffic over sub-optimal or unintentional paths through the network towards its destination. At best, this means that traffic flows are unpredictable, which is a significant issue for planning and troubleshooting.
If you stick to the default configuration, and have a single-vendor network, then this situation should not arise. But there are a couple of primary causes of this situation which are common in complex networks.
When is cost not the cost?
Firstly, let us consider the cost value itself. The calculated cost is based on the bandwidth of the link as previously mentioned. But with technology improving all the time, capacity of links increases and we reach a point where we can’t differentiate between high speed links.
An interface cost is calculated using the following formula:
Cost = Reference bandwidth / Link bandwidth
If the reference bandwidth were 10Gbps, then a 1G link would have a cost of 10, a 100M link would have a cost of 100 and so on. If the reference bandwidth < link bandwidth, the cost is a minimum of 1, so as far as OSPF was concerned, in this situation a 100G link would have the same cost as a 10G link for example.
Up until fairly recently, Juniper JunOS had a default reference bandwidth value of 100M. Any link sized at 100M or above had the same cost in OSPF. Compare that with Cisco NX-OS for which the reference bandwidth currently defaults to 40G. While a 100G link would still have a cost of 1, a link of 10G would now have a cost of 4. This at least shows an improvement in the accuracy of the topology for the path calculation algorithm.
What happens when you have a mixture of Juniper and Nexus products? Without normalising your reference bandwidths, you lose consistency in the topology representation altogether! Each node can potentially have a different cost for the same link, and your path calculations now cannot be trusted to be optimal.
This situation arises when you allow the network node to automatically calculate the path cost based on its understanding of the bandwidth available.
Alternatively you, the network administrator, might decide that the routing policy dictates you choose one path over another. For example, you may have an expensive link charged back to you per KB of traffic passed. In this case you might choose to use that as a backup link. One way to achieve that might be to give the link a manual cost much higher than the automatic one.
The danger of mismatched link costs is significant here too. Unless you have a good way of documenting and tracking such manual changes, these can easily be overlooked.
Jim is miffed. He has been struggling with a problem for weeks with traffic taking the wrong path through his network. After hours of painstaking troubleshooting, he’s found an issue with a link between an IOS router and Nexus switch. The two devices are calculating different costs for the same link and so traffic is following the wrong path.
He’s particularly annoyed, as he has 250 sites with similar equipment and topologies. He doesn’t know if he has the same problem at them all or not!
There really is no easy way of manually checking the consistency of these link costs. Jim is going to have to:
- talk to one of the other engineers to get the commands he needs to retrieve information from the Nexus switches;
- log into every single one of the IOS routers and check which interfaces are enabled in OSPF;
- check the path costs for those interfaces and whether they are manual or automatically calculated;
- then he’ll log into every one of the connected Nexus switches and do the same, recording the results;
- and once he has collected the data, he can put it all into a spreadsheet and analyse it.
Let IP Fabric have a go
Alternatively he could give the job to IP Fabric.
When the platform carries out its discovery, it captures configuration and state information for every node, regardless of vendor. It will retrieve inventory, interface information, configuration and operational state. It then places all those elements in a database, then analyses them to understand connections between nodes. IP Fabric is then aware of how every node in the network is connected.
Jim creates the snapshot, then checks the results of the verification checks on the dashboard:
Under Neighbourship Compliance, the OSPF Cost Consistency check verifies that costs are the same at either end of OSPF neighbour relationships. Clicking through the amber indication, brings up the table of inconsistencies, and amongst those he can see:
Entries like this which show that one end – an IOS router – has a cost of 1, and the other – a NX-OS switch – with a cost of 40. This table gives Jim all the info he needs to raise changes to fix the issue.
If you have found this article helpful, please follow our company’s LinkedIn or Blog, where more content will be emerging. If you would like to test our solution to see for yourself how IP Fabric can help you manage your network more effectively, please contact us through www.ipfabric.io.