Read blog
De-risk your SD-WAN rollout with network digital twin technology.
read more

Change verifications using snapshots and intent-based networking

Change management from the engineering point of view. What is needed for change or migration on a project related to computer networks and how we can mitigate the risks.

Transcript

Hello, and welcome to IP Fabric Webinar about change verification using snapshots and intent based networking. Today, we are going to talk about change management from engineering point of view, and, what is needed to prepare the change, verify the change, and how we can mitigate the risks, that is associated with this, hinges. And, how can, intent based networking help you in, continuous digital transformation across your network? So first of all, I wanted to talk about the requirements. What do we need in change management?

Essentially, we want to, plan, prepare, and execute and report the results of that change. We need to answer a number of questions, you know, before we, in in the planning phase, such as, what is actually being changed. We need to thoroughly understand the technology that's being changed, how the, change should proceed, and, better full effects, for the patient's speech. We also need to assess, what is actually being impacted. Will there be any outages during the change?

What, what is the intended result of a change? Do we expect, a service to be improved? All, need to understand, what are the associated risks. The network by itself, can work without a hiccup until a change occurs. And then when a change occurs, it can be far reaching and, result in unintended consequences.

We want to control network infrastructure such as when we change, one part of the network that it doesn't, adversely impact another part of the network. For example, if we are changing a routing policy on just one single site of a global network, that routing policy at one remote location can actually, propagate a an incorrect route and cause, a black hole for, a wide range of networks that could adversely impact the whole network worldwide. It's a nightmare of every engineer that a simple change actually causes the global network outages. And this isn't, a, an isolated case because, such issues occur, rather frequently. So what is the risk with, change management?

It's that we either miss something that we should have known beforehand, because of, immense network complexity. There is a wealth of network state data that needs to be analyzed and, gathered. And in this phase, it's very easy to actually miss something. Not only from the point of view that you miss some systems that would be impacted by the change and, would go offline during the change, but also from the point of view that, a technology really does not operate as you expected it to operate before the change. So the big question is then, how can we reduce this risk?

How can we ensure that there will be, no adverse impact? And our answer, to this is, in-depth snapshots and intent based analytics, where we ensure that, all of the network state data is, gathered before and after the change, and that we provide access to all of the data at all times. Meaning that even if you realize that, you need certain, data to, to analyze the change further, After the fact, you can go back in time and find that specific data point, in the historical snapshot, that will provide you this, this information. And most importantly, we want to provide you an ability to, you know, verify that the change is performed as So let's look at this, practically, from the user interface from the user interface of, IP Fabric platform, and let me share that with you. Where we want when we want to prepare a change, we simply take a new snapshot of the network and ensure that, we have gathered all of the current Also, rely on, existing snapshots that was in in the pieces.

So let's say we have taken a, snapshot, we can, rename the snapshot that it's a network before a certain a certain change and, prepare that, prepare that specific change thoroughly. In this case, we are interested in, one particular site, which, has, which has a problem. So the problem that we are interested in solving is a, missing, redundancy in the in in the current network. So to to see what is exact where the redundancy, is missing, we can, visualize the network that needs changing and, see that, exactly, where the change needs to happen. In this case, in in this design, there is a, redundancy missing, in, for that particular site where we see that, outbound connectivity from the site is established through a BGP routing protocols and, which is connecting to, gateway routers of the site.

However, one of the gateway routers is not connected to, other routers at the site except, the other, gateway routers. So in this case, we want to provide redundancy for this site, and we want to enable decide to function even in case of a, single device failure. In this case, the, single point of failure here is, you know, EXR 2 or r four devices, that are displayed in the background here. We can display, much more information about, any of the devices by simply, clicking on on the device and seeing the specific, information about, present interfaces, routes, routing protocols, and and so on. In in this particular case, we simply want to enable, routing, towards, towards the external router.

To verify the impact of the change, we can, take this site and find all of the users that are reliant on the site connectivity. So if we'll go to, host inventory, We we can find out, the specific, site and see all of the hosts that are present at the site. So this list immediately gives us a complete overview of all of the individual hosts that are served by this site and provides us a list of devices, that might be impacted during the change. So, when project manager asks you what kind of systems will be impacted and what is the role of this system, you can provide them the whole list of, all of the impacted devices and, provide detailed information about, host connectivity, in case in case that is needed. This step already, ensures that, you are not surprised, when a certain system would, go down in case the change doesn't go as expected, and, have all of the information in your initial impact assessment.

You can also export this information and and share it, share it with the team, in, legacy, Excel format, or you can simply copy a link that will then be used, to share to share within your team. So that's in this way, you can ensure that, the impact of the change is fully known. Additionally, if the system is connected to the the the DNS, We could know what specific host names of those systems, that are being used and see if these systems actually provide some services towards other, you know, towards other parts of the network. Beforehand, we can also ensure that, the routes that, you know, in in this particular sites, in this particular site are, also present, throughout, throughout the network. So here we can see what specific routes are, connected in this particular site, and we want to ensure that the presence of all of these routes, throughout the network before and after the change.

So this is a pre change snapshot, So we know all of the routes that are that are connected in that site. Again, we can store this information for, sharing or, simply copy the view for for sharing within the team. You can look also at the specific protocols that, protocol that we are interested in, changing. In this case, it at this particular site, we can see the, all of the routing protocol sessions that that are established in this particular site. And we can also gather this information for, for analytical purposes.

So in this case, we can see all of the OSPF sessions that are present. And what we want to ensure is that these sessions are not affected, inadvertently affected by the, by the change. After, after performing change, we can ensure that, the change has gone as expected, simply by taking, another snapshot of the network. And, after the change and, visually verifying the first that the change, has happened, as expected. So in this case in this case and let's save this view as, our change window.

Where, before the change, we saw that there are, there were a single points of failure and non redundant links. Well, after the change, we have established, another routing session. Now there are no more single points of failure or non redundant links in this particular topology. To ensure that, the hosts still have connectivity, we would need to ensure that hosts are still reachable within the network. We can either verify verify that, in the host tables, by comparing the previously prepared list with a list available in the new snapshot, or we can set create a rule that would, do this automatically for us right in this table.

We can then verify the, that the routing has changed, as for that particular site. Unfortunately, in this case, we can actually see that, the route, some of the routing sessions that a number of routing sessions were affected, that we did not intend, because their session time is, much lower than expected. And, we can actually see if this, situation was a problem before as well, because, this routing problem did not result because of the change, but it's actually was something that was always present and, that was always affecting the network. So this is an example of where, we, needed an access to this additional information only after the fact, after the change. But we can go back in time and see what what is was the status of the change, of what is the status of the network before this particular change.

We can also verify the, changes themselves, by comparing the preaching snapshot and post change snapshot and see exactly what has changed from the device's point of view, part numbers, IP addresses, or connectivity matrix. In this case, we are interested in adding and removed IP addresses at, our, particular, site. And, we can see that, actually, there there wasn't just a single change happening, but a number of IP addresses have been added or removed within within that site. We can also verify a, the actual, routing sessions that were, established. And in this case, we can see that, a new routing session has been added, which provided, an additional connectivity on the external router.

However, there were also a number of other sessions that have been that have been added or removed, inside, inside that site. So overall, we can see that, although the change was, completed, as intended, there are a number of unintended consequences. And, we can, go back in, in time and see if those unintended consequences were a part of the site, a part of the, change procedure, or if they actually were always present in the network and, just surfaced, with the change. So I hope this, presentation was useful, and you could see how, IP Fabric can help you with the change management in your network. Thank you for your attention.

Webinar notes

Episode Title:

Change verifications using snapshots and intent-based networking

Topics:

  • Change Verifications
  • Intent-based networking
  • Engineering

Our hosts

Pavel Bykov

Pavel Bykov

CEO