Batfish Review

      Comments Off on Batfish Review

Contents

Note: This review used Batfish version 2021.04.12.882, Docker version 19.03.13, and Pybatfish version 2021.7.9.974.

Introduction

Networks are getting more complex as the number and types of devices increase as well as the supported services. Network engineers use various approaches to verify device configuration proactively before applying any changes to production networks. Simulators, emulators, and small prototypes are some of the approaches used to verify changes. After applying changes to production networks, tools such as ping and traceroute may be used for debugging. Other tools, including network management systems, are used to monitor the networks’ performance over the long term.

Recently, there has been an increased interest in developing analytical tools that analyze network configuration and discover errors systematically and proactively before applying them to production[1]. Batfish is one of these tools; it is an open-source network configuration analysis tool that “finds errors and guarantees the correctness of planned or current network configurations. It enables safe and rapid network evolution, without the fear of outages or security breaches.”[2] When compared to popular simulators/emulators tools that are currently used to verify and test network configuration, Batfish has several advantages over the popular GNS3[3]:

This article summarizes my evaluation of Batfish capabilities. I reviewed some of Batfish’s features by using it to analyze three networks created in a lab environment. The evaluation is done entirely using Batfish and Python scripts; no hardware or emulators were used for the evaluation.

What can Batfish do?

According to its developers, Batfish’s main capabilities are[4]:

  • Auditing Configuration Settings
    • Checking for errors in configuration files.
    • Checking configuration compliance to pre-defined standards.
  • Data Plane Analysis
    • Tracing packet flow between any two points in the network.
    • Confirming the status of all tunnels.
  • Reachability Analysis
    • Checking routing policies
  • ACL/Firewall Analysis
    • Checking the validity of the ACL rules.
    • Checking if traffic flows pass through a firewall,
  • Reliability Analysis
    • Testing failure scenarios

Batfish can analyze configuration files that represent an instant in time (snapshot). It can also compare two versions of configuration from the same network and report differences. For instance, it can compare a running configuration on a device to a configuration that represents a planned change.

How does Batfish work?

In summary, Batfish works as follows [4] [5]:

  1. Batfish uses raw network configuration files as input. Batfish supports input configurations from many vendors[6]
  2. Configuration is parsed into unified vendor-independent data model.
  3. Batfish computes the forwarding information base (FIB), and routing information base (RIB) for all devices.
  4. Network verification and query engine discovers all possible sources, destinations, flows, failures, routing advertisements, etc.

From a user’s perspective, Batfish consists of two components:

  • A Batfish application, written in Java and delivered as a Docker container.
  • A Python library, Pybatfish, that interacts with the Docker container.

To start with Batfish, the user needs to download and run the Batfish docker locally or on a remote server. The tested network information has to be organized in a specific folder structure shown below. The top level folder represents a configuration snapshot. Any files in the configs subfolder are treated as device configuration files. Other subfolders are optional and I will explain their purpose later.

- snapshot
  - batfish
  - configs
  - hosts
  - iptables

Batfish offers the user a set of 67 pre-determined questions (queries). The user needs to write a Python script that points to the Batfish Docker, upload network configuration files, ask any number of questions, receive answers and format the response.

Each question returns a answer as in a Python dictionary. For convenience, the answer can be accessed also a Pandas’ data frame[7], which can be processed further using Panda’s functions. Some questions have options that filters the returned answer.

A good starting point in learning Batfish will be the github repository and the online documentation.

Evaluation Methodology

To evaluate Batfish, I used device configurations from three networks built in lab environments. All networks use Cisco hardware:

  • Network 1 – Access Lists: This network consists of four routers connected to a single switch. VLANs are used to create a start logical topology centered at one router. The routers restrict traffic access using Standard Access Lists and Extended Access Lists (ACL).
  • Network 2 – VPN: This network consists of six routers connected to a single switch. VPNs are used to connect the routers in a virtual star topology centered at one routers. The configuration of the network is incomplete.
  • Network 3 – MPLS: This network consists of five routers arranged serially. MPLS is needed to connect the routers at the far ends of the topology. The configuration of the network is also incomplete.

Initially, I use Batfish to test the networks and discover their capabilities without any change to the configuration. Later, I make changes to the configuration to review more Batfish capabilities. For the remainder of this report, the three original networks are referred to as the acl_net, vpn_net, and mpls_net, respectively. The modified networks are referred to as acl_net_mod, vpn_net_mod, and mpls_net_mod, respectively.

Test Environment

I used a Windows laptop for all the testing (i7-1165G7 CPU, 16GB RAM, 64-bit Windows 10). I installed the Batfish docker image following the online instructions on an Ubuntu 18 VM running over VirtualBox using Vagrant. I installed Pybatfish and wrote Python scripts on Windows.

The following Python line is an example of how a Batfish question is used to obtain an answer.

result = bfq.nodeProperties().answer().frame()

Throughout this evaluation, I had to run the questions many times, observe the results, try different input parameters, etc. To make the task easier, I wrote a Python application that simplifies some of these repetitive tasks. The application executes the same question on the three networks and saves the formatted result in a markdown (or Excel) file for easy viewing. Only some selected output are included in this report (additional editing is needed in some cases to fit the data to a page). The following code snippet is part of the application that initializes Batfish.

class BatfishAnalysis():

    def __init__(self, host, network_name, snapshot_path, snapshot_name, init=True):

        # Initialize Batfish
        bf_session.host = host
        bf_set_network(network_name)
        if init:
            bf_init_snapshot(snapshot_path, name=snapshot_name, overwrite=True)
        else:
            bf_set_snapshot(snapshot_name)

        # Set Pandas options
        pd.set_option('display.max_columns', None)
        pd.set_option('display.max_rows', None)
        pd.set_option('display.max_colwidth', 200)
        pd.set_option('expand_frame_repr', False)

        load_questions()

Question Categories

Not all 67 questions are reviewed in this report. The reviewed questions are classified into categories and discussed as a group in one subsection. The classification often matches Batfish documentation with some exceptions.

Feature Review – Part I

This section includes review of Batfish capabilities using the original network configuration files. To avoid cluttering, I report only few examples of the returned results and summarize the findings in each section.

Input Verification

Batfish has a set of questions that discover and report issues encountered while reading the supplied input files. Here is a brief summary of each question:

  • Initialization Issues: reports failures to recognize certain lines in the configuration, lack of support for certain features, and errors when converting to vendor-independent models.
  • File Parse Status: returns the host IDs that were produced by the file and the parse status: pass, fail, or partially parsed.
  • File Parse Warnings: returns warnings such as failure to recognize certain lines and lack of support for certain features.

When running the above questions, Batfish returns the following:

  • acl_net: One configuration file passed the test, the other four have partially unrecognized lines.
  • vpn_net: All seven files passed the test.
  • mpls_net All five file have partially unrecognized lines.

The results are warnings that Batfish does not recognize some lines in the configuration because they are not supported. The following tables show two examples.

 NodesSource_LinesTypeDetailsLine_TextParser_Context
0 [configs/r2.cfg:[129]]Parse warningThis syntax is unrecognizedtransport input telnet[null_block s_null stanza cisco_configuration]
Part of Initialization Issues in acl_net
 NodesSource_LinesTypeDetailsLine_TextParser_Context
5 [configs/R2.txt:[37]]Parse warningThis syntax is unrecognizedrd 6.68.2.2:100[null_block s_null stanza cisco_configuration]
6 [configs/R2.txt:[38]]Parse warningThis syntax is unrecognizedroute-target export 6.68.2.2:200[null_block s_null stanza cisco_configuration]
Part of Initialization Issues in mpls_net

I conclude that while Batfish is able to recognize a Cisco IOS configuration and parse most of it successfully, there are commands as “transport input telnet” or “route-target export” (MPLS) that are not recognized. Some of these issues are not significant and do not affect the overall operation of the network. Others, such as MPLS commands, represent major functions of the network; therefore, this part of the functionality cannot be tested by Batfish.

During the later part of this evaluation, I used the “File Parse Warnings” question to verify that I added the correct commands, in correct spelling, to the configuration files.

Configuration Verification

This section includes questions that checks the correctness of configuration.

Structures

Batfish finds structures (e.g., ACLs, prefix-lists) defined in device configurations and tracks their references in other parts of the configuration. To verify the correctness of configuration, Batfish has several questions related to structures:

  • Named Structures: Return structures defined in the configurations in a vendor-independent JSON format.
  • Defined Structures: Lists the structures defined in the network.
  • Referenced Structures: Lists the references in configuration files to vendor-specific structures.
  • Undefined References: Finds references to named structures that are not defined.
  • Unused Structures: Return nodes with structures that are defined but not used.

The presence of undefined references indicates errors and can cause serious problems in some cases. Also, unused structures may represent a bug in the configuration or they could be extra configuration that has no use. Both cases should be avoided, so my tests will use only these two questions.

The result of testing shows there are some unused structures in acl_net (See the table below).

 Structure_TypeStructure_NameSource_Lines
0extended ipv4 access-list101configs/r1.cfg:[115, 116, 117, 118, 119, 120]
1extended ipv4 access-list111configs/r1.cfg:[121, 122]
2extended ipv4 access-list121configs/r1.cfg:[123, 124, 125, 126, 127, 128]
Unused structures in acl_net

I can say that this feature is very helpful in detecting some configuration errors that can be hard to detect. For example, an ACL name may be applied to an interface, but the ACL is defined with a misspelled name, resulting in both undefined reference and unused structure, respectively.

Node and Interface Property

There are three questions in this category:

  • Node Properties: lists global settings of devices in the network.
  • Interface Properties: lists interface-level settings of interfaces.
  • IP Owners: lists the mapping from IPs to corresponding interface(s) and VRF(s).

The answers generated by these questions provide plenty of information about the device and interfaces. The amount of information is large, so I will provide only a description.

The Node and Interface Properties questions provide large amount of information. Inspecting manually against network specification may not be feasible. Instead, further filtering of the returned answers may be required to find specific information. Also, the information can be used for comparisons between two configuration versions (called snapshots), to detect if VLAN settings on an interface have changed, for example.

I used the returned information to identify interfaces that are configured but mistakenly left disabled. Conversely, an enabled router interface that does not have an IP address may indicate a configuration error. Both cases can be detected easily by filtering the Pandas data frames returned by the answers.

The results show that none of the networks have missing IP Addresses or interfaces that are configured but disabled.

Note: In Cisco configuration, the presence of “shutdown” command indicate a disabled interface. The absence of the command in an interface is interpreted as enabled by Batfish. A generated configuration should explicitly use “no shutdown” command to remove ambiguity.

Finally, the last IP Owners question can be used to test if there are duplicate IP addresses in the network by setting the parameter ‘duplicatesOnly=True’.

Network Topology

As mentioned earlier, Batfish discovers the L3 topology using the IP address information, but it needs the help of a JSON file to recognize Layer 1/2 topology. There are two main questions for topology inspection.

edges: returns different types of edges depending on specified parameter (edgeType).
layer3Edges: lists all Layer 3 edges in the network.

The first question returns an empty answer in all tests (edgeType=’layer1′), because layer 1 information is not supplied. The next table shows a sample answer to the second question.

 InterfaceIPsRemote_InterfaceRemote_IPs
0r2[GigabitEthernet0/1][‘6.68.23.2’]r3[GigabitEthernet0/2][‘6.68.23.3’]
1r2[GigabitEthernet0/2][‘6.68.12.2’]r1[GigabitEthernet0/1][‘6.68.12.1’]
2r1[GigabitEthernet0/1][‘6.68.12.1’]r2[GigabitEthernet0/2][‘6.68.12.2’]
3r3[GigabitEthernet0/2][‘6.68.23.3’]r2[GigabitEthernet0/1][‘6.68.23.2’]
4r3[GigabitEthernet0/1][‘6.68.34.3’]r4[GigabitEthernet0/2][‘6.68.34.4’]
5r4[GigabitEthernet0/1][‘6.68.45.4’]r5[GigabitEthernet0/1][‘6.68.45.5’]
6r5[GigabitEthernet0/1][‘6.68.45.5’]r4[GigabitEthernet0/1][‘6.68.45.4’]
7r4[GigabitEthernet0/2][‘6.68.34.4’]r3[GigabitEthernet0/1][‘6.68.34.3’]
Layer 3 Edges from mpls_net

I find that the edges returned by the questions are useful in drawing topology diagrams. The information can be converted into a graph for further analysis. For example, the following plots were generated from the edge information using Python modules networkx and matplotlib.

acl_net_modvpn_net_modmpls_net_mod
Layer 3 topology of all networks

Routing Validation

Batfish supports several routing protocols, including OSPF and BGP protocols.

Routing Protocol Configuration

There are several questions that check the configuration correctness of OSPF and BGP and construct the topology created by these protocols:

  • BGP Session Compatibility: checks each BGP peering and reports any issues with local settings or incompatibility with its remote counterparts.
  • BGP Session Status: checks if BGP peerings can be established.
  • BGP Edges: lists all BGP adjacencies in the network.
  • OSPF Session Compatibility: returns compatible OSPF sessions in the network. A session is compatible if the interfaces involved are not shutdown and do run OSPF, are not OSPF passive, and are associated with the same OSPF area.
  • OSPF Edges: lists all OSPF adjacencies in the network.
  • Test Route Policies: finds how the specified route is processed through the specified routing policies.
  • Search Route Policies: this question finds route announcements for which a route policy has a particular behavior (mostly related to BGP).

The mpls_net implements BGP routing, so I applied the BGP Session Status and the BGP Edges questions and got the following answers:

 NodeVRFLocal ASLocal IPRemote ASRemote NodeRemote IPSession TypeEstablished Status
0r1default123456.68.12.11r26.68.12.2EBGP SINGLEHOPESTABLISHED
1r2ABCD16.68.12.212345r16.68.12.1EBGP SINGLEHOPESTABLISHED
2r2default16.68.2.21r46.68.4.4IBGPESTABLISHED
3r4ABCD16.68.45.412345r56.68.45.5EBGP SINGLEHOPESTABLISHED
4r4default16.68.4.41r26.68.2.2IBGPESTABLISHED
5r5default123456.68.45.51r46.68.45.4EBGP SINGLEHOPESTABLISHED
BGP Session Status form mpls_net (omitting empty columns)
 NodeIPInterfaceAS NumberRemote NodeRemote IPRemote InterfaceRemote AS Number
0r46.68.45.4 1r56.68.45.5 12345
1r56.68.45.5 12345r46.68.45.4 1
2r46.68.4.4 1r26.68.2.2 1
3r26.68.12.2 1r16.68.12.1 12345
4r16.68.12.1 12345r26.68.12.2 1
5r26.68.2.2 1r46.68.4.4 1
BGP Edges in mpls_net

Also for mpls_net, I got these answers for the OSPF protocol:

 InterfaceRemote_Interface
0r2[GigabitEthernet0/1]r3[GigabitEthernet0/2]
1r3[GigabitEthernet0/2]r2[GigabitEthernet0/1]
2r3[GigabitEthernet0/1]r4[GigabitEthernet0/2]
3r4[GigabitEthernet0/2]r3[GigabitEthernet0/1]
OSPF Edges in mpls_net

I conclude that Batfish does a good job confirming the correctness of routing protocols’ configuration. I am not sure, however, why other protocols do not have specific questions like OSPF and BGP. I also found that Batfish supports EIGRP but not RIP.

Route Verification

Batfish makes validating routing and forwarding in the network easy by providing centralized view of routing tables in the network. Batfish offers some questions that return all routes available from all (supported) protocols, or from a specific protocol.

  • Routes: returns all routes or routes for a specific routing information base (RIB), Virtual routing and forwarding (VRF), and node(s).
  • BGP RIB: shows BGP routes for a specified VRF and node(s).
  • EVPN RIB: shows EVPN routes for a specified VRF and node(s).
  • Longest Prefix Match: return longest prefix match routes for a given IP in the RIBs of specified nodes and VRFs.

Here is a partial output from mpls_net showing routes learned from three different protocols and two VRFs.

 NodeVRFNetworkNext_HopNext_Hop_IPProtocolMetricAdmin_Distance
1r4default6.68.2.2/32r36.68.34.3ospf3110
3r4default6.68.34.0/24 AUTO/NONE(-1l)connected00
10r4ABCD6.68.5.5/32r56.68.45.5bgp020
Selected routes from r4 in mpls_net

I see that Batfish reported all active routes correctly in all networks. In vpn_net, EIGRP is also configured, on some routers but the configuration is not complete so EIGRP does not advertise its routes anywhere. This will change in the second part of the evaluation.

Packet Forwarding and Flow Analysis

Batfish perform packet forwarding analysis by tracing the path of packet flows across the network topology. This functionality mimics ping or traceroute functions without sending actual packets across the network. Unlike ping and traceroute, Batfish can trace packets of any supported protocol. This capabilities enable a network administrator to analyze the impact of configuration changes before they are pushed to the network.

Batfish provides the following questions to perform packet forward analysis.

  • Traceroute: performs a virtual traceroute in the network from a starting node to a destination IP. This traceroute is uni-directional.
  • Bi-directional Traceroute: is similar to Traceroute but it also provides the path traces for the reverse flows.
  • Reachability: is similar to the Traceroute question, but it allows testing reachability from multiple points in the network (e.g. what nodes can reach the DNS server?).
  • Bi-directional Reachability: is similar to the Reachability question, but it also provides the path traces for the reverse flows.
  • Loop detection: Returns flows in the network that will experience forwarding loops (in L3).
  • Multipath Consistency for host-subnets: returns flows where multiple paths are present, but the path outcome is different. For example, one trace within the flow is permitted and the other is denied.
  • Multipath Consistency for router loopbacks: acts the same as Multipath Consistency for host-subnets, but it is performed between router loopbacks.

From previous section, I know that end-to-end reachability cannot be tested in mpls_net and vpn_net because the configuration is incomplete and MPLS is not supported. Therefore, I am going to apply some of the above questions on acl_net only.

One useful feature of Batfish is the ability to ignore any access-list in reachability questions. This enables the tester to separate the routing capabilities of the network from any security policies in place.

Flow analysis questions differ from other questions discussed earlier because they require mandatory input parameters that specify the starting location and packet header information. The following question is an example of the minimum information needed to perform a traceroute while ignoring ACLs.

result = bf.q.bidirectionalTraceroute(startLocation='@enter(r2[GigabitEthernet0/1])',
 headers=HeaderConstraints(dstIps='150.1.14.4'), ignoreFilters=True).answer().frame()

The default answers to trace questions are verbose, so I formatted the output differently in the following tables:

 NodesSteps
1r2RECEIVED: GigabitEthernet0/1 — FORWARDED: ARP IP: 172.16.40.1, Output Interface: GigabitEthernet0/1, Routes: [ospf (Network: 150.1.14.0/24, Next Hop IP:172.16.40.1)] — TRANSMITTED: GigabitEthernet0/1
2r1RECEIVED: GigabitEthernet0/2 — FORWARDED: ARP IP: AUTO/NONE(-1l), Output Interface: GigabitEthernet0/1, Routes: [connected (Network: 150.1.14.0/24, Next Hop IP:AUTO/NONE(-1l))] — TRANSMITTED: GigabitEthernet0/1
3r4RECEIVED: GigabitEthernet0/1 — ACCEPTED: GigabitEthernet0/1
Traceroute from R2 to R4 in acl_net
 NodesSteps
1r2RECEIVED: GigabitEthernet0/1 — FORWARDED: ARP IP: 172.16.40.1, Output Interface: GigabitEthernet0/1, Routes: [ospf (Network: 150.1.14.0/24, Next Hop IP:172.16.40.1)] — TRANSMITTED: GigabitEthernet0/1
2r1RECEIVED: GigabitEthernet0/2 — FORWARDED: ARP IP: AUTO/NONE(-1l), Output Interface: GigabitEthernet0/1, Routes: [connected (Network: 150.1.14.0/24, Next Hop IP:AUTO/NONE(-1l))] — TRANSMITTED: GigabitEthernet0/1
3r4RECEIVED: GigabitEthernet0/1 — ACCEPTED: GigabitEthernet0/1
Bi-Directional Traceroute (Forward) in acl_net
 NodesSteps
1r4ORIGINATED: default — FORWARDED: ARP IP: 150.1.14.1, Output Interface: GigabitEthernet0/1, Routes: [ospf (Network: 172.16.40.0/24, Next Hop IP:150.1.14.1)] — TRANSMITTED: GigabitEthernet0/1
2r1RECEIVED: GigabitEthernet0/1 — FORWARDED: ARP IP: AUTO/NONE(-1l), Output Interface: GigabitEthernet0/2, Routes: [connected (Network: 172.16.40.0/24, Next Hop IP:AUTO/NONE(-1l))] — TRANSMITTED: GigabitEthernet0/2
3r2RECEIVED: GigabitEthernet0/1 — ACCEPTED: GigabitEthernet0/1
Bi-Directional Traceroute (Reverse) in acl_net

The answer shows a complete trace of a UDP packet (default) from one interface to another. The Traceroute offers information that are not available from its ‘physical’ counterpart, such as the required ARP protocol exchanges and the protocol used to choose the route (e.g. OSPF or connected).

Batfish also offers the option of originating a packet from an interface or from a link connected to the interface. The latter is used to model packets before they enter the interface or after they exit. The options available for packet headers are too many to be listed here[8].

The Reachability questions offer more powerful features. The following question, for example, asks if all routers can reach R4 using TCP while ignoring filters. The question can be used also to provide a list of routers that cannot reach R4 by changing the actions to ‘FAILURE’. Path constrains allows testing specific paths or avoid others.

result = bf.q.reachability(pathConstraints=PathConstraints(startLocation = '/r/'),
headers=HeaderConstraints(dstIps='r4', srcIps='0.0.0.0/0', ipProtocols='TCP'),
actions='SUCCESS').answer().frame()

The results show that all routers can reach R4 (Batfish picks random IP addresses and ports when the given source IP is 0.0.0.0/0):

 Flows
0start=r1 [10.0.0.0:49152->150.1.14.4:80 TCP length=512]
1start=r2 [10.0.0.0:49152->150.1.14.4:80 TCP length=512]
2start=r3 [10.0.0.0:49152->150.1.14.4:80 TCP length=512]
3start=r4 [10.0.0.0:49152->150.1.14.4:80 TCP length=512]
Reachability Flows
 NodesSteps
1r1ORIGINATED: default — FORWARDED: ARP IP: AUTO/NONE(-1l), Output Interface: GigabitEthernet0/1, Routes: [connected (Network: 150.1.14.0/24, Next Hop IP:AUTO/NONE(-1l))] — TRANSMITTED: GigabitEthernet0/1
2r4RECEIVED: GigabitEthernet0/1 — ACCEPTED: GigabitEthernet0/1
Part of the reachability output

Flow Analysis is one of the most powerful features of Batfish as it can reduce or eliminate hours or days spend on verifying reachability by other methods such as simulators or testing in a production networks.

Access Lists and Firewall Rules

Batfish offers several questions aimed at verifying ACL/Firewall rules. These questions, combined with the flow analysis described earlier, provide tremendous help in checking the effectiveness of security policies before and after applying changes to the network. The questions can also be used to audit security polices and discover any errors[9]. Since the only network that implements ACLs is acl_net, once again, I am going to use this network only to demonstrate and review Batfish capabilities.

The questions available for ACL/Firewall rules are:

  • Filter Line Reachability: finds all lines in the specified filters that will not match any packet because of shadowing or other reasons.
  • Search Filters: searches for flows that match an ACL rule of particular action.
  • Test Filters: shows how a specified flow is processed through the ACL, returning its permit/deny status as well as the line(s) it matched.
  • Find Matching Filter Lines: finds all lines in the specified filters that match any packet within the specified header constraints.

I do not know of any question that retrieves all ACLs in the network the same way routing information can be retrieved from all nodes. Therefore, I copied the ACLs manually in the next table:

NodeInterfaceACL
r1GigabitEthernet0/3 [in]access-list 101 permit tcp 192.168.13.0 0.0.0.255 host 172.16.40.2 eq telnet
access-list 101 permit icmp 192.168.13.0 0.0.0.255 host 172.16.40.2 echo
access-list 101 deny ip 192.168.13.0 0.0.0.255 172.16.40.0 0.0.0.255
access-list 101 permit ip 192.168.13.0 0.0.0.255 any
access-list 101 permit udp any any eq rip
access-list 101 deny ip any any
r1GigabitEthernet0/2 [in]access-list 111 permit udp any any eq rip
access-list 111 deny ip any any
r1GigabitEthernet0/1 [in]access-list 121 permit tcp 150.1.14.0 0.0.0.255 172.16.40.0 0.0.0.255 eq www
access-list 121 permit tcp 150.1.14.0 0.0.0.255 172.16.40.0 0.0.0.255 eq ftp
access-list 121 permit tcp 150.1.14.0 0.0.0.255 172.16.40.0 0.0.0.255 eq smtp
access-list 121 permit icmp 150.1.14.0 0.0.0.255 172.16.40.0 0.0.0.255 echo
access-list 121 permit udp any any eq rip
access-list 121 deny ip any any
ACLs used in acl_net

When the first question is applied to the acl_net, I got an empty table, indicating that all rules are reachable. The second question asks if there are any rules in any ACL that allows TCP to reach R4.

result = bfq.searchFilters(headers=HeaderConstraints(dstIps='r4', ipProtocols = 'TCP'),
 action='PERMIT').answer().frame()

The answer in the following table shows that TCP packets from network 192.168.13.0/24 are allowed to R4 by to ACL 101:

 NodeFilter_NameFlowActionLine_ContentTrace
0r1101start=r1 [192.168.13.0:49152->150.1.14.4:80 TCP length=512]PERMITpermit ip 192.168.13.0 0.0.0.255 any– Matched line permit ip 192.168.13.0 0.0.0.255 any
Search Filter – Reaching R4 via TCP

The question can be more specific by including the source host (or IP address) and an application name. The ACL name can be specified as well. The following question asks if DNS traffic is permitted from R2 to R4 in any ACL (by omitting the filter name). The answer returned was an empty table indicating DNS is not permitted from R2 to R4.

result = bfq.searchFilters(headers=HeaderConstraints(dstIps='r4', srcIps='172.16.40.2',
         applications = ['dns']),  action='permit').answer().frame()

The Test Filter question is similar to the Search Filter question, but instead of specifying the expected action, the question returns the action encountered by the packet at a specific node.

result = bfq.testFilters(headers=HeaderConstraints(dstIps='r4', applications = ['dns']),
 nodes='r1').answer().frame()

The following table shows the test results. Because the source IP address was omitted, Batfish provides results assuming packets come from anywhere in the network, which I find very convenient. The table shows that DNS is permitted from network 192.168.13.0/24 to R4 due to the presence of the 4th rule in ACL 101.

 NodeFilter NameFlowActionLine ContentTrace
0r1101start=r1 [192.168.13.1:49152->150.1.14.4:53 UDP length=512]PERMITpermit ip 192.168.13.0 0.0.0.255 any– Matched line permit ip 192.168.13.0 0.0.0.255 any
1r1101start=r1 [172.16.40.1:49152->150.1.14.4:53 UDP length=512]DENYdeny ip any any– Matched line deny ip any any
2r1121start=r1 [192.168.13.1:49152->150.1.14.4:53 UDP length=512]DENYdeny ip any any– Matched line deny ip any any
3r1101start=r1 [150.1.14.1:49152->10.10.4.1:53 UDP length=512]DENYdeny ip any any– Matched line deny ip any any
4r1111start=r1 [192.168.13.1:49152->150.1.14.4:53 UDP length=512]DENYdeny ip any any– Matched line deny ip any any
5r1111start=r1 [150.1.14.1:49152->10.10.4.1:53 UDP length=512]DENYdeny ip any any– Matched line deny ip any any
6r1121start=r1 [150.1.14.1:49152->10.10.4.1:53 UDP length=512]DENYdeny ip any any– Matched line deny ip any any
7r1111start=r1 [172.16.40.1:49152->150.1.14.4:53 UDP length=512]DENYdeny ip any any– Matched line deny ip any any
8r1121start=r1 [172.16.40.1:49152->150.1.14.4:53 UDP length=512]DENYdeny ip any any– Matched line deny ip any any
Testing reaching R4 via DNS from any node in the acl_net

The last question, shown in the example below, returns lines in ACLs and firewall rules that match any packet within the specified header constraints. In the case of DNS, there are five lines in the three ACLs that impact DNS traffic.

result = bfq.findMatchingFilterLines(
  headers=HeaderConstraints(applications='DNS')).answer().frame()
 NodeFilterLineLine_IndexAction
0r1101deny ip 192.168.13.0 0.0.0.255 172.16.40.0 0.0.0.2552DENY
1r1101permit ip 192.168.13.0 0.0.0.255 any3PERMIT
2r1101deny ip any any5DENY
3r1111deny ip any any1DENY
4r1121deny ip any any5DENY
ACL rules that affect DNS packets

I conclude this section by noting that Batfish’s capabilities in analyzing ACLs cannot be matched practically by manually testing ACLs in a live or simulated network. Even if automated tools are developed to test the network empirically, they have to be able to generate different types traffic from multiple starting points in the network and observe the results from multiple points as well. Batfish can do the job analytically in a very short time using nothing but the configuration files.

Feature Review – Part II

Previous evaluations used network configuration files without any changes. There are remaining features that I cannot review without modifying configuration files or providing additional information. To review these additional Batfish features, I created a second snapshot of each network (as a new folder), and modified the configuration of each network in the new snapshot as follows:

acl_net_mod:

  • Added three hosts. Each host is connected directly to a single router interface. The routers’ configuration were modified to add IP addresses for connectivity to the hosts.
  • Added a physical topology file that describes the physical connections between all the nodes.
  • Added a rule to ACL 121 in R1 to permit DNS (was previously denied).

vpn_net_mod:

  • Added physical topology file that describes the physical connections between all the nodes.
  • Created a pair of GRE tunnels between R2 and R3 and between R2 and R4. The tunnels are encrypted using IPSec. Adding these two tunnels allows the EIGRP protocol to route traffic end-to-end.

mpls_net_mod:

  • Added physical topology file that describes the physical connections between all the nodes.

Batfish Configuration Files

Batfish requires adding JSON files to the snapshot folder to supply additional information about the network, such as:

  • Physical Topology: Batfish cannot automatically detect physical topology. A file representing the physical topology must be named ‘layer1_topology.json’ and saved in ‘batfish’ folder under the main snapshot folder (see above).
  • Hosts: Hosts are represented in Batfish as JSON files saved in ‘hosts’ folder. The host configuration includes its IP addresses, interface name, and iptables (optional).
  • iptables: Host iptables can be defined as JSON files and saved in separate folder ‘iptables’
  • isp: Batfish can represent an ISP configuration using a single JSON file that includes interfaces and ASN information. This file is useful in testing BGP.
  • bgp announcements: External BGP announcements are represented by files that include all necessary information. This is also useful in testing BGP without the need to replicate a large network configuration.
  • Runtime interface information: Not all interface attributes can be obtained from static configuration files (from some vendors), so Batfish uses additional files to supply runtime information such as interface status (up/down) and speed/bandwidth for some vendors.
  • Outages: Batfish is useful in testing failure scenarios. Failures can be represented by files that include the node and the interface information.
  • Configuration files: Configuration from some vendors are represented by multiple files, so additional files (or folders) are needed to tell Batfish how to read the configurations properly.

Differential Questions

Most of the Batfish questions can be run differentially by adding the names of a current snapshot and a reference snapshot to the .answer() methods. For example, to view the difference in routing tables between mpls_net_mod and mpls_net, use the following question:

result = bfq.routes().answer(snapshot="mpls_net_mod", reference_snapshot="mpls_net").frame()

When I applied the differential questions related the File Parse Status, Layer 1 Edges, and Layer 3 Edges, I got the following answers:

  • acl_net_mod vs acl_net:
    • Three host configuration files.
    • All Layer 1 edges.
    • The new Layer 3 edges between hosts and routers.
  • vpn_net_mod vs vpn_net:
    • One switch configuration file.
    • All Layer 1 edges.
  • mpls_net_mod vs mpls_net
    • All Layer 1 edges.

An example of the answer is shown in the next table (empty columns removed):

 InterfaceRemote InterfaceKeyPresenceSnapshot IPsReference IPsSnapshot Remote IPsReference Remote IPs
0h2[eth0]r2[Gi0/0]Only in Snapshot[‘10.10.2.2’] [‘10.10.2.1’] 
1h3[eth0]r3[Gi0/0]Only in Snapshot[‘10.10.3.2’] [‘10.10.3.1’] 
2h4[eth0]r4[Gi0/0]Only in Snapshot[‘10.10.4.2’] [‘10.10.4.1’] 
3r2[Gi0/0]h2[eth0]Only in Snapshot[‘10.10.2.1’] [‘10.10.2.2’] 
4r3[Gi0/0]h3[eth0]Only in Snapshot[‘10.10.3.1’] [‘10.10.3.2’] 
5r4[Gi0/0]h4[eth0]Only in Snapshot[‘10.10.4.1’] [‘10.10.4.2’] 
The new Layer 3 Edges in acl_net_mod

And the resulting physical topology is shown in the following table (drawn programmatically):

acl_net_modvpn_net_modmpls_net_mod
Physical topology of all networks

In vpn_net, multiple Layer 3 edges are added.

 InterfaceRemote InterfaceKeyPresenceSnapshot IPsReference IPsSnapshot Remote IPsReference Remote IPs
0r01[Gi1.12]r02[Gi1.12]In both[‘155.1.12.1’][‘155.1.12.1’][‘155.1.12.3’, ‘155.1.12.2’][‘155.1.12.2’]
1r02[Gi1.12]r01[Gi1.12]In both[‘155.1.12.3’, ‘155.1.12.2’][‘155.1.12.2’][‘155.1.12.1’][‘155.1.12.1’]
2r02[Tunnel3]r03[Tunnel3]Only in Snapshot[‘1.1.3.2’] [‘1.1.3.3’] 
3r02[Tunnel4]r04[Tunnel4]Only in Snapshot[‘1.1.4.2’] [‘1.1.4.4’] 
4r03[Tunnel3]r02[Tunnel3]Only in Snapshot[‘1.1.3.3’] [‘1.1.3.2’] 
5r04[Tunnel4]r02[Tunnel4]Only in Snapshot[‘1.1.4.4’] [‘1.1.4.2’] 
Additional Three Edges in vpn_net

The results demonstrate Batfish’s ability to detect the differences between two versions of network configuration. In a production network, this ability can be used to review any change in network configuration before they are pushed into production. In our teaching environment, this feature can be used to compare a student’s configuration against a reference configuration.

VLANs and VxLANs

At Layer 2, Batfish can recognize virtual LANs (VLAN)s, switch ports’ access and trunk modes, switch VLAN interfaces, routers’ sub-interfaces, and multi-chassis link aggregation group (MLAG). Some of these capabilities were demonstrated in the modified configurations above. Unfortunately, it seems that Batfish does not support the Spanning Tree Protocol.

Batfish has several questions related to VLANs and VxLANs:

  • VLAN Properties: Lists VLAN information.
  • VXLAN VNI Properties: Lists VxLAN Network Identifiers information.
  • VXLAN Edges: Lists all VXLAN edges in the network.
  • L3 EVPN VNIs: Lists VNI-level network segment settings configured for VXLANs.

Since we do not have VxLANs in our reviewed networks, I was able to apply the VLAN question only and get the following answer from vpn_net_mod (note that the ports are in trunk mode):

 NodeVLAN IDInterfaces
0s147[s1[GigabitEthernet0/1], s1[GigabitEthernet0/2], s1[GigabitEthernet0/3], s1[GigabitEthernet0/4], s1[GigabitEthernet0/5], s1[GigabitEthernet0/6], s1[GigabitEthernet0/7]]
1s112[s1[GigabitEthernet0/1], s1[GigabitEthernet0/2], s1[GigabitEthernet0/3], s1[GigabitEthernet0/4], s1[GigabitEthernet0/5], s1[GigabitEthernet0/6], s1[GigabitEthernet0/7]]
2s113[s1[GigabitEthernet0/1], s1[GigabitEthernet0/2], s1[GigabitEthernet0/3], s1[GigabitEthernet0/4], s1[GigabitEthernet0/5], s1[GigabitEthernet0/6], s1[GigabitEthernet0/7]]
3s114[s1[GigabitEthernet0/1], s1[GigabitEthernet0/2], s1[GigabitEthernet0/3], s1[GigabitEthernet0/4], s1[GigabitEthernet0/5], s1[GigabitEthernet0/6], s1[GigabitEthernet0/7]]
4s125[s1[GigabitEthernet0/1], s1[GigabitEthernet0/2], s1[GigabitEthernet0/3], s1[GigabitEthernet0/4], s1[GigabitEthernet0/5], s1[GigabitEthernet0/6], s1[GigabitEthernet0/7]]
5s136[s1[GigabitEthernet0/1], s1[GigabitEthernet0/2], s1[GigabitEthernet0/3], s1[GigabitEthernet0/4], s1[GigabitEthernet0/5], s1[GigabitEthernet0/6], s1[GigabitEthernet0/7]]
Switched VLAN Properties in vpn_net_mod

IPSec tunnels

Batfish can detect the status of IPSec tunnels in the network using two questions:

  • IPSec Session Status: shows configuration settings and status for each configured IPSec tunnel in the network. The status can be IPSEC_SESSION_ESTABLISHED, IKE_PHASE1_FAILED, IKE_PHASE1_KEY_MISMATCH, IPSEC_PHASE2_FAILED, or MISSING_END_POINT.
  • IPSec Edges: lists all IPSec tunnels in the network.

I used these questions to gather information from vpn_net_mod IPSec tunnels in the following tables:

 NodeNode InterfaceNode IPRemote NodeRemote Node InterfaceRemote Node IPTunnel InterfacesStatus
0r02r02[Gi1.12]155.1.12.2r03r03[Gi1.13]155.1.13.3Tunnel3 -> Tunnel3ESTABLISHED
1r02r02[Gi1.12]155.1.12.3r04r04[Gi1.14]155.1.14.4Tunnel4 -> Tunnel4ESTABLISHED
2r03r03[Gi1.13]155.1.13.3r02r02[Gi1.12]155.1.12.2Tunnel3 -> Tunnel3ESTABLISHED
3r04r04[Gi1.14]155.1.14.4r02r02[Gi1.12]155.1.12.3Tunnel4 -> Tunnel4ESTABLISHED
IPSec Session Status in vpn_net_mod
 Source InterfaceTunnel InterfaceRemote Source InterfaceRemote Tunnel Interface
0r03[GigabitEthernet1.13]r03[Tunnel3]r02[GigabitEthernet1.12]r02[Tunnel3]
1r02[GigabitEthernet1.12]r02[Tunnel3]r03[GigabitEthernet1.13]r03[Tunnel3]
2r02[GigabitEthernet1.12]r02[Tunnel4]r04[GigabitEthernet1.14]r04[Tunnel4]
3r04[GigabitEthernet1.14]r04[Tunnel4]r02[GigabitEthernet1.12]r02[Tunnel4]
IPSec Edges in vpn_net_mod

After the network has been modified to add IPSec tunnels, end-to-end routes where available using EIGRP over the GRE tunnels. As a result, the logical topology in the network changed to look like the next figure.

L3 Topology of vpn_net_mod after adding Tunnels

I used the traceroute question again to confirm end-end connectivity, with the following results:

 NodesSteps
1r05RECEIVED: Loopback0 — FORWARDED: ARP IP: 10.1.25.2, Output Interface: GigabitEthernet1.25, Routes: [eigrp (Network: 150.1.7.7/32, Next Hop IP:10.1.25.2)] — TRANSMITTED: GigabitEthernet1.25
2r02RECEIVED: GigabitEthernet1.25 — FORWARDED: ARP IP: 1.1.4.4, Output Interface: Tunnel4, Routes: [eigrp (Network: 150.1.7.7/32, Next Hop IP:1.1.4.4)] — TRANSMITTED: Tunnel4
3r04RECEIVED: Tunnel4 — FORWARDED: ARP IP: 10.1.47.7, Output Interface: GigabitEthernet1.47, Routes: [eigrp (Network: 150.1.7.7/32, Next Hop IP:10.1.47.7)] — TRANSMITTED: GigabitEthernet1.47
4r07RECEIVED: GigabitEthernet1.47 — ACCEPTED: Loopback0
Traceroute R05 to R07 via IPSec Tunnel
 NodesSteps
1r06RECEIVED: Loopback0 — FORWARDED: ARP IP: 10.1.36.3, Output Interface: GigabitEthernet1.36, Routes: [eigrp (Network: 150.1.7.7/32, Next Hop IP:10.1.36.3)] — TRANSMITTED: GigabitEthernet1.36
2r03RECEIVED: GigabitEthernet1.36 — FORWARDED: ARP IP: 1.1.3.2, Output Interface: Tunnel3, Routes: [eigrp (Network: 150.1.7.7/32, Next Hop IP:1.1.3.2)] — TRANSMITTED: Tunnel3
3r02RECEIVED: Tunnel3 — FORWARDED: ARP IP: 1.1.4.4, Output Interface: Tunnel4, Routes: [eigrp (Network: 150.1.7.7/32, Next Hop IP:1.1.4.4)] — TRANSMITTED: Tunnel4
4r04RECEIVED: Tunnel4 — FORWARDED: ARP IP: 10.1.47.7, Output Interface: GigabitEthernet1.47, Routes: [eigrp (Network: 150.1.7.7/32, Next Hop IP:10.1.47.7)] — TRANSMITTED: GigabitEthernet1.47
5r07RECEIVED: GigabitEthernet1.47 — ACCEPTED: Loopback0
Traceroute R06 to R07 via IPSec Tunnel

I find that Batfish support of IPSec useful and powerful. In fact, I created the tunnels and tested the network using nothing but Batfish. Batfish helped in troubleshooting the configuration until I got the desired results.

Comparative Filters and Reachability

In addition to comparing two snapshots using any question, Batfish has some questions that can ONLY be run differentially, such as:

  • Compare Filters: compares filters with the same name in the current and reference snapshots and returns the lines that match the same flow(s) but treat them differently (i.e. one permits and the other denies the flow).
  • Differential Reachability: searches across all possible flows in the network, with the specified header and path constraints, and returns example flows that are successful in one snapshot and not the other.

The result of the first question was discovering the modified ACL rule that PERMITs DNS traffic. To confirm that Batfish does not simply perform text comparisons between the ACLs in the two snapshots, I changed the added rule to DENY. Batfish did not report any differences because the treatment of DNS traffic did not change in the latter case.

 NodeFilter NameLine IndexLine ContentLine ActionReference Line IndexReference Line_Content
0r11214permit udp any any eq 53PERMIT5deny ip any any
Differences in ACL rules in acl_net_mod

When applying the second question to the vpn_net_mod (without specifying any constrains), it returned 15 flows that were not available in the reference snapshot because routes were not available. In acl_net_mod, four flows are available to R4 from other routers after modifying the ACL to permit DNS. Notice that the table below shows source IP addresses that do not exist in the network. That is because Batfish select random IP addresses when none are specified in the question (I used ‘0.0.0.0/0’). The ACL allows these flows because the policy is ‘access-list 121 permit udp any any eq 53’.[^any]

 FlowSnapshot Traces
0start=r1 [8.8.8.8:49152->10.10.4.1:53 UDP length=512][((ORIGINATED(default), FORWARDED(ARP IP: 150.1.14.4, Output Interface: GigabitEthernet0/1, Routes: [ospf (Network: 10.10.4.0/24, Next Hop IP:150.1.14.4)]), TRANSMITTED(GigabitEthernet0/1)), (RECEIVED(GigabitEthernet0/1), ACCEPTED(GigabitEthernet0/0)))]
1start=r2 [10.0.0.0:49152->150.1.14.4:53 UDP length=512][((ORIGINATED(default), FORWARDED(ARP IP: 172.16.40.1, Output Interface: GigabitEthernet0/1, Routes: [ospf (Network: 150.1.14.0/24, Next Hop IP:172.16.40.1)]), TRANSMITTED(GigabitEthernet0/1)), (RECEIVED(GigabitEthernet0/2), FORWARDED(ARP IP: AUTO/NONE(-1l), Output Interface: GigabitEthernet0/1, Routes: [connected (Network: 150.1.14.0/24, Next Hop IP:AUTO/NONE(-1l))]), TRANSMITTED(GigabitEthernet0/1)), (RECEIVED(GigabitEthernet0/1), ACCEPTED(GigabitEthernet0/1)))]
2start=r3 [10.0.0.0:49152->150.1.14.4:53 UDP length=512][((ORIGINATED(default), FORWARDED(ARP IP: 192.168.13.1, Output Interface: GigabitEthernet0/1, Routes: [ospf (Network: 150.1.14.0/24, Next Hop IP:192.168.13.1)]), TRANSMITTED(GigabitEthernet0/1)), (RECEIVED(GigabitEthernet0/3), FORWARDED(ARP IP: AUTO/NONE(-1l), Output Interface: GigabitEthernet0/1, Routes: [connected (Network: 150.1.14.0/24, Next Hop IP:AUTO/NONE(-1l))]), TRANSMITTED(GigabitEthernet0/1)), (RECEIVED(GigabitEthernet0/1), ACCEPTED(GigabitEthernet0/1)))]
3start=r4 [8.8.8.8:49152->10.10.4.1:53 UDP length=512][((ORIGINATED(default), ACCEPTED(GigabitEthernet0/0
Comparing Reachability to R4 via DNS

What is missing?

Batfish offers 67 questions at the time of this writing. I only covered a small set of these questions. There are many features in Batfish that I could not review because of time constrains or because the test networks do not implement these features. Among these are:

  • BGP and ISP support (I only touched on this briefly).
  • VxLANs
  • NATs
  • Investigation of failure scenarios
  • Multipath flow analysis
  • Hosts iptables

Additionally, many questions have parameters that modify their behaviour (e.g. by filtering the output). In all investigations, I used only the default parameters or the minimum required for testing. This means that there remains many features to explore.

Conclusions

I am quite impressed with Batfish. It has been around since 2015 and it is currently in active development as an open source project and a commercial product[10] with a growing list of supported vendor equipment[6]. More importantly, Batfish represents a new way of “doing business” in the networking field by relying on analytical tools to evaluate network configuration before deployments. This approach eliminates uncertainty and emotional stress commonly associated with network deployment and upgrades. Batfish also represents a new field of academic research that has great potential.

Batfish has its limitations. Some limitations, such as the lack of support of RIP, are not fundamental and can be overcome in the future by anyone who wants to contribute code to the project. Other limitations are more fundamental[1]. Nevertheless, Batfish can be considered as another tool in the network engineer’s toolbox.

I concluded each of the evaluation sections with some comments about the questions reviewed in the section. Here are additional general comments:

  • The installation of Batfish was straightforward using Docker. I used an Ubuntu VM but it can be installed on Windows too. Installing Pybatfish module of Python is similarly easy.
  • Learning Batfish is easy. I started looking into Batfish on October 12th, 2021 (I am writing this sentence on October 26th). However, prior familiarity with Linux, Docker, Python and its various modules was a contributing factor.
  • Documentation and example code (jupyter notebooks) are available online, but the documentation is not complete (the is common in open source projects). I learned a lot about Batfish from online articles.
  • There is an application that offers a GUI interface for Batfish (I have not tested yet), but Python is needed to get the most out of Batfish.
  • Formatting Batfish answers as Pandas’ data frame is convenient but the answer can be verbose and it requires some filtering and/or sorting. This is where some programming skills can be useful. For instance, in case of traceroute and reachability questions, I had to reformat the output to make it easier to read.
  • By accessing the answers’ objects directly (instead of getting a data frame), more information can be extracted or formatted differently.

References

[1] Y. Li, X. Yin, Z. Wang, J. Yao, X. Shi, J. Wu, H. Zhang, and Q. Wang, “A survey on network verification and testing with formal methods: Approaches and challenges,” IEEE Communications Surveys & Tutorials, vol. 21, pp. 940–969, 2019.

[2] “Batfish – An open source network configuration analysis tool.” https://www.batfish.org/.

[3] C. Vyas, “Network test automation: Rock, Paper, Scissors, Lizard, or Fish?” Intentionet. [Online]. Available: https://www.intentionet.com/blog/network-test-automation-rock-paper-scissors-lizard-or-fish/, Mar-2021.

[4] D. Halperin, “Batfish and pybatfish: Using open source tools to validate network configuration,” in NANOG 75, 2019.

[5] A. Fogel, S. Fung, L. Pedrosa, M. Walraed-Sullivan, R. Govindan, R. Mahajan, and T. Millstein, “A general approach to network configuration analysis,” in 12th USENIX symposium on networked systems design and implementation (NSDI 15), 2015, pp. 469–483.

[6] “Supported devices.” Batfish Online Documentation; https://pybatfish.readthedocs.io/en/latest/supported_devices.html.

[7] “Pandas.” pandas.pydata.org; https://pandas.pydata.org/.

[8] “Header constraints.” Batfish Online Documentation; https://batfish.readthedocs.io/en/latest/datamodel.html.

[9] R. Donato, “How to Build an ACL Auditor with Batfish,” Network to Code. [Online]. Available: https://blog.networktocode.com/post/how-to-build-an-acl-auditor/.

[10] “Intentionet.” Intentionet; https://www.intentionet.com/.