Monday, August 21, 2017

Openstack-Vagrant - Bringing Vagrant, Ansible, and Devstack together to deploy for developers

Introduction

OpenStack developers in general, and Dragonflow developers in particular, find themselves in need of setting up many OpenStack deployments (for testing, troubleshooting, developing, and what not). Every change requires testing on a 'real' environment.

Doing this manually is impossible. This is a task that must be automated. If a patch is ready, setting up a server to test it should take seconds.

This is where Openstack-Vagrant comes in.

More Details

In essence, Openstack-Vagrant (https://github.com/omeranson/openstack-vagrant) is a Vagrantfile (read: Vagrant configuration file) that sets up a virtual machine, configures it and installs all the necessary dependencies (using Ansible), and then runs devstack.

In effect, Openstack-Vagrant allows you to create a new OpenStack deployment by simply updating a configuration file, and running vagrant up.

Vagrant

Vagrant (https://www.vagrantup.com/) allows you to easily manage your virtual machines. They can be deployed on many hosts (e.g. your personal PC, or several lab servers), with many backends (e.g. libvirt, or virtual box), with many distributions (e.g. Ubuntu, Fedora). I am sticking to Linux here, because that's what's relevant to our deployment.

Vagrant also let's you automatically provision your virtual machines, using e.g. shell or Ansible.

Ansible

Ansible (https://www.ansible.com/) allows you to easily provision your remote devices. It was selected for OpenStack-Vagrant for two main reasons:
  1. It is agent-less. No prior installation is needed.
  2. It works over SSH - out of the box for Linux cloud images.
Like many provisioning tools, Ansible is idempotent - you state the outcome (e.g. file exists, package installed) rather than the action. This way the same playbook (Ansible's list of tasks) can be replayed safely in case of errors along the way.

Devstack

Every developer in OpenStack should know devstack (https://docs.openstack.org/devstack/latest/). That's how testing setups are deployed.

Really In-Depth

Let's review how to set-up an OpenStack and Dragonflow deployment on a single server using Openstack-Vagrant.

  1. Grab a local.conf file. The Dragonflow project has some with healthy defaults (https://github.com/openstack/dragonflow/tree/master/doc/source/single-node-conf). At the time of writing, redis and etcd are gated. I recommend etcd, since it's now an Openstack base service (https://github.com/openstack/dragonflow/blob/master/doc/source/single-node-conf/etcd_local_controller.conf)
    • wget https://raw.githubusercontent.com/openstack/dragonflow/master/doc/source/single-node-conf/etcd_local_controller.conf
  2. Create a configuration for your new virtual machine. A basic example exists in the project's repository (https://github.com/omeranson/openstack-vagrant/blob/master/directory.conf.yml).
    • machines: 
        - name: one
          hypervisor:
            name: localhost
            username: root
          memory: 8192
          vcpus: 1
          box: "fedora/25-cloud-base"
          local_conf_file: etcd_local_controller.conf
      
  3. Run vagrant up <machine name>
    • vagrant up one
  4. Go drink coffee. You have an hour. 
  5. Once Ansible finishes its thing, you can log into the virtual machine with vagrant ssh or vagrant ssh -p -- -l stack (to log in directly to the stack user). Once as the stack user, devstack progress is available in a tmux session.
    • vagrant ssh -p -- -l stack
    • tmux attach

How Can It Be Better? 

 There are many ways we can still improve Openstack-Vagrant. Here are some thoughts that come to mind:
  1. A simple CLI interface that creates the configuration file and fires up the virtual machine.
  2. Use templates to make the local.conf file more customisable.

Conclusion

With Openstack-Vagrant, it is much easier to create new devstack deployments. A deployment can be fired in under a minute, and it will automatically boot the virtual machine, update it, install any necessary software, and run devstack

Policy based routing with SFC in DragonFlow


One of the coolest new Pike release features in Dragonflow is support for Service Function Chaining. In this post I'll give a short intro on the topic and share some details on how we implemented it, and what it's good for.

A quick intro

A network service function is a resource that (as the name suggests) provides a service, which could be an IDS, a Firewall, or even a cache server (or anything else that works on the network data path).

In a traditional network environment, packets are forwarded according to their destination, i.e. when Server A wants to send a packet to Server B, it puts Server B's address on packet destination field.  That way, all the switches between the servers know how to forward the packet correctly.




Now consider that you want to steer this traffic through an IDS and a Firewall. Server A will still put Server B as the destination for its packets.  
One popular way to accomplish this is to place A and B within different subnets. This will allow us to use a router to route the packets through our IDS and firewall.




However, such an approach complicates the network quite a bit, requiring each 2 communicating servers to be placed within separate subnets, and causing all their traffic to go through routers (slower and more expensive).  Moreover, all packets will be routed the same way, even if you want to only apply IDS on HTTP traffic.  This headache scales with the number of servers, and it can quickly become a configuration hell.


In the SDN world we should be able to do better


SFC introduces the concept of service chains. 

There are two aspects to a service chain:


Classification

What traffic should be served by the chain.  For example, outbound TCP connections with destination port 80 from subnet 10.0.0.0/16


Service path

Which service functions (and in what order) should be applied to the packet.  For example, a firewall, then IDS, then local HTTP cache

With this in mind, a forwarding element is enhanced to handle service function chains, so everything can be deployed in a more intuitive way:





SFC in OpenStack

OpenStack Neutron supports SFC through the networking-sfc extension. This extension provides a vendor-neutral API for defining service function chains. 

The basic model is composed of 4 object types:

PortChain

Represents the whole service function chain.  It is composed of FlowClassifiers, and PortPairGroups where the former specifies the subset of traffic for which this port chain applies, and the latter specifies what service functions need to be applied.

FlowClassifier

Specifies what packets should enter the specific chain.  The classification is done by matching against packet's fields.  Some of the fields that can be specified are:

  • Source/destination logical ports
  • IP(or IPv6) source and dest CIDRs
  • Protocol types and port numbers
  • L7 URLs

PortPairGroup

Represents a step in the service function chain.  The model aggregates all port pairs that can be used to perform this specific step.

PortPair

Represents a service function instance.  It specifies what port we need to forward our packet into to apply the service and what port the resulting packet will emerge at.


TCP/80 egress traffic of Server A will go through the port chain above. Blue arrow shows possible path of classified traffic, red shows path of not classified traffic.


What goes on the wire

We solved all our issues a few paragraphs above by adding a mysterious SFC forwarder element.  How does it make sure that packets traverse the correct path? 
Usually, packets that need to be serviced by a service function chain are encapsulated and a service header is added to the packet:




The service header is used to store information needed to steer the packet along the service chain (usually, what chain is performed, and how far along the chain are we). Two of the trending choices for service protocols are MPLS and NSH.

With this metadata on the packet, the forwarder can easily decide where packet should be sent next. The service function themselves will receive the packet with the service header and operate on the encapsulated packet.


A packet classifier at SFC forwarder. If the service function supports service headers, the packet is sent in encapsulated form (right). If the service function does on support service headers, a proxy must be used.

The left side of the above figure depicts service protocol unaware function, a function that expects ingress packets to be without any encapsulation. Dragonflow's forwarding element will act as proxy when the function is service unaware.

Dragonflow drivers

In Pike release we have added SFC drivers to Dragonflow, the drivers aim to implement the classification and forwarding elements. The initial version supports:
  • MPLS service chains (the only protocol supported by networking-sfc API)
  • both MPLS aware and unaware service functions

In Dragonflow, we manage our own integration bridge to provide various services in a distributed manner. We implemented service function chaining in a as such. Each Dragonflow controller is a fully capable SFC forwarding element, so a packet does not need to travel elsewhere, unless the service function itself is not present on the current node.

Take SFC for a spin

 Easiest way to get a working environment with Dragonflow + SFC is to deploy it in a devstack.  this is the local.conf I used to deploy it:


[[local|localrc]]
DATABASE_PASSWORD=password
RABBIT_PASSWORD=password
SERVICE_PASSWORD=password
SERVICE_TOKEN=password
ADMIN_PASSWORD=password

enable_plugin dragonflow https://github.com/openstack/dragonflow
enable_service q-svc
enable_service df-controller
enable_service df-redis
enable_service df-redis-server
enable_service df-metadata

disable_service n-net
disable_service q-l3
disable_service df-l3-agent
disable_service q-agt
disable_service q-dhcp

Q_ENABLE_DRAGONFLOW_LOCAL_CONTROLLER=True
DF_SELECTIVE_TOPO_DIST=False
DF_REDIS_PUBSUB=True
Q_USE_PROVIDERNET_FOR_PUBLIC=True
Q_FLOATING_ALLOCATION_POOL=start=172.24.4.10,end=172.24.4.200
PUBLIC_NETWORK_NAME=public
PUBLIC_NETWORK_GATEWAY=172.24.4.1

ENABLED_SERVICES+=,heat,h-api,h-api-cfn,h-api-cw,h-eng

ENABLE_DF_SFC=True
enable_plugin networking-sfc git://git.openstack.org/openstack/networking-sfc

IMAGE_URL_SITE="http://download.fedoraproject.org"
IMAGE_URL_PATH="/pub/fedora/linux/releases/25/CloudImages/x86_64/images/"
IMAGE_URL_FILE="Fedora-Cloud-Base-25-1.3.x86_64.qcow2"
IMAGE_URLS+=","$IMAGE_URL_SITE$IMAGE_URL_PATH$IMAGE_URL_FILE

Distributed SNAT - examining alternatives

Source NAT (SNAT) is a basic cloud network functionality that allows traffic from the private network to go out to the Internet. 

At the time of writing this blog, SNAT still has no agreed-upon mainstream distributed solution for OpenStack Neutron.  While Distributed Virtual Router (DVR) provides a distributed and decentralized solution to local connectivity for floating IP and simplified east-west VM communication, SNAT still needs to be deployed at a network node and remains a traffic bottleneck.


Figure 1

Figure 1 above has two deployed tenants, 'green' and 'orange'. SNAT is performed centrally at the network node.


Possible solutions

There are a number of proposed solutions to decentralize SNAT. Each solution has its own benefits and drawbacks. I am going to dive into two possible solutions that have been recently implemented in Dragonflow SDN Controller.


1. SNAT per {tenant, router} pair

The most straightforward solution is to perform SNAT at compute node router instance. 

However, while DVR deployment can easily copy the internal subnet router address across compute nodes, the router's IP address on the external network can not follow this scheme. Such a deployment will consume extra external address per {tenant, router} pair.


Figure 2

Maximum address consumption equals to:
   [# of compute nodes] x [# of tenants]

This problem may be somewhat mitigated by allocating the external IP in a lazy manner - only when the first VM of the requested tenant is deployed on a compute node that is scheduled by Nova.  In figure 2 external address 172.24.4.2 was allocated only when VM2 of 'orange' tenant was deployed .

This model may be appealing for cloud deployments that have a very large pool of external addresses, or deployments going through an additional NAT beyond the cloud edge. 

However, the Neutron database would need to track all additional gateway router ports and external addresses would need to be assigned implicitly from a separate external address pool. 

A proof of concept implementation for this SNAT model base on Neutron stable/mitaka branch can be found here and is being discussed in this post. This implementation makes a few assumptions that would need to be removed in a future enhancement round, such as:
  • Explicit external address allocation per {router, compute node} pair that required client API modification instead of automated allocation.

2. SNAT per compute node

This second SNAT model we discuss reduces the number of external addresses to a single one per compute node and significantly optimizes network resources consumption, while improving latency and bandwidth of internet-bound traffic.

This model has at least one caveat - When several tenant VMs go out to the internet via the same IP, one tenant abusing an external service (e.g. gmail, fb) may cause blacklisting of the shared external IP, thus affecting the other tenants who share this IP. 

(This problem can be somewhat mitigated by only allowing this capability for "trusted" tenants, while leaving "untrusted" tenants to go via the SNAT node using their pre-assigned external IP).

Figure 3

In figure 3 we can see that the SNAT rule implemented by both tenants is masquerading multiple tenant VMs behind a single external address. On the returning traffic, reverse NAT restores the tenant IP/MAC information and ensures packets return to their rightful owners. 

In order to simplify the visualization of flow, figure 3 shows two routing entities.  In reality this translation could (and probably would) be performed by different routing tables within the same routing function.


Dragonflow implementation

SNAT per compute node model was proposed and implemented in quite elegant manner within the Dragonflow service plugin of Neutron. 

For those of you who are not familiar with its design, Dragonflow runs a tiny local controller on every compute node, that manages the local OVS and feeds off of a central database and a pub/sub mechanism.  The basic control flow is shown on figure 4 below.

Dragonflow uses OVS (Open Virtual Switchas its dataplane implementation, and controls its bridges using OpenFlow. The OVS bridges replace the native Linux forwarding stack. 
Figure 4
This is what happens when Nova requests Neutron to allocate a network port for a VM:
  1. Neutron server writes the new port information in its database and passes port allocation request to the ML2 Plugin
  2. The Dragonflow ML2 Driver writes the newly-created Neutron port information into its separate Dragonflow database (not the Neutron DB)
  3. The Dragonflow ML2 Driver then publishes a port update to the relevant compute nodes, where Dragonflow local controllers are running, using its own pub/sub mechanism 
  4. The Dragonflow local controller running on the compute node where the VM is scheduled for creation fetches the port information from the Dragonflow database or the published port update event, and passes it to the Dragonflow applications
  5. Every application that is registered for this specific event (local neutron port created) may insert/update OVS flows. For instance, the L2 application adds an OVS flow to detect the port's MAC address, and marks the packet to be sent to the relevant port attached to the OVS bridge.
  6. The new SNAT application installs a flow that uses OVS's NAT and connection tracking component to NAT, unNAT, and track the NATed connection. These features are available starting from OVS version 2.6.

SNAT application configuration

When the new SNAT application is enabled, Dragonflow's configuration has to be modified to reflect the host's 'external IP', i.e. the masquerading address of NATed traffic.

figure 5

In figure 5 we see the minimal configuration required to enable the SNAT application in Dragonflow:
  • Add ChassisSNATApp to Dragonflow application list (apps_list)
  • Configure proper external_host_ip address


Questions and comments are welcome.

Useful resources: