Sunday, October 1, 2017

Pike Release - What have we done of late?

Introduction

We would like to present the work, changes, and features we have done this cycle. We have worked very hard, and we would like to show off some of our work.

Birds-Eye View

What Did We Do?

In the past half a year, we have finished the following features:
  • IPv6
  • Trunk ports (VLAN aware VMs)
  • SFC
  • Service Health Report
  • BGP
  • Distributed SNAT
We also made a big improvement to our NorthBound API.

What's Next?

This is what we plan to do for Queens. If you want something that isn't here, let us know!
  • Kuryr Integration
  • RPM and DEB Packaging
  • LBaaS (Dragonflow native and distributed)
  • L3 Flavour
  • Troubleshooting Tools
  • FWaaS
  • TAPaaS
  • Multicast and IGMP
Of course, the best laid plans of mice and men...

Statistics

 

Contribution by Companies (Resolved bugs)

 

 

Lines of Code by Contributors

 

Filed Bugs, by Compay:


Deep-Dive

What Was Done

IPv6

https://docs.openstack.org/developer/dragonflow/specs/ipv6.html

Dragonflow now supports IPv6. This means that virtual machines that are configured on IPv6 subnets can communicate with each other. IPv6 routing also works, so they don't even have to be on the same network. Security groups also work, so you can create IPv6-based micro-segmentation.

If there is also an IPv6 provider network, these VMs can communicate with the outside world.

Note that since NAT is supposed to be less used in IPv6, it was not implemented.

Trunk Ports (VLAN aware VMs) 

https://docs.openstack.org/developer/dragonflow/specs/vlan_aware_vms.html

In container scenarios, a virtual machine with a single virtual network card hosts multiple containers. It tags each container's traffic with a VLAN tag, and then Dragonflow can know to which (virtual) port the traffic belongs. Incoming traffic tagged with the container's VLAN tag is forwarded to the container.

Dragonflow now supports this scenario. With trunk ports, virtual sub-ports can be defined to have a segmentation type and ID, and have a completely different network and subnet than its parent.

Since now anything can sit on the network (virtual machine, container, or even an application inside a network namespace), we will refer to all of them as Network Elements.

A few words regarding implementation: Every port in Dragonflow is tagged in an OpenFlow register. Specifically, reg6 contains an internal ID belonging to the source port.

Dragonflow detects the VLAN tag on the packet from a relevant logical port. It untags the packet, and changes reg6 to the logical subport.

To emphasise the container networking angle, see the Kuryr-Kubernetes blog post: http://www.dragonflow.net/2017/09/kubernetes-container-services-at-scale.html. It discusses the Kuryr integration as well.

SFC

https://docs.openstack.org/developer/dragonflow/specs/service_function_chaining.html

Service Function Chaining allows the tenant to place Service Functions along the network path between network elements. A service function can be anything - e.g. firewall, deep packet inspection device, or VOIP codec.

A blog post specifically for SFC has been published here:

http://www.dragonflow.net/2017/08/policy-based-routing-with-sfc-in.html

Service Health Report

https://blueprints.launchpad.net/dragonflow/+spec/services-status


It is very important to know which running services  are still... running. This is implemented in the Service Health Report feature. Every Dragonflow service now reports its health to the database.

This way you can tell if a service process has died. There isn't a user interface for this yet, but the underlying data is in place.

BGP

https://docs.openstack.org/developer/dragonflow/specs/bgp_dynamic_routing.html

Border Gateway Protocol (BGP) is a standardized gateway protocol designed to exchange routing and reachability information among autonomous systems.

BGP dynamic routing in OpenStack enables advertisement of self-service network prefixes to physical network devices that support BGP, thus removing the conventional dependency on static routes.

Distributed SNAT

https://docs.openstack.org/developer/dragonflow/specs/distributed_snat.html

Distributed SNAT is a novel implementation for SNAT that allows it to be fully distributed. This means that no network node is needed. The entire feature is contained within the compute node relevant to the network element using it.

For more information on exactly how it works, see this post: http://www.dragonflow.net/2017/06/distributed-snat-examining-alternatives.html

What Is Yet To Come

Kuryr Integration

Kuryr allows container networking (e.g. Docker, Kubernetes) to be defined using the Neutron API. We want to make sure Dragonflow supports being deployed and used this way.

It is worth mentioning that external LBaaS solutions such as HA Proxy and Octavia already work with Dragonflow. They are used to support Kubernetes services.

RPM and DEB Packaging

Currently, the only installation method is pip. This is not good enough for production. RPM (for RedHat based distributions) and DEB (For Debian based distribution) packages are a must-have for any mature project.

LBaaS

https://review.openstack.org/#/c/477463/

Dragonflow's motto is that everything should be distributed. Bearing that in mind, we believe that we can improve LBaaS performance by implementing it in a distributed manner. Actual implementation should be pushed as far down as it would go (i.e. implement using OpenFlow first if possible, and push it higher-up only if necessary).

L3 Flavour

https://review.openstack.org/#/c/475174/

Dragonflow is becoming very feature rich, but not all the features are wanted in every deployment. In some cases, only Dragonflow's L3 features (e.g. Distributed SNAT) are needed. Allowing Dragonflow to be deployed as an L3 agent allows Dragonflow to be used with greater flexibility, letting deployers take exactly the features they need.

Troubleshooting Tools

https://docs.openstack.org/developer/dragonflow/testing_and_debugging.html

Troubleshooting cloud networking is a known pain point. It pains developers and operators alike. If there is a problem in the network, you need to quickly find the source of the problem, and quickly identify who can fix it the fastest.

We want to answer this need, by developing troubleshooting tools that will be able to visually show where the network fails, and why.

FWaaS

https://docs.openstack.org/developer/dragonflow/specs/fwaas.html

Sometimes, security groups are just not enough.  In some cases, the user wants to define a firewall inside their virtual network. The most logical place to put the firewall implementation is on the wire, i.e. directly in the pipeline the packet passes.

TAPaaS

https://docs.openstack.org/developer/dragonflow/specs/tap_as_a_service.html

tcpdump is usually the first tool I go to when I don't understand why my network application isn't working. Sometimes even before ping. TAPaaS allows cloud users to have a similar functionality. Implementing this service will go a long way to help users understand why their application isn't working right.

Multicast and IGMP

https://docs.openstack.org/developer/dragonflow/specs/igmp_application_and_multicast_support.html

Multicast communication has many uses. Its power that its both efficient, and specific. However, none of these strengths come into play in the current multicast implementation. This can be improved greatly.

Conclusion

As you can see, we have a lot planned for the next cycle. It would be great if you could join us to suggest features, priorities, or even patches!

Stay tuned for information regarding the vPTG for Queens.

In the meantime, you can find us on the IRC, on Freenode, in #openstack-dragonflow !

Tuesday, September 26, 2017

Bare-Metal networking in OpenStack-ironic

Bare-Metal networking in OpenStack-ironic


Ironic is an OpenStack project for provisioning bare metal machines as part of an OpenStack deployment. Ironic manages those servers by using common management protocols (e.g PXE and IPMI) and vendor-specific management protocols.  (More information about Ironic and how it's work can be found In the Ironic documentation).

  In this post I want to focus on the networking aspect of Ironic . Ironic use Neutron (the networking API of OpenStack)  for configuring the network.“Bare-metal” deployment is little bit different than VM and  Ironic had some extra  requirement from the Neutron ml2 impelmation.  (All operations  mentioned in this post (e.g create-network, create-ports, bind-port etc..) should be implemented by Neutron ml2-driver).   

This post should be an introduction to another-post that will describe how we planning to implement those networking requirements in Dragonflow.   

Ironic networking overview         

What Ironic Requires from neutron-implementation?

  • Ironic defines 3 different network types for "bare metal"  (as doucmented in  spec , doc):
    • Cleaning network - network that is used to clean  the bare-metal server - and make sure that the "bare metal"-node is ready for new workload. That network is recommended to be created as a provider-VLAN network for separation from the tenant  VLAN ranges.
    • Provisioning network - network that is used for regular management of the node (tear-down, reboot, pxe-boot etc..) . Also that network is recommended to be created as a provider-VLAN network for the same reasons of cleaning networks. (The operator can use same network for  Provisioning and cleaning, but Ironic enable define those 2 types for enable the separation between the the new/clean-nodes that are waiting to deploy and the dirty-nodes, that are waiting for clean)
    • Tenant Networks - networks that can be used for accessing to the "bare metal" for any other purpose - those networks should be managed like any network on the cloud. When “bare-metal” node is connected to tenant network , it’s should not be connected to the provision network for security reasons. (the same provision network is used for all bare-metal servers, and it breaks isolation requirements).
  • Supporting port-groups - Bare-Metal often required to treat a group of physical ports - as logical port (e.g BOND/LAG). Those port-groups are required to be managed by Neutron.
  • Support PXE boot with DHCP - the most common way to boot a Bare-metal servers is by PXE boot .The PXE-boot procedure uses dhcp for retrieving the boot-file-name and tftp-server address. Ironic pass the value of those parameters to neutron (by using neutron extra_dhcp_opt ), and the dhcp-server implementation in neutron should use those parameters for answering pxe-dhcp-requests.         

The networking building blocks of Bare-metal deployment

     There are several components involved in the networking of a bare-metal deploy:
  1. The bare-metal server itself.
  2. Ironic conductor - the software component of Ironic that actually controls the "bare metal" server (that includes the TFTP server for the PXE boot).
  3. DHCP server - for the assigning IP address to the "bare metal" server, and support PXE-BOOT param as well.  
  4. Top of rack switch - we assume that the bare-metal server is physically connected to along with all other components (compute-node, ironic conductor-node  etc..) .
  5. Tenant network - can be dynamically attached and detached from the "bare metal" node.     
  6. Provider networks  - for cleaning and provisioning  - and for any other needs.
      Example of deployment :



Bare-metal machine -life-cycle (from networking side):
(full state machine of ironic-node can bew found here )
  1. Cleaning - make the node ready for new a job (use the cleaning network).  
  2. Provisioning - ironic-conductor uses IPMI on the provisioning network in order to start the machine - and use PXE for booting the machine with the desired image. The PXE boot process includes the following steps (all steps done on provisioning networks):
    1. Use DHCP to obtain tftp-server addresses  
    2. Download boot-file from the tftp-server
    3. Boot from the downloaded file         
  3. Connect to tenant network - after the machine is up and running. It can be connected to tenant network and managed like any VM.  At this phase traffic from "bare metal" server interacts with all other component in the deployment (e.g vm , SNAT, DNAT etc.. ).
    1. Ironic can  change the physical-ports that were used for provisioning network to be bind to tenant network. In such case the "bare metal" server will lose the connectivity with Ironic-conductor, and with "bare metal" provisioning.  
  4. Cleaning - back to step 1..   

BM-life-cycle.png

How Neutron learn about the bare metal topology:

neutron-port configurations:
To notify neutron about "bare metal" ports, Ironic uses it's own mechanisms to inspect the hardware , and forward that information as part of neutron-port configuration.
For that 2 new fields introduced in neutron lport (spec) :
  • local_link_information - that field located in the lport binding-profile and used for inform neutron how the port is connected the TOR switch. it's include 3 parameters:
    • switch_id - identifier of the switch that the port connected to. It’s can be switch MAC address OpenFlow based datapath_id.
    • port_id - a physical port-identifier in the switch.
    • switch_info - other information about the switch (optional param).
  • port-groups - a list of parameters for configuring the LAG/BOND on the TOR.
The neutron mechanism-drivers should use that information , while binding the lport.

DHCP configuration:

Ironic uses the extra_dhcp_option attribute on  neutron-port for configuring the the DHCP to support PXE boot (dhcp options:  boot-file-name and tftp-server-address). Neutron  ML2 driver should configure the DHCP server to answer these values upon request.