Tuesday, September 26, 2017

Bare-Metal networking in OpenStack-ironic

Bare-Metal networking in OpenStack-ironic

Ironic is an OpenStack project for provisioning bare metal machines as part of an OpenStack deployment. Ironic manages those servers by using common management protocols (e.g PXE and IPMI) and vendor-specific management protocols.  (More information about Ironic and how it's work can be found In the Ironic documentation).

  In this post I want to focus on the networking aspect of Ironic . Ironic use Neutron (the networking API of OpenStack)  for configuring the network.“Bare-metal” deployment is little bit different than VM and  Ironic had some extra  requirement from the Neutron ml2 impelmation.  (All operations  mentioned in this post (e.g create-network, create-ports, bind-port etc..) should be implemented by Neutron ml2-driver).   

This post should be an introduction to another-post that will describe how we planning to implement those networking requirements in Dragonflow.   

Ironic networking overview         

What Ironic Requires from neutron-implementation?

  • Ironic defines 3 different network types for "bare metal"  (as doucmented in  spec , doc):
    • Cleaning network - network that is used to clean  the bare-metal server - and make sure that the "bare metal"-node is ready for new workload. That network is recommended to be created as a provider-VLAN network for separation from the tenant  VLAN ranges.
    • Provisioning network - network that is used for regular management of the node (tear-down, reboot, pxe-boot etc..) . Also that network is recommended to be created as a provider-VLAN network for the same reasons of cleaning networks. (The operator can use same network for  Provisioning and cleaning, but Ironic enable define those 2 types for enable the separation between the the new/clean-nodes that are waiting to deploy and the dirty-nodes, that are waiting for clean)
    • Tenant Networks - networks that can be used for accessing to the "bare metal" for any other purpose - those networks should be managed like any network on the cloud. When “bare-metal” node is connected to tenant network , it’s should not be connected to the provision network for security reasons. (the same provision network is used for all bare-metal servers, and it breaks isolation requirements).
  • Supporting port-groups - Bare-Metal often required to treat a group of physical ports - as logical port (e.g BOND/LAG). Those port-groups are required to be managed by Neutron.
  • Support PXE boot with DHCP - the most common way to boot a Bare-metal servers is by PXE boot .The PXE-boot procedure uses dhcp for retrieving the boot-file-name and tftp-server address. Ironic pass the value of those parameters to neutron (by using neutron extra_dhcp_opt ), and the dhcp-server implementation in neutron should use those parameters for answering pxe-dhcp-requests.         

The networking building blocks of Bare-metal deployment

     There are several components involved in the networking of a bare-metal deploy:
  1. The bare-metal server itself.
  2. Ironic conductor - the software component of Ironic that actually controls the "bare metal" server (that includes the TFTP server for the PXE boot).
  3. DHCP server - for the assigning IP address to the "bare metal" server, and support PXE-BOOT param as well.  
  4. Top of rack switch - we assume that the bare-metal server is physically connected to along with all other components (compute-node, ironic conductor-node  etc..) .
  5. Tenant network - can be dynamically attached and detached from the "bare metal" node.     
  6. Provider networks  - for cleaning and provisioning  - and for any other needs.
      Example of deployment :

Bare-metal machine -life-cycle (from networking side):
(full state machine of ironic-node can bew found here )
  1. Cleaning - make the node ready for new a job (use the cleaning network).  
  2. Provisioning - ironic-conductor uses IPMI on the provisioning network in order to start the machine - and use PXE for booting the machine with the desired image. The PXE boot process includes the following steps (all steps done on provisioning networks):
    1. Use DHCP to obtain tftp-server addresses  
    2. Download boot-file from the tftp-server
    3. Boot from the downloaded file         
  3. Connect to tenant network - after the machine is up and running. It can be connected to tenant network and managed like any VM.  At this phase traffic from "bare metal" server interacts with all other component in the deployment (e.g vm , SNAT, DNAT etc.. ).
    1. Ironic can  change the physical-ports that were used for provisioning network to be bind to tenant network. In such case the "bare metal" server will lose the connectivity with Ironic-conductor, and with "bare metal" provisioning.  
  4. Cleaning - back to step 1..   


How Neutron learn about the bare metal topology:

neutron-port configurations:
To notify neutron about "bare metal" ports, Ironic uses it's own mechanisms to inspect the hardware , and forward that information as part of neutron-port configuration.
For that 2 new fields introduced in neutron lport (spec) :
  • local_link_information - that field located in the lport binding-profile and used for inform neutron how the port is connected the TOR switch. it's include 3 parameters:
    • switch_id - identifier of the switch that the port connected to. It’s can be switch MAC address OpenFlow based datapath_id.
    • port_id - a physical port-identifier in the switch.
    • switch_info - other information about the switch (optional param).
  • port-groups - a list of parameters for configuring the LAG/BOND on the TOR.
The neutron mechanism-drivers should use that information , while binding the lport.

DHCP configuration:

Ironic uses the extra_dhcp_option attribute on  neutron-port for configuring the the DHCP to support PXE boot (dhcp options:  boot-file-name and tftp-server-address). Neutron  ML2 driver should configure the DHCP server to answer these values upon request.

Sunday, September 10, 2017

Kubernetes container services at scale with Dragonflow SDN Controller

Cloud native ecosystem is getting very popular, but VM based workloads are not going away. Enabling developers to connect VMs and containers to run hybrid workloads, means shorter time to market, more stable production environment and ability to leverage the maturity of the VM ecosystem.

Dragonflow is a distributed, modular and extendable SDN controller that enables to connect cloud network instances (VMs, Containers and Bare Metal servers) at scale. Kuryr allows you to use Neutron networking to connect the containers on your OpenStack cloud. Combining them allows to use the same networking solution for all workloads.

In this post I will  briefly cover both Dragonflow and Kuryr, explain how Kubernetes cluster networking is supported by Dragonflow and provide details about various Kubernetes cluster deployment options.


Dragonflow Controller in a nutshell

Dragonflow adopts a distributed approach to solve the scaling issues for large scale deployments. With Dragonflow the load is distributed to the compute nodes running local controller. Dragonflow manages the network services for the OpenStack compute nodes by distributing network topology and policies to the compute nodes where they are translated into OpenFlow rules and programmed into Open vSwitch datapath.
Network services are implemented as Applications in the local controller.
OpenStack can use Dragonflow as its network provider through the Modular Layer 2 (ML2) Plugin.


Project Kuryr uses OpenStack Neutron to provide networking for containers. With kuryr-kubernetes, Kuryr project enables native Neutron-based networking for Kubernetes.
Kuryr provides solution for Hybrid workloads, enabling Bare Metal, Virtual Machines and Containers to share the  same Neutron network or to choose different routable network segments.

Kubernetes - Dragonflow Integration

To leverage Dragonflow SDN Controller as Kubernetes network provider, we use Kuryr to act as the container networking interface (CNI) for Dragonflow.

Diagram 1: Dragonflow-Kubernetes integration

Kuryr Controller watches K8s API for Kubernetes events and translates them into Neutron models. Dragonflow translates Neutron model changes into a network topology that gets stored in the distributed DB and propagates network policies to its local controllers that apply changes on open vSwitch pipeline.
Kuryr CNI driver binds Kubernetes pods on worker nodes into Dragonflow logical ports ensuring requested level of isolation.
As you can see in the diagram above, there is no kube-proxy component. Kubernetes services are implemented with the help of Neutron load balancers. Kuryr-Controller translates Kubernetes service into Load Balancer, Listener and Pool. Service endpoints are mapped to the members in the pool. See the following diagram diagram:
Diagram 2: Kubernetes service translation

Currently either Octavia or HA Proxy can be used as Neutron LBaaSv2 providers. In the Queens release, Dragonflow will provide native LBaaS implementation, as drafted in the following specification.

Deployment Scenarios

With Kuryr-Kubernetes it’s possible to choose to run both OpenStack VMs and Kubernetes Pods on the same network provided by Dragonflow if your workloads require it or to use different network segments and, for example, route between them. Below you can see the details of various scenario, including devstack recipes.  

Bare Metal deployment

Kubernetes cluster can be deployed on Bare Metal servers. Logically there are 3 different types of servers.

OS Controller hosts - required control service, such as Neutron Server, Keystone and Dragonflow Northbound Database. Of course, they can be distributed on number of servers.

K8s Master hosts - components that provide the cluster’s control plane. Kuryr-Controller is part of the cluster control plane.

K8s Worker nodes - hosts components that  run on every node, maintaining running pods and providing the Kubernetes runtime environment.

Kuryr-CNI is invoked by Kubelet. It binds Pods into Open vSwitch bridge that is managed by Dragonflow Controller.

If you want to try Bare Metal deployment with devstack, you should enable Neutron, Keystone, Dragonflow and Kuryr components. You can use this local.conf:

Nested (Containers in VMs) deployment

Another deployment option is nested-VLAN, where containers are created inside OpenStack VMs by using the Trunk ports support. Undercloud OS environment has all the needed components to create VMs (e.g., Glance, Nova, Neutron, Keystone, ...), as well as the needed Dragonflow configurations such as enabling the trunk support that will be needed for the VM to enable running Containers to use undercloud networking. The overcloud deployment inside the VM contains Kuryr components along Kubernetes Control plane components.

If you want to try nested-VLAN deployment with devstack, you can use Dragonflow Kuryr Bare Metal config with the following changes:
  1. Do not enable kuryr-kubernetes plugin and kuryr related services as they will be installed inside VM.
  2. Nova and Glance components need to be enabled to be able to create the VM where we will install the overcloud.
  3. Dragonflow Trunk service plugin need to be enable to ensure Trunk ports support.
Then create Trunk and spawn overcloud VM on the Trunk port.
Install overcloud, following the instructions as listed here.

Hybrid environment

Hybrid environment enables diverse use cases where containers, regardless if they are deployed on Bare Metal or inside Virtual Machines, are in the same Neutron network as other co-located VMs.
To bring up such environment with devstack, just follow the instructions as stated in the nested deployment section.

Testing the cluster
Once the environment is ready, we can test that network connectivity works among Kubernetes pods and services. You can check the cluster configuration according to this default configuration guide. You can run simple example application and verify the connectivity and configuration reflected in the Neutron and Dragonflow data model. Just follow the instructions to try sample kuryr-kubernetes application.