Date Published: April 6, 2017
Publisher: Public Library of Science
Author(s): Ahmed Abdelaziz, Ang Tan Fong, Abdullah Gani, Usman Garba, Suleman Khan, Adnan Akhunzada, Hamid Talebian, Kim-Kwang Raymond Choo, Chun-Hsi Huang.
Software Defined Networking (SDN) is an emerging promising paradigm for network management because of its centralized network intelligence. However, the centralized control architecture of the software-defined networks (SDNs) brings novel challenges of reliability, scalability, fault tolerance and interoperability. In this paper, we proposed a novel clustered distributed controller architecture in the real setting of SDNs. The distributed cluster implementation comprises of multiple popular SDN controllers. The proposed mechanism is evaluated using a real world network topology running on top of an emulated SDN environment. The result shows that the proposed distributed controller clustering mechanism is able to significantly reduce the average latency from 8.1% to 1.6%, the packet loss from 5.22% to 4.15%, compared to distributed controller without clustering running on HP Virtual Application Network (VAN) SDN and Open Network Operating System (ONOS) controllers respectively. Moreover, proposed method also shows reasonable CPU utilization results. Furthermore, the proposed mechanism makes possible to handle unexpected load fluctuations while maintaining a continuous network operation, even when there is a controller failure. The paper is a potential contribution stepping towards addressing the issues of reliability, scalability, fault tolerance, and inter-operability.
Software Defined Networking (SDN)  is a new evolutionary concept for network architecture, which separates the control plane from the data plane. The separation helps in better management of the network with efficient handling of the network traffic on different planes of the software-defined networks (SDNs) architecture. The data plane in SDN forwards network traffic based on the control plane instructions. The SDN controller builds network intelligence by observing the data plane forwarding entities and other SDN agents. No doubt, the centralized control helps in better network management; however, it always becomes a bottleneck when it comes to exchanging large volumes of data. Moreover, due to the centralized architecture of the controller, it experiences overhead as the number of user increases. Consequently, the controller becomes an obstacle to the smooth provision of service, and if the controller itself fails, the switch that it had been managing can no longer be controlled. Moreover, the SDN controller act as a single point of failure because all the forwarding decisions are dependent directly on the controller . Once the SDN controller or the switches-to-controller links fail, the entire network may collapse.
SDN is a new concept in computer networking, which promises to simplify network control and management and also support innovation through network programmability . However, the traditional network is designed and implemented from a large number of network devices such as switches, firewalls, routers with more complex controls and protocols. The software is embedded on the network devices which require image updating whenever new features are available for its updates. Network engineers are responsible for configuring various network devices, which is a challenging and error-prone task for medium to large-scale networks. Therefore, the separation of the control plane (software) from the data plane  (hardware) in SDN is needed to provide more flexible, programmable, cost efficient and innovative network architecture . SDN was first introduced and promoted by Open Network Foundation (ONF) to address the aforementioned issue. The SDN architecture logically centralizes the network intelligence in the software-based controllers at the control plane. The network devices (data plane) simply acts packet-forwarding devices that can be programmed using an open interface called OpenFlow . The separation of the control plane from the data plane enables easier deployment of new technologies and applications; network virtualization  and various middleboxes can be consolidated into a software control . The separation of the control and data plane is compared to an operating system and the computer hardware which is illustrated in Fig 1; where the controller acts as an operating system and the forwarding devices (switches) act as the hardware devices (CPU, memory, storage). The devices are located in the south of the controller whereas network applications are located in the north of the controller. The network engineer develops customized network applications to perform various tasks such as load balancing, routing, firewall as well as traffic engineering.
Distributed controller architectures with more than one controller could be used to address some of the challenges of a single SDN controller  placement such as availability. In fact, a vast majority of networks contain duplication as a means to ensure the availability of the system. Furthermore, multiple controllers can reduce the latency or increase the scalability and fault tolerance of the SDN deployment. However, this architecture increases the lookup overhead of communication between switches and multiple controllers. A potential downside of this approach is to maintain the consistent state in the overall distributed system. The network applications will act incorrectly when the global view of the network state is inconsistent . There has been a considerable amount of research work on distributed controller platforms such as Onix, HyperFlow, Kandoo, DISCO, Elasticon and Pratyaastha, which suggest the placement of multiple copies of SDN controllers throughout the control plane to provide scalability for larger networks and traffic loads. Onix  is a distributed controller for large scale networks that implements multiple SDN controllers. Onix handles the distribution and collection of information from switches and distributes controls appropriately among various controllers. A similar system with a distributed control platform is HyperFlow  which is an application of the NOX  controller that can handle state distribution between distributed controllers through a push/subscribe system based on the WheelFS  distributed file system. However, HyperFlow can only handle a few thousand events per second and anything beyond that is considered a scalability limitation. Kandoo  distributes controller states by placing the controllers in a two level hierarchy comprising a root controller and multiple local controllers. The system does not allow the controllers within a tier to communicate with one another and limit the usage of the second tier services that requiring a global network view. ElastiCon  proposes a controller pool, which dynamically grows or shrinks according to the traffic conditions. Besides, the workload is dynamically distributed among the controllers. Pratyaastha  proposes a novel approach for assigning SDN switches and partitions the SDN application state to distributed controller instances. As observed from the existing distributed controller architectures, the single point of failure of SDN controller was solved using multiple distributed controllers. However, the solution presented various challenges such as the network state distribution, the network topology consistent state, the master-selection issue and etc. As a result, this research work was carried out to address these issues.
The common perception that the possibility for the controller to become a single-point-of-fail or a bottleneck of the network led to raising serval issues such as scalability, reliability, and performance. Numbers of research papers proposed distributed controller clustering to address these issues. In this section, we discuss these problems following by the placement of the controller in terms of distributed controller clustering.
A study focusing on the Beacon controller  showed that a single SDN controller could handle 12.8 million new flows per second on a 12 cores machine, with an average latency of 24.7 ms for each flow. However, to increase scalability, reliability, robustness and fast failover,  recognized that the logically centralized controller must be physically distributed, as a single SDN controller architecture presents a single point of failure. Besides, the controller reliability will be affected when the switches in a network initiate more flow that the controllers can handle. A reliability-aware controller placement problem was proposed by  with its main objectives was to place a given number of controllers in a certain physical network such that the pre-defined objective function is optimized. The reliability issue is addressed as a placement metric that is reflected by the percentage of valid control paths. Although a tradeoff between reliability and latency shows that additional latency was incurred using the researcher’s algorithm.
Our proposed architecture is based on distributed controller clustering in SDN that consists of two different types of controllers; an open source and commercial based controllers. Both types of controllers having different SDN networks. Each controller is setup within a cluster of three nodes; the controllers in the each cluster are configured in active mode with one of the controllers acting as the primary controller as shown in Fig 3. The mode provides load balancing and sharing; and network consistency among the entire cluster. In our proposed architecture, when a primary controller fails then any other controller among the cluster becomes the primary controller based on a predefined priority configuration, thus ensuring a highly available of SDN architecture. The proposed architecture is designed and implemented using ONOS and HP VAN SDN controllers configured on Amazon EC2 cloud servers. ONOS and HP VAN SDN controllers are installed and configured on different sets of three Amazon EC2 cloud servers. All servers are running Ubuntu server.
In order to evaluate our proposed clustered distributed controller architecture. We conducted two experiments. The first experiment focuses on latency and the second experiment is carried out to capture the number of dropped packets (i.e. packet loss). In this section, an experimental step that includes tools and tests configuration for both experiments are detailed.
In this section, we have evaluated our proposed architecture and discussed its output results in detail. The tests measure the latency and packet loss using the normally distributed controller architecture and the proposed controller clustering architecture in SDN. The controller flow setup delay (latency) test for HP VAN SDN and ONOS controllers are carried out to measure the time taken by the controllers to setup a flow under distributed controller architecture and the proposed controller clustering.
In this paper, we propose a distributed controller clustering mechanism in SDNs. Multiple distributed prominent controllers have been configured in a cluster of three nodes in both active and reactive mode. The controller cluster is placed using the capacitated controller placement algorithm. The emulation of the proposed clustering mechanism shows promising results. The result shows that the proposed distributed controller clustering mechanism is able to significantly reduce the average latency from 8.1% to 1.6%, the packet loss from 5.22% to 4.15%, compared to distributed controller without clustering running on HP Virtual Application Network (VAN) SDN and Open Network Operating System (ONOS) controllers respectively. The result shows that the proposed distributed controller clustering outperforms the existing distributed controller without clustering in terms of latency, and packet loss with reasonable CPU utilization. In future, we consider more rigorous experimentation of diverse SDN commercial controller with different metrics such as flow setup rate (throughput), the number of nodes in the cluster and various others. Moreover, this research work can be extended to be implemented in commercial SDN-based cloud diverse data-centers infra-structures.