Research Article: Hierarchical trie packet classification algorithm based on expectation-maximization clustering

Date Published: July 13, 2017

Publisher: Public Library of Science

Author(s): Xia-an Bi, Junxia Zhao, Miguel A Fernandez.

http://doi.org/10.1371/journal.pone.0181049

Abstract

With the development of computer network bandwidth, packet classification algorithms which are able to deal with large-scale rule sets are in urgent need. Among the existing algorithms, researches on packet classification algorithms based on hierarchical trie have become an important packet classification research branch because of their widely practical use. Although hierarchical trie is beneficial to save large storage space, it has several shortcomings such as the existence of backtracking and empty nodes. This paper proposes a new packet classification algorithm, Hierarchical Trie Algorithm Based on Expectation-Maximization Clustering (HTEMC). Firstly, this paper uses the formalization method to deal with the packet classification problem by means of mapping the rules and data packets into a two-dimensional space. Secondly, this paper uses expectation-maximization algorithm to cluster the rules based on their aggregate characteristics, and thereby diversified clusters are formed. Thirdly, this paper proposes a hierarchical trie based on the results of expectation-maximization clustering. Finally, this paper respectively conducts simulation experiments and real-environment experiments to compare the performances of our algorithm with other typical algorithms, and analyzes the results of the experiments. The hierarchical trie structure in our algorithm not only adopts trie path compression to eliminate backtracking, but also solves the problem of low efficiency of trie updates, which greatly improves the performance of the algorithm.

Partial Text

The core equipment of computer network is the router and firewall. Packet classification technology is the key technology of these core devices, which restricts the development of computer network bandwidth. Thus, packet classification technology has great significance on the next-generation Internet network equipment[1], and plays important roles in routing, quality of service, firewall, multimedia communications, accounting, traffic monitoring, and so on[2]. With the rapid development of high-speed network, packet classification technology has become one of the main factors that affect the improvement of network equipment[3]. Meanwhile, packet classification algorithms are required to deal with larger number of rule sets. Researches on efficient packet classification algorithms which support large-scale rule sets are of great significance[4].

In this section, we provide a brief discussion on the packet classification algorithms. General packet classification algorithm are roughly divided into basic data structure algorithms, space mapping algorithms and hardware-based algorithms. The survey of the packet classification algorithms is shown in Table 1.

This section proposed a hierarchical trie algorithm for packet classification based on expectation-maximization clustering. The algorithm has two stages, one is the preprocessing stage of rules and packets, one is the packet matching stage. In the first stage, we firstly adopt the formalization method of packet classification problem to map the rules and packets into rectangular area in the two-dimensional space. Then we use expectation-maximization algorithm to cluster the formalized rules and thus a plurality of clusters could be formed. In the second stage, we construct a hierarchical trie based on the existing clusters and complete the packet matching process. The hierarchical trie structure in this algorithm adopts the path compression to eliminate backtracking and overcomes the difficulty of trie update, which greatly improves the performance of the proposed algorithm.

In this section, we compare our proposed algorithm with PTIAL algorithm by running a series of experiments to compare the performances of these two algorithms. The experiments are conducted by simulation on the ClassBench[30] platform. ClassBench provides classification tables which are similar to real classifiers in the Internet routers, and is able to input traces in accordance with each classification table. Specifically, we have performed simulations by using three different types of classification tables generated by ClassBench, access control lists (ACL), firewalls (FW), and IP chains (IPC). In ClassBench platform, it is the module ‘Filter Set Generator’ that produces synthetic rule sets. The synthetic rule sets can accurately model the characteristics of real rule sets. Though the size of the real rule sets varies, high-level control is provided by ClassBench and ClassBench can generate packet classification rule sets with different characteristics by setting parameters. We use it to generate traces which can simulate the traces running on routers and firewalls. Moreover, we do not set the distributions of protocol, port number and address in order to keep the authenticity of our experiments.

In this section, we present the experiments to compare the performances of our algorithm with the famous algorithm HD-Cuts[31] and GroupCuts[18] in real environment. In the experiments, the metrics of algorithm performance include time performance which is evaluated as memory access, and the identification precision which is evaluated as the accuracy of the algorithms.

Packet classification algorithms need to deal with a growing size of rule sets with the increasing demand for network bandwidth, nevertheless the existing processing speed cannot meet the development of computer networks. Studies supporting efficient packet classification algorithms for large-scale rule sets are of great significance.

 

Source:

http://doi.org/10.1371/journal.pone.0181049

 

0 0 vote
Article Rating
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments