Date Published: July 18, 2018
Publisher: Public Library of Science
Author(s): Yizhang Wang, Wei Pang, You Zhou, Xiangtao Li.
The performance of density based clustering algorithms may be greatly influenced by the chosen parameter values, and achieving optimal or near optimal results very much depends on empirical knowledge obtained from previous experiments. To address this limitation, we propose a novel density based clustering algorithm called the Density Propagation based Adaptive Multi-density clustering (DPAM) algorithm. DPAM can adaptively cluster spatial data. In order to avoid manual intervention when choosing parameters of density clustering and still achieve high performance, DPAM performs clustering in three stages: (1) generate the micro-clusters graph, (2) density propagation with redefinition of between-class margin and intra-class cohesion, and (3) calculate regional density. Experimental results demonstrated that DPAM could achieve better performance than several state-of-the-art density clustering algorithms in most of the tested cases, the ability of no parameters needing to be adjusted enables the proposed algorithm to achieve promising performance.
Clustering has been a promising technique in data mining and pattern recognition [1–3]. One important goal of clustering is to find potential data structures without supervision information. In density based clustering algorithms, a density threshold (which often represents radius of searching circle and size of grids) is usually used to build a search framework, and the final results depend on the local or global density threshold settings. Density based clustering algorithm follows the hypothesis that the high-density regions are always surrounded by low-density connected sets. DBSCAN, OPTICS, DENCLUE, CLIQUE all belong to this kind of algorithms [4–7], which use thresholds to find clusters. In addition to the above algorithms, recently Rodriguez and Laio proposed a novel fast density clustering algorithm by searching density peaks (DP) , and they adopted a global threshold to calculate the local density of each point. However, manual intervention and domain knowledge is still required to obtain an appropriate threshold in DP, and a user may require a significant amount of time to learn how to configure parameters properly . Based on the above consideration, in this research, we propose an adaptive density clustering method to address the above issue.
In this section, we review the related work of density clustering algorithms. Most density based clustering algorithms are based on the same assumption: the dense regions of objects surrounded by low-density regions clusters. Based on this assumption, many methods such as DBSCAN, OPTICS, DENCLUE, CLIQUE and STING are proposed . These algorithms calculate local density according to a given distance metric, which contains the minimum number of neighborhood at least. DBSCAN is a typical density based clustering algorithm, and in DBSCAN the density of every point is associated with the number of points within a threshold radius circle. OPTICS aims to overcome the shortcomings of DBSCAN by ordering points to identify the cluster structure. The data space can be quantized into a finite number of cells to form a grid structure. There also exist some grid density clustering algorithms: STING uses grids to store statistical information by the wavelet transform method; CLIQUE employs grids to high dimensional data clustering [17, 18]. However, all these algorithms require users to set parameter values manually and this may lead to fluctuating clustering performance.
A single natural cluster can be composed of a set of micro-clusters, each of which includes a smaller number of points with a higher local density. We propose the concept of regional density to measure the distributions of points in each micro-cluster in Section 3.3. The proposed DPAM is based on the following assumption: the smaller the difference of regional density among different micro-cluster is, the better the clustering results are.
In this section we report the performance of DPAM. We experimented with publicly available data, and the clustering performance evaluation tasks include: (1) whether DPAM can produce correct clustering results for low dimensional data, (2) performance comparison with other density clustering algorithms, and (3) whether DPAM is robust on high-dimensional data.
In this research we present a new approach to adaptively obtain optimal density clustering results. Density propagation based adaptive density clustering adopts a three-stage strategy to clustering low-dimensional density spatial data, and it also perform well on high-dimensional data. We report the promising performance of our approach for clustering different datasets and experimental results indicate that our approach overcomes the limitations of some existing clustering algorithms.