Research Article: From here to infinity: sparse finite versus Dirichlet process mixtures in model-based clustering

Date Published: August 24, 2018

Publisher: Springer Berlin Heidelberg

Author(s): Sylvia Frühwirth-Schnatter, Gertraud Malsiner-Walli.

http://doi.org/10.1007/s11634-018-0329-y

Abstract

In model-based clustering mixture models are used to group data points into clusters. A useful concept introduced for Gaussian mixtures by Malsiner Walli et al. (Stat Comput 26:303–324, 2016) are sparse finite mixtures, where the prior distribution on the weight distribution of a mixture with K components is chosen in such a way that a priori the number of clusters in the data is random and is allowed to be smaller than K with high probability. The number of clusters is then inferred a posteriori from the data. The present paper makes the following contributions in the context of sparse finite mixture modelling. First, it is illustrated that the concept of sparse finite mixture is very generic and easily extended to cluster various types of non-Gaussian data, in particular discrete data and continuous multivariate data arising from non-Gaussian clusters. Second, sparse finite mixtures are compared to Dirichlet process mixtures with respect to their ability to identify the number of clusters. For both model classes, a random hyper prior is considered for the parameters determining the weight distribution. By suitable matching of these priors, it is shown that the choice of this hyper prior is far more influential on the cluster solution than whether a sparse finite mixture or a Dirichlet process mixture is taken into consideration.

Partial Text

In the present paper, interest lies in the use of mixture models to cluster data points into groups of similar objects; see Frühwirth-Schnatter et al. (2018) for a review of mixture analysis. Following the pioneering papers of Banfield and Raftery (1993) and Bensmail et al. (1997), model-based clustering using finite mixture models has found numerous applications, see Grün (2018) for a comprehensive review.

Sparse finite mixture models were introduced in Malsiner Walli et al. (2016) in the framework of Gaussian mixture distributions, however, the underlying concept is very generic and can be easily applied to more or less any mixture distribution. In this section, we consider various types of sparse finite mixture models for non-Gaussian data, including sparse latent class models for multivariate categorical data (Sect. 3.1), sparse Poisson mixtures for univariate discrete data (Sect. 3.2) and sparse mixtures of generalised linear models (GLMs) for regression models with count data outcomes (Sect. 3.3). Finally, Sect. 3.4 considers clustering continuous data with non-Gaussian clusters using mixtures of univariate and multivariate skew normal and skew-t distributions. For each of these classes of mixture models, case studies are provided in Sect. 5 where sparse finite mixtures are compared to Dirichlet process mixtures of the same type.

The aim of this simulation study is to investigate whether (1) a sparse finite mixture of non-Gaussian components appropriately estimates the number of data clusters, (2) the posterior of documentclass[12pt]{minimal}
usepackage{amsmath}
usepackage{wasysym}
usepackage{amsfonts}
usepackage{amssymb}
usepackage{amsbsy}
usepackage{mathrsfs}
usepackage{upgreek}
setlength{oddsidemargin}{-69pt}
begin{document}$$K_+$$end{document}K+ of sparse finite mixtures and DPM is comparable, if the priors on the precision parameters documentclass[12pt]{minimal}
usepackage{amsmath}
usepackage{wasysym}
usepackage{amsfonts}
usepackage{amssymb}
usepackage{amsbsy}
usepackage{mathrsfs}
usepackage{upgreek}
setlength{oddsidemargin}{-69pt}
begin{document}$$e_0$$end{document}e0 and documentclass[12pt]{minimal}
usepackage{amsmath}
usepackage{wasysym}
usepackage{amsfonts}
usepackage{amssymb}
usepackage{amsbsy}
usepackage{mathrsfs}
usepackage{upgreek}
setlength{oddsidemargin}{-69pt}
begin{document}$$alpha $$end{document}α are matched, and (3) whether both approaches estimate similar partitions of the data. Additionally, the impact of the prior on documentclass[12pt]{minimal}
usepackage{amsmath}
usepackage{wasysym}
usepackage{amsfonts}
usepackage{amssymb}
usepackage{amsbsy}
usepackage{mathrsfs}
usepackage{upgreek}
setlength{oddsidemargin}{-69pt}
begin{document}$$alpha $$end{document}α and documentclass[12pt]{minimal}
usepackage{amsmath}
usepackage{wasysym}
usepackage{amsfonts}
usepackage{amssymb}
usepackage{amsbsy}
usepackage{mathrsfs}
usepackage{upgreek}
setlength{oddsidemargin}{-69pt}
begin{document}$$e_0$$end{document}e0, the number of specified components K, and the number of observations N is investigated.

For each type of mixture models discussed in Sect. 3, a case study is provided to compare sparse finite mixtures with DPM of the same type. For both model classes, the influence of the priors documentclass[12pt]{minimal}
usepackage{amsmath}
usepackage{wasysym}
usepackage{amsfonts}
usepackage{amssymb}
usepackage{amsbsy}
usepackage{mathrsfs}
usepackage{upgreek}
setlength{oddsidemargin}{-69pt}
begin{document}$$p( e_{0})$$end{document}p(e0) and documentclass[12pt]{minimal}
usepackage{amsmath}
usepackage{wasysym}
usepackage{amsfonts}
usepackage{amssymb}
usepackage{amsbsy}
usepackage{mathrsfs}
usepackage{upgreek}
setlength{oddsidemargin}{-69pt}
begin{document}$$p(alpha )$$end{document}p(α) on the posterior distribution documentclass[12pt]{minimal}
usepackage{amsmath}
usepackage{wasysym}
usepackage{amsfonts}
usepackage{amssymb}
usepackage{amsbsy}
usepackage{mathrsfs}
usepackage{upgreek}
setlength{oddsidemargin}{-69pt}
begin{document}$$p( K _+|{mathbf y})$$end{document}p(K+|y) of the number of clusters documentclass[12pt]{minimal}
usepackage{amsmath}
usepackage{wasysym}
usepackage{amsfonts}
usepackage{amssymb}
usepackage{amsbsy}
usepackage{mathrsfs}
usepackage{upgreek}
setlength{oddsidemargin}{-69pt}
begin{document}$$K _+$$end{document}K+ is investigated in detail. Typically, for sparse finite mixtures documentclass[12pt]{minimal}
usepackage{amsmath}
usepackage{wasysym}
usepackage{amsfonts}
usepackage{amssymb}
usepackage{amsbsy}
usepackage{mathrsfs}
usepackage{upgreek}
setlength{oddsidemargin}{-69pt}
begin{document}$$K=10$$end{document}K=10 and documentclass[12pt]{minimal}
usepackage{amsmath}
usepackage{wasysym}
usepackage{amsfonts}
usepackage{amssymb}
usepackage{amsbsy}
usepackage{mathrsfs}
usepackage{upgreek}
setlength{oddsidemargin}{-69pt}
begin{document}$$e_{0} sim mathcal {G}left( 1 ,200right) $$end{document}e0∼G1,200, implying documentclass[12pt]{minimal}
usepackage{amsmath}
usepackage{wasysym}
usepackage{amsfonts}
usepackage{amssymb}
usepackage{amsbsy}
usepackage{mathrsfs}
usepackage{upgreek}
setlength{oddsidemargin}{-69pt}
begin{document}$$text{ E }(e_{0})=0.005$$end{document}E(e0)=0.005, is specified whereas for DPM documentclass[12pt]{minimal}
usepackage{amsmath}
usepackage{wasysym}
usepackage{amsfonts}
usepackage{amssymb}
usepackage{amsbsy}
usepackage{mathrsfs}
usepackage{upgreek}
setlength{oddsidemargin}{-69pt}
begin{document}$$alpha sim mathcal {G}(2,4)$$end{document}α∼G(2,4) is specified as in Escobar and West (1995). In addition, both priors are matched as described in Sect. 2.3. For each case study, standard finite mixtures with documentclass[12pt]{minimal}
usepackage{amsmath}
usepackage{wasysym}
usepackage{amsfonts}
usepackage{amssymb}
usepackage{amsbsy}
usepackage{mathrsfs}
usepackage{upgreek}
setlength{oddsidemargin}{-69pt}
begin{document}$$ e_{0}=4$$end{document}e0=4 are estimated for increasing K.

This paper extends the concept of sparse finite mixture models, introduced by Malsiner Walli et al. (2016) for Gaussian clustering kernels, to a wide range of non-Gaussian mixture models, including Poisson mixtures, latent class analysis, mixtures of GLMs, skew normal and skew-t distributions. Opposed to common belief, this paper shows that finite mixture models do not necessarily assume that the number of clusters is known. As exemplified for several case studies in Sect. 5, the number of clusters was estimated a posteriori from the data and ranged from documentclass[12pt]{minimal}
usepackage{amsmath}
usepackage{wasysym}
usepackage{amsfonts}
usepackage{amssymb}
usepackage{amsbsy}
usepackage{mathrsfs}
usepackage{upgreek}
setlength{oddsidemargin}{-69pt}
begin{document}$$ hat{K} _+=1$$end{document}K^+=1 (for the Fabric Fault Data under a mixture of negative binomial GLMs) to documentclass[12pt]{minimal}
usepackage{amsmath}
usepackage{wasysym}
usepackage{amsfonts}
usepackage{amssymb}
usepackage{amsbsy}
usepackage{mathrsfs}
usepackage{upgreek}
setlength{oddsidemargin}{-69pt}
begin{document}$$ hat{K} _+=4$$end{document}K^+=4 (for the Eye Tracking Data), when sparse finite mixtures with documentclass[12pt]{minimal}
usepackage{amsmath}
usepackage{wasysym}
usepackage{amsfonts}
usepackage{amssymb}
usepackage{amsbsy}
usepackage{mathrsfs}
usepackage{upgreek}
setlength{oddsidemargin}{-69pt}
begin{document}$$K=10$$end{document}K=10 components were fitted.

 

Source:

http://doi.org/10.1007/s11634-018-0329-y

 

0 0 vote
Article Rating
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments