TEXT SIZE

CrossRef (0)
Bayesian analysis of random partition models with Laplace distribution

Minjung Kyung

aDepartment of Statistics, Duksung Women’s University, Korea
Correspondence to: 1Department of Statistics, Duksung Women’s University, 33 Samyangro 144-gil, Seoul 01369, Korea. E-mail: mkyung@duksung.ac.kr
Received April 25, 2017; Revised July 24, 2017; Accepted August 21, 2017.
Abstract

We develop a random partition procedure based on a Dirichlet process prior with Laplace distribution. Gibbs sampling of a Laplace mixture of linear mixed regressions with a Dirichlet process is implemented as a random partition model when the number of clusters is unknown. Our approach provides simultaneous partitioning and parameter estimation with the computation of classification probabilities, unlike its counterparts. A full Gibbssampling algorithm is developed for an efficient Markov chain Monte Carlo posterior computation. The proposed method is illustrated with simulated data and one real data of the energy efficiency of Tsanas and Xifara (Energy and Buildings, 49, 560–567, 2012).

Keywords : Laplace mixture, model-based cluster, random partition model, Dirichlet process prior
1. Introduction

Clustering algorithms attempt to understand a partition of a finite set of objects into a potentially predetermined number of nonempty subsets; in addition, the number of partitions is often unknown beforehand. We focus on probability models for partitions and avoid purely algorithmic methods. As a special case, product partition models (PPMs), introduced by Hartigan (1990) and Barry and Hartigan (1992), are based on modeling random partitions of the sample space. These assume that observations in different elements of a random partition of the data are actually independent. So if the probability distribution for the random partitions is in a product form prior to obtaining observations, it is also then in product form after obtaining the observations (Jordan et al, 2007). In inference can therefore be made by conditioning on and averaging over partitions, with a random partition:

$P(ρn={S1,…,Sk})=K∏j=1kc(Sj),$

where ρn is a partition of the objects in a family of subsets S1, S2, …, Sk of S0 = {1, 2, …, n} and c(S ) is a non-negative cohesion that is specified for each subset of S0. Here, the normalizing constant $K=∑ρ∈P∏j=1∣ρ∣c(Sj)$, where ℘ is the set of all possible partitions into nonempty sets. Together with independent sampling across clusters, a PPM can be described as

$P(y∣ρn={S1,…,Sk})∝∏j=1kc(Sj)P(ySj),$

where P(ySk) is the density for subcluster Sk and the cohesion is c(Sj)

Cohesion is the measure of the strength of the functional relationship of the elements in each subsets that then controls the partition of subsets that can be roughly thought of as a probability. A popular choice is c(S ) = m(|S |−1)! where m is a precision parameter and |S | is the number of elements in S. It follows that the resulting probability model for ρn is

$P(ρn)=mk-1∏j=1k(nj-1)!∏i=1n(m+i-1),$

where nj = |Sj| is the number of elements in cluster j that is known as the Dirichlet process (DP) random partition (Blackwell and MacQueen, 1973). Details are in McCullagh and Yang (2007), Müller et al.(2015), Pitman (1996), Quintana and Iglesias (2003), and references therein.

A similarly popular prior on random partitions is model-based clustering and its extended models, which fit a finite mixture of multivariate Gaussian distributions with various variance structures to the data (Banfield and Raftery, 1993; Fraley and Raftery, 2002, 2007; McLachlan and Peel, 2000; Wolfe, 1970). It implements an Expectation-Maximization (EM) algorithm (Dempster et al., 1977) to obtain a local optimum of the log-likelihood. To select the best number of clusters, model selection criterion such as Bayesian information criterion (BIC) was employed after fitting several mixture models with different numbers of clusters.

In the Bayesian literature, the nonparametric Bayesian clustering approach is usually based on a mixture of the DP (Antoniak, 1974; Ferguson, 1973) and an unknown number of clusters. Especially, a Dirichlet process mixture (DPM) of regression models has been widely used as a flexible semiparametric approach for clustering and density estimation (Escobar and West, 1995). The implementation of the DP mixture models has been made feasible by the modern method of Bayesian computation and efficient algorithms (MacEachern and Müller, 1998; Neal, 2000). Product partition type priors on a normal mixture of regression model also have been widely used for the tractable, probability-based, objective function to identify good partitions (Booth et al., 2008; Crowley, 1997; Quintana and Iglesias, 2003).

Recently, a natural extension of the random partition model has been considered with incorporating covariate values in its definition. MacEachern (1999, 2000) proposed a collection of dependent random probability measures with marginal distributions given by the DP. This idea has been extended and applied to the construction of various types of random probability measures such as the density regression (Dunson et al., 2007; Tokdar et al., 2010). A covariate-dependent extension was proposed by Müller et al., (2011) and some alternative extensions to build covariate-dependent random partition models can be found in Park and Dunson (2010), and Argiento et al., (2014). Airoldi et al., (2014) also provided a general family of nonexchangeable species sampling sequences dependent on the realizations of a set of latent variables. Murua and Quintana (2017) recently provided the construction of a covariate-dependent prior distribution based on the Potts clustering model by covariate proximity in both the formation of clusters, and the prior predictive distributions for the multivariate multiple linear regression of the multivariate normal error.

Park and Dunson (2010) argued that a semiparametric Bayesian approach with an infinite number of clusters can be considered by letting yi ~ f (φi), with φi ~ G and G assigned a DP prior. In marginalizing out G, a prior on the partition of subjects into clusters is formed with cluster-specific parameters consisting of independent draws from G0 as base distribution in the DP. This prior is a type of PPM (Quintana and Iglesias, 2003) and it is appealing to marginalize out G in order to increase efficiency in computation and simplify interpretation. They also argued that the DP induces a particular prior on the partition and one can develop alternative classes of PPMs by replacing the DP prior on G with an alternative choice such as species sampling models (Ishwaran and James, 2003; Pitman, 1996) which are a very broad class of nonparametric priors that include the DP as a special case.

Most of the mixture of the regression model are considered with a normality assumption for the distribution of subcluster P(ySk). With normality assumption on the distribution of error in each cluster, various form of mean might be able to be estimated easily based on feasible computation and efficient algorithms. Instead in this research, we propose a Laplace distribution for the distribution of subclusters. Least absolute deviation (LAD) regression has been widely used in practice by assuming that the error terms follow a Laplace distribution. Because it is known that the least absolute value (LAV) estimator is statistically more efficient than the least squares estimator (normal error regression model) when disturbances come from heavy-tailed distributions such as non-normal stable distributions, the Laplace distribution or contaminated normal distribution (Dielman, 1984). He also argued that the asymptotic distribution of the LAV estimator is known under a fairly general set of assumptions, allowing for statistical inference in large samples. Details on the theoretical properties of LAD can be found in Dielman (1984, 2005).

Song et al., (2014) recently proposed a robust estimation procedure for mixture linear regression models with error terms that follow a Laplace distribution. They argued that LAD regression has been widely used in practice to consider the impact of outliers. Outliers are known to impact more heavily on mixture linear regression models than on the usual linear regression models since the outliers affect the estimation of the regression parameters as well as totally blur the mixture structure. The estimation procedure of the EM algorithm has been studied using the fact that the Laplace distribution can be written as a scale mixture of a normal and a latent distribution.

In this research, we develop a full Bayesian estimation procedure for the linear regression mixture model of the full conditional posterior distribution with Laplace distribution. For the prior on the clustering structure, we consider a random partition model of the DP process based on a truncated approximation of stick-breaking priors (Ishwaran and Zarepour, 2000) because the proposed model leads to a tractable, probability-based, objective function to identify good partitions. For the full posterior distribution of Laplace distribution, we consider that the Laplace distribution is a scale mixture of a normal distribution with an exponential mixing density (Andrews and Mallows, 1974). Details are discussed in the following section.

We also apply a post process to posterior samples for parameters of the proposed model to choose a single clustering estimate to compromise the “label-switching” problem (Richardson and Green, 1997; Stephens, 2000). We follow Fritsch and Ickstadt (2009), which finds a single clustering estimate by maximizing the posterior expected adjusted Rand index with the posterior probabilities of two observations being clustered together.

We use hierarchical models and Gibbs sampling to obtain estimators for Laplace distribution mixture models. In Section 2, we consider the hierarchical structure of models and the basic identity of a scale mixture of a normal distribution for Laplace distribution. Section 3 provides details on Markov chain Monte Carlo (MCMC) procedures based on the full conditional distribution of parameters and post process based on the posterior similarity matrix to choose a single cluster and important oscillating functions in each curve based on the posterior expected adjusted Rand index. We compare the proposed Laplace regression mixture and the normal mixture in Section 4, using simulations and data sets. There is a discussion in Section 5.

2. Random partition model of Laplace distribution

We begin with construction of the random partition model with DP prior based on a Laplace linear regression. We discuss the mixture structure and the basic identity of the Laplace distribution which is a scale mixture of a normal distribution with an exponential mixing density.

### 2.1. Random partition model

We discussed in Section 1 that a PPM with a cohesion function c(S ) = m(|S| − 1)! where m is a precision parameter and |S | is the number of elements in S, is the DP random partition and the resulting probability model for the random partition is in (1.2).

Blackwell and MacQueen (1973) proved that for Y1, …, Yn iid from G ~ ℘, the joint distribution of Y is a product of successive conditional distributions of following form:

$yi∣y1,…,yi-1,m,μ,τ2~1i-1+m∑l=1i-1δ(yl=yi)+mi-1+mf(yi∣μ,τ2),$

where f (yi| μ, τ2) is a probability density function: the base measure, and δ(·) denotes the Dirac delta function. Quintana and Iglesias (2003) also show that the joint marginal distribution of (2.1) can be expressed as the PPM as

$P(y)=∏i=1n(1i-1+m∑l=1i-1δ(yl=yi)+mi-1+mf(yi∣μ,τ2))=∑k=1n1∏i=1n(m+i-1)∏j=1km(nj-1)!(∏l=1njf(yl∣μ,τ2))∏l=2njδ(yl=yj)=K*∑k=1n∏j=1kc(Sj)fj(yl),$

where fj(yl) is the density function of yl, nj is the sample size in cluster j, and lSj, Sj is the subset of S0 = {1, 2, …, n} for cluster j and K* is the normalizing constant. This expression is known as the Blackwell and MacQueen (1973)’s Pólya urn representation of the DP.

The algorithms of Bush and MacEachern (1996) are some of the most widely-used approaches for the posterior computation of Pólya urn DP. They argued that their approach first updates the configuration of subjects to clusters based on the Pólya urn scheme in (2.1), and then separately updated cluster specific parameters given the cluster configuration with conjugate priors. Extension to non-conjugate priors is discussed by MacEachern and Müller (1998) and Neal (2000) based on Metropolis-Hastings. Park and Dunson (2010) also considered a generalized Pólya urn scheme based on a distance metric through a flexible nonparametric model for the joint distribution of the predictors.

Here, for the DP process prior, we consider the stick-breaking representation of the DP for the infinite number of clusters. According to Sethuraman (1994), if G is assigned a DP prior with precision m and base measure G0, the stick-breaking representation of G is

$G=∑h=1∞phδθh, ph=Vh∏l

where δθ is a probability measure concentrated at θ and all Vh’s and θ’s are independent. Gibbs sampling methods for stick-breaking priors are provided in many articles.

Ishwaran and James (2001) presented two Gibbs sampling methods for fitting Bayesian nonparametric hierarchical models based on stick-breaking priors. The first type of Gibbs sampler, referred to as a Pólya urn Gibbs sampler, applies to stick-breaking priors with a known Pólya urn characterization, that are priors with an explicit and simple prediction rule. The second method, the blocked Gibbs sampler, works by directly sampling values from the posterior of the random measure. They argue that the blocked Gibbs sampler avoids marginalizing over the prior and allows the prior to be directly involved in the Gibbs sampling scheme. This allows direct sampling of the nonparametric posterior and leads to several computational and inferential advantages. Thus, in this paper, we consider the blocked Gibbs sampler of Ishwaran and James (2001) based on the stick-breaking representation of the DP as a prior on the clustering structure.

For the index of cluster, we consider an indicator variable of mixture Zi for observation i, i = 1, …, n. Then we re-express the model structure with the stick-breaking prior as

$Yi∣Xi,Zi,{θh}h=1∞~f(yi∣Xi,θZi),Zi~∑h=1∞phδh,ph=Vh∏l

where Vh ~ Be (1, m).

### 2.2. Basic identity

For the density function of yi in cluster k, we consider a Laplace distribution such that

$f(yi∣Xi,θk)=12bkexp(-∣yi-Xiβk∣bk),$

where θk = (βk, bk), βk is a regression parameter of the location parameter, and bk is a scale parameter.

The Laplace (double-exponential) distribution is a scale mixture of a normal distribution with an exponential mixing density (Andrews and Mallows, 1974), that is

$a2exp(-a∣z∣)=∫0∞12πτexp(-z22τ)a22exp(-a22τ)dτ.$

Details of the equation and the proof have been discussed by Kyung et al. (2010). The main idea is to introduce appropriate latent parameters. We develop the posterior distribution of Laplace distribution parameters based on the equation in (2.5).

3. Sampling scheme

Ishwaran and Zarepour (2000) proposed a truncated approximation with N < ∞ to the DPM model in (2.3) to improve the mixing of its Gibbs sampler. They argued that the key to work with random probability measures of a truncated approximation is that it allows us to perform blocked updates for the probability p1, …, pN and Z1, …, ZN in (2.3). This then will result in a rapid mixing Markov chain that permits a direct inference for the posterior of the random probability measure G. All the detailed derivation of the posterior distributions of Z and p and how to determine the truncation level N can be found therein. Therefore, we consider a truncated approximation with N instead of ∞ mixing properties for the hierarchical structure of a Laplace random partition model in (2.3) with the probability density function of (2.4).

For the regression parameters in cluster h, we consider the following priors:

$βh~MVNp(0,chI) and bh2~π(bh2)∝1bh2.$

We begin with construction of a cluster structure and discuss how to estimate parameters in each cluster.

• Step 1. Cluster structure

With an appropriate approximation level of N, the subject-specific latent variable Zi in (2.3) follows a discrete distribution with p = (p1, …, pN). Then, the conditional posterior distribution of Zi updated with observed yi is specified as $P(Zi=h∣yi,p,{θl}l=1N)=phf(yi∣Xi,θh)∑l=1Nplf(yi∣Xi,θl).$

Upon sampling Z, the index set Sh = {i; Zi = h} for h = 1, …, N is also updated, inducing the cluster structure among n genes. Letting nh = n (Sh) be the cardinality of Sh, (N − 1) beta random variables, V1, …, VN1 can be updated by sampling from $Vh∣α,Z~Beta(1+nh, m+∑j=h+1Nnj),$

for h = 1, …, N −1, and it is set that VN = 1 to ensure $∑h=1Nph=1$ with ph = VhΠl<h(1−Vl) for h = 1, …, N.

• Step 2. Model parameters

Given the cluster indices of each observation Z, the joint posterior distribution of cluster h based on a hierarchical model with priors can be written as $π(βh,τh,bh2|X,Z,by)∝[∏Zi=h(2πτi)-12 exp {-12τi(yi-Xiβh)2}12bh2exp(-τi2bh2)]1bh2exp(-12chβh′βh),$

where τh = {τi : Zi = h} is a vector variances of observations in cluster h. Thus the full conditional posterior distribution of model parameters can be obtained based on data augmentation methods. For cluster h, the full conditional posterior distributions of parameters are $ti≡bh∣yi-Xiβh∣τi|Zi={βh∣τh,bh2,X,Z,y~MVNp(βh*, Σβh*)h,βh,bh2,X,Z,y~inverseGaussian(μi=1, λi=∣yi-Xiβh∣bh)bh2∣βh,τh,X,Z,y~IG(nh,12∑Zi=1τi),$

where $βh*=(∑Zi=h1τiXi′Xi+1chI)-1(∑Zi=h1τiXi′yi), Σβh*=(∑Zi=h1τiXi′Xi+1chI)-1,$

and nh = |Sh| is the number of observations in cluster h. Details of derivation are in Appendix A.

• Step 3. Post process

Mixture models suffer from a well-known “label-switching” problem, which arises due to the identical likelihood for any permutation of component-specific parameters. Inheriting the properties of the adjusted Rand index, Fritsch and Ickstadt (2009) proposed the posterior expected adjusted Rand index and showed the outperformance of the posterior expected adjusted Rand index over competing methods such as maximum a posteriori (MAP) estimate. In addition, implementing the method is easy with R package mcclust. Details are in Fritsch and Ickstadt (2009) and references therein.

4. Application

We conduct simulation studies to evaluate our proposed random partition model with LAD regression. We implement the full conditional Gibbs sampler using DP prior on the cluster structure to analyze the energy efficiency data set. As a competing model, we consider the normal mixture model (NMM) of DP prior. Also, to compare the proposed method, we consider the model-based clustering of Fraley and Raftery (2002) based on the NMM, for which the R package mclust (Fraley et al., 2012) is available. Mixture models of normal distributions with various covariance structures are fitted via the EM algorithm. The NMM was implemented with no specific prior distribution, which was the default setting provided by the R package mclust.

For the simulation studies, we considered the maximum truncation level is 30 (N = 30) to perform blocked updates for the probability p1, …, pN and Z1, …, ZN in (2.3). An MCMC algorithm also ran for 50,000 iterations with a burn-in period of 20,000. We collected every 10th sampler among 30,000 iterations to prevent the correlation of Gibbs. With 3,000 Gibbs sampler, we conducted the post process. With the unknown number of clusters, for the Gibbs sampling, we consider the posterior expected adjusted Rand index of Fritsch and Ickstadt (2009) in R package mcclust to choose the optimal number of clusters and the indices of clusters. For the model-based clustering, the BIC is used to identify the optimal number of clusters and covariance structure for a given data set, and a MAP estimate is obtained.

### 4.1. Simulation study I

We first evaluated the performance of our method with simulated data, where the classes are known. We simulated data according to the following regression models with n = 300 and k = 3

$Yih=Xiβh+ɛih,$

where i = 1, …, n and h = 1, …, k. We considered three clusters (k = 3) and the cluster indicator Zi follows

$Zi~Multinomial (1,p=(0.3, 0.3, 0.4)).$

For regression, we generated two exploratory variables, X1 from N(−3, 0.01) and X2 from N(2, 0.01). We set a design matrix as X = (1, X1, X2). We fix the regression parameters in each cluster as:

$Cluster 1:β1=(0,0,2), Cluster 2:β2=(-1,0,-2), Cluster 3:β3=(1,1,0).$

For various situation of data structure, we considered three different sets of errors for each clusters:

• Set 1.εi1 ~ N(0, 0.5), εi2 ~ N(0, 0.2) and εi3 ~ N(0, 0.1)

• Set 2. $ɛi1~Laplace(0, 0.5),ɛi2~Laplace(0, 0.2)$ and $ɛi,3~Laplace(0, 0.1)$

• Set 3. εih ~ t (df = 5) for i = 1, …, n and h = 1, …, k.

For normally distributed error data (Set 1), means of cluster Xβh are set to be well separable such that μ1 = Xβ1 ≈ 4, μ2 = Xβ2 ≈ −5, and μ3 = Xβ3 ≈ −2, and the true variances are small numerically as $σ12=0.5,σ22=0.2$, and $σ32=0.1$.

For the model-based normal mixture, to identify the optimal number of clusters and covariance structure for a given data set, the BIC is considered and the BIC plots of each data sets for the number of clusters are in Figure C.1 in Appendix C. By the R package mclust, the mixture models of normal distributions with various covariance structures are considered via the EM algorithm and the BIC are computed. In our simulation studies, for all data sets, the BIC has chosen only one cluster with most of multivariate covariance structures except “spherical, equal volume (EII)” and “spherical, varying volume (VII)” structures. It might be reason for large scale parameter values of each data sets for each cluster, or the limitation of the BIC computation based on the EM algorithm. We already know the number of clusters as 3 for each sets of data. Therefore, for the comparison, we consider the EM based MAP estimation of 3 clusters with spherical and varying volume (VII) covariance.

Based on the posterior mean and 95% credible interval, the proposed model correctly estimates the mean functions, but it fails to capture the linear trends correctly. However, the estimated μh’s are numerically close to the true values. Based on the post process, we compute the adjusted Rand index between the Laplace regression partition model and the true cluster index of the same objects, and Zi’s from our proposed model are perfectly matched to the true indices of clusters. NMM also shows similar results to the proposed model. The estimated curves with true curves are in Figure 1. Both the Laplace regression partition model and the normal regression mixture model estimated the true mean curve adequately. The EM based Gaussian mixture models also estimate the mean of each clusters close to the true values and the computed adjusted Rand index shows that the measured classification index of the EM perfectly matched to true indices of clusters.

Table 1 shows the estimated scale parameters and 95% credible intervals of both Laplace partition models and NMMs. Estimated scale parameters of the proposed Laplace regression partition model are 1 = 0.49(0.34.0.98), 2 = 0.37(0.27, 0.95), and 3 = 0.24(0.17, 0.93), and these are numerically similar to the standard deviation of true models for each clusters. The estimated standard deviation of the NMM are σ̂1 = 0.58(0.47, 0.84), σ̂2 = 0.44(0.36, 0.58), and σ̂3 = 0.30(0.25, 0.54) of cluster 1, 2, and 3. Based on the 95% credible intervals of the proposed Laplace regression partition model, we observe that the posterior distributions of scale parameters are skewed to right and have a wider credible interval than NMM. The posterior distributions of the standard deviations are symmetric and have shorter credible intervals. The estimated standard deviation of the EM NMM are σ̂1 = 0.28, σ̂2 = 0.22, and σ̂3 = 0.15, which are almost half values of the estimated mean standard deviation of the Gibbs normal mixtures.

For Laplace random partition data (Set 2), means of cluster Xβh are also set to be well separable such that μ1 = Xβ1 ≈ 4, μ2 = Xβ2 ≈ −5, and μ3 = Xβ3 ≈ −2, but the true scale parameters are not small numerically to be separable clusters clearly, b1 = 0.71, b2 = 0.45, and b3 = 0.32. Therefore, there might be the grey zone (which is a subregion that is not separable clearly as different clusters).

The estimated number of cluster is 5 based on our proposed model and 4 based on the NMM. Thus, the computed adjusted Rand index between the estimated Zi and the true index is 0.83 and 0.89 for our model and NMM, separately. The Laplace regression random partition model seems to detect partition sensitively for the grey zone between partition 2 and 3, compared to the normal regression mixture model. Figure 2 shows the estimated curves with true curves. Regardless of the estimated number of clusters, the estimated curves are quite close to the true curve for both Laplace and normal regression models.

The proposed model correctly estimates the mean functions based on the posterior mean and 95% credible interval; however, it fails to capture the linear trends correctly and similar results of the NMM. Table 1 shows that the estimated scale parameters and 95% credible intervals of the proposed Laplace regression partition model are 1 = 0.49(0.26, 1.00), 2 = 0.47(0.18, 1.67), 3 = 0.85(0.11, 2.71), 4 = 0.63(0.20, 2.35), and 5 = 0.21(0.06, 1.41), and the estimated median standard deviation of the NMM are σ̂1 = 0.73(0.45, 3.27), σ̂2 = 0.41(0.19, 2.46), σ̂3 = 0.60(0.20, 5.38), and σ̂4 = 0.36(0.19, 8.08). For the scale parameters, credible intervals of standard deviations of normal regression mixture models are quite wider compared to the credible intervals of scale parameters of Laplace regression partition models. The posterior distributions of the standard deviations are also highly skewed right. Data set is generated from the Laplace random partition models; however, it might be a reason for an unstable estimation of the standard deviation in NMMs.

With EM algorithm of fixed 3 clusters, the estimated scale parameters are σ̂1 = 0.43, σ̂2 = 0.36, and σ̂3 = 0.17. There might exist underestimation problem even with known number of clusters for the EM. The computed adjusted Rand index between the estimated Zi and the true index is 0.91. As discussed above, it might be the reason for data generation setting and the forced separation to 3 clusters. We also observe that the estimated curves are quite close to the true curve based on the estimated curve in

For t (df = 5) (Set 3), means of cluster Xβh are also set to be well separable such that μ1 = Xβ1 ≈ 4, μ2 = Xβ2 ≈ −5, and μ3 = Xβ3 ≈ −2, but with df = 5, t-distribution has heavy tails. Therefore, the generated data set might not be well separable.

The estimated number of cluster is 3 based on our proposed model and 2 based on the NMM. Therefore, the computed adjusted Rand index between the estimated Zi and the true index is 0.50 and 0.47 for our model and NMM, separately. The Laplace regression random partition model seems to detect partition sensitively compared to the normal regression mixture model. However, the computed adjusted Rando index is 0.70 for the EM Gaussian mixture model. For the estimated mean of each cluster μh, we observe that the NMM combine cluster 2 and 3, then estimates the mean of cluster 1 as around 4 μ̂1 ≈ 4 and of cluster 2 as around −4 μ̂2 ≈ −4, the mean of the true means of cluster 2 and 3. However, the Laplace random partition model estimate the mean of cluster 1 as around 4 μ̂1 ≈ 4; however, cluster 2 is around −3.5 μ̂2 ≈ −3.5 and around 1.3 for cluster 3 μ̂3 ≈ 1.3. From the histogram of generated data in Figure 3, we observe that there is a group of data between 0 and 3, and the distribution of negative valued data are skewed left. This distribution might be because the true distribution seems to have two clusters with a normal regression mixture model that estimates the number of parameters as 2. The proposed Laplace random partition model also seems to partition a subset of data between 0 and 3 as a different cluster due to the distribution of generated data. However, with fixed 3 clusters, the estimated curve of the EM on the histogram seems not to consider the distribution of data at all even though the computed adjusted Rando index is 0.70. Thus, if our goal is density estimation, we should better use the mixture models with Gibbs sampling, but the EM will provide more hidden information if our goal is the detection of the cluster indices.

The proposed model and the NMM correctly estimates the mean functions based on the posterior mean and 95% credible interval; however, it fails to capture the linear trends correctly. Figure 3 includes the estimated curves with true curves. With heavy tailed mixture, we observe that any method might be unable to capture the true clustering structure in data. Estimated scale parameters and 95% credible intervals in Table 1 of the proposed Laplace regression partition model are 1 = 0.40(0.25, 3.12), 2 = 1.03(0.61, 5.20), and 3 = 1.35(0.58, 4.24). However, the estimated standard deviations of the NMM σ̂1 = 0.84(0.57, 11.80) and σ̂2 = 2.75(2.12, 25.80) are unstable and highly skewed. True values are included in the credible intervals of scale parameters of Laplace partition models; however, the true values are not included in the credible intervals of NMM because it estimated the number of cluster as 2 with large value of standard deviation. The estimated standard deviations of the EM are σ̂1 = 0.73, σ̂2 = 0.66, and σ̂3 = 30, and these are underestimated.

### 4.2. Simulation study II

For more complicated structure of the data generation process with clusters, we generated two more data sets to evaluate our proposed random partition model with LAD regression. We simulated data according to the following regression models with n = 400 and k = 2

$Yih=Xiβh+ɛih$

where i = 1, …, n and h = 1, …, k. We considered two clusters (k = 2) and the cluster indicator Zi follows

$Zi~Multinomial (1,p=(0.6,0.4)).$

For regression, we generated two exploratory variables, X1 from N(0, 1) and X2 from N(0, 1) and set a design matrix as X = (1, X1, X2). The fixed regression parameters in each clusters are:

$Cluster 1:β1=(0,1,1), Cluster 2:β2=(0,-1,-1).$

We mimic the 5th and 6th settings of the simulation studies in Song et al. (2014) and these are;

• Set 4. εih ~ 0.95N(0, 1) + 0.05N(0, 25) for h = 1, 2

• Set 5. εih ~ N(0, 1) with 5

The error in Set 4 is a mixture of two normal distributions and this complexity causes the generated data to appear to have at least four clusters and not easy to partition. This would produce 5% data likely to be low leverage outliers and unsmooth curved data. Based on the posterior mean and 95% credible interval of parameters in each clusters, we observe that the estimate fails to capture the linear trends correctly.

The true number of cluster is 2, and the NMM of Gibbs choose 2 clusters and the Laplace partition model of Gibbs have chosen 5 clusters based on the posterior expected adjusted Rand index. The BIC of the EM model based-cluster consider 2 to 3 clusters with “spherical, varying volume (VII)” variance structure, but we choose to have 2 clusters. Based on the chosen number of clusters of models, the computed adjusted Rand index between clusterings/partitions and the true indices of the clusters are in Table 2. We observe that the computed value of adjusted Rand index with true indices of the Gibbs NMM is slightly larger than that of the Gibbs Laplace partition model numerically. However, the EM model-based model seems not to detect true clustering indices correctly compared to other Gibbs models.

Figure 4 shows the estimated curves of NMM and LPM with 95% credible intervals on selected data. Based on the 95% credible intervals, we observe that the 95% credible interval of LPM is wider than NMM as discussed in the previous section. The estimated curves do not seem to adequately estimate the true curve at each data point due to the complexity of data generation setting. However, the true number of clusters is 2, and the estimated curves of NMM and LPM seem to capture the true number of clusters around the data points that can be easily partitioned.

In the generating setting of the 5th data (Set 5), 5% of the observations are replicated serving as high leverage outliers, used to check the robustness of estimation procedures against the high leverage outliers. The BIC of the EM model based-cluster consider 2 to 3 clusters “spherical, equal volume (EII)” variance structure; however, we choose to have 3 clusters because of the 5% high leverage outliers. The NMM and the Laplace partition model of Gibbs choose 4 clusters based on the posterior expected adjusted Rand index. From the computed adjusted Rand index in Table 2, we observe that the computed value of adjusted Rand index of the Gibbs normal is a little larger than it of the Gibbs Laplace partition model. It is unexpected that our proposed method performs no better than the Gibbs NMM. We observe that the estimated curve of the Gibbs normal mixture does not capture the true curve correctly from the estimated curves in Figure 5. However, the 95% credible interval of the proposed Gibbs Laplace partition model seems to adequately estimate for the hidden structure of the data with high leverage outliers, even though the 95% credible interval is wider.

### 4.3. Energy efficient data analysis

The energy efficient dataset was created and processed by Tsanas and Xifara (2012) using 12 different building shapes simulated in Ecotect. The buildings differ with respect to glazing area, glazing area distribution, and orientation, amongst other parameters. They originally simulate various settings as functions of the afore-mentioned characteristics to obtain 768 building shapes and the dataset comprises 768 samples and 8 features, aiming to predict two real valued responses. Two responses are “Heating Load” (Y1) and “Cooling Load” (Y2), and eight attributes are relative compactness (X1), surface area (X2), wall area (X3), roof area (X4), overall height (X5), orientation (X6), glazing area (X7), and glazing area distribution (X8). Correlations between explanatory variables are very strong among X1, X2, X4, and X5; however, there is no relationship with X6, X7, and X8. In addition, there exist mild correlation between X3 and (X1, X2, X4, X5), and between X7 and X8.

Tsanas and Xifara (2012) investigated the association strength of each input variable with each of the output variables using a variety of classical and non-parametrical statistical analysis tools to identify the most strongly related input variables. They compared a linear regression approach and random forests to estimate heating load (HL) and cooling load (CL). Tsanas and Xifara did not considered standardization and intercept in the model for the linear regression and random forest models. Tsanas and Xifara concluded that based on the random forest, X7 (glazing area) is the most important predictor for both HL and CL and similar interpretation for a regression model. However, X7 variable varies from 0 to 0.4 and the observed X2 variable is in (514.5, 808.5). It might be a reason why the estimated impact of X7 is larger than other variables.

We instead consider a normal regression mixture model and a Laplace regression random partition model for HL and CL because a simple linear regression is inadequate to explain the relationship of input variables to output variable. Figure 6 includes histograms of HL and CL with estimated density based on Gaussian kernels. We observe from histograms that the HL might be able to explained with mixture of few normal distributions, and the CL can be explained with one normal distribution with small variance and one normal distribution with large variance.

Estimated cluster-specific parameters of normal linear regression mixture model and Laplace linear regression random partition model are in Table B.2 in Appendix B. The estimated number of clusters of NMM is 6 and of Laplace partition model is 2. For NMM, cluster 1 is specified with positive parameters of X5 and X8, cluster 2 is with no input variables, cluster 3 is with negative parameter of X1 and positive parameters of X5 and X7, cluster 4 is very similar with cluster 3, cluster 5 is positive parameters of X5 and X7, and cluster 6 is with X7. Tsanas and Xifara (2012) explained that X7 (glazing area) is the most important predictor for HL, and it is in NMM of cluster 3 and 4. However, the impact of X7 is the same as X5 (orientation) in cluster 5. There are also clusters which do not detect X7 as an impact input, and 95% credible intervals are significantly skewed left. With the number observations in each clusters, we observe that most of HL can be explained with X5 and X7.

The estimated number of clusters of Laplace random partition model is 2, and cluster 1 is specified with positive parameters of X5 and X7 and cluster 2 is with positive parameter of X7 only. We also observed that the X7 input variable is the most important variable to explain HL. Even though the observed values of X7 is smaller than other variables, in cluster 1, we observe that the 95% credible interval is not wide to suspect the impact of the small observed values. We select few of data points and plot curves of HL and CL in Figure 5 that compare the estimated curves of both the NMM and Laplace random partition model. The estimated curves of selected NMM data points show many bumps compared to the estimated curves of Laplace random partition model due to the estimated number of clusters and cluster-specific parameter estimation. The estimated curves of HL is in the upper part of

For the Cooling Load, estimated cluster-specific parameters of the normal linear regression mixture model and Laplace linear regression random partition model are in Table B.1 in Appendix B. The estimated number of clusters of NMM is 4 and of Laplace partition model is 2. For Laplace random partition model, cluster structure on CL is similar and like clusters on HL. For NMM, cluster 1 is specified with X5, cluster 2 and 4 are with X5 and X7, but cluster 3 is with no significant input variables. The estimated curves of the normal mixture and Laplace partition models on selected data points are in

Unlike the conclusion of Tsanas and Xifara (2012), we observe that X5 (orientation) is also an important variable to explain HL and CL with X7 (glazing area). Tsanas and Xifara (2012) argued that the most important variable (glazing area) is not the most correlated with either output variable and other input variables. It can also be intuitively understood that the glazing area is of paramount significant to determine the energy performance of buildings. However, we argue that there are various cluster structures to explain HL and CL with significant input variables in each cluster. Therefore, various linear combinations of orientation and glazing area are important elements to determine the energy performance of buildings because the amount of glazing and the orientation of buildings determine that the heat absorbed in a building due to the sun as well as a similar orientation and glazing is a source of heat leakage from the building to the environment.

5. Discussion

We have developed a random partition procedure based on a DP prior with Laplace distribution. A full Gibbs-sampling algorithm for the linear regression mixture model of the full conditional posterior distribution with Laplace distribution is developed for an efficient MCMC posterior computation. For the prior on the clustering structure, we consider a random partition model of the DP, because the proposed model leads to a tractable, probability-based, objective function to identify good partitions. For the full posterior distribution of the Laplace distribution, we consider the fact the Laplace distribution is a scale mixture of a normal distribution with an exponential mixing density (Andrews and Mallows, 1974). We also have applied a post process to posterior samples for parameters of the proposed model to choose a single clustering estimate to compromise the “label-switching” problem based on maximizing the posterior expected adjusted Rand index of Fritsch and Ickstadt (2009).

For the comparison of the proposed methods, we considered the model-based clustering, Gaussian mixture model, based on the EM methods in our simulation studies. To choose the optimal number of clusters, we considered the BIC values on each sets. However, in our simulation studies, strangely for all data sets, the BIC has chosen only one cluster with most of multivariate covariance structures except “spherical, equal volume (EII)” and “spherical, varying volume (VII)” structures. It might be the reason of large scale parameter values of each sets of data for each clusters or the limitation of the BIC computation based on the EM algorithm. In the simulation, we already know the number of clusters and we fixed the number of clusters as 3 that is the true number of clusters for the data generation.

For the first set of simulations, we considered three different sets of error distributions, normal, Laplace, and t with df = 5. Based on the posterior mean and 95% credible interval, the proposed model and the NMM correctly estimates the mean functions, but it fails to capture the linear trends correctly for all sets of generated data. With light tailed error such as Laplace distribution, for the scale parameters, credible intervals of standard deviations of normal regression mixture models are quite wider compared to the credible intervals of scale parameters of Laplace regression partition models. The posterior distributions of the standard deviations are also highly skewed right. However, with heavy tailed errors such as t distribution with df = 5, we observed that in the credible intervals of scale parameters of Laplace partition models, true values are included, but in the credible intervals of NMM, the true values are not included because it estimated the number of clusters smaller with a large value of standard deviation than the true number of clusters.

The two data sets in the second simulation section were with the 5% low leverage outliers and with the 5% high leverage outliers, respectively. The NMM and even the EM model based clustering algorithm failed to capture the linear trends correctly in the proposed model; in addition, the estimated curves were not on the generated data points correctly. However, for the data with 5% high leverage outliers, the 95% credible interval of the proposed Gibbs Laplace partition model seems to adequately estimate for the hidden structure of data with high leverage outliers, even though the 95% credible interval is wider.

The EM NMM seems to underestimate the scale parameters of each clusters on each set of data compared to other Gibbs methods. Also, with fixed number of clusters as the true number of clusters, the estimated curve of the EM on the histogram seem not to consider the distribution of data with heavy tailed error, but the indices of clusters based on the EM seem close to the true indices of clusters. It is best use the mixture models with Gibbs sampling if our goal is density estimation; however, the EM will provide more hidden information if our goal is the detection of the cluster indices.

We observe that X5 (orientation) is also an important variable to explain HL and CL with X7 (glazing area) in the energy performance data of buildings. The estimated numbers of clusters for HL are 6 of NMM and 2 of Laplace random partition model; in addition, the estimated number of clusters for CL are 4 of NMM and 2 of Laplace random partition model. We conclude that there are various cluster structures to explain HL and CL with significant input variables in each cluster. Thus various linear combinations of orientation and glazing area are important elements to determine the energy performance of buildings, because the amount of glazing and the orientation of buildings determine the heat absorbed in a building due to the sun; in addition, similarly orientation and glazing is a source of heat leakage from the building to the environment.

Acknowledgements

Minjung Kyung was supported by the National Research Foundation of Korea (NRF) grant funded by the Ministry of Science, ICT & Future Planning (Grant No. NRF-2015R1C1A1A01051837).

Appendix A

### Posterior distribution of model parameters

The joint posterior distribution of parameters for cluster h in Section 3 is

$π(βh,τh,bh2∣X,Z,by)∝[∏Zi=h(2πτi)-12exp {-12τi(yi-Xiβh)2}12bh2exp (-τi2bh2)]1bh2exp (-12chβh′βh).$

For the posterior distribution of βh,

$π (βh∣τh,bh2,X,Z,y)∝exp {-12∑Zi=h1τi(yi-Xiβh)′(yi-Xiβh)-12chβh′βh}∝exp [-12{βh′(∑Zi=h1τiXi′Xi+1chI)βh-2βh′(∑Zi=h1τiXi′yi)}]∝exp [-12(βh-βh*)′Σβh*-1(βh-βh*)],$

where

$βh*=(∑Zi=h1τiXi′Xi+1chI)-1(∑Zi=h1τiXi′yi), Σβh*=(∑Zi=h1τiXi′Xi+1chI)-1.$

Therefore,

$βh∣τh,bh2,X,Z,y~MVNp(βh*, Σβh*).$

For the posterior distribution of τi of τh = {τi|Zi = h},

$π(τi∣Zi=h,βh,bh2,X,Z,y)∝τi-12 exp {-12τi(yi-Xiβh)2-τi2bh2}∝τi-12 exp [-12τibh2{bh2(yi-Xiβh)2+τi2}]∝τi-12 exp {-12τi(τibh-∣yi-Xiβh∣)2}.$

Let

$τibh∣yi-Xiβh∣=1ti$

then

$π(ti∣Zi=h,βh,bh2,X,Z,y)∝(bh∣yi-Xiβh∣ti)-12bh∣yi-Xiβh∣ti-2exp {-∣yi-Xiβh∣22bh∣yi-Xiβh∣ti(1ti-1)2}∝(∣yi-Xiβh∣/bh2πti3)12exp {-∣yi-Xiβh∣/bh2ti(ti-1)2}.$

Therefore,

$ti≡bh∣yi-Xiβh∣τi|Zi=h,βh,bh2,X,Z,y~inverse Gaussian(μi=1, λi=∣yi-Xiβh∣bh),$

and

$τi=bh∣yi-Xiβh∣ti.$

For the posterior distribution of $bh2$,

$π(bh2∣βh,τh,X,Z,y)∝(1bh2)nh+1exp(-∑Zi=hτi2bh2).$

Therefore,

$bh2∣βh,τh,X,Z,y~IG(nh,12∑Zi=1τi).$
Appendix B

### Parameter estimation

Posterior median and 95% credible interval of cluster-specific model parameters of NMM and LPM for Heating load

ModelParameterCluster

123
NMMβ10.29 (−1.89, 2.28)0.23 (−1.81, 2.33)−2.20 (−4.01, −0.71)
β2−0.01 (−0.83, 0.77)−0.02 (−0.83, 0.75)0.00 (−0.78, 0.78)
β30.04 (−0.76, 0.86)0.06 (−0.70, 0.87)0.09 (−0.69, 0.88)
β4−0.01 (−1.58, 1.63)−0.02 (−1.56, 1.59)−0.10 (−1.65, 1.49)
β51.87 (1.46, 2.77)2.01 (−0.41, 2.72)1.21 (0.82, 2.47)
β6−0.01 (−0.44, 0.24)−0.11 (−0.60, 0.41)−0.02 (−0.11, 0.06)
β70.22 (−1.45, 2.63)0.67 (−1.41, 2.79)10.33 (2.27, 11.27)
β82.01 (1.05, 2.98)2.07 (−0.38, 3.44)−0.09 (−0.15, 2.16)
σ0.07 (0.04, 2.14)1.30 (0.35, 2.20)0.38 (0.32, 1.05)
nh2022116

ModelParameterCluster

456

NMMβ1−1.64 (−2.72, −0.33)−1.40 (−2.77, 0.07)−0.15 (−2.20, 1.71)
β20.01 (−0.84, 0.79)0.02 (−0.79, 0.80)0.07 (−0.71, 0.85)
β30.01 (−0.77, 0.86)0.03 (−0.76, 0.83)−0.03 (−0.81, 0.76)
β4−0.05 (−1.63, 1.64)−0.07 (−1.63, 1.53)0.04 (−1.51, 1.60)
β54.11 (2.97, 4.25)3.10 (2.89, 3.27)−0.62(−1.46, 0.32)
β6−0.01 (−0.10, 0.05)−0.03 (−0.16, 0.10)0.00 (−0.23, 0.24)
β711.85 (2.88, 12.32)3.47 (2.39, 4.40)7.82 (5.09, 10.68)
β8−0.05 (−0.09, 0.06)0.00 (−0.10, 0.11)−0.11 (−0.29, 0.08)
σ0.39 (0.35, 0.71)0.65 (0.54, 0.79)1.32 (1.05, 1.65)
nh371125114

ModelParameterCluster

12

LPMβ1−1.77 (−6.69, 1.42)−1.38 (−3.30, 1.29)
β20.00 (−0.78, 0.76)−0.04 (−0.87, 0.75)
β30.04 (−0.74, 0.82)0.13 (−0.88, 1.41)
β4−0.04 (−1.58, 1.52)−0.10 (−1.74, 1.52)
β53.68 (2.98, 4.28)3.62 (−1.43, 3.93)
β6−0.03 (−0.19, 0.13)−0.03 (−0.84, 0.91)
β711.63 (10.51, 14.38)11.25 (0.88, 12.48)
β80.09 (−0.03, 0.27)0.09 (−0.73, 0.95)
b2.23 (1.76, 2.43)2.21 (1.00, 2.47)
nh72939

NMM = normal mixture model; LPM = Laplace partition model.

Posterior median and 95% credible interval of cluster-specific model parameters of NMM and LPM for cooling load

ModelParameterCluster

1234
NMMβ10.10 (−1.54, 2.12)1.13 (−0.58, 2.77)−0.21 (−2.24, 1.80)−1.36 (−3.42, 1.99)
β2−0.01 (−0.80, 0.80)−0.01 (−0.83, 0.79)−0.04 (−0.84, 0.77)−0.01 (−0.79, 0.76)
β30.08 (−0.72, 0.88)0.08 (−0.73, 0.89)0.26 (−0.55, 1.06)0.40 (−0.76, 0.85)
β4−0.03 (−1.63, 1.56)−0.03 (−1.64, 1.60)−0.19 (−1.79, 1.42)−0.02 (−1.58, 1.54)
β50.76 (0.20, 1.15)1.90 (0.77, 2.17)0.61 (−1.40, 1.43)5.31 (0.35, 5.62)
β6−0.04 (−0.15, 0.08)0.14 (−0.07, 0.32)−0.06 (−0.57, 0.48)0.04 (−0.12, 0.25)
β70.69 (−1.55, 2.52)9.52 (1.52, 10.94)1.81 (−0.36, 4.15)9.95 (0.19, 10.86)
β81.10 (−0.00, 2.42)0.03 (−0.10, 2.35)0.35 (−0.16, 0.80)0.01 (−0.07, 1.42)
σ0.15 (0.08, 0.38)1.65 (0.14, 1.86)2.06 (1.42, 2.80)0.76 (0.13, 1.98)
nh1030949400

ModelParameterCluster

12

LPMβ1−1.96 (−8.86, 0.82)−1.02 (−3.39, 1.60)
β2−0.00 (−0.81, 0.80)−0.03 (−0.91, 0.84)
β30.03 (−0.77, 0.83)0.11 (−0.93, 1.41)
β4−0.02 (−1.62, 1.58)−0.10 (−1.84, 1.59)
β53.80 (3.15, 5.05)3.57 (−1.71, 4.16)
β60.12 (−0.13, 0.29)0.13 (−0.83, 1.06)
β78.70 (7.37, 10.68)7.74 (1.20, 9.51)
β80.06 (−0.11, 0.20)0.05 (−0.86, 1.04)
b2.37 (1.98, 2.60)2.35 (1.00, 3.70)
nh73929

NMM = normal mixture model; LPM = Laplace partition model.

Appendix C

### BIC plots of EM model based clustering

BIC plots of model based clustering based on the generated data from normal mixture model (Set 1), Laplace random partition model (Set 2), and t mixture model of df = 5 (Set 3). BIC = Bayesian information criterion.

Figures
Fig. 1. Histogram of generated data from normal mixture model (Set 1). The cluster-specific mean curves (black solid lines) with two estimated curves by the Dirichlet process normal mixture model (dotted blue lines), the model-based normal mixture EM model (dot-dashed magenta lines), and the Laplace regression random partition model (red dot lines) are on histogram.
Fig. 2. Histogram of generated data from Laplace random partition model (Set 2). The cluster-specific mean curves (black solid lines) with two estimated curves by the normal mixture model (dotted blue lines) and the Laplace regression random partition model (red dot lines) are on histogram.
Fig. 3. Histogram of generated data from t mixture model of df = 5 (Set 3). The cluster-specific mean curves (black solid lines) with two estimated curves by the normal mixture model (dotted blue lines) and the Laplace regression random partition model (red dot lines) are on histogram.
Fig. 4. Estimated curves based on normal mixture model (bold black line) and on Laplace random partition model (bold blue line) with 95% credible intervals of normal mixture (dotted black lines) and of Laplace random partition model (dotted blue lines) for Set 4 data at selected data points.
Fig. 5. Estimated curves based on normal mixture model (bold black line) and on Laplace random partition model (bold blue line) with 95% credible intervals of normal mixture (dotted black lines) and of Laplace random partition model (dotted blue lines) for Set 5 data at selected data points.
Fig. 6. Histograms of heating load and cooling load with estimated curve of responses based on kernel method.
Fig. 7. Estimated curves based on normal mixture model (bold black line) and on Laplace random partition model (bold blue line) with 95% credible intervals of normal mixture (dotted black lines) and of Laplace random partition model (dotted blue lines) for heating and cooling load at selected data points.
TABLES

### Table B.1

Posterior median and 95% credible interval of cluster-specific model parameters of NMM and LPM for Heating load

ModelParameterCluster

123
NMMβ10.29 (−1.89, 2.28)0.23 (−1.81, 2.33)−2.20 (−4.01, −0.71)
β2−0.01 (−0.83, 0.77)−0.02 (−0.83, 0.75)0.00 (−0.78, 0.78)
β30.04 (−0.76, 0.86)0.06 (−0.70, 0.87)0.09 (−0.69, 0.88)
β4−0.01 (−1.58, 1.63)−0.02 (−1.56, 1.59)−0.10 (−1.65, 1.49)
β51.87 (1.46, 2.77)2.01 (−0.41, 2.72)1.21 (0.82, 2.47)
β6−0.01 (−0.44, 0.24)−0.11 (−0.60, 0.41)−0.02 (−0.11, 0.06)
β70.22 (−1.45, 2.63)0.67 (−1.41, 2.79)10.33 (2.27, 11.27)
β82.01 (1.05, 2.98)2.07 (−0.38, 3.44)−0.09 (−0.15, 2.16)
σ0.07 (0.04, 2.14)1.30 (0.35, 2.20)0.38 (0.32, 1.05)
nh2022116

ModelParameterCluster

456

NMMβ1−1.64 (−2.72, −0.33)−1.40 (−2.77, 0.07)−0.15 (−2.20, 1.71)
β20.01 (−0.84, 0.79)0.02 (−0.79, 0.80)0.07 (−0.71, 0.85)
β30.01 (−0.77, 0.86)0.03 (−0.76, 0.83)−0.03 (−0.81, 0.76)
β4−0.05 (−1.63, 1.64)−0.07 (−1.63, 1.53)0.04 (−1.51, 1.60)
β54.11 (2.97, 4.25)3.10 (2.89, 3.27)−0.62(−1.46, 0.32)
β6−0.01 (−0.10, 0.05)−0.03 (−0.16, 0.10)0.00 (−0.23, 0.24)
β711.85 (2.88, 12.32)3.47 (2.39, 4.40)7.82 (5.09, 10.68)
β8−0.05 (−0.09, 0.06)0.00 (−0.10, 0.11)−0.11 (−0.29, 0.08)
σ0.39 (0.35, 0.71)0.65 (0.54, 0.79)1.32 (1.05, 1.65)
nh371125114

ModelParameterCluster

12

LPMβ1−1.77 (−6.69, 1.42)−1.38 (−3.30, 1.29)
β20.00 (−0.78, 0.76)−0.04 (−0.87, 0.75)
β30.04 (−0.74, 0.82)0.13 (−0.88, 1.41)
β4−0.04 (−1.58, 1.52)−0.10 (−1.74, 1.52)
β53.68 (2.98, 4.28)3.62 (−1.43, 3.93)
β6−0.03 (−0.19, 0.13)−0.03 (−0.84, 0.91)
β711.63 (10.51, 14.38)11.25 (0.88, 12.48)
β80.09 (−0.03, 0.27)0.09 (−0.73, 0.95)
b2.23 (1.76, 2.43)2.21 (1.00, 2.47)
nh72939

NMM = normal mixture model; LPM = Laplace partition model.

### Table B.2

Posterior median and 95% credible interval of cluster-specific model parameters of NMM and LPM for cooling load

ModelParameterCluster

1234
NMMβ10.10 (−1.54, 2.12)1.13 (−0.58, 2.77)−0.21 (−2.24, 1.80)−1.36 (−3.42, 1.99)
β2−0.01 (−0.80, 0.80)−0.01 (−0.83, 0.79)−0.04 (−0.84, 0.77)−0.01 (−0.79, 0.76)
β30.08 (−0.72, 0.88)0.08 (−0.73, 0.89)0.26 (−0.55, 1.06)0.40 (−0.76, 0.85)
β4−0.03 (−1.63, 1.56)−0.03 (−1.64, 1.60)−0.19 (−1.79, 1.42)−0.02 (−1.58, 1.54)
β50.76 (0.20, 1.15)1.90 (0.77, 2.17)0.61 (−1.40, 1.43)5.31 (0.35, 5.62)
β6−0.04 (−0.15, 0.08)0.14 (−0.07, 0.32)−0.06 (−0.57, 0.48)0.04 (−0.12, 0.25)
β70.69 (−1.55, 2.52)9.52 (1.52, 10.94)1.81 (−0.36, 4.15)9.95 (0.19, 10.86)
β81.10 (−0.00, 2.42)0.03 (−0.10, 2.35)0.35 (−0.16, 0.80)0.01 (−0.07, 1.42)
σ0.15 (0.08, 0.38)1.65 (0.14, 1.86)2.06 (1.42, 2.80)0.76 (0.13, 1.98)
nh1030949400

ModelParameterCluster

12

LPMβ1−1.96 (−8.86, 0.82)−1.02 (−3.39, 1.60)
β2−0.00 (−0.81, 0.80)−0.03 (−0.91, 0.84)
β30.03 (−0.77, 0.83)0.11 (−0.93, 1.41)
β4−0.02 (−1.62, 1.58)−0.10 (−1.84, 1.59)
β53.80 (3.15, 5.05)3.57 (−1.71, 4.16)
β60.12 (−0.13, 0.29)0.13 (−0.83, 1.06)
β78.70 (7.37, 10.68)7.74 (1.20, 9.51)
β80.06 (−0.11, 0.20)0.05 (−0.86, 1.04)
b2.37 (1.98, 2.60)2.35 (1.00, 3.70)
nh73929

NMM = normal mixture model; LPM = Laplace partition model.

### Table 1

Posterior median and 95% CI for cluster-specific residual scale parameter σh or bh of NMM and LPM, and MAP estimate from the model-based clustering of EM

ModelClusterTruthNMMLPMMAP

Mean95% CIMean95% CI
Normal10.710.58(0.47, 0.84)0.49(0.34, 0.98)0.28
20.450.44(0.36, 0.58)0.37(0.27, 0.95)0.22
30.320.30(0.25, 0.54)0.24(0.17, 0.93)0.15

Laplace10.710.73(0.45, 3.27)0.49(0.26, 1.00)0.43
20.450.41(0.19, 2.46)0.47(0.18, 1.67)0.36
30.320.60(0.20, 5.38)0.85(0.11, 2.71)0.17
4-0.36(0.19, 8.08)0.63(0.20, 2.35)
5---0.21(0.06, 1.41)

t (df = 5)11.670.84(0.57, 11.80)0.40(0.25, 3.12)0.73
21.672.75(2.12, 25.80)1.03(0.61, 5.20)0.66
31.67--1.35(0.57, 4.24)0.30

CI = credible interval; NMM = normal mixture model; LPM = Laplace partition model; MAP = maximum a posteriori.

### Table 2

Computed the adjusted Rand index between partitions and the true indices of clusters of the same objects for NMM and LPM, and MAP estimate from the model-based clustering model of EM

Set 4Set 5

NMMLPMMAPNMMLPMMAP
Rand index0.4670.4340.000.4270.4010.102

NMM = normal mixture model; LPM = Laplace partition model; MAP = maximum a posteriori.

References
1. Airoldi, EM, Costa, T, Bassetti, F, Leisen, F, and Guindani, M (2014). Generalized species sampling priors with latent Beta reinforcements. Journal of the American Statistical Association. 109, 1466-1480.
2. Andrews, DF, and Mallows, CL (1974). Scale mixtures of normal distributions. Journal of the Royal Statistical Society Series B (Methodological). 36, 99-102.
3. Antoniak, CE (1974). Mixture of Dirichlet processes with applications to Bayesian nonparametric problems. The Annals of Statistics. 2, 1152-1174.
4. Argiento, R, Cremaschi, A, and Guglielmi, A (2014). A “density-based” algorithm for cluster analysis using species sampling Gaussian mixture models. Journal of Computational and Graphical Statistics. 23, 1126-1142.
5. Banfield, JD, and Raftery, AE (1993). Model-based Gaussian and non-Gaussian clustering. Biometrics. 49, 803-821.
6. Barry, D, and Hartigan, JA (1992). Product partition models for change point problems. The Annals of Statistics. 20, 260-279.
7. Blackwell, D, and MacQueen, JB (1973). Ferguson distributions via Pólya urn schemes. The Annals of Statistics. 1, 353-355.
8. Booth, JG, Casella, G, and Hobert, JP (2008). Clustering using objective functions and stochastic search. Journal of the Royal Statistical Society, Series B (Statistical Methodology). 70, 119-139.
9. Bush, CA, and MacEachern, SN (1996). A semiparametric Bayesian model for randomized block design. Biometrika. 83, 275-285.
10. Crowley, EM (1997). Product partition models for normal means. Journal of the American Statistical Association. 92, 192-198.
11. Dempster, AP, Laird, NM, and Rubin, DB (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B (Methodological). 39, 1-38.
12. Dielman, TE (1984). Least absolute value estimation in regression models: an annotated bibliography. Communications in Statistics Theory and Methods. 4, 513-541.
13. Dielman, TE (2005). Least absolute value regression: recent contributions. Journal of Statistical Computation and Simulation. 75, 263-286.
14. Dunson, DB, Pillai, N, and Park, JH (2007). Bayesian density regression. Journal of the Royal Statistical Society Series B (Statistical Methodology). 69, 163-183.
15. Escobar, MD, and West, M (1995). Bayesian density estimation and inference using mixtures. Journal of the American Statistical Association. 90, 577-588.
16. Ferguson, TS (1973). A Bayesian analysis of some nonparametric problems. The Annals of Statistics. 1, 209-230.
17. Fritsch, A, and Ickstadt, K (2009). Improved criteria for clustering based on the posterior similarity matrix. Bayesian Analysis. 4, 367-392.
18. Fraley, C, and Raftery, AE (2002). Model-based clustering, discriminant analysis, and density estimation. Journal of the American Statistical Association. 97, 611-631.
19. Fraley, C, and Raftery, AE (2007). Bayesian regularization for normal mixture estimation and model-based clustering. Journal of Classification. 24, 155-181.
20. Fraley, C, Raftery, AE, Murphy, TB, and Srucca, L (2012). mclust version 4 for R: Normal mixture modeling for model-based clustering, classification, and density estimation: University of Washington, Department of Statistics
21. Hartigan, JA (1990). Partition models. Communications in Statistics Theory and Methods. 19, 2745-2756.
22. Ishwaran, H, and James, LF (2001). Gibbs sampling methods for stick-breaking priors. Journal of the American Statistical Association. 96, 161-173.
23. Ishwaran, H, and James, LF (2003). Some further developments for stick-breaking priors: finite and infinite clustering and classification. Sankhyā: The Indian Journal of Statistics. 65, 577-592.
24. Ishwaran, H, and Zarepour, M (2000). Markov chain Monte Carlo in approximate Dirichlet and beta two-parameter process hierarchical models. Biometrika. 87, 371-390.
25. Jordan, C, Livingstone, V, and Barry, D (2007). Statistical modelling using product partition models. Statistical Modelling. 7, 275-295.
26. Kyung, M, Gill, J, Ghosh, M, and Casella, G (2010). Penalized regression, standard errors, and Bayesian lassos. Bayesian Analysis. 5, 369-412.
27. MacEachern, SN 1999. Dependent nonparametric processes., ASA Proceedings of the Section on Bayesian Statistical Science, Alexandria, VA, Alexandria, VA.
28. MacEachern, SN (2000). Dependent Dirichlet processes. Columbus, OH: Department of Statistics, The Ohio State University
29. MacEachern, SN, and Müller, P (1998). Estimating mixture of Dirichlet process models. Journal of Computational and Graphical Statistics. 7, 223-238.
30. McCullagh, P, and Yang, J (2007). Stochastic classification models. Proceedings of the International Congress of Mathematicians (Madrid, 2006). Madrid, pp. 669-686
31. McLachlan, GJ, and Peel, D (2000). Finite Mixture Models. New York: John Wiley & Sons
32. Müller, P, Quintana, F, Jara, A, and Hanson, T (2015). Bayesian Nonparametric Data Analysis. Cham: Springer
33. Müller, P, Quintana, F, and Rosner, GL (2011). A product partition model with regression on covariates. Journal of Computational and Graphical Statistics. 20, 260-278.
34. Murua, A, and Quintana, FA (2017). Semiparametric Bayesian regression via Potts model. Journal of Computational and Graphical Statistics. 26, 265-274.
35. Neal, RM (2000). Markov chain sampling methods for Dirichlet process mixture models. Journal of Computational and Graphical Statistics. 9, 249-265.
36. Park, JH, and Dunson, DB (2010). Bayesian generalized product partition model. Statistica Sinica. 20, 1203-1226.
37. Pitman, J (1996). Some developments of the Blackwell-MacQueen urn scheme. Statistics, Probability and Game Theory. Hayward, CA: Institute of Mathematical Statistics, pp. 245-267
38. Richardson, S, and Green, PJ (1997). On Bayesian analysis of mixtures with an unknown number of components. Journal of the Royal Statistical Society, Series B. 59, 731-792.
39. Sethuraman, J (1994). A constructive definition of Dirichlet priors. Statistica Sinica. 4, 639-650.
40. Song, W, Yao, W, and Xing, Y (2014). Robust mixture regression model fitting by Laplace distribution. Computational Statistics and Data Analysis. 71, 128-137.
41. Stephens, M (2000). Dealing with label switching in mixture models. Journal of the Royal Statistical Society, Series B. 62, 795-809.
42. Tokdar, ST, Zhu, YM, and Ghosh, JK (2010). Bayesian density regression with logistic Gaussian process and subspace projection. Bayesian Analysis. 5, 319-344.
43. Tsanas, A, and Xifara, A (2012). Accurate quantitative estimation of energy performance of residential buildings using statistical machine learning tools. Energy and Buildings. 49, 560-567.
44. Quintana, FA, and Iglesias, PL (2003). Bayesian clustering and product partition models. Journal of the Royal Statistical Society Series B (Statistical Methodology). 65, 557-574.
45. Wolfe, JH (1970). Pattern clustering by multivariate mixture analysis. Multivariate Behavioral Research. 5, 329-350.