TEXT SIZE

CrossRef (0)
Comparison of time series clustering methods and application to power consumption pattern clustering

Jaehwi Kima, Jaehee Kim1,b

aKorea Rural Economic Institute, Korea;
bDepartment of Statistics, Duksung Women’s University, Korea
Correspondence to: 1Department of Statistics, Duksung Women’s University, Samyang-ro 144-gil 33, Dobong-gu, Seoul 01369, Korea. E-mail: jaehee@duksung.ac.kr
Received July 14, 2020; Revised September 8, 2020; Accepted September 26, 2020.
Abstract

The development of smart grids has enabled the easy collection of a large amount of power data. There are some common patterns that make it useful to cluster power consumption patterns when analyzing s power big data. In this paper, clustering analysis is based on distance functions for time series and clustering algorithms to discover patterns for power consumption data. In clustering, we use 10 distance measures to find the clusters that consider the characteristics of time series data. A simulation study is done to compare the distance measures for clustering. Cluster validity measures are also calculated and compared such as error rate, similarity index, Dunn index and silhouette values. Real power consumption data are used for clustering, with five distance measures whose performances are better than others in the simulation.

Keywords : complexity distance, model-free distance, model-based distance, power consumption, time series clustering, silhouette
1. Introduction

Smart grids have many advantages over traditional conventional power grids because they enhance the way electricity is generated, distributed, and consumed by using advanced sensing devices and controllers that depend on power consumption profiles. With the development of the smart grid technology, electrical data is accumulating into big data in real time. According to Haben et al. (2015), the clustering method of the electrical data is changing from a method that has been applied to a large customer group or the energy data of a high/medium voltage customer to low voltage clustering such as household level. Clustering algorithms are used to profile individuals’ power consumption and analyzing their patterns (Al-Jarrah et al., 2017; Tsekouras et al., 2007). This analyzes what energy suppliers and distribution network operators need to understand how customers use energy and its impact on voltage networks. Clustering issues are at the heart of many knowledge discovery and data mining tasks that are also useful in identifying data patterns (Xiong and Yeung, 2004). Clustering is an unsupervised process such as turning grouping data patterns into clusters, therefore the patterns within a cluster are similar but different from between the other clusters. The goal of clustering is to identify structures in an unlabeled data set by objectively organizing data into homogeneous clusters where the within-cluster-object similarity is minimized and the between-cluster-object dissimilarity is maximized. Time series clustering is a research area applicable to a wide range of fields (Liao, 2005). There seems to be an increased interest in time series clustering as a part of temporal data mining research. Note that time series clustering may cause some multivariate clustering algorithms to fail, because the similarity notation at higher dimensions becomes obscure when the time series is very long (Wang et al., 2006). Therefore, it is necessary to consider clustering using a suitable distance measure between time series incorporating their dependency.

There are various types of distances used in clustering such as model-free, model-based, and complexity methods (Montero and Vilar, 2014). The model-free method is used to measure the proximity between two time series based on the closeness of their values at specific points in time. One way is to compare the autocorrelation function (ACF) between two time series (Bohte et al., 1980; Galeano and Peña, 2000; Caiado et al., 2006; D’Urso and Maharaj, 2009). Fréchet (1906) calculates the distance by considering all combinations of the time points between two time series.

Model-based approaches consider that each time series is generated by some kind of model or by a mixture of underlying probability distributions. Time series are considered similar when model parameters characterizing individual series or the remaining residuals after fitting the model are similar. For example, the distance using the correlation of two time series (Golay et al., 1998) is used. The greater the correlation between two time series, the closer the distance is given. Chouakria and Nagabhushan (2007) propose a distance that covers both existing measures of the correlation between the proximity of observations and the behavior proximity estimates between two time series. Caiado et al. (2006) compared the periodograms of two time series and expressed them in Euclidean distance form. For example, the model-based method uses the assumption that each time series follows the ARIMA model. Piccolo (1990) calculates the distance between two time series using the Euclidean distance between parameters of the AR models. Maharaj (1996, 2000) compare two time series by setting the chi-square test statistic using the parameters of the AR models. Kalpakis et al. (2001) calculate the linear predictive coding (LPC) cepstrum and Euclidean distance between the LPC cepstrums.

Complexity-based approaches compare the levels of complexity of time series. The similarity of two time series does not rely on specific serial features or the knowledge of underlying models, but on measuring the level of information shared by both time series. The mutual information between two series can be formally established using the Kolmogorov complexity concept; however this measure cannot be computed in practice and must be approximated. There are two ways to consider the complexity: calculate the complexity of each time series and compare them with each other (Li et al., 2004; Keogh et al., 2004; Keogh et al., 2007), or to set the weighting function for complexity (Batista et al., 2011).

In this paper we compare clustering methods for time series and apply them to power consumption times series as power consumption profiling. This practical application is important for load forecasting, bad data correction, optimal energy resources scheduling. This paper is organized as follows. Section 2 introduces 10 distance measures considered, and Section 3 provides hierarchical and K-means clustering methods and clustering comparative measures. Section 4 shows the clustering results of simulation data and its application to electricity consumption data. Discussions are in Section 5. Clustering analysis is implemented with R program and TSclust package.

2. Dissimilarity measure

Let X = (X1, . . . , XT )′ and Y = (Y1, . . . , YT)′ denote subsets from two real-valued processes X = {Xt, t ∈ ℤ} and Y = {Yt, t ∈ ℤ}, respectively. Here X and Y are both time series. There are N cases of time series whose time-length is T in the data set.

### 2.1. Autocorrelation-based distance

The autocorrelation function is used as a dissimilarity measure and some authors have studied this type of measure (Bohte et al., 1980; Galeano and Peña, 2000; Caiado et al., 2006; D’Urso and Maharaj, 2009). Galeano and Peña (2000) define a distance between X and Y with the estimated autocorrelations vectors as

$dACF(X,Y)=(ρ^X-ρ^Y)′ Ω (ρ^X-ρ^Y),$

where ρ̂X = (ρ̂1, X, . . . , ρ̂L,X)′ and ρ̂Y = (ρ̂1, Y, . . . , ρ̂L,Y) are the estimated autocorrelation vectors of X and Y respectively. And Ω is a weight matrix. For some L, ρ̂i,X ≈ 0 and ρ̂i,Y ≈ 0 for i > L.

If Ω = I with the uniform weights,

$dACFU(X,Y)=∑i=1L(ρ^i,X-ρ^i,Y)2.$

If the geometric weights are decaying according to the autocorrelation lag,

$dACFG(X,Y)=∑i=1Lp(1-p)i (ρ^i,X-ρ^i,Y)2, with 0

Likewise this measure evaluates the dissimilarity between the corresponding spectral representations of the series.

### 2.2. Correlation-based distance

Correlation measures the similarity of two series. Golay et al. (1998) propose correlation-based distances such as

$dCor(X,Y)=2 (1-Cor (X,Y))$

based on the Pearson correlation between X and Y

$Cor(X,Y)=Σt=1T(Xt-X¯)(Yt-Y¯)Σt=1T(Xt-X¯)2Σt=1T(Yt-Y¯)2,$

where and are the averages of the time series X and Y respectively.

### 2.3. An adaptive dissimilarity index covering both proximity on value and on behavior

Chouakria and Nagabhushan (2007) propose a dissimilarity measure that covers both existing measures of the proximity on observations and temporal correlations for the behavior proximity estimation. The proximity between the behaviors of the series is evaluated by the first-order temporal correlation coefficient as follows:

$CorT(X,Y)=Σt=1T-1(Xt+1-Xt)(Yt+1-Yt)Σt=1T-1(Xt+1-Xt)2Σt=1T-1(Yt+1-Yt)2.$

CorT(X, Y) has a value from −1 to 1. The value CorT(X, Y) = 1 means that both series have similar temporal behavior with a similar direction and instantaneous growth rate. If CorT(X, Y) equals 0, there is no monotonicity between X and Y and their growth rate is stochastically linearly independent (different behaviors). CorT(X, Y) = −1 means that the rate of growth is similar but the direction is opposite (opposite behaviors). The dissimilarity measure as a function of CorT is

$dCorT(X,Y)=φk [CorT(X,Y)]·d(X,Y),$

where φk(·) is an adaptive tuning function to automatically regulate an existing raw-data distance d(X, Y) according to the temporal correlation with φk(u) = 2/(1+exp(ku)), k ≥ 0. The value CorT(X, Y) = 0 implies dCorT(X, Y) = d(X, Y).

### 2.4. Periodogram-based distance

Caiado et al. (2006) propose the Euclidean distance between the periodogram ordinates as

$dPer(X,Y)=1n∑k=1n(IX(λk)-IY(λk))2,$

where $IX(λk)=T-1∣Σt=1TXte-iλkt∣2$ is the periodogram of X and $IY(λk)=T-1∣Σt=1TYte-iλkt∣2$ is the periodogram of Y, at frequencies λk = 2πk/T, k = 1, . . . , n, with n = [(T − 1)/2].

### 2.5. Fréchet distance

Fréchet (1906) proposed a method for measuring proximity between continuous curves. Fréchet distance is widely used in the discrete case (Eiter and Mannila, 1994) and time series framework. Let M be the set of all possible sequences of m pairs that are preserved in the form of observation order r denoted as

$r=((Xa1,Yb1),…,(Xam,Ybm))$

while ai, bj ∈ {1, . . . , T} such that a1 = b1 = 1, am = bm = T, and ai+1 = ai or ai + 1 and bi+1 = bi or bi + 1, for i ∈ {1, . . . , m − 1}.

The Fréchet distance is defined by

$dF(X,Y)=minr∈M (maxi=1,…,m|Xai-Ybi|).$

Fréchet distance not only treats the series as two point sets, but also considers the order of observation. Note that dF(X, Y) can also be computed on series of different lengths.

### 2.6. Piccolo distance

Piccolo (1990) argues that autoregressive expansions convey all the useful information about the stochastic structure of processes except for initial values. If the series are non-stationary, differencing is carried out to make them stationary. If the series have seasonality, seasonality is removed before further analysis, then they are fitted with the truncated AR(∞) models of order k1 and k2 approximating the generation processes of X and Y, respectively, using criteria such as Akaike information criterion (AIC) and Bayesian information criterion (BIC). This approach solves the problem of obtaining ad hoc ARMA approximation for each series subjected to clustering.

The Piccolo’s distance with k = max (k1, k2) takes the following form as

$dPic(X,Y)=∑j=1k(π^j,X*-π^j,Y*)2,$

where Π̂X = (π̂1, X, . . . , π̂k1, X)′ and Π̂Y = (π̂1, Y, . . . , π̂k2, Y)′ denote the vectors of AR(k1) and AR(k2) parameter estimations for X and Y, respectively.

If jk1, and $π^j,X*=0$ otherwise, and analogously $π^j,Y*=π^j,Y$, if jk2, and $π^j,Y*=0$ otherwise. In addition to satisfying the distance properties such as non-negativity, symmetry and triangularity, ∑πj, ∑||πj||, $Σπj2$ are well-defined quantities. Therefore dPic always exists in all invertible ARIMA processes.

### 2.7. Maharaj distance

A major feature of the Maharaj method is the introduction of hypothesis tests to see if the two time series are significantly different. Maharaj (1996, 2000) uses hypothesis testing to determine whether the two time series have significantly different generating processes for the invertible and stationary ARIMA classes.

The test statistic is

$dMah(X,Y)=T (Π^X*-Π^Y*)′ V^-1 (Π^X*-Π^Y*),$

where $Π^X*$ and $Π^Y*$ are the AR(k) parameter estimations of X and Y, respectively, with k selected as in the Piccolo’s distance. is an estimator of $V=σX2RX-1(k)+σY2RY-1(k)$. $σX2$ and $σY2$ are the variances of the white noise processes associated with X, Y. RX and RY are the sample covariance matrices of both series. dMah is asymptotically χ2 distributed under the null hypothesis ΠX = ΠY.

Therefore, the dissimilarity measure between Π̂X and Π̂Y through associated p-value is

$dMah,p(X,Y)=P (χk2>dMah(X,Y)).$

If a hierarchical algorithm starting from the pairwise matrix of p-values dMah,p is developed, then a clustering homogeneity criterion is provided by pre-specifying a threshold significance level α such as 5% or 1%. Those series with associated p-values greater than α will be grouped together. It implies that only those series whose dynamic structures are not significantly different at level α will be placed in the same group. The test statistic dMah and the associated p-value dMah,p satisfy the properties of non-negativity and symmetry; therefore they can be used as dissimilarity of X and Y. dMah evaluates dissimilarity by comparing autoregressive approximations of two series, similar to Piccolo distance dPic. But dMah can be affected by the scale unit because it uses variance of white noise processes, unlike dPic.

### 2.8. Cepstral-based distance

The linear predictive coding (LPC) cepstrum is proposed by Kalpakis et al. (2001) to be useful for clustering ARIMA time series, and provides good properties for distinguishing between ARIMA time series. The cepstrum is defined as the inverse Fourier transform of the logarithm of a signal spectrum. The LPC cepstrum is a cepstrum constructed using autoregression coefficients from the linear model of the signal. The LPC cepstrum is so named because the cepstrum is derived from the LPC of the signal.

Cepstral-based distance is calculated as the Euclidean distance between the LPC cepstral coefficients of X and Y,

$dLPC.Cep(X,Y)=∑i=1T(ψi,X-ψi,Y)2,$

where a time series X following an AR(p) structure such as $Xt=Σr=1pφrXt-r+ɛt$ with the autoregression coefficients φr, a white noise process εt ~ (0, σ2). Here the LPC cepstral coefficients are defined as

$ψh={φ1,if h=1,φh+∑r=1h-1(φr-ψh-r),if 1

### 2.9. Compression-based dissimilarity measure

The Kolmogorov complexity of an object x, K(x) is the shortest program length that can produce x on a universal computer such as a Turing machine. K(x) is the minimum amount of information for generating x by an algorithm. The larger K(x), the greater the complexity. Similarly, given two objects x and y, the conditional Kolmogorov complexity of x given y, K(x|y) is the length of the minimum program that produces x when y is given as an auxiliary input on the program. Therefore, K(x)−K(x|y) is the amount of information that y generates for x.

Based on this theory, Li et al. (2004) present the normalized information distance (NID) as:

$dNID(x,y)=max {K(x∣y),K(y∣x)}max {K(s),K(y)}.$

Here dNID is a metric with a value of [0, 1]. The biggest problem with dNID is that Kolmogorov complexity is uncomputable. So, approximate K(·) by the length of the compression object obtained from data compressors such as gzip and bzip2 (compression program).

Let C(X) and C(Y) be the compressed size of X and Y, respectively. The denominator of dNID is easily approximated to max {C(X), C(Y)}, but the numerator is difficult to approximate because it contains conditional Kolmogorov complexities. Li et al. (2004) solve this problem by using the theory that K(x|y) is roughly equal to K(xy) − K(y), where K(xy) is the minimum program length to calculate the sequence of x and y. A normalized compression distance (NCD) approximating the NID is expressed as

$dNCD(X,Y)=C(XY)-min {C(X),C(Y)}max {C(X),C(Y)}.$

A metric dNCD has a value from 0 to 1 + ε, where ε is the error due to the deficiency of the compression techniques.

Keogh et al. (2004) (see also Keogh et al. (2007)) propose a simplified version of the NCD called a compression-based dissimilarity measure (CDM) as

$dCDM(X,Y)=C(X,Y)C(X)C(Y).$

This metric dNCD ranges from 1/2 to 1, where 1/2 means pure identity and 1 means maximum discrepancy.

### 2.10. Complexity-invariant dissimilarity measure

Batista et al. (2011) argue that highly complex time series pairs often tend to be further apart than simple series pairs. This means that complex series can be incorrectly assigned to a class with less complexity. To reduce this effect, Batista et al. (2011) use information on the difference in complexity between the two series as a correction factor for conventional dissimilarity measures. A complexity-invariant dissimilarity measure (CID) is defined as

$dCID(X,Y)=CF(X,Y)·d(X,Y)$

where an existing raw-data distance d(X, Y) and a complexity correction factor CF(X, Y) with

$CF(X,Y)=max{CE(X),CE(Y)}min{CE(X),CE(Y)}, CE(X)=∑t=1T-1(Xt-Xt+1)2.$

Here CE(·) is a complexity estimator of series. If all the complexity of the series is the same, then dCID(X, Y) = d(X, Y).

3. Clustering methods

### 3.1. Hierarchical clustering

Hierarchical clustering works by combining individual objects with similar objects or groups sequentially and hierarchically using a tree model. It uses a dendrogram, a tree-like structure that shows the order in which objects are joined, so we can do this without having to define a number of clusters in advance. After creating the dendrogram, the tree can be cut at the appropriate level to divide the entire data into several clusters. The distance or similarity between objects is required to perform hierarchical clustering. We calculate group objects that are close together in sequence along with the distance between the newly bundled cluster and another object (or another cluster). Several linkage methods can be used and we use the complete linkage method in this paper. The complete linkage method uses the farthest distance of all pairs of objects in two clusters when calculating the distance between two clusters.

### 3.2. K-means clustering

K-means clustering is a method of forming clusters by gathering individuals close to the center of each cluster. K-means clustering must first set the center of each cluster. Unlike hierarchical clustering, it can only work by specifying the number of clusters in advance. Let X = C1C2 ∪ ·· · ∪ CK, and CiCj = φ. Clusters are determined by

$arg minC∑i=1K∑xj∈Ci‖xj-ci‖2,$

where ci is the center of the cluster. The operation of K-means clustering is similar to the EM algorithm. Initially, each object is assigned to a cluster based on the centers of the cluster given at random. The center value is calculated in each cluster. The objects are then assigned to the cluster again, according to the new center. The operation stops when the center converges in a certain place or fills a predetermined number of iterations.

### 3.3. Clustering validity

To evaluate clustering validity we consider four measures. Error rate (ER) and similarity (sim) index compare the actual clusters with the estimated clusters. The Dunn index and silhouette measure are computed to compare within cluster connectedness.

• (i) Error rate (ER)

Let H be a clustering map defined as

$H(f,g)={1,if f and g are in the same cluster,0,otherwise.$

Regarding the estimation error, the clustering estimation error rate η(K) is defined by Serban and Wasserman (2005) as

$η(K)=1CN2∑r

where C = {f1, . . . , fN} denote the true curves and Ĉ = { 1, . . . , N} denote the estimated curves.

• (ii) Similarity index (Sim index)

$Sim (G,C)=1K∑i=1Kmax1≤j≤K Similarity (Gi,Cj), where Similarity (Gi,Cj)=[Gi∩Cj][Gi]+[Cj],$

where is the true partition, and C = {C1, . . . , CK} is the partition obtained via clustering algorithm. [·] denotes the cardinality of the elements in the set.

• (iii) Dunn index

Dunn (1974) proposed an internal measure of clustering as

$D(C)=minCk,Cl∈C,Ck≠Cl (mini∈Ck,j∈Cl dist(i,j))maxCm∈C diam (Cm),$

where diam (Cm) is the maximum distance between observations in cluster Cm, and d(i, j) is the distance between data points xi and xj in the cluster Ci. A larger value means it is tighter in the same cluster.

• (iv) Silhouette

Rousseeuw (1987) proposed silhouette width as the average of each observation’s silhouette value.

$s(i)=b(i)-a(i)max{a(i),b(i)},$

where

$a(i)=1[Ci]-1∑j∈Ci,i≠jd(i,j) and b(i)=mink≠i1[Ck]∑j∈Ckd(i,j).$

The silhouette value is from −1 to 1 and means how tightly grouped an internal evaluation is. A larger value means it is tighter.

4. Data analysis

### 4.1. Simulation

We design a set of 100 time series in five clusters. Each cluster has 20 time series. Each time series has 96 time points. The errors are generated independently from N(0, 1). The description of each cluster is :

• Cluster 1. Step up and down

${Xt~N(1,1),t=1,…,20,Xt~N(8,2),t=21,…,70,Xt~N(1,1),t=71,…,96.$

• Cluster 2. Combined AR model

${Xt=1.5+0.8Xt-1+ɛt,t=1,…,48,Xt=-1+0.6Xt-1+ɛt,t=49,…,96, where ɛt~N(0,1).$

• Cluster 3. ARMA model

$Xt=4+0.5Xt-1+ɛt+ɛt-1, t=1,…,96, where ɛt~N(0,1).$

• Cluster 4. Periodic model

$Xt=3 sin (20(t-1)95)+5+ɛt, t=1,…,96, where ɛt~N(0,1).$

• Cluster 5. Complex periodic model

$Xt=2 [sin (20(t-1)95)+2|sin (20(t-1)4×95)|]+4+ɛt, t=1,…,96, where ɛt~N(0,1).$

Figure 1 shows one data set (a) in the simulation and mean lines (b) of each cluster. The simulation results are obtained from 100 repetitions. We used 10 distance measures with hierarchical complete linkage and K-means clustering algorithm respectively. Table 1 and Table 2 show the results. Hierarchical and K-means clustering methods with each distance measure provide some similar measures of cluster validity. According to ER, the best performance is done with dCID in both hierarchical and K-mean clustering. Either clustering with dPer has the highest silhouette values. Based on the Sim index, dCorT is the best in hierarchical clustering and dCID is the best in K-means clustering. The methods are substantially different; however, they lead to acceptable models in the sense of cluster validity measures.

In practice, attention should be given to the choice of the distance and clustering algorithm. Each property should be considered to make proper clusters. A range of feature-, model-, and complexity-based dissimilarities are included in this paper. For instance, the K-means clustering algorithm moves each series to the cluster whose centroid is closest in order to recalculate the cluster centroid and repeats the procedure until no more are assigned. The range of proper methods become limited and careful implementation is required once the clustering objectives are made clear and the characteristics of time series are considered.

### 4.2. Electricity consumption data analysis

For a clustering application, we use a power consumption data set from a total of 90 devices and buildings in 16 companies located in South Korea on June 19, 2018 (Table 3). Power consumption data measured every 15 minutes is a time series with a total of 96 time points per day. Each power consumption pattern varies widely, but can be divided into three types: continuous consumption, on-off repetition, and turning on and off once a day. For example, machine A repeats on-off every 1 hour from 9 am, but machine B can repeat on-off every 2 to 3 hours even if it starts at 9 am. Various continuous patterns are also possible according to working periods and workloads.

According to comparing the silhouette values from three to ten clusters based on dCID, the silhouette value of resulting four clusters was the highest. Therefore we determined four clusters from dCID which was the best in the simulation. The distances used for cluster analysis are dAR.Mah, dCID, dCor, dCorT, and dPer. The missing values in the data are estimated using the Stineman (1980) interpolation which creates a curve with no more inflection points than clearly required by the given set of points. We used this method by na.interpolation of imputeTS (Morits and Bartz-Beielstein, 2017) R packages.

Table 4 provides the clustering validity measures of this real power consumption data. Figure 2 gives clusters and their averaging patterns using the dPer in hierarchical complete linkage and K-means clustering. The patterns in each cluster are shown in the far left figure with the average lines. Figure 3 provides cluster members by dPer in K-means. Cluster 1 has some continuous pattern. Cluster 2 has many devices of on-off form, Cluster 3 has some main devices, and chiller devices are in Cluster 4. There is some limit to device interpretation since the exact device information is not allowed.

5. Discussion

The development of smart grids has enabled the easy collection of vast amount of power data. In order to efficiently analyze this huge data, it is very useful to quickly catch and cluster power consumption patterns. If you can understand power consumption patterns, there is an advantage in analysis such as prediction. We compared 10 distance measures using hierarchical clustering and K-means clustering. Simulation provides that there is no one best clustering. The time series structure should be reflected in doing clustering analysis. There should be more measures to evaluate time series clustering except Dunn index and silhouette width. This work could be extended to meet the requirements of real-time data processing applications, such as clustering the power consumption of appliances and controlling usage patterns, and possibly the detection of appliances with anomalies that indicate faulty or compromised appliances. We hope that this research will help serve others interested in advancing time series clustering research. For power consumption big data, clustering is a challenging problem considering local and global profiles with computational burden and effective complexity modelling.

Acknowledgement
This research was supported by Korea Electric Power Corporation (Grant number: R18XA01). Also it is supported by the Korea Research Foundation (KNRF) (No. 2018R1A2B26001664).
Figures
Fig. 1. One simulation data set of 100 cases (a) and average lines of each cluster (b).
Fig. 2. A real data clustering result by using dPer. (a) Hierarchical, (b) K-means.
Fig. 3. Cluster members obtained by dPer in K-means.
TABLES

### Table 1

Hierarchical clustering performance with simulation data

DistanceMeasure

ERSim indexDunn indexSilhouette
ACF0.29910.53880.41730.1006
Cor0.19950.65470.57470.2477
CorT0.11890.80020.35470.3022
Per0.21080.69540.31880.4807
Frechet0.18340.71240.27230.2212

AR.Pic0.28520.57710.13120.2802
AR.Mah0.17970.72130.04640.4702
AR.LPC.CEP0.17640.71170.18800.2902

CDM0.25450.55470.97760.0009
CID0.10550.82000.45020.3623

### Table 2

K-means clustering performance with simulation data

DistanceMeasure

ERSim indexDunn indexSilhouette
ACF0.29910.53880.41730.1006
Cor0.08860.84720.46300.2282
CorT0.08650.85560.28890.2949
Per0.10870.81380.15620.4991
Frechet0.10590.82610.30160.2649

AR.Pic0.18400.67800.06960.2612
AR.Mah0.17300.73140.01140.4436
AR.LPC.CEP0.14230.74750.11190.2784

CDM0.15300.70890.97620.0014
CID0.07780.87080.33610.3278

### Table 3

List of 90 devices and buildings in 16 companies located in South Korea

CompanyType
D1LV2
LV5
Chiller No.4
Chiller No.6
Chiller No.7
Chiller No.8
Water treatment

D2L-1M public
L-2 main
L-14 main
LP-CAR parking tower

M1Main device
Main office
Grooving machine

G1Public
Main
Shopping area

N1250KVA_Main
300KVA_Main
Lathe1
Lathe2
New material
Grinder
Urethane1
Urethane2

D3Rooftop

S1L-A
L-B
LP-A1
LP-M
PA-2
PA-3
PB-1

S2E_V
Water supply

S31F
2,3,4F
5F
Public
Main

B1F-B3
L-F
L-O
P-CAR
P-E-1
P-EHP

K1Main circuit breaker
Circuit breaker1
Circuit breaker4

T1A-04
B-01
CO2 welding machine
Main press
Welding Line Distribution Board

P1Unit1_1
Unit1_2
Unit1_3
Unit1_4
Unit1_5
Unit1_coiler
Unit2_1
Unit2_2
Unit2_3
Unit2_4
Unit2_coiler
Unit3_1
Unit3_2
Unit3_3
Unit3_incidental equipment
Unit3_coiler
Unit4_1
Unit4_2
Unit4_3
Unit4_4
Unit4_incidental equipment
Unit4_coiler

H1L-1A
L-M
Public

H1LC-1 office main
Main MCCB
Sub1 light
Main1 main

### Table 4

Hierarchical & K-means clustering with electricity consumption data with 4 clusters

MethodMeasureCorCorTPerAR.MahCID
HierarchicalDunn index0.60770.26320.53830.00590.0235
Silhouette0.19720.68230.77990.34080.6142
K-meansDunn index0.40760.00660.14150.00040.0008
Silhouette0.10020.46460.65680.29400.2793

References
1. Al-Jhrrah OY, Al-Hammadi Y, and Muhaidat S (2017). Multi-layered clustering for power consumption profiling in smart grids. IEEE Access. Digital Object Identifier/ACCESS 2017.2712258
2. Batista GE, Wang X, and Keogh EJ (2011). A complexity-invariant distance measure for time series. Proceedings of the 2011 SIAM International Conference on Data Mining, Society for Industrial and Applied Mathematics. , 699-710.
3. Bohte Z, Čepar D, and Košmelj K (1980). Clustering of time series, Barritt MM and Wishart D (Eds). Compstat 1980: Proceeding in Computational Statistics, (pp. 587-593), Heidelberg, Physica-Verlag.
4. Caiado J, Crato N, and Peña D (2006). A periodogram-based metric for time series classification. Computational Statistics & Data Analysis, 50, 2668-2684.
5. Chouakria AD and Nagabhushan PN (2007). Adaptive dissimilarity index for measuring time series proximity. Advances in Data Analysis and Classification, 1, 5-21.
6. Dunn J (1974). Well separated clusters and optimal fuzzy partitions. Journal of Cybernetics, 4, 95-104.
7. D’Urso P and Maharaj EA (2009). Autocorrelation-based fuzzy clustering of time series. Fuzzy Sets and Systems, 160, 3565-3589.
8. Eiter T and Mannila H (1994). Computing discrete frechet distance (Technical Report CD-TR 94/64), Vienna, Austria, Information Systems Department, Technical University of Vienna.
9. Fréchet MM (1906). Sur quelques points du calcul fonctionnel. Rendiconti del Circolo Matematico di Palermo (1884–1940), 22, 1-72.
10. Galeano P and Peña D (2000). Multivariate analysis in vector time series. Department de Estadística y Econometría, Universidad Carlos III de Madrid.
11. Golay X, Kollias S, Stoll G, Meier D, Valavanis A, and Boesiger P (1998). A new correlation-based fuzzy logic clustering algorithm for fMRI. Magnetic Resonance in Medicine, 40, 249-260.
12. Haben S, Singleton C, and Grindrod P (2015). Analysis and clustering of residential customers energy behavioral demand using smart meter data. IEEE Transactions on Smart Grid, 7, 136-144.
13. Kalpakis K, Gada D, and Puttagunta V (2001). Distance measures for effective clustering of ARIMA time-series. Proceedings 2001 IEEE International Conference on Data Mining. , 273-280.
14. Keogh E, Lonardi S, and Ratanamahatana CA (2004). Towards parameter-free data mining. Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. , 206-215.
15. Keogh E, Lonardi S, Ratanamahatana CA, Wei L, Lee SH, and Handley J (2007). Compression-based data mining of sequential data. Data Mining and Knowledge Discovery, 14, 99-129.
16. Li M, Chen X, Li X, Ma B, and Vitányi PM (2004). The similarity metric. IEEE Transactions on Information Theory, 50, 3250-3264.
17. Liao TW (2005). Clustering of time series data—a survey. Pattern Recognition, 38, 1857-1874.
18. Maharaj EA (1996). A significance test for classifying ARMA models. Journal of Statistical Computation and Simulation, 54, 305-331.
19. Maharaj EA (2000). Cluster of time series. Journal of Classification, 17, 297-314.
20. Montero P and Vilar JA (2014). TSclust: An R package for time series clustering. Journal of Statistical Software, 62, 1-43.
21. Moritz S and Bartz-Beielstein T (2017). imputeTS: time series missing value imputation in R. The R Journal, 9, 207-218.
22. Piccolo D (1990). A distance measure for classifying ARIMA models. Journal of Time Series Analysis, 11, 153-164.
23. Rousseeuw PJ (1987). Silhouettes: graphical aid to the interpretation and validation of cluster analysis. Journal of Computation and Applied Mathematics, 20, 53-65.
24. Serban N and Wasserman L (2005). CATS: clustering after transformation and smoothing. Journal of American Statistical Association, 471, 990-999.
25. Stineman RW (1980). A consistently well-behaved method for interpolation. Creative Computing, 6, 54-57.
26. Tsekouras GJ, Hatziargyriou ND, and Dialynas EN (2007). Two-stage pattern recognition of load curves for classification of electricity customers. IEEE Transactions on Power Systems, 22, 1120-1128.
27. Wang X, Smith K, and Hyndman R (2006). Characteristic-based clustering for time series data. Data Mining and Knowledge Discovery, 13, 335-364.
28. Xiong Y and Yeung DY (2004). Time series clustering with ARMA mixtures. Pattern Recognition, 37, 1675-1689.