search for

CrossRef (0)
Monitoring social networks based on transformation into categorical data
Communications for Statistical Applications and Methods 2022;29:487-498
Published online July 31, 2022
© 2022 Korean Statistical Society.

Joo Weon Leea, Jaeheon Lee1,a

aDepartment of Applied Statistics, Chung-Ang University, Korea
Correspondence to: 1Department of Applied Statistics, Chung-Ang University, 84, Heukseok-ro, Dongjak-gu, Seoul 06974, Korea. E-mail: jaeheon@cau.ac.kr

This work was supported by the National Research Foundation of Korea(NRF) grant funded by the Korea government
(MSIT) (No. 2020R1F1A1A01050674).
Received March 17, 2022; Revised April 7, 2022; Accepted April 7, 2022.
Social network analysis (SNA) techniques have recently been developed to monitor and detect abnormal behaviors in social networks. As a useful tool for process monitoring, control charts are also useful for network monitoring. In this paper, the degree and closeness centrality measures, in which each has global and local perspectives, respectively, are applied to an exponentially weighted moving average (EWMA) chart and a multinomial cumulative sum (CUSUM) chart for monitoring undirected weighted networks. In general, EWMA charts monitor only one variable in a single chart, whereas multinomial CUSUM charts can monitor a categorical variable, in which several variables are transformed through classification rules, in a single chart. To monitor both degree centrality and closeness centrality simultaneously, we categorize them based on the average of each measure and then apply to the multinomial CUSUM chart. In this case, the global and local attributes of the network can be monitored simultaneously with a single chart. We also evaluate the performance of the proposed procedure through a simulation study.
Keywords : average run length, closeness centrality, degree centrality, exponentially weighted moving average (EWMA) chart, multinomial cumulative sum (CUSUM) chart, social network monitoring
1. Introduction

Social network analysis (SNA) provides information about social and individual relationships by graphically and mathematically monitoring global structures and local entities. Network surveillance applies statistical process monitoring (SPM) to a network and aims to quickly detect network anomalies. The network is monitored using control charts to detect anomalous behavior in each period of the network data. Therefore, network surveillance can help prevent conflicts and crises in social situations by identifying global and local perspectives. Many studies have addressed network-surveillance schemes. For example, Hosseini and Noorossana (2018) compared the performance of detecting outbreaks by applying the average and standard deviation of degree measures of a network to exponentially weighted moving average (EWMA) and cumulative sum (CUSUM) charts. Wilson et al. (2019) applied a degree corrected stochastic block model (DCSBM) to Shewhart charts to detect local and global changes. Yu et al. (2018) proposed a multivariate procedure to monitor an individual’s propensity by using the Hotelling T2 chart. Moreover, Yu et al. (2021) compared several monitoring methods using the global network summary statistics, the scan-based method proposed by Preibe et al. (2005), and the model-based methods proposed by Yu et al. (2018) and Wilson et al. (2019).

Networks comprise nodes (vertices) and edges (links), where nodes represent actors or entities and edges represent communication between nodes. Networks can be divided into unweighted or weighted networks according to the types of edges. Edges in unweighted networks take a value of 0 or 1 depending on whether two nodes are communicating or not. Edges in weighted networks take non-negative values such as the number of communications between two nodes. Additionally, networks can be divided into two main categories: directed and undirected networks depending on whether the network is directional or not. An undirected network does not consider the direction of communication between two nodes. For example, given nodes i and j for ij, communication between nodes i and j is the same as communication between nodes j and i. Otherwise, it is a directed network, or digraph. A network graph can be represented by an adjacency matrix that illustrates the edges between nodes. Network data can be analyzed more easily by mathematically representing the networks using adjacency matrices.

Centrality measures indicate which node is most important in the network. Freeman (1977, 1978) proposed degree, closeness, and betweenness centrality measures in unweighted networks. Degree centrality is the number of edges directly connected to a node and is representative of the node’s popularity and communication activity. Scott (1991) argued that degree centrality can be regarded as local centrality. However, closeness and betweenness centralities are considered measures of global centrality in terms of the distance among various nodes. Closeness and betweenness centralities are associated with the number of nodes that lie within the shortest distance from all other nodes in the network, helping clarify the global network structure. Brandes (2001), Newman (2001), and Barrat et al. (2004) generalized these measures and applied them to weighted networks. Brandes (2001) and Newman (2001) considered the shortest path algorithm proposed by Dijkstra (1959) under the assumption that the most critical actor or entity is on the shortest path and can easily access all nodes. Opsahl et al. (2010) proposed a generalized form that combines the number of edges and their weights in order to consider both measures in a weighted network. Abassi et al. (2011) discussed the relationship between scholar’s performance and SNA measures using normalized centrality measures. Abassi et al. (2013) also proposed hybrid centrality measures that combine existing centrality measures to improve the importance of nodes, and applied them to a co-author network to evaluate the proposed measures.

Because centrality measures in weighted networks are continuous variables, previous studies on SNA have generally used continuous statistics to evaluate performance through control charts. However, if several centrality measures are transformed into one categorical variable through classification, only a single chart can be used to monitor social networks more effectively. By using this method, the problem for monitoring social networks using centrality measures turns into that of monitoring categorical processes. Recently, Perry (2020) classified the observed networks over time into two categories based on the average values of transitivity and reciprocity measures, and applied them to an EWMA chart to monitor the hierarchical tendency. Furthermore, he compared the out-of-control average run length (ARL) performance of the proposed EWMA chart with that of the multinomial CUSUM chart proposed by Ryan et al. (2011).

A brief explanation of the multinomial CUSUM chart proposed by Ryan et al. (2011) is as follows: They proposed the multinomial CUSUM chart for monitoring in situations where items were classified into more than two categories, and they compared with multiple Bernoulli CUSUM charts. The multinomial CUSUM chart is useful when monitoring more than two categories with specified direction of out-of-control shifts, and unlike other charts which monitor categorical data, this chart does not require aggregation of items into samples.

In this paper, we propose a procedure for creating a categorical variable having four categories by classifying degree and closeness centrality measures according to their average values in weighted networks, and then monitoring this variable using the multinomial CUSUM chart. By applying the four categories to the multinomial CUSUM chart, we can simultaneously monitor both local and global views of the network. We also compare the performance of the proposed procedure with that of the EWMA charts using degree centrality and closeness centrality individually.

The next sections are as follows: In Section 2, we define degree and closeness centrality measures, and in Section 3, we describe the control charts used in the simulations. In Section 4, we propose a procedure for monitoring one categorical variable transformed from two centrality measures, and in Section 5, we describe the simulation settings and discuss the simulation results. Finally, we provide our conclusions in Section 6.

2. Centrality measures

2.1. Degree centrality

Degree centrality indicates the node’s communication activity and popularity. It has the advantage of being uncomplicated and easy to calculate. Degree is a local centrality measure because it is calculated based on the number of neighbors of a focal node. Freeman (1977, 1978) defined degree centrality as the number of directly connected nodes in a binary network. The degree centrality for node i in unweighted networks is defined as


where xi j is a binary value of 1 when nodes i and j interact with each other and 0 otherwise, and N is the total number of nodes.

However, since this definition does not take into account the weights representing the frequency of communications or the closeness between nodes, the use of this measure in weighted networks can lead to significant information loss. Barrat et al. (2004) proposed the degree centrality for node i in weighted networks as in Equation (2.1) considering weights.


where wi j denotes the weight between nodes i and j.

2.2. Closeness centrality

Closeness centrality is the inverse of the shortest path between nodes, indicating that an efficient node is close to all the nodes. This centrality measure is related to global centrality. For example, a node may have a high degree centrality but a low value of closeness centrality in the network because the node is popular locally but not globally. There has been great interest in defining the shortest path between nodes. It assumes that the higher the number of intermediate nodes, the higher the communication cost. For example, a node with many intermediary nodes takes more time to acquire information, which distorts and delays the communication between nodes.

Freeman (1977, 1978) proposed the closeness centrality for node i in unweighted networks defined as


where d(i, j) is the shortest distance between nodes i and j and is defined as


for intermediary nodes g on paths between node i and j.

However, this measure has the disadvantage of losing a lot of the information contained in weighted networks. Newman (2001) considered the relationship between the weight and distance by using Dijkstra (1959)’s algorithm. The closeness centrality for node i in weighted networks proposed by Newman (2001) is given by


where dw(i, j) is the shortest distance used Dijkstra’s algorithm and is defined as

dw(i,j)=min (1wig++1wgj)

for intermediary nodes g on paths between node i and j.

3. Control charts

3.1. EWMA chart

The EWMA chart proposed by Roberts (1959) is useful for monitoring and detecting small shifts in the process parameter. Despite having similar performance to CUSUM charts, EWMA charts are widely used due to their ease and simplicity of application and implementation.

We define Xt, t = 1, 2, . . . , as the independent observations at time t. The EWMA chart statistic, the upper control limit (UCL), and the lower control limit (LCL) are defined as follows:


where μ0 is the in-control mean, σ0 is the in-control standard deviation of Xt, γ (0 < γ ≤ 1) is the smoothing coefficient, and L is the factor that determines the width of the control limits. The initial value of chart statistic, Z0, is usually set to be equal to μ0. For smoothing coefficients, it is well known that smaller values of γ are better at detecting small changes, and larger values are better at detecting large changes. If the value of γ is chosen, L can be determined by the practitioner according to the specified value of in-control ARL, ARL0.

As t increases, the above control limits converge to the asymptotic limits given by


In this paper, we use these asymtotic control limits to evaluate the performance of EWMA charts.

3.2. Multinomial CUSUM chart

Traditionally, SPM uses Shewhart p-charts and binomial CUSUM charts to monitor the proportion of non-conforming items in the process. To use these charts, we must wait until the required number of items (n > 1) are aggregated into samples to monitor the proportion of nonconforming items. As a result, these charts have the disadvantage of having an inherent delay in detecting shifts in the proportion, which causes slower detection of process changes. To overcome this disadvantage, Reynolds and Stoumbos (1999) proposed using a Bernoulli CUSUM chart when a continuous stream of inspected items is available. The Bernoulli CUSUM chart is a special case of the binomial CUSUM chart when n = 1.

Although these charts mentioned above can only be used in situations where items are classified into two categories, there are many situations where items are classified into more than two categories. For example, instead of classifying items in a process as good or bad, they can be categorized as good, fair, or bad. Similarly, the diagnosis of posttraumatic patients can be divided into four categories: survival, minor complications, major complications, and death. Ryan et al. (2011) proposed a multinomial CUSUM chart to monitor several categories by extending the Bernoulli CUSUM chart.

Let X1, X2, . . . be independent multinomial random variables where Xt = u if the tth item classified into the uth category, t = 1, 2, . . . , u = 1, 2, . . . , v. Let p0,u and p1,u be the in-control and out-of-control probabilities of being classified into the uth category, respectively, where p0,up1,u, Σu=1vp0,u=1 and Σu=1vp1,u=1.

The multinomial CUSUM statistics are defined as

St=max(0,St-1+Rt),         t=1,2,,

where S0 = 0 and Rt is the log-likelihood ratio score. The values of Rt are

Rt=ln(p1,up0,u),         t=1,2,,u=1,2,,v,

when Xt = u. The multinomial CUSUM chart signals when St > h, where the control limit, h, can be determined satifying the specified value of ARL0.

Note that the multinomial CUSUM procedure can be generalized to include samples of size n > 1 (Ryan et al., 2011).

4. Proposed monitoring procedure

Let wi j,t be the number of communications between nodes i and j at time t. In this paper, we assume that the network is undirected and does not consider any node to communicate with itself, and also assume that wi j,t follows a Poisson distribution as

wij,t~Poisson(λ)         for ij,

where λ represents the in-control mean of weights.

We can calculate degree centralities Dw,t(i) and closeness centralities Cw,t(i) for i = 1, 2, . . . , N at time t defined in (2.1) and (2.2), respectively, and then can obtain the average degree centrality Deg¯t and the average closeness centrality Close¯t as


These average measures can be classified into the following four categories to be applied to the multinomial CUSUM chart.

  • (i) Category 1 : Deg¯tμd and Close¯tμc,

  • (ii) Category 2 : Deg¯t<μd and Close¯tμc,

  • (iii) Category 3 : Deg¯t<μd and Close¯t<μc,

  • (iv) Category 4 :Deg¯tμd and Close¯t<μc,

where μd is the in-control mean of degree and μc is the in-control mean of closeness which can be estimated from Phase I samples. In this paper, the values of μd, μc, and p0,u (u = 1, 2, 3, 4) in (3.5) are estimated from 100,000 simulated networks.

The procedure proposed in this paper is summarized as follows: First, Dw,t(i) and Cw,t(i) are calculated for N nodes at time t, and Deg¯t and Close¯t are calculated. After that, by determining which category these values belong to according to the above classification rule, the Rt in (3.5) and the multinomial CUSUM statistic, St, in (3.4) are calculated. Finally, by comparing the value of this statistic with the control limit h, it is determined whether or not to give a signal.

5. ARL performance

5.1. Simulation settings

Through simulation, the performance of the proposed multinomial CUSUM procedure is evaluated by comparing it with the EWMA procedure using degree centrality and clossness centrality separately. In the EWMA procedure, Deg¯t in (4.1) and Close¯t in (4.2) are used as Xt in (3.1), respectively. The multinomial CUSUM chart and the EWMA charts are compared based on their ARL values.

The settings of the simulation are determined by considering (i) the network to have N = 30 nodes, (ii) the in-control mean of weights as λ = 1 and 2, and (iii) in the presence of an anomalous event within the social network, the mean of weights between K (≤ 30) randomly chosen abnormal nodes to change from λ to (1+δ)λ. In our simulations, K, which refers to the number of abnormal nodes, takes on values of 2, 3, 4, and 6, while δ, which denotes the change in parameter, takes on values of 0.5, 1, 2, 3, 4, 5, 6, 7, and 8. For example, the case where λ = 1, K = 2, and δ = 2 means that among 30 nodes, the weight of two nodes has changed to 3, and the weight of the remaining nodes is 1 without change.

Note that in order to apply the multinomial CUSUM chart, it is necessary to specify the out-of-control probabilities of each category, p1,u (u = 1, 2, 3, 4), in (3.5). Table 1 shows the in-control probabilities, p0,u, estimated from Phase I samples and the out-of-control probabilities, p1,u, desired to be detected. These out-of-control probabilities are set to probabilities of δ = 2 when λ = 1 and to probabilities of δ = 1 when λ = 2 for each abnormal nodes size K.

To determine the control limits for control charts, the in-control ARL value is set as 100. We design three different EWMA charts with γ = 0.05, 0.1, and 0.2, and obtain the values of L in (3.2) and (3.3) using the software provided by Knoth (2021). The control limits, h, for the multinomial CUSUM chart, are determined using 100,000 simulations. The values of L in (3.2) and (3.3) for the EWMA chart and the values of h for the multinomial CUSUM chart are displayed in Table 2. As shown in Table 1, in the multinomial CUSUM chart, since the values of p0,u and p1,u vary according to λ and K, the values of h satisfying ARL0 = 100 in each case are different. However, in the EWMA chart, the average centralities in (4.1) and (4.2) approximately follow a normal distribution, respectively, and L, which represents the width of the control limits, depends only on γ. Therefore, the values of L satisfying ARL0 = 100 are determined to be the same value for a given γ.

The out-of-control ARL values for the EWMA chart are obtained from 10,000 simulations, while the values for the multinomial CUSUM chart are obtained from 100,000 simulations.

5.2. Simulation results

We look at how the occurrence of abnormal nodes changes the probability of each category. Figure 1 illustrates a scatterplot of the in-control and out-of-control probabilities for degree and closeness centralities of λ = 1. In Figure 1(a) shows the scatterplot for the in-control state, and shows that the probabilities of category 1 and 3 components are similar. However, from Figure 1(b), it can be seen that when abnormal nodes of K = 4 and δ = 6 occur, the probability of category 1 increases significantly, but that of category 3 decreases significantly. It is found that when K = 2, 3, and 4, the probabilities of category 1 and 2 increase, while the probabilities of category 3 and 4 decrease. When K = 6, the probability of category 1 increases, but the probabilities of category 2, 3, and 4 decrease. Due to the change in the probabilities belonging to each category, the multinomial CUSUM chart can detect the occurrence of abnormal nodes.

The ARL performance for λ = 1 and λ = 2 are shown in Tables 3 and 4, respectively. In each row of the tables, the smallest ARL value is shown in boldface.

In Table 3 where λ = 1, the in-control mean of degree is μd = 29.00397, whereas the in-control mean of closeness is μc = 0.0471556. When K = 2 and δ is not large (δ ≤ 3), the multinomial CUSUM chart performs better than the EWMA charts. When δ is large, the EWMA chart using closeness centrality with γ = 0.05 performs better. However, even when δ is large, the performance of the EWMA chart using degree centrality is not good compared to the multinomial CUSUM chart. For example, when δ = 4, the ARL for the multinomial CUSUM chart is 38.4, while the ARL for the EWMA chart using degree centrality with γ = 0.05 is 52.3. The multinomial CUSUM chart performs better for small δ at K = 3 and 4, but the EWMA charts perform better as δ increases. When K = 6, the EWMA charts outperform the multinomial CUSUM chart for all the changes. When the abnormal nodes size is large (K ≥ 4), the overall performance of the EWMA charts using γ = 0.2 is better.

To determine why the multinomial CUSUM chart performs poorly for large K and δ, we examine the probabilities of four categories for each change, as shown in Table 5 where λ = 1. As mentioned in Ryan et al. (2011), the multinomial CUSUM chart has a weakness in detecting shifts in misspecified directions and magnitudes. As K and δ increase, the direction and magnitude of the probabilities change differently from those specified in Table 1. For example, it can be seen that when K = 3 and δ = 8, the probability of category 1 increases significantly compared to the magnitude specified in Table 1, and the probability of category 2 changes in the opposite direction to the direction specified in Table 1. In addition, when K = 6 and δ is large, the probability of category 1 is significantly larger than other probabilities. As such, when the probability of one category is representative of the entire network, the ARL performance of the multinomial CUSUM chart may deteriorate.

In Table 4 where λ = 2, the in-control mean of degree is μd = 57.99715, whereas the in-control mean of closeness is μc = 0.08057656. Similar to the results in Table 3, the multinomial CUSUM chart performs better for small values of K and δ, while the EWMA charts perform better for large values of K or δ. When K or δ are very large, the fact that the performance of the multimomial CUSUM chart deteriorates still occurs as shown in Table 3.

From the results in Tables 3 and 4, it is recommended to use the multinomial CUSUM chart when the direction and magnitude of change in category probabilities can be predicted to some extent, or when the magnitude of change is expected to be small. The advantage of using the multinomial CUSUM chart is that we can monitor several centrality measures simultaneously instead of monitoring one centrality measure with a single chart.

6. Conclusions

In recent years, there have been many studies focused on monitoring and detecting changes in SNA and providing information to help prevent conflicts and crises. Accordingly, the networks were observed over time and the observed networks were monitored by applying control charts. Most studies that applied control charts individually used continuous measures as statistics. However, when several continuous measures are classified into one categorical variable and applied to a control chart, there is an advantage that several measures can be monitored using a single control chart.

In this paper, we proposed a procedure for classifying degree and closeness measures, that represent local and global perspectives, respectively, based on their average values and detecting the occurrence of abnormal nodes using the probabilities change of the classified categories. The performance of the proposed procedure was compared with that of the EWMA charting procedure using degree and centrality measures individually through simulation.

From the simulation results, it was found that the multinomial CUSUM chart showed better performance for small changes but poor performance for large changes. When K and δ were large, the performance of the CUSUM chart deteriorated because the probability of each category changed differently from the direction and magnitude specified in advance. Therefore, the proposed procedure is recommended when the direction and magnitude of change in category probabilities can be predicted or when it is desired to detect small changes in the social network.

Finally, it would be interesting to study procedures that use measures other than degree and close ness centrality measures as criteria for categorization. In the future, we will study the classification procedures using two or more different measures and their performance for various types of change.

Fig. 1. Scatterplots of the in-control and out-of-control probabilities based on degree and closeness centralities when 貫 = 1. (a) the in-control probabilities (b) the out-of-control probabilities when = 4 and 灌 = 6.

Table 1

The in-control and out-of-control probabilities for categories




λ = 120.44450.09430.38220.0790




λ = 220.44970.07930.38790.0832

Table 2

Values of L for the EWMA chart and values of h for the multinomial CUSUM chart

EWMAγ = 0.05, L = 1.878617
γ = 0.10, L = 2.147571
γ = 0.20, L = 2.359552

Multinomial CUSUMλ = 1K = 2h = 0.69
K = 3h = 1.68
K = 4h = 2.41
K = 6h = 3.146

λ = 2K = 2h = 0.58
K = 3h = 1.30
K = 4h = 2.05
K = 6h = 3.15

Table 3

ARL values for EWMA charts, using degree and closeness centralities separately, and multinomial

KδEWMAMultinomial CUSUM

γ = 0.05γ = 0.1γ = 0.2





Table 4

ARL values for EWMA charts, using degree and closeness centralities separately, and multinomial

KδEWMAMultinomial CUSUM

γ = 0.05γ = 0.1γ = 0.2





Table 5

Probabilities of categories when λ = 1





  1. Abbasi A, Altmann J, and Hossain L (2011). Identifying the effects of co-authorship networks on the performance of scholars. Journal of Informetrics, 5, 594-607.
  2. Abbasi A and Hossain L (2013). Hybrid centrality measures for binary and weighted networks. Complex Networks, Berlin, Heidelberg, Springer.
  3. Barrat A, Barthelemy M, Pastor-Satorras R, and Vespignani A (2004). The architecture of complex weighted networks. Proceedings of the National Academy of Sciences, 101, 3747-3752.
  4. Brandes U (2001). A faster algorithm for betweenness centrality. The Journal of Mathematical Sociology, 25, 163-177.
  5. Dijkstra EW (1959). A note on two problems in connexion with graphs. Numerische Mathematik, 269-271.
  6. Freeman LC (1977). A set of measures of centrality based on betweenness. Sociometry, 40, 35-41.
  7. Freeman LC (1978). Centrality in social networks conceptual clarification. Social Networks, 1, 215-239.
  8. Hosseini SS and Noorossana R (2018). Performance evaluation ofEWMAand CUSUM control charts to detect anomalies in social networks using average and standard deviation of degree measures. Quality and Reliability Engineering International, 34, 477-500.
  9. Knoth S (2021). R software package 쁲pc: Statistical process control밅alculation of ARL and other control chart performance measures.
  10. Newman MEJ (2001). Scientific collaboration networks. II. Shortest paths, weighted networks, and centrality. Physical Review E, 64, 016132.
  11. Opsahl T, Agneessens F, and Skvoretz J (2010). Node centrality in weighted networks: generalizing degree and shortest paths. Social Networks, 32, 245-251.
  12. Perry MB (2020). An EWMA control chart for categorical processes with applications to social network monitoring. Journal of Quality Technology, 52, 182-197.
  13. Priebe CE, Conroy JM, Marchette DJ, and Park Y (2005). Scan statistics on Enron graphs. Computational and Mathematical Organization Theory, 11, 229-247.
  14. Reynolds MR and Stoumbos ZG (1999). A CUSUM chart for monitoring a proportion when inspecting continuously. Journal of Quality Technology, 31, 87-108.
  15. Roberts SW (1959). Control chart tests based on geometric moving averages. Technometrics, 1, 239-250.
  16. Ryan AG, Wells LJ, and Woodall WH (2011). Methods for monitoring multiple proportions when inspecting continuously. Journal of Quality Technology, 43, 237-248.
  17. Scott J (1991). Social Network Analysis: A Handbook, Sage Publications, Inc.
  18. Wilson JD, Stevens NT, and Woodall WH (2019). Modeling and detecting change in temporal networks via a dynamic degree corrected stochastic block model. Quality and Reliability Engineering International, 35, 1363-1378.
  19. Yu L, Woodall WH, and Tsui KL (2018). Detecting node propensity changes in the dynamic degree corrected stochastic block model. Social Networks, 54, 209-227.
  20. Yu L, Zwetsloot IM, Stevens NT, Wilson JD, and Tsui KL (2021). Monitoring dynamic networks: A simulation-based strategy for comparing monitoring methods and a comparative study. Quality and Reliability Engineering International, 39, 1226-1250.