
Social network analysis (SNA) provides information about social and individual relationships by graphically and mathematically monitoring global structures and local entities. Network surveillance applies statistical process monitoring (SPM) to a network and aims to quickly detect network anomalies. The network is monitored using control charts to detect anomalous behavior in each period of the network data. Therefore, network surveillance can help prevent conflicts and crises in social situations by identifying global and local perspectives. Many studies have addressed network-surveillance schemes. For example, Hosseini and Noorossana (2018) compared the performance of detecting outbreaks by applying the average and standard deviation of degree measures of a network to exponentially weighted moving average (EWMA) and cumulative sum (CUSUM) charts. Wilson
Networks comprise nodes (vertices) and edges (links), where nodes represent actors or entities and edges represent communication between nodes. Networks can be divided into unweighted or weighted networks according to the types of edges. Edges in unweighted networks take a value of 0 or 1 depending on whether two nodes are communicating or not. Edges in weighted networks take non-negative values such as the number of communications between two nodes. Additionally, networks can be divided into two main categories: directed and undirected networks depending on whether the network is directional or not. An undirected network does not consider the direction of communication between two nodes. For example, given nodes
Centrality measures indicate which node is most important in the network. Freeman (1977, 1978) proposed degree, closeness, and betweenness centrality measures in unweighted networks. Degree centrality is the number of edges directly connected to a node and is representative of the node’s popularity and communication activity. Scott (1991) argued that degree centrality can be regarded as local centrality. However, closeness and betweenness centralities are considered measures of global centrality in terms of the distance among various nodes. Closeness and betweenness centralities are associated with the number of nodes that lie within the shortest distance from all other nodes in the network, helping clarify the global network structure. Brandes (2001), Newman (2001), and Barrat
Because centrality measures in weighted networks are continuous variables, previous studies on SNA have generally used continuous statistics to evaluate performance through control charts. However, if several centrality measures are transformed into one categorical variable through classification, only a single chart can be used to monitor social networks more effectively. By using this method, the problem for monitoring social networks using centrality measures turns into that of monitoring categorical processes. Recently, Perry (2020) classified the observed networks over time into two categories based on the average values of transitivity and reciprocity measures, and applied them to an EWMA chart to monitor the hierarchical tendency. Furthermore, he compared the out-of-control average run length (ARL) performance of the proposed EWMA chart with that of the multinomial CUSUM chart proposed by Ryan
A brief explanation of the multinomial CUSUM chart proposed by Ryan
In this paper, we propose a procedure for creating a categorical variable having four categories by classifying degree and closeness centrality measures according to their average values in weighted networks, and then monitoring this variable using the multinomial CUSUM chart. By applying the four categories to the multinomial CUSUM chart, we can simultaneously monitor both local and global views of the network. We also compare the performance of the proposed procedure with that of the EWMA charts using degree centrality and closeness centrality individually.
The next sections are as follows: In Section 2, we define degree and closeness centrality measures, and in Section 3, we describe the control charts used in the simulations. In Section 4, we propose a procedure for monitoring one categorical variable transformed from two centrality measures, and in Section 5, we describe the simulation settings and discuss the simulation results. Finally, we provide our conclusions in Section 6.
Degree centrality indicates the node’s communication activity and popularity. It has the advantage of being uncomplicated and easy to calculate. Degree is a local centrality measure because it is calculated based on the number of neighbors of a focal node. Freeman (1977, 1978) defined degree centrality as the number of directly connected nodes in a binary network. The degree centrality for node
where
However, since this definition does not take into account the weights representing the frequency of communications or the closeness between nodes, the use of this measure in weighted networks can lead to significant information loss. Barrat
where
Closeness centrality is the inverse of the shortest path between nodes, indicating that an efficient node is close to all the nodes. This centrality measure is related to global centrality. For example, a node may have a high degree centrality but a low value of closeness centrality in the network because the node is popular locally but not globally. There has been great interest in defining the shortest path between nodes. It assumes that the higher the number of intermediate nodes, the higher the communication cost. For example, a node with many intermediary nodes takes more time to acquire information, which distorts and delays the communication between nodes.
Freeman (1977, 1978) proposed the closeness centrality for node
where
for intermediary nodes
However, this measure has the disadvantage of losing a lot of the information contained in weighted networks. Newman (2001) considered the relationship between the weight and distance by using Dijkstra (1959)’s algorithm. The closeness centrality for node
where
for intermediary nodes
The EWMA chart proposed by Roberts (1959) is useful for monitoring and detecting small shifts in the process parameter. Despite having similar performance to CUSUM charts, EWMA charts are widely used due to their ease and simplicity of application and implementation.
We define
where
As
In this paper, we use these asymtotic control limits to evaluate the performance of EWMA charts.
Traditionally, SPM uses Shewhart
Although these charts mentioned above can only be used in situations where items are classified into two categories, there are many situations where items are classified into more than two categories. For example, instead of classifying items in a process as good or bad, they can be categorized as good, fair, or bad. Similarly, the diagnosis of posttraumatic patients can be divided into four categories: survival, minor complications, major complications, and death. Ryan
Let
The multinomial CUSUM statistics are defined as
where
when
Note that the multinomial CUSUM procedure can be generalized to include samples of size
Let
where
We can calculate degree centralities
These average measures can be classified into the following four categories to be applied to the multinomial CUSUM chart.
(i) Category 1 :
(ii) Category 2 :
(iii) Category 3 :
(iv) Category 4 :
where
The procedure proposed in this paper is summarized as follows: First,
Through simulation, the performance of the proposed multinomial CUSUM procedure is evaluated by comparing it with the EWMA procedure using degree centrality and clossness centrality separately. In the EWMA procedure,
The settings of the simulation are determined by considering (i) the network to have
Note that in order to apply the multinomial CUSUM chart, it is necessary to specify the out-of-control probabilities of each category,
To determine the control limits for control charts, the in-control ARL value is set as 100. We design three different EWMA charts with
The out-of-control ARL values for the EWMA chart are obtained from 10,000 simulations, while the values for the multinomial CUSUM chart are obtained from 100,000 simulations.
We look at how the occurrence of abnormal nodes changes the probability of each category. Figure 1 illustrates a scatterplot of the in-control and out-of-control probabilities for degree and closeness centralities of
The ARL performance for
In Table 3 where
To determine why the multinomial CUSUM chart performs poorly for large
In Table 4 where
From the results in Tables 3 and 4, it is recommended to use the multinomial CUSUM chart when the direction and magnitude of change in category probabilities can be predicted to some extent, or when the magnitude of change is expected to be small. The advantage of using the multinomial CUSUM chart is that we can monitor several centrality measures simultaneously instead of monitoring one centrality measure with a single chart.
In recent years, there have been many studies focused on monitoring and detecting changes in SNA and providing information to help prevent conflicts and crises. Accordingly, the networks were observed over time and the observed networks were monitored by applying control charts. Most studies that applied control charts individually used continuous measures as statistics. However, when several continuous measures are classified into one categorical variable and applied to a control chart, there is an advantage that several measures can be monitored using a single control chart.
In this paper, we proposed a procedure for classifying degree and closeness measures, that represent local and global perspectives, respectively, based on their average values and detecting the occurrence of abnormal nodes using the probabilities change of the classified categories. The performance of the proposed procedure was compared with that of the EWMA charting procedure using degree and centrality measures individually through simulation.
From the simulation results, it was found that the multinomial CUSUM chart showed better performance for small changes but poor performance for large changes. When
Finally, it would be interesting to study procedures that use measures other than degree and close ness centrality measures as criteria for categorization. In the future, we will study the classification procedures using two or more different measures and their performance for various types of change.
The in-control and out-of-control probabilities for categories
0.4065 | 0.0899 | 0.4206 | 0.0830 | ||
---|---|---|---|---|---|
2 | 0.4445 | 0.0943 | 0.3822 | 0.0790 | |
3 | 0.5312 | 0.0974 | 0.3001 | 0.0714 | |
4 | 0.6503 | 0.0904 | 0.2036 | 0.0557 | |
6 | 0.8927 | 0.0393 | 0.0448 | 0.0232 | |
0.4180 | 0.0777 | 0.4181 | 0.0863 | ||
2 | 0.4497 | 0.0793 | 0.3879 | 0.0832 | |
3 | 0.5068 | 0.0823 | 0.3347 | 0.0763 | |
4 | 0.5957 | 0.0804 | 0.2578 | 0.0662 | |
6 | 0.8084 | 0.0549 | 0.0988 | 0.0379 |
Values of
EWMA | |||
Multinomial CUSUM | |||
ARL values for EWMA charts, using degree and closeness centralities separately, and multinomial
EWMA | Multinomial CUSUM | |||||||
---|---|---|---|---|---|---|---|---|
Degree | Closeness | Degree | Closeness | Degree | Closeness | |||
2 | 0.5 | 100.5 | 97.0 | 102.4 | 98.1 | 101.2 | 98.7 | |
1 | 95.7 | 91.7 | 98.6 | 93.4 | 100.7 | 92.2 | ||
2 | 82.9 | 67.7 | 85.8 | 70.4 | 88.8 | 75.8 | ||
3 | 66.8 | 49.9 | 71.3 | 54.1 | 76.5 | 58.7 | ||
4 | 52.3 | 55.0 | 40.8 | 62.1 | 46.6 | 38.4 | ||
5 | 42.7 | 46.0 | 32.5 | 50.7 | 36.8 | 33.4 | ||
6 | 34.8 | 37.2 | 26.9 | 42.2 | 31.0 | 30.0 | ||
7 | 28.8 | 30.4 | 23.8 | 35.1 | 26.9 | 27.4 | ||
8 | 24.5 | 25.8 | 21.4 | 29.2 | 23.3 | 24.4 | ||
3 | 0.5 | 89.5 | 84.3 | 92.8 | 87.0 | 94.8 | 89.8 | |
1 | 67.2 | 56.5 | 69.8 | 60.2 | 76.4 | 64.3 | ||
2 | 34.8 | 36.8 | 27.4 | 42.5 | 30.9 | 28.4 | ||
3 | 21.4 | 16.2 | 21.8 | 24.9 | 17.4 | 19.5 | ||
4 | 15.1 | 11.9 | 14.9 | 16.1 | 12.0 | 15.0 | ||
5 | 11.6 | 9.7 | 11.1 | 11.4 | 9.1 | 12.7 | ||
6 | 9.4 | 8.3 | 8.8 | 7.7 | 8.8 | 11.2 | ||
7 | 7.9 | 7.5 | 7.3 | 6.9 | 6.9 | 10.3 | ||
8 | 6.9 | 6.9 | 6.2 | 6.2 | 9.6 | |||
4 | 0.5 | 66.2 | 61.9 | 70.5 | 64.2 | 75.8 | 70.2 | |
1 | 36.9 | 37.1 | 31.2 | 42.5 | 35.6 | 31.6 | ||
2 | 15.1 | 12.2 | 14.9 | 16.2 | 12.1 | 14.7 | ||
3 | 9.4 | 7.8 | 8.8 | 7.2 | 8.8 | 10.0 | ||
4 | 6.8 | 6.0 | 6.2 | 5.4 | 5.8 | 8.1 | ||
5 | 5.4 | 5.0 | 4.8 | 4.5 | 4.4 | 7.2 | ||
6 | 4.5 | 4.4 | 3.9 | 3.9 | 6.7 | |||
7 | 3.8 | 4.1 | 3.4 | 3.6 | 3.2 | 6.4 | ||
8 | 3.4 | 3.8 | 2.9 | 3.3 | 2.9 | 6.3 | ||
6 | 0.5 | 26.6 | 28.1 | 26.9 | 32.1 | 29.8 | 31.5 | |
1 | 11.6 | 10.6 | 11.1 | 11.5 | 10.2 | 13.6 | ||
2 | 5.4 | 4.9 | 4.8 | 4.3 | 4.4 | 6.4 | ||
3 | 3.6 | 3.4 | 3.1 | 3.0 | 2.7 | 5.3 | ||
4 | 2.7 | 2.8 | 2.4 | 2.4 | 2.1 | 5.1 | ||
5 | 2.3 | 2.4 | 2.0 | 2.1 | 1.8 | 5.0 | ||
6 | 2.0 | 2.2 | 1.7 | 1.9 | 1.6 | 5.0 | ||
7 | 1.8 | 2.1 | 1.5 | 1.8 | 1.5 | 5.0 | ||
8 | 1.6 | 2.0 | 1.2 | 1.7 | 1.4 | 5.0 |
ARL values for EWMA charts, using degree and closeness centralities separately, and multinomial
EWMA | Multinomial CUSUM | |||||||
---|---|---|---|---|---|---|---|---|
Degree | Closeness | Degree | Closeness | Degree | Closeness | |||
2 | 0.5 | 98.0 | 99.0 | 99.5 | 101.2 | 100.8 | 100.5 | |
1 | 91.2 | 91.1 | 92.1 | 93.1 | 95.2 | 95.4 | ||
2 | 67.9 | 58.6 | 71.2 | 60.9 | 76.5 | 67.3 | ||
3 | 48.1 | 37.7 | 52.7 | 40.4 | 58.2 | 46.7 | ||
4 | 36.6 | 38.6 | 29.0 | 44.0 | 33.5 | 30.1 | ||
5 | 27.6 | 29.6 | 23.1 | 34.0 | 25.9 | 26.1 | ||
6 | 22.6 | 23.7 | 19.7 | 27.1 | 21.8 | 23.2 | ||
7 | 18.8 | 17.5 | 19.1 | 21.1 | 19.1 | 20.9 | ||
8 | 16.0 | 17.4 | 16.9 | 19.5 | ||||
3 | 0.5 | 79.1 | 83.5 | 83.3 | 84.6 | 87.0 | 90.2 | |
1 | 48.6 | 45.1 | 51.8 | 47.7 | 58.9 | 53.4 | ||
2 | 22.3 | 23.3 | 26.7 | 19.7 | 19.7 | |||
3 | 14.0 | 10.9 | 13.6 | 14.6 | 10.5 | 14.1 | ||
4 | 9.9 | 8.3 | 9.4 | 7.7 | 9.5 | 11.5 | ||
5 | 7.8 | 7.0 | 7.1 | 6.2 | 6.9 | 10.1 | ||
6 | 6.4 | 6.2 | 5.8 | 5.6 | 5.3 | 9.3 | ||
7 | 5.4 | 5.6 | 4.8 | 5.0 | 4.6 | 8.7 | ||
8 | 4.7 | 5.3 | 4.2 | 4.7 | 4.2 | 8.4 | ||
4 | 0.5 | 48.8 | 51.0 | 52.7 | 53.7 | 58.9 | 59.2 | |
1 | 22.3 | 23.4 | 26.8 | 22.6 | 20.6 | |||
2 | 10.0 | 8.1 | 9.4 | 7.6 | 9.6 | 10.3 | ||
3 | 6.3 | 5.4 | 5.7 | 4.9 | 5.4 | 7.8 | ||
4 | 4.7 | 4.3 | 4.2 | 3.8 | 3.7 | 6.8 | ||
5 | 3.8 | 3.8 | 3.3 | 3.3 | 6.4 | |||
6 | 3.2 | 3.4 | 2.8 | 2.9 | 2.5 | 6.2 | ||
7 | 2.8 | 3.1 | 2.4 | 2.7 | 2.3 | 6.1 | ||
8 | 2.4 | 2.9 | 2.1 | 2.6 | 2.2 | 6.1 | ||
6 | 0.5 | 17.4 | 17.5 | 17.4 | 19.2 | 19.0 | 19.2 | |
1 | 7.8 | 7.0 | 7.2 | 6.4 | 6.8 | 8.4 | ||
2 | 3.8 | 3.4 | 3.3 | 3.0 | 2.9 | 5.3 | ||
3 | 2.6 | 2.5 | 2.3 | 2.2 | 5.0 | |||
4 | 2.1 | 2.1 | 1.8 | 1.9 | 5.0 | |||
5 | 1.8 | 1.9 | 1.5 | 1.7 | 1.3 | 5.0 | ||
6 | 1.5 | 1.8 | 1.1 | 1.5 | 1.2 | 5.0 | ||
7 | 1.2 | 1.7 | 1.4 | 1.1 | 5.0 | |||
8 | 1.6 | 1.3 | 1.1 | 5.0 |
Probabilities of categories when
2 | 0.5 | 0.4141 | 0.0927 | 0.4100 | 0.0833 |
1 | 0.4226 | 0.0920 | 0.4028 | 0.0826 | |
2 | 0.4445 | 0.0943 | 0.3822 | 0.0790 | |
3 | 0.4688 | 0.0977 | 0.3572 | 0.0764 | |
4 | 0.4929 | 0.0999 | 0.3350 | 0.0722 | |
5 | 0.5108 | 0.1008 | 0.3166 | 0.0718 | |
6 | 0.5280 | 0.0987 | 0.3030 | 0.0703 | |
7 | 0.5442 | 0.0954 | 0.2884 | 0.0720 | |
8 | 0.5653 | 0.0924 | 0.2699 | 0.0724 | |
3 | 0.5 | 0.4315 | 0.0907 | 0.3944 | 0.0834 |
1 | 0.4665 | 0.0922 | 0.3617 | 0.0796 | |
2 | 0.5312 | 0.0974 | 0.3001 | 0.0714 | |
3 | 0.5920 | 0.0999 | 0.2465 | 0.0616 | |
4 | 0.6491 | 0.0944 | 0.2012 | 0.0553 | |
5 | 0.6979 | 0.0861 | 0.1636 | 0.0524 | |
6 | 0.7431 | 0.0731 | 0.1320 | 0.0518 | |
7 | 0.7773 | 0.0612 | 0.1086 | 0.0528 | |
8 | 0.8080 | 0.0478 | 0.0888 | 0.0554 | |
4 | 0.5 | 0.4622 | 0.0888 | 0.3667 | 0.0823 |
1 | 0.5235 | 0.0917 | 0.3089 | 0.0759 | |
2 | 0.6503 | 0.0904 | 0.2036 | 0.0557 | |
3 | 0.7493 | 0.0791 | 0.1287 | 0.0429 | |
4 | 0.8292 | 0.0593 | 0.0772 | 0.0344 | |
5 | 0.8882 | 0.0375 | 0.0462 | 0.0282 | |
6 | 0.9231 | 0.0214 | 0.0285 | 0.0270 | |
7 | 0.9476 | 0.0121 | 0.0158 | 0.0245 | |
8 | 0.9632 | 0.0056 | 0.0088 | 0.0225 | |
6 | 0.5 | 0.5488 | 0.0859 | 0.2855 | 0.0798 |
1 | 0.6902 | 0.0759 | 0.1745 | 0.0595 | |
2 | 0.8927 | 0.0393 | 0.0448 | 0.0232 | |
3 | 0.9724 | 0.0113 | 0.0083 | 0.0080 | |
4 | 0.9938 | 0.0021 | 0.0013 | 0.0028 | |
5 | 0.9984 | 0.0002 | 0.0002 | 0.0012 | |
6 | 0.9994 | 0.0000 | 0.0000 | 0.0006 | |
7 | 0.9998 | 0.0000 | 0.0000 | 0.0002 | |
8 | 0.9999 | 0.0000 | 0.0000 | 0.0001 |