search for

CrossRef (0)
Regional flood frequency analysis of extreme rainfall in Thailand, based on L-moments
Communications for Statistical Applications and Methods 2024;31:37-53
Published online January 31, 2024
© 2024 Korean Statistical Society.

Thanawan Prahadchaia , Piyapatr Busababodhin1,b , Jeong-Soo Parka

aDepartment of Mathematics and Statistics, Chonnam National University, Gwangju, Korea;
bDepartment of Mathematics, Mahasarakham University, Maha Sarakham, Thailand
Correspondence to: 1 Digital Innovation Research Cluster for Integrated Disaster Management in the Watershed, Mahasarakham University, Maha Sarakham 44150, Thailand. E-mail: piyapatr.b@msu.ac.th
Received August 10, 2023; Revised October 14, 2023; Accepted October 14, 2023.
In this study, flood records from 79 sites across Thailand were analyzed to estimate flood indices using the regional frequency analysis based on the L-moments method. Observation sites were grouped into homogeneous regions using k-means and Ward’s clustering techniques. Among various distributions evaluated, the generalized extreme value distribution emerged as the most appropriate for certain regions. Regional growth curves were subsequently established for each delineated region. Furthermore, 20- and 100-year return values were derived to illustrate the recurrence intervals of maximum rainfall across Thailand. The predicted return values tend to increase at each site, which is associated with growth curves that could describe an increasing long-term predictive pattern. The findings of this study hold significant implications for water management strategies and the design of flood mitigation structures in the country.
Keywords : cluster analysis, generalized normal distribution, homogeneous region, quantile estimation, regional goodness-of-fit test, water management
1. Introduction

A common statistical problem in hydrology or in climatology is the estimation of annual maximum rainfall distributions and their high quantiles, with the objective of assessing water-related risk and evaluating flood protection system (Naghettini, 2017). One approach is doing so-called frequency analysis (FA) based on the observations of each station. This approach is called the at-site FA, which is not easy to obtain a satisfactory result especially when the sample size is small, because high quantile estimation usually requires tail-extrapolation of the at-site distribution (Castillo et al., 2005; Park, 2005; Lee et al., 2013, 2022). To address this difficulty, people have tried to apply so-called the regional frequency analysis (RFA) to the data collected from multiple sites, instead of doing FA indivisually for each site. It is known that the RFA usually provides better estimates of high quantiles than the at-site FA does (Hosking and Wallis, 1997). In this study, we focus on the RFA for flood data.

For the estimation of annual maximum rainfall distributions and their high quantiles in the RFA, one should choose a suitable distribution first. The generalized extreme value (GEV) distribution has been widely used to model the block extreme events, because the large sample theory supports it (Coles, 2001). However in real data analysis, the GEV distribution sometimes yields inadequate results, especially for small to moderate sample sizes (Vogel and Wilson, 1996; Salinas et al., 2014; Stein, 2017). Thus, other many distributions, including the Pearson type III (PE3) or the generalized logistic (GLO) distributions, have been considered to model extreme events (Hosking and Wallis, 1997; Hamed and Rao, 2019; Murshed et al., 2011; Jeong et al., 2014; Otiniano et al., 2019), for example.

Selection of the most suitable distribution for the RFA is usually accompanied with the estimation of parameters in a distribution. The maximum likelihood estimation (MLE) and L-moment estimation (LME) methods (Hosking, 1990) are usually employed to estimate parameters of these distributions. Generally, the LME performs well on small samples and offers some advantages over the MLE, such as easy computation, more robustness, and less bias (Naghettini, 2017). Thus in this study, we employ the LME method. The RFA using the L-moments is well described in Hosking and Wallis (1997).

Kriengsiri (1976) may be the first research to apply the RFA method to flood data of Northeastern Thailand. He used the RFA for explaining the variations in flood magnitude and illustrated the relationships between heavy rainfall frequency and predicted values for any location. Chaleeraktrakoon et al. (2002) proposed to search for a practical distribution of annual flood records in Thailand using popularly linear moment diagrams. Wuttichaikitcharoen (2015) studied the regionalization of low-flow frequency curves for estimating several-days low-flow at the selected return period in the upper Northern Thailand region.

There has been no RFA for flood data for a whole site of Thailand. Thus this study aims to do the RFA using L-moments over 79 sites across Thailand. The main objectives of this research are: (1) to test the homogeneity of the region using cluster analysis, (2) to evaluate parameters by region, and (3) to identify the most suitable distribution for the volume and duration of extreme rainfall in geographic area of Thailand. We also want to provide useful results that can be used in decision-making process regarding prevention of floods and their water management for different regions of Thailand.

2. Data and area

In this study, we use the annual maximum daily precipitation (AMP) observations for flood data. Part of the data for 71 stations were recorded from 1984 to 2020, and another part for 8 stations were recorded from 1960 to 2020 (TMD, 2021). These 79 stations can be seen in Figure 1 together with 25 river basins. The names of basins are in the supplementary information (SI).

Since most areas near the basins experience repeated flooding, we consider the characteristics of the site to closest basin to be important information, that is a distance variable. These distances were calculated using data from the national water resources office (2021) and hydro and agricultural informatics institute (2023) (ONWR, 2021; HII, 2023). Table 1 summarizes site characteristics from some stations. In addition, more variables are considered; latitude (Lat), longitude (Long), average of AMP (Avg), the ratio, basin area (Area), and elevation (Elev). The ratio refers to the ratio between the minimum and maximum two-month moving average rainfall. For Monmin and Monmax variables are the beginning month of minimum and maximum two-month moving average rainfall calculated from January = 1, February = 2,. . . , to December = 12.

For cluster analysis, we need to standardize variables because they have very different scales and ranges. We rescaled all the site variables except the ratio variable. The Lat, Long, Avg, Distance, Area, and Elev variables were transformed to the range [0,1]. Hosking andWallis (1997) proposed replacing the variable representing the point in year, such as variables Monmin and Monmax, to sin(2πX/12), corresponding to the coordinate of the point on the unit circle. So, these Monmin and Monmax variables were rescaled values by this function and have range on [−1,1]. The details are found in the SI.

3. Methodology

3.1. Steps of RFA

There are four main steps in Figure 2 that are required to perform the RFA procedure. The first step is data preparation which has two parts: 1) Before clustering analysis, the characteristics data must be standardized for cluster analysis, and 2) any missing values must be checked. The second step is to identify homogeneous regions by clustering method. The initial region clustering was established by defining the number of regions and analyzing the characteristics data by k-means and Ward methods, which can be computed from ‘stat’ and ‘factoextra’ packages in the R program (Bolar, 2019; Kassambara and Mundt, 2020). The discordancy measure and the homogeneity test are applied to the initial region clusters. When a region is indicated by a statistic as non-homogeneous, then the stations of that region will be moved and adjusted until the last region supported by the homogeneity statistic test.

The third step is to choose a regional frequency distribution. L-moment ratio diagrams and counting of the acceptable distributions will be used to select the best-fit regional frequency distribution for the homogeneous regions. The final step is the parameter and quantile estimation. Parameter estimates of regional frequency distribution can be calculated from individually at each site parameter estimates and combined to give a regional average.

3.2. Cluster analysis

Clustering is an unsupervised data mining technique that groups data based on similarities between data records. Generally, there are two types of clustering: Non-hierarchical and hierarchical clustering, which have different advantages and disadvantages. In hierarchical clustering, data can be grouped sequentially from top to bottom ((Everitt et al., 2011; Yoo et al., 2020), for example), while in non-hierarchical clustering involves creating new clusters by merging or splitting the clusters rather than following a hierarchical order. As for non-hierarchical clustering, the k-means algorithm is a popular choice due to its simplicity and fast execution ((Everitt et al., 2011; Kang and Lim, 2022), for example). Dikbas et al. (2013) defined a homogeneous region for the streamflow process in Turkey using the k-means clustering method and results showed that k-means can be used for RFA. In this study, we used both Ward’s method and k-means algorithm for clustering of characteristics data.

3.3. L-moments

In this study, we employ L-moment RFA which was proposed by Hosking and Wallis (1997). This method is known to be more robust than conventional moments to outliers in the data. The L-moments are the expected values of linear combinations multiplied by scalar constants. Denote by X(k:n) the kth smallest observation from sample size n, so that the ordered sample is X(1:n)X(2:n) ≤ · · · ≤ X(n:n).

The L-moments of probability distribution are defined by Hosking (1990)


The L-moment λ1 is the mean of the distribution, a location measure (L-location), while λ2 is a dispersion measure (L-scale). The L-moment ratios are obtained by dividing the higher-order L-moments by scale measure λ2:

τr=λrλ2,         r=3,4,

For summarizing distributions, the L-moments λ1 and λ2, the so-called coefficient of L-variation (L-CV) τ = λ2/λ1, and τ3 and τ4, which are measures of skewness (L-skewness) and kurtosis (L-kurtosis), respectively, are the most useful ones (Hosking and Wallis, 1997; Hosking, 1990).

In practice, L-moments must be estimated from a finite sample. The sample L-moment lr is defined as


where x1:nx2:n ≤ · · · ≤ xn:n are the ordered sample. Correspondingly, to Equation (3.6), the sample L-moment ratios are defined by

tr=lrl2,         r=3,4,

and the sample L-CV is computed by t = l2/l1. The estimators t and tr are not unbiased but their biases are very small in moderate or larger samples (Hosking and Wallis, 1997).

3.4. Measuring discordance and homogeneity

The discordance measure was used to identify the discordant among N sites in the group to be removed. The classic discordancy measure (Di) for the ith site in a group is defined as (Hosking and Wallis, 1997):


where ui=[t(i),t3(i),t4(i)]T is a vector of L-moment ratios for site ith and u¯=N-1i=1Nui is the unweight group average. Matrix of sums of squares and cross-products A is defined as:


Hosking andWallis (1997) recommended that a site be considered as discordant if its Di value exceeds the critical value and the criterion Di ≥ 3.

For the test of regional homogeneity, the aim is to calculate the degree heterogeneity in a group of sites and to compare between-site variations in sample L-moments for group of sites versus what would be expected for a homogeneous region. Suppose a region has N sites, and each site i has the recorded data length ni. Let tR, t3R, and t4R be the regionally averaged L-moment ratios which are given as follows


Then the weighted standard deviation of at-site sample L-CV, Vj, for j = 1, 2, 3 can be calculated as follows;


where t(i), t3(i), and t4(i) are sample L-moment ratios for site i.

Now we fit a four-parameter kappa distribution (K4D) (Hosking, 1994; Murshed et al., 2014; Papukdee et al., 2022) to the regionally averaged L-moment ratios to obtain an estimate of the parameter. Finally, a large number of synthetic regions are simulated according to the estimated parameters of the K4D to compute the heterogeneity measure H as follows

Hl=Vl-μvlσvl,         l=1,2,3,

where μv and σv are mean and standard deviation of V. Hosking and Wallis (1997) suggested that a region is considered as “acceptably homogeneous” if Hl < 1, “possibly heterogeneous” if 1 ≤ H < 2, and “definitely heterogeneous” if Hl ≥ 2. Additionally, they stated that H1 statistic compared to H2 and H3 has a discriminating power and is the most important heterogeneity measure (Hamed and Rao, 2019).

3.5. Regional goodness-of-fit test

Hosking and Wallis (1997) described the goodness-of-fit measure, ZDIS T , which is based on Monte Carlo (MC) simulation to compare between sample L-kurtosis and population L-kurtosis for different distributions. DIST can be any distributions such as GLO, GEV, GNO (generalized normal), PE3 (Pearson type III) or GPA (generalized Pareto) to fit with the sample regional average L-moment ratios tR, t3R and t4R. This test is used to confirm the suitability of a chosen distribution and as a measure of confidence level. The goodness-of-fit measure for each distribution is defined as (Hosking and Wallis, 1997):


where τ4DIST is the L-kurtosis of the fitted distribution, and B4 is the bias of t4R which can be calculated as


where σ4 is the standard deviation of t4R:


The regional average L-moment ratios tR, t3R and t4R are used to fit with K4D and simulate some large number Nsim of realizations of regions with N sites. The regional average of L-kurtosis (t4[m]) is calculated for each mth simulated region. As stated in Section 3.4, the K4D is used in computing the heterogeneity measure for the simulated region. The probability distribution is considered acceptable for representing the sample data if |ZDist| ≤ 1.64, that is a criterion at significance level of α = .10 (Hosking and Wallis, 1997).

3.6. Selection of regional frequency distribution

Five candidate probability distributions were examined in selecting the best fit for the final regional AMP data in Thailand: GLO, GEV, GNO, PE3, and GPA. Choosing an appropriate distribution is necessary for the quantile prediction of data. After the final regional is defined, the L-moment ratios diagram between L-skewness and L-kurtosis is a simple and efficient method to choose the best distribution. In addition, when the goodness-of-fit test of each region shows an acceptable distribution or number of |ZDist| ≤ 1.64 is greater than one distribution, then the acceptable number of each distribution is voted on.

3.7. Parameters and quantiles estimation

The parameters of a regional frequency distribution can be estimated using the method of L-moment ratios. In the homogeneous region, the distribution of rainfall data at different sites in the region is identical, except for a scale parameter. Thus, each dimension of the observation, Qi j, is normalized by μi, where μi is the mean of AMP data at site i called index flood:


where i = 1, . . . , N and j = 1, . . . , ni. The dimensionless rescaled data (qi) for all site thus has a characterized by E(qi) = 1. We can estimate a regional probability distribution, q(F), called the regional growth curve, which represents a dimensionless quantile function common to all sites within a homogeneous region. Therefore, quantile function of extreme rainfall at site i, can be estimated as (Hosking and Wallis, 1997).


where F refers to a given quantile. Parameters and quantiles can be computed from ‘lmomRFA’ package in the R program (Hosking, 2023).

4. Results

4.1. Screening of the data

The AMP data from all 79 sites mentioned in study area in the Section 2 were used for analysis. Table 2 shows summary of statistics for AMP data of some sites and discordant statistics Di for site i, i = 1, . . . , 79 is computed from Equation (3.9) From the results of this table, there are a few sites were found that Di value exceeded the critical value in the preliminary screening data such as Mukdahan, Prachin Buri, and Prachuap Khiri Khan. Other sites in the Table 2, such as Kabin Buri, Mae Sa Riang and Phetchaburi, also have very high discordance values and the discordancy values of all sites are shown in the SI.

4.2. Initial identification of regions

Clustering in which the 79 sites were divided into ten clusters was initially obtained. Regional heterogeneity measures for the initial ten clusters shown in Table 3, includes the number of sites (N) in each region. A fitted K4D for each region was used to generate set of record lengths for a similar number of sites in a given region. A total of 500 number of realizations was simulated for each region to obtained H values. Figure 3 shows the cluster dendrogram of initial ten clusters obtained by Ward’s method mentioned in Section 3.2. This figure shows the distance used to determine the similarity between stations in an easy to understand way. Thailand map of initial regions of rainfall data with ten clusters by both k-means and Ward’s methods are shown in the SI.

Cluster dendrograms and plots of L-skewness versus L-kurtosis ware carried out for initial region for both two methods, as shown in the SI (for all available sites). From Table 3, there are six sites with H1 < 1 values, nine sites with value H2 < 1 and seven sites with value H3 < 1 for k-means method. For Ward’s method, also six sites were found that H1 < 1 values, nine sites with value H2 < 1, and eight sites with value H3 < 1. As mentioned in Section 3.4, Hamed and Rao (2019) suggested that the H1 statistic is the most discriminating power and is the most important measure of heterogeneity. Accordingly, the results of Table 3 show that four out of ten clusters obtained during the initial clustering employ have very high values of H1 for both k-means and Ward’s methods. However, the plots of the L-moment ratios under the sub-heading regions K4 and K2 in both clustering methods in Figure 4 show the outlying nature of some sites. This corresponding to Table 3 where more than two H1 values were found, meaning “definitely heterogeneous” regions, as previously defined.

4.3. Identification of homogeneous regions

The algorithm depicted in Figure 5 demonstrates a modification made to the cluster integration step after the initial clustering process. The first ten clusters underwent refinement using this algorithm, resulting in the acquisition of a final region as in Figure 6. In this Figure, the red circles represent sites that have been relocated to other regions due to having the highest D-value, where the H-value is an indicator of non-homogeneity in the initial region. Specifically, these sites include Prachuap Khiri Khan, Hua Hin, Phetchaburi, and Prachin Buri in region K4 for k-means clustering method and Prachuap Khiri Khan and Phetchaburi in region K2 for the Ward’s clustering method in the Figure 4.

When some sites in some regions are moved out and some sites moved in, according to the steps of algorithm the last one gets the final version of the region with the same number of ten regions but the number of sites in some groups change. The final delineation of homogeneous measurements is shown in Table 4. This table shows that H1 < 1 for all regions for both clustering methods. Except for the region K8 with H = 1.73 for the k-means which means that this region is still uncertain about its absolute homogeneity.

4.4. Regional flood frequency distribution

This goodness-of-fit test is done simulated as a homogeneity measurement and the results for all ten regions of Thailand, with Table 5 showing the number of accepted distributions. The distribution was accepted when the |Zdist| ≤ 1.64 statistic. As a result, distribution GEV and GNO distribution have the same number of acceptances, that is eight times for both clustering methods.

The L-moment diagrams were shown in Figure 7, it was found that each point region has a strong preference for those distributions and they fitted well to the curves between L-skewness and L-kurtosis. The GLO curve and the PE3 curve are the upper and the lower bounds, respectively. The WEI is also represented in this figure as this curve shows a lower level than the PE3 curve. Points abbreviated by L, N, G, E, and U represent two-parameter logistic, normal, Gumbel, exponential, and uniform distributions curves, respectively. The red cross-points in the figure are represented by mean of regional average L-moment ratios.

4.5. Regional quantile estimates

The extreme event XT is defined as the event exceeded on average once every T years (Coles, 2001) and probability of occurrence of an event XT in any observation is the inverse of its return period P[X > xT] = 1/T. The return period (T) can be calculated as T = 1/(1 − F), where F is the non-exceedance probability mentioned in Section 3.7. Table 6 shows a summary statistic of regional average L-moment ratios and estimated parameters of regional frequency distribution of each region by Ward’s methods. The definitions of distributions and the result from k-means algorithm are provided in the SI.

The plotting positions are related to the Gumbel variate via the return period T. The values F = 0.01, 0.10, 0.20, 0.50, 0.80, 0.90, 0.95, 0.99, 0.998, and 0.999 correspond to return periods of 1−, 1.11−, 1.25−, 2−, 5−, 10−, 20−, 100−, 500−, and 1000−year, respectively. Figure 8 shows the regional growth curve of GEV distribution fitted to AMP data of the ten homogeneous regions. The regional growth curve is written such as

Region K1:ξ^+α^κ^[1-(-logF)κ^]=0.850-8.857×[1-(-logF)-0.028]

when the rescaled data is used to calculate the parameter estimate.

For Figure 9 shows maps of 20- and 100-year return level of regional GEV distribution of AMP data in Thailand using Ward clustering method. The return level of each site is calculated by multiplying the estimate regional growth curve with the average AMP data at each site, which is related to Equation (3.20). Mae Sa Riang site showed the lowest predicted values that were 98.41 mm and 122.00 mm and Nakhon Si Thammarat station showed the highest predicted values that were 414.91mm and 591.31 for the 20-year and 100-year return levels, respectively. The recurrence level predicted value tended to increase at each site, which was associated with a growth curve that could describe a long-term predictive value pattern ranging T = 1, . . . , 1000.

5. Discussion

Regional frequency analysis (RFA) in Thailand, two distributions are considered: Generalized extreme value (GEV) and generalized normal (GNO). However, previous research has demonstrated that the distribution of hydrological data in Thailand supports the application of the GEV distribution. Chaleeraktrakoon et al. (2002) conducted an analysis of annual flood data in Thailand using popular linear moment diagrams. Theoretical moment relations of various distributions were applied to the observed diagrams, with the GEV distribution proving to outperform other distributions. Additionally, Yoon et al. (2015) employed a spatial GEV model to analyze annual maximum precipitation (AMP) data from the recordings of the Meteorological Department at 25 stations. This study highlighted the effectiveness of the model and the advantage of spatial extreme modeling, which provides robust estimation. Figures 89 illustrated that the return level of the GEV distribution increased as the return period increased. Furthermore, Prahadchai et al. (2023) presented an analysis of AMP data in Thailand using non-stationary (NS) GEV models. The results showed that the future rainfall forecast values from the model fluctuate over time. Consequently, researchers are now aware of the importance of considering the non-stationary model (NS) in further RFA studies.

6. Concluding remark

In this study, a practical regional flood estimation procedure for Thailand is presented using a technique for analyzing regional flood frequency. Identification of multiple homogeneous regions is practicable after rigorous testing for homogeneity in statistical terms, although there are limitations both in terms of the spatial coverage of Thailand and the characteristics of river basin area and general data (i.e., latitude, longitude, elevation,. . . ). We highlight on clusters by measuring the distance from the nearest basin because these areas are frequently at risk of repeated flooding according to several studies.

The selection of the appropriate frequency distribution was also conducted based on an appropriate statistical test. The results of both clustering methods yield the same GEV and GNO distributions, but since the mean of regional average L-moment ratios is closer to the GEV distribution, this distribution is recommended. The predicted level value tends to increase at each site, which is associated with a growth curve that could describe a long-term predictive pattern.


The authors are grateful to the reviewers and the Editors for their valuable and constructive comments. Observational data and the characteristics of 25 basin data in the Thailand are provided by TMD at http://www.tmd.go.th/ and HAII at https://tiwrm.hii.or.th/, respectively. The authors are grateful to Tossapol Phoophiwfa who helped a lot for plotting a map of 25 basins location in Thailand. This research was supported by under the framework of international cooperation program managed by the NRF of Korea (2022K2A9A1A01097935) and the BK21 FOUR (Fostering Outstanding Universities for Research, NO.5120200913674) funded by the Ministry of Education (MOE) and National Research Foundation of Korea (NRF). Piyapatr’s research was financially supported by Mahasarakham University and funded by the Agricultural Research Development Agency (Public Organization) of Thailand, (ARDA).

Conflict of interest

The authors declare no potential conflicts of interest.

Fig. 1. Map of Thailand including the 79 observation stations (left) and 25 basins locations with its name (right).
Fig. 2. The four main steps of regional frequency analysis.
Fig. 3. Cluster dendrogram of initial ten clusters obtained by Ward’s clustering method.
Fig. 4. L-moment ratios diagrams between L-CV and L-skewness for all the available sites (represented by black circle symbols) in regions K4 and K2 for both k-means (left) and Ward (right) clustering methods, respectively. The red triangles represent the site with the highest Di in the region.
Fig. 5. The algorithm of process of correcting the location of at-site in each region based on discordant measures.
Fig. 6. Final regions of extreme rainfall with ten clusters by k-means (left) and Ward’s (right) clustering methods.
Fig. 7. Regional average L-moment ratios for the final regions for both k-means (left) and Ward (right) clustering methods, respectively.
Fig. 8. Regional growth curves of generalized extreme-value distribution fitted annual maximum precipitation data for the ten final regions. Both clustering methods are presented, which k-means method is left side and Ward’s method is right.
Fig. 9. Maps of 20- and 100-year return levels of annual maximum precipitation in Thailand by region clustered using the Ward’s method.

Table 1

Site characteristics with units for some stations in Thailand

StationLat (deg)Long (deg)Avg (mm)RatioMonminMonmaxDist (km)Area (km2)Elev (m)
Chiang Mai18.77298.9786.500.038188.10E+043.50E+04307
Hua Hin12.58999.73118.580.0691297.90E+046.30E+035
Nakhon Si Tham.8.54499.94213.560.1412114.30E+042.60E+046
Nong Khai17.867102.72107.820.0251272.10E+054.90E+04167
Ubon Ratchathani15.244104.88125.120.0091286.90E+047.10E+04125

Note: Dist refers to the distance between the station and the nearest basin.

Table 2

Summary of statistics for annual maximum rainfall data of some sites in Thailand and n is the observed record length and Di stands for discordancy statistics

Chiang Mai6186.5030.1750.1700.1320.0140.112
Kabin Buri3789.9950.1400.0780.2330.0122.877
Mae Hong Son3775.6780.1770.1700.0660.0180.721
Mae Sa Riang3763.7430.1500.2330.3330.1092.010
Maha Sarakham37102.6190.1820.2020.1400.0050.065
Nakhon Phanom61160.5230.2050.2440.2370.1780.466
Nakhon Si Thammarat61213.5620.2760.2450.1380.0541.873
Phliu Agromet.37164.3080.1730.2760.2320.1420.245
Prachin Buri37110.6730.1460.3550.1820.0953.109*
Prachuap Khiri Khan37121.2650.3040.1880.0450.0403.864*
Ubon Ratchathani61125.1150.1650.2120.1110.0500.586

Note: * denotes a Di value that exceeds the critical value 3 (Hosking and Wallis, 1997).

Table 3

The measures of regional heterogeneity for initial regions of ten clusters obtained by k-means and Ward’s clustering methods

Namek-means methodWard’s method


Hi < 1697698

Table 4

The measures of regional heterogeneity for final regions of ten clusters obtained by k-means and Ward’s clustering methods

Regionsk-means methodWard’s method


Hi < 1910710108

Table 5

Number of distributions accepted for each cluster method

k-means methodWard’s method

DistributionNumber of acceptedDistributionNumber of accepted
Generalized logistic7Generalized logistic6
Generalized extreme-value8Generalized extreme-value8
Generalized normal8Generalized normal8
Pearson type III6Pearson type III7
Generalized Pareto1Generalized Pareto1

Note: The fit distribution is acceptable as |Zdist| ≤ 1.64.

Table 6

Summary statistics of regional average L-moment ratios and estimated parameters of regional frequency distribution obtained by Ward’s clustering method

NameRegional averageParameters of GEVParameters of GNO

K10.1800.2300.2000.1080.840 (0.010)0.237 (0.010)−0.091 (0.044)0.927 (0.010)0.290 (0.010)−0.476 (0.061)
K20.1710.2040.1780.0680.852 (0.006)0.235 (0.008)−0.053 (0.028)0.938 (0.006)0.282 (0.008)−0.423 (0.039)
K30.1770.1880.1420.0730.850 (0.011)0.248 (0.012)−0.028 (0.052)0.941 (0.012)0.294 (0.013)−0.387 (0.070)
K40.1560.1480.1350.0340.873 (0.012)0.233 (0.018)0.034 (0.072)0.958 (0.013)0.267 (0.017)−0.305 (0.093)
K50.1830.2840.2460.1500.830 (0.013)0.219 (0.010)−0.170 (0.047)0.910 (0.013)0.280 (0.011)−0.592 (0.069)
K60.1890.2650.1730.0950.827 (0.020)0.234 (0.018)−0.142 (0.082)0.912 (0.021)0.295 (0.018)−0.551 (0.118)
K70.1580.1700.1660.0480.869 (0.008)0.227 (0.011)0.000 (0.043)0.952 (0.008)0.265 (0.011)−0.350 (0.058)
K80.1590.1920.1320.0680.864 (0.009)0.222 (0.011)−0.035 (0.046)0.945 (0.009)0.264 (0.011)−0.397 (0.063)
K90.2600.2410.1440.0680.766 (0.010)0.336 (0.011)−0.108 (0.032)0.889 (0.011)0.415 (0.011)−0.501 (0.045)
K100.1670.2070.1510.0450.855 (0.009)0.227 (0.011)−0.058 (0.045)0.939 (0.009)0.273 (0.011)−0.429 (0.062)

Note: In parentheses are the standard error calculated using bootstrap sampling technique.

  1. Bolar K (2019) Package 쁓TAT: Interactive Document for Working with Basic Statistical Analysis . Retrieved Jul. 7, 2023.
  2. Castillo E, Hadi AS, Balakrishnan N, and Sarabia JM (2005). Extreme Value and Related Models with Applications in Engineering and Science, Hoboken, NewJersy, Wiley-Interscience.
  3. Chaleeraktrakoon C, Sarochananjeen W, and Saengratwatchara K (2002). Probability distribution of annual flood data in Thailand. Engineering Journal of Research and Development, 13, 17-26.
  4. Coles S (2001). An Introduction to Statistical Modeling of Extreme Values, London, Springer.
  5. Dikbas F, Firat M, Koc AC, and Gungor M (2013). Defining homogeneous regions for streamflow processes in Turkey using a -means clustering method. Arabian Journal for Science and Engineering, 38, 1313-1319.
  6. Everitt BS, Landau S, Leese M, and Stahl D (2011). Cluster Analysis (5th ed), London, UK, King셲 College.
  7. Hamed K and Rao AR (2019). Flood Frequency Analysis, USA, CRC Press.
  8. Hosking JRM (1990). Array. Journal of the Royal Statistical Society: Series B, 52, 105-124.
  9. Hosking JRM (1994). The four-parameter kappa distribution. IBM Journal of Research and Development, 38, 251-258.
  10. Hosking JRM (2023) Package lmomRFA: Regional Frequency Analysis using L-moments. R package, version 3.5 .
  11. Hosking JRM and Wallis JR (1997). Regional Frequency Analysis, UK, Cambridge University Press.
  12. Hydro-Informatics Institute (HII) (2023) Basin characteristics data . Retrieved Jul. 7, 2023.
    Available from: https://www.hii.or.th/
  13. Jeong BY, Murshed MS, Seo YA, and Park JS (2014). A three-parameter kappa distribution with hydrologic application: A generalized Gumbel distribution. Stochastic Environmental Research and Risk Assessment, 28, 2063-2074.
  14. Kang D and Lim Y (2022). Clustering non-stationary advanced metering infrastructure data. Communications for Statistical Applications and Methods, 29, 225-238.
  15. Kassambara A and Mundt F (2020) Package 쁣actoextra: Extract and Visualize the Results of Multivariate Data Analyses . Retrieved Jul. 7, 2023.
  16. Kriengsiri P (1976). (A methodology for estimating the regional flood frequencies for northeastern Thailand (Dissertation)) , Oklahoma State University, Oklahoma.
  17. Lee Y, Yoon S, Murshed MS, Kim M-K, Cho C, Baek H-J, and Park J-S (2013). Spatial modeling of the highest daily maximum temperature in Korea via max-stable processes. Advances in Atmospheric Sciences, 30, 1608-1620.
  18. Lee S, Park S, and Lim Y (2022). Prediction of extreme PM concentrations via extreme quantile regression. Communications for Statistical Applications and Methods, 29, 319-331.
  19. Murshed MS, Kim S, and Park JS (2011). Beta-k distribution and its application to hydrologic events. Stochastic Environmental Research and Risk Assessment, 25, 897-911.
  20. Murshed MS, Seo YA, and Park JS (2014). LH-moment estimation of a four parameter kappa distribution with hydrologic applications. Stoch Environ Res Risk Assess, 28, 253-262.
  21. Naghettini M (2017). Fundamentals of Statistical Hydrology, Cham, Switzerland, Springer.
  22. Office of the National Water Resources (ONWR) (2021) 22 Basin in Thailand and Royal Decree Determining Watersheds B.E . Retrieved Jul. 7, 2023.
    Available from: http://www.onwr.go.th/
  23. Otiniano CEG, Paiva BSD, and Martins Netob DSB (2019). The transmuted GEV distribution: Properties and application. Communications for Statistical Applications and Methods, 26, 239-259.
  24. Park JS (2005). A simulation-based hyperparameter selection for quantile estimation of the generalized extreme value distribution. Mathematics and Computers in Simulation, 70, 227-234.
  25. Papukdee N, Park J-S, and Busababodhin P (2022). Penalized likelihood approach for the four-parameter kappa distribution. Journal of Applied Statistics, 49, 1559-1573.
    Pubmed KoreaMed CrossRef
  26. Prahadchai T, Shin Y, Busababodhin P, and Park JS (2023). Analysis of maximum precipitation in Thailand using non-stationary extreme value models. Atmospheric Science Letters, 24, e1145.
  27. Salinas JL, Castellarin A, Viglione A, Kohnov찼 S, and Kjeldsen TR (2014). Regional parent flood frequency distributions in Europe밣art 1: Is the GEV model suitable as a pan-European parent?. Hydrology and Earth System Sciences, 18, 4381-4389.
  28. Stein ML (2017). Should annual maximum temperatures follow a generalized extreme value distribution?. Biometrika, 104, 1-16.
  29. Thai Meteorological Department (TMD) (2021) General Climatic Conditions . Retrieved Jul. 17, 2023.
    Available from: https://www.tmd.go.th
  30. Vogel RM and Wilson I (1996). Probability distribution of annual maximum, mean, and minimum streamflows in the United States. Journal of Hydrologic Engineering, 1, 69-76.
  31. Wuttichaikitcharoen P (2015). Low-flow regionalization in the upper northern Thailand. Proceedings of the 3rd EIT International Conference on Water Resources Engineering (ICWRE3) Udon Thani, Thailand. , 1-9.
  32. Yoo C, Yoo Y, Um HY, and Yoo JK (2020). On hierarchical clustering in sufficient dimension reduction. Communications for Statistical Applications and Methods, 27, 431-443.
  33. Yoon S, Kumphon B, and Park JS (2015). Spatial modelling of extreme rainfall in northeast Thailand. Journal of Applied Statistics, 42, 1-16.