TEXT SIZE

CrossRef (0)
An alternative method for estimating lognormal means

Yeil Kwon1,a

aDepartment of Mathematics, University of Central Arkansas, USA
Correspondence to: 1 Department of Mathematics, University of Central Arkansas, 201 Donaghey Avenue, Conway, AR, 72035, USA. E-mail: ykwon1@uca.edu
Received January 7, 2021; Revised April 9, 2021; Accepted June 16, 2021.
Abstract

For a probabilistic model with positively skewed data, a lognormal distribution is one of the key distributions that play a critical role. Several lognormal models can be found in various areas, such as medical science, engineering, and finance. In this paper, we propose a new estimator for a lognormal mean and depict the performance of the proposed estimator in terms of the relative mean squared error (RMSE) compared with Shen’s estimator (Shen et al., 2006), which is considered the best estimator among the existing methods. The proposed estimator includes a tuning parameter. By finding the optimal value of the tuning parameter, we can improve the average performance of the proposed estimator over the typical range of σ2. The bias reduction of the proposed estimator tends to exceed the increased variance, and it results in a smaller RMSE than Shen’s estimator. A numerical study reveals that the proposed estimator has performance comparable with Shen’s estimator when σ2 is small and exhibits a meaningful decrease in the RMSE under moderate and large σ2 values.

Keywords : lognormal distribution, relative mean squared error, variance approximation, tuning parameter, consistent estimator
1. Introduction

Lognormal distribution has a central role to play in probabilistic modeling for the phenomena described by a skewed distribution of positive random variables in nature. One can find numerous examples of lognormal models, such as clinical pharmacokinetic studies (Lacey et al., 1997), failure rates in engineering (Ohring and Kasprzak, 1998), and stock prices in finance (Hull, 2018). Therefore, finding a precise and stable estimation method for the lognormal mean is crucial in lognormal modeling problems. In particular, an accurate estimation becomes much more critical when σ2 is large because the estimation error is inclined to escalate rapidly when σ2 becomes larger.

This paper focuses on the estimation method for the mean of the lognormal distribution. For decades, a few estimators for the lognormal mean have been proposed. The sample mean is one of the most popular estimators because of its simplicity and lack of bias. Although the maximum likelihood estimator (MLE) is not unbiased, it has desirable asymptotic properties, such as asymptotic normality and asymptotic efficiency. Finney (1941) proposed the uniformly minimum variance unbiased estimator (UMVUE) with a form of infinite series of the sample variance, and it has the minimum variance among all unbiased estimators. Evans and Shaban (1974), and Zhou (1998) proposed estimators based on Zellner’s conditional minimal mean squared error (MSE) estimator (Zellner, 1971). These are not unbiased but have markedly smaller MSEs than Finney’s UMVUE with small sample sizes. Shen et al. (2006) proposed an efficient estimator using second-order asymptotics, which improves the conditional minimal MSE estimators for large σ2 values. Longford (2009) proposed an estimator with a form of exp( + bσ̂2) and directly found b, achieving the lower bound of the MSE. Because b is a function of unknown σ2, is determined by replacing σ2 with the unbiased estimator for σ2.

In this research, we propose a new estimator of a specific form containing a tuning parameter k in it. This parameter k is determined by a value that minimizes the average relative mean squared error (RMSE) of the proposed estimator over the given range of σ2. The proposed estimator’s performance is compared with that of the existing estimators in terms of the RMSE. The numerical study reveals that the proposed estimator has comparable performance with the existing estimators when σ2 is small. However, it improves adequately the existing estimators when σ2 is moderate or large, regardless of the sample size.

Section 2 includes the literature review for the existing estimators using a lognormal mean. In Section 3, we propose a new estimator and address the properties of the new estimator. We report the numerical study results to describe the performance of the proposed estimator based on the RMSE in Section 4.

2. Existing estimators

Let X be a normal random variable with mean μ and variance σ2: X ~ N(μ,σ2). Then, Y = eX is lognormally distributed with parameters μ and σ2: Y ~ LN(μ,σ2). The mean of Y, θ, is a function of both μ and σ2,

$θ=E(Y)=eμ+σ22.$

Suppose X1, . . . , Xn is a random sample from N(μ,σ2). Then,$Yi=eXi∼iidLN(μ,σ2)$ for i = 1, . . . , n. We define three statistics,

$Y¯=1n∑i=1nYi, X¯=1n∑i=1nXi, and S2=∑i=1n(Xi-X¯)2.$

Let R(θ̂, θ) denote the relative mean squared error (RMSE), which is the risk of an estimator θ̂ for θ under the squared loss function L(θ̂, θ) = (θ̂/θ − 1)2,

$R (θ^,θ)=E(θ^θ-1)2.$

In this paper, the performance of each estimator is compared with the others in terms of the RMSE.

### 2.1. Sample mean and maximum likelihood estimator

In the research based on the lognormal distribution, the sample mean, θ̂sm = , is one of the most widely used estimators. The sample mean is popular because it is obtainable and unbiased. However, in a vast amount of the literature, the sample mean is significantly inefficient compared with other biased estimators with both small and large samples (Zhou, 1998; Shen et al., 2006; Longford, 2009).

However, the MLEs for μ and σ2 of the normal distribution N(μ,σ2) are given by and S2/n using the statistics defined in (2.2). By the invariant property of MLE, one can attain the MLE for θ : θ̂mle = exp( + S2/2n). The MLE has the preferable asymptotic properties of MLE, such as consistency, normality, and efficiency. In particular, when the sample size is large, the MLE has a comparable RMSE with the other existing estimators and is very easy to calculate. Zhou recommended using the MLE for large samples, such as n ≥200 (Zhou, 1998). However, θ̂mle performs worse than the sample mean when the sample size is small (see Figure 1).

### 2.2. Estimators based on the infinite series

There are several estimators for θ based on the infinite series of S2. Finney (1941) derived UMVUE with the joint complete and sufficient statistics (, S2) for (μ,σ2) as follows,

$θ^umvue=exp (X¯) g (S22),$

where

$g(t)=∑j=0∞Γ (n-12)j!Γ (n-12+j)(n-12nt)j.$

Although θ̂umvue has the smallest risk under the squared error loss among all unbiased estimators, an estimator with a lower MSE than the UMVUE of the lognormal mean exists (Rukhin, 1986). Let θ̂r denote an estimator of the form exp()r(σ2). Zellner (1971) found that conditioning on σ2, the MSE of θ̂r is minimized when r(σ2) = exp((n − 3)σ2/2n), and it results in a conditionally minimal MSE estimator of exp( + (n − 3)σ2/2n). Evans and Shaban (1974) and Zhou (1998) proposed the following estimators,

$θ^ES=exp (X¯) g (n-32(n-1)S2) and θ^Zhou=exp (X¯) g (n-42(n-1)S2),$

using unbiased estimators for exp((n − 3)σ2/2n) and exp((n − 4)σ2/2n), respectively. Figure 1 illustrates that θ̂ES and θ̂Zhou significantly improve θ̂umvue for small samples. With small sample size, θ̂Zhou has less risk than θ̂ES, but the two estimators are almost equivalent to each other when the sample size is large.

### 2.3. Shen’s efficient estimator

Shen et al. (2006) proposed a class of estimators for θ of the form θ̂c = exp( + cS2/2) with c = 1/(n + d) and d > −n. Let MU(t) denote the moment generate function of a random variable U. From the definition in (2.2), we have MXi(t) = E(etX1) = eμt+σ2t2/2, and $S2/σ2~χ(n-1)2$ with $Mχ(n-1)2(t)=E(etχ(n-1)2)=(1-2t)-(n-1)/2$. Thus,

$E (eX¯)=eμ+σ22n, E (e2X¯)=e2μ+2σ2n, and E (ecS2)=(1-2cσ2)-n-12.$

Using the results in (2.4), the squared error risk of θ̂c, R(θ̂c, θ) = E(θ̂c/θ − 1)2, can be shown as follows,

$R (θ^c,θ)=1θ2E [exp (X¯+cS22)-θ]2=exp {(2n-1) σ2}(1-2cσ2)-n-12-2 exp {(1n-1)σ22}(1-cσ2)-n-12+1, c<12σ2.$

With the standard expansion for c = 1/(n + d) = 1/nd/n2 + o(1/n2), R(θ̂c, θ) can be written as a function of d,

$R (θ^c,θ)=σ2n{1+σ22+σ24n[d2-(8+3σ2)d+8σ2+74σ4]}+o (1n2),$

which is minimized at d = 4 + 3σ2/2. Replacing the unknown σ2 with its unbiased estimator S2/(n − 1), we obtain the following,

$θ^Shen=exp (X¯+(n-1)S22(n+4)(n-1)+3S2).$

With small sample sizes and small values of σ2, the RMSE of θZhou is slightly lower than that of θShenShen et al. (2006). Except for such a case, θ̂Shen has a uniformly smaller RMSE than θ̂sm, θ̂mle, θ̂umvue, θ̂ES, and θ̂Zhou (see Figure 1).

### 2.4. Longford’s estimator

Longford (2009) proposed a family of estimators of the form θ̂b = exp( + bS2/(n − 1)) to estimate θa = exp(μ + 2) with a given constant a. The mean of the lognormal distribution, θ, in (2.1) is a special case of θa where a = 1/2. (The mode of the lognormal distribution is exp(μσ), which is a special case of θa where a = −1.) The relative mean squared error of θ̂b, is given by

$R (θ^b,θ)=exp {(2n-1) σ2}(1-4bσ2n-1)-n-12-2 exp {(1n-1)σ22}(1-2bσ2n-1)-n-12+1, b

The minimum of R(θ̂b, θ) can be determined by solving the following equation,

$∂R(θ^b)∂b=eσ2(2n-1) (n-12)(4σ2n-1) (1-4bσ2n-1)-n+12-2eσ22(1n-1) (n-12)(2σ2n-1) (1-2bσ2n-1)-n+12=0,$

and the solution is obtained as follows,

$b=n-12σ2·D-12D-1 with D=exp (n-3n(n+1)σ2).$

By substituting σ2 with S2/(n − 1), we have the Longford estimator as

$θ^Lf=exp (X¯+n-12·exp (n-3n(n2-1)S2)-12 exp (n-3n(n2-1)S2)-1).$

Using the Taylor expansion, θ̂Lf can be approximated as follows,

$θ^Lf≈exp (X¯+(n-1)S22n(n+1)(1+2n-3)+4S2).$

Figure 2 presents the risk comparison of θ̂Shen and θ̂Lf. First, two estimators are almost identical over σ2 ∈ (0, 5) with a large sample. Next, with small samples and small values of σ2, θ̂Lf performs a little better than θ̂Shen. However, for all other settings of n and σ2, θ̂Shen outperforms θ̂Lf. Thus, from Figures 1 and 2, we can consider θ̂Shen to be the best estimator of the existing methods in terms of the RMSE. Therefore, in this paper, we use θ̂Shen as a benchmark for evaluating the performance of the proposed estimator.

3. Proposed estimator

### 3.1. Analytic properties

We consider an estimator of the form, for k ≥1,

$θ^k=exp (X¯+(n-1)S22n(n+1)+kS2),$

with the squared error risk,

$R (θ^k,θ)=eσ2(2n-1)Q2(k)-2eσ22(1n-1)Q1(k)+1,$

where,

$Qi(k)=E [exp (i(n-1)S22n(n+1)+kS2)], i=1,2.$
Proposition 1.

$R (θ^k,θ)=E (θkθ-1)2→0$, for 0 < k < ∞

Proof:

Let θ̂c = exp( + cS2/2) . Shen et al. (2006) showed R(θ̂c, θ) → 0 when cn → 1. θ̂k can be written as the same form as θ̂c with c = (n − 1)/(n(n + 1) + kS2/2). Since cn = n(n − 1)/(n(n + 1) + kS2/2) → 1, we have R(θ̂k, θ) → 0.

Proposition 1 states the relative risk of θ̂k converges to zero as the sample size increases implying θ̂k is a consistent estimator for θ. With proposition 1, one can conclude that the bias and the variance of θ̂k converges to 0 as n→∞. Proposition 2 shows both θ̂k and θ̂Shen underestimate θ and, for k ≤ 3, the squared bias of θ̂k is uniformly smaller than that of θ̂Shen.

### Proposition 2

• (1) Both θ̂k and θ̂Shen are negatively biased: E(θ̂k) − θ < 0 and E(θ̂Shen) − θ < 0.

• (2) Bias2(θ̂k) < Bias2(θ̂Shen), for k ≤ 3.

• (3) Bias2(θ̂k) converges to zero as n→∞.

### Proposition 3

The variance of θ̂k is given by

$V (θ^k)=exp (2μ+σ2n) [exp (σ2n) Q2(k)-Q12(k)],$

where, for i = 1, 2,

$Qi(k)=E [exp (i(n-1)S22n(n+1)+kS2)]=∫0∞exp (iσ2(n-1)w2n(n+1)+kσ2w) g(w) dw,$

and g(w) is a p.d.f of chi-squared distribution with degree of freedom n−1. Moreover, V(θ̂k) converges to zero as n→∞.

### Proposition 4

For a large sample, the variance of θ̂k can be approximated by,

$V (θ^k)≈exp (2μ+2(n-1)2σ22n(n+1)+k(n-1)σ2) [σ2n+8σ4n2(n+1)2(n-1)3(2n(n+1)+kσ2(n-1))4],$

and the right hand side of (3.4) converges to zero as n→∞.

For a large sample, the approximated V(θ̂) can be estimated by substituting and S2/(n − 1) for μ and σ2, respectively as follows,

$V^(θ^k)≈exp (2X¯+2(n-1)S22n(n+1)+kS2) [S2n(n-1)+8n2(n+1)2(n-1)S4(2n(n+1)+kS2)4].$
Theorem 1

A positive value k = k*exists, at which R(θ̂k*, θ) is the unique minimum of R(θ̂k, θ) in (3.2), for n ≥3.

### 3.2. Choosing a tuning parameter k

Theorem 1 states that a unique value k = k* exists that minimizes the RMSE of the θ̂k in (3.1). Evidently, the k* value depends on σ2 and the sample size n; thus, the optimal k* can be obtained numerically, given σ2. However, because σ2 is unknown, we suggest the following method of determining k,

• (1) Let v0, v1, . . . , vI be values such that $σLB2=v0, where $σLB2$ and $σUB2$ are the upper and lower bounds of σ2 in which we are interested.

• (2) Let w1, . . . , wJ be positive integer values such that nLB = w1 < w2 < · · · < wj < · · · < wJ = nUB, where nLB and nUB are the upper and lower bounds of n in which we are interested.

• (3) Compute Rij(θ̂k, θ) for each discretized value of k over (1, K), where Rij(θ̂k, θ) is the RMSE of θ̂k in (3.2) with σ2 = vi and n = wj, for i = 1, . . . , I and j = 1, . . . , J. Here, K is a number large enough for finding the optimal value of k. Note that Rij(θ̂k, θ) includes the expected values Qi(k), i = 1, 2, in (3.3) and they will be calculated by using the numerical integration with (G.1) and (G.2) in Appendix G.

• (4) Determine k = for the proposed estimator θ̂Prop . is the value of k at which the average of Rij is minimized,

$k˜=arg mink>01IJ∑i=1I∑j=1JRij(θ^k,θ).$

Therefore, can be considered as the value of k that minimizes the average of RMSEs over $σLB2<σ2<σUB2$ and nLB < n < nUB.

It is worth to point out that we set $σUB2=5$. A number of researches such as Land (1972), Chelmow et al. (1995), Zhou (1997, 1998), and Shen et al. (2006) provide empirical evidence of σ2 < 5 for the majority of the lognormal data in practice. For this reason, many literature concerning the lognormal data presented the numerical study over σ2 ∈ (0, 5): Shen et al. (2006), Zhou (1997, 1998). The value of the skewness coefficient with σ2 = 5 corresponds to 1826.2, which is not commonly observed in real data (Zhou, 1998). Therefore, the proposed estimator based on obtained by using $σUB2=5$ can be applied for the majority of the lognormal data except for the extreme cases of σ2. Moreover, since all of the $σLB2, σUB2$, nLB, and nUB are predetermined as follows, is a generic value that does not depend on data,

• (a) Set $σLB2=0, σUB2=5$. The size of increment of σ2 is chosen as 0.1. Thus, v0 = 0 and vi = vi−1 + 0.1 for i = 1, . . . , I, with I = 50. The smaller increment size of σ2 enable us to have a fine grid in this optimizing process. However, we found that, from the numerical study, it could not make significant difference for determining .

• (b) Set nLB = 5, w0 = 0 and wj = wj−1 + 5, for j = 1, . . . , J with J = nUB/5. To investigate the effect of nUB for Rij(θ̂k, θ), we examined the cases of nUB = 10, 20, 30, 40, 50, 75, 100, and 125. When nUB is larger than 100, the effect of nUB for diminishes remarkably; thus, nUB = 100 (J = 20), is selected for finding . Moreover, with the same reason as (a), the increment of sample size is chosen as 5. The increment smaller than 5 does not show notable difference in Rij(θ̂k, θ) calculation.

• (c) As for K, we set k = 1.0, 1.1, 1.2, . . . , K, with K = 20. Figure 3 displays the graphs of k versus the average RMSE for each nUB considered in (b).

• (d) As a result, the k value for the proposed estimator is determined by = 3.5; thus, the proposed estimator is given as follows,

$θ^prop=exp (X¯+(n-1)S22n(n+1)+3.5S2).$

On the other hand, one can use the confidence interval of σ2 to check whether σ2 < 5 or not. Consider 100(1 − α)% one-sided confidence interval for σ2. Since $S2/σ2~χ(n-1)2$, the upper bound of the interval is given by $S2/χ1-α2$, where $χ1-α2$ is the 100α-th percentile of the chi-squared distribution with degree of freedom n − 1. If the upper bound of the confidence interval $S2/χ1-α2$ is less than 5, we can be highly confident that σ2 < 5. Note that $S2/χ1-α2<5$ is equivalent to $(sample variance)<5·χ1-α2/(n-1)$. With 95% confidence interval, $5·χ1-α2/(n-1)$ values are 1.97, 2.71, 3.08, 3.31, and 3.48 under the sample sizes n = 11, 21, 31, 41, and 51, respectively. Thus, for example, if [11 ≤ n ≤ 30 & the sample variance < 2], or [n > 30 & the sample variance < 3], it is recommendable to use the proposed estimator. All of the lognormal data used in the researches mentioned in section 3.2 have the sample variances less than 1.5.

4. Numerical study for risk comparison

In this section, we compare the performance of the proposed estimator with the existing estimators, θ̂textLf and θ̂Shen. The RMSEs of these are given by the following,

$R (θ^Lf ,θ)=e(2n-1)σ2L2-2eσ22(1n-1)L1+1,R (θ^Shen,θ)=e(2n-1)σ2F2-2eσ22(1n-1)F1+1,R (θ^Prop,θ)=e(2n-1)σ2G2-2eσ22(1n-1)G1+1,$

where, for i = 1, 2,

$Fi=E [exp (i(n-1)S22(n+4)(n-1)+3S2)],Li=E [exp (i(n-1)2·exp((n-3)S2/n(n2-1))-12 exp((n-3)S2/n(n2-1))-1)],Gi=E [exp (i(n-1)S22n(n+1)+3.5S2)],$

respectively. In the research based on lognormal data, σ2 are typically observed in (0, 5) (Shen et al., 2006; Zhou, 1998). Thus, in this numerical study, the values of σ2 are selected as 0.1, 0.2, . . . , 4.9, 5.0. We also take the sample size n to be 8, 12, 20, 30, 50, 75, 100, and 120. In addition, μ is set as μ = −σ2/2 throughout the numerical illustration.

First, R(θ̂Shen, θ) and R(θ̂Prop, θ) are compared in terms of the squared bias, variance, and RMSE. With μ = −σ2/2, the RMSE can be written as the sum of the squared bias and variance. The RMSE of the estimators in (4.1) can be calculated using numerical integration because S2/σ2 is a χ2 random variable with the degrees of freedom n − 1. The upper panel of Figure 4 displays

$Bias2(θ^Shen)-Bias2(θ^Prop), V (θ^Shen)-V (θ^Prop), R (θ^Shen,θ)-R (θ^Prop,θ),$

from the left to the right. The lower panel of Figure 4 displays,

$Bias2 (θ^Shen)Bias2 (θ^Prop), V (θ^Shen)V (θ^Prop), R (θ^Shen,θ)R (θ^Prop,θ),$

from the left to the right, respectively. Figure 4 displays the squared bias of θ̂Prop, which is uniformly smaller than that of θ̂Shen, and the variance of θ̂Prop is larger than that of θ̂Shen over σ2 ∈ (0, 5). Compared to θ̂Shen, the proposed estimator makes a greater bias reduction than the variance increase and results in a decrease in the RMSE. Specifically, R(θ̂Prop, θ) is smaller than R(θ̂Shen, θ) when σ2 > 1, and R(θ̂Shen, θ) is smaller than R(θ̂Prop, θ) when 0 < σ2 < 1. However, we can observe that the improvement in the RMSE caused by the proposed estimator over σ2 > 1 is much greater than the deterioration of the RMSE over 0 < σ2 < 1.

Next, the RMSEs of θ̂Lf, θ̂Shen, and θ̂Prop are compared under various distributions of σ2 including uniform, gamma, inverse gamma, and mixture gamma distributions. All of the distributions have the same mean, E(σ2) = 1, and N = 10,000 values of σ2 are randomly generated from each distribution. For each value of σ2 and μ = −σ2/2, random numbers y1, y2, . . . , yn are generated from the lognormal distribution with sample sizes 8, 12, 20, 30, 50, and 100. Based on the selected sample, we calculated θ̂Lf, θ̂Shen, θ̂Prop, and their RMSEs. Table 1 lists the RMSE values for all the scenarios. For all cases, θ̂Prop uniformly outperforms the other two estimators regardless of the sample size n.

5. Conclusion

In this research, we proposed a new estimator, which is obtained by finding a tuning parameter to make the smallest average RMSE over a specific range of σ2 values. The new estimator is primarily compared with Shen’s estimator because it has been proven to almost uniformly outperform the other existing estimators, such as the sample mean, MLE, UMVUE, and estimators based on the conditional minimal MSE estimators. The proposed estimator exhibits comparable performance with Shen’s estimator when σ2 is small (σ2 < 1) and demonstrates purposeful improvement with moderate and large σ2 values in terms of the RMSE. Moreover, because the proposed estimator has a simpler form than the competing estimators, it is easy to compute in a real data analysis. Therefore, we suggest using the proposed estimator for the lognormal mean, unless one has strong evidence that σ2 of the given lognormal data is very large.

Appendix A: Lemma 1

### Lemma 1

Lemma 1

For i = 1, 2,

$lim n → ∞ Q i ( k ) = exp ( i σ 2 2 ) .$Proof:

Let Un = iS 2/(n − 1), Tn = 2n(n + 1)/(n − 1)2 + kS 2/(n − 1)2 and Zn = Un/Tn. Since S 2/(n − 1) → σ2, we have $U n → p i σ 2 , T n → p 2$, and $Z n → p i σ 2 / 2$ by Slutsky’s theorem. Note that Qi(k) = E[exp(Zn)] and exp(·) is continuous. Furthermore, since Zn is a convergent sequence, it is bounded, and so is exp(Zn). Now, since Zn is bounded and converges to 2/2, by Portmanteau lemma,

$lim n → ∞ Q i ( k ) = lim n → ∞ E [ exp ( Z n ) ] = E [ exp ( lim n → ∞ Z n ) ] = exp ( i σ 2 2 ) .$
Appendix B: Proof of Proposition 1
Proof:

The proof of Proposition 1 is provided in Section 3.1. In Appendix B, we present alternative proof of Proposition 1 using Lemma 1. Recall, from (3.2), R(θ̂k, θ) = eσ2(2/n−1)Q2(k) −2eσ2(1/n−1)/2Q1(k)+ 1. Since limn→∞ Q2(k) = eσ2 and limn→∞ Q1(k) = eσ2/2,

$lim n → ∞ R ( θ ^ k , θ ) = lim n → ∞ [ e σ 2 ( 2 n - 1 ) Q 2 ( k ) - 2 e σ 2 2 ( 1 n - 1 ) Q 1 ( k ) + 1 ] = lim n → ∞ e σ 2 ( 2 n - 1 ) lim n → ∞ Q 2 ( k ) - 2 lim n → ∞ e σ 2 2 ( 1 n - 1 ) lim n → ∞ Q 1 ( k ) + 1 = e 2 σ 2 n - σ 2 e σ 2 - 2 e σ 2 2 n - σ 2 2 e σ 2 2 + 1 = 0.$
Appendix C: Proof of Proposition 2
Proof:

From (2.2), we have E(e) = θeσ2(1−n)/2n, and note that (n − 1)S 2/(2n(n + 1) + kS 2) = (n − 1)/k − 2(n − 1)n(n + 1)/k(2n(n + 1) + kS 2). Thus, for k ≥1,

$E ( θ ^ k ) - θ = E [ exp ( X ¯ + ( n - 1 ) S 2 2 n ( n + 1 ) + k S 2 ) ] - θ = θ ( exp ( σ 2 ( 1 - n ) 2 n ) exp ( n - 1 k ) E [ exp ( - 2 ( n - 1 ) n ( n + 1 ) k ( 2 n ( n + 1 ) + k S 2 ) ) ] - 1 ) ≤ θ ( exp ( σ 2 ( 1 - n ) 2 n ) exp ( n - 1 k ) exp ( - 2 ( n - 1 ) n ( n + 1 ) k ( 2 n ( n + 1 ) + k E ( S 2 ) ) ) - 1 ) = θ ( exp ( σ 2 ( 1 - n ) 2 n ) exp ( n - 1 k ) exp ( - 2 ( n - 1 ) n ( n + 1 ) k ( 2 n ( n + 1 ) + k σ 2 ( n - 1 ) ) ) - 1 ) = θ ( exp ( σ 2 ( 1 - n ) 2 n ) exp ( n - 1 k ( 1 - 2 n ( n + 1 ) 2 n ( n + 1 ) + k σ 2 ( n - 1 ) ) ) - 1 ) = θ ( exp ( - σ 2 ( n - 1 ) 2 ( 4 n / ( n - 1 ) + k σ 2 ) 2 n ( 2 n ( n + 1 ) + k σ 2 ( n - 1 ) ) ) - 1 ) < 0.$

The inequality of (C.1) comes from Jensen’s inequality. We can show E(θ̂Shen) −θ < 0 in the same way. Moreover, since σ2(n − 1)2(4n/(n − 1) + 2)/2n(2n(n + 1) + 2(n − 1)) → 0, we have Bias(θ̂k) → 0, as n → ∞. Now, for k ≤ 3 and given S 2 = s2, we have 2(n + 4)(n − 1) + 3s2 > 2n(n + 1) + ks2, which results in Pr (θ̂Shen < θ̂k | S2) = 1. Hence, E(θ̂Shenθ) < E(θ̂kθ) < 0.

Appendix D: Proof of Proposition 3
Proof: $V ( θ ^ k ) = E ( θ ^ k 2 ) - E 2 ( θ ^ k ) = E [ exp ( X ¯ + ( n - 1 ) S 2 2 n ( n + 1 ) + k S 2 ) ] 2 - E 2 [ X ¯ + exp ( ( n - 1 ) S 2 2 n ( n + 1 ) + k S 2 ) ] = E [ exp ( 2 X ¯ ) ] E [ exp ( 2 ( n - 1 ) S 2 2 n ( n + 1 ) + k S 2 ) ] - E 2 [ exp ( X ¯ ) ] E 2 [ exp ( ( n - 1 ) S 2 2 n ( n + 1 ) + k S 2 ) ] = exp ( 2 μ + 2 σ 2 n ) Q 2 ( k ) - exp ( 2 μ + σ 2 n ) Q 1 2 ( k ) = exp ( 2 μ + σ 2 n ) [ exp ( σ 2 n ) Q 2 ( k ) - Q 1 2 ( k ) ] ,$

where Qi(k) = E[exp(i(n − 1)S 2/(2n(n + 1) + kS 2))], for i = 1, 2. Let $W ~ χ ( n - 1 ) 2$ and g(w) be a probability density function of W. Since $S 2 / σ 2 ~ χ ( n - 1 ) 2$,

$Q i ( k ) = E [ exp ( i σ 2 ( n - 1 ) W 2 n ( n + 1 ) + k σ 2 W ) ] = ∫ 0 ∞ exp ( i σ 2 ( n - 1 ) w 2 n ( n + 1 ) + k σ 2 w ) g ( w ) d w .$

Furthermore, since limn→∞ Qi(k) = exp(2/2). by Lemma 1,

$lim n → ∞ [ exp ( σ 2 n ) Q 2 ( k ) - Q 1 2 ( k ) ] = e 0 e σ 2 - ( e σ 2 2 ) 2 = 0 ,$

resulting in V (θ̂k) → 0 as n→∞.

Appendix E: Proof of Proposition 4
Proof:

Let

$U n = [ X ¯ S n - 1 2 ] , η = [ μ σ 2 ] , and ψ ( w ) = exp ( a + ( n - 1 ) b 2 n ( n + 1 ) / ( n - 1 ) + k b ) ,$

where $S n - 1 2 = S 2 / ( n - 1 )$, and w = [a, b]T. Note that,

$ψ ( U n ) = exp [ X ¯ + ( n - 1 ) S n - 1 2 2 n ( n + 1 ) / ( n - 1 ) + k S n - 1 2 ] = exp [ X ¯ + ( n - 1 ) S 2 2 n ( n + 1 ) + k S 2 ] = θ ^ k ,$

and

$ψ ( η ) = exp [ μ + ( n - 1 ) σ 2 2 n ( n + 1 ) / ( n - 1 ) + k σ 2 ] .$

Since and $S n - 1 2$ are independent, $Cov ( U n ) = [ σ 2 / n 0 0 2 σ 4 / ( n - 1 ) ]$ and the gradiant of ψ(η) is given by

$∇ ψ ( η ) = [ ψ ( η ) , ψ ( η ) · 2 n ( n + 1 ) ( n - 1 ) 2 ( 2 n ( n + 1 ) + k σ 2 ( n - 1 ) ) 2 ] T .$

Since $X ¯ → p μ$ and $S n - 1 2 → p σ 2$, the first order Tayor expansion of ψ(Un) is

$ψ ( U n ) = ψ ( η ) + [ ∇ ψ ( η ) ] T ( U n - η ) + o p ( 1 ) ,$

and the variance of θ̂k can be approximated by,

$V ( θ ^ k ) = V [ ψ ( U n ) ] ≈ [ ∇ ψ ( η ) ] T Cov ( U n ) [ ∇ ψ ( η ) ] = ψ 2 ( η ) [ σ 2 n + 2 σ 4 n - 1 · 4 n 2 ( n + 1 ) 2 ( n - 1 ) 4 ( 2 n ( n + 1 ) + k σ 2 ( n - 1 ) ) 4 ] .$

Since ψ(η) → θ, σ2/n → 0, 2σ4/(n − 1) → 0 and 4n2(n+1)2(n−1)4/(2n(n+1)+2(n−1))4 → 1/4 , we have limn→∞ V [ψ(Un)] = 0.

Appendix F: Lemma 2
Lemma 2

Let Y be a chi-squared random variable with a probability density function g(y). Define h(w), c2 and c1 as,

$h ( w ) = σ 2 ( n - 1 ) w 2 n ( n + 1 ) , c 2 = e σ 2 ( 2 n - 1 ) , c 1 = e σ 2 2 ( 1 n - 1 ) .$

Then, for n ≥3,

$∫ 0 ∞ ( c 1 c 2 - e h ( w ) ) g ( w ) d w < 0.$Proof: $∫ 0 ∞ ( c 1 c 2 - e h ( w ) ) g ( w ) d w = ∫ 0 ∞ ( e n - 3 2 n σ 2 - e σ 2 ( n - 1 ) w 2 n ( n + 1 ) ) g ( w ) d w = e n - 3 2 n σ 2 - ∫ 0 ∞ e σ 2 ( n - 1 ) w 2 n ( n + 1 ) g ( w ) d w = ∑ j = 0 ∞ 1 j ! ( n - 3 2 n σ 2 ) j - ∫ 0 ∞ ∑ j = 0 ∞ ( σ 2 ( n - 1 ) w 2 n ( n + 1 ) ) j g ( w ) d w = ∑ j = 0 ∞ 1 j ! ( n - 3 2 n σ 2 ) j - ∑ j = 0 ∞ 1 j ! ( σ 2 ( n - 1 ) 2 n ( n + 1 ) ) j E ( W j ) < 0.$

The inequality of the last part of (F.1) can be demonstrated by mathematical induction. Let A( j) and B( j) be the jth term of

$∑ j = 0 ∞ 1 j ! ( ( n - 3 ) σ 2 2 n ) j and ∑ j = 0 ∞ 1 j ! ( ( n - 1 ) σ 2 2 n ( n + 1 ) ) j E ( W j ) ,$

respectively. Note that, $E ( W j ) = ∏ r = 1 j ( n - 3 + 2 r ) = ( n - 1 ) ( n + 1 ) ( n + 3 ) ⋯ ( n - 3 + 2 j )$. This implies E(Wj+1) = E(Wj)(n + 2 j − 1). Clearly, A(0) = B(0). When j = 1,

$A ( 1 ) = ( n - 3 ) σ 2 2 n < B ( 1 ) = ( n - 1 ) σ 2 2 n ( n + 1 ) E ( W ) = ( n - 1 ) 2 σ 2 2 n ( n + 1 ) = σ 2 2 n ( n - 3 + 4 n + 1 ) .$

For j ≥2, suppose A( j) < B( j). Then, we obtain the following result:

$A ( j + 1 ) B ( j + 1 ) = A ( j ) B ( j ) · ( n - 3 ) ( n + 1 ) ( n - 1 ) ( n + 2 j - 1 ) < n + 1 n + 2 j - 1 < 1.$

Therefore, A(0) = B(0) and A( j) < B( j), for all j ≥1, resulting in $∑ j = 0 ∞ A ( j ) - ∑ j = 0 ∞ B ( j ) < 0$.

Appendix G: Proof of Theorem 1
Proof:

From the definition of S 2 in (2.2), $W = S 2 / σ 2 ~ χ ( n - 1 ) 2$. Then, with g(w), which is a p.d.f. of the chi-squared distribution with degrees of freedom n − 1, Q1(k) in (3.3) given by the following,

$Q 1 ( k ) = E [ exp ( ( n - 1 ) S 2 2 n ( n + 1 ) + k S 2 ) ] = ∫ 0 ∞ exp ( σ 2 ( n - 1 ) w 2 n ( n + 1 ) + k σ 2 w ) g ( w ) d w = ∫ 0 ∞ q 1 ( w , k ) d w ,$

where

$q 1 ( w , k ) = exp ( σ 2 ( n - 1 ) w 2 n ( n + 1 ) + k σ 2 w ) g ( w ) .$

Note that q1(w, k) is a continuous function, and g(w) = w(n−1)/2−1ew/2/(Γ ((n − 1)/2) · 2(n−1)/2) ≤ 1/2 for n ≥3. First, we demonstrate that q1(w, k) and its improper integral are bounded. By deleting 2n(n + 1) from q1(w, k), we obtain the following,

$∣ q 1 ( w , k ) ∣ ≤ e n - 1 k g ( w ) ≤ 1 2 e n - 1 k and ∫ 0 ∞ ∣ q 1 ( w , k ) ∣ d w ≤ ∫ 0 ∞ e n - 1 k g ( w ) d w = e n - 1 k .$

Furthermore, ∂q1(w, k)/∂k, which is continuous, and its improper integral is bounded, because,

$| ∂ q 1 ( w , k ) ∂ k | = σ 4 ( n - 1 ) w 2 ( 2 n ( n + 1 ) + k σ 2 w ) 2 q 1 ( w , k ) ≤ n - 1 k 2 q 1 ( w , k ) ≤ n - 1 2 k 2 e n - 1 k ,$

and

$∫ 0 ∞ | ∂ q 1 ( w , k ) ∂ k | d w ≤ ∫ 0 ∞ n - 1 k 2 q 1 ( w , k ) d w ≤ n - 1 k 2 e n - 1 k .$

Similarly, we can apply the same argument to q2(w, k) to demonstrate that q2(w, k), ∂q2(w, k)/∂k, and their improper integrals are bounded. To simplify the expressions, we define

$h ≡ h ( w , k ) = σ 2 ( n - 1 ) w 2 n ( n + 1 ) + k σ 2 w , c 2 = e σ 2 ( 2 n - 1 ) , c 1 = e σ 2 2 ( 1 n - 1 ) ,$

with

$h ′ = - h 2 ( n - 1 ) , h ″ = - 2 n - 1 h h ′ .$

Now, we can obtain the first and second derivatives of R(θ̂k, θ) by interchanging the integral and derivative because, for i = 1, 2, all qi(w, k), ∂qi(w, k)/∂k, and their improper integrals are bounded,

$∂ R ( θ ^ k , θ ) ∂ k = ∂ ∂ k ∫ 0 ∞ c 2 e 2 h ( w , k ) g ( w ) d w - ∂ ∂ k ∫ 0 ∞ 2 c 1 e h ( w , k ) g ( w ) d w = ∫ 0 ∞ ∂ ∂ k c 2 e 2 h ( w , k ) g ( w ) d w - ∫ 0 ∞ ∂ ∂ k 2 c 1 e h ( w , k ) g ( w ) d w = ∫ 0 ∞ 2 c 2 h ′ ( w , k ) e 2 h ( w , k ) g ( w ) d w - ∫ 0 ∞ 2 c 1 h ′ ( w , k ) e h ( w , k ) g ( w ) d w = ∫ 0 ∞ 2 c 2 h ′ ( w , k ) e h ( w , k ) ( e h ( w , k ) - c 1 c 2 ) g ( w ) d w = ∫ 0 ∞ 2 c 2 n - 1 [ h ( w , k ) ] 2 e h ( w , k ) ( c 1 c 2 - e h ( w , k ) ) g ( w ) d w .$

Because 2c2[h(w, k)]2eh(w,k)/(n − 1) > 0, and is bounded for w > 0, the sign of ∂R(θ̂k, θ)/∂k depends on the sign of $∫ 0 ∞ ( c 1 / c 2 - e h ( w , k ) ) g ( w ) d w$. We can show that 1 ≤ c1/c2 = eσ2(n−3)/2n < eσ2/2, and limk→∞ eh(w,k) = 1, implying that limk→∞R(θ̂k, θ)/∂k > 0. However, for k = 0, n ≥3, by Lemma 2, we have the following,

$∫ 0 ∞ ( c 1 c 2 - e h ( w , 0 ) ) g ( w ) d w < 0 ,$

implying ∂R(θ̂k, θ)/∂k|k=0 < 0. Furthermore, we define r1(k) and r2(k) as

$r 1 ( k ) = ∫ 0 ∞ 2 c 2 h ′ ( w , k ) e 2 h ( w , k ) g ( w ) d w and r 2 ( k ) = - ∫ 0 ∞ 2 c 1 h ′ ( w , k ) e h ( w , k ) g ( w ) d w ,$

respectively. Then, the third line of (G.5) states ∂R(θ̂k, θ)/∂k = r1(k) + r2(k). Recall that h″(w, k) > 0 from (G.3) and (G.4) because h > 0. Now, we observe that r1(k) is an increasing function, and r2(k) is a decreasing function because

$∂ ∂ k r 1 ( k ) = ∫ 0 ∞ 2 ( c 2 h ″ ( w , k ) + 2 c 2 ( h ′ ( w , k ) ) 2 ) e 2 h ( w , k ) g ( w ) d w > 0$

and

$∂ ∂ k r 2 ( k ) = - ∫ 0 ∞ 2 ( c 1 h ″ ( w , k ) + c 1 ( h ′ ( w , k ) ) 2 ) e h ( w , k ) g ( w ) d w d w < 0.$

This implies that a unique critical point k = k* exists,

$∫ 0 ∞ c 2 h ′ ( w , k * ) e 2 h ( w , k * ) g ( w ) d w = ∫ 0 ∞ c 1 h ′ ( w , k * ) e h ( w , k * ) g ( w ) d w .$

Moreover, we have the following

$∂ 2 R ( θ ^ k , θ ) ∂ k 2 = ∫ 0 ∞ 2 c 2 ( h ″ ( w , k ) + 2 [ h ′ ( w , k ) ] 2 ) e 2 h ( w , k ) g ( w ) d w - ∫ 0 ∞ 2 c 1 ( h ″ ( w , k ) + [ h ′ ( w , k ) ] 2 ) e h ( w , k ) g ( w ) d w = ∫ 0 ∞ 2 c 2 ( 2 n - 1 h ( w , k ) h ′ ( w , k ) + 2 [ h ′ ( w , k ) ] 2 ) e 2 h ( w , k ) g ( w ) d w - ∫ 0 ∞ 2 c 1 ( 2 n - 1 h ( w , k ) h ′ ( w , k ) + [ h ′ ( w , k ) ] 2 ) e h ( w , k ) g ( w ) d w = ∫ 0 ∞ 2 c 2 h ′ ( w , k ) e 2 h ( w , k ) ( 2 n - 1 h ( w , k ) + 2 h ′ ( w , k ) ) g ( w ) d w - ∫ 0 ∞ 2 c 1 h ′ ( w , k ) e h ( w , k ) ( 2 n - 1 h ( w , k ) + h ′ ( w , k ) ) g ( w ) d w ,$

and at k = k*,

$∂ 2 R ( θ ^ k , θ ) ∂ k 2 | k = k * = ∫ 0 ∞ 2 c 2 h ′ ( w , k * ) e 2 h ( w , k * ) ( 2 n - 1 h ( w , k * ) + 2 h ′ ( w , k * ) ) g ( w ) d w - ∫ 0 ∞ 2 c 2 h ′ ( w , k * ) e 2 h ( w , k * ) ( 2 n - 1 h ( w , k * ) + h ′ ( w , k * ) ) g ( w ) d w = ∫ 0 ∞ 2 c 2 [ h ′ ( w , k * ) ] 2 e 2 h ( w , k * ) g ( w ) d w > 0.$

Therefore, a point k = k* exists, at which R(θ̂k, θ) has a unique minimum.

Figures
Fig. 1. Risk comparison of the existing estimators with various sample sizes. [Upper] Relative mean squared error (RMSE) of six estimators.[Lower]Ratio of RMSE, R(θ̂*, θ)/R(θ̂Shen, θ), where θ̂* denotes each of θ̂sm, θ̂mle, θ̂umvue, θ̂ES, and θ̂Zhou. Overall, θ̂Shen has the smallest RMSE compared with the other five estimators.
Fig. 2. Risk comparison between the estimators of Longford and Shen. [Upper] Difference in the Relative mean squared error (RMSE) R(θ̂Lf, θ) − R(θ̂Shen, θ). [Lower] Ratio of the RMSE, R(θ̂Lf, θ)/R(θ̂Shen, θ).
Fig. 3. Average relative mean squared error (RMSE) where , and nLB = 5. Each graph presents the average RMSE with a different setting of nUB. The optimal k values are 3.3, 3.3, 3.4, 3.5, 3.6, 3.7, 3.8, and 3.8 from the left-top to the right-bottom.
Fig. 4. Risk comparison between θ̂Shen and θ̂Prop with small sample sizes, n = 8, 12, 20 and 30. Upper: Bias2(θ̂Shen) − Bias2(θ̂Prop), V(θ̂Shen) − V(θ̂Prop), and RMS E(θ̂Shen, θ) − RMS E(θ̂Prop, θ). Lower: Bias2(θ̂Shen)/Bias2(θ̂Prop), V(θ̂Shen)/V(θ̂Prop), and RMS E(θ̂Shen)/RMS E(θ̂Prop).
TABLES

### Table 1

Relative mean squared error (RMSE) of θ̂Lf, θ̂Shen, and θ̂Prop under various distributions of σ2

Distribution of σ2Est·n = 8n = 12n = 20n = 30n = 50n = 100
Uniform (0, 2)θ̂Lf0.132080.098330.065460.046280.029540.01562
θ̂Shen0.130050.097300.065120.046180.029520.01562
θ̂Prop0.129340.096610.064670.046040.029470.01562
Gamma (1, 1)θ̂Lf0.132160.098480.068780.049780.033190.01790
θ̂Shen0.129440.096570.068010.049510.033150.01790
θ̂Prop0.128510.095170.066800.048820.032890.01783
Gamma (2, 2)θ̂Lf0.133880.099220.065660.047300.030660.01622
θ̂Shen0.132040.097850.065190.047130.030640.01621
θ̂Prop0.131560.096900.064510.046760.030530.01620
Inv. Gam. (3, 2)θ̂Lf0.130910.096930.066400.047570.030590.01657
θ̂Shen0.128940.095700.065940.047380.030540.01656
θ̂Prop0.128670.095100.065500.047090.030410.01652
Inv. Gam. (6, 5)θ̂Lf0.132230.097070.064320.045180.028740.01529
θ̂Shen0.130710.096080.064020.045090.028710.01528
θ̂Prop0.130710.095690.063840.045040.028670.01528
Gamma Mixture 0.5*G(4,1/5)+0.5*G(16,1/10)θ̂Lf0.133060.098020.066790.046910.030450.01633
θ̂Shen0.130800.096560.066410.046760.030440.01633
θ̂Prop0.129850.095320.065760.046380.030390.01633

*Inv.Gam = Inverse gamma.

References
1. Chelmow D, Penzias AS, Kaufman G, and Cetrulo C (1995). Costs of triplet pregnancy. American Journal of Obstetrics & Gynecology, 172, 677-682.
2. Evans IG and Shaban SA (1974). A note on estimation in lognormal models. Journal of the American Statistical Association, 69, 779-781.
3. Finney DJ (1941). On the distribution of a variate whose logarithm is normally distributed. Supplement to the Journal of the Royal Statistical Society, 7, 144-161.
4. Hull JC (2018). Options, Futures, and Other Derivatives, New York, Pearson.
5. Lacey LF, Keene ON, Pritchard JF, and Bye A (1997). Common noncompartmental pharmacokinetic variables: are they normally or log-normally distributed?. Journal of Biopharmaceutical Statistics, 7, 171-178.
6. Land C (1972). An evaluation of appropriate confidence interval methods for lognormal means. Technometrics, 14, 145-158.
7. Longford NT (2009). Inference with the lognormal distribution. Journal of Statistical Planning and Inference, 139, 2329-2340.
8. Ohring M and Kasprzak L (1998). Reliability and Failure of Electronic Materials and Devices, San Diego, Academic Press.
9. Rukhin AL (1986). Improved estimation in lognormal models. Journal of the American Statistical Association, 81, 1046-1049.
10. Shen H, Brown LD, and Zhi H (2006). Efficient estimation of log-normal means with application to pharmacokinetic data. Statistics in Medicine, 25, 3023-3038.
11. Zellner A (1971). Bayesian and non-Bayesian analysis of the log-normal distribution and log-normal regression. Journal of the American Statistical Association, 66, 327-330.
12. Zhou XH, Hui SL, and Gao S (1997). Methods for comparing the means of two independent lognormal samples. Biometrics, 53, 1129-1135.
13. Zhou XH (1998). Estimation of the log-normal mean. Statistics in Medicine, 17, 2251-2264.