TEXT SIZE

CrossRef (0)
Optimal designs for small Poisson regression experiments using second-order asymptotic

S. Mehr Mansoura, M. Niaparast1,a

aRazi University, Department of Statistics, Iran
Correspondence to: 1Razi University, Department of Statistics, University Avenue, Taghe Bostan, Kermanshah, Iran.
E-mail: Niaparast@razi.ac.ir
Received March 9, 2019; Revised May 19, 2019; Accepted August 14, 2019.
Abstract
This paper considers the issue of obtaining the optimal design in Poisson regression model when the sample size is small. Poisson regression model is widely used for the analysis of count data. Asymptotic theory provides the basis for making inference on the parameters in this model. However, for small size experiments, asymptotic approximations, such as unbiasedness, may not be valid. Therefore, first, we employ the second order expansion of the bias of the maximum likelihood estimator (MLE) and derive the mean square error (MSE) of MLE to measure the quality of an estimator. We then define DM-optimality criterion, which is based on a function of the MSE. This criterion is applied to obtain locally optimal designs for small size experiments. The effect of sample size on the obtained designs are shown. We also obtain locally DM-optimal designs for some special cases of the model.
Keywords : bias, DM-optimal criterion, mean square error, Poisson regression model
1. Introduction

The validity of an experiment depends on the appropriate experimental design that can be elucidated by using the theory of optimum experimental design. A convenient framework to provide the ideal design is to select a set of control variables so that it leads to the minimum of the variance of parameter estimator. There is an increasing number of experiments that a linear model cannot adequately indicate the fundamental features of the data.

Observational studies involve many situations in which the outcome of an experiment is countable. An appropriate model to describe such data is the Poisson regression model, that can be found in McCullagh and Nelder (1989). Unfortunately, there are difficulties with the optimal design problem in such models. These difficulties are caused by likelihood equations that are usually the nonlinear functions of the parameters. Therefore, the maximum likelihood estimator (MLE) cannot be solved analytically. Under such circumstances, numerical methods such as Fisher scoring are employed to obtain MLE (Searle et al., 1992). Due to the lack of a closed form for MLE, variance-covariance matrix cannot be directly calculated. However, MLE has substantial asymptotic properties despite this limitation. According to the large-sample theory, MLE is asymptotically unbiased and the variance of MLE is approximated by the inverse of the Fisher information matrix. Optimal designs for Poisson regression models have been done by Wang et al. (2006a, b), Russell et al. (2009b). In these literature references, optimal designs are obtained based on the maximization of a convex function of the Fisher information matrix, which follow the asymptotic theory. An important question rising here is, whether the asymptotic properties establish when the sample size is not large enough. Numerical results show that the asymptotic approximations do not hold for small size experiments. Therefore, the mentioned problem may affect the accuracy of the obtained optimal designs.

For non-linear models, the small-sample differential-geometric approach based on the least squares estimator is deemed to improve the experiment conclusions (Pronzato and Pazman, 2013). Russell et al. (2009a) considered a locally optimal design for the simple logistic regression model in small size experiments. They used the maximum penalized likelihood, which has been studied by Firth (1993), for estimation of parameters and obtained locally optimal designs for two support points of the designs. Poursina and Talebi (2013) proposed a modified D-optimality criterion based on the Bhattacharyya lower bound for two parameters simple logistic model. They presented an intuitive proof and demonstrated that modified optimal designs are more efficient than the previous optimal designs. Khuri et al. (2006) derived the mean square error (MSE) for the linear predictor, and consider optimal design for the generalized linear models (GLMs). Mehr Mansour and Niaparast (2019) have also shown a numerical study on optimal designs for small sample size in logistic regression model.

In the present work, we provide the discussion of the higher-order asymptotic expansion of the bias of the MLE for the Poisson regression model. In order to obtain accurate results in small size experiments, it is preferable that the variance and the bias are considered together to obtain optimal designs (Box and Draper, 1959). To fulfil this, we primarily introduce the Poisson regression model and review the asymptotic expansion of MLE for this model. We then discuss the bias of MLE for the Poisson regression model and obtain the MSE by using the second-order expansion of MLE. In Section 3, we define DM-optimality criterion, and then study optimal designs for the special cases of the Poisson regression model based on the minimization of the MSE.

2. Poisson regression model

The validity of an experiment depends on the appropriate experimental design that can be elucidated by using the theory of optimum experimental design. A convenient framework to provide the ideal design is to select a set of control variables so that it leads to the minimum of the variance of parameter estimator. There is an increasing number of experiments that a linear model cannot adequately indicate the fundamental features of the data.

Observational studies involve many situations in which the outcome of an experiment is countable. An appropriate model to describe such data is the Poisson regression model, that can be found in McCullagh and Nelder (1989). Unfortunately, there are difficulties with the optimal design problem in such models. These difficulties are caused by likelihood equations that are usually the nonlinear functions of the parameters. Therefore, the maximum likelihood estimator (MLE) cannot be solved analytically. Under such circumstances, numerical methods such as Fisher scoring are employed to obtain MLE (Searle et al., 1992). Due to the lack of a closed form for MLE, variance-covariance matrix cannot be directly calculated. However, MLE has substantial asymptotic properties despite this limitation. According to the large-sample theory, MLE is asymptotically unbiased and the variance of MLE is approximated by the inverse of the Fisher information matrix. Optimal designs for Poisson regression models have been done by Wang et al. (2006a, b), Russell et al. (2009b). In these literature references, optimal designs are obtained based on the maximization of a convex function of the Fisher information matrix, which follow the asymptotic theory. An important question rising here is, whether the asymptotic properties establish when the sample size is not large enough. Numerical results show that the asymptotic approximations do not hold for small size experiments. Therefore, the mentioned problem may affect the accuracy of the obtained optimal designs.

For non-linear models, the small-sample differential-geometric approach based on the least squares estimator is deemed to improve the experiment conclusions (Pronzato and Pazman, 2013). Russell et al. (2009a) considered a locally optimal design for the simple logistic regression model in small size experiments. They used the maximum penalized likelihood, which has been studied by Firth (1993), for estimation of parameters and obtained locally optimal designs for two support points of the designs. Poursina and Talebi (2013) proposed a modified D-optimality criterion based on the Bhattacharyya lower bound for two parameters simple logistic model. They presented an intuitive proof and demonstrated that modified optimal designs are more efficient than the previous optimal designs. Khuri et al. (2006) derived the mean square error (MSE) for the linear predictor, and consider optimal design for the generalized linear models (GLMs). Mehr Mansour and Niaparast (2019) have also shown a numerical study on optimal designs for small sample size in logistic regression model.

In the present work, we provide the discussion of the higher-order asymptotic expansion of the bias of the MLE for the Poisson regression model. In order to obtain accurate results in small size experiments, it is preferable that the variance and the bias are considered together to obtain optimal designs (Box and Draper, 1959). To fulfil this, we primarily introduce the Poisson regression model and review the asymptotic expansion of MLE for this model. We then discuss the bias of MLE for the Poisson regression model and obtain the MSE by using the second-order expansion of MLE. In Section 3, we define DM-optimality criterion, and then study optimal designs for the special cases of the Poisson regression model based on the minimization of the MSE.

### 2. Poisson regression model

The main assumption in the linear regression is that the response variable is normally distributed. There are numerous conditions in which outcomes have non-normal distribution, but the random response is a member of the exponential family. Nelder and Wedderburn (1972) established the GLMs to describe such data. The Poisson regression model is a special case of GLMs used to model count data where the response variable is specified by the Poisson distribution. Some applications of this model in medicine, social sciences and toxicology have been studied, for example see Coxe et al. (2009) and Cameron and Trivedi (2013).

Suppose that Yi’s are independent random variables from Poisson distribution which assume that the logarithm of their means can be modelled by a linear combination of unknown parameters, i.e.

$log ( μ i ) = f T ( x i ) β , i = 1 , … , n ,$

where βT = (β1, …, βp) is the vector of unknown parameters, μi = μi(xi, β) is the mean of Yi and fT (xi) = ( f1(xi), …, fp(xi)) is the vector of known regression functions at explanatory variables, xi, that are the same for all individuals.

This model has been widely studied in literature, for example see McCullagh and Nelder (1989) and Cameron and Trivedi (2013).

### 2.1. Maximum likelihood estimator

For the Poisson regression model, the log likelihood can be written as

$L ( β ) = ∑ i = 1 n [ y i f T ( x i ) β - e f T ( x i ) β - log ( y i ! ) ] ,$

and

$l = ∂ L ( β ) ∂ β = ∑ i = 1 n [ y i f ( x i ) - f ( x i ) e f T ( x i ) β ] .$

The score function to estimate the Poisson regression parameters is completely non-linear; therefore, the MLE of parameters cannot be obtained in a closed form. This problem causes a difficulty for making inference based on the maximum likelihood estimation. The common approach is to use the asymptotic approximation of the score function, the derivative of the log-likelihood function, that indicates sensitive a likelihood function with respect to parameters.

To denote the different orders of derivatives of the log-likelihood function, we apply the following notations.

$l r = ∂ L ( β ) ∂ β r = ∑ i = 1 n ( - f r ( x i ) e f T ( x i ) β + y i f r ( x i ) ) , L r s = ∂ 2 L ( β ) ∂ β r ∂ β s = ∑ i = 1 n ( - f r ( x i ) f s ( x i ) e f T ( x i ) β ) , L r s t = ∂ 3 L ( β ) ∂ β r ∂ β s ∂ β t = ∑ i = 1 n ( - f r ( x i ) f s ( x i ) f t ( x i ) e f T ( x i ) β ) .$

We also indicate the expectation of the above expressions with

$λ r s = E ( L r s ) , λ r s t = E ( L r s t ) .$

The Taylor expansion of the rth element of the score function, $l^r=lr(β^)$, about β is

$l ^ r = l r + ∑ s L r s ( β ^ s - β s ) + 1 2 ∑ s , t L r s t ( β ^ s - β s ) ( β ^ t - β t ) + ⋯ .$

Then Equation (2.1) is inverted to obtain an asymptotic expansion for $(β^r-βr()$. Therefore, asymptotic expansion for $(β^r-βr()$ can be written as

$( β ^ r - β r ) = - ∑ s L r s l s - 1 2 ∑ s , t a r s t l s l t + ∑ s , t , u L r s L t u H s t l u + O P ( n - 3 2 ) ,$

where, arst = Σg,h,i LrgLshLtiLghi, Hst = Lstλst and ${ L r s } r , s = 1 p$ is the inverse matrix of ${ L r s } r , s = 1 p$ (e.g., for example Lawley (1956)).

The bias of $β^r$, r = 1, …, p, can be represented as follow,

$bias ( β ^ r ) = 1 2 ∑ i , j , h i r i i j h λ i j h ,$

where, ${ i r s } r , s = 1 p$ is the inverse matrix of the Fisher information matrix, $I ( β ) = { - λ r s } r , s = 1 p$ (see for example Barndorff-Nielsen and Cox (1994, p.150)).

It can be obtained from Equation (2.2) that the above bias, called the second order bias, is of order O(n−3/2). Whereas according to the large sample theory, the first order bias, which is based on the first term of right side of Equation (2.2), tends to zero (Lehmann, 1999).

To illustrate the influence of the sample size on the bias of β, we now consider the following simulated example.

### Example 1

Consider a simple Poisson regression model, where the response mean is described in terms of an explanatory variable

$μ i = e β 0 + β 1 x i , i = 1 , … , n .$

Furthermore, let xi’s take values {0, 2}. For the purpose of illustration, we have simulated a data set of the simple Poisson regression model, so that half of the observations is taken at x = 0 and the other half at x = 2. The number of observations is set in sizes n = 10, 20, 50, 100, 1000. The number of simulations is s = 1000 in each case.

In each replication, we fit the simple Poisson model and use the simulated data and the explanatory variables to compute the average bias of the parameter estimates for each combination of (n, β) defined as

$bias ( β ^ r ) = 1 s ∑ j = 1 s ( β ^ r j - β r ) ,$

where $β ^ r j$ is the estimator of βr for jth replication. Figure 1 shows the simulated results.

Figure 1(a) reports the absolute value of bias($β^0$) as a function of β0 with the assumption that β1 = −1. For any fixed value of β0, the absolute of bias($β^0$) is a decreasing function of n. The plots in Figure 1(a) demonstrate that the absolute of bias($β^0$) tends to zero when β0 is rising.

Figure 1(b) shows the absolute value of the bias of $β^1$ as a function of β1, when β0 = 1. Similar to Figure 1(a), it is observed that the absolute value of bias($β^1$) is a decreasing function of n specially for small value of β1.

As a general conclusion, the simulated results have indicated that the absolute value of the bias of MLE of parameters is a decreasing function of distribution means. It is also shown that the bias of MLE is not zero in small sample experiments. While, the first order bias is equal to zero. Therefore, it is useful to compare the simulated bias with the second order bias which is expressed in Equation (2.3). Note that the true values of the parameters are β0 = 1 and β1 = −1.

Table 1 shows that the second order bias is closer to the simulated bias and is a better approximation for computation bias. Therefore, we apply Equation (2.2) to inference about MLE.

### 2.2. Mean square error

The MSE is a useful tool to measure the difference between the estimator and the true value of the parameters. The MSE expresses the quality of an estimator, $β^$, and is defined as follows,

$MSE ( β ^ ) = E ( β ^ - β ) ( β ^ - β ) T = Var ( β ^ ) + bias ( β ^ ) bias T ( β ^ ) ,$

where MSE($β^$) is a symmetric and positive semi-definite matrix. The MSE($β^$) incorporates both the variance of the estimator and its bias.

Note that if we apply the first order of the Taylor expansion of the score function, Equation (2.1), or we ignore the second term of right side of Equation (2.2), then the bias of β tends to zero. The MSE($β^$) will be equal to I−1(β) asymptotically.

In the following theorem, we obtain MSE($β^$) for the Poisson regression model.

### Theorem 1

Consider a Poisson regression model with canonical link function. The MSE for the MLE of the vector of parameters can be represented as

$M S E ( β ^ ) ≃ I - 1 ( β ) - 1 2 I - 1 ( β ) F T W U ( 2 ) WF I - 1 ( β ) + 1 4 I - 1 ( β ) F T W U ( 2 ) W U ( 2 ) WF I - 1 ( β ) + 1 4 I - 1 ( β ) F T W D U J D U WF I - 1 ( β ) ,$

where $W = d i a g { μ i } i = 1 n$ is a diagonal matrix of means, F = {f(x1), …, f(xn)}T is the design matrix, I(β) = FTWF, U = FI−1(β)FT, $U ( 2 ) = { u i j 2 } i , j = 1 n$, DU = diag{u11, …, unn} and J is a n × n matrix of unit elements.

The proof of Theorem 1 is given in Appendix A.

Khuri et al. (2006) derive MSE for the linear predictor using the asymptotic variance and the second order bias, while we directly obtain the MSE of the second order expansions of the MLE for the vector estimators.

3. DM-optimal design

Optimal design is to provide the best estimation of parameters based on specific design criterion. Most optimal criteria are based on the variance of the estimator of parameters. For the Poisson regression model, Wang et al. (2006a, b) conducted extensive work on optimal design based on a convex function of the Fisher information matrix. However, for small size experiments, the Fisher information matrix may not be a trustworthy approximation to obtain the optimal designs. Therefore, we propose to use MSE to obtain optimal design for small size experiments due better accuracy. Suppose that $ξ = { x 1 ⋯ x r n 1 ⋯ n r }$ is an exact design where xi’s are points of design and ni is the number of observations which are taken at xi, $∑ i = 1 r n i = n$. The design ξ* is better than the design ξ, if MSE($β^ξ$) − MSE($β^ξ*$) is a non-negative definite matrix. Unfortunately, it is not possible to solve the above optimization problem. Therefore, the design of experiments is selected so that some real functions of the MSE of estimators are minimized. The most popular approach is based on the determinant of the MSE($β^ξ$). Therefore, we define a new optimality criterion as follows,

### Definition 1

A design ξ* is said to be DM-optimal in the class of designs Ξ if

$log det ( M S E ( β ^ ξ * ) ) ≤ log det ( M S E ( β ^ ξ ) ) ∀ ξ ∈ Ξ ,$

where D and M are abbreviation of the determinant and the MSE.

This criterion is different of the D-optimality criterion. DM-optimality criterion is based on the determinant of the MSE whereas D-optimality criterion is based on the determinant of the Fisher information matrix.

There are two essential problems: First, MSE($β^ξ$) depends on the unknown parameters through response mean, and therefore, the DM-optimality criterion depends on the vector of parameters, β. Whereas the exact values of β are unknown in practice. We consider the locally DM-optimal design. Second, MSE($β^ξ$) relies on the sample size. It means that the sample size affects the DM-optimal design. MSE($β^ξ$) is based on the more accurate approximation of MLE; therefore, it is expected that DM-optimal designs are more efficient for small size experiments.

For any ξ, there exists a value of the det(MSE($β^ξ$)). We need an optimization method to find the design ξ* that minimize log det(MSE($β^ξ$)). Here, we have a discrete optimization problem which is difficult to find a solution for this problem. An approach is to consider all possible combinations of ni’s to obtain DM-optimal design. We can use also approximate designs to guess the optimum number of trials at any point of design instead of this calculation. Approximate designs ignore the constraint that the number of observations must be an integer at any point of design. These results are then employed to find DM-optimal designs based on the number of trials at any point of design.

Optimization classical methods often are known as local optimization since they derive the different results with different starting points. Therefore in this paper, we apply a hybrid method which uses of two optimality algorithm to obtain the exact global optimal point.

First a genetic algorithm (GA) is used to find near point to global optimal point (Chen and Ye, 2009). The GA is a global optimization algorithm based on the principles of natural selection and genetics. This process starts with a population of individuals, which serve as the first generation to solve above optimization problem. Then selection, crossover and mutation operations are employed to produce the next generation. It is repeated until a sufficiently large number of generations have passed without any improvement in the value of the function that we want to optimize. We use the second optimization due to the constraints of GA that, do not guarantee the exact global optimum despite resulting in a near global optimum.

Second, a local optimality algorithm is employed to achieve the global optimal point using the obtained point in GA as the starting point. We use of the sequential quadratic programming (SQP) method work out in “fmincon” function at MATLAB program (Fletcher and Powell, 1963).

In this paper, we consider two special cases of the Poisson regression model. Equations (3.2) and (3.3) define the mean function for simple model and quadratic model, respectively,

$μ i = exp ( β 0 + β 1 x i ) ,$ $μ i = exp ( β 0 + β 1 x i + β 2 x i 2 ) .$

In most applications of this model, especially in toxicity studies, it is assumed that the design region is the non-negative subset of real numbers, χ = [h, g] where h is a non-negative real number and g > h (Wang et al., 2006a). In this study, we suppose that χ = [0,∞]. Without loss of generality, we consider a special case, where μi is assumed to be a decreasing function of xi. We define canonical mean $μ~i$ = μi/μmax, where μmax is the maximum value of μi = μi(xi, β), which is achieved in the lower bound of design region. It is clear that $μ~i$ will be in [0, 1]. In this paper, we adopt a design space in terms of $μ~i$’s rather than xi’s. According to the monotone relation between $μ~i$ and xi, we can consider designs based on both, equivalently. All the designs in this paper are derived under this assumption; however, they can be easily generalized to situations that μi is an increasing function of xi.

### 3.1. Locally DM-optimal design for the simple model

In this section, we consider a simple Poisson regression model with mean μi, which has been specified in (3.2). We are also restricted in design space, Ξ, to saturated designs or designs with two support points, i.e., $ξ = { μ ˜ 1 μ ˜ 2 n 1 n 2 }$. Note that $μ~i$ = eβ1 xi or equivalently xi = (1/β1) log( $μ~i$). The following theorem obtains MSE($β^ξ$) for this model.

### Theorem 2

For a simple Poisson regression model, MSE($β^ξ$) can be represented as

$M S E ( β ^ ξ ) ≃ 1 a ( m 11 m 12 m 12 m 22 ) ,$

where $a = ( log ( μ ˜ 1 ) - log ( μ ˜ 2 ) ) 2 e 3 β 0 n 1 3 n 2 3 μ ˜ 1 3 μ ˜ 2 3$

$m 11 = e 2 β 0 n 1 2 n 2 2 μ ˜ 1 2 μ ˜ 2 2 ∑ n i μ ˜ i log 2 ( μ ˜ i ) - 0.25 e β 0 n 1 n 2 μ ˜ 1 μ ˜ 2 ∑ n i 2 μ ˜ i 2 log 2 ( μ ˜ i ) + 0.25 ∑ n i 3 μ ˜ i 3 log 2 ( μ ˜ i ) - 0.5 e β 0 n 1 2 n 2 2 μ ˜ 1 2 μ ˜ 2 2 log ( μ ˜ 1 ) log ( μ ˜ 2 ) , m 12 = - β 1 e 2 β 0 n 1 2 n 2 2 μ ˜ 1 2 μ ˜ 2 2 ∑ n i μ ˜ i log ( μ ˜ i ) + 0.25 β 1 e β 0 n 1 n 2 μ ˜ 1 μ ˜ 2 ∑ n i 2 μ ˜ i 2 log ( μ ˜ i ) - 0.25 β 1 ∑ n i 3 μ ˜ i 3 log ( μ ˜ i ) + 0.25 e β 0 n 1 2 n 2 2 μ ˜ 1 2 μ ˜ 2 2 ( log ( μ ˜ 1 ) + log ( μ ˜ 2 ) ) , m 22 = β 1 2 e 2 β 0 n 1 2 n 2 2 μ ˜ 1 2 μ ˜ 2 2 ∑ n i μ ˜ i - 0.25 β 1 2 e β 0 n 1 n 2 μ ˜ 1 μ ˜ 2 ∑ n i 2 μ ˜ i 2 + 0.25 β 1 2 ∑ n i 3 μ ˜ i 3 - 0.5 β 1 2 e β 0 n 1 2 n 2 2 μ ˜ 1 2 μ ˜ 2 2 .$

### Proof

For any $ξ = { μ ˜ 1 μ ˜ 2 n 1 n 2 }$, we have $F ξ = ( 1 1 β 1 log ( μ ˜ 1 ) 1 1 β 1 log ( μ ˜ 2 ) )$ and $W ξ = ( n 1 μ 1 0 0 n 2 )$ which are design matrix and diagonal matrix of means corresponding to design ξ, respectively. Then

$I ( β ) = e β 0 β 1 2 ( β 1 2 ∑ n i μ ˜ i β 1 ∑ n i μ ˜ i log ( μ ˜ i ) β 1 ∑ n i μ ˜ i log ( μ ˜ i ) ∑ n i μ ˜ i log 2 ( μ ˜ i ) ) , U = 1 e β 0 ( 1 n 1 μ ˜ 1 0 0 1 n 2 μ ˜ 2 ) .$

Replacing the above matrices in Equation (2.4), and then straightforward calculation leads to the results.

We have assumed that μi is a decreasing function of xi. xi ∈ [0,∞]; therefore, this implies that β1 < 0 for simple model in (3.2). Using the Theorem 2 and applying hybrid method, the results for DM-optimal design are obtained for some representative values of parameters in

DM-optimal design optimizes the determinant of MSE($β^ξ$); therefore, it is beneficent that we compare the accuracy of DM-optimal design, ξ*, with D-optimal design, ξ+, that maximizes the determinant of I(β). Therefore, we define DMeff criterion as follows,

$DM eff = det ( MSE ( β ^ ξ * ) ) det ( MSE ( β ^ ξ + ) ) .$

Figure 2 shows DMeff of obtained designs in

The results show that DM-optimal designs are more efficient for small size experiments. DM-optimal designs are robust based on $μ~$; therefore, do not change for different values of β1. Figure 2 is drown only for different values of β0. Plots for some different values of β0 in Figure 2 show that the DM-optimal designs and the D-optimal designs will be very close to each other when β0 increases.

### 3.2. Locally optimal design for the Quadratic model

Considering the effect of an explanatory variable on the response variable, the effect of explanatory variables is sometimes stronger than the simple model describes. Therefore, the quadratic model defined in (3.3) may be suitable. Here we survey DM-optimal design for the quadratic model. According to the decreasing assumption on the mean response, we take negative values for β1 and β2. To find DM-optimal design, the class of designs, Ξ, has been restricted to the saturated designs with three support points, i.e., $ξ = { μ ˜ 1 μ ˜ 2 μ ˜ 3 n 1 n 2 n 3 }$ where, $μ ˜ i = e β 1 x i + β 2 x i 2$ is equivalent to

$x i = 1 2 - β 2 [ - r + - r - 4 log ( μ ˜ i ) ] ,$

where $r = β 1 2 / β 2$. Therefore, $F ξ = ( 1 x 1 x 1 2 1 x 2 x 2 2 1 x 3 x 3 2 )$ and $W ξ = diag { n i μ i } i = 1 3$ are the design matrix and the diagonal matrix of means, respectively. DM-optimal designs have been obtained for some representative values of parameters (Table 3).

The DMeff of DM-optimal designs for the quadratic model are illustrated in Figure 3. It is observed that DM-optimal designs are more accurate for small size experiments.

4. Conclusion

The non-existence of an explicit form for the parameter estimators of non-linear regression models leads statisticians to apply asymptotic inference based on the likelihood function to obtain the optimal design for GLMs.

In this study, we consider the Poisson regression model, which is a special case of the GLMs. The results indicate that for small size experiments, the asymptotic properties of parameter estimators; such as unbiasedness, may be in conflict. Therefore, we replace variance with MSE, which is considered as both a variance and bias, to obtain optimal design. In addition, we obtained a more precise estimation of parameter β using the higher-order expansion of the likelihood function. To illustrate the theoretical results, we consider two special cases of the Poisson regression model. DM-optimal designs are calculated for small size experiments and compared with D-optimal designs, respectively. The results show that the optimal designs based on the MSE are better than the optimal designs based on the inverse of the information matrix.

An ongoing research activity is the investigation of Bayesian optimal designs instead of the local optimal designs.

Appendix

### Proof of Theorem 1

For Poisson regression model (r, k)th element of MSE($β^$) is given by

$m r k = E { ( β ^ r - β r ) ( β ^ k - β k ) } ≃ E ( ∑ s 1 i r s 1 l s 1 - 1 2 ∑ s 1 , t 1 α r s 1 t 1 l s 1 l t 1 ) ( ∑ s 2 i k s 2 l s 2 - 1 2 ∑ s 2 , t 2 α k s 2 t 2 l s 2 l t 2 ) = ∑ s 1 , s 2 i r s 1 i k s 2 E ( l s 1 l s 2 ) + 1 4 ∑ s 1 , t 1 , s 2 , t 2 α r s 1 t 1 α k s 2 t 2 E ( l s 1 l t 1 l s 2 l t 2 ) - 1 2 ∑ s 1 , s 2 , t 2 i r s 1 α k s 2 t 2 E ( l s 1 l s 2 l t 2 ) - 1 2 ∑ s 1 , t 1 , s 2 i k s 2 α r s 1 t 1 E ( l s 1 l t 1 l s 2 ) ,$

where $αrs1t1=-Σq1,q2,q3irq1is1q2it1q3λq1q2q3$. Note that Hst = 0.

To obtain the expectation of the log-likelihood derivatives, we use a given general form by Barndorff-Nielsen and Cox (1994, p 146) that is as follow

$∑ π = 1 R ∑ R / π ν R 1 , … , R π = 0 ,$

where R = r1 · · · rm is a set of coordinate indices. The inner sum in equation (A.2) is over all partitions of R into π blocks R1, …, Rπ and $νR1,...,Rπ$ = $Fs=MSSBMSSW=SSBK-1SSWN-K$. Therefore, for the Poisson regression model, we obtain

$E ( l s 1 l s 2 ) = - λ s 1 s 2 , E ( l s 1 l s 2 l t 2 ) = - λ s 1 s 2 t 2 , E ( l s 1 l t 1 l s 2 l t 2 ) = λ s 1 t 1 λ s 2 t 2 + λ s 1 s 2 λ t 1 t 2 + λ s 1 t 2 λ s 2 t 1 - λ s 1 t 1 s 2 t 2 .$

Then, we have

$∑ s 1 , s 2 i r s 1 i k s 2 E ( l s 1 l s 2 ) = i r k .$

Moreover, we can find

$∑ s 1 , s 2 , t 2 i r s 1 α k s 2 t 2 E ( l s 1 l s 2 l t 2 ) = ∑ i , m μ i μ m ( ∑ s 1 f s 1 ( x m ) i r s 1 ) ( ∑ s 2 , q 2 f s 2 ( x i ) f q 2 ( x m ) i s 2 q 2 ) ( ∑ t 2 , q 3 f t 2 ( x m ) f q 3 ( x i ) i t 2 q 3 ) ( ∑ q 1 f q 1 ( x i ) i k q 1 ) = ∑ i , m μ i μ m u i m 2 ( ∑ s 1 f s 1 ( x m ) i r s 1 ) ( ∑ q 1 f q 1 ( x i ) i k q 1 ) .$

Similarly,

$∑ s 1 , t 1 , s 2 i k s 2 α r s 1 t 1 E ( l s 1 l t 1 l s 2 ) = ∑ i , m μ i μ m u i m 2 ( ∑ s 2 f s 2 ( x m ) i k s 2 ) ( ∑ q 1 f q 1 ( x i ) i r q 1 ) .$

We also get

$∑ s 1 , t 1 , s 2 , t 2 α r s 1 t 1 α k s 2 t 2 E ( l s 1 l t 1 l s 2 l t 2 ) = 2 ∑ i , m μ i μ m u i m 2 ( ∑ q 1 f q 1 ( x m ) i r q 1 ) ( ∑ p 1 f p 1 ( x i ) i k p 1 ) + ∑ i , m μ i μ m u i i u m m ( ∑ q 1 f q 1 ( x m ) i r q 1 ) ( ∑ p 1 f p 1 ( x i ) λ k p 1 ) + ∑ i , j , m μ i μ j μ m u i j 2 u m j 2 ( ∑ q 1 f q 1 ( x i ) i r q 1 ) ( ∑ p 1 f p 1 ( x m ) i k p 1 ) .$

The results is obtained by substituting the above three equations in (A.1) and the straightforward application of matrix algebra.

Figures
Fig. 1.

(a): |bias($β^0$)| when β1 = −1; (b): |bias($β^1$)| when β0 = 1.

Fig. 2.

DMeff for DM-optimal design in the simple model: (a) β0 = 0.1; (b) β0 = 0.5; (c) β0 = 1.

Fig. 3.

DMeff for DM-optimal design in the quadratic model. I: β1 = −1, β2 = −1, (a) β0 = 1; (b) β0 = 1.5; (c) β0 = 2; (d) β0 = 2.5. II: β0 = 1, β2 = −1, (a) β1 = −1; (b) β1 = −1.5; (c) β1 = −2; (d) β1 = −2.5. III: β0 = 1, β1 = −1, (a) β2 = −1; (b) β2 = −1.5; (c) β2 = −2; (d) β2 = −2.5.

TABLES

### Table 1

The bias for the simple Poisson model

n

10 16 20 30 50
$bias(β0^)$ Simulated bias −0.036 −0.023 −0.021 −0.001 −0.009
Second order bias −0.037 −0.023 −0.018 −0.012 −0.007

$bias(β1^)$ Simulated bias −1.481 −0.536 −0.291 −0.072 −0.025
Second order bias −0.118 −0.073 −0.059 −0.039 −0.024

### Table 2

DM-optimal design for the simple model, n2 = nn1

n β0 = 0.1, β1 = −1 β0 = 0.5, β1 = −1 β0 = 1, β1 = −1

n1 $μ˜1$ $μ˜2$ n1 $μ˜1$ $μ˜2$ n1 $μ˜1$ $μ˜2$
8 5 0.1705 1 4 0.1669 1 4 0.1402 1
10 6 0.1607 1 5 0.1532 1 5 0.1328 1
15 8 0.1483 1 8 0.1388 1 8 0.1270 1
20 10 0.1396 1 10 0.1286 1 10 0.1249 1
30 15 0.1285 1 15 0.1250 1 15 0.1262 1
50 25 0.1249 1 25 0.1262 1 25 0.1290 1

### Table 3

DM-optimal design for the quadratic model, n3 = nn1n2

n β0 = 1, β1 = −1, β2 = −1 β0 = 2, β1 = −1, β2 = −1

n1 n2 $μ˜1$ $μ˜2$ $μ˜3$ n1 n2 $μ˜1$ $μ˜2$ $μ˜3$
8 4 2 0.0943 0.5416 1 3 2 0.0546 0.5079 1
10 4 3 0.0800 0.5303 1 4 3 0.0478 0.5017 1
15 6 4 0.0639 0.5163 1 5 5 0.0438 0.4983 1
20 8 6 0.0551 0.5088 1 7 6 0.0394 0.4934 1
30 10 10 0.0497 0.5044 1 10 10 0.0371 0.4911 1
50 17 16 0.0407 0.4948 1 16 17 0.0365 0.4907 1

n β0 = 1, β1 = −2, β2 = −1 β0 = 1, β1 = −1, β2 = −2

n1 n2 $μ˜1$ $μ˜2$ $μ˜3$ n1 n2 $μ˜1$ $μ˜2$ $μ˜3$

8 4 2 0.0694 0.4571 1 3 2 0.0995 0.5760 1
10 4 3 0.0697 0.4576 1 4 3 0.0848 0.5652 1
15 6 4 0.0547 0.5163 1 5 5 0.0754 0.5560 1
20 8 6 0.0464 0.4337 1 7 6 0.0634 0.5488 1
30 10 10 0.0411 0.4281 1 10 10 0.0541 0.5409 1
50 17 16 0.0315 0.4155 1 16 17 0.0463 0.5336 1

References
1. Barndorff-Nielsen OE and Cox DR (1994). Inference and Asymptotics, Chapman and Hall, London.
2. Box GEP and Draper NR (1959). A basis for the selection of a response surface design, Journal of the American Statistical Association, 54, 622-654.
3. Cameron AC and Trivedi PK (2013). Regression Analysis of Count Data, Cambridge University Press, New York.
4. Chen Y and Ye K (2009). Bayesian hierarchical modeling on dual response surfaces in partially replicated designs, Quality Technology and Quantitative Management, 6, 371-389.
5. Coxe S, West SG, and Aiken LS (2009). The analysis of count data: a gentle introduction to Poisson regression and its alternatives, Journal of Personality Assessment, 91, 121-136.
6. Firth D (1993). Bias reduction of maximum likelihood estimates, Biometrica, 80, 27-38.
7. Fletcher R and Powell MJD (1963). A rapidly convergent descent method for minimization, Computer Journal, 6, 163-168.
8. Khuri AI, Mukherjee B, Sinha BK, and Gosh M (2006). Design issues for generalized linear models: a review, Statistical Science, 21, 376-399.
9. Lawley DN (1956). A general method for approximating to the distribution of likelihood ratio criteria, Biometrika, 43, 295-303.
10. Lehmann EL (1999). Elements of Large Sample Theory, Springer-Verlag, New York.
11. McCullagh P and Nelder JA (1989). Generalized Linear Models, Chapman and Hall, London.
12. Mehr Mansour S and Niaparast M (2019). The effect of small sample on optimal designs for logistic regression models, Communications in Statistics-Theory and Methods, 48, 2893-2903.
13. Nelder JA and Wedderburn RWM (1972). Generalized linear models, Journal of the Royal Statistical Society, Series A, 135, 370-384.
14. Poursina D and Talebi H (2013). Modified D-optimal design for logistic model, Journal of Statistical Computation and Simulation, 84, 428-437.
15. Pronzato L and Pazman A (2013). Design of Experiments in Nonlinear Models: Asymptotic Normality, Optimality Criteria and Small-Sample Properties, volume 212 of Lecture Notes in Statistics, Springer, New York.
16. Russell KG, Eccleston JA, Lewis SM, and Woods DC (2009a). Design considerations for small experiments and simple logistic regression, Journal of Statistical Computation and Simulation, 79, 81-91.
17. Russell KG, Woods DC, Lewis SM, and Eccleston JA (2009b). D-optimal designs for Poisson regression models, Statistica Sinica, 19, 721-730.
18. Searle SR, Casella G, and McCulloch CE (1992). Variance Components, Wiley, New York.
19. Wang Y, Myers RH, Smith EP, and Ye K (2006a). D-optimal designs for Poisson regression models, Journal of Statistical Planning Inference, 136, 2831-2845.
20. Wang Y, Smith EP, and Ye K (2006b). Sequential designs for a Poisson regression model in toxicological and medical studies, Journal of Statistical Planning Inference, 136, 3187-3202.