TEXT SIZE

search for



CrossRef (0)
An importance sampling for a function of a multivariate random variable
Communications for Statistical Applications and Methods 2024;31:65-85
Published online January 31, 2024
© 2024 Korean Statistical Society.

Jae-Yeol Parka, Hee-Geon Kanga, Sunggon Kim1,a

aDepartment of Statistics, University of Seoul, Korea
Correspondence to: 1 Department of Statistics, University of Seoul, 163 Seoulsiripdaero, Dongdaemun-gu, Seoul 02504, Korea. E-mail: sgkim@uos.ac.kr
Received September 11, 2023; Revised October 23, 2023; Accepted October 24, 2023.
 Abstract
The tail probability of a function of a multivariate random variable is not easy to estimate by the crude Monte Carlo simulation. When the occurrence of the function value over a threshold is rare, the accurate estimation of the corresponding probability requires a huge number of samples. When the explicit form of the cumulative distribution function of each component of the variable is known, the inverse transform likelihood ratio method is directly applicable scheme to estimate the tail probability efficiently. The method is a type of the importance sampling and its efficiency depends on the selection of the importance sampling distribution. When the cumulative distribution of the multivariate random variable is represented by a copula and its marginal distributions, we develop an iterative algorithm to find the optimal importance sampling distribution, and show the convergence of the algorithm. The performance of the proposed scheme is compared with the crude Monte Carlo simulation numerically.
Keywords : importance sampling, copula, inverse transform likelihood ratio method, Monte Carlo simulation, rare event simulation
1. Introduction

The estimation of rare-event probabilities has received great attention due to its relevance in various fields such as financial and insurance risk management, inventory management, structural systems reliability, computer networks, and telecommunications networks. The estimation of such probabilities often poses challenges when the events of interest are located in the tails of probability distributions, where the crude Monte Carlo simulation exhibits inefficiency. When the occurrence of the events of interest is rare, the accurate estimation of the tail probability requires a huge number of samples and it takes a very long time to achieve satisfactory results through the crude Monte Carlo simulation. To address this issue, importance sampling methods have emerged as a powerful tool, enabling accurate estimation of the tail probability (Glynn and Iglehart, 1989; Juneja and Shahabuddin, 2006). By sampling more frequently on the regions of interest, the importance sampling can significantly reduce the variance in the estimation of the tail probability with a smaller number of samples.

Kroese and Rubinstein (2004) proposed an importance sampling method called the transform likelihood ratio method for rare event simulation. In the method, the multivariate random variable (or the random vector) of interest is transformed into another multivariate random variable in order to find a more efficient importance sampling distribution. If the random variable of interest is one dimensional and the transformed random variable is uniformly distributed on (0, 1), then it is called the inverse transform likelihood ratio method. When the form of the cumulative distribution function of the random variable is known, then the inverse transform likelihood ratio method is a simple and unifying way of generating the importance samples efficiently in the region of interest. In the tail probability estimation of a monotone increasing function of a one dimensional random variable, Kroese and Rubinstein (2004) showed that the inverse transform likelihood ratio method gives an estimator with bounded relative error.

The efficiency of an importance sampling method depends on the selection of the importance sampling distribution. In the case of light-tailed distributions, the exponential twisting is the common approach to find the efficient importance sampling distribution. It gives the optimal importance sampling estimator in some cases (Sadowsky and Bucklew, 1990; Sadowsky, 1993). In the case of subexponential distributions such as the Weibull or the lognormal, the exponential twisting is not applicable due to the non-existence of the moment generating function. For the tail probability estimation of the random or deterministic sums of independent and identically distributed (i.i.d.) subexponential random variables, Asmussen et al. (2000) proposed a number of estimators, which are asymptotically optimal. For the same problem, Juneja and Shahabuddin (2002) proposed a method called the hazard rate twisting, and showed the asymptotic optimality of the method. The performance of the estimators by Asmussen et al. (2000) and Juneja and Shahabuddin (2002) has been improved by the conditional Monte Carlo estimator proposed by Asmussen and Kroese (2006). For the case of sums of i.i.d. random variables with regularly varying tails, Dupuis et al. (2007) proposed an algorithm for creating importance sampling estimator with bounded relative error. When the random variables are independent but not identically distributed, Rached et al. (2015) and Rached et al. (2018) proposed a generalized hazard rate twisting for the tail probability estimation.

For the case of the sums of correlated lognormal random variables, Asmussen et al. (2011) proposed two importance sampling estimators, one of which is asymptotically efficient, and the other of which shows bounded relative error. This method has been extended to apply to more general cases by Blanchet and Rojas-Nandayapa (2011). They proposed an asymptotically optimal importance sampling method for the estimation of the tail probability of eX1 + · · · + eXd, where (X1, . . ., Xd) is a d-variate random variable. In the case that (X1, . . ., Xd) follows the d-dimensional multivariate normal distribution, eX1 + · · · + eXd is the sum of the correlated lognormal random variables.

By confining the importance sampling distribution to a parametric family of distributions, we can apply analytical methods to find the optimal importance sampling distribution from the family. The cross entropy method and the variance minimization method are generally applied to find the optimal parameter. In the latter method the parameter minimizing the variance of the importance sampling estimator is found to be the optimal (Rubinstein and Shapiro, 1993). In the former method which was proposed by Rubinstein (2002) and Homem-de-Mello and Rubinstein (2002), the optimal parameter to be found is the one minimizing the Kullback-Leibler divergence of the importance sampling distribution from the zero variance distribution, which is defined in Section 2.3. In both methods, the optimal parameter is usually found by solving a series of stochastic optimization problems iteratively; see De Boer et al. (2005) for more details.

In this paper, we propose an inverse transform likelihood ratio method for the tail probability estimation of an increasing function of a multivariate random variable. This method was proposed originally for the rare event simulations with one-dimensional random variables (Kroese and Rubinstein, 2004). Due to the Sklar’s theorem (Nelsen, 2006), the cdf (cumulative distribution function) of a multivariate random variable is represented by its marginal distributions and a copula. Note that a copula is the multivariate uniform distribution with a certain dependence structure. The cdf of the multivariate random variable of interest in this form is assumed to be known. We transform the distribution of the multivariate random variable into a copula, and confine importance sampling distributions to a family of multivariate distributions in which the marginal distribution of each component follows the beta distribution and the dependence of a multivariate distribution is modeled by the Gaussian copula. By decomposing the Kullback-Leibler divergence of an importance sampling pdf from the zero variance pdf into two parts and minimizing each part in order, we find the pseudo-optimal parameter in an iterative manner. We also show the convergence of the proposed scheme.

The rest of the paper is organized as follows. The problem statement is given in Section 2. In the same section, we introduce the inverse transform likelihood method and the cross entropy method briefly, and describe how to apply these methods to our problem. We propose an algorithm for finding the optimal parameter of the importance sampling distribution. In Section 3, we show the convergence of the proposed algorithm. The performance of the proposed scheme is compared numerically with the crude Monte Carlo simulation in Section 4. Finally, we conclude the paper in Section 5.

2. Method

2.1. Problem statement

We consider a d-dimensional continuous random vector X = (X1, . . ., Xd) with support Ω = (a1, b1) × · · ·×(ad, bd), where −∞ ≤ ai < bi ≤ ∞, i = 1, . . ., d. We denote by F the joint cdf of X, and by Fi the marginal cdf of Xi, i = 1, . . . d. We also let f be the pdf (probability density function) of X. For a real valued function L defined on Ω, we want to estimate the tail probability of L(X) over threshold γ, i.e.

l=Pr{L(X)>γ}.

We call L(X) the loss, and L(·) the loss function. We assume that L(x) is strictly increasing and continuous on Ω. For x = (x1, . . ., xd), x=(x1,,xd)d, we define that x < x′ if and only if xixi, ∀i and xx′. Then, we have that

L(x)<L(x),         x<x.

We also assume that the support of L(X) is (sL,∞), where −∞ ≤ sL < ∞, and that for a sequence x(1), x(2), . . ., ∈ Ω such that max{F1(x1(i)),,Fd(xd(i))}1 as i→∞,

limiL(x(i))=.

We consider the case that the value of γ is so large that the tail loss probability l is near 0.

Sklar’s theorem (Nelsen, 2006) says that there is a copula C0 satisfying

F(x1,,xd)=C0(F1(x1),,Fd(xd)),         (x1,,xd)d.

We denote by c0 the pdf of C0, i.e.

c0(u)=nC0(u)u1ud,         u=(u1,,ud)(0,1)d.

We call F the nominal cdf of X, and C0 the nominal copula, respectively. Suppose that U is a d-dimensional random vector on (0, 1)d, and that the joint cdf of U is C0. Let Ui, i = 1, . . ., d, be the ith element of U. Then, Ui is uniformly distributed on (0, 1). We denote by X˜=(F1-1(U1),,Fd-1(Ud)). Then the cdf of is F, i.e.

X˜=Xd,

where =d means the same in distribution.

We define a function (u) on (0, 1)d as follows:

L˜(u)=L(F1-1(u1),,Fd-1(ud)),         u(0,1)d.

Since L(x) is continuous on Ω and F1-1(u), i = 1, . . ., d, is continuous on (0, 1), (u) is also continuous on (0, 1)d. Note that F1-1(u), i = 1, . . ., d, is also strictly increasing on (0, 1) due to the continuity of Xi. Then, Equation (2.5) says that (u) is strictly increasing on (0, 1)d, i.e. for u, u′ ∈ (0, 1)d,

L˜(u)<L˜(u),         u<u.

The assumption (2.2) on L(x) and the definition of (u) given in Equation (2.5) imply that (u) also diverges as max1idui goes to 1, i.e.

limmax{u1,,ud}1L˜(u)=.

When U follows the copula C0, it follows from Equations (2.4) and (2.5) that (U) =dL(X). The tail probability l in Equation (2.1) is represented by

l=Pr{L˜(U)>γ}.

Let I(A) be the indicator function of an event A and H(u) = I((u) > γ) for u ∈ (0, 1)d. Then, Equation (2.8) is rewritten as

l=Ec0[H(U)],

where Ec[h(U)] is the expected value of h(U) for a real valued function h and a random vector U following a pdf c.

Usually, the distribution of (U) is not tractable, and it is not easy to compute the value of l analytically. Instead, we can get an estimate of it by the Monte Carlo simulation. Suppose that sampling from copula C0 is not hard. Then, we generate N samples of U independently from C0, and denote them by U(1), . . .,U(N), respectively. Equation (2.9) gives the CMC (crude Monte Carlo) estimator of l as follows:

l^CMC=1Nj=1NH(U(j)).

2.2. Inverse transform likelihood ratio method

For the efficient estimation of l, a sampling distribution different from C0(u) can be applied to generate random copies of U. If we choose the sampling density of U appropriately, then the variance of the importance sampling estimator may be much less than that of the CMC estimator in Equation (2.10). We propose an importance sampling by applying the inverse transform likelihood method (Kroese and Rubinstein, 2004). We denote by G(u) the importance sampling cdf of U. We consider the following form of distribution as a candidate of G(u):

G(u)=C(G1(u1),,Gd(ud)),         u(0,1)d,

where C(u) is a copula defined on (0, 1)d and Gi(u), i = 1, . . ., d, is a cdf with support (0, 1). Note that Gi(u) is the marginal cdf of the ith component of G(u). We assume that Gi(u) is differentiable and Gi(x)=gi(x), x ∈ (0, 1). If we denote by g(u) the pdf of G(u), then g(u) is represented as

g(u)=c(G1(u1),,Gd(ud))i=1dgi(ui),         u(0,1)d,

where c is the pdf of C.

Suppose that U is a sample generated from G. Then, the likelihood ratio of U with respect to c0 is given by

W(U)=c0(u)g(u),         u(0,1)d,

and Equation (2.9) is rewritten as follows:

l=Eg[H(U)W(U)].

Suppose that we have N random copies U(1), . . ., U(N) of U. Then, it follows from Equation (2.13) that an importance sampling estimator of l is given by

l^IS=1Nj=1NH(U(j))W(U(j)).

To get a sample of random vector following G in Equation (2.11), we first need to generate a random vector V = (V1, . . ., Vd) from copula C(u). Since Gi(u) is strictly increasing on (0, 1), it has the inverse function Gi-1. If we consider a random vector defined as

U=(G1-1(V1),,Gd-1(Vd)),

then it follows that for u = (u1, . . ., ud) ∈ (0, 1)d,

Pr{Uu}=Pr{G1-1(V1)u1,,Gd-1(Vd)ud}=Pr{V1G1(u1),,VdGd(ud)}.

Since the random vector V follows the cdf C(u), the cdf of U′ is G(u) in Equation (2.11).

Algorithm 1 shows the procedure to get an estimate of l as described above. For efficient estimation of l, sampling from C(u) should not be hard. We also should choose G(u) such that Gi-1(u) and gi(u), i = 1, . . ., d, have the explicit form or the numerical computation of their values for given u is not difficult.

2.3. The cross entropy method

Kroese and Rubinstein (2004) adopted the cross entropy method to find the optimal importance distribution. We extend their method to our case. Let g*(·) be the zero-variance pdf given by

g*(u)=H(u)c0(u)l,         u(0,1)d.

Then, g*(·) is the optimal importance pdf in the sense that generation of random copies of U from g*(·) gives the minimum variance of IS in Equation (2.14) (Rubinstein and Kroese, 2016). However, we do not know the value of l so that the sampling from g*(·) is not easy, or inefficient. Instead, we try to find the importance sampling pdf with the minimum cross-entropy from g*(·).

We confine the importance sampling pdf to a parametric family of pdfs . For a multivariate distribution belonging to , the marginal distribution of each component is the beta distribution and the dependence is modeled by the Gaussian copula. Let U be an importance sample following a distribution belonging to . Then, Ui, i = 1, . . ., d, follows the beta distribution with parameters 1 and βi. We reparametrize βi as ψi = 1/βi. Then, the pdf of Ui is

gi(u;ψi)=1ψi(1-u)-1+1ψi,         u(0,1),

and the cdf is

Gi(u;ψi)=1-(1-u)1ψi,         u(0,1).

When Ui follows the above cdf, we denote that Ui follows Beta(1, ψi) in what follows. The d-dimensional Gaussian copula has a d-dimensional correlation matrix as its parameter. The Gaussian copula with correlation matrix R has the following form:

CG(u;R)=Φd(Φ-1(u1),,Φ-1(ud);R),         u(0,1)d,

where Φd(·; R) is the cdf of the d-dimensional multivariate normal distribution with mean (0, . . ., 0) and variance-covariance matrix R, and Φ is the cdf of the standard normal distribution. We define cG(·; R) as the pdf of CG(·; R), and φ as the pdf of the standard normal distribution, respectively. Then, the explicit form of cG(·; R) is as follows:

cG(u;R)=1|det R|exp{-12zT(R-1-Id)z},

where det R is the determinant of R, z = (Φ1(u1), . . ., Φ1(ud))T, and Id is the d-dimensional identity matrix. We define θ = (ψ1, . . ., ψd, R) and Θ = (0,∞)d ×d, where d is the set of d-dimensional correlation matrices. It follows from Equation (2.12) that for θΘ,

g(u;θ)=cG(G1(u1;ψ1),,Gd(ud;ψd);R)i=1dgi(ui;ψi),         u(0,1)d.

Then, is represented as {g(u; θ), θΘ}.

The optimal importance sampling distribution with minimum cross entropy from g*(·) is obtained by solving the following problem (Rubinstein and Kroese, 2016):

maximizeθΘEc0[H(U)log g(U;θ)].

Let θ* be the solution of Equation (2.19). Generally, it is not easy to find value of θ* analytically. Instead, we can find the approximate solution of Equation (2.19) by solving the following stochastic programming:

maximizeθΘj=1MH(U(j))log g(U(j);θ),

where U(1), . . ., U(M) are independent samples from c0(·).

When the value of γ is large, only a small number of samples in U(1), . . ., U(M) make H(U( j)) non-zero, and the solution of Equation (2.20) does not approximate θ* well. In this case, an iterative method is appropriate (Rubinstein and Korese, 2016). In the method, the sampling distribution of U is changed iteratively and the solution of Equation (2.19) with increasing thresholds instead of γ is found in each iteration. In the tth step of the iterative method, we consider an importance sampling pdf g(·;ω(t1)) of U and a threshold st so that L(U) over st is not rare. In this end, for a positive real number ρ small but not close to 0, st is usually set to be the the (1−ρ) quantile of the distribution with pdf g(·;ω(t1)). The solution of the following maximization problem is found approximately in the tth iteration:

maximizeθΘEω(t-1)[I(L˜(U)st)log g(U;θ)W(U;ω(t-1))],

where Eω[·] means the expectation with respect to the pdf g(·;ω) and

W(u;ω)=c0(u)g(u;ω).

Since the occurrence of L(U) over st is not rare, st is estimated as ŝt, the (1−ρ) quantile of the generated samples of U following g(·;ω(t1)). The solution of Equation (2.21) can be found approximately by solving the corresponding stochastic programming given by

maximizeθΘj=1MI(L˜(U(j))s^t)log g(U(j);θ)W(U(j),ω(t-1)).

The solution of the above problem will be used as ω(t), the parameter of the importance sampling pdf in the (t + 1)-st iteration. The iteration continuous until ŝtγ.

We define θ(s) as follows:

θ(s)=argmaxθΘEc0[H(U;s)log g(U;θ)],

where H(u; s) = I((u) ≥ s). The continuity of (·) implies that θ(s) is continuous with respect to s (s > 0). Note that θ(s) can be rewritten as for ωΘ,

θ(s)=argmaxθΘEω[H(U;s)log g(U;θ)W(U;ω)].

Equation (2.24) shows that θ(s) does not depend on the value of ω, and that θ(st), t = 1, 2, . . ., is the solution of the problem (2.21). Note that θ* = θ(γ). If the sequence {ŝ1, ŝ2, . . .} becomes larger than γ and the iteration ends at the step t, then we obtain an approximate value of θ* by substituting γ for ŝt in Equation (2.23).

2.4. The proposed scheme

When we apply the procedure described in the previous subsection, we need to solve Equation (2.21). It follows from Equation (2.18) that Eω[H(U; s) log g(U; θ)W(U;ω)] is rewritten as

maximizeθΘ(i=1dEω[H(U;s)log gi(Ui;ψi)W(U;ω)]+Eω[H(U;s)log cG(G1(U1;ψ1),,Gd(Ud;ψd);R)W(U;ω)]).

Due to the complex form of cG(u; R), solving the above problem analytically is nearly impossible, and it is also not easy to solve the corresponding stochastic programming obtained by generating samples from g(·;ω). Instead, we decompose the above optimization problem into two sub-problems. Let ψ(s) = (ψ1(s), . . ., ψd(s)) be the solution of the first term of problem (2.25), i.e.

ψi(s)=argmaxψi(0,)Eω[H(U;s)log gi(Ui;ψi)W(U;ω)],         i=1,,d.

After substituting ψi(s) for ψi in the second term of problem (2.25), the correlation matrix maximizing the second term is obtained as

R(s)=argmaxRdEω[H(U;s)log cG(G1(U1;ψ1(s)),,Gd(Ud;ψd(s));R)W(U;ω)].

Then (ψ(s), R(s)) is an approximate value of θ(s). We may apply this procedure to solve Equation (2.21) stochastically. However, it is not guaranteed that the sequence {ŝ1, ŝ2, . . .} becomes larger than γ.

In our proposed scheme, we choose an increasing sequence {γ1, γ2, . . ., γK} with γK being equal to γ, and find approximate values of ψ(γk) and R(γk), k = 1, 2, . . ., K. We denote them by ψ̂(γk) and (γk), respectively. Given (γk1), we find ψ̂(γk) by solving iteratively a series of stochastic programmings corresponding to Equation (2.26) in the same manner as described in the previous section. In each iteration, the correlation matrix R, which is an element of the importance sampling parameter ω, is fixed to (γk1). Theorem 5 in Section 3 shows that the value of ŝt is eventually larger than γk for a t > 0. In this case, we set ŝt = γ and determine the value of (γk) by solving Equation (2.27) stochastically. Note that there are sub-iterations in the iteration of finding (ψ̂(γk), (γk)). To avoid confusion, we call the iteration of finding (ψ(γk), R(γk)) the stage k for k = 1, 2, . . ., K, and call the tth sub-iteration in a stage the tth step for t = 1, 2, . . . .

At the start of the stage k, we assume that (γk1) is available for k = 2, 3, . . ., K. In stage 1, we set (γ0) as the d-dimensional identity matrix, where we use the notation (γ0) for the notational consistency. At the first step of the stage k, we set ψ(0) = (1, . . ., 1) ∈ ℝd. We generate M independent samples following pdf g(·; ψ(0), (γk1)). At the tth step (t ≥ 2), we generate M independent samples following pdf g(·; ψ̂(t1), (γk1)), where ψ̂(t1) is determined in the previous step. In order to get random samples following g(·; ψ̂(t1), (γk1)) for t ≥ 1, we first generate independently the multivariate normal vectors Z(1), . . ., Z(M) with mean (0, . . ., 0) and variance-covariance matrix (k − 1), and transform it into

Vi(j)=Φ(Zi(j)),         i-1,,d,

where Zi(j) is the ith element of Z( j). Then, V1(j),,Vd(j) has copula CG(·; (γk1)) as its cdf. Using the following transform, we get

U(j)=(G1-1(V1(j);ψ1(t-1)),,Gd-1(Vd(j);ψ^d(t-1))),         j=1,,M.

As we explained in Section 2.2, U( j) has pdf g(·; ψ̂(t1), (γk1)). Moreover, U(1), . . ., U(M) are mutually independent due to the independence of Z(1), . . ., Z(M). By substituting u = U( j) and ω = (ψ̂(t1), (γk1)) into Equation (2.22), we obtain the likelihood ratio Wj(t) of the sample U( j), i.e.

Wj(t)=c0(U(j))g(U(j);ψ^(t-1),R^(γk-1)),         j=1,,M.

Suppose that we have generated M samples of U at the tth step. Then, we denote by ŝt the (1 − ρ) quantile of the generated losses {(U( j)), j = 1, . . ., M}, and let (ŝt) = {j: (U( j)) > ŝt, j = 1, . . ., M}. The stochastic programming corresponding to Equation (2.26) is as follows:

maximizeψi(0,)j(s^t)Wj(t)log gi(Ui(j);ψi),         i=1,,d.

It follow from Equation (2.15) that the solution of Equation (2.28) is denoted by

ψ^i(t)=argmaxψi(0,1)j(s^t)Wj(t)(-logψi+(1ψi-1)log(1-Ui(j))).

The explicit form of ψ^i(t) is obtained as

ψ^i(t)=-j(s^t)Wj(t)log(1-Ui(j))j(s^t)Wj(t).

We iterate the above procedure for t = 1, 2, . . . until ŝt > γk. Theorem 5 in Section 3 shows that the iteration terminates with probability 1. Suppose that ŝt is larger than γk at step t = τ, and that {U( j), j = 1, . . ., M} are the generated samples at this step. Then, each element of ψ̂(γk) is obtained by substituting γk for ŝt in Equation (2.29), i.e.

ψ^i(γk)=-j(γk)Wj(τ)log(1-Ui(j))j(γk)Wj(τ),         i=1,,d.

We derive the stochastic programming corresponding to Equation (2.27) with s = γk is as follows:

maximizeRdj(γk)Wj(τ)logcG(G1(U1(j);ψ^1(γk)),Gd(Ud(j);ψ^d(γk));R).

Let Zi(j)=Φ-1(Gi(Ui(j);ψ^i(γk))), i = 1, . . ., d, and let Z(j)=(Z1(j),,Zd(j)), j = 1, . . ., M. By applying the form of cG in Equation (2.17) to Equation (2.30), we can see that R(γk) is estimated as

R^(γk)=argminRdj(γ(k))wj(τ)ZjTR1Zj+logdetR.

It can be easily shown that the solution of Equation (2.31) is given by

R^(γk)=j(γ(k))Wj(T)(Z(j))TZ(j)j(γ(k))Wj(τ).

Some diagonals of (γk) in Equation (2.32) may not be 1. We normalize it so that it is a correlation matrix.

If we let θ̂ (γk) = (ψ̂(γk), (γk)), then θ̂(γk) is an approximate value of θ(γk) for k = 1, 2, . . ., K. Since γK is equal to γ, θ̂(γK) is the estimate of θ*. Algorithm 2 shows the procedure to get the pseudo-optimal parameter θ̂(γK).

3. Convergence of the iterative scheme

In this section, we will show that if the number of importance samples generated in each step is sufficiently large, then the algorithm terminates in a finite number of steps with a probability close to 1 We define ψ = (ψ1, . . ., ψd). Suppose that a random multivariate variable U follows the joint pdf g(·; ψ, R). Since (u) is continuous and strictly increasing on (0, 1)d and the support of g(u; ψ, R) is (0, 1)d, (U) is a continuous random variable and the support of (U) is the same as that of L(X), where X is a multivariate random variable with cdf F. Thus, the support of (U) is (sL,∞).

The cdf of (U) is as follows:

FL˜(s;ψ,R)=(0,1)d1(L˜(u)s)g(u;ψ,R)du,         s>0.

The representation of g(·; ψ, R) in Equation (2.18) says that g(·; ψ, R) is continuous with respect to ψ on (0,∞)d, which means that F(s; ψ, R) is also continuous with respect to ψ on (0,∞)d. We denote by s(ψ; R) the (1 − ρ) quantile of (U). Then, s(ψ; R) is the unique solution of the following equation:

FL˜(s;ψ,R)=1-ρ,         0<ρ<1.

Theorem 1

s(ψ; R) is continuous with respect to ψon (0,∞)d.

Proof

Since F(s; ψ, R) is continuous with respect to ψ on (0,∞)d, we have that

limψψ0FL˜(s(ψ0;R);ψ,R)=FL˜(s(ψ0;R);ψ0,R).

It follows from Equations (3.1) and (3.2) that

limψψ0(0,1)d1(L˜(u)s(ψ0;R))g(u;ψ,R)du-(1-ρ)=0.

Since s(ψ; R) is the (1 − ρ) quantile of (U), we have that for ψ ∈ (0,∞)d,

(0,1)d1(L˜(u)s(ψ;R))g(u;ψ,R)du=1-ρ.

By applying the above equation to Equation (3.3), we have that

limψψ0(0,1)d{1(L˜(u)s(ψ0;R))-1(L˜(u)s(ψ;R))}g(u;ψ,R)du=0.

If s(ψ0; R) < s(ψ; R), then

1(L˜(u)s(ψ0;R))-1(L˜(u)s(ψ;R))={-1,s(ψ0;R)<L˜(u)s(ψ;R),      0,otherwise.

If s(ψ; R) ≤ s(ψ0; R), then

1(L˜(u)s(ψ0;R))-1(L˜(u)s(ψ;R))={1,s(ψ;R)<L˜(u)s(ψ0;R),0,otherwise.

The above two equations give that

1(L˜(u)s(ψ0;R))-1(L˜(u)s(ψ;R))=(-1)1(s(ψ0,R)s(ψ,R))|1(L˜(u)s(ψ0;R))-1(L˜(u)s(ψ;R))|.

Then, Equation (3.4) implies that

limψψ0(0,1)d|1(L˜(u)s(ψ0;R))-1(L˜(u)s(ψ;R))|g(u;ψ,R)du=0.

The integrand of the above equation is nonnegative for each ψ on (0,∞)d. Thus, it converges to 0 almost everywhere. Since g(u; ψ, R) > 0 for u ∈ (0, 1)d, the above equation implies that the set {u: s(ψ0; R) < (u) ≤ s(ψ; R)} ∪ {u: s(ψ; R) < (u) ≤ s(ψ0; R)} converges to a set whose measure is 0. The continuity of (u) on (0, 1)d implies that s(ψ; R) → s(ψ0; R) as ψψ0.

When U follows the pdf g(·; ψ, R), E[Ui] is an increasing function of ψi, i = 1, . . ., d. Thus, we can guess that Pr{(U) > γ} may be increasing with respect to each of ψi, or equivalently, the (1 − ρ) quantile of (U) may be increasing with respect to each of ψi. The following theorem shows that this is true.

Theorem 2

s(ψ; R) is a strictly increasing function of ψ.

Proof

For ψ = (ψ1, . . ., ψd) ∈ (0,∞)d and ψ=(ψ1,ψd)(0,)d, we assume that ψ < ψ′. Then, it follows that for given v = (v1, . . ., vd) ∈ (0, 1)d,

(1-(1-v1)ψ1,,1-(1-vd)ψd)<(1-(1-v1)ψ1,,1-(1-vd)ψd).

Since (u) is a strictly increasing function of u, we have that for given (v1, . . ., vd) ∈ (0, 1)d,

L˜(1-(1-v1)ψ1,,1-(1-vd)ψd)<L˜(1-(1-v1)ψ1,,1-(1-vd)ψd).

Suppose that V = (V1, . . ., Vd) is a random vector following the Gaussian copula with correlation matrix R. We define two random vectors

U=(1-(1-V1)ψ1,,1-(1-Vd)ψd),U=(1-(1-V1)ψ1,,1-(1-Vd)ψd).

Then, U and U′ follow the pdfs g(·; ψ, R) and g(·; ψ′, R), respectively. Since s(ψ; R) is the (1 − ρ) quantile of (U), we have that

Pr{L˜(U)s(ψ;R)}=1-ρ.

Equation (3.5) says that Pr{(U′) ≤ s(ψ; R)} < Pr{(U) ≤ s(ψ; R)}. It implies that

Pr{L˜(U)s(ψ;R)}<1-ρ.

Note that s(ψ′; R) is the (1 − ρ) quantile of (U′). Then, we have that s(ψ; R) < s(ψ′; R).

In this subsection, we assume that the converse of Equation (2.6) is also valid in a stochastic sense. Specifically, we assume that for U with pdf c0, the conditional random variable Ui|((U) = s), i = 1, . . ., d, is stochastically increasing with respect to s in the strict sense, i.e. for s < s′,

Pr{Ui>uL˜(U)=s}<Pr{Ui>uL˜(U)=s},         u(0,1).

The solution of Equation (2.26) is rewritten as

ψi(s)=argmaxψi(0,)Ec0[I(L˜(U)>s)log gi(Ui;ψi)].

It follows from Equation (2.15) that ψi(s) is represented as

ψi(s)=Ec0[-log(1-Ui)L˜(U)>s],         s>0.

Theorem 3

Suppose that for U with pdf c0, the conditional random variable Ui|((U) = s), i = 1, . . ., d, is stochastically increasing with respect to s. Then, ψi(s), i = 1, . . ., d, is a strictly increasing and continuous function of s, s > 0.

Proof

Let f(·) is the pdf of (U) when U follows pdf c0, and let fi(u, s) be the joint pdf of Ui and (U). We also denote by fUi|(u|s) the conditional pdf of Ui given that (U) = s, i.e.

fUiL˜(us)=fi(u,s)fL˜(s).

Since d Pr{Ui > u|(U) = s}/du = −fUi|(u|s), we have that

Ec0[-log(1-Ui)|L˜(U)=s]=01-log(1-u)fUi|L˜(u|s)du=01Pr{Ui>u|L˜(U)=s}1-udu.

By applying Equation (3.6) to the above equation, we have that for s < s′,

Ec0[-log(1-Ui)|L˜(U)=s]<Ec0[-log(1-Ui)|L˜(U)=s].

Then, Equation (3.7) implies that ψi(s) is strictly increasing with respect to s for i = 1, . . ., d.

It follows from Equation (3.7) that

ψi(s)=s01-log(1-u)fi(u,y)dudyPr{L˜(U)>s}.

Both of the numerator and the denominator of ψi(s) in Equation (3.8) are continuous with respect to s ∈ (0,∞). Thus, ψi(s) is a continuous function of s on (0,∞).

Suppose that a correlation matrix Rd are given. Let ψ(0) = (1, . . ., 1). We define ψ(t) and st, t = 1, 2, . . ., iteratively, as follows:

st=s(ψ(t-1);R),ψi(t)=ψi(st),         i=1,,d,

where ψ(t)=(ψ1(t),,ψd(t)). It can be easily shown that

Ec0[-log(1-Ui)]=1,         i=1,,d,

which means that

ψi(0)=Ec0[-log(1-Ui)|L˜(U)>0].

Since s1 > 0, we have from Theorem 3 that

ψ(1)>ψ(0).

and from Theorem 2 that s(ψ(1); R) > s(ψ(0); R), equivalently,

s2>s1.

Then, Equation (3.9) gives that for t = 1, 2, . . .,

st+1>st,ψ(t+1)>ψ(t).

Moreover, Equation (3.9) says that the mapping from ψ(t) to ψ(t+1) is represented as

h(ψ)=(ψ1(s(ψ;R)),,ψd(s(ψ;R))).

In stage k of the proposed scheme described in Section 2.4, ŝt and ψ̂(t), t = 1, 2, . . ., are the approximate values of st and ψ(t) with R being equal to (γk1). Note that ŝt is the (1 − ρ) quantile of the samples following g(·; ψ̂(t1), (γk1)), and that ψ̂(t), t = 1, 2, . . ., is the cross entropy estimator of ψ(t). The following theorem due to Lieber (1998) describes the conditions under which the sequence {ŝ1, ŝ2, . . .} becomes larger than γk in a stochastic sense (Rrubinstein, 1999).

Theorem 4. (Lieber, 1998)

If the following conditions hold:

  • The sequence {st, t = 1, 2, . . .} is strictly monotonic increasing.

  • s(ψ; R) is continuous with respect to ψ.

  • s(ψ; R) is a proper function of ψ, i.e. for a closed interval A, {ψ: s(ψ; R) ∈ A} is compact.

  • Δ(ψ) = s(h(ψ); R) − s(ψ; R) is lower semi-continuous,

then there exists an integer t such that for x > 0,

limMPr{s^t<γk}=0,

where M is the number of samples in each step.

By exploiting the above theorem, we can show that if the number of samples is sufficiently large, the sequence {ŝt, t = 1, 2, . . .} becomes eventually larger than γk.

Theorem 5

In stage k for k = 1, 2, . . ., K, there exists an integer t such that

limMPr{s^t<γk}=0.
Proof

It is sufficient to show that the four conditions of Theorem 4 are satisfied. We can see from Equation (3.10) that the condition (i) is satisfied, and have shown that the condition (ii) is satisfied in Theorem 1. Since s(ψ; R) is continuous with respect to ψ ∈ (0,∞)d and ψi(s) is continuous with respect to s ∈ (0,∞), ψi(s(ψ; R)) is continuous with respect to ψ ∈ (0,∞)d for i = 1, . . ., d. In other words, each component of h(ψ) in (3.11) is continuous with respect to ψ. Thus, Δ(ψ) is a continuous function of ψ, which proves the condition (iv).

It remains to prove the condition (iii). For a closed interval A = [a1, a2], we let s1(A; R) = {ψ: s(ψ; R) ∈ A}. Since s(ψ; R) is continuous with respect to ψ, s1(A; R) is closed. Suppose that s1(A; R) is not bounded. In this case, s1(A; R) contains a sequence {ψ(1), ψ(2), . . .} such that for constant K0 > 0,

|ψ(n)|>dK0n,         n=1,2,.

Then, there is an i ∈ {1, . . ., d} such that ψi(n)>K0n, where ψi(n) is the ith element of ψ(n). Let U(n) be the random vector following the pdf g(·; ψ(n), R). Then,

Pr{L˜(U(n))a2}1-ρ,         n=1,2,,

and it follows from Equation (2.16) that for 0 < u0 < 1,

Pr{Ui(n)>u0}=(1-u0)1ψi(n),

where Ui(n) is the ith element of U(n). The above equation gives that

Pr{Ui(n)>u0}>(1-u0)1(K0n).

Since Pr{maxiidUi(n)>u0}Pr{Ui(n)>u0}, we have that

Pr{max1idUi(n)>u0}>(1-u0)1(K0n),

which implies that

limnmax1idUi(n)=1,         in probability.

Then, Equation (2.7) says that

limnL˜(U(n))=,         in probability,

which implies that limn Pr{(U(n)) ≤ a2} = 0. It contradicts to Equation (3.12). Thus, s1(A; R) is bounded. Then it is a compact set.

Suppose that the current stage is the kth stage with temporal threshold γk. If ŝt < γk for t = 1, . . ., τ − 1, and ŝτγk, then stage k ends at step τ. Let ε be a sufficiently small positive real number. According to Theorem 5, there are positive integers τk and Mk such that if the number of importance samples at each step is larger than Mk, then

Pr{s^τkγk}>1-ɛK.

In the case that ŝτkγk, the stage k ends at step τk or before. Suppose that M in Algorithm 2 is greater than max1kKMk. Then the number of steps required to terminate the stage k is less than or equal to τk with a probability larger than 1 − ε/K. To get an estimate of θ*, K stages must be completed. Let T be the total number of steps required to complete all stages. Then it follows that

Pr{Tk=1Kτk}Pr{s^τ1γ1,,s^τkγK}=1-Pr{k=1K{s^τk<γk}}1-k=1KPr{s^τk<γk}.

By applying Equation (3.13) to the above inequality, we have that

Pr{Tk=1Kτk}1-ɛ,

i.e. we can get an estimate of θ* within k=1Kτk steps with a probability larger than 1−ε. In a practical scenario, we do not know the value of τk and Mk. However, if we set the value M to be sufficiently large, then the algorithm terminates in a finite number of steps with a probability close to 1.

4. Numerical results

In this section, we compare the performance of the proposed scheme with that of the CMC estimator given in Equation (2.10). We estimate numerically the tail probabilities given in Equation (2.1) over various thresholds. The CMC estimator and the proposed scheme given in Algorithm 2 are applied to estimate the tail loss probabilities. We call the methods CMC and ITLR, respectively. We consider the loss function defined as follows:

L(x)=i=15xi+2i=610xi,         x(0,)10.

The joint cdf of X = (X1, . . ., X10) has the form in Equation (2.3). In this section, we assume that C0 is the Gaussian copula with exchangeable correlation matrix R0, and that Xi, i = 1, . . ., 10, follows the Weibull distribution. The shape parameter of Xi, i = 1, . . ., 10, is set to be

a1=a2=a3=a6=a7=a8=1.5,a4=a5=a9=a10=2.5,

and the scale parameter of Xi, i = 1, . . ., 10, is set to be

σ1=σ2=σ9=σ10=1,σ3=σ4=σ7=σ8=2,σ5=σ6=5.

Let η be the off-diagonal element of R0. We consider three cases of η = 0.25, 0.5, and 0.75. In order to infer the distributional properties of the loss L(X), we have generated 107 random samples of the loss for each case of η. Figure 1 shows the histograms of the generated losses with varying values of η. We can see from the figure that the distributions of the losses don’t look very different from each other except it seems to be more centered around the mean as the value of η becomes small.

Table 1 shows summary statistics of the generated losses. It follows from the table that the sample means of the generated losses have very similar values with varying values of η. This is the same for the case of sample medians, while the dispersion of the samples becomes large with increasing value of η. Note that the maximum loss may fluctuate greatly for each set of generated samples due to the large variability of the Weibull distribution.

When X follows the joint distribution described above and the loss function L is as defined in Equation (4.1), we have estimated the tail loss probabilities over various values of threshold by CMC and ITLR methods, respectively. We have chosen the values of threshold so that the tail loss probabilities are in the range (106, 103). Then, the occurrence of L(X) larger than the threshold is rare. The adopted values of threshold γ are shown in Table 2. In the CMC estimation, we generated 107 samples of X independently from cdf C0(F1(x1), . . ., F10(x10)) for each combination of γ and η. The scheme described in Section 2.1 was applied for the generation of samples of X. In the ITLR estimation, we also generated 107 importance samples of X for each combination of γ and η. To find the optimal sampling distribution of X and compute the likelihood ratio of a sample, we applied the scheme in Algorithm 2 with γi = (i/3)γ, i = 1, 2, 3. Through some pilot simulations, we found that such setting of γi’s is appropriate.

The estimated tail loss probability for each combination of η and γ is given in Table 2. Since ITLR estimates the tail loss probabilities with lower standard error than CMC in all cases, we have chosen the estimated value by ITLR as . We can see the tendency that the standard errors of CMC and ITLR estimates become less as γ increases. The tendency is obviously due to the fact that the tail loss probability decreases as γ increases.

When we estimate the tail loss probability using a Monte Carlo simulation scheme, the efficiency of the scheme can be measured by the product of the sample variance of the estimate to the tail loss probability and the simulation time to get the estimate (Glynn and Whitt, 1992; Sak and Hörmann, 2012). We denote the product by time*variance. The smaller value of time*variance implies the better performance. For each combination of η and γ, the time*variance of each estimate by CMC and ITLR, respectively, is shown in Table 2. The time*variance of ITLR estimate to the tail loss probability is larger than that of CMC estimate in the case of η = 0.5 and γ = 75. However, in the other cases, ITLR shows the time*variance lower than CMC.

Table 3 shows the ratio of the time*variance of CMC estimate to that of ITLR estimate. We can see that the time*variances of ITLR are 3.5 to 541.1 times less than those of CMC with varying thresholds for the case of η = 0.25. In other words, ITLR is 3.5 to 541.1 times faster than CMC in terms of simulation time to obtain the same estimation error. In the same table, we observe that the estimate for the case of η = 0.75 shows the similar behavior of the time*variance to that of η = 0.25. In this case, ITLR is 1.2 to 26.4 times faster than CMC. ITLR shows better performance than CMC for the cases of η = 0.25 and η = 0.75. However, in the case of η = 0.5, the ratio has values from 0.48 to 121.7 according to the values of γ. It depends on the value of threshold which method shows the better performance. We can see from Table 3 that ITLR shows the better performance than the CMC for the case of high threshold, and for the case of weak dependence of Xi’s.

5. Conclusion

We consider a static problem to estimate the probability that a strictly increasing function of a multivariate random variable has values over a given threshold, and assume that the exact form of the marginal distributions of the multivariate random variable and that of its copula are known. In order to solve the problem, we applied the inverse transform likelihood ratio method, which was originally proposed for the case of one dimensional random variables. Confining importance sampling distributions to a family of multivariate distributions in which the marginal distribution of each component follows the beta distribution and the dependence of a multivariate distribution is modeled by the Gaussian copula, we proposed an algorithm to find the pseudo-optimal parameter in an iterative manner, and showed the convergence of the proposed algorithm. Numerical study showed that the proposed scheme is more efficient than the crude Monte Carlo simulation in the case that the threshold is high or in the case that the dependence of each pair of components is weak. However, the performance of the proposed scheme deteriorates in the other case.

If the copula of the nominal distribution is not Gaussian and it shows strong tail dependence, then the losses over a high threshold typically occur when some variables have large values at the same time. Regardless of the correlation matrix, the Gaussian copula is asymptotically independent in both the upper and the lower tails, and the samples generated by the proposed method are unlikely to have any of their components having large values at the same time. This makes it hard to generate large losses frequently, and the proposed method will lose its efficiency. Therefore, when the tail dependence is strong, copulas such as Gumbel or generalized Clayton can be suitable for importance sampling distributions.

As for further research, we suggest to extend the proposed scheme to the case of strong tail dependence, the case of monotonically non-decreasing loss functions, and also the case that the each marginal distribution of an importance sampling distribution is the beta distribution with two free parameters.

Acknowledgements

The authors would like to thank the anonymous reviewers for their comments and suggestions on the first draft of this paper. Their suggestions have greatly improve the quality of the paper. This work was supported by the 2021 sabbatical year research grant of the University of Seoul.

Figures
Fig. 1. Histrogram of the 107 random samples of L(X) with varying values of η.
TABLES

Table 1

Summary statistics of the generated losses

Case Min. 1st Qu. Median Mean 3rd Qu. Max.
η = 0.25 1.51 21.70 28.39 29.61 36.21 116.83
η = 0.5 0.31 19.69 27.86 29.62 37.67 136.18
η = 0.75 0.08 17.90 27.30 29.62 38.85 161.86

Table 2

The estimated tail loss probabilities and some performance measures for various values of thresholds

η = 0.25

γ l simulation time s.e. time*variance

CMC ITLR CMC ITLR CMC ITLR
70 1.66·103 63.5 250.0 1.29·105 3.49·106 1.06·108 3.04·109
80 2.23·104 61.6 240.5 4.82·106 6.65·107 1.43·109 1.06·1010
90 2.66·105 63.1 238.8 1.54·106 9.55·108 1.49·1010 2.18·1012
100 2.62·106 63.2 233.5 5.20·107 1.16·108 1.71·1011 3.16·1014

η = 0.5

γ l simulation time s.e. time*variance

CMC ITLR CMC ITLR CMC ITLR

75 3.73·103 63.1 249.6 1.93·105 1.41·105 2.38·108 4.99·108
90 3.97·104 60.1 240.4 6.29·106 2.53·106 2.37·109 1.54·109
105 3.26·105 63.4 228.3 1.83·106 2.78·107 2.12·1010 1.76·1011
120 2.08·106 62.3 229.2 3.87·107 1.83·108 9.35·1012 7.68·1014

η = 0.75

γ l simulation time s.e. time*variance

CMC ITLR CMC ITLR CMC ITLR

90 1.69·103 63.5 245.0 1.30·105 6.13·106 1.08·108 9.21·109
110 1.21·104 64.6 235.3 3.46·106 1.18·106 7.74·1010 3.25·1010
125 1.39·105 64.3 242.5 1.11·106 1.70·107 7.97·1011 7.03·1012
135 2.97·106 60.8 234.9 4.58·107 4.52·108 1.27·1011 4.81·1013

Table 3

Time*variance ratio of CMC to ITLR

η = 0.25 η = 0.5 η = 0.75

γ time*variance ratio γ time*variance ratio γ time*variance ratio
70 3.5 75 0.48 90 1.2
80 13.5 90 1.5 110 2.4
90 68.3 105 12.0 125 11.3
100 541.1 120 121.7 135 26.4

Algorithm 1

Inverse transform likelihood ratio method for a function of a multivariate random variable

Require: loss function L, threshold γ, nominal copula C0, marginal importance sampling cdfs G1, . . ., Gd, marginal importance sampling pdfs g1, . . ., gd, importance sampling copula C
Ensure: 꽇(L(U) > γ)
 1: Set N as the total number of samples.
 2: Generate V(1), . . ., V(N) independently from copula C.
 3: Get U(1), . . ., U(N) using the transform: For j = 1, . . ., N,
U(j)=(G1-1(V1(j)),,Gd-1(Vd(j))).
 4: Compute
H(U(j))=I(L˜(U(j))>γ), 듼 듼 듼j=1,,N.
 5: Compute the likelihood ratio
W(U(j))=c0(U(j))g(U(j)), 듼 듼 듼j=1,,N.
 6: Return
l^IS=1Nj=1NH(U(j))W(U(j)).

Algorithm 2

The proposed scheme

Require: nominal pdf c0(u), loss function L(u), threshold γ, the number of iterations K, thresholds in each iteration, γ1, γ2, . . ., γK1, the importance sampling pdf g(u; θ) given in Equation (2.18), the value of ρ, the likelihood ratio function W(u; θ).
Ensure: 꽇(L(U) > γ)
 1: γKγ
 2: R(γ0) ← the d-dimensional identity matrix
 3: for k = 1 to K do
 4:   ← 0
 5:  ψ ← (1, . . ., 1)
 6:  while < γk do
 7:   Generate M random samples U(1), . . .,U(M) from g(u; ψ, R(γk1))
 8:    ← min{γk, the (1 − ρ) quantile of L(U(1)), . . ., L(U(M))}
 9:   () ← {j: L(U(j)) > , j = 1, . . ., M}
 10:   for j() do
 11:    WjW(U(j); ψ, R(γk1))
 12:   end for
 13:   for i = 1 to d do
ψ^i-j(s^t)Wjlog(1-Ui(j))j(s^)Wj
 14:   end for
 15:   ψ ← (ψ1, . . ., ψd)
 16:  end while
 17:  ψ(γk) ← ψ
 18:  forj(γ(k)) do
 19:        Zj(Φ1(G1(U1j;ψ^1(γk))),...,Φ1(Gd(Udj,ψ^d(γk))))
 20:  end for
 21:
R^(γk)j(γ(k))Wj(Z(j))TZ(j)j(γ(k))Wj
 22: end for
 23: θ ← (ψ(γK), R(γK))
 24: Generate N random samples U(1), . . .,U(N) from g(u; θ)
 25: Return
l^IS=1Nj=1NH(U(j))W(U(j);θ^).

References
  1. Asmussen S, Binswanger K, and Højgaard B (2000). Rare events simulation for heavy-tailed distributions. Bernoulli, 6, 303-322.
    CrossRef
  2. Asmussen S and Kroese DP (2006). Improved algorithms for rare event simulation with heavy tails. Advances in Applied Probability, 38, 545-558.
    CrossRef
  3. Asmussen S, Blanchet J, Juneja S, and Rojas-Nandayapa L (2011). Efficient simulation of tail probabilities of sums of correlated lognormals. Annals of Operations Research, 189, 5-23.
    CrossRef
  4. Blanchet J and Rojas-Nandayapa L (2011). Efficient simulation of tail probabilities of sums of dependent random variables. Journal of Applied Probability, 48, 147-164.
    CrossRef
  5. De Boer P-T, Kroese DP, Mannor S, and Rubinstein RY (2005). A tutorial on the cross-entropy method. Annals of Operations Research, 134, 19-67.
    CrossRef
  6. Dupuis P, Leder K, and Wang H (2007). Importance sampling for sums of random variables with regularly varying tails. ACM Transactions on Modeling and Computer Simulation (TOMACS), 17, 14.
    CrossRef
  7. Glynn PW and Iglehart DL (1989). Importance sampling for stochastic simulations. Management Science, 35, 1367-1392.
    CrossRef
  8. Glynn PW and Whitt W (1992). The asymptotic efficiency of simulation estimators. Operations Research, 40, 505-520.
    CrossRef
  9. Homem-de-Mello T and Rubinstein RY (2002). Estimation of rare event probabilities using cross-entropy. Proceedings of the Winter Simulation Conference San Diego, CA, USA. , 310-319.
    CrossRef
  10. Juneja S and Shahabuddin P (2002). Simulating heavy tailed processes using delayed hazard rate twisting. ACM Transactions on Modeling and Computer Simulation (TOMACS), 12, 94-118.
    CrossRef
  11. Juneja S and Shahabuddin P (2006). Rare-event simulation techniques: An introduction and recent advances. Handbooks in Operations Research and Management Science, 13, 291-350.
    CrossRef
  12. Kroese DP and Rubinstein RY (2004). The transform likelihood ratio method for rare event simulation with heavy tails. Queueing Systems, 46, 317-351.
    CrossRef
  13. Lieber D (1998). Rare-event estimation via cross-entropy and importance sampling (Doctoral dissertation), Technion-Israel Institute of Technology, Haifa.
  14. Nelsen RB (2006). An Introduction to Copulas, New York, Springer.
  15. Rached NB, Benkhelifa F, Alouini M-S, and Tempone R (2015). A fast simulation method for the log-normal sum distribution using a hazard rate twisting technique. Proceedings of 2015 IEEE International Conference on Communications (ICC), London, 4259-4264.
    CrossRef
  16. Rached NB, Benkhelifa F, Kammoun A, Alouini M-S, and Tempone R (2018). On the generalization of the hazard rate twisting-based simulation approach. Statistics and Computing, 28, 61-75.
    CrossRef
  17. Rubinstein RY (1999). The cross-entropy method for combinatorial and continuous optimization. Methodology and Computing in Applied Probability, 1, 127-190.
    CrossRef
  18. Rubinstein RY (2002). Cross-entropy and rare events for maximal cut and partition problems. ACM Transactions on Modeling and Computer Simulation (TOMACS), 12, 27-53.
    CrossRef
  19. Rubinstein RY and Kroese DP (2016). Simulation and the Monte Carlo Method, Hoboken, New Jersey, John Wiley & Sons.
    CrossRef
  20. Rubinstein RY and Shapiro A (1993). Discrete Event Systems: Sensitivity Analysis and Stochastic Optimization by the Score Function Method, England, Wiley, Chichester.
  21. Sadowsky JS (1993). On the optimality and stability of exponential twisting in Monte Carlo estimation. IEEE Transactions on Information Theory, 39, 119-128.
    CrossRef
  22. Sadowsky JS and Bucklew JA (1990). On large deviations theory and asymptotically efficient Monte Carlo estimation. IEEE Transactions on Information Theory, 36, 579-588.
    CrossRef
  23. Sak H and Hörmann W (2012). Fast simulations in credit risk. Quantitative Finance, 12, 1557-1569.
    CrossRef