TEXT SIZE

CrossRef (0)
Principal component analysis for Hilbertian functional data

Dongwoo Kima, Young Kyung Lee1,b, Byeong U. Parka

aDepartment of Statistics, Seoul National University, Korea;
bDepartment of Information Statistics, Kangwon National University, Korea
Correspondence to: 1Department of Information Statistics, Kangwon National University, 1 Gangwondaehak-gil, Chuncheon-si, Gangwon-do 24341, Korea. E-mail: youngklee@kangwon.ac.kr
Received November 11, 2019; Revised December 3, 2019; Accepted December 3, 2019.
Abstract
In this paper we extend the functional principal component analysis for real-valued random functions to the case of Hilbert-space-valued functional random objects. For this, we introduce an autocovariance operator acting on the space of real-valued functions. We establish an eigendecomposition of the autocovariance operator and a Karuhnen-Lo`eve expansion. We propose the estimators of the eigenfunctions and the functional principal component scores, and investigate the rates of convergence of the estimators to their targets. We detail the implementation of the methodology for the cases of compositional vectors and density functions, and illustrate the method by analyzing time-varying population composition data. We also discuss an extension of the methodology to multivariate cases and develop the corresponding theory.
Keywords : principal component analysis, functional data, Hilbert space
1. Introduction

Principal component analysis (PCA) is a useful tool for identifying the dominant directions of variation of multivariate data. Functional PCA (FPCA) is its generalization to time-varying data, called functional data or stochastic processes. Functional data may take values in Euclidean spaces, Hilbert spaces, or more generally in metric spaces. Most of the works on functional data have been focused on the Euclidean case (Ramsay and Silverman, 2005; Chiou et al., 2014). These days we often encounter non-Euclidean data. There has been increasing interest in developing suitable statistical methodologies for analyzing non-Euclidean data. The Hilbertian case is central in this development. In this paper, we discuss PCA when the observed data are time-varying and take values in a general Hilbert space.

Recently, Lin and Yao (2019) proposed an approach to FPCA for Riemannian manifolds. Although they are mainly concerned with manifold-valued functional data, their approach is based on an eigenanalysis for Hilbertian functional data, so that one may use the procedure described there for Hilbertian stochastic processes. The eigenanalysis discussed in Lin and Yao (2019) gives eigenfunctions that live in the same space as the data objects. This means that, if one adopts their approach for Hilbertian functional data, then one would have eigenfunctions whose trajectories are time-varying Hilbertian values. In such cases one may face some difficulty in visualizing or interpreting the resulting eigenfunctions.

In this paper we take a different approach. Our approach gives real-valued eigenfunctions regardless of the underlying Hilbert space where the random functions take values. The associated functional principal component (FPC) scores lie in the Hilbert space. For example, consider the case where the Hilbert space under study is a simplex, . Let X be a random function defined on a time domain such that X(t) for each is a random element taking values in . In this case, our approach gives real-valued eigenfunctions which map to $R$, and FPC scores which live in . Our procedure specialized to this simplex example may be considered as a generalization of the recent work by Wang et al. (2015) who studied the PCA for multivariate compositional data. By introducing an autocovariance operator acting on the space of real-valued functions, we establish an eigendecomposition of the autocovariance operator and a Karuhnen-Loève expansion. We illustrate our proposal through a real data example. We also extend the methodology to multivariate Hilbertian functional data.

2. Methodology

Let $H$ be a separable Hilbert space equipped with an inner product ⟨· , · ⟩$H$ and the corresponding norm ∥·∥ $H$. We denote the zero vector by 0, the addition operation by ⊕ and the scalar multiplication by ⊙. There are numerous examples of Hilbert spaces. The most well-known examples are the spaces of square integrable real-valued functions defined on a set S$Rk$. In this case, 0 is a zero function, ab for a = a(·) and b = b(·) with a, b : S$R$ is defined by (ab)(z) = a(z) + b(z) and (c ⊙ a)(z) = c · a(z) for c$R$. It is well known that these are separable Hilbert spaces. In Section 3 we discuss two other examples. Throughout this paper, we let 

L 2 , for a Hilbert space ℋ with an inner product ⟨· , ·⟩, denote the space of square integrable ℋ-valued functions f : [0, 1] → ℋ, i.e., $L 2 ℋ = { f : ∫ 0 1 ‖ f ( t ) ‖ ℋ 2 d t < ∞ }$ where $‖ f ( t ) ‖ ℋ 2 = ⟨ f ( t ) , f ( t ) ⟩ ℋ$. The associated inner product $⟨ · , · ⟩ L 2 ℋ$ on $L 2 ℋ$ is defined by $⟨ f , g ⟩ L 2 ℋ = ∫ 0 1 ⟨ f ( t ) , g ( t ) ⟩ ℋ d t$ and its norm $‖ · ‖ L 2 ℋ$ by $‖ f ‖ L 2 ℋ 2 = ⟨ f , f ⟩ L 2 ℋ$.

### 2.1. Eigenanalysis for Hilbertian functional data

Let X be a continuous $H$-valued stochastic process on the unit time interval [0, 1]. For each t ∈ [0, 1], X(t) is a random element taking values in $H$. We assume $E ∫ 0 1 ‖ X ( t ) ‖ H 2 d t < ∞$. This implies that $E ‖ X ( t ) ‖ H 2 < ∞$ a.e. t ∈ [0, 1], . It also implies that $∫ 0 1 ‖ X ( t ) ‖ H 2 d t < ∞$ with probability one. For simplicity of presentation, we consider a centered X such that EX(t) = 0 for all t ∈ [0, 1].

We let C : [0, 1] × [0, 1] → $R$ be the bivariate function defined by

$C ( s , t ) = E ( ⟨ X ( s ) , X ( t ) ⟩ H ) .$

In the case where $H$ = $R$, the C reduces to the usual ‘autocovariance’ function C(s, t) = E (X(s)X(t)). We consider the integral operator $C : L 2 R → L 2 R$ defined by

$C ( f ) ( s ) = ∫ 0 1 C ( s , t ) f ( t ) d t , s ∈ [ 0 , 1 ] , f ∈ L 2 R .$

Then, is nonnegative definite. This follows since

$⟨ C ( f ) , f ⟩ L 2 R = ∫ 0 1 C ( f ) ( s ) · f ( s ) d s = E ‖ ∫ 0 1 f ( t ) ⊙ X ( t ) d t ‖ H 2 ≥ 0 ,$

for all $f ∈ L 2 R$, where the integral on the right hand side of the second equality is in the Bochner sense. Bochner integration is a notion of integration that generalizes the conventional Lebesgue integrals to Banach-space-valued maps, see Section 2 in Jeon and Park (2020) for details. The kernel C(· , ·) of the integral operator admits the following eigendecomposition due to Mercer’s Theorem:

$C ( s , t ) = ∑ j = 1 ∞ λ j ϕ j ( s ) ϕ j ( t ) ,$

where λ1λ2 ≥ · · · ≥ 0 are the eigenvalues and ${ ϕ j } l = 1 ∞$ are the associated eigenfunctions of that form a complete orthonormal system for the space $L 2 R$. The equality (2.2) is a standard result in functional data analysis, see Theorem 4.6.5 in Hsing and Eubank (2015), for example. The sum of all eigenvalues equals the total variation of X, i.e.,

$∑ j = 1 ∞ λ j = E ∫ 0 1 ‖ X ( t ) ‖ H 2 d t .$

Now, we derive a Karhunen-Loève expansion of X. We note that the standard result such as Theorem 7.2.7 in Hsing and Eubank (2015) do not apply here, since $ϕ j ∈ L 2 R$ and $X ( · , ω ) ∈ L 2 H$ belong to different spaces. Let ${ e l } l = 1 L$ be an orthonormal basis of $H$, where we allow L = ∞. For each l ≥ 1, define Zl(t) = ⟨X(t), el$H$. They are real-valued stochastic processes. Note that $Z l ( · ) ∈ L 2 R$ with probability one since

$∫ 0 1 ∣ ⟨ X ( t ) , e l ⟩ H ∣ 2 d t ≤ ∫ 0 1 ‖ X ( t ) ‖ H 2 d t$

and the integral on the right hand side of (2.3) exists with probability one. Since ${ ϕ j } j = 1 ∞$ is a complete orthonormal system for $L 2 R$, it holds that

$Z l ( t ) = ∑ j = 1 ∞ ( ∫ 0 1 ⟨ X ( s ) , e l ⟩ H · ϕ j ( s ) d s ) · ϕ j ( t ) .$

Also, since ${ e l } l = 1 L$ is an orthonormal basis of $H$, we get that

$X ( t ) = ⊕ l = 1 L Z l ( t ) ⊙ e l = ⊕ j = 1 ∞ ⊕ l = 1 L ( ∫ 0 1 ⟨ X ( s ) , e l ⟩ H · ϕ j ( s ) d s · ϕ j ( t ) ) ⊙ e l = ⊕ j = 1 ∞ ( ⊕ l = 1 L ⟨ ∫ 0 1 X ( s ) ⊙ ϕ j ( s ) d s , e l ⟩ H ⊙ e l ) ⊙ ϕ j ( t ) = ⊕ j = 1 ∞ ( ∫ 0 1 X ( s ) ⊙ ϕ j ( s ) d s ) ⊙ ϕ j ( t ) .$

Here, we have used (2.4) for the second equation. The integrals on the right hand sides of the third and fourth equations in (2.5) are Bochner integrals. Let $ξ j = ∫ 0 1 X ( s ) ⊙ ϕ j ( s ) d s$. These are in $H$ and plays the role of FPC in the conventional FPCA for real-valued random functions. We call ξj the FPC scores of X. The foregoing arguments establish the following theorem. Let ⊖ denote the subtraction operation in $H$ defined by ab = a ⊕ (−1) ⊙ b.

Theorem 1

Assume that X is a continuous $H$-valued stochastic process on [0, 1] and that $∫ 0 1 ‖ X ( t ) ‖ H 2 d t < ∞$. Then, it holds that

$lim N → ∞ sup t ∈ [ 0 , 1 ] E ‖ X ( t ) ⊖ ⊕ j = 1 N ξ j ⊙ ϕ j ( t ) ‖ H 2 = 0.$

Apparently, E ξj = 0. Any two FPC scores, ξj and ξk for j ≠ k, are uncorrelated in the sense that E(⟨ξj, ξk$H$) = 0. Their variances defined by $E ( ‖ ξ j ‖ H 2 )$ equal the corresponding eigenvalues λj. The latter two properties follow from

$E ( ⟨ ξ j , ξ k ⟩ H ) = ∫ [ 0 , 1 ] 2 C ( s , t ) ϕ j ( s ) ϕ k ( t ) d s d t = ∑ l ≥ 1 λ l ( ∫ 0 1 ϕ l ( s ) ϕ j ( s ) d s ) ( ∫ 0 1 ϕ l ( t ) ϕ k ( t ) d t ) = λ j · I ( j = k ) .$
Remark 1

We are aware of two recent pioneering works on PCA for Riemannian functional data, which are Dai and Müller (2018) and Lin and Yao (2019). Both deal with functional data taking values in Riemannian manifolds. The basic idea of the first is to embed the manifold under consideration in an Euclidean ambient space, perform an eigenanalysis in the ambient space and then transform the results back to the manifold via the ‘exponential map’. Thus, the actual eigenanalysis is done on a Euclidean space. The latter approach is to transform Riemannian stochastic process to a random element taking values in a tensor Hilbert space and then perform an eigenanalysis in the latter Hilbert space. However, the eigenanalysis in the Hilbert space is different from ours in that it produces eigenfunctions residing in the same space as the transformed stochastic process, while our approach gives eigenfunctions residing in $L 2 R$, not in $L 2 H$, regardless of the nature of $H$.

### 2.2. Estimation of eigenfunctions and functional principal component scores

Let Xi, 1 ≤ in, be independent copies of X that we actually observe. In this subsection, we describe a method of estimating ϕj and ξj that we introduced in the previous subsection. Without loss of generality we assume that the sample average $n - 1 ⊕ i = 1 n X i ( t ) = 0$. First, we estimate C(s, t) by

$C ^ ( s , t ) = n - 1 ∑ i = 1 n ⟨ X i ( s ) , X i ( t ) ⟩ H .$

Define $C ^ : L 2 R → L 2 R$, the estimator of the integral operator , by

$C ^ ( f ) ( s ) = ∫ 0 1 C ^ ( s , t ) f ( t ) d t , s ∈ [ 0 , 1 ] , f ∈ L 2 R .$

Then, it is also nonnegative definite since

$⟨ C ^ ( f ) , f ⟩ L 2 R = n - 1 ∑ i = 1 n ‖ ∫ 0 1 f ( t ) ⊙ X i ( t ) d t ‖ H 2 ≥ 0 ,$

for all $f ∈ L 2 R$. Thus, due to Mercer’s Theorem again it holds that there exists λ̂1λ̂2 ≥ ·· · ≥ 0 and a complete orthonormal basis ${ ϕ ^ j } j = 1 ∞$ of $L 2 R$ such that

$C ^ ( s , t ) = ∑ j = 1 ∞ λ ^ j ϕ ^ j ( s ) ϕ ^ j ( t ) .$

The $λ̂j$ are the estimators of the respective λj and $ϕ̂j$ are the estimators of the respective ϕj. Let $ξ i j = ∫ 0 1 X i ( s ) ⊙ ϕ j ( s ) d s$ be the true FPC scores of Xi. We estimate ξi j by

$ξ ^ i j = ∫ 0 1 X i ( s ) ⊙ ϕ ^ j ( s ) d s , j ≥ 1.$

The empirical FPC scores ($ξ̂ij:1≤i≤n$) and ($ξ̂ik:1≤i≤n$) for j ≠ k are also uncorrelated in the sense that

$n - 1 ∑ i = 1 n ⟨ ξ ^ i j , ξ ^ i k ⟩ H = 0.$

Also, we have $λ ^ j = n - 1 ∑ i = 1 n ‖ ξ ^ i j ‖ H 2$ for 1 ≤ jd. Furthermore, using the standard perturbation theory in Bosq (2000) or in Hsing and Eubank (2015), for example, we may show the following theorem.

Theorem 2

Assume that $E ∫ 0 1 ‖ X ( t ) ‖ H 4 d t < ∞$, and that min1≤lJ (λlλl+1) > 0, where J ≥ 1 is fixed. Then, for all 1 ≤ jJ, it holds that $∫ 0 1 ∣ ϕ ^ j ( t ) - ϕ j ( t ) ∣ 2 d t = O p ( n - 1 )$. Furthermore, if we assume additionally that $E ∫ 0 1 ‖ X ( t ) ‖ H 2 β d t < ∞$ for some β ≥ 2, then $max1≤i≤n∥ξ̂ij⊖ξij∥H=Op(n-(β-1)/2β)$) for all 1 ≤ jJ.

In the above theorem, J is fixed. We may let J depend on the sample size n and grow as n → ∞. In the latter case, the rates of convergence of $ϕ̂j$ and $ξ̂ij$ to their targets depend on how fast δJ := min1≤lJ(λlλl+1) tends to zero as J →∞.

3. Two special cases

Here, we consider two Hilbert spaces and give some more details for the implementation of the methodology presented in the previous section. One is the space of compositional vectors and the other is the space of density functions. The vector operations for these spaces are unconventional. We use isometric isomorphisms that map the respective Hilbert spaces to $Rk$ and $L 2 R$ where the conventional vector operations apply. We note that, for the space of square integrable real-valued functions on $Rd$, which is another important class of $H$, the vector operations are defined by (f (·) ⊕ g(·))(u) = f (u) + g(u) and (c ⊙ f (·))(u) = c · f (u).

### 3.1. Space of compositional vectors

Suppose that $H$ is the standard (k – 1)-simplex, $S k - 1 = { ( v 1 , … , v k ) ∈ ( 0 , 1 ) k : ∑ j = 1 k v j = 1 }$. For and c$R$, define vector addition, scalar multiplication and inner product by

$v ⊕ w = ( v 1 w 1 v 1 w 1 + ⋯ + v k w k , … , v k w k v 1 w 1 + ⋯ + v k w k ) , c ⊙ v = ( v 1 c v 1 c + ⋯ + v k c , … , v k c v 1 c + ⋯ + v k c ) , ⟨ v , w ⟩ S k - 1 = 1 2 k ∑ j = 1 k ∑ l = 1 k log ( v j v l ) log ( w j w l ) .$

With 0 = (1/k, . . . , 1/k) as the zero vector, is a separable Hilbert space with the operation structure at (3.1), known as Aitchison geometry (Aitchison, 1986). The map defined by ℒ(u) = x with $x j = log u j - k - 1 ∑ l = 1 k log u l$, also known as the ‘center log ratio’ transform, is an isometry such that , where ⟨· , ·⟩E denotes the Euclidean inner product, i.e., ⟨a, bE = ab. The transformation ℒ is also an isomorphism. The addition vw and the scalar multiplication c ⊙ v in may be performed by the Euclidean operation with ℒ(v) and ℒ(w) and then transforming the results back to the corresponding compositional vectors. That is,

$v ⊕ w = L - 1 ( L ( v ) + L ( w ) ) , c ⊙ v = L - 1 ( c · L ( v ) ) .$

As we have seen in the previous section, the FPCA with Hilbertian functional data boils down to the computation of the covariance kernel $C^$ at (2.6). In the case of compositional functional data Xi such that for all t ∈ [0, 1], it is nothing else than

$C ^ ( s , t ) = n - 1 ∑ i = 1 n L ( X i ( s ) ) ⊤ L ( X i ( t ) ) .$

The eigenfunctions $ϕ̂j$ of the kernel $C^$ at (3.2) are then readily obtained from the existing method of eigenanalysis for real-valued functional data.

We discuss the computation of the Bochner integrals at (2.7) for $ξ̂ij$. Suppose that we want to compute a Bochner integral $∫ 0 1 f ( t ) d t$ for a Bochner integrable map $f ∈ L 2 S k - 1$. Then, there exists a sequence of simple maps such that $∫ 0 1 ‖ f n ( t ) ⊖ f ( t ) ‖ S k - 1 d t → 0$ as n → ∞. Since the center log ratio transform ℒ is an isometry, ℒ(f) is Lebesgue integrable. Furthermore, ∫0 ∥ℒ(fn(t)) – ℒ(f(t))∥E dt → 0, where ∥ · ∥E denotes the Euclidean norm in $Rk$. Now, note that the simple maps fn take the form $f n ( t ) = ⊕ l = 1 N 1 S l , n ( t ) ⊙ h l , n$, where Sl,n ⊂ [0, 1] satisfy Sl,nSl′,n = ∅︀ for l ≠ l′ and $⊕ l = 1 N S l , n = [ 0 , 1 ]$, and hl,n are constant compositional vectors. For such simple maps, it holds that

$∫ 0 1 f n ( t ) d t = ⊕ l = 1 N μ ( S l , n ) ⊙ h l , n = L - 1 L ( ⊕ l = 1 N μ ( S l , n ) ⊙ h l , n ) = L - 1 ( ∑ l = 1 N μ ( S l , n ) · L ( h l , n ) ) = L - 1 ( ∫ 0 1 L ( f n ( t ) ) d t )$

since ℒ is injective, where the last integral is in Lebesgue sense. Since ℒ and its inverse are continuous, we get from (3.3) and the convergence $∫ 0 1 ‖ f n ( t ) ⊖ f ( t ) ‖ S k - 1 d t → 0$ that

$∫ 0 1 f ( t ) d t = lim n → ∞ L - 1 ( ∫ 0 1 L ( f n ( t ) ) d t ) = L - 1 ( ∫ 0 1 L ( f ( t ) ) d t ) ,$

where the second and third integrals are in Lebesgue sense. This means that we may evaluate $ξ̂ij$ by the Lebesgue integration of ℒ(Xi(t) ⊙ $ϕ̂j$(t)) as follows.

$ξ ^ i j = L - 1 ( ∫ 0 1 L ( X i ( t ) ⊙ ϕ ^ j ( t ) ) d t ) , j ≥ 1.$

### 3.2. Space of density functions

For simplicity we consider the space of density functions supported on [0, 1]. An extension to the case of a general support on $Rd$ with finite Lebesgue measure is immediate. This space is actually not a Hilbert space. We consider an equivalence relation such that f and g are equivalent if and only if f is a constant multiple of g. Let [ f ] denote the class of real-valued functions g such that g = c · f for some constant c > 0. Let

$F = { [ f ] : ∫ 0 1 ( log f ( x ) ) 2 d x < ∞ } .$

For this space, we define vector addition, scalar multiplication and inner product as follows.

$[ f ] ⊕ [ g ] = [ f · g ] , c ⊙ [ f ] = [ f c ] , ⟨ [ f ] , [ g ] ⟩ F = ∫ [ 0 , 1 ] 2 log ( f ( x ) f ( y ) ) log ( g ( x ) g ( y ) ) d x d y .$

With 0 = [1] = $R$ as the zero vector, is a separable Hilbert space with the operation structure at (3.5), see van den Boogaart et al. (2014).

We define the map $L : F → L 2 R$ by

$L ( [ f ] ) ( x ) = log f ( x ) - ∫ 0 1 log f ( u ) d u .$

Then, ℒ is an isometry preserving the metrics on the two spaces:

$⟨ [ f ] , [ g ] ⟩ F = ⟨ L ( [ f ] ) , L ( [ g ] ) ⟩ L 2 R = ∫ 0 1 ( log f ( x ) - ∫ 0 1 log f ( u ) d u ) ( log g ( x ) - ∫ 0 1 log g ( u ) d u ) d x .$

Here we recall that $⟨ · , · ⟩ L 2 R$ is the usual inner product on $L 2 R$. We note that ℒ is scale-invariance, so that we may simply write ℒ( f ) instead of ℒ([ f ]). The transformation ℒ is also an isomorphism. As in the case of compositional vectors, the addition and scalar multiplication in ℱ may be performed by doing the corresponding operation on $L 2 R$ and then transforming the results back to the density space. That is,

$[ f ] ⊕ [ g ] = L - 1 ( L ( f ) + L ( g ) ) , c ⊙ [ f ] = L - 1 ( c · L ( f ) ) .$

The covariance kernel $C^$ at (2.6) in this case is given by

$C ^ ( s , t ) = n - 1 ∑ i = 1 n ⟨ L ( X i ( s ) ) , L ( X i ( t ) ) ⟩ L 2 R .$

Here, it is worthwhile to note that Xi(s) for a given s ∈ [0, 1] is a random density, i.e., if we write it as Xi(· , s), it is a random element such that Xi(u, s) ≥ 0 and $∫ 0 1 X i ( u , s ) d u = 1$

with probability one. For the kernel $C^$ at (3.7), we may also obtain its eigenfunctions $ϕ̂j$ from the existing method of eigenanalysis for real-valued functional data.

Since ℒ as defined at (3.6) is an isometry, injective, continuous and has continuous inverse, we may prove (3.4) as well for the space of density functions. In practical implementation, we may discretize densities on a grid of [0, 1], which result in functional compositional vectors. Then, we are able to apply the procedure described in Section 3.1 to density functional data. Specifically, we may choose a grid {u1, . . . , uD} ⊂ [0, 1] and then apply (3.2) and (3.4) with

$X i ( t ) = ( X i ( u 1 , t ) ∑ j = 1 D X i ( u j , t ) , … , X i ( u D , t ) ∑ j = 1 D X i ( u D , t ) )$

as -valued random elements.

4. Real data example

Here, we illustrate the methodology we presented in Section 2. We analyzed changes in population composition by age as time passes, which is an important element in demographic analysis and a key to understanding the social dynamic. The dataset we analyzed came from the World Population Prospects 2019 (WPP, https://population.un.org/wpp/) offered by the United Nations (UN), available at https://population.un.org/wpp/Download/Files/1 Indicators%20(Standard)/CSV FILES/WPP. The WPP2019 contains numerous data, among which we took those on population by location, age group and year. The original dataset consists of population estimates from the year 1950 to 2020, and projections from 2020 to 2100. Of these years we took the period 1950–2018. The original age groups were 0–4, 5–9, . . . , 95–99, 100+. We aggregated them into three age groups, 0–19 (before working age), 20–64 (work force), 65+ (retired). Locations are given in several types of categories such as country, region, subregion, etc. We chose ‘country’ as the data unit and the number of countries was 201. Thus, we obtained compositional vectors $x i o ( t j ) = ( x i 1 o ( t j ) , x i 2 o ( t j ) , x i 3 o ( t j ) )$ for 1 ≤ i ≤ 201 and tj = 1950 + ( j – 1) for 1 ≤ j ≤ 69, where $x i 1 o , x i 2 o$, and $x i 3 o$ are the population proportions for the age groups, 0–19, 20–64 and 65+, respectively, for the ith country.

To transform the dataset further into a set of smooth compositional vectors over time, we pre-smoothed $x i o ( t j )$ along the time scale on [1950, 2018]. We applied the local linear smoothing technique to $L ( x i o ( t j ) )$, where ℒ is the center log transform introduced in Section 3.1, and then transform the results back to compositional functional vectors. Specifically, we computed

$x i ( t ) = L - 1 ( arg min a ∈ R 3 ∑ j = 1 69 K h ( t , t j ) ‖ L ( x i o ( t j ) ) - a - b ( t j - t ) ‖ E 2 ) , t ∈ [ 1950 , 2018 ] ,$

where Kh(t, tj) with the bandwidth h are kernel weights defined by Kh(t, tj) = h−1K((ttj)/h) for a symmetric nonnegative function K and ∥ · ∥E denotes the Euclidean norm on $R$3. We chose the Epanechnikov kernel K(u) = (3/4)(1 – u2)I(|u| ≤ 1) and used h = 2.0.

Our primary interest in this example was to see how well a few number of the FPC scores can explain the apparent change in population composition over time. For this we took dominant eigenfunctions based on the criterion called ‘fraction of variance explained (FVE)’. The FVE of the first J eigenfunctions is defined by $FVE J = ∑ j = 1 J λ ^ j / ∑ j ≥ 1 λ ^ j$. We chose eigenfunctions which altogether explained more than 90% of the variation of the sample compositional functional vectors {xi : 1 ≤ i ≤ 201}. It turned out that J = 2. In fact, the FVEs of the leading eigenfunctions were FVE1 = 0.899 and FVE2 = 0.976. Figure 1 depicts the first two eigenfunctions ϕ̂1 and ϕ̂2. They represent the principal deviations from the average population composition over time. The first eigenfunction has an increasing trend. This indicates that variability between population composition patterns amongst different countries is mostly explained by increased separation from the mean composition as time passed. This might have something to do with increased variability in recent years among 201 countries in birth rate and human mortality.

To illustrate how well the two leading FPC scores can reproduce the original data, we computed $ξ̂ij$ according to the formula (3.4) and reconstructed Xi as

$X ^ i ( t ) = ξ ^ i 1 ⊙ ϕ ^ 1 ( t ) ⊕ ξ ^ i 2 ⊙ ϕ ^ 2 ( t ) .$

Figure 2 compares the original Xi and the reconstructed i for Republic of Korea. We find that the reconstruction is quite successful, meaning that the first two FPC scores capture most of time-dynamic features in the population composition of the country. This is also the case with other countries, although we do not present them all here.

We performed a cluster analysis based on the two leading FPC scores. We formed three clusters based on the following distance metric between xi and xl for 1 ≤ i ≠ l < 201.

$d ( x i , x l ) = ‖ ξ ^ i 1 ⊖ ξ ^ i 1 ‖ S 2 2 + ‖ ξ ^ i 2 ⊖ ξ ^ l 2 ‖ S 2 2 = ‖ L ( ξ ^ i 1 ) - L ( ξ ^ l 1 ) ‖ E 2 + ‖ L ( ξ ^ i 2 ) - L ( ξ ^ l 2 ) ‖ E 2 .$

We employed hierarchical clustering based on complete linkage, which takes

$d ( C j , C k ) = max x ∈ C j , y ∈ C k d ( x , y ) .$

as the distance between two clusters and . Figure 3 depicts the average compositional vectors over time for each cluster. The figure demonstrates the central features of the clusters. The cluster consists of countries where population composition does not change much over time. The proportion of the retired ages in the cluster is almost constant over time. Those countries in the clusters and experienced some level of decrease in the proportion of the young ages since near 1970, but the proportion in increased until 1970 while in they stayed constant until then. Also, both and saw some level of increase in the proportion of the retired ages, but started from a relatively larger elderly proportion and underwent more rapid increase than .

We also compared our clustering result with the UN development groups obtained from the WPP2019. We took the three groups from the WPP2019: countries in more developed regions, least developed countries, and other less developed countries. More developed regions comprise Europe, Northern America, Australia/New Zealand and Japan. Less developed regions comprise all regions of Africa, Asia (except Japan), Latin America and the Caribbean plus Melanesia, Micronesia and Polynesia. Countries in less developed regions are divided further into two groups, one is the group of 47 least developed countries and the other consists of countries in less developed regions excluding the least developed ones. The criterion for least developed countries can be found at http://unohrlls.org/about-ldcs/criteria-for-ldcs/. Table 1 is a contingency table comparing the two clusterings. We find that the clusters based on the compositional FPC are very close to the UN development groups.

5. Multivariate extension

In this section, we discuss a multivariate extension of the methodology presented in Section 2. Suppose that we now have X = (X(1), . . . ,X(d)) where each X(j) is a $H$-valued stochastic process with $E ∫ 0 1 ‖ X ( j ) ( t ) ‖ H 2 d t < ∞$. Thus, X(t) for each t ∈ [0, 1] takes values in $Hd$. Without loss of generality we also assume E X(t) = 0d for all t ∈ [0, 1], where 0d is the zero vector in $Hd$.

To accommodate the dependency between X(j) and X(k) for j ≠ k, we consider the following d × d matrix of covariance functions:

$C ( s , t ) = ( C j k ( s , t ) ) , C j k ( s , t ) = E ( ⟨ X ( j ) ( s ) , X ( k ) ( t ) ⟩ H ) .$

Define the integral operator $C ⊗ : L 2 R d → L 2 R d$ such that for $f ∈ L 2 R d$

$C ⊗ ( f ) ( s ) = ∫ 0 1 C ( s , t ) f ( t ) d t , s ∈ [ 0 , 1 ] .$

We endow $L 2 R d$ with the metric $⟨ f , g ⟩ L 2 R d = ∑ j = 1 d ⟨ f j , g j ⟩ L 2 R$. We may prove that for all $f ∈ L 2 R d$

$⟨ C ⊗ ( f ) , f ⟩ L 2 R d = E ‖ ⊕ j = 1 d ∫ 0 1 f j ( t ) ⊙ X ( j ) ( t ) d t ‖ H 2 ≥ 0.$

Thus, the integral operator is nonnegative definite. Also, its kernel C is symmetric, i.e., C(s, t) = C(t, s) for all s, t ∈ [0, 1]. Thus, due to Mercer’s Theorem,

$C ( s , t ) = ∑ j = 1 ∞ λ j ϕ j ( s ) ϕ j ( t ) ⊤ ,$

where λ1λ2 ≥ · · · ≥ 0 are the eigenvalues and ${ ϕ j } l = 1 ∞$ are the associated eigenfunctions of which form a complete orthonormal system for the space $L 2 R d$.

Let ϕjk denote the kth entry of ϕj and ${ e l } l = 1 L$ be an orthonormal basis of $H$. Consider the d-variate real-valued random functions $Zl(t)=(⟨X(1)(t),el⟩H,...,⟨X(d)(t),el⟩H)⊤$ for 1 ≤ lL. Then, we may write

$Z l ( t ) = ∑ j = 1 ∞ ⟨ Z l , ϕ j ⟩ L 2 R d · ϕ j ( t ) = ∑ j = 1 ∞ ( ∑ k = 1 d ⟨ ∫ 0 1 X ( k ) ( s ) ⊙ ϕ j k ( s ) d s , e l ⟩ H ) · ϕ j ( t ) .$

Write $Zlk(t)=⟨X(k)(t),el⟩H$ for the kth entry of Zl(t). Plugging the expression at (5.1) into the right hand side of the equality $X ( k ) ( t ) = ⊕ l = 1 L Z l k ( t ) ⊙ e l$, we get the following Karhunen-Loève expansion of X:

$X ( t ) = ⊕ j = 1 ∞ ( ⊕ k = 1 d ∫ 0 1 X ( k ) ( s ) ⊙ ϕ j k ( s ) d s ) ⊙ ϕ j ( t ) .$

The corresponding FPC scores in the above eigenanalysis are

$ξ j = ⊕ k = 1 d ∫ 0 1 X ( k ) ( s ) ⊙ ϕ j k ( s ) d s , j ≥ 1.$

The FPC scores reside in $H$, not in $Hd$, and the eigenfunctions ϕj lie in $L 2 R d$. As in the case of d = 1 we treated in Section 2, ξj are uncorrelated in the sense that $E(⟨ξj,ξk⟩H)=0$ for j ≠ k and $E ‖ ξ j ‖ H 2 = λ j$. Here, we have used the fact that

$‖ ϕ j ‖ L 2 R d 2 = ∑ l = 1 d ∫ 0 1 ϕ j l ( t ) 2 d t = 1 , j ≥ 1 , ⟨ ϕ j , ϕ k ⟩ L 2 R d 2 = ∑ l = 1 d ∫ 0 1 ϕ j l ( t ) ϕ k l ( t ) d t = 0 , j ≠ k ≥ 1.$

The following theorem is a multivariate version of Theorem 1. In the theorem, h ⊙ c for h$H$ and c = (c1, . . . , cd)$Rd$ means (c1h, . . . , cdh)$Hd$.

Theorem 3

Assume $E ∫ 0 1 ‖ X ( j ) ( t ) ‖ H 2 d t < ∞$ for all 1 ≤ jd. Then, it holds that $X ( t ) = ⊕ j = 1 ∞ ξ j ⊙ ϕ j ( t )$ with probability one.

We may follow the procedure we described in Section 2.2 to estimate the eigenfunctions and the FPC scores. Let Xi, 1 ≤ in, be independent copies of X. Without loss of generality we assume that $n - 1 ⊕ i = 1 n X i ( t ) = 0$. Then, we may estimate C(s, t) by

$C ^ ( s , t ) = ( C ^ j k ( s , t ) ) , C ^ j k ( s , t ) = n - 1 ∑ i = 1 n ⟨ X i ( j ) ( s ) , X i ( k ) ( t ) ⟩ H .$

This would give the eigendecomposition $C ^ ( s , t ) = ∑ j = 1 ∞ λ ^ j ϕ ^ j ( s ) ϕ ^ j ( t )$. We take $λ̂j$ as the estimators of λj and $ϕ̂j$ as the estimators of the ϕj. We also estimate the FPC scores of Xi by

$ξ ^ i j = ⊕ k = 1 d ∫ 0 1 X i ( k ) ( s ) ⊙ ϕ ^ j k ( s ) d s , j ≥ 1 , 1 ≤ i ≤ n .$

Then, as in the case of d = 1, we have

$n - 1 ∑ i = 1 n ⟨ ξ ^ i j , ξ ^ i k ⟩ H = 0 , j ≠ k ≥ 1 , n - 1 ∑ i = 1 n ‖ ξ ^ i l j ‖ H 2 = λ ^ j , j ≥ 1.$

For these estimators of eigenfunctions and FPC scores, we may also obtain an analogue of Theorem 2.

Acknowledgements

Research of Young Kyung Lee was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIP) (NRF-2018R1A2B6001068). Research of Dongwoo Kim and Byeong U. Park was supported by the National Research Foundation of Korea (NRF) grant funded by the Korean government (MSIP) (No. 2019R1A2C3007355).

Figures
Fig. 1. First two eigenfunctions for time-varying population composition.
Fig. 2. Pre-smoothed compositional vector over time (a) and its reconstruction based on two leading eigenfunctions (b), for Republic of Korea.
Fig. 3. Mean compositional vectors over time.
TABLES

### Table 1

Contingency table for the counts of countries

Least developed Other less developed More developed Total
Cluster 1 39 17 0 56
Cluster 2 7 80 2 89
Cluster 3 0 13 43 56

Total 46 110 45 201

References
1. Aitchison J (1986). The Statistical Analysis of Compositional Data, Monographs on Statistics and Applied Probability, Chapman & Hall, London.
2. Bosq D (2000). Linear Processes in Function Spaces: Theory and Applications, Springer, New York.
3. Chiou JM, Chen YT, and Yang YF (2014). Multivariate functional principal component analysis: a normalization approach, Statistica Sinica, 24, 1571-1596.
4. Dai X and Müller HG (2018). Principal component analysis for functional data on Riemannian manifolds and spheres, Annals of Statistics, 46, 3334-3361.
5. Hsing T and Eubank R (2015). Theoretical Foundations of Functional Data Analysis, with an Introduction to Linear Operators, Wiley, Singapore.
6. Jeon JM and Park BU (2020). Additive regression with Hilbertian responses, Annals of Statistics, in print.
7. Lin Z and Yao F (2019). Intrinsic Riemannian functional data analysis, Annals of Statistics, 47, 3533-3577.
8. Ramsay J and Silverman BW (2005). Functional Data Analysis, Springer, New York.
9. van den Boogaart KG, Egozcue JJ, and Pawlowsky-Glahn V (2014). Bayes Hilbert spaces, Australian and New Zealand Journal of Statistics, 56, 171-194.
10. Wang H, Shangguan L, Guan R, and Billard L (2015). Principal component analysis for compositional data vectors, Computational Statistics, 30, 1079-1096.