There is an emerging interest in brain functional connectivity (FC) based on functional Magnetic Resonance Imaging in Alzheimer’s disease (AD) studies. The complex and high-dimensional structure of FC makes it challenging to explore the association between altered connectivity and AD susceptibility. We develop a pipeline to refine FC as proper covariates in a penalized logistic regression model and classify normal and AD susceptible groups. Three different quantification methods are proposed for FC refinement. One of the methods is dimension reduction based on common component analysis (CCA), which is employed to address the limitations of the other methods. We applied the proposed pipeline to the Alzheimer’s Disease Neuroimaging Initiative (ADNI) data and deduced pathogenic FC biomarkers associated with AD susceptibility. The refined FC biomarkers were related to brain regions for cognition, stimuli processing, and sensorimotor skills. We also demonstrated that a model using CCA performed better than others in terms of classification performance and goodness-of-fit.
Alzheimer’s disease (AD) is a neurodegenerative disease that affects the elderly’s health and places a huge burden on families and society. Its pathophysiological process is thought to begin many years before diagnosis (Morris, 2005). The preclinical phase of AD provides critical opportunities for early diagnosis that could reduce healthcare costs for both patients and governments. If 80–100% of AD patients were diagnosed at an early stage, it would yield a total cumulative savings of \$7 trillion to $7.9 trillion in medical and long-term care costs (Alzheimer’s Association, 2019). Early diagnosed patients could also prepare legal and financial plans while cognitively capable of making those critical decisions. Furthermore, early diagnosis could help the patients to lessen anxieties about their cognitive and behavioral symptoms by being aware of disease progression. Therefore, there have been many studies to detect AD at mild cognitive impairment (MCI) (Davatzikos
This study helps establish promising biomarkers that contribute to the classification of MCI patients and cognitively normal elderly. Patients with MCI have a significantly higher likelihood to progress to probable AD relative to unimpaired individuals (Ganguli
Biomarkers from brain imaging methods such as computed tomography (CT), magnetic resonance imaging (MRI), and position emission tomography (PET) have been used in the study of AD over the past decade. For more details about these methods, refer to (Johnson
Resting-state fMRI (rs-fMRI) shows the baseline BOLD variance using resting-state FC. Observation of the resting-state FC has reported all across the spectrum from AD (Wang
We focus on the resting-state FC-based biomarkers associated with the classification of MCI and cognitively normal. Logistic regression is used for the classification, since it provides a the straightforward interpretation of the coefficients, unlike other machine learning methods. The most common way of estimating FC is Pearson’s correlation coefficients between two BOLD signals of brain regions. In this study, the brain is segmented into 116 regions of interests (ROIs) by an anatomical parcellation on the brain with the automated anatomical labeling (AAL) template (Tzourio-Mazoyer
Consequently, in order to use FC as covariates in a statistical model, we consider the methods mentioned above: (1) half-vectorization of the Pearson’s correlation matrices, (2) the graph-theory based descriptive measures, and (3) half-vectorization of the dimension-reduced correlation matrices by CCA. We acquire three different datasets from the same rs-fMRI data as a result. However, the three FC datasets are still high dimensional, which is not suitable for a classic logistic regression model. That what, penalization using the elastic net penalty is employed to address the high-dimensionality and multicollinearity by compromising between Lasso and Ridge. We used leave-one-out cross-validation (LOOCV) to estimate the area under the curve (AUC) to select proper tuning parameters of the elastic net penalty. The AUC and deviance of a selected optimum model were also employed to assess classification performance and goodness-of-fit of the three datasets.
The goals of this paper are : (1) to establish a pipeline to incorporate functional connectivity as covariates in a logistic regression model, (2) to compare the performance of models using the three datasets, and (3) to investigate the effects of the FC-based biomarkers. It could provide insight into the relationship with the classification of MCI from NC. The rest of this paper is organized as follows. Section 2 provides detailed descriptions of ADNI data regarding rs-fMRI and data-preprocessing procedures. Three quantification methods for FC will be discussed in Section 3. They include two traditional treatments for FC’s high-dimensionality and the CCA that have not received adequate attention. In Section 4, we propose a pipeline for modeling a Binary Response with FC Covariates by using the CCA. The section briefly explains penalized logistic regression, a core framework of the pipeline, and provides the modeling pipeline with graphic representation. Section 5 shows the performance comparison among the three models using and findings from biomarkers highlighted from regression models. Finally, Section 6 ties up these together and concludes the paper. Also, since there are many technical terms, we list their abbreviations in
This paper is motivated by the ADNI database. The ADNI was launched in 2003 by the National Institute of Biomedical Imaging and Bioengineering (NIBIB), the Food and Drug Administration, the National Institute on Aging, private pharmaceutical companies and non-profit organizations, as a $60 million, 5-year public-private partnership (Martínez-Murcia
Many demographic factors have been known to be associated with the progression of AD, so they have been used as covariates in prior studies to predict conversion from MCI to AD including Age, Education length, ADAS-cog score, Gender, and APOE-
Alzheimer’s Disease Assessment Scale-Cognitive Subscale (ADAS-Cog) (Mohs, 1983) is a clinical and cognitive assessment score from the ADNI dataset that is potentially useful for predicting MCI-to-AD conversion. A higher ADAS score means a greater degree of cognitive impairment and a higher probability of being MCI. Also, the apolipoprotein E (APOE)
Table 1 shows a summary of participant demographics. 53 NC (24 males/29 females) and 67 MCI subjects (36 males/31 females) were obtained from the ADNI dataset. In the table, mean and standard error were presented for continuous variables such as Age, Education length, and ADAS-cog. The unit of Age and Education length is a year. For example, the mean of Age in terms of year for MCI and NC groups was 71.531 and 72.955, with standard error 0.872 and 0.825, respectively. The count and the percentage of each category were presented for categorical variables such as Gender and APOE-
The rs-fMRI data, which were acquired using a 3.0 Tesla Philips Medical Systems during the task-free scans, was downloaded in original Directed Components (DICOM) format from the ADNI website. The scanning protocol for the rs-fMRI of all subjects was depicted as follows: flip angle = 80.0 degrees; manufacturing model=Intera; echo time (TE) = 30.001 ms; repetition time (TR) = 3000.0 ms; pixel spacing size = 3.3125 × 3.3125; slice thickness = 3.313; slices = 6720.0; matrix size = 64 × 64; pulse sequence = GR; the number of anatomical volumes = 140. Detailed acquisition parameters could be referred at the ADNI web site (http://www.adni-info.org/).
For the rs-fMRI data preprocessing, we used SPM8 (https://www.fil.ion.ucl.ac.uk/spm/). The details are: (1) removing the first 10 image volumes of functional time series manually to ensure magnetization equilibrium. (2) slice acquisition timing was corrected for each volume, followed by head-motion correction (i.e., realignment) with rigid-body transformation. (3) intensity scaling of each fMRI scan after motion correction to yield a whole-brain mean value of 10000, (4) temporally band-pass filtering with low-frequency range (0.01–0.08Hz) to remove effects of very low-frequency drift and high-frequency noise, (5) regressing out a set of nuisance signals, including the signal averaged over the white matter, signal averaged over the cerebrospinal fluid, global signal averaged over the whole brain, and six motion parameters, and (6) nonlinear normalization to the Montreal Neurological Institute space and spatially smoothing using Gaussian kernel of 6mm full-width.
The preprocessed BOLD time-series signals of all voxels were partitioned into 116 ROIs using the AAL template atlas (Tzourio-Mazoyer
where
The FC can be treated as a weighted graph based on the graph theory: a specific ROI (i.e., brain region) corresponds to a node, and an edge is used to characterize the pairwise FC between the ROIs.
Another recently proposed approach is the
where
The unknown parameter matrices
where
where the eigenmap
where
Let
Calculate
Compute the
Set
Repeat the above iterations until it converges.
Calculate
We half-vectorize Λ
We propose a modeling pipeline for a binary response with FC covariates. Logistic regression is widely used in this case, which is not suitable for the FC data due to its complex matrix structure. The functional connectivity data is refined as appropriate covariates by applying for each of the three quantification methods, LON, METRIC, and DFC. Since the refined covariates are still high-dimensional, penalized logistic regression is employed as a core model in our modeling strategy to address the issue. We summarize the whole procedure as a pipeline for the FC analysis in Subsection 4.2.
Assume that we have
The design matrix
where
The penalized logistic regression adds a non-negative penalty term to the log-likelihood function to solve constrained maximization for
where
For the model estimation, we used the R
Figure 1 shows the pipeline of the modeling framework. It describes the following steps:
(1) Quantification of the FC data: This quantification includes simple vectorization (LON), feature extraction based on graph theory (METRIC), and dimension reduction (DFC) for the FC data. Hence, three different biomarkers are refined and denoted by
(2) Penalized logistic regression: After the first step, there are possibly more variables than the sample size. Therefore, we conduct penalized logistic regression using the elastic net penalty, where the refined biomarkers (
(3) Model assessment: Each model is assessed by LOOCV-AUC and the deviance test. One can conclude which quantification method shows better performance than the others in terms of classification performance and goodness-of-fit.
This pipeline can be applied to any classification problem, where explanatory variables are symmetric matrices. It can also be easily adapted to generalized linear models.
We included Age, Education length, ADAS-cog score, Gender, and APOE-
The results of the current paper can be divided into three parts. Section 5.1 presents the results of the classification between NC and MCI. By computing AUC and deviance, we assess the classification performance and the goodness-of-fit for each FC quantification method. In Section 5.2, we explore selected biomarkers by penalized logistic regression. We will describe the selected biomarkers from
Figure 2 shows the receiver operating characteristic (ROC) curve for each model and Table 2 summarizes the assessment results. The first three rows of the table represent the results when the tuning has been done by LOOCV. The next three rows show performance of the manually tuned model with three covariates. Results for five covariates are summarized in the last three rows. The AUC of
The estimated coefficients are presented in Table 3. With
The model
For the model
The 2
The estimated odds of MCI multiply by exp(
In AD study, demographic and clinical variables have allegedly been considered to be critical factors. Since AGE, APOE-4, Education length, Gender, and ADAS-cog score could be causing-factors of AD, we added them as control variables to each model not to be penalized. Table 3 shows that the signs of estimated coefficients were the same in the three models, indicating the same directions of the effects on the probability of being MCI. Hence, we only focused on the variables from
The regression coefficients of Gender(Male = 1, Female = 0), APOE-
In this work, we developed the pipeline for classification using FC. This pipeline can be applied to any classification problem, where explanatory variables are symmetric matrices. Three different methods were used to quantify FC and construct classification models using FC biomarkers as covariates. Penalized logistic regression with the elastic net penalty was applied in the MCI classification to simultaneously tackle model estimation and FC biomarker selection. We compared the performance of three methods for the ADNI data. The AUCs of 69.22%, 75.81%, and 78.26%, for LON, METRIC, and DFC, respectively. Therefore, in comparison with other alternatives, the
The ADNI dataset could include heterogeneous MCI subjects. Diagnosis for MCI has followed the criteria formulated by the Mayo Alzheimer’s Disease Research Center (Hänninen
Longitudinal fMRI studies in patients with dementia have multiple challenges. Changes in the brain of NC and MCI have very weak signals to diagnose if a subject is cognitively normal or MCI (Johnson
Anatomical automatic labeling
Left Anterior cingulate and paracingulate gyri
Alzheimer’s disease
Alzheimer’s Disease Assessment Scale-Cognitive Subscale
Alzheimer’s Disease Neuroimaging Initiative
Apolipoprotein E
Area under the ROC curve
Blood oxygen level-dependent
Left calcarine fissure and surrounding cortex
Common component analysis
Dimension-reduced Functional Connectivity
Directed Components
Functional connectivity
Functional Magnetic Resonance Imaging
Low-Order functional Network
Leave-One-Out Cross-Validation
Mild cognitive impairment
Magnetic Resonance Imaging
Normal control
Positron emission tomography
Receiver operating characteristic
Region of interests
Resting-state fMRI
Right superior parietal gyrus
Demographic information
Demographics | Total | NC | MCI | |
---|---|---|---|---|
Age | 72.160 ± 0.609 | 72.955 ± 0.825 | 71.531 ± 0.872 | |
Education length | 16.250 ± 0.234 | 16.585 ± 0.328 | 15.985 ± 0.328 | |
ADAS-cog score | 7.542 ± 0.369 | 5.566 ± 0.362 | 9.104 ± 0.523 | |
Gender | Male | 60 (50.00%) | 24 (45.28%) | 36 (53.73%) |
Female | 60 (50.00%) | 29 (54.72%) | 31 (46.27%) | |
APOE- | Total | 0.517 ± 0.059 | 0.377 ± 0.072 | 0.627 ± 0.087 |
0 | 68 (56.70%) | 34 (64.20%) | 34 (50.70%) | |
1 | 42 (35.00%) | 18 (34.00%) | 24 (35.80%) | |
2 | 10 (8.30%) | 1 (1.90%) | 9 (13.40%) |
Model assessment
Tuning method | Model | AUC(%) | Deviance | df | |
---|---|---|---|---|---|
LOOCV | 69.22 | 140.36 | 113 | 0.0414^{*} | |
75.81 | 129.15 | 113 | 0.1421 | ||
78.26 | 120.15 | 111 | 0.2603 | ||
3 covariates | 68.68 | 138.64 | 111 | 0.0389^{*} | |
75.47 | 127.27 | 111 | 0.1385 | ||
78.26 | 120.15 | 111 | 0.2603 | ||
5 covariates | 67.33 | 137.95 | 109 | 0.0319^{*} | |
75.08 | 121.94 | 109 | 0.1871 | ||
77.76 | 119.90 | 109 | 0.2238 |
Coefficients of the three penalized logistic regression models
Coefficients | |||||||
---|---|---|---|---|---|---|---|
exp( | exp( | exp( | |||||
Demographics | Age | −0.055 | 0.947 | −0.071 | 0.932 | −0.079 | 0.924 |
Education | −0.169 | 0.845 | −0.149 | 0.862 | −0.141 | 0.869 | |
ADAS | 0.070 | 1.073 | 0.162 | 1.176 | 0.199 | 1.220 | |
Gender(male) | 0.583 | 1.791 | 0.490 | 1.632 | 0.411 | 1.508 | |
APOE- | 0.679 | 1.972 | 0.647 | 1.910 | 0.606 | 1.833 | |
FC Biomarkersa | −0.107 | 0.898 | - | - | |||
- | −0.011 | 0.989 | - | ||||
Λ_{2,12} | - | - | 0.070 | 1.073 | |||
Λ_{4,12} | - | - | −0.095 | 0.910 | |||
Λ_{11,14} | - | - | 0.193 | 1.213 |
^{a}Selected FC biomarkers for each model.
ROIs mainly associated with the 5 eigenvectors
2nd eigenvector | 4th eigenvector | 11th eigenvector | |
---|---|---|---|
(1) | right inferior occipital gyrus | dorsolateral area of left superior frontal gyrus | orbital part of right middle frontal gyrus |
(2) | right middle occipital gyrus | left middle frontal gyrus | orbital part of right inferior frontal gyrus |
(3) | flocculonodular lobe of left cerebellum | right supramarginal gyrus | right inferior parietal |
(4) | left calcarine fissure and surrounding cortex | triangular part of left inferior frontal gyrus | dorsolateral part of right superiorfrontal gyrus |
(5) | left gyrus rectus | right insula | |
(6) | right olfactory cortex | right Heschl gyrus | |
(7) | orbital part of left inferior frontal gyrus | right rolandic operculum | |
(8) | bilateral anterior cingulate and paracingulate gyri | ||
(9) | right Inferior parietal, but supramarginal and angular gyri | ||
12th eigenvector | 14th eigenvector | ||
(1) | left precentral gyrus | orbital part of left inferior frontalgyrus | |
(2) | left rolandic operculum | temporal pole of left superior temporal gyrus | |
(3) | right superior occipital gyrus | left vermis 3 | |
(4) | dorsolateral part of left superior frontal gyrus | temporal pole of the left middle temporal gyrus | |
(5) | left vermis 9 | left cuneus | |
(6) | left Heschl gyrus | ||
(7) | right angular gyrus | ||
(8) | right inferior parietal, but supramarginal and angular gyri | ||
(9) | right Lenticular nucleus andputamen |