Study design and participants
The ARIC study is a prospective population-based study of atherosclerosis and clinical atherosclerotic diseases. At its inception (1987–1989), 15,792 men and women, including 4266 African-American participants aged 45 to 64, were recruited from probability samples in four US communities: Suburban Minneapolis, MN; Washington County, MD; Forsyth County, NC; and Jackson, MS (African-Americans only) [10]. Four additional examinations (visit 2: 1990–1992; visit 3: 1993–1995; visit 4: 1996–1998; and visit 5: 2011–2013) have been completed. During the first 2 years of the third ARIC examination, participants aged 55 and older from the Forsyth County and Jackson sites were invited to undergo a cranial MRI. A total of 1920 participants, including 955 African-Americans had usable MRI data. All methods were approved by the institutional review board at each field center and coordinating center, and written informed consent was obtained from the participants.
MRI protocol and phenotyping
Details of the MRI scanning and the image interpretation protocols used for this study have been published [11, 12]. General Electric (General Electric Medical Systems) or Picker (Picker Medical Systems) 1.5-T scanners were used for the MRI examination. The scanning protocol included a series of sagittal T1-weighted scans and axial proton density-weighted, T2-weighted, and T1-weighted scans with 5-mm thickness and no interslice gaps. Images were interpreted directly from a PDS-4 digital workstation consisting of four 1024 × 1024-pixel monitors capable of displaying all 96 images simultaneously. WMH were estimated as the relative total volume of periventricular and subcortical white matter signal abnormality on proton density-weighted axial images by visual comparison with eight templates that successively increased from barely detectable white matter abnormalities (grade 1) to extensive, confluent abnormalities (grade 8). Individuals with no white matter abnormalities received grade 0, and those with abnormalities worse than grade 8 received grade 9.
DNA methylation analysis
DNA methylation analysis was conducted with the Infinium HumanMethylation450 BeadChip (HM450) array (Illumina Inc., San Diego, CA) on genomic DNA extracted from blood samples collected at ARIC visit 2 or 3. Assay were performed on 2879 African-American participants who had not restricted use of their DNA and for whom at least 1 μg of DNA and genome-wide genotyping data were available.
Details of assay and QC procedures have been previously published [13]. Briefly, genomic DNA was treated with sodium bisulfite using the EZ-96 DNA methylation kit (Zymo Research Corporation, Irvine, CA) following the manufacturer’s protocol. Bisulfite converted DNA was amplified, enzymatically fragmented, purified, and hybridized to the HM450 array in accordance with the manufacturer’s directions. Methylation typing at 485,577 CpG sites was performed using GenomeStudio 2011.1 (Illumina Inc., San Diego, CA). Methylation level for each probe was derived as a beta value representing the fractional level of methylation at that location. Quality control analysis was performed using the wateRmelon R package [14]. Probe data were excluded if they had a low detection rate (<95% at P < 0.01) and a high missing rate (greater than 1% across all samples). Sample data were excluded based on the following criteria: (1) greater than 5% missing values across all probes; (2) possible gender mismatch based on principal component analysis; and (3) genotype mismatch based on 24 SNPs present on the HM450 array.
A total of 713 participants with, both, a DNA methylation measure and a brain MRI scan are included in these analyses. For each participant, DNA methylation age was estimated from the background-subtracted beta values processed in GenomeStudio. Imputation of missing beta values and data normalization were performed using R codes implemented by Horvath. Specifically, non-normalized beta values were uploaded, and data normalization option was selected. Two estimates of DNA methylation age were derived: The Horvath predicted age based on 353 CpG probes was generated using the online calculator; and the Hannum et al. predicted age based on 71 probes was derived using the regression weights supplied by the authors [6]. Age acceleration estimates were calculated as the residual values from the regression of each of the predicted age measures onto the chronologic age at the time of blood collection.
Statistical analysis
To reduce the skewness of its distribution, WMH burden was expressed as the natural log-transformed WM grade (log(WMH + 1)). Linear regression models were used to estimate the association of WMH burden (log(WMH + 1)) with either Horvath or Hannum et al. estimate of age acceleration adjusting for covariates. Covariates were measured at ARIC visit 3 except for blood cell composition, which were estimated from the DNA methylation data as described elsewhere [15, 16]. Model 1 adjusted for age at brain MRI and sex. Model 2 adjusted for age at brain MRI, sex, body mass index, systolic blood pressure, hypertension, diabetes, and current smoking. Model 3 adjusted for covariates in model 2 and blood cell composition (estimated proportions of CD8 and CD4 T cells, natural killer cells, plasma blasts, monocytes, and granulocytes were derived from Horvath algorithm).
To examine whether estimates of age acceleration were associated with greater WMH severity, WMH was classified into three categories: no WMH (grade 0); low WMH burden (grade = 1–4); and high WMH burden (grade > =5). Multinomial logistic regression was used to estimate the association of WMH severity categories with estimates of age acceleration adjusting for covariates. The covariate adjustment scheme was the same as that described for the linear models. Statistical analyses were performed using the SAS software v9.4.
Genome-wide association analysis of age acceleration and association of age acceleration loci with WMH
Genome-wide genotyping was conducted at the Broad Institute using the Affymetrix 6.0 SNP Array on 3207 African-American participants. Of these, 336 were removed in data cleaning procedures, which included an insufficient call rate, sex mismatch, discordance with previously genotyped markers, first-degree relative of an included individual, and genetic outlier based on allele sharing and principal components analyses. Imputation was performed on the QCed data in two steps: (1) pre-phasing with ShapeIt (v1.r532) and (2) imputation with IMPUTE2 using the 1000 Genomes Phase I v3 reference panel. Measured SNPs used for imputation were restricted to have MAF > 0.01, >95% call rate, and Hardy Weinberg Equilibrium P > 1 × 10−6. After frequency and genotyping pruning, there were 806,416 SNPs in the final set used for the imputation.
Statistical analyses modeled the association between the number of minor alleles at each SNP and each measure of age acceleration trait using linear regression, adjusting for chronologic age, gender, study site, blood cell composition and 10 PCs. A P value less than 5 × 10−8 was considered statistically significant. There were 2216 ARIC participants with both DNA methylation and genome-wide genotyping included in the analyses. Significant epigenetic age acceleration loci were, then, evaluated for their association with WMH using the largest published GWAS of WMH to date [17].
Epigenome-wide association analysis of WMH burden
We examined the association of individual CpG probes with WMH burden in the 713 individuals with DNA methylation and brain MRI data. The association between methylation scores at each CpG site and log(WMH + 1) was modeled using linear mixed models, adjusting for age at brain MRI, gender, study site, visit, BMI, smoking, blood cell composition, systolic blood pressure, diastolic blood pressure, and technical variables (plate ID, chip ID and chip row). For this analysis, beta values normalized using the BMIQ procedure [18] were used. Probes with poor reliability in our technical replicate analyses [19], probes containing known SNPs, and non-specific probes were excluded [20, 21]. A P value less than 1 × 10−7, which corresponds to a Bonferroni correction to the total number of CpGs tested of the P = 0.05 threshold, was considered statistically significant.