Epigenome-wide association study of Alzheimer’s disease replicates 22 differentially methylated positions and 30 differentially methylated regions

Background Growing evidence shows that epigenetic modifications play a role in Alzheimer’s disease (AD). We performed an epigenome-wide association study (EWAS) to evaluate the DNA methylation differences using postmortem superior temporal gyrus (STG) and inferior frontal gyrus (IFG) samples. Results Samples from 72 AD patients and 62 age-matched cognitively normal controls were assayed using Illumina© Infinium MethylationEPIC BeadChip. Five and 14 differentially methylated positions (DMPs) associated with pathology (i.e., Braak stage) with p value less than Bonferroni correction threshold of 6.79 × 10–8 in the STG and IFG were identified, respectively. These cytosine–phosphate–guanine (CpG) sites included promoter associated cg26263477 annotated to ABCA7 in the STG (p = 1.21 × 10–11), and cg14058329 annotated to the HOXA5/HOXA3/HOXA-AS3 gene cluster (p = 1.62 × 10–9) and cg09448088 (p = 3.95 × 10–9) annotated to MCF2L in the IFG. These genes were previously reported to harbor DMPs and/or differentially methylated regions (DMRs). Previously reported DMPs annotated to RMGA, GNG7, HOXA3, GPR56, SPG7, PCNT, RP11-961A15.1, MCF2L, RHBDF2, ANK1, PCNT, TPRG1, and RASGEF1C were replicated (p < 0.0001). One hundred twenty-one and 173 DMRs associated with pathology in the STG and IFG, respectively, were additionally identified. Of these, DMRs annotated to 30 unique genes were also identified as significant DMRs in the same brain region in a recent meta-analysis, while additional DMRs annotated to 12 genes were reported as DMRs in a different brain region or in a cross-cortex meta-analysis. The significant DMRs were enriched in promoters, CpG islands, and exons in the genome. Gene set enrichment analysis of DMPs and DMRs showed that gene sets involved in neuroinflammation (e.g., microglia differentiation), neurogenesis, and cognition were enriched (false discovery rate (FDR) < 0.05). Conclusions Twenty-two DMPs and 30 DMRs associated with pathology were replicated, and novel DMPs and DMRs were discovered.


Introduction
Dementia refers to conditions of memory loss and other cognitive decline serious enough to interfere with daily life. Alzheimer's disease (AD) is the most common cause of dementia and accounts for 50-75% of dementia cases [1]. While genetic studies have identified familial risk factors such as APP, PSEN1, and PSEN2 that are involved with amyloid-β production, they only account for a small fraction of patients with early onset AD [2]. Most patients with AD acquire the disease late in life (i.e., age of onset > 65 years). Genome-wide association studies (GWAS) have identified dozens of loci for late onset AD (LOAD) [3][4][5][6][7][8][9]. Most of the variants confer risk with relatively small effect size, except for APOE variants with modest effect size. Next generation sequencing (NGS) enables rare variant analysis to further identify genes such as TREM2 playing a critical role in AD [9][10][11]. Growing evidence shows that epigenetic modifications also play a role in AD onset and progression [12][13][14][15]. Epigenetic modification could be detected by bisulfite conversion, a method that assesses the degree of DNA methylation present as it converts unmethylated cytosine to uracil (and to thymine through PCR), while 5-methylcytosine (5-mC) and 5-hydroxymethylcytosine (5-hmC) are not converted (and thus remain as cytosine in PCR). As such, bisulfite treatment of DNA allows the differentiation between cytosine and the modified versions of cytosine (5-mC/5-hmC) through downstream assay techniques. Oxidative bisulfite conversion may further distinguish 5-mC from 5-hmC [16]. Epigenome-wide association studies (EWAS) using bisulfite conversion approaches coupled with the Illumina Infinium ® Human-Methylation450 BeadChip have demonstrated robust and reproducible differences in total DNA methylation at a number of loci in AD brain [17][18][19][20][21][22], including ankyrin 1 (ANK1), ABCA7, BIN1, TREM2, and the HOXA and HOXB gene clusters. Notably, a recent meta-analysis using samples from 6 cohorts identified a total of 220 DMPs in a cross-cortex meta-analysis [21].
Volumetric measurements of specific regions of the cortex from AD patients reveal anatomical regions with severe, moderate, or mild/no atrophy. Severe atrophy occurs in medial temporal lobe structures as well as in inferior temporal and superior and middle frontal cortices, while moderate atrophy takes place in the superior temporal gyrus (STG) and no atrophy is noted in the inferior frontal lobes [23]. While there is no significant increase in total (intra-and extracellular) neurofibrillary tangles (NFTs), moderate neuronal loss and evidence of oxidative stress are observed in the STG [24][25][26][27][28]. The STG could therefore be a surrogate for an earlier stage compared to the most severely affected regions, while inferior frontal gyrus (IFG) could represent the earliest stage in the disease course. In this EWAS study, total methylation patterns from the STG and IFG were interrogated using Infinium ® MethylationEPIC BeadChip containing ~ 850 K CpG probes, doubling the density of the Infinium ® HumanMethylation450 BeadChip used in earlier studies [17,18,20,29]. DMPs and DMRs associated with pathology were identified. Additionally, we performed a replication study using this study data set to replicate the findings from a recent EWAS meta-analysis [21]. Gene set enrichment and over representation analyses were performed to provide insight into coherent biological pathways and processes.

Postmortem brain tissue epigenetic profiling and DMP analysis results
Demographic and clinical characteristics are provided in Table 1. The mean Pearson correlations of methylation levels for all possible subject pairs were 0.986 and 0.984 for the STG and IFG samples, respectively, indicating that the majority of the CpG sites did not show significant differences in DNA methylation levels. The estimated proportion of NeuN + cells (primarily neurons) showed no significant differences between AD patients and cognitively normal controls (Wilcoxon ranksum test p value = 0.504 and 0.159 in the STG and IFG, respectively).
Epigenome-wide association studies are known to be prone to significant inflation and bias of test statistics. Lambda (λ) inflation factors were 1.54 and 1.11 for the initial EWAS in the STG and IFG, respectively, suggesting the presence of inflation in test statistics (Additional file 1: Figure S1 for QQ plots). A Bayesian method based on estimation of the empirical null distribution as implemented in BACON [30] was used to control bias and inflation in EWAS. After BACON correction, λ values for both EWAS were less than 1.05 in this study. All results reported in this study are after BACON correction. Manhattan plots are also available in Additional file 1: Figure  S2.
Five CpGs were associated with pathology in the STG passing Bonferroni correction threshold of 6.79 × 10 -8 , including cg26263477 annotated to ABCA7 (p = 1.21 × 10 -11 , Table 2), a gene known to harbor an AD susceptibility genetic variant and DMP [5,31]. In addition, fourteen CpG probes were associated with pathology in the IFG, including CpG probes cg14058329 annotated to the HOXA5/HOXA3/HOX-AS3 gene cluster (p = 1.62 × 10 -9 ) and cg09448088 (p = 3.95 × 10 -9 ) annotated to MCF2L (Fig. 1). cg09448088 was recently reported as a significant DMP in an cross-cortex Braak stage EWAS meta-analysis [21], and the HOXA gene cluster was reported to harbor DMPs and DMRs associated with Braak stage [20,21]. There is no overlap between the study-wide significant findings between these two brain regions.
Of the 14 significant DMPs associated with pathology in the IFG, 5 were nominally associated with pathology (p < 0.05) in the STG. The effect sizes in the STG for the 19 DMPs associated with pathology in either the STG or IFG were correlated with the effect sizes for the same probes in the IFG (r = 0.50, p = 0.03) (Fig. 2a). A full list of DMPs associated with pathology with p value less than 6.79 × 10 -8 in either the STG or IFG is available in the Additional file 2: Table S1.
Using the reported significant DMP findings from a recent EWAS meta-analysis [21], our results replicated a subset of reported epigenome-wide significant DMPs. There were 377 unique genome-wide significant CpGs (236, 95, 10, and 220 CpGs associated with Braak stage in the prefrontal cortex (PFC), temporal gyrus (TG), EC, and cross-cortex, respectively) reported, among which 344 CpGs passed QC and were present in this dataset. 236 (68.6%) of these 344 CpGs were nominally significant (p < 0.05) in our study, while 22 (6.4%) remained significant after accounting for the multiple tests of the replication effort (p < 0.05/344 ~ 0.0001) ( Table 2). The replication rate in the IFG seemed to be higher than that in the STG. Eighteen (8.33%) out of 216 significant DMPs in the PFC (216 out of 236 DMPs from the meta-analysis

DMR analysis results
A DMR analysis, which allowed us to identify regions of the genome consisting of ≥ 3 probes, revealed a total of 121 and 173 DMRs significantly associated with the pathology in the STG and IFG, respectively (Sidak-corrected p value < 0.05, Additional file 2: Table S2A and S2B), among which 11 and 33 were reported to be significant DMRs associated with pathology identified in the corresponding cortex region in the recent EWAS metaanalysis [21]. Lists of replicated DMRs in the same brain region and more broadly in any brain region are available in Additional file 2:  In addition, 4 DMRs associated with pathology in the IFG were detected in regions annotated to MCF2L consisting of a total of 17 probes (Additional file 1: Figure S3B); the DMR annotated to MCF2L was also detected in the STG. Furthermore, multiple DMRs annotated to TFAP2E (STG), ZNF608 (STG), STRA6 (IFG), LHX6 (IFG), SHH (IFG), and LINC00870 (IFG) were also detected. The HOXA gene cluster, MCF2L, TFAP2E, SHH, and LHX6, was among the replicated DMRs reported previously in the recent EWAS meta-analysis [21]. Additional replicated DMRs supported by a single DMR include regions annotated to RGMA, CD82, CPEB4, RHBDF2 in both STG and IFG, C3, CUX2, CLDN5, CXXC5, DDAH2, DIP2A, PARS2, S1PR4, SLC16A3, HLA-DPA1, SMG9, ATP2A3, ZNF385A, DUSP27, CAMTA1, and the HOXB gene cluster (Additional file 1: Figure S3C) in the STG, and NAT8L, DDR1, SLC15A4, RHOB in the IFG (see Additional file 2: Tables S4A and S4B for a full list of replicated DMRs). In total, 26 (16.3%, calculated at the gene level) of the reported 262 significant DMRs (annotated to 160 unique genes) from the PFC meta-analysis were replicated in the IFG analysis, while 15.8% of our 173 significant DMRs (annotated to 164 unique genes) were reciprocally replicated by the PFC meta-analysis. In addition, 11 (18.3%) of the reported 104 significant DMRs (annotated to 60 unique genes) from the TG meta-analysis were replicated in the STG analysis, while 9.2% of our 121 significant DMRs (annotated to 119 unique genes) were reciprocally replicated by the TG meta-analysis. The replication rate at the DMR level seems to be higher than that at the DMP level.
Among the top DMRs associated with pathology in the IFG, the DMR annotated to DDAH2 stood out as one of the most significant DMRs  Fig. 2 Correlated effect sizes of differential methylation within and between studies. Effect sizes from the Braak stage EWAS in the STG from this study were plotted against those in the IFG from the same study (a); effect sizes from the Braak stage EWAS in the STG (b) and IFG (c) from this study were plotted against those in the PFC, TG, EC, and cross-cortex meta-analysis from Smith et al. For panel b and c, only the correlation and p value for the same brain is displayed

AD-associated DMRs are enriched/depleted in specific genomic features
Among the DMRs that were associated with pathology, genomic features such as promoter and CpG island (CGI) are highly enriched in all analyses. Exon, 5′ UTR, and transcription termination site (TTS) are also enriched. In contrast, intergenic region and repeats (SINE and LINE) are depleted (Fig. 3 and Additional file 2: Table S4).

Gene set enrichment analysis (GSEA) and over-representation analysis (ORA) results
Gene set enrichment analysis of the DMPs and DMRs revealed that gene sets involved in neuroinflammation, neurogenesis, and cognition were enriched (false discovery rate (FDR) < 0.05. See Additional file 2: Tables S5 and S6 for a full list of enriched gene sets from the DMP and DMR analyses).
Gene sets related to neuroinflammation were enriched in this study. Over representation analysis revealed that genes involved in microglia differentiation (driven by TSPAN2 and negative regulator of reactive oxygen species (NRROS)) were enriched among the DMRs associated with pathology in the STG (Additional file 2: Table S5A). Noticeably, the DMR in NRROS was also replicated in the recent EWAS meta-analysis [21]. Phagocytosis (p = 0.001, adjusted p value = 0.05) was also enriched in DMPs associated with pathology in the IFG (Additional file 2: Table S5B).

Correlation between selected CpG probes and between CpG probes and transcript level
Correlation among top ANK1 CpG cg13609385 (nominally associated with diagnosis in the STG, p DMP = 0.001), top HOXA gene cluster probe cg14058329, top HOXB gene cluster probe cg04904318 (p DMP = 8.79 × 10 -6 ), and top DDAH2 probe cg25845158, and between these probes and expression level of all transcripts, were tested. Top HOXB gene cluster probe cg04904318 was correlated with multiple CpG probes. For example, cg13609385 was positively correlated (r = 0.74, p = 1.33 × 10 -14 ) with cg04904318 in the STG, but negatively correlated with cg04904318 (r = −p valu0.72, p = 2.06 × 10 -11 , Additional file 1: Table S7) in the IFG, suggesting a brain region specificity in their interaction. This correlation is significant even after multiple testing correction (p < 0.05/(4*2*(5772+3)) ~ 1.08 × 10 -6 to correct for testing correlation with 5,772 transcripts and 3 other CpG probes for a total of 4 probes tested in 2 brain regions). In addition, there was a positive correlation between the methylation level of DDAH2 probe cg25845158 and that of cg04904318 in both the STG and IFG (r = 0.549, p = 2.77 × 10 -7 in the STG; r = 0.588, p = 3.33 × 10 -7 in the IFG, Additional file 1: Figures S4A and 4B).

Discussion
In this study, we profiled methylome from the STG and IFG brain regions and attempted to identify evidence of replication of the reported DMPs and DMRs from a recent EWAS meta-analysis [21]. We identified DMRs with replication evidence and replicated 22 significant DMPs reported in the recent EWAS meta-analysis. In addition, we discovered novel DMPs and DMRs associated with pathology surpassing the genome-wide significance threshold (p < 6.79 × 10 -8 for DMPs, and Sidak-corrected p value < 0.05 for DMRs) for future follow-up studies.
It is of interest to note that despite the modest sample size, the replication rate for DMRs was substantial. The DMRs identified in this study were supported by more probes than that of previous studies [20,21], owing to the fact that the EPIC BeadChip used in this study doubles the CpG probe density compared to the previous studies that utilized the HumanMethylation450 BeadChip. This suggests that the higher CpG probe density could increase the power of DMR detection, given comparable study sample size. Indeed, for this study the replication rate at the DMR level is higher than that at the DMP level. 8.33% of the DMPs and 16.25% of the DMRs reported in the PFC meta-analysis were replicated in the IFG analysis, while 4.49% of the DMPs and 18.33% of the DMRs reported in the TG meta-analysis were replicated in the STG analysis. It appears that this study replicated more DMPs identified from the IFG region than the STG region. This could be due to the differential power of the meta-analysis in that the sample size in the PFC metaanalysis is larger than that in the TG meta-analysis (sample size n = 959 for the PFC vs. n = 608 for the TG), and hence, more (and perhaps also more reliable) DMPs were discovered (n = 236 for the PFC vs. n = 95 for the TG).
The activation of HOX genes during differentiation was enriched among the DMPs associated with pathology in the STG (Additional file 2: Table S5A). This is not surprising given the single genome-wide significant DMP in the HOXA gene cluster and multiple significant DMRs in both HOXA and HOXB gene clusters were identified in this study. Both HOXA and HOXB differential methylation findings were reported previously in the AD brain [17][18][19][20][21]32]. Additionally, HOXA differential methylation was reported in the blood from patients with Down syndrome [33] and HOXB differential methylation was also reported in the blood from patients with AD [34]. Many Down syndrome patients develop AD resulting from an extra copy of the APP gene due to trisomy on chromosome 21. In Drosophila, it has been shown that the HOX transcription factor is one of the upstream regulators coordinating ankyrin-dependent microtubule organization and synapse stability [35] and it is therefore potentially neuroprotective. ANK1 was also reported to be differentially methylated in prior studies [16][17][18]32] and replicated in this study. We therefore tested the correlation of top ANK1 CpG cg13609385 and representative probes in the HOX gene clusters. Surprisingly, cg13609385 was positively correlated with the lead HOXB probe cg04904318 in the STG, but negatively correlated with cg04904318 (Additional file 2: Table S7) in the IFG, suggesting a brain region specificity in their interaction.
Given the role of DDAH2 in oxidative stress response, hypermethylation of DDAH2 could theoretically result in lower level of DDAH2 gene expression and increased NOS activity, excessive reactive oxygen species (ROS) production, and higher level of neuroinflammation, as shown previously that DDAH2 expression level was inversely correlated with proinflammatory cytokines IL-6 and TNF-alpha [36]. However, with limited overlapping samples (n = 76 for the STG samples) between mRNA-Seq and EPIC array samples, we did not have evidence to support the negative correlation between the genome-wide significant DMP cg25845158 from DDAH2 and DDAH2 mRNA level, suggesting the transcriptional regulation of DDAH2 is more complicated than simple regulation by the methylation at the CpG island. In contrast, there was a nominal positive correlation between cg25845158 and DDAH2 mRNA level in the IFG (r = 0.316, p = 0.01, Additional file 1: Figure S4C) and no correlation in the STG (Additional file 1: Fig. 4D). Interestingly, there was a positive correlation between the methylation level of DDAH2 probe cg25845158 and the methylation level of the lead probe in the HOXB gene cluster cg04904318 in both the STG and IFG. The significance of this correlation is unknown. Furthermore, DDAH2 probe cg25845158 was negatively correlated with the transcript level of SNAP25, (r = − 0.62, p = 4.34 × 10 -8 ), MAPK8IP2 (r = − 0.59, p = 2.63 × 10 -7 ), PDE2A (r = − 0.59, p = 2.86 × 10 -7 ), and BZRAP1 (r = − 0.57, p = 1.01 × 10 -6 ) (Additional file 1: Table S7). This is of interest as it is thought that Aβ peptides trigger synaptic dysfunction by interfering with the synaptic vesicular fusion facilitated by the SNARE protein complexes including SNAP25 [37]. MAPK8IP2 is also known as JIP2, c-Jun NH(2)-terminal kinase (JNK)-interacting protein 2, which is known to interact with Aβ to play an important role in the metabolism and/or the function of Aβ including the regulation of Aβ phosphorylation by JNK [38]. Inhibition of PDE2 has been shown to rescue Aβ induced memory impairment via regulation of PKA/ PKG-dependent neuroinflammatory and apoptotic pathways [39]. Finally, genetic variants from BZRAP1-AS1 were previously implicated to be associated with AD [40].
While this study identified interesting genes and pathways, there are limitations that worth commenting. Despite the modest sample size for the two brain regions included in this study, the sample size is still far smaller compared to the pooled sample size in the recent metaanalysis, and hence limits the power in identifying more genome-wide significant DMPs and DMRs. This is a cross-sectional study using samples from the end stage of a disease and therefore it is difficult to infer whether the methylation change is causal or is a result of the disease process. The current study annotates CpG probes to the nearby genes based on the genomic location. It is possible that a regulatory element may interact with another sequence element in the distance via chromatin loop, and therefore, the functional consequence could affect another distal gene. This is a study using DNA extracted from bulk tissue despite the correction for neuronal proportion in the DMP analysis. Studies on cell-type specific methylation pattern [19,32] revealed cell-type specific effect, which could be obscured in studies using bulk tissues. Finally, this study did not distinguish between 5-mC and 5-hmC and could have missed specific differences between the two. Further studies are needed to replicate the novel DMPs or DMRs identified in this study.

Conclusions
We conducted a modest size EWAS to identify DMPs and DMRs associated with pathology. Five and 14 study-wide significant DMPs were identified to be associated with pathology in the STG and IFG, respectively. Our study replicated 22 DMPs supporting the findings of a recent EWAS meta-analysis. Additionally, there was substantial overlap between the DMRs identified in this study and those identified in the recent meta-analysis. The identified DMPs and DMRs converged on biological pathways and processes that were previously implicated in AD.

Cohort
Postmortem brain case samples from patients with AD and control samples from patients who were cognitively normal from the STG (n case = 91, n control = 61) and IFG (n case = 89, n control = 57) were acquired from Banner Sun Health Research Institute [41,42]. These brain samples came from subjects who were volunteers in the Arizona Study of Aging and Neurodegenerative Disorders (AZSAND) and the Brain and Body Donation Program, a longitudinal clinicopathological study of healthy aging, cognition, and movement in the elderly since 1996 in Sun City, Arizona.

Postmortem brain tissue epigenetic profiling
Genomic DNA and total RNA, including miRNA, were simultaneous purified from the brain tissue samples using AllPrep DNA/RNA/miRNA Universal Kit (QIA-GEN Inc., Germantown, MD, USA) following the standard protocol. 10 μl of genomic DNA with minimal concentration of 40 ng/μl was bisulfite converted using the Zymo EZ DNA Methylation ™ kit (Zymo, Irvine, CA, USA) using the manual protocol, while genome-wide methylation was measured using Infinium ® Methylatio-nEPIC BeadChip (Illumina, San Diego, CA, USA) using the automated protocol as detailed in the Infinium ® HD Assay Methylation Protocol. Methylome profiling data were generated over two batches for each brain region, respectively. All data generation were conducted by laboratory personnel blinded as to the clinical phenotype.

Postmortem brain tissue mRNA-Seq
The mRNA-Seq study was reported previously [43]. RNA samples (n case = 24, n control = 38) from the same cohort above with RNA integrity number (RIN) greater than 6 were proceeded to the library construction step for mRNA-Seq data generation. Libraries were constructed using TruSeq ® Stranded mRNA Library Prep (Illumina Inc., San Diego, CA, USA) according to manufacturer's protocol using 200 ng of input RNA. Briefly, poly-Acontaining mRNA was captured using poly-T oligonucleotide-attached magnetic beads. Following purification, the mRNA was fragmented using divalent cations under elevated temperature. The cleaved RNA fragments were copied into first strand cDNA using reverse transcriptase and random primers. Strand specificity was achieved by replacing dTTP with dUTP in the Second Strand Marking Mix (SMM), followed by second strand cDNA synthesis using DNA polymerase I and RNase H. These cDNA fragments were then followed by A-tailing and adapter ligation reactions. The products were purified and enriched with PCR to create the final cDNA library. All libraries were quantified by Caliper and real-time qPCR and amplified on cBot to generate the clusters on the flowcell, and sequenced using HiSeq4000 (Illumina Inc., San Diego, CA, USA) using paired end (100 bp × 2) sequencing to a sequencing depth of 40M reads (or 8G data). Sequencing data were generated over two batches for each brain region. All data generation were conducted by laboratory personnel blinded as to the clinical phenotype. This dataset was used to perform correlation analysis of selected CpG probes and mRNA transcript to shed light on the potential consequence of DNA methylation.

Data Pre-processing
Epigenetic data were analyzed separately for each brain region/wave. Quality control of the epigenetic data was performed using ChAMP R package [44]. Probes that did not perform well (with detection p value ≥ 0.01 in one or more samples (n STG = 10,136 and n IFG = 10,920 for the STG and IFG samples, respectively), or with bead count < 3 in at least 5% of samples (n STG = 4220 and n IFG = 7179), probes with known SNP sites or with cross-reactivity [45] (n STG = 95,414 and n IFG = 94,915), non-CG probes (n STG = 2910 and n IFG = 2900), probes align to multiple locations on the genome [46] (n = 15), as well as probes located on the sex chromosomes (n STG = 16,400 and n IFG = 16,287) were filtered out. At the sample level, gender based on the methylation data was estimated using getSex function in the minfi (v1.28.4) R package and compared to that from the clinical phenotype. No discrepant gender was detected for the study samples. Since a subject may have samples from two brain regions assayed in this study, sample identity check was performed using R package ewastools (v1.6) [47]. All expected pairs of identity were confirmed, and all detected pairs of identity were expected.
The methylation levels were then normalized using Dasen method in R package wateRmelon [48]. The neuronal vs. non-neuronal cell composition was estimated using the estimateCellCounts function in minfi [49] which used a reference brain dataset of fluorescence activated cell sorting (FACS) sorted neuronal and nonneuronal nuclear fractions [50]. Surrogate variables are covariates constructed directly from high-dimensional data that could be used in subsequent analyses to adjust for unknown, unmodeled, or latent sources of noise [51,52]. We used sva (v3.30.1) [53,54] to detect and estimate surrogate variables for unknown sources of variation to remove artifacts in the high-throughput experiments. Removing batch effects using surrogate variables in differential analysis have been shown to reduce dependence, stabilize error rate estimates, and improve reproducibility [55]. Samples with discrepant phenotype between sample label and the phenotype data linked to case ID on the sample label were excluded from downstream analysis.
To have a balanced age-matched study design, only samples from subjects aged between 60 and 89 inclusive were included in the analysis resulting in sample sizes of 127 samples (67 AD and 60 cognitively normal control) and 117 samples (60 AD and 57 cognitively normal control) for the STG and IFG, respectively, used in downstream analysis.

Identification of DMPs
We used M value in the statistical analysis to identify DMPs using limma [59] as M value was shown to provide better performance in detection rate and true positive rate for both highly methylated and unmethylated CpG sites and was more statistically valid than betavalue, despite beta-value was more biologically intuitive [60]. Epigenetic association model corrected the top five surrogate variables, sex, age, neuronal proportion, and Braak stage (as a continuous variable) was tested in a linear regression model to identify differentially methylated probes associated with Braak stage. Epigenome-wide association studies were prone to significant inflation and bias of test statistics, and a Bayesian method to control bias and inflation in EWAS based on estimation of the empirical null distribution was proposed and implemented in R package BACON [30]. We applied this Bayesian method as implemented in BACON v1.10.1 to control for inflation and lambda (l) inflation factors before and after correction was reported. A stringent threshold using Bonferroni correction was used to declare study-wide significance.
The discovered DMPs in each brain region were examined for consistency evidence in several ways. Firstly, we checked for consistency of effect size and directionality between the two brain regions in this study; secondly, we compared the effect sizes from this study to those reported in the recent meta-analysis [21]. Lastly, we attempted to replicate the published genome-wide significant DMPs and DMRs from the recent meta-analysis given that our data were generated using Illumina EPIC BeadChip, and the published studies were using the Illumina HumanMethylation450 BeadChip.

Identification and annotation of DMRs and genomic feature enrichment
DMRs were identified using comb-p [61] with a distance of 500 bp and a seeded p value of 1.0 × 10 -4 . The DMR analyses were carried out for all probes (irrespective of directionality of differential methylation), and DMRs with at least three probes and Sidak-corrected p less than 0.05 were considered significant and reported. The identified DMRs were annotated by HOMER [62]. HOMER first determined the distance of a DMR to the nearest transcription start site (TSS) and assigned the DMR to that gene, it then determined the genomic annotation of the region occupied by the center of the DMR and performed enrichment analysis of genomic features.

Gene set enrichment analysis
ORA [63] for genes near significant CpGs from Illumina's Infinium Human MethylationEPIC array was performed using missMethyl R package v1.16.0 [64], taking into account the differing number of probes per gene present on the array. Additionally, a GSEA [65] analysis was performed using R package methylGSA [66] adjusting for multiple p values of each gene by Robust Rank Aggregation (RRA), and then apply pre-ranked version of GSEA (GSEAPreranked) in gene set testing. Lastly, methylglm function within R package methylGSA was used for length bias correction using logistic regression [67].Gene ontology databases used included KEGG database [68] and c2.cp (a superset of c2.cp.biocarta, c2.cp.kegg, and c2.cp.reactome [69] and a few other data sources) (v7.0) subsets of Molecular signatures database (MSigDB) [70].
Over-representation analysis of genes implicated by DMR was performed using https ://www.gsea-msigd b.org/gsea/msigd b/compu te_overl aps.jsp which has a broader background gene set assumption and test overrepresentation at higher levels of the ontology hierarchy. Gene ontology databases used included c2.cp and c5 subsets of MSigDB [70].

Methylation-mRNA correlation analysis
In order to identify potential consequence of DNA methylation, paired methylation level-mRNA correlation analysis was performed using Partek Genomic Suite (Partek Inc, St Louis, MO, USA), which only examined the correlation between selected top CpG probes (n = 4) and the transcript level. We used fsva function in sva R package to perform frozen surrogate variable analysis [71] to remove nuisance batch effects from both methylation array and mRNA-Seq datasets and used the adjusted version of datasets for correlation analysis. Multiple testing correction was applied (p < 0.05/(4*2*(5772 + 3)) ~ 1.08 × 10 -6 to correlate for testing correlation with 5772 transcripts (only test the moderate to abundant transcripts) and 3 other CpG probes for 4 probes in 2 brain regions) as some of top CpG probes are in the homeobox transcription factors, and the functional consequence could be reflected in the targets of the transcriptional factors.