Design and subjects
This study was embedded within the Rotterdam Study, a population-based cohort study in Rotterdam, the Netherlands. The design of the Rotterdam Study has been previously described in detail elsewhere [24]. Briefly, residents of Ommoord, a district in Rotterdam, aged 45 years and older were invited to participate. The Rotterdam Study includes three sub-cohorts. We used data from the baseline and second visit of the third cohort (RSIII-1 and RSIII-2) and the third visit from the second cohort (RSII-3).
Discovery panel
We used the data from RSIII-1 as the discovery panel: between February 2006 and December 2008, 3932 participants were examined. EWAS measurements were performed on a random subset of 731 subjects, of whom 725 had fasting blood samples available and were included in the current analyses.
Replication panel
We sought replication in a set of 767 participants from RSII-3 and RSIII-2. Between February 2011 and February 2012, 1887 participants from RSII-3 were examined. Between March 2012 and December 2013, approximately 3000 participants from RSIII-2 were examined. From the participants included in the replication study, 760 had fasting blood samples available and were included for analyses. None of the participants included in the replication study were included in the discovery cohort.
DNA methylation
DNA was extracted from whole blood (stored in EDTA tubes) by standardized salting out methods. Genome-wide methylation levels were measured using the Illumina Infinium HumanMethylation450 Beadchip (Illumina Inc., San Diego, CA) [25]. Briefly, samples (500 ng of DNA per sample) were bisulfite treated with use of the Zymo EZ-96 DNA methylation kit (Zymo Research, Irvine, CA, USA). Thereafter, the samples were hybridized to the arrays according to the protocol of the manufacturer. During quality control in RSIII-1, samples showing incomplete bisulfite treatment were excluded (n = 5) as were samples with a low detection rate (<99%) (n = 7) and gender swaps (n = 4). Probes with a detection p value >0.01 in >1% of the samples were filtered out [26, 27]. In RSII-3 and RSIII-2, outlying samples were checked using the first two principal components obtained using principal component analysis (PCA). None of the samples failed the quality control checks, indicating high quality data. Per individual probe, participants with methylation levels higher than three times the inter-quartiles range (IQR) were excluded. The methylation proportion of a CpG site was reported as a beta value ranging from 0 to 1 [28]. We used the genome coordinates provided by Illumina (GRCh37/hg19) to identify independent loci.
mRNA expression data
Total RNA was isolated (PAXgene Blood RNA kits—Qiagen) from whole blood (PAXgene Tubes—Becton Dickinson). All RNA samples were analyzed using the LabChip GX (Caliper) according to the manufacturer’s instructions, to ensure a constant high quality of RNA preparations. Samples with an RNA quality score >7 were amplified and labeled (Ambion TotalPrep RNA) and hybridized to the Illumina HumanHT-12 v4 Expression BeadChips, as described by the manufacturer’s protocol. RNA samples were processed at the Genetic Laboratory of Internal Medicine, Erasmus University Medical Center Rotterdam. The dataset including 881 expression samples from RSIII-1 is available at GEO (Gene Expression Omnibus) public repository under the accession GSE338828. Gene expression data was quantile normalized to the median distribution and log2 transformed. Probe and sample means were centered to zero. Genes were considered significantly expressed when detection p values calculated by Genome Studio were less than 0.05 in more than 10% of all discovery samples, which added to a total number of 21,238 probes. The eQTL-mapping pipeline was used to perform quality control [29].
Blood lipids
All participants had blood samples taken during the visits to the research center. From the blood samples, concentrations of triglycerides, HDL-C, and total cholesterol were measured using an automated enzymatic method. LDL-C was calculated using the Friedewald formula (total cholesterol − HDL-C − triglycerides/5) [20]. Participants with non-fasting blood samples were excluded from the current analyses (n = 6).
Covariates
Height and weight were measured during the center visit, and BMI was calculated (kg/m2). During home visit interviews, data on tobacco smoking, dietary intake, and medication use were collected. Information on smoking history was acquired from questionnaires and categorized as never, former, or current smoking. Nutritional data was collected using semi-quantitative FFQs, and information on the intake of different types of fatty acids were obtained. Fat intake was reported as total fat, PUFA, MUFA, and SFA. Information regarding the use of lipid-lowering medication was derived from both structured home interviews and linkage to pharmacy records.
Statistical analysis
Triglyceride level was log transformed using a natural log to obtain a normal distribution. The associations between lipid levels and DNA methylation beta values were examined using linear mixed-effect models.
Discovery
All models were adjusted for sex, age, smoking (current, former, or never), white blood cell proportions, and technical covariates (model 1). Gender, age, and smoking were added to the model as fixed effects. To correct for cell mixture distribution, leukocyte proportions (CD8+ T cells, CD4+ T cells, NK cells, B cells, monocytes, and granulocytes) were estimated using the Houseman method and were added to the model as fixed effects [23]. Technical covariates included array number and position on the array, and these were added to the models as random effects. To account for multiple testing, we used a Bonferroni-corrected p value of 1.08 × 10−7 (0.05/463,456 probes).
Replication
Identified probes were replicated using the same models as in the discovery cohort, further adjusted for the cohort. The adjustment for cell counts (monocytes, granulocytes, and lymphocytes) was based on lab measurements rather than Houseman estimates. For the replication, we applied a Bonferroni-corrected significance threshold of 6.25 × 10−3 (0.05/8 probes).
Meta-analyses
To combine results from the discovery and replication cohorts, fixed-effect meta-analyses were performed in METAL, using an inverse variance weighted method. In subsequent analyses, models were further adjusted for lipid-lowering medication use (model 2), waist circumference (model 3), and other lipids (model 4).
Additional analyses
Lipid levels may be affected by dietary fat intake, and this might be mediated through DNA methylation. Therefore, we explored associations between different types of fatty acid intake and DNA methylation at significantly replicated CpG sites. To account for potential measurement error and confounding by total energy intake, we used the residual method to adjust the fatty acid intake for total energy intake. Briefly, linear regression analyses were used with energy intake as the independent variable and fatty acid intake as the dependent variable to calculate the energy-adjusted intake of individual fatty acids for each subject. We regressed out the estimated leukocyte proportions, age, sex, array number, and position on array on the beta values of the CpG sites using linear mixed models. The associations between energy-adjusted fatty acid intakes and the residuals of the DNA methylation beta values were examined using linear regression models. All models were adjusted for sex, age, total energy intake (kcal/day), and smoking. Furthermore, to test if there was an interaction between lipid-lowering medication or fat intake and methylation at novel loci on lipids, interaction terms were added to the regression model, with one of the lipid traits as the outcome. In post hoc analyses, we adjusted our models on HDL-C for the PCSK9 genotype, as DHCR24 and PCSK9 are located near each other. In these analyses, the top SNP from GWAS (rs17111503) was added to our main model (model 1). As it is recommended to use M values due to heteroscedasticity in beta values, we performed additional analyses in which we replaced beta values with M values for comparison [22].
Methylation risk score
A methylation risk score was calculated based on CpG sites that were associated with the phenotypes, using both newly identified CpG sites in the current study as well as the ones previously reported for the corresponding trait [7, 8]. First, CpG sites were checked for correlation and CpGs were pruned giving priority to the most significant CpGs reported by the largest studies using a correlation coefficient cutoff of 0.6. Second, linear regression analyses were performed in the replication cohort, using the lipids as the dependent variable and the included CpG sites as independent variables. Models were adjusted for age, sex, blood cell counts, and technical covariates. The effect estimates were used to build the methylation risk score using data from the discovery panel. With the use of linear regression models, we calculated the lipid variance explained by the methylation risk score.
Functional analyses
Considering that DNA methylation can have an effect on the expression of genes, we explored the association between DNA methylation at the statistically significant CpG sites identified by EWAS and mRNA expression of the corresponding genes. The DNA methylation proportions and mRNA expression levels of these genes were checked for association using linear mixed models, which were adjusted for age, sex, smoking, white blood cell proportions, and technical covariates (array number and position on array).