Tobacco smoking is associated with methylation of genes related to coronary artery disease

Tobacco smoking, a risk factor for coronary artery disease (CAD), is known to modify DNA methylation. We hypothesized that tobacco smoking modifies methylation of the genes identified for CAD by genome-wide association study (GWAS). We selected genomic regions based on 150 single-nucleotide polymorphisms (SNPs) identified in the largest GWAS on CAD. We investigated the association between current smoking and the CpG sites within and near these CAD-related genes. Methylation was measured with the Illumina Human Methylation 450K array in whole blood of 724 Caucasian subjects from the Rotterdam Study, a Dutch population based cohort study. A total of 3669 CpG sites within 169 CAD-related genes were studied for association with current compared to never smoking. Fifteen CpG sites were significantly associated after correction for multiple testing (Bonferroni-corrected p value <1.4 × 10−5). These sites were located in the genes TERT, SARS, GNGT2, SMG6, SKI, TOM1L2, SIPA1, MRAS, CDKN1A, LRRC2, FES and RPH3A. In 12 sites, current smoking was associated with a 1.2 to 2.4 % lower methylation compared to never smoking; and in three sites, it was associated with a 1.2 to 1.8 % higher methylation. The effect estimates were lower in 10 of the 15 CpG sites when comparing current to former smoking. One CpG site, cg05603985 (SKI), was found to be associated with expression of nearby CAD-related gene PRKCZ. Our study suggests an effect of tobacco smoking on DNA methylation of CAD-related genes and thus provides novel insights in the pathways that link tobacco smoking to risk of CAD.


Background
In recent years, large genome-wide association studies (GWAS) have been conducted to identify genetic risk factors for a vast amount of diseases including coronary artery disease (CAD). These GWAS have successfully identified tens of single-nucleotide polymorphisms (SNPs) located in genes and their vicinity that might play a role in the pathophysiology of CAD. The CARDI-oGRAMplusC4D consortium is the largest CAD-GWAS consortium comprising 63,746 CAD cases and 130,681 controls [1]. This consortium has found 46 susceptibility loci significantly associated with the risk of CAD and 104 loci suggestive for CAD (FDR < 0.05).
One of the major risk factors for CAD is tobacco smoking which accounts for 10-15 % of the risk [2]. Recent studies have shown that smoking can interact with genetic variation to increase the risk of CAD [3,4]. One of the potential mechanisms for this interaction is DNA methylation. DNA methylation is the attachment of a methyl group to a nucleotide which occurs most often at the cytosine nucleotide of CpG dinucleotides. Methylation has varying functions at different locations in the human genome including influence on gene expression [5].
Since studies have established an important role for smoking in DNA methylation [6][7][8], we hypothesized that tobacco smoking changes DNA methylation of genes in/near loci identified for CAD, which in turn could alter gene expression of these genes. We therefore investigated the association between DNA methylation of genes in/near CAD-GWAS loci in whole blood and tobacco smoking in the Rotterdam Study. Furthermore, we investigated the association between methylation and expression of CAD-related genes nearby the identified differentially methylated sites.

Results
Characteristics of the participants under study are summarized in Table 1. Of the 724 subjects in the study, 195 were current smokers and 201 were never smokers. The mean age was 59.9 years. Among the current smokers, 50 % was male, among the never smoker 37 % (p = 0.008).
The 150 SNPs identified in the CAD GWAS were annotated to 85 genes with an in-gene variant and 84 cisexpression-quantitative trait loci (cis-eQTL) within 1 Mb of the identified SNPs as found in a large publically available blood cis-eQTL database (FDR < 0.05) (Additional file 1: Table S1) [9]. These genes had 3669 methylation sites measured on the array within and near the gene as provided by Illumina (Additional file 1: Table S2). After correction for multiple testing, 15 CpG sites were significantly associated with current smoking (p < 1.4 × 10 −5 ) ( Table 2). Current tobacco smoking was associated with a 1.2 to 2.4 % lower DNA methylation compared to never smoking in 12 of the CpG sites. In three CpG sites, current tobacco smoking was associated with a 1.2 to 1.8 % higher DNA methylation. The effect estimates of the associations were robust to further adjustment for total cholesterol, triglyceride levels, systolic blood pressure, daily alcohol intake and type 2 diabetes mellitus as potential confounders or mediators. In a sensitivity analysis in current smokers, two sites were significantly associated with cumulative exposure to tobacco smoking (Table 3).
When comparing current to former smokers, the effect estimates were lower and the differences were no longer significant in 10 of the 15 CpG sites (Table 2). This was confirmed in a sensitivity analysis in former smokers, which showed that cessation time was associated with differences in methylation level in three of the identified CpG sites ( Table 3).
The two top CpG sites, cg24908166 and cg12324353, were annotated to TERT ( Table 4). The two CpG sites were positively correlated with each other (r = 0.27, p < 0.001) and were located within 1 kb from each other in an intron of TERT. Two other CpG sites cg05603985 and cg09469355 were located within 1 kb from each other in the first exon and intron of SKI. Methylation of these two sites was positively correlated (r = 0.57, p < 0.001). CpG sites cg09397246 and cg26405020 were located within 1500 base pairs from the transcription start site of FES. These sites were within two base pairs from each other and had a positive correlation (r = 0.88, p < 0.001). The other significant hits were annotated to SARS, GNGT2, SMG6, TOM1L2, SIPA1, MRAS, CDKN1A, LRRC2 and RPH3A. The beta-value distributions for all identified CpG sites stratified by the three smoking categories can be found in Additional file 2: Figure S1. The associations between the 15 CpG sites and mRNA expression of nearby CAD genes are shown in Additional file 1: Table S3. Increased methylation of cg05603985 (SKI) was associated with increased expression of cis-eQTL gene PRKCZ (estimate = 0.035, p = 1.4 × 10 −4 ) (Additional file 1: Table S3 and Additional file 2: Figure S2). In the mediation analysis, we did not find a significant proportion of the association between tobacco smoking and expression of PRKCZ to be mediated by methylation of cg05603985 (proportion mediated = 0.24, p = 0.79). The other CpG sites were not associated with gene expression.

Discussion
The results of the current study suggest an association between tobacco smoking and DNA methylation of 12 genes suggested to be associated to CAD via GWAS. One of these CpG sites was found to be associated with expression of nearby CAD-related gene PRKCZ (protein kinase C, zeta).   We found that the differences in DNA methylation between current and former smokers was lower compared to the difference between current and never smokers in 10 of the 15 CpG sites. This suggests that the effect of tobacco smoking on DNA methylation of these CpG sites is relatively sustained after smoking cessation. To explore the time needed for these CpG sites to recover, we did an analysis on methylation and cessation time and found that cessation time was associated with DNA methylation in 3 of the 15 CpG sites. These results are in line with findings from a recent paper on the relation between tobacco smoking cessation and DNA methylation [6]. We further found that 2 of the 15 CpG sites were associated with cumulative tobacco smoking exposure in current smokers, which is concordant with results from previous studies [6,7].
The top two CpG sites, cg24908166 and cg12324353, are located within TERT (telomerase reverse transcriptase). High levels of TERT expression are found in macrophages of human atherosclerotic lesions [10]. Two other CpG sites, cg09397246 and cg26405020, were located near the transcription start site of FES (FES protooncogene, tyrosine kinase), which has been identified by GWAS to be associated with blood pressure and hypertension [11]. Smoking was further associated with methylation of cg09469355 and cg05603985 within SKI (avian sarcoma viral oncogene homolog) which is a repressor of TGF-beta activity. Decreased TGF-beta activity is associated with atherosclerosis development and plaque instability [12,13]. This could be a plausible pathway through which smoking can increase the occurrence of CAD since smoking has already been associated with decreased plasma levels of TGF-beta and decreased expression of TGF-beta in bronchial cell lines [14,15].
PRKCZ might be the functional gene for rs10797416, the most significant SNP of this locus reported in CAD-GWAS. Rs10797416 has a cis-eQTL effect on PRKCZ. PRKCZ is involved in proliferation, differentiation and secretion of almost all cell types including cardiac myocytes and has been associated with cardiomyopathy in chromosome 1p36 deletions [16,17]. We found that tobacco smoking is associated with decreased methylation of cg05603985, which is in turn associated with decreased expression of PRKCZ. We propose that demethylation of cg05603985 might be involved in development of CAD through PRKCZ expression, although the causality and mechanism through which this might occur are unclear. Cg05603985 is located in the binding site of transcriptional repressor CTCF [ENCODE:GSM1022677, UW, HCM] [18]. Demethylation of this CTCF binding site facilitates binding of CTCF, resulting in decreased expression of PRKCZ. This is due to the fact that CTCF prevents the binding of enhancers to the promoter of PRKCZ [19].
None of the other CpG sites was associated with the expression of nearby CAD-related genes. The lack of an association does not necessarily mean that methylation of these sites has no effect on expression but could result from an insufficient statistical power. This also applies for the non-significant mediation analysis. Furthermore, mRNA expression is tissue specific and an association may therefore not be found in whole blood. In addition, not all methylation sites in the human genome have an effect on mRNA expression. It could be that these methylation sites function through histone modification or DNA stability which could not be studied in the current work. Finally, it could be that these sites are merely biomarkers of tobacco smoking [5]. The availability of DNA methylation and mRNA expression data from the same samples is a major strength of this study. Therefore, we were able to conduct an indepth exploration of the association between smoking, DNA methylation and mRNA expression of CADrelated genes. Our study involved methylation and expression data from whole blood samples and not from vascular or lung tissue. This could be a limitation, since methylation and expression might be tissue specific. However, the relationship between smoking and DNA methylation has been confirmed in other tissues including lung tissue [20]. Furthermore, in a study of atherosclerotic tissue, three of our identified genes (PRKCZ, LCCR2 and SMG6) were found to be differentially methylated [21]. This provides further evidence that our findings may be influential in the atherosclerotic pathway. However, this needs further investigation in functional studies. A second limitation is the challenge of gene annotation in GWAS. GWAS locate risk variants for the phenotype under study, but the underlying causal gene might be difficult to designate. To minimize this problem, we limited our analysis to in-gene variants and variants with known cis-eQTL effects. Therefore, the CAD-related genes in our study are more plausible to be actual causal variants for CAD, thus making the results more convincing.

Conclusions
Our study provides examples of CAD-related genes of which differential methylation is associated with tobacco smoking. Whether or not these genes are in the causal pathway between smoking and coronary artery disease needs further elucidation as well as further efforts in large samples.

Study population
The study was conducted using data from the third cohort of the Rotterdam Study. The design of the Rotterdam Study has previously been described elsewhere [22]. Briefly, all inhabitants from the neighbourhood Ommoord in Rotterdam aged 45 years and over were invited to participate. During the centre visit, 3934 participants were examined between February 2006 and December 2008. We performed the analyses on a random subset of 747 Caucasian subjects from the centre visit. The study was approved by the medical ethics committee at Erasmus University Rotterdam, Rotterdam, the Netherlands, and all examined participants gave written informed consent.

Data collection
Data on tobacco smoking was collected during home interviews. Participants were asked about past and present cigarette, cigar and pipe smoking behaviour and were then categorized into current, former and never tobacco smokers. Furthermore, information on starting age of smoking, stopping age and time of discontinuation were used to determine packyears in current smokers (median 25.6 packyears, interquartile range 11.4 to 38.0) and cessation time in former smokers (median 21.5 years, interquartile range 10.6 to 32.6). Seven participants had missing smoking status and were therefore excluded from any analysis. During the centre visit, weight and height were measured in standing position wearing normal cloths. All participants had blood samples taken during the visit to quantify DNA methylation, messenger RNA (mRNA) expression levels and other blood measurements.

DNA methylation data
DNA was extracted from whole peripheral blood (stored in EDTA tubes) by standardized salting out methods. Genome-wide DNA methylation levels were measured using Illumina Human Methylation 450K array [23]. In short, samples (500 ng of DNA per sample) were first bisulfite treated using the Zymo EZ-96 DNA-methylation kit (Zymo Research, Irvine, CA, USA). Next, they were hybridized to the arrays according to the manufacturer's protocol. The methylation percentage of a CpG site was reported as a beta-value ranging between 0 (no methylation) and 1 (full methylation).
Quality control of the samples was done with Genome Studio. A total number of 16 samples were removed: seven had a sample call rate below 99 %; five had incomplete bisulfite conversion and four had gender swaps. Quality control of the probes was done based on the detection p-value calculated with Genome Studio. Probes with a detection p-value of more than 0.01 in more than 1 % of the samples were excluded. This resulted in a total set of 474,528 probes which were normalized using the Dasen option of the WateRmelon R-package [24].

mRNA expression data
Whole-blood was collected (PAXGene Tubes -Becton Dickinson), and total RNA was isolated (PAXGene Blood RNA kits -Qiagen). To ensure a constant high quality of the RNA preparations, all RNA samples were analysed using the Labchip GX (Calliper) according to the manufacturer's instructions. Samples with an RNA quality score more than 7 were amplified and labelled (Ambion TotalPrep RNA) and hybridized to the Illumina HumanHT12v4 Expression Beadchips as described by the manufacturer's protocol. Processing of the Rotterdam Study RNA samples was performed at the Genetic Laboratory of Internal Medicine, Erasmus University Medical Centre Rotterdam. The RS-III expression dataset is available at GEO (Gene Expression Omnibus) public repository under the accession GSE33828: 881 samples are available for analysis.
Illumina gene expression data was quantile normalized to the median distribution and subsequently log2 transformed. The probe and sample means were centred to zero. Genes were declared significantly expressed when the detection p-values calculated by Genome Studio were less than 0.05 in more than 10 % of all discovery samples, which added to a total number of 21,238 probes. Quality control was done using the eQTL mapping pipeline.

Statistical analysis
The characteristics of the study population were compared between current and never smokers using IBM SPSS Statistics version 21.0.0.1 (IBM Corp.). The p-values were calculated using independent sample T-test for continuous variables and chi-square test for dichotomous variables.
Of the 150 SNPs discovered by CARDIoGRAM-plusC4D, 96 were located within a gene [1]. In addition, 58 SNPs had known effect on expression of a gene within 1 Mb as found in a large publically available blood cis-eQTL database (FDR < 0.05) [9]. We annotated these SNPs to 85 genes with an in-gene variant and 84 cis-eQTL genes (Additional file 1: Table S1). The methylation probes within and near these CAD-related genes as provided by Illumina were included in the analysis. We excluded probes from the Infinium HD methylation SNP list with a minor allele frequency above 1 % as provided by Illumina, since variations in these SNPs can cause bias in the methylation measurement [25]. We further excluded known cross-reactive probes, since they can introduce bias in the results [26]. A complete list of the probes considered for analysis including reasons for exclusion can be found in Additional file 1: Table S2. The remaining 3669 methylation probes were checked for association with tobacco smoking using a linear mixed model with the LME4 package in R version 3.1.0 with Dasen normalized beta-values of the CpG sites as outcome measure [27]. We first compared current to never smokers and then performed a sensitivity analysis on the identified CpG sites comparing current to former smokers. Covariates were selected based on known association with DNA methylation and different distributions between current and never smokers in our samples. The selected covariates with fixed effects were age, sex and BMI [28][29][30][31]. Houseman estimated white blood cell proportions were used as fixed effects to correct for cell mixture distribution [32]. Array number and position on array were added in the model as covariates with random effects to correct for batch effects. We corrected for multiple testing using a robust Bonferroni-corrected p-value of 1.4 × 10 −5 as the threshold for significance (0.05/3669 probes).
To decrease the possibility of confounding in our association, we further adjusted the model in a second analysis for other possible confounders and mediators. This analysis included total cholesterol, triglyceride levels, systolic blood pressure, daily alcohol intake and type 2 diabetes mellitus.
To assess the effect of cumulative exposure to tobacco smoking on DNA methylation, we performed a sensitivity analysis on the identified CpG sites in current smokers with packyears per 10 year as exposure. To assess the effect of discontinuation of tobacco smoking exposure, we performed a sensitivity analysis on the identified CpG sites in former smokers using cessation time per 10 years as exposure. The robust Bonferroni-corrected p-value for both analyses was 3.3 × 10 −3 (0.05/15).

Functional analysis
Since DNA methylation may have an effect on gene expression, we tested the association between DNA methylation and mRNA expression of nearby CAD-related genes. In the 724 individuals under study, 16 genes out of 26 candidates were found to be expressed in blood. First, we regressed out the Houseman-estimated white blood cell proportions, the erythrocytes and platelet cell counts, fasting state, RNA quality score, plate number, age and sex on the mRNA expression levels using a linear mixed model. We then regressed out the Housemanestimated white blood cell proportions, age, sex, array number and position on array on the Dasen normalized beta-values of the CpG sites using a linear mixed model. The residuals of the mRNA expression levels and the residuals of the Dasen normalized beta-values of the CpG sites were checked for association using a linear regression model. We analysed the associations between the 15 methylation sites and 16 nearby CAD genes. Additional file 1: Table S3 indicates the 35 CpG-gene combinations that were tested. The robust Bonferroni-corrected p-value threshold for a significant association was 1.4 × 10 −3 (0.05/35 tests). Significant associations were reviewed in a mediation analysis with current versus never smoking as exposure, Dasen normalized beta-values of the CpG site as potential mediator and mRNA expression levels as outcome using the mediation package in R [33]. The method applied in the mediation package calculates the proportion of the association between tobacco smoking and expression that is mediated by methylation. This is done by comparing the effect estimates of tobacco smoking on expression in subjects with different levels of methylation.