A combined epigenome- and transcriptome-wide association study of the oral masticatory mucosa assigns CYP1B1 a central role for epithelial health in smokers

Background The oral mucosa has an important role in maintaining barrier integrity at the gateway to the gastrointestinal and respiratory tracts. Smoking is a strong environmental risk factor for the common oral inflammatory disease periodontitis and oral cancer. Cigarette smoke affects gene methylation and expression in various tissues. This is the first epigenome-wide association study (EWAS) that aimed to identify biologically active methylation marks of the oral masticatory mucosa that are associated with smoking. Results Ex vivo biopsies of 18 current smokers and 21 never smokers were analysed with the Infinium Methylation EPICBeadChip and combined with whole transcriptome RNA sequencing (RNA-Seq; 16 mio reads per sample) of the same samples. We analysed the associations of CpG methylation values with cigarette smoking and smoke pack year (SPY) levels in an analysis of covariance (ANCOVA). Nine CpGs were significantly associated with smoking status, with three CpGs mapping to the genetic region of CYP1B1 (cytochrome P450 family 1 subfamily B member 1; best p = 5.5 × 10−8) and two mapping to AHRR (aryl-hydrocarbon receptor repressor; best p = 5.9 × 10−9). In the SPY analysis, 61 CpG sites at 52 loci showed significant associations of the quantity of smoking with changes in methylation values. Here, the most significant association located to the gene CYP1B1, with p = 4.0 × 10−10. RNA-Seq data showed significantly increased expression of CYP1B1 in smokers compared to non-smokers (p = 2.2 × 10−14), together with 13 significantly upregulated transcripts. Six transcripts were significantly downregulated. No differential expression was observed for AHRR. In vitro studies with gingival fibroblasts showed that cigarette smoke extract directly upregulated the expression of CYP1B1. Conclusion This study validated the established role of CYP1B1 and AHRR in xenobiotic metabolism of tobacco smoke and highlights the importance of epigenetic regulation for these genes. For the first time, we give evidence of this role for the oral masticatory mucosa. Electronic supplementary material The online version of this article (10.1186/s13148-019-0697-y) contains supplementary material, which is available to authorized users.


Background
Smoking is considered one of the most important environmental factors causing premature death by promoting a variety of malignant and non-malignant diseases. It increases the relative risk of dying by up to 2.8 and shortens lifetime expectancies by 10 years on average [1]. The oral mucosa is the barrier tissue that is most directly exposed to cigarette smoke. Accordingly, smoking is a strong risk factor for a variety of oral diseases such as oral cancer [2] and the widespread common oral inflammatory disease periodontitis (PD) [3]. The deleterious effects of smoking on tissue integrity are driven by a variety of biological mechanisms. In the recent past, evidence grew that some of these mechanisms correlate with changes in DNA methylation patterns. DNA methylation plays a particular role in cell differentiation and the maintenance of cell specificity [4] and represents an important mechanism for cells to react on persistent external stimuli, allowing somatic cells to adjust gene expression to a particular environment in a long-term manner that can be passed on to daughter cells [5,6]. Smoking is currently the best studied environmental factor altering DNA-methylation patterns [7]. A plethora of studies exist that show significantly different methylation patterns in smokers compared to nonsmokers in different tissues [2,[8][9][10][11][12][13][14][15][16][17][18][19][20][21][22] and in the context of smoke-related diseases like cancer [23] or chronic obstructive pulmonary disease (COPD) [21]. The bestreplicated gene-specific association of differential methylation with smoking is attributed to the aryl-hydrocarbon receptor repressor AHRR across various tissues and cell types [8, 10, 12-14, 16, 17, 20, 22, 24, 25]. AHRR is a core regulator in the aryl hydrocarbon receptor (AhR) signalling cascade, modulating dioxin toxicity pathways involving Cytochrome P450, most likely in a cell-specific manner [26]. Most EWAS that assessed the role of CpG methylation in tobacco-related xenobiotic metabolism analysed leukocytes, but some also analysed airway epithelial cells [27] and buccal cells [12,21,22]. These cell types are located at the interface to the environment and form a mechanical and physiological barrier. Most notably, a large comprehensive EWAS analysed buccal cells of 790 healthy individuals and identified a set of 1501 CpGs that were differentially methylated in smokers [22]. Buccal swabs largely consist of non-keratinized squamous epithelial cells detached from the inner facial cheeks, and immune cells from the saliva. In the current study, we chose to analyse the effects of cigarette smoke on the healthy keratinized oral mucosa, which is restricted to the gingiva, the tongue and the hard palate. It is the tissue that forms the mechanical barrier between the oral cavity and the hard tissues (alveolar bone and skull). It consists of an upper layer of highly keratinized stratified squamous epithelium, the underlying keratinized connective tissue and invaded immune cells, which make it a strong mechanical and immunological barrier against injuries and pathogens. In addition, the gingiva is the only interface of the human body that is penetrated by hard tissues (the teeth), forming a direct connection between the environment and the bone. In this context, the masticatory mucosa is unique. Accordingly, the gingiva is the tissue which is affected by the common inflammatory disease periodontitis, for which smoking is the strongest risk factor. In contrast, buccal cells are not involved in the pathophysiology of periodontitis. The different functions of the lining mucosa of the inner cheeks, where buccal cells are derived, and of the masticatory mucosa are characterised by pronounced differences in cell turnover [28], cellular differentiation [29], and differences in expression patterns and permeability [29,30]. Because of these special features in cell derivation, function and clinical significance, the objective of our study was to assess the putative differences of methylation and transcription of the oral masticatory mucosa between healthy smokers and never smokers. Knowledge of these differences will provide information on the regulatory responses of this interface to direct exposure to xenobiotics of tobacco smoke before periodontitis or oral cancer becomes a clinical manifestation, and if defective, may have a relevant role in the disease aetiology.
To this end, we performed an EWAS with 39 healthy participants, all of them showing no signs of oral inflammation as indicated by a periodontal screening index (PSI) < 2. Taking into account that the masticatory mucosa itself also differs in cell type composition across different oral regions [31], samples were extracted from a clearly defined site. In this very homogeneous set of ex vivo biopsies, methylation patterns were investigated using the Infinium Methylation EPICBeadChip. In order to investigate potential effects of differentially methylated CpG sites on gene expression, we additionally performed RNA-sequencing (RNA-Seq) in the same samples. With this study, we provide a genome-wide methylation reference set and a RNA-Sequencing data set for the healthy masticatory mucosa, the clinically affected tissue in periodontitis. For the first time, smoking-informed reference datasets for methylation and expression patterns are combined for the oral mucosa. With these datasets, we were able to validate associations of CpGs with smoking levels shown in buccal swabs [22] and add to the knowledge of potential tissuespecific differential methylation in smokers. We showed shifts in transcriptional patterns of smokers, which partly paralleled the differential methylation patterns.

DNA methylation differences of the masticatory mucosa between smokers and non-smokers
The DNA of 18 healthy smokers and 21 healthy nonsmokers of the same age was analyzed with the Infinium DNA MethylationEPIC BeadChip (Table 1). Next, 802, 254 DNA methylation probes that passed the QC criteria were investigated for statistical significant differences in methylation levels between smokers and never smokers in an ANCOVA adjusting for age, sex and batch effects. A QQ plot revealed some global inflation of the test statistics as compared to the expected distribution, with an inflation factor of λ = 1.12 (Fig. 1a). After correction for multiple testing, we found nine CpGs to be significantly hypomethylated in smokers, with cg04066994 in the AHRR (aryl-hydrogen receptor repressor) gene showing the strongest association (p = 5.9 × 10 −10 , Δβ = − 0.08 in smokers; Fig. 1b, Table 2). The overall effect sizes of differentially methylated loci in smokers are illustrated in Fig. 2. Additionally, two significant CpGs mapped to the intronic region of CYP1B1 and another one to an intergenic region 24 kb upstream of CYP1B1, three mapped to the long noncoding RNA LINC00673 and one was located in an intergenic regioñ 6 kb upstream of CYP1A2 (cytochrome P450 family 1 subfamily A member 2). Observed effect sizes were highest in CYP1B1, with Δβ = − 0.18 for cg02162897 (p = 5.5 × 10 −8 ).
Subsequently, the methylation levels were analysed for association with smoke pack-years (SPY), a measure for smoking intensity that includes information of the quantity of smoking. After correction for multiple testing, 54 CpGs showed significant hypomethylation and seven CpGs showed significant hypermethylation with increasing smoking levels, measured as SPY (q < 0.05; Figs. 3 and 4 and Table 3). The most significant differential methylation was observed in the 3′ untranslated region (3′UTR) of the gene CYP1B1 (cytochrome P450 family 1 subfamily B member 1), with an average change in methylation of Δβ = − 0.046 per ten SPY at cg01584760 (p = 4.0 × 10 −10 ; q = 3.2 × 10 −4 ). Additionally, four other CpGs in the intronic region of CYP1B1 showed significant hypomethylation with increasing SPY (e.g. cg20408276, Δβ = − 0.108 per ten SPY, p = 1.1 × 10 −7 ). Another three significant probes (q < 0.05) flanked the long noncoding antisense RNA CYP1B1-AS1, which is partially overlapping with CYP1B1 (e.g. cg10090109, Δβ = − 0.015 per ten SPY, p = 2.1 × 10 −8 ). Among the other significant probes (q < 0.05), six showed similar or higher effect sizes compared to those observed at CYP1B1. These mapped to the genes PIP4K2A (phosphatidylinositol-5-phosphate 4-kinase type 2 alpha; cg02030592, Δβ = − 0.106 per ten SPY, p = 5.3 × 10 −7 ), DLX5 (distal-less homeobox 5; e.g. cg11891395, Δβ = − 0.097 per ten SPY, p = 2.5 × 10 −6 ), FABP4 (fatty acid binding protein 4; cg07234508, Δβ = − 0.073 per ten SPY, p = 1.1 × 10 − 6 ), FABP5P3 (fatty acid binding protein 5 pseudogene 3; cg16171205, Δβ = − 0.047 per ten SPY, p = 6.4 × 10 −7 ) and C10orf46 (cg27009448, Δβ = − 0.046 per ten SPY, p = 3.0 × 10 −6 ). To account for our small sample size, which was very susceptible to false negative findings, we relaxed the significance threshold to q < 0.15 and compared the results of the SPY analysis to previously published results from a well-powered EWAS on smoking in buccal cells [22] and to a set of CpGs that was reported to be associated with smoking by multiple independent EWAS performed in blood [32]. Out of 352 CpGs that passed the lowered significance threshold in our data, 33 mapped to 18 loci that were associated in buccal cells, three of which were also reported in blood, among them AHRR (Table 4, Additional file 1). Moreover, two CpGs were associated with q < 0.15 that mapped to loci associated in blood, but not in buccal cells.

Estimation of different cell type fractions in the masticatory mucosa
The masticatory mucosa consists of a layer of keratinized epithelium, an underlying layer of connective tissue and invaded immune cells. We calculated the fraction of different cell types with the EpiDISH algorithm. The number of fibroblasts and immune cells in our samples was moderate (median 0.15 and 0.16, respectively) compared to epithelial cells (median 0.69; Fig. 5a, Additional file 2). The amount of epithelial cells in smokers and never smokers was highly similar (median 0.69 and 0.68, respectively). Similarly, we observed no differences in the estimated fractions of fibroblasts and immune cells, which seemed to be very slightly reduced in smokers (median of 0.15 and 0.14, respectively, compared to 0.16 for both cell types in never smokers).

Differential gene expression between smokers and never smokers in the masticatory mucosa
We performed RNA-Seq of the same samples that were used in the EWAS to quantify the gene expression in the masticatory mucosa. Among the ten highest expressed genes in the solid oral mucosa, six were keratin genes (KRT1, KRT6A, KRT10, KRT14, KRT16 and KRT76 with median TPM values between 118,399 and 355,686; Additional file 3). The other genes of the top ten highest expressed genes were S100A8 and S100A9 of the S100 protein family, which has a prominent role in the regulation of inflammatory processes and immune response, COX1 (cytochrome c oxidase subunit 1) and EEF1A1 (elongation factor 1-alpha 1). After adjustment for multiple testing according to Benjamini & Hochberg, 20 genes showed differential expression between smokers and never smokers ( Table 5). The most significant association of differential expression and smoking status was found for CYP1B1 (p = 2.2 × 10 −14 , q = 4.9 × 10 −10 , log2 fold change = 2.9). A KEGG pathway analysis showed a significant upregulation of two pathways, the Metabolism of Xenobiotics by Cytochrome P450 Pathway, (KEGG entry: rn00980; p = 3.7 × 10 −4 ; q = 0.034; Table 6) and the Aminoacyl-tRNA biosynthesis pathway (KEGG entry: hsa00970; p = 1.9 × 10 −4 ; q = 0.034). A Reactome pathway analysis showed 24 pathways to be significantly upregulated in smokers (q < 0.05). Out of these, 14 pathways belonged to the pathway classes of "cell cycle function" and/or "DNA replication" and three to the pathway class "metabolism" with respect to electron transport, most notably the top significant pathway involving the Citric Acid Cycle and Respiratory Electron Transport (Reactome identifier: R-HSA-1428517; p = 1.4 × 10 −5 ; q = 0.009). In both approaches, no significantly downregulated pathways were found (Additional file 4).
We next investigated whether changes in methylation patterns have a measurable effect on gene expression in vivo. At a significance threshold of FDR = 0.05, CYP1B1 was the only gene that showed correlation of hypomethylation with gene expression. Considering the limited statistical power of our sample that made it susceptible for false negative associations, we relaxed the significant  Genetic region previously reported to be associated with smoking in buccals [22] thresholds and correlated all CpGs with q < 0.15 to those genes that were differentially expressed in smokers with a nominal p value of < 0.05. This analysis suggested differential expression of 15 genes with 25 CpGs at q < 0.15 (Additional file 5). To balance the resulting increased risk of type I errors, we subsequently included only those genes that mapped to at least two associated CpGs (q < 0.15) in our EWAS and had previously been reported to be associated with smoking in buccal swabs or blood [22,32]. In total, four genes fulfilled these criteria: Identifiers are given for the CpGs that were significantly associated with smoking after adjustment for multiple testing CYP1B1, SLC7A5 (solute carrier family 7 member 5), EDC3 (enhancer of mRNA decapping 3) and LAMA3 (laminin subunit alpha 3), the latter of which showed opposite effect sizes for differential expression and differential methylation (Table 7). Because the differentially methylated lncRNA CYP1B1-AS1 was not covered by our RNA-Seq reads, we analysed the expression by qRT-PCR in 69 individuals (n = 32 smokers, 37 never smokers). CYP1B1-AS1 was stronger expressed in smokers when compared to never smokers with p = 0.0061.
In vitro effects of cigarette smoke extract on the expression of hypomethylated genes in primary gingival fibroblasts CYP1B1 had been shown to be induced in response to cigarette smoke exposure in the adenocarcinomic human alveolar basal epithelial cell line A549 from cancerous lung tissue [23]. To validate the regulatory effect of tobacco smoke on CYP1B1 gene expression for gingival cells, we exposed gingival fibroblasts to cigarette smoke extract (CSE) in vitro. Exposure to CSE significantly increased the mRNA levels of CYP1B1 and of the corresponding antisense RNA CYP1B1-AS with p = 0.019 and p = 0.023, respectively (Fig. 6).

Association of smoking-induced changes in methylation levels in oral squamous cell carcinoma
To investigate a potential role of smoking-induced CYP1B1 and AHRR hypomethylation in the development of oral cancer, we compared the methylation data from our healthy never smokers with the TCGA data for 16 oral squamous cell carcinoma (OSCC) biopsies that were specified as "gum" or "palate", similar to the samples used in our EWAS. As the exact extraction site was not specified for OSCC samples, we first investigated cell type compositions using EpiDISH in this potentially heterogeneous tissue source. The cell composition varied extremely among OSCC samples, irrespective of smoking status (Fig. 5b, Additional file 2), with an overall higher amount of epithelial cells compared to our EWAS never smokers (median of 0.80 in OSCC compared to 0.69 in our samples). The OSCC data are based on the 450k array, and of a total of 202 CpGs at the genetic region of AHRR and CYP1B1 that were analysed in our EPIC array dataset, 36% (n = 72) were not analysed with the 450k array. Yet, a total of 130 CpGs at AHRR (n = 98 CpGs) and CYP1B1 (n = 32 CpGs) were shared between the OSCC samples and our data. In the OSCC data, at CYP1B1, 1 CpG was significantly hypomethylated and 16 CpG were significantly hypermethylated (corrected  , and Δβ = 0.45 for cg09799983 in CYP1B1 (p = 7.5 × 10 −12 ), which both were not associated with smoking in our samples (Additional file 6). In our EWAS, five CpGs within CYP1B1 showed significant hypermethylation in smokers, two of which were not analysed on the 450k chip for OSCC. The remaining three were significantly hypermethylated in OSCC, with the highest effect size at cg02162897 with Δβ = 0.197 (p = 4.7 × 10 −6 , Table 8). For AHRR, two CpGs were significantly associated in our EWAS but were not covered by the 450k chip used for the OSCC data. This is why we investigated the CpGs that flanked our associated CpGs in the EWAS, albeit these were not associated in our EWAS. In OSCC, one of these adjacent CpGs, cg04551776, showed a significant hypermethylation (Δβ = 0.15, p = 2.7 × 10 −5 ).

Discussion
In the current EWAS, we generated a reference dataset for combined differential gene methylation and expression patterns of the healthy masticatory mucosa of never smokers and smokers, accompanied by corresponding estimations of cell type fractions. We identified 63 significant differentially methylated CpGs after adjustment for multiple testing. Of these, the differential methylation at the gene CYP1B1 had the highest effect size, with 11% hypomethylation per ten SPY. CYP1B1 is a member of the cytochrome P450 gene family, and the gene product is involved in detoxification of pro-carcinogens in tobacco smoke [33]. CYP1B1 was previously reported to be hypomethylated with epigenome-wide significance in EWAS of buccal swabs [22] and small airway epithelial cells of healthy smokers [27]. The significantly associated genes PIP4K2A and DLX5 showed similar effect sizes as CYP1B1. Interestingly, the GWAS catalogue reports an association with variants within PIP4K2A for several cancer types. For DLX5, which was also found to be significantly associated with SPY in buccal cells [22], the GWAS catalogue reports variants associated with bone mineral densities and facial morphology. There is substantial evidence that smokers suffer from lower bone mineral densities [34,35], possibly due to increased bone resorption [36]. Smoke-related toxins were described as inducing bone loss mediated by the AhR-pathway involving CYP1A1, CYP1A2 and CYP1B1 in mice [37]. Interestingly, in the analysis of smokers against never smokers as a categorical value, we found nine CpGs to be significantly differentially methylated, with the top  Shown are results for all CpGs that map to genes previously reported in a large EWAS on buccal swabs [22] or in a systematic review on EWAS on blood [32] and which were associated with SPY in our EWAS with q < 0.  We consider the association of the non-coding RNA LINC00673 as another interesting finding. This lncRNA was significantly hypomethylated in smokers in our study at three individual CpGs. This gene was shown to be upregulated in tongue squamous cell carcinoma [38]. Differential methylation of AHRR, CYP1B1, and LINC00673 was shown before in buccal cells of smokers [22]. Increased expression of CYP1B1 by cigarette smoke was shown in vivo in buccal cells after smoke stimulation [39], in bronchial airway epithelial cells from smokers [40] and in vitro in A549 lung adenocarcinoma cells [23]. CYP1B1-AS1 was also shown to be upregulated in vitro by CSE exposure in A549 cells [23].
We could validate these findings in our differential expression analysis, showing a significant upregulation of CYP1B1, and additionally of CYP1A1 in the masticatory mucosa of smokers and a CSE-induced upregulation of CYP1B1 and CYP1B1-AS in vitro. Likewise, pathway analysis of differentially expressed genes revealed a significant upregulation of genes involved in the Metabolism of Xenobiotics by Cytochrome P450-KEGG pathway. Smoking is a widespread risk factor for periodontitis, a common inflammatory disease of the supporting tissues of the oral cavity, which is characterised by severe inflammation of the gingiva and irreversible bone degradation leading to tooth loss. With the data provided by this study, we add evidence to the significance of smoking on differential DNA methylation of cytochrome-P450-mediated AhR- pathway, a pathway that is involved in smoke-mediated bone degradation [37]. Apart from the upregulation of the Cytochrome P450related pathway, Reactome pathway analysis of differentially expressed genes found 58% (14 out of 24) altered pathways relating to the classes DNA replication or cell cycle function, suggesting a connection of smoking and tumorigenesis.
Our data did not suggest a global correlation of the results from the differential methylation analysis with the differential expression analysis. However, we note that the majority of effect sizes in smoking EWAS are not large, with 12.5% hypomethylation in smokers for AHRR in our study. Therefore, the identification of subtle changes in expression caused by the methylation differences detected in EWAS requires very large sample sizes, even if assumed that the relationship between methylation and expression is linear. When accounting for other regulatory mechanisms that disturb this hypothesised linearity, the detection of a direct relationship between changes in methylation patterns and changes in expression patterns becomes even more Fig. 6 Gene expression of CYP1B1 and CYP1B1-AS is significantly increased by cigarette smoke extract in vitro. Primary gingival fibroblast cell lines were cultured from three independent ex vivo biopsies (FB1-3) and treated for 6 h with cigarette smoke extract (CSE), labelled as "+". Control cell lines without CSE treatment are labelled as "−". qRT-PCR data were obtained from five treatments with CSE. Significant upregulation upon CSE treatment were observed for CYP1B1 (p adj = 0.019) and CYP1B1-AS (p adj = 0.023). P values are Bonferroni-adjusted for two independent tests Genes nominally significant in the differential expression (DE) analysis that correspond to those CpGs that were associated with smoke pack years (SPY) with q < 0.15 in the differential methylation (DM) analysis and that lie within genetic regions previously reported to be differentially methylated in buccals [22]. EDC3 was additionally reported to be differentially methylated in blood [32] a NCBI build GRCh37 b p values are from the analysis of covariance (ANCOVA) with adjustment by age, sex and batch. DE differential expression, Log2FC log2fold change, Chr chromosome, bp basepairs, Add. CpGs additional CpGs significant with q < 0.15 at this locus, DM differential methylation, SPY smoke pack years c Effect sizes for DE and DM are contradictory difficult. Accordingly, EWAS that identified a large correlation of significant methylation with transcription differences are rare, and this despite the fact that hypomethylation is functionally associated with increased gene expression. However, we are aware of a large study that addressed the effects of smoking on methylation and transcription in 730 blood samples, which identified 254 CpGs showing significant correlation of differential methylation with expression values of nearby transcripts [13]. We presume that our non-finding of a global correlation is probably due to the lack of statistical power of the comparably small sample size. When filtering our expression data for genes with multiple associated CpGs and taking into account knowledge on previously published differentially methylated CpGs, in addition to CYP1B1, we identified two further genes, SLC7A5 and EDC3, that showed consistent effect sizes for differential methylation and differential expression, and which were supported by smoking EWAS on buccal cells or blood. Variants within EDC3 have been reported in a large meta-analysis on caffeine intake, together with AHR, CYP1A1 and CYP1A2 [41]. EDC3 is located 25 kb upstream of CYP1A1 and5 5 kb upstream of CYP1A2, which also harboured an associated CpG 6 kb upstream in our EWAS. Taken together, our results point towards an important role of differential methylation of several genes of the CYP-gene cluster on chromosome 15 to regulate the AhR-pathway in the masticatory mucosa of smokers.
In the comparison of CYP1B1 methylation patterns of our never smokers data with the OSCC samples, all five significant CpGs were hypomethylated in our EWAS but in the OSCC data, CYP1B1 showed global hypermethylation (16 out of 17 significant CpGs). For AHRR, findings are less clear because none of our CpG was analysed on the 450k chip used for the OSCC samples. Likewise, in 51 OSCC samples from various extraction sites (e.g. buccal mucosa, floor of mouth, lip, palate), significant downregulation of CYP1B1 transcript levels was reported [42]. This contradicts the observation that CYP1B1 was overexpressed in many cancers [43]. As suggested earlier [22,23], our observations may indicate that demethylation and upregulation of CYP1B1 in the oral mucosal tissues of smokers is not causal for the development of cancer, but in contrast, part of the normal cellular detoxification. In oral cancer, these pathways might be misregulated, which in turn would lead to impaired regulation of xenobiotic metabolism. However, these results contradict findings from other cancer types, where CYP1B1 is overexpressed and serves as a tumour marker [44]. It needs to be emphasised that a major limitation of the OSCC dataset is that extraction sites were not specified. Cell compositions vary among different regions of the oral cavity [45], which is also suggested by the results of our cell type deconvolution analysis, and which could account for a stratification of results. As OSCC and other oral tissue samples from TCGA most likely have a heterogeneous cellular background originating from diverse tissue extraction sites, they might not be a perfect data source for a straightforward identification of differential methylation patterns. To investigate cell type-specific and disease-relevant methylation patterns in specific parts of the oral mucosa, further cell type deconvolution studies from different extraction sites and cell types of the oral mucosa can improve future analyses. The main limitation of our EWAS was the small sample size. Yet, our samples  Shown are the results for all CpGs within CYP1B1 and AHRR that were significantly associated with SPY levels or smoking status in our EWAS. We compared AHRR and CYP1B1 methylation levels in 16 oral squamous cell carcinoma (OSCC) samples from TCGA to our healthy never smoker samples. Differential methylation analysis was performed using an ANCOVA adjusting for sex and unknown variation. The CpGs cg01584760, cg20408276, cg06036945 and cg04066994 were not covered by the 450k analysis of the OSCC samples. For AHRR, we added the results for cg04551776 and cg25648203, which flanked our EWAS-CpGs with a distance of 1503 bp and 553 bp, respectively, albeit these CpGs were not associated in our EWAS a NCBI build GRCh37 b Top significant CpG in the differential methylation analysis with SPY c Top significant CpG in the differential methylation analysis with smoking status. Chr chromosome, bp basepairs, SPY smoke pack years, P adj Bonferroni-adjusted for 130 tests allowed the detection of the most significant tobacco smoke-related differentially methylated loci, which indicates the power of this homogeneously collected sample set.

Conclusion
The current study adds evidence to previous EWAS that reported significant hypomethylation of AHRR and CYP1B1 in buccal and airway epithelium of smokers. We could show increased expression of CYP1B1 and CYP1B1-AS in the masticatory mucosa of smokers as compared to never smokers and validated upregulation of these genes by cigarette smoke in vivo and in vitro. We conclude that hypomethylation of AHRR and CYP1B1 is an important regulatory mechanism of xenobiotic metabolism of the masticatory mucosa in response to tobacco smoke exposure.

Study sample
A total of 39 healthy smokers and never smokers, i.e. free of systemic disease, and all of European descent, were enrolled in this study, which was conducted in accordance with the principles of Good Clinical Practice and approved by the Independent Ethics Committee of each participating University Hospital. Written informed consent was obtained from all subjects according to the Declaration of Helsinki. All study participants completed a detailed questionnaire to provide general personal information (e.g. sex, age, geographical and family descent), information on the general and oral health and smoking habits (current, former, never smoker and amount of smoking in cigarette packs per day and duration of smoking in years). We calculated smoking pack year (SPY) values by multiplying the number of smoked cigarette packs per day by the number of years smoked. Fifty percent of the smokers had > 10 SPY, and 50% had between two and ten SPY. The mean age in smokers was 38 years and 35 years in never smokers. All individuals underwent periodontal examination prior to tissue extraction to exclude the presence of PD. Only individuals with a periodontal screening index (PSI) < 2 were included, indicating the absence of severe oral inflammation.

Collection of ex-vivo tissue samples from the masticatory oral mucosa
Solid tissue samples from the oral masticatory mucosa were collected at a defined site from the hard palate adjacent to the 4th and 5th tooth by the use of a tissue puncher with 3 mm diameter and a depth of approximately 1 mm (for an illustration of the area of tissue sampling, see Additional file 7: Figure S1). We consider this site also representative for the gingival mucosa (i.e. often inflamed in periodontitis) because the gingival masticatory mucosa and the masticatory mucosa of the hard palate share the same histological cellular structures (Stratum basale, Str. spinosum, Str. granulosum, Str. corneum) and therefore the same cell types (Additional file 7: Figure S2). Likewise, indicating similar states of cellular differentiation, tissue collection at the specified sampling area is a common approach during periodontal surgery when free gingival grafts are excised and transplanted to other mucosal areas in the oral cavity. To stabilise DNA and RNA, the biopsies were stored in the AllProtect reagent (Qiagen, Hilden, Germany) immediately after punching, stored at 4°C for 24 h, and subsequently transferred to − 20°C.

DNA and RNA extraction
The conserved tissue samples were manually broken up into small pieces with a scalpel and subsequently homogenised using the automated mixer mill MM 400 (Retsch, Haan, Germany) using frozen beads (

Differential DNA methylation analysis
For analysis and quality control, the R environment (Version 3.5.1) and the R package minfi (1.26.2) [46] were used, including the R package limma (Version 3.36.5) for the differential methylation analysis [47]. The Red/Green channel of the array were converted into one methylation signal without any normalisation. Using the function plotQC, we estimated sample-specific quality control (QC) and did not remove any sample. The QC criteria for probes were filtering of probes with median detection p values > 0.05, probes that lay within 5 basepairs (bp) of a single nucleotide polymorphism (SNP) with > 5% minor allele frequency, probes on the sex chromosomes and cross-reactive probes using the R package maxprobes (https://rdrr.io/github/markgene/ maxprobes) according to Pidsley et al. [48] and McCartney et al. [49]. In total, 802,254 probes complied with the QC criteria and were included in the analysis. Methylation status was estimated according to the fluorescent intensity ratio, as any value between β = 0 (unmethylated) and 1 (completely methylated), and log2transformed into M values, which are considered a more statistically valid estimator [50]. Corresponding effect sizes are, for a better biological understanding, given as Δβ, unless otherwise stated. After quality control, we performed a functional normalisation using the prepro-cessFunnorm function in minfi [51]. To identify CpGs correlating significantly with SPY, an analysis of covariance (ANCOVA) was performed for SPY as a continuous variable with adjustment by age as a continuous variable and sex as a categorical variable. We used SPY in this model to account for the cumulative risk exposure conferred by long-term smoking as suggested by Teschendorff et al. 2015 [22]. We additionally investigated differential methylation correlating with smoking status by performing an ANCOVA with the covariates age and sex. In both analyses, we additionally adjusted for technical variation, i.e. slide and position on slide, by using Combat and the R package BatchQc [52]. Correction for multiple testing was performed using the method by Benjamini & Hochberg [53]. CpGs were annotated to genes according to GRCh37/hg19 as provided in the MethylationEPIC BeadChip manifest.

Cell type deconvolution of EWAS samples
To identify the presence of non-epithelial cells in our samples of oral mucosa, we used the EpiDISH algorithm (Version 3.9) [54], applying the centEpiFibIC.m.rda reference dataset on our beta-values, which includes centroids for epithelial cells, fibroblasts and immune cells.

RNA-sequencing
To validate the effects of the observed methylation differences between smokers and never smokers on gene expression in the oral mucosa, we performed RNA-sequencing (RNA-Seq) of the same samples that were used in the EWAS. This ensured a homogeneous genetic and cellular background of the methylation and expression analysis. The total mRNA of 38 biopsy samples that passed QC for RNA-Seq (> 600 ng of total RNA, RIN > 7) was sequenced with a coverage of 16 million 50 bp single-end reads per sample. One RNA-sample was discarded due to low amounts of RNA. In brief, the library preparation was performed with 600 ng RNA using the TruSeq Stranded mRNA Kit (Illumina). RNA-Seq was performed on a HiSeq3000 (Illumina) at the Institute for Clinical Molecular Biology of the Christian-Albrechts-University Kiel, Germany.

Differential expression analysis
RNA-seq samples were mapped to the transcriptome (Ensembl 96, GRCh38) using the transcript abundance quantification method Salmon [55]. Salmon estimates abundances without aligning reads. The transcript level estimates were then summarised for the gene level and imported into R using the package tximport [56]. Subsequently, the imported data was analysed for differential expression between smokers and never smokers using DESeq2 [57]. Genes with ≤ 2 counts in all samples combined were not considered in the analyses. To identify genes that show substantial evidence for a correlation of differential expression with differential methylation in smokers, we applied the following steps: (1) filtering of all differentially expressed transcripts with unadjusted p < 0.05; (2) filtering of transcripts with more than one CpG associated with q < 0.15 with SPY; (3) filtering of transcripts that were previously reported in a large EWAS on buccal swabs [22] or in a systematic review on EWAS on blood [32].

Gene set enrichment analysis of differentially expressed genes
Gene sets of KEGG (n = 186) and Reactome (n = 674) were gathered from the Broad Institute Molecular Signatures Database (MSigDB [58,59]). Together with the log2FoldChanges of the differential expression analysis, we performed an enrichment analysis using the R package gage [60]. Gene sets with q value < 0.05 were considered as being significantly enriched.
Preparation of cigarette smoke extract and RNA extraction of cultured cells Cigarette smoke extraction system and cigarette smoke extract (CSE) were prepared as described in [61]. CSE was prepared from Roth-Händle no filters cigarettes (Reemtsma, Hamburg, Germany). Smoke of three cigarettes was filtered using a 0.2-μM sterile filter and extracted into 10 mL cell culture medium. The CSEcontaining medium was directly added to the cells (3 mL per well).

CSE stimulation of primary gingival fibroblasts
For CSE stimulation, solid ex vivo biopsies of the masticatory oral mucosa of the hard palate were taken from three additional participants with 3 mm tissue punchers. Biopsies were dissected enzymatically to separate the epithelial cells from the fibroblasts by overnight incubation in 5 mg/mL dispase II (Sigma Aldrich) diluted in cell growth medium (DMEM, 1% Pen/Strep) at 4°C. The primary gingival fibroblast cells (pGFs) were subsequently cultured in cell growth medium (DMEM, 1% Amphotericin B, 1% Pen/Strep, 1% non-essential amino acids). Prior to CSE stimulation, the pGFs (passage 3-4) were seeded in 6-well tissue culture plates (TPP Techno Plastic Products, Trasadingen, Switzerland) 24 h before stimulation (2.2 × 10 5 cells per well). The CSE was freshly prepared at the day of stimulation. CSE induction was performed for 6 h in three biological replicates (each with five technical replicates) of pGFs, with aliquots of the same CSE. Three millilitre medium without CSE was added to the control cell plates (five technical replicates for each of the three donors). After 6 h of CSE incubation, the cells were washed twice with PBS. Cell disruption and total RNA extraction was carried out using the RNeasy Mini Kit (Qiagen) according to the manufacturer's instructions.

cDNA synthesis and qRT-PCR
To test differential expression in primary gingival fibroblasts upon CSE-stimulation, we performed quantitative reverse transcription polymerase chain reaction (qRT-PCR) for CYP1B1 and CYP1B1-AS. qRT-PCR was performed with an amount of 500 ng of total RNA of CSE-stimulated cells using the High-Capacity cDNA Reverse Transcription Kit and oligo-(dT)-primers (Thermo Fisher Scientific, Waltham, USA) in accordance with the manufacturer's guidelines, using oligo-(dT)-and dNTPprimers instead of random primers. Control PCR reactions contained water instead of cDNA. qRT-PCR experiments were performed using the CFX Connect System (Bio-Rad, Hercules, USA) in combination with SYBR Select Master Mix (Thermo Fisher Scientific) according to the manufacturer's instructions. The gene expression levels were normalised to the mRNA expression of GAPDH, and relative expression was calculated using the mathematical model delta-delta ct (GraphPad Prism Software/R). Primers were manufactured (metabion GmbH, Planegg/Steinkirchen, Germany) with the following sequences: CYP1B1 (forward: GAC GAC CCC GAG TTC CGT GA; reverse: AGC CAG GGC ATC ACG TCC AC) and CYP1B1-AS (forward: AGC CCT GAA AGA TGA ACA GTG GT; reverse: GGC ATG CCC ATT TCT TCC ACA). Because the lncRNA CYP1B1-AS1 was not covered by mRNA-Sequencing, we analysed expression in our exvivo biopsies from the EWAS and 17 additional biopsies that were collected as described above with qRT-PCR. In total, CYP1B1-AS-expression of 32 smokers and 37 never smokers was analysed.
Cell type deconvolution and differential methylation analysis of OSCC samples To identify a putative role of genes that showed differential methylation in oral mucosa of smokers in cancer-development, we analysed data from The Cancer Genome Atlas (TCGA) [62]. We downloaded Illumina 450k methylation bead chip data from 16 samples of oral squamous cell carcinoma (OSCC) of keratinized oral mucosa (sites specified as "palate" or "gums"; i.e., Hard Palate, Upper Gum, Lower Gum and Gum NOS in the TCGA). To account for cell type-specific methylation, we performed the cell type deconvolution analyses on the 450k OSCC data as described above. We then performed an ANCOVA for all 130 CpGs that were annotated on the Illumina 450k array to the genetic regions of CYP1B1 and AHRR with sex as a covariate and adjusting for technical variation using the R package sva [63] to compare OSCC samples to the healthy never smokers from the EWAS.