- Open Access
A combined epigenome- and transcriptome-wide association study of the oral masticatory mucosa assigns CYP1B1 a central role for epithelial health in smokers
Clinical Epigeneticsvolume 11, Article number: 105 (2019)
The oral mucosa has an important role in maintaining barrier integrity at the gateway to the gastrointestinal and respiratory tracts. Smoking is a strong environmental risk factor for the common oral inflammatory disease periodontitis and oral cancer. Cigarette smoke affects gene methylation and expression in various tissues. This is the first epigenome-wide association study (EWAS) that aimed to identify biologically active methylation marks of the oral masticatory mucosa that are associated with smoking.
Ex vivo biopsies of 18 current smokers and 21 never smokers were analysed with the Infinium Methylation EPICBeadChip and combined with whole transcriptome RNA sequencing (RNA-Seq; 16 mio reads per sample) of the same samples. We analysed the associations of CpG methylation values with cigarette smoking and smoke pack year (SPY) levels in an analysis of covariance (ANCOVA). Nine CpGs were significantly associated with smoking status, with three CpGs mapping to the genetic region of CYP1B1 (cytochrome P450 family 1 subfamily B member 1; best p = 5.5 × 10−8) and two mapping to AHRR (aryl-hydrocarbon receptor repressor; best p = 5.9 × 10−9). In the SPY analysis, 61 CpG sites at 52 loci showed significant associations of the quantity of smoking with changes in methylation values. Here, the most significant association located to the gene CYP1B1, with p = 4.0 × 10−10. RNA-Seq data showed significantly increased expression of CYP1B1 in smokers compared to non-smokers (p = 2.2 × 10−14), together with 13 significantly upregulated transcripts. Six transcripts were significantly downregulated. No differential expression was observed for AHRR. In vitro studies with gingival fibroblasts showed that cigarette smoke extract directly upregulated the expression of CYP1B1.
This study validated the established role of CYP1B1 and AHRR in xenobiotic metabolism of tobacco smoke and highlights the importance of epigenetic regulation for these genes. For the first time, we give evidence of this role for the oral masticatory mucosa.
Smoking is considered one of the most important environmental factors causing premature death by promoting a variety of malignant and non-malignant diseases. It increases the relative risk of dying by up to 2.8 and shortens lifetime expectancies by 10 years on average . The oral mucosa is the barrier tissue that is most directly exposed to cigarette smoke. Accordingly, smoking is a strong risk factor for a variety of oral diseases such as oral cancer  and the widespread common oral inflammatory disease periodontitis (PD) . The deleterious effects of smoking on tissue integrity are driven by a variety of biological mechanisms. In the recent past, evidence grew that some of these mechanisms correlate with changes in DNA methylation patterns. DNA methylation plays a particular role in cell differentiation and the maintenance of cell specificity  and represents an important mechanism for cells to react on persistent external stimuli, allowing somatic cells to adjust gene expression to a particular environment in a long-term manner that can be passed on to daughter cells [5, 6]. Smoking is currently the best studied environmental factor altering DNA-methylation patterns . A plethora of studies exist that show significantly different methylation patterns in smokers compared to non-smokers in different tissues [2, 8,9,10,11,12,13,14,15,16,17,18,19,20,21,22] and in the context of smoke-related diseases like cancer  or chronic obstructive pulmonary disease (COPD) . The best-replicated gene-specific association of differential methylation with smoking is attributed to the aryl-hydrocarbon receptor repressor AHRR across various tissues and cell types [8, 10, 12,13,14, 16, 17, 20, 22, 24, 25]. AHRR is a core regulator in the aryl hydrocarbon receptor (AhR) signalling cascade, modulating dioxin toxicity pathways involving Cytochrome P450, most likely in a cell-specific manner . Most EWAS that assessed the role of CpG methylation in tobacco-related xenobiotic metabolism analysed leukocytes, but some also analysed airway epithelial cells  and buccal cells [12, 21, 22]. These cell types are located at the interface to the environment and form a mechanical and physiological barrier. Most notably, a large comprehensive EWAS analysed buccal cells of 790 healthy individuals and identified a set of 1501 CpGs that were differentially methylated in smokers . Buccal swabs largely consist of non-keratinized squamous epithelial cells detached from the inner facial cheeks, and immune cells from the saliva. In the current study, we chose to analyse the effects of cigarette smoke on the healthy keratinized oral mucosa, which is restricted to the gingiva, the tongue and the hard palate. It is the tissue that forms the mechanical barrier between the oral cavity and the hard tissues (alveolar bone and skull). It consists of an upper layer of highly keratinized stratified squamous epithelium, the underlying keratinized connective tissue and invaded immune cells, which make it a strong mechanical and immunological barrier against injuries and pathogens. In addition, the gingiva is the only interface of the human body that is penetrated by hard tissues (the teeth), forming a direct connection between the environment and the bone. In this context, the masticatory mucosa is unique. Accordingly, the gingiva is the tissue which is affected by the common inflammatory disease periodontitis, for which smoking is the strongest risk factor. In contrast, buccal cells are not involved in the pathophysiology of periodontitis. The different functions of the lining mucosa of the inner cheeks, where buccal cells are derived, and of the masticatory mucosa are characterised by pronounced differences in cell turnover , cellular differentiation , and differences in expression patterns and permeability [29, 30]. Because of these special features in cell derivation, function and clinical significance, the objective of our study was to assess the putative differences of methylation and transcription of the oral masticatory mucosa between healthy smokers and never smokers. Knowledge of these differences will provide information on the regulatory responses of this interface to direct exposure to xenobiotics of tobacco smoke before periodontitis or oral cancer becomes a clinical manifestation, and if defective, may have a relevant role in the disease aetiology.
To this end, we performed an EWAS with 39 healthy participants, all of them showing no signs of oral inflammation as indicated by a periodontal screening index (PSI) < 2. Taking into account that the masticatory mucosa itself also differs in cell type composition across different oral regions , samples were extracted from a clearly defined site. In this very homogeneous set of ex vivo biopsies, methylation patterns were investigated using the Infinium Methylation EPICBeadChip. In order to investigate potential effects of differentially methylated CpG sites on gene expression, we additionally performed RNA-sequencing (RNA-Seq) in the same samples. With this study, we provide a genome-wide methylation reference set and a RNA-Sequencing data set for the healthy masticatory mucosa, the clinically affected tissue in periodontitis. For the first time, smoking-informed reference datasets for methylation and expression patterns are combined for the oral mucosa. With these datasets, we were able to validate associations of CpGs with smoking levels shown in buccal swabs  and add to the knowledge of potential tissue-specific differential methylation in smokers. We showed shifts in transcriptional patterns of smokers, which partly paralleled the differential methylation patterns.
DNA methylation differences of the masticatory mucosa between smokers and non-smokers
The DNA of 18 healthy smokers and 21 healthy non-smokers of the same age was analyzed with the Infinium DNA MethylationEPIC BeadChip (Table 1). Next, 802,254 DNA methylation probes that passed the QC criteria were investigated for statistical significant differences in methylation levels between smokers and never smokers in an ANCOVA adjusting for age, sex and batch effects. A QQ plot revealed some global inflation of the test statistics as compared to the expected distribution, with an inflation factor of λ = 1.12 (Fig. 1a). After correction for multiple testing, we found nine CpGs to be significantly hypomethylated in smokers, with cg04066994 in the AHRR (aryl-hydrogen receptor repressor) gene showing the strongest association (p = 5.9 × 10−10, Δβ = − 0.08 in smokers; Fig. 1b, Table 2). The overall effect sizes of differentially methylated loci in smokers are illustrated in Fig. 2. Additionally, two significant CpGs mapped to the intronic region of CYP1B1 and another one to an intergenic region 24 kb upstream of CYP1B1, three mapped to the long noncoding RNA LINC00673 and one was located in an intergenic region ~ 6 kb upstream of CYP1A2 (cytochrome P450 family 1 subfamily A member 2). Observed effect sizes were highest in CYP1B1, with Δβ = − 0.18 for cg02162897 (p = 5.5 × 10−8).
Subsequently, the methylation levels were analysed for association with smoke pack-years (SPY), a measure for smoking intensity that includes information of the quantity of smoking. After correction for multiple testing, 54 CpGs showed significant hypomethylation and seven CpGs showed significant hypermethylation with increasing smoking levels, measured as SPY (q < 0.05; Figs. 3 and 4 and Table 3). The most significant differential methylation was observed in the 3′ untranslated region (3′UTR) of the gene CYP1B1 (cytochrome P450 family 1 subfamily B member 1), with an average change in methylation of Δβ = − 0.046 per ten SPY at cg01584760 (p = 4.0 × 10−10; q = 3.2 × 10−4). Additionally, four other CpGs in the intronic region of CYP1B1 showed significant hypomethylation with increasing SPY (e.g. cg20408276, Δβ = − 0.108 per ten SPY, p = 1.1 × 10−7). Another three significant probes (q < 0.05) flanked the long noncoding antisense RNA CYP1B1-AS1, which is partially overlapping with CYP1B1 (e.g. cg10090109, Δβ = − 0.015 per ten SPY, p = 2.1 × 10−8). Among the other significant probes (q < 0.05), six showed similar or higher effect sizes compared to those observed at CYP1B1. These mapped to the genes PIP4K2A (phosphatidylinositol-5-phosphate 4-kinase type 2 alpha; cg02030592, Δβ = − 0.106 per ten SPY, p = 5.3 × 10−7), DLX5 (distal-less homeobox 5; e.g. cg11891395, Δβ = − 0.097 per ten SPY, p = 2.5 × 10−6), FABP4 (fatty acid binding protein 4; cg07234508, Δβ = − 0.073 per ten SPY, p = 1.1 × 10− 6), FABP5P3 (fatty acid binding protein 5 pseudogene 3; cg16171205, Δβ = − 0.047 per ten SPY, p = 6.4 × 10−7) and C10orf46 (cg27009448, Δβ = − 0.046 per ten SPY, p = 3.0 × 10−6). To account for our small sample size, which was very susceptible to false negative findings, we relaxed the significance threshold to q < 0.15 and compared the results of the SPY analysis to previously published results from a well-powered EWAS on smoking in buccal cells  and to a set of CpGs that was reported to be associated with smoking by multiple independent EWAS performed in blood . Out of 352 CpGs that passed the lowered significance threshold in our data, 33 mapped to 18 loci that were associated in buccal cells, three of which were also reported in blood, among them AHRR (Table 4, Additional file 1). Moreover, two CpGs were associated with q < 0.15 that mapped to loci associated in blood, but not in buccal cells.
Estimation of different cell type fractions in the masticatory mucosa
The masticatory mucosa consists of a layer of keratinized epithelium, an underlying layer of connective tissue and invaded immune cells. We calculated the fraction of different cell types with the EpiDISH algorithm. The number of fibroblasts and immune cells in our samples was moderate (median 0.15 and 0.16, respectively) compared to epithelial cells (median 0.69; Fig. 5a, Additional file 2). The amount of epithelial cells in smokers and never smokers was highly similar (median 0.69 and 0.68, respectively). Similarly, we observed no differences in the estimated fractions of fibroblasts and immune cells, which seemed to be very slightly reduced in smokers (median of 0.15 and 0.14, respectively, compared to 0.16 for both cell types in never smokers).
Differential gene expression between smokers and never smokers in the masticatory mucosa
We performed RNA-Seq of the same samples that were used in the EWAS to quantify the gene expression in the masticatory mucosa. Among the ten highest expressed genes in the solid oral mucosa, six were keratin genes (KRT1, KRT6A, KRT10, KRT14, KRT16 and KRT76 with median TPM values between 118,399 and 355,686; Additional file 3). The other genes of the top ten highest expressed genes were S100A8 and S100A9 of the S100 protein family, which has a prominent role in the regulation of inflammatory processes and immune response, COX1 (cytochrome c oxidase subunit 1) and EEF1A1 (elongation factor 1-alpha 1).
After adjustment for multiple testing according to Benjamini & Hochberg, 20 genes showed differential expression between smokers and never smokers (Table 5). The most significant association of differential expression and smoking status was found for CYP1B1 (p = 2.2 × 10−14, q = 4.9 × 10−10, log2 fold change = 2.9). A KEGG pathway analysis showed a significant upregulation of two pathways, the Metabolism of Xenobiotics by Cytochrome P450 Pathway, (KEGG entry: rn00980; p = 3.7 × 10−4; q = 0.034; Table 6) and the Aminoacyl-tRNA biosynthesis pathway (KEGG entry: hsa00970; p = 1.9 × 10−4; q = 0.034). A Reactome pathway analysis showed 24 pathways to be significantly upregulated in smokers (q < 0.05). Out of these, 14 pathways belonged to the pathway classes of “cell cycle function” and/or “DNA replication” and three to the pathway class “metabolism” with respect to electron transport, most notably the top significant pathway involving the Citric Acid Cycle and Respiratory Electron Transport (Reactome identifier: R-HSA-1428517; p = 1.4 × 10−5; q = 0.009). In both approaches, no significantly downregulated pathways were found (Additional file 4).
We next investigated whether changes in methylation patterns have a measurable effect on gene expression in vivo. At a significance threshold of FDR = 0.05, CYP1B1 was the only gene that showed correlation of hypomethylation with gene expression. Considering the limited statistical power of our sample that made it susceptible for false negative associations, we relaxed the significant thresholds and correlated all CpGs with q < 0.15 to those genes that were differentially expressed in smokers with a nominal p value of < 0.05. This analysis suggested differential expression of 15 genes with 25 CpGs at q < 0.15 (Additional file 5). To balance the resulting increased risk of type I errors, we subsequently included only those genes that mapped to at least two associated CpGs (q < 0.15) in our EWAS and had previously been reported to be associated with smoking in buccal swabs or blood [22, 32]. In total, four genes fulfilled these criteria: CYP1B1, SLC7A5 (solute carrier family 7 member 5), EDC3 (enhancer of mRNA decapping 3) and LAMA3 (laminin subunit alpha 3), the latter of which showed opposite effect sizes for differential expression and differential methylation (Table 7). Because the differentially methylated lncRNA CYP1B1-AS1 was not covered by our RNA-Seq reads, we analysed the expression by qRT-PCR in 69 individuals (n = 32 smokers, 37 never smokers). CYP1B1-AS1 was stronger expressed in smokers when compared to never smokers with p = 0.0061.
In vitro effects of cigarette smoke extract on the expression of hypomethylated genes in primary gingival fibroblasts
CYP1B1 had been shown to be induced in response to cigarette smoke exposure in the adenocarcinomic human alveolar basal epithelial cell line A549 from cancerous lung tissue . To validate the regulatory effect of tobacco smoke on CYP1B1 gene expression for gingival cells, we exposed gingival fibroblasts to cigarette smoke extract (CSE) in vitro. Exposure to CSE significantly increased the mRNA levels of CYP1B1 and of the corresponding antisense RNA CYP1B1-AS with p = 0.019 and p = 0.023, respectively (Fig. 6).
Association of smoking-induced changes in methylation levels in oral squamous cell carcinoma
To investigate a potential role of smoking-induced CYP1B1 and AHRR hypomethylation in the development of oral cancer, we compared the methylation data from our healthy never smokers with the TCGA data for 16 oral squamous cell carcinoma (OSCC) biopsies that were specified as “gum” or “palate”, similar to the samples used in our EWAS. As the exact extraction site was not specified for OSCC samples, we first investigated cell type compositions using EpiDISH in this potentially heterogeneous tissue source. The cell composition varied extremely among OSCC samples, irrespective of smoking status (Fig. 5b, Additional file 2), with an overall higher amount of epithelial cells compared to our EWAS never smokers (median of 0.80 in OSCC compared to 0.69 in our samples). The OSCC data are based on the 450k array, and of a total of 202 CpGs at the genetic region of AHRR and CYP1B1 that were analysed in our EPIC array dataset, 36% (n = 72) were not analysed with the 450k array. Yet, a total of 130 CpGs at AHRR (n = 98 CpGs) and CYP1B1 (n = 32 CpGs) were shared between the OSCC samples and our data. In the OSCC data, at CYP1B1, 1 CpG was significantly hypomethylated and 16 CpG were significantly hypermethylated (corrected for multiple testing, n = 130 tests). At AHRR, 30 CpGs were significantly hypomethylated and 22 CpGs were significantly hypermethylated. The effect sizes in OSCC were high, with Δβ = − 0.48 for cg12202185 in AHRR (p = 2.8 × 10−7), and Δβ = 0.45 for cg09799983 in CYP1B1 (p = 7.5 × 10−12), which both were not associated with smoking in our samples (Additional file 6). In our EWAS, five CpGs within CYP1B1 showed significant hypermethylation in smokers, two of which were not analysed on the 450k chip for OSCC. The remaining three were significantly hypermethylated in OSCC, with the highest effect size at cg02162897 with Δβ = 0.197 (p = 4.7 × 10−6, Table 8). For AHRR, two CpGs were significantly associated in our EWAS but were not covered by the 450k chip used for the OSCC data. This is why we investigated the CpGs that flanked our associated CpGs in the EWAS, albeit these were not associated in our EWAS. In OSCC, one of these adjacent CpGs, cg04551776, showed a significant hypermethylation (Δβ = 0.15, p = 2.7 × 10−5).
In the current EWAS, we generated a reference dataset for combined differential gene methylation and expression patterns of the healthy masticatory mucosa of never smokers and smokers, accompanied by corresponding estimations of cell type fractions. We identified 63 significant differentially methylated CpGs after adjustment for multiple testing. Of these, the differential methylation at the gene CYP1B1 had the highest effect size, with 11% hypomethylation per ten SPY. CYP1B1 is a member of the cytochrome P450 gene family, and the gene product is involved in detoxification of pro-carcinogens in tobacco smoke . CYP1B1 was previously reported to be hypomethylated with epigenome-wide significance in EWAS of buccal swabs  and small airway epithelial cells of healthy smokers . The significantly associated genes PIP4K2A and DLX5 showed similar effect sizes as CYP1B1. Interestingly, the GWAS catalogue reports an association with variants within PIP4K2A for several cancer types. For DLX5, which was also found to be significantly associated with SPY in buccal cells , the GWAS catalogue reports variants associated with bone mineral densities and facial morphology. There is substantial evidence that smokers suffer from lower bone mineral densities [34, 35], possibly due to increased bone resorption . Smoke-related toxins were described as inducing bone loss mediated by the AhR-pathway involving CYP1A1, CYP1A2 and CYP1B1 in mice . Interestingly, in the analysis of smokers against never smokers as a categorical value, we found nine CpGs to be significantly differentially methylated, with the top two significant CpGs mapping to AHRR and CYP1B1, and another CpG 6 kb downstream of CYP1A2.
We consider the association of the non-coding RNA LINC00673 as another interesting finding. This lncRNA was significantly hypomethylated in smokers in our study at three individual CpGs. This gene was shown to be upregulated in tongue squamous cell carcinoma .
Differential methylation of AHRR, CYP1B1, and LINC00673 was shown before in buccal cells of smokers . Increased expression of CYP1B1 by cigarette smoke was shown in vivo in buccal cells after smoke stimulation , in bronchial airway epithelial cells from smokers  and in vitro in A549 lung adenocarcinoma cells . CYP1B1-AS1 was also shown to be upregulated in vitro by CSE exposure in A549 cells . We could validate these findings in our differential expression analysis, showing a significant upregulation of CYP1B1, and additionally of CYP1A1 in the masticatory mucosa of smokers and a CSE-induced upregulation of CYP1B1 and CYP1B1-AS in vitro. Likewise, pathway analysis of differentially expressed genes revealed a significant upregulation of genes involved in the Metabolism of Xenobiotics by Cytochrome P450-KEGG pathway. Smoking is a widespread risk factor for periodontitis, a common inflammatory disease of the supporting tissues of the oral cavity, which is characterised by severe inflammation of the gingiva and irreversible bone degradation leading to tooth loss. With the data provided by this study, we add evidence to the significance of smoking on differential DNA methylation of cytochrome-P450-mediated AhR-pathway, a pathway that is involved in smoke-mediated bone degradation .
Apart from the upregulation of the Cytochrome P450-related pathway, Reactome pathway analysis of differentially expressed genes found 58% (14 out of 24) altered pathways relating to the classes DNA replication or cell cycle function, suggesting a connection of smoking and tumorigenesis.
Our data did not suggest a global correlation of the results from the differential methylation analysis with the differential expression analysis. However, we note that the majority of effect sizes in smoking EWAS are not large, with 12.5% hypomethylation in smokers for AHRR in our study. Therefore, the identification of subtle changes in expression caused by the methylation differences detected in EWAS requires very large sample sizes, even if assumed that the relationship between methylation and expression is linear. When accounting for other regulatory mechanisms that disturb this hypothesised linearity, the detection of a direct relationship between changes in methylation patterns and changes in expression patterns becomes even more difficult. Accordingly, EWAS that identified a large correlation of significant methylation with transcription differences are rare, and this despite the fact that hypomethylation is functionally associated with increased gene expression. However, we are aware of a large study that addressed the effects of smoking on methylation and transcription in 730 blood samples, which identified 254 CpGs showing significant correlation of differential methylation with expression values of nearby transcripts . We presume that our non-finding of a global correlation is probably due to the lack of statistical power of the comparably small sample size. When filtering our expression data for genes with multiple associated CpGs and taking into account knowledge on previously published differentially methylated CpGs, in addition to CYP1B1, we identified two further genes, SLC7A5 and EDC3, that showed consistent effect sizes for differential methylation and differential expression, and which were supported by smoking EWAS on buccal cells or blood. Variants within EDC3 have been reported in a large meta-analysis on caffeine intake, together with AHR, CYP1A1 and CYP1A2 . EDC3 is located 25 kb upstream of CYP1A1 and ~ 55 kb upstream of CYP1A2, which also harboured an associated CpG 6 kb upstream in our EWAS. Taken together, our results point towards an important role of differential methylation of several genes of the CYP-gene cluster on chromosome 15 to regulate the AhR-pathway in the masticatory mucosa of smokers.
In the comparison of CYP1B1 methylation patterns of our never smokers data with the OSCC samples, all five significant CpGs were hypomethylated in our EWAS but in the OSCC data, CYP1B1 showed global hypermethylation (16 out of 17 significant CpGs). For AHRR, findings are less clear because none of our CpG was analysed on the 450k chip used for the OSCC samples. Likewise, in 51 OSCC samples from various extraction sites (e.g. buccal mucosa, floor of mouth, lip, palate), significant downregulation of CYP1B1 transcript levels was reported . This contradicts the observation that CYP1B1 was overexpressed in many cancers . As suggested earlier [22, 23], our observations may indicate that demethylation and upregulation of CYP1B1 in the oral mucosal tissues of smokers is not causal for the development of cancer, but in contrast, part of the normal cellular detoxification. In oral cancer, these pathways might be misregulated, which in turn would lead to impaired regulation of xenobiotic metabolism. However, these results contradict findings from other cancer types, where CYP1B1 is overexpressed and serves as a tumour marker . It needs to be emphasised that a major limitation of the OSCC dataset is that extraction sites were not specified. Cell compositions vary among different regions of the oral cavity , which is also suggested by the results of our cell type deconvolution analysis, and which could account for a stratification of results. As OSCC and other oral tissue samples from TCGA most likely have a heterogeneous cellular background originating from diverse tissue extraction sites, they might not be a perfect data source for a straightforward identification of differential methylation patterns. To investigate cell type-specific and disease-relevant methylation patterns in specific parts of the oral mucosa, further cell type deconvolution studies from different extraction sites and cell types of the oral mucosa can improve future analyses. The main limitation of our EWAS was the small sample size. Yet, our samples allowed the detection of the most significant tobacco smoke-related differentially methylated loci, which indicates the power of this homogeneously collected sample set.
The current study adds evidence to previous EWAS that reported significant hypomethylation of AHRR and CYP1B1 in buccal and airway epithelium of smokers. We could show increased expression of CYP1B1 and CYP1B1-AS in the masticatory mucosa of smokers as compared to never smokers and validated upregulation of these genes by cigarette smoke in vivo and in vitro. We conclude that hypomethylation of AHRR and CYP1B1 is an important regulatory mechanism of xenobiotic metabolism of the masticatory mucosa in response to tobacco smoke exposure.
A total of 39 healthy smokers and never smokers, i.e. free of systemic disease, and all of European descent, were enrolled in this study, which was conducted in accordance with the principles of Good Clinical Practice and approved by the Independent Ethics Committee of each participating University Hospital. Written informed consent was obtained from all subjects according to the Declaration of Helsinki. All study participants completed a detailed questionnaire to provide general personal information (e.g. sex, age, geographical and family descent), information on the general and oral health and smoking habits (current, former, never smoker and amount of smoking in cigarette packs per day and duration of smoking in years). We calculated smoking pack year (SPY) values by multiplying the number of smoked cigarette packs per day by the number of years smoked. Fifty percent of the smokers had > 10 SPY, and 50% had between two and ten SPY. The mean age in smokers was 38 years and 35 years in never smokers. All individuals underwent periodontal examination prior to tissue extraction to exclude the presence of PD. Only individuals with a periodontal screening index (PSI) < 2 were included, indicating the absence of severe oral inflammation.
Collection of ex-vivo tissue samples from the masticatory oral mucosa
Solid tissue samples from the oral masticatory mucosa were collected at a defined site from the hard palate adjacent to the 4th and 5th tooth by the use of a tissue puncher with 3 mm diameter and a depth of approximately 1 mm (for an illustration of the area of tissue sampling, see Additional file 7: Figure S1). We consider this site also representative for the gingival mucosa (i.e. often inflamed in periodontitis) because the gingival masticatory mucosa and the masticatory mucosa of the hard palate share the same histological cellular structures (Stratum basale, Str. spinosum, Str. granulosum, Str. corneum) and therefore the same cell types (Additional file 7: Figure S2). Likewise, indicating similar states of cellular differentiation, tissue collection at the specified sampling area is a common approach during periodontal surgery when free gingival grafts are excised and transplanted to other mucosal areas in the oral cavity. To stabilise DNA and RNA, the biopsies were stored in the AllProtect reagent (Qiagen, Hilden, Germany) immediately after punching, stored at 4 °C for 24 h, and subsequently transferred to − 20 °C.
DNA and RNA extraction
The conserved tissue samples were manually broken up into small pieces with a scalpel and subsequently homogenised using the automated mixer mill MM 400 (Retsch, Haan, Germany) using frozen beads (3 mm, Retsch) for 90 s at 30 Hz. Subsequently, DNA and RNA was extracted simultaneously using the AllPrep DNA/RNA/miRNA Universal Kit (Qiagen, Hilden, Germany), ensuring the same cellular background of DNA and RNA for subsequent comparison of DNA-methylation and RNA-expression. Integrity of DNA and RNA was subsequently verified with a 2% agarose gel electrophoresis. Concentrations were measured using the Multiskan GO Microplate Spectrophotometer (Fisher Scientific, Hampton, USA). DNA and RNA samples were stored at − 80 °C.
Bisulfite conversion and hybridisation to the Infinium MethylationEPIC BeadChips
Five hundred nanograms DNA per sample was bisulfite converted with the EZ-96 DNA Methylation Kit (Zymo Research, Irvine, USA) and hybridised to the Infinium DNA MethylationEPIC BeadChip (Illumina, San Diego, USA) on an iScan Microarray Scanner (Illumina) at the Institute for Clinical Molecular Biology, Christian-Albrechts-University Kiel, Germany.
Differential DNA methylation analysis
For analysis and quality control, the R environment (Version 3.5.1) and the R package minfi (1.26.2)  were used, including the R package limma (Version 3.36.5) for the differential methylation analysis . The Red/Green channel of the array were converted into one methylation signal without any normalisation. Using the function plotQC, we estimated sample-specific quality control (QC) and did not remove any sample. The QC criteria for probes were filtering of probes with median detection p values > 0.05, probes that lay within 5 basepairs (bp) of a single nucleotide polymorphism (SNP) with > 5% minor allele frequency, probes on the sex chromosomes and cross-reactive probes using the R package maxprobes (https://rdrr.io/github/markgene/maxprobes) according to Pidsley et al.  and McCartney et al. . In total, 802,254 probes complied with the QC criteria and were included in the analysis. Methylation status was estimated according to the fluorescent intensity ratio, as any value between β = 0 (unmethylated) and 1 (completely methylated), and log2-transformed into M values, which are considered a more statistically valid estimator . Corresponding effect sizes are, for a better biological understanding, given as Δβ, unless otherwise stated. After quality control, we performed a functional normalisation using the preprocessFunnorm function in minfi . To identify CpGs correlating significantly with SPY, an analysis of covariance (ANCOVA) was performed for SPY as a continuous variable with adjustment by age as a continuous variable and sex as a categorical variable. We used SPY in this model to account for the cumulative risk exposure conferred by long-term smoking as suggested by Teschendorff et al. 2015 . We additionally investigated differential methylation correlating with smoking status by performing an ANCOVA with the covariates age and sex. In both analyses, we additionally adjusted for technical variation, i.e. slide and position on slide, by using Combat and the R package BatchQc . Correction for multiple testing was performed using the method by Benjamini & Hochberg . CpGs were annotated to genes according to GRCh37/hg19 as provided in the MethylationEPIC BeadChip manifest.
Cell type deconvolution of EWAS samples
To identify the presence of non-epithelial cells in our samples of oral mucosa, we used the EpiDISH algorithm (Version 3.9) , applying the centEpiFibIC.m.rda reference dataset on our beta-values, which includes centroids for epithelial cells, fibroblasts and immune cells.
To validate the effects of the observed methylation differences between smokers and never smokers on gene expression in the oral mucosa, we performed RNA-sequencing (RNA-Seq) of the same samples that were used in the EWAS. This ensured a homogeneous genetic and cellular background of the methylation and expression analysis. The total mRNA of 38 biopsy samples that passed QC for RNA-Seq (> 600 ng of total RNA, RIN > 7) was sequenced with a coverage of 16 million 50 bp single-end reads per sample. One RNA-sample was discarded due to low amounts of RNA. In brief, the library preparation was performed with 600 ng RNA using the TruSeq Stranded mRNA Kit (Illumina). RNA-Seq was performed on a HiSeq3000 (Illumina) at the Institute for Clinical Molecular Biology of the Christian-Albrechts-University Kiel, Germany.
Differential expression analysis
RNA-seq samples were mapped to the transcriptome (Ensembl 96, GRCh38) using the transcript abundance quantification method Salmon . Salmon estimates abundances without aligning reads. The transcript level estimates were then summarised for the gene level and imported into R using the package tximport . Subsequently, the imported data was analysed for differential expression between smokers and never smokers using DESeq2 . Genes with ≤ 2 counts in all samples combined were not considered in the analyses. To identify genes that show substantial evidence for a correlation of differential expression with differential methylation in smokers, we applied the following steps: (1) filtering of all differentially expressed transcripts with unadjusted p < 0.05; (2) filtering of transcripts with more than one CpG associated with q < 0.15 with SPY; (3) filtering of transcripts that were previously reported in a large EWAS on buccal swabs  or in a systematic review on EWAS on blood .
Gene set enrichment analysis of differentially expressed genes
Gene sets of KEGG (n = 186) and Reactome (n = 674) were gathered from the Broad Institute Molecular Signatures Database (MSigDB [58, 59]). Together with the log2FoldChanges of the differential expression analysis, we performed an enrichment analysis using the R package gage . Gene sets with q value < 0.05 were considered as being significantly enriched.
Preparation of cigarette smoke extract and RNA extraction of cultured cells
Cigarette smoke extraction system and cigarette smoke extract (CSE) were prepared as described in . CSE was prepared from Roth-Händle no filters cigarettes (Reemtsma, Hamburg, Germany). Smoke of three cigarettes was filtered using a 0.2-μM sterile filter and extracted into 10 mL cell culture medium. The CSE-containing medium was directly added to the cells (3 mL per well).
CSE stimulation of primary gingival fibroblasts
For CSE stimulation, solid ex vivo biopsies of the masticatory oral mucosa of the hard palate were taken from three additional participants with 3 mm tissue punchers. Biopsies were dissected enzymatically to separate the epithelial cells from the fibroblasts by overnight incubation in 5 mg/mL dispase II (Sigma Aldrich) diluted in cell growth medium (DMEM, 1% Pen/Strep) at 4 °C. The primary gingival fibroblast cells (pGFs) were subsequently cultured in cell growth medium (DMEM, 1% Amphotericin B, 1% Pen/Strep, 1% non-essential amino acids). Prior to CSE stimulation, the pGFs (passage 3–4) were seeded in 6-well tissue culture plates (TPP Techno Plastic Products, Trasadingen, Switzerland) 24 h before stimulation (2.2 × 105 cells per well). The CSE was freshly prepared at the day of stimulation. CSE induction was performed for 6 h in three biological replicates (each with five technical replicates) of pGFs, with aliquots of the same CSE. Three millilitre medium without CSE was added to the control cell plates (five technical replicates for each of the three donors). After 6 h of CSE incubation, the cells were washed twice with PBS. Cell disruption and total RNA extraction was carried out using the RNeasy Mini Kit (Qiagen) according to the manufacturer’s instructions.
cDNA synthesis and qRT-PCR
To test differential expression in primary gingival fibroblasts upon CSE-stimulation, we performed quantitative reverse transcription polymerase chain reaction (qRT-PCR) for CYP1B1 and CYP1B1-AS. qRT-PCR was performed with an amount of 500 ng of total RNA of CSE-stimulated cells using the High-Capacity cDNA Reverse Transcription Kit and oligo-(dT)-primers (Thermo Fisher Scientific, Waltham, USA) in accordance with the manufacturer’s guidelines, using oligo-(dT)- and dNTP-primers instead of random primers. Control PCR reactions contained water instead of cDNA. qRT-PCR experiments were performed using the CFX Connect System (Bio-Rad, Hercules, USA) in combination with SYBR Select Master Mix (Thermo Fisher Scientific) according to the manufacturer’s instructions. The gene expression levels were normalised to the mRNA expression of GAPDH, and relative expression was calculated using the mathematical model delta-delta ct (GraphPad Prism Software/R). Primers were manufactured (metabion GmbH, Planegg/Steinkirchen, Germany) with the following sequences: CYP1B1 (forward: GAC GAC CCC GAG TTC CGT GA; reverse: AGC CAG GGC ATC ACG TCC AC) and CYP1B1-AS (forward: AGC CCT GAA AGA TGA ACA GTG GT; reverse: GGC ATG CCC ATT TCT TCC ACA). Because the lncRNA CYP1B1-AS1 was not covered by mRNA-Sequencing, we analysed expression in our ex-vivo biopsies from the EWAS and 17 additional biopsies that were collected as described above with qRT-PCR. In total, CYP1B1-AS-expression of 32 smokers and 37 never smokers was analysed.
Cell type deconvolution and differential methylation analysis of OSCC samples
To identify a putative role of genes that showed differential methylation in oral mucosa of smokers in cancer-development, we analysed data from The Cancer Genome Atlas (TCGA) . We downloaded Illumina 450k methylation bead chip data from 16 samples of oral squamous cell carcinoma (OSCC) of keratinized oral mucosa (sites specified as “palate” or “gums”; i.e., Hard Palate, Upper Gum, Lower Gum and Gum NOS in the TCGA). To account for cell type-specific methylation, we performed the cell type deconvolution analyses on the 450k OSCC data as described above. We then performed an ANCOVA for all 130 CpGs that were annotated on the Illumina 450k array to the genetic regions of CYP1B1 and AHRR with sex as a covariate and adjusting for technical variation using the R package sva  to compare OSCC samples to the healthy never smokers from the EWAS.
Thun MJ, Carter BD, Feskanich D, Freedman ND, Prentice R, Lopez AD, et al. 50-year trends in smoking-related mortality in the United States. N Engl J Med. 2013;368(4):351–64.
Ferreira Antunes JL, Toporcov TN, Biazevic MG, Boing AF, Scully C, Petti S. Joint and independent effects of alcohol drinking and tobacco smoking on oral cancer: a large case-control study. PLoS One. 2013;8(7):e68132.
Tonetti MS, Claffey N, European Workshop in Periodontology group C. Advances in the progression of periodontitis and proposal of definitions of a periodontitis case and disease progression for use in risk factor research. Group C consensus report of the 5th European Workshop in Periodontology. J Clin Periodontol. 2005;32(Suppl 6):210–3.
Portela A, Esteller M. Epigenetic modifications and human disease. Nat Biotechnol. 2010;28(10):1057–68.
Feinberg AP. Phenotypic plasticity and the epigenetics of human disease. Nature. 2007;447(7143):433–40.
Fraga MF, Ballestar E, Paz MF, Ropero S, Setien F, Ballestar ML, et al. Epigenetic differences arise during the lifetime of monozygotic twins. Proc Natl Acad Sci U S A. 2005;102(30):10604–9.
Flanagan JM. Epigenome-wide association studies (EWAS): past, present, and future. Methods Mol Biol. 2015;1238:51–63.
Ambatipudi S, Cuenin C, Hernandez-Vargas H, Ghantous A, Le Calvez-Kelm F, Kaaks R, et al. Tobacco smoking-associated genome-wide DNA methylation changes in the EPIC study. Epigenomics. 2016;8(5):599–618.
Breitling LP, Yang R, Korn B, Burwinkel B, Brenner H. Tobacco-smoking-related differential DNA methylation: 27K discovery and replication. Am J Hum Genet. 2011;88(4):450–7.
de Vries M, van der Plaat DA, Nedeljkovic I, Verkaik-Schakel RN, Kooistra W, Amin N, et al. From blood to lung tissue: effect of cigarette smoke on DNA methylation and lung function. Respir Res. 2018;19(1):212.
Guida F, Sandanger TM, Castagne R, Campanella G, Polidoro S, Palli D, et al. Dynamics of smoking-induced genome-wide methylation changes with time since smoking cessation. Hum Mol Genet. 2015;24(8):2349–59.
Jessen WJ, Borgerding MF, Prasad GL. Global methylation profiles in buccal cells of long-term smokers and moist snuff consumers. Biomarkers. 2018;23(7):625–39.
Joubert BR, Haberg SE, Nilsen RM, Wang X, Vollset SE, Murphy SK, et al. 450K epigenome-wide scan identifies differential DNA methylation in newborns related to maternal smoking during pregnancy. Environ Health Perspect. 2012;120(10):1425–31.
Philibert RA, Beach SR, Brody GH. Demethylation of the aryl hydrocarbon receptor repressor as a biomarker for nascent smokers. Epigenetics. 2012;7(11):1331–8.
Rzehak P, Saffery R, Reischl E, Covic M, Wahl S, Grote V, et al. Maternal smoking during pregnancy and DNA-methylation in children at age 5.5 years: epigenome-wide-analysis in the European childhood obesity project (CHOP)-study. PLoS One. 2016;11(5):e0155554.
Sayols-Baixeras S, Lluis-Ganella C, Subirana I, Salas LA, Vilahur N, Corella D, et al. Identification of a new locus and validation of previously reported loci showing differential methylation associated with smoking. The REGICOR study. Epigenetics. 2015;10(12):1156–65.
Shenker NS, Polidoro S, van Veldhoven K, Sacerdote C, Ricceri F, Birrell MA, et al. Epigenome-wide association study in the European Prospective Investigation into Cancer and Nutrition (EPIC-Turin) identifies novel genetic loci associated with smoking. Hum Mol Genet. 2013;22(5):843–51.
Sun YV, Smith AK, Conneely KN, Chang Q, Li W, Lazarus A, et al. Epigenomic association analysis identifies smoking-related DNA methylation sites in African Americans. Hum Genet. 2013;132(9):1027–37.
Tsai PC, Glastonbury CA, Eliot MN, Bollepalli S, Yet I, Castillo-Fernandez JE, et al. Smoking induces coordinated DNA methylation and gene expression changes in adipose tissue with consequences for metabolic health. Clin Epigenetics. 2018;10(1):126.
Tsaprouni LG, Yang TP, Bell J, Dick KJ, Kanoni S, Nisbet J, et al. Cigarette smoking reduces DNA methylation levels at multiple genomic loci but the effect is partially reversible upon cessation. Epigenetics. 2014;9(10):1382–96.
Wan ES, Qiu W, Carey VJ, Morrow J, Bacherman H, Foreman MG, et al. Smoking-associated site-specific differential methylation in buccal mucosa in the COPDGene study. Am J Respir Cell Mol Biol. 2015;53(2):246–54.
Teschendorff AE, Yang Z, Wong A, Pipinikas CP, Jiao Y, Jones A, et al. Correlation of smoking-associated DNA methylation changes in buccal cells with DNA methylation changes in epithelial cancer. JAMA Oncol. 2015;1(4):476–85.
Stueve TR, Li WQ, Shi J, Marconett CN, Zhang T, Yang C, et al. Epigenome-wide analysis of DNA methylation in lung tissue shows concordance with blood studies and identifies tobacco smoke-inducible enhancers. Hum Mol Genet. 2017;26(15):3014–27.
Monick MM, Beach SR, Plume J, Sears R, Gerrard M, Brody GH, et al. Coordinated changes in AHRR methylation in lymphoblasts and pulmonary macrophages from smokers. Am J Med Genet B Neuropsychiatr Genet. 2012;159B(2):141–51.
Zeilinger S, Kuhnel B, Klopp N, Baurecht H, Kleinschmidt A, Gieger C, et al. Tobacco smoking leads to extensive genome-wide changes in DNA methylation. PLoS One. 2013;8(5):e63812.
Vogel CFA, Haarmann-Stemmann T. The aryl hydrocarbon receptor repressor—more than a simple feedback inhibitor of AhR signaling: clues for its role in inflammation and cancer. Curr Opin Toxicol. 2017;2:109–19.
Buro-Auriemma LJ, Salit J, Hackett NR, Walters MS, Strulovici-Barel Y, Staudt MR, et al. Cigarette smoking induces small airway epithelial epigenetic changes with corresponding modulation of gene expression. Hum Mol Genet. 2013;22(23):4726–38.
Squier CA, Kremer MJ. Biology of oral mucosa and esophagus. J Natl Cancer Inst Monogr. 2001;2001(29):7–15.
Bergmeier LA. Oral mucosa in health and disease a concise handbook. 2018.
Lesch CA, Squier CA, Cruchley A, Williams DM, Speight P. The permeability of human oral mucosa and skin to water. J Dent Res. 1989;68(9):1345–9.
Winning TA, Townsend GC. Oral mucosal embryology and histology. Clin Dermatol. 2000;18(5):499–511.
Gao X, Jia M, Zhang Y, Breitling LP, Brenner H. DNA methylation changes of whole blood cells in response to active smoking exposure in adults: a systematic review of DNA methylation studies. Clin Epigenetics. 2015;7:113.
Sutter TR, Tang YM, Hayes CL, Wo YY, Jabs EW, Li X, et al. Complete cDNA sequence of a human dioxin-inducible mRNA identifies a new gene subfamily of cytochrome P450 that maps to chromosome 2. J Biol Chem. 1994;269(18):13092–9.
Hopper JL, Seeman E. The bone density of female twins discordant for tobacco use. N Engl J Med. 1994;330(6):387–92.
Law MR, Hackshaw AK. A meta-analysis of cigarette smoking, bone mineral density and risk of hip fracture: recognition of a major effect. BMJ. 1997;315(7112):841–6.
Szulc P, Garnero P, Claustrat B, Marchand F, Duboeuf F, Delmas PD. Increased bone resorption in moderate smokers with low body weight: the Minos study. J Clin Endocrinol Metab. 2002;87(2):666–74.
Iqbal J, Sun L, Cao J, Yuen T, Lu P, Bab I, et al. Smoke carcinogens cause bone loss through the aryl hydrocarbon receptor and induction of Cyp1 enzymes. Proc Natl Acad Sci U S A. 2013;110(27):11115–20.
Yu J, Liu Y, Gong Z, Zhang S, Guo C, Li X, et al. Overexpression long non-coding RNA LINC00673 is associated with poor prognosis and promotes invasion and metastasis in tongue squamous cell carcinoma. Oncotarget. 2017;8(10):16621–32.
Spivack SD, Hurteau GJ, Jain R, Kumar SV, Aldous KM, Gierthy JF, et al. Gene-environment interaction signatures by quantitative mRNA profiling in exfoliated buccal mucosal cells. Cancer Res. 2004;64(18):6805–13.
Spira A, Beane J, Shah V, Liu G, Schembri F, Yang X, et al. Effects of cigarette smoke on the human airway epithelial cell transcriptome. Proc Natl Acad Sci U S A. 2004;101(27):10143–8.
Cornelis MC, Monda KL, Yu K, Paynter N, Azzato EM, Bennett SN, et al. Genome-wide meta-analysis identifies regions on 7p21 (AHR) and 15q24 (CYP1A2) as determinants of habitual caffeine consumption. PLoS Genet. 2011;7(4):e1002033.
Pradhan S, Nagashri MN, Gopinath KS, Kumar A. Expression profiling of CYP1B1 in oral squamous cell carcinoma: counterintuitive downregulation in tumors. PLoS One. 2011;6(11):e27914.
Murray GI, Taylor MC, McFadyen MC, McKay JA, Greenlee WF, Burke MD, et al. Tumor-specific expression of cytochrome P450 CYP1B1. Cancer Res. 1997;57(14):3026–31.
Chang I, Mitsui Y, Kim SK, Sun JS, Jeon HS, Kang JY, et al. Cytochrome P450 1B1 inhibition suppresses tumorigenicity of prostate cancer via caspase-1 activation. Oncotarget. 2017;8(24):39087–100.
Brand RW, Isselhard DE. Anatomy of orofacial structures : a comprehensive approach. Eight edition. ed. St. Louis: Elsevier; 2019.
Aryee MJ, Jaffe AE, Corrada-Bravo H, Ladd-Acosta C, Feinberg AP, Hansen KD, et al. Minfi: a flexible and comprehensive Bioconductor package for the analysis of Infinium DNA methylation microarrays. Bioinformatics. 2014;30(10):1363–9.
Smyth GK, Michaud J, Scott HS. Use of within-array replicate spots for assessing differential expression in microarray experiments. Bioinformatics. 2005;21(9):2067–75.
Pidsley R, Zotenko E, Peters TJ, Lawrence MG, Risbridger GP, Molloy P, et al. Critical evaluation of the Illumina MethylationEPIC BeadChip microarray for whole-genome DNA methylation profiling. Genome Biol. 2016;17(1):208.
McCartney DL, Walker RM, Morris SW, McIntosh AM, Porteous DJ, Evans KL. Identification of polymorphic and off-target probe binding sites on the Illumina Infinium MethylationEPIC BeadChip. Genom Data. 2016;9:22–4.
Du P, Zhang X, Huang CC, Jafari N, Kibbe WA, Hou L, et al. Comparison of Beta-value and M-value methods for quantifying methylation levels by microarray analysis. BMC Bioinformatics. 2010;11:587.
Fortin JP, Labbe A, Lemire M, Zanke BW, Hudson TJ, Fertig EJ, et al. Functional normalization of 450k methylation array data improves replication in large cancer studies. Genome Biol. 2014;15(12):503.
Manimaran S, Selby HM, Okrah K, Ruberman C, Leek JT, Quackenbush J, et al. BatchQC: interactive software for evaluating sample and batch effects in genomic data. Bioinformatics. 2016;32(24):3836–8.
Benjamini Y, Hochberg Y. Controlling the false discovery rate - a practical and powerful approach to multiple testing. J R Stat Soc Ser B Methodol. 1995;57(1):289–300.
Zheng SC, Webster AP, Dong D, Feber A, Graham DG, Sullivan R, et al. A novel cell-type deconvolution algorithm reveals substantial contamination by immune cells in saliva, buccal and cervix. Epigenomics. 2018;10(7):925–40.
Patro R, Duggal G, Love MI, Irizarry RA, Kingsford C. Salmon provides fast and bias-aware quantification of transcript expression. Nat Methods. 2017;14(4):417–9.
Soneson C, Love MI, Robinson MD. Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences. F1000Res. 2015;4:1521.
Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15(12):550.
Liberzon A, Birger C, Thorvaldsdottir H, Ghandi M, Mesirov JP, Tamayo P. The molecular signatures database (MSigDB) hallmark gene set collection. Cell Syst. 2015;1(6):417–25.
Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A. 2005;102(43):15545–50.
Luo W, Friedman MS, Shedden K, Hankenson KD, Woolf PJ. GAGE: generally applicable gene set enrichment for pathway analysis. BMC Bioinformatics. 2009;10:161.
Gellner CA, Reynaga DD, Leslie FM. Cigarette smoke extract: a preclinical model of tobacco dependence. Curr Protoc Neurosci. 2016;77:9 54 1–9 10.
Cancer Genome Atlas Research N, Weinstein JN, Collisson EA, Mills GB, Shaw KR, Ozenberger BA, et al. The cancer genome Atlas pan-cancer analysis project. Nat Genet. 2013;45(10):1113–20.
Leek JT, Johnson WE, Parker HS, Jaffe AE, Storey JD. The sva package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinformatics. 2012;28(6):882–3.
The results published for the OSCC analysis are in parts based upon data generated by the TCGA Research Network: https://www.cancer.gov/tcga.
This study was funded by a research grant from the Deutsche Forschungsgemeinschaft (DFG), RI 2827/1-1, the Bundesministerium für Bildung und Forschung (01DL15002), and a grant of the DG PARO/CP GABA-Forschungsförderung.
Availability of data and materials
The generated datasets from RNA-Sequencing experiment will be made publicly available on and the gene expression omnibus (GEO) database (https://www.ncbi.nlm.nih.gov/geo/) upon acceptation for publication. DNA methylation data from the EWAS experiment will be available from the corresponding author on reasonable request.
Ethics approval and consent to participate
This study was conducted in accordance with the principles of Good Clinical Practice and approved by the Independent Ethics Committee of the participating University Hospitals of Berlin (Germany), Würzburg (Germany), Vienna (Austria) and Coimbra (Portugal). Written informed consent was obtained from all subjects according to the Declaration of Helsinki.
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
CpGs from the literature associated with SPY in our data. Results are shown for the ANCOVA with SPY for all CpGs with q < 0.15 in our EWAS that map to genes harbouring a significant CpG in an EWAS on buccal swabs  or were reported as replicated in at least two independent EWAS on blood . (XLSX 47 kb)
Results of the EpiDISH cell type deconvolution. Estimates for the different cell type fractions are given each for the EWAS samples and the OSCC samples, each also additionally divided into the subsets smokers and non-smokers. (XLSX 17 kb)
Median TPM values from RNA-Sequencing. Transcripts were ranked according to their median TPM value in the EWAS samples, with the highest expressed genes on top. (XLSX 1299 kb)
Results from the KEGG and Reactome Pathway Analyses of Differentially Expressed Genes. (XLSX 76 kb)
Correlation of associated CpGs with association of transcripts. Shown are all CpGs with q < 0.15 that map to genes that were differentially expressed in smokers with a nominal p value of < 0.05. (XLSX 14 kb)
Results from the ANCOVA of OSCC samples in AHRR and CYP1B1. Listed are all CpGs within the genetic regions of AHRR and CYP1B1 (n = 202) to indicate absence of CpGs in the 450 k analysis of OSCC. CpGs passing the adjusted significance threshold are marked in red. (XLSX 8961 kb)
Figure S1. Area of tissue sampling. The tissue samples were collected from the hard palate directly adjacent to the 4th and 5th tooth by the use of a tissue puncher (3 mm diameter). Figure S2. Histological features of the masticatory mucosa of the gingiva and the hard palate. The mucosa of both sites is characterised by the four layered orthokeratinised stratified squamous epithelium, overlaying the lamina propria. The lamina propria contains closely packed bundles of collagen fibres enabling the mucosa to resist heavy loading. The cells in the upper keratin layer have lost their nuclei. Because the function and appearance of the cells and cell layers from both extraction sites are similar, it is the broad agreement of the periodontologists participating in this study that both sites are comparable and are likely to share similar methylation patterns under normal conditions. (DOCX 1137 kb)