To identify recurrent epigenetic alterations in lung cancer and to reveal their putative benefit for diagnostics, HumanMethylation450 BeadChip analyses were performed on biopsy samples collected during bronchoscopy (further called: "biopsy samples"), lung cancer specimens of surgically resected primary tumors (further called "surgical specimens") and corresponding normal lung control specimens collected from the same patients (Additional file 2: Fig. S1).
While surgically resected tumor specimens usually contain plenty of tumor cells allowing a reliable histological diagnosis, analyses of biopsy samples more often suffer from low sample quality, e.g. due to low tumor cell content or putatively altered histology, making diagnosis more difficult and error-prone. To address the question whether DNA methylation analysis might add reliability to classical diagnostics of demanding cases, paired biopsy samples (i.e. from a supposedly tumor-containing and a supposedly tumor free contralateral bronchus) were collected from 55 patients during bronchoscopy. All samples were subsequently surveyed by experienced lung pathologists using standard histopathological procedures (Additional file 6: Table S1). In a total of 74 of the 110 biopsies, definite histopathological diagnosis could be reached (37 lung cancer specimens and 37 controls with definite diagnosis). In a total of 36 specimens, no definite diagnosis could be reached (13 lung cancer specimens and 23 nonmalignant samples without definite diagnosis), predominantly due to a low tumor cell content of the specimens. These cases are further classified as “indefinite” or “uncertain diagnosis”. Accordingly, 37 of the 55 patients received a definite diagnosis. DNA methylation values obtained using the HumanMethylation450 BeadChip of a subset of 13 CpG loci were verified by performing 608 bisulfite pyrosequencing reactions in 24 DNA samples isolated from 5 adenocarcinomas, 7 squamous cell carcinoma and 12 controls. The overall Pearson’s correlation coefficient between both techniques was 0.89, which demonstrates high correlation of DNA methylation values determined by independent techniques (Additional file 3: Fig. S2 and Additional file 7: Table S2).
Aberrant DNA methylation profiles in paired biopsy specimens
Differential DNA methylation analysis (DMA) of the paired biopsies with a definite diagnosis available from 37 patients identified 1303 loci (paired Wilcoxon test, FDR < 1 × 10–6, delta.beta > 0.25) aberrantly methylated in cancer-cell containing samples as compared to tumor-free samples. A subsequent hierarchical cluster analysis of these 1303 loci including all 110 biopsies clearly separated the specimens with definite infiltration by cancer cells from the corresponding control samples with high specificity and sensitivity (Fig. 1). In the 36 samples labeled as “indefinite diagnosis” by histopathological investigation, the assumption of the pathologist could be verified in the vast majority of samples (32 of 36, 89%). Comparing the pathologist assumption and the outcome of the DNA methylation analysis, the Cohen's kappa coefficient ranged from 0.76 (taking the uncertain diagnosis into account only; Cohens κuncertain = 0.76) to 0.93 (including all 110 samples; Cohens κoverall = 0.93). Only two samples with uncertain diagnosis (P05140125T and P10130074T) histologically considered most likely as benign clearly clustered with the tumor-containing samples. Interestingly, patient P10130074T presented clinically as stage 4 lung carcinoma without histological confirmation, whereas specimen P05140125T turned out being a metastasis of a renal carcinoma by a second histological evaluation. Thus, both tumors, which have been histologically misclassified as non-malignant were correctly identified by the epigenetic approach.
In turn, two uncertain diagnosis samples suspicious for being cancerous clustered between benign and malignant samples, both of these were collected from the same patient (P09130061). These samples could not definitely classified as malignant or benign, neither by histology nor by DNA methylation analysis.
A more detailed analysis revealed that of the 1303 differentially methylated loci, only 63 loci were hypomethylated in the cancer-cell containing biopsies as compared to the corresponding controls. These loci were located in 36 known genes encoding for e.g. transcription factors (ZNF423, PEG3, E2F6), chromosome associated proteins (DCTN2, ZNF423, CSPP1), cell adhesion proteins (SIRPB1) and apoptosis controlling factors (CASP8). Strikingly, four genes belong to the olfactory receptor family (OR8H2, OR8K1, OR2M7 and OR4K5). The 63 hypomethylated CpG loci were enriched for localization in the first exon (OR: 6.88, p = 2.55 × 10–06, chi2-test) but depleted for CGIs (OR: 0, p = 2.52 × 10–07, chi2-test) and DNaseI hypersensitive sites (OR: 0.12, p = 0.02, chi2-test).
In contrast, the hypermethylated loci in tumor containing biopsies mapped to 555 genes. A gene ontology search demonstrated that these genes contributed to known tumor and signaling pathways, i.e. to the TGF-beta signaling pathway (CREB, FBN1, GDF5, PITX2, RGMA, SMAD3, THSD4), the RAS signaling pathway (ABL1, INSR, NTRK1, PIK3CA, PIK3R1, PIK3R2, RAPGEF5, ZAP70), the TNF signaling pathway (DAB2, MAP3K14, MAPK14, PIK3CA, PIK3R1, PIK3R2, RIPK1, TNFAIP3, VCAM1) or apoptosis (CAPN1, DAB2IP, ITPR2, LMNA, MAP3K14, NTRK1, PIK3CA, PIK3R1, PIK3R2, RIPK1, TP53AIP1). In general, hypermethylated loci were found enriched for gene bodies (OR: 1.27, p = 2.67 × 10–4, chi2-test), 5′UTRs (OR: 1.38, p = 0.02, chi2-test), CGIs (OR: 1.15, p = 0.02, chi2-test), enhancers (OR: 2.48, p = 5.78 × 10–59, chi2-test) and DNaseI hypersensitive sites (OR: 1.71, p = 6.31 × 10–14, chi2-test).
Therapy of lung cancer besides clinical presentation and increasingly mutational findings relies on histopathologic subtyping [7] into SLCL, AC and SQC. To study, whether DNA methylation of biopsy samples from bronchoscopy might add to this subtyping we in the first step investigated differential DNA methylation of lung cancer entities in our cohort by performing an ANOVA analysis. Due to limited sample numbers in the different groups (Additional file 6: Table S1), we focused on SCLC, AC and SQC. Hierarchical cluster analysis of the identified 300 differentially methylated loci (σ/σmax > 0.25, FDR < 1 × 10–6, ANOVA; corresponding to 170 individual genes) resulted in two major branches separating SCLC from NSCLC (AC and SQC) samples with the exception of two samples (Fig. 2a, Additional file 9: Table S3; Cohens κ = 0.86). These two have been classified previously as SCLC but belong to the set of specimens with an uncertain histological diagnosis (76701 and P10130072, Additional file 6: Table S1). However, these samples show a DNA methylation pattern different from the one of other SCLC samples but identical to those of the NSCLC specimens, suggesting a misclassification of these biopsies based on the initial histological screening. In a second approach we focused on the clinically most relevant groups of NSCLC, AC (n = 13 samples in the biopsy cohort) and SQC (n = 18). Applying a t-test statistic to identify loci differentially methylated between these entities revealed 15 CpG loci (q < 0.05, σ/σmax > 0.55; cg00129651, cg00370229, cg00400827, cg01188578, cg06922248, cg09451235, cg11965913, cg12861034, cg17178900, cg18367631, cg20395967, cg20668644, cg20691436, cg22061831, cg26631039) corresponding to nine individual genes (ARHGEF4, CALML3, GLI2, HADHA, MIR663, PM20D1, PRKAR1B, RAPGEFL1, ZDHHC1). Considering these loci only, a hierarchical cluster analysis of the methylation values separated AC from SQC (Fig. 2b).
Aberrant DNA methylation profiles in surgical lung cancer specimens and comparison to biopsy specimens
To compare the results obtained from bronchoscopic biopsy samples with high quality specimens usually used in numerous other studies on lung cancer and to further validate our findings, we in addition collected an independent sample cohort of surgical cancer specimens. In all these samples a firm histopathologic diagnosis could be reach (Additional file 6: Table S1). To minimize putative corruption of DNA methylation data in malignant cancer cells due to adjacent non-malignant cells of the tumor microenvironment, tumor cell content was increased by macrodissection (resulting tumor cell content > 80%). In parallel, matched tumor-free control samples were isolated from the same surgically removed tissue.
In a first approach we investigated how the panel of 1303 CpG loci obtained from the DMA of the paired bronchoscopic biopsy samples described above performed on the independent cohort of surgically resected specimens. Based on the methylation status of these loci all surgical specimens were correctly separated into tumoral and normal by both hierarchical cluster analysis and PCA (Fig. 3a, b). Therefore, the DMA results of the biopsy samples could be fully validated using the independent cohort of surgically obtained primary specimens (κ = 1). Additionally, we applied the set of 1303 loci to a DNA methylation data set provided by the TCGA including more than 800 lung cancer samples (439 AC and 369 SQC) and 26 non-malignant controls (see materials). Although this data set contains only information of 1162 of the 1303 loci, both, a subsequently performed PCA (Fig. 4) or hierarchical cluster analysis (Additional file 4: Fig. S3) separated lung cancer and control samples with only a minor number of exceptions. This further confirmed the results of our analysis of surgical specimens. Depending on the selected branch of the cluster analysis, the agreement between histological and epigenetic diagnosis is almost perfect (κ = 0.87, Cohens kappa comparing the diagnostic outcome of histological and DNA methylation based analyses).
Further analyses in particular of the data set obtained from the surgical specimens as well as a comparison with the results obtained from bronchoscopic biopsies is provided in the Additional file 1: Supplement.