The promoter methylome differentiates tumor tissue from adjacent non-tumor tissue in HCC
The clinicopathologic features of the 8 patients with HCC in this promoter-wide methylation study are described in Additional file 1: Table S1. The primary etiology of this group was HBV infection (7 of 8 patients). All patients had a single tumor and most of the primary tumors (5 of 8) had moderately differentiated histology; 6 of 8 had stage II tumor, classified using the American Joint Committee on Cancer (AJCC) TNM system.
A LHC-BS approach [14, 15] was subsequently applied to profile the promoter methylome of the 8 sample pairs. Promoters were denoted as regions from −2200 bp to +500 bp of the transcriptional start sites (TSS) [16]. Based on the hg19 reference human genome, a total of 150,407 capture probes from the Crick strand were customized, capturing 1.86 million CpG nucleotides in the promoters. Based on this design, the Watson strand can be captured, enabling a coverage of 31,372 (91.8 %) genes in the RefSeq database [15]. We obtained an average of 4.4 Gb clean data for each sample, reaching 23× read depth, of which 94.77 % were mapped to at least one genomic position, with 87.75 % mapped uniquely to the reference genome. Furthermore, 94.21 % of the uniquely mapped reads were located at the defined promoter regions (Additional file 2: Table S2). We then filtered out all the CpG sites with less than 4× coverage in the 8 paired samples. The median value of CpG coverage between the lowest (994,997) and highest (1,685,393) sample was 1.374799 million CpGs.
To identify differential methylation of CpG loci linked to HCC, we further picked 690,858 CpG sites achieving a minimum read coverage of 4× in all 16 samples and performed a hierarchical clustering analysis. Based on the average level of methylation crossing downstream 500-bp region around TSS, the tumor could be clearly separated from adjacent non-tumor by significant changes in the pattern of the promoter methylome (Fig. 1a). Principal component analysis (PCA) consistently demonstrated that HCC tissue exhibited greater variance than non-tumor tissue, and this was further confirmed by chi-square tests (Additional file 3: Figure S1A, B). These findings may suggest generalized disruption of the integrity of the methylome in HCC.
The promoter regions of genes losing or acquiring DNA methylation show different CpG contents in HCC
To analyze the relationship between promoter DNA methylation and activity determined by the CpG content of the promoters, we applied a previously described classification of promoters as having either high-CpG content (HCP), intermediate CpG content (ICP), or low-CpG content (LCP), based on the CpG ratio, GC content, and length of the CpG-rich region [16] (Additional file 4: Figure S2A). In line with a previous report using MeDIP technology, our analysis demonstrated that genes acquiring low DNA methylation levels in tumors were mostly characterized by the presence of HCP promoters (Additional file 4: Figure S2B). We further performed hierarchical clustering analyses of CGIs, and a chi-square analysis was then applied to select the top 1000 genes containing highly variable CGI methylations based on the P values (Fig. 1b). In general, many of these 1000 genes had a substantially higher methylation ratio in tumor tissue than in non-tumor tissue (Fig. 1b). Interestingly, the majority of these 1000 CGIs were consistently hypo-methylated in poorly differentiated tumor (ID NO. 388) compared with moderately or well-differentiated tumors. Although one tumor (ID NO. 734) clustered closely with its adjacent tissue, we suspect that this may have been due to contamination of the tumor sample with non-tumor tissue. Overall, the data support the possibility that enriched HCPs may be responsible for inhibiting the expression of the corresponding genes.
Comparisons of promoter CpG methylation between HCC tissue and adjacent tissue reveal differentially methylated regions and DEGs
We next applied a pair-wise comparison to reveal differentially methylated regions (DMRs). In each comparison, the sliding window strategy was used to determine if the region within the window exhibited differential methylation between tumor and non-tumor samples (Additional file 5: Material and Methods). The approach generated an average of 2972 DMRs for 16 samples, although there was variation between sample pairs, suggesting high intra-tumor heterogeneity in DNA methylation (Additional file 6: Figure S3B). However, 77 genes with one or two DMRs were found in 6 of the 8 paired samples, and 67.5 % of these DMRs were hyper-methylated in the tumor tissue (Additional file 7: Table S3).
Promoter CGI methylation has frequently been associated with silencing of gene expression. To obtain expression data from the 8 HCC pairs, we used Illumina high-throughput RNA-seq technology to assess differentially expressed genes (DEGs). After removing low quality reads, we obtained 84.55 % of reads aligned to previously annotated genes, reaching 78.13 % of mapped unique reads. Our analysis determined 18,850 genes exhibiting at least one unique read. To identify DEGs, we next performed a pair-wise comparison between tumor and non-tumor tissue using a fold change cutoff of reads per kb per million (RPKM) values larger than 2 and an FDR-adjusted P value less than 0.01 [17]. Using this approach, the median numbers of genes identified as DEGs for the 8 paired samples were 7019, and the majority showed down-regulated expression in HCCs (Additional file 6: Figure S3A). Only 93 DEGs were shared by 6 of the 8 paired samples (Additional file 8: Table S4).
We hypothesized that there would be a relationship between the presence of DMRs in specific promoters and the DEGs in the liver tumors. As a result, 24 genes containing DMRs in promoter regions were subsequently matched and met the selection criteria in at least 5 of the 8 sample pairs (Additional file 9: Table S5). Among these, 20 genes showed expression levels negatively associated with the DMR methylation status. These included 4 genes hypo-methylated in tumor tissue (CLCNKA, BAIAP2L2, CCL20, and NQO1) and 16 genes hyper-methylated in tumor tissue (IFITM1, SMAD6, TBX15, CHST4, LRRC4, PHYHD1, STEAP4, TACSTD2, NPC1L1, THRSP, KCNJ10, PALM3, FAM134B, TMEM100, PM20D1, and GRHL2).
Selection of candidate genes and validation of methylation in 78 pairs of HCCs by MiSeq-BSP
We further acquired an additional 78 paired samples (of HCC and adjacent tissue) to validate the genes initially identified in the LHC-BS study. Since most of the primary tumors studied in the LHC-BS analysis had a well or moderately differentiated histology, we obtained 39 well-to-moderately and 39 moderately differentiated HCCs together with their matched adjacent tissues (Additional file 10: Table S6). The majority of the patients (96 %) in our study were male; the average age at diagnosis and treatment of HCC was 47.6 ± 10.1 years; 88 % had HBV infection. With regard to the common factors associated with HCC prognosis and recurrence, 83 % of the subjects had a single tumor, 65 % of the primary tumors were more than 5 cm in diameter, 58 % of patients had stage III tumors, and 41 % of patients had blood alpha-fetoprotein (AFP) levels greater than 4000 ng/ml. Therefore, the subjects represent a group of patients with hyper-vascular primary liver malignancy, who have a poor prognosis, associated with large tumor size as well as involvement of nearby or major vessels.
By undertaking a comprehensive literature search on liver carcinogenesis, we manually selected 12 of these 20 genes showing an inverse relationship between promoter methylation and gene expression. Among these 12 genes, 10 genes up-regulated in tumors with a hyper-methylated promoter (IFITM1, SMAD6, TBX15, CHST4, LRRC4, PHYHD1, STEAP4, TACSTD2, NPC1L1, and THRSP) and 2 genes down-regulated in tumors with a hypo-methylated promoter (CCL20 and NQO1) were further validated using Illumina MiSeq sequencing-based bisulfite sequencing PCR (MiSeq-BSP). Libraries for the 12 genes were prepared and individually barcoded for high-throughput pair-end sequencing using MiSeq2500 (Additional file 5: Material and Methods; Additional file 9: Table S5). Deep-sequencing of individual PCR fragments was achieved in a cost-effective way (Additional file 11: Table S7). We found that 7 genes (IFITM1, SMAD6, TBX15, CHST4, LRRC4, CCL20, and NQO1) showed significantly different promoter methylation levels between tumor and non-tumor tissue (P value < 0.001) in approximately 80 % of the 78 HCCs (Fig. 2a). In addition, 20–40 % of the examined HCCs showed a minimal difference in the mean values of 0.2 (corresponding to a 20 % difference in methylation) (Fig. 2b), indicating a highly tumor-specific promoter methylome change in these genes. We further performed supervised PCA on these 7 genes, which clearly separated tumors from non-tumors (Fig. 2c). However, the methylation status of these genes was not associated with any of the clinicopathologic findings, including histological differentiation and TNM stage (Fig. 2d).
Validation of candidate gene expression in HCCs
Among the genes that exhibited aberrant methylation in HCCs, SMAD6 has at least two transcript variants, the full-length variant 1 (NM_005585.4) and the short variant 2 (NM_001142861.2). Genomic sequence alignment suggested that the promoter hyper-methylation observed occurred in the shorter spliced form, which lacks one in-frame exon compared with the full-length transcript, variant 1 (Fig. 3a). We further chose the primer pair specific for variant 2 and examined its expression in 8 HCC pairs assessed by the LHC-BS assay. Through qRT-PCR analysis, we confirmed reduced SMAD6 variant 2 mRNA expressions in all examined tumors, indicating HCC-specific down-regulation of variant 2 (Fig. 3b).
Western blot analysis was further performed on the 8 HCC pairs to confirm the protein expression of the candidate genes, including IFITM1, CHST4, TBX15, LRRC4, and NQO1. Compared with adjacent non-tumor tissue, we observed reduced protein expression of IFITM1 in 6 of 8 tumors, lower TBX15 levels in 7 of 8 tumors, and decreased CHST4 amounts in 5 of 8 tumors (Fig. 3c, d). However, we could not detect alterations in the protein expressions of LRRC4 and NQO1 in HCC tissue (data not shown).
Demonstration of epigenetic regulation of candidate genes transcription via demethylation assays in cell lines
DNMT1 and DNMT3B, which belong to the DNA methyltransferase (DNMT) family, control DNA methylation. To further evaluate the impact of promoter methylation on gene expression, we utilized two cancer cell lines, namely HCT116 wild type and HCT116DNMT1−/− DNMT3B−/− double knockout (DKO) cells [18]. A total of 5 of 6 genes, including CHST4, IFITM1, TBX15, LRRC4, and SMAD6 variant 2, showed high promoter methylation levels (>80 %) in HCT116, while more than 50 % of their methylation was lost in DKO cells as a consequence of DNMT inhibition. Correspondingly, CCL20, CHST4, IFITM1, and SMAD6 variant 2 showed elevated expression in HCT116 DKO, confirmed by qRT-PCR (Fig. 4a and Additional file 12: Table S8).
For hyper-methylated or transcriptionally silenced genes, the DNA demethylating agent 5-aza-2-deoxycytidine (DAC) is known to restore gene expression [19, 20]. We further analyzed candidate gene expression upon DAC treatment in two immortalized non-tumor liver cell lines (QSG-7701 and HL-7702) and two HCC cell lines (HLE and HLF). For CCL20, we observed elevated expression in HCC cell lines upon treatment with DAC, but not in non-tumor cell lines, suggesting that CCL20 methylation status may be specifically associated with HCC prognosis. In agreement with the observations made in the HCT116-DKO study, IFITM1 expression in HCC cell lines was restored after treatment with DAC (Fig. 4b). However, TBX15, LRRC4, and CHST4 showed no systematic difference in expression between HCC and non-tumor liver cell lines (Additional file 13: Figure S4). Nonetheless, these observations in cell lines do not exclude the possibility that the expressions of TBX15 and CHST4 are altered in some patients with HCC due to promoter hyper-methylation, particularly as the Western blot analysis of the 8 paired samples described above revealed that TBX15 levels were reduced in 7 of 8 tumors, and CHST4 levels were decreased in 5 of 8 tumors (Fig. 3c, d).
Taken together, our results suggest that SMAD6 variant 2, IFITM1, TBX15, and CHST4 may act as TSGs in HCC that are silenced by promoter hyper-methylation; meanwhile, CCL20 may be epigenetically activated in tumor through promoter hypo-methylation, and elevated expression may be associated with a poor prognosis in HCC.