Landscape of the genetic variation of histone modification regulators in colon cancer
Figure 1 represents the work flow of this study. We retrieved four reviews on histone acetylation and methylation modification [4, 7, 8, 12], and a total of 88 acknowledged histone modification regulators, including 14 acetyltransferases, 18 deacetylases, 34 methyltransferases, and 22 demethylases, were identified and summarized for subsequent analysis (Additional file 1: Table S1). To clarify the role of histone modification regulators in patients with colon cancer, the gene expression profile of 884 samples of colon tumor and 60 samples of nonneoplastic mucosa were collected from the GSE39582 and TCGA-COAD datasets. The comprehensive landscape of the expression pattern, prognostic significance, and interactions between these modifiers were depicted in the network plot (Fig. 2A–D, left). Most regulators demonstrated significant differential transcriptional expression between tumor and normal tissues and were significantly correlated with relapse-free survival (RFS, Additional file 1: Table S2), indicating that histone modification may play a crucial role in the pathogenesis and progression of colon cancer. As for somatic mutations, we found that 237 of the 399 included samples (59.4%) demonstrated at least one altered histone modification regulator (defined as the total mutation rate); nevertheless, the mutation rate of most regulators was less than 10% (Fig. 2A–D, right). Interestingly, we noticed that the four regulators with the highest mutation frequency in colon cancer, namely KMT2D (64/399, 16.0%), KMT2B (52/399, 13.0%), KMT2C (48/399, 12.0%), and SETD1B (46/399, 11.5%), were all histone H3 lysine 4 (H3K4) methyltransferases that belonged to the “Complex of Proteins Associated with Set1” family [13]. The landscape of the mutation positions of these genes is displayed in Additional file 2: Fig. S1A–D. Furthermore, tumor microenvironment (TME) analyses revealed significantly higher immune cell infiltration abundance, especially for cytotoxic cells, CD8+ T cells, activated dendritic cells, and Th2 cells, in samples from patients with mutations in any one of KMT2D, KMT2B, KMT2C, or SETD1B than in samples from patients without these mutations (Additional file 2: Fig. S1E). Consistently, the Sankey plot (Additional file 2: Fig. S1F) also showed that the mutations of these four “Complex of Proteins Associated with Set1” family genes were mainly concentrated in patients with the high microsatellite instability (MSI-H), CMS1(consensus molecular subtypes) [14], C2 (pan-immune, TCGA) [15], or HM-indel (pan-GI, TCGA) [15] colon cancer subtypes, which mostly indicate immune activation phenotypes. We also studied the prevalence of histone modifier gene alterations across tumor types (Additional file 3: Fig. S2A–D and Additional file 1: Table S3). The total mutation rate of histone modifiers in colon cancer samples (59.4%) was slightly lower than the average (64.7%) and was only higher than that in the liver (47.3%) and breast cancer (39.1%) samples. Interestingly, besides colon cancer, KMT2D also demonstrated a relatively high alteration frequency (> 10%) in most other tumor types. However, SETD1B mutation was only concentrated in colon cancer samples and was not detected in most other tumor types. Finally, we explored the potential role of histone modification regulators in regulating chemoresponses by comparing the expression of each histone modification regulator between fluorouracil response and nonresponse groups using a two-sided Student’s t test in the dataset merged by GSE39582 and TCGA-COAD. As shown in Fig. 2E, most regulators presented significant differential expressions among the nonresponse and response subgroups of fluorouracil, indicating that these regulators may affect the efficacy of adjuvant chemotherapy in patients with colon cancer. Collectively, the above results indicated that expression alterations and genetic variations in histone modifiers were important factors contributing to tumor heterogeneity and were closely linked with the initiation, progression, and therapeutic effect of colon cancer.
Identification of histone modifier expression patterns and exploration of their clinical relevance
As histone modifications were reported to play a crucial role in the tumorigenesis and progression of colon cancer by causing abnormal epigenomic reprogramming [4], we aimed to evaluate whether the transcriptional profiling of the 88 acetylation and methylation regulators can help classify patients with colon cancer. A total of 1372 patients diagnosed with stage I–III colon cancer from 5 GEO datasets (GSE17536, GSE33113, GSE37892, GSE38832, and GSE39582) and the TCGA-COAD dataset were enrolled (Additional file 1: Tables S4, S5). Unsupervised K-means clustering analyses of the meta-GEO cohort (990 patients), TCGA-COAD cohort (382 patients), and integrated meta-GEO and TCGA-COAD cohort (1372 patients) were conducted. Results have provided four distinct expression patterns of histone modifiers (Additional file 4: Fig. S3A–C), and the compositions of histone modifiers in these clusters were similar among all three cohorts (Additional file 4: Fig. S3D), indicating that the existence of these four clusters was stable. We termed these clusters HMC1 (n = 546), HMC2 (n = 280), HMC3 (n = 247), and HMC4 (n = 299). Among them, HMC1 exhibited a high expression abundance of nearly all histone modifiers, indicating that the activity and turnover of histone acetylation and methylation were intense in HMC1, while the remaining three clusters showed the enrichment of partial regulators (Fig. 3A). The distribution of each histone modifiers in the four clusters is shown in Additional file 4: Fig. S3E, F. Specifically, we noticed that HMC4 was characterized by the prominent expression of regulators enriched in the fluorouracil-nonresponse subgroup (Fig. 3A). A survival analysis revealed that HMC4 had a significantly shorter RFS time than did HMC1, HMC2, and HMC3 (HMC4 vs. HMC1–3: hazard radio [HR] = 1.63, 95% confidence interval [CI] 1.23–2.16, Fig. 3B). As for overall survival (OS), HMC4 also exhibited a higher mortality risk than the remaining clusters; however, this difference was statistically insignificant (Fig. 3C–E). It should be noted that the negative correlation between HMC4 and OS obviously increased in patients who underwent adjuvant chemotherapy (GSE39582: HR 1.92, 95%CI 1.11–3.30; TCGA-COAD: HR 3.09, 95%CI 0.97–9.84), suggesting that this pattern may be associated with chemotherapy resistance (Fig. 3E). To validate this hypothesis, we analyzed the relationship between HMC clusters and chemotherapy responses in both GSE39582 and TCGA-COAD datasets. As shown in Fig. 3F–G, adjuvant chemotherapy conduction did not provide any survival benefits to patients in the HMC4 cluster in both GSE39582 (HR 1.26, 95%CI 0.66–2.38) and TCGA-COAD (HR 1.03, 95%CI 0.40–2.67) cohorts and the fluorouracil-response rate was also the lowest in patients in the HMC4 cluster. Taken together, these data imply that the histone modification clusters were significantly correlated with patients’ prognosis and chemotherapy benefit, which might provide new insights on colon cancer classification system.
Biological characteristics of different histone modifier expression patterns
To further characterize and understand the biological differences between these intrinsic histone modification phenotypes, we performed a gene set variation analysis (GSVA) based on the “Hallmarker” gene set (Fig. 4A–D). Results indicated that there are some similarities in the biological pathway activation between HMC1 and HMC2. For example, the activation levels of DNA repair-, E2F-, mTORC1-, and MYC-related pathways in HMC1 and HMC2 samples were significantly higher than those in HMC3 and HMC4. Interestingly, the G2M checkpoint pathway score was the highest in HMC1 but the lowest in HMC3 in both the meta-GEO and TCGA-COAD cohorts, suggesting that cell cycle disorder may be an important mechanism underlying the tumorigenesis of patients with HMC1. In addition, the protein secretion pathway was also significantly inhibited in HMC3 patients. In both meta-GEO and TCGA-COAD cohorts, HMC2 was enriched in multiple cell metabolism pathways, including glycolysis, fatty acid metabolism, and oxidative phosphorylation, suggesting that inhibition of metabolism may be a potential treatment strategy for HMC2 patients. Meanwhile, the Wnt signaling pathway and Hedgehog pathway, two key signaling pathways that are crucial for stem and progenitor cell homeostasis and function, were lowest in HMC2 patients, suggesting that the stemness feature of HMC2 is weaker than that of other histone modifier expression patterns. Particularly, HMC4 represented a stromal/mesenchymal phenotype with many enriched pathways, including epithelial–mesenchymal transition (EMT), hypoxia, and TGFβ signaling. Consistent with the results of GSVA, the protein expression levels of the molecular markers involved in EMT and TGFβ signaling were significantly higher in HMC4 than in the remaining clusters (Fig. 4G). Intriguingly, we found that the KRAS pathway showed the highest activation degree in HMC4, and the number of patients in the HMC4 cluster with KRAS mutant tumors was significantly higher than those with KRAS wild-type tumors (Fig. 4H). We further calculated the level of OLFM4+ stem cells and mesenchymal cells using signatures obtained from single-cell sequences proposed by Gao et al. [16] and analyzed their associations with histone modifier expression patterns. Results have confirmed that the stem cell and mesenchymal cell signatures both had the highest enrichment in patients of the HMC4 cluster (Fig. 4E, F). Finally, based on the molecular subtypes of the GSE39582 and TCGA-COAD cohorts (Fig. 4I), we found that most patients with the C4 (CIT) [17], C6 (CIT) [17], CMS4, TMEC2 [18], Sub 3 [19], and C6 (Pan-Immune, TCGA) [20] subtypes, which mostly represent stromal/mesenchymal phenotypes, were assigned to the HMC4 cluster. Overall, these results suggest that histone modifier expression patterns were characterized by distinct biological pathway activation status.
Immune landscapes of different histone modifier expression patterns
We subsequently explored differences in the immune landscapes among all histone modification clusters. A single-sample gene set enrichment analysis (ssGSEA) was performed to obtain the infiltration abundance of TME cells as described in the “Methods” section. As shown in Fig. 5A–D, the TME features of patients in the HMC1 cluster were close to those of an “immune-desert” phenotype characterized by little immune cell infiltration; nevertheless, both HMC2 and HMC3 displayed moderate immune infiltration. However, helper T cells, especially Th2 cells, and central memory T cells were markedly downregulated in HMC3 compared with those in the remaining clusters, suggesting that antigen recognition was repressed in HMC3. Compared with the remaining clusters, HMC4 was characterized by significant increases in stromal cell infiltration, such as fibroblasts and endothelial cells. Moreover, innate immune cells with immunosuppressive properties, such as macrophages, neutrophils, mast cells, and B cells, also had the highest infiltration rate in HMC4. It is noteworthy that we found that CD8+ T cells, which are considered marker cells of adaptive immunity, were more abundant in HMC4 than in the remaining clusters in the TCGA-COAD cohort. Previous studies have revealed that the immune-excluded TME phenotype was characterized by an abundance of immune cells, with these immune cells being retained in the stroma surrounding tumor cell nests rather than in the parenchyma [21]. Therefore, we speculated that the TME feature of patients in the HMC4 cluster might be classified as the feature of the “immune-excluded” phenotype. Subsequent analyses have revealed that patients in the HMC4 cluster had the highest T cell exhaustion [22], tertiary lymphoid structure signatures [23], and stromal cell infiltration intensity score [24] (Fig. 5E, F, and Additional file 5: Fig. S4). Moreover, we obtained intratumoral heterogeneity (ITH), tumor purity, tumor mutation burden (TMB), and the number of neoantigen results from the study of Thorsson et al. [15] and analyzed their distribution across histone modifier expression patterns. Consistent with our earlier findings, patients in the HMC4 cluster exhibited the highest ITH and lowest tumor purity (Fig. 5G, H). However, there were no significant differences in TMB and number of neoantigen among HMC clusters (Fig. 5I, J). In summary, these data proved that the HMC4 cluster was closely related to the “immune-excluded” phenotype that was characterized by enrichment of both immune and stromal cell types.
Histone modification score (HM_score) construction
Since patients in the HMC4 cluster had the worst prognosis and lowest fluorouracil-response rate, we believe that developing a scoring model capable of individually quantifying histone modification status to identify patients in the HMC4 cluster may offer potential clinical application value. Therefore, we recognized differentially expressed genes (DEGs) in these four clusters. A total of 1003 DEGs (801 upregulated and 202 downregulated) in HMC4 were identified (Fig. 6A and Additional file 1: Table S6). A gene ontology analysis of these DEGs revealed that the upregulated genes were enriched in biological processes related to mesenchyme development, stromal activation, and cell response to external stress, whereas the downregulated DEGs were enriched in items related to cell metabolic processes (Fig. 6A and Additional file 1: Table S7). Subsequently, the Boruta algorithm was applied to reduce the dimension of these DEGs (Methods section), and we ultimately screened out 155 genes to form a histone modification-related signature termed as the HM_score (Fig. 6B, C and Additional file 1: Table S8). The boxplots (Additional file 6: Fig. S5A, B, left) have shown that the median HM_score value was highest in the HMC4 cluster in both meta-GEO and TCGA-COAD cohorts. A receiver operating characteristic (ROC) curve analysis further demonstrated that HM_score was a reliable index to distinguish patients in the HMC4 cluster with an area under the ROC curve (AUC) of 0.94 and 0.95 in the meta-GEO dataset and in the TCGA-COAD dataset, respectively (Additional file 5: Fig. S4A, B, right). In conclusion, the above results strongly suggested that the HM_score can effectively distinguish HMC4 patients.
Clinical relevance and biological characteristics of HM_score
We subsequently explored the prognostic impacts and predictive value of therapeutic benefits of the HM_score. Patients were divided into low- or high-score subgroups according to the cutoff values determined by the “survminer” package. The survival analyses indicated that groups with low HM_score had a significantly high RFS (HR 1.77, 95%CI 1.37–2.29) in the meta-GEO cohort (Fig. 7A). Moreover, the HM_score was validated as an independent prognostic biomarker for evaluating patient relapse using a multivariate Cox regression model (HR 1.49, 95%CI 1.02–2.16, Fig. 7B) controlled for age, gender, tumor stage, and CMS subtype. Similarly, we also noticed that the positive correlation between HM_score value and mortality rate was statistically significant in the subgroup of patients receiving chemotherapy in both the GSE39582 (HR 1.88, 95%CI 1.39–2.53) and TCGA-COAD (HR 2.54, 95%CI 1.47–4.38) cohorts (Fig. 7C), while chemotherapy conduction was a risk factor for unfavorable prognosis in patients within a high HM_score group (GSE39582: HR 1.38, 95%CI 0.81–2.33; TCGA-COAD: HR 1.33, 95%CI 0.52–3.36; Fig. 7D). The following boxplots also showed that HM_score was significantly higher in the fluorouracil-nonresponse and CMS4 subtype groups than in the remaining groups (Fig. 7E, F). GSVA and immune analyses demonstrated that the HM_score was markedly positively correlated with stromal activation processes and stromal cell infiltration, which is consistent with the results of the HMC4 cluster analysis (Fig. 7G, H). To confirm the clinical value and biological implication of the HM_score, we obtained bulk RNA-sequencing data from 30 additional patients with colon cancer from the Sun Yat-sen University Cancer Center (SYSUCC) as an external dataset (Additional file 1: Table S5). Patients were also grouped into the HMC4 and non-HMC4 using the nearest template prediction algorithm (GenePattern module “NTP,” https://cloud.genepattern.org) based on the DEGs in HMC4 we earlier identified. Consistent with the results of the meta-GEO and TCGA-COAD databases, the median HM_score value was significantly higher in the HMC4 than in the non-HMC4 cluster in the SYSUCC cohort (Additional file 5: Fig. S4C, left), and HM_score defined the HMC4 patterns with an AUC of 0.98 according to ROC curve analysis (Additional file 6: Fig. S5C, right). Furthermore, HM_score was also significantly higher in the fluorouracil-nonresponse and CMS4 subtype groups than in other groups (Fig. 7I), and there were strong positive associations between HM_score and stroma-relevant signatures (Fig. 7J) and stroma cell infiltration (Fig. 7K) in the SYSUCC cohort. The above results revealed that HM_score was a useful biomarker that could effectively predict survival and chemotherapy benefit in colon cancer patients and may offer potential clinical application value.
Whole-genome CRISPR screen reveals ZEB2 as the candidate driver gene for fluorouracil resistance in patients within HMC4 cluster
To identify critical genes driving fluorouracil resistance in patients of the HMC4 cluster, we performed CRISPR-based genome-wide loss-of-function screening in SW480 cells, using 2 μg/mL of fluorouracil as an effective selection pressure (Fig. 8A). From this screen, we discovered a subset of sgRNAs targeting 166 genes were significantly enriched in the fluorouracil-treated cells when compared to the vehicle control. These genes were identified as potential drivers for fluorouracil resistance (Fig. 8B). Moreover, among these genes, eight were highly expressed in patients in the HMC4 cluster (Fig. 8B, C). ZEB2, whose sgRNA was decreased the most in fluorouracil-treated populations, gained our attention. Subsequent analyses confirmed that patients in the HMC4 cluster in both the meta-GEO and TCGA-COAD cohorts had the highest ZEB2 mRNA expression distribution (Fig. 8D). There was also a strong positive correlation between ZEB2 transcriptional expression and HM_score in the SYSUCC cohort (Fig. 8E). A clinical relevance analysis has suggested that ZEB2 expression reflected the prognostic value of both RFS and OS, especially in patients who underwent adjuvant chemotherapy (Fig. 8G). The analysis also revealed that only patients in the low-ZEB2 group could significantly benefit from adjuvant chemotherapy (GSE39582: HR 0.53, 95%CI 0.32–0.88; TCGA-COAD: HR 0.29, 95%CI 0.08–0.98, Fig. 8G). The boxplots in Fig. 8F and I showed that ZEB2 mRNA expression was significantly elevated in the fluorouracil-nonresponse and CMS4 subtype groups. Pathway and immune analyses confirmed that the activation level of stroma pathways and stromal cell infiltration significantly increased as ZEB2 expressions increased (Fig. 8H).
To validate the CRISPR/Cas9 library screening results, we transfected siRNAs targeting ZEB2 and the ZEB2 overexpression plasmid in vitro into SW480 and HCT116, respectively, and performed a methyl-thiazolyl-tetrazolium (MTT) assay. As shown in Fig. 8K and Additional file 7: Fig. S6, the cell viability of the ZEB2 silencing group was significantly inhibited, while the cell viability of the ZEB2 overexpression group was significantly enhanced compared with the control group in each fluorouracil concentration gradient we tested. These data suggested that the cytotoxicity of fluorouracil to tumor cells was influenced by the transcriptional abundance of ZEB2. Interestingly, we uncovered several potential histone modification regulators of ZEB2 by differential expression analysis between the high- and low-ZEB2 groups (Fig. 8J), such as HDAC10, HDAC9, and KAT2B, indicating that these regulators might regulate ZEB2 expression in patients with HMC4. Collectively, based on the results from the analysis of these real-world cohorts, we are confident that the ZEB2 found by CRISPR library screening was indeed the core gene mediating chemoresistance in HMC4 patients.
The Connectivity Map analysis identifies potential molecular targets and compounds capable of reversing transcriptional characteristics in patients with HMC4
To identify candidate molecular targets and compounds that may be options to achieve chemosensitization in patients with HMC4, we analyzed the Connectivity Map project. Briefly, 147 significantly enriched molecular targets (Additional file 1: Table S9) and 91 compounds (Additional file 1: Table S10) were identified, and those identified in at least two cohorts were presented in the heatmap (Fig. 8L). Among these candidate molecular targets, nine were significantly enriched in all three cohorts, and five of them (TP53BP1 [25], RIPK2 [26], EHMT2 [27], IGFBP3 [28], and HMOX1 [29]) have been reported to have cancer- and chemoresistance-promoting activities simultaneously. Particularly, although EHMT2, a member of the histone methyltransferase family, was identified in all three cohorts, the transcription level of EHMT2 was significantly higher in the fluorouracil-response group (Fig. 2E). Of the 91 compounds, 13 were significantly enriched in all three cohorts. A mode-of-action analysis of these 13 compounds revealed 11 shared action mechanisms (Fig. 8M). Additionally, 2 compounds (SJ-172550 and RITA) shared the mode-of-action as the MDM inhibitor (Fig. 8M). The mode-of-action of the PKC inhibitor was also found in two other compounds (Fig. 8M). These findings provided a new perspective for developing effective chemosensitizing treatment strategies in HMC4 patients.