Alterations in histone modifications have been reported to be related to tumorigenicity and tumor progression. However, whether histone modification can aid the classification of patients or influence clinical behavior in patients with colon cancer remains unclear. Therefore, this study aimed to evaluate histone modifier expression patterns using the unsupervised clustering of the transcriptomic expressions of 88 histone acetylation and methylation regulators.
In this study, by consensus clustering analysis based on the transcriptome data of 88 histone modification regulators, we identified four distinct expression patterns of histone modifiers associated with different prognoses, intrinsic fluorouracil sensitivities, biological pathways, and tumor microenvironment characteristics among 1372 colon cancer samples. In these four clusters, the HMC4 cluster represented a stroma activation phenotype characterized by both the worst prognosis and lowest response rates to fluorouracil treatment. Then, we established a scoring scheme comprising 155 genes designated as “HM_score” by using the Boruta algorithm to distinguish colon cancer patients within the HMC4 cluster. Patients with a high HM_score were considered to have high stromal pathway activation, high stromal fraction, and an unfavorable prognosis. Further analyses indicated that a high HM_score also correlated with reduced therapeutic benefits from fluorouracil chemotherapy. Moreover, through CRISPR library screening, ZEB2 was found to be a critical driver gene that mediates fluorouracil resistance, which is associated with histone modifier expression patterns.
This study highlights that characterizing histone modifier expression patterns may help better understand the epigenetic mechanisms underlying tumor heterogeneity in patients with colon cancer and provide more personalized therapeutic strategies.
Colon cancer remains a major source of morbidity and mortality worldwide .
Similar to many other malignancies, Colon cancer is also a heterogeneous disease with distinct molecular properties, resulting in diverse clinical outcomes [2, 3]
Although several molecular classification strategies have been proposed to characterize distinct biological properties in colon cancer , more effective and clinically accessible classifiers remain to be explored.
Histone modification is an important epigenetic method used to regulate chromatin structure, DNA repair, and gene expression. It plays a crucial role in oncogenic transformation and variations in therapeutic responses . Although many types of histone modification have been reported so far, acetylation and methylation are the two most well-studied types , and there are functional interactions between them . Histone acetylation has been recognized as a fundamental process that regulates gene transcriptional activation by neutralizing the positive charge at unmodified lysine residues to diminish the electrostatic affinity between DNA and histones to enable transcription factors to more easily bind to the promoter region . It is also a dynamic and reversible process regulated by two kinds of enzymes with opposite effects: acetyltransferases (acetyl group transfer onto lysine residues) and deacetylases (acetyl group removal from lysine residues). Similarly, histone methylation is also tightly controlled by several methyltransferases (methyl group transfer onto lysine residues) and demethylase enzymes (methyl group removal from lysine residues) that function in concert to transfer and remove specific methyl groups critical for gene expression, cell fate, and genomic stability [7, 8]. However, compared with acetylation, histone methylation is more complex and subtle and is considered to be the most stable and inheritable chromatin modification form of all histone modifications [7, 8].
It has been widely reported that alterations in histone acetylation and methylation patterns and their interactions are linked with the initiation and progression of colon cancer [9,10,11]. However, most studies were conducted on one or two histone modification regulators due to technical limitations. The global effect of these regulators on biological outcomes and whether their interactions help classify patients from the perspective of histone modification in colon cancer remains unclear. Therefore, in this study, we comprehensively evaluated histone modifier expression patterns by clustering the transcriptomic expressions of 88 histone acetylation and methylation regulators in an unsupervised manner in an integrated cohort comprising 1372 patients with colon cancer from The Cancer Genome Atlas (TCGA) and Gene Expression Omnibus (GEO) databases. Moreover, we established a scoring scheme capable of individually quantifying histone modification status and predicting the clinical outcomes and fluorouracil (the basic drug of adjuvant chemotherapy for colon cancer) responses, designated as “HM_score.” Moreover, by performing a genome-wide screening of the “Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)” library, we demonstrated that ZEB2 acts as a driver gene mediating the fluorouracil resistance related to histone modifier expression patterns.
Landscape of the genetic variation of histone modification regulators in colon cancer
Figure 1 represents the work flow of this study. We retrieved four reviews on histone acetylation and methylation modification [4, 7, 8, 12], and a total of 88 acknowledged histone modification regulators, including 14 acetyltransferases, 18 deacetylases, 34 methyltransferases, and 22 demethylases, were identified and summarized for subsequent analysis (Additional file 1: Table S1). To clarify the role of histone modification regulators in patients with colon cancer, the gene expression profile of 884 samples of colon tumor and 60 samples of nonneoplastic mucosa were collected from the GSE39582 and TCGA-COAD datasets. The comprehensive landscape of the expression pattern, prognostic significance, and interactions between these modifiers were depicted in the network plot (Fig. 2A–D, left). Most regulators demonstrated significant differential transcriptional expression between tumor and normal tissues and were significantly correlated with relapse-free survival (RFS, Additional file 1: Table S2), indicating that histone modification may play a crucial role in the pathogenesis and progression of colon cancer. As for somatic mutations, we found that 237 of the 399 included samples (59.4%) demonstrated at least one altered histone modification regulator (defined as the total mutation rate); nevertheless, the mutation rate of most regulators was less than 10% (Fig. 2A–D, right). Interestingly, we noticed that the four regulators with the highest mutation frequency in colon cancer, namely KMT2D (64/399, 16.0%), KMT2B (52/399, 13.0%), KMT2C (48/399, 12.0%), and SETD1B (46/399, 11.5%), were all histone H3 lysine 4 (H3K4) methyltransferases that belonged to the “Complex of Proteins Associated with Set1” family . The landscape of the mutation positions of these genes is displayed in Additional file 2: Fig. S1A–D. Furthermore, tumor microenvironment (TME) analyses revealed significantly higher immune cell infiltration abundance, especially for cytotoxic cells, CD8+ T cells, activated dendritic cells, and Th2 cells, in samples from patients with mutations in any one of KMT2D, KMT2B, KMT2C, or SETD1B than in samples from patients without these mutations (Additional file 2: Fig. S1E). Consistently, the Sankey plot (Additional file 2: Fig. S1F) also showed that the mutations of these four “Complex of Proteins Associated with Set1” family genes were mainly concentrated in patients with the high microsatellite instability (MSI-H), CMS1(consensus molecular subtypes) , C2 (pan-immune, TCGA) , or HM-indel (pan-GI, TCGA)  colon cancer subtypes, which mostly indicate immune activation phenotypes. We also studied the prevalence of histone modifier gene alterations across tumor types (Additional file 3: Fig. S2A–D and Additional file 1: Table S3). The total mutation rate of histone modifiers in colon cancer samples (59.4%) was slightly lower than the average (64.7%) and was only higher than that in the liver (47.3%) and breast cancer (39.1%) samples. Interestingly, besides colon cancer, KMT2D also demonstrated a relatively high alteration frequency (> 10%) in most other tumor types. However, SETD1B mutation was only concentrated in colon cancer samples and was not detected in most other tumor types. Finally, we explored the potential role of histone modification regulators in regulating chemoresponses by comparing the expression of each histone modification regulator between fluorouracil response and nonresponse groups using a two-sided Student’s t test in the dataset merged by GSE39582 and TCGA-COAD. As shown in Fig. 2E, most regulators presented significant differential expressions among the nonresponse and response subgroups of fluorouracil, indicating that these regulators may affect the efficacy of adjuvant chemotherapy in patients with colon cancer. Collectively, the above results indicated that expression alterations and genetic variations in histone modifiers were important factors contributing to tumor heterogeneity and were closely linked with the initiation, progression, and therapeutic effect of colon cancer.
Identification of histone modifier expression patterns and exploration of their clinical relevance
As histone modifications were reported to play a crucial role in the tumorigenesis and progression of colon cancer by causing abnormal epigenomic reprogramming , we aimed to evaluate whether the transcriptional profiling of the 88 acetylation and methylation regulators can help classify patients with colon cancer. A total of 1372 patients diagnosed with stage I–III colon cancer from 5 GEO datasets (GSE17536, GSE33113, GSE37892, GSE38832, and GSE39582) and the TCGA-COAD dataset were enrolled (Additional file 1: Tables S4, S5). Unsupervised K-means clustering analyses of the meta-GEO cohort (990 patients), TCGA-COAD cohort (382 patients), and integrated meta-GEO and TCGA-COAD cohort (1372 patients) were conducted. Results have provided four distinct expression patterns of histone modifiers (Additional file 4: Fig. S3A–C), and the compositions of histone modifiers in these clusters were similar among all three cohorts (Additional file 4: Fig. S3D), indicating that the existence of these four clusters was stable. We termed these clusters HMC1 (n = 546), HMC2 (n = 280), HMC3 (n = 247), and HMC4 (n = 299). Among them, HMC1 exhibited a high expression abundance of nearly all histone modifiers, indicating that the activity and turnover of histone acetylation and methylation were intense in HMC1, while the remaining three clusters showed the enrichment of partial regulators (Fig. 3A). The distribution of each histone modifiers in the four clusters is shown in Additional file 4: Fig. S3E, F. Specifically, we noticed that HMC4 was characterized by the prominent expression of regulators enriched in the fluorouracil-nonresponse subgroup (Fig. 3A). A survival analysis revealed that HMC4 had a significantly shorter RFS time than did HMC1, HMC2, and HMC3 (HMC4 vs. HMC1–3: hazard radio [HR] = 1.63, 95% confidence interval [CI] 1.23–2.16, Fig. 3B). As for overall survival (OS), HMC4 also exhibited a higher mortality risk than the remaining clusters; however, this difference was statistically insignificant (Fig. 3C–E). It should be noted that the negative correlation between HMC4 and OS obviously increased in patients who underwent adjuvant chemotherapy (GSE39582: HR 1.92, 95%CI 1.11–3.30; TCGA-COAD: HR 3.09, 95%CI 0.97–9.84), suggesting that this pattern may be associated with chemotherapy resistance (Fig. 3E). To validate this hypothesis, we analyzed the relationship between HMC clusters and chemotherapy responses in both GSE39582 and TCGA-COAD datasets. As shown in Fig. 3F–G, adjuvant chemotherapy conduction did not provide any survival benefits to patients in the HMC4 cluster in both GSE39582 (HR 1.26, 95%CI 0.66–2.38) and TCGA-COAD (HR 1.03, 95%CI 0.40–2.67) cohorts and the fluorouracil-response rate was also the lowest in patients in the HMC4 cluster. Taken together, these data imply that the histone modification clusters were significantly correlated with patients’ prognosis and chemotherapy benefit, which might provide new insights on colon cancer classification system.
Biological characteristics of different histone modifier expression patterns
To further characterize and understand the biological differences between these intrinsic histone modification phenotypes, we performed a gene set variation analysis (GSVA) based on the “Hallmarker” gene set (Fig. 4A–D). Results indicated that there are some similarities in the biological pathway activation between HMC1 and HMC2. For example, the activation levels of DNA repair-, E2F-, mTORC1-, and MYC-related pathways in HMC1 and HMC2 samples were significantly higher than those in HMC3 and HMC4. Interestingly, the G2M checkpoint pathway score was the highest in HMC1 but the lowest in HMC3 in both the meta-GEO and TCGA-COAD cohorts, suggesting that cell cycle disorder may be an important mechanism underlying the tumorigenesis of patients with HMC1. In addition, the protein secretion pathway was also significantly inhibited in HMC3 patients. In both meta-GEO and TCGA-COAD cohorts, HMC2 was enriched in multiple cell metabolism pathways, including glycolysis, fatty acid metabolism, and oxidative phosphorylation, suggesting that inhibition of metabolism may be a potential treatment strategy for HMC2 patients. Meanwhile, the Wnt signaling pathway and Hedgehog pathway, two key signaling pathways that are crucial for stem and progenitor cell homeostasis and function, were lowest in HMC2 patients, suggesting that the stemness feature of HMC2 is weaker than that of other histone modifier expression patterns. Particularly, HMC4 represented a stromal/mesenchymal phenotype with many enriched pathways, including epithelial–mesenchymal transition (EMT), hypoxia, and TGFβ signaling. Consistent with the results of GSVA, the protein expression levels of the molecular markers involved in EMT and TGFβ signaling were significantly higher in HMC4 than in the remaining clusters (Fig. 4G). Intriguingly, we found that the KRAS pathway showed the highest activation degree in HMC4, and the number of patients in the HMC4 cluster with KRAS mutant tumors was significantly higher than those with KRAS wild-type tumors (Fig. 4H). We further calculated the level of OLFM4+ stem cells and mesenchymal cells using signatures obtained from single-cell sequences proposed by Gao et al.  and analyzed their associations with histone modifier expression patterns. Results have confirmed that the stem cell and mesenchymal cell signatures both had the highest enrichment in patients of the HMC4 cluster (Fig. 4E, F). Finally, based on the molecular subtypes of the GSE39582 and TCGA-COAD cohorts (Fig. 4I), we found that most patients with the C4 (CIT) , C6 (CIT) , CMS4, TMEC2 , Sub 3 , and C6 (Pan-Immune, TCGA)  subtypes, which mostly represent stromal/mesenchymal phenotypes, were assigned to the HMC4 cluster. Overall, these results suggest that histone modifier expression patterns were characterized by distinct biological pathway activation status.
Immune landscapes of different histone modifier expression patterns
We subsequently explored differences in the immune landscapes among all histone modification clusters. A single-sample gene set enrichment analysis (ssGSEA) was performed to obtain the infiltration abundance of TME cells as described in the “Methods” section. As shown in Fig. 5A–D, the TME features of patients in the HMC1 cluster were close to those of an “immune-desert” phenotype characterized by little immune cell infiltration; nevertheless, both HMC2 and HMC3 displayed moderate immune infiltration. However, helper T cells, especially Th2 cells, and central memory T cells were markedly downregulated in HMC3 compared with those in the remaining clusters, suggesting that antigen recognition was repressed in HMC3. Compared with the remaining clusters, HMC4 was characterized by significant increases in stromal cell infiltration, such as fibroblasts and endothelial cells. Moreover, innate immune cells with immunosuppressive properties, such as macrophages, neutrophils, mast cells, and B cells, also had the highest infiltration rate in HMC4. It is noteworthy that we found that CD8+ T cells, which are considered marker cells of adaptive immunity, were more abundant in HMC4 than in the remaining clusters in the TCGA-COAD cohort. Previous studies have revealed that the immune-excluded TME phenotype was characterized by an abundance of immune cells, with these immune cells being retained in the stroma surrounding tumor cell nests rather than in the parenchyma . Therefore, we speculated that the TME feature of patients in the HMC4 cluster might be classified as the feature of the “immune-excluded” phenotype. Subsequent analyses have revealed that patients in the HMC4 cluster had the highest T cell exhaustion , tertiary lymphoid structure signatures , and stromal cell infiltration intensity score  (Fig. 5E, F, and Additional file 5: Fig. S4). Moreover, we obtained intratumoral heterogeneity (ITH), tumor purity, tumor mutation burden (TMB), and the number of neoantigen results from the study of Thorsson et al.  and analyzed their distribution across histone modifier expression patterns. Consistent with our earlier findings, patients in the HMC4 cluster exhibited the highest ITH and lowest tumor purity (Fig. 5G, H). However, there were no significant differences in TMB and number of neoantigen among HMC clusters (Fig. 5I, J). In summary, these data proved that the HMC4 cluster was closely related to the “immune-excluded” phenotype that was characterized by enrichment of both immune and stromal cell types.
Histone modification score (HM_score) construction
Since patients in the HMC4 cluster had the worst prognosis and lowest fluorouracil-response rate, we believe that developing a scoring model capable of individually quantifying histone modification status to identify patients in the HMC4 cluster may offer potential clinical application value. Therefore, we recognized differentially expressed genes (DEGs) in these four clusters. A total of 1003 DEGs (801 upregulated and 202 downregulated) in HMC4 were identified (Fig. 6A and Additional file 1: Table S6). A gene ontology analysis of these DEGs revealed that the upregulated genes were enriched in biological processes related to mesenchyme development, stromal activation, and cell response to external stress, whereas the downregulated DEGs were enriched in items related to cell metabolic processes (Fig. 6A and Additional file 1: Table S7). Subsequently, the Boruta algorithm was applied to reduce the dimension of these DEGs (Methods section), and we ultimately screened out 155 genes to form a histone modification-related signature termed as the HM_score (Fig. 6B, C and Additional file 1: Table S8). The boxplots (Additional file 6: Fig. S5A, B, left) have shown that the median HM_score value was highest in the HMC4 cluster in both meta-GEO and TCGA-COAD cohorts. A receiver operating characteristic (ROC) curve analysis further demonstrated that HM_score was a reliable index to distinguish patients in the HMC4 cluster with an area under the ROC curve (AUC) of 0.94 and 0.95 in the meta-GEO dataset and in the TCGA-COAD dataset, respectively (Additional file 5: Fig. S4A, B, right). In conclusion, the above results strongly suggested that the HM_score can effectively distinguish HMC4 patients.
Clinical relevance and biological characteristics of HM_score
We subsequently explored the prognostic impacts and predictive value of therapeutic benefits of the HM_score. Patients were divided into low- or high-score subgroups according to the cutoff values determined by the “survminer” package. The survival analyses indicated that groups with low HM_score had a significantly high RFS (HR 1.77, 95%CI 1.37–2.29) in the meta-GEO cohort (Fig. 7A). Moreover, the HM_score was validated as an independent prognostic biomarker for evaluating patient relapse using a multivariate Cox regression model (HR 1.49, 95%CI 1.02–2.16, Fig. 7B) controlled for age, gender, tumor stage, and CMS subtype. Similarly, we also noticed that the positive correlation between HM_score value and mortality rate was statistically significant in the subgroup of patients receiving chemotherapy in both the GSE39582 (HR 1.88, 95%CI 1.39–2.53) and TCGA-COAD (HR 2.54, 95%CI 1.47–4.38) cohorts (Fig. 7C), while chemotherapy conduction was a risk factor for unfavorable prognosis in patients within a high HM_score group (GSE39582: HR 1.38, 95%CI 0.81–2.33; TCGA-COAD: HR 1.33, 95%CI 0.52–3.36; Fig. 7D). The following boxplots also showed that HM_score was significantly higher in the fluorouracil-nonresponse and CMS4 subtype groups than in the remaining groups (Fig. 7E, F). GSVA and immune analyses demonstrated that the HM_score was markedly positively correlated with stromal activation processes and stromal cell infiltration, which is consistent with the results of the HMC4 cluster analysis (Fig. 7G, H). To confirm the clinical value and biological implication of the HM_score, we obtained bulk RNA-sequencing data from 30 additional patients with colon cancer from the Sun Yat-sen University Cancer Center (SYSUCC) as an external dataset (Additional file 1: Table S5). Patients were also grouped into the HMC4 and non-HMC4 using the nearest template prediction algorithm (GenePattern module “NTP,” https://cloud.genepattern.org) based on the DEGs in HMC4 we earlier identified. Consistent with the results of the meta-GEO and TCGA-COAD databases, the median HM_score value was significantly higher in the HMC4 than in the non-HMC4 cluster in the SYSUCC cohort (Additional file 5: Fig. S4C, left), and HM_score defined the HMC4 patterns with an AUC of 0.98 according to ROC curve analysis (Additional file 6: Fig. S5C, right). Furthermore, HM_score was also significantly higher in the fluorouracil-nonresponse and CMS4 subtype groups than in other groups (Fig. 7I), and there were strong positive associations between HM_score and stroma-relevant signatures (Fig. 7J) and stroma cell infiltration (Fig. 7K) in the SYSUCC cohort. The above results revealed that HM_score was a useful biomarker that could effectively predict survival and chemotherapy benefit in colon cancer patients and may offer potential clinical application value.
Whole-genome CRISPR screen reveals ZEB2 as the candidate driver gene for fluorouracil resistance in patients within HMC4 cluster
To identify critical genes driving fluorouracil resistance in patients of the HMC4 cluster, we performed CRISPR-based genome-wide loss-of-function screening in SW480 cells, using 2 μg/mL of fluorouracil as an effective selection pressure (Fig. 8A). From this screen, we discovered a subset of sgRNAs targeting 166 genes were significantly enriched in the fluorouracil-treated cells when compared to the vehicle control. These genes were identified as potential drivers for fluorouracil resistance (Fig. 8B). Moreover, among these genes, eight were highly expressed in patients in the HMC4 cluster (Fig. 8B, C). ZEB2, whose sgRNA was decreased the most in fluorouracil-treated populations, gained our attention. Subsequent analyses confirmed that patients in the HMC4 cluster in both the meta-GEO and TCGA-COAD cohorts had the highest ZEB2 mRNA expression distribution (Fig. 8D). There was also a strong positive correlation between ZEB2 transcriptional expression and HM_score in the SYSUCC cohort (Fig. 8E). A clinical relevance analysis has suggested that ZEB2 expression reflected the prognostic value of both RFS and OS, especially in patients who underwent adjuvant chemotherapy (Fig. 8G). The analysis also revealed that only patients in the low-ZEB2 group could significantly benefit from adjuvant chemotherapy (GSE39582: HR 0.53, 95%CI 0.32–0.88; TCGA-COAD: HR 0.29, 95%CI 0.08–0.98, Fig. 8G). The boxplots in Fig. 8F and I showed that ZEB2 mRNA expression was significantly elevated in the fluorouracil-nonresponse and CMS4 subtype groups. Pathway and immune analyses confirmed that the activation level of stroma pathways and stromal cell infiltration significantly increased as ZEB2 expressions increased (Fig. 8H).
To validate the CRISPR/Cas9 library screening results, we transfected siRNAs targeting ZEB2 and the ZEB2 overexpression plasmid in vitro into SW480 and HCT116, respectively, and performed a methyl-thiazolyl-tetrazolium (MTT) assay. As shown in Fig. 8K and Additional file 7: Fig. S6, the cell viability of the ZEB2 silencing group was significantly inhibited, while the cell viability of the ZEB2 overexpression group was significantly enhanced compared with the control group in each fluorouracil concentration gradient we tested. These data suggested that the cytotoxicity of fluorouracil to tumor cells was influenced by the transcriptional abundance of ZEB2. Interestingly, we uncovered several potential histone modification regulators of ZEB2 by differential expression analysis between the high- and low-ZEB2 groups (Fig. 8J), such as HDAC10, HDAC9, and KAT2B, indicating that these regulators might regulate ZEB2 expression in patients with HMC4. Collectively, based on the results from the analysis of these real-world cohorts, we are confident that the ZEB2 found by CRISPR library screening was indeed the core gene mediating chemoresistance in HMC4 patients.
The Connectivity Map analysis identifies potential molecular targets and compounds capable of reversing transcriptional characteristics in patients with HMC4
To identify candidate molecular targets and compounds that may be options to achieve chemosensitization in patients with HMC4, we analyzed the Connectivity Map project. Briefly, 147 significantly enriched molecular targets (Additional file 1: Table S9) and 91 compounds (Additional file 1: Table S10) were identified, and those identified in at least two cohorts were presented in the heatmap (Fig. 8L). Among these candidate molecular targets, nine were significantly enriched in all three cohorts, and five of them (TP53BP1 , RIPK2 , EHMT2 , IGFBP3 , and HMOX1 ) have been reported to have cancer- and chemoresistance-promoting activities simultaneously. Particularly, although EHMT2, a member of the histone methyltransferase family, was identified in all three cohorts, the transcription level of EHMT2 was significantly higher in the fluorouracil-response group (Fig. 2E). Of the 91 compounds, 13 were significantly enriched in all three cohorts. A mode-of-action analysis of these 13 compounds revealed 11 shared action mechanisms (Fig. 8M). Additionally, 2 compounds (SJ-172550 and RITA) shared the mode-of-action as the MDM inhibitor (Fig. 8M). The mode-of-action of the PKC inhibitor was also found in two other compounds (Fig. 8M). These findings provided a new perspective for developing effective chemosensitizing treatment strategies in HMC4 patients.
Histone modifications serve as regulatory markers that are essential to control transcription and architecture. Although histone modification deregulation (particularly the well-studied deregulation of acetylation and methylation modifications) has been widely reported to be vital epigenetic mechanisms underlying cancer progression , the correlation between the global profiling of histone modification regulator patterns and tumor heterogeneity due to pathway activation or TME infiltration has not been comprehensively recognized.
In this study, by the consensus clustering and analysis of the transcriptome data of 88 histone acetylation and methylation regulators, we, for the first time, identified four distinct different histone modifier expression patterns that were associated with different clinical outcomes, biological pathways, and TME characteristics. Each cluster was enriched by multiple regulators involved in acetylation or methylation processes, implying that these histone modification regulators contribute to the heterogeneous progression of colon cancer in a highly coordinated manner. Among the four clusters provided in this study, HMC4 gained our attention the most as it had the worst prognosis and lowest fluorouracil-response rate. Based on the functional and TME analyses, we observed that patients in the HMC4 cluster had the highest activation levels of EMT, TGFβ signals, and hypoxia pathways and the highest infiltration of stromal cells and immunosuppressive innate immune cells. HMC4 collectively harbored stromal/mesenchymal properties, which can explain the poor prognosis of patients in this cluster. This result is consistent with that of our previous studies, which stated that the stromal pathway activation level is a core determinant of negative chemotherapy outcomes in patients with colon cancer [18, 19]. Intriguingly, patients in the HMC4 cluster demonstrated higher KRAS mutation and KRAS signaling enrichment incidences than patients in other clusters. Since growing evidence has acknowledged the association between KRAS mutation and the adverse prognostic impacts on patients with colon cancer treated with fluorouracil-based chemotherapy [30, 31], we can conclude that the diminished benefits of chemotherapy in patients within the HMC4 cluster also resulted in the combined effects of KRAS mutation and KRAS signaling dysregulation.
Considering the special clinical features of patients in the HMC4 cluster, there is a need to develop a scoring scheme that can individually quantify the histone modification status to distinguish HMC4 from other histone modification subtypes. By applying the dimension reduction method, we successfully established a transcriptome-based quantification system named the “HM_score” to define HMC4 patterns with high accuracy. This finding validated that the abnormal transcriptional activation of oncogenes or, conversely, the repression of tumor suppressors is the main histone modification mechanism underlying tumor heterogeneity progression [4, 32]. Clinical analyses further highlighted that HM_score is an independent prognostic factor for colon cancer and associated with chemotherapeutic responses. This finding also verified our hypothesis that histone modifier expression patterns could be applied in clinical practice to guide therapeutic strategies more precisely for individual patients.
In addition to exploring the implications of the histone modification pattern on colon cancer treatment strategies, we also performed a genome-wide CRISPR screen and identified that ZEB2 is a potential driver gene contributing to the drug resistance in the background of histone modification alterations. ZEB2 is a known EMT regulator whose promoter experiences dynamic histone mark changes during cell transition toward mesenchymal features in response to EMT inducers, such as TGFβ [33,34,35,36]. Currently, several histone modification regulators, including DOT1L , DNMT1 , EZH2 , and KDM5B , have been reported to play a role in the TGFβ-stimulated ZEB2 transcriptional upregulation of many cancers. To the best of our knowledge, this study is the first to report an association between ZEB2 expression and global histone modifier expression patterns. Future studies to further elucidate the exact mechanism underlying the ZEB2-related histone modification process may be helpful to develop novel cancer therapies. However, PKC activation is involved in histone modification-dependent ZEB2 expression and EMT processes. Additionally, pan-PKC inhibitors suppress EMT by promoting the DNMT1-induced histone methylation of ZEB2 . Coincidentally, through a Connectivity Map analysis, two types of PKC inhibitors were significantly enriched and consequently considered as compounds for the chemosensitization of patients in the HMC4 cluster. Accordingly, systematic preclinical studies investigating the efficiency of PKC inhibitors as combined targeted therapies for patients in the HMC4 cluster are warranted.
Although our study is the first to establish molecular subtypes based on histone modification regulators, providing new insights on the epigenetic mechanisms underlying colon cancer heterogeneity, this study has some limitations. First, we only collected and analyzed regulators involved in histone acetylation and methylation. Some less prevalent or newly reported histone modification types, such as crotonylation , propionylation , butyrylation , and β-hydroxybutyrylation , are also reported to be linked with cancer. Second, we only focused on the transcriptional levels of histone acetylation and methylation regulators and did not integrate other omics data affecting gene expression to classify patients, such as copy number variations, gene mutations, and DNA methylation, meaning that the subtypes analyzed in this study are biased. Third, the various histone modification residues, which were also determinants of the biological functions of histone modifications , were not included in this study. Fourth, the method of interpreting the HM_score and the appropriate cutoff values need to be standardized to ensure that the role of this scoring model can be validated in future prospective studies. Last but not least, since the acquisition of ZEB2 comes from CRISPR screening of cell lines in vitro, its role in vivo still needs to be further verified through prospective clinical trials.
In conclusion, this study comprehensively evaluated the clinical behavior, molecular, and genetic factors associated with histone modifier expression patterns and consequently demonstrated several important insights on how tumor heterogeneity is generated and enhanced by mechanisms underlying epigenetic disorders, as well as proposed promising and effective opportunities for therapeutic intervention. In addition, the HM_score we developed was a clinically valuable tool for identifying patients in the HMC4 cluster precisely by individually quantifying histone modification status and predicting patient survival and chemotherapeutic benefit, thus providing more precise therapeutic guidance in colon cancer in the future.
Public data preparation
The procedure for data analysis was compiled into a flowchart (Additional file 8: Fig. S7). Public transcriptome data on colon cancer samples were retrospectively collected from the GEO (http://www.ncbi.nlm.nih.gov/geo/) and TCGA-COAD (https://cancergenome.nih.gov/) datasets. The demographic and clinical information were retrieved using the “GEOquery” package for GEO datasets or downloaded from the University of California Santa Cruz Xena database (https://xenabrowser.net). The following clinical information was collected: patients’ age, sex, TNM stage, primary tumor site, and chemotherapy performance. The endpoint analyzed in this study was RFS, defined as the interval between the date of diagnosis and date of tumor relapse, and OS, defined as the interval between the date of diagnosis and death. Besides transcriptome data, we also downloaded the somatic mutation data (MAF files, MuTect2 Variant Aggregation and Masking) of patients specimens from ten different tumor types using “TCGAbiolinks” packages to explore the genetic mutation landscape of histone modification regulators. Patient selection criteria for establishing patient cohorts of molecular typing and scoring model development and transcriptome data processing methods are described in Additional file 9: Materials and Methods.
RNA sequencing of samples from the Sun Yat-sen University Cancer Center cohort
This study was approved by the Nanfang Hospital Ethics Review Board. Thirty fresh samples histologically diagnosed with nonmetastatic colon cancer at the SYSUCC (Guangzhou, China) were included, and RNA extraction and sequencing were performed as described previously . The count values of RNA-sequencing data were transformed using the “voom” algorithm after gene symbol transformation (based on Ensembl ID) in order to convert count data to values similar to those resulting from microarrays .
Identification of histone modifier expression patterns by consensus clustering
The unsupervised clustering (K-means) method was used to identify different histone modifier expression patterns and classify patients for further analysis. A consensus clustering algorithm was used to evaluate clustering stability and select the optimal cluster number using the R package “ConsensusClusterPlus” with the following settings: maxK = 10, reps = 1000, pItem = 0.95, and pFeature = 1 .
Histone modification score generation
To develop a histone modification score to individually quantify the histone modification status, we first analyzed the differential expressed genes (DEGs) among distinct histone modifier expression patterns in the integrated meta-GEO and TCGA-COAD transcriptional profiling using the “limma” package. The adjusted p value for multiple testing was calculated using the Benjamini–Hochberg correction. The significance criterion for DEGs was set as an absolute “Log2FC” value > 0.5 and an adjusted p value < 0.01. Specifically, the DEGs that up- or downregulated in the HMC4 cluster were selected and termed as gene cluster A (801 DEGs upregulated in the HMC4) and cluster B (202 DEGs downregulated in the HMC4), respectively. The Boruta algorithm was employed for the dimension reduction of the gene cluster A and gene cluster B, respectively, using the “Boruta” package (settings: doTrace = 2, maxRuns = 100, ntree = 500) to screen out the most informative genes. The final score was defined as: HM_score = the average expression of final determined gene cluster A—the average expression of final determined gene cluster B.
The fluorouracil response of clinical samples was assessed using the R package “pRRophetic,” which implemented a built-in ridge regression model  and was qualified as the area under the dose–response curve (AUC), with lower AUC values indicating higher sensitivity to fluorouracil. Further details are provided in Additional file 9: Materials and Methods.
Biological process and tumor microenvironment characteristics analysis
The biological process and tumor microenvironment characteristics analysis were performed as previously described . Briefly, we utilized GSVA analysis comprising the gene set files of “hallmark gene sets” with the R package “GSVA” to measure biological process activity. An ssGSEA implemented in the “GSVA” R package was used to generate the infiltration scores of the TME cells. The special feature gene panels for marking immune cells , stromal cells , exhausted T cells , and tertiary lymphoid structure  were curated from the published literature. The abundance of each cell type was represented by an enrichment score of the gene set in a sample outputted by ssGSEA analysis based on gene expression profiles.
Cell culture, cell transfection, and MTT assay
Cell culture, cell transfection, and MTT assay were performed as described previously . Further details are provided in Additional file 9: Materials and Methods.
Genome-wide CRISPR/Cas9 knockout library screen.
The human GeCKO v2 CRISPR library A and library B were used to generate a mutant cell pool for high throughput screening. The criteria for screening candidate sgRNAs were: (1) The average count values of candidate sgRNA in both the fluorouracil-treated group and the vehicle group were greater than 1000; (2) the absolute “Log2FC” value calculated by difference analysis of sgRNA level between the vehicle group and fluorouracil-treated group was more than 0.5. Further details on this matter are provided in Additional file 9: Materials and Methods.
Connectivity Map analysis
A Connectivity Map analysis was performed to explore the potential molecular targets and specific compounds that could be used for chemosensitization of patients with HMC4 via an online tool (https://clue.io/). A total of 300 DEGs with the most significant fold changes (150 DEGs upregulated and 150 DEGs downregulated in the HMC4 cluster) were entered into the Connectivity Map database following the instructions provided by the website. In this study, the enrichment score generated by Connectivity Map analysis was set to < − 97 and < − 95 for the significant threshold of molecular targets and chemical compounds, respectively.
Statistical analysis was performed using R software version 3.6.0 or SPSS version 25.0 (IBM Corp., Armonk, N.Y., USA). The two-tailed Student’s t test, Mann–Whitney U test, Kruskal–Wallis test, one-way ANOVA test, Fisher’s exact test, Pearson’s correlation test, Spearman’s rank correlation test, Cox regression hazard model, and Kaplan–Meier method with the log-rank test were used where necessary. All p values were two-tailed, and statistical significance was set to p < 0.05 unless noted otherwise. Details are provided in Additional file 9: Materials and Methods.
Taniue K, Hayashi T, Kamoshida Y, Kurimoto A, Takeda Y, Negishi L, et al. UHRF1-KAT7-mediated regulation of TUSC3 expression via histone methylation/acetylation is critical for the proliferation of colon cancer cells. Oncogene. 2020;39(5):1018–30.
Gao S, Yan L, Wang R, Li J, Yong J, Zhou X, et al. Tracing the temporal-spatial transcriptome landscapes of the human fetal digestive tract using single-cell RNA-sequencing. Nat Cell Biol. 2018;20(6):721–34.
Marisa L, de Reynies A, Duval A, Selves J, Gaub MP, Vescovo L, et al. Gene expression classification of colon cancer into molecular subtypes: characterization, validation, and prognostic value. PLoS Med. 2013;10(5):e1001453.
Zhou R, Zeng D, Zhang J, Sun H, Wu J, Li N, et al. A robust panel based on tumour microenvironment genes for prognostic prediction and tailoring therapies in stage I–III colon cancer. EBioMedicine. 2019;42:420–30.
Woroniecka K, Chongsathidkiet P, Rhodin K, Kemeny H, Dechant C, Farber SH, et al. T-cell exhaustion signatures vary with tumor type and are severe in glioblastoma. Clin Cancer Res Off J Am Assoc Cancer Res. 2018;24(17):4175–86.
Zhou R, Wen Z, Liao Y, Wu J, Xi S, Zeng D, et al. Evaluation of stromal cell infiltration in the tumor microenvironment enable prediction of treatment sensitivity and prognosis in colon cancer. Comput Struct Biotechnol J. 2022;20:2153–68.
Bouwman P, Aly A, Escandell JM, Pieterse M, Bartkova J, van der Gulden H, et al. 53BP1 loss rescues BRCA1 deficiency and is associated with triple-negative and BRCA-mutated breast cancers. Nat Struct Mol Biol. 2010;17(6):688–95.
Jaafar R, Mnich K, Dolan S, Hillis J, Almanza A, Logue SE, et al. RIP2 enhances cell survival by activation of NF-kB in triple negative breast cancer cells. Biochem Biophys Res Commun. 2018;497(1):115–21.
Vural S, Palmisano A, Reinhold WC, Pommier Y, Teicher BA, Krushkal J. Association of expression of epigenetic molecular factors with DNA methylation and sensitivity to chemotherapeutic agents in cancer cell lines. Clin Epigenet. 2021;13(1):49.
Mofid MR, Gheysarzadeh A, Bakhtiyari S. Insulin-like growth factor binding protein 3 chemosensitizes pancreatic ductal adenocarcinoma through its death receptor. Pancreatol Off J Int Assoc Pancreatol. 2020;20(7):1442–50.
Ahmad IM, Dafferner AJ, O’Connell KA, Mehla K, Britigan BE, Hollingsworth MA, et al. Heme oxygenase-1 inhibition potentiates the effects of nab-paclitaxel-gemcitabine and modulates the tumor microenvironment in pancreatic ductal adenocarcinoma. Cancers. 2021;13(9):2264.
Lee DW, Kim KJ, Han SW, Lee HJ, Rhee YY, Bae JM, et al. KRAS mutation is associated with worse prognosis in stage III or high-risk stage II colon cancer patients treated with adjuvant FOLFOX. Ann Surg Oncol. 2015;22(1):187–94.
Cho MH, Park JH, Choi HJ, Park MK, Won HY, Park YJ, et al. DOT1L cooperates with the c-Myc-p300 complex to epigenetically derepress CDH1 transcription factors in breast cancer progression. Nat Commun. 2015;6:7821.
Enkhbaatar Z, Terashima M, Oktyabri D, Tange S, Ishimura A, Yano S, et al. KDM5B histone demethylase controls epithelial-mesenchymal transition of cancer cells by regulating the expression of the microRNA-200 family. Cell Cycle. 2013;12(13):2100–12.
Lee E, Wang J, Yumoto K, Jung Y, Cackowski FC, Decker AM, et al. DNMT1 regulates epithelial-mesenchymal transition and cancer stem cells, which promotes prostate cancer metastasis. Neoplasia. 2016;18(9):553–66.
Fang Y, Wang Y, Zeng D, Zhi S, Shu T, Huang N, et al. Comprehensive analyses reveal TKI-induced remodeling of the tumor immune microenvironment in EGFR/ALK-positive non-small-cell lung cancer. Oncoimmunology. 2021;10(1):1951019.
Ali HR, Chlon L, Pharoah PD, Markowetz F, Caldas C. Patterns of immune infiltration in breast cancer and their clinical implications: a gene-expression-based retrospective study. PLoS Med. 2016;13(12):e1002194.
Zeng D, Li M, Zhou R, Zhang J, Sun H, Shi M, et al. Tumor microenvironment characterization in gastric cancer identifies prognostic and immunotherapeutically relevant gene signatures. Cancer Immunol Res. 2019;7(5):737–50.
Yang C, Huang X, Li Y, Chen J, Lv Y, Dai S. Prognosis and personalized treatment prediction in TP53-mutant hepatocellular carcinoma: an in silico strategy towards precision oncology. Brief Bioinform. 2021;22(3):bbaa164.
Bindea G, Mlecnik B, Tosolini M, Kirilovsky A, Waldner M, Obenauf AC, et al. Spatiotemporal dynamics of intratumoral immune cells reveal the immune landscape in human cancer. Immunity. 2013;39(4):782–95.
Becht E, Giraldo NA, Lacroix L, Buttard B, Elarouci N, Petitprez F, et al. Estimating the population abundance of tissue-infiltrating immune and stromal cell populations using gene expression. Genome Biol. 2016;17(1):218.
Sun H, Zhou R, Zheng Y, Wen Z, Zhang D, Zeng D, et al. CRIP1 cooperates with BRCA2 to drive the nuclear enrichment of RAD51 and to facilitate homologous repair upon DNA damage induced by chemotherapy. Oncogene. 2021;40(34):5342–55.
We would like to thank TCGA and GEO databases for their contribution.
This work was supported by the National Natural Science Foundation of China (No. 82073375 to Xiaoxiang Rong; No. 82102731 to Rui Zhou), Natural Science Foundation of Guangdong Province of China (2020A1515110686 to Rui Zhou; 2022A1515012418 to Rui Zhou; 2021A1515010700 to Zhenhua Huang), Science and Technology Planning Project of Guangzhou (No. 202102020532 to Xiaoxiang Rong), and China Postdoctoral Science Foundation (2019M663001 to Xiaoxiang Rong).
Rui Zhou, Fuli Xie and Kuncai Liu have are contributed equally
Authors and Affiliations
Department of Oncology, Nanfang Hospital, Southern Medical University, Guangzhou, 510515, Guangdong, People’s Republic of China
Department of Radiotherapy, Sun Yat-sen University Cancer Center, Guangzhou, Guangdong, People’s Republic of China
State Key Laboratory of Organ Failure Research, Guangdong Provincial Key Laboratory of Viral Hepatitis Research, Department of Hepatology Unit and Infectious Diseases, Nanfang Hospital, Southern Medical University, Guangzhou, Guangdong, People’s Republic of China
Department of Pathology, Sun Yat-sen University Cancer Center, State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Guangzhou, 510515, Guangdong, People’s Republic of China
XR, ZH, and SX contributed to the planning of the study. RZ, FX, and KL drafted the manuscript. SX verified the numerical results by an independent implementation. RZ, FX, KL, and XZ prepared all the figures and tables. XC and JC contributed to interpretation of data and review of the manuscript. All the authors reviewed and approved the final manuscript.
: Fig. S1. Mutations of the “Complex of Proteins Associated with Set1” gene family. A–D Lollipop diagrams of the landscape of KMT2D (A), KMT2C (B), KMT2B (C), and SETD1B (D) mutation positions. (E) Boxplot of immune cell infiltration of “Complex of Proteins Associated with Set1” mutation and non mutation groups in the TCGA-COAD cohort. Boxes represent 25–75% of values, lines in boxes represent median values, whiskers represent 1.5 interquartile ranges, and black dots represent outliers. (F) Sankey diagram of “Complex of Proteins Associated with Set1” mutations in groups with different molecular subtypes in the TCGA-COAD cohort. MT, mutant type; WT, wild type; CMS, consensus molecular subtypes; MSI, microsatellite instability; MSS, microsatellite stability.
: Fig. S2. Alterations of histone modification regulators in the TCGA pan-cancer cohort. A–D Detailed heatmap of alteration frequencies (left) and mutation rates (right) in members of histone acetyltransferases (A), histone deacetylases (B), histone methyltransferases (C), and histone demethylases regulators (D) across solid tumors in the TCGA pan-cancer cohort.
: Fig. S3. The optimal cluster number as determined by the consensus clustering algorithm. A–C Consensus matrixes (left) of patients with colon cancer for k = 4 and line graphs (right) of relative changes in the area under the CDF curve according to the cluster number in the meta-GEO (A), TCGA-COAD (B), and integrated GEO and TCGA-COAD cohorts (C). D Heatmaps of the enrichment of histone modification regulators in different histone modifier expression patterns in the meta-GEO (left), TCGA-COAD cohort (middle), and integrated GEO and TCGA-COAD cohorts (right). (E–F) Boxplot of distribution of histone modifier expressions among different histone modifier expression patterns in the meta-GEO (E) and TCGA-COAD cohorts (F). Boxes represent 25–75% of values, lines in boxes represent median values, whiskers represent 1.5 interquartile ranges, and black dots represent outliers. *p < 0.05, **p < 0.01, ***p < 0.001; ns, not significant.
: Fig. S4. Distribution of SIIS value among HMC clusters. Boxplot of SIIS value in the four studied histone modifier expression patterns in the meta-GEO (left) and TCGA-COAD (right) cohorts. Boxes represent 25–75% of values, lines in boxes represent median values, whiskers represent 1.5 interquartile ranges, and black dots represent outliers. *p < 0.05, **p < 0.01, ***p < 0.001.
: Fig. S5. Association between HM_score and histone modifier expression patterns. A–C (left) Boxplot of the HM_score values of the different modification clusters in the meta-GEO (A), TCGA-COAD (B), and SYSUCC (C) cohorts. Boxes represent 25–75% of values, lines in boxes represent median values, whiskers represent 1.5 interquartile ranges, and black dots represent outliers. A–C (right) receiver operating characteristics curve of the HM_score model for distinguishing patients in the HMC4 cluster from those who are not in the meta-GEO (A), TCGA (B), and SYSUCC cohorts (C). Ref, reference; AUC, area under curve.
: Fig. S6. ZEB2 overexpression experiment. Dose–response curves of SW480 (left) and HCT116 cells (right) transfected with empty vectors or ZEB2 plasmid after fluorouracil treatment for 24 h. The mean ± standard deviation of the three replicates of each time point is shown. *p < 0.05, **p < 0.01, ***p < 0.001.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
Zhou, R., Xie, F., Liu, K. et al. Cross talk between acetylation and methylation regulators reveals histone modifier expression patterns posing prognostic and therapeutic implications on patients with colon cancer.
Clin Epigenet14, 70 (2022). https://doi.org/10.1186/s13148-022-01290-y