Genome-wide DNA methylation profiling integrated with gene expression profiling identifies PAX9 as a novel prognostic marker in chronic lymphocytic leukemia

Background In chronic lymphocytic leukemia (CLL), epigenomic and genomic studies have expanded the existing knowledge about the disease biology and led to the identification of potential biomarkers relevant for implementation of personalized medicine. In this study, an attempt has been made to examine and integrate the global DNA methylation changes with gene expression profile and their impact on clinical outcome in early stage CLL patients. Results The integration of DNA methylation profile (n = 14) with the gene expression profile (n = 21) revealed 142 genes as hypermethylated-downregulated and; 62 genes as hypomethylated-upregulated in early stage CLL patients compared to CD19+ B-cells from healthy individuals. The mRNA expression levels of 17 genes identified to be differentially methylated and/or differentially expressed was further examined in early stage CLL patients (n = 93) by quantitative real time PCR (RQ-PCR). Significant differences were observed in the mRNA expression of MEIS1, PMEPA1, SOX7, SPRY1, CDK6, TBX2, and SPRY2 genes in CLL cells as compared to B-cells from healthy individuals. The analysis in the IGHV mutation based categories (Unmutated = 39, Mutated = 54) revealed significantly higher mRNA expression of CRY1 and PAX9 genes in the IGHV unmutated subgroup (p < 0.001). The relative risk of treatment initiation was significantly higher among patients with high expression of CRY1 (RR = 1.91, p = 0.005) or PAX9 (RR = 1.87, p = 0.001). High expression of CRY1 (HR: 3.53, p < 0.001) or PAX9 (HR: 3.14, p < 0.001) gene was significantly associated with shorter time to first treatment. The high expression of PAX9 gene (HR: 3.29, 95% CI 1.172–9.272, p = 0.016) was also predictive of shorter overall survival in CLL. Conclusions The DNA methylation changes associated with mRNA expression of CRY1 and PAX9 genes allow risk stratification of early stage CLL patients. This comprehensive analysis supports the concept that the epigenetic changes along with the altered expression of genes have the potential to predict clinical outcome in early stage CLL patients. Electronic supplementary material The online version of this article (doi:10.1186/s13148-017-0356-0) contains supplementary material, which is available to authorized users.


Background
Chronic lymphocytic leukemia (CLL) arises from a malignant clone of B cells due to altered control of apoptosis and dysregulated rate of proliferation. Its progression is characterized by clonal proliferation and accumulation of mature neoplastic CD5 + B lymphocytes [1]. The clinical course of CLL patients is extremely variable with some patients progressing rapidly as compared to others and ultimately, requiring therapeutic intervention. Several biomarkers including immunoglobulin heavy chain variable (IGHV) gene mutations that segregate CLL patients into low and high-risk clinical groups are widely used to assess the prognosis of these patients. Low-risk patients generally display mutated IGHV gene, low CD38, and low ζ chain associated protein kinase-70 (ZAP-70) expression, while high-risk cases exhibit the reverse pattern [2][3][4][5][6].
Altered DNA methylation is one of the hallmark events in cancer. The first evidence of DNA methylation in CLL was presented by Wahlfors et al. [7] in which a global loss of methylation was reported. In addition to global hypomethylation, hypermethylation of individual gene promoters has also been reported in CLL [7][8][9][10][11]. Methylation of TWIST2 and ZAP-70 exhibited a strong association with the IGHV-mutated status [9,12] whereas methylation of HOXA4 gene was predominantly associated with the IGHV unmutated status [13]. Further studies employing genome wide methylation profiling technologies have revealed association of differential methylation patterns with prognostic subgroups based on the IGHV mutation status [14][15][16], CD38 levels [17], ZAP-70 levels [16], immunogenetic subsets [18], and 17p-deletion status [19].
Earlier, DNA hypermethylation was thought to affect the expression of a gene negatively but the emerging research has suggested that the function and effect of DNA methylation is contextual, and the relationship between DNA methylation and transcription is more complex [20]. In CLL, although association of differential methylation patterns with specific prognostic subgroups in earlier reports highlights the potential of altered gene methylation as a tool to predict clinical outcome, further research is required to establish the relationship between the epigenome and the transcriptome. The present study was carried out to correlate the DNA methylation patterns with gene expression profile and to assess the prognostic implications of such correlations on clinical outcome in 93 early stage CLL patients.

Patient selection
Treatment naive early stage (Rai 0-II) CLL patients (n = 100) were enrolled in the study after obtaining informed consent as per the guidelines of the institute ethics committee. According to the staging criteria outlined by Rai et al. [21], 24 patients were in stage 0, 33 were in stage I and 43 were in stage II. Fourteen randomly selected CLL samples and pooled CD19+ B-cells from 10 healthy individuals were profiled for methylation. Gene expression profiling was carried out in 21 CLL samples and pooled CD19+ Bcells from 10 healthy individuals. All the CLL samples had at least ≥65% CLL phenotype cells. The clinical and laboratory characteristics of the CLL patients analysed using methylation and gene expression arrays are provided in Table

IGHV mutation status
IGHV gene family usage was evaluated as per BIOMED-2 protocol [23] and the patients were assigned to IGHV mutated or unmutated subgroups based on the IGHV sequence homology (cut-off = 98%) as determined by the international ImmunoGeneTics database (IMGT; http:// imgt.cines.fr, Montpellier, France).

Methylated CpG island microarrays
Genomic DNA was extracted from the peripheral blood mononuclear cells (PBMC) of CLL patients (n = 14) and CD19 + sorted cells pooled from 10 healthy individuals. To isolate the CD19+ cells, mononuclear cells isolated from peripheral blood of healthy individuals were incubated with CD19 + magnetic microbeads and processed according to the manufacturer's protocol (Milteneyi Biotech, Gladbach, Germany). In healthy individuals, CD19+ cells constitute 2-3% of the leukocyte fraction and therefore, sorted CD19+ B-cells from healthy individuals were used. In the CLL samples evaluated for microarrays, CD19+ cells constituted at least ≥65% of the leukocytes and the PBMC fraction from CLL patients was used for the study.
For methylated CpG island microarrays, 6 μg of genomic DNA was digested with Mse I restriction enzyme (New England Biolabs Inc., Ipswich, MA, USA) and labelled with anti-5 methyl cytidine antibody (Abcam, Cambridge, UK). One fraction of the labelled DNA was immunoprecipitated while the other was used as input DNA. Both the input and immunoprecipitated fractions were purified followed by whole genome amplification (WGA, Sigma Aldrich, St. Louis, MO, USA), labelled with Cy3-and Cy5-dUTP, respectively, and hybridized on 1x244K human promoter chIP-on-chip microarray slides as per the manufacturer's recommendations (Agilent Technologies, Santa Clara, CA, USA). The slides were washed and scanned on the Agilent DNA microarray scanner D and the data was extracted with Feature Extraction® software FE version 11.5 (Agilent Technologies, Santa Clara, CA, USA).

Gene expression microarray
Total RNA obtained from PBMC of CLL patients (n = 21) and CD19+ sorted cells pooled from 10 healthy individuals was amplified and simultaneously labelled with Cy3-CTP using low input quick amp labelling kit (Agilent Technologies, Santa Clara, CA, USA). The labelled product was finally hybridized to SurePrint G3 Human Gene Expression 8x60K microarray slide as per manufacturer's recommendation (Agilent Technologies, Santa Clara, CA, USA). The slides were washed and scanned on the Agilent DNA microarray scanner D and the data was extracted with Feature Extraction® software FE version 11.5 (Agilent Technologies, Santa Clara, CA, USA). These samples included seven CLL samples profiled for DNA methylation status.

Bisulfite genome sequencing
Genomic DNA (2 μg) was bisulfite modified and purified using Epitect Bisulfite kit as per the manufacturer's instructions (Qiagen, Hilden , Germany). The bisulfite converted DNA was amplified for two CpG islands in PAX9 gene as depicted in Fig. 1 and sequenced with BigDye Terminator v3.1 Cycle Sequencing kit (Applied Biosystems, CA, USA) with primers designed using MethPrimer (http://www.urogene.org/cgi-bin/methprimer/methprimer.cgi). The percent methylation levels were computed and further analysed with Bisulfite

Real-time quantitative PCR (RQ-PCR)
The mRNA expression based microarray findings were validated using RQ-PCR in an independant cohort of 93 early stage CLL patients for 17 genes with gene-specific primers (Additional file 1: Table S1). The experiments were performed using SYBR Green Master Mix according to the manufacturer's protocol on Mx3005P (Agilent Technologies, Santa Clara, CA, USA). The fold change was calculated using 2 -ΔΔCt method with beta-actin as an endogenous control. The Receiver's operating characteristic (ROC) curve-derived cut-off values were used to define high or low mRNA expression levels.

Bioinformatics analysis and statistics
Methylation array data was analyzed using Genomic Workbench version 7.0 (Agilent Technologies, Santa Clara, CA, USA). On the basis of melt temperature, log-ratio data for each probe was normalized. By taking into account the Gaussian-fit curves, Z score was generated for each sample and p values were calculated. The p values were then used to determine the log-odds score for each probe. The differentially hypermethylated and hypomethylated probes between groups were filtered based on the minimum value of log2-fold change (log2FC) between the groups =0.25, p < 0.05 and the false discovery rates (FDR) of 0.2 [24].The probes with log2FC ≤ (-)0.25 were considered hypomethylated and ≥ (+)0.25 were considered hypermethylated. The gene expression data across all arrays was log2 transformed and normalized using quantile normalization and analyzed by the Lima library from R-Bioconductor. Probes with an adjusted p-value less than 0.05 and log2FC of 1 were selected.
The correlation of log-odds values obtained from the DNA methylation arrays (p < 0.05, log2FC = 0.25) and the expression arrays for the identified genes was used as an indicator of the correlation between DNA methylation and . The probes used for methylation microarrays were specific for CpG islands 121, 129, 39, and 76. b MethPrimer based CpG prediction and primer design for bisulfite gene sequencing for two CpG islands (3 and 7) located at PAX9 upstream region. c Bisulfite sequencing of CpG islands 3 and 7 was performed in 21 and 23 CLL patients respectively and in five healthy controls. A representative electropherogram depicting two methylated (C) or unmethylated (T) CpG sites in island 3 located in 5' region of PAX9 is shown gene expression. The probes showing hypomethylation (log2FC ≤ (-)0.25, p < 0.05) in conjunction with higher expression (log2 FC > 1, p < 0.05) between any two compared conditions were identified. Similarly, the probes exhibiting hypermethylation (log2FC ≥ 0.25, p < 0.05) in conjunction with lower expression (log2 FC < (-)1, p < 0.05) were also identified.
Receiver's operating characteristic curve was used to calculate the cut-off value to determine the low and high expression of a particular gene. The differences in mRNA expression between the groups as obtained from RQ-PCR were compared using the Mann-Whitney Rank Sum test or Kruskal-Wallis One Way Analysis of Variance on Ranks. The relative risk (RR) of treatment initiation was assessed using the Chi-square statistic with Yate's continuity correction. The time to first treatment (TTFT) and overall survival (OS) were compared between the groups using the Kaplan-Meier survival analysis followed by the log-rank test. Hazard ratio (HR) for each variable was calculated using the Cox proportional hazard regression (Sigma Plot Version 13.0, Systat Software, Inc.).

Data access
The DNA methylation as well as the mRNA expression data generated in the study have been submitted to the NCBI Gene Expression Omnibus (GEO) (http:// www.ncbi.nlm.nih.gov/geo/) under accession number GSE81937.

Methylation profile
A comparison of differential methylation between CLL (n = 14) and normal CD19+ B-cells identified a total of 6129 probes to be differentially methylated which were further classified as hypermethylated (5254 probes, 2505 genes) or hypomethylated (875 probes, 753 genes). The differentially methylated probes that represented unknown genes, non-coding RNAs, hypothetical proteins, chromosomal loci, predicted open reading frames, and probes associated with sex-chromosomes were excluded from the downstream analysis. Among the differentially methylated probes, 53.8% of hypermethylated probes were located inside known gene bodies, 38.2% in the promoters, 2.6% in divergent promoters and 5.2% were located downstream of the known genes (Fig. 2a). The frequency distribution of the hypomethylated probes ( Fig. 2b) was comparable to the hypermethylated probes. Of the differentially methylated probes, CpG sites were found in 73% of the hypermethylated probes and in 81% of the hypomethylated probes. The details pertaining to these probes, including the gene name, chromosomal location and distribution are provided in Additional file 1: Tables S2A and S2B.

Correlation of methylation and gene expression analysis
To investigate the possible influence of CpG methylation status on the expression level of the corresponding genes, the gene expression profiles were integrated with the DNA methylation profiles and co-analyzed. On comparing the data of CLL patients with healthy individuals, a negative correlation in CpG methylation and gene expression was observed for 211genes (Additional file 1: Table S4). Of these, 149 genes were hypermethylated and downregulated and 62 genes were hypomethylated and upregulated including AXIN2, ID4, EBF1, SOX4, SOX7, TAL1, PMEPA1, SPRY1, CDK6, and MEIS1. Pathway analysis using the genes having negative correlation for DNA methylation and gene expression in CLL Vs. normal CD19+ cells identified significant enrichment of three KEGG pathways which included p53 signalling pathway (p = 0.002), pathways in cancer (p = 0.005), and the cell cycle pathway (p = 0.007).
Of the various genes found to be differentially methylated and/or differentially expressed, a total of 17 genes ( Table 2) were validated using RQ-PCR on a cohort of 93 (22 female: 71 male) early stage CLL patients and pooled CD19+ B cells from 10 healthy volunteers. The criteria for selection of these genes was negative correlation between CpG promoter methylation and gene expression in CLL Vs. normal (MEIS1, PMEPA1, SOX7, SPRY1, CDK6, ID4, AXIN2, TNRC18) and in the IGHV unmutated Vs. mutated subgroup (CRY1, VIPR1, PAX9, RIC8B). Other genes selected for validation included NFATC1 (hypomethylated in CLL), TBX2, TSHZ3 (hypermethylated in CLL), SPRY2 (upregulated in CLL) and BIK (downregulated in CLL). We focused on these five genes as they had previously been shown in the literature to be epigenetically influenced in CLL [NFATC1   [11]], or in other malignancies [BIK [34], SPRY2 [35], TBX2 [36,37], TSHZ3 [38,39]]. As expected, MEIS1, PMEPA1, SOX7, SPRY1, CDK6, TBX2 were significantly downregulated (p < 0.05) while SPRY2 (p = 0.016), VIPR1 (p = 0.04) and ID4 (p = 0.03) were significantly upregulated in CLL cells as compared to healthy B-cells. Though not significant, AXIN2 was upregulated and TNRC18, NFATC1 and BIK were downregulated in CLL as compared to healthy CD19+ cells ( Table 2). The expression of only CRY1 and PAX9 differed significantly (p < 0.05) with respect to the IGHV mutation status ( Table 2, Fig. 4) The status of hypomethylation of PAX9 among unmutated CLL was confirmed through bisulfite genome sequencing of CpG island 3 in close proximity to CpG110 The statistically significant p values are shown in italics ( Fig. 1).While CpG island 7 did not reveal any significant difference in methylation levels, the average % methylation at CpG island 3 was found to be 52.74% in mutated while 24.72% among unmutated CLL. This further corroborates with the reduced expression of PAX9 in unmutated group of CLL patients as established through microarray-based observations.

Association between gene expression and clinical outcome
Of the 17 genes evaluated for mRNA expression, CRY1 (p = 0.008) and PAX9 (p < 0.001) were expressed at higher levels in Rai stage I and II as compared to stage 0. A progressive increase in the expression of CRY1 (p = 0.004) and PAX9 (p < 0.001) was observed in increasing IPI score categories ranging from 1 to 4. We further explored the association between expression level of candidate genes with relative risk of treatment initiation, TTFT and OS ( Table 3). The relative risk of treatment initiation was significantly higher with high expression of PAX9 (p = 0.001) or CRY1 (p = 0.005). The high expressions of both PAX9 (HR 3.14, 95% CI 1.589-6.205, p < 0.001) as well as CRY1 (HR 3.53, 95% CI 1.789-6.987, p < 0.001) were significantly associated with shorter TTFT (Fig. 5). However, high expression of only PAX9 gene (HR 3.29, 95% CI 1.172-9.272, p = 0.016) was significantly associated with shorter OS (Fig. 6).

Discussion
Extremely variable clinical course of early stage CLL patients highlights the importance of well-described prognostic markers for clinical management of these patients. Various prognostic markers that are currently in use include IGHV mutational status [2], genomic abnormalities [3], expression of ZAP-70 [4], and CD38 [2]. Recent studies have associated specific DNA methylation signatures with specific prognostic subgroups in CLL [14][15][16]19]. The present study has dealt with the methylation profiling of early stage CLL patients on the basis of their IGHV mutational status. The study has identified differential methylation of several genes such as NCOR2, SIX3, CHRM1, NRF1, CRY1, KCNJ2, and SOX5 that have been reported earlier to be differentially methylated in the IGHV-gene based subgroups [16,28]. Besides, an association of promoter hypomethylation of MYLK with the IGHV unmutated cases was also observed in the current study. Since, higher expression of MYLK is known to be significantly correlated with poor clinical outcome [40], it is plausible that promoter hypomethylation of MYLK in the IGHV unmutated cases might be associated with poor prognosis. Furthermore, differential CpG promoter hypomethylation of two important hematopoietic transcription factors MEIS1 and TAL1 which are known methylation targets in B-cell ALL was also observed [41].
An analysis of signalling pathway network for genes with perturbed methylation profiles observed among IGHV unmutated patients indicated the involvement of calcium signalling pathway. Previous studies have suggested that altered Ca 2+ signalling contributes to major tumor progression events including proliferation, migration, invasion, and metastasis [42,43]. Recently, Muggen et al. [44] demonstrated an association of the IGHV mutational status with the level of basal Ca 2+ signalling in CLL. The present study provides evidence that aberrant methylation of genes involved in the calcium signalling pathway might be one of the mechanisms responsible for net differences in the basal Ca 2+ signalling events.
In the present analysis, an inverse correlation between methylation and gene expression was observed for 209 genes in CLL including transcription factors (ID4, NFATC1, TBX2, TAL1, MEIS1), SPRY family members (SPRY1, SPRY2) and SOX family members (SOX4, SOX7). Correlation of promoter methylation of ID4 gene with shortened patient survival has been already    [45]. An association of methylation of TBX2 [37] and SPRY2 [35] with disease progression has also been demonstrated in bladder cancer and B-cell diffuse lymphoma, respectively, but so far, no studies have been reported in CLL.
Screening of inversely correlated genes associated with IGHV mutation status revealed only 21 gene promoters to be significantly hypomethylated and upregulated in unmutated cases (Additional file 1: Table S5). One of these genes encodes for bone morphogenetic protein (BMP) receptor II which is a serine/threonine receptor kinase and has previously been shown to be involved in molecular pathogenesis of hematological malignancies including acute myelomonocytic leukemia, acute promyelocytic leukemia, multiple myeloma as well as CLL [46,47]. Cell surface expression of BMP receptors (BMPRIA and BMPRIB) have been shown to be elevated in advanced stages of CLL [47]. In-vitro studies have shown that co-expression of BMPRII facilitates BMP binding to its receptors and therefore contributes to downstream biological functions [48,49]. This is in line with the results of the present study wherein upregulated BMPRII gene expression and hypomethylation of BMPRII gene was noticed among unmutated subgroup of CLL patients. Alterations in methylation status and associated gene expression levels of another gene CRY1 have also been reported in prognostically distinct subsets of CLL [50] as well as in CML [51]. Our study confirms the possible influence of hypomethylation and upregulated expression of CRY1 in prognostically poor IGHV unmutated CLL and further emphasises its role as potential biomarker for relative risk of treatment initiation and TTFT in early stage CLL. In addition to CRY1, three other circadian rhythm genes NPAS2, BHLHE40, and ARNTL were also observed to be hypomethylated in the unmutated subgroup [52].
PAX9 is one of the nine members of "paired box" (PAX)-containing transcription factor family and its inhibition has been shown to induce apoptosis with increased cleavage of caspase-3 and PARP, increased expression of BAX and decreased expression of BCL-2 in oral squamous cell carcinoma [53]. In the recent years, it has emerged as one of the biomarkers of cell proliferation in lung cancer [54]. A significant association of PAX9 expression with stage, IPI score, relative risk of treatment initiation, TTFT and OS in the present study strengthens its role as an important marker of prognosis in CLL as well.
Since levels of expression of either PAX9 or CRY1 did not show significant difference in CLL patients when compared to healthy controls but rather between patients subgrouped on the basis of the IGHV mutational status, it is plausible that these two genes may be involved in progression of CLL rather than development of the disease. This explanation is further supported by progressively increasing gene expression levels of PAX9 and CRY1 in coherence with advanced Rai stage and higher IPI scores. The mechanism(s) underlying such an influence of these two genes in CLL pathogenesis are not known but might involve apoptotic [53,[55][56][57], or analogous pathways involved in cancer.
Besides, several aberrantly methylated genes were also identified in IGHV mutational status based subgroups which could serve as potential markers in CLL. The major limitation of the present study was that a limited number of genes were evaluated in a small cohort of early stage CLL patients. Further studies on large cohorts of early stage CLL patients for expression patterns of additional set of genes are required that may help in characterizing the functional role of the genes identified in the present study. Identification of relevant epigenetically influenced genes that have an impact on gene expression as well as clinical outcome may pave way for identification and development of therapeutically relevant drug targets.

Conclusions
The present study confirms the prognostic role of CRY1 in CLL as its aberrant methylation and expression is associated with high risk of treatment initiation and shorter time to first treatment. In addition, this study highlights PAX9 as a novel marker of prognostication in CLL as its expression was significantly associated with high risk of treatment initiation, shorter time to first treatment and overall survival.

Additional file
Additional file 1: Table S1. List of primers used in RQ-PCR studies. Table S2A. List of probes hypermethylated in CLL in comparison to CD19+ cells from healthy individuals. Table S2B. List of probes hypomethylated in CLL in comparison to normal CD19+ cells. Table S3A. List of probes hypermethylated in unmutated in comparison to mutated CLL. Table S3B. List of probes hypomethylated in unmutated CLL in comparison to mutated CLL. Table S4. List of genes having negative correlation for methylation and gene expression in CLL as compared to normal 19+ cells. Table S5. List of genes having negative correlation for methylation and gene expression in unmutated CLL as compared to mutated CLL. (XLS 1339 kb)