Serum methylation of GALNT9, UPF3A, WARS, and LDB2 as noninvasive biomarkers for the early detection of colorectal cancer and advanced adenomas
Clinical Epigenetics volume 15, Article number: 157 (2023)
Early detection has proven to be the most effective strategy to reduce the incidence and mortality of colorectal cancer (CRC). Nevertheless, most current screening programs suffer from low participation rates. A blood test may improve both the adherence to screening and the selection to colonoscopy. In this study, we conducted a serum-based discovery and validation of cfDNA methylation biomarkers for CRC screening in a multicenter cohort of 433 serum samples including healthy controls, benign pathologies, advanced adenomas (AA), and CRC.
First, we performed an epigenome-wide methylation analysis with the MethylationEPIC array using a sample pooling approach, followed by a robust prioritization of candidate biomarkers for the detection of advanced neoplasia (AN: AA and CRC). Then, candidate biomarkers were validated by pyrosequencing in independent individual cfDNA samples. We report GALNT9, UPF3A, WARS, and LDB2 as new noninvasive biomarkers for the early detection of AN. The combination of GALNT9/UPF3A by logistic regression discriminated AN with 78.8% sensitivity and 100% specificity, outperforming the commonly used fecal immunochemical test and the methylated SEPT9 blood test.
Overall, this study highlights the utility of cfDNA methylation for CRC screening. Our results suggest that the combination methylated GALNT9/UPF3A has the potential to serve as a highly specific and sensitive blood-based test for screening and early detection of CRC.
Colorectal cancer (CRC) is the third cancer with the highest incidence worldwide and the second leading cause of cancer death in both sexes . Diagnosis at advanced symptomatic stages is responsible for low survival (14% for stage IV) compared to 90% five-year survival for stages I and II . Although the implementation of screening programs is related to the reduction in CRC incidence and mortality , the overall participation rate in stool-based screening programs using the fecal immunochemical test (FIT) followed by a confirmatory colonoscopy remains modest (49.5% in Europe and 43.8% worldwide) . In CRC screening settings, FIT reports high specificity (95%) and convenient sensitivity (70–75%) for colorectal tumors [5, 6], but moderate-to-low sensitivity for AA (22–44%) [6,7,8]. The inconsistent sensitivity of the FDA-approved SEPT9 blood methylation test for the detection of CRC and AA [9, 10] contraindicates its use for screening.
Since the effectiveness of a screening test relies not only on the test performance but also on its acceptance by the target population, test preference for CRC screening has been evaluated. A survey-based study reported as first choice a blood test over a stool one ; similarly, among screening-enrolled individuals who refused colonoscopy, 83% preferred a blood-based test over the 15% that chose a fecal test . Therefore, participation in screening programs could significantly improve by offering a noninvasive blood-based test.
Liquid biopsy has emerged as a noninvasive alternative to traditional procedures for sampling. Blood-based screening is easily available, repeatable, and minimally invasive . Circulating cell-free DNA (cfDNA) can be detected in body fluids and reflects alterations occurring during neoplastic transformation, such as aberrant methylation in colorectal carcinogenesis [14, 15]. This fact, together with the fact that this epigenetic mark is stable during DNA extraction, makes methylation a particularly interesting source of biomarkers for CRC. Indeed, alterations in DNA methylation arise during early stages of tumor progression and heterogeneity of the different pathways to CRC is already detectable in adenomas [16, 17]. The Illumina MethylationEPIC BeadChip array combined with sample pooling represents a particularly suitable strategy for the cost-effective analysis of large sample sets aiming to discover differentially methylated signatures . In this study, following a cfDNA pooling strategy, we aimed to identify noninvasive methylation biomarkers for the early detection of both CRC and premalignant advanced adenomas. Here, we report the discovery and independent validation of serum-based methylation biomarkers that provide a new highly specific and sensitive noninvasive test for the screening and early detection of CRC and advanced adenomas.
The study was conducted in three phases: (i) We first performed a high-throughput discovery analysis based on a sample pooling strategy, paired with a statistically robust biomarker prioritization, to identify candidate noninvasive methylation biomarkers for the joint detection of AA and CRC. Next, we designed targeted assays for the quantification of the candidate biomarkers in an independent cohort of patients (individual serum samples). The targeted analysis was divided into (ii) an evaluation of the candidate biomarkers, followed by a feature selection step by penalized logistic regression to obtain specific predictive biomarkers subsets, and (iii) a subsequent validation and final statistical model construction. The final classification models were also evaluated in non-colorectal tumors. An overview of the study design is shown in Fig. 1.
Patients and samples
Individuals were recruited from the following Spanish Hospitals: Complexo Hospitalario Universitario de Ourense, Hospital Clínic de Barcelona, Hospital Donostia, and Hospital General Universitario de Alicante. A total of 433 symptomatic or asymptomatic individuals between 50 and 75 years old were included. Exclusion criteria comprised a personal history of CRC, digestive cancer or inflammatory bowel disease, a severe synchronic illness, and a previous colectomy. All individuals underwent a colonoscopy, and blood samples were obtained immediately before colonoscopy. Blood samples were coagulated and centrifuged for serum collection. Circulating cell-free DNA (cfDNA) was extracted from 0.5 to 2 mL serum according to availability. Serum samples were stored at − 20 °C until cfDNA extraction.
Individuals were classified according to the most advanced colorectal finding. Advanced adenomas (AA) are defined as adenomas ≥ 1 cm, with villous components or high-grade dysplasia. ‘Advanced colorectal neoplasia’ (AN) was defined as AA or CRC. Individuals with no colorectal findings (NCF), benign pathologies (BEN: hemorrhoids and diverticula), and non-advanced adenomas (NAA) were considered together as ‘no neoplasia’ (NN). Lesions were considered ‘proximal’ when located only proximal to the splenic flexure of the colon and ‘distal’ when found only in the distal colon or in both distal and proximal colon.
We performed a stratified random sampling using colorectal finding and sex as stratifying variables. Strata were matched by age and recruitment hospital. This multicenter cohort was separated into two independent subsets: Discovery cohort (n = 280; 140 female and 140 male) and Biomarker validation sample set (n = 153; 73 female and 80 male). The latter was randomly split between a Biomarker evaluation cohort (30%, n = 48) and a Model validation cohort (70%, n = 105) (Table 1).
Methylation microarray data were compared with an external cohort of patients from Hospital Clínic de Barcelona (n = 71). Serum and tissue methylation of these patients was quantified by Reduced Representation Bisulfite Sequencing (RRBS), targeting CpG-rich regions of 20 AA and 27 CRC cases, with matched adenoma/tumor, healthy mucosa, and serum samples of each patient, and 24 serum samples from healthy controls.
The specificity of the biomarkers for the detection of CRC and AA was evaluated in an independent cohort of 16 patients with different cancer types (breast, kidney, lung, ovary, and prostate cancer) (Additional file 1: Table S4). Additionally, 8 pairs of matched serum and plasma samples were used to account for differences in the methylation levels between serum and plasma (3 NCF, 2 BEN, and 3 NAA).
DNA extraction and sample pooling
cfDNA from samples used for biomarker discovery was extracted with a phenol–chloroform protocol . DNA was quantified using Qubit dsDNA HS Assay Kit (Thermo Fisher Scientific, Waltham, MA, USA). For methylation biomarker discovery, we followed a DNA pooling strategy as previously described . Twenty-eight independent cfDNA pooled samples were constructed using equal amounts of cfDNA from 5 men and 5 women from the same pathological group, recruitment hospital- and age-matched. Pools were divided into NN and AN. The NN group comprised 13 cfDNA pooled samples: 3 pools of NCF individuals, 5 pools of BEN, and 5 pools of NAA. On the other hand, the 15 cfDNA pooled samples of AN included 5 pools of proximal AA (P-AA), 5 pools of distal AA (D-AA), 3 pools of CRC stages I-II, and 2 pools of CRC stages III-IV (Additional file 1: Table S2). The cfDNA pools were stored at – 20 °C and were sent to the Cancer Epigenetics and Biology Program facilities at the Bellvitge Biomedical Research Institute (IDIBELL, Barcelona, Spain) for processing and methylation quantification.
The QIAmp Circulating Nucleic Acid Kit (Qiagen, Hilden, Germany) was used for cfDNA extraction from serum and plasma samples in the evaluation and validation phase. Individual cfDNA samples were bisulfite-converted using EZ DNA Methylation-Direct Kit (Zymo Research, Irvine, CA, USA) and stored at – 80 °C.
Genome-wide DNA methylation measurements
DNA methylation of pooled samples was measured with the Infinium MethylationEPIC BeadChip array (Illumina, San Diego, CA, USA) following the manufacturer’s instructions. A total of 865,859 CpG sites were quantified throughout the genome. Importantly, to minimize the potential impact of batch effects and confounder variability, samples of each pathological group were randomly allocated to the slides. That is, for each beadchip array, we randomly selected one sample from each of the pathological groups, avoiding hybridizing in the same beadchip cfDNA pools from the same pathological group.
Methylation biomarker discovery
Illumina methylation data were preprocessed and analyzed using the R/Bioconductor environment (see Additional file 1 for details on quality control and preprocessing). To test for differentially methylated positions (DMPs) between AN and NN linear models were fitted for each CpG site across all samples by generalized least squares, and an empirical Bayes method was used to compute the p values. Linear regression assumptions were checked for each model .
To select and prioritize the DMPs as candidate biomarkers, we first applied the constraint-based statistically equivalent signature (SES) algorithm for feature selection : Multiple CpG sets with minimal size and maximal predictive power for the binary classification problem NN vs AN were obtained by iteratively comparing logistic regression models through a chi-square test. Then, the different CpG subsets were used to build classification models based on support vector machine, random forest, and logistic regression. Models were cross-validated to select candidate CpG biomarkers with maximal methylation differences and minimum prediction error for NN vs AN classification (see Additional file 1 for details).
Targeted methylation quantification by bisulfite pyrosequencing
The methylation levels of the candidate biomarkers, together with the v2 promoter region of the SEPT9 gene, were evaluated by bisulfite pyrosequencing in the 153 individual serum samples from the biomarker validation cohort. Bisulfite-converted cfDNA (2 μl) was subjected to PCR amplification using primers flanking the CpG candidate biomarkers. Multiplex reactions including 3–6 candidate markers were performed, followed by nested singleplex PCR reactions using a biotin-labeled primer. Primers and PCR conditions for multiplex and singleplex PCR are provided in Additional file 1 and Additional file 1: Table S3. Pyrosequencing was performed using a PyroMark Q96 ID pyrosequencer (Qiagen, Hilden, Germany). Data acquisition and methylation measurements were conducted at the Biomedical Research Institute of Malaga facilities (IBIMA, Málaga, Spain) using PyroMark Q96 ID software, CpG analysis mode (v1.0.11).
Biomarker selection and classification model development
Raw and log10-transformed methylation percentages were subjected to multivariate analyses for the development and validation of methylation-based classification models. The validation set of 153 individual samples was randomly divided for the multi-step process of methylation panel development (Table 1).
First, a preliminary evaluation of the methylation levels of the candidate biomarkers was carried out in a 30% sample subset (n = 48: 21 NN, 8 D-AA, 8 P-AA, and 11 CRC cases). Penalized logistic regressions, Least Absolute Shrinkage and Selection Operator (LASSO) and Elastic net, were applied to the candidate biomarkers, age, and sex for feature selection, with the glmnet package . The minimum mean cross-validation error was used to define the penalty factor. Biomarkers present in the LASSO and Elastic net-derived models were selected for validation. Then, multivariate logistic regressions were fitted in the remaining 70% of samples (n = 105: 39 NN, 23 D-AA, 19 P-AA, and 24 CRC cases) to derive models based on the selected biomarkers.
All statistical analyses were performed with the R environment (v3.4.0). In the epigenome-wide methylation analyses, p values were adjusted for multiple testing with the Benjamini–Hochberg procedure to control the false discovery rate (FDR). One-sided Fisher’s exact tests were used to assess the significance of the enrichment of the DMPs to functionally annotated elements. To assess the performance of the classification models, receiver operating characteristic (ROC) curves were elaborated, derived by the leave-one-out cross-validation approach, and AUC, sensitivity, specificity, negative and positive predictive values (NPV, PPV) were estimated with their corresponding 95% confidence intervals. The best cutoff values were determined by the Youden index method . Fisher’s exact tests were employed to compare the proportion of distal and proximal lesions detected. Wilcoxon rank-sum test was used to compare the methylation levels between NN and AN in individual serum samples. Nonparametric Wilcoxon signed-rank test was used to compare methylation levels between matched serum and plasma samples.
A total of 28 cfDNA pooled samples (NN group 13 pools and AN 15 pools; cfDNA quantity 62–403 ng; age range from individuals 51–72) were used in the epigenome-wide methylation analysis for biomarker discovery (Additional file 1: Table S2). There was no statistically significant difference in the mean age between pools (ANOVA, p value > 0.05). The age range matches the USPSTF guideline recommendation for CRC screening .
Epigenome-wide biomarker discovery
MethylationEPIC BeadChip was used for quantitative DNA methylation profiling in the 28 cfDNA pooled samples. We correctly detected 99.95% of the total array probes. Failed positions and probes not holding the assumptions for linear regression model fitting were discarded. After quality control and preprocessing, a total of 741,310 CpG sites mapped to the human genome assembly GRCh37/hg19 were left for differential methylation testing. No samples were removed due to quality issues.
Since the purpose of screening programs includes the early detection of preclinical CRC and the detection and removal of AA , differential methylation was assessed between NN and AN groups. This analysis revealed 376 differentially methylated positions (10% FDR, BH-adjusted p value) (Fig. 2A), annotated to a total of 290 gene regions and 183 CpG islands. Most CpG sites (326 DMPs, 86.7%) were found hypermethylated in AN (Fig. 2B). Concerning the distribution across functional elements, DMPs were mainly located in open sea regions (51.59%) and CpG islands (23.67%) (Fig. 2C). Differentially hypermethylated positions were significantly enriched in CpG islands, shelves, and gene body regions (Fig. 2D). Clustering analyses of all pooled samples based on the methylation values of the 376 DMPs (Fig. 3A) suggest the capacity of this differentially methylated signature to discriminate AN from NN controls.
To identify the most relevant features, a robust strategy of selection was followed. First, the constraint-based SES algorithm was applied to the 376 DMPs to identify 3,256 combinations of CpG pairs whose performances for the NN vs AN classification were statistically equivalent. CpG sites were ranked according to their prediction error (logistic regression, random forest, and SVM models based on CpG pairs) and their absolute methylation differences. From the ranked list, we selected the top 15 CpG sites with greater methylation differences and present in models with minimum classification error. Finally, due to the limited sensitivity of FIT for the detection of AA, we also selected 3 additional CpG sites that presented 0% prediction error for the detection of AA.
In parallel, our data were compared to an external RRBS dataset, comprising serum and tissue samples from patients with CRC, AA, and healthy controls. We selected a subset of 8 CpG sites that reported more than 30% differences in the methylation levels by bisulfite sequencing, showing the same direction (hyper/hypomethylated) in our methylation microarray data.
Altogether, a total of 26 CpG positions were selected as candidate biomarkers to discriminate colorectal advanced neoplasia (Fig. 3B). Three of 26 markers were hypomethylated, while the rest were found hypermethylated in AN compared to NN (Fig. 4A). Description, regulatory features, and relation to CpG island are available in Additional file 1: Table S4.
Evaluation of candidate methylation biomarkers and further selection in individual samples
The methylation levels of the 26 candidate biomarkers and the v2 promoter region of the SEPT9 gene were first quantified in the individual samples from the biomarker evaluation cohort (n = 48). Pyrosequenced regions are detailed in Additional file 1: Table S3. The methylation levels of the 26 candidate biomarkers in both pooled (MethylationEPIC-derived) and individual cfDNA samples (bisulfite pyrosequencing) are shown in Figs. 4A and B, respectively. There was a significant positive correlation (Pearson’s r > 0.6, p value < 0.001) between cfDNA methylation in pooled and individual serum samples (Fig. 4C, D). The performance (ROC curves, AUC, sensitivity, and specificity) of the 26 biomarkers for the detection of AN in the biomarker evaluation cohort is shown in Additional file 1: Figure S1.
To further reduce the number of amplicons for validation, penalized logistic regressions, LASSO and Elastic net, were fitted to the complete list of 26 candidate biomarkers. Although only CG3, CG8, CG15, CG16, CG20, CG24, and CG25 reported statistically significant methylation differences (Wilcoxon rank-sum rest p value < 0.05) between NN and AN (Fig. 4B), variable selection was applied to the whole set of 26 candidate biomarkers since it has been reported that prediction power is not always increased with variables significantly correlated with the outcome . We produced sparse models containing two (CG3-GALNT9 and CG15-UPF3A; AUC: 0.905, 95% CI 0.801–1) and four (CG3-GALNT9, CG15-UPF3A, CG5-WARS, and CG24-LDB2; AUC: 0.827, 95% CI 0.651–1) methylation biomarkers, derived by LASSO and Elastic net regularization, respectively. Also, the application of Elastic net to the 26 biomarkers, including age and sex, generated a model containing 20 biomarkers and sex (AUC: 0.820, 95% CI 0.675–0.966). The four biomarkers selected by the sparse models are included in the 20-biomarker signature. Hence, this 20-biomarker set (Table 2) was selected to proceed with the validation phase.
Validation of the selected biomarkers and final model construction
The final 20 selected methylation biomarkers were then quantified in the Model validation cohort (n = 105) (Fig. 5A). The performance (ROC curves, AUC, sensitivity, and specificity) of the 20 selected biomarkers for the detection of AN is presented in Additional file 1: Figure S2. Multivariate logistic regressions were fitted to the selected biomarker subsets obtained from the three best-performing models from the previous step, to derive three new models containing 2 (CG3-GALNT9 and CG15-UPF3A), 4 (CG3-GALNT9, CG15-UPF3A, CG5-WARS, and CG24-LDB2), and 20 biomarkers. Performances of the diagnostic prediction models are summarized in Table 3, while ROC curves are provided in Fig. 5B.
The model composed of GALNT9 and UPF3A yielded an AUC of 0.896 (95% CI 0.835–0.958), discriminating AN from NN with 78.8% sensitivity and 100% specificity. This model identified 33/42 AA cases (78.6%) and 19/24 CRC patients (79.2%), with notable detection of early-stage CRC (87.5% and 100% for stages I and II, respectively). This 2-biomarker panel detected 87% of distal AA and 68.4% proximal AA. On the other hand, the model containing GALNT9, UPF3A, WARS, and LDB2 yielded an AUC of 0.864 (95% CI 0.798–0.931), with 62.1% sensitivity and 97.4% specificity for AN, detecting 28/42 AA (66.7%; 78.3% distal and 52.6% proximal), and 13/24 CRC cases (54.2%; 37.5 stage I and 75% stage II). These two models demonstrated no significant differences in the detection of distal AA compared to proximal ones (Fisher’s exact test p value > 0.05). Finally, the model containing 20 biomarkers and sex reported the highest sensitivity (92.4%) and the highest detection rate for AA and CRC (92.9% and 91.7%, respectively), with reduced specificity (17.9%). The logistic classification rules of the 4- and 2-biomarker panel are detailed in Additional file 1. CG3-GALNT9 was hypermethylated in AN, while CG15-UPF3A, CG5-WARS, and CG24-LDB2 showed hypomethylation in AN. Differences in methylation levels between NN and AN were statistically significant (Wilcoxon rank-sum test p value < 0.05; Fig. 5A). Since plasma samples are most commonly used as a source of cfDNA, methylation of GALNT9, UPF3A, WARS, and LDB2 was also evaluated in 8 pairs of matched serum and plasma samples. No statistical differences were found between serum and plasma methylation levels (Wilcoxon signed-rank test p value > 0.01) (Fig. 5E).
Analysis and performance of Septin9 methylation
None of the MethylationEPIC probes targeting the v2 promoter of SEPT9 (cg02884239, cg20275528, cg12783819) reported significance for the detection of AN in the cfDNA pooled samples, showing an average methylation difference of 0.14% between NN and AN (Fig. 4A). We reported hypomethylation of SEPT9 in AN in both the biomarker evaluation cohort (n = 48) (Fig. 4B) and the model validation cohort (n = 105) (Fig. 5A). The diagnostic performance of SEPT9 evaluated in the model validation cohort yielded an AUC of 0.504 (95% CI 0.389–0.618) for AN detection and sensitivity and specificity values of 15.2% and 100%, respectively (Table 3, Fig. 5D). Only 7.1% of AA and 29.2% of CRC were detected.
Evaluation of the classification models in non-colorectal tumors
To assess the ability of the models to specifically detect AA and CRC, methylation of GALNT9, UPF3A, WARS, and LDB2 was quantified in serum from patients with lung, breast, kidney, prostate, and ovarian cancer (n = 16) (Fig. 5C, Additional file 1: Table S1). The GALNT9/UPF3A model misclassified as AN 3 out of 16 cancer cases (18.75%: prostate, kidney, and breast cancer). On the other hand, the model composed of GALNT9, UPF3A, WARS, and LDB2 incorrectly identified 1 out of 16 cancer cases (6.25%: ovarian cancer).
Early detection has proven to be the most effective strategy to reduce both the incidence and mortality of CRC [3, 27]. FIT is the noninvasive screening strategy mostly used for CRC despite having a modest sensitivity for the detection of premalignant AA.[6,7,8] Also, most existing FIT-based screening programs suffer from low participation rates (43.8% worldwide) [4, 28]. Currently, there is no ideal noninvasive biomarker for the early detection of CRC and AA. To this end, the development of liquid biopsy technology has shown to be a promising approach for CRC screening, diagnosis, follow-up, and treatment guidance .
In this study, we first conducted an epigenome-wide analysis with the MethylationEPIC array using a cfDNA pooling approach to discover potential blood-based biomarkers for the joint detection of AA and CRC. The selection process of the candidate biomarkers was conducted by penalized logistic regression. After prioritization of candidate biomarkers and evaluating their methylation levels in individual samples from an independent cohort, we developed and cross-validated three prediction models for the detection of AN (AA and CRC). The first one, the 20 methylation biomarkers with sex, yielded a sensitivity of 92.4% for AN, at 18% specificity. Despite its high sensitivity and highest detection rate for AA (92.9%) and CRC (91.7%), such low specificity is not cost-effective for screening programs. Secondly, the model composed of GALNT9, UPF3A, WARS, and LDB2 reported 62.1% sensitivity and 97.4% specificity, while the GALNT9/UPF3A model discriminated AN with 78.8% sensitivity and 100% specificity, showing the best prediction performance for CRC screening. The sensitivities reported for our biomarkers are comparable to that of FIT for CRC detection (70–75%) and higher for AA (22–44%), with increased specificity (97.4–100% and 81–97% for our biomarkers and FIT, respectively) [5,6,7,8]. The GALNT9/UPF3A panel also fulfills the main objective of CRC screening, that is, the detection of preclinical CRC and premalignant AA, as reported suitable detection rate for AA (78.6%) and early CRC stages I and II (87.5% and 100%, respectively). Also, no statistically significant differences were reported between the detection of distal (87%) and proximal (68.4%) AA, in contrast with FIT which performs better for distal lesions .
Among the four methylation biomarkers, only UPF3A has been previously related to CRC. Located in a CpG island-shelf, we reported the hypomethylation of UPF3A in AN. High expression levels of this gene were associated with TNM stage, liver metastasis, and recurrence in CRC . GALNT9 is also located in a CpG island and showed hypermethylation in AN, which was also reported in brain metastasis from primary breast cancer . On the other hand, WARS and LDB2 show hypomethylation in AN but are located within opensea regions. High expression levels of WARS were found in high microsatellite-instable gastrointestinal adenocarcinomas, associated with poor prognosis , while a decreased expression of LDB2 was associated with a more favorable outcome in lung adenocarcinoma patients .
Nowadays, the only blood methylation biomarker approved by the FDA for CRC detection is SEPT9. Nevertheless, the diagnostic performance of SEPT9 is variable and inconsistent, with sensitivities ranging from 36–93% for CRC and 22–49% for AA, with 79–99% specificity [9, 10]. In an asymptomatic average-risk cohort, SEPT9 showed lower performance than FIT (sensitivity: 68% vs. 79%; specificity: 80% vs. 94%, respectively) . In our study, we also evaluated the performance of SEPT9. The 3 CpG sites targeting SEPT9 interrogated in the MethylationEPIC BeadChip were not differentially methylated between NN and AN, and in the final validation cohort, the sensitivity for AN and AA resulted in 15.1% and 7%, respectively, with 100% specificity. Results may not be fully comparable since the commercial test is based on plasma qPCR, while we quantified SEPT9 methylation by pyrosequencing in serum samples.
Several blood-based methylation biomarkers have emerged for CRC detection. Methylated markers such as BCAT1 and IKZF1 , C9orf50, KCNQ5, and CLIP4 , SFRP2 and SDC2 , cg10673833 , APC, MGMT, RASSF2A, and Wif-1 , ALX4, BMP3, NPTX2, RARB, SDC2 and VIM , and NEUROG1  have reported sensitivities ranging from 66–91% for CRC detection and 5–58% for premalignant AA, with 73–99% specificity.
In the ongoing validation of CRC screening biomarkers, it is important to consider data on protocol acceptability. A blood-based test has the potential to improve compliance and participation in CRC screening, as reported by a randomized controlled trial  and a screening study . To optimize participation in CRC screening, perhaps both a fecal and a blood-based test should be offered to target different preferences. In this way, a blood test could be proposed as a screening option to invitees refusing FIT. Another option for implementing a blood test in screening programs is triaging FIT-positive individuals for improved selection to colonoscopy . Figure 6 shows a schematic representation of the possible implementation of a blood test in CRC screening, both as an alternative to FIT aiming to increase participation rates, and as a triage approach to optimize selection to colonoscopy.
To the best of our knowledge, this is the first study conducting a serum-based discovery and validation of cfDNA methylation biomarkers for CRC screening. Our study design, targeting the final sample format (serum), enhances the possibility to discover and translate robust noninvasive biomarkers. Additionally, our results underline the feasibility of cfDNA pooled samples as an affordable approach for biomarker discovery, increasing the DNA input when small amounts are available [18, 43, 44]. Our multicenter cohort includes colonoscopy-diagnosed individuals with CRC, AA, healthy controls, and also benign pathologies (non-advanced adenomas, hemorrhoids, and diverticula) typically found during screening programs that influence test specificity. The ability of our methylation biomarkers to specifically detect CRC was also confirmed.
Nevertheless, our study has some limitations. Firstly, CRC cases were mostly diagnosed as having symptoms, and secondly, the proportions of CRC cases, tumor stages, and the rest of the pathologies are not fully representative of a screening population. The proposed methylation models should be validated in a large prospective average-risk screening setting.
We have discovered and reported GALNT9, UPF3A, WARS, and LDB2 as new noninvasive biomarkers for the early detection of CRC and AA, regardless of the location of the lesion. We propose that the combination of methylated GALNT9/UPF3A is the most promising to serve as a highly specific and sensitive blood-based test for screening and detection of CRC at an early and curable stage, even at the premalignant lesion phase.
Availability of data and materials
The Infinium MethylationEPIC data from all the pooled samples generated and analyzed during this study have been deposited in the NCBI Gene Expression Omnibus (GEO) (www.ncbi.nlm.nih.gov/geo) and are accessible through GEO Series accession number GSE186381. [https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE186381].
Area under the curve
Benign pathologies Hemorrhoids and diverticula
Circulating cell-free DNA
Individuals with advanced adenomas of both distal and proximal locations
Differentially methylated position
False discovery rate
Fecal immunochemical test
No colorectal findings
Negative predictive value
Positive predictive value
Receiver operating characteristic
Bray F, Ferlay J, Soerjomataram I, Siegel RL, Torre LA, Jemal A. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2018;68(6):394–424.
Siegel RL, Miller KD, Jemal A. Cancer statistics, 2019. CA Cancer J Clin. 2019;69(1):7–34.
Ladabaum U, Dominitz JA, Kahi C, Schoen RE. Strategies for colorectal cancer screening. Gastroenterology. 2020;158(2):418–32.
Senore C, Basu P, Anttila A, Ponti A, Tomatis M, Vale DB, et al. Performance of colorectal cancer screening in the European union member states: data from the second European screening report. Gut. 2019;68(7):1232–44.
Lee JK, Liles EG, Bent S, Levin TR, Corley DA. Accuracy of fecal immunochemical tests for colorectal cancer: systematic review and meta-analysis. Ann Intern Med. 2014;160(3):171.
Imperiale TF, Gruber RN, Stump TE, Emmett TW, Monahan PO. Performance characteristics of fecal immunochemical tests for colorectal cancer and advanced adenomatous polyps: a systematic review and meta-analysis. Ann Intern Med. 2019;170:319–29.
Kim NH, Yang HJ, Park SK, Park JH, Park DI, Sohn CI, et al. Does low threshold value use improve proximal neoplasia detection by fecal immunochemical test. Dig Dis Sci. 2016;61(9):2685–93.
Lin JS, Piper MA, Perdue LA, Rutter CM, Webber EM, O’Connor E, et al. Screening for colorectal cancer: updated evidence report and systematic review for the US preventive services task force. JAMA. 2016;315(23):2576.
Song L, Peng X, Li Y, Xiao W, Jia J, Dong C, et al. The SEPT9 gene methylation assay is capable of detecting colorectal adenoma in opportunistic screening. Epigenomics. 2017;9(5):599–610.
Song L, Jia J, Peng X, Xiao W, Li Y. The performance of the SEPT9 gene methylation assay and a comparison with other CRC screening tests: a meta-analysis. Sci Rep. 2017;7(1):3032.
Osborne J, Flight I, Wilson C, Chen G, Ratcliffe J, Young G. The impact of sample type and procedural attributes on relative acceptability of different colorectal cancer screening regimens. Patient Prefer Adherence. 2018;12:1825–36.
Adler A, Geiger S, Keil A, Bias H, Schatz P, DeVos T, et al. Improving compliance to colorectal cancer screening using blood and stool based tests in patients refusing screening colonoscopy in Germany. BMC Gastroenterol. 2014;14(1):1–8.
Mazouji O, Ouhajjou A, Incitti R, Mansour H. Updates on clinical use of liquid biopsy in colorectal cancer screening, diagnosis, follow-up, and treatment guidance. Front Cell Develop Biol. 2021;9:660924.
Danese E, Minicozzi AM, Benati M, Montagnana M, Paviati E, Salvagno GL, et al. Comparison of genetic and epigenetic alterations of primary tumors and matched plasma samples in patients with colorectal cancer. Lo AWI, editor. PLoS One. 2015;10(5):e0126417.
Galanopoulos M, Tsoukalas N, Papanikolaou IS, Tolia M, Gazouli M, Mantzaris GJ. Abnormal DNA methylation as a cell-free circulating DNA biomarker for colorectal cancer detection: a review of literature. World J Gastrointest Oncol. 2017;9(4):142–52.
Luo Y, Wong CJ, Kaz AM, Dzieciatkowski S, Carter KT, Morris SM, et al. Differences in DNA methylation signatures reveal multiple pathways of progression from adenoma to colorectal cancer. Gastroenterology. 2014;147(2):418–29.
Bormann F, Rodríguez-Paredes M, Lasitschka F, Edelmann D, Musch T, Benner A, et al. Cell-of-Origin DNA methylation signatures are maintained during colorectal carcinogenesis. Cell Rep. 2018;23(11):3407–18.
Gallardo-Gómez M, Moran S, Páez de la Cadena M, Martínez-Zorzano VS, Rodríguez-Berrocal FJ, Rodríguez-Girondo M, et al. A new approach to epigenome-wide discovery of non-invasive methylation biomarkers for colorectal cancer screening in circulating cell-free DNA using pooled samples. Clin Epigenetics. 2018;10(1):1–10.
Hufnagl C, Stöcher M, Molk M, Geisberger R, Greil R. A modified Phenol-chloroform extraction method for isolating circulating cell free DNA of tumor patients. J Nucleic Acids Investig. 2013;4(1):e1.
Mansell G, Gorrie-Stone TJ, Bao Y, Kumari M, Schalkwyk LS, Mill J, et al. Guidance for DNA methylation studies: statistical insights from the Illumina EPIC array. BMC Genom. 2019;20(1):1–15.
Lagani V, Athineou G, Farcomeni A, Tsagris M, Tsamardinos I. Feature selection with the R package MXM: discovering statistically equivalent feature subsets. J Stat Softw. 2017;80(7).
Friedman J, Hastie T, Tibshirani R. Regularization paths for generalized linear models via coordinate descent. J Stat Softw. 2010;33(1):1–22.
Youden WJ. Index for rating diagnostic tests. Cancer. 1950;3(1):32–5.
Davidson KW, Barry MJ, Mangione CM, Cabana M, Caughey AB, Davis EM, et al. Screening for colorectal cancer. JAMA. 2021;325(19):1965.
Brenner H, Stock C, Hoffmeister M. Colorectal cancer screening: the time to act is now. BMC Med. 2015;13(1):262.
Lo A, Chernoff H, Zheng T, Lo SH. Why significant variables aren’t automatically good predictors. Proc Natl Acad Sci U S A. 2015;112(45):13892–7.
Keum NN, Giovannucci E. Global burden of colorectal cancer: emerging trends, risk factors and prevention strategies. Nat Rev Gastroenterol Hepatol. 2019;16(12):713–32.
Navarro M, Nicolas A, Ferrandez A, Lanas A. Colorectal cancer population screening programs worldwide in 2016: an update. World J Gastroenterol. 2017;23(20):3632.
Chiu HM, Jen GHH, Wang YW, Fann JCY, Hsu CY, Jeng YC, et al. Long-term effectiveness of faecal immunochemical test screening for proximal and distal colorectal cancers. Gut. 2021;70(12):2321–9.
Bao X, Huang Y, Xu W, Xiong G. Functions and clinical significance of UPF3A expression in human colorectal cancer. Cancer Manag Res. 2020;12:4271–81.
Hussain MRM, Hoessli DC, Fang M. N-acetylgalactosaminyltransferases in cancer. Oncotarget. 2016;7(33):54067–81.
Lu S, Wang LJ, Lombardo K, Kwak Y, Kim WH, Resnick MB. Expression of Indoleamine 2, 3-dioxygenase 1 (IDO1) and Tryptophanyl-tRNA Synthetase (WARS) in Gastric Cancer Molecular Subtypes. Appl Immunohistochem Mol Morphol. 2019;28(5):360–8.
Zhai D, Wang G, Li L, Jia X, Zheng G, Yin J. LIM-domain binding protein 2 regulated by m6A modification inhibits lung adenocarcinoma cell proliferation in vitro. Nan Fang Yi Ke Da Xue Xue Bao. 2021;41(3):329–35.
Pedersen SK, Symonds EL, Baker RT, Murray DH, McEvoy A, Van Doorn SC, et al. Evaluation of an assay for methylated BCAT1 and IKZF1 in plasma for detection of colorectal neoplasia. BMC Cancer. 2015;15(1):1.
Jensen SØ, Øgaard N, Ørntoft MBW, Rasmussen MH, Bramsen JB, Kristensen H, et al. Novel DNA methylation biomarkers show high sensitivity and specificity for blood-based detection of colorectal cancer—A clinical biomarker discovery and validation study. Clin Epigenet. 2019;11(1):1–14.
Zhao G, Ma Y, Li H, Li S, Zhu Y, Liu X, et al. A novel plasma based early colorectal cancer screening assay base on methylated SDC2 and SFRP2. Clin Chim Acta. 2020;503:84–9.
Luo H, Zhao Q, Wei W, Zheng L, Yi S, Li G, et al. Circulating tumor DNA methylation profiles enable early diagnosis, prognosis prediction, and screening for colorectal cancer. Sci Transl Med. 2020;12(524):1–12.
Lee B Bin, Lee EJ, Jung EH, Chun H-K, Chang DK, Song SY, et al. Aberrant methylation of APC, MGMT, RASSF2A, and Wif-1 genes in plasma as a biomarker for early detection of colorectal cancer. Clin Cancer Res. 2009;15(19):6185–91.
Rasmussen SL, Krarup HB, Sunesen KG, Pedersen IS, Madsen PH, Thorlacius-Ussing O. Hypermethylated DNA as a biomarker for colorectal cancer: a systematic review. Color Dis. 2016;18(6):549–61.
Otero-Estévez O, Gallardo-Gomez M, de la Cadena MP, Rodríguez-Berrocal FJ, Cubiella J, Ramirez VH, et al. Value of serum NEUROG1 methylation for the detection of advanced adenomas and colorectal cancer. Diagnostics. 2020;10(7):437.
Liles EG, Coronado GD, Perrin N, Harte AH, Nungesser R, Quigley N, et al. Uptake of a colorectal cancer screening blood test is higher than of a fecal test offered in clinic: a randomized trial. Cancer Treat Res Commun. 2017;1(10):27–31.
Petersen MM, Ferm L, Kleif J, Piper TB, Rømer E, Christensen IJ, et al. Triage may improve selection to colonoscopy and reduce the number of unnecessary colonoscopies. Cancers. 2020;12(9):1–9.
Pearson JV, Huentelman MJ, Halperin RF, Tembe WD, Melquist S, Homer N, et al. Identification of the genetic basis for complex disorders by use of pooling-based genomewide single-nucleotide–polymorphism association studies. Am J Hum Genet. 2007;80(1):126–39.
Gallego-Fabrega C, Carrera C, Muiño E, Montaner J, Krupinski J, Fernandez-Cadenas I. DNA methylation levels are highly correlated between pooled samples and averaged values when analysed using the infinium HumanMethylation450 BeadChip array. Clin Epigenetics. 2015;7(1):78.
We would like to thank Dr. Martínez Zorzano and Professors Rodríguez Berrocal and Páez de la Cadena for their support and scientific advice. We would like to thank Dr. Gómez Zumaquero for the pyrosequencing support. The samples and clinical data of patients included in this study were provided by: the HGUA Biobank (PT13/0010/0044), integrated into the Red Nacional de Biobancos and in the Red Valenciana de Biobancos; SERGAS Biobanco A Coruña; Basque Biobank/Biodonostia Node; Biobank at the Galicia Sur Health Research Institute; Biobank of Hospital Clínic, Barcelona—IDIBAPS; and CHUS’s Biobank (University Hospital Complex of Santiago de Compostela) with the approval of the respective Ethical and Scientific Committees, and have been processed following standard procedures. We acknowledge CESGA (Fundación Pública Galega Centro Tecnolóxico de Supercomputación de Galicia) for providing access to computing facilities to analyze methylation microarray data. All authors have read both the Translational Research’s policy on disclosure of potential conflicts of interest and the authorship statement.
This work received funding from Plan Nacional I+D+I 2015–2018 (Acción Estratégica en Salud) Instituto de Salud Carlos III (Spain)-FEDER (PI15/02007) and Fundación Científica de la Asociación Española contra el Cáncer (GCB13131592CAST), and support from Centro Singular de Investigación de Galicia (Consellería de Cultura, Educación e Ordenación Universitaria) (ED431G/02, Xunta de Galicia and FEDER-European Union). María Gallardo-Gómez was supported by a predoctoral fellowship from Ministerio de Educación, Cultura y Deporte (Spanish Government) (FPU15/02350).
Ethics approval and consent to participate
Written informed consent was obtained from all patients with approval by the Ethics Committee for Clinical Research of Galicia (2018/008). The study was conducted according to the clinical and ethical principles of the Spanish Government and the Declaration of Helsinki.
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
. Details about the bioinformatics preprocessing of methylation microarray data for biomarker discovery. Details about biomarker discovery analysis and robust biomarker prioritization. PCR conditions and primers for biomarker evaluation and validation in individual serum samples. Decision rules derived from the final models for de detection of colorectal advanced neoplasia. Table S1. Epidemiological and clinical data of patients with other tumors (n = 16). Table S2. Description of cfDNA pooled samples. Table S3. Primers, PCR conditions, and amplicon details for biomarker evaluation by pyrosequencing. Table S4. Description of the CpG candidate biomarkers obtained after the epigenome-wide methylation analysis. Fig. S1. ROC curve analysis for the 26 candidate biomarkers and SEPT9 for NN versus AN classification in the biomarker evaluation cohort (n = 48). Fig. S2. ROC curve analysis for the 20 selected biomarkers and SEPT9 for NN versus AN classification in the model validation cohort (n = 105).
About this article
Cite this article
Gallardo-Gómez, M., Rodríguez-Girondo, M., Planell, N. et al. Serum methylation of GALNT9, UPF3A, WARS, and LDB2 as noninvasive biomarkers for the early detection of colorectal cancer and advanced adenomas. Clin Epigenet 15, 157 (2023). https://doi.org/10.1186/s13148-023-01570-1