Skip to main content

Whole-blood methylation signatures are associated with and accurately classify multiple sclerosis disease severity



The variation in multiple sclerosis (MS) disease severity is incompletely explained by genetics, suggesting genetic and environmental interactions are involved. Moreover, the lack of prognostic biomarkers makes it difficult for clinicians to optimise care. DNA methylation is one epigenetic mechanism by which gene–environment interactions can be assessed. Here, we aimed to identify DNA methylation patterns associated with mild and severe relapse-onset MS (RMS) and to test the utility of methylation as a predictive biomarker.


We conducted an epigenome-wide association study between 235 females with mild (n = 119) or severe (n = 116) with RMS. Methylation was measured with the Illumina methylationEPIC array and analysed using logistic regression. To generate hypotheses about the functional consequence of differential methylation, we conducted gene set enrichment analysis using ToppGene. We compared the accuracy of three machine learning models in classifying disease severity: (1) clinical data available at baseline (age at onset and first symptoms) built using elastic net (EN) regression, (2) methylation data using EN regression and (3) a weighted methylation risk score of differentially methylated positions (DMPs) from the main analysis using logistic regression. We used a conservative 70:30 test:train split for classification modelling. A false discovery rate threshold of 0.05 was used to assess statistical significance.


Females with mild or severe RMS had 1472 DMPs in whole blood (839 hypermethylated, 633 hypomethylated in the severe group). Differential methylation was enriched in genes related to neuronal cellular compartments and processes, and B-cell receptor signalling. Whole-blood methylation levels at 1708 correlated CpG sites classified disease severity more accurately (machine learning model 2, AUC = 0.91) than clinical data (model 1, AUC = 0.74) or the wMRS (model 3, AUC = 0.77). Of the 1708 selected CpGs, 100 overlapped with DMPs from the main analysis at the gene level. These overlapping genes were enriched in neuron projection and dendrite extension, lending support to our finding that neuronal processes, rather than immune processes, are implicated in disease severity.


RMS disease severity is associated with whole-blood methylation at genes related to neuronal structure and function. Moreover, correlated whole-blood methylation patterns can assign disease severity in females with RMS more accurately than clinical data available at diagnosis.


Multiple sclerosis (MS) is an autoimmune, neurodegenerative disease that is highly heterogeneous between individuals. Variability in disease activity and trajectory is demonstrated in natural history as well as modern day cohorts, despite disease-modifying therapy (DMT) exposure [1]. Clinicians are currently unable to predict patients’ future disease severity due to a lack of prognostic biomarkers, and as such, selecting the most appropriate course of management for the individual remains a clinical challenge.

Genome-wide association studies (GWASes) have identified over 200 independent MS susceptibility variants [2]. Therefore, it is logical that heterogeneity of MS clinical outcomes could also be regulated by genetic factors. However, attempts to associate MS susceptibility variants with disease activity [3] or disease progression [4] have been largely unsuccessful. Similarly, discovery GWASes of disease severity using cross-sectional MS severity scale scores have also failed to identify any variants reaching genome-wide significance [5, 6]. Epidemiological studies demonstrate the impact of environmental factors on disease severity, most notably demonstrating that smoking [7] and vitamin D deficiency [8] are associated with an increased risk of more severe disease, while DMT exposure and pregnancy with a reduced risk [9]. This points us to consider epigenetic mechanisms as the missing link explaining the observed clinical heterogeneity in disease severity.

DNA methylation is the best understood epigenetic mechanism that involves the presence or absence of a methyl (CH3) group on cytosine–phosphate–guanine (CpG) dinucleotides in the DNA sequence. Importantly, methylation is impacted by environmental factors and, therefore, is a plausible mechanism by which genetic and environmental factors interact to drive disease susceptibility and severity.

In case–control epigenome-wide association studies (EWASes) of MS, differential methylation in the HLA gene is the strongest validated finding across studies and cell types [10,11,12,13]. EWASes comparing methylation patterns between MS subtypes are less common, therefore methylation signatures specific to relapsing–remitting MS (RRMS), secondary progressive MS (SPMS) or primary progressive MS (PPMS) are not validated. PPMS patients show general hypermethylation in whole blood compared to RRMS patients [14], while RRMS and SPMS patients exhibit differences in CD4+ [12, 15,16,17] and CD8+ T cell [17], CD14+ monocyte[17] and CD19+ B cell [17] methylation patterns. In CD4+ T cells, differential methylation at the Major Histocompatibility Complex (including HLA-DRB1 and HLA-DRB5) [12], hypomethylation of the HTR2A locus [16] and hypermethylation at the VMP1/MIR21 locus [15] in SPMS compared to RRMS has been shown. One study has demonstrated SPMS-specific signatures in CD8+ T cells, CD14+ monocytes and CD19+ B cells compared to RRMS and controls [17]. These signatures were enriched in neuronal and neurodegenerative genes, and myeloid cell function [17]. Despite these comparisons of MS subtypes, no studies to date have compared the methylation patterns between people at the extremes of relapse-onset MS (RMS) severity; where RMS refers to patients diagnosed with RRMS and living with either RRMS or SPMS at blood collection.

In this study, we aimed to identify whole-blood and immune cell-specific DNA methylation patterns associated with mild and severe RMS across autosomes. Additionally, we aimed to assess differences in methylation age acceleration between groups and further to test the utility of DNA methylation as a predictive biomarker in MS compared to clinical data available at diagnosis.


Cohort characteristics

This study included 235 females with relapse-onset MS across four study sites (Additional file 1: Fig. S1). Using MSBase Registry data, we determined disease severity using a longitudinal age-related multiple sclerosis severity (ARMSS) score [18] approach. For each patient, an ARMSS score was calculated for each visit with a relapse-independent Expanded Disability Status Scale (EDSS) [19] score recorded. The median of these ARMSS scores was the longitudinal ARMSS score used to classify severity. Patients with median longitudinal ARMSS scores at or below the cohort 20th percentile were categorised as mild, while those at or above the 80th percentile were categorised as severe. Mild and severe patients had median longitudinal ARMSS scores of 1.21 (range = 0.17–3.01) and 8.36 (range = 6.52–9.92), respectively (Table 1).

Table 1 Cohort summary statistics

Disease severity is associated widespread differential methylation in whole-blood

After methylation data preprocessing using the Chip Analysis Methylation Pipeline (ChAMP) Bioconductor package [20], approximately 748,000 of 867,000 (86%) probes remained for analysis (Additional file 1: Fig. S2). Batch effect analysis identified Plate, Sentrix ID and Sentrix Position as significant sources of technical variation (p < 0.01). Batch effect correction reduced these to negligible effects (Additional file :1 Fig. S3).

To identify differential methylation between mild and severe groups at the single CpG level [i.e. differentially methylated positions (DMPs)], we implemented a logistic model of methylation level at each probe and severity group, adjusted for Natural Killer (NK) cell proportions. NK cell proportions were significantly associated with 1514 CpGs (FDR < 0.05, Additional file :2 Table S1) in sensitivity analyses and were therefore adjusted for in the differential methylation analysis. We revealed 1472 DMPs with an FDR < 0.05 and methylation difference (Δmeth) > 1% between mild and severe groups, mapping to 812 genes and 660 unannotated genomic locations (major DMPS listed in Table 2, full list shown in Additional file 2: Table S2). Of these 1472 DMPs, 839 (57%) were hypomethylated and 633 (43%) were hypermethylated in the severe group relative to the mild group (Fig. 1A). Δmeth ranged from -14.02 to 14.04%. The majority of DMPs were in open sea regions (1039, 70.6%), with 264 (17.9%) in shores, 105 (7.1%) in shelves and 64 (4.3%) in islands.

Table 2 Major differentially methylated positions (DMPs, Δmeth > 5%)
Fig. 1
figure 1

Differentially methylated positions (DMPs) between mild and severe groups in whole-blood. A Of 1472 DMPs, 839 (57%) were hypomethylated (shown in blue) and 633 (43%) were hypermethylated (shown in red) in the severe group. 22 DMPs had a Δmeth above 5% (indicated by the dashed line). B Methylation quantitative trait loci (mQTLs) for each major DMP (Δmeth > 5%). Methylation beta value for each DMP is plotted by genotype at the most significant mQTL based on p-value. The plotted CpG-SNV pairs are: cg10070864-15:57665782 (p = 7.25 × 10–15), cg07199764-1:5348270 (p = 7.50 × 10–13), cg07499182-13:33824770 (p = 4.76 × 10–06), cg17015133-1:155106697 (p = 4.17 × 10–02), cg14433904-8:78519337 (p = 4.33 × 10–12), cg24833027-8:1902112 (p = 1.43 × 10–10), cg00401101-5:16508674 (p = 1.01 × 10–17), cg08434374-11:44471561 (p = 5.46 × 10–05), cg03453431-7:157226016 (p = 7.05 × 10–21), cg16958467-15:57655776 (p = 8.69 × 10–29), cg00541777-2:3652616 (p = 5.22 × 10–28), cg01156747-7:120429 (p = 9.31 × 10–36), cg17322118-1:85744472 (p = 2.14E-12), cg12257246-8:25240100 (p = 4.73E-17), cg00267320-10:111537595 (p = 3.73E-02), cg00570954-12:4281245 (p = 2.50 × 10–26), cg13883027-2:62887884 (p = 3.11 × 10–38), cg25261547-19:49364511 (p = 7.81 × 10–03), cg21728101-7:1976457 (p = 3.48 × 10–04), cg14859874-1:154239283 (p = 1.13 × 10–30), cg19192981-15:57667027 (p = 2.28 × 10–06). The list of mQTLs and p-values for each major DMP are listed in Additional file 2: Table S12. Δmeth delta beta, SNV single-nucleotide variant

No differentially methylated regions (DMRs) were identified using the DMRcate algorithm at an FDR threshold of 0.05. Therefore, we loosened the definition of a DMR to a region containing at least two DMPs with the same effect direction, within 1000 bp of the adjacent DMP/s and using an FDR < 0.01. With this definition, suggestive DMRs on chromosomes 11 and 15 were identified containing two DMPs each (hereafter referred to DMRChr11 and DMRChr15, Table 3). DMRChr11 was hypomethylated in the severe group with a maximum Δmeth of 1.8% (Additional file 1: Fig. S4A). The strongest DMP in this DMR was cg02877698 (Δmeth = 1.8%, FDR = 0.0087). DMRChr11 contained two DMPs within a 447-bp region that overlapped with the gene body of uncharacterised long noncoding RNA LOC101929295. DMRChr15 was hypermethylated in the severe group with a maximum Δmeth of 8.7% (Additional file 1: Fig. S4B). cg10070864 was the strongest DMP in this DMR (Δmeth = 8.7%, FDR = 0.0032). DMRChr15 contained two DMPs within a 214-bp unannotated, intergenic region.

Table 3 Differentially methylated regions (DMRs)

Disease severity is associated with CD8+ T cell methylation

As methylation can be cell-type-specific, differential methylation analysis of whole-blood signal may not be sensitive to cell-specific DMPs (csDMPs). To address this, we estimated and compared the proportion of immune cell types in mild and severe groups. Cell-type proportions were estimated from whole-blood data using reference-based statistical deconvolution with the CIBERSORT algorithm [21]. No differences were found in immune cell proportions (Additional file 1: Fig. S5). We identified csDMPs in all cell types tested using linear models (Fig. 2): 48 csDMPs in CD8+ T cells, eight in granulocytes, five in B cells, four in CD4+ T cells, four in NK cells and two in monocytes (Additional file :2 Tables S3–8).

Fig. 2
figure 2

Number of differentially methylated positions (DMPs) in each immune cell type. CD8 + T cells (n = 48), granulocytes (n = 8), B cells (n = 5), CD4 + T cells (n = 4), Natural Killer cells (n = 4), monocytes (n = 2)

Natural Killer cell proportions are associated with methylation patterns

In sensitivity analyses, NK cell proportions were significantly associated with 1,514 CpGs (FDR < 0.05, Additional file :2 Table S1). No CpGs were significantly associated with other covariates tested (data not shown), demonstrating no major effects of these covariates on differential methylation in our cohort and therefore were not adjusted for in the analysis. Of the 2622 smoking-associated CpGs form Joehanes (2016) and 1472 DMPs identified in this study, 127 overlapped and were removed prior to downstream analyses to avoid confounding (Additional file 2: Table S9).

Disease severity is associated with methylation independent of genetic effects

Methylation quantitative trait loci (mQTLs) refer to single-nucleotide variants (SNVs) that influence methylation levels at or near certain genetic loci. We conducted a targeted mQTL analysis by testing the relationship between genotype and methylation levels within a) a ± 5 kb window of each major DMP (Δmeth > 5%) and b) at each DMR.

There were 72 independent SNVs across the 22 major DMPs (Table 2, Fig. 1B). After quality control and filtering, 221 of 235 samples had genotype data for analysis. All major DMPs except cg25304129 contained at least one mQTL (Additional file 2: Table S10). We adjusted our differential methylation analysis for mQTL effects by modelling methylation and genotype as joint predictors of disease severity in a logistic model. The association between methylation level and disease severity at these major DMPs was not attenuated when adjusted for genotype (Additional file 2: Table S10).

Three independent SNPs in DMRChr11 and one independent SNP in DMRChr15 were assessed for mQTL effects. In DMRChr11, genotype at 11:94,886,463 and 11:94,886,632, but not 11:94,886,601, were correlated with methylation, suggesting mQTL effects at this DMR (Additional file 1: Fig. S6, Additional file 2: Table S11). Adjusted logistic models showed that DMRChr11 remains significantly associated with disease severity, independent of genotype at 11:94,886,463 (cg10357314, p = 1.61 × 10–05 and cg02877698, p = 2.15 × 10–06) and 11:94,886,632 (cg10357314, p = 0.00017 and cg02877698, p = 4.72 × 10–06). In DMRChr15, genotype at 15:57,664,572 was correlated with methylation (Additional file 1: Fig. S7, Additional file 2: Table S11). Accounting for 15:57,664,572 genotype in a logistic model demonstrated that DMRChr15 remains significantly associated with disease severity (cg10070864, p = 4.06 × 10–29 and cg19192981, p = 4.20 × 10–29).

Disease severity is associated with differential methylation at genes enriched in neuronal compartments and pathways

As the majority of DMPs had effect sizes below 5%, we conducted gene set enrichment analysis (GSEA) using ToppGene on all DMPs that mapped to a gene (812 of 965, 84%) to elucidate potentially cumulative effects of many DMPs with small effect sizes. ToppGene revealed that differential methylation, regardless of direction of effect, was primarily enriched in the three cellular components: synapse (ngenes = 101, FDRB&H = 1.53 × 10–5), supramolecular complex (ngenes = 116, FDRB&H = 1.53 × 10–5) and glutamatergic synapse (ngenes = 46, FDRB&H = 1.53 × 10–5, Fig. 3A, Additional file 2: Table S12). The top enriched pathway was B cell receptor signalling (ngenes = 15, FDRB&H = 0.00098, Fig. 3B, Additional file 2: Table S12), and notable pathways included the p75 neurotrophin receptor (NTR)-mediated signalling (ngenes = 13, FDRB&H = 0.016, Additional file 2: Table S12) and NRAGE signals death through c-Jun N-terminal kinases (JNKs) (ngenes = 11, FDRB&H = 0.007, Additional file 2: Table S12).

Fig. 3
figure 3

Gene Set Enrichment Analysis (GSEA) of differentially methylated positions (DMPs). GSEA was conducted using 812 genes. A Ten most significantly enriched cellular components. B Ten most significantly enriched pathways. Gene ratio is the ratio of the number of genes in the query list and the hit count for that gene set in the genome

Hypermethylated genes (n = 391) were enriched in neuron related cellular components such as neuron spine (ngenes = 20, FDRB&H = 6.48 × 10–5) and dendritic spine (ngenes = 20, FDRB&H = 6.48 × 10–5, Fig. 4A, Additional file 2: Table S13), as well as the Adherens junction pathway (ngenes = 12, FDRB&H = 4.81 × 1005, Fig. 4B, Additional file 2: Table S13). Hypomethylated genes (n = 440) were enriched in extracellular matrix (ngenes = 29, FDRB&H = 0.00813), and external encapsulating structure cellular components (ngenes = 29, FDRB&H = 0.00813, Fig. 5A, Additional file 2: Table S14), and the axon guidance mediated by netrin (NTN) pathway (ngenes = 9, FDRB&H = 0.0006) and B cell receptor signalling pathway (ngenes = 7, FDRB&H = 0.00443, Fig. 5B, Additional file 2: Table S14).

Fig. 4
figure 4

Gene Set Enrichment Analysis (GSEA) of hypermethylated differentially methylated positions (DMPs). GSEA was conducted using 391 hypermethylated genes. A Ten most significantly enriched cellular components. B Ten most significantly enriched pathways. Gene ratio is the ratio of the number of genes in the query list and the hit count for that gene set in the genome

Fig. 5
figure 5

Gene Set Enrichment Analysis (GSEA) of hypomethylated differentially methylated positions (DMPs). GSEA was conducted using 440 hypermethylated genes. A Ten most significantly enriched cellular components. B) Ten most significantly enriched pathways. Gene ratio is the ratio of the number of genes in the query list and the hit count for that gene set in the genome

Disease severity is associated with methylation age acceleration using PhenoAge

Methylation age is the estimation of biological age from methylation levels at a subset of CpGs associated with age, known as clock CpGs. Using two validated methylation age algorithms (PhenoAge [22] and GrimAge [23]) we did not find any evidence for significant differences in methylation age between severity groups (PhenoAge, p = 0.064; GrimAge, p = 0.343; data not shown). We did identify two PhenoAge clock CpGs as DMPs in our study (Additional file 2: Table S15). However, this overlap did not meaningfully contribute to differences in MAA between severity groups.

Methylation age acceleration (MAA) is a measure of the disparity between chronological and biological age, and can provide insight into an individual’s state of health and potential lifespan [22,23,24]. It is defined as the residual term from regressing chronological age on methylation age. The distribution of residual terms was normal for PhenoAge (p = 0.599) and GrimAge (p = 0.508), so t-tests were used to assess mean difference in MAA between mild and severe groups. There were significant differences in MAA between mild and severe groups using PhenoAge (Δμ = 1.36, p = 0.048), but not GrimAge (Δμ = 0.464, p = 0.375; Additional file 2Fig. S8).

Sex, smoking history, body mass index (BMI) and socioeconomic status (SES) are reported confounders of PhenoAge[25]. To investigate whether the disparate findings between PhenoAge MAA and GrimAge MAA were due to unaddressed confounder effects, we compared smoking history between sample groups. Mild and severe groups showed no significant differences in DNA methylation smoking pack years (DNAmPACKYRS, p = 0.4342, data not shown). Sex did not need to be addressed as all participants are female, while BMI and SES data was not available for this cohort.

Whole-blood methylation levels classify disease severity more accurately than clinical data

To understand the potential of methylation as a predictive biomarker of severity with clinical utility, we compared the accuracy of three models in classifying binary disease severity in our cohort using elastic net regression. Elastic net regression is a form of penalised regression that is useful for datasets with correlated features (e.g. methylation) and a greater number of features than subjects. Model 1 used clinical data available at diagnosis, including age at onset (AAO) and first symptoms (optic pathways, supratentorial, brainstem and/or spinal cord). Model 2 used genome-wide methylation data at 747,969 CpGs and immune cell-type proportion estimates. Model 3 used a weighted methylation risk score (wMRS) of DMPs identified in the main differential methylation analysis (n = 1472). All models were trained on 70% of the cohort (n = 164).

For model 1 and 2, we used a cross-validation approach to determine the optimal alpha and lambda values (model 3: alpha = 0.4, lambda = 0.20, model 3: alpha = 0.01, lambda = 0.02). Using these parameters model 1 selected all five clinical variables (Fig. 6A), with AAO as the most important clinical factor for classifying disease severity. Model 2 selected 1708 CpGs associated with disease severity (Fig. 6B, top 10 shown in Table 4, full list in Additional file 2: Table S16). The most important CpG in model 3, cg11445760 (variable importance = 100), is located on Chromosome 11 and maps to the Splicing Factor 1 (SF1) gene. cg11445760 is not located in the DMR we identified on the same chromosome (DMRChr11) in the primary differential methylation analysis. The 1708 CpGs in model 2 mapped to 1018 genes strongly enriched in nerve growth factor (NGF) signalling pathways (ngenes = 56, FDRB&H = 2.68 × 10–5, Additional file 2: Table S17). Comparison of features in model 2 with DMPs identified in the main differential methylation analysis highlighted five overlapping CpGs and 100 overlapping genes (Additional file 2: Table S18). Overlapping genes were enriched in dendrite extension biological processes (ngenes = 5, FDRB&H = 4.94 × 10–3) and neuron projection cellular components (ngenes = 21, FDRB&H = 9.736 × 10–3, Additional file 2: Table S19). In model 3, the wMRS was significantly different between mild and severe groups (p < 2.73 × 10–17, Fig. 6C).

Fig. 6
figure 6

Multifactor feature selection and classification modelling using clinical or methylation data. A Variable importance plot for model 1 (clinical data available at diagnosis). B Variable importance plot for model 2 (methylation data at 1708 CpGs). C Distribution of weighted methylation risk scores (wMRS) by Sample Group (p < 2.72 × 10–16). D Receiver Operator Curves (ROCs) of disease severity classification using clinical data (AUC = 0.74), methylation data at 1708 CpGs (AUC = 0.91) or wGRS (AUC = 0.77). Models were trained on 164 samples (70%) and tested on 71 samples (30%)

Table 4 Top ten CpGs associated with disease severity as selected by the elastic net model

We tested the classification capacity of each model on the remining 30% of the cohort (n = 71). Model 2 classified disease severity the most accurately (AUC = 0.91), followed by model 2 (AUC = 0.77) and model 1 (AUC = 0.74, Fig. 6D).


There is limited, but growing evidence of association between genetics and disease severity in MS [5], but well established relationships between environmental variables such as smoking [7], DMT exposure [9] and pregnancy [9] on long-term outcomes. Despite this, no published studies to date have examined the association between disease severity and epigenetic mechanisms, such as DNA methylation. We hypothesised that MS severity is associated with DNA methylation. To test this, we compared genome-wide methylation between 119 relapse-onset females with mild disease and 116 age-matched females with severe disease. Disease severity was determined with median longitudinal ARMSS scores using data from the MSBase Registry. We identified numerous differentially methylated CpG sites across the genome with small, but likely cumulative impacts on MS severity. We also identified differential methylation in immune cell types, mainly CD8+ T cells, using reference-based statistical deconvolution. Gene set enrichment analyses identified neuronal, rather than immune pathways, as differentially methylated between outcome extremes. Finally, using a machine learning approach, we were able to accurately assign disease severity demonstrating the potential of methylation as a prognostic biomarker in MS.

Our analysis of whole-blood methylation differences between groups at outcome extremes identified 1472 DMPs, the majority of which had small effect sizes below 5% (n = 1455). Of the 1472 DMPs, 55% mapped to genes enriched in CNS cellular components and pathways. Notably, hypermethylated genes were enriched in neuronal structures, including neuron spine, dendritic spine and neuronal projection. Recent studies show the specific vulnerability of excitatory projection neurons to chronic cortical inflammation, compared to inhibitory neurons [26]. Our findings lend support to a recent suggestion that excitatory cortical projection neurons could be a novel therapeutic target to combat MS-related neurodegeneration [27]. Hypomethylated genes in the severe group were enriched in axon guidance pathways. Research in both humans and mouse models of MS have shown that levels of NTN1, an axon guidance gene that reduces immune cell translocation into the CNS, are lower in MS compared to healthy comparators, particularly during relapse [28]. Further to this, the inhibition of repulsive guidance molecule-a, an axon guidance gene that prevents axon growth and immune regulation, has shown therapeutic efficacy in MS animal models [29]. In our study, we have shown hypomethylation, and subsequently potentially increased expression, of axon guidance genes in patients with severe disease. Although counterintuitive, hypomethylation of axon guidance pathways in those with severe MS may represent an adaptive mechanism aimed at countering MS-associated neurodegeneration, where neurodegenerative pathways regulated via the p75 neurotrophin receptor, and NRAGE-mediated cell death pathways were also identified as differentially methylated in our study. These findings add to a growing body of evidence implicating neuronal response to injury in disease severity. Jokubaitis et al. recently identified genetic variants associated with disease severity based on longitudinal ARMSS scores overrepresented in synaptic plasticity processes [6], while Kosa et al. identified synaptogenesis as a pathway differentially activated between patients with different levels of disability based on spinal cord damage [30].

We further showed that hypomethylated genes were enriched in the B cell receptor signalling pathway. Hypomethylation may indicate increased expression of B cell receptor signalling pathway genes, leading to more severe disease. There is a strong basis for a role of B cells in MS pathogenesis [31], further supported by the efficacy of monoclonal anti-CD20 antibodies as DMTs, by selectively depleting CD20+ B cells primarily via antibody-dependent cell-mediated cytotoxicity [32]. Given the high rates of response to B cell depleting therapies in patients with severe MS [32], it is unsurprising that this signalling pathway is implicated somewhat in disease severity. The enrichment of the B cell receptor signalling pathway in this study, paired with the efficacy of B cell depleting DMTs implicate a role for B cells in disease severity, additional to their putative role in pathogenesis [32]. Overall, our results support a dichotomy between mechanisms regulating MS risk and MS outcome, where we have overarchingly identified neurodegeneration-related pathways, rather than immune-related pathways, as dysregulated in severe MS. This is consistent with the current understanding of progressive MS, as demonstrated by the lack of effectiveness of most immunosuppressive treatments for secondary and primary progressive MS.

We identified two differentially methylated regions (DMRs) in whole-blood. DMRChr15 (Chr15:57,664,490–57,664,704) was hypermethylated in the severe group and located in an intergenic region. DMRChr11 (Chr11:94,886,261–94,886,708) contained two DMPs that were hypomethylated in the severe group. While cg10357314 is in an intergenic region, cg02877698 is in the gene body of uncharacterised long noncoding RNA (lncRNA) gene LOC101929295. lncRNAs are genes that are transcribed, but never translated (noncoding), as their primary role is in transcriptional, post-transcriptional and translational regulation of gene expression [33]. Numerous studies have described a role for lncRNAs in MS pathogenesis through the alteration of immune cell differentiation and activation [34]. Gupta et al. (2019) demonstrated the prognostic potential of lncRNAs by identifying eight upregulated lncRNAs in severe (n = 21) RRMS cases relative to mild (n = 43) RRMS cases. They also categorised disease severity using longitudinal ARMSS scores. Gupta et al. did not identify LOC101929295 as an upregulated lncRNA. It is possible that we did not validate the findings of Gupta et al. as our cohort was sex-matched and much larger, or due to post-transcriptional modifications. Nevertheless, when taken together these results demonstrate a potential role for epigenetically regulated lncRNA in disease severity.

Targeted mQTL analysis of major DMPs and DMRs identified genetic effects on methylation at these loci. However, these genetic effects did not attenuate the association between methylation and disease severity. It must be noted that the genetic architecture of the methylome is highly complex with many CpGs being influenced by both cis and trans acting mQTLs [35]. For this reason, we cannot rule out more complex genetic interactions with the major DMPs and DMRs identified in this study.

We identified cell-specific methylation differences between mild and severe patients by estimating cell-type proportions from whole-blood methylation data using statistical deconvolution. The majority of csDMPs were in CD8+ T cells (n = 48), a handful of which mapped to genes shown to be differentially expressed in animal models of neurodegenerative diseases (e.g. Dynactin Subunit 5[36], Isopentenyl-Diphosphate Delta Isomerase 1 [37], SLIT-ROBO Rho GTPase Activating Protein 1 [38] and Phosphodiesterase 4D [39]). Given the absence of DMRs in CD8+ T cells, and low number of csDMPs in other immune cell types, these results serve to guide future cell-type-targeted studies of disease severity, rather than demonstrating a mechanistic role for individual cell-type methylation in determining clinical outcomes.

Methylation age acceleration (MAA) is associated with increased all-cause morbidity and mortality. We identified slower MAA in mild patients, compared to severe, using the PhenoAge algorithm. We did not observe the same effect with the GrimAge algorithm which has been shown to be more robustly associated with clinical outcomes in comparison to PhenoAge, including walking speed, polypharmacy, frailty and mortality [25]. However, different results between the algorithms are not unexpected as they were designed using different clinical markers as ageing measures [25]. PhenoAge uses albumin, creatinine, serum glucose, c-reactive protein, lymphocyte per cent, mean cell volume, red cell distribution width, alkaline phosphatase and white blood cell count [22], while GrimAge uses adrenomedullin, beta-2-microglobulin, cystatin C, growth differentiation factor 15, leptin, plasminogen activation inhibitor 1, tissue inhibitor metalloproteinase 1 and smoking pack years [23]. It is possible that methylation patterns between mild and severe patients differ more at DNA methylation proxies of PhenoAge clinical variables, compared to the GrimAge clinical variables. Notably, the PhenoAge clinical variables lymphocyte per cent and white blood cell count are impacted by DMTs for MS. Therefore, PhenoAge may be more sensitive to methylation differences in people with MS than GrimAge. Alternatively, the disparity between PhenoAge and GrimAge acceleration in this paper may be driven by unaccounted confounder effects, including smoking history, BMI and SES [25]. We tested smoking history using an estimate of DNA methylation smoking pack years and showed no significant differences in smoking pack years between mild and severe groups. While differences in smoking history are not driving the significant differences in PhenoAge MAA between groups, we could not account for confounding by BMI and SES due to lack of data.

To understand the potential of whole-blood methylation as a prognostic biomarker, we compared the accuracy of three models in classifying binary disease severity. One model used only clinical data available at diagnosis (model 1), while two models used methylation data in the form of selected features from elastic net regression (model 2) or a weighted methylation risk score (wMRS) of DMPs (model 3). The importance of age at onset (AAO) in model 1 confirms current understanding that later AAO is associated with greater long-term disability [9]. For model 2, elastic net regression identified whole-blood methylation levels at 1708 CpGs to be associated with disease severity, of which five overlapped with the main differential methylation analysis at the CpG level and 100 CpGs overlapped at the gene level. These genes were enriched in neuronal components and processes, adding to a growing evidence that disease severity is driven by neuronal, rather than immune, mechanisms [6, 30]. The ability of the machine learning-trained elastic net regression model to identify 1702 additional DMPs to the main analysis demonstrate the presence of small, correlated and potentially cumulative effects of methylation on disease severity. While this requires a larger cohort to understand at a more granular level, it emphasises the value of accounting for correlated effects across the genome using methods such as penalised regression.

Model 2 classified disease severity the most accurately (AUC = 0.91). The improved accuracy of classifying disease severity using methylation levels at correlated CpGs (model 2), compared clinical data available at diagnosis (model 1), demonstrates the potential of whole-blood methylation as a clinical biomarker. At diagnosis, clinicians prognosticate based on AAO, presenting symptoms and MRI lesions. Our findings lend support to a growing body of evidence that genomic signatures may help clinicians prognosticate more accurately than clinical data alone [6]. We are confident that our models are not overfitted due to internal validation using training and testing sets with a conservative 70:30 split. Nevertheless, validation of the model in an independent mixed-sex cohort at diagnosis would confirm the robustness and generalisability of our multi-SNV signature as a true prognostic biomarker.

Ours is the largest study to date examining severity-associated differential methylation in a cohort of females with RMS. Further, ours is the only study reported to date that has utilised longitudinal severity data to ascribe severity status. Given the fluctuation of EDSS scores across disease course [40], the ability to robustly phenotype patients using longitudinal data, rather than cross-sectional data, reduces the probability of erroneous associations. Categorising disease severity by ARMSS score also allows comparison of cohorts a priori adjusted for age, a known factor that may confound methylation analyses. Our findings require validation in an independent cohort of females and, further, extension to males to confirm that differential methylation patterns between those with mild and severe RMS is consistent between sexes. The prognostic utility of correlated methylation patterns requires independent validation, although we did try to mitigate against overfitting using a 70:30 training:testing strategy. A range of environmental factors impact methylation, and we were underpowered to adjust for many of these, including BMI, SES and DMT at blood collection [41].


This is the first study to investigate the association between genome-wide methylation patterns and longitudinal disease severity in relapse-onset MS. We identified differences in whole-blood methylation patterns between 119 mild and 116 severe females with RMS, and demonstrate a dichotomy in signally pathways associated with MS severity (enriched in neuronal pathways and signalling) as opposed to past studies of risk. Using a machine learning strategy, we show that correlated methylation patterns classify disease severity more accurately than clinical data. Our findings support evidence that genomic signatures may help clinicians prognosticate more accurately than clinical data alone at diagnosis. Validation studies are needed to confirm our findings and assess the utility of differential methylation as a prognostic biomarker in relapse-onset MS.

Materials and methods

Clinical data collection

This study utilised clinical data from the MSBase Registry, an international, prospective, observational MS clinical outcomes register. Data are collected in a unified manner and include patient demographics, EDSS scores, relapse and treatment data, as previously described [42].

Participant recruitment, severity definitions and sample collection

Using MSBase Registry data, we assessed patients for study eligibility based on the following criteria: a diagnosis of relapse-onset MS, European ethnicity, female, Australian, minimum five years of clinical follow-up, minimum three relapse-independent EDSS scores recorded, and available genotype and whole-blood methylation data. We restricted participants to Australian females to reduce confounding by sex and geographical location as these are known to impact methylation [43].

Disease severity was determined using a longitudinal ARMSS [18] approach, as previously described [6]. Briefly, ARMSS scores were calculated for each relapse-independent Expanded Disability Status Scale (EDSS) score [19] available, and the median of these longitudinal ARMSS scores was calculated. Patients were categorised as mild or severe based on median longitudinal ARMSS scores. Mild patients were those with median longitudinal ARMSS scores at or below the cohort 20th percentile, while severe patients were those at or above the cohort 80th percentile. Our final cohort consisted of 119 mild patients and 116 age-matched severe patients (n = 235, Additional file 1: Fig. S1). Included study sites were Royal Melbourne Hospital (VIC, n = 81), Box Hill Hospital (VIC, n = 67), Flinders Medical Centre (SA, n = 51) and John Hunter Hospital (NSW, n = 36).

Methylation arrays

Whole-blood genomic (gDNA) was processed for methylation arrays at the Hunter Medical Research Institute. The Qbit (Invitrogen™, USA) and TapeStation (Agilent™, USA) were used to assess DNA quantity and quality, respectively. Samples were bisulfite converted using the EZ-DNA Methylation™ Kit (Zymo) kits and hybridised to Illumina MethylationEPIC BeadChip arrays (EPIC arrays). To avoid batch effects, samples were randomised by clinic site using the OSAT R package. We used an iScan (Illumina™) to read the EPIC arrays, which produced raw Idat files for analysis.

Genotyping arrays

Genotyping was performed at the Center for Genome Technology, John P. Hussman Institute for Human Genomics, University of Miami using Illumina Multiethnic genotyping arrays (MEGAEX). Genotype calling was performed using GenomeStudio v2.0 (Illumina).

DNA methylation analysis pipeline

We used the ChAMP Bioconductor package (version 2.22.0) [20] for methylation data preprocessing in the R statistical environment. We filtered raw Idat files to exclude low-quality samples (failed to successful probe ratio > 0.1), low-quality probes (detection p value > 0.01, bead count < 3 in ≥ 5% of samples), non-CpG probes, SNP-related probes, non-autosomal probes and multi-hit probes. We additionally excluded multi-hit probes based on Pidsley (2016) Additional file 2:Table S1 [44]. We normalised methylation beta values using the Beta-Mixture Quantile (BMIQ) method [45]. We identified batch effects at the array and chip level using singular value decomposition (SVD) analysis [46] and corrected these effects using the Combat algorithm [47].

Primary differential methylation analysis

We identified differential methylation between mild and severe groups at the single CpG level (i.e. differentially methylated positions (DMPs)) and at the genomic region level (i.e. DMRs). We implemented a linear model of methylation level at each probe and severity group, adjusted for NK cell proportions, using a filtered and normalised beta matrix as input. This was a modified version of the function champ.DMP from the ChAMP R package. We used a false discovery rate (FDR) threshold of 0.05 to assess statistical significance for all analyses. Methylation beta values equate to methylation percentage (i.e. Δmeth of 0.01 = 1%), therefore, we report methylation differences (Δmeth) in percentages below. To avoid false positives driven by technical error, we removed DMPs with an Δmeth < 1%.

We used two methods to identify DMRs. Firstly, the DMRcate R package (version 2.2.3) [48] with the following parameters: at least three DMPs within 1000 bp of the adjacent DMP, a DMP threshold of FDR < 0.05, an DMR threshold of FDR < 0.05. Secondly, DMRs were identified from the DMP list as ≥ 2 DMPs with (1) an FDR < 0.01, (2) the same direction of effect and (3) located within 1000 bp. Previous studies demonstrate the robustness of this strategy to identify DMRs in studies with small sample and/or effect sizes [12, 16].

Methylation patterns can be cell-type-specific. Therefore, we estimated immune cell-type proportions to confirm that differential methylation in whole-blood was not driven by cell-type proportion differences between groups. We estimated immune cell-type proportions with the EpiDISH R package (version 2.8.0) [49] and reference-based CIBERSORT algorithm [21], using methylation M-values as input. We then used a modified version the cellDMC function of EpiDISH, to identify csDMPs. Here, the outcome term of a linear model was methylation M-value, and the predictors were cell-type proportion estimate and an interaction term of cell-type proportion and severity. A genome-wide threshold of p ≤ 9 × 10–8 was used to assess statistical significance.

Sensitivity analyses

Methylation patterns are impacted by a range of clinical and environmental factors. Therefore, we performed sensitivity analyses to assess the potential impact of a series of demographic, clinical, biological and environmental covariates on methylation patterns, including: age at blood collection, symptom duration, annualised relapse rate, cell-type proportion estimates (B cells, CD4+ cells, CD8+ cells, NK cells, neutrophils and eosinophils) and methylation age acceleration. Additional categorical environmental factors were assessed, including treatment at blood collection (yes or no, specific treatment at blood collection available in Additional file 2: Table S20), smoking status at blood collection (ever or never) and parity at blood collection (nulliparous or parous). An FDR threshold of 0.05 was used to assess statistical significance.

Mild and severe patients were matched by age at blood collection (n = 214, 107 pairs), and the difference in methylation at each probe (Δmeth) was calculated. The correlation between each Δmeth and covariate was tested. Pearson’s correlation was used for continuous covariates and ANOVA for categorical covariates. For categorical variables (treatment, smoking history, parity), mild–severe pairs were required to have the same value for the correlation with methylation to be tested. Of 107 pairs in total, 17 pairs were on treatment at blood collection and 12 were off treatment, 25 pairs were nulliparous at blood collection and 17 were parous, and six pairs were ‘ever’ smokers at blood collection and four were ‘never’ smokers (Additional file 2: Table S21). Due to the known effect of smoking on the methylome and limited smoking data available for this cohort, DMPs were filtered for 2,622 known smoking-associated CpGs identified by Johanes and colleagues (2016) [50].

Single-nucleotide variant analysis

We performed quality control with PLINKv1.9 [51]. Single-nucleotide variants (SNVs) were excluded based on low call rate (< 95%), low minor allele frequency (MAF < 0.05), violation of Hardy–Weinberg equilibrium (p < 1 × 10–5), monomorphism and non-autosomal location. Samples were excluded based on sex inconsistencies, low call rate (< 95%) and relatedness (pi-hat > 0.05). We assessed relatedness using Identity by Descent (IBD) analysis in PLINKv1.9, followed by confirmation in KING [52]. Principal components (PC) analysis was implemented in EIGENSTRAT [53]. We projected PCs to 1000 Genomes Project [54] data to assess population stratification effects and exclude population outliers. We imputed genotypes using the Haplotype Reference Consortium [55] on the Michigan Imputation Server (!) and converted imputed genotypes to genotype calls in PLINKv1.9.

Targeted methylation quantitative trait loci (mQTL) analysis

To account for genetic effects on methylation levels, we conducted a targeted mQTL analysis by testing the relationship between genotype and methylation levels within a) a ± 5 kb window of each major DMP (i.e. Δmeth > 5%) and b) each DMR.

We used the KRIS R package (version 1.1.6)[56] to extract SNVs located within a ± 5 kb window of each major DMP and within the boundaries of each DMR. For SNVs on the same chromosome, we assessed linkage disequilibrium (LD) using bivariate correlations of genotype frequencies using a statistical significance threshold of p < 0.05.

We then assessed the relationship between methylation level (beta values) and genotype using Kruskal–Wallis tests due to the beta binomial (non-normal) distribution of methylation beta values. Lastly, we used binary logistic regression with severity group as the outcome and methylation and genotype as the predictors, to test whether differential methylation at major DMPs/DMRs were regulated by genetic effects rather than disease severity.

Gene set enrichment analysis (GSEA)

We performed gene set enrichment analysis (GSEA) with the the ToppGene online application programming interface (API) to explore biological processes and pathways enriched by differentially methylated genes. We used an FDR-ranked genes as input and analysed hypomethylated and hypermethylated genes both together and separately. A Benjamini–Hochberg adjusted p value threshold ≤ 0.05 was used to assess the statistical significance.

Methylation age acceleration analysis

Methylation age is the estimation of biological age from methylation levels at a subset of CpGs associated with age, known as clock CpGs. Multiple algorithms have been developed to perform this estimation, the most widely used algorithms being PhenoAge [22] and GrimAge [23]. Methylation age acceleration (MAA) is defined as the discrepancy between chronological and biological age, whereby an acceleration of biological age is associated with increased risk of various morbidities and mortality, as well as shorter lifespan [22,23,24].

We estimated the PhenoAge [22] of our samples with the methyAge function of the ENmix R package (version 2.8.0) [57]. GrimAge was calculated with the online calculator at MAA was defined as the residual term from regressing chronological age on methylation age estimates. We used Shapiro–Wilk normality tests to assess the normality of each MAA distribution and t-tests to assess mean difference in MAA between mild and severe groups.

To test smoking history as a potential confounder of PhenoAge, we used DNA methylation smoking pack years (DNAmPACKYRS) as a measure of smoking history, calculated using the GrimAge online tool: Shapiro–Wilk normality tests were used to assess the normality of the DNAmPACKYRS distribution in each group, and a Wilcoxon signed rank test was used to assess mean difference in DNAmPACKYRS between mild and severe groups.

Multifactor feature selection and classification modelling

For each model, we split samples into training (n = 164) and testing sets (n = 71) to limit overfitting. For models 1 and 2, we conducted multifactor feature selection in the training set using a cross-validation resampling method with 10 iterations via the train function of the caret R package (version 6.0–93) [58]. We then conducted a k-fold cross-validation elastic net regression using the optimal alpha value to identify the minimum lambda value with the cv.glmnet function of the glmnet R package (version 4.1–4) [59]. These alpha and lambda values were used in the final elastic net regression models that were applied to the testing set using the glmnet function [59]. We compared the CpGs selected in model 2 with DMPs from the main differential methylation analysis at the CpG and gene level. GSEA of the overlapping genes were performed using ToppGene.

For model 3, we calculated a weighted methylation risk score (wMRS) for each sample:

$$ {\text{wMRS}}_{{{\text{sample}}}} = \sum \beta z - {\text{score}}_{{{\text{Sample}} {\text{at}} {\text{DMPi}}}} \times \Delta \beta_{{{\text{DMPi}}}} $$

The wMRS for each sample was subsequently used in a general linear model as the predictor, with sample group as the outcome.

We tested the ability of each model to classify test samples as mild or severe using base R function, predict(). The performance function of the ROCR R package (version 1.0–11) [60] was used to test the performance of the classification model with the AUC measure.

Availability of data and materials

Supplemental files contain data supporting the conclusions in this article. For access to deidentified raw methylation data and code please contact Dr Vilija Jokubaitis ( Data sharing requests will be reviewed based on a research proposal. Sharing de-identified patient-level clinical data from MSBase Registry may be possible in principle, but will require permission/consent from each contributing data controller to protect participant confidentiality.



Age-related multiple sclerosis severity


Chip analysis methylation pipeline




Cell-specific differentially methylated position


Differentially methylated position


Differentially methylated region


Disease-modifying therapy


Expanded disability status scale


Epigenome-wide association study


False discovery rate


Gene set enrichment analysis


Genome-wide association study


c-Jun N-terminal kinase


Long noncoding RNA


Methylation age acceleration


Methylation quantitative trait loci


Multiple sclerosis


Nerve growth factor


Natural Killer




Neurotrophin receptor


Primary progressive multiple sclerosis


Relapse-onset multiple sclerosis


Relapsing–remitting multiple sclerosis


Weighted methylation risk score


Splicing factor 1


Single-nucleotide polymorphism


Secondary progressive multiple sclerosis


  1. Confavreux C, Vukusic S, Moreau T, Adeleine P. Relapses and progression of disability in multiple sclerosis. N Engl J Med. 2000;343(20):1430–8.

    Article  CAS  Google Scholar 

  2. International Multiple Sclerosis Genetics Consortium. Multiple sclerosis genomic map implicates peripheral immune cells and microglia in susceptibility. Science. 2019 27;365(6460).

  3. Søndergaard HB, Petersen ER, Magyari M, Sellebjerg F, Oturai AB. Genetic burden of MS risk variants distinguish patients from healthy individuals but are not associated with disease activity. Mult Scler Relat Disord. 2017;13:25–7.

    Article  Google Scholar 

  4. Jokubaitis VG, Butzkueven H. A genetic basis for multiple sclerosis severity: Red herring or real? Mol Cell Probes. 2016;30(6):357–65.

    Article  CAS  Google Scholar 

  5. Giacalone G, Clarelli F, Osiceanu A, Guaschino C, Brambilla P, Sorosina M, et al. Analysis of genes, pathways and networks involved in disease severity and age at onset in primary-progressive multiple sclerosis. Mult Scler. 2015;21(11):1431–42.

    Article  CAS  Google Scholar 

  6. Jokubaitis VG, Campagna MP, Ibrahim O, Stankovich J, Kleinova P, Matesanz F, et al. Not all roads lead to the immune system: the genetic basis of multiple sclerosis severity. Brain. 2022.

    Article  Google Scholar 

  7. Ramanujam R, Hedström AK, Manouchehrinia A, Alfredsson L, Olsson T, Bottai M, et al. Effect of smoking cessation on multiple sclerosis prognosis. JAMA Neurol. 2015;72(10):1117–23.

    Article  Google Scholar 

  8. Ascherio A, Munger KL, White R, Köchert K, Simon KC, Polman CH, et al. Vitamin D as an early predictor of multiple sclerosis activity and progression. JAMA Neurol. 2014;71(3):306–14.

    Article  Google Scholar 

  9. Jokubaitis VG, Spelman T, Kalincik T, Lorscheider J, Havrdova E, Horakova D, et al. Predictors of long-term disability accrual in relapse-onset multiple sclerosis. Ann Neurol. 2016;80(1):89–100.

    Article  Google Scholar 

  10. Rhead B, Brorson IS, Berge T, Adams C, Quach H, Moen SM, et al. Increased DNA methylation of SLFN12 in CD4+ and CD8+ T cells from multiple sclerosis patients. PLoS ONE. 2018;13(10):e0206511.

    Article  Google Scholar 

  11. Kular L, Liu Y, Ruhrmann S, Zheleznyakova G, Marabita F, Gomez-Cabrero D, et al. DNA methylation as a mediator of HLA-DRB1*15:01 and a protective variant in multiple sclerosis. Nat Commun [Internet]. 2018 Jun 19 [cited 2020 Apr 26];9. Available from:

  12. Maltby VE, Lea RA, Sanders KA, White N, Benton MC, Scott RJ, et al. Differential methylation at MHC in CD4+ T cells is associated with multiple sclerosis independently of HLA-DRB1. Clin Epigenet. 2017;9(1):71.

    Article  Google Scholar 

  13. Graves M, Benton M, Lea R, Boyle M, Tajouri L, Macartney-Coxson D, et al. Methylation differences at the HLA-DRB1 locus in CD4+ T-Cells are associated with multiple sclerosis. Mult Scler. 2014;20(8):1033–41.

    Article  CAS  Google Scholar 

  14. Kulakova OG, Kabilov MR, Danilova LV, et al. Whole-genome DNA methylation analysis of peripheral blood mononuclear cells in multiple sclerosis patients with different disease courses. Acta Nat. 2016;8(3):103–10.

    Article  CAS  Google Scholar 

  15. Ruhrmann S, Ewing E, Piket E, Kular L, Cetrulo Lorenzi JC, Fernandes SJ, et al. Hypermethylation of MIR21 in CD4+ T cells from patients with relapsing-remitting multiple sclerosis associates with lower miRNA-21 levels and concomitant up-regulation of its target genes. Mult Scler. 2018;24(10):1288–300.

    Article  CAS  Google Scholar 

  16. Maltby VE, Lea RA, Burnard S, Xavier A, Van Cao T, White N, et al. Epigenetic differences at the HTR2A locus in progressive multiple sclerosis patients. Sci Rep. 2020;10(1):22217.

    Article  CAS  Google Scholar 

  17. Ewing E, Kular L, Fernandes SJ, Karathanasis N, Lagani V, Ruhrmann S, et al. Combining evidence from four immune cell types identifies DNA methylation patterns that implicate functionally distinct pathways during Multiple Sclerosis progression. EBioMedicine. 2019;30(43):411–23.

    Article  Google Scholar 

  18. Manouchehrinia A, Westerlind H, Kingwell E, Zhu F, Carruthers R, Ramanujam R, et al. Age related multiple sclerosis severity score: disability ranked by age. Mult Scler. 2017;23(14):1938–46.

    Article  Google Scholar 

  19. Kurtzke JF. Rating neurologic impairment in multiple sclerosis: an expanded disability status scale (EDSS). Neurology. 1983;33(11):1444–52.

    Article  CAS  Google Scholar 

  20. Tian Y, Morris TJ, Webster AP, Yang Z, Beck S, Feber A, et al. ChAMP: updated methylation analysis pipeline for Illumina BeadChips. Bioinformatics. 2017;33(24):3982–4.

    Article  CAS  Google Scholar 

  21. Newman AM, Liu CL, Green MR, Gentles AJ, Feng W, Xu Y, et al. Robust enumeration of cell subsets from tissue expression profiles. Nat Methods. 2015;12(5):453–7.

    Article  CAS  Google Scholar 

  22. Levine ME, Lu AT, Quach A, Chen BH, Assimes TL, Bandinelli S, et al. An epigenetic biomarker of aging for lifespan and healthspan. Aging (Albany NY). 2018;10(4):573–91.

    Article  Google Scholar 

  23. Lu AT, Quach A, Wilson JG, Reiner AP, Aviv A, Raj K, et al. DNA methylation GrimAge strongly predicts lifespan and healthspan. Aging (Albany NY). 2019;11(2):303–27.

    Article  CAS  Google Scholar 

  24. Gibson J, Russ TC, Clarke TK, Howard DM, Hillary RF, Evans KL, et al. A meta-analysis of genome-wide association studies of epigenetic age acceleration. PLoS Genet. 2019;15(11): e1008104.

    Article  Google Scholar 

  25. McCrory C, Fiorito G, Hernandez B, Polidoro S, O’Halloran AM, Hever A, et al. GrimAge outperforms other epigenetic clocks in the prediction of age-related clinical phenotypes and all-cause mortality. J Gerontol A Biol Sci Med Sci. 2020.

    Article  Google Scholar 

  26. Schirmer L, Velmeshev D, Holmqvist S, Kaufmann M, Werneburg S, Jung D, et al. Neuronal vulnerability and multilineage diversity in multiple sclerosis. Nature. 2019;573(7772):75–82.

    Article  CAS  Google Scholar 

  27. Beutel T, Dzimiera J, Kapell H, Engelhardt M, Gass A, Schirmer L. Cortical projection neurons as a therapeutic target in multiple sclerosis. Expert Opin Ther Targets. 2020;24(12):1211–24.

    Article  CAS  Google Scholar 

  28. Mulero P, Córdova C, Hernández M, Martín R, Gutiérrez B, Muñoz JC, et al. Netrin-1 and multiple sclerosis: A new biomarker for neuroinflammation? Eur J Neurol. 2017;24(9):1108–15.

    Article  CAS  Google Scholar 

  29. Tanabe S, Fujita Y, Ikuma K, Yamashita T. Inhibiting repulsive guidance molecule-a suppresses secondary progression in mouse models of multiple sclerosis. Cell Death Dis. 2018;9(11):1061.

    Article  Google Scholar 

  30. Kosa P, Lumbard K, Wang J, Liang CJ, Masvekar R, Kim Y, et al. Molecular mechanisms associated with multiple sclerosis progression, severity and phenotype. medRxiv. 2022.

    Article  Google Scholar 

  31. Zhong M, van der Walt A, Campagna MP, Stankovich J, Butzkueven H, Jokubaitis V. The Pharmacogenetics of Rituximab: potential Implications for Anti-CD20 therapies in multiple sclerosis. Neurotherapeutics. 2020.

    Article  Google Scholar 

  32. Margoni M, Preziosa P, Filippi M, Rocca MA. Anti-CD20 therapies for multiple sclerosis: current status and future perspectives. J Neurol. 2021;11:1–19.

    Google Scholar 

  33. Long Y, Wang X, Youmans DT, Cech TR. How do lncRNAs regulate transcription? Sci Adv. 2017;3(9):eaao2110.

    Article  Google Scholar 

  34. Yang X, Wu Y, Zhang B, Ni B. Noncoding RNAs in multiple sclerosis. Clin Epigenetics. 2018;10(1):149.

    Article  CAS  Google Scholar 

  35. Gaunt TR, Shihab HA, Hemani G, Min JL, Woodward G, Lyttleton O, et al. Systematic identification of genetic influences on methylation across the human life course. Genome Biol. 2016;17(1):61.

    Article  Google Scholar 

  36. Gispert S, Brehm N, Weil J, Seidel K, Rüb U, Kern B, et al. Potentiation of neurotoxicity in double-mutant mice with Pink1 ablation and A53T-SNCA overexpression. Hum Mol Genet. 2015;24(4):1061–76.

    Article  CAS  Google Scholar 

  37. Malik B, Fernandes C, Killick R, Wroe R, Usardi A, Williamson R, et al. Oligomeric amyloid-β peptide affects the expression of genes involved in steroid and lipid metabolism in primary neurons. Neurochem Int. 2012;61(3):321–33.

    Article  CAS  Google Scholar 

  38. Kanaan NM, Collier TJ, Cole-Strauss A, Grabinski T, Mattingly ZR, Winn ME, et al. The longitudinal transcriptomic response of the substantia nigra to intrastriatal 6-hydroxydopamine reveals significant upregulation of regeneration-associated genes. PLoS ONE. 2015;10(5): e0127768.

    Article  Google Scholar 

  39. Sunke R, Bankala R, Thirupataiah B, Ramarao EVVS, Kumar JS, Doss HM, et al. InCl3 mediated heteroarylation of indoles and their derivatization via CH activation strategy: discovery of 2-(1H-indol-3-yl)-quinoxaline derivatives as a new class of PDE4B selective inhibitors for arthritis and/or multiple sclerosis. Eur J Med Chem. 2019;15(174):198–215.

    Article  Google Scholar 

  40. Roxburgh RHSR, Seaman SR, Masterman T, Hensiek AE, Sawcer SJ, Vukusic S, et al. Multiple sclerosis severity score: using disability and disease duration to rate disease severity. Neurology. 2005;64(7):1144–51.

    Article  CAS  Google Scholar 

  41. Maltby VE, Lea RA, Ribbons KA, Sanders KA, Kennedy D, Min M, et al. DNA methylation changes in CD4+ T cells isolated from multiple sclerosis patients on dimethyl fumarate. Mult Scler J Exp Transl Clin [Internet]. 2018 Jul 17 [cited 2020 Apr 26];4(3). Available from:

  42. Butzkueven H, Chapman J, Cristiano E, Grand’Maison F, Hoffmann M, Izquierdo G, et al. MSBase: an international, online registry and platform for collaborative outcomes research in multiple sclerosis. Mult Scler. 2016;12(6):769–74.

    Article  Google Scholar 

  43. Feil R, Fraga MF. Epigenetics and the environment: emerging patterns and implications. Nat Rev Genet. 2012;13(2):97–109.

    Article  CAS  Google Scholar 

  44. Pidsley R, Zotenko E, Peters TJ, Lawrence MG, Risbridger GP, Molloy P, et al. Critical evaluation of the Illumina MethylationEPIC BeadChip microarray for whole-genome DNA methylation profiling. Genome Biol. 2016;17(1):208.

    Article  Google Scholar 

  45. Teschendorff AE, Marabita F, Lechner M, Bartlett T, Tegner J, Gomez-Cabrero D, et al. A beta-mixture quantile normalization method for correcting probe design bias in Illumina Infinium 450 k DNA methylation data. Bioinformatics. 2013;29(2):189–96.

    Article  CAS  Google Scholar 

  46. Teschendorff AE, Menon U, Gentry-Maharaj A, Ramus SJ, Gayther SA, Apostolidou S, et al. An epigenetic signature in peripheral blood predicts active ovarian cancer. PLoS One [Internet]. 2009 Dec 18 [cited 2020 Aug 25];4(12). Available from:

  47. Müller C, Schillert A, Röthemeier C, Trégouët DA, Proust C, Binder H, et al. Removing batch effects from longitudinal gene expression-quantile normalization plus ComBat as best approach for microarray transcriptome data. PLoS One [Internet]. 2016 Jun 7 [cited 2020 Oct 21];11(6). Available from:

  48. Peters TJ, Buckley MJ, Statham AL, Pidsley R, Samaras KV, Lord R, et al. De novo identification of differentially methylated regions in the human genome. Epigenet Chromatin. 2015;8:6.

    Article  Google Scholar 

  49. Teschendorff A. Epigenetic dissection of intra-sample-heterogeneity [Internet]. 2017. Available from:

  50. Joehanes R, Just AC, Marioni RE, Pilling LC, Reynolds LM, Mandaviya PR, et al. Epigenetic signatures of cigarette smoking. Circ Cardiovasc Genet. 2016;9(5):436–47.

    Article  CAS  Google Scholar 

  51. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007;81(3):559–75.

    Article  CAS  Google Scholar 

  52. Manichaikul A, Mychaleckyj JC, Rich SS, Daly K, Sale M, Chen WM. Robust relationship inference in genome-wide association studies. Bioinformatics. 2010;26(22):2867–73.

    Article  CAS  Google Scholar 

  53. Patterson N, Price AL, Reich D. Population structure and Eigenanalysis. PLOS Genet. 2006;2(12): e190.

    Article  Google Scholar 

  54. Auton A, Abecasis GR, Altshuler DM, Durbin RM, Abecasis GR, Bentley DR, et al. A global reference for human genetic variation. Nature. 2015;526(7571):68–74.

    Article  Google Scholar 

  55. McCarthy S, Das S, Kretzschmar W, Delaneau O, Wood AR, Teumer A, et al. A reference panel of 64,976 haplotypes for genotype imputation. Nat Genet. 2016;48(10):1279–83.

    Article  CAS  Google Scholar 

  56. Chaichoompu K, Abegaz F, Sissades T, James Shaw P, Sakuntabhai A, Pereira L, et al. KRIS: keen and reliable interface subroutines for bioinformatic analysis. 2018.

  57. Xu Z, Niu L, Li L, Taylor JA. ENmix: a novel background correction method for Illumina HumanMethylation450 BeadChip. Nucleic Acids Res. 2016;44(3): e20.

    Article  Google Scholar 

  58. Kuhn M. Building predictive models in R using the caret package. J Stat Softw. 2008;28(1):1–26.

    Google Scholar 

  59. Friedman J, Hastie T, Tibshirani R. Regularization paths for generalized linear models via coordinate descent. J Stat Softw. 2010;33(1):1–22.

    Article  Google Scholar 

  60. Sing T, Sander O, Beerenwinkel N, Lengauer T. ROCR: visualizing classifier performance in R. Bioinformatics. 2005;21(20):3940–1.

    Article  CAS  Google Scholar 

Download references


We thank all the people with MS who participated in this research without whom this work would not be possible. The authors would like to acknowledge Professor Trevor Kilpatrick from the Royal Melbourne Hospital, Associate Professor Mark Slee from Flinders Medical Centre, and research nurses Ms Jo Baker, Ms Jodi Haartsen, Ms Sandra Williams and Ms Lisa Taylor for assisting with blood collection for this study. We also acknowledge Dr Louise Laverick and Ms Malgorzata Krupa for research assistance.


This study was financially supported by an MSRA Project Grant (18–0424), RMH Home Lottery Grant (MH2013-055), MSBase Foundation Project Grant, Charity Works for MS Project Grant, Pennycook Foundation Grant 2018 and a Monash University Project Grant.

Author information

Authors and Affiliations



MPC conducted statistical analyses, drafted and revised the manuscript and aided in study design and data interpretation. AX conducted statistical analyses, aided in study design and data interpretation, and revised the manuscript. VEM aided in sample collection, data interpretation and revised the manuscript. MS and TK aided in sample collection and revised the manuscript. HB and JLS aided in sample collection, study design, data interpretation and revised the manuscript. RJS acquired funding, aided in data interpretation and revised the manuscript. RAL and VGJ designed and conceptualised the study, acquired study funding, aided in sample collection and data interpretation, and drafted and revised the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Maria Pia Campagna.

Ethics declarations

Ethics approval and consent to participate

Ethics approval for the collection of demographic, clinical and treatment data via the MSBase Registry was obtained from the Alfred Health Human Research Ethics Committee (528/12) and from institutional review boards at all participating centres. The Australian National Mutual Acceptance Scheme approved the genetic component of this study (HREC/13/MH/189). Written informed consent was obtained from participants at each site as per local laws.

Consent for publication

Participants signed informed consent regarding publishing their data.

Competing interests

The authors report no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1.

. Supplementary figures.

Additional file 2.

. Supplementary tables.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Campagna, M.P., Xavier, A., Lea, R.A. et al. Whole-blood methylation signatures are associated with and accurately classify multiple sclerosis disease severity. Clin Epigenet 14, 194 (2022).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI:


  • DNA methylation
  • Epigenetics
  • Prognostics
  • Machine learning
  • Epigenome-wide association study
  • Genomics