Blood DNA methylation and COVID-19 outcomes

Background There are no prior reports that compare differentially methylated regions of DNA in blood samples from COVID-19 patients to samples collected before the SARS-CoV-2 pandemic using a shared epigenotyping platform. We performed a genome-wide analysis of circulating blood DNA CpG methylation using the Infinium Human MethylationEPIC BeadChip on 124 blood samples from hospitalized COVID-19-positive and COVID-19-negative patients and compared these data with previously reported data from 39 healthy individuals collected before the pandemic. Prospective outcome measures such as COVID-19-GRAM risk-score and mortality were combined with methylation data. Results Global mean methylation levels did not differ between COVID-19 patients and healthy pre-pandemic controls. About 75% of acute illness-associated differentially methylated regions were located near gene promoter regions and were hypo-methylated in comparison with healthy pre-pandemic controls. Gene ontology analyses revealed terms associated with the immune response to viral infections and leukocyte activation; and disease ontology analyses revealed a predominance of autoimmune disorders. Among COVID-19-positive patients, worse outcomes were associated with a prevailing hyper-methylated status. Recursive feature elimination identified 77 differentially methylated positions predictive of COVID-19 severity measured by the GRAM-risk score. Conclusion Our data contribute to the awareness that DNA methylation may influence the expression of genes that regulate COVID-19 progression and represent a targetable process in that setting. Supplementary Information The online version contains supplementary material available at 10.1186/s13148-021-01102-9.


Introduction
More than 3 million deaths worldwide have been attributed to COVID-19, primarily arising from acute respiratory distress syndrome (ARDS). The clinical course of SARS-CoV-2 infection is highly variable, ranging from an asymptomatic state to life-threatening infection [1][2][3][4]. Recent evidence indicates that disease severity predominantly depends on host factors [5][6][7][8][9][10], supporting the need to better resolve individual responses at the molecular level. We and others have recently described the multi-omic profile of COVID-19 patients in association with disease severity [10][11][12]. Analysis of mRNA sequencing from circulating leukocytes identified multiple expressed genes associated with worse outcomes [10,11,13].
Because almost every cell in an individual shares identical genomic sequence, distinct cellular phenotypes are established and maintained by epigenetic mechanisms [14,15], including DNA, histone and chromatin modifications and non-coding RNA expression that affect the chromatin landscape [16]. Although DNA cytosine methylation at 5′-C-phosphate-G-3′ (CpG) sites is pervasive in the genome and is considered relatively stable [15][16][17][18], it is highly sensitive to age and environmental factors [16,[19][20][21][22][23]. However, CpG rich regions (CpG islands) in gene promoter regions of actively transcribed genes are typically hypomethylated and selective hypermethylation of key cytosine residues may repress gene transcription by modifying transcription factor accessibility [16]. Moreover, methylation of cytosine residues outside the context of CpG sites may also affect gene transcription [16]. Critically ill patients exhibit altered circulating blood DNA methylation profiles [24,25]. Epigenetic marks affect gene expression profiles and increase individual vulnerability to viral infections [26]. For example, modulators of host-pathogen interactions including interferons are epigenetically regulated [27,28], and DNA methylation has been shown to underpin antigen-presentation following MERS-CoV infection [27][28][29]. To date, no methylomes of samples from COVID-19 patients have been compared to pre-pandemic sample sources, or to samples from patients with non-COVID-19 respiratory illness using a shared epigenotyping platform and facility. Moreover, it is currently unknown whether patients with worse outcomes and distinct transcriptomes [11,30,31] may be further distinguished by patterns of differential methylation. These data carry strong potential to illuminate mechanisms underlying COVID-19-associated gene expression and outcomes [32,33], and may facilitate the identification of sub-phenotypes likely to benefit from specific interventions [34][35][36]. For example, immunemodulating drugs such as corticosteroids, that are beneficial in COVID-19 patients [7,[37][38][39], interact with gene expression-response elements throughout the genome. Resolution of the differential methylome in COVID-19 patients offers potential insights into COVID-19 pathogenesis, susceptibility, diagnosis and prognosis.
Accordingly, we conducted a prospective cohort study involving 124 consecutive patients with and without COVID-19 diagnosis who were admitted to Albany Medical Center in Albany, New York. Thirty-nine healthy patient samples collected before the COVID-19 pandemic characterized with an identical epigenotyping platform provided reference methylomes (Fig. 1). We hypothesized that: (1) DNA methylation regions would Fig. 1 Diagram of the entire cohort involved in study: Notice that while the hospitalized patients' cohort contributed 128 patients, only 124 were part of the analyses due to inadequate quality of 4 samples; see diagram and details in the text differ in patients with COVID-19 diagnosis in comparison with pre-pandemic healthy control individuals; (2) DNA methylation regions would differ in patients with COVID-19 diagnosis in comparison with patients with respiratory illness of similar magnitude not caused by COVID-19; and 3) COVID-19 severity reflected by clinically validated outcome measures [40], would be associated with distinct patterns of DNA methylation in blood.

Sample cohort and experimental design
From 6 April 2020 through 1 May 2020, we collected blood samples from 128 adult patients admitted to the Albany Medical Center in Albany, NY for moderate to severe respiratory failure presumably related to infection with SARS-CoV-2 ( Fig. 1). In addition to acquisition of various clinical data (Table 1), a 10 ml blood sample was obtained at the time of enrollment. Patients who later tested positive (N = 102) and negative (N = 26) for SARS-CoV-2 infection were assigned to the COVID-19 and non-COVID-19 groups, respectively (see "Methods" for enrollment details). Females comprised 37.3% and 50.0% of the COVID-19 and non-COVID-19 patients, respectively. The average age of patients was similar: 60.5 (50.5-74.8) and 62  years in the COVID-19 group (females and males, respectively; p value = 0.28) compared the non-COVID-19 patients: 59.5  and 68.2 (63-82) years, (females and males respectively; p value = 0.09). The average number of days hospitalized before study enrollment was 3 and 1 for the COVID-19 and non-COVID-19 patients, respectively (Table 1). To identify DNA CpG methylation changes associated with COVID-19, we compared DNA methylation data from COVID-19 patients (N = 102) to DNA methylation data from a previously published study [21] that profiled DNA methylation from whole blood of healthy participants (N = 39) that was collected at least 3 years before the COVID-19 outbreak. An identical epigenotyping platform and facility (Genuity Science, Inc. Boston, MA) was used to obtain the methylation data. To test whether COVID-19 severity correlates with patterns of differential DNA methylation in blood, we used the COVID-19 specific GRAM risk score [40] and patient mortality. Other clinical data included: Acute Physiologic Assessment and Chronic Health Evaluation (APACHE II) score, Sequential Organ Failure Assessment (SOFA) score [41], SAPS II score, Charlson Comorbidity Index score [42], mechanical ventilation physiological parameters, need for admission to intensive care, and C-reactive protein (CRP), D-dimer, ferritin, lactate, procalcitonin, fibrinogen, and other levels ( Table 1). APACHE II, SOFA, and SAPS II severity scores assigned to patients in intensive care, exhibited similar distributions between the groups (Table 1). In keeping with previous reports, males predominated in the group requiring intensive care (66 vs. 33%) and mechanical ventilation (46.9 vs. 34.2%, Additional file 1: see clinical data Table S1).

DNA methylation in blood is altered in COVID-19 patients
Average DNA methylation abundance across the entire genome did not significantly differ between COVID-19 patients (58.8%) and healthy pre-pandemic controls (58.7%), indicating that no global changes in methylation abundance are related to COVID-19 ( Fig. 2A). To investigate locus-specific DNA methylation levels linked to COVID-19, methylome data were subjected to a linear regression model that accounted for batch effects, sex, and leukocyte proportions for downstream analyses [43]. This approach detected 1505 differentially methylated regions (DMRs) distributed across the entire genome comprising clusters of ≥ 5 CpGs (FDR p value < 0.05; Fig. 2B; Additional file 1: Table S2-1). A total of 416 hyper-methylated and 1089 hypo-methylated DMRs were distinguished, indicating that a majority of differentially methylated regions are hypo-methylated, as noted in a recent report of 7 COVID-19 positive patients [10]. A majority of DMRs (~ 75%) reside within or near gene promoter regions, denoting a potential role in gene regulation [16] (Fig. 2C). The 1,505 DMRs were annotated to 1,680 unique genes, indicating that several DMRs spanned two contiguous genes that harbor alterations in DNA methylation in the presence of SARS-CoV-2 infection. To test the relationships between the DMR-associated genes, we conducted a gene ontological analysis and found significant enrichments of immune-related terms, including immune responsivity, leukocyte activation, and defense responses, together with a diversity of recognized immune function genes (cytokines/chemokines and receptors (including IL-10, IL-1β, CXCR2/5/6), interferon-stimulated genes (IFIT3, ISG20), and signal transduction genes (TRAF2, ZAP70), (FDR p value < 0.05; Fig. 2D; Additional file 1: Table S2-2). A disease ontological analysis of methylation regions that differ between COVID-19 patients and healthy pre-pandemic controls indicated significant associations of DMR-associated genes with autoimmune diseases, including systemic lupus erythematosus and rheumatoid arthritis (FDR p value < 0.05; Fig. 2E; Additional file 1: Table S2-1). We observed no difference in the gap between chronologic age and "epigenetic clock" age between COVID-19 patients and healthy pre-pandemic controls, suggesting that there is no difference between the two groups in predisposition and resilience to an acute infection known to have enhanced severity in the elderly [44,45] (Additional file 2: Figure 1). These findings indicate that differential

DNA methylation in blood is specific to SARS-CoV-2 infection
To identify DNA methylation profiles that distinguish concurrently enrolled respiratory patients with and without COVID-19, we analyzed data of 128 patients, with (N = 102) and without (N = 26) COVID-19 diagnosis collected concurrently at Albany Medical Center ( Fig. 1, Table 1). Four samples (two COVID-19 and two non-COVID-19 patients) were removed due to unreliable methylation values ( Fig. 1) and 95,447 probes were removed leaving 770,412 for further analysis. Average Peptic ulcer disease   Fig. 2D; Additional file 1: Table S3-3).
These findings indicate that COVID-19 patients demonstrate an altered blood methylome compared to that of patients with respiratory illness arising from other causes, and that differences in DNA methylation occur at genes specific to COVID-19. These results indicate that the observed DMRs occur in genes that participate in the pathogenesis of inflammatory and leukocyte disorders

COVID-19 DNA methylation in blood and interferon-stimulated gene (ISG) expression
To narrow our focus on COVID-19 specific DMRs, we identified common DMRs from COVID-19 patients vs. healthy pre-pandemic control individuals, and DMRs from COVID-19 patients vs. patients with non-COVID-19 respiratory illness. Forty-seven DMRs are shared between the two datasets ( Fig. 4A; Additional file 1: Table S4-1). Twenty-five of the 47 DMRs are closely linked to B lymphocyte, T lymphocyte, macrophage, and neutrophil functions, including antiviral activity, cytokine production, inflammation, and innate and adaptive immunity (Additional file 1: Table S4-2). Gene ontology and pathway enrichment analysis revealed significant enrichment in terms related to host defense responses including interferon alpha and beta signaling, defense response to organisms, and activation of the immune system (Fig. 4B). DMRs were hypo-methylated in promoter regions and contiguous sites in 2 prototypical interferonstimulated genes, IFI27 and OAS2, (Fig. 4C, D), suggesting possible regulatory effects on gene expression. Both previously published RNA sequencing analysis of the same samples [11], and RT-qPCR experiments done for this project confirm that transcriptional products of IFI27 and OAS2 are upregulated in COVID-19 samples in comparison with non-COVID-19 control patients (Fig. 5A, B).
To gain insight into the effects of DMRs on gene expression, we compared DMRs between COVID-19 patients and patients with non-COVID-19 respiratory illness, with differentially expressed genes (DEGs) identified in our RNAseq analysis of circulating leukocytes from the same patients [11]. We identified 36 genes that were both differentially methylated and differentially expressed in COVID-19 patients. This gene set was highly enriched in the gene ontology term: defense response to virus (27/36 genes) and a Reactome gene set: interferon signaling (19/36 genes) (Additional file 1: Tables S5-1 and S5-2). Eight in the interferon pathway were upregulated in parallel with the presence of DMRs in their genes. All identified DMRs were hypo-methylated with at least 5 consecutive CpGs near promoter regions (Additional file 1: Table S5-1).

DNA methylation in blood and COVID-19 severity
The GRAM score is a validated outcome measure that defines the risk of deterioration in COVID-19 patients [40]. We obtained GRAM scores and mortality outcomes These results indicate that 254 DNA differentially methylated regions distinguish SARS-Cov-2 infection from non-COVID-19 respiratory illness. C Bar Graph of the top ten disease ontological (DO) biological processes related to the SARS-CoV-2-associted differentially methylated genes, ordered by statistical significance. The X-axis indicates the number of SARS-CoV-2 DMR-associated genes that contribute to each DO term. Bar color indicates the FDR p value using a Fischer test. These results indicate that the observed DMRs occur in genes that participate in inflammatory and host-defense processes. D Bar Graph of the top ten gene ontological (GO) processes related to the SARS-CoV-2-associated differentially methylated genes, ordered by statistical significance. The X-axis indicates the number of SARS-CoV-2 DMR-associated genes that contribute to each GO term. BAR color indicates the FDR P-value by using a Fischer test. These results indicate that the observed DMRs occur in genes that participate in the pathology of influenza, other viral infections and inflammatory disorders in our cohort, which allowed comparison of different disease burdens with DMRs in blood, and to test the potential value of DMR analysis as a predictor of patient prognosis. The GRAM-score risk percentage was dichotomized into a discrete variable (i.e., low [< 50%] and high [> 50%]) and DNA methylation data was regressed on this variable in the COVID-19 respiratory patients (N = 100). Because the GRAM-risk score has been validated for specific use in COVID-19 patients [40], only patients with COVID-19 were included in the analysis (Table 1). Nineteen DMRs with ≥ 3 consecutive differentially methylated CpGs were identified, (p value < 0.0001, Additional file 1: Table S7) between patients with low and high GRAMrisk scores. In total, the DMRs comprised 145 differentially methylated positions (DMPs), of which there were 84% located at gene promoter regions and ~ 65% were hyper-methylated (Fig. 6A). Evaluation of mortality as an outcome measure identified 18 DMRs comprising 113 DMPs, 62% of which were hyper-methylated.
To identify specific DMPs that best define GRAMscore risk, the DNA methylation levels at these 145 GRAM-risk score-associated DMPs were subjected to a recursive feature elimination analysis [46]. This algorithm  GAPDH was used as a reference gene; see methods for details. **; p value < 0.01 revealed 77 DMPs with methylation levels that distinguish COVID-19 severity in a hierarchical cluster analysis (Fig. 6B). These data suggest that worse outcomes are associated with hyper-methylation in promoter regions and that specific positions throughout the genome may potentially correlate with COVID-19 severity.

Discussion
In this prospective cohort study, we tested the hypothesis that COVID-19 patients demonstrate patterns of DNA methylation in blood that are different from prepandemic healthy individuals, and from patients with respiratory illness who did not have COVID-19. We also tested whether worse outcomes in COVID-19 patients were associated with DMRs and DMPs in blood.

DNA methylation in blood altered in COVID-19 patients
In samples obtained within days of acute SARS-CoV-2 infection, patients exhibit 1089 (72%) hypo-methylated regions and 416 (28%) hyper-methylated regions comprising 5 or more consecutive differentially methylated CpGs in comparison with healthy control blood samples collected before the COVID-19 pandemic (Fig. 2B). A recent report comparing patients with and without sepsis of unspecified origin indicates differential methylation at genes that participate in interferon-gammamediated (IFNγ) signaling, MHCII antigen processing and presentation, immunoglobulin production, and cell adhesion pathways [47]. In a limited study of 6 patients with SARS-CoV-2 infection, 6 DMPs (not DMRs) were observed in genes that encode proteins that participate in granulopoiesis and B-lymphocyte-to-granulocyte trans-differentiation [10]; and a very recent report of a larger cohort identifies 44 CpGs in 39 genes that were differentially methylated, including genes related to interferon response to viral infections [48]. It has also been previously reported that viral infections induce aberrant methylation patterns in host cells [33,49]. For instance, H5N1 influenza and Middle Eastern respiratory syndrome coronavirus (MERS-CoV) infections downregulate interferon-stimulated and antigen-presenting genes, which are associated with hyper-methylation of gene promoter regions in human airway epithelial cells in vitro [28,29]. The large number of DMRs identified by the very conservative criteria and inferential comparisons used here, and the diversity of their corresponding loci and pathways, are surprising in view of the short interval from infection to hospitalization in the enrolled patients, thereby potentially denoting the role of the methylome as a rapid responder to SARS-CoV-2 infection. Interestingly, a very recent report focused on pediatric critical illness demonstrates a rapid regulation of DNA methylation in circulating leukocytes, taking place within the first three days of hospitalization [50].
Genes comprising DMRs between patients with COVID-19 and healthy pre-pandemic controls include  (N = 100). DMRs were ascertained as regions with at least 3 consecutive CpGs where > 75% of the CpGs in the region had a FDR p value < 0.05 and all were either hyper-methylated or hypo-methylated. DNA methylation levels of the DMPs (N = 145) residing in the DMRs were subjected to recursive feature elimination to identify CpGs that best distinguish GRAM-score risk. Shown is a hierarchical cluster using the DNA methylation data from the 77 DMPS (see Additional file 1: Table S8), that are shown as a heatmap of the M-values. Low GRAM-score risk (gray) and high GRAM-score risk (black) are indicated. These results indicate that DNA methylation levels at these 77 DMPs may be useful as biomarkers of the severity of COVID-19 patients. (see Additional file 1: Table S6-1 and S6-2) IFN-stimulated genes (ISGs), with well-recognized antiviral activity such as IFI27 and OAS2. Differential methylation of type I IFN pathway genes in specific leukocyte subsets is associated with autoimmune disorders including Sjogren's syndrome, Lupus, Grave's disease, and rheumatoid arthritis [51][52][53][54][55], indicating a possible role for ISG methylation in the dysregulation of inflammatory processes, and autoimmunity as a potential contributor to COVID-19 pathogenesis [56,57]. Much less is known about the impact of ISG methylation in blood on the control of viral infections. Recently, a correlation between ISG methylation and the outcome of HIV infection has been reported, with hyper-methylation of interferon and antiviral genes correlated with improved HIV control [58]. In SARS-CoV-2 infection, differential methylation and expression of antiviral ISGs may influence viral replication and spread in leukocyte subsets [59], or contribute to COVID-19 pathogenesis by altering immune cell activation or function. Multiple DMRs reported here appear in genes recently found with dysregulated expression levels in samples from the identical patients [11]. As previously described, methylation of CpGs located at promoter regions is canonically associated with transcriptional repression [16]. Mechanistically, methylated CpGs recruit complexes holding methyl-CpG binding domain-containing proteins and other factors that aggregate into multiprotein repressive complexes to silence transcription [60,61]. Of note, while our previously reported RNA sequencing analysis [11] and the qPCR experiments presented here indicate that expression of ISGs, IFI27 and OAS2 is upregulated in COVID-19 patients, their differential hypo-methylation in gene promoter regions suggests that methylation may contribute to their transcriptional regulation. Future studies focused on cross-ome trajectory analyses combined with locusspecific interrogation of DNA methylation will add clarity to this possibility.

DNA methylation in blood and COVID-19 severity
To determine if disease severity in COVID-19 patients is associated with DMRs in blood, we tested the association of DMRs with clinical outcomes including the GRAM risk score [40] and mortality. We found that worse GRAM scores were associated with 19 DMRs comprising 145 differentially methylated positions (DMPs) in 18 genes. Sixty-three percent of the GRAM-score-associated DMPs were hyper-methylated. Mortality was associated with 18 DMRs comprising 113 DMPs in 17 genes (Additional file 1: Table S9). In this setting, 61% of the DMRs were hyper-methylated. Over 84% of the DMRs associated with outcomes were located in gene promoter regions; notably, promoter hyper-methylation is canonically associated with transcriptional repression [15,16,18]. Previous research indicates that non-permissive (immunosuppressive) transcriptomic states are associated with worse outcomes in critical illness [30,[62][63][64]. Moreover, protracted COVID-19 is associated with blockade of T-cells proliferation [65] and suppression of the innate immune system in circulating blood [13]. These data suggest that as COVID-19 severity increases, promoter-predominant hypermethylation may regulate transcriptional repression at critical genes, potentially influencing the pathophysiology of host response. Future investigations with animal models will enable further elucidation of these pathways, and test whether promoter hypermethylation induced by greater severity downregulates gene expression and is physiologically consequential.
Using recursive feature elimination, we identified 77 DMPs that discriminate clinical outcomes. Given the observed epigenetic changes were captured at the time of patients' enrollment, their combined use with other biochemical variables may support an accurate predictive instrument to guide care before clinical deterioration. This instrument may be of value for resource-allocation in a pandemic environment with an overwhelmed healthcare system [66]. Future studies based on whole genome methylation sequencing will provide the opportunity to capture other outcome-associated DMRs and DMPs and provide a more comprehensive instrument based on a shared principle [11,67,68].

Strengths and limitations of this study
While the global RNA transcriptomic profiles in blood have been previously reported in sepsis [30], acute respiratory distress syndrome [31,69] and COVID-19 [10,11,13], and there are recently reported small cohorts describing blood DNA methylation in COVID-19 [70], there are no prior reports that compare differentially methylated regions blood samples from COVID-19 patients to samples collected before the SARS-CoV-2 pandemic using a shared epigenotyping platform. Together, DNA methylation and RNA expression data may facilitate improved COVID-19 diagnosis, prognosis, and targeted treatments [71]. As well, we provide a first large, prospective cohort study that addresses the association of COVID-19 blood methylome and patientcentered outcomes including mortality. Future studies will investigate the association of DMRs in blood with longer-term outcomes. For example, COVID-19-induced DMRs may persist long after acute care, contributing to the post-ICU syndrome comprising physical and cognitive dysfunction [72][73][74][75]. Recent data indicate that blood DNA methylation profiles mediate worse neurocognitive development in the pediatric ICU population [25], which could be relevant in COVID-19 as well [76,77].
Our study has some limitations. First, COVID-19 patients were enrolled in a single center. Although our population is racially diverse, it does not necessarily replicate factors related to geography or population socioeconomic status elsewhere. Second, while we enrolled the patients near the time of hospital admission, we did not control the interval that elapsed between disease onset and the blood sampling (Table 1). While most observational clinical ICU research is based on inception cohort studies in which the timeline is arbitrarily defined by the moment of patient enrollment [78], the time elapsed between admission and enrollment could have an effect on the analyzed data. Of note, previous research on the transcriptome of patients with sepsis indicated that the timing of blood sampling in relation to ICU admission was not predictive of the patients' expression profiles [30]. Third, due to the urgency-driven pandemic environment and the initial lack of formal recommendations by medical society guidelines, we had no control of the various drugs administered to the patients including azithromycin, hydroxychloroquine, corticosteroids and others, which could have impacted the overall COVID-19 data generated. Fourth, our study relies on a cross-sectional blood sample per patient on admission and must be complemented with longitudinal sampling and trajectory analysis to ascertain the dynamics of DNA methylation in COVID-19. Lastly, our analysis is based on a highcapacity chip array which despite contributing valuable information, is limited to about 3.4% of the genome [79]. Future studies based on whole methylome sequencing analysis will assure a more highly resolved database of DMRs associated with COVID-19 severity.
In summary, we generated pre-and post-COVID-19 methylome maps, and have shown that while acute respiratory disease causes substantial changes in the DNA methylation status leading to a predominantly hypomethylated state, COVID-19 infection is associated with differentially methylated regions impacting, among others, IFN-stimulated genes (ISGs). Using RNA sequencing analysis from the same patients [11], and confirmed with qPCR experiments to interrogate two prototypal ISGs, we found that gene promoter regions of overexpressed transcripts are hypomethylated in COVID-19 versus non-COVID-19 patients, thereby suggesting an epigenetic regulatory mechanism played by CpG methylation. Moreover, we found that sicker patients exhibit a predominantly promoter hypermethylated profile, possibly suggesting a regulatory role of DMRs in the immunosuppressive gene profile already described in severe COVID-19 cases [13]. Finally, using a recursive feature elimination algorithm, we identified a limited number of DMPs that accurately discriminate clinical outcomes, suggestive the potential role of DNA methylation profiling in the early prognostication of COVID-19 patients.

Methodological considerations
The use of the Illumina Infinium MethylationEPIC 850,000 BeadChip facilitates comparisons of data between investigations that employ a shared platform comprising sites that span the genome. This approach, which predominantly captures circulating leukocytes DNA, has been recently used in the intensive care setting [25]. However, the array is intrinsically biased by a priori selection of regions targeted for interrogation and does not incorporate over 24,000,000 additional CpGs amenable to direct sequencing of the entire methylome. While use of mixed cell populations in whole blood is of high relevance in infectious disease diagnosis and prognosis [67], and has supported identification of actionable subphenotypes [34,35,71], it does not capture processes taking place in other tissue compartments or specific cell types relevant to the pathogenesis of the COVID-19 that may arise from tissue or cell type-specific DMRs.
Whereas the nucleotide sequence of the genome is remarkably stable from conception to death [15,16], our data contribute to the awareness that DNA methylation is rapidly dynamic, influences the expression of genes that regulate COVID-19 progression [11], and potentially modifiable by acute insults which could be reversed by targeted interventions [80].

Cohort characteristics Human subject enrollment
Albany Medical Center With approval of the Albany Medical College Committee on Research Involving Human Subjects (AMC IRB Study No. 5670-20), we conducted a single-center observational study of adult subjects admitted to either the medical floor or the medical intensive care unit (MICU) of Albany Medical Center in Albany, NY. Enrollment took place between April 6, 2020, and May 1, 2020, and follow-up continued until June 15, 2020. Patients were eligible for enrollment if they were older than 18 years and were admitted to the hospital for symptoms compatible with COVID-19. Exclusion criteria were imminent death or inability to provide consent, which was obtained from the patient or a legally authorized representative. Patients were assigned to the COVID-19 group only after receiving a positive test result via nasopharyngeal swab testing using the Abbott Realtime SARS-CoV-2 Assay ® (Abbott, IL). SARS-CoV-2 test negative participants were assigned to the non-COVID-19 respiratory patient group as controls. The cause of respiratory distress in the non-COVID-19 patients is presented in Additional file 1: Table S1. Pre-hospital co-morbidities determined using clinical history and hospital documentation were aggregated using the Charlson comorbidity index [42]. APACHE II, SOFA, and SAPS II scores were used to assess severity of critical illness on ICU admission [41]. Sex, age, and other relevant subject data are provided in Table 1 and the Additional file 1: Table S1.
Wisconsin Alzheimer's disease research center With approval of the University of Wisconsin Institutional Review Board (UW IRB Study No. 2015-0300), blood samples were collected before 2017 from 39 healthy normal control participants. Participants were recruited from the community by advertisements and outreach events, and served as healthy normal controls in a Wisconsin Alzheimer's Disease Research Center (WADRC) investigation [21]. The healthy normal control participants complete a yearly study visit consisting of a blood draw, medical history questionnaires, psychometric testing, a physical exam, and must have no known diseases that interfere with study participation over time. Demographic details of the healthy normal control participants are provided in Table 1.

Selection of outcome measures
We analyzed the data with an outcome measure that: (1) is able to combine the severity of disease with mortality in a single metric; (2) is applicable to both ICU and medical floor populations; (3) uses a timeframe that accounts for longer hospitalizations in COVID-19 patients with respiratory failure compared with non-COVID-19 individuals [3,81]; (4) accounts for COVID-19 linear deterioration that transitions from mild respiratory compromise to respiratory failure, followed by respiratory distress requiring mechanical ventilatory support and eventually death. Thus, we selected the composite outcome variable defined by the COVID-19 risk GRAM score [40]. Characteristics contributing to the determination of the COVID-19 risk GRAM score are shown in Additional file 1: Table S6-1 and 6-2. To simplify the analysis, patients were separated into two groups based on a calculated risk percentage below or above 50%. The secondary outcome measure was in-hospital mortality.

Sample collection and storage
At enrollment, blood samples were collected using BD EDTA Vacutainers ® . Whole blood was then aliquoted and frozen at − 80 C degrees for later processing and analysis.

DNA isolation and methylation microarray
DNA was isolated from 500µL of frozen whole blood using the GeneJET whole blood kit (Thermo Fisher Scientific, K0782) following the manufacturer's protocols. DNA concentration was determined using a Qubit fluorometer (Thermo Fisher Scientific) and normalized to 20 ng/µL for microarray analysis. Samples were shipped overnighted to Genuity Science Inc. (Boston, MA) for bisulfite conversion and methylation microarray analysis using the Illumina Infinium MethylationEPIC Beadchip array [82]. The shared collection and processing of the blood DNA methylation levels from the Wisconsin healthy individuals' cohort (WADRC) was previously published [21].

Leukocyte messenger RNA (mRNA) expression determination
Whole blood samples were simultaneously collected from all participants, and leukocytes were isolated using LeukoLOCK filters (Cat. No. AM1923; Thermo Fisher). RNA was then extracted from the filters following the manufacturer's instructions and as previously reported [83]. Five hundred nanograms of total RNA was used to prepare cDNA using Qiagen RT Master Mix at 42 °C (Cat. No. 210215; Qiagen) following the manufacturer's instructions. After RT reaction, 2 µL of cDNA was used per qPCR reaction. qPCR was performed on a CFX Connect (Bio-Rad) instrument using SYBR green-based universal iTaq supermix (Cat. No. 1725125; Bio-Rad) and 2 pmol primers (IDT). Fold induction was calculated using the ΔΔCt method using GAPDH as the reference gene. Each sample was assayed in triplicate, and a negative control was included in each experiment. Primer sequences can be found in Additional file 1: Table S10.

Illumina human MethylationEPIC data preprocessing
To identify methylation changes associated with COVID-19, we compared COVID-19 patients (N = 102) to methylation data from pre-pandemic participants [21] that were enrolled 3 or years before the SARS-CoV-2 outbreak (N = 39). Raw.idat files from all (N = 141) were imported to the R environment. R package minfi was used to parse and preprocess methylation microarray data [84]. The quality of raw data was assessed, and no samples were filtered due to high mean detection p value (i.e., mean > 0.05). Bisulfite conversion of samples was assessed for each sample by density and bean plots, and determinations, to assure that the distribution of betavalues was bimodal with the largest densities being centered on 0 or 1, and that the majority of data was either methylated or unmethylated. All samples were deemed to be successfully converted. Leukocyte proportions were estimated from methylation signatures, and cell counts were extracted for incorporation into models of differential methylation. Samples were normalized using functional normalization by background and dye correction following the normal-exponential out-of-band method [85]. Following normalization, sex prediction was generated using normalized values.
Two COVID-19 samples were removed due to improper sex prediction from the COVID-19 and non-COVID-19 cohorts each, suggesting unreliable methylation values from these samples. Probes were removed from remaining samples (N = 139) if any of the following criteria applied: probes measured methylation on sex chromosomes; probes contained or reported methylation at SNPs; probes measured methylation at CH sites; detection p value of a probe > 0.01 for at most one sample; and probes were known to be cross-reactive. This filtering approach removed 99,905 probes through quality processing, leaving 765,954 for further analysis. Betavalues and logit M-values from the remaining probe set were generated for differential analysis. A one-way ANOVA was used to determine significant differences between mean beta-values of patients between groups.
To identify methylation changes associated specifically with COVID-19 versus non-COVID-19 respiratory patients, or other variables of interest (i.e., GRAM score, and mortality), raw.idat files from the AMC cohort (N = 124) samples were imported to the R environment. R package minfi was used to parse and preprocess methylation microarray data [84]. The quality of raw data and bisulfite conversion were assessed, leukocyte proportions were estimated, and samples were normalized, as above. After normalization, sex prediction was generated using normalized values. Four samples were removed due to improper sex prediction, suggesting unreliable methylation values for these samples. Probes were removed from remaining samples (N = 124) using the criteria as above. Filtering removed 95,447 probes through quality processing, leaving 770,412 for further analysis. Beta-values and logit M-values were generated for differential analysis. A one-way ANOVA was used to determine significant differences between mean beta-values of patients between groups.

Model selection for differential analysis
Several potential models using available covariates were assessed to generate the best fit for the data. To compare COVID-19 (N = 100) samples with pre-pandemic samples (N = 39), models accounting for COVID-19 status (positive vs. negative), age, sex, and estimated leukocyte proportions (i.e., granulocytes, monocytes, natural killer cells, B lymphocytes, CD8 T lymphocytes, CD4 T lymphocytes) were generated. Model selection was based on BIC score criterion. Of the tested models, a model accounting for COVID-19 status, sex, and leukocyte proportions was preferential and used for downstream analyses. Batch effects between microarrays were adjusted using ComBat from the R package sva [86].
Batch-adjusted beta-and M-values were assessed by the R package sva to identify unknown confounders such as with other infections or complications. The surrogate variables found were adjusted for during model fitting.
When assessing methylation levels associated with mortality of COVID-19 patients (N = 100) and GRAM score (N = 100), model selection was performed as above. Based on BIC criterion, models adjusted for surrogate variables using sva were selected for downstream analysis. Two outlier samples were removed from the GRAM score analysis because their scores were greater than 3 standard deviations from the mean.

Detection of differentially methylated regions
R package DMRcate was used for the detection of differentially methylated regions (DMRs) [43]. M-value matrices were annotated to their chromosomal position, and test statistics were generated for variables of interest using models as described above. For comparisons of COVID-19 patients versus pre-pandemic healthy participants, and COVID-19 patients versus non-COVID respiratory patients, DMRs were identified using an FDR p value cutoff of 0.05 and a minimum of 5 CpG sites in the region. For the comparison of methylation levels to GRAM score, and mortality, criteria for DMR identification included a p value cutoff of 0.0001 and a minimum of 3 CpG sites in the region. Genes annotated to DMRs were extracted for downstream ontological analyses.

Ontological analyses
Genes comprising DMRs were assessed for ontological analyses of biological processes and diseases using the R package clusterProfiler [87]. A listing of background genes was generated from all tested regions from DMRcate (N = 20,899 genes). Gene symbols were converted to ENTREZIDs. Significant terms were determined using an FDR p value cutoff of 0.05, comparing differentially methylated genes to the background gene list.

Plot generation
Manhattan plot generation used R packages qqman and ggplot2. For the pie plot, R package ChIPseeker was used to annotate regions [87]. Bar plots of ontological terms were generated using the R package clusterProfiler. Hypergeometric tests in the R environment were used to identify enrichments of gene lists. Customized Circos plots were generated using the R package BioCircos [88]. For heatmap generation of dichotomous GRAM score data, the R package caret was used for backward feature selection, starting with a matrix