- Research
- Open access
- Published:
Multi-omics association study of DNA methylation and gene expression levels and diagnoses of cardiovascular diseases in Danish Twins
Clinical Epigenetics volume 16, Article number: 117 (2024)
Abstract
Background
Cardiovascular diseases (CVDs) are major causes of mortality and morbidity worldwide; yet the understanding of their molecular basis is incomplete. Multi-omics studies have significant potential to uncover these mechanisms, but such studies are challenged by genetic and environmental confounding—a problem that can be effectively reduced by investigating intrapair differences in twins. Here, we linked data on all diagnoses of the circulatory system from the nationwide Danish Patient Registry (spanning 1977–2022) to a study population of 835 twins holding genome-wide DNA methylation and gene expression data. CVD diagnoses were divided into prevalent or incident cases (i.e., occurring before or after blood sample collection (2007–2011)). The diagnoses were classified into four groups: cerebrovascular diseases, coronary artery disease (CAD), arterial and other cardiovascular diseases (AOCDs), and diseases of the veins and lymphatic system. Statistical analyses were performed by linear (prevalent cases) or cox (incident cases) regression analyses at both the individual-level and twin pair-level. Significant genes (p < 0.05) in both types of biological data and at both levels were inspected by bioinformatic analyses, including gene set enrichment analysis and interaction network analysis.
Results
In general, more genes were found for prevalent than for incident cases, and bioinformatic analyses primarily found pathways of the immune system, signal transduction and diseases for prevalent cases, and pathways of cell–cell communication, metabolisms of proteins and RNA, gene expression, and chromatin organization groups for incident cases. This potentially reflects biology related to response to CVD (prevalent cases) and mechanisms related to regulation and development of disease (incident cases). Of specific genes, Myosin 1E was found to be central for CAD, and DEAD-Box Helicase 5 for AOCD. These genes were observed in both the prevalent and the incident analyses, potentially reflecting that their DNA methylation and gene transcription levels change both because of disease (prevalent cases) and prior disease (incident cases).
Conclusion
We present novel biomarkers for CVD by performing multi-omics analysis in twins, hereby lowering the confounding due to shared genetics and early life environment—a study design that is surprisingly rare in the field of CVD, and where additional studies are highly needed.
Background
Globally, cardiovascular diseases (CVDs) remain the leading cause of death, especially deaths due to coronary artery disease (CAD) and stroke are frequent [1]. Overall, CVD comprise several complex heterogeneous diseases of the circulatory system caused by an interplay between biological variation, and physiological, environmental, and lifestyle factors [1]. Although the prevalence of CVD continues to increase globally, a detailed understanding of the molecular mechanisms underlying CVD is still incomplete.
Genome-wide association studies (GWASs) have identified several genetic loci associated with CVD [2,3,4,5,6]. For instance, GWAS of CAD have identified more than 160 genetic loci during the last decade [5], and new studies continue to detect novel loci. For instance, recently 95 loci were identified in 1,093,078 individuals (including 243,392 CAD cases) [6]. Moreover, polygenetic risk scores of CVD have been published, summarizing the risk related to the identified genetic variants [7]. Nevertheless, such GWAS provide little or no molecular evidence of gene causality. This realization led to the integration of genetic data with additional high-throughput molecular data designed to investigate the epigenome, transcriptome, proteome, or metabolome, thus enabling the integrative analysis of multi-omics data to potentially discover casual genes and identify essential molecular mechanisms involved in CVD progression [8].
Epigenetics is a regulatory mechanism that can alter gene activity without changing the DNA sequence, where the most studied epigenetic mechanism is DNA methylation [9]. DNA methylation plays a key role in gene expression and mainly occurs in cytosine contiguous with guanine (CpGs). Several epigenome-wide association studies (EWASs) have identified CpGs showing differential methylation patterns related to CVD [10,11,12] and its cardiometabolic risk factors such as blood pressure [13], lipid levels [14], and body mass index (BMI) [15]. For instance, a cohort study by Agha et. al. identified an increase of methylation of 52 CpGs to be associated with CAD in 11,461 individuals (including 1895 CAD cases) [11]. Transcriptomics, also known as gene expression profiling, refers to the study of gene expression itself as measured by RNA levels [16]. A limited number of transcriptomics-wide association studies (TWASs) have identified genes linked to CVD [17, 18]. For instance, Liao et al. showed that 190 mRNAs were upregulated, and 176 mRNAs were downregulated significantly in 40 CAD patients compared to 20 controls [18]. Lastly, the integration of DNA methylation data and gene expression data in relation to CVD is rare. To the best of our knowledge, only two studies have been published on this topic: a recent study by Xiaokang Z. et al. [19] found five genes (ATG7, BACH2, CDK1B, DHCR24, and MPO) that were regulated by DNA methylation and associated with CAD in 2,085 participants. The other study by Palou-Márquez et al. [20] identified three independent latent factors (named 9, 21, and 27) linked to CAD: factor 9 related to age- and cell-type proportions; factor 21 related to immunodeficiency virus infection-related pathway and inflammation; and factor 27 related to lifestyle factors like smoking, alcohol consumption, and BMI. The study was performed in 2620 and 1892 participants with DNA methylation and gene expression data, respectively.
However, one challenge in investigating the association between nongenetic biomarkers and CVD is that this association might be confounded by genetic variation. One way to control for this genetic bias is to investigate twin pairs that are discordant for the disease, i.e., where one twin develops the disease of interest, while the co-twin does not. This study design is known as the discordant twin pair design [21], and it is a statistically efficient approach to control for confounding factors [22]. For instance, for DNA methylation array data, it has been estimated that, compared to the classical case–control design, the discordant twin pair design needs down to only one-tenth the individuals for a trait with a heritability of 60% [22]. The discordant twin pair design also controls for differences in environmental factors potentially confounding the association, especially factors related to the shared early life environment. Hence, the aim of the current study was to integrate DNA methylation and gene expression data to detect molecular mechanisms associated with CVD in Danish twins. The integration of several omics layers provides the possibility to investigate the complexity of CVD and potentially reveal new predictive biomarkers and therapeutic targets [23]. To the best of our knowledge, this is the first twin study to integrate DNA methylation and gene expression data to identify molecular mechanisms associated with CVD.
Materials and methods
For the present study, genome-wide epigenetic data and genome-wide gene transcription data were available for 835 twins (including 412 complete twin pairs). These data were analyzed together with information on diagnoses of CVD obtained for the same individuals from the Danish National Patient Registry (DNPR) [24]. The study design is depicted in Fig. 1 and explained below, and additional information regarding biological data and data preparation can be found in Supplementary Methods.
Study population
The study population comprised 835 twins drawn from three population-based and nationwide surveys from the Danish Twin Registry [25]: The Longitudinal Study of Aging Danish Twins (LSADT), The Middle Age Danish Twin study (MADT), and the Birthweight-Discordant Study (hereon called LifeSpan). The LSADT, initiated in 1995, was a cohort-sequential study of Danish twins aged 70 years or more, where surviving twins were followed up every second year until 2005 [26]. In 2007, all twin pairs, in which both twins were still alive, were invited to participate. The MADT was initiated in 1998 and included twins randomly selected from birth cohorts spanning 1931–1952. In 2008–2011, a follow-up study was conducted of all eligible twin pairs originally enrolled [25]. The LifeSpan cohort was conducted in 2009–2011 and included twin pairs from the registry, who had a median intrapair difference in birth weight of 0.5 kg [27]. The LifeSpan samples have been analyzed for epigenome-wide association to birth-weight discordance by Tan et al. [28], who reported no significant CpGs. The present study population consists of twins for whom both genome-wide epigenetic and genome-wide transcriptomic data were available: 84 twins from the LSADT 2007 wave, 476 twins from the MADT 2008–2010 wave, and 275 twins from the Lifespan cohort. Zygosity of the present study population was determined either by microsatellite markers and/or genome-wide genetic data (LifeSpan and LSADT) or by four questions regarding physical similarity (MADT). The latter has been shown to correctly classify more than 95% of the pairs [29]. Characteristics for the current study population are summarized in Table 1.
For all cohorts, data were collected through comprehensive interview-based questionnaries and examinations, as well as blood sampling. Moreover, data on disease diagnoses and survival status were available through linkage to nationwide health registries (see below). Informed consent to take part in the cohorts was obtained from all participants, and the survey was approved by the Regional Scientific Ethical Committees for Southern Denmark (S-VF-19980072, S-VF-20040241, and S-20090033) and conducted in accordance with the Helsinki II declaration.
Cardiovascular disease diagnoses from the Danish National Patient Registry
The four disease groups analyzed in the present study are listed in Table 1.
International Classification of Diseases (ICD) codes were extracted from the DNPR, which contains all hospital discharges and outpatient visits from all Danish hospitals since 1977 [24]. In the present study, diagnoses were available until the 23rd of November 2022. All national health registries in Denmark are linked via a unique personal identification number assigned to all Danish residents [30]. DNPR holds ICD-8 and ICD-10 codes, with ICD-10 since 1994, while ICD-9 codes were never implemented in Danish registries. Here, ICD codes for all Diseases of the Circulatory Organs (i.e., ICD-10 I00–99 [31]) were defined in overall disease groups according to The Danish Health Data Authority browser [32], the Statistics Denmark, holding the historical information on disease grouping [33], as well as the definitions by the Danish Cause of Death Registry [34]. Consequently, eight groups were initially defined:
-
(1)
Rheumatic fever and chronic rheumatic heart diseases (ICD-8: 390–398 and ICD-10: I00–I09),
-
(2)
Hypertension (ICD-8: 400–404 and ICD-10: I10–I15),
-
(3)
Ischemic heart diseases (ICD-8: 410–414 and ICD-10: I20–I25),
-
(4)
Vascular diseases of the brain (ICD-8: 430–438 and ICD-10: I60–I69),
-
(5)
Diseases of arteries, arterioles, and capillaries (ICD-8: 440–448 and ICD-10: I28, I70–I79),
-
(6)
Other heart diseases (ICD-8: 420–429 and ICD-10: I27, I30–I52),
-
(7)
Diseases of the veins and lymphatic system (ICD-8: 450–457 and ICD-10: I26, I80–I89).
-
(8)
Other and unspecific diseases of the circulatory system (ICD-8: 458 and ICD-10: 95–99).
As blood pressure was to be included as a covariate in the present study (see below, cf. [35]), group No. 2 was not considered further. For each of the remaining groups, the number of individuals with at least one diagnosis within the disease group in question was counted, including both main and secondary diagnoses. No cases were found for group No.1, and for group No. 8, only diagnoses of low blood pressure (N = 16) or unspecific/post-operation diagnoses were found (N = 7). Hence, only the five remaining groups were considered in the analysis. Due to a limited sample size, group No. 5 (N = 84) was combined with group No. 6 (N = 234), similarly to [36]. This group is hereafter referred to as: Arterial and other cardiovascular diseases. Based on the definitions of the WHO [31], group No. 7 was kept as a separate group. Consequently, four disease groups were analyzed in the present study:
-
(a)
Vascular diseases of the brain, hereon called cerebrovascular diseases (CD),
-
(b)
Ischemic heart diseases, also known as coronary artery diseases (CADs),
-
(c)
Arterial and other cardiovascular diseases (AOCDs),
-
(d)
Diseases of the veins and lymphatic system (DVL).
For each of these four disease groups, the first occurring diagnose date was defined for each individual, and they were then divided into either prevalent or incident cases, i.e., whether the first occurring diagnosis date was before (prevalent) or after (incident) blood sample collection. The reason for this division of cases was to condition on the temporality between expose and outcome; CVD event occurring before blood sample can potentially be reflected in a blood sampling taken after a disease event, whereas biological variation identified in an analysis of CVD events occurring after blood sampling for individuals with no diagnosis prior to the blood sampling is in principle predictive of future CVD events. In the subsequent data analyses (see below) of each of the disease groups, analysis of prevalent cases was performed by comparing cases to the rest of the study population (i.e., the individuals with no diagnose before blood sampling within the disease group in question). Incident cases were analyzed based on time to first diagnosis (incident cases) or time to death or end-of-follow-up (individuals remaining diagnose free) and excluding the prevalent cases from the disease group in question. For analysis of incident cases, information on survival status was retrieved from the Danish Central Person Register, which is continuously updated [37]. Finally, within each of the four disease groups, the number of discordant twin pairs was identified. The prevalent discordant cases were twin pairs, in which one twin was diagnosed with the disease before blood sampling, while the co-twin was not. The incident discordant cases were twin pairs, in which one twin obtained a diagnosis after blood sampling, while the co-twin remained disease-free or died, as well as pairs, in which both twins obtain a diagnosis, although at different time points after blood sampling.
Genome-wide omics data obtained for DNA and RNA from the same individuals
Detailed descriptions of the laboratory methods and quality control of the Infinium HumanMethylation450K BeadChip (Illumina, San Diego, CA, USA) methylation data, and of the Agilent SurePrint G3 Human GE 8 × 60K Microarray (Agilent Technologies) transcriptomic data can be found in [38] and [39], respectively. A brief overview can be found in the Supplementary Methods. Annotation of the DNA methylation data was done using GRCh37/hg19 (as per year 2018), applying the annotation file from [40], as recommended by Illumina Inc. (Illumina, San Diego, CA, United States) (personal communication), while the transcriptomic data were annotated using GRCh37/hg19, applying the annotation file supplied by Agilent (as per November 2022). This annotation among others includes information of names of genes located in the specific genomic position of the CpG or probe, these gene names were used for the subsequent bioinformatics analyses (see below).
Lastly, for both genome-wide DNA methylation data and genome-wide gene expression data, the biological material from each of the three cohorts had been analyzed on different occasions, hence, each dataset had been processed individually before the datasets were merged in the present study (see Supplementary Methods for details). Only reoccurring CpGs and gene expression probes across the three cohorts were used for the present study, resulting in 451,547 CpGs and 33,869 probes for statistical analyses. To achieve a distribution more fit for statistical testing, the β-values of the DNA methylation data were transformed to M-values by quantile transformation [41].
Survey phenotype data considered as covariates
Based on the HeartScore algorithm, developed by the European Association of Preventive Cardiology [35], for estimation of the risk of CVD disease events, the following covariates were considered: age, sex, self-reported smoking status (never, former, and current smoker), systolic blood pressure (mmHg), and non-high-density lipoprotein cholesterol (nonHDL) level (mmol/L). The latter is defined as the level of high-density lipoprotein cholesterol (HDL) subtracted from the total cholesterol level in blood. Standard systolic blood pressure measurements of the upper right arm, and blood lipid levels, by routine hospital measurements, were available for 740 (89%) and 357 (43%), respectively, of the 835 individuals in the study population. In order to consider these two variables as covariates, imputation was performed using the Multiple Imputation by Chain Equations (MICE) method. Moreover, sensitivity analyses of real versus imputed values, and sensitivity analyses with or without inclusion of the two variables were performed and led to the same overall conclusions (see Supplementary Methods for details).
Moreover, blood cell counts were also considered, as blood cell composition can bias molecular epidemiology studies performed in blood samples. Data were available as hematological counts of leukocytes subtypes (neutrophils, monocytes, basophils, eosinophils, and lymphocytes) for 763 (91%) out of the 835 individuals. Cell counts for the remaining individuals had previously been imputed using epigenome-wide data with a modified version of the PredictCellComposition method (see [42] for details).
Before statistical analyses, an exploratory principal component analysis (PCA) was initially performed (similarly to [38]) of all survey and register data considered in the present study as well as of the DNA methylation data and the gene expression data, respectively, with the aim to examine the correlation between potential confounder variables and the principal components (PCs) of the omics data. Subsequently, either the confounder variable or the PC was included statistical analyses (see Supplementary Methods for details).
For descriptive purposes, information on height, weight, overall diabetes status, and medication use was included. Height and weight were measured for the LifeSpan and MADT individuals, whereas they were self-reported for LSADT individuals. With respect to medication, the use of statins (ATC code C10AA) was based on self-report and was defined following the ATC classification guideline [43].
Statistical analyses
All statistical analyses were performed in R version 4.1.0, while initial data handling and cleaning of survey and register data were conducted using STATA 17.0.
For all statistical models (see below), imputation of systolic blood pressure and nonHDL was performed using MICE by creating 20 datasets, 20 iterations for each dataset and pooling the results applying Rubin’s rules (see Supplementary Methods for details). The R libraries including tidyverse, ggpubr, parallel, mice, and broom.mixed were used. Adjusting for multiple testing was performed using Benjamini–Hochberg FDR correction method [44]. Lastly, volcano plots were constructed using the ggplot2 library to visualize patterns in the direction of estimates.
Analyses of prevalent cases
EWAS and TWAS were performed using linear regression models in the lme4 library with the single DNA methylation (CpG) level and the gene expression level, respectively, as the outcome and the given disease group status as the explanatory variable. EWAS and TWAS were conducted both at the individual-level and at the twin pair-level.
The individual-level analysis was performed applying a linear mixed effect regression model with twin pair ID as a random factor in order to take the within-pairs’ dependency into account. The following covariates were included for both EWAS and TWAS: age at blood sampling, sex, nonHDL, systolic blood pressure, and smoking status as fixed effects, and an omics dataset variable as an additional random effect. Moreover, PC3 and PC2 were included for EWAS and TWAS, respectively, and reflected the cell counts (see Supplementary Methods).
In the twin pair-level analysis, intrapair differences in twin pairs discordant for disease status were investigated. The intrapair difference was calculated for a given variable by subtracting the variable value of the co-twin without a diagnosis from that of the co-twin with a diagnosis. A linear regression model was applied with the intrapair difference at DNA methylation level and the gene expression level, respectively, as the outcomes, and with the intrapair differences in the following covariates as the exposures: nonHDL, systolic blood pressure, smoker status, and PC2 in TWAS or PC3 in EWAS. A dataset variable was not included in the intrapair analyses, as the biological samples from the same twin pair had been analyzed on the same array and within the same cohort.
Analyses of incident cases
For analysis of incident disease cases, Cox proportional hazards regression using the Survival library was conducted, both at the individual-level and at the twin pair-level.
The individual-level analysis was performed using age as the underlying timescale (delayed entry at blood collection) and adjusted for sex, nonHDL, systolic blood pressure, smoking status, and PC2 in TWAS or PC3 in EWAS. We used age as the underlying time scale to ensure proper age adjustment and to allow a nonlinear relation between the molecular marker analyzed and the risk of CVD. Within pair dependence was adjusted in the models to account for dependency between twins in each twin pair. Individuals who died during follow-up without obtaining a diagnosis in the disease group in question were censored at time of death.
The twin pair-level analysis was performed using the stratified Cox regression, where the baseline hazard functions was pair specific.
Bioinformatic analyses of overlapping genes
The CpG sites investigated in EWAS, and the probes investigated in TWAS were annotated. This annotation includes information of the names of genes located in the specific genomic position of the CpG or probe. These gene names were used in gene set enrichment analysis (GSEA) with the aim to explore the biological pathways in which the identified CpGs and probes take part. Specifically, for each analysis, i.e., within each of the four disease groups and within each of the four statistical models (individual-level analysis and twin pair-level analysis of prevalent cases as well as individual-level analysis and twin pair-level analysis of incident cases), the overlap in gene names of the CpGs (EWAS) with FDR < 0.05 and the gene names of the probes (TWAS) with FDR < 0.05 was explored. Similarly, overlapping genes were inspected using a cutoff p < 0.05. As some CpGs or probes might be annotated to different genes or different aliases, and as different CpGs and probes might be annotated to the same gene, the lists of genes used to find the overlaps included all unique genes identified for a given analysis. Subsequently, for these 16 lists of EWAS-TWAS overlapping gene names (i.e., 4 disease groups times 4 statistical models) the overlap in EWAS-TWAS overlapping gene names from the individual-level analysis, and EWAS-TWAS overlapping gene names from the twin pair analysis were found for each disease group and within prevalent and incident cases. Such an overlap holds the EWAS-TWAS overlapping genes identified in the individual-level analysis and confirmed in the twin pair-level analysis, i.e., the genes confirmed when reducing the potential confounding due to shared genetic factors and early life environment. Such genes must be considered the most relevant genes when exploring the association of molecular markers to CVD. These 8 lists of confirmed genes (i.e., 4 disease groups times prevalent or incident cases) were used for GSEA in the GSEA database [45,46,47] with application of the Kyoto Encyclopedia of Genes and Genomes (KEGG) and Reactome databases. Lastly, the overlap in genes between analyses of prevalent and analyses of incident cases, as well as all cases, was also explored. The lists of genes identified and used in GSEA were below the maximum number of 500 genes, which the GSEA database can handle with certainty [48].
Finally, interaction networks were also identified for the 8 list of genes by application of StringApp in Cytoscape [49]. In the displayed networks, genes not connected to other genes were not visualized, and for complex networks only the most connected genes are visualized. Unconstrained networks can be found in the Supplementary Results.
Results
The study design of the present study is depicted in Fig. 1. In the present study, we explore the association between variation in DNA methylation and gene expression levels and diagnoses of the circulatory system in 835 twins. We perform association analyses both at the individual-level and at the twin pair-level, the latter reducing the potential confounding introduced by genetic variation and variation in early life environment. Descriptive characteristics of the study participants can be seen in Table 1. As seen from this table, between 4.8% and 13.2% of the study population had a cardiovascular disease event before blood sampling (prevalent cases), depending on the disease group. Of the individuals without disease diagnose at the time of blood sampling, between 7.5 and 22.8% experienced a cardiovascular disease event after blood sampling (incident cases), depending on the disease group.
Epigenome-wide association studies and transcriptome-wide association studies of cardiovascular disease groups
TWAS and EWAS were performed for four different groups of CVD, i.e., cerebrovascular diseases (CD), coronary artery disease (CAD), arterial and other cardiovascular diseases (AOCD), and diseases of the veins and lymphatic system (DVL) in the entire twin population. For each disease group, analyses of both prevalent and incident cases were performed, as well as analyses of individual and intrapair differences. In TWAS, 33,869 transcripts were studied, and, in EWAS, 451,547 CpG sites were investigated.
The results of the association analyses holding a p value below 0.05 are listed in Supplementary Results, part 1. The number of CpGs with a p value below 0.05 in the EWAS ranged from 10,690 to 53,417 (corresponding to 2.4–11.8% of all CpGs) depending on the disease group and statistical analysis, while for TWAS, the range was 741–5273 probes (corresponding to 2.2–15.6% of all probes) depending on the disease group and statistical analysis.
Correction for multiple testing (FDR < 0.05) revealed 3 CpGs and 1 probe in the individual-level analysis of prevalent cases (see Table 2): cg16541931, cg04329510, and cg09378783 were all found associated with CD and annotated to the promoter region of GPR158 (G protein-coupled receptor), the 3′UTR of C5AR1 (complement C5a receptor 1) and to the gene body of EFGR (epidermal growth factor receptor), respectively, while A_19_P00806441, associated to DVL, was annotated to the LOC105370365 RNA gene. Moreover, two CpGs displayed association in the individual-level analysis of incident CAD: cg06567227 and cg06714480 annotated to the gene body of SIM1 (SIM BHLH transcription factor 1) and to the 5′UTR of NEUROD1 (neuronal differentiation 1) and CERK (Ceramide Kinase), respectively. Finally, none of the CpGs or probes found to have a p value below 0.05 in the twin pair-level analyses passed correction for multiple testing.
Genes in overlap between EWAS and TWAS
As seen in the previous section, no genes held both a CpG and a gene expression probe with an FDR corrected p value below 0.05. Hence, the 10,690–53,417 CpGs found to have a p value below 0.05 in the EWAS, and, in the TWAS, the 741–5,273 probes found to have a p value below 0.05 were investigated. They were annotated to 8634–21,561 unique genes from EWAS, and to 621–4,332 unique genes from TWAS, respectively. These unique genes were used to inspect the overlap in genes between EWAS and TWAS analyses within each disease group and within prevalent and incident cases, with the purpose to investigate the biology of these overlapping genes in GSEA (see below). Firstly, 8 gene lists of EWAS-TWAS overlapping genes were found for the prevalent cases (i.e., 4 disease groups times 2 analyses (individual-level and twin pair-level analyses)). Similarly, 8 gene lists of EWAS-TWAS overlapping genes were found for analysis of incident cases (i.e., 4 disease groups times 2 analyses (individual-level and twin pair-level analyses)). These 16 lists of EWAS-TWAS overlapping genes are listed in Supplementary Results, part 2. The number of overlapping genes ranged between 217 and 2007 depending on disease group and analysis. As the present study population was twins, we further identified the overlap between the EWAS-TWAS overlapping genes of the individual-level analysis and the EWAS-TWAS overlapping genes of the twin pair-level analysis within each disease group and within prevalent and incident cases. Such genes are the genes identified in the individual-level analysis, yet also confirmed in the twin pair analysis, the latter effectively reducing the potential confounding introduced by genetic variation and variation in early life environment. As such, these genes can be considered the most interesting candidates when investigating the association between variation in DNA methylation and gene expression levels and a given phenotype of interest. These 8 lists of genes (4 disease groups in prevalent analysis, and 4 disease groups in incident analysis) are listed in Supplementary Results, part 2. For prevalent cases, 136, 447, 391, and 109 genes were found for the CD, CAD, AOCD, and DVL, respectively, whereas for the incident cases 34, 104, 306, and 42 genes were found for the CD, CAD, AOCD, and DVL, respectively. Subsequently, volcano plots with all CpGs or probes displaying a p value below 0.05 were constructed (see Supplementary Figures) with the aim to explore the direction of the effect of the CpGs or probes located in the overlapping genes. As seen from these volcano plots, no general pattern in the direction of effect was seen. Subsequently, the overlapping genes were investigated in GSEA and displayed in gene networks.
Gene network and gene set enrichment analyses of the genes identified for diagnosis after blood sampling
With the aim to identify biological variation associated with future CVD, the incident CVD cases were analyzed first. Such analysis potentially identifies predictive biomarkers of CVD. For the incident cases, the number of genes confirmed in the twin pair-level analysis was 34, 104, 306, and 42 for the CD, CAD, AOCD, and the diseases of DVL, respectively (see Supplementary Results, part 2).
Interaction networks for each of these four groups of genes are displayed in Supplementary Results, part 3. CD displayed no networks, whereas DVL showed networks with maximum two interactions. Figure 2 display the interaction networks for CAD and AOCD: as seen from the figure, the most interacting gene for CAD was MYO1E (myosin 1E), COMMD9 (the COMM Domain Containing 9), TPM1 (Tropomyosin 1), and TPM3 (Tropomyosin 3), whereas for AOCD, the most connected gene was HNRNPA1 (Heterogeneous Nuclear Ribonucleoprotein A1) followed by ATM (Ataxia-telangiectasia mutated), and DDX5 (DEAD-Box Helicase 5).
When performing GSEA on these gene lists, we found no pathways for CD, or CAD, whereas 4 pathways were found for the AOCD, and 2 pathways were observed for DVL (see Table 3). As seen from the table, the pathways reflected cell–cell communication, and metabolisms of proteins and RNA, gene expression, and chromatin organization.
Gene network and gene set enrichment analyses of the genes identified for diagnosis before blood sampling
Subsequently, the prevalent cases were analyzed. Genes identified in such analyses associate to CVD events occurring before blood sampling and, hence, potentially reflect physiological processes occurring as a consequence of disease. For the prevalent cases, the number of genes confirmed in the twin pair-level analysis were 136, 447, 391, and 109 for the CD, CAD, AOCD, and DVL, respectively (see Supplementary Results, part 2).
Full interaction networks for each of these four groups of genes can be found in Supplementary Results, part 3, while networks with the genes displaying the most connections are displayed in Fig. 3. As seen in the figure, the CD, CAD, and AOCD displayed rather dense networks, whereas the network for DVL was less dense. The three most connected genes for CD were ITGB1 (integrin subunit beta 1) and FYN (FYN proto-oncogene, Scr family tyrosine kinase) that were located in the same subcluster, and RPS27A (the ribosomal protein S27a gene) located in another subcluster. For CAD, the three most connected genes were ACTB (Actin Beta), CREBBP (CREB Binding Protein), and the CD19 (CD19 Molecule). For AOCD, the three most connected genes were HDAC1 (Histone Deacetylase 1), SIRT1 (Sirtuin 1), DDX5, and UBE2I (Ubiquitin-Conjugating Enzyme E2 I). For DVL, the most connected genes were the AGPAT1 (1-Acylglycerol-3-Phosphate O-Acyltransferase 1), CACNA1C (Calcium Voltage-Gated Channel Subunit Alpha1 C), and RPSA (Ribosomal Protein SA).
When performing GSEA using each of these gene lists 48, 100, and 100, we found pathways with an FDR < 0.05 for CD, CAD, and AOCD, respectively (see Supplementary Results, part 4). Generally, the majority of the identified pathways were involved in signal transduction, immune system, and disease. No pathways were found for the 109 genes found for DVL.
Overlap in genes identified in analyses of prevalent CVD and incident CVD within each cardiovascular disease group
Finally, the overlap in genes found for prevalent cases and for the incident cases were inspected per CVD group. For CAD, 6 overlapping genes were found, whereas for AOCD, 19 genes were found (see Supplementary Results part 2, sheet 3). Performing of GSEA of these overlapping genes revealed no significant pathways. Gene network analysis revealed no networks for CAD, while PIGG (Phosphatidylinositol Glycan Anchor Biosynthesis Class G) and UVSSA (UV-Stimulated Scaffold Protein A) were the only two genes to show connection for AOCD. Lastly, when similarly investigating the overlap in GSEA, we found pathways for prevalent cases and for the incident cases per CVD group, we found four Reactome pathways for AOCD. These were “Posttranslational protein modification (R-HSA-597592),” “RNA polymerase II transcription (R-HSA-73857),” “Metabolism of RNA (R-HSA-8953854),” and “Chromatin modifying enzymes (R-HSA-3247509).”
Discussion
Globally, cardiovascular diseases (CVDs) remain the leading cause of death [1]. Studies using DNA methylation as well as gene expression data to investigate CVD development are rare. In such studies, a major challenge is to reduce genetic confounding. Hence, application of twin data is ideal to decrease such confounding. Additionally, registry data are likely to be less biased than self-reported data. Thus, we here use twin register data of 835 twins to study the relationship between CVD development and DNA methylation and gene expression.
First, analyses of DNA methylation and gene expression data were performed individually. Three gene regions were found in EWAS of prevalent cerebrovascular diseases (CD) holding for multiple testing (FDR < 0.05): GPR158/GPR158-AS1, C5AR1, and EGFR, encoding a protein likely involved in G protein-coupled receptor activity/its antisense RNA, a receptor of complement 5a, and a growth factor receptor, respectively. GPR158 is highly expressed in the brain and associates to nervous system-related tumors [50] and cardiac fibrosis [51], whereas C5AR1 associates to atherosclerosis [52], coronary artery disease (CAD) [53, 54], and CD [55]. C5AR1 has also been presented as a potential drug target for reducing inflammation [56, 57]. EGFR is highly expressed in the neural stem cells and plays a role during embryogenesis and organogenesis, including the heart. In the central nervous system, EGFR is, among others, involved in the neural stem cells’ pool maintenance and axonal regeneration. Moreover, the two gene regions found in EWAS of incident CAD (FDR < 0.05), i.e., SIM1 and NEUROD1/CERKL, encode transcription factors (SIM1 and NEUROD1) or a protein with ceramide kinase-like domains, yet with unknown function (CERKL). Although these genes have not been linked directly to CVD, they are linked to CVD risk factors; SIM1 has a role in neuronal differentiation of the hypothalamus, which is critical for food consumption regulation [58], and loss-of-function mutations in SMI1 associate with increased risk of obesity [59]. NEUROD1 is expressed by pancreatic and nervous tissues and is essential for the development of beta cells and diabetes [60], which have been reported to influence cardiac function, as well as inflammation [61]. Lastly, LOC105370365 notably found in TWAS of prevalent diseases of the veins and lymphatic system (DVL) encodes a noncoding RNA of unknown function. Overall, although not all of these gene regions have previously been linked directly to the CVD phenotype, for which they display association in the present study, most do relate to relevant biology, CVD traits, or risk factors.
Furthermore, the overlap in genes identified in EWAS and TWAS was found, and, subsequently, the genes found both in the individual-level analysis and in the twin pair-level analysis were identified. The fact that these genes are confirmed, when the potential confounding effects of genetics and early life environment were reduced, makes them the most noteworthy genes for identifying nongenetic markers to CVD. These overlap genes were inspected by pathway and network analyses. Interestingly, for the many pathways found for the prevalent CVD events, the most frequent hierarchical groups were related to signal transduction, the immune system, diseases, metabolism and metabolism of RNA and proteins. While the fewer pathways found for the incident CVD events, the hierarchical groups were related to cell–cell communication, metabolisms of proteins and RNA, gene expression, and chromatin organization. This could potentially reflect an overall physiological response to disease in the prevalent cases, compared to a more regulatory effect for the incident cases. In any case, this difference indicates that investigation of prevalent cases and incident cases indeed reflect different biology, and it calls for longitudinal studies of incident cases to identify biomarkers with true predictive abilities. Finally, when constructing networks with the overlapping genes, more dense networks were in general found for prevalent cases than for incident cases, potentially reflecting differences in biology. For prevalent cases, the most connected genes in these dense networks were: ITGB1 for CD, ACTB for CAD, and HDAC1 for AOCD, encoding an integrin subunit, actin beta, and a histone deacetylase, respectively. Integrins are membrane receptors involved in cell adhesion and recognition in a variety of processes including hemostasis, tissue repair, and immune response, where studies have shown that ITGB1 has a protective role in CVD development [62,63,64]. Actins are highly conserved proteins involved in cell motility, structure, integrity, and intercellular signaling, where ACTB is found to be associated with CAD [65] and CD [66]. HDAC1 is involved in deacetylation of lysine residues on the N-terminal part of the core histones, giving a tag for epigenetic repression, and related to transcriptional regulation, as well as cell cycle progression. HDAC1 is a negative regulator of cardiomyocytes that is found to be associated with CVD development [67] and is suggested as a potential drug target [68]. For the incident analyses, the networks were less dense; for CAD, the largest network was centered around MYO1E, which encodes a non-muscle membrane-associated class 1 myosin, expressed in the kidneys, where it helps the capillaries resist hydrostatic pressure at the glomerular filtration barrier by generating tension [69]. MYO1E mutations are associated with focal segmental glomerulosclerosis, characterized by an increased risk of CVD, primary due to increase in dyslipidemia and hypertension [70]. In the network MYO1E was among others connected to TPM1 and TPM3, which encode Tropomyosin 1 and 3, respectively. Tropomyosin 1 plays a role in calcium mediated muscle contraction and associates to several heart diseases, including hypertrophic cardiomyopathy and CAD [71, 72]. Mutations in TPM3 can result in congenital muscle weakness due to reduced calcium-sensitivity [73], and potentially to an increased risk of CVD [74]. Another cluster for CAD held the COMMD9 as the most central gene. COMMD9 is expressed in the kidneys and is involved in sodium ion transport and cholesterol homeostasis [75]. Changes in sodium transportation can result in blood pressure alterations, which can lead to CVD [76]. Interestingly, the Na+–Ca2+ exchanger is a vital regulator of Ca2+ homeostasis in cardiomyocytes and thereby an essential modulator of cardiac contractile function [78]. The network for incident AOCD had HNRNPA1 as the most connected gene. HNRNPA1 encodes a mRNA splicing factor affecting gene expression, e.g., HNRNPA1 reduces expression of 3-hydroxy-3-methylglutaryl-Coenzyme A reductase (HMGCR), which encodes a rate-limiting enzyme of the cholesterol biosynthesis pathway. Reduced expression of HMGCR leads to reduced cholesterol synthesis and increased low-density lipoprotein (LDL) and apolipoprotein B (ApoB) uptake from the blood. Hence, HNRNPA1 has been suggested to have cholesterol lowering effects similar to statins [77]. HNRNPA1 is a regulator of vascular smooth muscle cell function and neointimal hyperplasia, i.e., the proliferation and migration of vascular smooth muscle cells in the innermost layer of the arteries, which results in thickening of the arterial wall and atherosclerosis [79]. Central for the same network were also ATM and DDX5, which encodes a DNA repair protein and an RNA helicase, respectively. ATM deficient cells are characterized by hyperlipidemia, which promotes atherosclerosis and CVD [80, 81], and several studies have associated ATM to CVD [82, 83]. DDX5 is expressed in the smooth muscle cells in arteries, among others, and is suggested to have a protective role in neointimal hyperplasia [84]. Consequently, DDX5 has been suggested as a potential target for vascular therapies [85]. Lastly, despite the fact that different genes, pathways, and networks were found in the analyses of either prevalent or incident cases, some genes were found in both. Among genes found in both analyses were MYO1E (for incident CAD) and DDX5 (for incident AOCD). Such genes might be considered generalist genes, as the DNA methylation and gene transcription levels change as a consequence both of the disease (prevalent cases) as well as prior disease (incident cases).
The strength of the present study is primarily the use of register data from a nationwide registry that makes it possible to investigate all CVD diagnoses, which are less biased than self-report, and, furthermore, enables the investigation of both prevalent and incident cases. Another great strength is the use of twin data, which enabled us to examine intrapair differences, facilitating a detailed study of the gene expression and DNA methylation differences associated with CVD, while at the same time controlling for genetics and shared early life environmental confounders. Lastly, epigenetic data and gene transcription data are available from the same individuals in the present study, which is indeed rare in this line of research. To investigate both types of data, it is beneficial to investigate biological mechanisms due to the integration of different omics layers providing a more biologically holistic view compared with single omics studies. However, the study also has a number of limitations; first of all, the use of blood samples means that the molecular markers measured reflect the overall biology of the individuals, and not the biology specific to for instance the heart muscle cells or brain cells. On the other hand, identification of molecular markers applicable in a blood sample would have great potential due to the non-invasive nature of the sampling. Moreover, we conducted an explorative study, as genes with a p value below 0.05, yet not holding for multiple testing, were investigated. Hence, we cannot exclude false positive findings among these genes, and replication studies are needed to verify the present findings. Nevertheless, considering previous estimations of the statistical power of the discordant twin pair design relative to the classical case control study [22], the power of the twin pair analyses of the present study including 28–70 twin pairs appears to be reasonably good (considering a power > 80, a p value threshold of 10–6, and a heritability of 60%) [22]. The individual-level analysis (including 40–165 cases) is, on the other hand mostly likely under powered for the most part. In any case, as the findings of the present study are conditioned on the findings of the twin pair analyses, this is less of an issue and highlights the strength of including twin pairs in molecular studies as compared to including singletons only. Lastly, the incident CVD cases in the present study were defined as not having an CVD diagnosis within the CVD group of interest before blood sampling. However, we cannot exclude that pathological changes were already present at blood sampling, as e.g., CAD develops over years. So, not even conditioning on absence of diagnosis before blood sampling can ensure that the biology measured does not, to some degree, reflect initial disease.
In conclusion, we here present a study of diagnoses in the entire group of Diseases of the Circulatory Organs obtained from a nationwide registry in 835 twins for which both genome-wide epigenetic and genome-wide gene transcription data were available. Investigation of genes found in both layers of biology, as well as in the analysis of both individuals and twin pairs points to novel candidate genes and pathways. These markers could potentially be beneficial in the detection of both existing and future CVD. However, verification in additional study populations is needed.
Data availability
No datasets were generated or analyzed during the current study.
Abbreviations
- ApoB:
-
Apolipoprotein B
- AOCD:
-
Arterial and other cardiovascular diseases
- BMI:
-
Body mass index
- CAD:
-
Coronary artery disease
- CD:
-
Cerebrovascular diseases
- CVD:
-
Cardiovascular diseases
- DNPR:
-
Danish National Patient Registry
- DVL:
-
Diseases of the veins and lymphatic system
- EWAS:
-
Epigenome-wide association studies
- GSEA:
-
Gene set enrichment analysis
- GWAS:
-
Genome-wide association studies
- HDL:
-
High-density lipoprotein cholesterol
- ICD:
-
International Classification of Diseases
- KEGG:
-
Kyoto Encyclopedia of Genes and Genomes
- LDL:
-
Low-density lipoprotein
- LSADT:
-
The Longitudinal Study of Aging Danish Twins
- MADT:
-
The Middle Age Danish Twin study
- MICE:
-
Multiple Imputation by Chain Equations
- NonHDL:
-
Non-high-density lipoprotein cholesterol
- PCA:
-
Principal component analysis
- TWAS:
-
Transcriptomic-wide association studies
- ACTB :
-
Actin Beta
- AGPAT1 :
-
1-Acylglycerol-3-Phosphate O-Acyltransferase 1
- ATM :
-
Ataxia-telangiectasia mutated
- C5AR1 :
-
Complement C5a receptor 1
- CACNA1C :
-
Calcium Voltage-Gated Channel Subunit Alpha1 C
- CERK :
-
Ceramide Kinase
- COMMD9 :
-
COMM Domain Containing 9
- CREBBP :
-
CREB Binding Protein
- DDX5 :
-
DEAD-Box Helicase 5
- EFGR :
-
Epidermal growth factor receptor
- FYN :
-
FYN proto-oncogene, Scr family tyrosine kinase
- GPR158 :
-
G protein-coupled receptor
- HDAC1 :
-
Histone Deacetylase 1
- HMGCR :
-
3-Hydroxy-3-methylglutaryl-Coenzyme A reductase
- HNRNPA1 :
-
Heterogeneous Nuclear Ribonucleoprotein A1
- ITGB1 :
-
Integrin subunit beta 1
- MYO1E :
-
Myosin 1E
- NEUROD1 :
-
Neuronal differentiation 1
- PIGG :
-
Phosphatidylinositol Glycan Anchor Biosynthesis Class G
- RPS27A :
-
Ribosomal protein S27a
- RPSA :
-
Ribosomal Protein SA
- SIM1 :
-
SIM BHLH transcription factor 1
- SIRT1 :
-
Sirtuin 1
- TPM1 :
-
Tropomyosin 1
- TPM3 :
-
Tropomyosin 3
- UBE2I :
-
Ubiquitin-Conjugating Enzyme E2 I
- UVSSA :
-
UV-Stimulated Scaffold Protein A
References
Moodie DS. The global burden of cardiovascular disease. Congenit Heart Dis. 2016;11(3):213.
Erdmann J, Kessler T, Munoz Venegas L, Schunkert H. A decade of genome-wide association studies for coronary artery disease: the challenges ahead. Cardiovasc Res. 2018;114(9):1241–57.
Traylor M, Farrall M, Holliday EG, Sudlow C, Hopewell JC, Cheng YC, et al. Genetic risk factors for ischaemic stroke and its subtypes (the METASTROKE collaboration): a meta-analysis of genome-wide association studies. Lancet Neurol. 2012;11(11):951–62.
Liu C, Mou S, Pan C. The FTO gene rs9939609 polymorphism predicts risk of cardiovascular disease: a systematic review and meta-analysis. PLoS ONE. 2013;8(8): e71901.
Arking DE, Chakravarti A. Understanding cardiovascular disease through the lens of genome-wide association studies. Trends Genet. 2009;25(9):387–94.
Tcheandjieu C, Zhu X, Hilliard AT, Clarke SL, Napolioni V, Ma S, et al. Large-scale genome-wide association study of coronary artery disease in genetically diverse populations. Nat Med. 2022;28(8):1679–92.
German CA, Shapiro MD. Polygenic risk scores to identify cvd risk and tailor therapy: hope or hype? Curr Atheroscler Rep. 2021;23(9):47.
Leon-Mimila P, Wang J, Huertas-Vazquez A. Relevance of multi-omics studies in cardiovascular diseases. Front Cardiovasc Med. 2019;6:91.
Greenberg MVC, Bourc’his D. The diverse roles of DNA methylation in mammalian development and disease. Nat Rev Mol Cell Biol. 2019;20(10):590–607.
Fernández-Sanlés A, Sayols-Baixeras S, Subirana I, Degano IR, Elosua R. Association between DNA methylation and coronary heart disease or other atherosclerotic events: a systematic review. Atherosclerosis. 2017;263:325–33.
Agha G, Mendelson MM, Ward-Caviness CK, Joehanes R, Huan T, Gondalia R, et al. Blood leukocyte DNA methylation predicts risk of future myocardial infarction and coronary heart disease. Circulation. 2019;140(8):645–57.
Westerman K, Sebastiani P, Jacques P, Liu S, DeMeo D, Ordovás JM. DNA methylation modules associate with incident cardiovascular disease and cumulative risk factor exposure. Clin Epigenet. 2019;11(1):142.
Richard MA, Huan T, Ligthart S, Gondalia R, Jhun MA, Brody JA, et al. DNA methylation analysis identifies loci for blood pressure regulation. Am J Hum Genet. 2017;101(6):888–902.
Braun KVE, Dhana K, de Vries PS, Voortman T, van Meurs JBJ, Uitterlinden AG, et al. Epigenome-wide association study (EWAS) on lipids: the Rotterdam Study. Clin Epigenet. 2017;9:15.
Reed ZE, Suderman MJ, Relton CL, Davis OSP, Hemani G. The association of DNA methylation with body mass index: distinguishing between predictors and biomarkers. Clin Epigenet. 2020;12(1):50.
Cappola TP, Margulies KB. Functional genomics applied to cardiovascular medicine. Circulation. 2011;124(1):87–94.
Arunachalam G, Upadhyay R, Ding H, Triggle CR. MicroRNA signature and cardiovascular dysfunction. J Cardiovasc Pharmacol. 2015;65(5):419–29.
Liao J, Wang J, Liu Y, Li J, Duan L. Transcriptome sequencing of lncRNA, miRNA, mRNA and interaction network constructing in coronary heart disease. BMC Med Genom. 2019;12(1):124.
Zhang X, Wang C, He D, Cheng Y, Yu L, Qi D, et al. Identification of DNA methylation-regulated genes as potential biomarkers for coronary heart disease via machine learning in the Framingham Heart Study. Clin Epigenet. 2022;14(1):122.
Palou-Márquez G, Subirana I, Nonell L, Fernández-Sanlés A, Elosua R. DNA methylation and gene expression integration in cardiovascular disease. Clin Epigenetics. 2021;13(1):75.
Tan Q, Li W, Vandin F. Disease-concordant twins empower genetic association studies. Ann Hum Genet. 2017;81(1):20–6.
Li W, Christiansen L, Hjelmborg J, Baumbach J, Tan Q. On the power of epigenome-wide association studies using a disease-discordant twin design. Bioinformatics. 2018;34(23):4073–8.
Joshi A, Rienks M, Theofilatos K, Mayr M. Systems biology in cardiovascular disease: a multiomics approach. Nat Rev Cardiol. 2021;18(5):313–30.
Schmidt M, Schmidt SA, Sandegaard JL, Ehrenstein V, Pedersen L, Sørensen HT. The Danish National Patient Registry: a review of content, data quality, and research potential. Clin Epidemiol. 2015;7:449–90.
Pedersen DA, Larsen LA, Nygaard M, Mengel-From J, McGue M, Dalgård C, et al. The Danish Twin registry: an updated overview. Twin Res Hum Genet. 2019;22(6):499–507.
McGue M, Christensen K. The heritability of level and rate-of-change in cognitive functioning in Danish twins aged 70 years and older. Exp Aging Res. 2002;28(4):435–51.
Frost M, Petersen I, Brixen K, Beck-Nielsen H, Holst JJ, Christiansen L, et al. Adult glucose metabolism in extremely birthweight-discordant monozygotic twins. Diabetologia. 2012;55(12):3204–12.
Tan Q, Frost M, Heijmans BT, von Bornemann HJ, Tobi EW, Christensen K, et al. Epigenetic signature of birth weight discordance in adult twins. BMC Genom. 2014;15(1):1062.
Christiansen L, Frederiksen H, Schousboe K, Skytthe A, von Wurmb-Schwark N, Christensen K, et al. Age- and sex-differences in the validity of questionnaire-based Zygosity in Twins. Twin Res Hum Genet. 2003;6(4):275–8.
Authority TDHD. 2023 [Available from: https://sundhedsdatastyrelsen.dk/da/english.
(WHO) WHO. International Classification of Diseases 2023 [Available from: https://icd.who.int/en.
Authority TDHD. SKS-Browser 2023 [Available from: https://medinfo.dk/sks/.
Denmark S. 2023 [Available from: https://www.dst.dk.
Helweg-Larsen K. The danish register of causes of death. Scand J Public Health. 2011;39(7 Suppl):26–9.
group SW, collaboration ESCCr, 2021, SCORE2 risk prediction algorithms: new models to estimate 10-year risk of cardiovascular disease in Europe. European Heart Journal, 42(25), 2439–54.
Kaalby L, Skytthe A, Andersen-Ranberg K, Jeune B. Causes of death among 9000 Danish centenarians and semisuper-centenarians in the 1970–2012 period. In: Maier H, Jeune B, Vaupel JW, editors. Exceptional lifespans. Cham: Springer International Publishing; 2021. p. 85–102.
Pedersen CB. The Danish civil registration system. Scand J Public Health. 2011;39(7):22–5.
Soerensen M, Li W, Debrabant B, Nygaard M, Mengel-From J, Frost M, et al. Epigenome-wide exploratory study of monozygotic twins suggests differentially methylated regions to associate with hand grip strength. Biogerontology. 2019;20(5):627–47.
Nygaard M, Larsen MJ, Thomassen M, McGue M, Christensen K, Tan Q, et al. Global expression profiling of cognitive level and decline in middle-aged monozygotic twins. Neurobiol Aging. 2019;84:141–7.
Zhou W LPaSH. Basic annotation with suggested overall masking - HM450 2018 [02/02/2024]. Available from: https://zwdzwd.github.io/InfiniumAnnotation/EPIC_hm450_hg19.html.
Du P, Zhang X, Huang CC, Jafari N, Kibbe WA, Hou L, et al. Comparison of Beta-value and M-value methods for quantifying methylation levels by microarray analysis. BMC Bioinform. 2010;11:587.
Debrabant B, Soerensen M, Christiansen L, Tan Q, McGue M, Christensen K, et al. DNA methylation age and perceived age in elderly Danish twins. Mech Ageing Dev. 2018;169:40–4.
(NIPH) WCCfDSMNIoPH. Guidelines for ATC classification and DDD assignment 2020. 2019.
Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J Roy Stat Soc: Ser B (Methodol). 1995;57(1):289–300.
Mootha VK, Lindgren CM, Eriksson K-F, Subramanian A, Sihag S, Lehar J, et al. PGC-1α-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat Genet. 2003;34(3):267–73.
Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci. 2005;102(43):15545–50.
Institute USDB. Gene Set Enrichment Analysis (GSEA) - Investigate Human Gene Sets 2023 [Available from: https://www.gsea-msigdb.org/gsea/msigdb/annotate.jsp.
Broad Institute I, Massachusetts Institute of Technology, and Regents of the University of California. Help with Investigating Gene Sets 2023 [Available from: https://www.gsea-msigdb.org/gsea/msigdb/help_annotations.jsp.
Doncheva NT, Morris JH, Gorodkin J, Jensen LJ. Cytoscape stringapp: network analysis and visualization of proteomics data. J Proteome Res. 2019;18(2):623–32.
Li N, Zhang Y, Sidlauskas K, Ellis M, Evans I, Frankel P, et al. Inhibition of GPR158 by microRNA-449a suppresses neural lineage of glioma stem/progenitor cells and correlates with higher glioma grades. Oncogene. 2018;37(31):4313–33.
Lodder EM, Scicluna BP, Beekman L, Arends D, Moerland PD, Tanck MW, et al. Integrative genomic approach identifies multiple genes involved in cardiac collagen deposition. Circ Cardiovasc Genet. 2014;7(6):790–8.
Ricklin D, Reis ES, Lambris JD. Complement in disease: a defence system turning offensive. Nat Rev Nephrol. 2016;12(7):383–401.
Zheng YY, Xie X, Ma YT, Yang YN, Fu ZY, Li XM, et al. Association of C5aR1genetic polymorphisms with coronary artery disease in a Han population in Xinjiang. China Diagn Pathol. 2015;10:33.
Zheng Y, Xie X, Yang Y, Fu Z, Li X, Pan S, et al. GW26-e2365 association of c5ar1 genetic polymorphisms with pathogenesis and prognosis of coronary artery disease in xinjiang han and uygur population. J Am Coll Cardiol. 2015;66(16):C33-C.
Shi Y, Jin Y, Li X, Chen C, Zhang Z, Liu X, et al. C5aR1 mediates the progression of inflammatory responses in the brain of rats in the early stage after ischemia and reperfusion. ACS Chem Neurosci. 2021;12(21):3994–4006.
Weiss S, Rosendahl A, Czesla D, Meyer-Schwesinger C, Stahl RAK, Ehmke H, et al. The complement receptor C5aR1 contributes to renal damage but protects the heart in angiotensin II-induced hypertension. Am J Physiol-Renal Physiol. 2016;310(11):F1356–65.
Zhang J, Zhou R, Cao G, Zhang Y, Xu H, Yang H. Guhong injection prevents ischemic stroke-induced neuro-inflammation and neuron loss through regulation of C5ar1. Front Pharmacol. 2022;13: 818245.
Michaud JL, Rosenquist T, May NR, Fan CM. Development of neuroendocrine lineages requires the bHLH-PAS transcription factor SIM1. Genes Dev. 1998;12(20):3264–75.
Bonnefond A, Raimondo A, Stutzmann F, Ghoussaini M, Ramachandrappa S, Bersten DC, et al. Loss-of-function mutations in SIM1 contribute to obesity and Prader-Willi-like features. J Clin Invest. 2013;123(7):3037–41.
Romer AI, Singer RA, Sui L, Egli D, Sussel L. Murine perinatal β-cell proliferation and the differentiation of human stem cell-derived insulin-expressing cells require NEUROD1. Diabetes. 2019;68(12):2259–71.
García-Rivas G, Castillo EC, Gonzalez-Gil AM, Maravillas-Montero JL, Brunck M, Torres-Quintanilla A, et al. The role of B cells in heart failure and implications for future immunomodulatory treatment strategies. ESC Heart Failure. 2020;7(4):1387–99.
Henning C, Branopolski A, Follert P, Lewandowska O, Ayhan A, Benkhoff M, et al. Endothelial β1 integrin-mediated adaptation to myocardial ischemia. Thromb Haemost. 2021;121(6):741–54.
Li L, Guan Q, Dai S, Wei W, Zhang Y. Integrin β1 increases stem cell survival and cardiac function after myocardial infarction. Front Pharmacol. 2017;8:135.
Fujioka T, Kaneko N, Ajioka I, Nakaguchi K, Omata T, Ohba H, et al. β1 integrin signaling promotes neuronal migration along vascular scaffolds in the post-stroke brain. EBioMedicine. 2017;16:195–203.
Jin J, Zhu C, Wang J, Zhao X, Yang R. The association between ACTB methylation in peripheral blood and coronary heart disease in a case-control study. Front Cardiovasc Med. 2022;9: 972566.
Yang S, Zhao Y, Chen X, Lu X, Chen Y, Zhao X, et al. The ACTB variants and alcohol drinking confer joint effect to ischemic stroke in chinese han population. J Atheroscler Thromb. 2020;27(3):226–44.
Wang Z, Zhao YT, Zhao TC. Histone deacetylases in modulating cardiac disease and their clinical translational and therapeutic implications. Exp Biol Med (Maywood). 2021;246(2):213–25.
Han Y, Nie J, Wang DW, Ni L. Mechanism of histone deacetylases in cardiac hypertrophy and its therapeutic inhibitors. Front Cardiovasc Med. 2022;9: 931475.
Mele C, Iatropoulos P, Donadelli R, Calabria A, Maranta R, Cassis P, et al. MYO1E mutations and childhood familial focal segmental glomerulosclerosis. N Engl J Med. 2011;365(4):295–306.
Sethna CB, Ng DK, Jiang S, Saland J, Warady BA, Furth S, et al. Cardiovascular disease risk among children with focal segmental glomerulosclerosis: a report from the chronic kidney disease in children study. Pediatr Nephrol. 2019;34(8):1403–12.
Halder SS, Rynkiewicz MJ, Creso JG, Sewanan LR, Howland L, Moore JR, et al. Mechanisms of pathogenicity in the hypertrophic cardiomyopathy-associated TPM1 variant S215L. PNAS Nexus. 2023;2(3):pagd011.
England J, Granados-Riveron J, Polo-Parada L, Kuriakose D, Moore C, Brook JD, et al. Tropomyosin 1: multiple roles in the developing heart and in the formation of congenital heart defects. J Mol Cell Cardiol. 2017;106:1–13.
Yuen M, Cooper ST, Marston SB, Nowak KJ, McNamara E, Mokbel N, et al. Muscle weakness in TPM3-myopathy is due to reduced Ca2+-sensitivity and impaired acto-myosin cross-bridge cycling in slow fibres. Hum Mol Genet. 2015;24(22):6278–92.
Damluji AA, Alfaraidhy M, AlHajri N, Rohant NN, Kumar M, Malouf CA, et al. Sarcopenia and cardiovascular diseases. Circulation. 2023;147(20):1534–53.
Liu YF, Swart M, Ke Y, Ly K, McDonald FJ. Functional interaction of COMMD3 and COMMD9 with the epithelial sodium channel. Am J Physiol-Renal Physiol. 2013;305(1):F80–9.
Gagnon KB, Delpire E. sodium transporters in human health and disease. Front Physiol. 2020;11: 588664.
Shigekawa M, Iwamoto T. Cardiac Na<sup>+</sup>-Ca<sup>2+</sup> exchange. Circ Res. 2001;88(9):864–76.
Yu C-Y, Theusch E, Lo K, Mangravite LM, Naidoo D, Kutilova M, et al. HNRNPA1 regulates HMGCR alternative splicing and modulates cellular cholesterol metabolism. Hum Mol Genet. 2013;23(2):319–32.
Zhang L, Chen Q, An W, Yang F, Maguire EM, Chen D, et al. Novel pathological role of hnRNPA1 (heterogeneous nuclear ribonucleoprotein A1) in vascular smooth muscle cell function and neointima hyperplasia. Arterioscler Thromb Vasc Biol. 2017;37(11):2182–94.
Watters DJ. Oxidative stress in ataxia telangiectasia. Redox Rep. 2003;8(1):23–9.
Mercer JR, Cheng KK, Figg N, Gorenne I, Mahmoudi M, Griffin J, et al. DNA damage links mitochondrial dysfunction to atherosclerosis and the metabolic syndrome. Circ Res. 2010;107(8):1021–31.
Ding X, He Y, Hao Q, Chen S, Yang M, Leng SX, et al. The association of single nucleotide polymorphism rs189037C>T in ATM gene with coronary artery disease in Chinese Han populations: a case control study. Medicine (Baltimore). 2018;97(4): e9747.
Su Y, Swift M. Mortality rates among carriers of ataxia-telangiectasia mutant alleles. Ann Intern Med. 2000;133(10):770–8.
Weber DS. A DEAD-Box Stop of Vascular Remodeling. Circ Res. 2019;124(10):1405–7.
Fan Y, Chen Y, Zhang J, Yang F, Hu Y, Zhang L, et al. Protective role of RNA helicase DEAD-Box protein 5 in smooth muscle cell proliferation and vascular remodeling. Circ Res. 2019;124(10):e84–100.
Acknowledgements
Not applicable
Funding
Open access funding provided by University of Southern Denmark. The project was supported by the Fabrikant Vilhelm Pedersen og Hustrus Legat on recommendation by the Novo Nordisk Foundation, Else og Mogens Wedell Wedellsborgs Fond, Ulla og Mogens Folmer Andersens Fond, the Danish Council for Independent Research—Medical Sciences (DFF- 6110-00016), the European Union’s Seventh Framework Program (FP7/2007–2011) under grant Agreement No. 259679 and the National Program for Research Infrastructure 2007 grant 09-063256 from the Danish Agency, for Science Technology and Innovation.
Author information
Authors and Affiliations
Contributions
ACS performed all data and bioinformatics analyses and wrote the original draft. ACS and MS did conception and design of the study, interpretation of the results, project administration, and review and editing of the draft. AM, HCB, and QT did interpretation of the results, and review and editing of the draft. All authors have read and approved the final version of the manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Informed consent to take part in the cohorts was obtained from all participants, and the survey was approved by the Regional Scientific Ethical Committees for Southern Denmark (S-VF-19980072, S-VF-20040241 and S-20090033) and conducted in accordance with the Helsinki II declaration.
Consent for publication
Not applicable.
Competing interests
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Skovgaard, A.C., Mohammadnejad, A., Beck, H.C. et al. Multi-omics association study of DNA methylation and gene expression levels and diagnoses of cardiovascular diseases in Danish Twins. Clin Epigenet 16, 117 (2024). https://doi.org/10.1186/s13148-024-01727-6
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s13148-024-01727-6