Assessment of differentially methylated loci in individuals with end-stage kidney disease attributed to diabetic kidney disease: an exploratory study

Background A subset of individuals with type 1 diabetes mellitus (T1DM) are predisposed to developing diabetic kidney disease (DKD), the most common cause globally of end-stage kidney disease (ESKD). Emerging evidence suggests epigenetic changes in DNA methylation may have a causal role in both T1DM and DKD. The aim of this exploratory investigation was to assess differences in blood-derived DNA methylation patterns between individuals with T1DM-ESKD and individuals with long-duration T1DM but no evidence of kidney disease upon repeated testing to identify potential blood-based biomarkers. Blood-derived DNA from individuals (107 cases, 253 controls and 14 experimental controls) were bisulphite treated before DNA methylation patterns from both groups were generated and analysed using Illumina’s Infinium MethylationEPIC BeadChip arrays (n = 862,927 sites). Differentially methylated CpG sites (dmCpGs) were identified (false discovery rate adjusted p ≤ × 10–8 and fold change ± 2) by comparing methylation levels between ESKD cases and T1DM controls at single site resolution. Gene annotation and functionality was investigated to enrich and rank methylated regions associated with ESKD in T1DM. Results Top-ranked genes within which several dmCpGs were located and supported by functional data with methylation look-ups in other cohorts include: AFF3, ARID5B, CUX1, ELMO1, FKBP5, HDAC4, ITGAL, LY9, PIM1, RUNX3, SEPTIN9 and UPF3A. Top-ranked enrichment pathways included pathways in cancer, TGF-β signalling and Th17 cell differentiation. Conclusions Epigenetic alterations provide a dynamic link between an individual’s genetic background and their environmental exposures. This robust evaluation of DNA methylation in carefully phenotyped individuals has identified biomarkers associated with ESKD, revealing several genes and implicated key pathways associated with ESKD in individuals with T1DM. Supplementary Information The online version contains supplementary material available at 10.1186/s13148-021-01081-x.


Background
Type 1 diabetes mellitus (T1DM) is a polygenic disease characterised by autoimmune destruction of the insulin producing beta cells in the pancreas which subsequently Open Access *Correspondence: laura.smyth@qub.ac.uk 1 Molecular Epidemiology Research Group, Centre for Public Health, Queen's University Belfast, Belfast, UK Full list of author information is available at the end of the article leads to hyperglycaemia [1]. Up to 40% of individuals with T1DM are predisposed to diabetic kidney disease (DKD), a microvascular diabetic complication [2,3]. Diabetes remains the primary disease causing end-stage kidney disease (ESKD) [4,5]. Individuals with DKD are also at increased risk of developing cardiovascular disease and premature mortality [6].
Epigenetic changes in DNA methylation have been associated with T1DM [17] and DKD [18] using peripheral whole blood samples. Bell et al. [19] used Illumina's Infinium 27K array in 2010 to assess epigenetic profiles associated with DKD in T1DM within 192 individuals. Nineteen differently methylated CpG sites (dmCpGs) were reported and correlated with time to the development of DKD. Six genes of interest with at least two dmCpGs including CUX1, ELMO1, FKBP5, PRKAG2 and PTPRN2 were identified from an epigenome-wide association study (EWAS) published in 2014 [20]. This investigation included individuals with DKD, caused by either T1DM or type 2 diabetes mellitus (T2DM).
In 2014, Gu et al. [21] employed bisulphite pyrosequencing of 778 individuals with T1DM, with and without DKD, and reported a decrease in the DNA methylation levels within IGFBP1. In 2015, Swan et al. [22] assessed DNA methylation variation in genes which encode mitochondrial proteins using Illumina's 450K and 27K methylation arrays in 442 individuals with T1DM and DKD. In all, 46 dmCpGs were identified in individuals with both DKD and ESKD. The largest change in methylation was evident for cg03169527 within TAMM41. Previous EWAS have identified differential methylation features with associations to chronic kidney disease (CKD) [23][24][25][26]. Improvements in sequencing and profiling technologies have prompted a rise in the number of studies assessing diseases using these techniques for biomarker discovery [23,27].
The aim of the present exploratory analysis was to assess differences in blood-derived DNA methylation patterns (using a high-density array) between individuals with T1DM-ESKD (including individuals requiring dialysis, or those having received a kidney transplant) and individuals with at least 15 years of T1DM and no evidence of kidney disease to identify blood-based biomarkers for T1D-ESKD. Careful and precise phenotyping has provided the ability to minimise differences caused by dialysis treatment; we also evaluated differences in methylation between individuals with T1DM-ESKD who had received a kidney transplant and thus were receiving immunosuppressive medication, versus those with long duration of T1DM and no evidence of kidney disease.

Results
The Illumina Infinium MethylationEPIC BeadChip array examines 862,927 CpG sites. Each resulting.idat file generated from the iScan was assessed using Illumina's BACR software. This software assessed the data in connection with a pre-set standard set of thresholds [28] which are indicated alongside the BACR data for each analysis. QC was completed with no significant difference in intensity levels detected. On average, fewer than 30 probes failed per sample. Houseman estimates [29] were calculated for the proportional WCCs of each sample, and the results were included as Additional file 1 for each of the four discovery analyses; Additional file 1: Table S2, Table S10, Table S19 and Table S27. Proportional WCCs for each group were compared with a general population control group, NICOLA (Northern Ireland Cohort for the Longitudinal Study of Ageing).
Blood-derived DNA from 360 individuals was included in this analysis. Of these participants, 253 were control individuals with a long duration of T1DM and no evidence of kidney disease upon repeated testing. The mean age of these participants was 41 years and with an average duration of diabetes of 26 years and without anti-hypertensive medication usage. The remaining 107 individuals were diagnosed with T1DM-ESKD, 73 of whom had received a kidney transplant. The average age of these participants was 41 years. The participant characteristics are provided in Table 1.
This study was conducted using four complementary analyses; matching of the participants was required for Analyses 1 and 3. Individuals were matched for age at sample collection (differing by < 5 years), sex and duration of diabetes (differing by ≤ 10 years). Concordance plots were drawn for seven duplicate samples; average r 2 = 0.99; Additional file 2: Figure S1.

Analysis 1: individuals with T1DM and ESKD compared to matched control individuals with T1DM and no evidence of kidney disease
This first analysis included 107 matched pairs, where both the case and control individuals were matched on sex and age at diagnosis ( Table 1). The BACR QC report is included within Additional file 1: Table S1. Proportional WCCs were estimated for all included participant blood samples. Additional file 1: Table S2 shows the average WCCs for Analysis 1 and the results of a t test comparing these between the two sample groups. Two cells, granulocytes and monocytes showed significant differences at the 0.05 level, the most significant of which was 0.04; but after correction for multiple testing of the WCC analyses, these results are not significant (p > 0.002).
Comparison of the methylation patterns between cases and controls identified 4383 top-ranked dmCpGs (FDR p ≤ × 10 -8 , Additional file 1: Table S3). Two genes, ARID5B and SEPTIN9, contained ≥ 10 topranked dmCpGs. When the stringency was increased to include a larger change in methylation between cases and controls (FDR p ≤ × 10 -8 and FC ± 2), 489 CpG sites remained significant (Additional file 1: Table S4) of which 331 were gene centric. Of these, 22 genes contained at least two dmCpGs including HDAC4, ITGAL, LY9, PBX1, RPTOR, RUNX3 and SEPTIN9 (Additional file 1: Table S4). A Manhattan plot was drawn using the qqman R package for all CpG sites from this analysis (Additional file 2: Figure S2).
To further assess the functional significance of these changes in DNA methylation between case and control groups, a GO enrichment analysis was undertaken. This analysis assessed the biological processes, cellular components and molecular functions of the genes within which the 489 top-ranked CpG sites were located (FDR p ≤ × 10 -8 and FC ≥ ± 2). A total of 325 GO functions had an enrichment score ≥ 4 (p < 0.01, Additional file 1: Table S5 and Additional file 2: Figure S3). The processes with the top enrichment scores include cell surface receptor signalling pathway, immune system process and positive regulation of immune system processes, lymphocyte activation and T cell activation. The KEGG and REAC-TOME pathway databases were additionally searched to identify key pathways linked to the genes where the top-ranked dmCpGs were located. Eleven KEGG pathways were identified with an enrichment score of ≥ 2 and p ≤ 0.01 (Additional file 1: Table S6). This analysis of differentially methylated genes returned pathways including Th17 cell differentiation, human T-cell leukaemia virus 1 infection and natural killer cell-mediated cytotoxicity. Ten REACTOME pathways were identified including six related to RUNX1 or RUNX3; RUNX3 regulates RUNX1mediated transcription, transcriptional regulation by RUNX3, regulation of RUNX1 expression and activity, RUNX3 regulates immune response and cell migration, RUNX1 regulates transcription of genes involved in differentiation of myeloid cells and RUNX3 regulates p14-ARF (Additional file 1: Table S7). The eFORGE-TF database was searched for transcription factor (TF) motifs associated with the top-ranked dmCpGs (FDR p ≤ × 10 -8 and FC ≥ ± 2) from within the most biologically plausible genes (ARID5B, HDAC4, ITGAL, LY9, PBX1, PIM1, RUNX3, SEPT9 and UPF3A, available in Table 2). Of the 17 dmCpGs examined, eight reported TF motifs where q < 0.05. These are detailed in Additional file 1: Table S8.

Analysis 2: individuals with T1DM and ESKD compared to a larger cohort of unmatched control individuals with T1DM and no evidence of kidney disease
This second analysis included the same 107 individual cases from Analysis 1 with a larger sample size of (unmatched) control individuals with T1DM and no evidence of kidney disease ( Table 1). The BACR QC report and proportional WCCs for this analysis are included within Additional file 1: Table S9 and Additional file 1:  Table S10, respectively.
Following the same analysis path as previously described, comparison of methylation between case individuals and control individuals identified 13,926 top-ranked dmCpGs (FDR p ≤ × 10 -8 ; Additional file 1: Table S11). Two genes ETS1 and UBAC2 contained over 20 dmCpGs. When the stringency levels were increased (FDR p ≤ × 10 -8 and FC ± 2), 1106 CpG sites remained, of which 764 were gene centric (Additional file 1: Table S12). SEPTIN9 contained the largest number of dmCpGs (n = 5) with the criteria set to include FC ± 2. Comparison of results from Analyses 1 and 2 (FDR p ≤ × 10 -8 and FC ± 2) identified 325 dmCpGs (223 within genes) which overlapped. Each dmCpG demonstrated the same direction of effect (Additional file 1: Table S13). A Manhattan plot (Additional file 2: Figure S4) was drawn using the qqman R package for all CpG sites from this analysis. A QQ plot for combined p values from Analyses 1 and 2 is included in Additional file 2: Figure S5.
GO enrichment analysis was similarly undertaken to assess the functional significance of the 1,106 significant DNA methylation alterations between the case and control groups. A total of 505 GO functions had an enrichment score ≥ 4, alongside p < 0.01 (Additional file 1: Table S14 and Additional file 2: Figure S6). The processes with the top enrichment scores include several linked to immune responses including regulation of immune system processes and lymphocyte activation. The KEGG and REACTOME pathway analyses assessed the same list of genes in which the 1,112 dmCpGs were located. KEGG pathway analysis revealed 16 pathways (enrichment score of ≥ 2, and p ≤ 0.01; Additional file 1: Table S15), including cancer, acute myeloid leukaemia and natural killer cell-mediated cytotoxicity. REACTOME pathway analysis identified 18 pathways (p < 0.01) including transcriptional regulation of granulopoiesis and transcriptional activation of cell cycle inhibitor p21 (Additional file 1: Table S16).

Analysis 3: individuals with T1DM and ESKD who have received a kidney transplant compared to matched control individuals with T1DM and no evidence of kidney disease
The inclusion criterion for the case subjects was restricted to individuals with a functioning kidney transplant for the third analysis. Only blood-derived DNA samples from individuals who had received a kidney transplant were included to minimise potential confounding due to differences in medication and renal replacement therapy modalities (n = 73). The methylation status of the CpG sites for these individuals was compared to matched controls with T1DM and no evidence of kidney disease (n = 73). The BACR QC report and proportional WCCs for this analysis are included within Additional file 1: Tables S18 and S19, respectively.
In total, 1,515 top-ranked dmCpGs were different between cases and controls (FDR p ≤ × 10 -8 ; Additional file 1: Table S20). Of those, 132 of the top-ranked dmCpGs remained when the stringency levels were increased (FDR p ≤ × 10 -8 and FC ± 2), 81 of which were gene centric including two within MTURN and UPF3A (Additional file 1: Table S21). A Manhattan plot (Additional file 2: Figure S7) was drawn using the qqman R package for all CpG sites from this analysis.
Additional GO enrichment and pathway analyses were undertaken for the 132 top-ranked genes in which the dmCpGs were located (FDR p ≤ × 10 -8 and FC ± 2) to assess their functional significance. In total, 75 GO functions were enriched with a score ≥ 4, alongside p < 0.01 (Additional file 1: Table S22, Additional file 2: Figure  S8). The KEGG pathway analysis showed one resultthe TGF-β signalling pathway with an enrichment score of ≥ 2 and p ≤ 0.01, Additional file 1: Table S23) and four REACTOME pathways were identified (entities p < 0.01) including the nuclear receptor transcription pathway and signalling by TGF-β family members (Additional file 1: Table S24).
The eFORGE-TF database was searched for TF motifs associated with the top-ranked dmCpGs (FDR p ≤ × 10 -8 and FC ≥ ± 2) from within the most biologically plausible genes (ARID5B, FKBP5 and UPF3A, available in Table 2).  Of the four dmCpGs examined, one reported a TF motif (q < 0.05) which is detailed in Additional file 1: Table S25.

Analysis 4: Individuals with T1DM who had received a kidney transplant compared to unmatched control individuals with T1DM and no evidence of kidney disease
The methylation status of the CpG sites for individuals who had received a kidney transplant was compared to unmatched controls with T1DM and no evidence of kidney disease (n = 253). The BACR and proportional WCCs QC report are included within Additional file 1: Table S26 and Additional file 1: Table S27, respectively. In total, 13,654 top-ranked dmCpGs were identified between both groups (FDR p ≤ × 10 -8 , Additional file 1: Table S28). Of these, 9,064 dmCpGs were gene centric. When the stringency levels were increased (FDR p ≤ × 10 -8 and FC ± 2), 1,070 CpG sites remained of which 714 were located within genes (Additional file 1: Table S29). A Manhattan plot (Additional file 2: Figure S9) was drawn using the qqman R package for all CpG sites from this analysis.
Combined p values from Analyses 3 and 4 are presented in a QQ plot (Additional file 2: Figure S10) drawn using qqman. Seven genes including ACAD8, LIME1, RPTOR and SEPTIN9 each contained three dmCpGs. In total, 78 dmCpGs (48 were within genes) overlapped between Analyses 3 and 4 (FDR p ≤ × 10 -8 and FC ± 2, Additional file 1: Table S30). GO enrichment and pathway analyses were completed for the genes in which the 1,070 top-ranked dmCpGs were located (FDR p ≤ × 10 -8 and FC ± 2). A total of 679 GO functions had an enrichment score ≥ 4 and p < 0.01 (Additional file 1: Table S31 and Additional file 2: Figure S11). The processes with the top enrichment scores include regulation of cell activation, enzyme binding and immune system processes. Fourteen KEGG pathways were identified (enrichment score of ≥ 2, and p ≤ 0.01; Additional file 1: Table S32) including cancer and platelet activation, and 16 REACTOME pathways (entities p < 0.01) including the SUMOylation of intracellular receptors and nuclear receptor transcription pathway (Additional file 1: Table S33).

Significantly associated dmCpGs from analyses 1-4
Thirty-six dmCpGs were identified as significantly different between cases and controls in each of the four analyses (FDR p ≤ × 10 -8 and FC ± 2; Additional file 1: Table S35 and Additional file 2: Figure S12). The direction of fold change was consistent for each Analysis.

Search for overlapping SNPs
Previous studies have reported that SNPs could potentially impact on methylation status [30]. To assess this, we searched the Infinium HD Methylation SNP List [31] for any SNPs that could potentially impact the methylation array results if present in the test population. Of the top-ranked dmCpGs from this analysis (FDR p ≤ × 10 -8 , FC ± 2), five single CpG sites have the potential to be affected by SNPs (Additional file 1: Table S36). All but two of these SNPs, rs4788986 (SEPTIN9) and rs742232 (RUNX3) are very rare in European populations and would therefore be unlikely to impact in this study. Of All genes included in Table 2 are reported in at least two of the categories except PRKAG2, PTPRN2 and TAMM41. Where additional dmCpGs were located within a candidate gene listed, their FCs are indicated. Approved gene symbols are included in Table 2, the gene symbols as included in the manifest file from Illumina are included in the Additional file 1 CpGs cytosine-phosphate-guanine sites, dmCpGs differentially methylated CpG sites, eGFR estimated glomerular filtration rate, eQTL expression quantitative trait loci, ESKD end-stage kidney disease, FC fold change, FDR false discovery rate, GO gene ontology, KEGG Kyoto encyclopedia of genes and genomes, T1DM type 1 diabetes mellitus, T2DM type 2 diabetes mellitus a eQTL support (p < × 10 -5 ) in American Indians (T2DM); b glomerular kidney tissue support (p < 0.05); c tubular kidney tissue support (p < 0.05); d kidney tubule support for eGFR slope (p < 0.05) using 450K data; e kidney tubule support for fibrosis (p < 0.05) using 450K data; f blood support for eGFR (p < 0.05) using 450K data; g blood support for eGFR slope (p < 0.05) using 450K data note, multiple dmCpGs were identified in both SEPTIN9 and RUNX3 genes with functional support provided for both these genes, making it improbable that these genes are falsely identified. Additional searches were conducted for the presence of SNP-CpGs within the dmCpGs identified from within the most biologically plausible genes which are included in Table 2 using the BBMRI-NL atlas platform. Of the 28 dmCpGs examined, eight were shown to have been potentially affected by SNPs. These are included within Additional file 1: Table S37.

STRING functional analyses
Functional network analyses were undertaken using STRING v11 [32] for the list of genes in which the topranked dmCpGs were located. Those which showed an increase in FC in the individuals with T1DM-ESKD compared to individuals with T1DM were analysed separately to those genes which showed a decrease in FC. All pathway interactions are shown in Additional file 2: Figures S13-S20.

Functional data results
Functional support was sought from existing glomerular and tubular expression data alongside eGFR and fibrosis in both blood and kidney biopsy tissue using data ascertained from the 450K array (all p < 0.05) [24,33]. Seven differentially methylated genes were replicated in the glomerular database and six in the tubular (Additional file 1: Table S38; Table 2). The genes in which top-ranked dmCpGs were located, which also had the most biological plausibility for ESKD, were also assessed for eQTLs in an American Indian population with T2DM and known renal status, in the absence of a more suitable cohort with the same phenotype as our discovery population [34]. EQTLs demonstrated support for eight of the top-ranked dmCpG genes; CUX1, ELMO1, HDAC4, PIM1, PRKAG2, PTPRN2, RUNX3, SEPTIN9 (p ≤ 10 -5 , Additional file 1: Table S38) [24,33].

Look-ups of top-ranked methylation markers
In the absence of an adequately powered, independent replication cohort using an EWAS array, we sought alternatives to provide supporting data to the top-ranked methylation markers reported from the discovery analysis. We have examined the association between the methylation of top-ranked CpGs and kidney function from this study and previous studies which had been performed on subjects with T2DM and known renal status, including cohorts with blood sample methylome analysis and others where methylation was studied in microdissected human kidney tubule samples. We confirmed nominally significant association with kidney function at the CUX1, PTPRN2 and SEPTIN9 locus and with kidney function decline at AFF3, HDAC4 and SEPTIN9 regions. Furthermore, the methylation level at the AFF3, CUX1 and PIM1 regions correlated with the degree of fibrosis in micro-dissected human kidney tissue samples.

Summary
A summary of results is included within Table 2, Additional file 2: Figures S21 and S22, including specific details regarding the strongest biologically plausible candidates linked to T1DM-ESKD, the GO functions and pathways. Table 2 highlights the top-ranked genes and pathways of interest and states the number of significant results from each section of the analysis.

Discussion
Epigenetic alterations provide a dynamic link between genetic background and environmental exposures. These alterations have been proposed to play an important role in kidney disease [20,23] and T1DM [35]. Previous research assessing T1DM-DKD [19][20][21][22]36] has identified dmCpGs using bisulphite pyrosequencing and either the Illumina Infinium 450K or 27K methylation arrays. This manuscript describes differences in DNA methylation patterns between individuals with T1DM and those with T1DM-ESKD, utilising the MethylationEPIC array technology.
Several top-ranked dmCpGs identified through this study showed an increase in FC related to the more severe phenotype of T1DM-ESKD compared to the T1DM control population.
The genes from each analysis with the most biological plausibility having assessed the literature are included in Table 2 and Additional file 2: Figures S21 and S22. Of the 16 genes in which dmCpGs were located with a significance level of FDR p ≤ × 10 -8 , (Additional file 2: Figure S21), 11 retained this status when the stringency level was increased (FDR p ≤ × 10 -8 and FC ± 2; Additional file 2: Figure S22). DmCpGs were located within FKBP5 and RUNX3 in each of the analyses where FDR p ≤ × 10 -8 . When the stringency was increased (FDR p ≤ × 10 -8 and FC ± 2), they remained top-ranked genes in all but Analyses 1 and 3, respectively.
FKBP5, also known as FK506 binding protein 51, acts as a co-chaperone of hsp90 to aid the modulation of glucocorticoid receptor sensitivity in response to stress [37,38]. Polymorphisms in this gene lead to an extended stress hormone response following exposure [37]. Recently, genome-wide analyses of human blood found associations between FKBP5 mRNA and a pro-inflammatory profile [39]. Moreover, aberrant FKBP5 methylation has previously been implicated in the pathology of numerous diseases, particularly in diseases common in older populations. This hypermethylation is exemplified in myocardial infarction [39] and conditions such as T2DM [40] and CKD [20]. All top-ranked dmCpGs within this gene were located in either the gene body or 5′ UTR and showed increased methylation for individuals with T1DM-ESKD. As a hallmark feature of CKD is persistent, low-to-moderate levels of circulating inflammatory markers [41], with distinguishing features such as nephron loss with subsequent acceleration of organ fibrosis, further study is required to determine if FKBP5 plays a mechanistic role in CKD development.
The runt-related transcription factor 3 (RUNX3) plays a downstream role in the TGF-β signalling pathway [42]. Its suppression has been implicated in tumour growth, migration and invasion [43]. In 2019, Cen et al. reported that higher methylation levels in RUNX3 were associated with a shorter renal cell carcinoma survival time [44]. Additionally, they suggested independent predictors of heightened methylation levels of this gene included the presence of intra-tumour vascularity. We identified increased FC in dmCpGs within the RUNX3 gene body and the UTRs in association with T1DKD-ESKD.
As indicated in Additional file 2: Figure S22, a further six genes, HDAC4, ITGAL, LY9, PBX1, PIM1 and SEP-TIN9 had top-ranked dmCpGs from three of the four analyses. These genes did not reach statistical significance in Analysis 3, which had the smallest population which compared DNA methylation patterns in 73 individuals with T1DM-ESKD (transplant only) to 73 individuals, matched, with T1DM.
Histone deacetylases (HDACs) are a group of enzymes that are characterised into three defined classes, known as I, II and III [45,46]. They each have roles in removal of acetyl groups from histone and non-histone proteins, chromatin condensation and transcriptional repression [45,46]. They have the ability to impact cellular function via both epigenetic and non-epigenetic mechanisms [46]. HDAC4 (a member of class II) was reported to reduce kidney injury during in vivo animal studies [45][46][47]. Additionally, HDAC4 was found to be over-expressed in kidney epithelial cells of a murine kidney fibrosis model [48]. Subsequent treatment with HDAC inhibitors demonstrated that the development and progression of kidney fibrosis can be inhibited by suppressing the activation and expression of numerous pro-fibrotic molecules, such as fibronectin and collagen 1 [48]. In the present study, HDAC4 showed an increased FC for all dmCpGs except cg25569341 in individuals with ESKD undergoing dialysis or who had undergone a transplant.
Integrins are integral membrane proteins which are heterodimeric in nature. They are comprised of alpha and beta chains which together form the integrin lymphocyte function-associated antigen-1, expressed in leukocytes. ITGAL encodes the alpha L chain and its expression has been previously linked to renal cell carcinoma [49]. A second report has demonstrated that DNA methylation of the ITGAL gene is heavily methylated in fibroblasts and demethylated in T lymphocytes [50]. The top-ranked dmCpGs present within the gene body or north shelf of ITGAL showed an increase in FC in individuals with T1DM-ESKD who were immunosuppressed following transplantation, compared to those with T1DM with no evidence of kidney disease.
Lymphocyte antigen 9 (LY9) is a member of the signalling lymphocyte activation molecule family receptor and is involved in immune responses. In 2019, Parikova et al. [51] reported expression of LY9 to be significantly increased in individuals receiving long-term dialysis compared to those who had received dialysis over a shorter time period. Our findings support these observations showing an increased FC in LY9 dmCpGs located within either the gene body or 5′ UTR in individuals with T1DM-ESKD compared to those with no kidney disease.
PBX1 encodes a nuclear protein within the PBX homeobox family of transcription factors. Through this, it can affect the expression of several genes including those which regulate insulin action and glucose metabolism [52]. In the current study, dmCpGs within the PBX1 gene body showed a consistent decrease in FC at the highest stringency level (FDR p ≤ × 10 -8 , FC ± 2) in individuals with T1DM-ESKD across each of the four analyses. Previous analyses of this gene have linked differential methylation patterns to higher birth weight-for-gestational age [53] and a translocational rearrangement of this gene and TCF3/E2A has been associated with B-cell acute lymphoblastic leukaemia [54]. Additionally, PBX1 haploinsufficiency has been linked to congenital anomalies of the kidney and urinary tract [55], reported to be involved in the proliferation of cells in renal cell carcinoma [56] and has been recognised as a candidate gene for T2DM [52,57].
PIM1 belongs to the serine/threonine kinase family [58] and its overexpression has previously been implicated in diseases such as ovarian cancer [59] and breast cancer [60]. Overexpression of PIM1 appears to influence cancer development in three ways, preventing apoptosis, enhancing cellular proliferation and through promoting genomic instability [61]. More specific to kidney research, PIM1 is aberrantly overexpressed in renal cell carcinoma [62] and lupus nephritis [63]. In this analysis, we identified top-ranked dmCpGs present within the 3′ UTR and north shore of PIM1 which resulted in an increase in the FC in individuals with T1D-ESKD.
Septins are a group of GTPase proteins that play a role in cytoskeleton organisation through their links with microtubules and actin filaments [64]. Although some septins are associated with the advancement of kidney fibrosis [64], SEPTIN9 specifically has been implicated in a host of disease pathologies, such as prostate cancer [65], colorectal cancer [66], liver fibrosis [67] and T2DM [68]. Moreover, SEPTIN9 overexpression has been shown to promote kidney epithelial cell migration [69]. Additionally Dayeh et al. reported SEPTIN9, alongside PTPRN2, as one of the top-ranked differentially methylated genes when comparing pancreatic islets from individuals with and without T2DM [70]. In this analysis, 20 dmCpGs were located in SEPTIN9, 17 of which displayed an increase in FC in individuals with T1DM-ESKD.
An additional three top-ranked genes which contained dmCpGs were common in two analyses; ARID5B and UPF3A, which were identified from Analyses 1 and 3, and CUX1 which resulted from Analyses 2 and 4 (FDR p ≤ × 10 -8 and FC ± 2; Additional file 2: Figure S22).
Each of the dmCpGs within ARID5B were located within the gene body and all were consistently increased in cases with ESKD (both chronic dialysis and transplant recipients) compared with controls with long duration of T1DM and no evidence of kidney disease. It has been previously demonstrated that ARID5B has a regulatory role in the phenotypic change of smooth muscle cells and SNPs within ARID5B have been linked to T2DM and coronary artery disease [71]. Differential methylation of a six-probe region spanning 99 base pairs within ARID5B gene has also been reported in Alzheimer's disease [72]. ARID5B is also known to contribute to cell growth, the differentiation of B-lymphocyte progenitors and has additionally been linked to acute lymphoblastic leukaemia [73][74][75].
UPF3A encodes a protein involved in mRNA nuclear export and mRNA surveillance, it has a crucial role in downregulating aberrant mRNAs [76]. Gotoh et al. in 2014 reported that this gene alongside 14 others had significantly reduced mRNA expression levels in renal cell carcinoma compared to non-cancerous kidney cortex tissue [77]. Top-ranked dmCpGs present within UPF3A consistently displayed an increase in FC within individuals with T1DM and ESKD.
CUX1 is a protein coding member of the homeodomain family of DNA binding proteins and is involved in cell cycle regulation and kidney development through the inhibition of p27 which promotes cell proliferation in the nephrogenic zone [20,78,79]. Previously, significant DNA methylation alterations have been reported in CKD where more than one dmCpG site was located within CUX1 compared to individuals with no evidence of kidney disease [20]. Additionally, genetic abnormalities within CUX1 have been linked to polycystic kidney disease [80] and myelodysplastic syndrome [81] in mouse models. Significant dmCpGs from Analyses 2 and 4 were located within the body of this gene which additionally showed an increase in FC in individuals with T1DM-ESKD, but this gene has not previously been explored in individuals post-kidney transplant.
In 2012, the GENIE consortium conducted a metaanalysis of GWAS in T1D-DKD, which revealed an intronic SNP (rs7583877) located in the AF4/FMR2 family member 3 (AFF3) gene as significantly associated with ESKD [14,82]. Functional studies have indicated that AFF3 influences kidney tubule fibrosis through the TGF-β1 pathway [14]. The findings from our study show increased methylation levels in the dmCpGs within the body of AFF3 in the individuals with T1D-ESKD, which could result in decreased gene expression. A link between DNA methylation of this gene and T1DM has previously been considered as a mediator of the genetic risk [83]. Each of the top-ranked dmCpGs identified from this analysis were present within the body of AFF3 with increased methylation in individuals with T1DM-ESKD.
ELMO1 encodes a member of the engulfment and cell motility protein family and has been previously linked to T2DM [84][85][86], hepatocellular carcinoma [87] and inflammatory arthritis [88]. Previous methylation analyses of this gene have shown associations with gastric cancer [89] and CKD [20]. The eight top-ranked dmCpGs within ELMO1 were present within various regions of the gene and each dmCpG site reported an increase in the FC in individuals with T1DM-ESKD. One hypermethylated dmCpG site, cg01119452, had been previously reported in 2014, where it also showed hypermethylation in its association with CKD [20].
PRKAG2 is an important regulator of cellular energy status and has previously been associated with eGFR [90]. SNP rs7805747 located in PRKAG2 has been reported in association with both CKD and serum creatinine at genome-wide significance level [91]. Differential methylation within this gene has also previously been reported in association with CKD [20]. The top-ranked dmCpGs were present within the 5′ UTR or body of PRKAG2 and each showed an increase in FC in individuals with T1DM-ESKD compared to those with no evidence of kidney disease.
Identified as an auto-antigen in diabetes, PTPRN2 has previously been linked to CKD [20,92], fasting plasma glucose and obesity [93]. PTPRN2 encodes islet antigen (IA)-2β and together with IA-2, these are integral membrane proteins of dense core vesicles which are expressed throughout the body in neuroendocrine cells [94]. PTPRN2 was reported by Dayeh at el. as second of the top-ranked differentially methylated genes in a comparison of pancreatic islets from individuals with and without T2DM [70]. Each of the dmCpGs present within PTPRN2 showed a consistent increase in FC in association with T1DM-ESKD.
The function of TAMM41 in higher vertebrates still remains largely undetermined, yet it is known to play a critical role in yeast cell mitochondrial membrane maintenance [95]. Using zebrafish models of human CVD, it was determined that the developing heart overexpressed tamm41 [95,96]. Furthermore, CRISPR/Cas9-mediated knockout of the tamm41 gene resulted in immature heart valve formation [95]. Differential methylation in TAMM41 has previously been reported in both DKD and ESKD [22].
Several of the top-ranked pathways and genes defined by dmCpGs have been previously linked to leukaemia. This is not unexpected as an elevated risk of leukaemia during dialysis and after transplant failure has previously been reported in an Australian and New Zealand population [97,98]. Although rare, Alfano et al. suggested that leukaemia can occur during the post-transplant period [99].
Five of the genes in which top-ranked gene-centric dmCpGs were located from this analysis have been previously identified as strong biological candidates for chronic kidney disease in a previous investigation published by Smyth et al. [20]. The CUX1, ELMO1, FKBP5, PRKAG2 and PTPRN2 genes included CpGs with statistically significant differential methylation levels between cases and controls in both studies. Furthermore, four dmCpGs, two within ELMO1 cg08044454 and cg01119452, and two within FKBP5, cg03546163 and cg14284211 show evidence of differential methylation between cases and controls in both investigations. Different Infinium BeadChip arrays were used for the analyses. The Infinium 450K array utilised in 2014 does not cover all CpGs sites which are included in the analysis using the Infinium MethylationEPIC BeadChip.
T1DM is a polygenic disease with several candidate genes previously reported [100]. Of these candidate genes, 23 are found to include significantly dmCpGs in at least one of the four analyses. BACH2, a gene previously linked to immunodeficiency, is a candidate gene for T1DM [101,102] which showed increased methylation levels at dmCpGs resulting from Analyses 1 and 2.
SH2B3 encodes a member of the SH2B adaptor family of proteins, involved in a range of signalling activities and is involved in haematopoiesis [110,111]. Mutations within this gene have been shown to increase susceptibility for T1DM [104,112]. DmCpGs within this gene were identified within each of the four analyses undertaken, with each dmCpG site showing an increased level of methylation in cases compared to controls. STRING v11: protein-protein association networks were searched for all genes in which top-ranked dmCpGs from each analysis (FDR p ≤ × 10 -8 and FC ± 2) were located. Those with an increased FC were searched separately to those with a decreased FC when cases and controls were compared. A strong interaction network was identified between LCK and three proteins including MAPK8IP3, SLA2 and PRKCE, each gaining a score of ≥ 0.9 when the genes with a decreased FC from Analysis 1 were searched. Scores ≥ 0.95 were obtained for 20 interactions resulting from the genes with an increased FC from Analysis 1 including UPF1-UPF3A, ITGAL-ITGB2 and LCK-CD247. ITGAL and UPF3A were selected as two of the most biologically plausible genes linked to T1DM-ESKD, as discussed previously. CD247 [113] and LCK [114] have both been previously linked to diabetes and which could warrant further investigation. Furthermore, two interactions scored 0.999 from both Analyses 2 and 4: ITGAL-ITGB2 and BRE-UIMC1. Neither BRE nor UIMC1 have previously been linked to T1DM or ESKD.
In the absence of a complementary replication cohort with available data, further support for top-ranked genes was sought from eQTL analysis from an American Indian cohort where the individuals had T2DM and known renal status [34]. eQTL analysis supports associations for eight of the top-ranked genes: CUX1, ELMO1, HDAC4, PIM1, PRKAG2, PTPRN2, RUNX3, SEPTIN9 (p ≤ 10 -5 , Additional file 1: Table S33). Functional support was also generated for these top-ranked markers in kidney tissue (glomerular tissue and tubule tissue from two cohorts [33,34]) alongside supporting data from additional blood-derived DNA from individuals with kidney function assessed using the 450K array (unpublished data) [33]. Unfortunately, the 450K array for DNA methylation includes < 25% of the top-ranked dmCpGs identified by our study which used the more comprehensive EPIC array. It is not unexpected that the dmCpGs did not correlate across all top-ranked results due to the differences in material and technologies employed.
Through this analysis, we have demonstrated that similar association results are reported for individuals who have received a kidney transplant or persons who are receiving dialysis. This shows that transplant recipients can be analysed alongside individuals receiving dialysis to increase the power of future EWAS for ESKD. The results, genes, p values and FC directions are consistent across each of the four analyses. RPTOR is the only topranked gene linked to immunosuppressant medication for transplants [115].
We have reported associations with several biologically plausible genes. Employing a considered approach, including four analyses (two of which were matched and two were unmatched), we show broad overlap in results from the analyses. This indicates that for future largescale studies, it is not essential to stratify this analysis by age and sex matching the participants, which will facilitate a larger, more powerful unmatched case-control design.

Strengths and limitations
Overall, this study has several strengths. We have carefully defined phenotypes for each analysis, ensuring individuals were well matched for paired Analyses 1 and 3, with a larger cohort of individuals employed for unmatched Analyses 2 and 4. Analyses 3 and 4 represented an extension of the initial analyses to evaluate the differences between individuals with T1DM and DKD compared to those with a more extreme phenotype who had progressed to T1DM-ESKD and had received a kidney transplant.
We have extended previous investigations through assessing T1DM-ESKD using the most cost-effective high-density methylation array available, the Infinium MethylationEPIC [116]. Methylation signatures were assessed using peripheral blood samples, while considering estimates of proportional WCCs and cell heterogeneity. We also assessed both the beta and M values which were derived from the methylationEPIC arrays. Furthermore, only top-ranked dmCpGs that met a significance level of FDR p ≤ × 10 -8 were reported, a threshold previously reported to reduce the rate of false-positives in studies which use the Infinium MethylationEPIC array [117].
This study is limited in not having a complementary replication cohort; however, we took the approach that a larger discovery cohort using the most comprehensive array commercially available was the optimal study design given similar kidney transplant cohorts do not have EPIC array data available for replication. This is similar to the approach taken for GWAS, in that the largest possible discovery cohort is selected whilst supporting data are sought for top-ranked results. While we have provided details of all dmCpGs with FDR p ≤ × 10 -8 and FC ± 2 in supplementary material, this manuscript focuses mainly on genes that have previously been linked to similar phenotypes in the literature. A largerscale multi-omic analysis incorporating genetic variation, epigenetic alterations and gene expression would be required to further determine the markers of interest for this phenotype and improve understanding of the biological mechanisms involved. Additional investigations could be undertaken to assess this phenotype in different ethnicities, sex-specific effects and analyses could be conducted to compare these results with those derived from kidney biopsy samples.

Conclusion
Epigenetic alterations, unlike genetic changes, are potentially reversible, offering opportunistic therapeutic interventions. Through this exploratory investigation, we have reported associations between dmCpGs and genes with T1DM-ESKD, several of which suggest complementary genetic and epigenetic influences to alter gene expression. Eight top-ranked genes also showed eQTL support in a T2DM American Indian cohort and 13 were supported by gene expression and / or methylation data from kidney tubule or glomerular tissues. Additional prospective studies may help identify whether the underlying methylation influences are causal or consequential.
The identification of unique epigenetic profiles associated with developing ESKD could highlight additional biological mechanisms to study kidney disease. Epigenetic profiles may also help to identify patients at greater risk for progression to ESKD. Targeting healthcare resources to individuals at highest risk for ESKD remains an important clinical goal.

Samples
Each participant was recruited as part of the All Ireland-Warren 3-Genetics of Kidneys in Diabetes (GoKinD) United Kingdom Collection. All participants were previously recruited, had White ancestry and provided written informed consent for research. DNA was frozen in multiple aliquots following extraction from whole blood using the salting out method [118] and normalised using PicoGreen quantitation [119] using the CytoFluor ® Series 4000 (Applied Biosystems, Thermo Fisher Scientific, CA, USA).
Individuals with both T1DM and ESKD were defined as cases (n = 107). These individuals had ≥ 10 years duration of T1DM alongside a diagnosis of DKD defined as persistent macroalbuminuria (≥ 500 mg/24 hr), estimated glomerular filtration rate (eGFR) < 60 mL/min/m 2 calculated using the Chronic Kidney Disease Epidemiology Collaboration (CKD-EPI) creatinine equation, hypertension (systolic/diastolic blood pressure ≥ 135/85 mmHg) and ESKD. They were not taking any anti-hypertensive medication. The control individuals had ≥ 15 years duration of T1DM and no evidence of kidney disease on repeat testing, i.e. they all had normal urinary albumin excretion and eGFR > 60 mL/min/m 2 (n = 253).

MethylationEPIC array
Blood-derived DNA from each participant was accurately quantitated using PicoGreen ® prior to normalisation. In total, 800 ng of DNA from each participant was bisulphite treated using the EZ Zymo Methylation Kit (D5002, Zymo Research, CA, USA) following the alternative overnight incubation conditions for use prior to the Illumina ® Infinium MethylationEPIC Kit provided in the published protocol. All samples were prepared and analysed using the Infinium MethylationEPIC Kit and BeadChips (Illumina, CA, USA) with no protocol deviations. All samples were processed in a consistent laboratory workstream by the same members of trained staff and methylation arrays were scanned using a dedicated iScan machine with regular monitoring of laser intensity levels. Case and control samples were randomly distributed across the BeadChip arrays. This array is a high-throughput platform which provides quantitative evaluation of methylation levels (β values) with single nucleotide resolution. In total, 862,927 sites were examined by the Infinium Methyla-tionEPIC array. The Infinium HD Methylation SNP List was searched for any SNPs that may have impacted upon the methylation array results [31]. Top-ranked dmCpGs located within biologically plausible genes were further examined using the web-based Biobanking and Biomolecular Resources Research Infrastructure, Netherlands (BBMRI-NL) database tool to identify potential SNP-CpGs [120].

Quality control
Each resulting.idat file generated from the iScan was assessed using BeadArray Controls Reporter (BACR) Software (Illumina) for quality control (QC). This software assessed the data in connection with a pre-set standard set of controls and evaluated the hybridisation, extension, dye specificity and bisulphite conversion process. An additional QC measure to determine the concordance of average beta values generated for seven duplicate samples was completed using GenomeStudio (Illumina) v1.8, methylation module including a sex check of all included individuals.
Proportional white cell counts (WCCs) were estimated following the Houseman method [29] using the raw.idat files output from the iScan machine. The minfi Bioconductor (v3.10) package was utilised to estimate six WCCs, CD8 + T, CD4 + T and CD19 + B lymphocytes, CD56 + natural killer cells, CD14 + monocytes and CD15 + granulocytes using the estimateCellCounts function for both the case and control groups, and a Northern Irish general population control group [121].

Differentially methylated loci analysis
Case and control groups were investigated for dmCpGs using Partek Genomics Suite (PGS) v7. 19.1125 (Partek, MO, USA) following functional normalisation. All software was used following the developer's instructions. Beta values were generated before M values were calculated. DmCpGs were determined using the M values for individuals with T1DM and ESKD compared to controls with long duration of T1DM and no evidence of kidney disease on repeat testing. Parameters were set at false discovery rate (FDR) adjusted p value threshold of ≤ × 10 -8 alongside a fold change (FC) ≥ ± 2, both calculated using PGS. Sex chromosomes were removed during the analysis (846,232 CpG sites were examined) alongside any probes with a poor detection p value (p > 0.01). Related genes for each of the CpG sites were annotated based on Homo sapiens hg19 genome build using PGS and the Infinium MethylationEPIC v1.0 B4 Manifest File. Four analyses were performed (Table 1), each assessing differential methylation patterns in individuals with T1DM-ESKD to those with T1DM and no evidence of renal disease (Fig. 1) 3. 73 matched pairs for age at sample collection (differing by < 5 years), sex and duration of diabetes (differing by ≤ 10 years). The case subjects were restricted to individuals with a functioning kidney transplant to minimise potential confounding due to differences in medication and renal replacement therapy modalities. These individuals were compared to matched controls with T1DM and no evidence of kidney disease. 4. The same 73 individuals from Analysis 3 were compared to a larger sample size of (unmatched) control individuals with T1DM and no evidence of kidney disease (n = 253).
Resulting dmCpGs which overlapped each of the four analyses were extracted and displayed in a heatmap which highlighted their degree of hyper-and hypo-methylation. Manhattan plots and QQ plots were drawn using the qqman R package [122].

Functional gene ontology and network analyses
Gene functionality was examined by gene ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis using PGS and the Reactome pathway database [123]. Solely genes in which top-ranked dmCpGs were located (FDR p ≤ × 10 -8 and FC ± 2) were examined. Enrichment score thresholds were set at an enrichment score ≥ 4 and p < 0.01 for GO, an enrichment score of ≥ 2, and p ≤ 0.01 for KEGG pathway analysis and p < 0.01 for the Reactome pathway database. An additional functional analysis was undertaken using STRING v11: protein-protein association networks [32]. For each of the four analyses, genes with top-ranked dmCpGs were categorised according to increased or decreased FC.

Tertiary analysis of dmCpGs and potential overlaps with transcription factors (TFs)
Top-ranked dmCpGs (FDR p ≤ × 10 -8 and FC ± 2) from biologically plausible genes (Table 2) were examined using eFORGE-TF, a software suite that assesses CpG sites for TF motif enrichment [124]. The CpGs were searched against previously acquired kidney data available in the online database. All significant motifs (q < 0.05) have been reported.

Kidney gene expression data
Participants for this study were American Indians with type-2 diabetes [125]. Protocol kidney biopsy samples were obtained from this cohort (n = 97). The study was approved by the Institutional Review Board of the National Institute of Diabetes and Digestive and Kidney Diseases and each participant signed an informed consent document. Expression profiling of the kidney biopsies was previously carried out using Affymetrix GeneChips (HumanGenome U133 Array and U133Plus2 Array and Affymetrix Human Gene ST GeneChip 2.1) as reported [126,127], and RNA-seq (Illumina) [13]. Expression quantitative trait (eQTL) mapping was performed using an EPACTS (https:// genome. sph. umich. edu/ Fig. 1 Illustration of groups samples were assigned to in order to complete the analysis wiki/ EPACTS) software tool using linear mixed model accounting for hidden familial relatedness, after inverse Gaussian transformation of expression levels, adjusting for age and sex. Significance threshold was set at p ≤ 10 -5 for eQTL and p <= 0.05 for association studies.

Analysis of top-ranked genes using the 450K array
In the absence of a complementary replication cohort, which is unavailable for a population with ESKD and T1DM, we sought Infinium 450K data from collaborators. In order to further assess the top-ranked methylation markers from the discovery cohorts, the top-ranked genes showing alterations in DNA methylation were assessed in 327 peripheral blood samples obtained from Pima Indians [24]. Blood samples for DNA isolation and cytosine methylation analysis were collected at the baseline examination from subjects. EGFR measures were available from clinical records before and after the nephrectomy. Among these subjects, 318 had longitudinal eGFR measurements with mean follow-up time of 9.8 years with a standard deviation (SD) of 3.9 years. The best linear unbiased predictor (BLUP) model was used to calculate eGFR slope for each subject.
Human kidney samples (n = 91) were obtained from surgical full/partial nephrectomies. Kidney samples were ~ 0.5 cm in diameter and were surrounded by at least 2 cm of normal tissue margins. Clinical data were obtained at the time of the sample collection and for a subset of samples eGFR measures were available from clinical records before and after the nephrectomy (n = 69). Only subjects with longitudinal eGFR measurements for at least 3 months post-nephrectomy were included for the analysis. The mean timespan of followup was 2.4 years (SD = 1.5 years).
DNA methylation was detected using Illumina 450K arrays and values were extracted using the minfi package. Several quality control measures were carried out and after the removal of CpG probes that (1) were in the proximity of regions with genetic variations, (2) were located on the sex chromosomes, (3) were known to cross-hybridise to other locations, (4) had poor detection p values (p > 0.01) or (5) were control probes, 321,473 CpG probes remained in the blood cohort used for 'lookups' cohort. The same pre-processing was performed for this kidney cohort which resulted in 355,141 CpG probes remaining.
In blood-derived DNA cohort, linear regression models were used to determine the association between DNA methylation, eGFR and eGFR slope, with covariates including sex, age, duration, mean arterial blood pressure, HbA1c, batch, converse and estimated cell fraction. Association between cytosine methylation level, eGFR and interstitial fibrosis was determined by linear regression models adjusted by age, sex, race, diabetes, hypertension, batch effect, bisulphite conversion efficiency and degree of lymphocytic infiltrate on histology. A p value of < 0.05 was used to determine significance in this population.