An EPIC predictor of gestational age and its application to newborns conceived by assisted reproductive technologies

Gestational age is a useful proxy for assessing developmental maturity, but correct estimation of gestational age is difficult using clinical measures. DNA methylation at birth has proven to be an accurate predictor of gestational age. Previous predictors of epigenetic gestational age were based on DNA methylation data from the Illumina HumanMethylation 27 K or 450 K array, which have subsequently been replaced by the Illumina MethylationEPIC 850 K array (EPIC). Our aims here were to build an epigenetic gestational age clock specific for the EPIC array and to evaluate its precision and accuracy using the embryo transfer date of newborns from the largest EPIC-derived dataset to date on assisted reproductive technologies (ART). We built an epigenetic gestational age clock using Lasso regression trained on 755 randomly selected non-ART newborns from the Norwegian Study of Assisted Reproductive Technologies (START)—a substudy of the Norwegian Mother, Father, and Child Cohort Study (MoBa). For the ART-conceived newborns, the START dataset had detailed information on the embryo transfer date and the specific ART procedure used for conception. The predicted gestational age was compared to clinically estimated gestational age in 200 non-ART and 838 ART newborns using MM-type robust regression. The performance of the clock was compared to previously published gestational age clocks in an independent replication sample of 148 newborns from the Prediction and Prevention of Preeclampsia and Intrauterine Growth Restrictions (PREDO) study—a prospective pregnancy cohort of Finnish women. Our new epigenetic gestational age clock showed higher precision and accuracy in predicting gestational age than previous gestational age clocks (R2 = 0.724, median absolute deviation (MAD) = 3.14 days). Restricting the analysis to CpGs shared between 450 K and EPIC did not reduce the precision of the clock. Furthermore, validating the clock on ART newborns with known embryo transfer date confirmed that DNA methylation is an accurate predictor of gestational age (R2 = 0.767, MAD = 3.7 days). We present the first EPIC-based predictor of gestational age and demonstrate its robustness and precision in ART and non-ART newborns. As more datasets are being generated on the EPIC platform, this clock will be valuable in studies using gestational age to assess neonatal development.


Background
Accurate determination of gestational age is important for assessing fetal development and maturity. This is necessary for investigating the impact of prenatal factors on pregnancy outcomes and any deviation from normal fetal development [1,2]. Although gestational age at birth exhibits some normal variation, both preterm and post-term births are associated with an increased risk of adverse perinatal outcomes and health outcomes later in life [3][4][5][6][7]. The effects of gestational age at birth on health outcomes may be linked to epigenetic patterns established in utero or early in the postnatal period [8,9]. Changes in these patterns may interfere with critical developmental processes [10][11][12] and trigger phenotypic changes that persist throughout life. This may be even more pertinent to children conceived by assisted reproductive technologies (ART), because ART procedures coincide with the extensive epigenetic reprogramming in the early embryo [13,14].
DNA methylation (DNAm) is the most studied epigenetic mark in humans. It has, in recent years, been used to build gestational age clocks that can predict gestational age [15][16][17][18]. Earlier clocks were built using DNAm data from the Illumina HumanMethylation27 (27 K) or the Illumina HumanMethylation450 (450 K) BeadChip arrays, both of which have subsequently been replaced by the Illumina MethylationEPIC BeadChip (EPIC). EPIC has nearly twice (865,859 CpGs) as many CpGs as 450 K, and a stronger focus on regulatory elements [19]. Although EPIC includes over 90% of the probes on 450 K [19], six to eight of the CpGs included in existing gestational age clocks are not present on EPIC. This discrepancy may affect the precision of the published clocks in predicting gestational age when applied to DNAm data generated on EPIC [20]. Therefore, it is essential to develop a new gestational age clock that is updated and optimized for EPIC. Equally important is to elucidate whether the additional CpGs on EPIC enhance gestational age prediction.
A challenge in developing accurate gestational age clocks is the lack of information on the exact gestational age of the newborns. The standard approaches for estimating gestational age, based on ultrasound measurements or the last menstrual period (LMP), have thus far been used for training and testing epigenetic clocks. Ultrasound and LMP are widely used in clinical settings and have their individual advantages and limitations. While LMP can be informative, it suffers from large variability, in part due to varying length of the follicular phase. Ultrasound is much more precise but still depends on the size of the fetus at the time of ultrasound [1,21,22]. On the other hand, for children conceived by ART, the exact time when the embryo is transferred back to the uterus is known. Although there may be some differences in the days before fertilization and embryo transfer, and the developmental speed may differ in the in vitro setting, the embryo transfer date (ETD) provides a more direct estimate of gestational age [23]. Therefore, DNAm data from ART births is particularly advantageous for developing and validating gestational age clocks. To our knowledge, no gestational age clock has yet been developed using ETD, although its use has been called for previously [16].
In addition to gestational age prediction, gestational age clocks can be used to estimate gestational age acceleration (GAA), which is defined as the discrepancy between gestational age predicted from DNAm data and gestational age derived from clinical measurements [16,24]. Investigating GAA is important because of its reported association with several measures related to birth outcomes, such as the cerebroplacental ratio (a robust indicator of prenatal stress [25]), higher maternal body mass index, and larger birth size [26]. Although children conceived by ART have a higher risk of spontaneous preterm birth [27] and other adverse perinatal outcomes [28][29][30], only one small study has explored GAA in ART children [31].
To address these knowledge gaps, we developed a new gestational age clock based on EPIC-derived DNAm data from newborns in the Norwegian Study of Assisted Reproductive Technologies (START), which is a substudy within the Norwegian Mother, Father and Child Cohort Study (MoBa) [32]. We validated this clock in test sets of ART and non-ART newborns in START, and also in an external dataset from the Finnish Prediction and Prevention of Preeclampsia and Intrauterine Growth Restriction (PREDO) study [33], which was used as a replication cohort. We also used the new EPIC-based clock to explore differences in GAA between ART and non-ART newborns.

Results
The EPIC gestational age clock We validated the resulting predictor, referred to as "EPIC GA clock" hereafter, in a test set of 200 non-ART newborns from START. The EPIC GA clock showed an R 2 of 0.713 and a median absolute deviation (MAD) of 3.59 days (Fig. 2, Table 2).

Comparison with previously published gestational age clocks in an external replication cohort (PREDO)
Using an external dataset of EPIC-derived DNAm data on 148 non-ART newborns from the PREDO study [33], we compared the performance of our EPIC GA clock with two published epigenetic gestational age clocks that were built on DNAm data from the previous methylation arrays: the Bohlin clock [15], based on 450 K, and the Knight clock [16], based on 27 K and    Table 3 Bootstrapped differences in R 2 , SE, and MAD between different clocks and GA estimation methods *See Table 1  CpGs in the 450 K/EPIC overlap clock were also present in the EPIC GA clock.

Using the embryo transfer date (ETD) to predict gestational age
A great advantage of the ART dataset is that the ETD is known for the ART-conceived children. We thus developed a gestational age clock using the ETD of ART-conceived children to investigate whether it was possible to achieve a better predictor of gestational age. Six hundred and seventy-four ART newborns from START (Table 1, Fig. 1) were used to train the ETD-based clock. Additional file 1: Figure S1 shows the performance of the ETD-based clock for ultrasound-and ETD-estimated gestational age in the START ART training and test set, respectively. When compared to the EPIC GA clock in the non-ART test set from START, the ETD-based clock showed a similar performance, with an R 2 difference of 0.048 (95% CI: −0.041, 0.123) and a difference in MAD of 0.645 (95% CI: −0.181, 1.209) ( Fig. 4; Table 3). The ETD-based GA clock contained 155 CpGs, and only 19 of them were in common with those of the EPIC GA clock.

Application of the EPIC GA clock to ART children
To assess the performance of the EPIC GA clock in ARTchildren, we applied the EPIC GA clock to the cord-blood DNAm data of 838 newborns conceived by ART (Table 1, Fig. 1). We compared predicted gestational age to gestational age estimated by ultrasound measurements and by ETD, respectively (Fig. 5). Gestational age estimated by ultrasound measurement and ETD was predicted with similar precision (R 2 difference of 0.015 (95% CI: −0.003, 0.033); Fig. 5, Table 3) and accuracy (MAD difference of −0.102 (95% CI: −0.465, 0.174)).

Gestational age acceleration in ART children
To assess whether GAA is associated with ART, we first regressed gestational age predicted by the EPIC GA clock on gestational age estimated by ultrasound in 200 non-ART and 838 ART newborns from START. GAA was calculated using the residuals from this regression. Next, we analyzed the relationship between GAA and ART by performing a logistic regression of ART on GAA. We found no significant difference in GAA between the ART (n = 838) and non-ART (n = 200) newborns (p = 0.388, Fig. 6). Aside from ETD, another major advantage of the START dataset is that the specific ART procedure used for conception was known, i.e., whether in vitro fertilization (IVF) was used alone or together with intracytoplasmic injection of sperm (ICSI), and whether the embryo was transferred fresh or after being frozen. We found no significant difference in GAA between newborns conceived by IVF alone (n = 470) and those conceived by IVF in combination with ICSI (n = 338) (p = 0.976, Additional file 2: Figure S2). Furthermore, there was no significant difference between fresh (n = 693) and frozen (n = 115) embryo transfer (p = 0.274, Additional file 3: Figure S3).

Gene-enrichment analysis
To explore the biological significance of the 176 CpGs selected in our EPIC GA clock, we performed  [34] was used to perform geneenrichment analyses of the 154 genes [35]. WebGestalt identified 78 categories as being significantly enriched at a false discovery rate (FDR) < 0.01. The category with the highest enrichment ratio was "regulation of platelet-derived growth factor receptor signaling pathway, " containing LRP1, HIP1R, HGS, and SRC (enrichment ratio = 37; FDR = 0.003). Several of the significant hits were related to abnormal morphology of the eye, ear, nose, and other developmental categories, e.g., "plasma membrane-bounded cell projection organization" and "negative regulation of cellular biosynthetic process. " The complete output of the WebGestalt analyses is provided in Additional file 5.

Discussion
We present the first EPIC-based predictor of gestational age and demonstrate its robustness and precision in ART versus non-ART newborns. This study benefited greatly from having the largest ART dataset to date, with detailed information on ETD and the specific procedure used for conception. Our EPIC GA clock, trained on the START dataset, outperformed previous cord blood-based gestational age clocks when compared in an independent Finnish test set (PREDO).
Previous DNAm-based clocks were developed using the now outdated 27 K and 450 K. EPIC has almost twice as many CpGs as 450 K, and while 27 K and 450 K mostly cover areas around genes and CpG-islands, some of the additional probes on EPIC target distal regulatory elements and intergenic regions [36]. We, therefore, hypothesized that the additional CpGs unique to EPIC might have enhanced the performance of the EPIC GA clock. However, when we developed a separate clock featuring only those probes that are shared between 450 K and EPIC, we observed a similar performance to the EPIC GA clock, indicating that the additional CpGs on EPIC did not significantly enhance the prediction of gestational age. This observation is consistent with recent findings on age prediction by Lee et al. [37]. Another plausible explanation for the superior performance of our EPIC GA clock might be related to the fact that eight CpGs in the Bohlin clock and six CpGs in the Knight clock are absent from the EPIC array. This discrepancy might have reduced the prediction accuracy of the earlier clocks when applied to EPIC data.
A substantial advantage of the START dataset is its large sample size combined with detailed information on ETD for the ART-conceived newborns and the specific ART procedures used for conception. Using ETD provides a more direct estimate of gestational age than estimates based on ultrasound measurement or LMP [23]. We thus checked whether a clock trained on gestational age estimated by ETD would lead to a further improvement in gestational age prediction. The results showed that the two clocks had similar performance, despite the low overlap in CpGs and genes. This suggests that using ETD-based gestational age estimates for training does not significantly enhance prediction compared to clocks trained on ultrasound-based estimates, further highlighting the precision of the EPIC GA clock.
A higher risk of spontaneous preterm birth and other adverse perinatal outcomes has been reported among ART-conceived children [28][29][30]. Given that the timing of ART procedures coincides with the extensive epigenetic remodeling in the gametes and early embryo, and, further that epigenetic alterations have been reported in ART embryos and children [38][39][40], we investigated whether the epigenetic gestational age of ART newborns differed significantly from that of non-ART newborns. When we applied the EPIC GA clock to ART newborns, the precision of the gestational age prediction remained similar to that of the non-ART newborns, indicating that the clock is also well suited for predicting gestational age in ART newborns. Furthermore, the EPIC GA clock predicted both ETD-based and ultrasound-based gestational age equally well, again underscoring the precision of the clock. Finally, we found no significant differences in GAA between ART and non-ART newborns.
ART is a collective term used to describe different procedures and categories that may have different impacts on fetal DNAm. It is therefore particularly important to investigate whether gestational age prediction differs according to the specific ART procedure used. For instance, embryos may be transferred to the uterus when they are fresh or after being frozen, and IVF may or may not involve ICSI. A previous study [31] examining GAA in ICSI newborns compared to non-ART newborns did not find any significant difference between the two groups. However, the authors detected a significant decrease in DNAm-predicted gestational age at birth among the ICSI newborns. To verify these findings in our dataset, we conducted another set of analyses to explore differences between IVF, ICSI, and non-ART newborns, as well as between fresh, frozen, and non-ART-conceived newborns. We found no significant differences in DNAm-predicted GA or GAA between any of the groups (Additional file 2: Figure S2 and Additional file 3: Figure  S3), further strengthening the hypothesis that GAA is not associated with ART.
Although DNAm is strongly associated with gestational age, the mechanisms underlying this association are not well understood. A closer inspection of the specific CpGs selected for gestational age prediction and the overlap between different clocks may provide some answers. Of the 176 CpGs selected by the EPIC GA clock, only 11 were in common with the CpGs in the Bohlin clock, and none overlapped with the CpGs in the Knight clock. This could partly be explained by the 89 EPIC-specific CpGs. The lack of overlap in CpGs across different clocks has also been observed in age prediction models [41]. Our analyses showed little overlap between the EPIC GA clock and the ETD-based clock, even though both were trained on EPIC data. As Lasso regression and elastic net regression may select CpGs that are not associated with the outcome per se [42], dataset-specific CpGs could end up being included in the model. Furthermore, Lasso selects one CpG for each group of correlated (or neighboring) CpGs, whereas elastic net regression selects several CpGs, leading to a so-called "grouping effect" [43], which could lead to less overlap in CpGs between prediction models.
Unraveling the biological mechanisms underlying the gestational age clocks requires identifying the genes associated with the clock-specific CpGs and examining how they are related to gestational age. Our results revealed several genes in common across the different clocks. For example, 13 genes were shared between the EPIC GA clock and the Bohlin clock, while 15 genes were shared between the EPIC GA clock and the ETD-based clock. Some of the CpGs and genes in the EPIC GA clock appear to be stably associated with gestational age. For example, CpGs linked to Nuclear Receptor Corepressor 2 (NCOR2) and Insulin-Like Growth Factor 2 MRNAbinding protein 1 (IGF2BP1) were selected in both the EPIC GA clock and the Bohlin clock, and both of these genes have previously been identified in other studies of gestational age [44][45][46][47]. NCOR2 is involved in vitamin A metabolism and lung function [48], and IGF2BP1 plays an important role in embryogenesis and carcinogenesis [49]. The EPIC GA clock also identified CpGs related to Corticotropin-Releasing Factor-Binding Protein (CRHBP), consistent with previous studies of gestational age [8,50]. CRHBP levels rise throughout pregnancy but drop markedly when approaching term [51]. Furthermore, Mastorakos and Ilias [52] showed that CRHBP might prevent aberrant pituitary-adrenal stimulation in pregnancy. In addition to the genes mentioned here, several other genes linked to the CpGs in our clock have previously been implicated in gestational age, including Muscleblind Like Splicing Regulator 1 (MBNL1), CD82 molecule (CD82), Integrin Subunit Beta 2 (ITGB2), and Rap Guanine Nucleotide Exchange Factor 3 (RAPGEF3) [47,50]. Additional studies are needed to elucidate their roles in gestational age.
For a clock to be useful, it needs to be generalizable to other cohorts and populations. As with the Bohlin clock, our EPIC GA clock was trained on data from a relatively homogeneous cohort in terms of ethnicity, socioeconomic status, and age [32,53]. Our clock performed equally well in the independent Finnish PREDO cohort. However, while the use of a homogeneous training set may enhance the prediction model [42,54], it can also result in a cohort-specific clock that is less generalizable to other populations.
Exploring associations between specific neonatal outcomes and DNAm-based gestational age is still in its nascent stages [26,55], and there are many unanswered questions regarding neonatal development. The development of an EPIC-specific gestational age clock may offer additional insights into gestational age and neonatal development. As the 450 K array has been discontinued, we anticipate that future research on DNAm-based GA clocks will migrate to the more updated EPIC array. Research on GA-related topics and DNAm utilizing the 450 K array are expected to continue for some time, as many 450 K-based datasets are still in circulation and some are being used in consortia-led efforts. The clocks presented here may facilitate further research on DNAmbased clocks for both 450 K and EPIC-based arrays.

Conclusions
The new EPIC GA clock presented here predicted gestational age precisely in both ART and non-ART newborns and outperformed previous cord blood-based gestational age clocks when validated in an independent test set. The increased performance was not due to the higher coverage of CpGs on the EPIC array. Furthermore, the use of ETD-estimated gestational age for training did not improve the precision of gestational age prediction significantly compared with clocks trained on ultrasoundestimated gestational age. This is reassuring, as most datasets on newborns only have ultrasound-or LMPbased measures of gestational age. Finally, we did not find any significant association between GAA and ART. With a growing number of epigenetic datasets currently being generated on the EPIC platform, we expect our EPIC GA clock to become increasingly valuable in assessing developmental maturity in studies of neonatal development and disease.

Study population
MoBa is an ongoing, population-based pregnancy cohort study conducted by the Norwegian Institute of Public Health (NIPH). Totally, 114,500 children, 95,200 mothers, and 75,200 fathers were recruited from all over Norway from 1999 through 2008 [32]. The MoBa mothers consented to participation in 41% of the pregnancies. Extensive details on the MoBa cohort have been provided elsewhere [32,56]. START is a substudy of MoBa and consists of 1,995 newborns and their parents. Blood samples from the newborns were obtained from the umbilical cord at birth [56].
PREDO is a prospective pregnancy cohort of Finnish women who gave birth to a singleton live child between 2006 and 2010 [33]. The cohort comprises 1079 pregnant women; 969 of these had one or more known risk factors for preeclampsia and intrauterine growth restriction, whereas the rest had no such risk factors. The women were enrolled in the study when they arrived for their first ultrasound screening at 12-14 gestational weeks in 10 study hospitals in Southern and Eastern Finland. Blood samples were obtained from the cord blood of 998 newborns [57]. To validate the gestational age clocks, we used cord blood-based DNAm data from 148 newborns (Fig. 1).

DNAm profiling and quality control
Cord blood samples taken by a midwife immediately after birth were frozen [56]. Five hundred nanograms of DNA extracted from the cord blood of START newborns were shipped to LIFE & BRAIN GmbH in Bonn, Germany, for measurement of DNAm on the Illumina Methylatio-nEPIC array (Illumina, San Diego, USA). The raw iDAT files were imported and processed in four batches using the R-package RnBeads [58]. 44,210 cross-hybridizing probes [59] and approximately 10,000 probes with a high detection p-value (above 0.01) were removed. 16,117 probes with the last three bases overlapping with a single-nucleotide polymorphism (SNP) were also excluded. The remaining DNAm signal was processed using BMIQ [60] to normalize the type I and type II probe chemistries. Control probes output from RnBeads were visually inspected for all samples, and those with low overall signals were removed. The Greedycut option [58] was used to remove outliers with markedly different DNAm signals than the rest of the samples. This resulted in the removal of 58 samples in total. For consistency, CpG sites removed from one batch, due to poor quality and detection p-value, were also removed from subsequent batches. After quality control, 770,586 autosomal CpGs and 1945 samples remained in the final dataset. 1793 subjects for whom we had information on ultrasoundbased gestational age were used to develop and validate the gestational age clocks in this study.
For the PREDO samples, DNA was extracted according to standard procedures. Methylation analyses were performed at the Max Planck Institute of Psychiatry in Munich, Germany. DNA samples were bisulfite-converted using the EZ-96 DNA Methylation kit (Zymo Research, Irvine, CA) and assayed on the Illumina Infinium MethylationEPIC array (Illumina, San Diego, USA). Three samples were excluded for being outliers based on their median intensity values. Another three samples showing discordant phenotypic and estimated sex were excluded. A further three samples were contaminated with maternal DNA and were also removed [61]. Methylation beta-values were normalized using the funnorm function [62] in the R-package minfi [63]. Three samples showed density artifacts after normalization and were removed from further analysis. We excluded probes on the sex chromosomes, probes containing SNPs, and cross-hybridizing probes according to previously published criteria [59,64,65]. Furthermore, CpGs with a detection p-value > 0.01 in at least 25% of the samples were also excluded. Finally, one duplicate sample was removed after quality control. The final dataset contained 812,987 CpGs and 148 samples. After normalization, no significant batch effects were identified.

Variables
For the START dataset, information on gestational age, sex, and ART status was extracted from the Medical Birth Registry of Norway (MBRN). Gestational age at birth was estimated by ultrasound measurements in week 18 of pregnancy. For the ART children, we used the date of egg retrieval plus 14 days to obtain a second estimate of gestational age. When the date of egg retrieval was not known, the date of embryo insertion was used instead, minus two days. For embryos that were frozen, we used the date of embryo insertion plus 14 days, and the number of days between egg retrieval and freezing. These three estimations of gestational age were combined into a variable called embryo transfer date (ETD). IVF and ICSI were defined as ART treatments, whereas children conceived by intrauterine insemination were defined as non-ART births.
For the PREDO dataset, information on gestational age and sex was extracted from the Finnish Medical Birth Register. Gestational age at birth was estimated by ultrasound measurements between 12 and 14 weeks of pregnancy.
Gestational age prediction Figure 1 shows a flowchart of the analyses performed. Children conceived without ART (non-ART) were randomly split into two groups: a training set (~ 80%) for developing the clock and a test set (~ 20%) for validating the clock. We used Lasso regression from the R-package glmnet [66] to develop DNAm-based predictors of gestational age. Clinically estimated gestational age was regressed on the 770,586 remaining CpGs after quality control in the START dataset. For the "450 K/EPIC overlap clock, " only the 397,473 CpGs that were in common between 450 K and EPIC were used. Missing probes were imputed using the median imputation procedure in the R-package Hmisc [67]. Tuning parameters α and λ were selected after tenfold cross-validation in the training set. For the "EPIC GA clock, " Lasso regression selected 176 CpGs (α = 1, λ = 0.66), while for the 450 K/EPIC overlap clock and the "ETD-based clock, " 173 CpGs (α = 1, λ = 0.63) and 156 CpGs (α = 1, λ = 0.62) were selected, respectively. Individual CpG sites and their corresponding coefficients are provided in Additional file 4.
The above clocks were used to estimate gestational age in (i) the START non-ART test set, (ii) the START ART newborns, and (iii) the non-ART newborns from PREDO (see Fig. 1 for more details). Predicted gestational age was regressed on clinically estimated gestational age using MM-type robust linear regression [68] from the R-package robustbase [69]. The precision of a given prediction model was defined as the proportion of variance explained by the model (i.e., by the R 2 value). Accuracy, on the other hand, was defined as the median absolute deviation (MAD) between observed and predicted gestational age.

Comparison of prediction parameters
To compare the performances of the different clocks and GA estimation methods, we calculated the differences in R 2 , SE, and MAD when computed by two different clocks or GA methods. To assess the size and significance of the differences, we computed bootstrap confidence intervals for each difference. Since all three performance measures can be calculated from observed and predicted GA values, each bootstrap sample selected individuals randomly and used the observed and predicted GA values already calculated for those individuals. The pairs of R 2 , SE, and MAD values were calculated from the same bootstrap sample to account for the same dataset being used in each comparison. Thus, we did not need to refit the full prediction model for each bootstrap sample.
The bootstrapping was performed using the R-package boot [70,71]. 95% confidence intervals of the bootstrap differences were standard percentile intervals, reported as type "perc" by the boot package. A difference was considered statistically significant when the corresponding confidence intervals did not include the value 0.

Gestational age acceleration analysis
GAA was defined as the residuals from a linear regression of DNAm gestational age predicted by the EPIC GA clock on ultrasound-estimated gestational age [16]. We tested for association between GAA and ART by performing a logistic regression of ART on GAA.

Gene-enrichment analysis
The online functional enrichment software WebGestalt [34] was used to search for enrichment within the annotated genes of the EPIC GA clock. We identified 154 unique gene names annotated for the 176 CpGs selected in the EPIC GA clock using the annotation data from Illumina's Infinium MethylationEPIC v1.0 B4 Manifest file. We then performed an overrepresentation analysis on the 154 genes using Fisher's exact test [35], assigning a minimum of five genes per category, and using the genome as background. WebGestalt leverages data from the following databases for each category: gene ontology [72,73] (Biological Process, Cellular Component, Molecular Function), pathway (KEGG [74], Panther [75], Reactome [76], WikiPathway [77]), network (Kinase target, Transcription Factor target, miRNA target), disease (DisGeNET [78], GLAD4U [79], OMIM [80]), drug (DrugBank [81]), phenotype (Human Phenotype Ontology [82]), and chromosomal location (Cytogenic Band). The Benjamini-Hochberg procedure was applied to the p-values, and categories with a false discovery rate below 0.01 were declared significantly enriched.