A genome-wide DNA methylation signature for SETD1B-related syndrome

SETD1B is a component of a histone methyltransferase complex that specifically methylates Lys-4 of histone H3 (H3K4) and is responsible for the epigenetic control of chromatin structure and gene expression. De novo microdeletions encompassing this gene as well as de novo missense mutations were previously linked to syndromic intellectual disability (ID). Here, we identify a specific hypermethylation signature associated with loss of function mutations in the SETD1B gene which may be used as an epigenetic marker supporting the diagnosis of syndromic SETD1B-related diseases. We demonstrate the clinical utility of this unique epi-signature by reclassifying previously identified SETD1B VUS (variant of uncertain significance) in two patients.


Introduction
Currently, five patients have been described with a microdeletion 12q31.24 and comparable phenotypes [1][2][3][4][5]. The lost fragment of chromosome 12 varied in size and included multiple genes. Labonne et al. [5] identified the smallest overlapping region and proposed two histone modifiers, KDM2B and SETD1B, as the most probable candidates to be responsible for the microdeletion 12q24.31 syndrome. SETD1B encodes a SET domaincontaining protein, which is a part of a histone methyltransferase complex. The key role of this complex is methylation of histone 3 on lysine 4 (H3K4), which is enriched in gene promoters and is seen to be highly correlated to gene expression [6]. KDM2B is a member of the F-protein family and encodes an enzyme that demethylates H3K36me2/3 and H3K4me3 [7]. Labonne et al. [5] showed that the genetic organization of 12q24.31 is conserved between zebrafish and humans and that KDM2B and SETD1B were expressed in the brain tissue of both zebrafish and human, suggesting evolutionary conservation of the regulation of these genes [5]. More recently, three patients with de novo point mutations in SETD1B have been described [8,9]. Their phenotypes were similar to patients with a 12q24.31 microdeletion.
Since it has been shown that there is a strong relationship between the methylation of H3K4 and DNA methylation [10][11][12][13], we set out to determine whether the SETD1B and KDM2B aberrations can manifest with a specific DNA methylation signature. For this, a genome wide-methylation analysis was performed on DNA samples from 13 patients with either aberrations of 12q24 (including or not including KDM2B and/or SETD1B genes) or mutations in SETD1B (Table 1). This set of patients included previously described patients and additional cases identified in our laboratory or through GeneMatcher [14].

Results
Identification of a SETD1B-related specific methylation signature Genomic DNA was obtained from whole blood samples (13 patients and 60 controls), and genome methylation status was analyzed using the Infinium MethylationEPIC BeadChip. The determination of DNAm signature based on HumanMethylation array was previously validated and described in various studies [13,[15][16][17][18][19].
The principal component analysis (PCA) of the data obtained showed two outliers in our cohort: a patient with a microdeletion including SETD1B and KDM2B (3_ del12q; Batch1) and a healthy control (4 days old, batch 2). Estimation of the blood cell types in patient 3_del12q showed an unexpected distribution of cell types (99% of The minimal deletion/duplication within the given start and end position The maximal deletion-without the given start and end position (between) B lymphocytes). Both outliers were excluded from further group analysis. Quality control (QC) of the data, PCA analysis, and estimation of the blood cell type distribution are described in detail in the supplemental information and listed in Additional file 1: Table S1. Next, a group-based differential methylation analysis was carried out, comparing the DNAm of five patients with pathogenic variants in SETD1B to that in controls (n = 59). Variants were considered pathogenic if the following was observed: (i) variants were de novo and occurred in more than one patient or (ii) variants resulted in a premature stop codon. The patients included in the group analysis were patient 1_mut (p.Arg1301*), patients 2_mut and 3_mut (p.Arg1902Cys), and patients 4_mut and 5_mut (p.Arg1885Trp).
A shift of the genome-wide methylation toward hypermethylation was observed ( Fig. 1), which is reflected in the selected significant differentially methylated CpGs (adj. P-value_M < 0.05, absolute beta difference > 0.1). This analysis identified 3340 significant differentially methylated CpGs, out of which more than 82% had a positive beta difference. All significant differentially methylated CpGs identified in this analysis are listed in Additional file 2: Table S2. To further calculate the probability that we would have identified that these 3340 CpGs as significant by chance, we performed an additional permutation analysis on the group labels. 99.6% of 3340 significant differentially methylated CpGs displayed P value less than or equal to 0.05. Details of this analysis are described in the additional information and listed in the Additional file 6: Table S6.
Next, unsupervised hierarchical clustering of beta values of the identified significant CpG sites (3340 CpGs) for each individual of our cohort was created; 13 patients and 60 controls (Fig. 2). Eight of the 13 patients were clustered in a separate group. All five patients with pathogenic variants in SETD1B (patients included in the "SETD1B-related" group analysis); two patients with a deletion including KDM2B and SETD1B (2_del12q, 3_del12q) and one with a deletion including only SETD1B (4_del12q) fell into this cluster. Note that although patient 3_del12q had an aberrant blood cell composition, the methylation signature was detectable in this sample. These results demonstrate the robustness of the specific DNAm of the SETD1B aberrations/variations. Despite the many variables in the cohort that may have had an impact on the DNAm (different ethnicity, different aberrations/variations, a different method of DNA isolation small sample size, batch, age, and distribution of the cell types), there is a distinct SETD1B specific methylation signature. The methylation profile of the patients with a deletion excluding SETD1B (1_del12q and 5_ del12q_a), a patient that carried a duplication of the 12q region, and two patients with a variant of uncertain significance, in SETD1B (6_mut and 7_mut), did not show the SETD1B-specific signature.
Examination of the specificity of the SETD1B-related DNAm signature We examined whether the DNA methylation signature of SETD1B-related syndrome overlaps with that of other neurodevelopmental disorders or syndromes, which in Fig. 1 The volcano plot of the methylation difference between patients with certain pathogenic variation in SETD1B and healthy individuals (group analysis). The y-axis represents a negative log 10 of adj. P-values_M; the x-axis represents the different beta values between patients and controls. Each dot on the plot represents a single CpG site. The horizontal, dotted line represents the statistical significance threshold (adj. P-values_M = 0.05). The vertical, dotted lines show the effect-size threshold (− 0.1 and 0.1). CpGs with adj. P-value_M lesser than 0.05 and an absolute beta difference higher than 0.1 are highlighted in green some cases, are caused by mutations in the members of the epigenetic machinery. Using a multidimensional scaling of the methylation values across the CpGs differentially methylated in the SETD1B-related syndrome, we examined the methylation profile of a total of 502 individuals with a confirmed diagnosis of various syndromes with previously described epi-signatures including imprinting defect disorders [16,17,20] (Angelman syndrome, Prader-Willi syndrome, Silver-Russell syndrome, and Beckwith-Wiedemann syndrome), BAFopathies (Coffin-Siris and Nicolaides-Baraitser syndromes), Autosomal dominant cerebellar ataxia, deafness, and narcolepsy, Floating-Harbor syndrome, Cornelia de Lang syndrome, Claes-Jensen syndrome, ADNP syndrome, ATRX syndrome, Kabuki syndrome, CHARGE syndrome, Fragile X syndrome, trisomy 21, Williams syndrome, and Chr7 duplication syndrome (Fig. 3). All of these patients showed a DNA methylation pattern different from the SETD1B-related syndrome and were clustered with controls, indicating that the identified epi-signature is highly specific to SETD1B loss of function.  Identification of the SETD1B-related differentially methylated regions Using the "bumphunter" R-package, four genomic regions differentially methylated between patients with pathogenic variants in SETD1B (as defined above) and controls were identified (minimum three differentially methylated CpGs in a region; family-wise error rate (Fwer) < 0.05) ( Table 2). All four regions were hypermethylated in patients and located in the regulatory clusters of active promoters, enhancers, and DNAse hypersensitivity (UCSC Genome Browser on Human; GRCh37/hg19 [21]), three of which were annotated to genes (i) KLHL28, FAM179B; (ii) RUNX1; and (iii) BRD2.
Analysis of the genomic distribution of the CpG sites in the SETD1B DNAm signature An analysis of the genomic distribution of the CpG sites identified in the group analysis was conducted. This showed an over-representation of CpGs in the gene body, DNase hypersensitivity sites (DHS), CpG island S-shore, reprogramming differentially methylated regions (RDMR), and in promoter-associated sites (Fig. 4). These results demonstrate that the disrupted methylation related to the SETD1B function is enriched in the regulatory parts of the genome.

DNAm signature
To identify the processes involved in the development of the phenotype, ORA analysis based on gene names associated with the 3340 identified significant methylated CpGs using WEB-based GEne SeT AnaLysis Toolkit [22] was performed. The analysis for biological processes displayed enrichment for genes with a function in chromosome organization, regulation of organelle organization, cell cycle, and regulation of cell death. ORA for molecular function demonstrates enrichment for genes with a role in the regulation of gene activity, such as RNA binding, protein domain-specific binding, regulatory region nucleic acid binding, and transcription regulatory region DNA binding. ORA for the human phenotype (top 10 highest ranked features) showed enrichment in genes related to facial and posture abnormalities. The results of ORA are summarized in Table 3. Note that ORA analysis is very general and the results should be interpreted with caution.

Analysis of a KDM2B-related specific methylation signature
Only three patients in this cohort had a deletion of KDM2B (1_del12q, 2_del12q, 3_del12q), one of whom presented with a deletion excluding SETD1B (1_del12q). Furthermore, of these, patient 3_del12q was excluded from the group analysis due to the heavily disturbed blood cell-type distribution. Despite these limitations, an attempt was made to identify a KDM2B-specific signature, running the group analysis of only two patients (1_ del12q, 2_del12q) compared to 59 controls. This identified 697 significant differently methylated CpG sites (adj. P-value_M < 0.05 and absolute beta difference > 0.1). Nevertheless, the unsupervised hierarchical clustering (Fig. 5) of the 697 identified CpGs did not show any specific methylation signature related to KDM2B. The two patients (1_del12q and 2_del12q) were clustered separately from each other, other patients, and healthy controls. Moreover, the SETD1B-related specific signature was still strongly marked. All significant differentially methylated CpGs identified in this analysis are listed in Additional file 3: Table S3.

Identification of the KDM2B-related differentially methylated regions
The DMR analysis did not show any significant DMR (minimum of three differentially methylated CpGs in a region; Fwer < 0.05).

Clinical features
All patients with a SETD1B signature-positive methylation profile presented with an intellectual disability. Common features included language delay, epilepsy, and behavioral problems such as autism spectrum disorder and anxiety. Dysmorphisms included full cheeks, full lower lip, macroglossia, and tapering fingers. Delay in motor development was primarily present in patients with a deletion and absent in patients with a point mutation in SETD1B (Table 4). Value -represents the difference between patient end controls L-number of differentially methylated CpGs in the detected region, Cluster L-number of CpGs in the genomic cluster, Fwer-family-wise error rate

Discussion
Pathogenic changes within the SETD1B gene were found to have an associated specific DNAm signature. This specific DNAm was not substantially affected by differences in blood cell distribution and other variables such as technical differences and chromosomal aberrations. The specificity of the DNAm signature was highlighted by the lack of signature in patients carrying a deletion that did not include SETD1B or in a patient carrying a duplication of the region or patients with other neurodevelopmental disorders or syndromes. Moreover, we were able to assess the pathogenicity of two variants of unknown clinical significance: p.(Glu1692del) and p.(Glu1160Lys) in patients 6_ mut and 7_mut, respectively. The inheritance of variant p.(Glu1692del) in patient 6_ mut was unknown. This variant results in the loss of residue Glu1692. The p.(Glu1160Lys) variant in the 7_mut patient occurred de novo. It is a missense variant present at very low frequencies in the general population (5/ 187386 alleles in the GnomAD database [23]; MAF < 0.01; rs959370052) and affects a weakly conserved amino acid. The methylation profile of both patients did not display a specific SETD1B signature, suggesting both variants do not result in a loss of SETD1B function and are probably not pathogenic. While patients 6_mut and 7_mut display clinical features compatible with the phenotype caused by SETD1B mutations, this is not related to the specific SETD1B methylation pattern, indicating that they do not have a SETD1B-related disorder.
We detected the specific SETD1B-related DNAm signature based on the methylation status of three different pathogenic variants in five patients. An increased sample size would lead to the possibility of detecting differences in DNAm between variants.
Four hypermethylated DMRs were found to be associated with SETD1B. The region located on chromosome 6 (chr6: 26195488-26195995; hg19) was not assigned to any gene and was found to be characterized by high DNase hypersensitivity with promoter activity and located in Homo sapiens histone cluster 1. Histone 1 (H1) is responsible for chromatin condensation and DNA fragmentation during apoptosis [24,25]. Note that the apoptotic process, regulation of cell death, and chromatin condensation were enriched in ORA (biological processes) of CpG sites of the SETD1B-related DNAm-specific signature. Another hypermethylated region on chromosome 6 (chr6: 32942063-32943025; hg19) was assigned to the BRD2 gene. It displays promoter and enhancer activity and overlaps exon 3 of BRD2. Pathak et al. [26] reported hypermethylation in another locus (CPG75) near the promoter of BRD2 as implicated in juvenile myoclonic epilepsy (JME) [26]. Hypermethylation of this locus was found to be associated with a single nucleotide polymorphism (rs3918149). Schultz et al. [27] could not confirm this association in the German population. However, in 2007, Cavalleri et al. published the results of genotyping rs3918149 variant across five independent JME cohorts, observing a significant effect of this SNP on epilepsy in the British and the Irish cohorts, but reprogramming-specific differentially methylated region, promoter-associated, and promoter-associated cell-type specific not in those of the German, Australian, and Indian [28]. Although the association of BRD2 and epilepsy is not clear, we tentatively speculate that the hypermethylation detected in BRD2 in our cohort may play a role in the occurrence of epilepsy in these patients. Two other hypermethylated DMRs detected in the SETD1B-related group analysis were found to be located on chromosomes 14 and 21 (chr14: 45431885-45432516; chr21:36258423-36259797; hg19) and assigned to genes KLHL28, FAM179B, and RUNX1. The former covers a CpG island with promoter activity and a DNAse hypersensitivity cluster (exon 1 of FAM179B) while the latter corresponds to a CpG island with promoter activity at exon 4 of RUNX1 and a DNAse hypersensitivity cluster. The biological function of these genes could not be  (Table 4). Interestingly, two patients presenting with the microdeletion, involving also KDM2B, were initially diagnosed with Beckwith Wiedemann syndrome (BWS) because of overgrowth and macroglossia, which are typical for BWS (MIM 130650). Hiraide et al. (2018) suggested that the deletion of KDM2B could be a possible reason for an overgrowth phenotype in these two patients [8]. Moreover, a KDM2B missense mutation (c.2503G > A) was identified to be associated with "paunch calf syndrome" [29]. The characteristic features of this syndrome include abdominal distension and tongue protrusion that are comparable with abdominal wall defects and macroglossia, features that are characteristics for BWS [30].
The results of this study show a strong effect of SETD1B function on DNA methylation. SETD1B is a known histone modifier that produces trimethylated histone H3 at Lys4 (H3K4me3), which may play a role in blocking of the de novo DNA methylation in some genomic regions. DNMT3L ((cytosine-5)-methyltransferase 3-like), which stimulates de novo DNA methylation, interacts only with unmodified H3K4. The methylation of H3K4 disables this interaction [31]. The loss of the function of SETD1B may lead to the insufficient production of H3K4me3 and, thereby, hypermethylation of the DNA in specific loci. Indeed, 82% of differentially methylated CpGs in patients with a SETD1B pathogenic variant were hypermethylated. The 18% of differentially methylated CpGs that were hypomethylated remain unexplained by this mechanism, but these may be secondary effects, caused by altered expression of target genes of SETD1B.
Syndromic disorders have often similar clinical features. Genetic testing has multiple limitations. For instance, the resolution often prevents it from detecting low-frequency mosaicism. Moreover, the reason underlying the clinical features can occasionally not easily be inferred from the variants if variant occurs in non-coding regions, contiguous genes are deleted, or if they have been annotated as VUS. Examination of specific DNAm signatures was previously described as a powerful solution in the classification of various unresolved cases including syndromic Mendelian disorders, imprinting disorders, repeat expansion disorders, and uncertain clinical diagnosis with VUS [16,17] and has therefore been proposed as a novel molecular diagnostic test. Our results reinforce this observation indicating that the specific DNAm signature has a diagnostic value and can be used as an additional diagnostic test to resolve variants of unknown significance in SETD1B.
Due to the small sample, we were unable to determine whether the loss of the KDM2B caused a specific DNAm signature. Studies including a sufficient number of patients are needed to solve this. The other limitation of our study was the technical differences between samples. Different DNA isolation methods between samples may influence the results.

Patients
Whole blood DNA samples from 13 individuals were collected for the methylation study. Seven patients had point mutations in SETD1B, which were identified by whole-exome sequencing (WES), and five chromosomal 12q24.12-32 aberrations. One of the five patients had        NA not available, "+" feature present, "-" feature absent the deletion involving KDM2B (1_del12q), two the deletion of both KDM2B and SETD1B (2-del12q, 3_ del12q), one the deletion on SETD1B (4_del12q), one the deletion not involving KDM2B and SETD1B, and one the duplication of 12q24.12 not involving KDM2B and SETD1B. Table 1 shows the genetic aberrations and inheritance of the patients included in the analysis. Figure 6 depicts the comparison between the deleted regions and genes in patients with microdeletions of 12q24.31 from the cohort (according to Hg19). Informed consent was obtained for each patient.

Healthy controls
Whole blood DNA samples were collected from 60 healthy individuals. Cohort details are listed in Additional file 4: Table S4.

Methylation EPIC array
The samples were divided into two batches: the first contained seven DNA samples from the patients (two females and five males) and 40 samples from the healthy controls (20 females and 20 males) and the second contained six DNA samples from the patients (two females Fig. 6 Comparison between deleted regions in patients with a microdeletion of 12q24. 31 and four males) and 20 from the healthy controls (ten females and ten males). The samples were randomized and sent to GenomeScan in Leiden (ISO/IEC 17025 approved), where the bisulfite treatment and the hybridization to the Infinium Methylation EPIC array (Illumina) were processed. The raw methylation data were obtained and the quality (QC) of the data assessed using the MethylAid script in R. (GenomeScan's Guidelines for Successful Methylation Experiments Using the Illumina Infinium® HumanMethylation BeadChip).

Normalization and data analysis
The EPIC array data was loaded onto the R software and normalized using the preprocessFunnorm function of "minfi" R package [32]. All probes containing SNPs (MAF > 0.01), cross-hybridization probes, and probes located on the sex chromosomes were excluded; 776,920 probes remained for analysis. The beta values (ratio of the methylated probe intensity ranging from 0 to 1) were obtained for all the patients from the cohort. Row beta values were normalized and PCA carried out.

Estimation of the blood cell type distribution
White blood cell type was estimated for each patient using estimateCellCounts function in R "FlowSorted.Blood.E-PIC" package [33]. The counts were calculated for CD8T (cytotoxische T cell), CD4T(T helper cells), NK (natural killer cells), B cell (B lymphocytes), mono (monocytes), and gran (granulocytes). The P value was calculated for each patient of our cohort (13 patients), for each cell type (Crawford-Howell t test; R software). Subsequently, the Bonferroni correction was applied for 78 tests (six cell types × 13 patients). We assume that the distribution of the cell types was significantly disturbed if the Bonferronicorrected P value for the cell types was less than 0.05.
Group analysis and identification of CpG sites for the DNAm-specific signature DNA methylation of patients in the groups (five patients in the SETD1B-related group and two in the KDM2B-related group) were compared with methylation in a group of 59 healthy controls using the "minfi" R-package. The design model was corrected for age, gender, batch, and cell distribution. The beta values were obtained and logit transformed into M values. The adjusted P values for the M values were calculated, and the significance threshold was 0.05. Finally, to avoid false-positive results, CpG sites with an effect size of at least 10% difference in an average of DNAm between patient groups and the control group were selected. In this way, we identified 3340 and 697 differentially methylated CpGs in the SETD1B-related group and KDM2B-related group, respectively.

Analysis of a specific methylation signature
Beta values of CpGs selected in the group analyses were used to perform the unsupervised hierarchical clustering ("pheatmap" R-package). Two heatmaps were created, one for the SETD1B-related group and the other for the KDM2B-related group. Each heatmap was created for all individuals in the cohort (13 patients and 60 controls).
Examination of the specificity of the SETD1B-related DNAm signature Whole blood DNA samples were collected from 502 patients with various neurodevelopmental syndromes. To compare the methylation values of our cohort with these additional samples, we performed re-normalization, according to the Illumina normalization method, with background correction using the "minfi" R-package. To select significant differentially methylated SETD1B-related CpGs, we used similar filtering steps for these in the SETD1B-related group analysis namely, a corrected P value less than 0.05 and an effect size of at least 10% difference. Correlated probes with r 2 higher than 0.8 were removed from this analysis. Multidimensional scaling (MDS) was used to examine the DNA methylation profiles. All samples used in this analysis and the details of the method were fully described by Aref-Eshghi et al. [16,17]. The list of 502 samples used in this specific analysis is listed in Additional file 5: Table S5.

Identification of differentially methylated regions
To identify the DMRs between patient and control groups, a "bumphunter" R-package was used. The design model was corrected for age, gender, batch, and cell distribution. The P value for each region was calculated and multiple testing applied according to the family-wise error rate. The significant DMRs were selected based on the two filter steps: (i) Fwer < 0.05 and (ii) at least three differentially methylated CpGs within the region (L > 2).

ORA-WEB-based Gene Set Analysis Toolkit
ORA were carried out for the first and unique gene symbol annotated to the CpGs identified during group analysis (according to the Infinium MethylationEPIC v1.0 B4 Manifest File). Basic parameters were as follows: organism-human, method-ORA, functional database-gene ontology (biological process and molecular function), and reference set for enrichment analysis-genome protein-coding. Advanced parameters were as follows: minimum number of genes for a category-5, maximum number of genes for category-2000, multiple test adjustment-Benjamini-Hochberg (BH), significant level-top 10, number of categories expected from set cover-10, number of categories visualized in the report-40, and color in DAG-continuous.