DNA methylation epi-signature is associated with two molecularly and phenotypically distinct clinical subtypes of Phelan-McDermid syndrome

Phelan-McDermid syndrome is characterized by a range of neurodevelopmental phenotypes with incomplete penetrance and variable expressivity. It is caused by a variable size and breakpoint microdeletions in the distal long arm of chromosome 22, referred to as 22q13.3 deletion syndrome, including the SHANK3 gene. Genetic defects in a growing number of neurodevelopmental genes have been shown to cause genome-wide disruptions in epigenomic profiles referred to as epi-signatures in affected individuals. In this study we assessed genome-wide DNA methylation profiles in a cohort of 22 individuals with Phelan-McDermid syndrome, including 11 individuals with large (2 to 5.8 Mb) 22q13.3 deletions, 10 with small deletions (< 1 Mb) or intragenic variants in SHANK3 and one mosaic case. We describe a novel genome-wide DNA methylation epi-signature in a subset of individuals with Phelan-McDermid syndrome. We identified the critical region including the BRD1 gene as responsible for the Phelan-McDermid syndrome epi-signature. Metabolomic profiles of individuals with the DNA methylation epi-signature showed significantly different metabolomic profiles indicating evidence of two molecularly and phenotypically distinct clinical subtypes of Phelan-McDermid syndrome.


Background
Phelan-McDermid syndrome (PHMDS) is characterized by developmental delay, absent or impaired speech, neonatal hypotonia, autistic behaviors and mild dysmorphic features [1,2]. It is caused by a contiguous gene deletion of the distal long arm of chromosome 22 (also referred to as 22q13.3 deletion syndrome). This loss of genetic material can be caused by a terminal or interstitial deletion of chromosome 22, an unbalanced translocation that can be inherited or de novo, or from other complex structural rearrangements involving chromosome 22 [2]. The size of the 22q13.3 deletion in PHMDS ranges from < 50 kb to > 9 Mb, including complete or partial deletion of the SHANK3 (SH3 and multiple ankyrin repeat domains 3) gene in virtually all cases.
A limited number of individuals with loss-of-function intragenic variants in the SHANK3 gene have been reported. Most individuals with SHANK3 intragenic variants have been described as having autism spectrum disorder (ASD) and/or intellectual disability [3][4][5][6]. A recent study had phenotypically characterized 17 individuals with pathogenic variants in SHANK3, demonstrating that SHANK3 intragenic pathogenic variants are sufficient to cause a broad range of features associated with PHMDS [7]. However, even though intellectual disability and ASD were present in the majority of individuals with SHANK3 mutations, speech impairment and motor deficits were less severe than in 22q13 deletions, and renal abnormalities were absent [7]. Taken together, there is compelling evidence, suggesting that SHANK3 haploinsufficiency is responsible for the majority of neurological features, i.e., intellectual disability and autism, but not the full spectrum of phenotypes observed in PHMDS [8][9][10]. There are at least two cases reported with 22q13 interstitial deletions sparing SHANK3 and phenotypes similar to those in PHMDS [11], suggesting an independent role for genes in these genomic regions. Another candidate gene located near SHANK3, and deleted in most cases of the syndrome, is the autism-linked gene IB2 (islet brain 2) that plays an important role in synaptic transmission and neuronal morphology [12].
Genotype-phenotype relationships in PHMDS have been extensively studied [13][14][15][16][17]. For most phenotypes, the individuals with the smallest terminal deletions (22q13.33) were less severely affected than those with the largest terminal deletions (22q13.2), except for ASD which is commonly associated with smaller deletions. These studies suggested that the effect of SHANK3, a gene associated with ASD, may be attenuated as the deletion size increases and additional genes are deleted. This may reflect the difficulty in diagnosing ASD in individuals that are more severely affected. In addition, these studies mapped a large number of potential candidate genes within 22q13 region, including genes associated with dysmorphic features, speech and behavior. Taken together, these studies emphasize that PHMDS is a clinically heterogenous syndrome caused by haploinsufficiency of multiple candidate genes. The large range of deletion sizes observed resulting in a large range of phenotypical features, together with the lack of knowledge of all candidate genes within the deleted region, complicates the clinical and molecular diagnosis and/or prognosis for these individuals.
DNA methylation is one of the most widely studied epigenetic mechanisms and is involved in many physiological and disease mechanisms through control of gene expression and genomic stability. The establishment of the DNA methylation pattern occurs early during embryonic development and varies significantly across different tissues. Genomic DNA methylation profiles in peripheral blood, referred to as DNA methylation epi-signatures, have been associated with many human traits, including age, sex and disease status [18][19][20][21]. More recently, we and others have reported evidence of highly sensitive and specific peripheral blood epi-signatures associated with genetic mutations in a growing number of genes and genetic syndromes associated with developmental delay/intellectual disabilities and multiple congenital anomalies [22][23][24][25]. In addition, microarray-based genomic DNA methylation assessment has enabled simultaneous detection of imprinted disorders and Fragile X syndrome, which is often within the spectrum of differential diagnoses of these neurodevelopmental syndromes [26,27].
In this study we assessed genome-wide DNA methylation profiles in a cohort of individuals with PHMDS, including individuals with large (2 to 5.8 Mb) 22q13. 3 deletions, with small deletions (< 1 Mb) or with intragenic variants in SHANK3. Given the molecular and phenotypic complexity of this disease, we hypothesized a differentiating DNA methylation epi-signature that would enable assessment of variants of unknown clinical significance, such as smaller deletions at 22q13 or SHANK3 variants, as well as diagnosis of ambiguous clinical cases. We describe a highly sensitive and specific DNA methylation signature associated with large 22q13.3 deletions. This signature is not evident in individuals with small deletions or SHANK3 intragenic variants, hence differentiating these as separate molecular entities. Using these data, we build a classification algorithm for PHMDS and demonstrate its ability to specifically identify individuals with PHMDS-large deletions differentiating them from other neurodevelopmental conditions with overlapping phenotypes, as well as other genetic conditions with recognized DNA methylation epi-signature. Metabolic profiling of cells from individuals with large 22q13 deletions revealed a functional impact of the disrupted DNA methylation by showing, among other features, reduced capacity of cells to produce energy utilizing high-efficiency aerobic pathways, decreased ability to adjust to metabolic stress and abnormal response to hormones and cytokines regulating growth and inflammation, as compared to controls or cells with small 22q13 deletions or SHANK3 variants. The metabolic findings provide in vitro validation of the hypothesis that abnormal genome-wide DNA methylation and large 22q13 deletions in PHMDS lead to a larger number of disrupted pathways and more severe clinical phenotypes than in cases with small deletions or SHANK3 variants and suggest the existence of distinguishable subtypes of PHMDS.

Genetic profiling of individuals with Phelan-McDermid syndrome
This study included 22 subjects with confirmed clinical features consistent with PHMDS and molecular/cytogenetic diagnoses of PHMDS.r These subjects had previous chromosome microarray testing performed for 22q13 genomic deletion detection, as well as Sanger sequencing of SHANK3 gene in individuals without 22q13 deletion. The molecular description at diagnosis and demographics of all subjects is shown in Table 1. Eleven individuals carried a large deletion at chr22q13 (referred to as PHMDS-Large Del), spanning a region > 1 Mb and including SHANK3 gene region. Of the remaining individuals (included in the PHMDS-Small Del/Mut cohort), five presented with smaller deletions, ranging from 0.06 to 0.92 Mb, including SHANK3 region; five individuals had SHANK3 intragenic mutations; and one individual was mosaic for a large deletion at chr22q13.

Clinical features
Clinical data were collected from medical records and from the PHMDS International Registry and are listed in Table 2. The most recurrent features showed no major differences between the two cohorts of individuals and included ASD or autistic traits (reported in almost all individuals) and other behavioral or mental disorders. Other common neurological traits, such as seizures, sleep problems and hypotonia, were also fairly equally  distributed between the two cohorts, while symptoms affecting neurosensory regulation or different systems (gastrointestinal, kidney, muscular/skeletal) appear to be more represented in the cohort with large 22q13 deletions, although for some of these features clinical information was available only for a limited number of individuals.

DNA methylation epi-signature in PHMDS
Genome-wide DNA methylation analysis was performed on peripheral blood DNA from the 22 subjects with confirmed clinical and molecular diagnoses of PHMDS using Illumina Infinium EPIC arrays. The PHMDS samples all had fewer than 1000 failed probes, passing the quality control requirements. A comparison was made between PHMDS cohorts and age-, sex-and batchmatched controls in a ratio of 4:1 (4 controls for each PHMDS samples). Figure 1a-c shows initial comparison using all PHMDS samples, using PHMDS samples with large deletions (> 1 Mb), and using PHMDS samples with small deletions (< 1 Mb). Results of this analysis showed that only the PHMDS-Large Del cohort showed significant methylation difference when compared to controls. Whereas analysis including all PHMDS individuals showed only 11 probes with significant methylation differences (Fig. 1a), analysis of the PHMDS-Large Del cohort analysis showed 1022 differentially methylated probes (Fig. 1b). PHMDS-Small Del cohort not shows any methylation differences from control samples (Fig. 1c). Thus, identification of the epi-signature and creation of a classification model were performed for the PHMDS-Large Del cohort only (n = 11). Briefly, analysis including PHMDS-Large Del cases and matched controls identified 1022 probes with a minimum of 10% methylation difference between the two cohorts and a multiple testing corrected p value < 0.01 (limma multivariable regression modeling), adjusted for blood cell type compositions (Additional file 1: Table S1). Information about controls, including sex, age and blood cell compositions, is shown in Additional file 2: Table S2. Hierarchical clustering and multiple dimensional scaling (MDS) demonstrate that the 1022 selected probes strongly separate the PHMDS-Large Del individuals from a larger cohort of controls (Fig. 1e, f ). PHMDS cases with small deletions or intragenic mutations in SHANK3 gene do not present with the epi-signature, clustering together with the control group (Fig. 1e, f ). As for comparison, the 11 probes identified using all PHMDS samples were used for MDS analysis and were not able to clearly distinguish between PHMDS samples and controls (Fig. 1d). Cross-validation using PHMDS-Large Del samples was performed to validate sensitivity of our epi-signature. For each round of validation, nine of the eleven PHMDS large deletion samples were used for probe selection along with matched controls and the remaining two PHMDS large deletion samples were saved for testing. Multidimensional scaling showed that every time the two testing samples clustered with the other PHMDS sample (Additional file 3: Figure S1).

Identification of differentially methylated regions (DMR)
Using the DMRcate algorithm [28], we prioritized a total of 29 DMRs for the PHMDS-Large Del epi-signature based on the following criteria: three or more probes less than 1 kb apart, > 10% average regional methylation change and a false discovery rate (FDR) of < 0.01, adjusted for blood cell-type compositions (Additional file 4: Table S3, Fig. 2). The vast majority of the DMRs identified in the PHMDS-Large Del epi-signature involved hypermethylation events (n = 24). These regions are located across multiple chromosomes and could involve clinically relevant genes. Notably, some of these genes are involved in neurotransmission regulation (ARPP21, OMIM# 605488), brain development (EBF4, OMIM #609935), and few of them are associated with other Mendelian conditions (IFT140, OMIM# 614620 and LAMA1 OMIM#150320). Gene ontology enrichment analysis was performed for the DMR (Additional file 3: Figure S2). This analysis showed enrichment for pathways involved in cell morphogenesis, neural tube development, forebrain development and intracellular signal transduction. However, the number of DMRs is small to result in a statistically significant enrichment.

Development of the MVP score for Phelan-McDermid syndrome
The 11 PHMDS subjects with large deletions (PHMDS-Large Del cohort) and 44 matched controls, plus 75% of the remaining controls and 75% of the other syndrome samples from our EpiSign Knowledge Database (EKD) were used for model training. We limited the analysis to probes shared by both EPIC and 450 k platforms (n = 399,092). Probes were filtered to those with a minimum of 10% methylation difference from controls. The final probe list was used to train a multi-class support vector machine (SVM) with linear kernel on the training cohort. The methylation variant pathogenicity (MVP) score was set to generate a single score from 0 to 1, with 1 being a methylation pattern similar to the case samples and 0 being a methylation patter similar to the control samples. The class obtaining the greatest score determined the epi-signature classification. A series of tests were performed to challenge the reliability of the model. For classifier testing we used the 11 PHMDS small deletion samples plus the remaining 25% of controls and other syndrome samples from our EKD. First, . Each dot represents one probe and significantly changed probes are in red. No significantly changed were found for the PHDMS small deletion samples. d The 11 probes identified using all PHMDS samples were used for MDS analysis and were not able to clearly distinguish between PHMDS samples and controls. Besides the controls used for probe selection (training), additional controls from our database (testing) were tested. e The 1022 probes identified using the PHMDS large deletion samples were able to clearly distinguish between PHMDS and control samples by MDS. f Hierarchical clustering using the same probes and samples as in e. A distinct cluster mainly representing hypermethylation events, and the other 11 subjects cluster with controls. The top lane in the heatmap indicates the phenotype. The heatmap color scale from blue to red represents the range of the methylation levels (beta values) between 0 and 1 the entire PHMDS-Large Del cohort was classified by the model. The correct classifications were assigned to all subjects predicted to have PHMDS with typical large deletion, with scores close to 1 and significantly different from controls. PHMDS Small Del/Mut individuals were assigned a score close to 0, similar to controls. To measure the specificity of the classifier, we tested DNA methylation profiles from over 1500 subjects with a confirmed diagnosis of a neurodevelopmental disorders, including trinucleotide repeat expansion abnormalities, imprinting defect disorders, BAFopathies, Mendelian disorders of the epigenetic machinery, Down syndrome, as well as 185 subjects with non-syndromic ASD (Fig. 3). All samples were classified as controls, further confirming the specificity of the PHMDS-Large Del classifier.

CNV assessment from EPIC array
Using the normalized raw methylated and unmethylated intensities from EPIC array, we were able to identify Copy Number Variants (CNV) in this patient cohort. Of the 17 individuals with known deletions at 22q13, including 2 individuals with SHANK3 deletion only, we were able to identify all deletions, except for the mosaic case ( Table 3). The deletion sizes range from 0.013 to 5.98 Mb. Strong correlation of the genomic coordinates and deletion sizes from EPIC versus chromosome microarray (CMA) testing was observed, proving that EPIC array can accurately be used to detect microdeletions at 22q13 in addition to providing information regarding the epi-signature. The mosaic case is undetermined as we were unable to detect the deletion or the epi-signature. The level of mosaicism in this sample is unknown and could be under the limit of detection of EPIC array.

Identification of the PHMDS DNA methylation epi-signature critical genomic region
We next evaluated the genomic characteristics of PHMDS individuals that present with an epi-signature versus individuals without a signature. Figure 4 shows the genomic location of the 22q13 deletions in 16 individuals, of whom 11 presented with epi-signature (PHMDS-Large Del cohort), and 5 other who do not show epi-signature. This analysis lets us narrow down the critical genomic region (Chr22: 49,238,248,907) shared between all subjects with epi-signature and not shared with subjects without the epi-signature. The only fully contained protein-coding gene in the PHMDS DNA methylation epi-signature critical region is BRD1 (bromodomaincontaining protein 1; OMIM# 604589). In addition, this critical region of overlap contains the transcript C22orf34 and exon 1 of ZBED4 (NM_014838.2), as well as the nonprotein-coding RNAs LOC100128946 and LOC90834.

Metabolic profiling
When compared to 50 control lymphoblastoid cell lines (LCLs), the 9 cell lines from individuals with small 22q13 deletions (< 1 Mb) or pathogenic variants of SHANK3 showed only one well that reached statistical significance according to adjusted p value: α-d-glucose from the tryptophan (Trp) plate (adjusted p value = 0.006). However, in 11 cell lines from individuals with large 22q13 deletions (> 1 Mb), 340 out of the total 776 wells (43.8%)  Table S3.
reached an adjusted p value < 0.05. Of these 340 wells, all but two (containing sodium chloride at two different concentrations, adjusted p values = 0.027 and 0.033) showed reduced NADH production in the samples with large deletions (Additional file 3: Figure S3, Table 4, Additional file 5: Table S4, Aditional file 6: Table S5). Assessment of individual plates showed that no particular patterns emerged in the small-deletion cohort, except for a reduced capacity of utilizing amino acids as energy sources, including tryptophan from the Trp plate, lower levels of NADH in the presence of 3-isobutyl-1-methylxanthine (3 out of 6 wells) and higher levels in the presence of sodium chloride (3 out of 6 wells).
Alternatively, the large deletion cohort showed several patterns of disrupted metabolic pathways. The PM-M1 plate showed a pattern of reduced utilization as energy source of the main carbohydrates-α-d-Glucose (p = 0.01), confirmed also by the data from the Trp plate (p = 0.00007), fucose (p = 0.007), galactose (p = 0.002), mannose (p = 0.017), dextrin (p = 0.029)-and their phosphorylated or methylated forms, of intermediates A SVM classifier was used to generate a score from zero to one for each subject as the probability of having a DNA methylation profile similar to what is observed in the PHMDS epi-signature. The Y-axis represents scores generated for each of the patient/control individuals on the X-axis. Every circle represents a single sample. The PHMDS large deletion samples, the matched controls used for probe selection and 75% of all other samples (controls and samples from patients with other neurodevelopmental syndromes) were used for training the classifier (blue). The PHMDS small deletion samples and remaining 25% of the controls and neurodevelopmental samples were used for testing (grey). Other neurodevelopmental conditions include subjects diagnosed with imprinting defects (Angelman, Prader-Willi, Beckwith-Wiedemann and Silver-Russell syndromes), non-syndromic autism spectrum disorders, BAFopathies, RASopathies, autosomal dominant cerebellar ataxia, deafness, and narcolepsy, ATRX, Borjeson-Forssman-Lehmann syndrome, Coffin-Lowry, Cornelia de Lange, CHARGE, Claes-Jensen, Down, Dup7, Floating-Harbor, Fragile X, Genitopatellar, Kabuki, Kleefstra, Rett, Sotos, Weaver, Cerebellar ataxia, deafness, and narcolepsy, immunodeficiency-centromeric instability-facial anomalies syndrome 1, Epileptic encephalopathy, Koolen-De Vries syndrome, SBBYSS syndrome, Rubinstein-Taybi syndrome 1, Rhaman syndrome, Helsmoortel-van der Aa syndrome, Autosomal dominant Mental retardation, X-linked Mental retardation, Tatton-Brown-Rahman syndrome Wiedemann-Steiner syndrome and Williams syndromes of the Krebs cycle like pyruvic (p = 0.026) and a-ketobutyric acids (p = 0.019), ketone bodies like acetoacetic (p = 0.027) and β-hydroxy-butyric acid (p = 0.003) and molecules with important roles in the nervous tissue, such as n-acetyl-neuraminic acid (p = 0.018) and γ-amino-butyric acid, or GABA (p = 0.007). The large deletion cohort showed no significant differences in the utilization of alternative energy sources. PM-M2 to M4 and Trp plates in the large-deletion cohort demonstrated a pattern of generalized reduction of the utilization of amino acids and dipeptides as energy sources, with over-representation of large amino acids, such as Trp, Phe, Tyr, Val, Leu and Ile (p < 0.05). Exposure to many ionic species caused reduced NADH levels in cells carrying large 22q13 deletions, particularly the ones containing sodium, namely sodium chloride (p = 0.0001 to 0.045), sodium tungstate (p = 0.00007 to 0.007), sodium phosphate (p = 0.0006 to 0.003), sodium pyrophosphate (p = 0.00006 to 0.0003) and sodium nitrate (p = 0.0003 to 0.008), and the ones containing chloride, such as lithium chloride (p = 0.0008 to 0.01), ferric chloride (p = 0.007 to 0.049), ammonium chloride (p = 0.0003 to 0.007), zinc chloride (p = 0.018 to 0.046), copper chloride (p = 0.001 to 0.007) and cobalt chloride (p = 0.0007 to 0.017). Overall 171 out of 288 wells (59.4%) from the combined PM-M6 to M8 plates showed reduced energy production in cells carrying large 22q13 deletions (adjusted p values < 0.05). Among the metabolic effectors showing the most significantly abnormal values there were epinephrine (p = 0.0001 to 0.0008), steroids and related regulatory hormones, like progesterone (p = 0.0001 to 0.002), dexamethasone (p = 0.0003 to 0.01), 4,5-α-dihydrotestosterone (p = 0.001 to 0.005), LH (p = 0.0009 to 0.007) and LH-RH (p = 0.006 to 0.024), hormones involved in the thyroid function and regulation, such as triiodothyronine (p = 0.00002 to 0.005), TSH (p = 0.0009 to 0.01) and TRH (p = 0.016 to 0.035), hormones with anabolic or pro-digestive effects, like resistin (p = 0.0002 to 0.005), ghrelin (p = 0.0003 to 0.002) and gastrin (p = 0.0004 to 0.007), growth factors, like somatotropin (p = 0.0002 to 0.002) and fibroblast growth factor 1, FGF-1 (p = 0.0002 to 0.002), and cytokines, such as IL-1β (p = 0.0002 to 0.005), IL-2 (p = 0.007 to 0.03), IL-6 (p = 0.002 to 0.023) and IFN-γ (p = 0.012 to 0.031).

Discussion
This study describes a novel genome-wide DNA methylation epi-signature associated with PHMDS. By assessing a patient cohort with variable breakpoints within the 22q13.3 locus and individuals with SHANK3 mutations, we identified a critical region including the BRD1 gene which is required for the PHMDS epi-signature. Individuals with DNA methylation epi-signature showed significantly different metabolomic profiles when compared to the epi-signature negative individuals with small deletions sparing BRD1 and with SHANK3 mutation, indicating evidence of two clinical subtypes within PHMDS that are distinct at the molecular and phenotypic levels.
The genomic region associated with PHMDS varies in size and contains multiple protein-coding genes as well as miRNAs and other non-coding RNAs. Deletions, duplications and mutations involving the SHANK3 gene within the chromosomal region 22q13.33 are considered to be responsible for many of the neurological findings that can be seen in PHMDS subjects [2,7,8,29]. Severity of phenotype has been correlated with an increased deletion size in multiple studies [8,13,30]. However, there are reports of smaller deletions in subjects with some features of a more severe phenotype including expressive speech delay [31], as well as interstitial deletions not involving the SHANK3 gene in subjects with clinical features of PHMDS [11]. These and many additional studies have suggested that other, more proximal genes, are responsible for the severe phenotypic features observed in individuals with larger deletions of 22q13 [4,11,13,32]. Currently, it is unclear whether other genes within the 22q13 region contribute to the neurological and additional phenotypic features of the disorder. Our findings show that only individuals with large deletions (2 to 6 Mb) of 22q13 have a distinct epigenetic signature despite the disruption of SHANK3 in all individuals (Fig. 4). This suggest that SHANK3 is not responsible for the methylation alterations observed in PHMDS.
The common region of overlap within the large deletion cohort contains the genes BRD1, C22orf34 and the first exon of ZBED4, as well as the non-protein-coding RNAs LOC100128946 and LOC90834. Of these genes the bromodomain containing protein 1 (BRD1) gene at chromosome region 22q13.33 is the most likely candidate for the observed genome-wide DNA methylation defects observed in these individuals. BRD1 is a component of a histone acetyltransferase complex that can stimulate acetylation of histone H3 [33,34] through interactions with chromatin remodeling proteins, e.g., PBRM 1 and histone modifiers, e.g., MYST2 and SUV 420 H1 [33]. Disruption of histone acetylation mechanisms are consistent with our findings showing predominant DNA hypermethylation in PHMDS individuals whose deletions include the BRD1 gene.  Currently there is a limited genotype-phenotype association of BRD1 loss in individuals with PHMDS. In one study focused on identification of 22q13 genes most likely to contribute to PHMDS, BRD1 was estimated to result in a high loss of function intolerance in these individuals [17]. This gene has been implicated in susceptibility to schizophrenia and bipolar disorders due to its effect on transcriptional regulation of numerous genes and brain development [33,[35][36][37]. Although psychiatric presentation of PHMDS is not well characterized, there are a few cases reported with schizophrenia and other psychiatric features [38,39].
Evaluation of the differentially methylated regions identified in this cohort of individuals with large deletions at 22q13 indicates several genes that may play a role in the pathophysiology of this syndrome (Additional file 4: Table S3). For example, the LAMA1 gene is associated with autosomal recessive Poretti-Boltshauser syndrome, characterized by cerebellar dysplasia, myopia, variable retinal dystrophy and eye movement abnormalities ataxia, delayed motor development, language impairment and intellectual disability. IFT140 is associated with autosomal recessive Mainzer-Saldino syndrome, a disorder characterized by kidney disease, eye problems and skeletal abnormalities. EBF4 and ARPP21 are both involved in brain development and neurotransmission. Further studies are necessary to evaluate the effect of methylation change on transcriptional activity of the related proteins and their impact on the clinical phenotype in PHMDS.
Consistent with genome-wide disruption in DNA methylation, metabolic profiling of subjects with the large 22q13 deletions including BRD1 gene showed a clearly recognizable metabolic profile, characterized, among the other features, by a reduced capacity of the cells to produce energy in the presence of high-efficiency energy sources, a decreased ability to adjust to metabolic environments influenced by different concentrations of ionic species and an abnormal response to hormones and cytokines involved in regulating growth, proliferation, energy storage and inflammation. PHMDS subjects with the small 22q13 deletions sparing BRD1 gene or with SHANK3 pathogenic variants are metabolically undistinguishable from controls. The two cohorts shared only a reduced utilization of amino acids as energy sources, although the values were more significant in the cases with large deletions. This metabolic feature has been reported in cases with isolated ASD [40,41], and all 20 individuals tested by the Biolog arrays presented autistic traits; therefore, it is possible to infer that in individuals with PHMDS the presence of autism as well as the disruption of the metabolic pathways involved in amino acids utilization are not influenced significantly by the size of the 22q13 deletion. Taken together, our findings demonstrated that individuals with large deletions encompassing BRD1 gene present with robust epigenetic and metabolic alterations. The resulting metabolic alteration may be caused by either or both the epigenetic alteration, which can alter the expression of critical metabolic genes, and/or loss of a larger genetic material, which may contain genes for metabolic pathways.
Comparison of the clinical features presented by the individuals with PHMDS in the PHMDS Large Del versus

Table 4 Selected significant compounds identified by metabolic profiling of cell lines from the PHMDS (n = 11) and PHMDS Small Del/Mut (n = 9) cohort
For the selected relevant compounds, we considered trends based on unadjusted p values to highlight the clustering of metabolic findings suggestive of specific pathways the PHMDS Small Del/Mut cohorts showed no significant differences in the most common traits, involving neurobehavioral issues, such as ASD or autistic traits and other behavioral or mental disorders, and other neurological signs and symptoms, such as seizures, sleep disorders and hypotonia (Table 3). These findings are in line with the widely accepted mechanistic model suggesting a central role for SHANK3 haploinsufficiency or disruption in the pathogenesis of most neurological features of PHMDS [4,7,10]. Other neurological traits such as abnormal regulation of body temperature and high tolerance to pain seem to be more frequent among individuals with large 22q13 deletions, but clinical data about these traits were available only for a limited number of individuals. Finally, non-neurological problems, involving gastrointestinal, renal and muscular/skeletal systems, resulted more represented in the cohort with large deletions, although also in these cases information was available for a limited number of cases. As observed in the metabolic findings, this greater involvement of different systems in individuals with large 22q13 deletions is likely reflecting the disruption of more genes and, consequently, more pathways, due to the abnormal methylation mediated by the loss of BRD1. Taken together, we showed that PHMDS individuals with the large 22q13 deletions including the BRD1-critical region present with robust metabolic, genomic and epigenomic alterations that likely contribute to a more severe and variable phenotype in PHMDS. Identification of individuals with loss of function mutations in the BRD1 gene specifically will help more precisely determine the contribution of BRD1 to the more complex range of presentations in the genetically heterogeneous PHMDS subtypes.
Neurodevelopmental disorders, including autism, schizophrenia, Down syndrome, Rett and Fragile X syndromes, Phelan-McDermid, Sotos, Kleefstra, Coffin Lowry and ATRX syndromes, and the disorders of imprinting, Angelman and Prader-Willi syndromes, are accompanied by aberrant epigenetic regulation of processes critical for normal and brain development [42]. However, the direct role of this methylation alteration on the pathophysiology of many of these diseases still needs further investigation. Our group has previously identified epi-signatures that can be used to specifically diagnose subjects with a variety of neurodevelopmental conditions, including syndromes that may clinically overlap with PHMDS [22]. The use of DNA methylation signatures can help solve many clinically ambiguous cases presenting with a neurodevelopmental phenotype.
While whole-genome chromosome microarray and sequencing techniques remain the standard approach for assessment of individuals with suspected PHMDS, a genomic DNA methylation array may be used to augment this approach. Previous studies have highlighted the ability to detect copy number variants in DNA methylation arrays at comparable sensitivity to other established methods, e.g., chromosomal microarray [43][44][45]. We have shown the efficacy of applying computational methods to methylation data from the EPIC assay to determine copy number variants in individuals with PHMDS. There were no significant differences in the size or the coordinates identified by the EPIC assay versus the detection method used at time of diagnosis ( Table 2). Implementation of whole genome methylation array in PHMDS diagnosis and potentially in other microdeletion syndromes will enable the identification of epi-signatures as well as determination of the deletion/duplication in a single test.

Conclusion
The capability of the methylation array to detect both the methylation epi-signatures from a variety of clinically related neurodevelopmental disorders and copy number variants on the same array may have the potential to be applied as a more informative and cost-effective first-tier test to screen individuals with a broad range of developmental disorders. In addition, the identification of a PHMDS epi-signature in an individual may suggest the development of specific PHMDS phenotypes with a more severe and variable presentation, including risk for schizophrenia. Finally, investigation of the genes affected by the abnormal DNA methylation may lead to the identification of novel targets for more personalized treatment approaches.

Study cohort
This patient cohort included 22 individuals, 11 males and 11 females, with diagnosis of PHMDS referred for genetic testing at the Greenwood Genetic Center, USA, and Federico II University, Italy. The current diagnosis criteria for Phelan-McDermid syndrome include both clinical findings and detection of a heterozygous deletion of chromosome 22q13.3 with involvement of at least part of SHANK3, or a heterozygous pathogenic variant in SHANK3 on molecular genetic testing (https ://www. ncbi.nlm.nih.gov/books /NBK11 98/). The clinical criteria for the diagnosis include developmental and language delay, hypotonia, autistic traits and/or other behavioral problems, seizures, sleep disturbances and minor dysmorphic traits. The cases were reviewed by Drs. Curtis Rogers and Luigi Boccuto, who possess vast experience in PHMDS. After initial molecular/cytogenetic assessment, 15 individuals had 22q13 deletions, one of which was mosaic, with deletion sizes ranging from 0.06 to 5.87 Mb, 2 individuals with a SHANK3 gene deletion (0.013 Mb) and 5 individuals with SHANK3 loss-of-function pathogenic variants. Ages spanned from 3 to 19 years, with a median age of 10 years and a mean age of 10.3 years.
Genomic methylation analysis was performed on peripheral blood-extracted genomic DNA. Samples were subdivided into two cohorts: typical PHMDS cohort (referred to as PHMDS-Large Del), including 11 cases with large (> 1 Mb) deletions at 22q13; and small deletion/SHANK3 mutation cohort (referred to as PHMDS Small Del/Mut), including 10 cases with 22q13 deletions smaller than 1 MB and SHANK3 intragenic mutations. A mosaic case with typical deletion was included in the PHMDS Small Del/Mut cohort.
The set of controls that were used for mapping the episignatures, feature selection and model training were chosen from our EpiSign Knowledge Database (EKD). Genomic DNA methylation profiles from individuals with other congenital syndromes that are commonly involved on the differential diagnosis of PHMDS were also used for assessment of the specificity of the epi-signature (EpiSign Knowledge Database [22]) and included a cohort of individuals with Prader-Willi syndrome and Angelman syndrome, Fragile X syndrome, Rett syndrome, FG syndrome 1, Sotos syndrome and ASD.
Any subject used herein to represent a condition had a confirmed clinical diagnosis of the aforementioned syndrome and was screened for mutations in the related genes. The mutation report from every individual was reviewed according to the American College of Medical Genetics and Genomics (ACMG) guidelines for interpretation of genomic sequence variants [46], and only individuals confirmed to carry a pathogenic or likely pathogenic mutation together with the clinical diagnosis were used to represent a syndrome.

Cytogenetic analysis
Genetic rearrangements were measured from whole blood DNA specimens by chromosome microarray analysis (CMA) using a custom 4 × 44 K 60-mer oligo array designed to cover 22q12.3-terminus by Oxford Gene Technology (Oxford, UK) as described by Sarasua et al. [13]. CMA genomic coordinates of breakpoints were established according to the 2006 human genome build (NCBI 36/HG 18) and converted from hg18 to hg19 using the UCSC LiftOver tool [47].

Sanger sequencing
Sequencing of SHANK3 was completed using standard Sanger sequencing protocol including coding sequences ± 20 nucleotides intronic. All variants identified by direct sequencing were analyzed using available bioinformatic websites to assess their potentially deleterious effect on the proteins and classified according to ACMG guidelines [46].

Methylation array and quality control
DNA methylation protocol, analysis and epi-signature construction were performed according to our previously published protocol [22,23,48,49]. Peripheral whole blood DNA was extracted using standard techniques. Following bisulfite conversion, DNA methylation analysis of the samples was performed using the Illumina Infinium methylation EPIC bead chip arrays (San Diego, CA), according to the manufacturer's protocol. The resulting methylated and unmethylated signal intensity data were imported into R 3.5.1 for analysis. Normalization was performed using the Illumina normalization method with background correction using the minfi package [50]. Probes with detection p value > 0.01, those located on chromosomes X and Y, those known to contain SNPs at the CpG interrogation or single nucleotide extension, and probes known to cross-react with chromosomal locations other than their target regions were removed, resulting in 753,265 probes remaining for the analysis. Arrays with more than 5% failure probe rate were excluded from the analysis. Sex of the subjects was predicted using the median signal intensities of the probes on the X and Y chromosomes, and those samples discordant between the labeled and predicted sex were not used for analysis. All of the samples were examined for genome-wide methylation density, and those deviating from a bimodal distribution were excluded. Factor analysis using a principal component analysis (PCA) was performed to examine the batch effect and identify the outliers.

Selection of matched controls for methylation profiling
For mapping the epi-signature (probe and feature selection), matched controls were randomly selected from our EpiSign Knowledge Database (EKD). All of the PHMDS samples were assayed using the EPIC array. Therefore, all controls selected for epi-signature identification were analyzed using the same array type (EPIC). Matching was done by age, sex and batch using the MatchIt package. The control sample size was increased until both the matching quality and sample size were optimized and consistent across all analyses. This led to the determination of a control sample size four times larger than that of the cases in every analysis. Increasing the sample size beyond this value compromised the matching quality. After every matching trial, a PCA was performed to detect outliers and examine the data structures. Outlier samples and those with aberrant data structures were removed before a second matching trial was conducted. The iteration was repeated until no outlier sample was detected in the first two components of the PCA.

DNA methylation profiling of Phelan-McDermid syndrome
The methylation level for each probe was measured as a beta value, calculated from the ratio of the methylated signals versus the total sum of unmethylated and methylated signals, ranging between zero (no methylation) and one (full methylation). This value was used for biological interpretation and visualization. For linear regression modeling, beta values were logit transformed to M values using the following equation: log2(beta/(1 − beta)). A linear regression model using the limma package [51] was used to identify the differentially methylated probes. The analysis was adjusted for blood cell-type compositions, estimated using the algorithm developed by Houseman et al. [52]. The estimated blood cell proportions were added to the model matrix of the linear models as confounding variables. The generated p values were moderated using the eBayes function in the limma package and were corrected for multiple testing using the Benjamini and Hochberg method. Probes with a corrected p value < 0.01 and a methylation difference greater than 10% were considered significant. The effect size cutoff of 10% was chosen to avoid reporting of probes with low effect size and those influenced by technical or random variations as conducted in our previous studies [23,49]. The analysis was repeated 3 times: 22 PHMDS samples with 88 matched controls, 11 PHMDS-Large Del with 44 matched controls, 11-Small Del with 44 matched controls. Because only the cohort of individuals with PHMDS with large deletions (> 1 MB) showed significant methylation difference, this and the following steps were performed using only the cohort with large deletions (referred to as PHMDS Large Del).

Clustering and dimension reduction
Following every analysis, the selected probes were examined using a hierarchical clustering and a multiple dimensional scaling to examine the structure of the identified epi-signature. Hierarchical clustering was performed using Ward's method on Euclidean distance by the gplots package. Multiple dimensional scaling was performed by scaling of the pair-wise Euclidean distances between the samples.

Leave-2-out cross-validation using PHMDS large deletion samples
For each round of validation, nine of the eleven PHMDS large deletion samples were used for probe selection along with matched controls and the remaining two PHMDS large deletion samples were saved for testing. Multidimensional scaling was used to cluster the samples. This was repeated 55 times for each combination of pairs of PHMDS-Large Del samples.

Identification of the differentially methylated regions
To identify genomic regions harboring methylation changes (differentially methylated regions-DMRs), the DMRcate algorithm was used [28]. First, the p values were calculated for every probe using multivariable limma regression modeling. Next, these values were kernel smoothed to identify regions with a minimum of three probes no more than 1 kb apart and an average regional methylation difference > 10%. We selected regions with a Stouffer transformed false-discovery rate (FDR) < 0.01 across the identified DMRs. The analysis was performed on the same sets of cases and controls used for methylation profiling and was adjusted for blood celltype compositions. Gene ontology analysis with the differentially methylated genes was performed using http:// www.webge stalt .org/.

Construction of a classification model for Phelan-McDermid syndrome
A classification model, referred to as methylation variant pathogenicity (MVP) score, was created to assess the specificity of the identified methylation signature using all of the identified probes. We trained a support vector machine (SVM) with linear kernel on the PHMDS-Large Del cases and controls. For classifier/model training we compared the 11 PHMDS-Large Del samples to the same 44 controls from probe selection, plus 75% of the remaining controls and 75% of the other syndrome samples from our EKD. For classifier testing we used the 11 PHMDS small deletion samples plus the remaining 25% of controls and other syndrome samples from our EKD. The EKD samples include both EPIC and 450 K samples to ensure the classifier works with both array types. Given the majority of the samples to be tested later were assayed using 450 k array, we limited the analysis to probes shared by both array types. Training was done using the e1071 R package. To determine the best hyperparameter used in linear SVM (cost-C), and to measure the accuracy of the model, tenfold cross-validation was performed during the training. In this process, the training set was randomly divided into tenfold. Ninefold was used for training the model and onefold for testing. After tenfold repeating of this iteration, the mean accuracy was calculated, and the hyperparameters with the most optimal performance were selected. For every subject, the model was set to generate a score ranging 0-1, representing the confidence in predicting whether the subject has a DNA methylation profile similar to PHMDS-Large Del. Conversion of SVM decision values to these scores was done according to the Platt's scaling method [53].
The class obtaining the greatest score determined the predicted phenotype. A classification as PHMDS was made when a sample received the greatest score for that class (normally greater than 0.5). The final model was applied to both training a large cohort of individuals with other neurodevelopmental disorders as well as a group of healthy controls to determine the specificity of the signature.

Validation of the classification model
We ensured that the model is not sensitive to the batch structure of the methylation experiment by applying it to all of the samples assayed on the same batch as the cases used for training. To confirm that the classifier is not sensitive to the blood cell-type compositions, we downloaded methylation data from isolated cell populations of healthy individuals from GEO (GSE35069) [54], supplied them to the classification model for prediction and examined the degree to which the scores were varied across different blood cell types. Next, the model was applied to the patient cohort to evaluate the predictive ability of the model on affected subjects. To determine the specificity of the model, we supplied a large number of DNA methylation arrays from healthy subjects to the model. To understand whether this model was sensitive to other medical conditions presenting with neurodevelopmental disorders and intellectual disabilities, we tested a large number of subjects with a confirmed clinical and molecular diagnosis of such syndromes by the model. These subjects are part of the EpiSign Knowledge Database housed at London Health Sciences and include > 5000 individuals with various neurodevelopmental disorders and non-affected controls. Information about this database can be found in our previous publications [22,23].

Copy number variation (CNV) analysis using methylation array data
To estimate the copy number alterations in the PHMDS samples from the infinium methylation array, the raw methylated and unmethylated intensities from every PHM This may reflect the difficulty DS sample and the same number of controls from the same batch were summed, and quantile normalized using the preproc-essCore package (Bioconductor.org). The normalized matrix values in all samples were divided by the median values of every probe across the normal samples. The divided ratios were then log10 transformed, smoothed and segmented using the DNAcopy Bioconductor package (Bioconductor.org) to identify genomic regions in every sample showing a copy number change. A p value of < 0.01 obtained from 10,000 permutations was used to define a change point during segmentation. Neighboring segments with an average difference in ratio of < 0.05 were joined before a visual comparison of the ratio plots, and the identified breakpoints led to the determination of the CNV coordinates.

Metabolic profiling Lymphoblastoid cell lines (LCLs)
Peripheral blood samples were collected from 20 individuals with PHMDS by venipuncture (individuals MS2676 and MS2675 were excluded from this analysis). Lymphoblastoid cell lines (LCLs) were obtained by immortalization via Epstein-Barr virus of lymphocytes isolated from the blood samples. The lymphoblastoid cell lines were harvested in Sigma RPMI-1640 with 15% fetal bovine serum (FBS) from Atlanta Biological (Flowery Branch, GA, USA) and 2 mM l-Glutamine, 100 U/mL Penicillin and 100 µg/mL Streptomycin from Sigma-Aldrich (St. Louis, MO, USA).

Metabolic profiling via Biolog Phenotype Mammalian MicroArrays (PM-Ms).
Metabolic profiling was measured to assess impact of position effects on metabolism. The Phenotype Mammalian MicroArray (PM-M) developed by Biolog (Hayward, CA, USA) is designed to measure the cellular production of NADH (nicotinamide adenine dinucleotide, reduced form) in the presence of different compounds. Lymphoblastoid cell lines were used to measure metabolic dysregulation in Biolog Metabolic Arrays. These cell lines generated from the patient's blood sample via Epstein-Barr virus transfection were counted utilizing a TC20 ™ Automated Cell Counter in order to measure the amount and the percentage of viable cells. The methodology employs 96-well microplates with diverse molecules, which act either as energy sources (plates PM-M1 to M4) or as metabolic effectors (plates PM-M5 to M8). Each well contains a single chemical, and production of NADH per well is monitored using a colorimetric redox dye chemistry. The energy sources include carbohydrates, nucleotides, carboxylic acids and ketone bodies in plate PM-M1, amino acids, both alone and as dipeptides (plates PM-M2 to M4). The metabolic effectors include ions (PM-M5), hormones, growth factors and cytokines (PM-M6 to M8). These metabolic effectors (PM-M5-M8) were tested in different concentrations for each compound (Additional file 6: Table S5). The custom tryptophan plate (Trp) generated by Biolog in collaboration with Greenwood Genetics Center (GGC) was also employed in consideration of previously published data showing decreased utilization of tryptophan as energy source by cells from individuals with ASD [41]. This plate is constituted by twelve 8-well columns containing glucose, empty well, tryptophan alone and 5 dipeptides in which tryptophan is combined, respectively, with glycine, lysine, leucine, arginine and alanine. Overall, 776 wells were analyzed for each cell line: 96 for each plate from PM-M1 to PM-M8 (768) and 8 from the PM-Trp plate. A list of chemicals used in the 776 wells is included in Additional file 5: Table S4. PM-M plates were incubated with 20,000 lymphoblastoid cells per well (40,000/well for the Trp plate) in a volume of 50 μl, using the modified Biolog IF-M1 medium. The medium for plates PM-M1 to M4 was prepared by adding the following to 100 mL of Biolog IF-M1: 1.1 mL of 100 × penicillin/streptomycin solution, 0.16 mL of 200 mM Glutamine (final concentration 0.3 mM) and 5.3 mL of fetal bovine serum (final concentration 5%). For the Trp plate 1.1 mL of fetal bovine serum was added instead of 5.3 mL, for a final concentration of 1%. For plates PM-5 to M8, 5.5 mL of 100 mM glucose (final concentration 5%) was added in place of the fetal bovine serum. The cells were incubated for 48 h at 37 °C in 5% CO 2 . After this first incubation, Biolog Redox Dye Mix MB was added (10 μL/well) and the plates were incubated under the same conditions for an additional 24 h, during which time the cells metabolize the sole carbon source in the well. As the cells metabolize the carbon source, tetrazolium dye in the media is reduced, producing a purple color according to the amount of NADH generated. During the 24 h of exposure to the dye, the plates were incubated in the Omnilog system, which measured the optical density of each well every 15 min, generating 96 data points. The information collected during the 24 h was analyzed by the kinetic software of the system to generate kinetic curves of the NADH generation for each well and calculate kinetic parameters, such as slope, endpoint and area under the curve. At the end of the 24-h incubation, the plates were analyzed utilizing a microplate reader with readings at 590 nm and 750 nm. The first value (A 590 ) indicated the highest absorbance peak of the redox dye, and the second value (A 750 ) gave a measure of the background noise. The relative absorbance (A 590-750 ) was calculated per well.
For Phenotype Mammalian data, the absorbance endpoint readings and the 96 datapoints of kinetic optical density collected over the 24 h of incubation with the tetrazolium dye in the Omnilog system were used for data normalization and statistical analysis using R (opm R package) [55]. The readings were normalized using the triplicate absorbance readings from the corresponding empty plate (plates run with no cells, just media and dye). These values were then transformed to a logarithmic scale for the analysis and compared versus the average values generated by 50 lymphoblastoid cell lines from healthy controls. The control samples were obtained from peripheral blood samples of 50 typically developing subjects from North and South Carolina (USA), 24 males and 26 females (male-to-female ratio 0.92), whose age at blood sampling ranged from 1.2 to 10.3 years. Our goal was to identify the wells in which the levels of NADH generated by PHMDS cells were significantly different from the ones measured in controls. We utilized the R package to implement the nonparametric Mann-Whitney approach of a two-sided t test with the cutoff of p value ≤ 0.05 (https ://www.rdocu menta tion.org/packa ges/stats /versi ons/3.6.2/topic s/wilco x.test). The generated p values were corrected for multiple testing using the Benjamini and Hochberg method using the R package (https ://www.rdocu menta tion.org/packa ges/stats / versi ons/3.6.2/topic s/p.adjus t) to obtain adjusted p values. The Mann-Whitney test was performed for the large-and small-deletion cohort against the controls separately. Using this approach, we were able to identify the metabolites differentially metabolized between cases with PHMDS and controls.