Genome-wide epigenomic profiling for biomarker discovery

A myriad of diseases is caused or characterized by alteration of epigenetic patterns, including changes in DNA methylation, post-translational histone modifications, or chromatin structure. These changes of the epigenome represent a highly interesting layer of information for disease stratification and for personalized medicine. Traditionally, epigenomic profiling required large amounts of cells, which are rarely available with clinical samples. Also, the cellular heterogeneity complicates analysis when profiling clinical samples for unbiased genome-wide biomarker discovery. Recent years saw great progress in miniaturization of genome-wide epigenomic profiling, enabling large-scale epigenetic biomarker screens for disease diagnosis, prognosis, and stratification on patient-derived samples. All main genome-wide profiling technologies have now been scaled down and/or are compatible with single-cell readout, including: (i) Bisulfite sequencing to determine DNA methylation at base-pair resolution, (ii) ChIP-Seq to identify protein binding sites on the genome, (iii) DNaseI-Seq/ATAC-Seq to profile open chromatin, and (iv) 4C-Seq and HiC-Seq to determine the spatial organization of chromosomes. In this review we provide an overview of current genome-wide epigenomic profiling technologies and main technological advances that allowed miniaturization of these assays down to single-cell level. For each of these technologies we evaluate their application for future biomarker discovery. We will focus on (i) compatibility of these technologies with methods used for clinical sample preservation, including methods used by biobanks that store large numbers of patient samples, and (ii) automation of these technologies for robust sample preparation and increased throughput.


Background
Within fundamental and clinical research and in clinical practice, biomarkers play an important role to facilitate disease diagnosis, prognosis, and selection of targeted therapies in patients. As such, biomarkers are critical for personalized medicine to improve disease stratification: the identification of groups of patients with shared (biological) characteristics, such as a favorable response to a particular drug [1,2]. Biomarkers need to fulfill a number of requirements, the most important of which is to show high predictive value. From a practical perspective, the detection method for a biomarker must be accurate, relatively easy to carry out, and show high reproducibility [3].
Over the last decade, there has been an increasing interest in biomarkers at the hand of rapid developments within high-throughput molecular biology technologies, capable of identifying "molecular biomarkers" [4,5]. Molecular biomarkers possess a critical advantage over more traditional biomarkers during the exploratory phase of biomarker discovery, as many candidate molecular biomarkers can be assayed in parallel. This particularly involves screening of (epi)genomic features at a genome-wide scale, often making use of powerful next-generation sequencing (NGS)-based technologies. These screens can assess very large numbers of loci for the presence or absence of a certain (epi)genomic feature. Subsequently, these loci can be evaluated as a potential biomarker by determining their correlation between samples with different characteristics, for example, by comparing healthy versus diseased tissue.
To be suitable for biomarker discovery, (epi)genomic profiling assays need to fulfill a number of important requirements. To accommodate sample collection for batch processing, clinical samples are often preserved by freezing or by formaldehyde crosslinking. Therefore, an important requirement for (epi)genomic biomarker screening technologies is that these are compatible with processed samples. Additionally, this allows inclusion of clinical samples that have been processed for biobanking, or to use such samples for replication or validation. Biobanks collect large numbers of samples such as tissues or DNA (deoxyribonucleic acid) and the associated patient information, which is highly valuable for retrospective biomarker studies [6][7][8][9]. Exploratory screens for candidate biomarkers mainly rely on the use of patient specimens, which are obtained in small quantities, while also biobanks often contain limited quantities of patient material. Therefore, a second requirement is that assays used for biomarker discovery are compatible with miniaturization to allow processing of lowinput samples. Furthermore, robust biomarker discovery is dependent on the screening of large numbers of samples due to the inherent clinical and biological variability between patient samples [10]. Assays used for biomarker discovery therefore benefit from automation and digitalization, facilitating upscaling while reducing the chance of errors due to human handling.
Genomic features that are utilized for molecular biomarker discovery can be separated in two categories: (i) changes in the DNA sequence itself, such as mutations and rearrangements, and (ii) changes in the epigenome, represented by molecules and structures associated with the DNA such as DNA methylation and posttranslational histone modifications. This review will focus on the latter category, as recent developments in epigenetic profiling technologies have not only greatly increased our knowledge on epigenetic regulation, but also allow for large-scale discovery of molecular epigenetic biomarkers. The first section of this review provides an overview of epigenetic features and how these can be assayed. We discuss how misregulation of epigenetic processes may lead to disease, providing mechanistic rationale for the use of epigenetic features as biomarkers. The feasibility of applying epigenetic biomarkers in the clinic is demonstrated by examples of DNA methylation biomarkers that have reached clinical stages. In the second part of this review, we will focus on current genome-wide epigenomic profiling technologies, and whether these are already or will likely become compatible with biomarker discovery in the near future. We will evaluate these approaches with three criteria in mind: (i) the possibility to use frozen or chemically fixed material in these assays, (ii) compatibility with miniaturization and single-cell profiling, and (iii) the current level of automation.

Main text
The epigenome Within a eukaryotic cell, the DNA is packaged to fit into the small volume of the nucleus in a highly organized fashion. The basic unit of chromatin involves the DNA wrapped around nucleosomes consisting of two copies of each of the core histones H2A, H2B, H3, and H4: the so-called beads-on-string structure [11]. Subsequent compaction leads to higher order structures including the formation of very dense arrays of nucleosomes observed in heterochromatin [12,13]. Despite being tightly packed, the chromatin appears to be highly plastic to allow processes such as transcription, DNA damage repair, DNA remodeling, and DNA replication. This plasticity is facilitated by several factors that influence both local and global chromatin architectures. The most prominent features affecting chromatin structure are reversible covalent modifications of the DNA, e.g., cytosine methylation and hydroxymethylation mainly occurring within the genomic CG context (CpGs), and reversible post-translational modifications of histones, e.g., lysine acetylation, lysine and arginine methylation, serine and threonine phosphorylation, and lysine ubiquitination and sumoylation. These modifications are set by specific classes of enzymes: DNA methyltransferases (DNMTs) in case of cytosine methylation [14] or histone-modifying enzymes [15]. Besides facilitating chromatin compaction, modifications of the DNA and histones are read by adaptor molecules, chromatinmodifying enzymes, and transcription factors (TFs) that contribute to the regulation of transcription and other chromatin-related processes [15,16]. Next to modifications of DNA and histones, the three-dimensional (3D) conformation of the DNA within the nucleus imposes an additional regulatory layer of gene expression [17].
The chromatin state of a cell, including the genomic localization of modifications of DNA and histones, TF binding sites, and 3D DNA structure, is generally referred to as the epigenome. The epigenome is an important layer that regulates which parts of the genome are accessible and thereby active and which parts are condensed and hence inactive. As such, epigenetic changes are a major driver of development and important to gain and maintain cellular identity. Each of the approximately 200 distinct cell types in the human body has essentially the same genome but has a unique epigenome that serves to instruct specific gene expression programs present within the cells. To gain insight in this variation, the epigenetic features of these cell types ( Fig. 1) are comprehensively studied at a genome-wide scale using high-resolution technologies as summarized in Table 1.
Most of the approaches are based on NGS, which generally yield higher sensitivity and resolution as compared to alternative readouts such as microarrays, and provide additional information such as allele specificity [18,19]. The International Human Epigenome Consortium (IHEC; http://www.ihec-epigenomes.org) and associated consortia such as BLUEPRINT and National Institutes of Health (NIH) Roadmap use these technologies to generate human reference data sets for a range of epigenetic features [20][21][22][23]. The aim of IHEC is to generate approximately 1000 reference epigenomes which are made publicly available. This data contains a wealth of information on the epigenetic mechanisms acting in healthy cells and serves as a valuable reference for comparisons with malignant cells and tissues [24,25].
Comparative analyses of epigenomes are complicated by the epigenetic variability that is present between individuals within a population. Genetic variation such as SNPs (single-nucleotide polymorphisms) or indels in regulatory sequences or mutations in epigenetic enzymes will have a direct effect on the epigenome [26][27][28][29]. Furthermore, environmental factors such as lifestyle, stress, and nutrition influence epigenetic patterns [30][31][32][33]. Also, epigenetic patterns change during aging. In fact, DNA methylation markers in saliva and blood can be used for accurate estimation of age [34][35][36][37]. Thus, epigenetic patterns are plastic and change during development and over time. The variability between individuals has to be accounted for in epigenetic studies including biomarker discovery and hence large cohorts need to be studied to overcome the intra-individual variation. In this respect, it is important to note that the extent of the intraindividual variation is much less as compared to the variation observed between tissues within individuals, at least for DNA methylation [38][39][40].
It has become increasingly clear that misregulation or mutations of epigenetic enzymes are at the basis of a broad range of syndromes and diseases [41]. Mutations in epigenetic enzymes are frequently observed in cancer [42], intellectual disability [43], neurological disorders such as Alzheimer's, Parkinson's, and Huntington's disease [44], and autoimmune diseases such as rheumatoid arthritis [45][46][47] and type 1 diabetes [48]. Most studies have been performed in cancer:~30% of all driver genes characterized in cancer are related to chromatin structure and function [42]. Well-known examples of genes in which mutations can promote or drive tumorigenesis include DNMT3A and TET2, involved in DNA methylation and DNA demethylation, respectively, and EZH2, which is part of the polycomb repressive complex 2 (PRC2) complex that trimethylates lysine 27 on histone 3 (H3K27me3) [49][50][51]. Apart from mutations in epigenetic enzymes, mistargeting of epigenetic enzymes, such as the silencing of CDKN2A and MLH1 by aberrant promoter DNA methylation, is considered to drive tumor formation [52]. Given their prominent roles in cancer and various other diseases, epigenetics enzymes represent promising targets for therapeutic intervention. For example, small molecules targeting enzymes involved in the post-translational modifications of histones, such as SAHA (suberanilohydroxamic acid; Vorinostat) inhibiting histone deacetylases (HDACs), are effective as therapeutic drugs for a range of tumor types including T cell lymphomas in case of SAHA [53][54][55]. See Rodriguez and Miller [56], Qureshi and Mehler [57], and various papers within this special issue for excellent recent reviews on the use of small molecules to target epigenetic enzymes and their current status in clinical applications.

Epigenetic biomarkers
Molecular diagnosis and prognosis is traditionally often based on (immuno)histochemistry or immunoassays, for example by assaying prostate-specific antigen (PSA) in case of testing for prostate cancer [58]. Also , changes in RNA (ribonucleic acid) expression, genetic alterations, Table 1 Summary of the main epigenetic features and the principles, caveats, and requirements of the main technologies used for their profiling DNA methylation. DNA methylation is the process in which a methyl group is added to the 5′ position of cytosines in the DNA, which mainly occurs within the context of CpGs. DNA methylation typically acts to repress gene transcription when located in a gene promoter, while gene-body methylation is positively correlated with expression [153][154][155][156][157]. Distal regulatory regions like enhancers generally contain low DNA methylation levels when active due to binding of TFs [158]. The role or consequence of DNA methylation at other places of the genome is less well understood [14]. Genome-wide profiling of DNA methylation generally relies on (i) affinity purification of methylated DNA fragments or (ii) the use of sodium bisulfite converting unmethylated cytosines into uracil. The technologies referred to by the first method, MBD-Seq/MethylCap-Seq (methyl-CpG binding domain protein-enriched sequencing/ methylated DNA capture sequencing) [140,141,159] and MeDIP-Seq (methylation DNA immunoprecipitation sequencing) [160,161], utilize a methyl binding protein domain or an antibody raised against 5-methylcytosine, respectively, to affinity purify methylated DNA fragments from sheared genomic DNA. Although MethylCap-Seq/MeDIP-Seq provides accurate measurements of DNA methylation [162], an important caveat is the aspecific background remaining after the affinity purification. These might cause false positive results (in particular in case of copy number variations) if not properly controlled for. The second method makes use of bisulfite on sheared genomic DNA to convert unmethylated cytosines into uracil, while leaving methylated cytosines unaffected [154]. After subsequent amplification to prepare the DNA for readout, the uracil (representing the unmethylated cytosine) is read as a thymidine, while cytosines represent methylated cytosines in the original sample. The readout of bisulfite-based methods is mainly performed by microarrays (including the Infinium Human-Methylation450 BeadChip array ("450K array") covering 450,000 of the 28 million genomic CpGs) [163] or by sequencing, referred to as whole-genome bisulfite sequencing (WGBS). In light of the high sequencing costs associated with WGBS, reduced representation bisulfite sequencing (RRBS) selects for CpG-rich fragments before sequencing using methylation-insensitive restriction enzymes such as MspI [164]. An important advantage of bisulfite-based methods (450K array, WGBS, RRBS) over other DNA methylation profiling technologies is that these generate DNA methylation profiles at base-pair resolution. Furthermore, the input requirements for WGBS/RRBS (20 ng of DNA for low-input WGBS/RRBS profiling, equivalent to 3 × 10 3 cells [121]) are low as compared to the 450K array (500 ng; 7.5 × 10 4 cells) and MBD-Seq/MethylCap-Seq/MeDIP-Seq (1 μg DNA; 1.5 × 10 5 cells). Although dependent on sequencing depth, the coverage of WGBS is usually >90% of all CpGs in the genome [165,166], as compared to 60-90% for MBD-Seq/MethylCap-Seq/MeDIP-Seq and 2% for the 450K array. In view of the superior specifications, WGBS is considered the "golden standard" for determining the DNA methylome.
Protein binding sites. Characterization of the genomic locations of post-translational histone modifications, histone variants, TFs, and other chromatin associated proteins is generally performed by chromatin immunoprecipitation (ChIP). ChIP relies on the use of a specific antibody to perform affinity purifications on sheared chromatin to isolate fragments bound by the protein of interest. In most workflows, proteins are crosslinked to the DNA by formaldehyde, after which the chromatin is fragmented by sonication or enzymatic digestion. However, in particular in case of histones, ChIP can also be performed on native (meaning non-crosslinked) chromatin fragmented by micrococcal nuclease (MNase) [167,168]. After ChIP, the purified DNA fragments are sequenced to determine the protein localization on a genome-wide scale (ChIP-Seq) [169,170]. Loci in the genome which are enriched for mapped sequencing reads (generally referred to as "peaks" according to their visual appearance in genome browsers) represent protein binding sites. ChIP-Seq heavily relies on the availability of antibodies that are specific for their endogenous target and that are compatible with the ChIP conditions. Since ChIP-Seq relies on an enrichment strategy, it generally requires a relative high number of cells as input to distinguish specific signals from background. The number of input cells for ChIP-Seq is typically 0.5-5 × 10 6 cells, with profiling of histones requiring less cells than profiling TFs [134].
Chromatin accessibility/footprinting. Transcriptional activation is tightly linked with disruption or eviction of nucleosome organization at control regions such as promoters and enhancers due to binding of TFs. Regulatory DNA thus coincides with open or accessible genomic sites in chromatin [171,172]. Profiling of these accessible sites is performed using the exonuclease desoxyribonuclease 1 (DNaseI) or using the Tn5 transposase on native chromatin, as both enzymes are able to target accessible genomic regions within chromatin. Selecting and sequencing short fragments (50-150 nt) after treatment with DNaseI (DNAseI-Seq) [173,174] or transposase (assay for transposase-accessible chromatin (ATAC)-Seq) allows to enrich for TF binding sites, in contrast to larger fragments that might be derived from nucleosomes [175]. Similar to ChIP-Seq, loci in the genome which are enriched for mapped sequencing reads (referred to as "peaks") represent accessible sites. Within the ATAC-Seq procedure, the Tn5 transposase directly inserts the adapters for sequencing. Therefore, ATAC-Seq has an important advantage in that it requires a relative small number of cells (5 × 10 4 cells) [175] to start with as compared to DNAseI-Seq (1-10 × 10 6 cells [172]). Both for ATAC-Seq and DNAseI-Seq, characterization of enriched DNA motifs within the accessible sites can be used to infer the identity of sequence-specific TFs. A complementary approach to infer the identity of TFs that are binding within accessible regions is by the use of so-called "footprints." Sequence-specific TFs protect the genome from DNAseI and transposase digestion at the exact position where they are binding the DNA. This results in a unique, detectable footprint that can be used for characterization of the factor that is binding [174,176].
Nucleosome occupancy/positioning. Nucleosomes are the basic core particles of the chromatin, consisting of histones and approximately 147 base pairs of DNA wrapped around it. Although the DNA-protein binding within nucleosomes is very stable, nucleosomes can be remodeled or slide along the DNA, thereby facilitating or inhibiting chromatin-related processes such as transcription. Nucleosome positioning is usually determined with the use of MNase on native chromatin [171,177]. MNase is an endo-exonuclease that digests and cleaves DNA unless it is protected by proteins. Nucleosome position can be determined by sequencing the DNA fragments (115-195 bp in size) isolated from chromatin treated with MNase (MNase-Seq) [178,179]. A typical MNAse-Seq profiling experiments requires 1-10 × 10 6 cells.
3D conformation of the genome. Chromatin loops and further high-order chromatin structures are profiled using chromosome confirmation capture [180]. Chromosome confirmation capture relies on digestion of crosslinked chromatin using restriction enzymes, followed by ligation of the sticky ends. Sequencing of DNA ligation products allows to determine the proximity of the ligated fragments and provides insight into the 3D structure within the nucleus. Chromosomal loci that are far apart on a linear chromosome, but close together in nuclear space, can come into proximity and will hence be ligated [181]. For genome-wide profiling, two different variants of chromosome confirmation capture that are popular include Circular chromosome confirmation capture (4C-Seq) [182] and HiC-Seq [183,184]. 4C-Seq determines all genomic interaction partners of one specific locus in the genome (referred to as "bait") at high resolution and sensitivity. In HiC-Seq, all genomic interactions are profiled at low resolution and sensitivity, enabling a global 3D view on the genome. Using HiC-Seq, recent studies in mice and human have revealed that chromosome territories are arranged into large megabase-sized topologically-associating domains (TADs) that are highly conserved and stable across cell types [183,185]. 4C-Seq experiments typically require 1 × 10 7 cells [186], while HiC-Seq experiments require 2.5 × 10 7 cells [187]. and chromosomal abnormalities represent powerful biomarkers in various diseases including cancer [59]. Notable examples are mutations in the BRCA1 and BRCA2 genes in breast and ovarian cancer or the presence of the Philadelphia chromosome in leukemia [60][61][62]. With the growing understanding that changes in the epigenome and chromatin are related with or causative in disease [41], it became clear that epigenetic alterations represent promising features to be used as biomarkers. An important characteristic for their use as biomarker is that epigenetic marks, in particular DNA methylation, are known to survive sample storage conditions reasonably well [63,64]. Another convenient characteristic is that almost every biological tissue sample or body fluid such as blood or saliva can be used for analysis of DNA methylation and other epigenetic marks [22,65,66]. This robustness makes the application of epigenetic biomarkers in a clinical environment attractive.
Over the recent years, it has become clear that epigenetic features contain a high predictive value during various stages of disease. These analyses thus far mainly focused on DNA methylation. DNA methylation has been shown to be informative for disease diagnosis, prognosis, and stratification. Some of the DNA methylation-based epigenetic biomarkers, such as the methylation status of VIM and SEPT9 for colorectal cancer, SHOX2 for lung cancer, and GSTP1 for prostate cancer, are in clinical use and diagnostic kits are commercially available [67][68][69][70][71]. In case of one of the best characterized biomarkers, GSTP1, a meta study (mainly using prostatectomy tissue or prostate sextant biopsies) showed that hypermethylation of the promoter allows to diagnose prostate cancer with a sensitivity of 82% and a specificity of 95% [72]. Importantly, the use of multiple DNA methylation biomarkers (combining hypermethylation of GSTP1, APC, RASSF1, PTGS2, and MDR1) resulted in a sensitivity and specificity of up to 100% [73]. See Heyn and Esteller [74] for a recent comprehensive overview of DNA methylation biomarkers and its potential use in the clinic. In addition to its diagnostic potential, it has been well established that DNA methylation is informative for patient prognosis in terms of tumor recurrence and overall survival. For example, the hypermethylation of four genes, CDKN2A, cadherin 13 (CDH13), RASSF1, and APC, can be used to predict tumor progression of stage 1 non-small cell lung cancer (NSCLC) [75]. In addition to disease prognosis, DNA methylation has been shown to be valuable for patient stratification to predict response to chemotherapeutic treatment. A well-known example is hypermethylation of MGMT in glioblastoma, which render the tumors sensitive to alkylating agents [76,77] such as carmustine and temozolomide.
Together, these examples show the power and feasibility of using epigenetic features, and in particular DNA methylation, as biomarkers. Epigenetic biomarkers are complementary to genetic biomarkers. Whereas genetic mutations can (among others) disrupt protein function due to amino acid changes, epigenetic alterations can de-regulate mechanisms such as transcriptional control, leading to the inappropriate silencing or activation of genes. Notably, epigenetic changes occur early and at high frequencies in a wide range of diseases including cancer [78]. It has been suggested that epigenetic alterations occur at higher percentages of tumors than genetic variations, resulting in a higher sensitivity in the detection of tumors [79].

Genome-wide epigenetic profiling for DNA methylation biomarkers
Thus far, the discovery of the epigenetic biomarkers mostly relied on targeted approaches using individual gene loci known or suspected to be involved in the etiology or progression of the disease or other phenotype under study. Despite the challenges in the identification of biomarkers using such approaches, this yielded a number of important epigenetic biomarkers. However, these approaches require a priori knowledge for the selection of candidate biomarkers.
In order to perform unbiased screens in the exploratory phase of biomarker discovery, genome-wide profiling technologies have spurred molecular biomarker discovery (detailed information on epigenomic profiling assays is presented in Table 1). Using these technologies, the entire (epi)genome can be interrogated for potential biomarkers by comparing healthy versus diseased cells/ tissue, malignant versus non-malignant tumors, or drugsensitive versus drug-resistant tumors. This enables selection of candidate biomarkers that are most informative for disease detection, prognosis, or stratification. The use of genome-wide screens furthermore enables to detect and evaluate combinations of (many) candidate loci, which often results in increased sensitivity and specificity of the biomarker. Importantly, the identification of individual genomic loci or genes as biomarkers from large datasets requires robust statistical testing such as multiple-testing correction (although traditional tests like the Bonferroni correction are over-conservative since there is often correlation between loci, i.e., they are not independent) or stringent false discovery rate (FDR) control (for example, by the Benjamini-Hochberg procedure) [80][81][82]. To define sets of biomarkers from large dataset, alternative statistical methods (such as sparse principle component analysis (PCA) or sparse canonical correlation analysis (CCA) [83,84]) are available as well. In light of (i) challenges with the experimental setup when using patient material, (ii) costs, and (iii) the extensive computational analysis associated with the exploratory phase of biomarker discovery, genome-wide screens are often performed on relatively small cohorts.
Independent of the (statistical) methods used, it is essential to validate (sets of ) candidate biomarkers in followup studies on large cohorts using targeted epigenetic approaches before potential application in the clinic [85].
Recent years have seen an increasing number of studies using genome-wide epigenetic profiling to predict disease outcome. For a range of tumors, including childhood acute lymphoblastic leukemia [86], kidney cancer [87], NSCLC [88], rectal cancer [89], cervical cancer [90,91], breast cancer [92,93], and glioblastoma [94], DNA methylome analysis has been shown to be of prognostic value. Most of these studies define changes in DNA methylation at single sites or at small subsets of sites that represent potential disease signatures. Although these studies are often restricted to a subset of CpGs within the genome and mostly rely on relatively small sample sizes, they show the power of performing genome-wide biomarker screens.
Currently, the most popular platform used in the exploratory phase of DNA methylation biomarker discovery represents the Infinium HumanMethylation450 BeadChip array (further referred to as "450K array"; see a short explanation of the 450K array within Table 1). The probes on the 450K array mainly represent functional CpG islands and functional elements such as promoters, enhancers, and TF binding sites. Main advantages of the 450K array for the detection of DNA methylation as compared to other DNA methylation platforms include (i) its high reproducibility, (ii) the straightforward analysis methods, (iii) the large number of samples that have been profiled using the 450K array thus far (which can be used for comparative purposes), and (iv) the relatively low costs. A disadvantage, like with all bisulfite-based methods (unless combined with additional chemical procedures), is that the 450K array is unable to distinguish between DNA methylation and DNA hydroxymethylation. Hydroxymethylated cytosines represent an intermediate step during demethylation of methylated cytosines but is relatively stable and is therefore likely to have specific biological functions as well [95]. It should be noted that levels of DNA hydroxymethylation are generally much lower as compared to levels of DNA methylation (for example, DNA hydroxymethylation levels are >95% lower in case of peripheral blood mononuclear cell (PBMC) [96]). A further disadvantage of the 450K array is that genetic differences between samples might result in false positives, in particular since a subset of probes on the 450K array target polymorphic CpGs that overlap SNPs [97,98]. For association studies using large cohorts, computational methods (based on principle components) have been developed to account for population stratification resulting from differences in allele frequencies [98][99][100].
To enable robust screening for a (set of ) potential biomarker(s), most current studies apply the 450K array on up to several hundred samples. To narrow down and validate candidate biomarkers, more targeted DNA methylation assays are used on the same or a very similar-sized cohort [101]. Subsequently, the remaining candidate biomarkers are further validated on larger cohorts using targeted DNA methylation assays that are compatible with routine clinical use, for example, by amplicon bisulfite sequencing [85]. Using this powerful workflow, tumors for which prognostic biomarkers have been identified include rectal cancer [102], breast cancer [103], hepatocellular carcinoma [104], and chronic lymphocytic leukemia (CLL) [105,106]. Interestingly, using a similar workflow, sets of DNA methylation biomarkers have recently been identified that are prognostic for the aggressiveness of tumors in prostate cancer [107,108]. Such studies are very important for improving treatment of prostate cancer by avoiding (radical) prostatectomy in cases where careful monitoring of the tumor over time is preferred.

Biomarkers other than DNA methylation
The majority of epigenetic biomarkers identified thus far involve changes in DNA methylation. However, in light of the various types of epigenetic misregulation associated with diseases, changes in epigenetic features other than DNA methylation are likely to become powerful molecular biomarkers as well. ChIP-Seq profiling has revealed prominent differences in binding sites of posttranslational histone modifications and other proteins between healthy and cancer tissue, both in leukemia as well as in solid tumors. For example, localized changes in H3 acetylation have been reported in leukemia (see, for example, Martens et al. [109] and Saeed et al. [110]). For solid tumors, differential estrogen receptor (ER) binding and H3K27me3 as determined by ChIP-Seq has been shown to be associated with clinical outcome in breast cancer [111,112]. Also, androgen receptor (AR) profiling predicts prostate cancer outcome [113]. A recent study identified tumor-specific enhancer profiles in colorectal, breast, and bladder carcinomas using H3K4me2 ChIP-Seq [114]. Next to ChIP-Seq, DNAseI hypersensitivity assays have identified tumor-specific open chromatin sites for several types of cancer (see, for example, Jin et al. [115]). In terms of chromatin conformation, it has recently been shown that disruption of the 3D conformation of the genome can result in inappropriate enhancer activity causing mis-expression of genes including proto-oncogenes [116,117]. These examples show that, besides DNA methylation, changes in (i) protein binding sites (including post-translational histone modifications), (ii) accessible (open) chromatin, and (iii) the 3D conformation of the genome represent epigenetic features that are potential effective biomarkers (Fig. 1). The near absence of biomarkers based on these epigenetic features is mainly due to practical reasons. ChIP-Seq as well as other comprehensive epigenetic profiling technologies traditionally require (much) more input material, up to 1 × 10 6 cells or more, to obtain robust results as compared to DNA methylation profiling (Table 1). This is particularly challenging for (banked) patient samples, which are often available in small quantities that might not be compatible with epigenetic profiling other than DNA methylation profiling. Also, profiling of such epigenetic features often require elaborate and delicate workflows (Table 1). Hence, quantitation and reproducibility of ChIP-Seq and other epigenetic profiling assays besides DNA methylation profiling are challenging. Furthermore, DNA methylation profiling is better compatible with (archived) frozen or fixed samples.
However, the last 2 years have seen a spectacular progress in miniaturization of epigenetic profiling assays. In various instances, this included automation of (part of ) the workflow, improving the robustness of the assays and its output. Also, improved workflows for epigenetic profiling of frozen or fixed samples have been reported. Although this involved proof-of-concept studies in basic research settings, these efforts are likely to have significant impetus on genome-wide epigenetic screens for candidate biomarkers. The remainder of this review will provide an overview of the current status of genomewide epigenetic profiling and the technological advances that facilitate miniaturization, automation, and compatibility with preserved samples.

New developments in epigenetic profiling: compatibility with preservation methods
Most epigenetic profiling assays have been developed using fresh material in order to preserve the native chromatin architecture. However, epigenetic biomarker screens require the use of patient-derived clinical samples that are generally processed to preserve the samples as well as to allow convenient sample handling, for example, for sectioning of biopsies. Also, samples present in biobanks are fixed to allow long-time storage. In particular for retrospective studies, epigenetic profiling technologies that are applied for biomarker screens should therefore be compatible with methodologies that are routinely used for sample preservation: freezing and chemical fixation (in particular FFPE fixation) [118].

Chemical Fixation (FFPE)
Chemical fixation generally includes overnight crosslinking with formaldehyde at high concentrations (up to 10%), followed by dehydration and paraffin embedding (so-called "FFPE": formalin-fixed, paraffin-embedded) [128]. Although procedures for FFPE fixation are timeconsuming, FFPE fixation has the advantage that samples can be stored at room temperature and that samples can be evaluated by morphology or immunohistochemistry (prior to possible further processing such as epigenetic profiling).
FFPE conditions do not affect DNA methylation, and also formaldehyde and paraffin do not interfere with the WGBS profiling procedure [129]. However, epigenetic assays other than bisulfite-based DNA methylation profiling are cumbersome with FFPE samples (Fig. 2). In case of ChIP-Seq, crosslinking generally occurs in much milder conditions (1% formaldehyde for 10 min) as compared to the harsh conditions used for FFPE fixation [120], which can complicate shearing and epitope accessibility. Pathology tissue (PAT)-ChIP has been reported to prepare FFPE samples for ChIP-Seq by the use of deparaffinization, rehydration, and MNase treatment followed by sonication at high power [130,131]. However, PAT-ChIP comes with various limitations including the long running time of the protocol (up to 4 days) and the fact that it is not compatible with all ChIP-grade Fig. 2 Compatibility of commonly used sample preservation methods with current epigenome profiling assays. A dotted line indicates that these assays would benefit from further optimization antibodies. Interestingly, some of these issues have been resolved in the very recently developed fixed-tissue (FiT)-Seq procedure, which might open up new avenues for ChIP-Seq profiling of FFPE samples [114]. DNaseI-Seq on FFPE samples has been reported at the expense of a drop in signal-to-noise ratios of around 50% as compared to the use of fresh material [115].
Despite new developments for ChIP-Seq and DNaseI-Seq, this overview shows that DNA methylation is still the most robust of all epigenetic marks for profiling of samples that are processed by freezing or chemical fixation. Although most other epigenetic profiling assays are compatible with frozen samples (at the expense of signal-to-noise ratios for some of the assays), they are generally not or poorly compatible with FFPE specimens (Fig. 2). This also implies that for these assays, it is much more challenging to make use of laser microdissection to select specific regions of interest from specimens for epigenetic analysis, for example, to separate tumor cells from stromal cells [132,133]. An additional advantage of using DNA methylation for biomarker screening is that, in contrast to the other epigenetic profiling assays discussed, the profiling can be performed on isolated genomic DNA. This enables the use of genomic DNA from clinical DNA banks to be included in DNA methylation biomarker screens.
It should be noted that in contrast to retrospective studies, it might be feasible to use fresh or fresh-frozen patient material for screening in prospective biomarker studies. However, the use of fresh(-frozen) material in these studies could interfere with further development of potential biomarkers if it turns out that these biomarkers are incompatible with (FFPE-)fixed patient material present in the clinic. In all cases, when collecting patient samples for profiling of epigenetic marks, it is important to keep the time between surgical removal and fixation or freezing as short as possible to avoid epitope destruction and/or breakdown of the chromatin. It would therefore be helpful if the procedure time up to fixation would be documented for banked samples, so as to evaluate whether such banked samples are suitable for the epigenetic profiling technology of choice.

New developments in epigenetic profiling: miniaturization and automation
The recent years saw great progress in low-input epigenetic profiling without significantly affecting signal-tonoise ratios (Fig. 3). Also, all main genome-wide epigenetic profiling assays are now compatible with single-cell readouts. An overview of the main technological advances that allowed miniaturization and single-cell readout is described in Table 2. Besides miniaturization, various epigenetic profiling assays, in particular ChIP-Seq, have been (partly) automated to improve reproducibility and to allow higher throughput. In this section, we briefly evaluate these new technological developments with respect to biomarker discovery.

Miniaturization of epigenetic profiling
As summarized in Table 2, Fig. 3, and Table 3, the amount of cells required for three of the main epigenetic profiling assays is currently well compatible with amounts present in patient-derived specimens or amounts present in banked patient samples. For bisulfite-based DNA methylation profiling, a starting amount of 7.5 × 10 4 cells for the 450K array or 3 × 10 3 cells for WGBS/RRBS is sufficient to obtain high-quality genome-wide profiles. For ChIP-Seq, the minimum amount of starting material is highly dependent on the protein to be profiled and the antibody that is used [134]. Although both histone modification and TF binding sites (such as ER [111,112]) are potentially powerful as biomarker, the minimum number of cells required for histone modification profiling (~1-5 × 10 4 cells) is much more compatible with patient samples than the number of cells required for TF profiling (generally 1 × 10 5 cells or more; Tables 1 and 2). ATAC-Seq and DNAseI-Seq are compatible with as low as 200 cells and 1 × 10 3 cells, respectively (Table 2) [115,135]. Together, this shows that the input requirements for bisulfite-based DNA methylation profiling, ChIP-Seq (in particular for histone modifications), and ATAC-Seq/DNAseI-Seq are well compatible with most clinical samples. The minimum number of cells currently required for 4C-Seq and HiC-Seq, at least 1 × 10 7 cells, is currently too high for clinical use.
Interestingly, all main epigenetic profiling assays can now provide single-cell readouts ( Table 2, Table 3). The possibility to assay individual cells within populations allows interrogation of heterogeneity which in "bulk" profiling would be averaged. This is very informative for clinical samples which can be highly heterogeneous Fig. 3 Level of comprehensiveness of epigenetic data from global epigenetic profiling assays using an increasing number of cells as input [136]. Single-cell profiling has been shown to be powerful in obtaining molecular signatures of heterogeneous populations that shift in cell type composition [137]. As such, an important clinical application of single-cell profiling is to screen for resistant versus non-resistant cells after drug treatment [138] or to monitor disease progression [139]. In terms of biomarker discovery, the use of single-cell assays will allow to screen for cell types that are most informative for disease stratification. Also, the level of heterogeneity as measured by single-cell studies might possibly by itself be informative for disease stratification. From a practical perspective, epigenetic profiling of single cells is challenging. Since one cell only contains two copies for each genomic locus to be assayed, any loss of material during washing or enrichments steps such as immunoprecipitations will significantly impact the outcome of the assay. Similarly, background signals are hard to distinguish from true signal. One of the main strategies to account for false negative signals as well as for aspecific background is to include a large number of cells in single-cell epigenetic assays to enable proper statistics. However, this results in (very) large datasets, for which computational and statistical analysis are generally challenging. For single- Table 2 Overview of the main technological advances that allowed miniaturization and single-cell readout of genome-wide epigenetic profiling assays WGBS Conventional WGBS profiling is compatible with a relatively low number of cells (Table 1). Recently, WGBS was adapted to enable single-cell profiling (scBS-Seq; single-cell bisulfite sequencing) [188]. Single cells were captured by fluorescence-activated cell sorting (FACS). To cope with the extensive DNA damage caused by the bisulfite treatment, Smallwood et al. [188] performed tagging of the DNA fragments with sequencing adaptors after bisulfite conversion as developed by Miura et al. [189], instead of before conversion as performed in traditional WGBS. scBS-Seq allows to get coverage of up to 48.4% over all CpGs. A subsequent study by Farlik et al. [190] used a similar approach for scBS-Seq but adapted it such that the whole process of library preparation following bisulfite treatment and cleanup is performed in a single tube, minimizing DNA loss and reducing contamination risk [190].
ChIP-Seq. Traditionally, ChIP-Seq requires a large number of cells (at least several hundred thousands). However, improvements in the sample preparation procedure to prepare the ChIPped DNA for sequencing allowed to perform ChIP-Seq profiling on 1 × 10 4 cells for H3K4me3 and H3K27me3 [191][192][193][194] and recently even 200 cells for H3K4me3 [195]. Also, the use of MNase for chromatin digestion has been shown to facilitate low-input ChIP-Seq [196,197]. An alternative approach for downscaling the number of cells for ChIP-Seq is to use carrier material, such as inert proteins and/or mRNA, which do not interfere with the ChIP-Seq procedure but increase efficiency and sensitivity [198]. This strategy allowed to perform ChIP-Seq on the TF Estrogen Receptor (ER) on 1 × 10 4 cells. Similarly, bacterial DNA has been used as carrier, although this comes at the cost of increased sequencing depth as the bacterial DNA remains included in the sequencing procedure [199]. In more recent studies aiming to obtain ChIP-Seq information from low numbers of cells, barcodes or adaptors for sequencing are ligated or transposed before or during the ChIP procedure instead of after the ChIP. ChIPmentation, the use of transposase to add adaptors to DNA during the ChIP, was shown to be highly efficient and compatible with as low as 1 × 10 3 cells [200]. An alternative recent strategy for low-input ChIP-Seq relies on the addition of histone octamers during ChIP to outcompete unspecific binding [201]. Ligation of adaptors before the ChIP (indexing-first ChIP (iChIP)) allows to pool multiple samples during the ChIP-Seq procedure, after which the sequence tags can be mapped back to the original sample [202,203]. Bernstein and his coworkers developed this further using direct adaptor ligation on MNAse treated chromatin in an automated droplet-based microfluidic device to obtain single-cell resolution for H3K4me3 and H3K4me2 ChIP-Seq [152]. Efficient immunoprecipitations were performed by pooling 100 single cells with the addition of carrier material that is not amplified during preparation of the ChIPped DNA for sequencing. This workflow enables the profiling of thousands of individual cells in parallel, mainly due to the continuous flow of droplets that is being generated to capture the individual cells (Fig. 4). Inherent to single-cell enrichment techniques, the coverage per single cell is sparse (~1000 unique reads per cell) and does not allow comprehensive analysis of protein binding sites in individual cells. However, the single-cell ChIP-Seq was shown to be very powerful in identifying functionally-relevant subpopulations within embryonic stem cells [152].
ATAC-Seq/DNAseI-Seq. ATAC-Seq has been downscaled to less than 200 cells [135,175]. Next to this, Buenrostro et al. [151] reported ATAC-Seq to be compatible with single-cell profiling by performing transposition on single cells captured on a commercial microfluidics platform (Fluidigm C1; Fig. 4). This allows capturing of 96 single cells in parallel and subsequent processing steps toward a full library ready for sequencing. Together, this automated epigenetic platform represents the first of its kind in which a single-cell suspension is loaded on a platform that subsequently generates a full library for sequencing without any further manual intervention. An alternative approach for single-cell ATAC-Seq has been developed by Cusanovich et al. [204]. They performed the transposase reaction in intact nuclei on small pools, while simultaneously performing indexing of the tagged sides. Pooling followed by redistribution of the small cell numbers combined with the introduction of a second barcode for each cell allowed to map back the tags obtained after sequencing to individual cells. The advantage of this strategy is that it allows for a higher throughput as shown by the 15,000 individual cells profiled by Cusanovich et al. [204]. Recently, also DNaseI-Seq has been further developed to facilitate low-input profiling (between 1 × 10 2 and 1 × 10 4 cells) as well as single-cell profiling [115]. Critically, after FACS sorting of single cells followed by lysis and DNaseI digestion, large amounts of circular plasmid DNA were added during further sample preparation for sequencing. The genomic coverage of both DNaseI-Seq and ATAC-Seq in single cells is inherently low due to the fact that each cell only contains two copies of the genome. The average number of sequence reads per cell was about 317,000 reads for DNaseI-Seq [115] and 73,000 [151] or 35,000 [204] reads for ATAC-Seq after deep sequencing of the libraries. Clearly, these numbers of sequencing reads do not allow to investigate individual genomic loci within single cells. Rather, the computational analysis in both studies makes use of DNaseI hypersensitive sites (DHSs) determined in pools of cells in order to call DHSs in single cells. Despite this limitation, the single-cell chromatin accessibility assays were shown to be powerful in identifying cell-type specific transcription factors, and their variation on genomic binding within individual cells on a global scale [115,151].
4C-Seq and HiC-Seq. 4C-Seq and HiC-Seq are relatively new techniques [182][183][184], for which optimization to low cell numbers have not been extensively reported yet. However, it has been shown that HiC-Seq is compatible with single-cell profiling by performing in-nuclei DNA digestion and ligation and subsequent manual picking of individual nuclei. Using single-cell HiC-Seq it was shown that the large megabase-sized TADs that have been identified in large populations of cells are also present in single cells [205,206]. Furthermore, single-cell HiC-Seq was shown to be very powerful to reconstruct chromosome folding. Although providing information at single-cell resolution, the single-cell HiC-Seq protocol requires 1 × 10 7 cells as starting material to facilitate the early steps of the protocol. Inherent to the HiC-Seq protocol, the resolution obtained in individual cells is low. Currently, between 10,000 and 30,000 ligation events are profiled per cell [205].
cell epigenetic profiling of clinical samples, there are two additional issues to consider: (i) generation of single-cell suspensions from patient samples might be challenging, and (ii) the number of cells required as input for singlecell epigenetic profiling is generally higher than for miniaturized epigenetic profiling in order to enable capturing of single cells (Fig. 4), which might affect compatibility with patient samples. Since single-cell technologies emerged very recently, further developments in technology (to increase sensitivity and specificity) and in computational analysis (for more robust statistical testing and model development) are to be expected. Once single-cell epigenetic profiling has fully matured, it will be very powerful for biomarker discovery in heterogeneous cell populations such as human blood samples and biopsies.

Automation of epigenetic profiling
The use of genome-wide epigenetic profiling for biomarker discovery strongly benefits from automated procedures that are compatible with upscaling to facilitate large-scale screens. Main advantages of automation include (i) a reduction in variability and batch effects, both of which are frequently observed in epigenetic profiling, (ii) increased throughput, (iii) reduced procedure and/or hands-on time, and (iv) lower error rates. In light of the limited number of cells within clinical samples, a combination of automation and miniaturization is likely to be beneficial in most cases. This comes with the additional advantage of reduced reagent cost, which can be substantial considering the high costs associated with epigenetic profiling. It should be noted that epigenetic profiling thus far is mainly being performed within basic research settings on relatively small sample sizes, which are well compatible with manual handling. Therefore, most automated platforms have been developed recently to cope with the increasing sample sizes and the profiling of more challenging (clinical) samples. In this section, we focus on automation of bulk and miniaturized epigenetic profiling; information on automation of single-cell technologies is included in Table 2. Efforts to design automated workflows for epigenetic profiling have mainly been focused on ChIP-Seq and to a lesser extent on DNA methylation profiling. This can be explained by the fact that DNA methylation profiling, and chromatin profiling (ATAC-Seq/DNAseI-Seq) as well, is relatively straightforward and therefore well compatible with manual handling. Considering 4C-Seq and HiC-Seq, these are relatively new technologies for which automated workflows have not been reported yet. For DNA methylation profiling, (parts of ) the workflow for MBD-Seq, MethylCap-Seq, and MeDIP-Seq have been designed on custom-programmed robotic liquid handling systems [140][141][142]. For ChIP-Seq, immunoprecipitations and subsequent sample preparation for sequencing have been designed on the same or similar robotic systems [143][144][145][146]. However, these robotic workflows require large amounts of starting material in the range of 1 × 10 6 cells or more. Clearly, with such input requirements, these platforms are not readily compatible with biomarker discovery.
More recently, miniaturized automated platforms have been described for ChIP-Seq using PDMS (polydimethylsiloxane)-based microfluidic devices that have been designed to perform automated immunoprecipitations. These platforms allow to perform ChIP-Seq using as low as 1 × 10 3 cells [147] or 100 cells [148] due to very small reaction volumes, providing proof-of-principle that automated low-input ChIP-Seq profiling is feasible. However, to facilitate high-throughput profiling, it would be important to increase the number of parallel samples to be profiled, as currently these platforms contain a maximum of assaying four samples in parallel [147,148]. Furthermore, integration with the labor-intensive DNA library preparation procedure would be desirable; stand-alone library preparation platforms on microfluidic devices have been reported [149,150]. For DNA methylation profiling, various commercial low-input bisulfite conversion kits have been shown to be compatible with automation. However, a fully automated miniaturized DNA methylation profiling platform has not been reported yet.

Conclusions
Biomarkers are highly valuable and desirable in a wide range of clinical settings, ranging from pharmacodynamics to monitoring treatment. Here, we have provided an overview of recent developments within genome-wide profiling technologies that may enable future large-scale screens for candidate epigenetic biomarkers. When comparing compatibility with miniaturization, automation and tissue preservation methods, bisulfite-based DNA methylation profiling is currently by far superior to other epigenetic profiling technologies for large-scale biomarker discovery. DNA methylation assays are technically less challenging than most other profiling assays, as it is not dependent on delicate enzymatic reactions or on immunoprecipitation, but on chemical conversion. A critical advantage of DNA methylation profiling over other assays is that is not affected by freezing or chemical fixation, and therefore very well compatible with (archived) clinical samples. DNA methylation profiling has the additional advantage that it requires a relatively low number of cells as input. In line with these advantages, most of the epigenetic biomarkers that have been identified thus far involve changes in DNA methylation.
Despite the advantages of DNA methylation, various other epigenetic marks are promising biomarkers. Histone-modifying enzymes are frequently mutated in a range of diseases, often directly affecting epigenetic patterns of post-translational histone modifications. The main methodology to profile these post-translational histone modifications is ChIP-Seq. ChIP-Seq is challenging on samples containing low numbers of cells as well as on archived samples, often resulting in variability in signal-to-noise ratios. However, in view of the continuous improvements in ChIP-Seq procedures for (ultra-)low input samples and for fixed samples, large scale ChIP-Seq-based screens for candidate biomarkers is likely to become feasible in the near future. These screens might benefit from the automated ChIP(-Seq) platforms that are currently being developed. The development of such automated platforms will also facilitate robust integration of ChIP assays as a diagnostic tool in clinical practice.
Of the remaining technologies discussed in this paper, ATAC-Seq and DNAseI-Seq seem most compatible with profiling of clinical samples, requiring as low as several hundred cells as input. Both ATAC-Seq and DNAseI-Seq are compatible with frozen patient samples [125][126][127][128], while DNAseI-Seq was recently successfully applied on FFPE samples [115]. However, as compared to DNAseI-Seq, the workflow of ATAC-Seq is much more straightforward as the adaptors for sequencing are inserted as part of the transposition. Also, at least for single-cell ATAC-Seq, a fully automated platform has been developed [151]. For biomarker discovery, compatibility of ATAC-Seq with FFPE samples would be highly desirable, as this would enable to include clinical samples from biobanks in large-scale ATAC-Seq profiling studies. This might be achieved by incorporating critical steps from the FFPE-compatible DNAseI-Seq. Although the use of open chromatin as an epigenetic biomarker has been rare thus far, the flexibility and ease of the recently developed ATAC-Seq (and possibly DNAseI-Seq) will undoubtedly boost the use of open chromatin in clinical research and clinical practice.
Together, this review shows that genome-wide epigenetic profiling technologies have very rapidly matured over the past decade. While originally these technologies were only compatible with large numbers of (in vitro cultured) cells, most of these can now be applied on samples containing very low numbers of primary cells down to single cells. Combined with an increasing number of sophisticated workflows and (automated) platforms, this will pave the way for large-scale epigenetic screens on clinical patient material. Such screens are essential to fill the need for new biomarkers for disease diagnosis, prognosis, and selection of targeted therapies, necessary for personalized medicine.