Skip to main content

Table 1 Summary of the main epigenetic features and the principles, caveats, and requirements of the main technologies used for their profiling

From: Genome-wide epigenomic profiling for biomarker discovery

DNA methylation. DNA methylation is the process in which a methyl group is added to the 5′ position of cytosines in the DNA, which mainly occurs within the context of CpGs. DNA methylation typically acts to repress gene transcription when located in a gene promoter, while gene-body methylation is positively correlated with expression [153157]. Distal regulatory regions like enhancers generally contain low DNA methylation levels when active due to binding of TFs [158]. The role or consequence of DNA methylation at other places of the genome is less well understood [14]. Genome-wide profiling of DNA methylation generally relies on (i) affinity purification of methylated DNA fragments or (ii) the use of sodium bisulfite converting unmethylated cytosines into uracil. The technologies referred to by the first method, MBD-Seq/MethylCap-Seq (methyl-CpG binding domain protein-enriched sequencing/methylated DNA capture sequencing) [140, 141, 159] and MeDIP-Seq (methylation DNA immunoprecipitation sequencing) [160, 161], utilize a methyl binding protein domain or an antibody raised against 5-methylcytosine, respectively, to affinity purify methylated DNA fragments from sheared genomic DNA. Although MethylCap-Seq/MeDIP-Seq provides accurate measurements of DNA methylation [162], an important caveat is the aspecific background remaining after the affinity purification. These might cause false positive results (in particular in case of copy number variations) if not properly controlled for. The second method makes use of bisulfite on sheared genomic DNA to convert unmethylated cytosines into uracil, while leaving methylated cytosines unaffected [154]. After subsequent amplification to prepare the DNA for readout, the uracil (representing the unmethylated cytosine) is read as a thymidine, while cytosines represent methylated cytosines in the original sample. The readout of bisulfite-based methods is mainly performed by microarrays (including the Infinium HumanMethylation450 BeadChip array (“450K array”) covering 450,000 of the 28 million genomic CpGs) [163] or by sequencing, referred to as whole-genome bisulfite sequencing (WGBS). In light of the high sequencing costs associated with WGBS, reduced representation bisulfite sequencing (RRBS) selects for CpG-rich fragments before sequencing using methylation-insensitive restriction enzymes such as MspI [164]. An important advantage of bisulfite-based methods (450K array, WGBS, RRBS) over other DNA methylation profiling technologies is that these generate DNA methylation profiles at base-pair resolution. Furthermore, the input requirements for WGBS/RRBS (20 ng of DNA for low-input WGBS/RRBS profiling, equivalent to 3 × 103 cells [121]) are low as compared to the 450K array (500 ng; 7.5 × 104 cells) and MBD-Seq/MethylCap-Seq/MeDIP-Seq (1 μg DNA; 1.5 × 105 cells). Although dependent on sequencing depth, the coverage of WGBS is usually >90% of all CpGs in the genome [165, 166], as compared to 60–90% for MBD-Seq/MethylCap-Seq/MeDIP-Seq and 2% for the 450K array. In view of the superior specifications, WGBS is considered the “golden standard” for determining the DNA methylome.

Protein binding sites. Characterization of the genomic locations of post-translational histone modifications, histone variants, TFs, and other chromatin associated proteins is generally performed by chromatin immunoprecipitation (ChIP). ChIP relies on the use of a specific antibody to perform affinity purifications on sheared chromatin to isolate fragments bound by the protein of interest. In most workflows, proteins are crosslinked to the DNA by formaldehyde, after which the chromatin is fragmented by sonication or enzymatic digestion. However, in particular in case of histones, ChIP can also be performed on native (meaning non-crosslinked) chromatin fragmented by micrococcal nuclease (MNase) [167, 168]. After ChIP, the purified DNA fragments are sequenced to determine the protein localization on a genome-wide scale (ChIP-Seq) [169, 170]. Loci in the genome which are enriched for mapped sequencing reads (generally referred to as “peaks” according to their visual appearance in genome browsers) represent protein binding sites. ChIP-Seq heavily relies on the availability of antibodies that are specific for their endogenous target and that are compatible with the ChIP conditions. Since ChIP-Seq relies on an enrichment strategy, it generally requires a relative high number of cells as input to distinguish specific signals from background. The number of input cells for ChIP-Seq is typically 0.5–5 × 106 cells, with profiling of histones requiring less cells than profiling TFs [134].

Chromatin accessibility/footprinting. Transcriptional activation is tightly linked with disruption or eviction of nucleosome organization at control regions such as promoters and enhancers due to binding of TFs. Regulatory DNA thus coincides with open or accessible genomic sites in chromatin [171, 172]. Profiling of these accessible sites is performed using the exonuclease desoxyribonuclease 1 (DNaseI) or using the Tn5 transposase on native chromatin, as both enzymes are able to target accessible genomic regions within chromatin. Selecting and sequencing short fragments (50–150 nt) after treatment with DNaseI (DNAseI-Seq) [173, 174] or transposase (assay for transposase-accessible chromatin (ATAC)-Seq) allows to enrich for TF binding sites, in contrast to larger fragments that might be derived from nucleosomes [175]. Similar to ChIP-Seq, loci in the genome which are enriched for mapped sequencing reads (referred to as “peaks”) represent accessible sites. Within the ATAC-Seq procedure, the Tn5 transposase directly inserts the adapters for sequencing. Therefore, ATAC-Seq has an important advantage in that it requires a relative small number of cells (5 × 104 cells) [175] to start with as compared to DNAseI-Seq (1–10 × 106 cells [172]). Both for ATAC-Seq and DNAseI-Seq, characterization of enriched DNA motifs within the accessible sites can be used to infer the identity of sequence-specific TFs. A complementary approach to infer the identity of TFs that are binding within accessible regions is by the use of so-called “footprints.” Sequence-specific TFs protect the genome from DNAseI and transposase digestion at the exact position where they are binding the DNA. This results in a unique, detectable footprint that can be used for characterization of the factor that is binding [174, 176].

Nucleosome occupancy/positioning. Nucleosomes are the basic core particles of the chromatin, consisting of histones and approximately 147 base pairs of DNA wrapped around it. Although the DNA-protein binding within nucleosomes is very stable, nucleosomes can be remodeled or slide along the DNA, thereby facilitating or inhibiting chromatin-related processes such as transcription. Nucleosome positioning is usually determined with the use of MNase on native chromatin [171, 177]. MNase is an endo-exonuclease that digests and cleaves DNA unless it is protected by proteins. Nucleosome position can be determined by sequencing the DNA fragments (115–195 bp in size) isolated from chromatin treated with MNase (MNase-Seq) [178, 179]. A typical MNAse-Seq profiling experiments requires 1–10 × 106 cells.

3D conformation of the genome. Chromatin loops and further high-order chromatin structures are profiled using chromosome confirmation capture [180]. Chromosome confirmation capture relies on digestion of crosslinked chromatin using restriction enzymes, followed by ligation of the sticky ends. Sequencing of DNA ligation products allows to determine the proximity of the ligated fragments and provides insight into the 3D structure within the nucleus. Chromosomal loci that are far apart on a linear chromosome, but close together in nuclear space, can come into proximity and will hence be ligated [181]. For genome-wide profiling, two different variants of chromosome confirmation capture that are popular include Circular chromosome confirmation capture (4C-Seq) [182] and HiC-Seq [183, 184]. 4C-Seq determines all genomic interaction partners of one specific locus in the genome (referred to as “bait”) at high resolution and sensitivity. In HiC-Seq, all genomic interactions are profiled at low resolution and sensitivity, enabling a global 3D view on the genome. Using HiC-Seq, recent studies in mice and human have revealed that chromosome territories are arranged into large megabase-sized topologically-associating domains (TADs) that are highly conserved and stable across cell types [183, 185]. 4C-Seq experiments typically require 1 × 107 cells [186], while HiC-Seq experiments require 2.5 × 107 cells [187].