Universal NicE-seq for high resolution accessible chromatin profiling for formaldehyde fixed and FFPE tissues

Accessible chromatin plays a central role in gene expression and chromatin architecture. Current accessible chromatin approaches depend on limited digestion/cutting and pasting adaptors at the accessible DNA, thus requiring additional materials and time for optimization. Universal NicE-seq (UniNicE-seq) is an improved accessible chromatin profiling method that negate the optimization step and is suited to a variety of mammalian cells and tissues. Addition of 5-methyldeoxycytidine triphosphate during accessible chromatin labeling and an on-bead library making step substantially improved the signal to noise ratio while protecting the accessible regions from repeated nicking in cell lines, mouse T cells, mouse kidney, and human frozen tissue sections. We also demonstrate one tube UniNicE-seq for FFPE tissue section for direct NGS library preparation without sonication and DNA purification steps. These refinements allowed reliable mapping of accessible chromatin for high resolution genomic feature studies.


Introduction
The eukaryotic nuclear genome is packaged into chromatin, consisting primarily of DNA, proteins and RNA, which is further condensed into larger folded chromosome structures during cell division. During Transposase-Accessible Chromatin using sequencing (ATAC-seq, 5) and nicking enzyme assisted sequencing (NicE-seq, 6). Although both DNase-seq and ATAC-seq are powerful methods, they both require specific reagents and cell type specific optimization including cell number to enzyme/Tn5 transposon concentration and time of incubation (4,5). While ATAC-seq works primarily on unfixed cells, mitochondrial DNA sequence contamination was a major issue untill a modified Omni-ATAC-seq protocol was developed (7). Our previously published NicE-Seq method also required similar enzyme titration in each cell type to determine an optimal enzyme to cell ratio for effective labeling and capture of accessible chromatin regions. Therefore, all the current methods required a careful titration of cells to enzyme and incubation time for optimal digestion to capture accessible chromatins. Here we report an improved, fast, accurate, and robust method, UniNicE-seq, which generates higher data quality for interrogation of accessible chromatin and eliminating the need for cell number to enzyme titration. We tested this method on a diverse set of cell lines, native and formaldehyde fixed tissue nuclei, fresh frozen formaldehyde fixed 5-10 µM human tissue sections along with FFPE tissue sections and observed markedly improved accessible chromatin signals. Taken together, Universal NicE-seq is a simple, cost effective, nick translation-based method to profile accessible chromatin using NextGen sequencing. Importantly, 5mC incorporated accessible chromatin remain resistant to further degradation eliminating time consuming titration and would allow automation on a variety of biological samples.

Universal NicE-seq protects labeled accessible chromatin against enzymatic degradation
During the previously published NicE-seq labeling reaction, the nucleotide mixture contained dNTPs supplemented with biotin-14-dATP and biotin-14-dCTP (6). This resulted in labeling accessible chromatin with biotinylated nucleotides for library preparation and enrichment. However, in the presence of excessive enzyme, the accessible regions were repeatedly nicked resulting in poor library quality and a higher signal to background ratio. In fact, the biochemical property of Nt.CviPII would allow nicking of an unmodified CCD but not mCCD (Fig 1A). To prevent repeated nicking at the same site, we substituted the dCTP in the labelling reaction with of 5-methyldeoxycytidine triphosphate . Thus, all available deoxycytidine triphosphates were indeed either 5-mdCTP or biotin-14-dCTP. This modification ensured incorporation of 5-mdCTP and/or biotin-14-dCTP at the nicking sites of Nt.CviPII by DNA Pol I. Modification of CCD sites at the 5' cytosine renders Nt.CviPII resistant to further nicking. The second modification step we implemented was on-bead library preparation which helped in reducing the background and enhancing the signal to noise ratio for the accessible region of the genome. For the UniNicE-seq, after labeling reaction the biotin labeled DNA of HCT116 cell was isolated, sonicated, and captured on streptavidin magnetic beads for library preparation and sequencing ( Fig 1B). A slight modification to this protocol by substituting dATP to Texas-Red-dATP allowed both visualization and sequencing (termed as NicE-viewSeq) of the accessible chromatin region (Fig 1C, Please contact the authors for step wise protocol).
To determine the effect of 5mC incorporation in accessible region for library preparation, we performed UniNicE-seq in the presence of 10X and 20X (25 and 50 units) excess Nt. CviPII nicking enzyme. Indeed, neither of the two reaction conditions showed loss of accessible chromatin in HCT116 cells (FRiP 0.19-0.14) demonstrating the robustness of 5mdC labeled accessible chromatin protection against degradation (Supp table 1). In a similar reaction condition using 25 and 50 units Nt.CviPII and original NicE-seq protocol we obtained FRiP score below 0.01 and loss of accessible chromatin peaks (Supp table 1). In addition, 2/3rd of peaks overlapped between universal NicE-seq libraries made with 1X, 10X and 20X nicking enzymes and signal to noise ratio were comparable (Fig 1D and E). These results unequivocally demonstrate the versatility of 5-mdCTP in the reaction to protect accessible regions without titration of cell to enzyme concentration.
To assess other improvements in our new method, we compared the number of accessible chromatin peaks detected and peak height which measures as the ratio of reads within the peak versus background.
First, we compared peak numbers between libraries made on beads (on bead) and control (off beads) either with 5-mdCTP or dCTP in the dNTP mix (Supp Fig 1A). The majority of accessible chromatin peaks are common across these libraries as shown in the Venn diagram suggesting that our modifications are not changing the distribution of identified regions ( Fig. 2A). Furthermore, the signal to noise ratio was improved in libraries made on beads as observed by peak height and distribution of fold change (FC) values ( Fig. 2B). For comparison between original NicE-seq (off bead C) and universal NicE-seq (on bead 5mC) the peak fold changes were better (6 vs. 8) and fraction of reads in the peaks (FRiP) were 0.095 and 0.26 respectively (Supp fig 1B).
We further compared the accessible chromatin reaction conditions and reads between UniNicE-seq, ATAC-seq and DNase-seq of HCT116 using Pearson correlation of peak read densities for data quality evaluation. In terms of data quality, UniNicE-seq and DNase-seq had the highest correlation (R=0.86) compared to UniNicE-seq with ATAC-seq (R=0.60) or ATAC-seq with DNase-seq (R=0.60), demonstrating most DHS (DNase Hypersensitive Sites) are indeed captured in this modified protocol (Fig. 2C). Since nucleosome-free DNA regions differentially affect distant communication in chromatin and accessible chromatins are a hallmark of active gene promoters, we compared accessible chromatin profile and transcription start sites (TSS) in duplicates datasets, across both off-bead and on-bead methods. Indeed, all the TSS were enriched with accessible chromatin tag densities (8). HCT116 cell nuclei labeling in the presence of 5mdCTP displayed highest tag densities and higher signal to noise ratio with high degree of correlation ~99% between technical duplicates (Supp Fig. 2A-C). Comparison between UniNicE-seq, ATACseq and DNase-seq peaks of HCT116 showed stronger peak signals in UniNicE-seq sequencing tracts that were derived from equal numbers of unique sequencing reads (Supp Fig 2D), with a high percentage of common accessible regions between UniNicE-seq and DNase-seq (Supp Fig . 2E). In order to define the accessible chromatin coverage of UniNicE-seq we also performed sequencing depth analysis and discovered accessible chromatin regions in promoters and other genic regions. Indeed, at a sequencing depth of 5 million reads we were able to identify most of the HCT116 promoters. At higher sequencing depth of 100 million reads, intronic and intergenic accessible regions were more prominently detectable, although the significance of these regions remains unclear (Fig. 2D). Since UniNiCE-seq and DNase-seq showed a high degree of correlation, we compared both datasets at a sequencing depth of 100 million reads with 10 million read increments and approximately 71% of the UniNicE-seq peaks overlapped with DHS, suggesting strong confidence in the UniNicE-seq method (Fig. 2E). To establish UniNicE-seq as a general protocol, we assayed two additional cell lines, MCF7 and K562, and compared accessible chromatin between libraries made on beads with either with 5-mdCTP or dCTP in the dNTP mix (Supp. table 2). We calculated fraction of reads in peaks (FRiP) analysis of all the peaks generated by UniNiCEseq to compare with HCT116 cells. The FRiP score was comparable amongst all three cell lines in both the total number of peaks and promoter peaks, suggesting substitution of 5mC for C and on bead library preparation improves the number of accessible chromatin peaks with identical sequencing depth of 11 million pair reads ( Fig 2F). As expected, the cell line specific unique and common peaks were also identified in UniNicE-seq (Fig. 2G, Supp Fig 3) with consistently higher numbers of identified TSS peaks (Supp table 2). Taken together, these results demonstrate universal NicE-seq is a superior and robust method than its predecessor NicE-seq protocol to profile accessible chromatin negating the need for enzyme titration and protecting accessible regions for NextGen sequencing.

Universal NicE-seq of mouse tissue
We further applied the UniNicE-seq to mouse T cells and kidney tissues. T cells were formaldehyde fixed and serial diluted from 25,000 to 500 cells for UniNicE-seq labeling, and libraries were made in duplicate.
The signal to noise ratio of all samples were consistent, and MACS2 peak calling was made with downsized 13M unique non-mitochondrial reads (Supp fig. 4). As expected, the majority of the accessible chromatin peaks were common in all replicates (Fig. 3A). The pairwise total read densities comparison between 500, 5K and 25K cells also displayed strong correlation of r value between 0.95-1. (500 vs. 5K, r=1; 5K vs. 25K, r=0.97; 500 vs. 25K, r=0.95). Since accessible DNA quantity varied between different cell numbers, we further tested if the read density between the common peaks are similar. For validation, average log2 (normalized reads) in 6062 peaks that were detected in all 6 samples were plotted on a 3dimensional scatter plot. The observation that all data points line up on a straight line in the 3dimensional space shows that signal in 25000, 5000 and 500 T-cells are highly correlated and that there was no loss of signal with decreased numbers of cells (Fig. 3B). A correlation matrix of Pearson values corroborates the above observation (Fig. 3B). A strong correlation was also observed when all reads were compared (Supp fig 4B). In addition to mouse T cells, we also interrogated and compared accessible chromatin data from formaldehyde fixed with unfixed mouse kidney cells. Accessible chromatin from 25K fixed cells were compared with 25K, 10K, 1K, 500 and 250 unfixed cells. The quality control metrics of UniNicE-seq libraries applied to mouse kidney tissues was good (Supp table 3). The peak numbers between fixed and unfixed cells were similar along with the signal to noise ratio between 25000-500 cells Table 3). We also observed that the nonfixed high number kidney nuclei (>1K) would form clumps and make the NicE-seq labeling reaction inefficient resulting in decrease in FRiP score (supp. Table   3). Direct comparison between common accessible region peaks of 25000 fixed and non-fixed cells

Universal NicE-seq of human 5-10 µM frozen tissue sections
Clinical samples from patients are often limited and increasingly obtained from fresh biopsies. Routinely, 5-10 µM tissue sections are used for immunopathological and other staining analysis. Currently, ATAC-seq method for chromatin accessibility studies requires 50,000 purified nuclei from patient brain and cartilage tissue samples (7,9). Indeed, 20 mgs of brain tissue samples were used from post-mortem human brain samples for nuclei preparation (7). However, obtaining large tissue volume from live patients is inconvinient, and often donor samples are limited. Therefore, we attempted UniNicE-seq using single frozen 5-10 µM tissue sections, mimicking biopsy samples. The protocol was slightly modified for this application, with a formaldehyde fixation step of the tissue section prior to the labeling reaction on the slide. The fixing ensured the attachment of tissue section for on-slide labeling reaction and subsequent washing steps. We than applied the UniNicE-seq protocol to make accessible chromatin libraries from tissue sections. Accessible chromatin regions were revealed, as expected, between two different lung tissue sections (Supp Fig 6A, Supp table 4). Total read counts between these two tissue sections displayed strong correlation r=0.97 (Supp Fig 6B). Similarly, all common accessible peak reads between both tissue sections also were highly correlated (r=0.97) demonstrating high quality of common accessible regions (Supp Fig . 6C). Cells undergo transcriptional changes during differentiation, especially the acquisition of tissue specific enhancers. Therefore, we profiled enhancers in fetal and adult lung tissues using known histone marks associated with enhancers. However, in fetal and adult lung there was a marked difference in accessibility of promoter and 5'UTR regions. When we compared adult vs. fetal tissues a varying degree of correlation was observed among all 3 different tissue types (Fig 4D). Accessibility of active enhancer elements in kidney was more enriched in fetal tissue compared to liver enhancers that predominated in the adult tissue. As expected, active enhancers in adult lung tissues were more enriched compared to fetal tissue suggesting a developmental cue for enhancer activation post birth (Fig 4E). Similarly, transcription factor consensus binding site near the UniNicE-seq derived accessible chromatin binding region displayed varying degrees of similarity and a contrast between fetal and adult tissues, with fetal tissue demonstrating developmental and functional programming of lung tissue (Fig 4F). Principal component analysis of read counts between fetal and adult tissue accessible chromatin displayed close similarity in liver compared to kidney and lung tissue (Supp Fig 6D). However, transcription start sites were always accessible in both fetal and adult tissue to varying degrees (Supp Fig 6E).

One-tube universal NicE-seq protocol of human 5-10 µM FFPE tissue sections
Change in chromatin accessibility underlie various diseases including cancers and are often available as FFPE samples. Therefore, we modified universal NicE-seq to be compatible with FFPE tissue sections. The major goal was to develop the reaction protocol that will negate DNA purification and sonication step and will directly feed into NGS library preparation (Fig 5A).
We introduced a few notable changes. For removal of paraffin we introduced warm paraffin oil wash in place of organic solvent, which is flammable and would need special handling. After paraffin oil based removal of paraffin, the tissue sections were hydrated and equilibrated in PBS for compartibility of labeling reaction. Once the labeling reaction was over, RNaseA was added and crosslinking was reversed at 65˚C incubation. The reaction mix was treated with proteinase K for removal of any protein to enrich DNA in the tube. In the next step proteinase K was heat inactivated and nicking enzyme Nt.CviPII was added again to the reaction. Since Nt.CviPII is sensitive to newly labeled accessible chromatin incorporated with 5mC, it would digest away all other DNA of the geome thus enriching only accessible regions. These accessible DNA fragments were captured on strpavidine beads for on-bead NGS library preparation and sequencing (Please contact the authors for step wise protocol).
To demonstate the proof of principle, two different human liver FFPE samples were profiled for accessible chromatin studies (Supp table 5). Both tissue sections displayed 60-70% common accessible regions and a strong correlation of read counts (Fig 5B, Supp fig. 7). Most accessible regions were enriched in promoter, exon, intron and intragenic regions as expected (Fig 5C, D, E). IGV browser display showed accessible chromatin peak correlation with active chromatin marks, H3K27ac and H3K4me3 along with active enhancer mark, H3K4me1 (Fig 4F). Therefore, we concluded that universal NicE-seq works well in single FFPE tissue section.

Discussion
In summary, we demonstrated the ease of application of UniNicE-seq to a variety of human cell lines, mouse tissues and human organ derived tissue section from both frozen-fixed and FFPE samples for For tissue samples, mouse T cells, human liver and mouse kidney cells, nuclei were prepared, formaldehyde fixed before labeling reaction, as described above. In experiments involving unfixed tissue nuclei, the nuclei suspension was in 1X PBS before labeling. Low input cell numbers between 250-500, library was amplified 12-13 cycles. The library was examined and quantitated with high-sensitive DNA chip (Agilent, 5067-4627).
In order to compare NicE-seq data generated from different protocols, different numbers of input cells, different samples or compare NicE-seq to other open-chromatin mapping methods (i.e., ATAC-seq, DNase-seq), we downsized the mapped reads from different experiments to the same number of mapped fragments (after excluding PCR duplicates and mitochondrial reads) through random sampling. Peaks were called using the same parameter with MACS2 as mentioned above. Bigwig files of normalized reads per million read in 10 bp non-overlapping windows across the genome were displayed in Integrated Genomics Viewer (13). All analysis were performed after removing ENCODE blacklists.

Fraction of reads in peaks (FRiP):
The FRiP score was calculated using the deepTools plotEnrichment function (14). Called peaks were classified into 2 groups: TSS peaks if they overlap with +/-500bp from annotated TSSs (based NCBI RefGene annotation); and distal peaks if otherwise. Correspondingly, reads that overlapped with the TSS peaks by at least 1 base were marked as "TSS" reads. Reads that overlapped with distal peaks were marked as "distal" reads. Reads that do not overlap with any called peaks were marked as "reads not in peaks".

Peak overlap analysis:
Peaks called from different experiments were compared using the Bedtools (15).
First peaks from all the samples are concatenated. Peaks that have at least one base pair overlapping are considered associated and are merged to form a union peak set. Then peaks of individual samples were compared to the union set and were marked as either "unique" or "common". Last the numbers of "unique" and "common" peaks were summarized from all the samples and were used to make Venn Diagrams in R.

Correlation analysis: Correlation analysis of UniNicE-seq open-chromatin signals were performed with
the DiffBind (16) package in R and deepTools (14) using two methods: occupancy (peak overlap) based method uses peak overlapping states and affinity (normalized read density) based method. The occupancy-based method determines the correlation coefficients based on the numbers of unique peaks and overlapping peaks. The affinity-based method first determines the number of normalized reads that overlap with a set of consensus peaks for individual samples and then calculates pearson correlation based on the normalized read count matrix. Correlation heat maps were generated using both occupancy and affinity methods with DiffBind and deepTools. PCA plots were generated from the normalized read count matrix by the affinity method.

Peak annotation and Gene/Genome Ontology analysis:
Functional annotation of called peaks was performed with HOMER (17) annotatePeaks.pl. After associating peaks with nearby genes and assigning peaks to different genomic features (e.g., promoter, exon, CpG islands, repetitive elements etc), we also conducted Gene Ontology enrichment analysis for selected sets of UniNicE-seq peaks (e.g., tissue-specific peaks) and tested for enrichment of UniNicE-seq peaks in associated genomic features with HOMER. As a control, the normalized coverage of a set of randomly sampled genomic regions were also plotted in the same way.
External datasets: TSS of mouse (mm10) and human (hg38) genomes were extracted from the NCBI RefGene gene table downloaded from the UCSC Table Browser. ChIP-seq datasets of cell-specific and tissue-specific CTCF binding, RNA polymerase II binding and histone marks (H3K27ac, H3K4me1, H3K4me3) were downloaded from the mouse and human Encyclopedia of DNA Elements (ENCODE) projects (Supp table 6). Human liver tissue-specific enhancers were acquired from the TiED database (http://lcbb.swjtu.edu.cn/TiED/) and EnhancerAtlas (18) . The original hg19 genome coordinates were converted to hg38 using the LiftOver tool.    Both of the TF binding motifs and the samples are organized by unsupervised k-means clustering method.
The p values of e-6 was considered for the cluster analysis.