Nucleated red blood cells impact DNA methylation and expression analyses of cord blood hematopoietic cells

Background Genome-wide DNA methylation (DNAm) studies have proven extremely useful to understand human hematopoiesis. Due to their active DNA content, nucleated red blood cells (nRBCs) contribute to epigenetic and transcriptomic studies derived from whole cord blood. Genomic studies of cord blood hematopoietic cells isolated by fluorescence-activated cell sorting (FACS) may be significantly altered by heterotopic interactions with nRBCs during conventional cell sorting. Results We report that cord blood T cells, and to a lesser extent monocytes and B cells, physically engage with nRBCs during FACS. These heterotopic interactions resulted in significant cross-contamination of genome-wide epigenetic and transcriptomic data. Formal exclusion of erythroid lineage-specific markers yielded DNAm profiles (measured by the Illumina 450K array) of cord blood CD4 and CD8 T lymphocytes, B lymphocytes, natural killer (NK) cells, granulocytes, monocytes, and nRBCs that were more consistent with expected hematopoietic lineage relationships. Additionally, we identified eight highly differentially methylated CpG sites in nRBCs (false detection rate <5 %, |Δβ| >0.50) that can be used to detect nRBC contamination of purified hematopoietic cells or to assess the impact of nRBCs on whole cord blood DNAm profiles. Several of these erythroid markers are located in or near genes involved in erythropoiesis (ZFPM1, HDAC4) or immune function (MAP3K14, IFIT1B), reinforcing a possible immune regulatory role for nRBCs in early life. Conclusions Heterotopic interactions between erythroid cells and white blood cells can result in contaminated cell populations if not properly excluded during cell sorting. Cord blood nRBCs have a distinct DNAm profile that can significantly skew epigenetic studies. Our findings have major implications for the design and interpretation of genome-wide epigenetic and transcriptomic studies using human cord blood. Electronic supplementary material The online version of this article (doi:10.1186/s13148-015-0129-6) contains supplementary material, which is available to authorized users.

Granulocytes were obtained from the bottom fraction of the Lymphoprep gradient during CBMC purification, mixed with 3% dextran/0.9% saline solution to allow separation of granulocytes from erythrocytes by sedimentation, and followed by three steps of hypotonic lysis. Hypotonic lysis was achieved by incubation of the granulocyte fraction with ice cold 0.2% sodium chloride (NaCl) for 30 seconds to lyse remaining red blood cells. Following lysis, isotonicity was restored by adding an equal volume of 1.6% NaCl solution at room temperature. Microscopic analysis using Wright-Giemsa staining indicated that the collected granulocytes were >95% neutrophils (data not shown).
Cell images were captured after Wright staining by a Nikon Eclipse E400 microscope and Canon VIXIA HFS20 camera.

RNA extraction and genome-wide expression profiling
Total RNA was extracted from the samples using QIAshredder columns and RNeasy Mini Kit For whole-genome expression profiling, the RNA samples were hybridized to the Illumina HumanHT-12_v4_BeadChip array according to the recommendations of the manufacturer. The resulting data were transferred to GenomeStudio (Illumina), then further processed and normalized in R using the lumi package. 1 Any gene probes with signal intensity <100 were considered background expression, and removed from analysis, for a final dataset of 20,876 probes. Average log 2 (expression) for each gene in T cells collected by the standard sorting strategy was compared to average log 2 (expression) for each gene in T cells collected by the stringent sorting strategy.

Quality control and probe filtering of DNAm data from 450K array
The raw intensity data produced by the 450K array were background normalized in GenomeStudio (Illumina). Quality control was performed using the 835 control probes included in the array. The intensity data were then exported from GenomeStudio and converted into M values using the lumi package 1 in R software. 2 Sample identity and quality were then evaluated in three ways: (i) clustering with the 65 SNP probes provided on the array, with samples from the same individual grouping together as expected; (ii) clustering with probes on the X and Y chromosomes, with samples grouping by known sex as expected; (iii) clustering based on all probes, producing groups based on cell type. Based on these checks, one NK cell sample was removed as an outlier. The 450K array targets 485,577 CpG sites, but probes were removed from analysis if they fell into any of the following categories: (i) probes that target SNPs (n = 65); (ii) probes that target or cross-hybridize with sites on the sex chromosomes (n = 11,648 and 11,359, respectively); and (iii) probes that target CpGs which may also contain SNPs (n = 19,271). 3 Probes that had a detection p-value>0.01 or under 3 bead replicates in more than one sample were also removed (n = 2,919), for a final dataset of 440,315 CpG sites.

Supplemental Data
Supplemental Table 1. Summary of surface antigens targeted to sort each cord blood hematopoietic cell type using the standard and stringent FACS protocols. "+"and "-" symbols respectively indicate positive and gating for a specific antigen. "•" indicates that the antigen was not used to sort a given cell type. Granulocytes were not sorted by FACS, but collected by density gradient separation, erythrocyte sedimentation, and erythrocyte hypotonic lysis.  Figure 1. Standard and stringent FACS gating strategies. Schematic representation of (A) the standard cell sorting strategy used to purify whole (CD3+) T cells, nRBCs, and monocytes by FACS; and (B) the stringent cell sorting strategy used to purify CD4 and CD8 T cells, B cells, nRBCs, monocytes, and NK cells by FACS.

Supplemental Figure 2. Cord blood hematopoietic cells in a published Gene Expression
Omnibus dataset (GSE24759) show high expression of hemoglobin genes. Expression of hemoglobin alpha, beta, and gamma genes in hematopoietic cells isolated from peripheral and cord blood by flow cytometry are reported on a log2 scale. Peripheral hematopoietic cells do not show hemoglobin gene expression above 6, the threshold for background expression. In contrast, cord blood hematopoietic cells display hemoglobin gene expression higher than background, indicating significant erythroid contamination of these isolated cell populations. Figure 3. Principal component analysis of T cells, monocytes, and nRBCs sorted by the standard and stringent FACS strategies. Principal components (PCs) 1 and 2 are both associated with cell type, and PC3 is associated with batch/sorting protocol. DNAm in nRBCs is strongly affected by the change in sorting protocol, but there is minimal impact on DNAm in T cells and monocytes. These PC plots are a direct comparison of cell populations from the two sorting protocols, unlike other DNAm analyses performed on these data; as such, all DNAm data were combined for SV adjustment to perform this PC analysis.

Supplemental Figure 5. DNAm in nRBCs is influenced by their proportion in whole cord blood, as measured by number of nRBCs/100 WBCs. (A)
The 450K array-wide median methylation in nRBC samples differed significantly with nRBC proportion, showing a pattern of decreasing DNAm with increasing nRBC count (full y-axis scale of 0.00-1.00 on the left, narrower y-axis scale on the right for a closer look at the association). (B) DNAm in nRBCs, CBMCs, whole cord blood (WB), and CD4 T cells at two of the CpG sites with the strongest association between nRBC DNAm and nRBC proportion (FDR <5%, magnitude of regression coefficient > 0.05). CBMCs and whole cord blood are included to display the potential impact these DNAm changes in nRBCs could have on cord blood cell mixtures; CD4 T cells are included as a reference for the other blood cell types, which do not show an association with nRBC count. (C) DNA methylation in nRBCs, CBMCs, whole cord blood, and CD4 T cells at the three of our eight identified erythroid DNAm marker CpGs that are significantly associated with nRBC proportion.