Skip to main content

Table 1 Major steps in the 450K array analysis pipeline

From: Establishing an analytic pipeline for genome-wide DNA methylation

Analysis

Rationale

Sample filtering

Experimental samples are compared to control probes present within the array technology to identify samples that fail to adequately detect DNAm. Samples with poor detection may be inaccurate, due to poor sample quality, and thus might be considered for exclusion from the dataset.

Probe filtering

Raw data must past initial quality and data screening. Probes failing to meet preset detection values and/or failed probes are removed from analysis because they are unreliable (see text). For example, some probes may cross-hybridize or overlap with SNPs, which could confound results. Study aims should be considered when determining which probes to remove.

Within-array normalization

This step removes “background” noise and corrects for technical dye-based (red/green), intensity, and probe type (I/II) differences within the array technology.

Batch effects

The step assesses and accounts for variation that is not caused by biological differences but by external variation (e.g., samples are processed on different days or at different facilities).

Cell composition

Whole blood contains multiple cell types with potentially different DNAm profiles. As different samples may contain varying proportions of cell types, statistical methods have been developed to estimate and correct for this cellular heterogeneity.

Differential DNAm positions and regions

Currently, many analytic pipelines assess for DNAm differences in both specific positions and broader regions. DNAm positions interrogated on the array are not evenly distributed, and both differentially methylated positions and regions may yield clinically meaningful results.

Biological and clinical interpretation

Various approaches may be necessary for accurate interpretation of differential methylation between groups. Tools for functional and regulatory enrichment analyses are available. Manual exploration of the literature and validation in a second cohort or by another method (e.g., bisulfite sequencing) remains as viable options for interpretation.