Skip to main content

Table 2 Qualitative and Quantitative metrics used to assess normalisation efficacy. The table includes a brief description of each metric and which figures describe the results for that method

From: Comparison of pre-processing methodologies for Illumina 450k methylation array data in familial analyses

 

Method

Description

Figure

1

Density plot: all samples

Bimodal distribution of Beta values as methylated and unmethylated signals. Each sample is represented by a single line. A batch effect is indicated when samples performed in the same batch have a similar distribution.

Fig. 2a, c, e

Additional file 5: Figure S4

Density plot: three groups of replicate samples

Bimodal distribution of Beta values as methylated and unmethylated signals. Samples are coloured by replicate group. As each replicate group contain the same biological information, differences in sample distribution within groups indicate technical bias.

Additional file 3: Figure S2 (A, C, E)

Density plot: probe I and II distribution

Bimodal distribution of Beta values as methylated and unmethylated signals separated by Infinium I and II probe types. Provides information about probe normalisation which is required for Infinium I and II signals to be combined in the same analysis.

Fig. 2b, d, f

2

MDS plot: all samples

Multidimensional scaling plots show a 2D projection of distances between samples. For these plots the 1000 most variable sites have been selected as they are the most biologically relevant for this type of analysis. Samples cluster by similarity and as such batch effects and familial clustering can be clearly discerned.

Fig. 3

Additional file 8: Figure S5

MDS plot: three groups of replicate samples

1000 most variable sites are again selected, with samples coloured by replicate group. As each replicate group contains the same biological information, close within group clustering indicates minimal technical bias while distantly clustered replicate samples indicate heightened technical bias.

Additional file 3: Figure S2 (B, D, F)

3

ANOVA of the first principal component for MDS plots

Provides a quantitative value for MDS plots. A lower p value indicates the clustering is more significantly explained by batch. Ie. a larger p value after normalisation indicates a reduction in batch effect.

p values displayed on Fig. 3

4

Median absolute differences between replicate samples

For each replicate group the median M value (log of Beta values) across all probes was calculated and the absolute difference compared between replicate groups after various normalisation methods. A smaller absolute difference indicates improved normalisation as more technical bias is removed.

Additional file 6: Table S2

5

Imprinted regions: density plots

227 probes mapping known imprinted hemi-methylated regions can be used as a standard to measure changes in methylation levels after normalisation. Density plots have a single distribution peak since there is roughly 50 % methylation at these sites.

Additional file 4: Figure S3

Differentially methylated region standard error (DMRSE)

The DMRSE measures how each sample varies from the expected 50 % methylation. Smaller error/deviation from 50 % indicates less technical bias.

Additional file 1: Table S1

Additional file 4: Figure S3 (A, C, E)

6

Cluster dendrogram

Another tool to measure clustering by sample similarity. Samples are labelled by batch with batch effects clearly seen before normalisation and diminished after. Red stars indicate replicate samples that are expected to cluster most closely.

Additional file 2: Figure S1

7

meQTL association

Association between methylation at cg17749961 and SNPs in a 2-Mb window.

A significant association is maintained after normalisation and batch correction.

Additional file 5: Figure S4

8

Epigenome-wide methylation association with age

QQ plots depicting the association between epigenome-wide methylation and age.

Plots are performed on raw, normalised and batch-corrected data.

Additional file 9: Figure S6