Data mining and overall features of the MethylCap-seq libraries of plasma cfDNA during hepatocellular carcinoma (HCC) development. (A) Analysis of the MethylCap-seq library generated 37,610,900; 37,072,952; 35,016,215 or 33,286,609 and 33,002,633 raw reads for health control (HC), chronic hepatitis B infection (CHB), liver cirrhosis (LC), or HCC and non-small cell lung cancer(NSCLC) (blue bars), respectively. Nearly half of the raw reads (43.3-50.7%) could be mapped to the hg19 reference genome (brown bar), producing 180,000-260,000 methylation peaks (green bar). (B) Cluster analysis based on genome-wide DNA methylation similarities. Hierarchical clustering was conducted to show the similarities in the DNAm among CHB, LC, HCC and NSCLC. The Euclidean distance was applied to measure the similarities in methylation alterations. Hypermethylated regions were defined as the regions in which methylated reads were over threefold higher than those observed in normal tissues through a 100-kb sliding window and 50-kb steps. (C) The density distribution of DNA methylation near the transcription start site (TSS). Alterations in DNAm were surveyed over a broad region of the gene (from 200 kb downstream to 200 kb upstream of the TSS). HCC had the highest levels of methylation in the TSS region, which was followed by LC, CHB and HC. As a control, NSCLC was plotted in the figure to show that HCC and NSCLC had the highest levels of methylation near the TSS. The original overlapping peaks are artificially separated to enable clear views of each peak. (D) Differential methylated regions (DMR) obtained by category, including the total DMR, gene-related DMR, CGI-related DMR and both CGI- and gene-related DMR. (E) General characterizations of the gene structures as determined by aberrantly methylated genomic loci in the different stages of HCC development (CHB, LC and HCC). The Y-axis depicts the hit numbers of the genes.