- Open Access
Validating biomarkers and models for epigenetic inference of alcohol consumption from blood
Clinical Epigenetics volume 13, Article number: 198 (2021)
Information on long-term alcohol consumption is relevant for medical and public health research, disease therapy, and other areas. Recently, DNA methylation-based inference of alcohol consumption from blood was reported with high accuracy, but these results were based on employing the same dataset for model training and testing, which can lead to accuracy overestimation. Moreover, only subsets of alcohol consumption categories were used, which makes it impossible to extrapolate such models to the general population. By using data from eight population-based European cohorts (N = 4677), we internally and externally validated the previously reported biomarkers and models for epigenetic inference of alcohol consumption from blood and developed new models comprising all data from all categories.
By employing data from six European cohorts (N = 2883), we empirically tested the reproducibility of the previously suggested biomarkers and prediction models via ten-fold internal cross-validation. In contrast to previous findings, all seven models based on 144-CpGs yielded lower mean AUCs compared to the models with less CpGs. For instance, the 144-CpG heavy versus non-drinkers model gave an AUC of 0.78 ± 0.06, while the 5 and 23 CpG models achieved 0.83 ± 0.05, respectively. The transportability of the models was empirically tested via external validation in three independent European cohorts (N = 1794), revealing high AUC variance between datasets within models. For instance, the 144-CpG heavy versus non-drinkers model yielded AUCs ranging from 0.60 to 0.84 between datasets. The newly developed models that considered data from all categories showed low AUCs but gave low AUC variation in the external validation. For instance, the 144-CpG heavy and at-risk versus light and non-drinkers model achieved AUCs of 0.67 ± 0.02 in the internal cross-validation and 0.61–0.66 in the external validation datasets.
The outcomes of our internal and external validation demonstrate that the previously reported prediction models suffer from both overfitting and accuracy overestimation. Our results show that the previously proposed biomarkers are not yet sufficient for accurate and robust inference of alcohol consumption from blood. Overall, our findings imply that DNA methylation prediction biomarkers and models need to be improved considerably before epigenetic inference of alcohol consumption from blood can be considered for practical applications.
Alcohol consumption is a modifiable lifestyle factor associated with morbidity and mortality worldwide . It was estimated to be the seventh-leading risk factor for disability-adjusted life-years (DALYs) and deaths in 2016, accounting for 5.2% (95% CI 4.4–6.0) of deaths globally . Various diseases are caused or strongly influenced by excessive alcohol consumption, often in a dose-dependent manner, such as different forms of cancer, various liver diseases, cardiovascular disease, epilepsy, and unipolar depressive disorder .
Recent alcohol consumption is detectable by breathalyzers or direct measurement of the alcohol concentration in blood and urine; however, such measurements only provide information on few hours since the last alcohol consumption. For example, ethanol can be detected in urine within ten to twelve hours after the last drink, but not later . Blood-based toxicological tests for alcohol consumption are also available, which are based on direct or indirect biomarkers. A direct biomarker is the result from ethanol metabolism or its reaction with other substances in the body, including ethyl glucuronide (EtG), ethyl sulfate (EtS), and phospholipid phosphatidylethanol (PEth). Indirect biomarkers are derived from cellular processes that undergo changes as a response to alcohol consumption, including carbohydrate-deficient transferrin (CDT), mean corpuscular volume (MCV), aspartate-aminotransferase (AST), alanine aminotransferase (ALT), and gamma-glutamyl transferase activity (GGT) [4, 5]. It is important to note that these direct and indirect biomarkers are specifically useful to determine the extreme categories, including excessive alcohol consumption or abstinence, and for recent alcohol consumption . For example, CDT can distinguish excessive alcohol consumption of on average > 50–80 g ethanol per day over a period of 2 weeks . In contrast, there are no reliable biomarkers available that can determine overall alcohol consumption habits like to distinguish heavy and at-risk drinkers from light and non-drinkers, or drinkers from non-drinkers and that are informative for alcohol consumption for longer periods of time. Therefore, due to the limited progress in previous alcohol biomarker research, information on long-term alcohol consumption is typically still collected using self-reports, although they are known to be unreliable . Accurate and reliable biomarkers that reflect habitual alcohol consumption over months and years are needed to better diagnose and treat alcohol-related diseases and for objective exposure assessment in studies on alcohol consumption and health .
DNA methylation has been proposed as a biomarker for the detection of lifestyle factors in general  and several studies have already shown that alcohol consumption is associated with changes in DNA methylation levels in particular [9,10,11,12]. A few studies have also explored the possibility of epigenetic inference of alcohol consumption from blood [12,13,14,15]. A large benefit from epigenetic-based inference is the increasing availability of DNA methylation information in study participants, as DNA methylation is extensively studied for its association with diseases. The most extensive study investigating the epigenetic association and inference of alcohol consumption was done by Liu et al. . In this study, an epigenome-wide association study (EWAS) meta-analysis on alcohol consumption was conducted in 9643 individuals of European ancestry from blood-derived DNA . The authors identified 363 CpGs significantly associated (P < 1 × 10−7) with alcohol consumption levels used as a continuous variable (grams/day). A meta-analysis was performed for prediction marker discovery in a subset of 6926 participants of European ancestry, which identified 361 CpGs (P < 5 × 10−6). The study also reports impressively high prediction accuracies, expressed as area under the curve (AUC) estimates, for DNA methylation-based prediction models for categorical alcohol consumption based on sets of 5, 23, 78, or 144 CpG markers plus age, sex, and BMI. These models include pairwise combinations of four alcohol consumption categories with the highest AUC obtained for the models with the extreme categories. For instance, the reported 144-CpG model showed discrimination of heavy drinkers versus (vs.) non-drinkers with an AUC of 0.91–1.0 (an AUC of 1.0 means completely accurate inference) in the discovery dataset and all four replication cohorts as well as 0.86–1.0 for heavy drinkers vs. light drinkers . The authors demonstrated increase in AUC with increased number of CpG predictors included in the models.
The high prediction accuracies reported by Liu et al.  were questioned based on methodological grounds by Hattab et al. . Liu et al. were particularly criticized for not having used the coefficients from the discovery dataset to determine prediction accuracies in the replication datasets, but instead, they re-estimated these coefficients in each replication cohort using the same dataset for model training and testing. Hattab et al.  concluded that the prediction accuracies published by Liu et al. represent overestimates. However, Hattab et al. based their conclusions entirely on simulated data instead of empirical data. In a subsequent study, Yousefi et al.  found only half of the alcohol consumption variance explained by the DNA methylation markers in their independent data, compared to the explained variance values reported by Liu et al. . In addition, Yousefi et al.  generated DNA methylation-derived scores using the coefficients made available by Liu et al.; based on these coefficients, they obtained much lower AUCs for the same models as reported by Liu et al. For instance, for adults at midlife, the reported AUCs were between 0.48 to 0.57 for distinguishing heavy drinkers from non-drinkers and AUCs between 0.55 to 0.57 for heavy drinkers vs. light drinkers. Although the Yousefi et al. study used empirical data, an important limitation of the study is their relatively small sample size, comprising only of 14 heavy drinkers, 67 at-risk drinkers, 748 light drinkers, and 54 non-drinkers.
Another source for the Liu et al.  AUCs putatively reflecting overestimations is their use of category subsets, and therewith participant subsets, in their prediction modeling approach. For instance, for estimating AUC for heavy drinkers vs. non-drinkers, Liu et al. only used data from heavy and non-drinkers thereby excluding the data from light drinkers and at-risk drinkers. Such use of partial data in prediction modeling is expected to result in overestimated prediction outcomes compared to a model that would include all available categories. Moreover, models that exclude participants based on their non-considered categories cannot be applied to the general populations where people with the excluded categories exist but can never be inferred correctly because their category was excluded from the model.
In the current study, we firstly aimed at replicating the association between alcohol consumption and the 363 CpGs previously identified by Liu et al. , using data from 2042 independent participants from five cohorts [18,19,20,21,22]. Then, by using a total of 4677 individuals from eight European cohorts [18,19,20,21,22,23,24,25], we aimed to thoroughly validate the DNA methylation biomarker sets and prediction models for the epigenetic inference of alcohol consumption from blood previously used by Liu et al. . In addition, we trained and validated two new models including all alcohol consumption categories.
Study populations and data sets
For replicating the association between alcohol consumption and the 363 CpGs previously reported by Liu et al. , we used data from 2042 individuals of five European cohorts as part of the Biobank-based Integrative Omics Study (BIOS) consortium [18,19,20,21,22, 26].
For prediction model building and internal validation, we employed a total dataset of 2883 Europeans, including the 2042 individuals from the BIOS consortium  together with 841 participants from The Cooperative Health Research in the Region of Augsburg (KORA) study (F4) . Only participants with complete alcohol consumption data and DNA methylation data of all 144 predictive CpGs were included. Notably, there is no overlap between these data and those used by Liu et al.  in their prediction marker discovery EWAS. This makes our model building dataset completely independent from that of Liu et al. The KORA data included here were previously used by Liu et al. for prediction replication analysis; thus, its use for model building here provides no data dependency problem.
For external validation, we applied data from three European cohorts not applied for model training and internal validation: i.e., participants from the Rotterdam Study (sub-cohort RS-III-1)  (N = 648) not included in the BIOS consortium, from the Study of Health in Pomerania (SHIP)-Trend cohort (N = 433) , and two datasets from the TwinsUK Study, TwinsUK (N = 713) and TwinsUK2 (N = 442) . The TwinsUK2 (N = 442) dataset comprises a subset of the TwinsUK (N = 713) participants but with re-processed DNA methylation dataset and a different alcohol consumption collection method (Additional file 1: Supplementary Methods). Of note, the TwinsUK and RS-III-1 data were previously used by Liu et al.  in their prediction marker discovery EWAS that identified the 361 associated CpGs (P < 5 × 10–6). Testing the inference ability of these 361 alcohol-associated CpGs by Liu et al. was solely conducted in the Framingham Heart Study data , which identified the 5, 23, 78, and 144 CpG marker sets used for prediction modelling by Liu et al. and therefore here as well. However, since these data were used in the initial marker discovery EWAS, we cannot exclude an overestimation effect in our prediction accuracy estimates obtained from these two cohorts (see below).
An overview of the datasets included in each analysis step of our study is provided in Fig. 1, their characteristics are summarized in Table 1 and described in detail in Additional file 1: Supplementary Methods.
Replication of alcohol consumption associations
We aimed at replicating the association between alcohol consumption and the 363 CpGs previously identified by Liu et al. (P < 1 × 10–7) , using data from the BIOS consortium (N = 2042), which does not overlap with the Liu et al. data. This analysis revealed successful replication of 106 (29%) of these 363 CpGs after applying the Bonferroni-corrected significance threshold of P < 1.4 × 10−4 (0.05/363) and 283 (78%) CpGs based on the uncorrected nominal significance threshold of P < 0.05. All but one (cg06603309) of the 106 CpGs replicated after Bonferroni correction showed an inverse relationship with alcohol consumption, i.e., lower DNA methylation levels were associated with higher alcohol consumption, in line with the findings from the initial discovery EWAS by Liu et al. .
The top CpG in the Liu et al. discovery EWAS was cg02583484, annotated to the heterogeneous nuclear ribonucleoprotein A1 gene (P = 1.50 × 10–19, β = -0.0004), which was replicated in our independent dataset with a P-value of 1.16 × 10–13 and β = -0.0055. In our dataset, this CpG had a methylation range with a minimum DNA methylation beta-value of 0.1457 and a maximum of 0.4189. However, this marker was not among the 144 CpGs used by Liu et al. for inference and thus was not used by us for prediction validation (see below). Out of the 144 predictive CpGs, we replicated 29 CpGs (P < 1.4 × 10−4), which were all included in the 144-CpG model, 19 in the 78-CpG model, 6 in the 23-CpG model, and 3 in the 5-CpG model. A summary of the results is presented in Additional file 2: Table S1.
A total of 77 genes were annotated to the 106 replicated CpGs after Bonferroni correction. Gene ontology enrichment analysis via http://geneontology.org/page/go-enrichmentanalysis showed that these 77 genes were enriched in two biological processes. The ‘negative regulation of cellular macromolecule biosynthetic process’ included enrichment of 16 genes (3.49-fold, FDR = 4.91 × 10–2) and 25 genes were enriched in the ‘cellular response to chemical stimulus’ (2.63-fold, FDR = 3.80 × 10–2).
Internal validation of alcohol consumption prediction models
To test the reproducibility of the seven prediction models reported by Liu et al. , we performed internal validation in our model building dataset via ten-fold cross-validation. The CpGs included per marker set and their average DNA methylation β-value per alcohol consumption category are presented in Additional file 3: Table S2. The mean AUC ± SD obtained by the ten logistic regression models are denoted as ‘Internal Validation’ in Fig. 2, Additional file 4: Figures S1-S5 and Additional file 5: Tables S3-S9. The highest mean AUC of 0.83 ± 0.05 was obtained for both the 5 and 23-CpG model for heavy drinkers vs. non-drinkers (Fig. 2A and Supplemental Table S3). For the other six models, we obtained for all marker sets an average AUC ≤ 0.75 in three models and ≤ 0.70 in the other three models (Fig. 2). Among all predictive marker sets, the lowest AUC of 0.61 ± 0.04 was obtained for the 144-CpG model for light drinkers vs. non-drinkers (Supplemental Table S9 and Supplemental Figure S5). In all seven prediction models, we obtained lower mean AUCs based on 144-CpGs compared to the models with lower numbers of CpG predictors. For example, the 144-CpG model for heavy drinkers vs. non-drinkers yielded an AUC of 0.78 ± 0.06 compared to 0.83 ± 0.05 obtained in the 5 and 23-CpG models (Fig. 2A and Supplemental Table S3). Similar results were obtained in the other models, and for some of the 78-CpG models, as shown in Additional file 4: Figures S1-S5 and Additional file 5: Tables S3-S9. Notably, these findings contrasts with that of Liu et al., who reported increased prediction accuracies with increased numbers of CpG predictors .
External validation of alcohol consumption prediction models
Aiming to test the transportability of the prediction models trained in our complete model building dataset (N = 2883), we performed external validation using data from three European cohorts (N = 1794) not considered for model building and internal validation: the Rotterdam study (RS-III-1), SHIP-Trend, and two datasets from the TwinsUK study. The obtained AUCs are denoted as ‘External Validation’ in Fig. 2, Additional file 4: Figures S1-S5 and Additional file 5: Tables S3-S9. The AUCs obtained from external validation varied strongly per model between the external validation datasets and differed with those obtained in the internal cross-validation. For example, the 144-CpG model for the heavy vs. non-drinkers yielded an AUC of 0.80 in RS and 0.84 in SHIP-Trend, while in TwinsUK and TwinsUK2 they were considerably lower with 0.68 and 0.60, respectively, and the mean AUC in the internal cross-validation was 0.78 ± 0.06 (Fig. 2A and Additional file 5: Tables S3). Similarly, the 23-CpG heavy vs. non-drinker model yielded AUCs of 0.81, 0.87, 0.65, 0.61, and 0.83 ± 0.05, respectively. The high variance between obtained AUCs in the different external validation datasets was also observed in several other models as shown in Additional file 4: Figures S1-S5 and Additional file 5: Tables S3-S9. The high AUC variance we observed in the external validation between datasets indicate non-robust performance of these prediction models, when applied to independent datasets.
New models for epigenetic inference of alcohol consumption using all categories
Finally, we developed two new models for epigenetic inference of alcohol consumption from blood by considering all data from all individuals of all four alcohol consumption categories in our prediction models, thereby refraining from excluding categories from prediction modeling as was done by Liu et al. . To this end, we used all individuals from the model building dataset to build and internally validate via ten-fold cross-validation the models, as well as all individuals from our external validation datasets to externally validate the models. This was done for two different models. Model 1 comprised all heavy and at-risk drinkers combined vs. all light and non-drinkers combined. Model 2 included all heavy, at-risk, and light drinkers combined (i.e., all drinkers no matter the level of alcohol consumption) vs. all non-drinkers. The average AUCs ± SDs from internal cross-validation in the model building dataset were denoted as ‘Internal Validation’ and the four AUCs from the four external validation datasets as ‘External Validation’ (Fig. 3 and Additional file 5: Tables S10 and S11).
Regarding model 1 for inferring heavy and at-risk vs. light and non-drinkers, the (mean) AUCs from internal cross-validation and from external validations ranged between 0.67–0.68 and 0.60–0.70, respectively across all marker sets (Fig. 3A and Additional file 5: Table S10). Regarding model 2 for inferring all drinkers (heavy plus risk plus light) vs. non-drinkers, the AUCs from the two validation approaches based on the 5-CpG and the 23-CpG models were between 0.54–0.55 and 0.54–0.61, respectively. For the 78-CpG and the 144-CpG models, similarly low AUCs were seen in the internal validation, between 0.55–0.56, with slightly higher AUCs in the external validation datasets, between 0.57–0.63 (Fig. 3B and Additional file 5: Table S11). Thus, compared to the Liu et al. models based on an approach that leaves out data, the new models based on all data achieved generally lower AUCs, while the AUC variance in the external validation was much less pronounced between the datasets than observed for the Liu et al. models (Fig. 2, Additional file 4: Figures S1-S5, and Additional file 5: Table S3-S9).
In this study, we firstly performed replication analysis in an independent dataset of the EWAS results on alcohol consumption previously reported by Liu et al. , which delivered Bonferroni-corrected significant replication of close to one-third of the previously identified CpGs. Our smaller sample size of 2042 compared to 9642 in the Liu et al. study might be the reason why we only replicated one-third of the previously identified CpGs. However, using the nominal significance threshold (P < 0.05), we replicated the association of close to 80% of these CpGs.
Secondly, by using data from eight population-based cohorts, we performed in-depth validation of the biomarkers and models reported by Liu et al.  to infer alcohol consumption from blood. Reproducibility assesses the degree to which the model fits the real patterns rather than random noise in the data . To test for reproducibility of the models, we performed internal model validation by implementing a ten-fold cross-validation scheme. The heavy vs. non-drinkers model obtained the highest average AUCs of the seven models in the cross-validation. Interestingly, the 144 and 78-CpG models obtained a lower average AUC than the 5 and 23-CpG models. In addition, in all models, we observed a higher AUC for the models including less CpGs compared to the 144-CpG model and to some extent also for the 78-CpG models. In contrast, Liu et al. reported increased prediction accuracies for models with increased number of CpG predictors . Our findings provide evidence that these 144-CpG models are over-fitted and thus, likely not reproducible. This increased risk for overfitting by including an increasing number of CpGs was also suggested by Hattab et al. . Overfitting of a model is more likely to be observed when the ratio of the number of variables to the number of samples is small . In this context, Harrell et al.  suggested that for generalizable binary models, no more than one predictor per ten participants in the smallest outcome category should be examined when fitting a regression model. As some of the findings of the analysis come from partially fitting to the noise on top of the true signal, noise features may be assigned nonzero coefficients due to chance associations with response to the training set . Overall, the AUCs we achieved via internal cross-validation for the different models and marker sets were considerably lower than those reported by Liu et al. . Also, the results obtained in our internal validation were much lower compared to the results we obtained when applying the same methods as Liu et al., e.g. training and testing the model in the same dataset, in our model building dataset (see Additional file 6 for results). This confirms previous conclusions  that the prediction accuracies reported by Liu et al. represent overestimates.
The transportability of the prediction models was tested by applying the models (trained in the model building dataset) to four validation datasets from three cohorts. Three models yielded an AUC ≤ 0.75 in the internal validation and in all four external validation datasets across all marker sets. For the other four models, a large variability in AUCs was obtained between the different datasets. Overall, in these four models, we obtained similar to higher AUCs for the Rotterdam Study and SHIP-Trend compared to the internal validation, while both TwinsUK datasets provided lower AUCs than the internal validation. It is important to note that the datasets from the Rotterdam Study and the TwinsUK (N = 713) were both included in the EWAS for predictive marker discovery by Liu et al. . The use of the same participants here and by Liu et al. could have led to an overestimation of the prediction accuracies. Surprisingly, the AUCs we obtained in the TwinsUK (N = 713) in the current study are in most models much lower than the AUCs we obtain in SHIP-Trend and the Rotterdam study. The results obtained in the Rotterdam Study were overall more similar to those obtained by SHIP-Trend. These results suggest that the use of the same participant, here and by Liu et al., did not positively impact the prediction accuracies obtained in our study. The subset of the TwinsUK (N = 442) includes re-processed DNA methylation data and a different FFQ-based approach for alcohol consumption information. Nevertheless, also in this dataset we obtain lower AUCs compared to the Rotterdam Study and SHIP-Trend, with very similar result as for the total TwinsUK (N = 713) dataset. Notably, the AUCs from external validation were generally lower than the AUCs reported by Liu et al.  and as the similarly high AUCs we obtained from our model building dataset, when applying the same methods as Liu et al. (see Additional file 6 for results), providing further evidence that the prediction accuracies reported by Liu et al. represent overestimates. This is in line with our conclusion from internal validation and as suggested by Hattab et al. .
Yousefi et al.  estimated DNA methylation-derived scores using the coefficients made available by Liu et al.  in participants of the Accessible Resource for Integrated Epigenomic Studies (ARIES) parental generation at midlife cohort (N = 1049, mean age = 50.2 ± 5.4 SD) as discovery dataset. A limitation of the study by Yousefi et al.  was the relatively small sample size in the higher alcohol consumption categories, with only 14 heavy drinkers and 67 at-risk drinkers. As a result, the lower AUCs obtained by Yousefi et al.  compared to Liu et al.  could possibly be due to the small sample size rather than an accurate representation of the true model prediction accuracies. In the current study, however, we have implemented 2883 participants, including 495 non-drinkers, 1800 light drinkers, 370 at-risk drinkers, and 218 heavy drinkers, with an age range of 19–87 years (mean age 57.4 ± 13.8 SD). By including more participants, especially in the categories with higher alcohol consumption, we overcome this possible sample size limitation and thus provide a more reliable representation of the models’ prediction accuracies. Yousefi et al.  obtained low AUCs from 0.48 to 0.57 to distinguish heavy drinkers vs. non-drinkers and 0.55 to 0.57 for heavy drinkers vs. light drinkers in adults at midlife. In our external validation, we obtained AUCs from 0.80 to 0.89 in the Rotterdam Study, 0.68 to 0.87 in SHIP-Trend, 0.52 to 0.68 in TwinsUK (N = 713), and 0.50–0.63 in TwinsUK2 (N = 442) for distinguishing heavy vs. non-drinkers and 0.72 to 0.84 in the Rotterdam Study, 0.71 to 0.89 in SHIP-trend, 0.56 to 0.57 in TwinsUK (N = 713), and 0.52–0.55 in TwinsUK2 (N = 442) for heavy drinkers vs. light drinkers. The results in the Rotterdam Study and SHIP-Trend are overall higher than those obtained by Yousefi et al., while the results obtained in the TwinsUK are very similar to those obtained by Yousefi et al. In addition, the high variability in the obtained AUCs in our study and the close to random inference obtained by Yousefi et al.  study suggest that the tested CpGs are not as suitable as previously suggested for achieving transportable and accurate alcohol consumption prediction models.
The exclusion from prediction modelling of data from participants who did not fit the inferred categories, as done by Liu et al. , means that such models cannot be applied to the general population, where individuals with the excluded categories exist and can never be inferred correctly because their category was not considered in the prediction model. Therefore, for a prediction models to be applicable in cohort studies used in epidemiology research, or any practical applications in the clinic and beyond, should be designed in data that realistically reflects the general population. For that reason, we have developed two additional models in which data from all individuals of all alcohol categories were included and validated them internally via cross-validation as well as externally in independent datasets. The first model for heavy and at-risk drinkers vs. light and non-drinkers provided cross-validated AUCs between 0.67 and 0.68 across all four CpG marker sets. These results are close to the lower 95% CI of the 450-CpG based model previously developed by McCartney et al. , which had an AUC of 0.73 (95% CI = 0.69–0.78) to distinguish light-to-moderate drinkers from heavy drinkers. Four CpGs overlap between this 450-CpG model and the 23-CpG model: cg00252472, cg06690548, cg11613559, and cg12825509. In addition, two more CpGs overlap with the 144-CpG model; cg11376147 and cg18032812. In the external validation, we obtained AUCs in the range of 0.60–0.70 across all marker sets and all external validation cohorts. In the second model, which distinguishes heavy, at-risk, and light drinkers vs. non-drinkers, we obtained AUCs at 0.54–0.63 in both internal and external validation. Thus, when applying appropriate prediction methodology by not excluding participant data and performing external validation, the CpG marker sets reported by Liu et al.  yield much lower prediction accuracies as compared to the AUCs previously published and obtained here based on the previous approach.
Our study has strengths and limitations that should be considered when interpreting the results. The main strengths of our study are the use of a large dataset from several cohorts with similar numbers for the different categories as Liu et al. , and the use of four datasets for external model validation. Moreover, our findings agree with a previous validation study based on a different methodology , while our larger dataset improved the limitations of the limited data used in the previous validation study. The main limitation of our study, as well as in the previous studies, is that the alcohol consumption information is based on interviews or self-reported questionnaires, which are generally considered unreliable in terms of underestimating actual alcohol consumption. Regarding the putative inaccuracy of interviews and self-reported alcohol consumption used here as phenotypes, we cannot know how error-prone these reports are. In particular, it is possible that heavy drinkers might not be able to or might be hesitant or unwilling to accurately recall or report their high alcohol consumption. Also, there is variability in the questionnaires regarding the reference time window. For example, KORA participants were asked about alcohol consumption in the past few days, which may or may not be representative of the participants’ long-term alcohol consumption habits. Also, non-drinkers may include lifetime non-drinkers but also sober alcoholics; however, it is not yet clear how this could affect the obtained DNA methylation patterns. Because all available studies, including the EWAS that identified CpGs associated with alcohol consumption, used interviews or self-reported alcohol consumption information, this is a general limitation that cannot be easily solved, as methods to empirically measure alcohol concentrations are not suitable for estimating long-term alcohol consumption. Another source of uncertainty may lie in the calculation for alcohol consumption in grams/day, which presents a slight variation in the formula used between the different cohorts. The variation in alcohol consumption data collection between the cohorts might also play a role in the variance we obtain in the prediction AUCs.
Another shortcoming of our study was the inclusion of only participants from European ancestry. As DNA methylation patterns might differ between populations , the absence of non-European participants during marker discovery and model building might prohibit accurate model transportability to non-European populations. Hence, future studies would benefit from a trans-ethnic prediction marker discovery, model building, and validation.
Overall, our extensive validation testing of the different CpG sets reported by Liu et al.  for inferring alcohol consumption from blood demonstrates that using appropriate prediction methodology regarding both separating datasets for model building and model testing by performing internal cross-validation and external validation, and including all alcohol consumption categories and individuals in the prediction modelling, yields much lower prediction accuracies and with a high variance between validation cohorts for the Liu et al. models as were previously published. This allows us to conclude that the currently available DNA methylation predictors for alcohol consumption need to be improved considerably before epigenetic inference of alcohol consumption from blood can be considered for practical applications in the clinic and beyond. Our study implies that, currently, we are far away from epigenetic inference of alcohol consumption from blood in research and practical applications, despite EWASs having already delivered hundreds of associated CpGs. Thus, further EWASs on alcohol consumption are necessary to increase the number of associated CpGs, including replication studies for the identified CpGs. Established CpGs replicated in several independent studies could provide better predictive markers than CpGs identified in one large meta-analysis. These CpGs will need to be carefully tested for their value to improve the low accuracy in inferring alcohol consumption from blood achieved with the currently available marker sets.
This study was embedded within the Biobank-based Integrative Omics Study (BIOS) consortium , by including participants from the Rotterdam Study (sub-cohorts RS-II-3 and RS-III-2) (N = 611) , Cohort on Diabetes and Atherosclerosis Maastricht (CODAM) (N = 159) , the Netherlands Twin Register (NTR) (N = 617) , the Leiden Longevity Study (LLS) (N = 491) , and the Prospective ALS Study Netherlands (PAN) (N = 164) . Additionally, we included 841 participants from The Cooperative Health Research in the Region of Augsburg (KORA) study (F4) . External validation was conducted in independent samples (i.e., not used for model building and internal validation) from the Study of Health in Pomerania (SHIP)-Trend cohort (N = 433) , two datasets from the TwinsUK Study with overlapping participants; TwinsUK (N = 713) and TwinsUK2 (N = 442) , and participants from the Rotterdam Study sub-cohort RS-III-1 (N = 648) that are not included in the BIOS consortium. Alcohol consumption information was obtained via interviews or self-reported questionnaires. Cohort-specific data collection and dataset characteristics are summarized in Table 1 and described in detail in the Supplementary Methods (Additional file 1).
Microarray-based DNA methylation quantification
DNA was extracted from whole peripheral blood and analyzed with the Illumina Infinium Human Methylation 450 K BeadChip (Illumina Inc, San Diego, CA, USA) or the Infinium MethylationEPIC BeadChip (Illumina Inc, San Diego, CA, USA) to obtain the DNA methylation measurements. Details on cohort-specific methods are provided in Supplementary Methods (Additional file 1). The methylation proportion of a CpG site was reported as the methylation β-value in the range of 0 to 1.
Candidate-CpG association study for alcohol consumption and gene annotation
Using the data from the BIOS Consortium (N = 2042), we tested the association of the 363 CpGs previously found to be significantly associated (P < 1 × 10–7) with alcohol consumption . Alcohol consumption levels (grams/day) were right-skewed and contained non-drinkers; therefore, the log-transformed alcohol consumption (log (g per day + 1)) was used as the independent variable. The β-values of the 363 CpGs were included as the dependent variable and the analysis was adjusted for age, sex, BMI, batch effects (plate, plate location, and cohort ID), and Houseman-imputed white blood cell counts (WBC) for CD4T cells, CD8T cells, natural killer cells, B-cells, granulocytes, and monocytes . The Bonferroni multiple-test corrected 5% significance level of P < 1.4 × 10−4 (0.05/363) was applied. All analyses were performed using the statistical package R, version 3.4.3.
We obtained the genes annotated to the replicated CpGs using the annotation file provided by Illumina and performed Gene ontology (http://geneontology.org/page/go-enrichmentanalysis) enrichment analysis for these genes.
Validation of the previously published prediction models
The BIOS and KORA DNA methylation data were combined as the model building dataset (N = 2883) using the “ComBat” function  (R-package “sva” ) to adjust for the known batches via an empirical Bayesian framework adjusting for age and sex. Then, possible confounders were regressed out using linear regression models, obtaining the residuals for each CpG adjusted for age, sex, BMI, batch effects (plate, plate location, and cohort ID), and Houseman-imputed WBC (CpG = age + sex + BMI + batch effects + WBC).
The self-reported phenotypic data on alcohol consumption were categorized according to their alcohol consumption levels for which we used the same cut-off categories as described by Liu et al. , to allow for direct comparison of the models’ performance; non-drinkers: participants with no alcohol consumption; light drinkers: participants with alcohol consumption of 0 < g per day ≤ 28 in men and 0 < g per day ≤ 14 in women; at-risk drinkers: participants with alcohol consumption of 28 < g per day < 42 in men and 14 < g per day < 28 in women; heavy drinkers: participants with alcohol consumption of ≥ 42 g per day in men and ≥ 28 g per day in women.
The alcohol categories used in each model were inferred using the same seven prediction models as previously applied by Liu et al. with heavy drinkers vs. all other categories separately, i.e., heavy drinkers vs. (1) non-drinkers, (2) light drinkers, (3) pooled individuals of light or non-drinkers, (4) at-risk drinkers, as well as two-category combinations between the other categories including (5) at-risk drinkers vs. non-drinkers, (6) at-risk drinkers vs. light drinkers, and (7) light drinkers vs. non-drinkers. In all models, the former category was the ‘cases’ (coded as “1”) and the latter was the ‘control’ group (coded as “0”). Selecting a subset of categories in prediction modeling and AUC estimation, as was done by Liu et al., may limit the possibility to extrapolate the result to the general population. Hence, we replicated this approach solely for outcome compatibility reasons. All seven models were trained for the null model, which only includes age, sex, and BMI, and subsequently the null model combined with the residuals of the four CpG sets (5, 23, 78, or 144 CpGs). The CpGs included per model and the average DNA methylation β-values per category are presented in Additional file 3: Table S2.
Internal and external validation of the previously published prediction models
We tested the reproducibility (internal validation) and transportability (external validation) of the prediction models conducted by Liu et al. [12, 28]. First, we adopted a tenfold cross-validation scheme  in which the whole model building dataset (N = 2883) was randomly distributed into ten non-overlapping subsets. The logistic regression model was trained in a combination of nine subsets (90% of the data), which was then applied to the remaining subset (10% of the data) to infer the participants’ alcohol status. This method results in ten different training (90%) and testing (10%) sets. We trained the seven models in the training sets (90%) using binomial regression analysis with the alcohol categories (coded as 1/0) as the dependent variable and age, sex, and BMI without (the null model) or with a set of (the residuals of the) CpGs as the independent variables (Alcohol category = age + sex + BMI (+ ResCpGs5, 23, 78, 144)). For this purpose, the “glm” function with “binomial” as family and “logit” as link were used. The models were then applied to the test set (10%) using the “predict” function. The prediction performance of the models was assessed using “roc” (R-package “pROC”) that calculates the AUC per model. This method resulted in ten logistic regression models and consequently, ten AUCs from which average values were estimated and standard deviation were obtained.
Secondly, we externally validated the models that were trained in the complete model building dataset (N = 2883) by testing them in four external validation datasets, using the “predict” function. The “roc” function (R-package “pROC”) was again used to calculate the AUC per model. The independent cohorts used our previously described pre-processing procedure by regressing out the potential covariates. The TwinsUK study used a linear mixed model to additionally adjust for twin family structure and zygosity using random effects. Also, sex was not included in the pre-processing steps because solely women were included in the TwinsUK analysis. Notably, according to the above-described scenarios, both internal and external validations followed the same approach previously applied by Liu et al.  in that individuals not fitting the inferred categories were excluded from the prediction analysis.
Prediction modeling without excluding categories and data
Finally, we trained as well as internally and externally validated two new prediction models comprising all individuals in the prediction modeling, i.e., (1) heavy and at-risk drinkers vs. light and non-drinkers and (2) heavy, at-risk and light drinkers (i.e. all drinkers no matter how much) vs. non-drinkers. These two models were internally validated via tenfold cross-validation and externally validated in four datasets. As age, sex, and BMI are already accounted for in the residuals we solely included the four CpG marker sets in these models. The coefficients for these models are presented in Additional file 7: Table S21.
Availability of data and materials
The data from the BIOS Consortium supporting the conclusions of this article are available from the European Genome-phenome Archive (EGA) under accession number EGAC00001000277. Data from the Rotterdam Study can be requested at http://www.epib.nl/research/ergo.htm or contact M. Arfan Ikram (email@example.com). The informed consents given by KORA study participants do not cover data posting in public databases. However, data are available upon request from KORA Project Application Self-Service Tool (https://epi.helmholtz-muenchen.de/). Data requests can be submitted online and are subject to approval by the KORA Board. The data of the SHIP study cannot be made publically available due to the informed consent of the study participants, but it can be accessed through a data application form available at https://fvcm.med.uni-greifswald.de/ for researchers who meet the criteria for access to confidential data. The TwinsUK methylation dataset can be applied for through the TwinsUK data access procedures (described in detail at https://twinsuk.ac.uk/resources-for-researchers/access-our-data/).
The Atherosclerosis Risk in Communities study
Accessible Resource for Integrated Epigenomic Studies
Area under the curve
- BIOS Consortium:
Biobank-based Integrative Omics Study Consortium
Body mass index
Cohort on Diabetes and Atherosclerosis Maastricht
Epigenome-wide association study
The Framingham Heart Study
- KORA study:
The Cooperative Health Research in the Region of Augsburg
The Lothian Birth Cohort 1936
Leiden Longevity Study
The Multi-Ethnic Study of Atherosclerosis
The Netherlands Twin Register
Prospective ALS Study Netherlands
Study of Health in Pomerania-Trend
White blood cell counts
Collaborators GBDRF. Global, regional, and national comparative risk assessment of 84 behavioural, environmental and occupational, and metabolic risks or clusters of risks, 1990–2016: a systematic analysis for the Global Burden of Disease Study 2016. Lancet. 2017;390(10100):1345–422.
Shield KD, Parry C, Rehm J. Chronic diseases and conditions related to alcohol use. Alcohol Res. 2013;35(2):155–73.
Helander A, Eriksson CJ, State WISo, Trait Markers ofAlcohol U, Dependence I. Laboratory tests for acute alcohol consumption: results of the WHO/ISBRA Study on State and Trait Markers of Alcohol Use and Dependence. Alcohol Clin Exp Res. 2002;26(7):1070–7.
Helander A. Biological markers in alcoholism. J Neural Transm Suppl. 2003;66:15–32.
Andresen-Streichert H, Müller A, Glahn A, Skopp G, Sterneck M. Alcohol biomarkers in clinical and forensic contexts. Dtsch Arztebl Int. 2018;115(18):309–15.
Althubaiti A. Information bias in health research: definition, pitfalls, and adjustment methods. J Multidiscip Healthc. 2016;9:211–7.
Ladd-Acosta C. Epigenetic signatures as biomarkers of exposure. Curr Environ Health Rep. 2015;2(2):117–25.
Vidaki A, Kayser M. From forensic epigenetics to forensic epigenomics: broadening DNA investigative intelligence. Genome Biol. 2017;18(1):238.
Philibert RA, Plume JM, Gibbons FX, Brody GH, Beach SR. The impact of recent alcohol use on genome wide DNA methylation signatures. Front Genet. 2012;3:54.
Wilson LE, Xu Z, Harlid S, White AJ, Troester MA, Sandler DP, et al. Alcohol and DNA methylation: an epigenome-wide association study in blood and normal breast tissue. Am J Epidemiol. 2019;188(6):1055–65.
Xu K, Montalvo-Ortiz JL, Zhang X, Southwick SM, Krystal JH, Pietrzak RH, et al. Epigenome-wide DNA methylation association analysis identified novel loci in peripheral cells for alcohol consumption among european American male veterans. Alcohol Clin Exp Res. 2019;43(10):2111–21.
Liu C, Marioni RE, Hedman AK, Pfeiffer L, Tsai PC, Reynolds LM, et al. A DNA methylation biomarker of alcohol consumption. Mol Psychiatry. 2018;23(2):422–33.
Endo K, Li J, Nakanishi M, Asada T, Ikesue M, Goto Y, et al. Establishment of the MethyLight assay for assessing aging, cigarette smoking, and alcohol consumption. Biomed Res Int. 2015;2015:451981.
Philibert R, Miller S, Noel A, Dawes K, Papworth E, Black DW, et al. A four marker digital PCR toolkit for detecting heavy alcohol consumption and the effectiveness of its treatment. J Insur Med. 2019;48(1):90–102.
McCartney DL, Hillary RF, Stevenson AJ, Ritchie SJ, Walker RM, Zhang Q, et al. Epigenetic prediction of complex traits and death. Genome Biol. 2018;19(1):136.
Hattab MW, Clark SL, van den Oord E. Overestimation of the classification accuracy of a biomarker for assessing heavy alcohol use. Mol Psychiatry. 2018;23(11):2114–5.
Yousefi PD, Richmond R, Langdon R, Ness A, Liu C, Levy D, et al. Validation and characterisation of a DNA methylation alcohol biomarker across the life course. Clin Epigenet. 2019;11(1):163.
Ikram MA, Brusselle G, Ghanbari M, Goedegebure A, Ikram MK, Kavousi M, et al. Objectives, design and main findings until 2020 from the Rotterdam Study. Eur J Epidemiol. 2020;35(5):483–517.
van Greevenbroek MM, Jacobs M, van der Kallen CJ, Vermeulen VM, Jansen EH, Schalkwijk CG, et al. The cross-sectional association between insulin resistance and circulating complement C3 is partly explained by plasma alanine aminotransferase, independent of central obesity and general inflammation (the CODAM study). Eur J Clin Invest. 2011;41(4):372–9.
Willemsen G, Vink JM, Abdellaoui A, den Braber A, van Beek JH, Draisma HH, et al. The Adult Netherlands Twin Register: twenty-five years of survey and biological data collection. Twin Res Hum Genet. 2013;16(1):271–81.
Schoenmaker M, de Craen AJ, de Meijer PH, Beekman M, Blauw GJ, Slagboom PE, et al. Evidence of genetic enrichment for exceptional survival using a family approach: the Leiden Longevity Study. Eur J Hum Genet. 2006;14(1):79–84.
Huisman MH, de Jong SW, van Doormaal PT, Weinreich SS, Schelhaas HJ, van der Kooi AJ, et al. Population based epidemiology of amyotrophic lateral sclerosis using capture-recapture methodology. J Neurol Neurosurg Psychiatry. 2011;82(10):1165–70.
Holle R, Happich M, Lowel H, Wichmann HE, Group MKS. KORA–a research platform for population based health research. Gesundheitswesen. 2005;67 Suppl 1:S19-25.
Völzke H, Alte D, Schmidt CO, Radke D, Lorbeer R, Friedrich N, et al. Cohort profile: the study of health in Pomerania. Int J Epidemiol. 2011;40(2):294–307.
Moayyeri A, Hammond CJ, Valdes AM, Spector TD. Cohort Profile: TwinsUK and healthy ageing twin study. Int J Epidemiol. 2013;42(1):76–85.
Bonder MJ, Luijk R, Zhernakova DV, Moed M, Deelen P, Vermaat M, et al. Disease variants alter transcription factor levels and methylation of their binding sites. Nat Genet. 2017;49(1):131–8.
Kannel WB, Feinleib M, McNamara PM, Garrison RJ, Castelli WP. An investigation of coronary heart disease in families. The Framingham offspring study. Am J Epidemiol. 1979;110(3):281–90.
Justice AC, Covinsky KE, Berlin JA. Assessing the generalizability of prognostic information. Ann Intern Med. 1999;130(6):515–24.
Harrell FE Jr, Lee KL, Mark DB. Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat Med. 1996;15(4):361–87.
Harrell FE Jr, Lee KL, Califf RM, Pryor DB, Rosati RA. Regression modelling strategies for improved prognostic prediction. Stat Med. 1984;3(2):143–52.
Frank E, Harrell J. Regression modeling strategies; with applications to linear models, logistic and ordinal regression, and survival analysis. 2nd ed. New York: Springer; 2001.
Kader F, Ghai M. DNA methylation-based variation between human populations. Mol Genet Genomics. 2017;292(1):5–35.
Houseman EA, Kelsey KT, Wiencke JK, Marsit CJ. Cell-composition effects in the analysis of DNA methylation array data: a mathematical perspective. BMC Bioinform. 2015;16:95.
Johnson WE, Li C, Rabinovic A. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics. 2007;8(1):118–27.
Leek JT, Johnson WE, Parker HS, Jaffe AE, Storey JD. The sva package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinformatics. 2012;28(6):882–3.
Hastie TTR, Friedman J. The elements of statistical learning: data mining, inference, and prediction. 2nd ed. New York: Springer; 2009.
The authors are grateful to the participants of the included cohorts; the Rotterdam Study (http://www.erasmus-epidemiology.nl/research/ergo.htm), the CODAM study (http://www.carimmaastricht.nl/), the Netherlands Twin Registry (http://www.tweelingenregister.org), the Leiden Longevity Study (http://www.leidenlangleven.nl), the PAN study (http://www.alsonderzoek.nl/), the KORA Study (https://www.helmholtzmuenchen.de/en/kora/index.html), SHIP-Trend (https://ship.community-medicine.de/) and TwinsUK (https://twinsuk.ac.uk/). Detailed cohort specific acknowledgments are included in Additional file 1: Supplementary Methods.
This work was performed within the framework of the BBMRI Metabolomics Consortium funded by BBMRI-NL, a research infrastructure financed by the Dutch government (NWO 184.021.007 and 184.033.111). A full list of the BIOS consortium investigators is available in Additional file 8. SCEM, AV, MG, and MK were supported by the Erasmus MC University Medical Center Rotterdam. AV was additionally supported with an EUR Fellowship by Erasmus University Rotterdam. Detailed cohort specific funding are included in the Supplementary Methods (Additional file 1). The researchers are independent from the funders. The study sponsors had no role in the study design, data collection, data analysis, interpretation of data and preparation, review or approval of the manuscript.
Ethics approval and consent to participate
The study was approved by the institutional review boards of the participating medical centers: RS by the Medical Ethics Committee of the Erasmus MC (registration number MEC 02.1015) and by the Dutch Ministry of Health, Welfare and Sport (Population Screening Act WBO, license number 1071272-159521-PG). The RS has been entered into the Netherlands National Trial Register (NTR; http://www.trialregister.nl) and into the WHO International Clinical Trials Registry Platform (ICTRP; http://www.who.int/ictrp/network/primary/en/) under shared catalogue number NTR6831. CODAM by the Medical Ethical Committee of the Maastricht University; NTR by the Central Ethics Committee on Research Involving Human Subjects of the VU University Medical Centre; LLS by the Ethical committee of the Leiden University Medical Center (registration number P01.113); PAN by the Institutional review board of the University Medical Centre Utrecht; KORA by the Ethics Committee of the Bavarian Medical Association (REC reference number: #06068); SHIP by the medical ethics committee of the University of Greifswald; and the TwinsUK study by the St Thomas’ Hospital Local Ethics Committee. The experimental methods comply with the Helsinki Declaration. All participants provided written informed consent to participate in the study and to have their information obtained from treating physicians.
HJG has received travel grants and speakers honoraria from Fresenius Medical Care, Neuraxpharm, Servier and Janssen Cilag as well as research funding from Fresenius Medical Care.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
. Comprising the study cohort characteristics.
. Which shows the replication results for the candidate-CpG association study on alcohol consumption.
. Presenting the CpGs incorporated in the four prediction marker sets, together with the mean DNA methylation residual β-value per alcohol consumption category per CpG in the model building dataset, and in the four external validation datasets.
. Figure S1 shows the results for heavy drinkers vs. light and non-drinkers; Figure S2 shows the results for heavy drinkers vs. at-risk drinkers; Figure S3 shows the results for at-risk drinkers vs. non-drinkers; Figure S4 shows the results for at-risk drinkers vs. light drinkers; and Figure S5 shows the results for light drinkers vs. non-drinkers.
. Table S3 shows the results for heavy drinkers vs. non-drinkers; Table S4 shows the results for heavy drinkers vs. light drinkers; Table S5 shows the results for heavy drinkers vs. light and non-drinkers; Table S6 shows the results for heavy drinkers vs. at-risk drinkers; Table S7 shows the results for at-risk drinkers vs. non-drinkers; Table S8 shows the results for at-risk drinkers vs. light drinkers; Table S9 shows the results for light drinkers vs. non-drinkers; Table S10 shows the results for heavy and at-risk drinkers vs. light and non-drinkers; and Table S11 shows the results for heavy, at-risk and light drinkers vs. non-drinker.
. We provide the data, methods, and results regarding the replication of the Liu et al. approach; including accompanying supplementary Tables S12–S20, and supplementary Figures S6–S14. Table S12 shows the results for heavy drinkers vs. non-drinkers; Table S13 shows the results for heavy drinkers vs. light drinkers; Table S14 shows the results for heavy drinkers vs. light and non-drinkers; Table S15 shows the results for heavy drinkers vs. at-risk drinkers; Table S16 shows the results for at-risk drinkers vs. non-drinkers; Table S17 shows the results for at-risk drinkers vs. light drinkers; Table S18 shows the results for light drinkers vs. non-drinkers; Table S19 shows the results for heavy and at-risk drinkers vs. light and non-drinkers; and Table S20 shows the results for heavy, at-risk and light drinkers vs. non-drinker. Figure S6 shows the results for heavy drinkers vs. non-drinkers; Figure S7 shows the results for heavy drinkers vs. light drinkers; Figure S8 shows the results for heavy drinkers vs. light and non-drinkers; Figure S9 shows the results for heavy drinkers vs. at-risk drinkers; Figure S10 shows the results for at-risk drinkers vs. non-drinkers; Figure S11 shows the results for at-risk drinkers vs. light drinkers; Figure S12 shows the results for light drinkers vs. non-drinkers; Figure S13 shows the results for heavy and at-risk drinkers vs. light and non-drinkers; and Figure S14 shows the results for heavy, at-risk and light drinkers vs. non-drinker.
. Providing the coefficients for our two new models including four-categories for each marker set.
List of all authors included in the BIOS consortium.
About this article
Cite this article
Maas, S.C.E., Vidaki, A., Teumer, A. et al. Validating biomarkers and models for epigenetic inference of alcohol consumption from blood. Clin Epigenet 13, 198 (2021). https://doi.org/10.1186/s13148-021-01186-3
- DNA methylation
- Alcohol inference