Skip to main content

Table 2 Performances of the models based on machine learning

From: Identification of DNA methylation-regulated genes as potential biomarkers for coronary heart disease via machine learning in the Framingham Heart Study

Features

Algorithm

Dataset

F1

ACC

AUC (95% CI)

AP

KS

TP

FP

TN

FN

TPR

TNR

Kappa

Methylation

LightGBM

Training

0.995

0.999

1.000 (1.000–1.000)

1.000

1.000

197

0

1364

2

0.990

1.000

0.994

  

Validation

0.460

0.807

0.768 (0.694–0.843)

0.538

0.540

43

77

378

24

0.642

0.831

0.353

 

XGBoost

Training

1.000

1.000

1.000 (1.000–1.000)

1.000

1.000

199

0

1364

0

1.000

1.000

1.000

  

Validation

0.429

0.770

0.756 (0.683–0.830)

0.391

0.525

45

98

357

22

0.672

0.785

0.308

 

Random forest

Training

0.995

0.999

1.000 (1.000–1.000)

1.000

1.000

197

0

1364

2

0.990

1.000

0.994

  

Validation

0.443

0.803

0.737 (0.656–0.818)

0.611

0.517

41

77

378

26

0.612

0.831

0.334

Expression

LightGBM

Training

0.992

0.998

1.000 (1.000–1.000)

1.000

1.000

196

0

1364

3

0.985

1.000

0.991

  

Validation

0.447

0.801

0.709 (0.626–0.792)

0.465

0.472

42

79

376

25

0.627

0.826

0.337

 

XGBoost

Training

0.997

0.999

1.000 (1.000–1.000)

1.000

1.000

198

0

1364

1

0.995

1.000

0.997

  

Validation

0.426

0.784

0.706 (0.646–0.766)

0.538

0.494

42

88

367

25

0.627

0.807

0.309

 

Random forest

Training

1.000

1.000

1.000 (1.000–1.000)

1.000

1.000

199

0

1364

0

1.000

1.000

1.000

  

Validation

0.283

0.592

0.647 (0.563–0.731)

0.347

0.320

42

188

267

25

0.627

0.587

0.105

Combination

LightGBM

Training

1.000

1.000

1.000 (1.000–1.000)

1.000

1.000

199

0

1364

0

1.000

1.000

1.000

  

Validation

0.517

0.839

0.834 (0.770–0.897)

0.616

0.615

45

62

393

22

0.672

0.864

0.427

 

XGBoost

Training

1.000

1.000

1.000 (1.000–1.000)

1.000

1.000

199

0

1364

0

1.000

1.000

1.000

  

Validation

0.439

0.780

0.807 (0.740–0.874)

0.460

0.566

45

93

362

22

0.672

0.796

0.322

 

Random forest

Training

0.995

0.999

1.000 (1.000–1.000)

1.000

1.000

197

0

1364

2

0.990

1.000

0.994

  

Validation

0.452

0.791

0.818 (0.758–0.878)

0.599

0.487

45

87

368

22

0.672

0.809

0.340

FRS

Framingham 10-

Training

–

0.830

0.647 (0.606–0.687)

–

–

39

105

1188

147

0.210

0.919

–

 

Year risk scale

Validation

–

0.797

0.610 (0.536–0.684)

–

–

12

49

376

50

0.194

0.885

–

  1. ACC Accuracy, AUC Area under the receiver operating characteristic curve, CI Confidence interval, AP Average precision score, KS Kolmogorov–Smirnov, TP True positive, FP False positive, TN True negative, FN False negative, TPR True positive rate, TNR True negative rate, FRS Framingham risk score. Dashes meant the parameters were not applicable in the Framingham 10-year risk scale