Biomarker screening | Discovery phase | Validation phase | Combined dataset | |
---|---|---|---|---|
Model development | Training set | Internal testing set | External testing set | |
Characteristics | TCGA | GSE75537 | GSE52793 | |
Number of samples | 499 | 53 | 82 | 634 |
Age at diagnosis (years) | 61.08 ± 11.92 | 49.36 ± 13.47 | – | – |
Age at diagnosis, N (%) | Â | Â | Â | Â |
 < 40 | 18 (3.6) | 16 (30.2) | – | 34 (6.2) |
 40–49 | 58 (11.6) | 15 (28.3) | – | 73 (13.2) |
 50–59 | 144 (28.9) | 7 (13.2) | – | 151 (27.4) |
 ≥ 60 | 279 (55.9) | 15 (28.3) | – | 294 (53.2) |
 Unknown | 0 | 0 | 82 | 82 |
Gender, N (%) | Â | Â | Â | Â |
 Male | 366 (73.3) | 42 (79.3) | – | 408 (73.9) |
 Female | 133 (26.7) | 11 (20.7) | – | 144 (26.1) |
 Unknown | 0 | 0 | 82 | 82 |
T stage, N (%) | Â | Â | Â | Â |
 I | 33 (6.8) | 13 (24.5) | – | 46 (8.6) |
 II | 142 (29.3) | 15 (28.3) | – | 157 (29.2) |
 III | 130 (26.9) | 12 (22.7) | – | 142 (26.4) |
 IV | 179 (37.0) | 13 (24.5) | – | 192 (35.8) |
 Unknown | 15 | 0 | 82 | 97 |
N stage, N (%) | Â | Â | Â | Â |
 0 | 238 (49.9) | 25 (47.2) | – | 263 (49.6) |
 1 | 80 (16.8) | 8 (15.1) | – | 88 (16.6) |
 2 | 152 (31.9) | 20 (37.7) | – | 172 (32.5) |
 3 | 7 (1.4) | 0 (0) | – | 7 (1.3) |
 Unknown | 22 | 0 | 82 | 104 |
M stage, N (%) | Â | Â | Â | Â |
 0 | 469 (99.0) | 45 (100.0) | – | 514 (99.0) |
 1 | 5 (1.0) | 0 (0) | – | 5 (1.0) |
 Unknown | 25 | 8 | 82 | 115 |
Smoking status, N (%) | Â | Â | Â | Â |
 Yes | 378 (77.3) | – | – | 378 (77.3) |
 No | 111 (22.7) | – | – | 111 (22.7) |
 Unknown | 10 | 53 | 82 | 145 |
Race, N (%) | Â | Â | Â | Â |
 White | 426 (87.8) | – | – | 426 (87.8) |
 Black or African American | 47 (9.7) | – | – | 47 (9.7) |
 Asian | 10 (2.1) | – | – | 10 (2.1) |
 American Indian or Alaska Native | 2 (0.4) | – | – | 2 (0.4) |
 Unknown | 14 | 53 | 82 | 149 |