Study design and study samples
This study was based on the FISSIC study (the Fangshan/Family-based Ischemic Stroke Study in China), with details described previously [36]. Briefly, the FISSIC study is a family-based genetic pedigree study to assess the role of multiple genetic, epigenetic, and environmental risk factors involved in the etiology of ischemic stroke. We recruited ischemic stroke patients as probands and their surviving biological parents and/or siblings. The inclusion criteria for probands were: (1) confirmed ischemic stroke patients with full medical records, computerized tomography (CT), or magnetic resonance imaging (MRI); (2) older than 40 years at enrollment; (3) had at least one surviving parent or sibling who could participate in the study. Because of the late onset of ischemic stroke, most of the subjects collected were proband–sibling families. Until 2017, 2518 participants from 918 families were recruited, of which 1007 were ischemic stroke cases and 1151 were controls with no ischemic stroke.
In the current study, we employed another stricter inclusion criterion to exclude the effect of age on the outcome, so that the age difference between the proband and their siblings was no more than 2 years. Finally, 118 proband–sibling families met the above criteria, and we randomly selected 55 families from this eligible proband–sibling family pool for DNA methylation analyses, because of limited budget.
Data collection
Data for participants included questionnaire assessments, laboratory tests, and clinical examinations.
We used a structured questionnaire to collect general demographic (such as age and gender) and lifestyle (such as smoking and alcohol-drinking habits) characteristics, and a medical history (diagnosis of hypertension and type 2 diabetes) of the subjects, through face-to-face interviews by trained investigators. Participants were categorized as smokers or non-smokers, where smokers included current smokers and former smokers. Current smokers were defined as a person who smoked at least one cigarette a day and has smoked accumulatively for 6 months or more. Former smokers were defined as people who smoked regularly in the past and have quit smoking for at least 1 month. Non-smokers were participants who had never smoked. Drinking was defined as someone who drank at least 50 ml per week of any alcohol-containing liquor for at least half a year.
Laboratory tests were done at the molecular epidemiology laboratory in the Department of Epidemiology and Biostatistics, School of Public Health, Peking University. The participants were asked not to eat after 20:00 the night before the survey. Serum blood samples were collected in EDTA tubes. All the samples were tested by auto-analyzer (Mindray BS-420; Shenzhen, China) using standard procedures. The tests included total cholesterol (TC), total triglyceride (TG), high-density lipoprotein (HDL), and low-density lipoprotein (LDL).
Clinical examinations included height, weight, cIMT, baPWV, and ABI. Height and body weight were measured by trained and certified observers using standard procedures. BMI was calculated as weight divided by height squared (kg/m2). Carotid ultrasound was performed by one of two trained ultrasonographers using a high-resolution B-mode real-time ultrasound system (Acuson Inc., Mountain View, CA, USA) with a probe frequency of 7.5–10.0 MHz according to study protocol. cIMT was measured using vascular research tools (VRT) 6 DEM-O software. There were three measurement sites on each side of the neck: the proximal end of the common carotid artery (CCA), the distal end of CCA, and the carotid bifurcation. Each segment was 1 cm long. The maximum value measured for each segment of blood vessels was used as the measured value, and the average of the measured values on the above six segments was taken as the cIMT value of the subject. The cIMT was determined by four trained professionals with intra-class correlation coefficients of 0.8 or higher. baPWV is a measurement of systematic arterial stiffness and ABI is valuable for screening for peripheral artery disease. Together with cIMT, they are all indicators of atherosclerotic vascular disease. The baPWV and ABI values were tested with a BP–203 RPE III automatic arteriosclerosis detection device (Omron Health Medical Co., Ltd., China). The participants were placed in a quiet position for 3 min before testing, and then the cuff was tied to both upper arm elbow joints and ankles, and the pulse wave in the brachial artery and the posterior tibial artery pulse were measured using an automated oscillometric method. baPWV was then calculated by dividing the distance between two pulse wave measurement points by the time difference between two pulse waves. The larger the value, the higher the degree of arteriosclerosis. The detector automatically calculated and recorded the baPWV value, taking the average of the left and right baPWV as the baPWV value. ABI was calculated by dividing the highest value obtained at each ankle by the highest of the arm values. The ABI of both the left and right legs was recorded, and for the definition of peripheral artery disease, the lower value of the two was considered. The methodology for baPWV and ABI measurement was the same for all participants.
Nucleic acid extraction and measurement of the DNA methylation level
Genomic DNA was isolated from peripheral blood leukocytes with a DNA extraction kit (DP319–01; Tiangen Biotech, Beijing, China) following the manufacturer’s instructions. Bisulfite conversion was performed using the EpiTect Bisulfite Kit (QIAGEN, Germany) according to the manufacturer’s instructions. PCR of bisulfite-converted DNA samples was performed using the PyroMark PCR Kit (QIAGEN, Germany). For all assays, the amplification began with an initial activation period of 3 min at 95 °C, followed by a 3-stage cycling process of denaturation (94 °C for 30 s), annealing (56 °C for 30 s), and extension (72 °C for 1 min) for 40 cycles. The PCR process completed with a final extension period of 72 °C for 7 min. Methylation assays of the two promotor regions of ABCG1 and APOE were designed with PyroMark Assay Design 2.0 (QIAGEN, Germany). The PyroMark custom assay (QIAGEN, Germany) genomic location, primer sequences, and the sequence for analysis are presented in Additional file 1: Table S4. DNA methylation was assessed using a PyroMark Q96 ID system (QIAGEN, Germany). The nucleotide dispensation order was generated by entering the sequence for analysis into the PyroMark Q96 software (QIAGEN, Germany). A non-CpG cytosine was included in the nucleotide dispensation order to detect incomplete bisulfite conversion. The methylation at each CpG site was determined using the Pyro Q-CpG software set in CpG mode.
We used candidate gene strategy to select genes and their CpG sites based on their functions and previous evidence on the association of DNA methylation, and at the same time referred to the results of BeadChip that we have conducted in a smaller population (not yet published). The cg06500161 was the most widely studied methylation site in ABCG1, and therefore was included in this study [24, 37, 38]. Another CpG site in ABCG1 (cg02494239) was selected because it is located in the gene promotor area and has relatively higher differential methylation levels and P values between ischemic stroke cases and matched siblings than other sites located in this area, according to the BeadChip results. The cg14123992 site in APOE has been previously reported to be associated with late-onset disease, and therefore it was included in this study [15, 29]. The locations of the three CpG sites in the genes are shown in Fig. 3.
Statistical analysis
Normality of the data was tested using the Shapiro–Wilk test. Continuous variables were expressed as the mean and standard deviation (SD) if normally distributed; otherwise, they were expressed as the median and interquartile range (IQR), while categorical variables were reported as frequencies and percentages (%).
First, we compared the distributions of demographic characteristics, life behaviors, medical history, BMI, and plasma lipid levels between probands and their age-matched siblings. The differences were assessed with a paired chi-square test for the qualitative variables, and a paired t test and non-parametric test for the normal and skewed distributed quantitative variables, respectively. We estimated the mean and the SD, the median and the IQR, the range with minimum to maximum of DNA methylation at ABCG1 and APOE for probands and their siblings.
Second, we established logistic mixed-effect models for each CpG site to test whether the methylation level was associated with ischemic stroke. The general formula of the mixed-effect regression model is:
$$ Y= X\beta + Z\mu +\epsilon $$
where Y is a vector of the outcome variable, and X is a matrix of the predictor variables for the fixed effects, and Z is the matrix of covariates for the random effects, and ϵ is a vector of the residuals. β is the vector of the regression coefficient for the fixed effects, and μ is for the random effects. Ischemic stroke (binary variable, yes or no) was entered as a dependent variable in the logistic mixed-model, and methylation level was an independent variable as a fixed effect, and family number was entered as a random effect. Gender, previous history of diabetes and hypertension, smoking, drinking, BMI, and blood lipid levels (TC, TG, HDL, and LDL) were added as covariates to the model as fixed effects to obtain adjusted associations. For more detailed analysis, we presented methylation variable as three types of variables for each CpG site, which were binary variable, categorical variable, and continuous variable. Taking one methylation site as an example to explain how to define the different types of independent variable, we created the binary methylation variable with the median as the split point, which is higher than the median as hypermethylation and lower than the median as hypomethylation. We used the 25% quartile, the median, and the 75% quartile of the methylation value to define categorical variable, where Q1 group was 0–25% of the values, Q2 was 25–50%, Q3 was 50–75%, and Q4 was 75–100%. We defined the continuous methylation variable as the methylation value obtained in the experiment (expressed as a percentage) multiplied by 100. When the binary variable is used as the independent variable in the mixed-effect model, the hypomethylation group is used as the reference group, and when the categorical variable is used as the independent variable, Q1 group is used as the reference groups, and all other groups were compared with Q1. β can be obtained directly from the model, and odds ratio (OR) is the exponentiation of β. For the continuous methylation variable, the regression coefficient multiplied by 10 represents the degree of increase in outcome risk for every 10% increase in methylation.
Third, linear mixed-effect regression models were performed to analyze the associations between the ABCG1 and APOE gene methylation levels and cIMT, ABI, and baPWV, which predict atherosclerosis. In this step, cIMT, ABI, and baPWV were the dependent variables, respectively. Definition of independent variables and covariates were the same as for the second step.
The study population was reanalyzed separately for men and women. Results were considered statistically significant when the P values were less than 0.05 (two sided). All statistical analyses were performed with the STATA 13.0 software (StataCorp LP, 4905 Lakeway Drive, College Station, TX77845, USA), and gene structure mapping was performed using Illustrator for Biological Sequences (IBS) software, version 1.0.3 [39].