Underestimated effect of intragenic HIV-1 DNA methylation on viral transcription in infected individuals

Background The HIV-1 proviral genome harbors multiple CpG islands (CpGIs), both in the promoter and intragenic regions. DNA methylation in the promoter region has been shown to be heavily involved in HIV-1 latency regulation in cultured cells. However, its exact role in proviral transcriptional regulation in infected individuals is poorly understood or characterized. Moreover, methylation at intragenic CpGIs has never been studied in depth. Results A large, well-characterized HIV-1 patient cohort (n = 72), consisting of 17 long-term non-progressors and 8 recent seroconverters (SRCV) without combination antiretroviral therapy (cART), 15 early cART-treated, and 32 late cART-treated patients, was analyzed using a next-generation bisulfite sequencing DNA methylation method. In general, we observed low level of promoter methylation and higher levels of intragenic methylation. Additionally, SRCV showed increased promoter methylation and decreased intragenic methylation compared with the other patient groups. This data indicates that increased intragenic methylation could be involved in proviral transcriptional regulation. Conclusions Contrasting in vitro studies, our results indicate that intragenic hypermethylation of HIV-1 proviral DNA is an underestimated factor in viral control in HIV-1-infected individuals, showing the importance of analyzing the complete proviral genome in future DNA methylation studies.


Background
Current combination antiretroviral therapy (cART) can successfully control human immunodeficiency virus type 1  infection and prevent disease progression to the acquired immunodeficiency syndrome (AIDS). However, a cure is not generally achievable due to the establishment of a latent reservoir of proviral HIV-1 DNA which remains dormant and fuels viral rebound upon treatment interruption [1][2][3][4]. Therefore, better insight into the mechanisms regulating HIV-1 latency is crucial in order to interfere with this latency state and to develop cure strategies. The state of HIV-1 latency can be defined as the transcriptional silencing of proviral genes caused by multiple transcriptional blocks after the stable integration of proviral DNA into the host genome [5]. Some of the major silencing mechanisms consist of epigenetic modifications, which have led to several clinical trials investigating the latent viral reservoir reactivation with histone deacetylase inhibitors, albeit with limited success [6][7][8][9][10]. Other epigenetic modifications such as HIV-1 proviral DNA methylation have also been described in HIV-1 transcriptional silencing and have been explored as targets for HIV-1 latency reversing strategy [11][12][13][14].
DNA methylation is a well-described epigenetic modification in which a methyl group is added at the number five carbon of the cytosine pyrimidine ring in CpG dinucleotides [15,16]. This modification plays a role in genome transcription regulation and is crucial in processes such as the development of multicellular organisms, cell differentiation, regulation of gene expression, X-chromosome inactivation, genomic imprinting, and in the suppression of parasitic and other repeat sequences [15][16][17][18][19][20][21][22][23]. In general, reliable and stable transcriptional silencing is caused if CpG islands (CpGIs)-stretches of DNA that contain an increased frequency of CpG dinucleotides (CG content > 50% and observed/expected CpG ratio > 60%)-in promoter regions are hypermethylated [12,15,16,24,25]. Methylation of CpGIs within gene bodies (intragenic methylation) has been shown to be involved in regulation of intragenic promoters, alternative splicing, and cellular differentiation, but also in the activation of retroviruses, repetitive elements, and prevention of aberrant transcript production [26][27][28][29][30].
To further understand the role of proviral HIV-1 DNA methylation in infected individuals, an NGS-based bisulfite assay was developed to characterize HIV-1 proviral DNA methylation profiles of both promotor and intragenic regions in the context of a large, well-characterized patient cohort (n = 72). This cohort comprises four different patient groups as described by Malatinkova et al. [39]: 15 early cART-treated individuals (ET), 32 late cART-treated individuals (LT), 17 long-term non-progressors (LTNP), and 8 acute seroconverters (SRCV).

Patient cohorts and DNA samples
HIV-1-positive patients were recruited from two clinical centers, the Ian Charleson Day Centre (Royal Free Hospital, London, UK) and the AIDS Reference Center (Ghent University Hospital, Ghent, Belgium) during the study performed by Malatinkova et al. [39]. Seventy-two HIV-1positive PBMC samples from that study were selected. Patients were divided into four cohorts based on their disease status (Additional Figure 1). The detailed study design and inclusion criteria have been described previously [39]. Briefly, (1) long-term cART-treated individuals (median treatment time of 10.77 years (interquartile range (IQR), 6.46-12.34 years)) who had initiated treatment during HIV-1 seroconversion (early treated (ET); n = 15) or (2) during the chronic phase of the infection (late treated (LT); n = 32); (3) cART-naïve long-term non-progressors (LTNPs, n = 17) who had maintained HIV-1 viral load (VL) ≤ 1000 copies/ml and CD4+ T cells > 500 cells/mm 3 over > 7 years post-infection or (4) cART-naïve seroconverters (SRCV, n = 8), who were sampled during the acute phase of the infection. Baseline characteristics and clinical parameters of these cohorts are summarized in Table 1. The Ethical Committees of Ghent University Hospital and the Royal Free Hospital had approved this study (reference numbers: B670201317826 (Ghent) and 13/LO/0729 (London)) with all study subjects giving their written informed consent.
DNA from aliquots of 10 7 PBMCs was isolated using the DNeasy® Blood & Tissue Kit (Qiagen, The Netherlands, 69504). Sample DNA concentration was determined with

Cell culture
Jurkat cells (human T cell leukemia line) and J-Lat 8.4 (Jurkat cells infected with one HIV-1 copy per cell [44]) were cultured in a humidified atmosphere of 37°C and 5% CO 2 in RPMI 1640 medium with GlutaMAX™ Supplement (Thermo Fisher Scientific, MA, USA, 61870-010), supplemented with 10% FCS and 100 μg/ml penicillin/ streptomycin. The culture medium was renewed every 2 to 3 days. DNA was isolated as described in the previous section.

Primer design
Primers targeting the 4 major HIV-1 CpGIs were designed using 2 online available primer design tools (Methprimer [45] and bisulfite primer seeker (Zymo Research, CA, USA, https://www.zymoresearch.com/ pages/bisulfite-primer-seeker)). LTR primers were obtained from Trejbalova et al. [13] and ETR_1 primers from Weber et al. [37]. To evaluate primers in silico, the bio-informatics tool developed by Rutsaert et al. [46], estimating the complementarity of each primer combination to all full-length HIV-1 sequences in the Los Alamos National Laboratory (LANL) database (www.hiv. lanl.gov) [47], was adapted: the database was transformed to the bisulfite-treated variant (C→T; CG→CG), nested primer combination analysis was included, as well as analysis of combinations of multiple PCR assays. First, the in silico analysis was used to evaluate primer combinations that were obtained from literature as well as inhouse designed. Primer combinations matching at least 50% of the LANL database and nested combinations with an overlap of at least 2/3 of the matched sequences were retained. Selected primers were in vitro tested using DNA from J-Lat 8.4 [44], diluted in Jurkat DNA at different concentrations to mimic patient samples (10, 000, 5000, 1000, 500, 250, 100 HIV-1 copies per 10 6 cells). Finally, an additional in silico analysis was used to select 4 or less primer combinations per CpGI that targeted at least 60% of the LANL database. These final primer sequences are listed in Additional File 1.

Bisulfite treatment
A minimum of 5 × 1 μg of DNA per patient was bisulfite treated using the Epitect Bisulfite kit (Qiagen, The Netherlands, 59110), which is the least fragmenting commercial bisulfite kit available, according to a previous in-house comparison [18]. We used the standard protocol as provided by the manufacturer. The five aliquots per patient were pooled, and immediately stored at -20°C.

Bisulfite-specific PCR
All PCR reactions were performed in triplicate to reduce the probability of preferential amplification of one specific amplicon that would dominate the output. Nested PCR reactions were performed using the FastStart™ Taq DNA Polymerase, 5 U/μl (Roche Applied Science, Belgium, 12032953001). A volume containing theoretically at least Values are reported as median (interquartile range), SRCV seroconverters, LTNP long-term non-progressors, PBMCs peripheral blood mononuclear cells, CA cellassociated, usRNA unspliced RNA, cART combination antiretroviral therapy, VL viral load * Total and integrated HIV-1 DNA measurements are performed using different assays and the absolute copies are therefore not directly comparable. To measure integrated HIV-1 DNA, an Alu-HIV-1 qPCR is used whereas digital PCR is used to determine the total number of HIV-1 DNA copies ten bisulfite-treated HIV-1 copies (based on the droplet digital PCR measurements as in Malatinkova et al. [39]) was added to the PCR mix containing 10 × PCR buffer, 2.5 U polymerase, 400 nM forward and reverse primers, and 3% DMSO in a final volume of 25 μl. Each CpGI was amplified with one nested primer combination, and after a failed PCR reaction, the subsequent primer combination was used (Additional File 1). Amplicons were visualized using 3% agarose gel electrophoresis. Depending on the selected primer, we used an in-house optimized PCR amplification protocol or one of the two previously published protocols [13,37], as described in Additional File 1.

Statistical analysis
HIV-1-specific amplicons with coverage > 250 were normalized and divided into tiles (blocks of the HIV-1 genome containing the region of interest (LTR or env)). Differential methylation analysis per region was performed using the MethylKit package (version 1.6.3) in R (version 3.5.1) [49,50], including correction for overdispersion. P value calculation was performed using the Chi-square test and p value correction for multiple testing was performed within each comparison using false discovery rate (FDR) [51,52]. Spearman rank correlation analysis was performed to explore correlations between DNA methylation (LTR and env) and patient characteristics (HIV-1 reservoir and immunological parameters, obtained from Malatinkova et al. [39]). Therefore, methylation data of both regions of every individual was summarized by calculating an M value over all CpGs using the formula as described by Du et al. [53]. Using stepwise regression model selection, linear regression models were developed for LTR and env methylation densities to determine which independent variables may explain variable DNA methylation in both regions.

Results
In silico, in vitro, and in vivo HIV-1 DNA methylation assay development Three hundred thirty-eight different nested primer combinations (assays) (13 LTR, 303 NCR, 1 ENV, and 21 ETR) were subjected to an in silico analysis using an adapted version of the bioinformatics tool developed by Rutsaert et al. [46] to estimate the complementarity to the Los Alamos National Library database, resulting in 70 nested PCR assays (2 LTR, 46 NCR, 1 ENV, and 21 ETR, Fig. 2a). The performance of these assays was subsequently tested by PCR amplification in undiluted and diluted J-Lat 8.4 DNA (up to 100 infected cells/10 6 cells), resulting in 36 assays (2 LTR, 15 NCR, 1 ENV, and 18 ETR) that were capable of generating PCR products at the lowest dilutions (Fig. 2a). After a final in silico analysis, a set of 9 primer combinations (2 LTR, 3 NCR, 1 ENV, and 3 ETR; Fig. 2 and Additional File 1) was selected.
These nine assays were used to determine the HIV-1 methylation profile of HIV-1-positive blood samples. The percentage of patients for whom the primer combinations generated PCR amplicons is listed in Table 2. This data demonstrates a similar trend as expected based on the in silico analysis, being that a certain percentage of HIV-1 sequences would not be detected in patients for certain primer combinations due to HIV-1 sequence variation. The difference between expected amplification percentage and the actual amplification percentage was 7.85%, 1.57%, 10.58%, and 3.57% for LTR, NCR, ENV, and ETR, respectively ( Table 2).

Correlations between HIV-1 methylation status and reservoir markers
During the explorative correlation analysis, negative correlations were found between the DNA methylation density in the LTR region and the duration of viral suppression (ρ = − 0.34; p = 0.020) and CD4+ T cell count at time of collection (ρ = − 0.27; p = 0.043) (Fig. 4a). However, we observed a significantly positive association for DNA methylation in the env region and the CD4 T cell count (ρ = 0.40; p = 0.0045) and cART duration (ρ = 0.39; p = 0.0055) (Fig. 4a). Moreover, env methylation decreased with increasing VL levels (ρ = − 0.39; p = 0.0063) and higher CD4+ T cell nadir (ρ = − 0.33; p = 0.020) (Fig. 4a). Based on the linear regression models, the only variable that was independently associated with DNA methylation in the LTR was the duration of VL suppression. Three  In silico analysis is based on the bioinformatics primer evaluation tool as described by Rutsaert et al. [46] variables were independently associated with the env methylation: VL, CD4 nadir, and CD4 count at time of sampling (Fig. 4b).

Discussion
The lack of consensus about the role of proviral DNA methylation in HIV-1 transcriptional regulation illustrates the need for a reliable and widely applicable methylation assessment method. In this study, we first described an in silico procedure to accurately predict the complementarity of PCR assays to the HIV LANL database, and an in vitro validation protocol to test the sensitivity of the designed assays. This procedure resulted in nine functional DNA methylation assays, designed against the four most common CpGIs of the HIV-1 provirus, which were consequently used to characterize HIV-1 DNA methylation in a large, well-characterized patient cohort. The in silico analysis was predictive of the number of patient samples leading to successfully amplified PCR products ( Table 2), indicating that this is an effective approach to prioritize testing of primer sets in the context of HIV-1 or other pathogens with a high sequence variability. In addition, as shown in the study of Cortés-Rubio et al. [14], by using an NGS-based approach, our method fulfills the need to analyze a large number of proviruses for each patient when compared with the established Sanger sequencing-based methods [54]. Across our four patient cohorts, we have found that the HIV-1 provirus had low amounts of DNA methylation in the promoter region (average 2.94%, IQR 0.19-5.5%) but substantially higher levels of intragenic (env) methylation (average 28.86%, IQR 8.73-39.44%). When comparing the differential methylation between the cohorts, only SRCV showed distinct methylation profiles, with increased LTR, and decreased env methylation.
Similarly, if patients were divided based on their VL status (detectable VL (VL > 40 HIV-1 copies/ml plasma), comprising all SRCV and 6/17 LTNPs. vs. undetectable VL (VL < 40 HIV-1 copies/ml plasma), comprising ET, LT, and 11/17 LTNPs), individuals with a detectable VL had higher DNA methylation density in the HIV-1 LTR region and a lower density in the env region compared with those with an undetectable VL. These observations might indicate that specific methylation profiles may be associated with in vivo HIV-1 transcriptional control and latency maintenance.
Indeed, since the involvement of DNA methylation in HIV-1 latency was first described in 1987 [55], it has been confirmed in HIV-1-infected cultured cells and latency models that promoter methylation density is associated with silencing stability: DNA methylation induction can initiate/stabilize HIV-1 latency, while methylation inhibitors as 5-aza-2′-deoxycytidine (5-aza-CdR) cause HIV-1 reactivation and display clear synergistic effects with other latency reversing agents [11-13, 32-34, 36, 56-58]. These studies reported a major role of promoter DNA methylation in latency regulation, which was in line with the general concept of transcription regulation by DNA methylation: hypermethylation of the promoter region suppresses both basal promoter activity and responses to activating stimuli, and hypomethylation is a transcription mark [57]. However, DNA methylation studies on patient-derived samples have shown-with the exception of some LTNPs-the same trend as in our present observation: low level of DNA methylation in the promoter region, even in patients suppressing VL successfully, therefore not following the predictions from the in vitro experiments [37,38]. It has been shown that DNA methylation behavior in cell lines is often drastically different from that of in vivo cells due to completely different epigenetic environments and Fig. 3 HIV-1 proviral DNA methylation comparison between patient cohorts. a Summary of the methylation data in the LTR region (CpGI LTR + CpGI NCR) using average methylation over all CpGs in the region. b Summary of the methylation data in the env region (CpGI ENV + CpGI ETR) using average methylation over all CpGs in the region. q = FDR-corrected p values for multiple testing. LT late treated, ET early treated, SRCV acute seroconverter, LTNP long-term non-progressor immortalization, sometimes producing unreliable results in terms of predicting in vivo DNA methylation events [59,60]. Some studies, however do show increasing LTR DNA methylation over time [13], or dynamic profiles in patients when measured longitudinally [14]. We could not confirm these data since we only measured single time point samples of patients with similar treatment time/time of virological control (except for the SRCV). The low abundance of DNA methylation in the promoter region of HIV-1 indicates that other (epigenetic) factors as integration site epigenetics or cell type might be more important for transcriptional regulation than promoter methylation. In previous DNA methylation studies in HIV-1 patients, the focus was on promoter methylation assessment [13,14,32,[36][37][38]. In contrast to promoter methylation, the role of intragenic DNA methylation in general transcriptional regulation is less clearly described [26][27][28][29][30]. Studies outside of the HIV-1 field have suggested that intragenic methylation could have a role in the activation of retroviruses, repetitive elements, alternative splicing, transcription initiation in canonical promoters of embryonic stem cells, and prevention of aberrant transcript production [28][29][30]. Moreover, intragenic methylation has been shown to be a robust predictor of gene transcription in genes with a CpGI containing promoter [61]. In our study, decreased env methylation levels in individuals with active ongoing replication (SRCVs) suggests that intragenic methylation increases in the case of proviral transcriptional silencing, leading to higher methylation in latently infected cells or in those in which viral replication is blocked. Indeed, cART-treated patients and LTNP have lower viral transcription (measured as cell-associated unspliced RNA (CA usRNA)) than SRCV (Table 1) and env methylation shows an inverse correlation with CA usRNA within the SRCV cohort (ρ = − 0.81; p = 0.014). Furthermore, intragenic methylation did correlate positively with the CD4+ T cell count, linking high intragenic methylation with viral control. Intragenic methylation was also negatively associated with the VL, a measure that indicates ongoing replication.
In contrast to what was proposed by LaMere et al. [54], we have found no statistical difference between proviral methylation in LTNP with undetectable VL (latent infection) and treated patients (cART-induced suppression) (LTR: Δ = 0.85%, q = 0.74; env: Δ = 2.29%, q = 0.94). This could be due to the low number of LTNPs with undetectable VL.
In general, the lack of promoter DNA methylation in HIV-1 proviral genomes in vivo suggests that this modification is of subordinate importance in the regulation of the viral life cycle compared with the more abundant, yet less studied intragenic DNA methylation. Our observations indicate that intragenic DNA methylation could be a late event during infection. Methylation of the proviral genome may occur stochastically during years of viral control, yet act as a stable epigenetic mark once established. This may subsequently affect transcription, including splicing, of viral transcripts, which could affect viral replication by interaction with transcriptional elongation (tat) or export of viral RNA (rev). Nevertheless, additional in vitro and in vivo experiments targeting the (intragenic) DNA methylation are required to evaluate the exact impact on the HIV-1 life cycle. Especially temporal changes of intragenic methylation would be very informative, yet our study was limited by the lack of longitudinal sampling. Other limitations include the fact that although the cohort size was much larger than previous studies [13,14,32,[36][37][38], the patient groups described here were not balanced, not in size, nor for sex, and age. Additionally, we did no specific CD4+ T cells selection. The use of PBMCs could potentially mask differential methylation since it is shown that LRAs have cell-type specific effects, indicating cell-type specific epigenetic profiles [62]. Moreover, due to the targeted nature of the methodology, it does not allow to provide information about integration site methylation or replication competence of the analyzed provirus. Finally, we did not provide information about the fifth CpGI (3′ LTR), nor did we analyze non-CpGI CpGs.

Conclusions
Altogether, our study illustrates the underestimation of the role of intragenic proviral DNA methylation in patient samples. Previous studies have mainly focused on LTR methylation and have interpreted LTR methylation as a transcriptional regulatory factor, ignoring any potential role of env methylation [13,35,38]. We suggest that both env and LTR methylation are involved in HIV-1 transcription regulation and that env methylation could be an important predictor of viral transcription in vivo. However, we also suggest that proviral promoter methylation is hindered/inhibited in all HIV-1-positive patients, especially those on cART, but that its density still influences viral transcription rate.
The exact functions of DNA methylation of these two regions should be clarified by performing additional experiments using longitudinal follow-up studies to monitor proviral DNA methylation dynamics within patients, starting early during infection, and ideally continuing over a period of multiple years of cART. Different CD4+ T cell types should be analyzed separately to avoid celltype dependent bias of the data. If HIV-1-positive patients were to undergo treatment interruption, DNA methylation profiles should also be monitored in order to understand the methylation dynamics during viral rebound. Moreover, proviral intragenic non-CpGI methylation analysis could also provide a better understanding of HIV-1 latency regulation by DNA methylation. Here, we do provide a useful tool to help design and estimate the sample size needed in these studies. Altogether, these insights should be of paramount importance when looking at the various strategies to control HIV-1 after discontinuation of cART and for the HIV-1 cure field.

Additional file 1. Primers and PCR experiments.
Additional file 2: Figure 1. Overview of patient cohorts included in this study. Patients are divided into four groups based on their disease state: early treated, late treated, Long-Term Non-Progressor and acute seroconverter. Arrows depict moment of sampling. PHI = Primary HIV-1 Infection; cART = combination Anti-Retroviral Therapy.