Transcriptional, epigenetic and metabolic signatures in cardiometabolic syndrome defined by extreme phenotypes

This work is aimed at improving the understanding of cardiometabolic syndrome pathophysiology and its relationship with thrombosis by generating a multi-omic disease signature. We combined classic plasma biochemistry and plasma biomarkers with the transcriptional and epigenetic characterisation of cell types involved in thrombosis, obtained from two extreme phenotype groups (morbidly obese and lipodystrophy) and lean individuals to identify the molecular mechanisms at play, highlighting patterns of abnormal activation in innate immune phagocytic cells. Our analyses showed that extreme phenotype groups could be distinguished from lean individuals, and from each other, across all data layers. The characterisation of the same obese group, 6 months after bariatric surgery, revealed the loss of the abnormal activation of innate immune cells previously observed. However, rather than reverting to the gene expression landscape of lean individuals, this occurred via the establishment of novel gene expression landscapes. NETosis and its control mechanisms emerge amongst the pathways that show an improvement after surgical intervention. We showed that the morbidly obese and lipodystrophy groups, despite some differences, shared a common cardiometabolic syndrome signature. We also showed that this could be used to discriminate, amongst the normal population, those individuals with a higher likelihood of presenting with the disease, even when not displaying the classic features.

Background Cardiovascular disease (CVD) is the primary cause of death worldwide (17.9 million deaths in 2016, 31% of all deaths) [1] accompanied by an ever increasing number of overweight and obese individuals, which place a burden of hundreds of billions of dollars on healthcare systems each year [2,3]. Cardiometabolic syndrome (CMS) increases both CVD and type 2 diabetes (T2D) risk [4]. CMS is a cluster of interrelated features including: obesity, dyslipidaemia, hyperglycaemia, hypertension and non-alcoholic fatty liver disease [5]. These features have overlapping components, including visceral fat deposition, high triglycerides, high low-density lipoprotein (LDL)-cholesterol, high fasting blood glucose, hypertension, decreased high-density lipoprotein (HDL)-cholesterol and low-grade chronic inflammation [6][7][8]. The therapeutic approaches aim to mitigate these features and include: weight loss strategies [9], lipid lowering drugs [10], antiplatelet therapies [11], glucose lowering [12,13] and anti-inflammatory drugs [14]. The relationship between cardiometabolic health and body weight is complex [15]. CVD risk varies amongst individuals of similar body mass index (BMI) depending on adipose tissue (AT) distribution and functionality [16][17][18][19][20]. AT acts as an active endocrine organ [21,22] and, when dysfunctional, plays a major role in metabolic disorders inducing peripheral insulin resistance, contributing to low-grade chronic inflammation [23].
While the participation of platelets and neutrophils in thrombosis, and that of macrophages in atherosclerotic plaque formation, are well established [24][25][26], the role of these cell types in atherogenesis and CVD onset has been appreciated only recently [27]. Additionally, prolonged exposure to low-grade inflammation is known to modify the functional phenotype of monocytes (an effect named trained immunity [28]), platelets [29,30] and neutrophils [31,32]. The molecular characterisation of these phenotypic changes remains incomplete, motivating the need for extended molecular phenotyping of these cells performed here. Previous multi-omics studies in blood cells have identified pathways involved in CVD and obesity, and confirmed whole blood as a source of surrogate biomarkers able to delineate the metabolic status [33]. Several risk score algorithms have been developed to predict the risk of complications associated with obesity [34][35][36][37][38][39]. However, a number of questions still remain open. CVD may also occur in the absence of other comorbidities, and certain events have a better clinical outcome in overweight and obese patients compared with their leaner counterparts (the so-called obesity paradox) [40]. We speculated that extreme phenotypes could be used to determine disease signatures, including new features, that are informative of disease aetiology in the general population.
Here, we present the molecular characterisation of the transcriptional (RNA sequencing, RNA-Seq) and epigenetic (histone 3 lysine 27 acetylation, H3K27ac; reduced representation bisulfite sequencing, RRBS, and Illumina HumanMethylation450 BeadChip) changes in neutrophils, monocytes, macrophages and platelets in morbidly obese (BMI > 40 kg/m 2 ; no obvious genetic cause [41]) and in familiar partial lipodystrophy type 2 (hereafter lipodystrophy; causal mutations in PPARG or LMNA genes, as verified by whole genome sequence [41,42]) individuals. We also investigated the reversibility of these molecular changes in the obese group after bariatric surgery. We found that proinflammatory gene expression programmes were downregulated, alongside more modest differences in regulatory elements usage and almost no differences in DNA methylation profiles. Altogether, the data indicate a reduced ability of these cells to be activated and undergo extracellular traps (NETosis), which was further confirmed by neutrophil and platelet cell functional assays, which showed a reduced ability to adhere, the key initial step during their activation. Lastly, we indeed identified the molecular signatures for CMS and devised a penalised logistic regression approach to stratify individuals in the general population based on their CMS risk.
Compared to the other groups, the lipodystrophy group had elevated GLC, TC, TG, ALT, AST, insulin (and consequently HOMA-IR and AT-IR), whereas HDL-C and LDL-C were decreased. Instead, the obese group, compared to the other 3, had elevated LAR, LDL-C and hsCRP.
To visualise how these parameters separate obese, lipodystrophy and BD, we performed a principal component analysis (PCA), which showed that obese, lipodystrophy and BD groups were distributed over distinct, albeit partially overlapping, dimensions (Fig. 1a). The first two components (PC1 and PC2) were sufficient to distinguish the different groups (Obese versus Lipodystrophy: p value = 0.002; Obese versus BD: p value < 2.2e−16; Lipodystrophy versus BD: p value < 2.2e−16; Hotelling's T-squared test with F distribution). Lipodystrophy and obese were separated from BD along PC1, while they were separated from each other along PC2. Loading and contribution analysis (Fig. 1b) showed that the main contributors to the separation along PC1 were BW, LAR, hsCRP, AST, ALT, GLC, AT-IR, HOMA-IR and TG.
Additionally, BW, LAR, hsCRP separated the obese from the lipodystrophy groups in one direction along PC2, while AST, ALT, GLC, AT-IR, HOMA-IR and TG separated them in the opposite direction. We further characterised the differences between the obese and lipodystrophy groups by investigating plasma metabolites, whose levels are known to be influenced by both extreme phenotypes [44][45][46][47]. We identified and quantified 988 plasma metabolite species (Metabolon ⓡ -; METHODS)), and we performed a weighted gene coexpression network consensus analysis (WGCNA) [48] to identify groups of metabolites whose levels were correlated across samples. To recognise shared features and to reach the sample size as recommended for such analysis [48] the obese and lipodystrophy group were analysed together. This analysis identified 16 clusters of metabolites (named modules, M1 to M16; Table S2). To determine the relationship between modules, anthropometric traits and plasma biochemistry, we investigated whether any correlation existed. Of the 208 tested associations, we found that 11 modules showed significant associations with BW, LAR, TG, HDL-C, LDL-C, ALT and AST in the extreme phenotype groups (FDR adjusted Fisher p values < 0.05; Fig. 1c), while no associations were found in the BD cohort (not shown). To pinpoint which modules were associated with each of the two extreme phenotype groups, we analysed the modules eigenmetabolite adjacencies (Fig. 1d). The module's adjacencies formed different clusters, C1 and C2 were found using extreme phenotype groups, C3 and C4 using BD  . Each bar is colour coded to represent the different cell types as in (a). b represents results when comparing lean-Control and obese individuals. c represents results when comparing lean-Control individuals and lipodystrophy patients. d Functional GO term annotation of upregulated genes when comparing lean-control versus obese group (top) and lean-control individuals versus lipodystrophy group (bottom), colour coded by cell types as above. The numbers near each dot indicate, from left to right: number of submitted genes, number of genes overlapping with the category and number of genes in the category samples (Additional file 2: Fig. S2A). Plotting the average eigen-metabolite value for each cluster (Fig. 1e) we showed that C1 and C2 represented the obese and lipodystrophy groups, respectively, whereas clusters C3 and C4 could not discriminate between obese and lipodystrophy (Additional file 2: Fig. S2B). C1 metabolites were significantly enriched in alanine, aspartate and glutamate metabolism, phenylalanine metabolism, nitrogen metabolism and TCA cycle, whereas C2 metabolites we found glycine, serine and threonine metabolism and cysteine and methionine metabolism pathways (Table S2). Our analysis demonstrated that the two extreme phenotype groups could be identified by their metabolic signatures, associated with clinical parameters, which also set them apart from the general population represented by BD.

Extreme phenotypes influence innate immune cell types and platelets transcriptional and epigenetic signatures
Next, we determined the influence of the changes in plasma on neutrophils, monocytes, macrophages and platelets, as these are some of the key players in atherogenesis and thrombus formation [49] (Fig. 2a and Additional file 1: Fig. S1B). We compared gene expression (RNA sequencing), active chromatin (histone 3 lysine 27 acetylation distribution) and DNA methylation (reduced representation bisulfite sequencing and Illumina arrays) in lean-Control and extreme phenotype groups. For each assay we performed the following comparisons: lean-Control versus obese, lean-Control versus lipodystrophy and obese versus lipodystrophy ( Fig. 2a and Table S7). For each comparison we identified differentially expressed genes (DEG ; Table S8-S11), differentially acetylated regions (DAcR; Table S12-S14) and differentially methylated CpG islands (Table S15-S17) at 5% FDR.
Cell type-specific functional annotation by gene ontology (GO) terms enrichment analysis for the DEG between the lean-Control and obese groups (Fig. 2d) found an enrichment for GO terms related to interferon alpha/beta signalling pathway, as well as focal adhesion in DEG upregulated in macrophages (Table S18). In monocytes, upregulated DEG were enriched for GO terms related to inflammatory response and downregulated DEG were enriched in GO terms related to programmed cell death and ion homeostasis (Table S19). In neutrophils, downregulated DEG were enriched for genes responding to antithrombotic drugs (Table S20). In the comparison between the lean-Control and lipodystrophy groups (Fig. 2d), macrophages upregulated DEG were enriched in GO terms related to cholesterol biosynthesis and immune response activation. In monocytes and neutrophils, upregulated DEG were enriched in terms related to interferon and immune responses. To determine whether these transcriptomic and epigenetic changes are reversible after exposures removal, a second blood sample was taken from the same obese individuals 6 months after bariatric surgery, and the same assays were performed.

Effect of bariatric surgery on transcriptional and epigenetic landscapes, and cell functions
Bariatric surgery is effective in the management of extreme obesity and associated comorbidities, including CMS risk [55], with well-established long-term benefits on weight loss, diabetes, hypertension and dyslipidaemia [56]. While the effect of this intervention has already been reported [57,58], little is known about the underlying molecular mechanisms. Because we sampled the same individuals, robust pairwise comparisons could be used. In plasma biochemistry we observed a decrease for LAR, TG, hsCRP, AT-IR and AST and an increase of HDL-C (p values: 7.22*10 -6 , 2.63*10 -9 , 4.98*10 -4 , 2.51*10 -2 , 1.48*10 -3 and 1.86*10 -3 , respectively; conditional multiple logistic regression, adjusted for age and sex; Fig. 3a; Table S1). Transcriptional and epigenetic paired analyses (Fig. 3b) identified DEG in macrophages (599), monocytes (1931), neutrophils (2571) and platelets (2883 ; Table S8-S11), DAcR in monocytes (229) and neutrophils (788; Table S13-S14) and differentially  (201), monocytes [48] and neutrophils (198; Table S15-S17). DEG GO terms enrichment identified amongst the upregulated pathways: ribosome formation, metabolism of amino acid and proteins, several immune-related pathways and cytoplasm translation and amongst the downregulated pathways: cholesterol metabolic process (through SREBF and miR33 [59]) and mRNA processing pathways (Table S18-S21). We found genes whose expression was reduced in obese, to revert to the levels observed in the lean-Control group: nine in macrophages . These indicate that lipoprotein metabolism (LDL-R), fatty acid synthesis (FASN) and cholesterol transport (STARD4) are restored after surgery. We also found two genes in macrophages (SNHG5, EVI2A; overlap p value = 0.03) and three in monocytes (XXbac-BPG32J3.22, MEIS2, MS4A14; overlap p value = 0.03) that move in the directions. However, the majority of DEG either did not revert to the values observed in lean-Control individuals or were not differentially expressed in the comparison between the obese and lean-Control groups.
The effects of bariatric surgery at organism level were monitored with plasma proteomics. We quantified 3098 plasma proteins; 604 of which were found to be differentially abundant (DAP; Fig. 3c and Table S24) above ordinal Q-value of 1*10 -3 . Proteins whose levels increased after bariatric surgery (n = 72) were enriched in GO terms related to tight junction and WNT, PI3K/AKT, sphingolipid signalling pathways. Proteins whose abundance decreased after surgery (n = 532) were enriched in the following GO terms: cell cycle and DNA repair, ribosomal RNA metabolism and cell senescence, phagocytosis and T cell receptor signalling as well as FGF, IL2, VEGF and insulin signalling pathways (Table S25). Amongst these we also found NLRP3, a critical mediator of inflammation [61] and several histones, normally released by cells undergoing apoptosis and NETosis [62]. No changes in full blood count that could explain these changes were observed. We only noted an increase in mean platelet volume (MPV; p value = 0.03; paired t test) and a reduction of lymphocytes (p value = 0.03) and eosinophils (p value = 0.03; Table S1) counts. The plasma proteomic results showed that the changes after bariatric surgery were not limited to immune cells. To determine whether any of them could be ascribed to a specific tissue, we determined which genes were tissuespecific, using the GTEx project database [63] (Table S26; METHODS). Tibial, coronary and aortic arteries, heart atrial appendage, heart left ventricle and blood displayed an enrichment for tissue-specific genes amongst DAP (p values: 1.6*10 -2 , 8*10 -3 , 2*10 -2 , 1.8*10 -2 , 1.6*10 -2 and 5*10 -2 , respectively; hyper-geometric test; Table S26). Of the 13 blood-specific genes encoding a DAP, six were also differentially expressed in at least one of the cell types (Fig. 3d). These six genes have roles in immune response and leptin resistance [64], immune pathways [65], neutrophils recruitment during thrombosis [66] and macrophage differentiation and inflammatory response [67]. Furthermore, monocytes and macrophages data allowed us to explore the effect of bariatric surgery on trained immunity [53], which has been shown to play a role in atherosclerosis [68,69]. Genes displaying an active promoter (H3K4me3), with or without β-glucan treatment, significantly overlapped with DEG in the obese versus post-surgery comparison (p value = 4.5*10 -2 and p value = 7.7*10 -3 ; Table S27).
To determine the impact of the changes observed at molecular levels on the functional phenotypes of these cell types, we performed functional tests on neutrophils and platelets. After bariatric surgery, neutrophils showed a reduction in their ability to adhere both when unstimulated ( Fig. 3e), as well as, when subjected to a variety of stimuli (DTT, LBP, PAM3, PAF and fMLP; Fig. 3f ), but not when treated with TNF alpha or PMA. These results were accompanied by a reduction in the cell surface levels of CD16 and CD32, but not CD66b, CD63, CD62L or CD11b (paired t test, all result in Table S28). Alongside, we also performed platelet functional tests, which showed a reduction in P-selectin surface exposure upon collagen stimulation, but not upon ADP or thrombin stimulation. These results were accompanied by a reduction in the cell surface levels of fibrinogen receptor (CD61 and CD41b) and CD36, the thrombospondin receptor that acts as scavenger for oxidised LDL. No changes were observed for CD49b, CD42a, CD42b, CD29 and CD9 (paired t test, all results in Table S28).
Data collected across several layers of evidence suggested a diminished ability of the cells to use neutrophil extracellular traps (NETosis) [70] after bariatric surgery (Fig. 4). NETs are formed by chromatin (DNA and histones), granular antimicrobial proteins and cytoplasmic proteins, and are normally found at low levels in the circulation [71]; however, in the presence of pathogens or sterile inflammation, such as the increase of reactive oxygen species observed in obese individuals [72], NETs levels are increased. We observed decrease plasma levels of NLRP3 a critical mediator of the inflammasome [61], RAC2 a protein directly involved in NETs promotion [73], MYO1G, a protein-promoting immune cells interaction [74] and several histones, core component of the chromatin released during NETosis [74] (all in Fig. 3c). Additionally, the upregulated genes in obese individuals indicated increased activity of neutrophils and monocytes (Table S19-S20).
Lastly, we also observed a decrease in the ability of neutrophils to adhere, alongside changes in their surface protein levels, such as CD16 and CDE32, also previously associated with NETosis [75] (Table S28). In addition, genes associated with DAcR in neutrophils post-surgery showed enrichment in the T cell receptor signalling pathway, in particular Th17 cell differentiation (Table S22), which suggested a restored ability for neutrophils to activate T cell through NETosis.
The results of the comparisons between lipodystrophy and post-bariatric surgery and post-bariatric surgery and lean-Control (Additional file 3: Fig. S3) are available in Tables S8 to S17.

Multi-omic signature classification of extreme phenotypes
Six lean (METHODS; hereafter named "Lean-BD"; Additional file 1: Fig. S1A) and six obese individuals, for which we had complete measurements on all layers, in monocytes and in neutrophils, were used to define training sets (Additional file 1: Fig. S1C). Since multivariable selection approaches have provided an effective means to integrate multiple omics layers and elucidate disease signatures [76,77], we applied elastic-net-penalised logistic regression [78] to identify signatures associated with an increased probability of belonging to the obese group and therefore to have some or all features associated with CMS (Fig. 5a). We performed this analysis independently for each data layer (METHODS). The variables selected into each signature defined patterns characterising the groups ( Fig. 5b; Table S29-31), and the biometric variables were then used to construct multivariable logistic regression models, using either the variables selected from a single data layer or all selected variables, across omic layers (METHODS). All models, single-layer or multi-layer trained, allowed us to rank individuals according to their probability of belonging to the obese group ( Fig. 5c and Additional file 4: Fig.   S4A). We quantified the log loss [79] (or cross-entropy loss; Fig. 5c and Additional file 4: Fig. S4A) and demonstrated that the multi-layer model provided the greatest separation, followed by the models trained on the RNA-seq, then those trained on metabolites and methylation. Qualitatively similar results were obtained when using the lean-BD and  Prioritised lipids Fenland cohort BDcohort * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * * * * * * * * ** * * * * * * * * * * * * * * * * * * * * * ** * * * * * * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * Log loss: 0.57 Heatmaps showing the mean of the Z-score distribution for each group, for all features selected in each layer. c Plots showing individuals ranked by their predicted probability of belonging to the obese group, using models trained using data from individual layers, as well as a multi-layer predictive model (as indicated by the plot titles). Plots are ordered by decreasing log loss, with smaller values corresponding to better discrimination of individuals in the extreme phenotype group from all other individuals. d Heatmap showing age and sex adjusted association values between (left) eight prioritised lipid species and risk factors measured in the Fenland and present cohorts; and (right) a negative control set of five unselected lipid species and the same risk factors. Black asterisks indicate significant associations after correcting for multitesting lipodystrophy individuals as training data (Fig S4B), suggesting that individuals belonging to the extreme phenotype groups tend to be more similar to one another, when taking into account all of the data layers, than to individuals outside these groups. We moreover noted that amongst the 20 BD predicted by the multi-layer model to belong to the obese group with highest probability, 4 were in the lowest quartile for weight and 8 were in the lowest quartile for LDL (Table S32), demonstrating that our multi-omic model is not simply recapitulating features of classic CMS presentation. External cohorts with similar data layers will be required to establish the predictive utility of our models and to fully validate the omic signatures identified in this study. However, because lipidomic data from external cohorts were available, we focused on the validation of the lipidomic signature. We prioritised a reduced set of nine lipid species from the signature to test for univariate association with known CMS risk factors (METHODS, Fig. 5d), including eight matched lipid species measured in a subset of 1507 participants in the Fenland study [80]. After correcting for multiple testing, 61% (225/368) of associations remained significant. Triacylglycerol 52:2 and 50:1 were positively associated with several risks factors (fasting plasma glucose, fasting insulin level, HOMA-IR, a fatty liver index, HbA1c, leptin, LDL-C, hsCRP, TG, BMI, fat mass, ALT, and ferritin; Table S33) and inversely associated with adiponectin and HDL-C. Phosphatidylcholine (40:7), (38:7), (38:6), (35:2) and O (36:2) were inversely associated with all factors except for adiponectin and HDL-C. Further supporting our findings, phosphatidylcholine (38:6 and 36:2) had previously been identified in obesity studies [81]; and triacylglycerol (50:1 and 52:2) had previously been linked to NAFLD [80] and NASH [82]. To assess the specificity of the results, we repeated the analysis with five lipid species randomly selected (METHODS) from those not included in the signature. Only 21% of associations were significant (49 out of 230 tests). The same pattern of associations was also found in our study ( Fig. 5d; Table S33), as well as, in a biopsy-confirmed non-alcoholic steatohepatitis (NASH) cohort comprising 73 individuals [82] (Additional file 5: Fig.  S5; Table S33). We showed the diagnostic value of the prioritised lipid species through their association with major cardiometabolic risk factors in the Fenland study and in the present study; as well as, albeit not significantly due to the small sample size, in the NASH cohort.

Discussion
Our overarching goal was to develop an integrative multiomic strategy to obtain a signature for CMS that would combine information collected across different -omics layers in order to account for the impact that genetic and environmental differences have on each of them. To obtain a CMS signature we generated data from extreme metabolic phenotype groups (morbidly obese referred for bariatric surgery and individuals affected by lipodystrophy) because we hypothesised that the signals would be stronger across all layers. We then used this signature to determine the cardiometabolic status of a group of individuals (BD) that are at increased risk of developing CMS due to age.
First, we explored the data to determine the similarities and differences between the two extreme phenotype groups. The comparison of anthropometric traits and the different blood plasma biochemistry assays showed that the separation of the extreme phenotype groups from the general population, represented by BD, is driven by BW, LAR, hsCRP, AT-IP, HOMA-IR, TG, GLC, ALT and AST. The directionality of the same parameters also separates obese from lipodystrophy, suggesting that while AT dysfunction is a shared feature, its influences were different in the two groups (Fig. 1a, b). Additional differences were observed in the Metabolon ⓡ assay data, where different groups of metabolites (modules) were associated with plasma biochemistry assays (Fig. 1c) and formed two larger clusters highlighting the role of the metabolism of different amino acids in separating the two conditions (Fig. 1a, b). Of note, these were not the branched chain amino acids which were found to be part of an obesity signature [83].
At the transcriptional (RNA-seq) and epigenetic levels (H3K27ac and DNA methylation), we observed modest changes in all comparisons (Fig. 2b, c and Additional file 3: Fig. S3B), whereas larger differences have been previously observed other tissues [50,51]. The largest number of changes was found in active chromatin (3616 DAcR) in the comparison between macrophages of the obese and lipodystrophy groups (Additional file 3: Fig. S3B) and was not accompanied by nearly as many changes in gene expression. This indicates that either similar transcriptional outputs were achieved using different regulatory landscapes [52], or that these cells have been differently primed to respond to acute stimuli [84]. These results were in agreement with the absence of overlaps between DEG, both in the lean versus obese and lean versus lipodystrophy comparisons, and genes previously associated with trained immunity [53]. The enrichment in GO terms linked to inflammation and inflammatory response in the DEG reflects the environment to which these cells are exposed. These findings expand on previous observations made in whole blood [54]. With the above exception in macrophages, these data showed that the two extreme phenotype groups were overall, as expected, more similar to each other than to the lean group, again reflecting the underlying AT dysfunction.
Because the morbidly obese individuals were selected amongst those referred for bariatric surgery, we had the opportunity to determine the effects of the procedure and the associated weight loss. Post-surgery we observed, not only the expected changes in plasma biochemistry parameters (Fig. 3a), but also changes in gene expression and chromatin modifications of a larger magnitude (Fig. 3b, c) than those observed when comparing obese and lean individuals. We found genes (LDL-R, FASN, STARD4) that reverted to the expression level observed in lean individuals, indicating that lipoprotein metabolism, fatty acid synthesis and cholesterol transport are restored after surgery. However, the majority of DEG observed in these four cell types were not highlighted in the comparison between obese and lean individuals and no enrichment was found for GO terms related to inflammation. This suggests that the reduction in inflammatory signatures after bariatric surgery was achieved with the establishment, at least in the time frame investigated, of novel gene expression landscapes.
A few notable observations can be made from these analyses: (1) the overall small number of observed changes in DNA methylation, together with the short life span of the hematopoietic cells analysed, indicated that the change in exposure had little effect on the hematopoietic stem cell epigenome, and that the effects observed in animal models [60] were either species specific or diluted and then lost with the turnover of the hematopoietic progenitor pool in 6 months. (2) Bariatric surgery had a positive impact on innate immune cells and indicated that trained immunity acts downstream of the hematopoietic stem cell pool; with its effects being diluted and eventually lost with the pool renewal. (3) The integration of gene expression and proteomic data showed that some of the changes at transcriptional level were directly involved in the reduction of the proinflammatory environment, but also highlighted a conspicuous involvement of other intermediate levels of regulation. (4) Several pathways involved in thrombus formation, including NETosis, have reduced ability to respond to stimuli after bariatric surgery, possibly explaining some of the beneficial effects of this procedure.
Given the limited number of changes identified using single-layer univariate comparisons, we sought to identify multivariable signatures by combining information across layers, to better discriminate between the extreme phenotype and lean groups. We found that the rank assigned to each participant was different depending on the -omics layer considered. Aside from the full model, the rank provided by the gene expression layer had the highest discriminatory potential and DNA methylation the lowest, as judged by the position of the lipodystrophy individuals on the model trained on obesity. Indeed, despite the differences between the two groups, lipodystrophy individuals had the highest probabilities of belonging to the obese group, with 8 out of 10 lipodystrophy individuals predicted to belong to the obese group in the complete model. These results indicated that our initial hypothesis that the extreme groups would share a CMS signature, and that different layers would capture its different components, was correct. Some differences were observed when comparing the models trained using the lipodystrophy individuals, these likely reflect the higher homogeneity in the latter, due to the high penetrance genetic causes. The strengths of the multi-omic approach have been extensively discussed in a separate manuscript [123], in which simulation studies to assess precision and recall under a variety of different settings were performed, and in which we compared our chosen method to alternative approaches. The approach presented here is to be preferred if we wish to select as many relevant predictors as possible (i.e. high sensitivity/ recall, low false negative rate), while also achieving good predictive (classification) performance. This approach is therefore highly suitable for hypothesis generation settings, in which we wish to provide many potentially relevant predictors. We are also aware of the limitations of the results we presented. Due to the limited sample size it is possible that using a larger number of participants some of the features we identified might not be significant while others might be pushed above the significance threshold. Moreover, we note the absence of other similar cohorts for a complete independent replication of our results. However, we have shown that the lipids signature is informative of disease features in two independent cohorts, with similar results.

Conclusions
Our overall goal was to develop an integrative multi-omic strategy to combine information collected across different -omics layers in order to account for the impact of genetic and environmental differences on each of them. We generated data from extreme metabolic phenotype groups to obtain a signature for CMS and then used this signature to determine the cardiometabolic status of a group of individuals (BD) that, due to age, are at increased risk of developing CMS. Substantial annotations in our analysis identified the reduction of inflammation and the reduction of the ability to form extracellular traps as key consequences of bariatric surgery in innate immune cells and platelets. Further investigation of the molecular basis underlying the priming of these innate immune cells will help to understand which features, such as small molecules or metabolites, promote abnormal inflammation and extracellular traps formation, providing possible avenues for future clinical treatments. Seyres et al. Clinical Epigenetics (2022) 14:39

Cell types isolation
Whole blood (50 ml) in citrate tubes was obtained after informed consent. Platelet-rich plasma (PRP) was separated from the cellular fraction by centrifugation (20' , 150 g and very gentle break) for platelet isolation. Platelets were then isolated from PRP after 2 more spins as above and leukodepleted using anti-CD45 Dynabeads (Thermofisher) following the manufacturer's instructions. Purified platelets were stored in TRIzol (Invitrogen) until RNA extraction. The remaining cells were resuspended in buffer 1 and separated on a Percoll gradient. Neutrophils were harvested from the red blood cell pellet after red cell lysis (4.15 g NH4Cl, 0.5 g KHCO3 and 18.5 mg EDTA (triplex III, 0.01%) in 500 ml of water) and aliquots prepared for RNA extraction (TRIzol), DNA extraction for RRBS (snap-frozen pellet) and ChIP-Seq (formaldehyde fixation, see below). Monocytes were isolated from the peripheral blood mononuclear cell (PBMC) layer by CD14 positive selection (Miltenyi) and aliquots prepared for RNA extraction (TRIzol), DNA extraction for RRBS (snap-frozen pellet) and ChIP-Seq (formaldehyde fixation, see below). Macrophages were cultured by plating 14*106 PBMC resuspended in 2 ml macrophage media (Macrophage-SFM [with L-Glutamine without Antibiotics], Fisher Scientific UK LTD). After 1 h, 30' non-adherent cells were removed and 1 ml fresh macrophage media added together with 400 μl of autologous serum. Culture media were replaced after 3 or 4 days. On day 7 cells were harvested for RNA extraction (TRIzol), DNA extraction for RRBS (snap-frozen pellet) and ChIP-Seq (formaldehyde fixation). Samples whose purity was below 90% were discarded. BD samples isolation has been extensively described in Chen et al. [43].

RNA extraction
RNA extraction from samples stored in TRIzol was performed following the manufacturer's instructions. Briefly, tubes were retrieved in small batches and thawed on ice. Prior to extraction samples were vortexed for 30″ to ensure complete lysis and let for 5' at room temperature. Samples were then transferred to heavy phase lock tubes (5prime) to separate RNA in the aqueous phase from the organic phase. RNA was precipitated from the former with isopropanol and glycogen. The RNA pellet was resuspended in RNase free water. Purified RNA was stored in single-use aliquots. Each sample was quality controlled by a Bioanalayser (Agilent) and quantified via Qubit (Thermofisher).

Library preparation and sequencing
For cell types isolated from obese and lipodystrophy patients and controls we used 100 ng of total RNA for neutrophils, monocytes and macrophages and 200 ng for platelets. o libraries were prepared for sequencing using the Kapa stranded RNA-Seq kit with riboerase (Roche) according to the manufacturer's instructions and sequenced 150 bp paired end on Illumina HiSeq 2500 or Illumina HiSeq 4000. BD RNA-Seq data (extensively described in Chen et al. [43]) were retrieved from European Genome-phenome Archive (EGA)-EMBL-EBI after application to the Data Access Committee.
Transcript-level abundance was estimated using Kallisto (v0.42) [85] with 100 bootstrap iterations in single-end mode for extreme phenotype samples in order to minimise technical batch effect with BD cohort. Transcript abundances were then summarised to gene level with Tximport R package (v1.9) [86] by using tximport function and Ensembl reference transcriptome (Ensembl Genes 96) [86,87]. This step provides an input count matrix for DESeq2 (v.1.21.21) [88]. DESeq2 was used to normalise counts by library size and transformed by variance stabilisation (VST). We corrected for sequencing batch effects by using Combat (from sva R package (v.3.29.1)) [88,89] and individual status as covariate. Non-autosomal genes and those with no or low variance across individuals were removed. The final gene sets (including coding and non-coding genes) were formed of 10,925 genes for monocytes and of 26,634 for neutrophils. Quality metrics are reported in Table S4.

Differential analysis
For differential analysis, transcript-level abundance was estimated by Kallisto with 100 bootstrap iterations in paired-end mode for each group (obese, post-surgery, lipodystrophy patients and control individuals) using Ensembl reference transcriptome (Ensembl Genes 96). Transcript abundances were then summarised to gene level with Tximport R package (v1.9) by using tximport function and DESeq2 object was created using DESeq-DataSetFromTximport function from DESeq2 R package (v. 1.21.21). Differential analysis was performed using the DESeq function from DESeq2 and we used age and gender as covariates. Log fold changes were corrected with the lfcShrink function from DESeq2. Genes with FDR < 5% were marked as differentially expressed. Lean individuals were selected from control group, and named hereafter "Lean-Control", by applying the following criteria: BMI < 25, glycaemia (GLUC) < 5.4 mmol/L, TG < 1.7 mmol/L, LDL < 2.59 mmol/L, HDL > 1 mmol/L for men and > 1. 3  and therefore performed a paired analysis by adding relationship information as covariate in the design formula.

Chromatin immunoprecipitation sequencing Sample preparation
Cells were fixed immediately after purification with 1% w/v formaldehyde for 10 min and quenched using 125 mM glycine before washing with PBS. Samples were sonicated using a Bioruptor (Diagenode), final SDS concentration of 0.1% w/v for 9 cycles of 30 s 'on' and 30 s 'off' , and immunoprecipitated using an IP-Star Compact Automated System (Diagenode) using the histone H3K27ac antibody C15410196 (lot 1723-0041D) Diagenode. Immunoprecipitated and input DNA were reverse cross-linked (65 °C for 4 h), treated with RNase and Proteinase K (65 °C for 30 min).
A combination of quality metrics was used to assess sample quality: number of uniquely mapped reads, number of called peaks, NSC (normalised strand crosscorrelation) and RSC (relative strand cross-correlation) computed with Phantompeakqualtools (v.1.2) [94,95], area under the curve (AUC), X-intercept and Elbow Point computed with plotFingerPrint function from deepTools suite (v.3.0.2) [96] with -skipZeros -numberOfSamples 50000 options. Peaks were called with MACS2 (v.2.1.1) with -nomodel -shift -100 -extsize 200, a qvalue threshold of 1e-3 options and celltype matching input file scaled to sample read number. We used the MACS2 randsample function to downscale inputs. We then computed a score by summing values obtained for each range of these metrics. We applied a threshold of -3 (total) to select the best quality data. To build the ChIP-Seq layer for integrative analysis, we defined a master set of peaks and quantified H3K27ac ChIP-Seq signals under these peaks. Peaks shared by at least 5 individuals were merged using R package DiffBind (v2.9) [97]. We obtained 67,763 and 49,188 peaks for monocytes and neutrophils, respectively. Minimum merged peak size was 244 bp and 235 bp, median peak size 1392 bp and 1648 bp and maximum peak size 75,534 bp and 60,528 bp for monocytes and neutrophils, respectively. We did not filter out very large merged peaks as they represent less than 3% of total peaks and indicate large acetylated regions. Read counts under merged peaks were TMM normalised using effective library size and logit transformed into count per million (CPM). Sequencing centre batch effect was corrected with Combat (from sva R package (v.3.29.1)) using individual status (Patient/Donor) as covariate. Non-autosomal and no or low variance peaks across individuals were removed. The final master set of peaks counted 25,595 regions in monocytes and 26,300 regions in neutrophils. Quality metrics are reported in Table S3. Seyres et al. Clinical Epigenetics (2022)  To retain even the smallest fragments and to minimise the loss of material, end preparation and adaptor ligation were performed in a singletube setup. End fill-in and A-tailing were performed by addition of Klenow Fragment 3' -> 5' exo-(Enzymatics) and dNTP mix (10 mM dATP, 1 mM dCTP, 1 mM dGTP New England Biolabs). After ligation to methylated Illumina TruSeq LT v2 adaptors using T4 DNA Ligase rapid (Enzymatics), the libraries were size selected by performing a 0.75 × clean-up with AMPure XP beads (Beckman Coulter). The libraries were pooled based on qPCR data and subjected to bisulfite conversion using the EZ DNA Methylation Direct Kit (Zymo Research) with changes to the manufacturer's protocol: conversion reagent was used at 0.9 × concentration, incubation performed for 20 cycles of 1 min at 95 °C, 10 min at 60 °C and the desulphonation time was extended to 30 min. These changes increase the number of CpG dinucleotides covered, by reducing double-strand break formation in larger library fragments. Bisulfite-converted libraries were enriched KAPA HiFi HS Uracil + RM (Roche). The minimum number of enrichment cycles was estimated based on a qPCR experiment. After a 1 × AMPure XP clean-up, library concentrations were quantified with the Qubit Fluorometric Quantitation system (Life Technologies) and the size distribution was assessed using the Bioanalyzer High Sensitivity DNA Kit (Agilent).

Differential analysis
For differential analysis, we used DiffBind with the builtin DESeq2 method for statistical analysis. We merged peaks present in at least 50% of individuals and asked that all individuals have a FRiP value (Fraction of Reads in Peaks) over 5%. We then applied a FDR threshold of 5% to select H3K27ac peaks differentially acetylated peaks. We used age and gender as covariates. For obese versus post-surgery comparison, we considered only paired samples and therefore performed a paired analysis by using the block factor in DESeq2. Differentially acetylated regions (DAcR) were annotated with HOMER (v.4.10) [98], annotatePeak function and Hg38 RefSeq genome annotation (http:// homer. ucsd. edu/ homer/ data/ genom es/ hg38. v6.0. zip).
Functional annotation was performed on genes within a window of 10 kb around each DAcR, taking into account fold change direction. Similarly to RNA-Seq, lists of genes were submitted to EnrichR interrogating the same databases. Annotation results are available in Table S22.

Illumina 450 K arrays and reduced representation bisulfite sequencing (RRBS) Arrays and libraries preparation and sequencing
BD Infinium Human Methylation 450 arrays (Illumina) were retrieved from the European Genome-phenome Archive (EGA)-EMBL-EBI. DNA extraction and array generation have been described in detail in Chen et al. [43]. Briefly, cells were lysed using guanidine hydrochloride, sodium acetate and protease lysis buffer. DNA was extracted using chloroform and precipitated in ethanol prior to washing and resuspension in ultra-pure water. 500 ng of DNA for each monocyte and neutrophil sample was randomly dispensed onto a 96-well plate to reduce batch effects. Samples were bisulfite-converted using an EZ-96 DNA Methylation MagPrep Kit (Zymo Research) following the manufacturer's instructions with optimised

Processing and quantification
All Infinium Human Methylation 450 array data preprocessing steps were carried out using established analytical methods incorporated in the R package RnBeads (v.1.13.4) [99]. First, we performed background correction and dye-bias normalisation using NOOB [100], followed by normalisation between Infinium probe types with SWAN [43,101]. Next, we filtered out probes based on the following criteria: median detection p value 0.01 in one or more samples; bead count of less than three in at least 5% of samples; ambiguous genomic locations [102]; cross-reactive and SNP-overlapping probes [103].
The RRBS samples were sequenced on Illumina HiSeq3000 platform in 50-bp single-end mode. Base calling was performed by Illumina Real Time Analysis (v2.7.7) software, and the base calls were converted to short reads using Illumina2bam (1.17.3 https:// github. com/ wtsi-npg/ illum ina2b am) tool before de-multiplexing (BamIndexDecoder) into individual, sample-specific BAM files. Trimmomatic (v0.32) [104] was used for trimming the adapter sequences. Trimmed short read sequences were aligned onto the GRCh38/hg38 human reference genome with BSMAP(v2.90) [105] aligner in RRBS mode which was optimised for aligning the RRBS data while being aware of the restriction sites and with the following options: -D C-CGG -D T-CGA -w 100 -v 0.08 -r 1 -p 4 -n 0 -s 12 -S 0 -f 5 -q 0 -u -V 2. R package RnBeads were used to filter out low confidence sites: sites overlapping any SNP, having a coverage lower than 5 and high coverage or missing in more than 5% or individuals were filtered out. Integration analysis required to attenuate technology effect between 450 K arrays and RRBS. To this goal, we generated RRBS data for 14 Blue-Print donors for which we already have 450 K array data in monocytes, and 9 in neutrophils. We first removed non-reproducible sites between technologies as follows: for monocytes and neutrophils, 1) liftover 450 K sites to Hg38 using UCSC liftover tool [106], 2) keep overlapping sites between array and RRBS, 3) filter out sites with high variation in methylation percentage observed in more than 70% of individuals. We excluded 844 and 1127 sites for monocytes and neutrophils, respectively. We have also excluded sites on sex chromosomes and imputed missing values using KNN networks (impute. knn function from impute R package (v.1.55.0)) [Hastie T, Tibshirani R, Narasimhan B, Chu G (2019). impute: impute: Imputation for microarray data.] with 10 nearest neighbours. Finally, we adjusted for batch effects using an empirical Bayesian framework, as implemented in the ComBat function of the R package SVA (v.3.29.1) and individual status as covariate, transformed beta values to M-values using beta2m function in R package lumi (v.2.33.0) [107,108], normalise by quantile using normalise.quantiles function from R package preprocessCore (v. 1 Table S5 and S6.

Differential analysis
For differential analysis, we used the methylKit R package (v.1.8.1) [109] and we compared only RRBS data. We first extracted methylation ratios from BSMAP mapping results with methratio.py python script provided with BSMAP. We then removed all sex chromosomes sites and filtered out non-retained sites from RnBeads RRBS processing. Finally, we used the methRead function from methylKit R package in CpGs context at base resolution to read in the input files and calculateDiffMeth function correcting for overdispersion (overdispersion = "MN") and applying Chisq test. We used age and gender as covariates. Q Values are then computed using the SLIM method [109,110]. We applied two thresholds: difference of methylation > 25 and qvalue < 0.05 and retrieved differentially methylated sites (DMS) with getMethylDiff function specifying type = "hypo" or type = "hyper" option to get down and up methylated CpGs , respectively.
Functional annotation was performed on genes within a window of 10 kb around each DMS, taking into account fold change direction. Similarly to RNA-Seq and ChIP-Seq, lists of genes were submitted to EnrichR interrogating the same databases. Annotation results are available in Table S23.

Plasma metabolites measurement Metabolites quantification
Metabolites profiling of obese and lipodistrophy patients, controls and blood donors (BD participants) was performed by Metabolon Inc. (https:// www. metab olon. com/) using their standard protocol. Briefly, Metabolon analytical platform incorporates two separate ultrahigh performance liquid chromatography/tandem mass spectrometry (UHPLC/MS/MS2) injections and one gas chromatography GC/MS injection per sample. The UHPLC injections are optimised for basic species and acidic species. The numbers of compounds of known structural identity (named biochemicals) as well as compounds of unknown structural identity (unnamed biochemicals) detected by this integrated platform were, respectively, of 793 and 362 for the first batch and 947 and 433 for the second batch (with an overlap of 786 and 359 compounds, respectively). All samples were rescaled to set the median to 1, and missing values were imputed using KNN networks (impute.knn function from impute R package (v.1.55.0) with the following options: number of nearest neighbours = 10, maximum missing values per metabolites < 50% and maximum missing values for individuals < 80%). Finally, we adjusted for batch effects using the ComBat function of the R package SVA (v.3.29.1) and individual status as covariate.

Plasma lipids measurement
Plasma was frozen in dry ice immediately after collection and stored at − 80C until analysis. Samples were prepared essentially as previously described [111]. Briefly, a 15 μL sample, controls and blanks were placed in a predefined random order across 96-well plates (Plate + , Esslab, Hadleigh, UK). To which, 750 µL methyl tert-butyl ether was added, along with 150 µl of internal standard mix, containing the following six internal standards (IS): 1,2-di-o-octadecyl-sn-glycero-3-phosphocholine (0.6 µM), 1,2-di-O-phytanyl-sn-glycero-3-phosphoethanolamine (1.2 µM), C8-ceramide (0.6 µM), N-heptadecanoyl-D-erythro-sphingosylphosphorylcholine (0.6 µM), undecanoic acid (0.6 µM), and trilaurin (0.6 µM), (Avanti Polar Lipids and Sigma-Aldrich). Quality controls were derived from pooling all samples and serially diluting with chloroform. Twenty-five microlitres of the sample/IS mixture was transferred to a glasscoated 384-well plate and 90 µl mass spectrometry (MS) mix [7.5 mM NH4Ac IPA:MeOH (2:1)] added and then sealed. Lipidomics was performed using chip-based nanospray with an Advion TriVersa Nanomate (Advion) interfaced to the Thermo Exactive Orbitrap (Thermo Scientific). Briefly, a mass acquisition window from 200 to 2000 m/z and acquisition in positive and negative modes were used with a voltage of 1.2 kV in positive mode and − 1.5 kV in negative mode and an acquisition time of 72 s. Raw spectral data were processed as previously described [112]. Raw data were then converted to.mzXML (usingMSconvert [113] with peakpick level 1), parsed with R, and 50 spectra per sample (scan from 20 to 70) were averaged using XCMS42, with a signal cutoff at 2000. The files were aligned using the XCMS [114,115] grouping function using "mzClust" with a m/z-window of 22 ppm and a minimum coverage of 60%. Compound annotation was automated using both an exact mass search in compound libraries as well as applying the referenced Kendrick mass defect approach. Signal normalisation was performed by summing the intensities of all detected metabolites to a fixed value to produce a correction factor for the efficiency of ionisation. Exact masses were fitted to the lipid species library and subsequently annotated to the peak as described before [82].

Plasma proteomics Sample preparation
Plasma was precleared by centrifugation at 3000g for 10 min and bound to 100 µL of calcium silicate matrix (CSM, 4 mg/mL) by rotation for 1 h. The sample was centrifuged at 14000g for 1 min, and the supernatant was removed for further analysis. The pellet was washed in ammonium bicarbonate (50 mMoL, 1 mL) 3 times using the same centrifugation settings. The sample was then reduced for 30 min at 65 °C using 200 µL of DL-dithiothreitol (DTT) premix (ADC 2%: ammonium bicarbonate 50 mMoL: DTT 1 MoL in the ratio of 50:49:1) and alkylated for 30 min in the dark with iodoacetamide (IAA) at 20 mMoL. Ammonium bicarbonate was added to dilute the ADC to 0.5%. Trypsin was added in the ratio of 1:25 trypsin to plasma and incubated overnight at 37 °C. The ADC was precipitated with 1% formic acid (FA) and centrifuged at 14000g for 10 min. The peptides were isolated using solid phase EMPORE C18 discs which had been washed with 1 stem of methanol and 3 stem of 0.1% FA. The sample was left to bind to the column for 30 min before washing with 0.1% FA and eluting with 60% acetonitrile (ACN) with 0.1% FA and then 80% ACN with 0.1% FA. The ACN was removed by speed vacuum for 1 h 15 min and freeze-dried overnight. Peptide suspended in 30 µL of 0.1% FA and a peptide assay was performed to calculate the amount of peptides. Ten microlitres of peptides was removed from each sample and 0.1% FA added to equalise the volume and spiked with an internal standard protein (yeast alcohol dehydrogenase, ADH), with a known amount of 50 fmol injected for each run.

Proteomic data processing and analysis
Progenesis QI for Proteomics (Nonlinear Dynamics, Waters Corporation, UK) was employed to identify and quantify proteins. The human database from UniProtKB was downloaded and used in FASTA format. The proteomic raw data were searched using strict trypsin cleavage rules with a maximum of two missed cleavages. Cysteine (Carbamidomethyl C) was set as a fixed modification. Deamination N, Oxidation M and Phosphoryl STY were selected as variable modifications. Minimum of 2 fragments per peptide, minimum of 5 fragments per protein and minimum of 2 peptides per protein were set for parameters of identification. The maximum protein mass was set to 1000 kDa. The false rate discovery (FDR) for protein identification was set at a maximum rate of 1%. Then, proteomic data generated from using the Progenesis QI were exported to Microsoft Excel for further data analysis.
For differential analysis, we used LIMMA (v.3.37.4) [116]. Because we compared obese and post-surgery patients, we performed a paired analysis. We then applied a threshold of 0.1% on ordinary q value.
To define whole blood-specific genes, we exported GTEx project [117] expression table (in TPMs), converted it into SummarizedExperiment container using SummarizedExperiment R package ((v.1.11.6); Morgan M, Obenchain V, Hester J, Pagès H SummarizedExperiment: SummarizedExperiment container. (2019)) and used teGeneRetrieval function from the TissueEnrich R package (v.1.2.1) [118]. This package relies on Human Protein Atlas [119] to grouped genes as follows: tissue enriched (genes with an expression level greater than 1 TPM that also have at least fivefold higher expression levels in a particular tissue compared to all other tissues), group enriched (genes with an expression level greater than 1 TPM that also have at least fivefold higher expression levels in a group of 2-7 tissues compared to all other tissues, and that are not considered tissue enriched) and tissue enhanced (genes with an expression level greater than 1 TPM that also have at least fivefold higher expression levels in a particular tissue compared to the average levels in all other tissues, and that are not considered tissue enriched or group enriched). With default parameters, we identified 693 whole blood-specific genes. Finally we intersected genes coding for differentially abundant proteins and whole blood-specific genes.

Weighted correlation network analysis (WGCNA)
WGCNA [48] is a correlation-based method that describes and visualises networks of data points, whether they are gene expression estimates, metabolite concentrations or other phenotypic data. To increase statistical power, we merged the patient groups under the assumption that they share similar associations of metabolites and phenotypic traits. We followed the protocols of WGCNA to create metabolic networks. Metabolites are clustered into co-abundant "modules". Low correlations can be suppressed either in a continuous ("soft") manner or the discontinuous ("hard") thresholding used in constructing unweighted networks. To maintain scale-free topology, we estimated an applied power by computing soft threshold with pickSoftThreshold function from WGCNA R package (v.1.64-1) [120]. To build network, we used blockwiseModules function with the following options: TOMType = "signed", minModuleSize = 20, reassignThreshold = 0, mergeCutHeight = 0.25 and cor-Type = "bicor". Each obtained module is notated by a unique colour. Additionally, we assigned a name to each consensus module. Each module abundance profile can be summarised by one representative metabolite: the module eigen metabolite. Specifically, the module eigen metabolite was defined as the first right-singular vector of the standardised module expression data [121]. We performed 3 analysis: extreme phenotypes (obese individuals and lipodystrophy patients were combined to get minimal sample size for network analysis), donors (all BD individuals) and a consensus analysis. We identified 8, 22 and 16 modules with donors, patients and consensus data, respectively. Regarding consensus analysis, we considered 988 metabolites; of these, 375 were assigned to 15 different modules and the remaining 613 were put in an ad hoc extra module because they did not show any correlation. We computed eigenmodule and biochemical parameters correlations (leptin-adiponectin ratio (LAR), glucose (GLC), triglycerides (TG), total cholesterol (TC), high-density lipoprotein (HDL-C), lowdensity lipoprotein (LDL-C), alanine amino-transferase (ALT), aspartate amino-transferase (AST), Homeostatic Model Assessment for Insulin Resistance (HOMA-IR) and adipose tissue insulin resistance (AT-IR) indexes and high-sensitivity C-reactive Protein (hsCRP) and also weight (WGT), BMI and age) using cor function from stats R base package (R version 3.5.0) and Pearson method (default). p value of each correlation was computed using corPvalueStudent function from WGCNA R package.
Pathways enrichment analysis was performed with MetaboAnalyst [122] and in particular Pathway analysis module by submitting combined list HMDB identifiers for clusters C1 and C2, hyper-geometric test, relativebetweenness centrality topology analysis and KEGG database. In addition, we submitted these lists to the Reactome database.

Multi-omic integration Training datasets
We identified 16 BD individuals, named hereafter as lean-BD, according to the following criteria: BMI < 25, glycaemia (GLUC) < 5.4 mmol/L, TG < 1.7 mmol/L, LDL < 2.59 mmol/L, HDL > 1 mmol/L for men and > 1.3 mmol/L for women, HOMA score < 2.2. For training the multi-omics predictive model (see below), we used a reduced training dataset comprising the subset of individuals having measurements across all omics layers. This reduced set comprised 6 lean-BD, 6 obese individuals and 10 lipodystrophy patients. For the clinical data, we first used multiple imputation by chained equations, as implemented in the mice R package (with default options) to impute missing values before construction of the training dataset. We used the same method to impute missing clinical values in the NASH cohort.

Variable selection: multivariable regression approach
For each of the omics layers considered independently, we used elastic net-penalised logistic regression as implemented in the glmnet R package to identify putative signatures that discriminated between all patients (i.e. lipodystrophy + obese) versus lean-BD. We adjusted for age and sex by including them as unpenalised covariates in the multivariable model. The elastic-net ɑ parameter was fixed at ɑ = 0.1, while the λ parameter was determined using cross-validation. Since different crossvalidation splits resulted in different choices for λ, we performed multiple rounds of cross-validation and used the value of λ that resulted in the maximum number of selections. We have provided a full assessment of this approach relative to possible alternatives elsewhere [123].

Single-layer predictive models
For each of the omics layers considered independently, we used the models fitted in the previous step to make predictions for the 96 individuals for which we had measurements across all omics layers.

Clinical predictive model
We trained a ridge-penalised logistic regression model predictive of the binary response (i.e. patient/lean-BD status) using the clinical training dataset.

Multi-omics predictive model
We used all omic variables selected by the multivariable approach described above (i.e. the full collection of selected variables, across all data layers), together with the clinical covariates, to train a ridge-penalised logistic regression model predictive of the binary response (i.e. patient/lean-BD status). We fitted this model using the reduced training dataset described above. We used this model to make predictions for the 96 individuals for which we had measurements across all omics layers. To allow us to make predictions for those individuals for which we only had measurements on a subset of the omics datasets, we additionally fitted models to each combination of subsets. A detailed analysis of this approach for selecting variables and training a multiomic predictive model, including simulation studies to assess both predictive performance and ability to identify relevant predictors, is provided in Cabassi et al. [123].

Validation of selected lipids
To further investigate the lipidomic signature, we prioritised a reduced set of nine lipid species that had been selected into the signature. These 9 species satisfied the following criteria: (1) they were selected into the lipidomic signature; and (2) using the Mann-Whitney test with Storey's q value method to correct for multiple testing, we were able to reject the null hypothesis of no difference in distribution for these lipids in all of the following comparisons: (1) obese versus lean-BD; (2) lipodystrophy versus lean-BD; and (3) {obese and lipod-ystrophy} versus lean-BD. All tests were performed using data from the present study only. Of these 9 species, we were able to match 8 with lipid species that had been quantified in a subset of 1507 participants of the Fenland study [80,82] which is a population-based cohort of 12,345 volunteers without diabetes born between 1950 and 1975 and recruited within the Cambridgeshire region between 2005 and 2015. We used linear regression analysis to test for association between plasma levels of the 8 lipid species selected into the lipidomic signature and all relevant CMS parameters quantified in both the reduced Fenland cohort, and the BD cohort, adjusting for age and sex, and using the Bonferroni method to control for multiple testing. To create a negative control set, we identified lipids that satisfied the following criteria: (1) they were not selected into the lipidomic signature; (2) they could be matched with lipid species that had been quantified in the reduced Fenland cohort; and (3) using the Mann-Whitney test with Storey's q-value method to correct for multiple testing, we were unable to reject the null hypothesis of no difference in distribution for these lipids in any of the following comparisons: (1) obese versus control; (2) lipodystrophy versus control; and (3) {obese and lipodystrophy} versus control. There were 37 lipid species that satisfied these criteria. We ranked these according to their mean absolute Pearson correlation with the 9 prioritised lipid species and selected the 5 lowest ranking as our negative control set.

Functional tests Neutrophils adhesion method
Polymorphonuclear granulocytes were isolated via density gradient (1.078 g/mL) from 3.2% sodium citrated whole blood within 2 h of venipuncture. Neutrophil purity was assessed by haematology analyser (Sysmex, XN-450) to ensure purity levels were satisfactory (≥ 90%) for subsequent functional assays. Isolated cells were incubated in a water bath at 37C for 30 min with fluorescently labelled Calcein-AM (4ug/mL, Molecular probes). Cells were washed twice with 1 × PBS and resuspended at 2 × 106/ml in HEPES complete medium supplemented with calcium (1 mM). 1.6 × 105 fluorescently labelled neutrophils were then added to relevant duplicate wells in a 96-well plate containing the following stimuli; fMLP, 10 µM; DTT, 10 mM; Pam3Cys, 20 µg/ml; LBP + LPS, 50 ng/mL and 20 ng/mL; PAF, 1 µM; PMA, 1 µg/mL; TNF, 10 ng/mL or HEPES only as a control in a final volume of in 100 µl. Cells were incubated for 30 min at 37C in a 5% CO2 incubator, after which they were washed twice using 1 × PBS before lysing in 100 µl PBS with 0.5% triton. A 100% adhesion control was generated by lysing 1.6 × 105 fluorescently labelled neutrophils in 0.5% triton. Fluorescent intensity was measured using a Tecan Infinite ® 200 PRO series plate reader (excitation of 485/20 nm and emission of 535/25 nm). The mean of duplicate values was calculated and the % adhesion over the hepes control calculated using the following formula: % adhesion = ((RFU stimuli − RFU HEPES)/ RFU 100% control) × 100.

CD63 expression
50ul of whole blood was incubated with antibodies: for 20 min, followed by a red cell lysis (BD FACS lyse) and resuspension in 0.2% formyl saline. Samples were analysed using flow cytometry (Beckman Coulter, FC500) within 4 h. Neutrophils were identified using scatter properties and CD16 positivity. BD CompBeads were used to generate compensation controls. The median fluorescence intensity (MFI) for each surface marker was calculated using Kaluza Analysis Software (Beckman Coulter).