Development of new bioinformatic tools to improve mass spectrometry-based analysis of the lipidome

María Isabel Alcoriza Balaguer

Ayuda

Development of new bioinformatic tools to improve mass spectrometry-based analysis of the lipidome

Autores: María Isabel Alcoriza Balaguer
Directores de la Tesis: Agustín Lahoz Rodríguez (dir. tes.)
Lectura: En la Universitat de València ( España ) en 2023
Idioma: inglés
Número de páginas: 294
Tribunal Calificador de la Tesis: Coral Barbas Arribas (presid.), Xavier Domingo Almenara (secret.), María Vinaixa Crevillent (voc.)
Programa de doctorado: Programa de Doctorado en Biomedicina y Biotecnología por la Universitat de València (Estudi General)
Materias:
- Química
  - Química analítica
    - Análisis cromatográfico
  - Bioquímica
    - Lípidos
Enlaces
- Tesis en acceso abierto en: webges.uv.es (pdf)
Resumen
- español
  El creciente interés por comprender el papel de los lípidos en la biología celular y la enfermedad ha promovido grandes avances en el campo de la lipidómica durante las últimas décadas. Sin embargo, la identificación de lípidos sigue siendo el principal cuello de botella en el flujo de trabajo del análisis lipidómico y, además, la interpretación biológica de los resultados obtenidos sigue siendo limitada. El principal objetivo de esta tesis es desarrollar métodos analíticos y herramientas computacionales que faciliten el análisis del lipidoma humano y ayuden a desentrañar la compleja red metabólica que subyace al metabolismo de los ácidos grasos. Para ello, se han desarrollado y evaluado dos herramientas diferentes en distintos escenarios biológicos relevantes: LipidMS, una herramienta dirigida al procesamiento de datos y a la anotación de lípidos en análisis lipidómicos basados en espectrometría de masas acoplada a cromatografía líquida, y FAMetA, una herramienta dedicada al análisis de la compleja red metabólica de los ácidos grasos, y basada en el uso de trazadores de carbono 13 y el análisis mediante espectrometría de masas.
- English
  The development of bioinformatics and analytical technologies has led to the emergence of omics approaches. These profiling platforms aim to determine the set of biomolecules (genes, proteins, metabolites, etc.) that are part of a biological system. Among them, metabolomics aims to characterize the set of metabolites, low molecular weight molecules that act as precursors, intermediates or end products of metabolism. The levels of metabolites are determined by all the biochemical processes responsible for their production, consumption and elimination and are therefore a direct reflection of the physiological state of the biological system under study. The great diversity of physicochemical properties of metabolites, which largely determine which analytical techniques should be used for their characterization, has favored the emergence of subdisciplines within metabolomics focused on the analysis of a specific group of metabolites with shared characteristics. Lipids are a numerous and heterogeneous subgroup of metabolites that are characterized by their hydrophobic or amphiphilic nature and are of great biological importance as intermediates or products of signaling pathways, structural components of cell membranes and sources of energy. The holistic analysis of these lipids has led to the establishment of lipidomics as a subdiscipline of metabolomics with its own entity and characteristics. Lipid metabolism plays a central role in biological systems and its study can contribute to the understanding of the mechanisms underlying different pathological conditions. In recent years, alterations in general lipid profiles and in particular lipid species have been identified in highly prevalent diseases such as cancer, non-alcoholic fatty liver disease, diabetes, heart disease and neurological diseases. Currently, there is great interest in understanding the role of lipids not only in the pathophysiology of various diseases, but also to determine whether they could constitute new biomarkers for diagnosis, prognosis or response to treatment. However, most of the proposed lipid biomarkers are not validated or are not useful as clinical biomarkers due to the lack of specificity or sensitivity of these molecules. In addition, the biological interpretation of alterations in lipid metabolism is limited because the specific functions of most lipid species are still unknown. In most cases, only the overall levels of lipid classes and total free fatty acids are used for interpretation of the results, overlooking the fatty acid chain composition of complex lipids. Therefore, advances in analytical methods and bioinformatics tools that improve lipidome analysis are still required to fully understand lipid metabolism and its implications in each disease. Currently, liquid chromatography coupled mass spectrometry (LC-MS) is the most widely used analytical technique for metabolome and lipidome analysis. In LC-MS, metabolites are first separated by liquid chromatography and then ionized and detected by mass spectrometry. The final result is a set of raw data characterized by three variables, retention time (RT), mass-to-charge ratio (m/z) and intensity, which must be processed to extract the signals associated with the different metabolites present in the samples. Depending on the objective of a metabolomic analysis carried out by LC-MS, two types of approaches can be distinguished: targeted metabolomics, whose objective is the quantification of a set of well-characterized metabolites, and untargeted metabolomics, whose objective is to achieve the widest possible coverage of the metabolome. The targeted approaches are carried out with low resolution mass spectrometers, such as a triple quadrupole (TQ), and for each metabolite of interest the characteristics to be used in its detection must be defined a priori, i.e. its molecular ion (precursors or parent ions) and the characteristic fragments generated after their fragmentation in the collision cell (fragments or daughter ions). These devices usually work in multiple reaction monitoring (MRM) mode in which multiple metabolites of interest are detected based on the aforementioned characteristics. In non-directed approaches, in the absence of a predefined set of metabolites of interest, the data must be processed with the aim of extracting signals from as many a priori unknown metabolites as possible. Metabolite identification is performed both on the basis of the exact mass of the detected molecular ion and on the basis of its structure, elucidated by molecular ion fragmentation. Therefore, untargeted analysis is usually performed with high mass resolution equipment that also possesses the ability to fragment the generated ions. In most cases the equipment has a quadrupole that allows the ions of interest to be filtered prior to their fragmentation in the collision cell and subsequent analysis. Depending on whether or not there is prior filtering of the ions in the quadrupole before they are introduced into the collision cell, we can distinguish between data dependent acquisition (DDA), in which a certain number of ions are selected and subsequently fragmented in the quadrupole, or data independent acquisition (DIA), in which all the ions that coelute at a given time are introduced into the collision cell. In the case of data acquired in DDA there is a direct connection between the generated fragments and the precursor, whereas in the case of DIA data analysis techniques must be used in order to establish the connection/correlation between the precursors and their corresponding fragments. The most common devices for untargeted metabolomic analysis are the quadrupole-time-of-flight (Q-TOF) and the quadrupole-orbitrap. Despite the great interest in lipidomics in recent years, the great heterogeneity, the size of the lipidome and the lack of commercial standards make it difficult to correctly identify the lipids detected by untargeted LC-MS analysis, which remains the main bottleneck in the advancement of the study of lipidomics. Moreover, as already mentioned, the biological interpretation of the results is limited because the specific functions of most lipid species are still unknown. For this reason, the general objective proposed in this thesis was the development of new methods and bioinformatics tools to facilitate the characterization of the lipidome and the study of lipid metabolism, particularly fatty acids. To this end, two main objectives were proposed: 1) Development of a tool to improve lipid annotation in untargeted LC-MS analysis. This tool should cover all the necessary steps for data processing and implement lipid annotation based on fragmentation rules for DDA and DIA data. 2) Development of a method that allows the study of the set of reactions involved in fatty acid biosynthesis based on the combined use of LC-MS and 13C tracers. This thesis is divided into two chapters in which each of the two tools developed throughout this thesis, LipidMS (Chapter 1), an R package for untargeted LC-MS data processing and lipid annotation, and FAMetA (Chapter 2), a tool based on isotopologue distributions for the comprehensive analysis of fatty acid metabolism, both aiming to improve mass spectrometry-based lipidome analysis, are explained in detail. On the one hand, LipidMS was developed with the specific aim of improving lipid identification in LC-MS by using fragmentation rules. As already mentioned, the size, complexity and heterogeneity of the lipidome together with the lack of available lipid standards make lipid annotation one of the most limiting and costly steps of data processing in lipidomic studies by LC-MS. Accurate identification of any metabolite in LC-MS, requires checking the RT, m/z and MS/MS spectra with a commercially available standard. In the case of lipids, due to the enormous variety of lipid species and the small number of available standards, this strategy cannot be applied in most cases. In this sense, the definition of fragmentation patterns for different lipid classes has allowed the in silico construction of MS/MS spectra libraries that are used for lipid annotation by using spectral matching algorithms. However, this strategy has some limitations. First, a single m/z value for a precursor is not sufficient to identify the molecular ion due to the large number of overlaps between isomeric and isobaric species, so a correct annotation of isotopes and adducts is of utmost importance in non-directed lipidomics. Moreover, although MS/MS information can help to distinguish some of these overlaps, it is not sufficient in many cases where common fragments are obtained between different lipid classes or between different species of the same class. On the other hand, if the MS/MS spectrum contains a small number of fragments with high intensities, the similarity calculations between spectra may be biased resulting in the same or very similar results for different isobaric and isomeric species. This is very common in lipids, where class-specific fragments, which only report the subclass of a lipid (e.g. polar head fragments), or fragments corresponding to fatty acid chains that only report the chain composition, but not the class or subclass of the lipid species of interest, are common to a large number of species. On the other hand, when isobaric or isomeric compounds co-elute during chromatographic separation, which is also common due to the block-like structural nature of lipids, complex MS/MS spectra are obtained for both DDA- and DIA-acquired data, making lipid annotations difficult. As an alternative, lipid identification based on fragmentation rules and the presence or absence of the expected fragments for each lipid class has been implemented in a small number of bioinformatics tools. At the time this PhD thesis started, only a few tools, such as LDA or LipidMatch, were based on fragmentation rules, and most of them only worked with data acquired in DDA. On the other hand, MS-DIAL allowed working with data acquired in DIA, but lipid annotation was based on spectral matching. In later versions MS-DIAL incorporated annotation based on fragmentation rules through LipidMatch. In this context, LipidMS was initially designed with the aim of annotating lipids in individual samples using data acquired in DIA and fragmentation rule-based annotations, although it was later extended to DDA, as it is the most commonly used acquisition mode. On the other hand, LipidMS initially relied on the use of external processing tools to analyze sequences from multiple samples. To overcome this limitation, the new versions of the package have incorporated the necessary functionalities to cover the entire data processing workflow: peak extraction, alignment, clustering and peak integration. Once the matrix is generated with all the signals detected in the dataset, LipidMS starts the lipid identification in those samples acquired in DIA or DDA using the information from both MS1 and MS2. With respect to other available tools, LipidMS incorporates two strategies that help maximize the number of correct assignments and minimize incorrect ones. On the one hand, the set of fragmentation rules has been experimentally defined in such a way that it prioritizes the use of well-characterized class-specific fragments rather than more intense, but less specific, fragments such as fatty acid chains (which can be common to a large number of lipid classes). On the other hand, lipids often ionize in the form of multiple adducts (e.g. [M+H]+, [M+Na]+ and [M+NH4]+, in the case of ESI+). On many occasions the adducts of a particular lipid species can be confused with another species, therefore, a correct assignment of all detected adducts for a particular lipid prior to the analysis of the generated fragments helps to give more robustness to the generated identifications and to minimize the number of incorrect annotations. The latest version of LipidMS includes predefined fragmentation rules for 28 lipid classes and allows customization of both the fragmentation rules and the building blocks used to generate the libraries required for identification. Depending on the fragments found, each identified species can be annotated with different levels of structural elucidation: at the class level, when only fragments characteristic of the lipid class or subclass have been found, confirming the lipid type and total carbon and double bond composition but not the chain composition; at the fatty acid chain composition level, when in addition to the class fragments, fragments specific to these chains have been found; and at the chain position level, when the relative intensities of the fragments corresponding to the chains allow elucidation of the position of each of the fatty acids within the complex lipid structure. LipidMS was evaluated by analyzing an additive and non-additive commercial human serum with a total of 68 lipid standards and compared with two of the most commonly used software packages for metabolomics and non-targeted lipidomics data processing: XCMS and MS-DIAL. First, the comparison with XCMS demonstrates that the processing algorithms implemented in the latest version of LipidMS work correctly since the results obtained with both softwares are similar. On the other hand, the comparison with MS-DIAL shows that LipidMS reduces the number of incorrect identifications and improves the level of structural elucidation of the identified species despite the fact that MS-DIAL is able to annotate a much larger number of species, so LipidMS and MS-DIAL could be used in a complementary way. It is also important to highlight that LipidMS supports simultaneous processing of the following combinations of MS acquisition modes: all samples acquired in DIA; all samples acquired in DDA; combination of DIA and DDA samples; combination of full scan and DIA; combination of full scan and DDA; and combination of full scan, DDA and DIA, which allows easier and automatic integration of the results of annotations obtained in DIA and DDA with the rest of the data. Future improvements of LipidMS should include the extension of lipid classes and fatty acid chains and sphingoid bases used to provide better coverage of the lipidome, the standardization of LipidMS to make it compatible with other R packages, or the possibility to analyze lipid data labeled with isotopic tracers. On the other hand, FAMetA emerged in response to the second objective of this thesis, which was to develop a tool to facilitate the study of fatty acid metabolism. The use of 13C tracers and MS-based detection is the reference method for the analysis of fatty acid metabolism. This method is based on the successive incorporation of two-carbon units labeled with the stable 13C carbon isotope, via acetyl-CoA, into fatty acids during synthesis and elongation reactions and the subsequent analysis of the isotopologue distributions obtained (species of the same molecule differing only in mass as a consequence of the incorporation of 13C instead of 12C, which is the naturally majority species). Thanks to the difference in mass between the pre-existing species or those synthesized through unlabeled sources with respect to those generated from the 13C-containing source, an analysis of metabolism based on isotopologue distribution can be performed. Although several algorithms and tools have been developed to extract information on fatty acid metabolism by modeling these isotopologue distributions, they still provide limited and difficult to interpret information. Most of these methods only provide information on de novo lipogenesis for fatty acids up to 16 or 18 carbons or do not reflect the actual biological steps of elongation processes. Furthermore, desaturation is not taken into account for the complete fatty acid network. In order to overcome these limitations, we developed FAMetA, a tool that uses the fatty acid isotopologue distributions obtained by 13C-labeled acetyl-CoA incorporation to estimate each of the steps of most of the biosynthetic reactions involved in fatty acid metabolism: de novo lipogenesis (S), elongation (E), desaturation (Δ) and import (I). In addition, FAMetA allows estimation of the relative contribution of the tracer used to the acetyl-CoA pool (D0, D1 and D2, referring to whether it contains 0, 1 or 2 carbon 13 atoms respectively). Traditionally, de novo synthesis for fatty acids up to 16 carbons has been modeled using multinomial distributions that allow estimation of the parameters I, S and D0, D1, D2. However, in FAMetA we use quasi-multinomial distributions capable of modeling and quantifying the overdispersion (via the Φ parameter) usually observed in experimentally obtained distributions. For fatty acids longer than 16 carbons, in addition to the S and I parameters, up to five elongation terms (En, referring n=1 to the first elongation step for 18-carbon fatty acids and n=5 the last step for 26-carbon fatty acids) representing each of the individual elongation steps from a precursor with X carbon atoms, to a product of length X+2, are also estimated. Compared to previous tools, the way FAMetA calculates elongations better reflects how fatty acids elongate within cells, allowing a direct biological interpretation of the estimated elongation parameters. In addition, FAMetA incorporates indirect estimation of desaturation for the fatty acid metabolic network by a strategy that uses the estimated synthesis parameters for the precursor and product of the desaturation reaction instead of total labeling. Finally, the FAMetA workflow includes all the necessary functions for data processing, group comparisons and graphical results, which facilitates the interpretation of the results. To test the validity of the algorithms implemented in FAMetA, first, a set of isotopologue distributions was simulated from known values of the different parameters calculated by FAMetA, and it was verified that FAMetA is able to accurately determine the complete set of parameters of fatty acid synthesis (relative error < 15%, RSD < 15% for all parameters) provided that the relative contribution of the tracer (D2) and the parameters to be calculated for a given fatty acid, i.e., S, E1, E2, E3 and E4, are within the range 0. 05-0.9, ensuring its applicability in a real biological scenario. FAMetA was then evaluated in different biological scenarios both in vivo and in vitro, with and without the presence of known inhibitors of specific reactions of fatty acid metabolism, proving that FAMetA allows to determine the parameters associated to these reactions the complete metabolic network and, furthermore, in a scenario of inhibitor use, FAMetA is able to detect the specific changes induced in the metabolism. Moreover, compared to FASA, the only tool that so far included the analysis of elongated fatty acids beyond 18 carbons, FAMetA provides a more complete characterization of the fatty acid biosynthetic network, a better and more intuitive description of each of the synthesis parameters and a more complete workflow ranging from data processing to group-based comparisons and graphical representation. Finally, the use of specific inhibitors combined with FAMetA analysis has allowed us to study in depth the metabolic network of fatty acid biosynthesis in A549 cells, identifying 33 a priori unknown fatty acids, 11 of which could be confirmed with commercial standards. In addition, 12 of them have not been previously described in mammals, although they belong to already described n/omega series. Future versions of FAMetA should incorporate the analysis of other types of tracers besides 13C, allow the use of labeled fatty acids as tracers, extend the reaction network to include odd-chain fatty acids, and address degradation. In summary, compared to previous tools, FAMetA offers: (i) characterization of a broader fatty acid biosynthetic network as it includes in a single tool the analysis of de novo synthesis, elongation and desaturation; (ii) the possibility to execute the necessary steps from data processing to fatty acid metabolism analysis and graphical representation in a single tool; (iii) a user-friendly environment due to its implementation as an R package and a web version with graphical interface; (iv) better fit to experimental data thanks to the implementation of a quasi-multinomial fit that includes the parameter Φ to account for overdispersion of the data; (v) better modeling of elongation reactions, allowing easier interpretation of the estimated parameters; and (vi) easy-to-interpret parameters and graphical representations that allow meaningful biological conclusions to be drawn

Acceso de usuarios registrados

¿Olvidó su contraseña?

¿Es nuevo? Regístrese

Ventajas de registrarse

Dialnet Plus

Opciones de compartir

Opciones de entorno

Sugerencia / Errata

Coordinado por: