Ayuda
Ir al contenido

Dialnet


Resumen de Development of computational and experimental tools for the identification of small proteins in bacterial genomes

Samuel Miravet Verde

  • Small proteins (SEPs; <100aa) are involved in essential processes such as cell homeostasis, signalling, or metabolism. However, they have been overlooked by computational and experimental approaches and their identification has mostly relied on serendipity. This implies a gap of knowledge in current reference genomes, which makes genome complexity to be underestimated. To address these limitations, the objectives of this PhD thesis are: 1) To define new computational approaches for the efficient annotation of SEPs in bacterial genomes. 2) To critically assess the identification of small proteins by available technologies. 3) To standardize the bioinformatic analysis of transposon sequencing technologies, expanding their application in genome studies and improving essentiality studies. 4) To define a high-throughput experimental approach to identify expressed proteins, including SEPs.

    In this thesis, I first evaluate different ‘-omic' technologies, including mass spectrometry, RNA sequencing and ribosome profiling to characterize their capability to recall SEPs. Limitations are evaluated and overcome by defining RanSEPs, a machine learning bioinformatic tool able to identify SEPs using species-specific sequence features, homology information and random forest models. I show that this approach predicts validated SEPs with an accuracy of 95%, outcompeting previous annotation algorithms. Moreover, running this tool in 109 bacterial genomes shows that the representation of SEPs in proteomes could increase from 10% to 25%. In addition, some annotated non‐coding RNAs could encode for SEPs. Finally, a functional bioinformatic evaluation of the predicted SEPs highlights an enrichment in membrane, translation, metabolism, and nucleotide‐binding categories; additionally, 9.7% of the SEPs included a N‐terminus predicted signal peptide; indicating these SEPs could play roles such as in quorum sensing or as bacteriocins.

    Then, two different tools I developed to aid the bioinformatic analysis of transposon sequencing (Tn-Seq) data are defined. First FASTQINS, a standardized pipeline to extract insertions profiles from raw data; and then ANUBIS, a computational framework to cover every Tn-Seq data analysis step. Application of these tools under different sample conditions in Mycoplasma pneumoniae allow to recover unprecedented coverage levels (1.5 insertions per base resolution) which allow the characterization of specific artifacts. As a novelty in the field, we introduce a new model based on unsupervised clustering, to provide estimates without prior knowledge on the essentiality of the organism.

    Finally, the application in the detection of SEPs resulting from the standardization of Tn-Seq approaches is introduced with the definition ProTInSeq. This is a methodology to explore proteomes using ultra-deep sequencing and mutated transposon vectors where a resistance or marker is expressed only when inserted in-phase to an ORF. Preliminary results of this library indicate that it can be used to perform quantitative protein studies, reveal membrane topology features and also identify SEPs being expressed.

    In conclusion, the proposed computational and experimental methodologies lay the foundation of future studies in the search for bioactive small proteins in available and new genome projects. Considering the roles small proteins can play, from homeostasis regulation to antimicrobial capacities; future findings in the field of small proteins will be of great impact in areas such as rational genetic engineering, or microbial therapies.


Fundación Dialnet

Dialnet Plus

  • Más información sobre Dialnet Plus