ARCIMBOLDO[1-4] is an ab initio phasing method for macromolecular crystallographic X-ray diffraction data, which combines location of model fragments such as polyalanine ¿- helices with the program PHASER[5] and density modification and main chain autotracing with the program SHELXE[6]. The method has been named after the Italian painter Giuseppe Arcimboldo (1526-1593), who used to compose portraits out of common objects such as fruits and vegetables (Figure 0-1). Following the analogy, ARCIMBOLDO composes an unknown structure by assembling small secondary structure elements, which are conserved across families of unrelated tertiary structure. Exploiting this method requires a multi-solution approach due to the difficulty to recognize correct solutions at early stages. Moreover, phasing a structure starting from partial information provided by such a small percentage of the total model (around 10% of the main chain atoms) is challenging and requires evaluation of alternative hypotheses under statistical constraints to avoid combinatorial explosion. ARCIMBOLDO methods have proven successful in many cases of previously unknown structures[3] and also on a pool of test structures[4]. The program can accept any Sohnke space group and all the most frequent ones are represented in the pool of structures Figure 0-1 L'Ortolano Giuseppe Arcimboldo. Civic Museum ¿Ala Ponzone¿, Cremona, Italy solved so far. In both studies data were collected in the most common protein space groups. Data quality is crucial for phasing methods, and particularly sensitive for ARCIMBOLDO, where low resolution (worse than 2.1 Å) and lack of completeness (less than 98%) drastically decrease the chance of success. Location of secondary structure elements is not indicated as phasing method for large structures or complexes (over 400 residues) unless very long helices are present and high resolution data are available. Such cases would require the placement of many fragments in order to assemble 10% of the main chain, which can lead to an unmanageable number of solutions. To approach correctly this different scenario we have implemented dedicated methods in ARCIMBOLDO_BORGES[7] and ARCIMBOLDO_SHREDDER[8]. These programs exploit libraries of folds or large search models and are described later in the text. The current implementation[4], coded in Python, is deployed as a standalone binary, freely available under registration from http://chango.ibmb.csic.es/download. The binary is compatible with common Linux distributions and latest versions of the Mac OSX operating system. Users can find online manuals, tutorials and documentation in our website. As of 30th April 2015, it has been downloaded 664 times and distributed to 121 research groups; furthermore, it has been installed in many European synchrotron facilities such as the Alba Synchrotron in Spain, the Diamond Light Source in United Kingdom and SOLEIL Synchrotron in France. The software is also available through SBGrid Consortium (https://sbgrid.org), a network of institutions across 19 countries, which provides a distributed grid network of computers to run structural biology software. We have recently started a collaboration with the San Diego Supercomputer Center (http://www.sdsc.edu) in California (USA), to develop optimized and dedicated versions of the programs for their platform with the aim of addressing difficult phasing cases. Due to this recent spread in the crystallographic community ARCIMBOLDO has been presented in many international conferences such as the International Union of Crystallography Meeting in Madrid (ES) 2011 and in Montreal (CA) 2014; the European Crystallographic Meeting in Bergen (NO) 2012, Warwick (UK) 2013; and many schools and workshops such as the International School of Crystallography in Erice (IT) 2012 and Macromolecular Crystallography School in Madrid (ES) 2014. This thesis is organised in the standard scientific format comprising five main parts: 1. INTRODUCTION: introducing the theoretical topics directly or indirectly related to the contents of the thesis and also discussing the state of the art of current scientific production related to the objective proposed. 2. OBJECTIVES: listing all general goals and particular aims of the doctoral project conducted. 3. MATERIALS AND METHODS: detailing the hardware and software environment, including third party software and algorithms employed in the project. 4. RESULTS AND DISCUSSION: presenting all the produced algorithms, software, experiments and tests that correspond to the prefixed objectives. 5. CONCLUSION: summarising the whole project and listing its achievements by the end of the doctoral studies. Chapter 1 in the Introduction, is dedicated to the subject of the X-ray macromolecular crystallography. After introducing the relevance of this science in the structural biology field and consequently in biomedicine, Section 1.2 focuses on the phase problem and on established methods employed to overcome it. Particular attention is paid to molecular replacement (Section 1.3) and ab initio methods (Section 1.5) presenting their strengths and limitations. Section 1.6 is dedicated to modelling, structure prediction and general use of external fragments in the determination of a protein structure. Chapter 2 introduces technical computational environments, such as grid networks (Section 2.2), supercomputers (Section 2.3), multiprocessing machines (Section 2.1) providing references for specialised reading. Each Chapter in the Results and Discussion section is dedicated to one of the designed programs implemented to solve and investigate set objectives. Chapter 3 is dedicated to ARCIMBOLDO and its underlying method to phase structures through unspecific secondary structure elements. The algorithm is described in detail, comprising underlying mathematics and geometry. Both the lite version (Section 3.1), as the simplest implementation for ARCIMBOLDO, and the extended distributed computing approach (Section 3.3), are explained. Section (3.2) deals with the testing of the deployed version considering both computing resources and generality of the method. The list of structures that, to our knowledge, have been solved with ARCIMBOLDO is presented in Section 3.4, discussing some of the cases that use particular features available in the program. Finally the solution with ARCIMBOLDO_LITE of a 13-fold superhelix[9] previously unknown structure is described in (Section 3.5). Chapter 4 deals with ARCIMBOLDO_SHREDDER, a method for phasing exploiting distant homologs. The method uses only crystallographic data to evaluate fragmented portions of the template model, the algorithm is described along with the statistics used, and designed formulas (Section 4.1). Section (4.2) is dedicated to the published case of MltE, which was solved with this method and from which the current program configuration has been derived. Chapter 5 deals with BORGES, a program to define, extract, superpose and cluster libraries of small local folds. First, the central mathematical constructions and operations are described. It introduces the novel concept of Characteristic Vector[7] to describe secondary structure elements (Section 5.1), statistics to prove its generality (Section 5.2) and its use to describe local fragment distortion (Section 5.4). The method is presented in its first prototype version (Section 5.5) that allowed the generation of some basic libraries, described in Section 5.10. These libraries have been successfully exploited several times for phasing test and unknown structures. The present implementation (Section 5.6) reorganises procedures under the same general idea and introduces a new algorithm. Its new features allow the creation of more complex folds described in Section 5.11, namely knowledge-based DNA binding motifs of contiguous fragments with exposed loops. Structural comparison, superposition (Section 5.7) and geometrical clustering (Section 5.8) are crucial in the creation of such libraries and their algorithms and parameterization are then discussed. Chapter 6 illustrates the ARCIMBOLDO_BORGES program. This software makes use of the BORGES libraries to enforce unspecific tertiary structure for phasing. Section 6.1 describes the single-machine implementation while Section 6.3 elaborates on the supercomputing and distributed version. As for ARCIMBOLDO, this Chapter also discusses test cases (Section 6.2) and previously unknown phased structures (Section 6.4). In particular two interesting cases are considered: the case of the coiled coil plectin fragment of the Rod domain (Section 6.5) and the case of a virus structure that is the first success, for an ARCIMBOLDO-based method, on an unknown all-ß structure (Section 6.6). The last Chapter (7) is rather technical but can be interesting for developers and details a series of procedures designed and implemented in the programs presented. The management of the I/O (Section 7.3); grid support for many middleware systems and for remote access (Section 7.4), are arguments discussed and for which algorithms and protocols have been established. This Chapter also describes the development environment (Section 7.2) and the mechanism for deploying the program binaries (Section 7.1). The thesis includes also the following parts: ¿ OUTLOOK: anticipating the on-going projects and elucidating about the possible developments starting from the achieved objective. ¿ REFERENCES: listing all the bibliography cited in the text. Hyperlinks are provided for the digital version. APPENDICES: including personal scientific production, posters and communications presented, attendance at schools and congresses. 1. Rodríguez, D.D., Grosse, C., Himmel, S., González, C., de Ilarduya, I.M., Becker, S., Sheldrick, G.M., and Usón, I., Crystallographic ab initio protein structure solution below atomic resolution. Nature methods, 2009. 6: p. 651-3. 2. Rodríguez, D., Sammito, M., Meindl, K., De Ilarduya, I.M., Potratz, M., Sheldrick, G.M., and Usón, I., Practical structure solution with ARCIMBOLDO. Acta Crystallographica Section D Biological Crystallography, 2012. 68: p. 336-343. 3. Millán, C., Sammito, M., and Usón, I., Macromolecular ab initio phasing enforcing secondary and tertiary structure. IUCrJ, 2015. 2: p. 95-105. 4. Sammito, M., Millán, C., Frieske, D., Rodríguez-Freirea, E., Borges, R.J., and Usón, I., ARCIMBOLDO_LITE: single workstation implementation and use. Acta Crystallographica Section D Biological Crystallography, 2015. 5. Mccoy, A.J., Grosse-kunstleve, R.W., Adams, P.D., Winn, M.D., Storoni, L.C., and Read, R.J., Phaser crystallographic software. Journal of applied crystallography, 2007. 40: p. 658-674. 6. Sheldrick, G.M., Macromolecular phasing with SHELXE. Zeitschrift für Kristallographie, 2002. 217: p. 644-650. 7. Sammito, M., Millán, C., Rodríguez, D.D., de Ilarduya, I.M., Meindl, K., De Marino, I., Petrillo, G., Buey, R.M., de Pereda, J.M., Zeth, K., Sheldrick, G.M., and Usón, I., Exploiting tertiary structure through local folds for crystallographic phasing. Nature methods, 2013. 10: p. 1099-101. 8. Sammito, M., Meindl, K., de Ilarduya, I.M., Millán, C., Artola-Recolons, C., Hermoso, J.A., and Usón, I., Structure solution with ARCIMBOLDO using fragments derived from distant homology models. FEBS Journal, 2014: p. n/a---- n/a. 9. Schoch, G.A., Sammito, M., Millán, C., Usón, I., and Rudolph, M.G., Structure of a 13-fold superhelix (almost) determined from first principles. IUCrJ, 2015. 2: p. 177-187.
© 2001-2024 Fundación Dialnet · Todos los derechos reservados