MIBiG

Minimal Information about a Biosynthetic Gene cluster (MIBiG)

Created: 06.11.2012. Last edited: 02.01.2014
Project Title Minimum Information about a Biosynthetic Gene cluster

Project Lead Marnix Medema, mhmedema@mpi-bremen.de, Max Planck Institute for Marine Microbiology, Bremen

Principle Investigators

Eriko Takano, eriko.takano@manchester.ac.uk, University of Manchester, UK
Rainer Breitling, rainer.breitling@manchester.ac.uk, University of Manchester, UK
Frank Oliver Glöckner (GSC), fog@mpi-bremen.de, Max Planck Institute for Marine Microbiology, Bremen, Germany

Initial Team members

Lead: Marnix Medema, Pelin Yilmaz, Renzo Kottmann, Rainer Breitling, Eriko Takano, Frank Oliver Glöckner

Researchers from the natural products community who contributed to MIBiG design thus far (in alphabetical order):

Stefano Donadio, KtedoGen s.r.l.
Wilfred van der Donk, University of Illinois at Urbana–Champaign
Pieter Dorrestein, University of California San Diego
Michael Fischbach, University of California San Francisco
Brad Moore, University of California San Diego
Jörn Piel, ETH Zürich
Tilmann Weber, DTU / Novo Nordisk Foundation Center for Biosustainability

Elevator Pitch Much has been achieved by the GSC in standardizing the contextual data associated to genome and metagenome sequences as well as marker gene sequences. With this project it is proposed to extend these efforts into the realm of biotechnology, for gene clusters encoding the biosynthesis of important molecules that serve society as, e.g., antibiotics, biofuels and immunosuppressants.

Project Summary Bacteria, fungi and plants produce an enormous variety of small functional molecules with manifold biological activities, e.g., as antibiotics, immunosuppressants, and signaling molecules. The biosynthesis of such molecules is encoded by compact genomic units (biosynthetic gene clusters). Over the past decades, hundreds of biosynthetic gene clusters encoding the biosynthesis of secondary metabolites have been characterized. Although dozens of biosynthetic gene clusters are published and thousands are sequenced annually (with or without their surrounding genome sequence), very little effort has been put into structuring this information. Hence, it is currently very difficult to prioritize gene clusters for experimental characterization, to identify the fundamental architectural principles of biosynthetic gene clusters, to understand which ecological parameters drive their evolution, and to obtain an informative ‘parts registry’ of building blocks for the synthetic biology of secondary metabolite biosynthesis.
Therefore, developing a genomic standard for experimentally characterized biosynthetic gene clusters (e.g., Minimum Information about a BIosynthetic Gene cluster, MIBiG) would be of great value to the field of microbial secondary metabolism. Building on the MIxS standards for ecological and environmental contextualization, information on, e.g., enzyme function, substrate specificities, functional subclusters, regulatory and transport systems, operon structure, chemical moieties of the end compound and its intermediates, biosynthetic precursor compounds, compound bioactivity and molecular targets and compound toxicity could be added to allow cross-linking the information to biochemistry, pharmaceutical properties, genomic structure and ecology. Using the already developed computational pipeline for analysis of biosynthetic gene clusters antiSMASH (http://antismash.secondarymetabolites.org/), which has quickly become a standard in the field, information on characterized biosynthetic gene clusters will be linked to the untapped wealth of thousands of unknown gene clusters that have recently been unearthed by massive genome sequencing efforts. Taken together, this has the potential to guide the characterization of new metabolites by allowing to optimize the sampling of diversity at different levels and to identify the biochemical, genomic and ecological parameters that are key predictors of pharmaceutically relevant biological activities. Moreover, it can transform the unordered pile of literature on secondary metabolites into a structured and annotated catalogue of parts that can be used as building blocks to design new biochemical pathways with synthetic biology.

What will this project aim to contribute to the GSC? It will address one of the key aims of the GSC, the better description and contextualization of sequence data, of important sequence elements, with concrete biotechnological applications that will increase the visibility and applicability of the GSC and its mission in the fields of applied microbiology, synthetic biology, natural products chemistry and enzymology.

Have you spoken about the project already within GSC? The project was presented at the GSC 15 meeting at NIH (Bethesda, MD, USA).

Which existing projects, if any, does this one replace/complement/subsume/expand? The project will utilize the environmental and ecological parameters from the MIxS standards, but will complement these with specific biosynthetic, enzymological, and biochemical parameters to fulfill the needs of scientists studying secondary metabolite biosynthesis.
Further integration with the MIxS standards is foreseen, especially for those genomes that are sequenced primarily to explore their biosynthetic potential (and will therefore often need to adhere to both standards).

How does this project fit into GSC’s mission statement? It will provide and standardize contextual (meta)data for biosynthetic gene clusters, an important class of genomic elements of high biotechnological value.

Will you start a GSC working group? A GSC project team will be established to develop the standard involving the natural products / biosynthesis community outside GSC. Later on, a working group as part of the GSC compliance and interoperability working group could be envisioned.

How do you wish to further engage the GSC? Collaboration and support will be sought both within and outside the GSC for all relevant aspects (see also point 8). This includes requesting talks and sessions at future GSC meetings, presenting at relevant biosynthesis community meetings (see also point 15), and organizing transparent communication and discussion of the MIBiG standards development, including through teleconferences and mail exchanges.

Do you already have a website or do you wish to create a home page for the project in the GSC website? The MIBiG repository is available under: http://mibig.secondarymetabolites.org. This database is linked to the existing tool http://antismash.secondarymetabolites.org, likely at the same domain, and users can access the JSON specifications of the MIBiG schema and options, as well as a tarball with all MIBiG entries in raw JSON format is available for download.

What other resources might you like from what the GSC can offer? A mailing list would be a useful communication tool. Also, an issue tracker such as used for MIxS would be worthwhile to set up.

What kind of timeline are you working to for building consensus, releasing a first version etc? Nine months are planned for initiation of the project: acquiring funding, setting up the core team, investigating the needs of the community, and obtaining an initial consensus. One and a half years are then planned for reaching a final consensus, specifying the standard and implementing it in a database.

How is this work currently funded? An NWO Rubicon grant has been received for a two-year full-time postdoctoral research project to set up the standard and database. Additional support will come from both existing and new collaborations among both GSC members and the biosynthesis community.

What are your current plans for publishing/promoting the project? A community paper titled “Minimum Information about a Biosynthetic Gene cluster” was published in Nature Chemical Biology in 2015. The paper is open access, and can be accessed following this link.

References or relevant websites

http://mibig.secondarymetabolites.org – MIBiG database
http://antismash.secondarymetabolites.org – web server for automated identification and analysis of biosynthetic gene clusters. See Medema et al. (2011) antiSMASH: rapid identification, annotation and analysis of secondary metabolite biosynthesis gene clusters in bacterial and fungal genome sequences. Nucleic Acids Research 39: W339-W346.
Osbourn (2010) Secondary metabolic gene clusters: evolutionary toolkits for chemical innovation. TRENDS in Genetics 26: 449-457. (Review article on biosynthetic gene clusters)
Nett, Ikeda & Moore (2009) Genomic basis for natural product biosynthetic diversity in actinomycetes. Natural Product Reports 26: 1362-1384. (Review on microbial biosynthetic gene clusters encoding the biosynthesis of a plethora of different compounds)
Fischbach & Walsh (2010) Natural Products Version 2.0: connecting genes to molecules. Journal of the American Chemical Society 132: 2469-2493. (Perspective on the integration of natural products chemistry with genomics, bioinformatics, synthetic biology and ecology.)
Medema et al. (2011) Exploiting plug-and-play synthetic biology for drug discovery and production in microorganisms. Nature Reviews Microbiology 9: 131-137. (Perspectives article on the implementation of biosynthetic gene clusters in synthetic biology.)
Medema et al. (2012) Computational tools for the synthetic design of biochemical pathways. Nature Reviews Microbiology 10: 191-202. (Review article on future computationally guided design of biosynthetic gene clusters for synthetic biology)