Public Data Resources as the Foundation for a Worldwide Metagenomics Data Infrastructure
Guy Cochrane, Maria J. Martin and Rolf Apweiler
from: Metagenomics: Theory, Methods and Applications (Edited by: Diana Marco). Caister Academic Press, U.K. (2010)
The public data resources serving nucleotide and protein sequence, functional annotation and sampling information provide the foundation for a worldwide bioinformatics data infrastructure. The traditional paradigm of genomic-level studies on isolated and identified organisms lies at the centre of these data resources. While metagenomics takes advantage of this existing paradigm, new methods and concepts are also required to rise to the many novel challenges presented by metagenomic data. In this chapter, we cover primary raw nucleotide sequencing data repositories, the annotated nucleotide sequence databases and the management of protein information from metagenomics studies, taking as example resources the European Nucleotide Archive and the UniProt Universal Protein Resource. We provide details of the information that is available from these resources, tools that support the use of the information, resources for data providers and the analytical pipelines that enrich these largely unannotated datasets read more ...