Guidelines to Statistical Analysis of Microbial Composition Data Inferred from Metagenomic Sequencing
Vera Odintsova, Alexander Tyakht and Dmitry Alexeev
from: Metagenomics: Current Advances and Emerging Concepts (Edited by: Diana Marco). Caister Academic Press, U.K. (2017) Pages: 17-36.
Metagenomics, the application of high-throughput DNA sequencing for surveys of environmental samples, has revolutionized our view on the taxonomic and genetic composition of complex microbial communities. An enormous richness of microbiota keeps unfolding in the context of various fields ranging from biomedicine and food industry to geology. Primary analysis of metagenomic reads allows to infer semi-quantitative data describing the community structure. However, such compositional data possess statistical specific properties that are important to be considered during preprocessing, hypothesis testing and interpreting the results of statistical tests. Failure to account for these specifics may lead to essentially wrong conclusions as a result of the survey. Here we present a researcher introduced to the field of metagenomics with the basic properties of microbial compositional data including statistical power and proposed distribution models, perform a review of the publicly available software tools developed specifically for such data and outline the recommendations for the application of the methods.