MetaProb: Accurate Metagenomic Reads Binning based on Probabilistic Sequence Signatures

MetaProb is a tool for metagenomic binning developed with the support of the Italian Ministry of Education, University and Reasearch Research within the Project of National Interest PRIN 20122F87B2 ``Compositional Approaches for the Analysis and Mining of Omics Data", PI Cinzia Pizzi


MetaProb is a novel assembly-assisted tool for unsupervised metagenomic binning. The novelty of MetaProb derives from solving a few important problems: how to divide reads into groups of independent reads, so that k-mer frequencies are not overestimated; how to convert k-mer counts into probabilistic sequence signatures, that will correct for variable distribution of k-mers, and for unbalanced groups of reads, in order to produce better estimates of the underlying genome statistic; how to estimate the number of species in a dataset. MetaProb is more accurate and efficient than other state-of-the-art tools in binning both short reads datasets (F-measure 0.87) and long reads datasets (F-measure 0.97) for various abundance ratios. Also, the estimation of the number of species is more accurate than MetaCluster. On a real human stool dataset MetaProb identifies the most predominant species, in line with previous human gut studies.

Download

MetaProb was implemented by Samuele Girotto and it is freely available for academic use at MetaProb bitbucket repository.
Contact address: Cinzia Pizzi.

Reference

If you use MetaProb, please cite:
S.Girotto, C.Pizzi, M.Comin: MetaProb: accurate metagenomic reads binning based on probabilistic sequence signatures.
Bioinformatics (2016) 32 (17): i567-i575. Pdf