MetaSNV is a pipeline for calling metagenomic single nucleotide variants (SNVs). It was designed to scale well with the exponentially increasing amount of metagenomic datasets and is capable of handling large multi-species references.
Input: MetaSNV takes a list of alignment files (BAM) as input. Each should represent one metagenomic sample aligned to a reference genome collection. The reference can be any multi-sequence FASTA file encompassing thousands of bacterial species.
Output: MetaSNV differentiates two classes of variants: population SNVs (pSNVs) and individual SNVs (iSNVs). The former are a non-reference nucleotide observed in more than 1% of all reads combined across all samples. If multiple different non-reference nucleotides are above this frequency, all are reported independently. The individual variants are those that fall below the 1% frequency population threshold, but are confidently observed in at least one sample. As such, positions that have a within-sample frequency above 10%, in a sample where that position is covered by at least 10 reads, are considered to be individual variants.
Analysis: MetaSNV offers scripts for the analysis of the output. Taxon, sample and position filters are applied post SNV calling. For downstream processing, pair-wise distance matrices are computed between all samples.
Conclusion MetaSNV is a very slim and versatile application, faster than comparable tools and due to its modular design user friendly and easily to customize.
or clone metaSNV from its git
repository and follow the instructions in the
readme to install the pipeline package.
MetaSNV is mainly written in bash, C++, and Python, and needs to be compiled prior to use. Setup and compile the pipeline as explained in the gitlab repository or the tutorial which also contains a worked out example.
3. Getting startedRead an elaborate tutorial to get started with metaSNV.