snpToolkit is a computational framework written in Python 3. snpToolkit allows users to:
- Visualize the content of their VCF files.
- Filter SNPs based on multiple criteria:
- Distance between SNPs
- Coordinates of regions to exclude
- Depth of coverage
- Quality
- The ratio corresponding to the number of reads that have the mutated allele / total number of reads at that particular position.
- Annotate SNPs using genome annotation data provided within a genbank file.
- Extract the distribution of all indels according to genome annotation.
- Visualize and explore the annotated SNPs for all analyzed files.
- Combine all snpToolkit output files generated using the annotate option and produce:
- A table storing the distribution of all SNPs on each sample
- A fasta file with all concatenated SNPs for each sample. such file can be used to build a phylogenetic tree.
- Analyse your data using two dimentionality reduction methods: PCA and UMAP.
snpToolkit detects automatically if the input vcf files were generated using samtools mpileup, gatk HaplotypeCaller or freebayes. Vcf files can be in gzipped format or not.