R, Bioconductor
filterVcf: Extract Variants of Interest from a Large VCF File (Paul Shannon)
We demonstrate three methods: filtering by genomic region, filtering on attributes of
each specific variant call, and intersecting with known regions of interest (exons, splicesites, regulatory regions, etc.).
Java
SelectVariants -- Select a subset of variants from a larger callset ( GATK SelectVariants )
Often, a VCF containing many samples and/or variants will need to be subset in order to facilitate certain analyses (e.g. comparing and contrasting cases vs. controls; extracting variant or non-variant loci that meet certain requirements, displaying just a few samples in a browser like IGV, etc.). SelectVariants can be used for this purpose.
Biostars
Question: How To Split Multiple Samples In Vcf File Generated By Gatk?
I did variant calling using BWA + PiCard + GATK and have just got the filtered VCF files from GATK. In the process of running GATK, I used list of inputs (11 samples) and for most steps, I had only one output file for each step. Now, I got two VCF files (one for SNPs and the other is for indels), each of which contains 11 samples. I can see the names of the 11 samples in the header of vcf files, and each sample seems to have one column of data. So I am wondering how to split each VCF files into individual sample vcf files?
bcftools
for file in *.vcf*; do for sample in `bcftools view -h $file | grep "^#CHROM" | cut -f10-`; do bcftools view -c1 -Oz -s $sample -o ${ file/.vcf*/.$sample.vcf.gz} $file donedone
vcf-subset
vcf-subset -c S1 bigfile.vcf > S1.vcf
REF: