VAnDa: Variant Annotation Dashboard
Variant annotation server with advanced query builder
What is VAnDa?
VAnDa (Variant Annotation Dashboard) is a web-based platform for annotating genomic variants from VCF files. It integrates multiple annotation sources to provide comprehensive functional, clinical, and population-level information for each variant.
Annotation Sources
- VEP (Variant Effect Predictor) v114 — Functional consequences, transcript impact
- dbNSFP v5.2a — Deleteriousness scores (CADD, REVEL, MetaSVM) and population frequencies
- ClinVar — Clinical significance and disease associations
- COSMIC — Cancer mutation annotations and genome screen counts
- dbscSNV — Splice site predictions
- Orphanet — Rare disease associations with HPO categories
- gnomAD v4.1 — Population allele frequencies
Annotation Pipeline
The annotation process runs through the following steps:
Step 1: VCF Normalization
Normalizes variant representation using
Normalizes variant representation using
bcftools norm. Splits multiallelic variants, left-aligns indels, and ensures consistent representation.
Step 2: VCF Cleaning
Removes any pre-existing annotation fields (CSQ, ANN, GCSQ) using
Removes any pre-existing annotation fields (CSQ, ANN, GCSQ) using
bcftools annotate to ensure a clean starting point.
Step 3: VEP Annotation
Runs Ensembl VEP v114 with the following plugins and custom annotations:
Runs Ensembl VEP v114 with the following plugins and custom annotations:
- Plugins: dbNSFP (ALL fields), dbscSNV, CADD v1.7
- Custom tracks: ClinVar (CLNSIG, CLNDN, CLNDISDB), COSMIC (TIER, GENOME_SCREEN_SAMPLE_COUNT)
- Transcript selection: By default all transcripts are reported. With
--severityflag, picks the most severe consequence per variant.
Step 4: Gene-Level Annotation
Adds gene-level information from dbNSFP using
Adds gene-level information from dbNSFP using
annotateNSFP_gene.py, including gene intolerance scores (pLI, LOEUF) and RefSeq gene annotations.
Step 5: Variant Extraction
Extracts relevant fields from the annotated VCF into a TSV format, filtering by disease category if specified (all, rare-disease, or cancer).
Extracts relevant fields from the annotated VCF into a TSV format, filtering by disease category if specified (all, rare-disease, or cancer).
Step 6: HTML Report Generation
Generates an interactive DataTables HTML report with advanced query builder, HPO category badges, and clickable links to Orphanet and HPO ontologies.
Generates an interactive DataTables HTML report with advanced query builder, HPO category badges, and clickable links to Orphanet and HPO ontologies.
Usage
Web Interface
The VAnDa web interface accepts two input modes:
- Upload VCF file — Drag & drop or browse for a
.vcfor.vcf.gzfile (max 50 MB) - Submit variant list — Paste variant coordinates in
chr,pos,ref,altformat
Filter Options
- All variants — No disease-specific filter
- Rare Disease — Filters for variants with known rare disease associations (Orphanet, ClinVar)
- Cancer — Filters for variants with cancer annotations (COSMIC)
Genotype Control
- Keep sample genotypes — Toggle ON to include genotype information in the output
Email Notification
Optional email notification when the annotation job completes (runs on SLURM cluster).
Output Files
Annotated VCF (.vcf.gz)
The full annotated VCF with all VEP, dbNSFP, ClinVar, COSMIC, and gene-level annotations in the INFO/CSQ field.
Variant TSV (_vars.tsv.gz)
Extracted tab-separated file with key annotation fields for downstream analysis.
HTML Report (_vars.tsv.html)
Interactive DataTables report featuring:
- Sortable, searchable columns
- Advanced query builder with AND/OR/NOT logic and grouping
- Expandable rows showing all annotation fields
- HPO category badges with links to OLS4 ontology browser
- Orphanet and ClinVar links for disease associations
- Export to CSV, Copy, Print functionality
Key Annotation Fields
Variant Identification
CHROM— ChromosomePOS— Genomic position (1-based)REF— Reference alleleALT— Alternate allele
Functional Impact
Consequence— Sequence Ontology term (e.g., missense_variant)IMPACT— Impact category: HIGH, MODERATE, LOW, MODIFIERSYMBOL— Gene symbol (HGNC)Gene— Ensembl Gene ID
Pathogenicity Scores
CADD_PHRED— Combined Annotation Dependent Depletion scorephyloP100way_vertebrate— Basewise conservation score
Population Frequencies
gnomADe_AF— gnomAD exome allele frequency (v4.1)dbNSFP_POPMAX_AF— Maximum population allele frequency
Clinical Significance
ClinVar_CLNSIG— Clinical significance (Pathogenic, Benign, etc.)ClinVar_CLNDISDB— Associated disease databases
Cancer Annotations
Cancer— COSMIC variant IDCancer_TIER— Cancer gene tier classificationCancer_GENOME_SCREEN_SAMPLE_COUNT— Number of screens with variant
Rare Disease
Orphanet_id— Orphanet disorder identifierOrphanet_disorder— Disease name from OrphanetHPO_id— Human Phenotype Ontology term IDsHPO_Categories— Disease class categories (e.g., NEUROLOGICAL, CARDIOVASCULAR)