VAnDa - Advanced Query Builder Documentation

What is VAnDa?

VAnDa (Variant Annotation Dashboard) is a web-based platform for annotating genomic variants from VCF files. It integrates multiple annotation sources to provide comprehensive functional, clinical, and population-level information for each variant.

Annotation Sources

VEP (Variant Effect Predictor) v114 — Functional consequences, transcript impact
dbNSFP v5.2a — Deleteriousness scores (CADD, REVEL, MetaSVM) and population frequencies
ClinVar — Clinical significance and disease associations
COSMIC — Cancer mutation annotations and genome screen counts
dbscSNV — Splice site predictions
Orphanet — Rare disease associations with HPO categories
gnomAD v4.1 — Population allele frequencies

Annotation Pipeline

The annotation process runs through the following steps:

Step 1: VCF Normalization
Normalizes variant representation using bcftools norm. Splits multiallelic variants, left-aligns indels, and ensures consistent representation.

Step 2: VCF Cleaning
Removes any pre-existing annotation fields (CSQ, ANN, GCSQ) using bcftools annotate to ensure a clean starting point.

Step 3: VEP Annotation
Runs Ensembl VEP v114 with the following plugins and custom annotations:

Plugins: dbNSFP (ALL fields), dbscSNV, CADD v1.7
Custom tracks: ClinVar (CLNSIG, CLNDN, CLNDISDB), COSMIC (TIER, GENOME_SCREEN_SAMPLE_COUNT)
Transcript selection: By default all transcripts are reported. With --severity flag, picks the most severe consequence per variant.

Step 4: Gene-Level Annotation
Adds gene-level information from dbNSFP using annotateNSFP_gene.py, including gene intolerance scores (pLI, LOEUF) and RefSeq gene annotations.

Step 5: Variant Extraction
Extracts relevant fields from the annotated VCF into a TSV format, filtering by disease category if specified (all, rare-disease, or cancer).

Step 6: HTML Report Generation
Generates an interactive DataTables HTML report with advanced query builder, HPO category badges, and clickable links to Orphanet and HPO ontologies.

Usage

Web Interface

The VAnDa web interface accepts two input modes:

Upload VCF file — Drag & drop or browse for a .vcf or .vcf.gz file (max 50 MB)
Submit variant list — Paste variant coordinates in chr,pos,ref,alt format

Filter Options

All variants — No disease-specific filter
Rare Disease — Filters for variants with known rare disease associations (Orphanet, ClinVar)
Cancer — Filters for variants with cancer annotations (COSMIC)

Genotype Control

Keep sample genotypes — Toggle ON to include genotype information in the output

Email Notification

Optional email notification when the annotation job completes (runs on SLURM cluster).

Output Files

Annotated VCF (`.vcf.gz`)

The full annotated VCF with all VEP, dbNSFP, ClinVar, COSMIC, and gene-level annotations in the INFO/CSQ field.

Variant TSV (`_vars.tsv.gz`)

Extracted tab-separated file with key annotation fields for downstream analysis.

HTML Report (`_vars.tsv.html`)

Interactive DataTables report featuring:

Sortable, searchable columns
Advanced query builder with AND/OR/NOT logic and grouping
Expandable rows showing all annotation fields
HPO category badges with links to OLS4 ontology browser
Orphanet and ClinVar links for disease associations
Export to CSV, Copy, Print functionality

Key Annotation Fields

Variant Identification

CHROM — Chromosome
POS — Genomic position (1-based)
REF — Reference allele
ALT — Alternate allele

Functional Impact

Consequence — Sequence Ontology term (e.g., missense_variant)
IMPACT — Impact category: HIGH, MODERATE, LOW, MODIFIER
SYMBOL — Gene symbol (HGNC)
Gene — Ensembl Gene ID

Pathogenicity Scores

CADD_PHRED — Combined Annotation Dependent Depletion score
phyloP100way_vertebrate — Basewise conservation score

Population Frequencies

gnomADe_AF — gnomAD exome allele frequency (v4.1)
dbNSFP_POPMAX_AF — Maximum population allele frequency

Clinical Significance

ClinVar_CLNSIG — Clinical significance (Pathogenic, Benign, etc.)
ClinVar_CLNDISDB — Associated disease databases

Cancer Annotations

Cancer — COSMIC variant ID
Cancer_TIER — Cancer gene tier classification
Cancer_GENOME_SCREEN_SAMPLE_COUNT — Number of screens with variant

Rare Disease

Orphanet_id — Orphanet disorder identifier
Orphanet_disorder — Disease name from Orphanet
HPO_id — Human Phenotype Ontology term IDs
HPO_Categories — Disease class categories (e.g., NEUROLOGICAL, CARDIOVASCULAR)

VAnDa: Variant Annotation Dashboard