Evidence Classification Tables for Regulatory Variants
The following tables outline the evidence classification used to score variants in RevUP. They were adapted from Van der Lee R, Correard S and Wasserman WW in “Deregulated Regulators: Disease- Causing cis Variants in Transcription Factor Genes” (Trends in Genetics, 2020).
Clinical Evidence - is there a causal link between genotype and phenotype?
Evidence Level | Evidence for Pathogenicity | Examples | |
---|---|---|---|
5 (High) | 5.2 Vb | Variant introduction (in a model organism results in a phenotype that is consistent with the human disease | - Transgenic model organism developed using CRISPR genome editing |
5.1 * V | Variant neutralization (in a model organism or cell line) rescues or reverses phenotype | - Knockout/knockdown/silencing of the variant in gain-offunction scenarios - Complementation in loss-of-function scenarios | |
4 | 4.2 V | Variant results (in a cellular phenotype) consistent with the disease phenotype - Insights are only relevant if the endophenotype assayed is consistent with the disease phenotype | - Patient-derived tissue/cells/induced pluripotent stem cells (iPSCs) - Transgenic cell lines - In vivo essays of cellular function |
4.1 V | Variant observed in multiple, unrelated families with the same disease phenotype - Typically only relevant when multiple well-described pedigrees are available, and in which the variant segregates with disease | ||
3 | 3.1 V | Variant shows familial segregation with the disease - Variant or locus segregates with affected and unaffected disease status in family pedigree - Should be considered stronger with increasing segregation data in larger families. For example, de novo variants identified in a family trio are weak segregation evidence. | - Trio whole-genome analysis - Linkage analysis |
2 | 2.5 @ V | Variant is a striking noncoding event - Often involves a large genomic alteration containing many candidate regulatory elements and variants | - Copy number changes, translocation/breakpoint mapping, aCGH, MLPA ... |
2.4 @ V | Variant is similar to another regulatory variant associated to the same suspected target gene and implicated in the same or a similar disease phenotype - Variants are often not the exact same, but should be justifiably similar: for example, strong overlap, affect the same TFBS. | - ClinVar, DECIPHER, GeneMatcher | |
2.3 @ V | Variant is considered deleterious by computational prediction methods - Using in silico tools designed to include noncoding variants | - CADD, FATHMM, FunSeq2, ReMM, NCBoost, ncGERP ... | |
2.2 G | Suspected target gene does not contain coding variants in the same individual - In the gene targeted by the regulatory variant or in other key genes for the phenotype under study - Assess accordance with expected inheritance of the phenotype; that is, present in one or both alleles | ||
2.1 @ G | Suspected target gene has been implicated in the same or a similar disease phenotype, or is otherwise relevant - OMIM disease genes, literature, and gene function can provide insight - Dosage sensitive and haploinsufficient genes may be of increased interest | - Other human patients - Knockout mice, biological link with phenotype, pLI metric | |
1 (Low) | 1.3 @ * V | Variant or locus previously statistically associated with the same or a similar disease phenotype | - GWAS, risk alleles - Somatic recurrence in cancer |
1.2 @ V | Variant is rare in unaffected individuals in specific sets of controls or reference population databases - Variant absent in databases of unaffected/control individuals; or present with a frequency less than expected given the penetrance and expressivity of the disease - Publications preceeding the appearance of large reference databases tend to depend on custom sets of control samples | - gnomAD - Control samples | |
1.1 @ V | Variant position is evolutionarily conserved | - Genome browser conservation tracks |
aSymbols: @, attainable with bioinformatics tools and databases; G, suspected target gene; R, regulatory region containing variant position; V, variant; *, conceptually possible but we did not find this in practice due to focus on rare disease
Functional Evidence - does the variant have a damaging effect on the gene?
Evidence Level | Evidence for Pathogenicity | Examples | |
---|---|---|---|
4 (High) | 4.1 V | Variant introduction (in a model organism) leads to changes in expression of target/reporter gene orchromatin environment - CRISPR genome editing, Cre-Lox recombination (endogenous) - Reporter gene constructs (exogenous) | - Any relevant technique listed under F2, below |
3 | 3.1 V | Variant introduction (in a cell line) leads to changes in expression of target/reporter gene or chromatin environment - Engineered cell line or in vitro model system | - Luciferase/LacZ reporter assay - MPRA/STARR-seq - Any relevant technique listed under F2 |
2 | 2.2 V | Variant leads to changes in expression of the target gene (in patient tissue) - Patient material compared with controls. Material from for example, tissue biopsy, cultured cells, iPSCs | - RT-qPCR/RNA-seq expression analysis - Allele-specific expression (ASE) - Staining, immunohistochemistry |
2.1 V | Variant causes a change in TF binding and/or chromatin environment - Strongest if the studied TF binding site and regulatory region are key proven regulators of target gene expression | - ChIP-seq/ChIP-qPCR/ChIP-MS (in vivo) - EMSA (in vitro) - Allele-specific binding (ASB) - Changes to chromatin domains/environment, DNA methylation | |
1 (Low) | 1.5 * R-G | Regulatory region is shown to regulate gene expression of the target gene - Typically involves characterization of a newly identified enhancer | - Transgene reporter expression |
1.4 @ V-G | Variant is statistically associated with expression levels of the target gene | - cis-eQTL | |
1.3 @ R-G | Regulatory region and target gene are directly linked based on annotation or experimental data | - Core and proximal promoter annotations - Chromosome conformation capture (e.g., 3C, 5C, Hi-C), ChIAPET | |
1.2 @ V | Variant localizes to a regulatory region based on genome annotations | - Known or predicted regulatory elements (enhancers, promoters) - Chromatin accessibility and histone ChIP-seq data | |
1.1 @ V | Variant position is implicated in TF binding based on experimental data | - TF ChIP-seq datasets (e.g., ENCODE and ReMAP) | |
0 (No Value) | 0.2 @ V-G | Variant and target gene locate to the same structural domain according to chromatin annotations | - 3D genome browser, TADs |
0.1 @ V | Variant predicted to change a TF binding (sequence) motif | - TFBS resources, (e.g., JASPAR, CIS-BP, Hocomoco) |
aSymbols: @, attainable with bioinformatics tools and databases; G, suspected target gene; R, regulatory region containing variant position; V, variant; *, conceptually possible but we did not find this in practice due to focus on rare disease
External Databases Used in the RevUP Scoring System
The following table outlines how several external databases were used as part of RevUP, including the source of the information, what information was extracted, what queries were used, and where they were used in the RevUP system.
External APIs accessed by RevUP
Information in RevUP | API | Information Extracted | Query |
---|---|---|---|
Evidence level C1.1: Variant position is evolutionarily conserved | UCSC REST API track: phyloP100way hg38 | PhyloP score | https://api.genome.ucsc.edu/getData/track? track=phyloP100way;genome=hg38;chrom=chr17; start=4987634;end=4987635 |
UCSC REST API track: phastCons100way hg38 | PhastCons score | https://api.genome.ucsc.edu/getData/track? track=phastCons100way;genome=hg38;chrom=chr17; start=4987634;end=4987635 | |
rsID | UCSC REST API track: snp151 hg38 | rsID (name) | https://api.genome.ucsc.edu/getData/track? track=snp151;genome=hg38;chrom=chr17; start=4987634;end=4987636 |
Reference Allele | UCSC REST API sequence query hg38 | Reference allele (dna) | https://api.genome.ucsc.edu/getData/sequence? genome=hg38;chrom=chr17;start=4987634;end=4987635 |
Evidence level F1.2: Variant localizes to a regulatory region based on genome annotations | UCSC REST API GRCh38 | cCRE (ccre) Description (description) Name (name) | https://api.genome.ucsc.edu/getData/track? track=encodeCcreCombined;genome=hg38;chrom=chr17; start=4987634;end=4987635 |
Evidence level C1.2: Variant is rare in unaffected individuals in specific sets of controls or reference population databases | gnomAD GraphQL dataset: gnomad_r3 hg38 | Allele count (ac) Allele number (an) Number of homozygotes (homozygote_count) | query getVariant($variantId: String!) { variant( variantId: $variantId, dataset: gnomad_r3) { exome { ac an } genome { ac ac_hom an homozygote_count } rsid } } inputs: { "variantId": "1-55516888-G-GA" } |
ClinVar number | gnomAD GraphQL API dataset: gnomad_r3 hg38 | ClinVar ID (clinvar_variation_id) | query getClinvarVariant($variantId: String!) { clinvar_variant( variant_id: $variantId, reference_genome: GRCh38) { rsid clinvar_variation_id } } inputs: { "variantId": "8-101493333-G-T" } |
Evidence level C2.3: Variant is considered deleterious by computational prediction methods | CADD REST API version v1.0 GRCh38-v1.6 | CADD Phred Score (PHRED) | https://cadd.gs.washington.edu/api/v1.0/GRCh38-v1.6/17:4987635 |
Evidence level F1.1: Variant position is implicated in TF binding based on experimental data | ReMap (downloaded file) hg38 | CRM at variant position | Homo sapiens CRMs file downloaded from ReMap 2020 on this page |
Evidence level F1.3: Regulatory region and target gene are directly linked based on annotation or experimental data Evidence level F1.4: Variant is statistically associated with expression levels of the target gene | SCREEN GraphQL API GRCh38 | cCRE Method (ccre["details"]["linkedGenes"]["method"]) | { ccres( assembly: GRCh38 range: { chrom: "chr17", start: 4987634, end: 4987635 } ) { total, ccres { accession, details { linkedGenes { gene, method } } } } } |