Evidence Classification Tables for Regulatory Variants



The following tables outline the evidence classification used to score variants in RevUP. They were adapted from Van der Lee R, Correard S and Wasserman WW in “Deregulated Regulators: Disease- Causing cis Variants in Transcription Factor Genes” (Trends in Genetics, 2020).



Evidence LevelEvidence for PathogenicityExamples
5 (High)
5.2
Vb
Variant introduction (in a model organism results in a phenotype that is consistent with the human disease- Transgenic model organism developed using CRISPR genome editing
5.1
*
V
Variant neutralization (in a model organism or cell line) rescues or reverses phenotype- Knockout/knockdown/silencing of the variant in gain-offunction scenarios - Complementation in loss-of-function scenarios
4
4.2
V
Variant results (in a cellular phenotype) consistent with the disease phenotype - Insights are only relevant if the endophenotype assayed is consistent with the disease phenotype- Patient-derived tissue/cells/induced pluripotent stem cells (iPSCs) - Transgenic cell lines - In vivo essays of cellular function
4.1
V
Variant observed in multiple, unrelated families with the same disease phenotype - Typically only relevant when multiple well-described pedigrees are available, and in which the variant segregates with disease
3
3.1
V
Variant shows familial segregation with the disease - Variant or locus segregates with affected and unaffected disease status in family pedigree - Should be considered stronger with increasing segregation data in larger families. For example, de novo variants identified in a family trio are weak segregation evidence.- Trio whole-genome analysis - Linkage analysis
2
2.5
@
V
Variant is a striking noncoding event - Often involves a large genomic alteration containing many candidate regulatory elements and variants- Copy number changes, translocation/breakpoint mapping, aCGH, MLPA ...
2.4
@
V
Variant is similar to another regulatory variant associated to the same suspected target gene and implicated in the same or a similar disease phenotype - Variants are often not the exact same, but should be justifiably similar: for example, strong overlap, affect the same TFBS.- ClinVar, DECIPHER, GeneMatcher
2.3
@
V
Variant is considered deleterious by computational prediction methods - Using in silico tools designed to include noncoding variants- CADD, FATHMM, FunSeq2, ReMM, NCBoost, ncGERP ...
2.2
G
Suspected target gene does not contain coding variants in the same individual - In the gene targeted by the regulatory variant or in other key genes for the phenotype under study - Assess accordance with expected inheritance of the phenotype; that is, present in one or both alleles
2.1
@
G
Suspected target gene has been implicated in the same or a similar disease phenotype, or is otherwise relevant - OMIM disease genes, literature, and gene function can provide insight - Dosage sensitive and haploinsufficient genes may be of increased interest - Other human patients - Knockout mice, biological link with phenotype, pLI metric
1 (Low)
1.3
@
*
V
Variant or locus previously statistically associated with the same or a similar disease phenotype- GWAS, risk alleles - Somatic recurrence in cancer
1.2
@
V
Variant is rare in unaffected individuals in specific sets of controls or reference population databases - Variant absent in databases of unaffected/control individuals; or present with a frequency less than expected given the penetrance and expressivity of the disease - Publications preceeding the appearance of large reference databases tend to depend on custom sets of control samples- gnomAD - Control samples
1.1
@
V
Variant position is evolutionarily conserved- Genome browser conservation tracks

aSymbols: @, attainable with bioinformatics tools and databases; G, suspected target gene; R, regulatory region containing variant position; V, variant; *, conceptually possible but we did not find this in practice due to focus on rare disease

Evidence LevelEvidence for PathogenicityExamples
4 (High)
4.1
V
Variant introduction (in a model organism) leads to changes in expression of target/reporter gene orchromatin environment - CRISPR genome editing, Cre-Lox recombination (endogenous) - Reporter gene constructs (exogenous)- Any relevant technique listed under F2, below
3
3.1
V
Variant introduction (in a cell line) leads to changes in expression of target/reporter gene or chromatin environment - Engineered cell line or in vitro model system- Luciferase/LacZ reporter assay - MPRA/STARR-seq - Any relevant technique listed under F2
2
2.2
V
Variant leads to changes in expression of the target gene (in patient tissue) - Patient material compared with controls. Material from for example, tissue biopsy, cultured cells, iPSCs- RT-qPCR/RNA-seq expression analysis - Allele-specific expression (ASE) - Staining, immunohistochemistry
2.1
V
Variant causes a change in TF binding and/or chromatin environment - Strongest if the studied TF binding site and regulatory region are key proven regulators of target gene expression- ChIP-seq/ChIP-qPCR/ChIP-MS (in vivo) - EMSA (in vitro) - Allele-specific binding (ASB) - Changes to chromatin domains/environment, DNA methylation
1 (Low)
1.5
*
R-G
Regulatory region is shown to regulate gene expression of the target gene - Typically involves characterization of a newly identified enhancer- Transgene reporter expression
1.4
@
V-G
Variant is statistically associated with expression levels of the target gene- cis-eQTL
1.3
@
R-G
Regulatory region and target gene are directly linked based on annotation or experimental data- Core and proximal promoter annotations - Chromosome conformation capture (e.g., 3C, 5C, Hi-C), ChIAPET
1.2
@
V
Variant localizes to a regulatory region based on genome annotations- Known or predicted regulatory elements (enhancers, promoters) - Chromatin accessibility and histone ChIP-seq data
1.1
@
V
Variant position is implicated in TF binding based on experimental data- TF ChIP-seq datasets (e.g., ENCODE and ReMAP)
0 (No Value)
0.2
@
V-G
Variant and target gene locate to the same structural domain according to chromatin annotations- 3D genome browser, TADs
0.1
@
V
Variant predicted to change a TF binding (sequence) motif- TFBS resources, (e.g., JASPAR, CIS-BP, Hocomoco)

aSymbols: @, attainable with bioinformatics tools and databases; G, suspected target gene; R, regulatory region containing variant position; V, variant; *, conceptually possible but we did not find this in practice due to focus on rare disease

External Databases Used in the RevUP Scoring System



The following table outlines how several external databases were used as part of RevUP, including the source of the information, what information was extracted, what queries were used, and where they were used in the RevUP system.



Information in RevUPAPIInformation ExtractedQuery
Evidence level C1.1: Variant position is evolutionarily conservedUCSC REST API track: phyloP100way hg38PhyloP scorehttps://api.genome.ucsc.edu/getData/track? track=phyloP100way;genome=hg38;chrom=chr17; start=4987634;end=4987635
UCSC REST API track: phastCons100way hg38PhastCons scorehttps://api.genome.ucsc.edu/getData/track? track=phastCons100way;genome=hg38;chrom=chr17; start=4987634;end=4987635
rsIDUCSC REST API track: snp151 hg38rsID (name)https://api.genome.ucsc.edu/getData/track? track=snp151;genome=hg38;chrom=chr17; start=4987634;end=4987636
Reference AlleleUCSC REST API sequence query hg38Reference allele (dna)https://api.genome.ucsc.edu/getData/sequence? genome=hg38;chrom=chr17;start=4987634;end=4987635
Evidence level F1.2: Variant localizes to a regulatory region based on genome annotationsUCSC REST API GRCh38cCRE (ccre) Description (description) Name (name)https://api.genome.ucsc.edu/getData/track? track=encodeCcreCombined;genome=hg38;chrom=chr17; start=4987634;end=4987635
Evidence level C1.2: Variant is rare in unaffected individuals in specific sets of controls or reference population databasesgnomAD GraphQL dataset: gnomad_r3 hg38Allele count (ac) Allele number (an) Number of homozygotes (homozygote_count) query getVariant($variantId: String!) { variant( variantId: $variantId, dataset: gnomad_r3) { exome { ac an } genome { ac ac_hom an homozygote_count } rsid } } inputs: { "variantId": "1-55516888-G-GA" }
ClinVar numbergnomAD GraphQL API dataset: gnomad_r3 hg38ClinVar ID (clinvar_variation_id) query getClinvarVariant($variantId: String!) { clinvar_variant( variant_id: $variantId, reference_genome: GRCh38) { rsid clinvar_variation_id } } inputs: { "variantId": "8-101493333-G-T" }
Evidence level C2.3: Variant is considered deleterious by computational prediction methodsCADD REST API version v1.0 GRCh38-v1.6CADD Phred Score (PHRED)https://cadd.gs.washington.edu/api/v1.0/GRCh38-v1.6/17:4987635
Evidence level F1.1: Variant position is implicated in TF binding based on experimental dataReMap (downloaded file) hg38CRM at variant positionHomo sapiens CRMs file downloaded from ReMap 2020 on this page
Evidence level F1.3: Regulatory region and target gene are directly linked based on annotation or experimental data Evidence level F1.4: Variant is statistically associated with expression levels of the target geneSCREEN GraphQL API GRCh38cCRE Method (ccre["details"]["linkedGenes"]["method"]) { ccres( assembly: GRCh38 range: { chrom: "chr17", start: 4987634, end: 4987635 } ) { total, ccres { accession, details { linkedGenes { gene, method } } } } }