Explanation of gavin_r0.5_calibvars.caddannot.clusters.tsv.gz. #### Rows in the data (n = 399,393): Each row represents one genetic variant (mutation) that has been classified as either PATHOGENIC or POPULATION. The PATHOGENIC variants are considered to be disease causing by reputable sources. For POPULATION variants this has not been established and are thus presumed to be safe/benign. The POPULATION variants are selected based on gene-specific properties using the GAVIN method (see: https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-1141-7) and serve as a highly representative set of negatives. #### Columns in the data: 1: GENEONTOL_cluster, a clustering of genes based on Gene Ontology annotations. 2: CADDPATHO_cluster, a clustering of genes based on aggregate CADD properties of pathogenic variants. Basic variant annotation, this includes gene, group (PATHOGENIC/POPULATION), effect and impact from SnpEff, and CADD scaled score (same as very last column). 3: gene 4: chr 5: pos 6: ref 7: alt 8: group 9: effect 10: impact 11: cadd All features of CADD annotation. For full description and to see which are used to train the original CADD scores, see: http://cadd.gs.washington.edu/static/ReleaseNotes_CADD_v1.3.pdf. 12: Chrom 13: Pos 14: Ref 15: Anc 16: Alt 17: Type 18: Length 19: isTv 20: isDerived 21: AnnoType 22: Consequence 23: ConsScore 24: ConsDetail 25: GC 26: CpG 27: mapAbility20bp 28: mapAbility35bp 29: scoreSegDup 30: priPhCons 31: mamPhCons 32: verPhCons 33: priPhyloP 34: mamPhyloP 35: verPhyloP 36: GerpN 37: GerpS 38: GerpRS 39: GerpRSpval 40: bStatistic 41: mutIndex 42: dnaHelT 43: dnaMGW 44: dnaProT 45: dnaRoll 46: mirSVR-Score 47: mirSVR-E 48: mirSVR-Aln 49: targetScan 50: fitCons 51: cHmmTssA 52: cHmmTssAFlnk 53: cHmmTxFlnk 54: cHmmTx 55: cHmmTxWk 56: cHmmEnhG 57: cHmmEnh 58: cHmmZnfRpts 59: cHmmHet 60: cHmmTssBiv 61: cHmmBivFlnk 62: cHmmEnhBiv 63: cHmmReprPC 64: cHmmReprPCWk 65: cHmmQuies 66: EncExp 67: EncH3K27Ac 68: EncH3K4Me1 69: EncH3K4Me3 70: EncNucleo 71: EncOCC 72: EncOCCombPVal 73: EncOCDNasePVal 74: EncOCFairePVal 75: EncOCpolIIPVal 76: EncOCctcfPVal 77: EncOCmycPVal 78: EncOCDNaseSig 79: EncOCFaireSig 80: EncOCpolIISig 81: EncOCctcfSig 82: EncOCmycSig 83: Segway 84: tOverlapMotifs 85: motifDist 86: motifECount 87: motifEName 88: motifEHIPos 89: motifEScoreChng 90: TFBS 91: TFBSPeaks 92: TFBSPeaksMax 93: isKnownVariant 94: ESP_AF 95: ESP_AFR 96: ESP_EUR 97: TG_AF 98: TG_ASN 99: TG_AMR 100: TG_AFR 101: TG_EUR 102: minDistTSS 103: minDistTSE 104: GeneID 105: FeatureID 106: CCDS 107: GeneName 108: cDNApos 109: relcDNApos 110: CDSpos 111: relCDSpos 112: protPos 113: relProtPos 114: Domain 115: Dst2Splice 116: Dst2SplType 117: Exon 118: Intron 119: oAA 120: nAA 121: Grantham 122: PolyPhenCat 123: PolyPhenVal 124: SIFTcat 125: SIFTval Outcome of CADD model in raw SVM and logarithmically scaled PHRED scores. 126: RawScore 127: PHRED