Development and Optimization of a Clinical Support Algorithm for Rapid Identification of Diagnostic Germline Variants
Loading...
Date
2022-03
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Congenital disorders are the leading cause of death amongst infants in the U.S. and ultimately effect approximately 8% of the population. Next generation sequencing methods have contributed to increased diagnostic yield in rare disease diagnostics; however, most patients referred to genetics departments still do not receive a diagnosis. By leveraging computational methods, candidate genetic variants can be ranked by likelihood of causing the disease phenotype. LIRICAL, a likelihood ratio algorithm that implements a phenotype and genotype component, outputs probabilities of candidate variants being diagnostic, which is preferable for human interpretation. Natural language processing (NLP) algorithms are capable of identifying phenotype terms in unstructured clinical notes, but the large number of extracted terms overwhelms LIRICAL and compromises accuracy. Here we compare our improved likelihood ratio algorithm, CAVaLRi, and investigate the clinical utility of NLP generated phenotype sets. Novel features of CAVaLRi include limiting inputted phenotype sets to only the most informative terms, incorporating parental genotypes and assigning relative importance by weighting each likelihood ratio component. Genetic sequencing data from an internal cohort (n=611, solved=185) were obtained along with phenotype sets curated by clinical staff. Clinical notes from the electronic health record were passed to ClinPhen, an NLP phenotype extraction algorithm, to generate computational phenotype sets. When passing clinician curated phenotype sets, CAVaLRi significantly outperformed LIRICAL (ROC AUC improved from 0.80 to 0.94, average rank of solved cases improved from 11.4 to 5.7, p=7.97e-16). CAVaLRi accuracy was virtually identical when clinician curated phenotype sets were replaced by ClinPhen generated phenotype sets (ROC AUC remained unchanged at 0.94, average rank of solved cases increase trivially from 5.7 to 5.8, p=0.23). The likelihood ratio paradigm extensions provided by CAVaLRi lead to highly significant gains in diagnostic variant classification accuracy compared to leading variant prioritization algorithms. CAVaLRi stands as the best available computational tool for ensuring diagnostic variants are not overlooked in clinical review.
Description
Poster Division: Health Sciences: 2nd Place (The Ohio State University Edward F. Hayes Graduate Research Forum)
Keywords
genetic, germline, rare disease, diagnostic variant, rare disease, algorithm