Validation of genetic variants from NGS data using Deep Convolutional Neural Networks

Monday, June 14 at 11:30pm (PDT)
Tuesday, June 15 at 07:30am (BST)
Tuesday, June 15 03:30pm (KST)

SMB2021 SMB2021 Follow Monday (Tuesday) during the "PS01" time block.
Share this

Marc Vaisband

University of Bonn
"Validation of genetic variants from NGS data using Deep Convolutional Neural Networks"
A crucial aspect of analysing next-generation sequencing (NGS) data from cancer patients lies in identifying mutations in the genetic code of tumor cells. This is done by considering the tumor DNA together with a reference germline sample, and inferring candidate somatic mutations by way of comparison. A multitude of tools exist for this purpose. In practice, however, sequencing artifacts or alignment errors are often mistakenly flagged as variants, necessitating extremely time-consuming manual validation by researchers.We demonstrate that this process can be largely automated using Deep Convolutional Neural Networks, whose utility has been a driving force behind many recent advances in applied machine learning. Using previously performed manual annotation as input data, we train a Deep Convolutional Neural Network of straightforward topology that recognises sequencing artifacts in called variants with high accuracy, achieving a score of 97.5% on a validation dataset. Moreover, its direct outputs are class probabilities instead of binary labels, and the remaining misclassified points lie in the region of low certainty, suggesting an effective modelling of the decision behaviour in manual annotation. This allows for a significant reduction in the workload for researchers, and can in the future be integrated into bioinformatics workflows for NGS data processing.

Hosted by SMB2021 Follow
Virtual conference of the Society for Mathematical Biology, 2021.