Crispr-Embedding: CRISPR/Cas9 Off-Target Activity Prediction Using DNA k-Mer Embedding
18 Pages Posted: 31 Mar 2022 Publication Status: Under Review
Abstract
In the field of gene editing, CRISPR/Cas9 has been a revolutionary new tool for biologists and researchers to work with. However, this technology has the risk of off-targets or editing at unintended sites, as they may harm normal cell functions. As such, many computational based approaches have been taken for accurate off-target prediction. Conventional feature and data handling produced issues of data imbalance and many of the architectures are unnecessarily complex. In this paper, we have devised a deep learning model, namely CRISPR-Embedding, using a 9 layered Convolutional Neural Network (CNN) for the prediction of CRISPR/Cas9 off-targets while implementing DNA k-mer embedding for sequence representation. In addition, using data augmentation and under-sampling we produced a substantially cleaner dataset to diffuse data imbalance issues. Evaluating CRISPR-Embedding with 5-fold cross validation, an average accuracy of 94.07% has been realized. Furthermore, comparison with other state-of-the-art methods has clearly showed improved off-target activity prediction.
Keywords: CRISPR/Cas9, Deep learning, Word embeddings, Off-targets
Suggested Citation: Suggested Citation