Abstract
In many biological research studies that rely on DNA sequence data, calculating the edit distance between two sequences is a vital component. However, computing the edit distance involves dynamic programming, which can be computationally intensive. To address this challenge, numerous works have focused on embedding sequences into the vector space while preserving the distance metric. This means that the edit distance between sequences is analogous to the distance between their corresponding vectors. In this study, we propose a novel Needleman-Wunsch Attention (NWA) framework for sequence embedding that leverages the relationship between the Needleman-Wunsch (NW) matrix and attention maps to improve the accuracy and efficiency of edit distance approximation methods. Our approach applies to any deep learning-based sequence embedding network and provides a general solution to improve the accuracy and efficiency of edit distance approximation methods. We validate the effectiveness of our proposed method by applying it to various existing embedding networks, demonstrating improved edit distance-preserving embedding in an actual dataset. The code is publicly available at https://github.com/thisislim/nw-attention/.
Original language | English |
---|---|
Pages (from-to) | 69087-69096 |
Number of pages | 10 |
Journal | IEEE Access |
Volume | 12 |
DOIs | |
Publication status | Published - 2024 |
Bibliographical note
Publisher Copyright:© 2013 IEEE.
All Science Journal Classification (ASJC) codes
- General Computer Science
- General Materials Science
- General Engineering