Mutation hotspots are either solitary amino acid residues or stretches of amino acids that show elevated mutation frequency in cancer-related genes, but their prevalence and biological relevance are not completely understood. Here, we developed a Smith-Waterman algorithm-based mutation hotspot discovery method, MutClustSW, to identify mutation hotspots of either single or clustered amino acid residues. We identified 181 missense mutation hotspots from COSMIC and TCGA mutation databases. In addition to 77 single amino acid residue hotspots 42.5 percent including well-known mutation hotspots such as IDH1 p.R132 and BRAF p.V600, we identified 104 mutation hotspots 57.5 percent as clusters or stretches of multiple amino acids, and the hotspots on MUC2, EPPK1, KMT2C, and TP53 were larger than 50 amino acids. Twelve of 27 nonsense mutation hotspots 44.4 percent were observed in four cancer-related genes, TP53, ARID1A, CDKN2A, and PTEN, suggesting that truncating mutations on some tumor suppressor genes are not randomly distributed as previously assumed. We also show that hotspot mutations have higher mutation allele frequency than non-hotspots, and the hotspot information can be used to prioritize the cancer drivers. Together, the proposed algorithm and the mutation hotspot information can serve as valuable resources in the selection of functional driver mutations and associated genes.
|Number of pages
|IEEE/ACM Transactions on Computational Biology and Bioinformatics
|Published - 2019 Sept
Bibliographical notePublisher Copyright:
© 2019 Institute of Electrical and Electronics Engineers Inc.. All rights reserved.
All Science Journal Classification (ASJC) codes
- Applied Mathematics