TY - JOUR
T1 - ASV-ID, a Proteogenomic Workflow to Predict Candidate Protein Isoforms on the Basis of Transcript Evidence
AU - Jeong, Seul Ki
AU - Kim, Chae Yeon
AU - Paik, Young Ki
N1 - Publisher Copyright:
Copyright © 2018 American Chemical Society.
PY - 2018/12/7
Y1 - 2018/12/7
N2 - One of the goals of the Chromosome-Centric Human Proteome Project (C-HPP) is to map and characterize the functions of protein isoforms produced by alternative splicing of genes. However, identifying alternative splice variants (ASVs) via mass spectrometry remains a major challenge, because ASVs usually contain highly homologous peptide sequences. A routine protein sequence analysis suggests that more than half of the investigated proteins do not generate two or more uniquely mapping peptides that would enable their isoforms to be distinguished. Here, we develop a new proteogenomics method, named "ASV-ID" (alternative splicing variants identification), which enables identification of ASVs by using a cell type-specific protein sequence database that is supported by RNA-Seq data. Using this workflow, we identify 1935 distinct proteins under highly stringent conditions. In fact, transcript evidence on these 841 proteins helps us distinguish them from other isoforms, despite the fact that these proteins are not predicted to make 2 or more uniquely mapping peptides. We also demonstrate that ASV-ID enables detection of 19 differently expressed isoforms present in several cell lines. Thus, a new workflow using ASV-ID has the potential to map yet-to-be-identified difficult protein isoforms in a simple and robust way.
AB - One of the goals of the Chromosome-Centric Human Proteome Project (C-HPP) is to map and characterize the functions of protein isoforms produced by alternative splicing of genes. However, identifying alternative splice variants (ASVs) via mass spectrometry remains a major challenge, because ASVs usually contain highly homologous peptide sequences. A routine protein sequence analysis suggests that more than half of the investigated proteins do not generate two or more uniquely mapping peptides that would enable their isoforms to be distinguished. Here, we develop a new proteogenomics method, named "ASV-ID" (alternative splicing variants identification), which enables identification of ASVs by using a cell type-specific protein sequence database that is supported by RNA-Seq data. Using this workflow, we identify 1935 distinct proteins under highly stringent conditions. In fact, transcript evidence on these 841 proteins helps us distinguish them from other isoforms, despite the fact that these proteins are not predicted to make 2 or more uniquely mapping peptides. We also demonstrate that ASV-ID enables detection of 19 differently expressed isoforms present in several cell lines. Thus, a new workflow using ASV-ID has the potential to map yet-to-be-identified difficult protein isoforms in a simple and robust way.
KW - RNA-sequencing
KW - alternative splicing variants
KW - cell type-specific sequence database
KW - proteogenomics
UR - http://www.scopus.com/inward/record.url?scp=85055175749&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85055175749&partnerID=8YFLogxK
U2 - 10.1021/acs.jproteome.8b00548
DO - 10.1021/acs.jproteome.8b00548
M3 - Article
C2 - 30289715
AN - SCOPUS:85055175749
SN - 1535-3893
VL - 17
SP - 4235
EP - 4242
JO - Journal of Proteome Research
JF - Journal of Proteome Research
IS - 12
ER -