摘要
Diatoms comprise a diverse and ecologically important group of eukaryotic phytoplankton that signifi- cantly contributes to marine primary production and global carbon cycling. Phaeodactylum tricornutum is commonly used as a model organism for studying diatom biology. Although its genome was sequenced in 2008, a high-quality genome annotation is still not available for this diatom. Here we report the develop- ment of an integrated proteogenomic pipeline and its application for improved annotation of P. tricornutum genome using mass spectrometry (MS)-based proteomics data. Our proteogenomic analysis unambigu- ously identified approximately 8300 genes and revealed 606 novel proteins, 506 revised genes, 94 splice variants, 58 single amino acid variants, and a holistic view of post-translational modifications in P. tricor- nutum. We experimentally confirmed a subset of novel events and obtained MS evidence for more than 200 micropeptides in P. tricornutum. These findings expand the genomic landscape of P. tricornutum and provide a rich resource for the study of diatom biology. The proteogenomic pipeline we developed in this study is applicable to any sequenced eukaryote and thus represents a significant contribution to the toolset for eukaryotic proteogenomic analysis. The pipeline and its source code are freely available at https://sourceforge.net/projects/gapeproteogeno mic.
Diatoms comprise a diverse and ecologically important group of eukaryotic phytoplankton that signifi- cantly contributes to marine primary production and global carbon cycling. Phaeodactylum tricornutum is commonly used as a model organism for studying diatom biology. Although its genome was sequenced in 2008, a high-quality genome annotation is still not available for this diatom. Here we report the develop- ment of an integrated proteogenomic pipeline and its application for improved annotation of P. tricornutum genome using mass spectrometry (MS)-based proteomics data. Our proteogenomic analysis unambigu- ously identified approximately 8300 genes and revealed 606 novel proteins, 506 revised genes, 94 splice variants, 58 single amino acid variants, and a holistic view of post-translational modifications in P. tricor- nutum. We experimentally confirmed a subset of novel events and obtained MS evidence for more than 200 micropeptides in P. tricornutum. These findings expand the genomic landscape of P. tricornutum and provide a rich resource for the study of diatom biology. The proteogenomic pipeline we developed in this study is applicable to any sequenced eukaryote and thus represents a significant contribution to the toolset for eukaryotic proteogenomic analysis. The pipeline and its source code are freely available at https://sourceforge.net/projects/gapeproteogeno mic.
基金
This work was supported by the National Key Research and Development Program (2016YFA0501304), the National Natural Science Foundation of China (grant no. 31570829), and the Strategic Priority Research Program of the Chinese Academy of Sciences (grant no. XDB14030202).