摘要
Proteins are the major carriers of biological processes and extant proteome contains tremendous diversity.However,the theoretical diversity of proteins greatly outnumbered the currently known,largely due to evolutionary constraints.Here,we propose that untouched protein space,either extant yet with unknown function,or unnatural proteins could have many proteins of desired functions,and outlined a roadmap for exploring such protein space with artificial intel-ligence.Particularly with the methods developed in natural language processing(NLP),we can first identify a large num-ber of functional proteins and peptides encrypted in biological big data,for instance microbiome and virome data.Secondly,larger scale mutations and directed evolution can be carried out and facilitated by NLP,to achieve improved function based on known proteins.Lastly,sampling random sequences and applying NLP might reveal the more complete landscape of protein functions and enable de novo protein design.
出处
《hLife》
2023年第2期93-97,共5页
健康科学(英文)
基金
supported by the National Key Research and Development Program of China (2021YFC2300700)
the Stra-tegic Priority Research Program of the Chinese Academy of Sciences (XDB29020000)
the Program of the Beijing Natural Science Foundation (JQ22017)
the Beijing Nova Program (202077&202120).