蛋白质功能预测是后基因组时代生物信息学的核心问题之一.蛋白质功能标记数据库通常仅提供蛋白质具有某个功能(正样例)的信息,极少提供蛋白质不具有某个功能(负样例)的信息.当前的蛋白质功能预测方法通常仅利用蛋白质正样例,极少关注量...蛋白质功能预测是后基因组时代生物信息学的核心问题之一.蛋白质功能标记数据库通常仅提供蛋白质具有某个功能(正样例)的信息,极少提供蛋白质不具有某个功能(负样例)的信息.当前的蛋白质功能预测方法通常仅利用蛋白质正样例,极少关注量少但富含信息的蛋白质负样例.为此,提出一种基于正负样例的蛋白质功能预测方法(protein function prediction using positive and negative examples,ProPN).ProPN首先通过构造一个有向符号混合图描述已知的蛋白质与功能标记的正负关联信息、蛋白质之间的互作信息和功能标记间的关联关系,再通过符号混合图上的标签传播算法预测蛋白质功能.在酵母菌、老鼠和人类蛋白质数据集上的实验表明,ProPN不仅在预测已知部分功能标记蛋白质的负样例任务上优于现有算法,在预测功能标记完全未知蛋白质的功能任务上也获得了较其他相关方法更高的精度.展开更多
蛋白质是生命活动的重要物质基础,对其功能的准确标注可以极大地促进生命科学的研究与发展.已有的蛋白质功能预测方法通常仅关注利用蛋白质具有某些功能的信息(正样例),并没有关注利用蛋白质不相关的功能信息(负样例).已有研究表明,结...蛋白质是生命活动的重要物质基础,对其功能的准确标注可以极大地促进生命科学的研究与发展.已有的蛋白质功能预测方法通常仅关注利用蛋白质具有某些功能的信息(正样例),并没有关注利用蛋白质不相关的功能信息(负样例).已有研究表明,结合蛋白质负样例可以降低蛋白质功能预测的复杂度并提高预测精度.本文提出一种基于降维的蛋白质不相关功能预测方法 (predicting irrelevant functions of proteins based on dimensionality reduction,IFDR).IFDR通过在蛋白质互作网邻接矩阵和蛋白质–功能标记关联矩阵上分别进行随机游走,挖掘蛋白质之间的内在关系和预估蛋白质的缺失功能标记,再分别利用奇异值分解将上述2个矩阵投影降维为低维实数矩阵,最后利用半监督回归预测负样例.在酵母菌、人类和拟南芥的蛋白质数据集上的实验表明,IFDR比已有相关算法能够更准确地预测负样例,对互作网络和功能标记空间的降维均可以提高负样例预测精度.展开更多
提出一种基于标签正负相关性的多标签类属特征学习方法(multi-label learning with label-specific features based on positive and negative label correlation,LIFTPNL)。基于k近邻的思想构建全局和局部的标签信息矩阵,根据此矩阵计...提出一种基于标签正负相关性的多标签类属特征学习方法(multi-label learning with label-specific features based on positive and negative label correlation,LIFTPNL)。基于k近邻的思想构建全局和局部的标签信息矩阵,根据此矩阵计算成对标签的正负相关性;对每个类别标签,基于属于相同和不同类簇的样本构建连接矩阵,联合该标签正负相关性计算样本相似度;采用谱聚类方法获得聚类中心,将原有特征转换成类属特征;通过二分类器得到分类结果。实验结果表明,所提算法优于多种多标签分类方法。展开更多
The recent worldwide spreading of pneumonia-causing virus, such as Coronavirus, COVID-19, and H1N1, has been endangering the life of human beings all around the world. In order to really understand the biological proc...The recent worldwide spreading of pneumonia-causing virus, such as Coronavirus, COVID-19, and H1N1, has been endangering the life of human beings all around the world. In order to really understand the biological process within a cell level and provide useful clues to develop antiviral drugs, information of Gram positive bacteria protein subcellular localization is vitally important. In view of this, a CNN based protein subcellular localization predictor called “pLoc_Deep-mGpos” was developed. The predictor is particularly useful in dealing with the multi-sites systems in which some proteins may simultaneously occur in two or more different organelles that are the current focus of pharmaceutical industry. The global absolute true rate achieved by the new predictor is over 99% and its local accuracy is around 92% - 99%. Both are transcending other existing state-of-the-art predictors significantly. To maximize the convenience for most experimental scientists, a user-friendly web-server for the new predictor has been established at http://www.jci-bioinfo.cn/pLoc_Deep-mGpos/, which will become a very powerful tool for developing effective drugs to fight pandemic coronavirus and save the mankind of this planet.展开更多
The research on named entity recognition for label-few domain is becoming increasingly important.In this paper,a novel algorithm,positive unlabeled named entity recognition(PUNER)with multi-granularity language inform...The research on named entity recognition for label-few domain is becoming increasingly important.In this paper,a novel algorithm,positive unlabeled named entity recognition(PUNER)with multi-granularity language information,is proposed,which combines positive unlabeled(PU)learning and deep learning to obtain the multi-granularity language information from a few labeled in-stances and many unlabeled instances to recognize named entities.First,PUNER selects reliable negative instances from unlabeled datasets,uses positive instances and a corresponding number of negative instances to train the PU learning classifier,and iterates continuously to label all unlabeled instances.Second,a neural network-based architecture to implement the PU learning classifier is used,and comprehensive text semantics through multi-granular language information are obtained,which helps the classifier correctly recognize named entities.Performance tests of the PUNER are carried out on three multilingual NER datasets,which are CoNLL2003,CoNLL 2002 and SIGHAN Bakeoff 2006.Experimental results demonstrate the effectiveness of the proposed PUNER.展开更多
文摘蛋白质功能预测是后基因组时代生物信息学的核心问题之一.蛋白质功能标记数据库通常仅提供蛋白质具有某个功能(正样例)的信息,极少提供蛋白质不具有某个功能(负样例)的信息.当前的蛋白质功能预测方法通常仅利用蛋白质正样例,极少关注量少但富含信息的蛋白质负样例.为此,提出一种基于正负样例的蛋白质功能预测方法(protein function prediction using positive and negative examples,ProPN).ProPN首先通过构造一个有向符号混合图描述已知的蛋白质与功能标记的正负关联信息、蛋白质之间的互作信息和功能标记间的关联关系,再通过符号混合图上的标签传播算法预测蛋白质功能.在酵母菌、老鼠和人类蛋白质数据集上的实验表明,ProPN不仅在预测已知部分功能标记蛋白质的负样例任务上优于现有算法,在预测功能标记完全未知蛋白质的功能任务上也获得了较其他相关方法更高的精度.
文摘蛋白质是生命活动的重要物质基础,对其功能的准确标注可以极大地促进生命科学的研究与发展.已有的蛋白质功能预测方法通常仅关注利用蛋白质具有某些功能的信息(正样例),并没有关注利用蛋白质不相关的功能信息(负样例).已有研究表明,结合蛋白质负样例可以降低蛋白质功能预测的复杂度并提高预测精度.本文提出一种基于降维的蛋白质不相关功能预测方法 (predicting irrelevant functions of proteins based on dimensionality reduction,IFDR).IFDR通过在蛋白质互作网邻接矩阵和蛋白质–功能标记关联矩阵上分别进行随机游走,挖掘蛋白质之间的内在关系和预估蛋白质的缺失功能标记,再分别利用奇异值分解将上述2个矩阵投影降维为低维实数矩阵,最后利用半监督回归预测负样例.在酵母菌、人类和拟南芥的蛋白质数据集上的实验表明,IFDR比已有相关算法能够更准确地预测负样例,对互作网络和功能标记空间的降维均可以提高负样例预测精度.
文摘提出一种基于标签正负相关性的多标签类属特征学习方法(multi-label learning with label-specific features based on positive and negative label correlation,LIFTPNL)。基于k近邻的思想构建全局和局部的标签信息矩阵,根据此矩阵计算成对标签的正负相关性;对每个类别标签,基于属于相同和不同类簇的样本构建连接矩阵,联合该标签正负相关性计算样本相似度;采用谱聚类方法获得聚类中心,将原有特征转换成类属特征;通过二分类器得到分类结果。实验结果表明,所提算法优于多种多标签分类方法。
文摘The recent worldwide spreading of pneumonia-causing virus, such as Coronavirus, COVID-19, and H1N1, has been endangering the life of human beings all around the world. In order to really understand the biological process within a cell level and provide useful clues to develop antiviral drugs, information of Gram positive bacteria protein subcellular localization is vitally important. In view of this, a CNN based protein subcellular localization predictor called “pLoc_Deep-mGpos” was developed. The predictor is particularly useful in dealing with the multi-sites systems in which some proteins may simultaneously occur in two or more different organelles that are the current focus of pharmaceutical industry. The global absolute true rate achieved by the new predictor is over 99% and its local accuracy is around 92% - 99%. Both are transcending other existing state-of-the-art predictors significantly. To maximize the convenience for most experimental scientists, a user-friendly web-server for the new predictor has been established at http://www.jci-bioinfo.cn/pLoc_Deep-mGpos/, which will become a very powerful tool for developing effective drugs to fight pandemic coronavirus and save the mankind of this planet.
基金the National Natural Science Foundation of China(No.61876144)the Strategy Priority Research Program of Chinese Acade-my of Sciences(No.XDC02070600).
文摘The research on named entity recognition for label-few domain is becoming increasingly important.In this paper,a novel algorithm,positive unlabeled named entity recognition(PUNER)with multi-granularity language information,is proposed,which combines positive unlabeled(PU)learning and deep learning to obtain the multi-granularity language information from a few labeled in-stances and many unlabeled instances to recognize named entities.First,PUNER selects reliable negative instances from unlabeled datasets,uses positive instances and a corresponding number of negative instances to train the PU learning classifier,and iterates continuously to label all unlabeled instances.Second,a neural network-based architecture to implement the PU learning classifier is used,and comprehensive text semantics through multi-granular language information are obtained,which helps the classifier correctly recognize named entities.Performance tests of the PUNER are carried out on three multilingual NER datasets,which are CoNLL2003,CoNLL 2002 and SIGHAN Bakeoff 2006.Experimental results demonstrate the effectiveness of the proposed PUNER.