MicroRNAs( miRNAs) are reported to be associated with various diseases. The identification of disease-related miRNAs would be beneficial to the disease diagnosis and prognosis. However,in contrast with the widely avai...MicroRNAs( miRNAs) are reported to be associated with various diseases. The identification of disease-related miRNAs would be beneficial to the disease diagnosis and prognosis. However,in contrast with the widely available expression profiling, the limited knowledge of molecular function restrict the development of previous methods based on network similarity measure. To construct reliable training data,the decision fusion method is used to prioritize the results of existing methods. After that,the performance of decision fusion method is validated. Furthermore,in consideration of the long range dependencies of successive expression values,Hidden Conditional Random Field model( HCRF) is selected and applied to miRNA expression profiling to infer disease-associated miRNAs. The results show that HCRF achieves superior performance and outperforms the previous methods. The results also demonstrate the power of using expression profiling for discovering disease-associated miRNAs.展开更多
Purpose:In order to annotate the semantic information and extract the research level information of research papers,we attempt to seek a method to develop an information extraction system.Design/methodology/approach:S...Purpose:In order to annotate the semantic information and extract the research level information of research papers,we attempt to seek a method to develop an information extraction system.Design/methodology/approach:Semantic dictionary and conditional random field model(CRFM)were used to annotate the semantic information of research papers.Based on the annotation results,the research level information was extracted through regular expression.All the functions were implemented on Sybase platform.Findings:According to the result of our experiment in carbon nanotube research,the precision and recall rates reached 65.13%and 57.75%,respectively after the semantic properties of word class have been labeled,and F-measure increased dramatically from less than 50%to60.18%while added with semantic features.Our experiment also showed that the information extraction system for research level(IESRL)can extract performance indicators from research papers rapidly and effectively.Research limitations:Some text information,such as that of format and chart,might have been lost due to the extraction processing of text format from PDF to TXT files.Semantic labeling on sentences could be insufficient due to the rich meaning of lexicons in the semantic dictionary.Research implications:The established system can help researchers rapidly compare the level of different research papers and find out their implicit innovation values.It could also be used as an auxiliary tool for analyzing research levels of various research institutions.Originality/value:In this work,we have successfully established an information extraction system for research papers by a revised semantic annotation method based on CRFM and the semantic dictionary.Our system can analyze the information extraction problem from two levels,i.e.from the sentence level and noun(phrase)level of research papers.Compared with the extraction method based on knowledge engineering and that on machine learning,our system shows advantages of the both.展开更多
Predicting essential proteins is crucial for discovering the process of cellular organization and viability.We propose biased random walk with restart algorithm for essential proteins prediction,called BRWR.Firstly,th...Predicting essential proteins is crucial for discovering the process of cellular organization and viability.We propose biased random walk with restart algorithm for essential proteins prediction,called BRWR.Firstly,the common process of practice walk often sets the probability of particles transferring to adjacent nodes to be equal,neglecting the influence of the similarity structure on the transition probability.To address this problem,we redefine a novel transition probability matrix by integrating the gene express similarity and subcellular location similarity.The particles can obtain biased transferring probabilities to perform random walk so as to further exploit biological properties embedded in the network structure.Secondly,we use gene ontology(GO)terms score and subcellular score to calculate the initial probability vector of the random walk with restart.Finally,when the biased random walk with restart process reaches steady state,the protein importance score is obtained.In order to demonstrate superiority of BRWR,we conduct experiments on the YHQ,BioGRID,Krogan and Gavin PPI networks.The results show that the method BRWR is superior to other state-of-the-art methods in essential proteins recognition performance.Especially,compared with the contrast methods,the improvements of BRWR in terms of the ACC results range in 1.4%–5.7%,1.3%–11.9%,2.4%–8.8%,and 0.8%–14.2%,respectively.Therefore,BRWR is effective and reasonable.展开更多
Recent work on opinion mining typically focuses on subtasks such as aspect mining or polarity classification, ignoring the detailed explanatory evidences that account for one certain user opinion. In this paper, we st...Recent work on opinion mining typically focuses on subtasks such as aspect mining or polarity classification, ignoring the detailed explanatory evidences that account for one certain user opinion. In this paper, we study the extraction of explanatory expressions, by modeling the problem based on conditional random field (CRF). We compare the effectiveness of both discrete and neural features, and further integrate them.We evaluate the models on two datasets from two different domains which have been annotated with ground-truth explanatory expression.Results show that the neural CRF model performs better than the discrete CRF. After a combination of the discrete and neural features, our final CRF mode achieves the top-performing results.展开更多
基金Sponsored by the National Natural Science Foundation of China(Grant Nos.61271346,61571163,61532014,61402132 and 91335112)
文摘MicroRNAs( miRNAs) are reported to be associated with various diseases. The identification of disease-related miRNAs would be beneficial to the disease diagnosis and prognosis. However,in contrast with the widely available expression profiling, the limited knowledge of molecular function restrict the development of previous methods based on network similarity measure. To construct reliable training data,the decision fusion method is used to prioritize the results of existing methods. After that,the performance of decision fusion method is validated. Furthermore,in consideration of the long range dependencies of successive expression values,Hidden Conditional Random Field model( HCRF) is selected and applied to miRNA expression profiling to infer disease-associated miRNAs. The results show that HCRF achieves superior performance and outperforms the previous methods. The results also demonstrate the power of using expression profiling for discovering disease-associated miRNAs.
基金supported by the National Social Science Foundation of China(Grant No.12CTQ032)
文摘Purpose:In order to annotate the semantic information and extract the research level information of research papers,we attempt to seek a method to develop an information extraction system.Design/methodology/approach:Semantic dictionary and conditional random field model(CRFM)were used to annotate the semantic information of research papers.Based on the annotation results,the research level information was extracted through regular expression.All the functions were implemented on Sybase platform.Findings:According to the result of our experiment in carbon nanotube research,the precision and recall rates reached 65.13%and 57.75%,respectively after the semantic properties of word class have been labeled,and F-measure increased dramatically from less than 50%to60.18%while added with semantic features.Our experiment also showed that the information extraction system for research level(IESRL)can extract performance indicators from research papers rapidly and effectively.Research limitations:Some text information,such as that of format and chart,might have been lost due to the extraction processing of text format from PDF to TXT files.Semantic labeling on sentences could be insufficient due to the rich meaning of lexicons in the semantic dictionary.Research implications:The established system can help researchers rapidly compare the level of different research papers and find out their implicit innovation values.It could also be used as an auxiliary tool for analyzing research levels of various research institutions.Originality/value:In this work,we have successfully established an information extraction system for research papers by a revised semantic annotation method based on CRFM and the semantic dictionary.Our system can analyze the information extraction problem from two levels,i.e.from the sentence level and noun(phrase)level of research papers.Compared with the extraction method based on knowledge engineering and that on machine learning,our system shows advantages of the both.
基金Project supported by the National Natural Science Foundation of China(Grant Nos.11861045 and 62162040)。
文摘Predicting essential proteins is crucial for discovering the process of cellular organization and viability.We propose biased random walk with restart algorithm for essential proteins prediction,called BRWR.Firstly,the common process of practice walk often sets the probability of particles transferring to adjacent nodes to be equal,neglecting the influence of the similarity structure on the transition probability.To address this problem,we redefine a novel transition probability matrix by integrating the gene express similarity and subcellular location similarity.The particles can obtain biased transferring probabilities to perform random walk so as to further exploit biological properties embedded in the network structure.Secondly,we use gene ontology(GO)terms score and subcellular score to calculate the initial probability vector of the random walk with restart.Finally,when the biased random walk with restart process reaches steady state,the protein importance score is obtained.In order to demonstrate superiority of BRWR,we conduct experiments on the YHQ,BioGRID,Krogan and Gavin PPI networks.The results show that the method BRWR is superior to other state-of-the-art methods in essential proteins recognition performance.Especially,compared with the contrast methods,the improvements of BRWR in terms of the ACC results range in 1.4%–5.7%,1.3%–11.9%,2.4%–8.8%,and 0.8%–14.2%,respectively.Therefore,BRWR is effective and reasonable.
文摘Recent work on opinion mining typically focuses on subtasks such as aspect mining or polarity classification, ignoring the detailed explanatory evidences that account for one certain user opinion. In this paper, we study the extraction of explanatory expressions, by modeling the problem based on conditional random field (CRF). We compare the effectiveness of both discrete and neural features, and further integrate them.We evaluate the models on two datasets from two different domains which have been annotated with ground-truth explanatory expression.Results show that the neural CRF model performs better than the discrete CRF. After a combination of the discrete and neural features, our final CRF mode achieves the top-performing results.