为了解决基于传统关键词的文本聚类算法没有考虑特征关键词之间的相关性,而导致文本向量概念表达不够准确,提出基于概念向量的文本聚类算法TCBCV(Text Clustering Based on Concept Vector),采用HowNet的概念属性,并利用语义场密度和义...为了解决基于传统关键词的文本聚类算法没有考虑特征关键词之间的相关性,而导致文本向量概念表达不够准确,提出基于概念向量的文本聚类算法TCBCV(Text Clustering Based on Concept Vector),采用HowNet的概念属性,并利用语义场密度和义原在概念树的权值选取合适的义原作为关键词的概念,实现关键词到概念的映射,不仅增加了文本之间的语义关系,而且降低了向量维度,将其应用于文本聚类,能够提高文本聚类效果。实验结果表明,该算法在文本聚类的准确率和召回率上都得到了较大的提高。展开更多
The paper proposes a new text similarity computing method based on concept similarity in Chinese text processing. The new method converts text to words vector space model at first, and then splits words into a set of ...The paper proposes a new text similarity computing method based on concept similarity in Chinese text processing. The new method converts text to words vector space model at first, and then splits words into a set of concepts. Through computing the inner products between concepts, it obtains the similarity between words. The new method computes the similarity of text based on the similarity of words at last. The contributions of the paper include: 1) propose a new computing formula between words; 2) propose a new text similarity computing method based on words similarity; 3) successfully use the method in the application of similarity computing of WEB news; and 4) prove the validity of the method through extensive experiments.展开更多
Background:In 2015,following a call for proposals from the Special Programme for Research and Training in Tropical Diseases(TDR),six scoping reviews on the prevention and control of vector-borne diseases in urban area...Background:In 2015,following a call for proposals from the Special Programme for Research and Training in Tropical Diseases(TDR),six scoping reviews on the prevention and control of vector-borne diseases in urban areas were conducted.Those reviews provided a clear picture of the available knowledge and highlighted knowledge gaps,as well as needs and opportunities for future research.Based on the research findings of the scoping reviews,a concept mapping exercise was undertaken to produce a list of priority research needs to be addressed.Methods:Members of the six research teams responsible for the“VEctor boRne DiseAses Scoping reviews”(VERDAS)consortium’s scoping reviews met for 2 days with decision-makers from Colombia,Brazil,Peru,Pan-American Health Organization,and World Health Organization.A total of 11 researchers and seven decision-makers(from ministries of health,city and regional vector control departments,and vector control programs)completed the concept mapping,answering the question:“In view of the knowledge synthesis and your own expertise,what do we still need to know about vector-borne diseases and other infectious diseases of poverty in urban areas?”Participants rated each statement on two scales from 1 to 5,one relative to‘priority’and the other to‘policy relevance’,and grouped statements into clusters based on their own individual criteria and expertise.Results:The final map consisted of 12 clusters.Participants considered those entitled“Equity”,“Technology”,and“Surveillance”to have the highest priority.The cluster considered the most important concerns equity issues,confirming that these issues are rarely addressed in research on vector-borne diseases.On the other hand,the“Population mobility”and“Collaboration”clusters were considered to be the lowest priority but remained identified by participants as research priorities.The average policy relevance scores for each of the 12 clusters were roughly the same as the priority scores for all clusters.Some issues 展开更多
文摘为了解决基于传统关键词的文本聚类算法没有考虑特征关键词之间的相关性,而导致文本向量概念表达不够准确,提出基于概念向量的文本聚类算法TCBCV(Text Clustering Based on Concept Vector),采用HowNet的概念属性,并利用语义场密度和义原在概念树的权值选取合适的义原作为关键词的概念,实现关键词到概念的映射,不仅增加了文本之间的语义关系,而且降低了向量维度,将其应用于文本聚类,能够提高文本聚类效果。实验结果表明,该算法在文本聚类的准确率和召回率上都得到了较大的提高。
基金Supported by the China Postdoctoral Science Foundation (Grant No. 20060400002)the Sichuan Youth Science and Technology Foundation of China (Grant No. 08JJ0109)+2 种基金the National Natural Science Foundation of China (Grant Nos.60473051, 60503037)the National High-tech Re- search and Development of China (Grant No. 2006AA01Z230)the Natural Science Foundation of Beijing Natural Science Foundation (Grant No. 4062018)
文摘The paper proposes a new text similarity computing method based on concept similarity in Chinese text processing. The new method converts text to words vector space model at first, and then splits words into a set of concepts. Through computing the inner products between concepts, it obtains the similarity between words. The new method computes the similarity of text based on the similarity of words at last. The contributions of the paper include: 1) propose a new computing formula between words; 2) propose a new text similarity computing method based on words similarity; 3) successfully use the method in the application of similarity computing of WEB news; and 4) prove the validity of the method through extensive experiments.
文摘Background:In 2015,following a call for proposals from the Special Programme for Research and Training in Tropical Diseases(TDR),six scoping reviews on the prevention and control of vector-borne diseases in urban areas were conducted.Those reviews provided a clear picture of the available knowledge and highlighted knowledge gaps,as well as needs and opportunities for future research.Based on the research findings of the scoping reviews,a concept mapping exercise was undertaken to produce a list of priority research needs to be addressed.Methods:Members of the six research teams responsible for the“VEctor boRne DiseAses Scoping reviews”(VERDAS)consortium’s scoping reviews met for 2 days with decision-makers from Colombia,Brazil,Peru,Pan-American Health Organization,and World Health Organization.A total of 11 researchers and seven decision-makers(from ministries of health,city and regional vector control departments,and vector control programs)completed the concept mapping,answering the question:“In view of the knowledge synthesis and your own expertise,what do we still need to know about vector-borne diseases and other infectious diseases of poverty in urban areas?”Participants rated each statement on two scales from 1 to 5,one relative to‘priority’and the other to‘policy relevance’,and grouped statements into clusters based on their own individual criteria and expertise.Results:The final map consisted of 12 clusters.Participants considered those entitled“Equity”,“Technology”,and“Surveillance”to have the highest priority.The cluster considered the most important concerns equity issues,confirming that these issues are rarely addressed in research on vector-borne diseases.On the other hand,the“Population mobility”and“Collaboration”clusters were considered to be the lowest priority but remained identified by participants as research priorities.The average policy relevance scores for each of the 12 clusters were roughly the same as the priority scores for all clusters.Some issues