With the dramatically development of Internet, the information processing and management technology onWWW have become a great important branch of data mining and data warehouse. Especially, nowadays, Text Miningis mar...With the dramatically development of Internet, the information processing and management technology onWWW have become a great important branch of data mining and data warehouse. Especially, nowadays, Text Miningis marvelously emerging and plays an important role in interrelated fields. So it is worth summarizing the contentabout text mining from its definition to relational methods and techniques. In this paper, combined to comparativelymature data mining technology, we present the definition of text mining and the multi-stage text mining process mod-el. Moreover, this paper roundly introduces the key areas of text mining and some of the powerful text analysis tech-niques, including: Word Automatic Segmenting, Feature Representation, Feature Extraction, Text Categorization,Text Clustering, Text Summarization, Information Extraction, Pattern Quality Evaluation, etc. These techniquescover the whole process from information preprocessing to knowledge obtaining.展开更多
Feature representation is one of the key issues in data clustering. The existing feature representation of scientific data is not sufficient, which to some extent affects the result of scientific data clustering. Ther...Feature representation is one of the key issues in data clustering. The existing feature representation of scientific data is not sufficient, which to some extent affects the result of scientific data clustering. Therefore, the paper proposes a concept of composite text description(CTD) and a CTD-based feature representation method for biomedical scientific data. The method mainly uses different feature weight algorisms to represent candidate features based on two types of data sources respectively, combines and finally strengthens the two feature sets. Experiments show that comparing with traditional methods, the feature representation method is more effective than traditional methods and can significantly improve the performance of biomedcial data clustering.展开更多
由于传统的端到端记忆神经网络模型特征表示能力不足、无法很好地表示各个记忆之间的联系,导致其在数据集b Ab I中的位置推理和路径查找问题正确率不高,针对此问题,提出了一种结合稠密连接和多层感知机的记忆神经网络。该模型利用稠密...由于传统的端到端记忆神经网络模型特征表示能力不足、无法很好地表示各个记忆之间的联系,导致其在数据集b Ab I中的位置推理和路径查找问题正确率不高,针对此问题,提出了一种结合稠密连接和多层感知机的记忆神经网络。该模型利用稠密连接与全连接层获取关系特征,增强了模型的特征表示能力。利用b Ab I数据集对模型进行推理正确率的评估,实验结果表明,与传统的记忆神经网络以及端到端记忆神经网络相比,该模型能有效提升文本推理的正确率。展开更多
文摘With the dramatically development of Internet, the information processing and management technology onWWW have become a great important branch of data mining and data warehouse. Especially, nowadays, Text Miningis marvelously emerging and plays an important role in interrelated fields. So it is worth summarizing the contentabout text mining from its definition to relational methods and techniques. In this paper, combined to comparativelymature data mining technology, we present the definition of text mining and the multi-stage text mining process mod-el. Moreover, this paper roundly introduces the key areas of text mining and some of the powerful text analysis tech-niques, including: Word Automatic Segmenting, Feature Representation, Feature Extraction, Text Categorization,Text Clustering, Text Summarization, Information Extraction, Pattern Quality Evaluation, etc. These techniquescover the whole process from information preprocessing to knowledge obtaining.
基金supported by the Agridata,the sub-program of National Science and Technology Infrastructure Program(Grant No.2005DKA31800)
文摘Feature representation is one of the key issues in data clustering. The existing feature representation of scientific data is not sufficient, which to some extent affects the result of scientific data clustering. Therefore, the paper proposes a concept of composite text description(CTD) and a CTD-based feature representation method for biomedical scientific data. The method mainly uses different feature weight algorisms to represent candidate features based on two types of data sources respectively, combines and finally strengthens the two feature sets. Experiments show that comparing with traditional methods, the feature representation method is more effective than traditional methods and can significantly improve the performance of biomedcial data clustering.
文摘由于传统的端到端记忆神经网络模型特征表示能力不足、无法很好地表示各个记忆之间的联系,导致其在数据集b Ab I中的位置推理和路径查找问题正确率不高,针对此问题,提出了一种结合稠密连接和多层感知机的记忆神经网络。该模型利用稠密连接与全连接层获取关系特征,增强了模型的特征表示能力。利用b Ab I数据集对模型进行推理正确率的评估,实验结果表明,与传统的记忆神经网络以及端到端记忆神经网络相比,该模型能有效提升文本推理的正确率。