摘要
科技管理领域热点主题抽取过程主要历经文本挖掘技术中的数据采集与清洗、信息抽取、主题分析三个阶段。其中,热点主题抽取采用TF-IDF信息抽取算法,主题聚类采用共现方法中的合并聚类。通过热点主题抽取、趋势分析和聚类分析,可以实现领域热点工作的提前预测和科学决策,有助于推动政务领域信息的智能化和知识化。
The S&T management field hot topic extraction process mainly undergoes three stages:data acquisition and cleaning,information retrieval, and topic analysis. As for hot topic extraction, TF-IDF information extraction algorithm is applied; in terms of topic clustering, agglomerative clustering from concurrence method is applied. By means of hot topic extraction, trend analysis and clustering analysis, the forecast and scientific decision making for field hot work can be realized, which helps promote the government business field information intellectualization and knowledge-driving.
出处
《计算机应用与软件》
CSCD
北大核心
2012年第7期109-111,140,共4页
Computer Applications and Software
基金
甘肃省科学技术研究与开发基金专项(0912TCYA026)
关键词
科技管理
文本挖掘
信息抽取
S&T management Text mining Information retrieval