摘要
信息采集系统一般需要用户手动设置采集规则,对采集结果不进行处理并返回大量信息。为了简化用户的操作并直接得到所需结果,提出了一种智能化信息采集系统。面向专家信息的采集,基于搜索引擎,根据专家的姓名、工作单位和领域关键词,利用搜索引擎搜索与专家信息相关的网页,对网页文档进行规范化处理,并对网页的主体进行提取。经过语义相关度的计算来实现专家信息智能识别。测试结果显示,系统的采准率约为83.87%.
Most collection systems require users to set collect rules and return large amount of information not processed.In order to simplify users' operation and directly obtain the required result,an intelligent information acquisition system was proposed.The system,which chooses experts' information composed of name,unite and field as collection object,searches the expert information automatically through the search engine,standardizes the hyper text markup language documents in order to find out the main text of the documents.The experts' information was identified by natural language processing and semantic relevancy.The tested results show that the system accuracy is about 83.87%.
出处
《兵工学报》
EI
CAS
CSCD
北大核心
2009年第S1期130-134,共5页
Acta Armamentarii
关键词
计算机应用技术
信息采集
智能化
主体文本选取
网页识别
computer application technology
information acquisition
intelligent
extraction of main text
recognizing of web pages