摘要
目前信息抽取成为提供高质量信息服务的重要手段,提出面向中文文本信息的自动抽取和相似检索机制,其基本思想是将用户兴趣表示为语义模板,对关键字进行概念扩充,通过搜索引擎获得初步的候选文本集合,在概念触发机制和部分分析技术基础上,利用语义关系到模板槽的映射机制,填充文本语义模板,形成结构化文本数据库.基于文本数据表述的模糊性,给出用户查询与文本语义模板的相似关系,实现了相似检索,可以更加全面地满足用户的信息需求.
The mechanism of information extraction and similar retrieval for Chinese texts is presented in this paper. Users' information interests are represented as semantic Template. The relevant texts are obtained by search engine under conceptual expansion of keywords. Based on conceptual trigger and sentences parser,the text semantic templates are filled in term of the mapping rules between semantic relationship and slots ,so the textual database is built. Considering the fuzzy information from natural language texts, the similarity measure between user's queries and text semantic templates are put forward. Moreover, the digital feature of text can be expanded by fuzzy mathematics and calculated about similarity. It is shows that the mechanism of extraction and retrieval can improve the efficiency of users' query and meet the more and more information demands.
出处
《小型微型计算机系统》
CSCD
北大核心
2007年第11期2074-2079,共6页
Journal of Chinese Computer Systems
基金
国家自然科学基金项目(6037309560673039)资助.
关键词
信息抽取语义模板概念扩充模糊语义
information extraction
semantic templates
conceptual expansion
fuzzy semantic