摘要
本体的构建是影响语义Web成功与否的重要因素之一。本文借鉴机器学习以及自然语言处理等技术成果尝试半自动构建本体,以专业研究论文为研究语料,采用N-Gram文本表达法从语料中抽取关键概念,计算主题度获取领域概念。利用改进的层次聚类算法对领域概念进行聚类以获取其等级体系,采用句法分析与统计相结合的方法从语料中获取可能的主、谓、宾模式为领域关系提供参考,并以农业史为例,设计开发了一个领域本体半自动构建实验系统,文中重点介绍了本体构建中概念的获取、等级关系、领域关系的构建以及形式化处理等关键技术的实现过程。
The success of the Semantic Web depends strongly on the proliferation of ontologies,which requires fast and easy engineering of ontologies and avoidance of knowledge acquisition bottleneck.In this paper we take the approach that constructed the ontology automatically,which attempted to take a method that extremely beneficial for the knowledge acquisition task was the integration of knowledge acquisition with machine learning techniques to increase the ontology construction effect,including domain concepts acquisition,taxonomy relation recognition,non-taxonomy relation recognition and ontology formalization description.This paper adopted an approach of Non-dictionary Chinese word Segmentation techniques based on N-Gram to acquire domain candidate concepts,take the method based of NLP in the recognition of domain concept property relation,extracted subject,predicate and object of sentences.This triangle data can be treated as the triplet of Data and Object Type Property.
出处
《情报学报》
CSSCI
北大核心
2009年第2期201-207,共7页
Journal of the China Society for Scientific and Technical Information
基金
《中国农业科技遗产数字化保护与利用研究》(科技部社会公益专项基金项目子课题2005DIB6J028)
南京农业大学青年创新基金(Y200727)的资助。
关键词
领域本体
半自动构建
概念抽取
等级关系
领域关系
主谓宾模式
domain ontology
semi-automatic construction
concept extraction
hierarchy relation
domain relation
S-P-O mode