摘要
传统的实体关系触发词词典构建主要采用人工方法和有监督的扩展学习方法。但是,上述两种方法都需要大量的人工参与,并且当关系类型发生变化时需要重新构建触发词词典。提出一种无监督的实体关系触发词词典自动构建方法。首先,对关系实例文档集进行分层狄利克雷过程建模,通过主题过滤和词语概率权重过滤构建候选触发词集合;然后,利用依存句法分析对候选触发词集合进行再次过滤以得到最终的触发词词典。该方法有效避免了传统实体关系触发词词典构建所需的大量人工参与。实验表明,基于分层狄利克雷过程和依存句法分析的实体关系触发词词典自动构建方法有效降低了人工标注成本,取得了较高的准确率。
Traditional construction of entity-relation trigger word dictionary mainly uses artificial or supervised extended learning methods.However,both of the methods require a lot of human involvement,and when the relation type changes,there has the need to rebuild trigger word dictionary. This paper proposes an unsupervised automatic construction method for entity-relation trigger word dictionary. First,we use hierarchical Dirichlet process to model the relation instance document set,and build candidate trigger word set by topics filtration and words probability weight filtration; then we make use of the dependency parsing to filter the candidate trigger word set once again for acquiring final trigger word dictionary. This method effectively avoids the extensive human involvement required by traditional construction of entity-relation trigger word dictionary. Experiments show that the automatic entity-relation trigger dictionary construction method based on hierarchical Dirichlet process and dependency parsing effectively reduces the manual annotation costs and achieves a higher accuracy.
出处
《计算机应用与软件》
CSCD
2016年第5期72-76,共5页
Computer Applications and Software
基金
国家高技术研究发展计划项目(2011AA7032030D)
全军军事研究生课题(军事学YJS1062)
关键词
实体关系触发词词典
分层狄利克雷过程
依存句法分析
Entity-relation trigger word dictionary
Hierarchical Dirichlet process
Dependency parsing