摘要
实体识别是自然语言处理领域中一个十分重要的问题,是信息提取的基础,其识别程度直接影响了后续的句法分析、篇章理解等工作的精确程度。"熵"最初是热力学的一个概念,用来表示不确定度,熵越大,不确定性越大。"最大熵"模型是一种融合多种特征于一体,并综合这些特征进行建模,在满足约束的模型中选择熵最大的模型。"最大熵"模型可以综合观察到各种相关或不相关的概率知识,对许多问题的处理都可以达到较好的效果。通过实验分析了在新闻报道领域最长地点实体的特征,并应用了最大熵模型进行了识别研究。
Entity identification is an important field in Natural Language Processing (NLP). It's foundation of intbrmation extracuon and its accuracy has a direct effect on many NLP tasks such as Syntactic analysis, reading comprehension and so on. "Entropy" was a concept of thermodynamics originally, used to represent uncertainty, and it decreases with increasing of uncertainty. A Maximum Entropy Model combines variety of features to model and fits the right model from all satisfied constraint model. It can observe a variety of related or not related to the probability, and can better problem solving. This paper statistically analyse characteristics of the LLE and identify using the maximum entropy model.
出处
《广东石油化工学院学报》
2012年第4期40-42,45,共4页
Journal of Guangdong University of Petrochemical Technology
基金
湛江师范学院校级项目"突发事件中地点性实体提及的提取及研究"(QL1110)
关键词
最长地点实体
实体识别
最大熵模型
Longest Location Entity (LLE)
Entity Recognition
Maximum Entropy