摘要
条件随机场模型是目前处理Web对象属性标注问题的最佳统计模型。为解决条件随机场模型不能充分利用Web对象和属性标签之间的特征关系这一问题,提出了一种增强约束条件随机场模型。借鉴最大间隔的思想,在原有条件随机场模型中增加约束条件和增强因子以提高模型标注正确率。使用最大似然参数估计方法估计模型特征函数的权重参数,并用Viterbi算法进行预测。在数据集中引入验证集的概念,以获得最优增强因子。实验结果表明,该模型有效地提高了Web对象属性标注正确率。
Conditional random fields model is one of the best statistical models of attribute labeling for Web objects. To overcome the problem that conditional random fields model does not take full advantage of the relationship between Web objects and attribute labels, this paper proposes a boosted constrained conditional random fields model. Motivated by the maximum margin criterion, the proposed model introduces constraints and boosting factor into the conditional random fields model to improve the accuracy of attribute labeling. Maximum likelihood estimation is used for the weights of characteristic functions and the Viterbi algorithm is applied for labeling. To get the best boosting factor, the concept of validation set is introduced in the dataset. The experimental results show that the labeling accuracy is improved effectively.
出处
《计算机科学与探索》
CSCD
2014年第9期1129-1136,共8页
Journal of Frontiers of Computer Science and Technology
基金
国家自然科学基金
教育部留学回国人员科研启动基金资助项目~~
关键词
约束条件随机场
增强因子
属性标注
WEB对象
最大间隔
constrained conditional random fields
boosting factor
attribute labeling
Web object
maximum margin