期刊文献+

基于约束条件随机场的Web对象属性标注

Web Object Attribute Labeling Based on Constrained Conditional Random Fields
下载PDF
导出
摘要 条件随机场模型是目前处理Web对象属性标注问题的最佳统计模型。为解决条件随机场模型不能充分利用Web对象和属性标签之间的特征关系这一问题,提出了一种增强约束条件随机场模型。借鉴最大间隔的思想,在原有条件随机场模型中增加约束条件和增强因子以提高模型标注正确率。使用最大似然参数估计方法估计模型特征函数的权重参数,并用Viterbi算法进行预测。在数据集中引入验证集的概念,以获得最优增强因子。实验结果表明,该模型有效地提高了Web对象属性标注正确率。 Conditional random fields model is one of the best statistical models of attribute labeling for Web objects. To overcome the problem that conditional random fields model does not take full advantage of the relationship between Web objects and attribute labels, this paper proposes a boosted constrained conditional random fields model. Motivated by the maximum margin criterion, the proposed model introduces constraints and boosting factor into the conditional random fields model to improve the accuracy of attribute labeling. Maximum likelihood estimation is used for the weights of characteristic functions and the Viterbi algorithm is applied for labeling. To get the best boosting factor, the concept of validation set is introduced in the dataset. The experimental results show that the labeling accuracy is improved effectively.
作者 吴秦 黄彦姣
出处 《计算机科学与探索》 CSCD 2014年第9期1129-1136,共8页 Journal of Frontiers of Computer Science and Technology
基金 国家自然科学基金 教育部留学回国人员科研启动基金资助项目~~
关键词 约束条件随机场 增强因子 属性标注 WEB对象 最大间隔 constrained conditional random fields boosting factor attribute labeling Web object maximum margin
  • 相关文献

参考文献20

  • 1Chang C H, Kayed M, Girgis M R, et al. A survey of Web information extraction systems[J]. IEEE Transactions on Knowledge and Data Engineering, 2006, 18(10): 1411 - 1428. 被引量:1
  • 2Satpal S, Bhadra S, Sellamanickam S, et al. Web informa- tion extraction using Markov logic networks[C]//Proceed- ings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '11). New York, NY, USA: ACM, 2011: 1406-1414. 被引量:1
  • 3McCallum A, Freitag D, Fereira F. Maximum entropy Markov models for information extraction and segmentation[C]// Proceedings of the 17th International Conference on Machine Learning (ICML '00). San Francisco, CA, USA: Morgan Kaufmann Publishers Inc, 2000: 591-598. 被引量:1
  • 4Lafferty J, McCallum A, Pereira F. Conditional random fields: probabilistic models for segmenting and labeling sequence data[C]//Proceedings of the 17th International Conference on Machine Learning (ICML '01). San Francisco, CA, USA: Morgan Kaufi'nann Publishers Inc, 2001: 282-289. 被引量:1
  • 5Sutton C, McCallum A. An introduction to conditional ran- dom fields[J]. Machine Learning, 2011, 4(4): 267-373. 被引量:1
  • 6Zhu Jun, Nie Zaiqing, Wen Jirong, et al. Simultaneous record detection and attribute labeling in Web data extraction[C]// Proceedings of the 12th ACM SIGKDD International Con- ference on Knowledge Discovery and Data Mining (KDD '06). New York, NY, USA: ACM, 2006: 494-503. 被引量:1
  • 7Chen Junjie, Jia Junyao, Duan Liguo. DOM semantic expansion-based extraction of topical information from Web pages[C]//LNCS 6988: Proceedings of the 2011 Inter- national Conference on Web Information Systems and Min- ing (WISM '11), Taiyuan, China, 2011. Berlin, Heidelberg: Springer-Verlag, 2011: 343-350. 被引量:1
  • 8Fayzrakhmanov R. Information extraction from Web pages based on their visual representation[J]. Current Trends in Web Engineering, 2012, 7059: 342-346. 被引量:1
  • 9Tur G, Deoras A, Hakkani-Tur D. Semantic parsing using word confusion networks with conditional random fields[C]// Proceedings of the 14th Annual Conference of the Intemational Speech Communication Association (INTERSPEECH '13), Lyon, France, 2013. [S.1.]: ISCA, 2013: 2579-2583. 被引量:1
  • 10施水才,王锴,韩艳铧,吕学强.基于条件随机场的领域术语识别研究[J].计算机工程与应用,2013,49(10):147-149. 被引量:14

二级参考文献20

  • 1张素香,高国洋,戚银城.基于条件随机场的中国人名识别方法[J].郑州大学学报(理学版),2009,41(2):40-43. 被引量:7
  • 2张锋,许云,侯艳,樊孝忠.基于互信息的中文术语抽取系统[J].计算机应用研究,2005,22(5):72-73. 被引量:36
  • 3周俊生,戴新宇,尹存燕,陈家骏.基于层叠条件随机场模型的中文机构名自动识别[J].电子学报,2006,34(5):804-809. 被引量:112
  • 4Zhai Yanhong, Liu Bing. Web data extraction based on partial tree alignment [C] //Proc of the 14th Int Conf on World Wide Web. New York: ACM, 2005:76-85. 被引量:1
  • 5Haas L M. Beauty and the beast: The theory and practice ot information integration [C] //Proc of the 12th Int Conf on Database Theory. Ber}in~ Springer, 2007:28-43. 被引量:1
  • 6Lafferty J D, McCallum A, Pereira F C. Conditional random fields: Probabillstic models for segmenting and labeling sequence data [C] //Proc of the 18th Int Conf on Machine Learning. San Francisco: Morgan Kaufmann, 2001 : 282-289. 被引量:1
  • 7Embley D W, Campbell D M, Jiang Y S, et al. Conceptual- model based data extraction from multiple record Web pages [J]. Data Knowledge Engineering, 1999, 31(3): 227-251. 被引量:1
  • 8Arlotta L, Crescenzi V, Mecca G, et al. Automatic annotation of data extracted from large Web sites [C] //Proc of the 6th Int Workshop on the Web and Databases. New York: ACM, 2003:7-12. 被引量:1
  • 9Nie Zaiqing, Wu Fei, Wen Jirong, et al. Extracting objects from the Web [C] //Proc of the 22nd Int Conf on Data Engineering. Piscataway, NJ: IEEE, 2006:123-134. 被引量:1
  • 10Kristjansson T, Culotta A, Viola P, et al. Interactive information extraction with constrained conditional random fields [C] //Proe of the 19th National Conf on Artificial Intelligence. Menlo Park, CA: AAAI, 2004: 412-418. 被引量:1

共引文献21

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部