期刊文献+

网页中商品“属性—值”关系的自动抽取方法研究 被引量:7

Automatic Extraction of the Product "Attribute-Value" Pair from the Webpages
下载PDF
导出
摘要 商品属性及其对应值的自动挖掘,对于基于Web的商品市场需求分析、商品推荐、售后服务等诸多领域有重要的应用价值。该文提出一种基于网页标题的模板构建方法,从结构化网页中抽取完整的商品"属性—值"关系。该方法包含四个关键技术:1)利用商品网页标题构建领域相关的属性词包;2)基于预设分隔符细化文本节点;3)结合领域商品属性词包获取种子"属性—值"关系;4)结合网页布局信息和字符信息来筛选与构建模板。该文的实验基于相机和手机两个领域展开,获得94.68%的准确率和90.57%的召回率。 If we represent the products as attributes and attribute values, it will improve the effectiveness of many applications, such as demand forecasting, product recommendations, and product supplier selection. In this paper, we propose a novel pattern based method to extract the "attribute-value" pair of product from structured or semistructured Web pages. This approach contains four key components: 1) acquire domain-specific attributes from tities of Web pages in the same domain. 2) refine text nodes based on some default delimiters. 3) coIlect seed "attribute-value" pairs based on the domain-specific attributes. 4) construct high-quality patterns by combining page-specific layout information and character information. The experimental corpus is collected from two domains: digital camera and mobile phone. Experiments show the proposed method can schieve 94.68%in precision and 90.57% in recall.
出处 《中文信息学报》 CSCD 北大核心 2013年第1期21-29,38,共10页 Journal of Chinese Information Processing
基金 国家自然科学基金资助项目(60970057) 国家自然科学基金资助项目(61003152) 苏州市自然科学基金资助项目(SYG201030)
关键词 商品"属性-值"关系抽取 WEB数据挖掘 模板构建 product "attribute-value" relation extraction web data mining template construction
  • 相关文献

参考文献5

二级参考文献97

  • 1[1]Ellen Riloff. Automatically Constructing a Dictionary for Information Extraction Tasks[C]. In: Proceedings of the Eleventh National Conference on Artificial Intelligence, 811-816. AAAI Press/ The MIT Press, 1993. 被引量:1
  • 2[2]Stephen Soderland, David Fisher, Jonathan Aseltine, and Wendy Lehnert. CRYSTAL: Inducing a conceptual dictionary[C]. In: Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, 1314-1319, 1995. 被引量:1
  • 3[3]Ellen Riloff. Automatically Generating Extraction Patterns from Untagged Text[C]. In: Proceedings of Thirteenth National Conference on Artificial Intelligence (AAAI-96), 1044-1049. 1996. 被引量:1
  • 4[4]Ellen Riloff, Rosie Jones. Learning Dictionaries for Information Extraction by Multi-Level Bootstrapping[C]. In: Proceedings of the Sixteenth National Conference on Artificial Intelligence (AAAI-99), Orlando FL. 1999. 被引量:1
  • 5[5]Roman Yangarber, Ralph Grishman, Pasi Tapanainen and Silja Huttunen. Unsupervised Discovery of Scenario-Level Patterns for Information Extraction[C]. In: Proceedings of Sixth Applied Natural Language Processing Conference (ANLP-2000), 282-289, Seattle WA. 2000. 被引量:1
  • 6Ralph Grishman. 1997. Information Extraction : Tech- niques and Challenges[R]. New York: New York U-niversity, 1997. 被引量:1
  • 7Ralph Grishman, Beth Sundheim. Message Under- standing Conference-6: A Brief History[C]//Proceed- ings of COLING, 1996. 被引量:1
  • 8http://www, itl. nist. gov/iad/mig/tests/ace/[OL]. 被引量:1
  • 9http ://www. nist. gov/tac/[OL]. 被引量:1
  • 10Martina Naughton, N. Kushmerichand J. Carthy. Event Extraction from Hetergeneous News Sources [C]//Proceedings of AAAI, 2006. 被引量:1

共引文献187

同被引文献46

引证文献7

二级引证文献27

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部