期刊文献+

Web中的行情数据获取与预测研究 被引量:2

Research on market data extraction and forecast on Web
下载PDF
导出
摘要 抽取网页中的行情数据进行预测和分析具有重要意义。提出了Web中的行情数据抽取算法,该算法主要基于"行情数据通常在网页中表现为区域最大的数据表格"等实践规律,首先自动识别出最大的数据表格,然后转换为DOM树结构,最后抽取DOM树的结点值。与传统算法不同,算法自动抽取行情区域而无需用户定义抽取数据区域。设计了一个农产品价格预测原型系统,该系统针对某个农产品,自动从特定网站获取价格数据,对月度价格进行预测,实验表明预测性能较好。 It is significant to extract market data in Web pages for prediction and analysis.An extraction algorithm for Web pages is proposed.Taking into account the common practice that “market data are usually displayed in the largest table on a Web page”,the market data extraction algorithm first detects the largest table on a Web page and then transfers it into a DOM tree,and in the end gets the node values of the tree.This algorithm is different from traditional ones in that it can automatically detect market data and does not need a data extraction region to be specified by the users.A prototype system for agriculture product price prediction is designed and developed.The system extracts market price data from a given website automatically and predicts the price in the future months.Experimental results show the prediction results are satisfying.
出处 《计算机工程与应用》 CSCD 北大核心 2009年第20期202-204,248,共4页 Computer Engineering and Applications
基金 安徽省科研项目No.KJ2008B033~~
关键词 WEB内容挖掘 行情数据抽取 行情预测 Web content mining market data extraction market data prediction
  • 相关文献

参考文献12

  • 1Kusumura Y,Hijikata Y,Nishida S.Extracting fixed information from miscellaneous documents on net auction[C]//17th International Conference on Advanced Information Networking and Applications, AINA 2003:446-453. 被引量:1
  • 2Doorendos R B,Etzioni O, Weld D.A scalable comparison-shopping agent for the world-wide Web[C]//Proc of the 1st International Conference on Autonomous Agents,ACM, 1997. 被引量:1
  • 3Laender A H F,Ribeiro-Neto B A,da Silva A S.A brief survey of Web data extraction tools[J].ACM SIGMOD Record,2002,31 (2): 84-93. 被引量:1
  • 4Wang Ya-lin,Hu Jian-ying.Detecting tables in Web documents[C]// LNCS 2423 : DAS 2002,2002: 249-260. 被引量:1
  • 5Krupl B,Herzog M,Gatterbauer W.Using visual cues for extraction of tabular data from arbitrary HTML documents[C]//Proceedings of the 14th Int'l Conf on World Wide Web,WWW2005,ACM,2005: 1000-1001. 被引量:1
  • 6Gan Y.Structured and semantic data extraction from Web pages[C]//Proceedings of the Third International Conference on Machine Learning and Cybemetics,Shanghai,IEEE,2004:26-29. 被引量:1
  • 7王舒..面向语义网络的实例抽取方法研究[D].中国科学技术大学,2005:
  • 8Gupta S,Kaiser G,Neistadt D.DOM-based content extraction of HTML documents[C]//Proceedings of the 12th Int'l Conf on World Wide Web, WWW2003, IEEE, 2003 : 207-214. 被引量:1
  • 9李效东,顾毓清.基于DOM的Web信息提取[J].计算机学报,2002,25(5):526-533. 被引量:101
  • 10Raggett D.Clean up your Web pages with HTML TIDY[EB/OL]. [2007 -09-18].http://www.w3.org/People/Raggett/tidy/. 被引量:1

二级参考文献17

  • 1Florescu D, Levy A Y, Mendelzon A. Database techniques for the World-Wide Web: A Survery. In: ACM The SIGMOD Record, 1998.59-74 被引量:1
  • 2Atzeni P, Mecca G, Merialdo P. To weave the Web. In: Proc the 23rd International Conference on Very Large Data Bases. Athens, Greece, 1997. 206-215 被引量:1
  • 3Pemberton S et al. XHTML 1.0: The extensible hyperText markup language. In: http://www.w3.org/MarkUp/ 被引量:1
  • 4Cattell R G G. The Object Database Standard ODMG-93. San Mateo,California: Morgan Kaufmann Publishers,1994 被引量:1
  • 5Mitchell T. Machine Learning. New York: McGraw Hill, 1997 被引量:1
  • 6Wall L et al. Programming Perl(3rd Edition). O'Reilly & Associates,2000 被引量:1
  • 7Birbeck M et al. Professional XML. Wrox Press Inc, 2000 被引量:1
  • 8Liu L, Pu C, Han W. XWRAP: An XML-enabled wrapper construction system for web information sources. In: Proc International Conference on Data Engineering (ICDE), San diego, California, 2000. 611-621 被引量:1
  • 9Chamberlin D, Robie J, Florescu D. Quilt: An XML query language for heterogeneous data sources. In: Proc International Workshop on the Web and Databases (WebDB'2000), Dallas, Texas, 2000. 53-62 被引量:1
  • 10Sahuguet A, Azavant F. Building light-weight wrappers for legacy web datasources using w4f. In: Proc International Conference on Very Large Databases, Edinburgh, Scotland, 1999. 738-741 被引量:1

共引文献174

同被引文献17

引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部