Web中的行情数据获取与预测研究被引量：2

Research on market data extraction and forecast on Web

下载PDF

导出

摘要抽取网页中的行情数据进行预测和分析具有重要意义。提出了Web中的行情数据抽取算法,该算法主要基于"行情数据通常在网页中表现为区域最大的数据表格"等实践规律,首先自动识别出最大的数据表格,然后转换为DOM树结构,最后抽取DOM树的结点值。与传统算法不同,算法自动抽取行情区域而无需用户定义抽取数据区域。设计了一个农产品价格预测原型系统,该系统针对某个农产品,自动从特定网站获取价格数据,对月度价格进行预测,实验表明预测性能较好。 It is significant to extract market data in Web pages for prediction and analysis.An extraction algorithm for Web pages is proposed.Taking into account the common practice that “market data are usually displayed in the largest table on a Web page”,the market data extraction algorithm first detects the largest table on a Web page and then transfers it into a DOM tree,and in the end gets the node values of the tree.This algorithm is different from traditional ones in that it can automatically detect market data and does not need a data extraction region to be specified by the users.A prototype system for agriculture product price prediction is designed and developed.The system extracts market price data from a given website automatically and predicts the price in the future months.Experimental results show the prediction results are satisfying.

作者于春燕胡学钢

机构地区合肥工业大学计算机与信息学院滁州学院计算机科学与技术系

出处《计算机工程与应用》 CSCD 北大核心 2009年第20期202-204,248,共4页 Computer Engineering and Applications

基金安徽省科研项目No.KJ2008B033~~

关键词 WEB内容挖掘行情数据抽取行情预测 Web content mining market data extraction market data prediction

分类号 TP18 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献12

1Kusumura Y,Hijikata Y,Nishida S.Extracting fixed information from miscellaneous documents on net auction[C]//17th International Conference on Advanced Information Networking and Applications, AINA 2003:446-453. 被引量：1
2Doorendos R B,Etzioni O, Weld D.A scalable comparison-shopping agent for the world-wide Web[C]//Proc of the 1st International Conference on Autonomous Agents,ACM, 1997. 被引量：1
3Laender A H F,Ribeiro-Neto B A,da Silva A S.A brief survey of Web data extraction tools[J].ACM SIGMOD Record,2002,31 (2): 84-93. 被引量：1
4Wang Ya-lin,Hu Jian-ying.Detecting tables in Web documents[C]// LNCS 2423 : DAS 2002,2002: 249-260. 被引量：1
5Krupl B,Herzog M,Gatterbauer W.Using visual cues for extraction of tabular data from arbitrary HTML documents[C]//Proceedings of the 14th Int'l Conf on World Wide Web,WWW2005,ACM,2005: 1000-1001. 被引量：1
6Gan Y.Structured and semantic data extraction from Web pages[C]//Proceedings of the Third International Conference on Machine Learning and Cybemetics,Shanghai,IEEE,2004:26-29. 被引量：1
7王舒..面向语义网络的实例抽取方法研究[D].中国科学技术大学,2005:
8Gupta S,Kaiser G,Neistadt D.DOM-based content extraction of HTML documents[C]//Proceedings of the 12th Int'l Conf on World Wide Web, WWW2003, IEEE, 2003 : 207-214. 被引量：1
9李效东,顾毓清.基于DOM的Web信息提取[J].计算机学报,2002,25(5):526-533. 被引量：101
10Raggett D.Clean up your Web pages with HTML TIDY[EB/OL]. [2007 -09-18].http://www.w3.org/People/Raggett/tidy/. 被引量：1

二级参考文献17

1Florescu D, Levy A Y, Mendelzon A. Database techniques for the World-Wide Web: A Survery. In: ACM The SIGMOD Record, 1998.59-74 被引量：1
2Atzeni P, Mecca G, Merialdo P. To weave the Web. In: Proc the 23rd International Conference on Very Large Data Bases. Athens, Greece, 1997. 206-215 被引量：1
3Pemberton S et al. XHTML 1.0: The extensible hyperText markup language. In: http://www.w3.org/MarkUp/ 被引量：1
4Cattell R G G. The Object Database Standard ODMG-93. San Mateo,California: Morgan Kaufmann Publishers,1994 被引量：1
5Mitchell T. Machine Learning. New York: McGraw Hill, 1997 被引量：1
6Wall L et al. Programming Perl(3rd Edition). O'Reilly & Associates,2000 被引量：1
7Birbeck M et al. Professional XML. Wrox Press Inc, 2000 被引量：1
8Liu L, Pu C, Han W. XWRAP: An XML-enabled wrapper construction system for web information sources. In: Proc International Conference on Data Engineering (ICDE), San diego, California, 2000. 611-621 被引量：1
9Chamberlin D, Robie J, Florescu D. Quilt: An XML query language for heterogeneous data sources. In: Proc International Workshop on the Web and Databases (WebDB'2000), Dallas, Texas, 2000. 53-62 被引量：1
10Sahuguet A, Azavant F. Building light-weight wrappers for legacy web datasources using w4f. In: Proc International Conference on Very Large Databases, Edinburgh, Scotland, 1999. 738-741 被引量：1

共引文献174

1王丽,唐建雄.基于DOM和网页模板的Web信息抽取[J].电脑知识与技术（过刊）,2007(18):1617-1619. 被引量：1
2杨桢,赵燕平,朱东华.基于正则表达式的信息抽取系统在国防技术监测中的应用[J].北京理工大学学报,2006,26(z1):74-78. 被引量：9
3欧健文,董守斌,蔡斌.模板化网页主题信息的提取方法[J].清华大学学报（自然科学版）,2005,45(S1):1743-1747. 被引量：70
4孙皓,董守斌.基于标签密度的自适应正文提取方法[J].郑州大学学报（理学版）,2009,41(1):44-47. 被引量：3
5王茹,宋瀚涛,陆玉昌.网页数据自动抽取系统[J].计算机工程与应用,2004,40(19):135-138. 被引量：8
6王茹,宋瀚涛,陆玉昌.基于树自动机的网页数据抽取[J].北京理工大学学报,2004,24(9):790-793. 被引量：6
7孟宪福,狄慧.基于Agent和XML的Web页面信息抽取研究与设计[J].计算机工程与设计,2004,25(8):1411-1414. 被引量：6
8李向阳,张亚非.一种网上图书信息抽取方法[J].情报学报,2004,23(6):655-660. 被引量：6
9张清军,朱才连.基于主动学习的Web页面信息抽取[J].情报学报,2004,23(6):667-671. 被引量：5
10LIXiang-yang,ZHANGYa-fei,LUJian-jiang,XUBao-wen.A Classification Method for Web Information Extraction[J].Wuhan University Journal of Natural Sciences,2004,9(5):823-827. 被引量：2

同被引文献17

1姚磊岳.XML数据到一般关系数据库数据的转换[J].洪都科技,2005(1):18-23. 被引量：2
2章义,黎峰.基于XML的数据库存储访问技术[J].计算机工程与设计,2005,26(1):208-211. 被引量：17
3于满泉,陈铁睿,许洪波.基于分块的网页信息解析器的研究与设计[J].计算机应用,2005,25(4):974-976. 被引量：55
4何月顺,汤彬,丁秋林.基于Web的数据挖掘技术的应用研究[J].计算机系统应用,2005,14(5):59-62. 被引量：12
5姜霞,张晓伟.基于XML的Web挖掘技术研究[J].电脑知识与技术,2005(7):79-81. 被引量：1
6孔令波,唐世渭,杨冬青,王腾蛟,高军.XML数据索引技术[J].软件学报,2005,16(12):2063-2079. 被引量：55
7Anne H.H. Ngu, Masaru Kitsuregawa, Erich J. Neuhold[J]. Web Content Mining,Computer Science,2005.3806:763-763. 被引量：1
8郭本俊,王鹏,陈高云,黄健.基于MPI的云计算模型[J].计算机工程,2009,35(24):84-86. 被引量：38
9董慧,唐敏.数据挖掘及其在网络信息检索中的应用[J].情报杂志,2010,29(B06):153-156. 被引量：4
10何晓兵.本体指导下的网络文献信息内容挖掘模型[J].图书情报工作,2010,54(24):45-49. 被引量：2

引证文献2

1梁娟,陈智.一种基于XML的Web内容挖掘预处理方法[J].计算机时代,2011(6):45-46.
2曹炜,蒋文明.基于大数据分析的旅游微博用户偏爱研究[J].滁州学院学报,2019,21(1):41-44.

1于春燕.Web行情数据的抽取研究[J].电脑知识与技术,2007(11):599-600.
2张淑征,陈明锐,许斌,钟东来,佟明川.基于句法分析的文本定义抽取方法[J].海南大学学报（自然科学版）,2016,34(2):105-111. 被引量：2
3张奇,郝志峰,温雯,蔡瑞初.基于互信息度量的Web信息抽取[J].计算机应用与软件,2013,30(12):15-18. 被引量：5
4彭文滔,叶飞跃,李霞,员红娟.信息抽取中基于DOM树的过滤器方法的研究[J].微计算机信息,2008,24(30):217-219. 被引量：4
5王智钢,王池社,李广水,王蓁蓁.基于SVC的证券行情周K线涨跌预测[J].金陵科技学院学报,2013,29(1):15-19. 被引量：1
6江龙艳.基于改进的QPSO-BP算法的锌矿价格行情预测[J].有色金属（矿山部分）,2014,66(4):101-106.
7杨成.基于XML的网页信息提取系统的研究与设计[J].电脑知识与技术（过刊）,2009,15(9X):7327-7329. 被引量：1
8张榕,宋柔.术语定义提取研究[J].术语标准化与信息技术,2006(1):29-32. 被引量：8
9张志强,杨在义,叶安胜,王伟钧.股票交易数据文件的抽取算法研究[J].成都大学学报（自然科学版）,2014,33(3):262-265.
10朱青,吕晓旭.基于机器学习的HTML标题抽取[J].微计算机信息,2010,26(9):15-16. 被引量：4

计算机工程与应用

2009年第20期

浏览历史

内容加载中请稍等...

Web中的行情数据获取与预测研究被引量：2

参考文献12

二级参考文献17

共引文献174

同被引文献17

引证文献2

相关作者

相关机构

相关主题

浏览历史

Web中的行情数据获取与预测研究 被引量：2

参考文献12

二级参考文献17

共引文献174

同被引文献17

引证文献2

相关作者

相关机构

相关主题

浏览历史

Web中的行情数据获取与预测研究被引量：2