期刊文献+

基于海量冗余网页过滤的Web挖掘技术研究 被引量:2

Web Mining Technology Research Based on the Mass Redundant Web Filter
下载PDF
导出
摘要 智能教学系统通过搜索网页关键词获取教学资源时,由于存在许多具有相同关键词的垃圾网页的影响,使得教学资源较难从海量网页信息中快速挖掘出来,传统的关键词查找方法受垃圾网页的影响使得搜索量过大,造成智能教学资源获取的及时性不高。为此,提出Web信息抽取技术应用在智能教学资源挖掘中。根据教学资源获取要求批量获取相关Web网页,利用Xpath语言结合搜索请求和网页主题信息块特征对Web网页进行清洗,然后根据Web文本特征模型挖掘出教学所需的资源。仿真实验表明,这种方法能够有效克服垃圾网页地干扰,快速完成教学资源地挖掘,取得了满意的结果。 Research intelligent teaching system of teaching resources fast mining.When intelligent teaching system through the web keywords to search the teaching resources,because there are many with the same key words of garbage the influence of the web page,which is hard to teaching resources from huge web information quickly dug out.The traditional ways to search keywords by the municipal waste the influence of web search volume is too large,cause intelligent teaching resources of the gain of timeliness is not high.In order to solve this problem,this paper puts forward Web information extraction technology used in intelligent teaching resource mining.According to the teaching requirements for access to resources related Web page batch,Xpath language is used to union search requests and Web page subject information piece features on the Web page for cleaning,and then based on the Web text characteristic model dig out the teaching resources needed.The simulation experiment shows that this method can effectively avoid the interference of garbage web page,complete the teaching resources of the fast mining,and satisfactory results were obtained.
作者 赵玺
出处 《科技通报》 北大核心 2013年第4期21-22,25,共3页 Bulletin of Science and Technology
关键词 智能教学 垃圾网页 信息抽取 intelligent teaching garbage web page information extraction
  • 相关文献

参考文献5

二级参考文献15

  • 1宋永发,袁永博,吴智敏.国内重点高等院校工程管理专业本科教学计划比较研究[J].高等建筑教育,2004,13(3):1-4. 被引量:20
  • 2孙春玲,尹贻林,严玲.专业协会对工程管理学科人才培养的介入机制研究[J].高等工程教育研究,2005,53(5):78-81. 被引量:25
  • 3E Riloff, R Jones. Learning dictionaries for information extraction by multi - level bootstrapping [ C ]. Proceedings of the Sixteenth National Conference on Artilicial Intelligence, Orlando: AAAI Press, 1999. 811 - 816. 被引量:1
  • 4N Kushmerick. Wrapper induction:Efficiency and expressiveness [ J ]. Artificial Intelligence, 2000,118 ( 12 ) : 15 - 68. 被引量:1
  • 5Kristie Seymore, Andrew McCallum, Ronal Rosenfel. Learning hidden Markov model structure for information extraction [ C ]. Proceedings of the AAAI' 99 Workshop on Machine Learning for Information Extraction, Orlando: AAAI Press, 1999.37 - 42. 被引量:1
  • 6Dayne Frietag, Andrew McCallum. Information extraction with HMMs and shrinkage [ C ]. Proceedings of the AAAI' 99 Workshop on Machine Learning for Information Extraction, Orlando: AAAI Press, 1999.31 - 36. 被引量:1
  • 7Lawrence E Rabiner. A tutorial on hidden Markov models and selected application in speech recognition [ C ]. Proceedings of the IEEE, 1989,77(2) :257 -286. 被引量:1
  • 8TURNEY PD, LITTMAN ML. Measuring praise and criticism: inference of semantic orientation from association[ J]. ACM Transactions on Information System, 2003, 21(4): 315 -346. 被引量:1
  • 9YI J, NIBLACK W. Sentiment mining in WebFountain[ A]. Proceedings of the 21st International Conference on Data Engineering( ICDE 2005) [ C]. Washington, DC, USA: IEEE Computer Society Press, 2005. 1073 - 1083. 被引量:1
  • 10TURNEY PD. Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews[ A]. Proceedings of the Association for Computational Linguistics 40th Anniversary Meeting[C]. Philadelphia, PA, USA, 2002. 417-424. 被引量:1

共引文献69

同被引文献18

引证文献2

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部