摘要
随着网络信息的迅猛发展,信息量日益增加,怎样从海量的Internet上获取有用信息,WEB文本挖掘系统是挖掘技术的重要应用方向,它是指在给定的分类体系下,根据网页的内容自动判别内容类别的过程,论文对文本中所涉及的关键技术,包括K-最近邻参照法模型、基于隐马尔科夫模型(HMM)的信息抽取、机器学习方法,进行了研究和探讨,并且给出了基于信息抽取的文本挖掘系统的设计实现和下一步的研究重点。
With the development of network technology,the spread of internet become more and more quick.There are many types of complicated data in the information ocean.How to acquire useful knowledge quickly from the information ocean is the very difficult.The Text Mining based on Web is a new research field which can solve the problem effectively .This paper gives a research to several key techniques about Text Mining,including K-Nearest Neighbor Model, Information Extraction (IE) based on Hide in Markov Model (HMM), Machine Learning.It also describes a text mining model based on IE,and gives the results.
出处
《计算机工程与应用》
CSCD
北大核心
2003年第34期167-169,196,共4页
Computer Engineering and Applications