This paper describes the functions,characteristics and operating principles of search engines based on Web text,and the searching and data mining technologies for Web-based text information.Methods of computer-aided t...This paper describes the functions,characteristics and operating principles of search engines based on Web text,and the searching and data mining technologies for Web-based text information.Methods of computer-aided text clustering and abstacting are also given.Finally,it gives some guidelines for the assessment of searching quality.展开更多
话题检测与跟踪(Topic Detection and Tracking,TDT)任务是对互联网热门话题和敏感话题进行信息处理,受到了研究者的广泛关注。其中,它的子任务之一是话题跟踪任务,即跟踪热门话题和敏感话题。话题跟踪任务的关键技术是话题/报道表示模...话题检测与跟踪(Topic Detection and Tracking,TDT)任务是对互联网热门话题和敏感话题进行信息处理,受到了研究者的广泛关注。其中,它的子任务之一是话题跟踪任务,即跟踪热门话题和敏感话题。话题跟踪任务的关键技术是话题/报道表示模型和文本分类算法。因此,本文主要研究话题跟踪关键技术,分析关键技术的优缺点,并采用话题/报道表示模型表示话题和报道,同时利用文本分类算法判断报道与话题的相关性以跟踪同类话题,采用话题检测与跟踪评测方法评估话题跟踪结果,设计通用的话题跟踪系统。研究结果表明,该系统具有良好的应用前景。展开更多
An algorithm of text classification is given that imitates human's in this paper. On one hand, the algorithmenhances weight of theme when feature vector is processed, because of the assumption that the title of a ...An algorithm of text classification is given that imitates human's in this paper. On one hand, the algorithmenhances weight of theme when feature vector is processed, because of the assumption that the title of a document canproject its content. On the other hand,a weight parameter o vector is designed to simulate human's skimming andskipping behavior for calculating method of a document cluster center, and a weight of the feature that there are morepositive examples than negative ones is enhanced . The experiment shows that the algorithm greatly improves the per-formance of a text classification system.展开更多
文摘This paper describes the functions,characteristics and operating principles of search engines based on Web text,and the searching and data mining technologies for Web-based text information.Methods of computer-aided text clustering and abstacting are also given.Finally,it gives some guidelines for the assessment of searching quality.
文摘话题检测与跟踪(Topic Detection and Tracking,TDT)任务是对互联网热门话题和敏感话题进行信息处理,受到了研究者的广泛关注。其中,它的子任务之一是话题跟踪任务,即跟踪热门话题和敏感话题。话题跟踪任务的关键技术是话题/报道表示模型和文本分类算法。因此,本文主要研究话题跟踪关键技术,分析关键技术的优缺点,并采用话题/报道表示模型表示话题和报道,同时利用文本分类算法判断报道与话题的相关性以跟踪同类话题,采用话题检测与跟踪评测方法评估话题跟踪结果,设计通用的话题跟踪系统。研究结果表明,该系统具有良好的应用前景。
文摘An algorithm of text classification is given that imitates human's in this paper. On one hand, the algorithmenhances weight of theme when feature vector is processed, because of the assumption that the title of a document canproject its content. On the other hand,a weight parameter o vector is designed to simulate human's skimming andskipping behavior for calculating method of a document cluster center, and a weight of the feature that there are morepositive examples than negative ones is enhanced . The experiment shows that the algorithm greatly improves the per-formance of a text classification system.