摘要
作为证券监管机构,如何从海量的网络信息中有效地对文本信息进行准确的分类,对于提高日常监管工作效率是非常重要的。该文主要基于数据挖掘技术,以矢量空间模型VSM为文本的表示方法,提出了一个基于协同演化遗传算法的多文本特征抽取算法,有效地降低了文本特征矢量的维数,为文本分类模板获取等多文本特征获取问题提供了一个可行的解决方案。
Web text mining is a new research issue of KDD and draws great interest from many communities. This paper uses vector space model (VSM) as the description of Web text and gives a feature subset algorithm based on the cooperative evolution genetic algorithm. This algorithm can greatly reduce the dimension of text feature vector and provides a new solution for the feature vector abstraction problem of multi-document, such as the acquiring of the classifying model of documents.
出处
《计算机工程》
EI
CAS
CSCD
北大核心
2005年第4期85-87,共3页
Computer Engineering
关键词
协同演化
VSM
遗传算法
文本特征抽取
Cooperativeevolution
VSM
Genetic algorithm
Text feature abstract