摘要
W eb信息自动标引系统或搜索引擎的索引库的建立大多采用加权词频统计法,但引源的权值较难确定。为得出科学的加权方案,从标引词应反映文档主题内容这一原则出发,对标引源的权值设置提出了一种改进方案,基于遗传算法对W eb信息自动标引。此方案可以使标引源的权值设置根据标引内容动态调整,有效地提高标引源权值设置的合理性、准确性,自动标引结果的人机相符率可达到87.9%,具有较强的实用性。
The statistical method of Web information automatic indexing location weighting and the words'frequency is mostly used in the system of or the establishment of index storehouse for search engine. But the weighting value of the indexing source is decided difficultly, rio find a scientific weighting scheme, on the basis of the principle that indexing words should reflect the subject of the document, an improvement scheme for the weighting of the indexing source was proposed, that is the Web information automatic indexing based on the genetic algorithm. This method may cause the weight value establishment of the indexing source adjust dynamically basis on the indexing content, and it can enhance the rationality and accuracy for the weight value setting of the indexing source effectively, the matching rate of the automatic indexing result between people and the machine can achieve 87.9%, and it has a higher usability.
出处
《吉林大学学报(信息科学版)》
CAS
2006年第5期542-547,共6页
Journal of Jilin University(Information Science Edition)
关键词
自动标引
权值
遗传算法
automatic indexing
weight value
genetic algorithm