摘要
针对向量空间模型中的权重计算公式仅考虑词语项在文档中的相关频数,提出词语项本身的领域权重概念,改进了向量空间模型的权重计算。同时结合关键词距离和关键词顺序信息,实现了句子相似度计算,以特定课程的FAQ库检索作S@n测试对比,结果表明改进后的相似度模型提高了S@n值。
In view of the fact that only the term' s relevant frequency in documents is considered in the weight calculation formula of Vector Space Model ( VSM), a concept of term's domain weight is put forward to improve the weight calculation of VSM. Further more, the keywords' distance and sequence are combined to realize the similarity calculation of the sentence. By conducting the S@n test with the special course's FAQ database, the results show that the S@n value is increased by the improved similarity model.
出处
《情报理论与实践》
CSSCI
北大核心
2008年第4期624-627,共4页
Information Studies:Theory & Application
基金
安徽省高校省级自然科学基金项目研究成果之一
项目编号:KJ2007B245