摘要
该文采用基于SVD和NMF矩阵分解相结合的改进潜在语义分析的方法为生物医学文献双语摘要进行建模,该模型将英汉双语摘要映射到同一语义空间,不需要外部词典和知识库,建立不同语言之间的对应关系,便于在双语空间中进行检索。该文充分利用医学文献双语摘要语料中的锚信息,通过不同的k值构建多个检索模型,计算每个模型的信任度,使得多个模型都对查询和文本的相似度做出贡献。在语义空间上进行项与项、文本与文本、项与文本之间的相似度计算,实现了双语摘要的跨语言检索。
Focused on the cross language information retrieval, this paper applies the improved Latent Semantic Indexing (LSI)by combining SVD and NMF to construct the semantic space for the abstracts of biomedical literatures. It maps the Chinese document and English document into the same semantic space without external dictionary and knowledge base and for the bilingual information retrieval. The proposed method also utilizes the anchor information included the abstracts of biomedical literatures and builds a series models corresponding to different K dimensions, all contributing to the similarity between query and documents with different credibility. As a result, the similarities of term to term, document to document and term to document are calculated forthe bilingual information retrieval of biomedical abstract. The experiment gets a better result.
出处
《中文信息学报》
CSCD
北大核心
2010年第3期105-111,共7页
Journal of Chinese Information Processing
基金
国家自然科学基金资助项目(60673039
60973068)
国家863高科技计划资助项目(2006AA01Z151)
教育部留学人员归国科研启动基金
教育部博士点基金资助(20090041110002)
关键词
计算机应用
中文信息处理
改进潜在语义分析
语义空间
跨语言检索
SVD
NMF
computer application
Chinese information processing
improved latent semantic indexing
semantic spacel cross language IR
SVD
NMF