期刊文献+

一种基于统计语义聚类的查询语言模型估计 被引量:3

An Estimation of Query Language Model Based on Statistical Semantic Clustering
下载PDF
导出
摘要 如何有效生成文档聚类并使用聚类信息提高检索效果是信息检索中的重要研究课题.如果假设文档中存在若干隐含的独立主题,那么文档可以看成是由这些隐含的独立主题混合噪声相互作用的结果.基于这个假设提出了一种基于独立分量分析的语义聚类技术,试图借助于独立分量分析的良好主题区分能力,将一组文档按照实际隐含的主题在语义空间上聚类.在语言模型的框架下,语义主题聚类将由用户初始查询按照一定的度量方式激活.利用激活语义聚类的信息估计一个反馈语义主题模型,并与初始查询模型一起形成新的查询模型.在5个TREC数据集上的实验结果表明:基于统计语义聚类估计的查询模型相比传统的查询模型以及其他基于聚类的语言模型在检索性能上有显著性提高.其主要原因是应用了和用户查询最相似的语义聚类信息来估计查询模型. It is an important research direction in information retrieval to determine how to effectively generate clusters and use the information in clusters.Assuming that a document contains a set of independent hidden topics,a document is viewed as an interaction of independent hidden topics with some noise.A novel semantic clustering technique using independent component analysis is proposed according to this assumption.The perfect topic separation capability of independent component analysis will group a set of documents into different semantic clusters according to the hidden independent components in semantic space.Within language modeling framework,a certain semantic cluster is activated by a user's initial query.A new query language model can be estimated by a user's initial query model and a feedback semantic topic model which is estimated from the semantic cluster information in an activated semantic cluster.The estimated query model is applied in experiments on five TREC data sets.The experiment results show that the semantic cluster based query model can significantly improve retrieval performance over traditional query models and other cluster based language models.The main contribution of the improved performance comes from the estimation of query model on the semantic cluster that is most similar to a user's query.
出处 《计算机研究与发展》 EI CSCD 北大核心 2011年第2期224-231,共8页 Journal of Computer Research and Development
基金 中国国家留学基金项目 美国国家自然科学基金项目(NSF/IIS0704628)
关键词 语义聚类 独立分量分析 查询模型 相关模型 语言模型 伪相关反馈 semantic clustering independent component analysis query model relevance model language model pseudo relevance feedback
  • 相关文献

参考文献17

  • 1Ponte J, Croft W B. A language modeling approach to information retrieval [C]//Proc of the 21st ACM SIGIR Conf on Research and Development in Information Retrieval. New York: ACM, 1998. 被引量:1
  • 2Lafferty J, Zhai C. Document language models, query models, and risk minimization for information retrieval [C]// Proc of the 24th ACM SIGIR Conf on Research and Development in Information Retrieval. New York: ACM, 2001. 被引量:1
  • 3Lee K S, Croft W B, Allan J. A cluster-based resampling method for pseudo-relevance feedback [C] //Proc of the 31st ACM SIGIR Conf on Research and Development in Information Retrieval. New York: ACM, 2008. 被引量:1
  • 4Liu X, Croft W B. Cluster-based retrieval using language models [C] //Proe of the 27th ACM SIGIR Conf on Research and Development in Information Retrieval. New York: ACM, 2004. 被引量:1
  • 5曲卫民,张俊林,孙乐.基于主题的汉语语言模型的研究[J].计算机研究与发展,2003,40(9):1368-1374. 被引量:3
  • 6Kalmanovich I G, Kurland O. Cluster-based query expansion [C] //Proc of the 32nd ACM SIGIR Conf on Research and Development in Information Retrieval. New York: ACM, 2009. 被引量:1
  • 7Hyv-rinen A, Karhunen J, Oja E. Independent Component Analysis [M]. New York: John Wiley & Sons, 2001. 被引量:1
  • 8杨福生,洪波著..独立分量分析的原理与应用[M].北京:清华大学出版社,2006:205.
  • 9Zhai C, Lafferty J. Model-based feedback in the language modeling approach to information retrieval [C] //Proc of the 10th Int Conf on Information and Knowledge Management (CIKM'01). New York:ACM, 2001. 被引量:1
  • 10Lia Y, Zhai C. Adaptive relevance feedback in information retrieval [C] //Proc of the 18th ACM Int Conf on Information and Knowledge Management (CIKM'09). New York: ACM, 2009. 被引量:1

二级参考文献10

  • 1R DeMoil, M Federico. Language model adaptation. In: Keith Pointing ed. Computational Models of Speech Pattern Processing. NATO ASI Series. Berlin: Springer Verlag, 1999. 102~111. 被引量:1
  • 2R Kuhn, R D Mori. A cache-based natural language model for speech reproduction. IEEE Trans on Pattern Analysis and Machine Intelligence, 1990, PAM2-12(6) : 570~583. 被引量:1
  • 3Daniel Gildea, Thomas Hofrnann. Topic-based language models using EM. In: Proc of the 6th European Conf on Speech Communication and Technology (EUROPEANSPEECH ) .Budapest, Hungary: ESCA, 1999. 2167~2170. 被引量:1
  • 4R Iyer, M Ostendorf. Modeling long distance dependence in language: Topic mixtures vs dynamic cache models. In: Proc of ICSLP. Philadelphia, USA: IEEE Press, 1996. 236~239. 被引量:1
  • 5K Seymore, R Roe, enfeld. Using story topics for language model adaptation. In: Proc of Eurospeech'97. Rhodes, Greece: ESCA,1997. 1987~ 1990. 被引量:1
  • 6Kristie Seymore, Stanley Chen, Ronald Rosenfeld. Nonlinear interpolation of topic models for language model adaptation. In: Proc of ICSLP-98. Sydney, Australia: ASSTA, 1998. 2503~2506. 被引量:1
  • 7Stanley F Chen, Kristie Seymore, Ronald Rosenfeld. Topic adaptation for language modeling using unnormalized exponential models. In: ICASSP-98. Seatde, Washhagton: IEEE Press,1998. 681~684. 被引量:1
  • 8P Clarkson, A Robinson. Language model adaptation using mixtures and an exponentially decaying cache. In: Proc of ICASSP-97. Munich, Germany: IEEE Press, 1997. 799~802. 被引量:1
  • 9Ronald Rosenfeld. A maximum entropy approach to adaptive statistical language modeling. Computer Speech and Language,1996, 10: 187~228. 被引量:1
  • 10P Dempster, N M Laivd, D B Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society B, 1977, 39:1~3. 被引量:1

共引文献2

同被引文献18

  • 1丁国栋,白硕,王斌.一种基于局部共现的查询扩展方法[J].中文信息学报,2006,20(3):84-91. 被引量:44
  • 2Qiang H, Dawei S, Stefan R. Robust Query-Specific Pseudo Feedback Document Selection for Query Expansion[A]//Proc. of the 30th European Conf. on Information Retrieval (ECIR), 2008[C]. Heidelberg. Springer-Verlag, 2008 : 547-554. 被引量:1
  • 3Ben H, Ladh O. Finding Good Feedback Documents[A]//Proe. of the 18th ACM Conf. on Information and Knowledge Manage- ment(CIKM), 2009 [C]. New York: ACM Press, 2009.. 2011- 2014. 被引量:1
  • 4Karthik R, Raghavendra U, Pushpak B, et al. On Improving Pseudo-Relevance Feedback Using Pseudo-Irrelevant Documents [A]//Proc. of the 32nd European Conf. on Information Retrie- val(ECIR), 2010 [C]. Heidelberg.. Springer-Verlag, 2010: 573- 576. 被引量:1
  • 5Lv Yuan-hua, Zhai Cheng-xiang, Chen Wan. A Boosting Ap- proach to Improving Pseudo-Relevance Feedbaek[A]//Proc. of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2011 [C]. New York: ACM Press, 2011 : 165-174. 被引量:1
  • 6Sakai T, Manabe T, Koyama M. Flexible Pseudo-Relevance Feedback via Selective Sampling[J]. ACM Transactions on Asian Language Information Processing, 2005,4(2) :111-135. 被引量:1
  • 7Kyung S L, Croft W B, James Pu A Cluster-Based Resampling Method for Pseudo-Relevance Feedback[A]//Proc. of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2008 [C]. New York: ACM Press, 2008 .. 235-242. 被引量:1
  • 8Shariq B, Andreas B. Improving Retrievability of Patents with Cluster-Based Pseudo-Relevance Feedback Document Selection [A]//Proc. of the 18th ACM Conf. on Information and Know- ledge Management (CIKM), 2009[C]. New York: ACM Press, 2009: 1863-1866. 被引量:1
  • 9Kevyn C T, Jamie C. Estimation and Use of Uncertainty in Pseudo-Relevance Feedback[A]//Proc. of the 30th Annual In- ternational ACM SIGIR Conference on Research and Develop- ment in Information Retrieval, 2007 [C]. New York1 ACM Press, 2007 .. 303-310. 被引量:1
  • 10叶正.基于网络挖掘与机器学习技术的相关反馈研究[D].大连:大连理工大学,2011. 被引量:2

引证文献3

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部