期刊文献+

中文口语理解弱监督训练方法 被引量:2

Weakly-supervised training method about Chinese spoken language understanding
下载PDF
导出
摘要 标注数据的获取一直是有监督方法需要面临的一个难题,针对中文口语理解任务中的意图识别研究了结合主动学习和自训练、协同训练两种弱监督训练方法,提出在级联框架下,从关键语义概念识别中获取语义类特征子集和句子本身的字特征子集分别作为两个"视角"的特征进行协同训练。通过在中文口语语料上进行的实验表明:结合主动学习和自训练的方法与被动学习、主动学习相比较,可以最大限度地降低人工标注量;而协同训练在很少的初始标注数据的前提下,利用两个特征子集进行协同训练,最终使得单一字特征子集上的分类错误率平均下降了0.52%。 Annotated corpus acquisition is a difficult problem in supervised approach. Aiming at the intention recognition task of Chinese spoken language understanding, two weakly supervised training approaches were studied. One is combining active learning with self-training, the other is co-training. A new method of acquiring two independent feature sets as two views for co-training was proposed based on spoken language understanding data in cascade frame. The two feature sets were character features of sentence and semantic class features obtained from key semantic concept recognition task. The experimental results on Chinese spoken language corpus show that the method combining active learning with self-training can minimize manual annotation compared with passive learning and active learning. Furthermore, under the premise of a few initial annotation data, co-training based on two feature sets can make the classification error rate fall in an average of 0.52% with single character feature set.
出处 《计算机应用》 CSCD 北大核心 2015年第7期1965-1968,1974,共5页 journal of Computer Applications
基金 国家自然科学基金资助项目(10925419 90920302 61072124 11074275 11161140319 91120001 61271426) 中国科学院战略性先导科技专项(XDA06030100 XDA06030500) 国家863计划项目(2012AA012503) 中国科学院重点部署项目(KGZD-EW-103-2) 内蒙古师范大学"十百千"人才培养工程项目 内蒙古自然科学基金面上项目(2012MS0930 2013MS0912) 内蒙古自治区高等学校科学研究项目(NJZY12032 NJZY028) 内蒙古师范大学引进高层次人才科研启动经费项目(2014YJRC036)
关键词 意图识别 口语理解 弱监督训练 协同训练 主动学习 intention recognition spoken language understanding weakly-supervised training co-training active learning
  • 相关文献

参考文献17

  • 1TORRESANI L. Weakly supervised learning [M]// Computer Vi-sion: A Reference Guide. Berlin: Springer, 2014: 883-885. 被引量:1
  • 2TUR G, HAKKANI-TüR D, SCHAPIRE R. Combining active and semi-supervised learning for spoken language understanding [J]. Speech Communication, 2005, 45(2): 171-186. 被引量:1
  • 3刘康,钱旭,王自强.主动学习算法综述[J].计算机工程与应用,2012,48(34):1-4. 被引量:26
  • 4TSUTAOKA T, SHINODA K. Acoustic model training using committee-based active and semi-supervised learning for speech recognition [C]// APSIPA ASC 2012: 2012 Asia-Pacific Signal & Information Processing Association Annual Summit and Conference. Piscataway: IEEE, 2012: 1-4. 被引量:1
  • 5赵卫中,马慧芳,李志清,史忠植.一种结合主动学习的半监督文档聚类算法[J].软件学报,2012,23(6):1486-1499. 被引量:30
  • 6VIJAYALAKSHMI T, THUTHI SARABAI D. Aspect based topic and opinion mining [J]. International Journal of Computer Trends and Technology, 2014, 15(4): 168-173. 被引量:1
  • 7da SILVA A T, FALC?O A X, MAGALH?ES L P. Active learning paradigms for CBIR systems based on optimum-path forest classification [J]. Pattern Recognition, 2011, 44(12): 2971-2978. 被引量:1
  • 8姚拓中..结合主动学习的视觉场景理解[D].浙江大学,2011:
  • 9SHAHSHAHANI B, LANDGREBE D. The effect of unlabeled samples in reducing the small sample size problem and mitigating the Hughes phenomenon [J]. IEEE Transactions on Geoscience and Remote Sensing, 1994, 32(5):1087-1095. 被引量:1
  • 10NIGAM K, McCALLUM A K, THRUN S, et al. Text classification from labeled and unlabeled documents using EM [J]. Machine Learning, 2000, 39(2/3): 103-134. 被引量:1

二级参考文献37

  • 1Settles B. Active Learning Literature Survey, Computer Science Technical Report 1648, University of Wisconsin- Madison, USA, 2009. 3-4. 被引量:1
  • 2Dasgupta S. Coarse sample complexity bounds for active learning. Advances in Neural Information Processing Sys- tems. Cambridge: The MIT Press, 2006. 235-242. 被引量:1
  • 3Tong S, Chang E. Support vector machine active learning for image retrieval. In: Proceedings of the 9th ACM Inter- national Conference on Multimedia. New York, USA: ACM, 2001. 107-118. 被引量:1
  • 4Tong S, Koller D. Support vector machine active learning with applications to text classification. The Journal of Ma- chine Learning Research, 2002, 2:45-66. 被引量:1
  • 5Seung H S, Opper M, Sompolinsky H. Query by commit- tee. In: Proceedings of the 5th Annual Workshop on Com- putational Learning Theory. New York, USA: ACM, 1992. 287-294. 被引量:1
  • 6Dagan I, Engelson S P. Committee-based sampling for train- ing probabilistic classifiers. In: Proceedings of the 12th International Conference on Machine Learning. California, USA: Morgan Kaufmann, 1995. 150-157. 被引量:1
  • 7Hoi S C H, Jin R, Lyu M R. Batch mode active learning with applications to text categorization and image retrieval. IEEE Transactions on Knowledge and Data Engineering, 2009, 21(9): 1233-1248. 被引量:1
  • 8Joshi A J, Porikli F, Papanikolopoulos N. Multi-class ac- tive learning for image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recog- nition. Miami, USA: IEEE, 2009. 2372-2379. 被引量:1
  • 9Zhu X J. Semi-supervised Learning Literature Survey, Computer Sciences Technical Report 1530, University of Wisconsin-Madison. USA. 2008. 11-13. 被引量:1
  • 10Riloff E, Wiebe J, Wilson T. Learning subjective nouns using extraction pattern bootstrapping. In: Proceedings of the 7th Conference on Natural Language Learning. Stroudsburg, USA: Association for Computational Linguis- tics, 2003. 25-32. 被引量:1

共引文献120

同被引文献13

引证文献2

二级引证文献27

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部