摘要
传媒语言语料库是一项重要的语言资源和必要的现代化研究手段。目前,我们初步建立了一个以汉蒙影视剧语料库(80万字/词)和蒙古语新闻语料库(95万字)为主的蒙古语传媒语言文本语料库。本文主要介绍了该语料库的构建工作,包括总体规划、语料采集、加工标注、词典建设以及软件开发等。该工作的开展和深入将促进蒙古语传媒语言语料库的开发和利用,从而推动相关理论研究和应用技术的不断发展。
Media language corpus is an important linguistic resource and the necessary modern research methods. At present, we primarily established the Mongolian language media texts corpus mainly with Chinese - Mongolian TV drama corpus (800,000 characters / words) and Mongolian news corpus (950,000 words). This paper describes the construction of the corpus, including overall planning, corpus capture, processing and labeling, dictionary building and software development. The implementation and promotion of this work will further promote the development and use of the Mongolian language media texts corpus, thus promoting the development of relevant theoretical research and applied technology.
出处
《内蒙古师范大学学报(哲学社会科学版)》
2016年第4期70-74,共5页
Journal of Inner Mongolia Normal University:Philosophy and Social Sciences Edition
基金
蒙古语言文字信息化专项扶持资金项目(MW-2014-MGYWXXH-01)
内蒙古师范大学引进高层次人才项目(2013YJRC015)支持的支持
关键词
蒙古语
传媒语言
文本
语料库
Mongolian language
media language
texts~ corpus