摘要
在机器翻译、自动分类、搜索引擎等技术中,彝文分词具有很重要的作用,同时也是彝文信息处理至关重要的环节.本文以当前的彝文分词技术为基础,通过构建彝文词库,并用彝文网页获取平台抓取彝文网页文本,结合彝文特有的优势,从分词词库、分词算法、结构流程、系统界面和模块、实验结果等方面进行了详细的分析,最终实现彝文网页文本分词平台.最后的结果表明,本平台分词准确率较高,实用性和通用性也较好.
In the fields of machine translation, automatic classificationand search engine technology, Yi word segmentation plays a very important role, which is also a vital part in Yi language information processing. This paper is based on the current segmentation of Yi word.Through the construction of Yi thesaurus and webpages of Yi, we can grab the page texts of Yi. Combining with the advantages of Yi language, with a detailed analysis such as the thesaurus, word segmentation algorithm, flowchart and structure, system interface and modulesand the experimental results, we build the segmentation platform of Yi page text. Finally, the results show that the segmentation platform has a property of higher accuracy, practicality and versatility.
作者
孙善通
王嘉梅
李炳泽
胡刚
SUN Shah-Tong WANG Jia-Mei LI Bing-Ze HU Gang(School of Electrical and Information Technology, Yunnan Minzu University, Kunming 650500, Chin)
出处
《计算机系统应用》
2016年第11期243-246,共4页
Computer Systems & Applications
基金
国家自然科学基金(61363085)
关键词
彝文网页
词典分词
词库
彝文分词
分词平台
Yi web pages
segmentation dictionary
thesaurus
Yi word segmentation
word segmentation platform