摘要
文章提出的基于三元组可比语料库的自动语言剖析技术扩大了该研究领域的内涵,使其包括面向自然语言处理的应用研究。从工程可实现性考虑,创新性地提出建造三元组可比语料库,利用n-元词串、关键词簇和语义多词表达等自动抽取技术,通过对比中式英语表达,发掘英语本族语言模型,实现改进和发展机器翻译、跨语言信息检索等自然语言处理应用的目标。
The proposed automatic language profiling technologies based on the 3-tuple comparable corpora expand the connotation of this research field to include the natural language processing-oriented application and study.Considering the feasibility of the project,this paper innovatively puts forward the building of the 3-tuple comparable corpora and uses the automatic extraction technologies such as n-grams,keyword clusters and semantic multi-word expression to develop the English native language model by comparing with the Chinese type English expression so as to improve and develop the application of natural language processing such as machine translation and cross-language information retrieval.
出处
《情报理论与实践》
CSSCI
北大核心
2012年第4期94-98,共5页
Information Studies:Theory & Application
基金
解放军总后勤部司令部2011年度后勤科研条件建设项目"军事后勤专业术语库及双语资源库信息处理平台"的阶段性研究成果
项目编号:2011-ZHTJ-5031
关键词
机器翻译
三元组可比语料库
自动语言剖析
情报智能处理
machine translation
3-tuple comparable corpora
automatic language profiling
intelligent information processing