摘要
梳理近60年(1960~2019)语言智能技术专利申请文献,可以发现近5年语言智能技术进步显著,预计在未来较长一段时期内仍将处于技术爆发期。当下,语言数据治理的重要性日渐凸显。分析当前智能技术赋能下机器翻译、智能客服、网络舆情监测、多语言资源建设等语言数据热点服务,指出语言数据治理体系面临的技术困境:(1)语言数据的偏见现象;(2)经典语言治理模型的短板。为破解困境并弥补经典数据挖掘模式的短板,提出点状聚合、线性组合和多层事态3种语言数据治理模式并展开对比分析,以期对智能化数据治理提供参考。
A review of the literature on patent applications for language intelligence technology over the past 60 years(1960–2019) reveals that language intelligence technology has advanced signifi cantly in the past fi ve years. It is anticipated that the technological explosion will last for a long time in the future. The rapid development of language intelligence technology highlights the increasing importance of language data governance. Focusing on language data service sectors such as machine translation, intelligent customer service, opinion monitoring, and multilingual resource construction, this review paper analyses the tendencies of language data service development empowered by intelligent technologies. It points out that the language data governance system faces two technical complications, namely language data bias, and limitations of the traditional language governance models. In order to resolve the dilemma and challenges in language data processing and mining, three language data governance models are proposed and comparatively analysed, i.e., point aggregation, linear combination, and multi-layer state of af fairs, which may serve as a reference for intelligent data governance.
作者
张凯
薛嗣媛
周建设
Zhang Kai;Xue Siyuan;Zhou Jianshe
出处
《语言战略研究》
CSSCI
北大核心
2022年第4期35-48,共14页
Chinese Journal of Language Policy and Planning
基金
国家语委“十四五”科研规划2021年度重大项目“我国语言文字治理体系现状及创新研究”(ZDA145-1)
国家语委科研项目“面向基础教育的语言文字运用能力提升——信息化条件下领域专用情感词库构建研究”(YB135-163)、“人工智能技术赋能中文学习研究——中文篇章逻辑结构表征和智能评估”(YB145-16)
科技部科技创新2030重大项目“复杂版面手写图文识别及理解关键技术研究”(2020AAA0109700)。
关键词
专利文献分析
语言智能技术发展
语言数据治理
语言数据治理技术模式
patent document analysis
language intelligence technology
language data governance
language data governance model