摘要
将自然语言转化成数据库可以执行的查询语句,是目前智能交互和人机对话系统的核心难题,也是新型供电列车大数据运用支撑平台对接应用平台及建立城轨列车个性化运维系统的难点。现有的基于神经网络的方法没有充分利用数据表的丰富信息,影响了查询的准确率。针对数据表内容作为输入的情况下,如何提升自然语言查询接口的查询准确率的问题,文中创新地提出了基于数据表内容的字段嵌入方法,利用数据表中每个字段存储的内容对字段进行嵌入表示,并据此提出了新的模型嵌入层结构;此外,提出了一种基于数据表内容的数据增强方法,通过用数据表相同字段中的其他记录去代替查询语句中的属性值,来产生新的训练样本。最后,针对提出的字段嵌入表示和数据增强方法,在WikiSQL数据集上进行了对比实验。实验结果显示,相比当前效果最好的模型,单独使用这两种方法时能够提升0.6%~0.8%的查询准确率,共同使用时则能够提升接近1%的查询准确率,证明所提字段嵌入和数据增强方法对查询准确率有一定的提升作用。
Converting natural language into query statements that can be executed in database is the core problem of intelligent interaction and human-computer dialogue system,and is also the urgent need of personalized operation and maintenance system for urban rail trains.At the same time,it is the difficulty of docking the bottom application platform with the support platform for large data application of the new power supply train.The existing neural network-based methods don’t utilizing semantic-rich table content or utilize it partially,which limits the improvement of the execution accuracy.This paper studies how to improve the query accuracy of natural language query interfaces when table content is included in the inputs.Aiming at this problem,this paper proposes a table column embedding method based on table content which embeds the table columns by utilizing the content stored in each table column.Based on the method,this paper proposes a new structure of embedding layer.This paper also proposes a method of data augmentation by utilize table content.It generates new training samples by replacing attribute values in queries with other records in the same column of the table.This paper finally conducts experiments on WikiSQL dataset for the proposed methods of column embedding and data augmentation.The experimental results show that,on the basis of the state-of-the-art methods,the two methods can improve the query accuracy by 0.6%~0.8%when they are used separately and nearly 1%when they are used together.Therefore,it proves that the methods of column embedding and data augmentation proposed in this paper can achieve good improvements on execution accuracy.
作者
田野
寿黎但
陈珂
骆歆远
陈刚
TIAN Ye;SHOU Li-dan;CHEN Ke;LUO Xin-yuan;CHEN Gang(College of Computer Science and Technology,Zhejiang University,Hangzhou 310027,China;Key Laboratory of Big Data Intelligent Computing of Zhejiang Province,Hangzhou 310027,China)
出处
《计算机科学》
CSCD
北大核心
2020年第9期60-66,共7页
Computer Science
基金
国家重点研发计划(2017YFB1201001)
国家自然科学基金(61672455)
浙江省自然科学基金(LY18F020005)。
关键词
数据库查询
自然语言处理
SQL
词嵌入
Database query
Natural language processing
SQL
Word embedding