在软件工程领域,代码补全是集成开发环境(integrated development environment,IDE)中最有用的技术之一,提高了软件开发效率,成为了加速现代软件开发的重要技术.通过代码补全技术进行类名、方法名、关键字等预测,在一定程度上提高了代...在软件工程领域,代码补全是集成开发环境(integrated development environment,IDE)中最有用的技术之一,提高了软件开发效率,成为了加速现代软件开发的重要技术.通过代码补全技术进行类名、方法名、关键字等预测,在一定程度上提高了代码规范,降低了编程人员的工作强度.近年来,人工智能技术的发展促进了代码补全技术的发展.总体来说,智能代码补全技术利用源代码训练深度学习网络,从语料库学习代码特征,根据待补全位置的上下文代码特征进行推荐和预测.现有的代码特征表征方式大多基于程序语法,没有反映出程序的语义信息.同时,目前使用到的网络结构在面对长代码序列时,解决长距离依赖问题的能力依旧不足.因此,提出了基于程序控制依赖关系和语法信息结合共同表征代码的方法,并将代码补全问题作为一个基于时间卷积网络(time convolution network,TCN)的抽象语法树(abstract grammar tree,AST)节点预测问题,使得网络模型可以更好地学习程序的语法和语义信息,并且可以捕获更长范围的依赖关系.实验结果表明,该方法比现有方法的准确率提高了约2.8%.展开更多
当前的位置预测方法大多没有考虑到用户行为信息,由于用户的访问时间、行为模式等能够在很大程度上反映所处位置,因此在对位置潜在向量进行预训练时有必要使用该信息。进行位置预测时,采样粒度较细的序列长度较长,难以捕获长距离依赖。...当前的位置预测方法大多没有考虑到用户行为信息,由于用户的访问时间、行为模式等能够在很大程度上反映所处位置,因此在对位置潜在向量进行预训练时有必要使用该信息。进行位置预测时,采样粒度较细的序列长度较长,难以捕获长距离依赖。针对这2个问题,提出了基于用户行为和上下文语义的分层时空长短期记忆网络(Hierarchical Spatiotemporal Long Short-Term Memory Based on User Behavior and Contextual Semantics,CHST-LSTM)模型。该模型通过Transformer编码层处理轨迹数据,将用户相关行为信息考虑在内,融合位置的上下文语义信息,通过预训练得到位置的嵌入表征。根据用户的行为状态分割轨迹阶段,采用编码器-解码器方式对ST-LSTM进行分段分层扩展,利用BiLSTM对全局信息建模,同时处理轨迹的长短期变化,解决长序列的长距离依赖问题。对外卖员用户群体的真实移动轨迹数据进行分析和实验,通过聚类发现其特有的工作模式,在预训练时加入工作模式信息与到访时间信息,得到位置的特征向量并用于预测模型。结果表明CHST-LSTM模型在预测用户下一位置时精度更高。展开更多
Semantic annotation of Web objects is a key problem for Web information extraction. The Web contains an abundance of useful semi-structured information about real world objects, and the empirical study shows that stro...Semantic annotation of Web objects is a key problem for Web information extraction. The Web contains an abundance of useful semi-structured information about real world objects, and the empirical study shows that strong two-dimensional sequence characteristics and correlative characteristics exist for Web information about objects of the same type across different Web sites. Conditional Random Fields (CRFs) are the state-of-the-art approaches taking the sequence characteristics to do better labeling. However, as the appearance of correlative characteristics between Web object elements, previous CRFs have their limitations for semantic annotation of Web objects and cannot deal with the long distance dependencies between Web object elements efficiently. To better incorporate the long distance dependencies, on one hand, this paper describes long distance dependencies by correlative edges, which are built by making good use of structured information and the characteristics of records from external databases; and on the other hand, this paper presents a two-dimensional Correlative-Chain Conditional Random Fields (2DCC-CRFs) to do semantic annotation of Web objects. This approach extends a classic model, two-dimensional Conditional Random Fields (2DCRFs), by adding correlative edges. Experimental results using a large number of real-world data collected from diverse domains show that the proposed approach can significantly improve the semantic annotation accuracy of Web objects.展开更多
文摘当前的位置预测方法大多没有考虑到用户行为信息,由于用户的访问时间、行为模式等能够在很大程度上反映所处位置,因此在对位置潜在向量进行预训练时有必要使用该信息。进行位置预测时,采样粒度较细的序列长度较长,难以捕获长距离依赖。针对这2个问题,提出了基于用户行为和上下文语义的分层时空长短期记忆网络(Hierarchical Spatiotemporal Long Short-Term Memory Based on User Behavior and Contextual Semantics,CHST-LSTM)模型。该模型通过Transformer编码层处理轨迹数据,将用户相关行为信息考虑在内,融合位置的上下文语义信息,通过预训练得到位置的嵌入表征。根据用户的行为状态分割轨迹阶段,采用编码器-解码器方式对ST-LSTM进行分段分层扩展,利用BiLSTM对全局信息建模,同时处理轨迹的长短期变化,解决长序列的长距离依赖问题。对外卖员用户群体的真实移动轨迹数据进行分析和实验,通过聚类发现其特有的工作模式,在预训练时加入工作模式信息与到访时间信息,得到位置的特征向量并用于预测模型。结果表明CHST-LSTM模型在预测用户下一位置时精度更高。
基金Supported by the National Natural Science Foundation of China under Grant No.90818001the Natural Science Foundation of Shandong Province of China under Grant No.Y2007G24
文摘Semantic annotation of Web objects is a key problem for Web information extraction. The Web contains an abundance of useful semi-structured information about real world objects, and the empirical study shows that strong two-dimensional sequence characteristics and correlative characteristics exist for Web information about objects of the same type across different Web sites. Conditional Random Fields (CRFs) are the state-of-the-art approaches taking the sequence characteristics to do better labeling. However, as the appearance of correlative characteristics between Web object elements, previous CRFs have their limitations for semantic annotation of Web objects and cannot deal with the long distance dependencies between Web object elements efficiently. To better incorporate the long distance dependencies, on one hand, this paper describes long distance dependencies by correlative edges, which are built by making good use of structured information and the characteristics of records from external databases; and on the other hand, this paper presents a two-dimensional Correlative-Chain Conditional Random Fields (2DCC-CRFs) to do semantic annotation of Web objects. This approach extends a classic model, two-dimensional Conditional Random Fields (2DCRFs), by adding correlative edges. Experimental results using a large number of real-world data collected from diverse domains show that the proposed approach can significantly improve the semantic annotation accuracy of Web objects.