为了考察辅助变量、时间滞后变量设置的重要性和神经网络中嵌入层对分类变量处理的有效性,利用2015年1月15日-2020年12月31日欧洲中期天气预报中心(European Centre for Medium-Range Weather Forecasts,ECMWF)高分辨率模式(high resolu...为了考察辅助变量、时间滞后变量设置的重要性和神经网络中嵌入层对分类变量处理的有效性,利用2015年1月15日-2020年12月31日欧洲中期天气预报中心(European Centre for Medium-Range Weather Forecasts,ECMWF)高分辨率模式(high resolution,HRES)输出产品及中国2238个国家级地面气象站基本气象要素数据集,在全连接神经网络基础上设计4个试验,构建24 h最高气温预报神经网络模型。结果表明:加入辅助变量、时间滞后变量的特征和带有嵌入层的全连接神经网络结构的深度学习神经网络模型对HRES日最高气温预报误差均有订正效果,均方根误差降低29.72%~47.82%,温度预报准确率提高16.67%~38.89%。加入经过嵌入层处理的辅助变量后,可显著提高青藏高原中南部和西南地区东部的平均绝对偏差不超过2℃的正技巧站点比例(比仅用HRES预报因子建模分别提高21.74%和14.17%),在此基础上加入时间滞后变量显著提高上述两个地区的平均绝对偏差不超过2℃的正技巧站点比例(比仅用HRES预报因子建模分别提高40.98%和20.33%),且预报性能更加稳定。展开更多
Heterogeneous information networks,which consist of multi-typed vertices representing objects and multi-typed edges representing relations between objects,are ubiquitous in the real world.In this paper,we study the pr...Heterogeneous information networks,which consist of multi-typed vertices representing objects and multi-typed edges representing relations between objects,are ubiquitous in the real world.In this paper,we study the problem of entity matching for heterogeneous information networks based on distributed network embedding and multi-layer perceptron with a highway network,and we propose a new method named DEM short for Deep Entity Matching.In contrast to the traditional entity matching methods,DEM utilizes the multi-layer perceptron with a highway network to explore the hidden relations to improve the performance of matching.Importantly,we incorporate DEM with the network embedding methodology,enabling highly efficient computing in a vectorized manner.DEM's generic modeling of both the network structure and the entity attributes enables it to model various heterogeneous information networks flexibly.To illustrate its functionality,we apply the DEM algorithm to two real-world entity matching applications:user linkage under the social network analysis scenario that predicts the same or matched users in different social platforms and record linkage that predicts the same or matched records in different citation networks.Extensive experiments on real-world datasets demonstrate DEM's effectiveness and rationality.展开更多
针对词向量语义信息不完整以及文本特征抽取时的一词多义问题,提出基于BERT(Bidirectional Encoder Representation from Transformer)的两次注意力加权算法(TARE)。首先,在词向量编码阶段,通过构建Q、K、V矩阵使用自注意力机制动态编...针对词向量语义信息不完整以及文本特征抽取时的一词多义问题,提出基于BERT(Bidirectional Encoder Representation from Transformer)的两次注意力加权算法(TARE)。首先,在词向量编码阶段,通过构建Q、K、V矩阵使用自注意力机制动态编码算法,为当前词的词向量捕获文本前后词语义信息;其次,在模型输出句子级特征向量后,利用定位信息符提取全连接层对应参数,构建关系注意力矩阵;最后,运用句子级注意力机制算法为每个句子级特征向量添加不同的注意力分数,提高句子级特征的抗噪能力。实验结果表明:在NYT-10m数据集上,与基于对比学习框架的CIL(Contrastive Instance Learning)算法相比,TARE的F1值提升了4.0个百分点,按置信度降序排列后前100、200和300条数据精准率Precision@N的平均值(P@M)提升了11.3个百分点;在NYT-10d数据集上,与基于注意力机制的PCNN-ATT(Piecewise Convolutional Neural Network algorithm based on ATTention mechanism)算法相比,精准率与召回率曲线下的面积(AUC)提升了4.8个百分点,P@M值提升了2.1个百分点。在主流的远程监督关系抽取(DSER)任务中,TARE有效地提升了模型对数据特征的学习能力。展开更多
文摘为了考察辅助变量、时间滞后变量设置的重要性和神经网络中嵌入层对分类变量处理的有效性,利用2015年1月15日-2020年12月31日欧洲中期天气预报中心(European Centre for Medium-Range Weather Forecasts,ECMWF)高分辨率模式(high resolution,HRES)输出产品及中国2238个国家级地面气象站基本气象要素数据集,在全连接神经网络基础上设计4个试验,构建24 h最高气温预报神经网络模型。结果表明:加入辅助变量、时间滞后变量的特征和带有嵌入层的全连接神经网络结构的深度学习神经网络模型对HRES日最高气温预报误差均有订正效果,均方根误差降低29.72%~47.82%,温度预报准确率提高16.67%~38.89%。加入经过嵌入层处理的辅助变量后,可显著提高青藏高原中南部和西南地区东部的平均绝对偏差不超过2℃的正技巧站点比例(比仅用HRES预报因子建模分别提高21.74%和14.17%),在此基础上加入时间滞后变量显著提高上述两个地区的平均绝对偏差不超过2℃的正技巧站点比例(比仅用HRES预报因子建模分别提高40.98%和20.33%),且预报性能更加稳定。
基金supported by the National Natural Science Foundation of China Youth Fund under Grant No.61902001.
文摘Heterogeneous information networks,which consist of multi-typed vertices representing objects and multi-typed edges representing relations between objects,are ubiquitous in the real world.In this paper,we study the problem of entity matching for heterogeneous information networks based on distributed network embedding and multi-layer perceptron with a highway network,and we propose a new method named DEM short for Deep Entity Matching.In contrast to the traditional entity matching methods,DEM utilizes the multi-layer perceptron with a highway network to explore the hidden relations to improve the performance of matching.Importantly,we incorporate DEM with the network embedding methodology,enabling highly efficient computing in a vectorized manner.DEM's generic modeling of both the network structure and the entity attributes enables it to model various heterogeneous information networks flexibly.To illustrate its functionality,we apply the DEM algorithm to two real-world entity matching applications:user linkage under the social network analysis scenario that predicts the same or matched users in different social platforms and record linkage that predicts the same or matched records in different citation networks.Extensive experiments on real-world datasets demonstrate DEM's effectiveness and rationality.
文摘针对词向量语义信息不完整以及文本特征抽取时的一词多义问题,提出基于BERT(Bidirectional Encoder Representation from Transformer)的两次注意力加权算法(TARE)。首先,在词向量编码阶段,通过构建Q、K、V矩阵使用自注意力机制动态编码算法,为当前词的词向量捕获文本前后词语义信息;其次,在模型输出句子级特征向量后,利用定位信息符提取全连接层对应参数,构建关系注意力矩阵;最后,运用句子级注意力机制算法为每个句子级特征向量添加不同的注意力分数,提高句子级特征的抗噪能力。实验结果表明:在NYT-10m数据集上,与基于对比学习框架的CIL(Contrastive Instance Learning)算法相比,TARE的F1值提升了4.0个百分点,按置信度降序排列后前100、200和300条数据精准率Precision@N的平均值(P@M)提升了11.3个百分点;在NYT-10d数据集上,与基于注意力机制的PCNN-ATT(Piecewise Convolutional Neural Network algorithm based on ATTention mechanism)算法相比,精准率与召回率曲线下的面积(AUC)提升了4.8个百分点,P@M值提升了2.1个百分点。在主流的远程监督关系抽取(DSER)任务中,TARE有效地提升了模型对数据特征的学习能力。