摘要
在信用评估问题中,用户信息中既包含类别数据,也包含数值数据。传统的基于人工智能的信用评估模型通常对类别数据进行one-hot变换后,再与数值数据进行拼接作为判别器的输入。与之不同,借鉴了自然语言处理中的词嵌入技术来提取类别数据的词向量;将输入的词向量集合类比为“句子”,并基于自注意力机制从“句子”中提取出用户特征;最后采用多层感知机来预测用户违约的概率。新模型可以使用反向传播算法实现端到端的训练。在三个不同的数据集上将新模型和六种基准算法进行了比较,结果表明该模型能够比基准算法取得更好的性能。
In the credit scoring problem, the user information contains both category data and numerical data. Traditional artificial intelligence-based credit scoring algorithms usually transform the category data into one-hot vectors and joints them with numerical data, as the input of the discriminator. In contrast, this paper extracts vectors of category data based on the word embedding techniques which are popularly used in the natural language processing problem. After that, the set of the word vectors is analogized to a“sentence”, and the input feature is extracted from the“sentence”based on the self-attention mechanism. Finally, a Multi-Layer Perception(MLP)neural network is used to predict the probability of default. The new model is trained end- to-end by the back propagation method. Experimental results show the proposed new model achieves better performance than six other baselines on three well-known benchmark datasets.
作者
刘欣阳
曲彦文
周琪云
LIU Xinyang;QU Yanwen;ZHOU Qiyun(School of Computer Information and Engineering,Jiangxi Normal University,Nanchang 330022,China)
出处
《计算机工程与应用》
CSCD
北大核心
2019年第13期36-41,共6页
Computer Engineering and Applications
基金
国家自然科学基金(No.61562041,No.61866018)
关键词
信用评估
自注意力机制
词嵌入
特征提取
深度神经网络
credit scoring
self-attention mechanism
word embedding
feature extraction
deep neural network