摘要
在金融学领域的传统实证研究中,所用数据多局限于财务报表和股票市场数据等结构化数据。而在大数据时代,计算机技术的进步使得数据类型不断丰富,研究者开始将非结构化的文本大数据引入到金融学领域的研究中,其主要包括上市公司披露文本、财经媒体报道、社交网络文本、网络搜索指数以及P2P网络借贷文本等,并对文本的可读性、语气语调、相似性以及语义特征展开研究。本文首先介绍了金融学领域文本大数据挖掘步骤和方法,描述了语料获取、预处理过程、文档表示以及文档的特征抽取;然后根据不同的文本信息来源,梳理了金融学文本大数据的研究进展;最后对未来金融学文本大数据的研究方法和研究内容进行了展望。
Traditional empirical studies in the field of finance usually rely on structured data such as financial statements and stock market trading data.In the era of big data,data types have enriched with the improvement of computer technology and researchers have begun to introduce textual big data into the field of finance,mainly including the disclosure documents of listed companies,financial media reports,social network texts,internet search index,P2P online lending texts,and have examined the readability,tone,similarity and semantic characteristics of the text.This paper first introduces the steps and methods of textual big data mining in the field of finance,describing the corpus acquisition,preprocessing,document representation and the extraction process of document features.In addition,according to different sources of textual information,this paper introduces the research progress in financial textual big data.Finally,this paper provides a comprehensive research prospect on the research methods and topics of financial textual big data.
作者
姚加权
张锟澎
罗平
YAO Jiaquan;ZHANG Kunpeng;LUO Ping(Jinan University, Guangzhou, China;Chinese Academy of Sciences, Beijing, China;University of Chinese Academy of Sciences,Beijing,China)
出处
《经济学动态》
CSSCI
北大核心
2020年第4期143-158,共16页
Economic Perspectives
基金
国家自然科学基金项目(71502152,U1811461)
国家社科基金重大项目(18ZDA092)
国家重点研发计划课题(2017YFB1002104)的资助。
关键词
文本大数据
文本分析
机器学习
深度学习
数据挖掘
Textual Big Data
Textual Analysis
Machine Learning
Deep Learning
Data Mining