摘要
债券市场充斥着海量且复杂的信息,而构建能够表达债券市场复杂语义的数字词典(预训练词向量),是充分利用这些信息并实现金融科技赋能业务的关键。目前,不仅缺乏债券领域专用的预训练词向量,而且词向量的评估也是一大挑战。上述研究提出了一种联合字组件、字和词信息的的债券领域多粒度词向量训练框架(BondJWE)。此外,上述研究为了实现对该词向量的科学评估,针对已有数据特点设计了下游文本分类任务。以上研究弥补了债券领域的专用预训练词向量研究的空白,且其实验结果表明BondJWE的性能优于其它基线模型,说明以上研究所提供的多粒度词向量有着更好的语义表达能力和鲁棒性。
The bond market is flooded with massive and complex information,while the key to fully utilizing this information and implementing the aim that fintech enables businesses is to construct a digital dictionary(namely,pretrained word embeddings),which can describe complex semantics in the bond market.So far,there has been a lack of pre-trained bond-specific embeddings,and their evaluation has also been a big challenge.On the basis of joint information of components,characters and words,this study proposed a multi-granularity word embeddings training framework for the bond field,named BondJWE.Moreover,to evaluate these embeddings scientifically,this study designed a downstream task,text classification,according to intrinsic features of data.This study makes up for the blank of research on pre-trained bond-specific embeddings.And results show that the performance of BondJWE is better than that of other baseline models,which indicates that these multi-granularity word embeddings can better express semantics and are more robust.
作者
华娇娇
唐华云
王延昭
商丽丽
HUA Jiao-jiao;TANG Hua-yun;WANG Yan-zhao;SHANG Li-li(Postdoctoral Research Workstation,China Central Depository&Clearing Co.,Ltd.,Beijing 100033 China;Blockchain Lab,Chinabond Financial and Information Technology Co.,Ltd,Beijing 100004,China)
出处
《计算机仿真》
2024年第3期260-266,共7页
Computer Simulation
基金
绿色发展大数据决策北京市重点实验室(dm202103)
博士后科学基金资助项目(2022M723692)。
关键词
词向量
文本分类
债券
Word embeddings
Text classification
Bond