摘要
就煤炭相关的政策文件发布对股市中煤炭板块影响这一问题,构建了自动生成政策数据的语料库,对政策文本及股市的涨跌进行预测分析。政策文本是爬取国家发展与改革委员会的政务公开文件,煤炭股市选择了龙头企业中国神华。通过对政策文本的向量化及人工添加标签值建立了逻辑回归模型,预测准确率为44.58%。而经过文本归一化、PCA主成份分析降维以及Birch文本聚类后,所得的预测模型准确率为62.20%,查准率为68.29%,查全率为60.87%,其中Birch文本聚类模型对政策文本信息的识别率为82.56%。
Regarding the impact of the release of coal-related policy documents on the coal sector in the stock market,a cor-pus of automatically generated policy data was constructed,and the policy text and the rise and fall of the stock market were pre-dicted and analyzed.The policy text is to crawl the public documents of the National Development and Reform Commission,and the coal stock market chooses the leading enterprise China Shenhua.The logistic regression model is established by vectorizing the policy text and manually adding label values,and the prediction accuracy is 44.58%.After text normalization,PCA principal com-ponent analysis dimension reduction and Birch text clustering,the accuracy rate of the prediction model is 62.20%,the precision rate is 68.29%,and the recall rate is 60.87%,and the recognition rate of the Birch text clustering model for policy text information is 82.56%.
作者
陈浩
于辰云
冯锡炜
吴建胜
侯伟
李品乐
王超琦
赵驰
桂亚飞
Chen Hao;Yu Chenyun;Feng Xiwei;Wu Jiansheng;Hou Wei;Li Pinyue;Wang Chaoqi;Zhao Chi;Gui Yafei(School of Information and Control engineering,Liaoning Petrochemical University,Fushun 113001,China)
出处
《现代计算机》
2023年第19期42-47,共6页
Modern Computer
基金
2022年度辽宁省教育厅面上项目(LJKMZ20220741)。
关键词
语料库
股市
逻辑回归
政策信息
文本聚类
corpus
stock market
logistic regression
policy information
text clustering