摘要
由于股评、新闻对股票价格变化有巨大影响,为选出优质股票以提高投资的收益率,采用了自然语言处理NLP技术对股评数据和新闻数据进行分析,基于朴素贝叶斯模型建立了文本情感倾向分类模型,模型预测准确率达到84%,生成了股评因子。基于LDA主题模型对新闻文本进行话题建模,快速获取新闻文本主题,并引入困惑度寻找文档最优主题数,生成了新闻因子,将股评因子和新闻因子作为筛选股票的依据,从股评和新闻信息中获取对股市带来的影响因素,从而优化选股策略。对于股票基本面数据,采用决策树模型进行因子的重要性分析,选出重要性最高的前5个因子,模型预测准确率达到88%。通过决策树模型,可以更准确地确定哪些因子在影响股价变化方面发挥着关键作用,这种改进的方法能够提高选股策略的有效性和准确性。最终使用主成分分析(PCA)对数据进行降维处理,依据主成分数值的高低来进行股票选择。
This article focuses on the impact of stock reviews and news on stock prices.In order to select high-quality stocks to improve investment returns,natural language processing(NLP)technology was used to analyze stock review and news data.Based on a naive Bayesian model,a text sentiment tendency classification model was established,with a prediction accuracy of 84%and the generation of stock review factors.Based on the LDA topic model,topic modeling is performed on news texts to quickly obtain the topic of the news text,and confusion is introduced to search for the optimal number of topics in the document.News factors are generated,and stock evaluation factors and news factors are used as the basis for screening stocks.This article obtains the influenc-ing factors on the stock market from stock evaluation and news information,thereby optimizing stock selection strategies.For stock fundamental data,this article uses a decision tree model for factor importance analysis,selecting the top 5 most important factors.The model's prediction accuracy reaches 88%.Through the decision tree model,it can more accurately determine which factors play a key role in influencing stock price changes.This improved method can improve the effectiveness and accuracy of stock selec-tion strategies.Finally,principal component analysis(PCA)is used to reduce the dimensionality of the data and select stocks based on the values of the principal components.
作者
吴彦昕
李宏滨
胡冠真
Wu Yanxin;Li Hongbin;Hu Guanzhen(School of Computer Science and Technology,Taiyuan Normal University,Jinzhong 030619,China)
出处
《现代计算机》
2024年第3期76-82,共7页
Modern Computer
关键词
自然语言处理
文本情感倾向分类模型
LDA主题模型
决策树模型
主成分分析
natural language processing
text sentiment tendency classification model
LDA theme model
decision tree model
principal component analysis