摘要
随着通信网络工程和新型基础设施技术的不断发展与完善,我国正逐渐实现从4G社会向5G社会的转型。5G其低时延、大带宽、广连接的技术优势,成为建设智慧城市和数字乡村重要的技术背景。为实现智慧城市建设所需的5G网络大规模连接条件,需要更高的5G用户使用率。基于此问题,本文从某移动大数据平台获取数据,基于5G潜在用户的预测问题利用机器学习建立分类预测模型,正确识别出潜在的5G用户并对其进行精准业务推荐,提升我国5G使用率,推进新型智慧城市建设的快速升级。构建预测模型的过程主要包括数据预处理、特征工程、模型的训练和评估。首先对数据进行预处理及探索性分析,针对数据进行了包含数据清洗、去除唯一值属性、数据变换等在内的一系列预处理工作,随后通过卡方检验、独立样本T检验和皮尔逊相关系数法对本文数据集中的特征进行了变量筛选,筛选出特征重要度高的24个特征变量。根据筛选出的特征变量构建模型,包括随机森林模型、CatBoost模型、LightGBM模型并进行参数调优,寻找最优参数。根据得到的最优参数建立模型并通过测试集进行测试,通过准确率、召回率、AUC值指标进行模型评价,对比发现LightGBM模型对于5G潜在用户预测效果总体上优于其他模型。另外通过上述模型得到特征的重要性得分并进行重要性排序。通过本文方法实现对5G潜在用户较为准确的识别与挖掘,运营商可据此实现对不同客户的精准营销,推进更多用户实现4G向5G的转变,加快我国5G市场的持续发展和智慧化城市的建设。
With the continuous development and improvement of communication network engineering and new infrastructure technologies, China is gradually realizing the transition from a 4G society to a 5G society. 5G, with its technical advantages of low latency, large bandwidth and wide connectivity, has become an important technical background for the construction of smart cities and digital villages. In order to achieve the conditions for large-scale connectivity of 5G networks required for the con-struction of smart cities, a higher utilization rate of 5G users is required. Based on this problem, this paper obtains data from a mobile big data platform, builds a classification prediction model based on the prediction problem of potential 5G users, correctly identifies potential 5G users and makes accurate service recommendations to them, improves the 5G utilization rate in China, and promotes the rapid upgrade of the construction of new smart cities. The process of building the prediction model mainly includes data pre-processing, feature engineering, training and evaluation of the model. Firstly, data pre-processing and exploratory analysis were performed, and a series of pre-processing work including data cleaning, removal of unique value attributes, data transformation, etc. were carried out for the data, followed by variable screening of the features in the dataset of this paper through chi-square test, statistical t-test and Pearson correlation coefficient method, and 24 feature variables with high feature importance were screened out. Models were constructed based on the screened feature variables, including Random Forest model, CatBoost model, and LightGBM model, and parameter tuning was performed to find the optimal parameters. The models are built according to the obtained optimal parameters and tested by the test set, and the models are evaluated by accuracy, recall, and AUC value indexes, and the comparison reveals that the LightGBM model is generally better than other models for 5G potential user prediction. In
出处
《数据挖掘》
2023年第2期173-184,共12页
Hans Journal of Data Mining