摘要
使用基于有监督机器学习分类器的岩性预测方法时,如果样本集中目标岩性样本过少,而非目标岩性样本过多,在这种不平衡样本集上训练分类器会使预测结果向非目标岩性偏倚,导致目标岩性的预测准确率较低。为了解决这一问题,提出一种针对不平衡样本集的随机森林岩性预测方法。首先,以录井岩性数据作为岩性样本标签,以井旁道地震属性和岩石弹性参数作为岩性样本特征构建岩性样本集;其次,将近邻清除算法(NM)与合成少数类过采样算法(SMOTE)相结合形成NM-SMOTE算法,对岩性样本集进行平衡化;然后,用平衡化的岩性样本集训练随机森林分类器,建立多种地震属性、弹性参数与岩性之间的非线性关系;最后,将目标探区的地震属性和弹性参数输入随机森林分类器,随机森林分类器将依据训练时得到的地震属性、弹性参数与岩性的非线性关系预测岩性。实际数据测试结果表明:训练样本集中过多的非目标岩性样本会对随机森林分类器的预测效果带来负面影响,岩性预测准确率仅为38%;使用NM-SMOTE算法对训练样本集进行平衡化后,岩性预测准确率提高至83%,获得的岩性数据体与地震资料吻合程度更高。
For the lithology prediction method depending on a supervised machine learning classifier,if the data set has too few samples of target lithology while too many samples of non-target lithology,the classifier trained on this imbalanced data set will cause the prediction results be biased toward the non-target lithology,resulting in poor prediction accuracy of target lithology.With regard to this problem,a Random Forests lithology prediction method for imbalanced data sets is proposed.Firstly,a lithology data set is constructed with lithological logging data as sample labels and seismic attributes and elastic parameters of rock at the uphole trace as sample features.Secondly,the NMSMOTE algorithm integrating near miss(NM)and synthetic minority over-sampling technique(SMOTE)is employed to balance the lithology data set.Then a Random Forests classifier is trained on the balanced data set to build a nonlinear relationship of lithology with various seismic attributes and elastic parameters.Finally,the seismic attributes and elastic parameters of the target exploratory area are input into the Random Forests classifier which will predict lithology according to the above nonlinear relationship obtained during training.The actual data test results demonstrate that too many samples of non-target lithology will affect the prediction accuracy of the Random Forests classifier,and the prediction accuracy of lithology is only 38%.After the training data set is balanced with the NM-SMOTE algorithm,the prediction accuracy of lithology rises up to 83%,and a data volume of lithology is obtained,which is more consistent with seismic data.
作者
王光宇
宋建国
徐飞
张文
刘炯
陈飞旭
WANG Guangyu;SONG Jianguo;XU Fei;ZHANG Wen;LIU Jiong;CHEN Feixu(School of Geosciences,China University of Petroleum(East China),Qingdao,Shandong 266580,China;School of Earth and Space Sciences,University of Science and Technology of China,Hefei,Anhui 230026,China;SINOPEC Petroleum Exploration and Production Research Institute,Beijing 100083,China;Research Institute of Petroleum Exploration and Development,PetroChina Tarim Oilfield Company,Korla,Xinjiang 841000,China)
出处
《石油地球物理勘探》
EI
CSCD
北大核心
2021年第4期679-687,I0007,共10页
Oil Geophysical Prospecting
基金
国家科技重大专项“陆相页岩油甜点地球物理识别与预测方法”(2017ZX05049-002)
国家自然科学基金面上项目“叠前数据挖掘与储层参数非线性预测”(41674125)
中石油重大科技项目“塔里木盆地深层复杂高陡构造与碳酸盐岩储层地震速度建模及成像关键技术研究”(ZD2019-181-003)联合资助。
关键词
岩性预测
机器学习
随机森林分类
不平衡样本集
类别平衡化技术
lithology prediction
machine learning
Random Forests classification
imbalanced data sets
class balancing techniques