Feature selection is very important to obtain meaningful and interpretive clustering results from a clustering analysis. In the application of soil data clustering, there is a lack of good understanding of the respons...Feature selection is very important to obtain meaningful and interpretive clustering results from a clustering analysis. In the application of soil data clustering, there is a lack of good understanding of the response of clustering performance to different features subsets. In the present paper, we analyzed the performance differences between k-means, fuzzy c-means, and spectral clustering algorithms in the conditions of different feature subsets of soil data sets. The experimental results demonstrated that the performances of spectral clustering algorithm were generally better than those of k-means and fuzzy c-means with different features subsets. The feature subsets containing environmental attributes helped to improve clustering performances better than those having spatial attributes and produced more accurate and meaningful clustering results. Our results demonstrated that combination of spectral clustering algorithm with the feature subsets containing environmental attributes rather than spatial attributes may be a better choice in applications of soil data clustering.展开更多
Using seismic attributes as features for classification in feature space, in various aims such as seismic facies analysis, is conventional for the purpose of seismic interpretation. But sometimes seismic data may have...Using seismic attributes as features for classification in feature space, in various aims such as seismic facies analysis, is conventional for the purpose of seismic interpretation. But sometimes seismic data may have no attributes or it is hard to define a small and relevant set of attributes in some applica- tions. Therefore, employing techniques that perform facies modeling without using attributes is neces- sary. In this paper we present a new method for facies modeling of seismic data with missing attributes that called dissimilarity based classification. In this method, classification is based on dissimilarities and facies modeling will be done in dissimilarity space. In this space dissimilarities consider as new features instead of real features. A support vector machine as a powerful classifier was employed in both feature space (feature-based) and dissimilarity space (feature-less) for facies analysis. The proposed feature-less and feature-based classification is applied on a real seismic data from an Iranian oil field. Facies model- ing using seismic attributes provide better results, but the feature-less classification outcome is also satis- factory and the facies correlation is acceptable. Indeed, the power of attributes to discriminate different facies causes to that facies analysis using attributes provide more reliable results comparing to feature- less facies analysis.展开更多
Derivative and volatility attributes can be usefully calculated from recorded gamma ray(GR)data to enhance lithofacies classification in wellbores penetrating multiple lithologies.Such attributes extract information a...Derivative and volatility attributes can be usefully calculated from recorded gamma ray(GR)data to enhance lithofacies classification in wellbores penetrating multiple lithologies.Such attributes extract information about the log curve shape that cannot be readily discerned from the recorded well log data.A logged wellbore section for which 8911 data records are available for the three recorded logs(GR,sonic(DT)and bulk density(PB))is evaluated.That section demonstrates the value of the GR attributes for machine learning(ML)lithofacies predictions.Five feature selection configurations are considered.The 9-var configuration including GR,DT,PB and six GR attributes,and the 7-var configuration of GR and the six GR attributes,provide the most accurate and reproducible lithofacies predictions.The other three feature configurations evaluated do not include the GR attributes but just one to three of the recorded log features.The results of seven ML models and two regression models reveal that K-nearest neighbor(KNN),random forest(RF)and extreme gradient boosting(XGB)are the best performing models.They generate between 14 and 23 misclassification from 8911 data records for the 9-var model.Multi-layer perceptron(MLP)and support vector classification(SVC)do not perform well with the 7-var model which lacks the PB feature displaying the highest correlation with facies class.Annotated confusion matrices reveal that KNN,RF and XGB models can effectively distinguish all facies classes for the 9-var and 7-var configurations(that includes the GR attributes),whereas none of the models can achieve that outcome for the 3-var configuration(that excludes the GR attributes).Accurately distinguishing lithofacies using well-log data in sedimentary sections is an important objective in applied geoscience.The straightforward,GR-attribute method proposed works to improve confidence in ML-lithofacies classifications based on limited recorded well-log data.展开更多
文摘Feature selection is very important to obtain meaningful and interpretive clustering results from a clustering analysis. In the application of soil data clustering, there is a lack of good understanding of the response of clustering performance to different features subsets. In the present paper, we analyzed the performance differences between k-means, fuzzy c-means, and spectral clustering algorithms in the conditions of different feature subsets of soil data sets. The experimental results demonstrated that the performances of spectral clustering algorithm were generally better than those of k-means and fuzzy c-means with different features subsets. The feature subsets containing environmental attributes helped to improve clustering performances better than those having spatial attributes and produced more accurate and meaningful clustering results. Our results demonstrated that combination of spectral clustering algorithm with the feature subsets containing environmental attributes rather than spatial attributes may be a better choice in applications of soil data clustering.
基金the Institute of Geophysics,University of Tehran for its valuable support
文摘Using seismic attributes as features for classification in feature space, in various aims such as seismic facies analysis, is conventional for the purpose of seismic interpretation. But sometimes seismic data may have no attributes or it is hard to define a small and relevant set of attributes in some applica- tions. Therefore, employing techniques that perform facies modeling without using attributes is neces- sary. In this paper we present a new method for facies modeling of seismic data with missing attributes that called dissimilarity based classification. In this method, classification is based on dissimilarities and facies modeling will be done in dissimilarity space. In this space dissimilarities consider as new features instead of real features. A support vector machine as a powerful classifier was employed in both feature space (feature-based) and dissimilarity space (feature-less) for facies analysis. The proposed feature-less and feature-based classification is applied on a real seismic data from an Iranian oil field. Facies model- ing using seismic attributes provide better results, but the feature-less classification outcome is also satis- factory and the facies correlation is acceptable. Indeed, the power of attributes to discriminate different facies causes to that facies analysis using attributes provide more reliable results comparing to feature- less facies analysis.
文摘Derivative and volatility attributes can be usefully calculated from recorded gamma ray(GR)data to enhance lithofacies classification in wellbores penetrating multiple lithologies.Such attributes extract information about the log curve shape that cannot be readily discerned from the recorded well log data.A logged wellbore section for which 8911 data records are available for the three recorded logs(GR,sonic(DT)and bulk density(PB))is evaluated.That section demonstrates the value of the GR attributes for machine learning(ML)lithofacies predictions.Five feature selection configurations are considered.The 9-var configuration including GR,DT,PB and six GR attributes,and the 7-var configuration of GR and the six GR attributes,provide the most accurate and reproducible lithofacies predictions.The other three feature configurations evaluated do not include the GR attributes but just one to three of the recorded log features.The results of seven ML models and two regression models reveal that K-nearest neighbor(KNN),random forest(RF)and extreme gradient boosting(XGB)are the best performing models.They generate between 14 and 23 misclassification from 8911 data records for the 9-var model.Multi-layer perceptron(MLP)and support vector classification(SVC)do not perform well with the 7-var model which lacks the PB feature displaying the highest correlation with facies class.Annotated confusion matrices reveal that KNN,RF and XGB models can effectively distinguish all facies classes for the 9-var and 7-var configurations(that includes the GR attributes),whereas none of the models can achieve that outcome for the 3-var configuration(that excludes the GR attributes).Accurately distinguishing lithofacies using well-log data in sedimentary sections is an important objective in applied geoscience.The straightforward,GR-attribute method proposed works to improve confidence in ML-lithofacies classifications based on limited recorded well-log data.