Profile hidden Markov models (HMMs) based on classical HMMs have been widely applied for protein sequence identification. The formulation of the forward and backward variables in profile HMMs is made under statistic...Profile hidden Markov models (HMMs) based on classical HMMs have been widely applied for protein sequence identification. The formulation of the forward and backward variables in profile HMMs is made under statistical independence assumption of the probability theory. We propose a fuzzy profile HMM to overcome the limitations of that assumption and to achieve an improved alignment for protein sequences belonging to a given family. The proposed model fuzzifies the forward and backward variables by incorporating Sugeno fuzzy measures and Choquet integrals, thus further extends the generalized HMM. Based on the fuzzified forward and backward variables, we propose a fuzzy Baum-Welch parameter estimation algorithm for profiles. The strong correlations and the sequence preference involved in the protein structures make this fuzzy architecture based model as a suitable candidate for building profiles of a given family, since the fuzzy set can handle uncertainties better than classical methods.展开更多
For unveiling coal-bearing source rocks in terrestrial-marine transitional sequences, the sequence stratigraphic framework and sedimentary facies of Lower Oligocene Yacheng Formation of Qiongdongnan Basin were investi...For unveiling coal-bearing source rocks in terrestrial-marine transitional sequences, the sequence stratigraphic framework and sedimentary facies of Lower Oligocene Yacheng Formation of Qiongdongnan Basin were investigated using seismic profiles, complemented by well bores and cores. Three third-order sequences are identified on the basis of unconformities on basin margins and correlative conformities in the basin center, namely SQYC3, SQYC2 and SQYC1 from bottom to top. Coal measure in Yacheng Formation of Qiongdongnan Basin were deposited within a range of facies associations from delta plain/tidal zone to neritic sea, and three types of favourable sedimentary facies associations for coal measure were established within the sequence stratigraphic framework, including braided delta plain and alluvial fan, lagoon and tidal flat, and fan delta and coastal plain facies associations. Results shown that, in the third-order sequences, coal accumulation in landward areas(such as delta plain) of the study area predominantly correlates with the early transgressive systems tract(TST) to middle highstand systems tract(HST), while in seaward areas(such as tidal flat-lagoon) it correlates with the early TST and middle HST. The most potential coal-bearing source rocks formed where the accommodation creation rate(Ra) and the peat-accumulation rate(Rp) could reach a state of balance, which varied among different sedimentary settings. Furthermore, intense tectonic subsidence and frequent alternative marine-continental changes of Yacheng Formation during the middle rift stage were the main reasons why the coal beds shown the characteristics of multi-beds, thin single-bed, and rapidly lateral changes. The proposed sedimentary facies associations may aid in predicting distribution of coal-bearing source rocks. This study also demonstrates that controlling factors analysis using sequence stratigraphy and sedimentology may serve as an effective approach for coal-bearing characteristics in the lower exploration展开更多
In uncertainty analysis and reliability-based multidisciplinary design and optimization(RBMDO)of engineering structures,the saddlepoint approximation(SA)method can be utilized to enhance the accuracy and efficiency of...In uncertainty analysis and reliability-based multidisciplinary design and optimization(RBMDO)of engineering structures,the saddlepoint approximation(SA)method can be utilized to enhance the accuracy and efficiency of reliability evaluation.However,the random variables involved in SA should be easy to handle.Additionally,the corresponding saddlepoint equation should not be complicated.Both of them limit the application of SA for engineering problems.The moment method can construct an approximate cumulative distribution function of the performance function based on the first few statistical moments.However,the traditional moment matching method is not very accurate generally.In order to take advantage of the SA method and the moment matching method to enhance the efficiency of design and optimization,a fourth-moment saddlepoint approximation(FMSA)method is introduced into RBMDO.In FMSA,the approximate cumulative generating functions are constructed based on the first four moments of the limit state function.The probability density function and cumulative distribution function are estimated based on this approximate cumulative generating function.Furthermore,the FMSA method is introduced and combined into RBMDO within the framework of sequence optimization and reliability assessment,which is based on the performance measure approach strategy.Two engineering examples are introduced to verify the effectiveness of proposed method.展开更多
In this article, we prove upper large deviations for the empirical measure generated by stationary mixing random sequence under some suitable assumptions and upper large deviations for the mixing random sequence.
Web log mining is analysis of web log files with web page sequences. Discovering user access patterns from web access are necessary for building adaptive web servers, to improve e-commerce, to carry out cross-marketin...Web log mining is analysis of web log files with web page sequences. Discovering user access patterns from web access are necessary for building adaptive web servers, to improve e-commerce, to carry out cross-marketing, for web personalization, to predict web access sequence etc. In this paper, a new agglomerative clustering technique is proposed to identify users with similar interest, and to determine the motivation for visiting a website. Using this approach, web usage mining is done through different stages namely data cleaning, preprocessing, pattern discovery and pattern analysis. Results are given to explain how this approach produces tight usage clusters than the existing web usage mining techniques. Rather than traditional distance based clustering, the similarity measure is considered during clustering process in order to reduce computational complexity. This paper also deals with the problem of assessing the quality of user session clusters and cluster validity is measured by using statistical test, which measures the distances of clusters distributions to infer their dissimilarity and distinguish level. Using such statistical measures, it is proved that cluster accuracy is improved to the extent of 0.83, over existing k-means clustering with validity measure 0.26, FCM (Fuzzy C Means) clustering with validity measure 0.56. Rough set based clustering with validity measure 0.54 Generation of dense clusters is essential for finding interesting patterns needed for further mining and analysis.展开更多
Under loose conditions, the existence of solutions to initial value problem are studied for second order impulsive integro-differential equation with infinite moments of impulse effect on the positive half real axis i...Under loose conditions, the existence of solutions to initial value problem are studied for second order impulsive integro-differential equation with infinite moments of impulse effect on the positive half real axis in Banach spaces. By the use of recurrence method, Tonelii sequence and the locally convex topology, the new existence theorems are achieved, which improve the related results obtained by Guo Da-jun.展开更多
From a data mining perspective, sequence classification is to build a classifier using frequent sequential patterns. However, mining for a complete set of sequential patterns on a large dataset can be extremely time-c...From a data mining perspective, sequence classification is to build a classifier using frequent sequential patterns. However, mining for a complete set of sequential patterns on a large dataset can be extremely time-consuming and the large number of patterns discovered also makes the pattern selection and classifier building very time-consuming. The fact is that, in sequence classification, it is much more important to discover discriminative patterns than a complete pattern set. In this paper, we propose a novel hierarchical algorithm to build sequential classifiers using discriminative sequential patterns. Firstly, we mine for the sequential patterns which axe the most strongly correlated to each target class. In this step, an aggressive strategy is employed to select a small set of sequential patterns. Secondly, pattern pruning and serial coverage test are done on the mined patterns. The patterns that pass the serial test are used to build the sub-classifier at the first level of the final classifier. And thirdly, the training samples that cannot be covered are fed back to the sequential pattern mining stage with updated parameters. This process continues until predefined interestingness measure thresholds are reached, or all samples axe covered. The patterns generated in each loop form the sub-classifier at each level of the final classifier. Within this framework, the searching space can be reduced dramatically while a good classification performance is achieved. The proposed algorithm is tested in a real-world business application for debt prevention in social security area. The novel sequence classification algorithm shows the effectiveness and efficiency for predicting debt occurrences based on customer activity sequence data.展开更多
文摘Profile hidden Markov models (HMMs) based on classical HMMs have been widely applied for protein sequence identification. The formulation of the forward and backward variables in profile HMMs is made under statistical independence assumption of the probability theory. We propose a fuzzy profile HMM to overcome the limitations of that assumption and to achieve an improved alignment for protein sequences belonging to a given family. The proposed model fuzzifies the forward and backward variables by incorporating Sugeno fuzzy measures and Choquet integrals, thus further extends the generalized HMM. Based on the fuzzified forward and backward variables, we propose a fuzzy Baum-Welch parameter estimation algorithm for profiles. The strong correlations and the sequence preference involved in the protein structures make this fuzzy architecture based model as a suitable candidate for building profiles of a given family, since the fuzzy set can handle uncertainties better than classical methods.
基金supported by the Zhanjiang Branch of CNOOC Ltd.the National Science and Technology Projects (No. 2011ZX05025-002-02-02)+1 种基金Natural Science Foundation of China (NSFC) (Nos. 41202074 and 41272122)the Key Laboratory of Tectonics and Petroleum Resources (CUG) of Ministry of Education Open Issue (No. TPR-2013-13)
文摘For unveiling coal-bearing source rocks in terrestrial-marine transitional sequences, the sequence stratigraphic framework and sedimentary facies of Lower Oligocene Yacheng Formation of Qiongdongnan Basin were investigated using seismic profiles, complemented by well bores and cores. Three third-order sequences are identified on the basis of unconformities on basin margins and correlative conformities in the basin center, namely SQYC3, SQYC2 and SQYC1 from bottom to top. Coal measure in Yacheng Formation of Qiongdongnan Basin were deposited within a range of facies associations from delta plain/tidal zone to neritic sea, and three types of favourable sedimentary facies associations for coal measure were established within the sequence stratigraphic framework, including braided delta plain and alluvial fan, lagoon and tidal flat, and fan delta and coastal plain facies associations. Results shown that, in the third-order sequences, coal accumulation in landward areas(such as delta plain) of the study area predominantly correlates with the early transgressive systems tract(TST) to middle highstand systems tract(HST), while in seaward areas(such as tidal flat-lagoon) it correlates with the early TST and middle HST. The most potential coal-bearing source rocks formed where the accommodation creation rate(Ra) and the peat-accumulation rate(Rp) could reach a state of balance, which varied among different sedimentary settings. Furthermore, intense tectonic subsidence and frequent alternative marine-continental changes of Yacheng Formation during the middle rift stage were the main reasons why the coal beds shown the characteristics of multi-beds, thin single-bed, and rapidly lateral changes. The proposed sedimentary facies associations may aid in predicting distribution of coal-bearing source rocks. This study also demonstrates that controlling factors analysis using sequence stratigraphy and sedimentology may serve as an effective approach for coal-bearing characteristics in the lower exploration
基金support from the Key R&D Program of Shandong Province(Grant No.2019JZZY010431)the National Natural Science Foundation of China(Grant No.52175130)+1 种基金the Sichuan Science and Technology Program(Grant No.2022YFQ0087)the Sichuan Science and Technology Innovation Seedling Project Funding Projeet(Grant No.2021112)are gratefully acknowledged.
文摘In uncertainty analysis and reliability-based multidisciplinary design and optimization(RBMDO)of engineering structures,the saddlepoint approximation(SA)method can be utilized to enhance the accuracy and efficiency of reliability evaluation.However,the random variables involved in SA should be easy to handle.Additionally,the corresponding saddlepoint equation should not be complicated.Both of them limit the application of SA for engineering problems.The moment method can construct an approximate cumulative distribution function of the performance function based on the first few statistical moments.However,the traditional moment matching method is not very accurate generally.In order to take advantage of the SA method and the moment matching method to enhance the efficiency of design and optimization,a fourth-moment saddlepoint approximation(FMSA)method is introduced into RBMDO.In FMSA,the approximate cumulative generating functions are constructed based on the first four moments of the limit state function.The probability density function and cumulative distribution function are estimated based on this approximate cumulative generating function.Furthermore,the FMSA method is introduced and combined into RBMDO within the framework of sequence optimization and reliability assessment,which is based on the performance measure approach strategy.Two engineering examples are introduced to verify the effectiveness of proposed method.
基金The NSF (10571073) of China985 Program of Jilin University
文摘In this article, we prove upper large deviations for the empirical measure generated by stationary mixing random sequence under some suitable assumptions and upper large deviations for the mixing random sequence.
文摘Web log mining is analysis of web log files with web page sequences. Discovering user access patterns from web access are necessary for building adaptive web servers, to improve e-commerce, to carry out cross-marketing, for web personalization, to predict web access sequence etc. In this paper, a new agglomerative clustering technique is proposed to identify users with similar interest, and to determine the motivation for visiting a website. Using this approach, web usage mining is done through different stages namely data cleaning, preprocessing, pattern discovery and pattern analysis. Results are given to explain how this approach produces tight usage clusters than the existing web usage mining techniques. Rather than traditional distance based clustering, the similarity measure is considered during clustering process in order to reduce computational complexity. This paper also deals with the problem of assessing the quality of user session clusters and cluster validity is measured by using statistical test, which measures the distances of clusters distributions to infer their dissimilarity and distinguish level. Using such statistical measures, it is proved that cluster accuracy is improved to the extent of 0.83, over existing k-means clustering with validity measure 0.26, FCM (Fuzzy C Means) clustering with validity measure 0.56. Rough set based clustering with validity measure 0.54 Generation of dense clusters is essential for finding interesting patterns needed for further mining and analysis.
基金Project supported by the National Natural Science Foundation of China(Nos. 10572057 and 10251001)the Science Foundation of Nanjing University of Aeronautics and Austronautics
文摘Under loose conditions, the existence of solutions to initial value problem are studied for second order impulsive integro-differential equation with infinite moments of impulse effect on the positive half real axis in Banach spaces. By the use of recurrence method, Tonelii sequence and the locally convex topology, the new existence theorems are achieved, which improve the related results obtained by Guo Da-jun.
基金supported by Australian Research Council Linkage Project under Grant No. LP0775041the Early Career Researcher Grant under Grant No. 2007002448 from University of Technology, Sydney, Australia
文摘From a data mining perspective, sequence classification is to build a classifier using frequent sequential patterns. However, mining for a complete set of sequential patterns on a large dataset can be extremely time-consuming and the large number of patterns discovered also makes the pattern selection and classifier building very time-consuming. The fact is that, in sequence classification, it is much more important to discover discriminative patterns than a complete pattern set. In this paper, we propose a novel hierarchical algorithm to build sequential classifiers using discriminative sequential patterns. Firstly, we mine for the sequential patterns which axe the most strongly correlated to each target class. In this step, an aggressive strategy is employed to select a small set of sequential patterns. Secondly, pattern pruning and serial coverage test are done on the mined patterns. The patterns that pass the serial test are used to build the sub-classifier at the first level of the final classifier. And thirdly, the training samples that cannot be covered are fed back to the sequential pattern mining stage with updated parameters. This process continues until predefined interestingness measure thresholds are reached, or all samples axe covered. The patterns generated in each loop form the sub-classifier at each level of the final classifier. Within this framework, the searching space can be reduced dramatically while a good classification performance is achieved. The proposed algorithm is tested in a real-world business application for debt prevention in social security area. The novel sequence classification algorithm shows the effectiveness and efficiency for predicting debt occurrences based on customer activity sequence data.