There is still no effective means to analyze in depth and utilize domestic mass data about agricultural product quality safety tests in china now. The neural network algorithm, the classification regression tree algor...There is still no effective means to analyze in depth and utilize domestic mass data about agricultural product quality safety tests in china now. The neural network algorithm, the classification regression tree algorithm, the Bayesian network algorithm were selected according to the principle of selecting combination model and were used to build models respectively and then combined, innovatively establishing a combination model which has relatively high precision, strong robustness and better explanatory to predict the results of perishable food transportation meta-morphism monitoring. The relative optimal prediction model of the perishable food transportation metamorphism monitoring system could be got. The relative perfect prediction model can guide the actual sampling work about food quality and safety by prognosticating the occurrence of unqualified food to select the typical and effective samples for test, thus improving the efficiency and effectiveness of sampling work effectively, so as to avoid deteriorated perishable food’s approaching the market to ensure the quality and safety of perishable food transportation. A solid protective wall was built in the protection of general perishable food consumers’ health.展开更多
流形数据由一些弧线状或环状的类簇组成,其特点是同一类簇的样本间距离差距较大。密度峰值聚类算法不能有效识别流形类簇的类簇中心且分配剩余样本时易引发样本的连续误分配问题。为此,本文提出面向流形数据的共享近邻密度峰值聚类(dens...流形数据由一些弧线状或环状的类簇组成,其特点是同一类簇的样本间距离差距较大。密度峰值聚类算法不能有效识别流形类簇的类簇中心且分配剩余样本时易引发样本的连续误分配问题。为此,本文提出面向流形数据的共享近邻密度峰值聚类(density peaks clustering based on shared nearest neighbor for manifold datasets,DPC-SNN)算法。提出了一种基于共享近邻的样本相似度定义方式,使得同一流形类簇样本间的相似度尽可能高;基于上述相似度定义局部密度,不忽略距类簇中心较远样本的密度贡献,能更好地区分出流形类簇的类簇中心与其他样本;根据样本的相似度分配剩余样本,避免了样本的连续误分配。DPC-SNN算法与DPC、FKNNDPC、FNDPC、DPCSA及IDPC-FA算法的对比实验结果表明,DPC-SNN算法能够有效发现流形数据的类簇中心并准确完成聚类,对真实以及人脸数据集也有不错的聚类效果。展开更多
One of the obstacles of the efficient association rule mining is theexplosive expansion of data sets since it is costly or impossible to scan large databases, esp., formultiple times. A popular solution to improve the...One of the obstacles of the efficient association rule mining is theexplosive expansion of data sets since it is costly or impossible to scan large databases, esp., formultiple times. A popular solution to improve the speed and scalability of the association rulemining is to do the algorithm on a random sample instead of the entire database. But how toeffectively define and efficiently estimate the degree of error with respect to the outcome of thealgorithm, and how to determine the sample size needed are entangling researches until now. In thispaper, an effective and efficient algorithm is given based on the PAC (Probably Approximate Correct)learning theory to measure and estimate sample error. Then, a new adaptive, on-line, fast samplingstrategy - multi-scaling sampling - is presented inspired by MRA (Multi-Resolution Analysis) andShannon sampling theorem, for quickly obtaining acceptably approximate association rules atappropriate sample size. Both theoretical analysis and empirical study have showed that the Samplingstrategy can achieve a very good speed-accuracy trade-off.展开更多
Constraint pushing techniques have been developed for mining frequent patterns and association rules. How ever, multiple constraints cannot be handled with existing techniques in frequent pattern mining. In this paper...Constraint pushing techniques have been developed for mining frequent patterns and association rules. How ever, multiple constraints cannot be handled with existing techniques in frequent pattern mining. In this paper, a new algorithm MCFMC (mining complete set of frequent itemsets with multiple constraints) is introduced. The algorithm takes advantage of the fact that a convertible constraint can be pushed into mining algorithm to reduce mining research spaces. By using a sample database, the algorithm develops techniques which select an optimal method based on a sample database to convert multiple constraints into multiple convert ible constraints, disjoined by conjunction and/or, and then partition these constraints into two parts. One part is pushed deep inside the mining process to reduce the research spaces for frequent itemsets, the other part that cannot be pushed in algorithm is used to filter the complete set of frequent itemsets and get the final result. Results from our detailed experi ment show the feasibility and effectiveness of the algorithm.展开更多
Incomplete data samples have a serious impact on the effectiveness of data mining.Aiming at the LRE historical test samples,based on correlation analysis of condition parameter,this paper introduced principle componen...Incomplete data samples have a serious impact on the effectiveness of data mining.Aiming at the LRE historical test samples,based on correlation analysis of condition parameter,this paper introduced principle component analysis(PCA)and proposed a complete analysis method based on PCA for incomplete samples.At first,the covariance matrix of complete data set was calculated;Then,according to corresponding eigenvalues which were in descending,a principle matrix composed of eigen-vectors of covariance matrix was made;Finally,the vacant data was estimated based on the principle matrix and the known data.Compared with traditional method validated the method proposed in this paper has a better effect on complete test samples.An application example shows that the method suggested in this paper can update the value in use of historical test data.展开更多
文摘There is still no effective means to analyze in depth and utilize domestic mass data about agricultural product quality safety tests in china now. The neural network algorithm, the classification regression tree algorithm, the Bayesian network algorithm were selected according to the principle of selecting combination model and were used to build models respectively and then combined, innovatively establishing a combination model which has relatively high precision, strong robustness and better explanatory to predict the results of perishable food transportation meta-morphism monitoring. The relative optimal prediction model of the perishable food transportation metamorphism monitoring system could be got. The relative perfect prediction model can guide the actual sampling work about food quality and safety by prognosticating the occurrence of unqualified food to select the typical and effective samples for test, thus improving the efficiency and effectiveness of sampling work effectively, so as to avoid deteriorated perishable food’s approaching the market to ensure the quality and safety of perishable food transportation. A solid protective wall was built in the protection of general perishable food consumers’ health.
文摘流形数据由一些弧线状或环状的类簇组成,其特点是同一类簇的样本间距离差距较大。密度峰值聚类算法不能有效识别流形类簇的类簇中心且分配剩余样本时易引发样本的连续误分配问题。为此,本文提出面向流形数据的共享近邻密度峰值聚类(density peaks clustering based on shared nearest neighbor for manifold datasets,DPC-SNN)算法。提出了一种基于共享近邻的样本相似度定义方式,使得同一流形类簇样本间的相似度尽可能高;基于上述相似度定义局部密度,不忽略距类簇中心较远样本的密度贡献,能更好地区分出流形类簇的类簇中心与其他样本;根据样本的相似度分配剩余样本,避免了样本的连续误分配。DPC-SNN算法与DPC、FKNNDPC、FNDPC、DPCSA及IDPC-FA算法的对比实验结果表明,DPC-SNN算法能够有效发现流形数据的类簇中心并准确完成聚类,对真实以及人脸数据集也有不错的聚类效果。
基金CAS Project of Brain and Mind Science,国家高技术研究发展计划(863计划),国家重点基础研究发展计划(973计划),国家自然科学基金,湖南省自然科学基金
文摘One of the obstacles of the efficient association rule mining is theexplosive expansion of data sets since it is costly or impossible to scan large databases, esp., formultiple times. A popular solution to improve the speed and scalability of the association rulemining is to do the algorithm on a random sample instead of the entire database. But how toeffectively define and efficiently estimate the degree of error with respect to the outcome of thealgorithm, and how to determine the sample size needed are entangling researches until now. In thispaper, an effective and efficient algorithm is given based on the PAC (Probably Approximate Correct)learning theory to measure and estimate sample error. Then, a new adaptive, on-line, fast samplingstrategy - multi-scaling sampling - is presented inspired by MRA (Multi-Resolution Analysis) andShannon sampling theorem, for quickly obtaining acceptably approximate association rules atappropriate sample size. Both theoretical analysis and empirical study have showed that the Samplingstrategy can achieve a very good speed-accuracy trade-off.
基金Supported by the National Natural Science Foun-dation of China(60542004)
文摘Constraint pushing techniques have been developed for mining frequent patterns and association rules. How ever, multiple constraints cannot be handled with existing techniques in frequent pattern mining. In this paper, a new algorithm MCFMC (mining complete set of frequent itemsets with multiple constraints) is introduced. The algorithm takes advantage of the fact that a convertible constraint can be pushed into mining algorithm to reduce mining research spaces. By using a sample database, the algorithm develops techniques which select an optimal method based on a sample database to convert multiple constraints into multiple convert ible constraints, disjoined by conjunction and/or, and then partition these constraints into two parts. One part is pushed deep inside the mining process to reduce the research spaces for frequent itemsets, the other part that cannot be pushed in algorithm is used to filter the complete set of frequent itemsets and get the final result. Results from our detailed experi ment show the feasibility and effectiveness of the algorithm.
基金supported by National Natural Science Foundation of China(No.51075391)
文摘Incomplete data samples have a serious impact on the effectiveness of data mining.Aiming at the LRE historical test samples,based on correlation analysis of condition parameter,this paper introduced principle component analysis(PCA)and proposed a complete analysis method based on PCA for incomplete samples.At first,the covariance matrix of complete data set was calculated;Then,according to corresponding eigenvalues which were in descending,a principle matrix composed of eigen-vectors of covariance matrix was made;Finally,the vacant data was estimated based on the principle matrix and the known data.Compared with traditional method validated the method proposed in this paper has a better effect on complete test samples.An application example shows that the method suggested in this paper can update the value in use of historical test data.