It is of great significance to improve the efficiency of railway production and operation by realizing the fault knowledge association through the efficient data mining algorithm.However,high utility quantitative freq...It is of great significance to improve the efficiency of railway production and operation by realizing the fault knowledge association through the efficient data mining algorithm.However,high utility quantitative frequent pattern mining algorithms in the field of data mining still suffer from the problems of low time-memory performance and are not easy to scale up.In the context of such needs,we propose a related degree-based frequent pattern mining algorithm,named Related High Utility Quantitative Item set Mining(RHUQI-Miner),to enable the effective mining of railway fault data.The algorithm constructs the item-related degree structure of fault data and gives a pruning optimization strategy to find frequent patterns with higher related degrees,reducing redundancy and invalid frequent patterns.Subsequently,it uses the fixed pattern length strategy to modify the utility information of the item in the mining process so that the algorithm can control the length of the output frequent pattern according to the actual data situation and further improve the performance and practicability of the algorithm.The experimental results on the real fault dataset show that RHUQI-Miner can effectively reduce the time and memory consumption in the mining process,thus providing data support for differentiated and precise maintenance strategies.展开更多
Although pruning is important to obtain highquality,large-diameter timber,the effects of pruning on nonstructural carbohydrates(NSC)in aboveground organs of many timber species are not well understood.Three intensitie...Although pruning is important to obtain highquality,large-diameter timber,the effects of pruning on nonstructural carbohydrates(NSC)in aboveground organs of many timber species are not well understood.Three intensities of pruning(none,moderate and severe)were tested on poplars(Populus alba×P.talassica)in the arid desert region of northwest China to compare the concentrations of soluble sugar(SS),starch(ST)and total nonstructural carbohydrate(TNC)in leaves,branches and trunks during the growing season.The concentration of NSC components after different pruning intensities varied similarly in seasonal patterns,increasing slowly at the beginning of the growing season,continuously declining in the middle,then gradually recovering by the end of the growing season.The monthly mean NSC concentration in poplar differed significantly among the three pruning intensities(p<0.05).The SS concentration in pruned trees was higher than in unpruned trees(p<0.05).For moderately pruned trees,the concentrations of ST and TNC in trunks and branches were higher than in unpruned and in severely pruned trees(p<0.05).Compared with no pruning,pruning changed the seasonal variation in NSC concentration.The orders of SS and TNC concentrations in aboveground organs were leaf>branch>trunk,while the order of ST concentration was trunk>leaf>branch,which was related to functional differences of plant organs.The annual average growth in height of unpruned,moderately pruned,and severely pruned poplars was 0.21±0.06,0.45±0.09 and 0.24±0.05 m,respectively,and the annual average growth in DBH were 0.92±0.04,1.27±0.06 and 1.02±0.05 cm,respectively.Our results demonstrate that moderate pruning may effectively increase the annual growth in tree height and DBH while avoiding damage caused by excessive pruning to the tree body.Therefore,moderate pruning may increase the NSC storage and improve the growth of timber species.展开更多
A hypothesis of the existence of dominant pattern that may affect the performance of a neural based pattern recognition system and its operation in terms of correct and accurate classification, pruning and optimizatio...A hypothesis of the existence of dominant pattern that may affect the performance of a neural based pattern recognition system and its operation in terms of correct and accurate classification, pruning and optimization is assumed, presented, tested and proved to be correct. Two sets of data subjected to the same ranking process using four main features are used to train a neural network engine separately and jointly. Data transformation and statistical pre-processing are carried out on the datasets before inserting them into the specifically designed multi-layer neural network employing Weight Elimination Algorithm with Back Propagation (WEA-BP). The dynamics of classification and weight elimination process is correlated and used to prove the dominance of one dataset. The presented results proved that one dataset acted aggressively towards the system and displaced the first dataset making its classification almost impossible. Such modulation to the relationships among the selected features of the affected dataset resulted in a mutated pattern and subsequent re-arrangement in the data set ranking of its members.展开更多
由于数据规模的快速增长,高效用序列模式挖掘算法效率严重下降.针对这种情况,提出基于Map Reduce的高效用序列模式挖掘算法Hus Ma R.算法基于Map Reduce框架,使用效用矩阵高效地生成候选项;使用随机映射策略均衡计算资源;使用基于领域...由于数据规模的快速增长,高效用序列模式挖掘算法效率严重下降.针对这种情况,提出基于Map Reduce的高效用序列模式挖掘算法Hus Ma R.算法基于Map Reduce框架,使用效用矩阵高效地生成候选项;使用随机映射策略均衡计算资源;使用基于领域的剪枝策略来防止组合爆炸.实验结果表明,在大规模数据集下,算法取得了较高的并行效率.展开更多
基金supported by the Research on Key Technologies and Typical Applications of Big Data in Railway Production and Operation(P2023S006)the Fundamental Research Funds for the Central Universities(2022JBZY023).
文摘It is of great significance to improve the efficiency of railway production and operation by realizing the fault knowledge association through the efficient data mining algorithm.However,high utility quantitative frequent pattern mining algorithms in the field of data mining still suffer from the problems of low time-memory performance and are not easy to scale up.In the context of such needs,we propose a related degree-based frequent pattern mining algorithm,named Related High Utility Quantitative Item set Mining(RHUQI-Miner),to enable the effective mining of railway fault data.The algorithm constructs the item-related degree structure of fault data and gives a pruning optimization strategy to find frequent patterns with higher related degrees,reducing redundancy and invalid frequent patterns.Subsequently,it uses the fixed pattern length strategy to modify the utility information of the item in the mining process so that the algorithm can control the length of the output frequent pattern according to the actual data situation and further improve the performance and practicability of the algorithm.The experimental results on the real fault dataset show that RHUQI-Miner can effectively reduce the time and memory consumption in the mining process,thus providing data support for differentiated and precise maintenance strategies.
基金supported by Key Projects of Universities for Foreign Cultural and Educational Experts Employment Plan in 2018(T2018013)granted from Special Funds for Sustainable Development of Science and Technology Platform for Fundamental Research Business Expenses of Central Universities(2572018CP05).
文摘Although pruning is important to obtain highquality,large-diameter timber,the effects of pruning on nonstructural carbohydrates(NSC)in aboveground organs of many timber species are not well understood.Three intensities of pruning(none,moderate and severe)were tested on poplars(Populus alba×P.talassica)in the arid desert region of northwest China to compare the concentrations of soluble sugar(SS),starch(ST)and total nonstructural carbohydrate(TNC)in leaves,branches and trunks during the growing season.The concentration of NSC components after different pruning intensities varied similarly in seasonal patterns,increasing slowly at the beginning of the growing season,continuously declining in the middle,then gradually recovering by the end of the growing season.The monthly mean NSC concentration in poplar differed significantly among the three pruning intensities(p<0.05).The SS concentration in pruned trees was higher than in unpruned trees(p<0.05).For moderately pruned trees,the concentrations of ST and TNC in trunks and branches were higher than in unpruned and in severely pruned trees(p<0.05).Compared with no pruning,pruning changed the seasonal variation in NSC concentration.The orders of SS and TNC concentrations in aboveground organs were leaf>branch>trunk,while the order of ST concentration was trunk>leaf>branch,which was related to functional differences of plant organs.The annual average growth in height of unpruned,moderately pruned,and severely pruned poplars was 0.21±0.06,0.45±0.09 and 0.24±0.05 m,respectively,and the annual average growth in DBH were 0.92±0.04,1.27±0.06 and 1.02±0.05 cm,respectively.Our results demonstrate that moderate pruning may effectively increase the annual growth in tree height and DBH while avoiding damage caused by excessive pruning to the tree body.Therefore,moderate pruning may increase the NSC storage and improve the growth of timber species.
文摘A hypothesis of the existence of dominant pattern that may affect the performance of a neural based pattern recognition system and its operation in terms of correct and accurate classification, pruning and optimization is assumed, presented, tested and proved to be correct. Two sets of data subjected to the same ranking process using four main features are used to train a neural network engine separately and jointly. Data transformation and statistical pre-processing are carried out on the datasets before inserting them into the specifically designed multi-layer neural network employing Weight Elimination Algorithm with Back Propagation (WEA-BP). The dynamics of classification and weight elimination process is correlated and used to prove the dominance of one dataset. The presented results proved that one dataset acted aggressively towards the system and displaced the first dataset making its classification almost impossible. Such modulation to the relationships among the selected features of the affected dataset resulted in a mutated pattern and subsequent re-arrangement in the data set ranking of its members.
文摘选择性集成通过选择部分基分类器参与集成,从而提高集成分类器的泛化能力,降低预测开销.但已有的选择性集成算法普遍耗时较长,将数据挖掘的技术应用于选择性集成,提出一种基于FP-Tree(frequent pattern tree)的快速选择性集成算法:CPM-EP(coverage based pattern mining for ensemble pruning).该算法将基分类器对校验样本集的分类结果组织成一个事务数据库,从而使选择性集成问题可转化为对事务数据集的处理问题.针对所有可能的集成分类器大小,CPM-EP算法首先得到一个精简的事务数据库,并创建一棵FP-Tree树保存其内容;然后,基于该FP-Tree获得相应大小的集成分类器.在获得的所有集成分类器中,对校验样本集预测精度最高的集成分类器即为算法的输出.实验结果表明,CPM-EP算法以很低的计算开销获得优越的泛化能力,其分类器选择时间约为GASEN的1/19以及Forward-Selection的1/8,其泛化能力显著优于参与比较的其他方法,而且产生的集成分类器具有较少的基分类器.
文摘由于数据规模的快速增长,高效用序列模式挖掘算法效率严重下降.针对这种情况,提出基于Map Reduce的高效用序列模式挖掘算法Hus Ma R.算法基于Map Reduce框架,使用效用矩阵高效地生成候选项;使用随机映射策略均衡计算资源;使用基于领域的剪枝策略来防止组合爆炸.实验结果表明,在大规模数据集下,算法取得了较高的并行效率.