排序是信息检索中一个重要的环节,当今已经提出百余种用于构建排序函数的特征,如何利用这些特征构建更有效的排序函数成为当今的一个热点问题,因此排序学习(Learning to Rank),一个信息检索与机器学习的交叉学科,越来越受到人们的重视...排序是信息检索中一个重要的环节,当今已经提出百余种用于构建排序函数的特征,如何利用这些特征构建更有效的排序函数成为当今的一个热点问题,因此排序学习(Learning to Rank),一个信息检索与机器学习的交叉学科,越来越受到人们的重视。从排序特征的构建方式易知,特征之间并不是完全独立的,然而现有的排序学习方法的研究,很少在特征分析的基础上,从特征重组与选择的角度,来构建更有效的排序函数。针对这一问题,提出如下的模型框架:对构建排序函数的特征集合进行分析,然后重组与选择,利用排序学习方法学习排序函数。基于这一框架,提出四种特征处理的算法:基于主成分分析的特征重组方法、基于MAP、前向选择和排序学习算法隐含的特征选择。实验结果显示,经过特征处理后,利用排序学习算法构建的排序函数,一般优于原始的排序函数。展开更多
在许多数据分析任务中,经常会遇到高维数据。特征选择技术旨在从原始高维数据中找到最具代表性的特征,但由于缺乏类标签信息,相比有监督场景,在无监督学习场景中选择合适的特征困难得多。传统的无监督特征选择方法通常依据某些准则对样...在许多数据分析任务中,经常会遇到高维数据。特征选择技术旨在从原始高维数据中找到最具代表性的特征,但由于缺乏类标签信息,相比有监督场景,在无监督学习场景中选择合适的特征困难得多。传统的无监督特征选择方法通常依据某些准则对样本的特征进行评分,在这个过程中样本是被无差别看待的。然而这样做并不能完全捕捉数据的内在结构,不同样本的重要性应该是有差异的,并且样本权重与特征权重之间存在一种对偶关系,它们会互相影响。为此,提出了一种基于对偶流形重排序的无监督特征选择算法(Unsupervised Feature Selection Algorithm based on Dual Manifold Re-Ranking, DMRR),分别构建不同的相似性矩阵来刻画样本与样本、特征与特征、样本与特征的流形结构,并结合样本与特征的初始得分进行流形上的重排序。将DMRR与3种原始无监督特征选择算法以及2种无监督特征选择后处理算法进行比较,实验结果表明样本重要性信息、样本与特征之间的对偶关系有助于实现更优的特征选择。展开更多
Responding to complex analytical queries in the data warehouse(DW)is one of the most challenging tasks that require prompt attention.The problem of materialized view(MV)selection relies on selecting the most optimal v...Responding to complex analytical queries in the data warehouse(DW)is one of the most challenging tasks that require prompt attention.The problem of materialized view(MV)selection relies on selecting the most optimal views that can respond to more queries simultaneously.This work introduces a combined approach in which the constraint handling process is combined with metaheuristics to select the most optimal subset of DW views from DWs.The proposed work initially refines the solution to enable a feasible selection of views using the ensemble constraint handling technique(ECHT).The constraints such as self-adaptive penalty,epsilon(ε)-parameter and stochastic ranking(SR)are considered for constraint handling.These two constraints helped the proposed model select the finest views that minimize the objective function.Further,a novel and effective combination of Ebola and coot optimization algorithms named hybrid Ebola with coot optimization(CHECO)is introduced to choose the optimal MVs.Ebola and Coot have recently introduced metaheuristics that identify the global optimal set of views from the given population.By combining these two algorithms,the proposed framework resulted in a highly optimized set of views with minimized costs.Several cost functions are described to enable the algorithm to choose the finest solution from the problem space.Finally,extensive evaluations are conducted to prove the performance of the proposed approach compared to existing algorithms.The proposed framework resulted in a view maintenance cost of 6,329,354,613,784,query processing cost of 3,522,857,483,566 and execution time of 226 s when analyzed using the TPC-H benchmark dataset.展开更多
Intuitionistic fuzzy numbers incorporate the membership and non-membership degrees.In contrast,Z-numbers consist of restriction components,with the existence of a reliability component describing the degree of certain...Intuitionistic fuzzy numbers incorporate the membership and non-membership degrees.In contrast,Z-numbers consist of restriction components,with the existence of a reliability component describing the degree of certainty for the restriction.The combination of intuitionistic fuzzy numbers and Z-numbers produce a new type of fuzzy numbers,namely intuitionistic Z-numbers(IZN).The strength of IZN is their capability of better handling the uncertainty compared to Zadeh's Z-numbers since both components of Z-numbers are charac-terized by the membership and non-membership functions,exhibiting the degree of the hesitancy of decision-makers.This paper presents the application of such numbers in fuzzy multi-criteria decision-making problems.A decision-making model is proposed using the trapezoidal intuitionistic fuzzy power ordered weighted average as the aggregation function and the ranking function to rank the alternatives.The proposed model is then implemented in a supplier selection problem.The obtained ranking is compared to the existing models based on Z-numbers.The results show that the ranking order is slightly different from the existing models.Sensitivity analysis is performed to validate the obtained ranking.The sensitivity analysis result shows that the best supplier is obtained using the proposed model with 80%to 100%consistency despite the drastic change of criteria weights.Intuitionistic Z-numbers play a very important role in describing the uncertainty in the decision makers’opinions in solving decision-making problems.展开更多
文摘排序是信息检索中一个重要的环节,当今已经提出百余种用于构建排序函数的特征,如何利用这些特征构建更有效的排序函数成为当今的一个热点问题,因此排序学习(Learning to Rank),一个信息检索与机器学习的交叉学科,越来越受到人们的重视。从排序特征的构建方式易知,特征之间并不是完全独立的,然而现有的排序学习方法的研究,很少在特征分析的基础上,从特征重组与选择的角度,来构建更有效的排序函数。针对这一问题,提出如下的模型框架:对构建排序函数的特征集合进行分析,然后重组与选择,利用排序学习方法学习排序函数。基于这一框架,提出四种特征处理的算法:基于主成分分析的特征重组方法、基于MAP、前向选择和排序学习算法隐含的特征选择。实验结果显示,经过特征处理后,利用排序学习算法构建的排序函数,一般优于原始的排序函数。
文摘在许多数据分析任务中,经常会遇到高维数据。特征选择技术旨在从原始高维数据中找到最具代表性的特征,但由于缺乏类标签信息,相比有监督场景,在无监督学习场景中选择合适的特征困难得多。传统的无监督特征选择方法通常依据某些准则对样本的特征进行评分,在这个过程中样本是被无差别看待的。然而这样做并不能完全捕捉数据的内在结构,不同样本的重要性应该是有差异的,并且样本权重与特征权重之间存在一种对偶关系,它们会互相影响。为此,提出了一种基于对偶流形重排序的无监督特征选择算法(Unsupervised Feature Selection Algorithm based on Dual Manifold Re-Ranking, DMRR),分别构建不同的相似性矩阵来刻画样本与样本、特征与特征、样本与特征的流形结构,并结合样本与特征的初始得分进行流形上的重排序。将DMRR与3种原始无监督特征选择算法以及2种无监督特征选择后处理算法进行比较,实验结果表明样本重要性信息、样本与特征之间的对偶关系有助于实现更优的特征选择。
文摘Responding to complex analytical queries in the data warehouse(DW)is one of the most challenging tasks that require prompt attention.The problem of materialized view(MV)selection relies on selecting the most optimal views that can respond to more queries simultaneously.This work introduces a combined approach in which the constraint handling process is combined with metaheuristics to select the most optimal subset of DW views from DWs.The proposed work initially refines the solution to enable a feasible selection of views using the ensemble constraint handling technique(ECHT).The constraints such as self-adaptive penalty,epsilon(ε)-parameter and stochastic ranking(SR)are considered for constraint handling.These two constraints helped the proposed model select the finest views that minimize the objective function.Further,a novel and effective combination of Ebola and coot optimization algorithms named hybrid Ebola with coot optimization(CHECO)is introduced to choose the optimal MVs.Ebola and Coot have recently introduced metaheuristics that identify the global optimal set of views from the given population.By combining these two algorithms,the proposed framework resulted in a highly optimized set of views with minimized costs.Several cost functions are described to enable the algorithm to choose the finest solution from the problem space.Finally,extensive evaluations are conducted to prove the performance of the proposed approach compared to existing algorithms.The proposed framework resulted in a view maintenance cost of 6,329,354,613,784,query processing cost of 3,522,857,483,566 and execution time of 226 s when analyzed using the TPC-H benchmark dataset.
基金funded by the Fundamental Research Grant Scheme under the Ministry of Higher Education Malaysia FRGS/1/2019/STG06/UMP/02/9.
文摘Intuitionistic fuzzy numbers incorporate the membership and non-membership degrees.In contrast,Z-numbers consist of restriction components,with the existence of a reliability component describing the degree of certainty for the restriction.The combination of intuitionistic fuzzy numbers and Z-numbers produce a new type of fuzzy numbers,namely intuitionistic Z-numbers(IZN).The strength of IZN is their capability of better handling the uncertainty compared to Zadeh's Z-numbers since both components of Z-numbers are charac-terized by the membership and non-membership functions,exhibiting the degree of the hesitancy of decision-makers.This paper presents the application of such numbers in fuzzy multi-criteria decision-making problems.A decision-making model is proposed using the trapezoidal intuitionistic fuzzy power ordered weighted average as the aggregation function and the ranking function to rank the alternatives.The proposed model is then implemented in a supplier selection problem.The obtained ranking is compared to the existing models based on Z-numbers.The results show that the ranking order is slightly different from the existing models.Sensitivity analysis is performed to validate the obtained ranking.The sensitivity analysis result shows that the best supplier is obtained using the proposed model with 80%to 100%consistency despite the drastic change of criteria weights.Intuitionistic Z-numbers play a very important role in describing the uncertainty in the decision makers’opinions in solving decision-making problems.