摘要
传统基于用户预估的执行时间通常准确性较差。结合分类和基于实例的学习方法,综合使用模板相似和数值相似方法,在历史调度数据中获取当前作业的相似作业,并使用其历史信息预测当前作业执行时间。使用调度历史中的用户名、分组名、队列名、应用名、用户请求处理器数、用户请求(预估)执行时间和用户请求内存量等属性进行训练和预测,算法中涉及的参数使用遗传算法确定。数值实验表明,相较于已有文献,本方法在使用更少参数的前提下得到了与文献结果中相近的低估率,并获得了更低的平均绝对误差。在HPC2N04和HPC2N05日志数据集上,平均绝对误差分别降低了43%和77%。研究了使用在线预测替换用户估计对作业调度的影响,对结果进行了初步分析并指出了今后的改进方向。
Traditional runtimes based on user estimating is usually less accurate.This paper combined the categorization with the instance-based learning method,used the template similarity and numerical similarity method to find the similar jobs of the current jobs in historical data,and used historical scheduling data to predict the runtimes of the current jobs.This paper only took seven job attributes into account,which included user name,group name,queue name,application name,requested number of processors,requested runtime,requested memory.It applied genetic algorithm to train the best parameters,and used similar jobs attributes to predict runtimes.Compared with the existing method,experimental results show that the proposed prediction method achieves a similar underestimate rate on the premise of using fewer parameters,and gets a lower mean absolute error.Moreover,on the HPC2N04 and HPC2N05 datasets,the mean absolute errors reduce 43%and 77%respectively.This paper studied the effect of using online prediction to replace user estimation on job scheduling,analyzed the results and pointed out the future improvement directions.
作者
许伦凡
熊敏
肖永浩
Xu Lunfan;Xiong Min;Xiao Yonghao(Institute of Computer Application,China Academy of Engineering Physics,Mianyang Sichuan 621900,China)
出处
《计算机应用研究》
CSCD
北大核心
2020年第3期763-767,共5页
Application Research of Computers
基金
国家重点研发计划资助项目(2016YFB0201504)。
关键词
执行时间预测
作业调度
遗传算法
K近邻
application runtimes prediction
job scheduling
genetic algorithm
K-nearest neighbor