摘要
针对已有采样方法在处理大规模事件日志时仍存在效率低下且无法保证模型质量的问题,提出面向日志完备性的事件日志采样方法,包括完全遍历采样法、集合覆盖采样法、基于轨迹长度的采样方法和基于轨迹频次的采样方法,并在开源流程挖掘工具平台ProM中实现。采用9个公开事件日志数据集从时间性能分析和模型质量评估两方面实验表明,所提采样方法在保证模型挖掘质量的前提下能够大幅提高日志采样效率。
The event log sampling method can improve the efficiency of model discovery. The existing sampling methods still have the problem of low efficiency and cannot guarantee the model quality when dealing with large-scale event logs. Therefore, an event log sampling approach oriented log completeness was proposed, which included brute force sampling, set coverage sampling, trace length-based sampling and trace frequency-based sampling. The proposed sampling approaches had been implemented in the open-source process mining toolkit ProM. Furthermore, experiments using 9 public event log datasets from both time performance analysis and model quality evaluation showed that the proposed sampling approaches could greatly improve the efficiency of log sampling on the premise of ensuring the quality of model mining.
作者
苏轩
刘聪
张帅鹏
曾庆田
李彩虹
SUXuan;LIU Cong;ZHANG Shuaipeng;ZENG Qingtian;LI Caihong(School of Computer Science and Technology,Shandong University of Technology,Zibo 255000,China;College of Computer Science and Engineering,Shandong University of Science and Technology,Qingdao 266590,China)
出处
《计算机集成制造系统》
EI
CSCD
北大核心
2022年第10期3156-3165,共10页
Computer Integrated Manufacturing Systems
基金
国家自然科学基金资助项目(61902222)
山东省泰山学者工程专项基金资助项目(tsqn201909109)
山东省自然科学基金优秀青年基金资助项目(ZR2021YQ45)
山东省高等学校青创科技计划创新团队资助项目(2021KJ031)。