摘要
为了解决现有启发式日志解析方法中日志特征表示区分能力不足导致解析精度低、泛化差的问题,提出了一种启发式在线日志解析方法PosParser。该方法使用来源于触发词概念的功能词序列作为特征表示,包含解决复杂日志易过度解析问题的两阶段检测方法和处理变长参数日志的后处理流程。PosParser在16个真实日志数据集上取得了0.952的平均解析准确率,证明了功能词序列具有良好区分性、PosParser有良好的解析效果和鲁棒性。
To solve the problems of low parsing accuracy and poor generalization caused by the insufficient distinguishing abi-lity of log feature representations for logs used in existing heuristic log parsing methods,this paper proposed PosParser,a heuristic online log parsing method.The method used function token sequence(FTS)derived from the concept of trigger words as feature representations,and consisted of the two-stage detection method for solving the problem of complex logs that were prone to over-parsing,and the post-processing for dealing with variable-length parameter logs.PosParser achieved an average parsing accuracy of 0.952 on 16 real-life log datasets.The results demonstrate that FTS has adequate distinguishing ability for logs and PosParser is effective and robust.
作者
蒋金钊
傅媛媛
徐建
Jiang Jinzhao;Fu Yuanyuan;Xu Jian(School of Computer Science&Engineering,Nanjing University of Science&Technology,Nanjing 210094,China)
出处
《计算机应用研究》
CSCD
北大核心
2024年第1期217-221,共5页
Application Research of Computers
基金
国防基础科研计划国防科技重点实验室稳定支持项目(WDZC20225250405)
国家自然科学基金资助项目(61872186)。
关键词
日志分析
日志解析
触发词提取
词性标注
系统运维
log analysis
log parsing
trigger word extraction
part-of-speech tagging
system maintenance