摘要
针对真实网络流量缺乏标记数据集的问题,文章提出了一种无监督异常流量检测方法。通过对四川大学网络出口流量行为的分析和研究,构建了用户行为特征集,利用改进的k-means++余弦聚类方法建立正常流量行为模型,通过度量流量行为与正常行为模型之间的偏离距离以识别异常流量。文章通过Spark大数据处理平台实现了特征抽取、k-means改进算法和异常检测的研发,通过实验验证了该方法的可行性和有效性,实验结果表明文章提出的方法对异常流量行为检测具有较高的准确性和敏感性。
Real network environment lack of labeled data set, so traditional anomaly traffic detection method based on labeled data set is xmsuitable for actual large-scale network. To resolve this, the paper proposes an improved k-means anomaly traffic detection method for unlabeled data sets. Firstly, collect the Sichuan University network outlet flow and store in the distributed file system; secondly, construct user behavior feature set on the basis of network flow analysis, and extract relevant characteristics by Spark big data processing platform. Referenced principles of group to define the normal behavior of clusters in the actual flow, construct normal flow behavior model on improved k-means++ cosine clustering method; Finally, the cosine distance between the normal behavior model and user actual flow behavior is calculated to detected anomaly flow behavior. The feasibility and validity of the method are verified by attacking experiment. The experimental results show that the normal flow behavior model for anomaly flow detection has higher accuracy.
出处
《信息网络安全》
2016年第11期45-51,共7页
Netinfo Security
基金
国家自然科学基金[61272447]