摘要
对校园网主干数据流中IP地址进行聚类,可以得到网络用户访问地址的分布概况从而了解用户行为特征。已有聚类算法大都将IP地址作为普通数字考虑,忽略了其特征属性以致聚类结果不合理。为此提出一种改进算法:首先基于最长前缀匹配和改进的最近邻规则算法得到初始聚类,然后运用逐步优化层次聚类的思想进一步聚合最靠近子类,最终得到基于IP地址特征属性的聚类。实验结果表明该算法与以往算法相比,提高了聚类效果,具有较好的准确性和可行性。
The cluster analysis of IP addresses can reveal useful knowledge for profiling of traffic flows and user behavior. However, the popular clustering algorithms were not applicable directly to IP addresses of the campus network traffic flows. The clusters which were generated by generic algorithms were inconsistent with the IP addresses partition and difficult to interpret. To overcome the shortcoming of the current algorithms which neglect the characteristics of IP addresses, a new algorithm which could effectively improve IP addresses clustering was proposed. Firstly, the initial clusters were got by adopting the longest prefix algorithm and the nearest neighbor clustering algorithm. Then the thought of stepwise-optimal hierarchical clustering was applied to merge the nearest groups of initial clusters. The similarity between initial clusters was determined by the longest prefix of IP addresses contained in these clusters. Finally, the algorithm automatically and meaningfully yielded clusters which were in accord with the characteristics of IP addresses on traffic flows. The results show that the proposed algorithm is accurate and effective in clustering IP addresses and robust to the input sequence of data.
出处
《计算机应用》
CSCD
北大核心
2007年第8期1862-1864,1867,共4页
journal of Computer Applications
基金
江苏省教育厅高校科学研究基金资助项目(03KJD520073)
关键词
IP地址聚类
最近邻规则
最长前缀匹配
逐步优化的层次聚类
IP address clustering
nearest neighbor nile
Longest Prefix Match (LPM)
stepwise-optimal hierarchical clustering