摘要
为了实现准确批量检测赌博网站,依据现阶段赌博网站的特点,区别于传统的赌博网站检测技术,以域名作为研究对象,对已有赌博网站下链接进行爬取,截取新增域名作为待分类数据,利用社区发现算法聚类新域名后,结合Page Rank算法获得的PR值排序,实现批量分类赌博网站和白名单网站,最终可实现单次批量获取60%以上赌博网站。
In order to achieve accurate batch detection of gambling websites,according to the characteristics of gambling websites at this stage,differ⁃ing from the traditional gambling website detection technology,using domain name as research object,crawling links under existing gam⁃bling websites,intercepting new domain name as data to be classified,after clustering new domain names with community discovery algo⁃rithm,realizing batch classification of gambling websites and whitelist websites by sorted PR value.Eventually,more than 60%of gambling websites can be obtained in a single batch.
作者
薛宛玥
洪磊
陈维杰
程欣
XUE Wan-yue;HONG Lei;CHEN Wei-jie;CHENG Xin(Department of Computer Information and Network Security,Jiangsu Police Academy,Nanjing 210031;Network Security Detachment,Nanjing Public Security Bureau,Nanjing 210031)
出处
《现代计算机》
2020年第2期3-7,共5页
Modern Computer
基金
江苏省现代教育技术研究重点课题(No.2017-R-59195)
江苏警官学院教育教学改革研究项目(No.2019A30)
大学生实践创新创业训练计划项目:基于深度学习的赌博网站自动识别研究(No.201910329040Y)