Fraud is a major challenge facing telecommunication industry. A huge amount of revenues are lost to these fraudsters who have developed different techniques and strategies to defraud the service providers. For any ser...Fraud is a major challenge facing telecommunication industry. A huge amount of revenues are lost to these fraudsters who have developed different techniques and strategies to defraud the service providers. For any service provider to remain in the industry, the expected loss from the activities of these fraudsters should be highly minimized if not eliminated completely. But due to the nature of huge data and millions of subscribers involved, it becomes very difficult to detect this group of people. For this purpose, there is a need for optimal classifier and predictive probability model that can capture both the present and past history of the subscribers and classify them accordingly. In this paper, we have developed some predictive models and an optimal classifier. We simulated a sample of eighty (80) subscribers: their number of calls and the duration of the calls and categorized it into four sub-samples with sample size of twenty (20) each. We obtained the prior and posterior probabilities of the groups. We group these posterior probability distributions into two sample multivariate data with two variates each. We develop linear classifier that discriminates between the genuine subscribers and fraudulent subscribers. The optimal classifier (βA+B) has a posterior probability of 0.7368, and we classify the subscribers based on this optimal point. This paper focused on domestic subscribers and the parameters of interest were the number of calls per hour and the duration of the calls.展开更多
针对多种故障类型的特征属性相互交叉导致故障难以辨识的问题,提出一种考虑相邻点之间成为近邻点概率的新度量函数。将新提出的近邻概率距离(Nearby Probability Distance,NPD)应用于局部保持投影算法(Locality Preserving Projection,L...针对多种故障类型的特征属性相互交叉导致故障难以辨识的问题,提出一种考虑相邻点之间成为近邻点概率的新度量函数。将新提出的近邻概率距离(Nearby Probability Distance,NPD)应用于局部保持投影算法(Locality Preserving Projection,LPP)与K-近邻(K-Nearest Neighbor,KNN)分类器中,提出基于近邻概率距离的局部保持投影算法(Nearby Probability Distance Locality Preserving Projection,NPDLPP)与基于近邻概率距离的K-近邻(Nearby Probability Distance K-Nearest Neighbor,NPDKNN)分类器;首先通过时域、频域特征提取方法,将振动信号转化为高维特征数据集,然后通过NPDLPP将高维数据集降维到低维空间,最后将降维得到的低维敏感特征集输入到NPDKNN中进行模式识别;用一个双跨度转子系统的振动信号集合进行验证,证明了所提出的降维算法效果明显,它能够达到各个故障类型更好分离。研究表明,新提出的近邻概率距离较传统的欧式距离测度更能最小化类内散度,最大化类间分离度。展开更多
在文本分类中,选取一个高效的分类算法是提高文本分类准确度,缩短分类时间的关键。提出基于指数分布族的多项式贝叶斯类特定分类算法(exponential family-multinomial naive Bayes,EF-MNB),基于多项式模型构造了 N 个类的分布,利用类特...在文本分类中,选取一个高效的分类算法是提高文本分类准确度,缩短分类时间的关键。提出基于指数分布族的多项式贝叶斯类特定分类算法(exponential family-multinomial naive Bayes,EF-MNB),基于多项式模型构造了 N 个类的分布,利用类特定特征选择算法得到第 N 个类的特征子集及对应类的特征概率密度函数(probability density function,PDF),通过指数分布族构造了 N 个类的原始PDF估计表达式,给定 N 个类的训练集,得到了第 N 个类的最优PDF估计,并基于贝叶斯定理制定了分类规则。仿真结果表明,与基于文档主题生成模型和支持向量机(latent dirichlet allocation-support vector machine,LDA-SVM)的层次分析分类算法、改进的超球支持向量机(improved hyper-sphere support vector machine,IHS-SVM)文本分类算法和基于主成份分析和k最近邻(principal component analysis-k-nearest-neighbor,PCA-KNN)混合分类算法相比,EF-MNB类特定分类算法使用少量的时间就可获得更高分类准确率。展开更多
文摘Fraud is a major challenge facing telecommunication industry. A huge amount of revenues are lost to these fraudsters who have developed different techniques and strategies to defraud the service providers. For any service provider to remain in the industry, the expected loss from the activities of these fraudsters should be highly minimized if not eliminated completely. But due to the nature of huge data and millions of subscribers involved, it becomes very difficult to detect this group of people. For this purpose, there is a need for optimal classifier and predictive probability model that can capture both the present and past history of the subscribers and classify them accordingly. In this paper, we have developed some predictive models and an optimal classifier. We simulated a sample of eighty (80) subscribers: their number of calls and the duration of the calls and categorized it into four sub-samples with sample size of twenty (20) each. We obtained the prior and posterior probabilities of the groups. We group these posterior probability distributions into two sample multivariate data with two variates each. We develop linear classifier that discriminates between the genuine subscribers and fraudulent subscribers. The optimal classifier (βA+B) has a posterior probability of 0.7368, and we classify the subscribers based on this optimal point. This paper focused on domestic subscribers and the parameters of interest were the number of calls per hour and the duration of the calls.
文摘针对多种故障类型的特征属性相互交叉导致故障难以辨识的问题,提出一种考虑相邻点之间成为近邻点概率的新度量函数。将新提出的近邻概率距离(Nearby Probability Distance,NPD)应用于局部保持投影算法(Locality Preserving Projection,LPP)与K-近邻(K-Nearest Neighbor,KNN)分类器中,提出基于近邻概率距离的局部保持投影算法(Nearby Probability Distance Locality Preserving Projection,NPDLPP)与基于近邻概率距离的K-近邻(Nearby Probability Distance K-Nearest Neighbor,NPDKNN)分类器;首先通过时域、频域特征提取方法,将振动信号转化为高维特征数据集,然后通过NPDLPP将高维数据集降维到低维空间,最后将降维得到的低维敏感特征集输入到NPDKNN中进行模式识别;用一个双跨度转子系统的振动信号集合进行验证,证明了所提出的降维算法效果明显,它能够达到各个故障类型更好分离。研究表明,新提出的近邻概率距离较传统的欧式距离测度更能最小化类内散度,最大化类间分离度。
文摘在文本分类中,选取一个高效的分类算法是提高文本分类准确度,缩短分类时间的关键。提出基于指数分布族的多项式贝叶斯类特定分类算法(exponential family-multinomial naive Bayes,EF-MNB),基于多项式模型构造了 N 个类的分布,利用类特定特征选择算法得到第 N 个类的特征子集及对应类的特征概率密度函数(probability density function,PDF),通过指数分布族构造了 N 个类的原始PDF估计表达式,给定 N 个类的训练集,得到了第 N 个类的最优PDF估计,并基于贝叶斯定理制定了分类规则。仿真结果表明,与基于文档主题生成模型和支持向量机(latent dirichlet allocation-support vector machine,LDA-SVM)的层次分析分类算法、改进的超球支持向量机(improved hyper-sphere support vector machine,IHS-SVM)文本分类算法和基于主成份分析和k最近邻(principal component analysis-k-nearest-neighbor,PCA-KNN)混合分类算法相比,EF-MNB类特定分类算法使用少量的时间就可获得更高分类准确率。