期刊文献+
共找到16篇文章
< 1 >
每页显示 20 50 100
基于密度峰值聚类和相对距离的半监督自训练方法
1
作者 孙洁 景志敏 周欢 《统计与决策》 CSSCI 北大核心 2024年第17期53-58,共6页
半监督自训练方法属于半监督自标记方法的一种,它能同时利用有标记样本和无标记样本来训练分类器。然而,对半监督自训练方法而言,误标记是一个不容忽视的问题。为此,文章提出了一种基于密度峰值聚类和相对距离的半监督自训练方法(STDPRD... 半监督自训练方法属于半监督自标记方法的一种,它能同时利用有标记样本和无标记样本来训练分类器。然而,对半监督自训练方法而言,误标记是一个不容忽视的问题。为此,文章提出了一种基于密度峰值聚类和相对距离的半监督自训练方法(STDPRD)。在迭代的自训练过程中,STDPRD首先用密度峰值聚类来选取具有高置信度的无标记样本,再标记他们;其次,STDPRD用相对距离来过滤掉在迭代过程中被误标记的样本;然后,STDPRD把在迭代过程中被正确标记的样本加入有标记集中;最后,STDPRD用被扩充的有标记集来训练给定的分类器,训练完成后,输出被训练的分类器。仿真实验结果表明,在真实数据集上,STDPRD的表现优于4种流行的半监督自训练方法。 展开更多
关键词 半监督学习 半监督分类 相对距离 误标记
下载PDF
Detecting mislabeling and identifying unique progeny in Acacia mapping population using SNP markers 被引量:1
2
作者 Asif Javed Muhammad Mohd Zaki Abdullah +1 位作者 Norwati Muhammad Wickneswari Ratnam 《Journal of Forestry Research》 SCIE CAS CSCD 2017年第6期1118-1126,共9页
Acacia hybrids offer a great potential for paper industry in Southeast Asia due to their fast growth and ability to grow on abandoned or marginal lands. Breeding Acacia hybrids with desirable traits can be achieved th... Acacia hybrids offer a great potential for paper industry in Southeast Asia due to their fast growth and ability to grow on abandoned or marginal lands. Breeding Acacia hybrids with desirable traits can be achieved through marker assisted selection(MAS) breeding. To develop a MAS program requires development of linkage maps and QTL analysis. Two mapping populations were developed through interspecific hybridization for linkage mapping and QTL analysis. All seeds per pod were cultured initially to improve hybrid yield as quality and density of linkage mapping is affected by the size of the mapping population. Progenies from two mapping populations were field planted for phenotypic and genotypic evaluation at three locations in Malaysia,(1) Forest Research Institute Malaysia field station at Segamat, Johor,(2) Borneo Tree Seeds and Seedlings Supplies Sdn, Bhd.(BTS) field trial site at Bintulu, Sarawak, and(3) Asiaprima RCF field trial site at Lancang, Pahang. During field planting, mislabeling was reported at Segamat, Johor, and a similar problem was suspected for Bintulu, Sarawak. Early screening with two isozymes effectively selected hybrid progenies, and these hybrids were subsequently further confirmed by using species-specific SNPs. During field planting, clonal mislabeling was reported and later confirmed by using a small set of STMS markers. A large set of SNPs were also used to screen all ramets in both populations. A total of 65.36% mislabeled ramets were encountered in the wood density population and 60.34% in the fibre length mapping population. No interpopulation pollen contamination was detected because all ramets found their match within the same population in question.However, mislabeling was detected among ramets of the same population. Mislabeled individuals were identified and grouped as they originated from 93 pods for wood density and 53 pods for fibre length mapping populations.On average 2 meiotically unique seeds per pod(179 seeds/93 pods) for wood density and 3 meiotically unique seeds per p 展开更多
关键词 Tree breeding SNP markers mislabeling Linkage mapping Quantitative trait loci(QTL) mapping
下载PDF
社科期刊论文基金项目标注不端行为实证研究--基于八所“211”师范大学社科版学报的343篇论文 被引量:3
3
作者 徐红萍 《苏州教育学院学报》 2020年第3期56-63,共8页
文章抽样调查了八所"211"师范大学社科版学报的343篇论文,研究社科期刊论文基金项目标注不端现状。社科期刊论文存在一文多项目、收稿日期早于项目获批日期或晚于项目结项日期、项目名称和项目编号不全、项目编号错误或不存... 文章抽样调查了八所"211"师范大学社科版学报的343篇论文,研究社科期刊论文基金项目标注不端现状。社科期刊论文存在一文多项目、收稿日期早于项目获批日期或晚于项目结项日期、项目名称和项目编号不全、项目编号错误或不存在等问题。通过分析基金项目标注不端行为的原因,从五个方面提出对策。 展开更多
关键词 社科期刊 基金项目 标注不端 学报 对策
下载PDF
DNA Barcoding and Mini-DNA Barcoding Reveal Mislabeling of Salmonids in Different Distribution Channels in the Qingdao Area 被引量:3
4
作者 HAN Cui DONG Shuanglin +2 位作者 LI Li GAO Qinfeng ZHOU Yangen 《Journal of Ocean University of China》 SCIE CAS CSCD 2021年第6期1537-1544,共8页
There is an increasing demand for salmonid authentication due to the globalization of the salmonid trade.DNA barcoding and mini-DNA barcoding are widely used for identifying fish species based on a fragment of the mit... There is an increasing demand for salmonid authentication due to the globalization of the salmonid trade.DNA barcoding and mini-DNA barcoding are widely used for identifying fish species based on a fragment of the mitochondrial cytochrome c oxidase subunit I(COI)sequence.In this study,rainbow trout(Oncorhynchus mykiss),steelhead trout(O.mykiss),and Atlantic salmon(Salmo salar)collected from two salmonid aquaculture bases in China were authenticated by DNA barcoding(about 650 bp)and mini-DNA barcoding(127 bp)to evaluate the accuracy of the two methods in the identification of different salmonid species.The results revealed that both methods could effectively distinguish O.mykiss and S.salar with 100%accuracy.However,the two methods failed to separate rainbow trout(O.mykiss)and steelhead trout(O.mykiss),which are the same species but cultured in different water environments.Moreover,salmonid samples from three main distribution channels in the Qingdao area(traditional supermarkets,online supermarkets,and sushi bars)were identified by the two methods.Substitution of S.salar with O.mykiss was discovered,and the 27.78%overall substitution rate of salmonids in the Qingdao area was higher than those in other regions reported in previous studies.In addition,the mislabeling rates of salmonids from traditional supermarkets,online supermarkets,and sushi bars were compared in this study.The mislabeling rate was significantly greater in sushi bars(50%)than in the other two channels(16.67%),suggesting that stronger monitoring and enforcement measures are necessary for the aquatic food catering industry. 展开更多
关键词 SALMONID DNA barcoding mini-DNA barcoding species authentication mislabeling rate
下载PDF
基于知识图谱的远程监督关系抽取降噪方法 被引量:2
5
作者 赵晋斌 王琦 +1 位作者 马黎雨 李学思 《火力与指挥控制》 CSCD 北大核心 2023年第10期160-169,共10页
关系抽取任务的研究往往需要人工标注大量训练数据去支撑,而远程监督可以通过自动构建训练数据的方式降低人工的成本和压力,但自动构建的数据集存在着严重的错误标注问题。针对这一问题,提出一种基于知识图谱的远程监督关系抽取降噪方... 关系抽取任务的研究往往需要人工标注大量训练数据去支撑,而远程监督可以通过自动构建训练数据的方式降低人工的成本和压力,但自动构建的数据集存在着严重的错误标注问题。针对这一问题,提出一种基于知识图谱的远程监督关系抽取降噪方法。利用生成对抗网络对数据集进行清洗;融入知识图谱中的实体信息,构建异构信息图;最后利用图注意力网络对异构信息图进行编码,实现关系抽取。在公开数据集NYT10上,相较于主流最优模型在精确率、召回率和F1值上均有所提高,证明知识图谱信息对远程监督关系抽取的重要性。 展开更多
关键词 关系抽取 远程监督 知识图谱 错误标注
下载PDF
一种面向小样本数据的错标记样本识别方法
6
作者 秦瑞斌 郑浩然 周宏 《北京生物医学工程》 2012年第6期574-578,共5页
目的针对小样本数据的错标记问题,本文在CL-stability算法的基础上提出一种加权的错标记样本识别算法(UCL-stability)。方法在UCL-stability算法中,根据样本标记翻转后数据所能选出的差异特征数目,定义了一个投票权值用于衡量翻转不同... 目的针对小样本数据的错标记问题,本文在CL-stability算法的基础上提出一种加权的错标记样本识别算法(UCL-stability)。方法在UCL-stability算法中,根据样本标记翻转后数据所能选出的差异特征数目,定义了一个投票权值用于衡量翻转不同样本标记对分类的影响。结果两组癌症基因表达数据的实验结果表明,UCL-stability与CL-stability算法均能有效识别数据中的可疑样本。通过人为错标记样本的进一步实验,显示UCL-stability算法相比于无投票权的CL-stability算法可取得较高的precision和recall值。结论本文提出的UCL-stability算法不仅考虑了小样本数据中单个样本的标记错误对分类器设计造成的影响,更进一步考虑了不同样本的标记错误对分类结果影响的差异。通过引入特征信息衡量该差异,UCL-stability取得了较好的结果。 展开更多
关键词 错标记 小样本数据 微阵列
下载PDF
Molecular Identification of Dried Shellfish Products Sold on the Market Using DNA Barcoding
7
作者 SUN Shao’e ZHANG Xiaojie +1 位作者 KONG Lingfeng LI Qi 《Journal of Ocean University of China》 SCIE CAS CSCD 2021年第4期931-938,共8页
The dried shellfish products with rich nutrients and low-calorie content are favorite food in China,especially in coastal areas.However,the species of dried shellfish products in the market are usually unknown,as the ... The dried shellfish products with rich nutrients and low-calorie content are favorite food in China,especially in coastal areas.However,the species of dried shellfish products in the market are usually unknown,as the taxonomic features were removed during the production process.This study described the application of DNA barcoding technique to the identification of 100 dried shellfish(scallop,squid,octopus and cuttlefish)products in markets.Samples were authenticated by comparing mitochondrial cytochrome oxidase subunit I(COI)gene and 16S ribosomal RNA(16S rRNA)gene sequences with public reference taxonomic databases.The results showed that all the 100 products can be identified at species level.Sixty four scallop adductor products were processed using the bay scallop,Argopecten irradians,and one was from Portuguese oyster,Crassostrea angulata.All the 27 squid,2 cuttlefish and 6 octopus products were produced by the Jumbo flying squid,Dosidicus gigas.The neighbour-joining tree is in agreement with the results of DNA barcoding analysis.The 64 scallop samples formed one A.irradians cluster,leaving Sca65 clustered with the reference oyster sequence C.angulata(MH997922).All the 35 cephalopod(squid,octopus and cuttlefish)samples formed a D.gigas cluster.This investigation revealed a low variety of dried shellfish products sold on the market,and highlighted the high rate of mislabeling and species substitution.Our present work provides a practical method for tracing and authenticating shellfish products. 展开更多
关键词 dried shellfish product DNA barcoding species identification mislabeling species substitution
下载PDF
基于稀疏重构权的错误标注数据检测方法 被引量:3
8
作者 吴敬生 王靖 杜吉祥 《计算机工程与科学》 CSCD 北大核心 2017年第11期2115-2121,共7页
数据分类的准确性依赖于数据标注的质量和数量,当训练数据被错误标注时,数据分类的准确性会受到很大的影响。针对这种情形,提出一种基于稀疏重构权的错误标注数据检测方法。首先,对含有错误标注数据集采用k近邻的方法求取其近邻点;然后... 数据分类的准确性依赖于数据标注的质量和数量,当训练数据被错误标注时,数据分类的准确性会受到很大的影响。针对这种情形,提出一种基于稀疏重构权的错误标注数据检测方法。首先,对含有错误标注数据集采用k近邻的方法求取其近邻点;然后,通过求解带L1-范数的最小二乘模型计算每个标注数据的局部稀疏重构权,并利用稀疏重构权计算每个标注数据的置信度;最后,通过寻找置信度曲线中最大曲率的位置,自适应地检测出错误标注数据。通过实际数据的实验验证了本文所提算法的有效性。 展开更多
关键词 稀疏重构权 错误标注 置信度 检测
下载PDF
A review of addressing class noise problems of remote sensing classification 被引量:1
9
作者 FENG Wei LONG Yijun +1 位作者 WANG Shuo QUAN Yinghui 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2023年第1期36-46,共11页
The development of image classification is one of the most important research topics in remote sensing. The prediction accuracy depends not only on the appropriate choice of the machine learning method but also on the... The development of image classification is one of the most important research topics in remote sensing. The prediction accuracy depends not only on the appropriate choice of the machine learning method but also on the quality of the training datasets. However, real-world data is not perfect and often suffers from noise. This paper gives an overview of noise filtering methods. Firstly, the types of noise and the consequences of class noise on machine learning are presented. Secondly, class noise handling methods at both the data level and the algorithm level are introduced. Then ensemble-based class noise handling methods including class noise removal, correction, and noise robust ensemble learners are presented. Finally, a summary of existing data-cleaning techniques is given. 展开更多
关键词 class noise label noise mislabeled classification ensemble learning remote sensing
下载PDF
NLWSNet:a weakly supervised network for visual sentiment analysis in mislabeled web images
10
作者 Luo-yang XUE Qi-rong MAO +1 位作者 Xiao-hua HUANG Jie CHEN 《Frontiers of Information Technology & Electronic Engineering》 SCIE EI CSCD 2020年第9期1321-1333,共13页
Large-scale datasets are driving the rapid developments of deep convolutional neural networks for visual sentiment analysis.However,the annotation of large-scale datasets is expensive and time consuming.Instead,it ise... Large-scale datasets are driving the rapid developments of deep convolutional neural networks for visual sentiment analysis.However,the annotation of large-scale datasets is expensive and time consuming.Instead,it iseasy to obtain weakly labeled web images from the Internet.However,noisy labels st.ill lead to seriously degraded performance when we use images directly from the web for training networks.To address this drawback,we propose an end-to-end weakly supervised learning network,which is robust to mislabeled web images.Specifically,the proposed attention module automatically eliminates the distraction of those samples with incorrect labels bv reducing their attention scores in the training process.On the other hand,the special-class activation map module is designed to stimulate the network by focusing on the significant regions from the samples with correct labels in a weakly supervised learning approach.Besides the process of feature learning,applying regularization to the classifier is considered to minimize the distance of those samples within the same class and maximize the distance between different class centroids.Quantitative and qualitative evaluations on well-and mislabeled web image datasets demonstrate that the proposed algorithm outperforms the related methods. 展开更多
关键词 Visual sentiment analysis Weakly supervised learning mislabeled samples Significant sentiment regions
原文传递
非识别体系的一种高度——杰弗里·巴瓦的建筑世界 被引量:7
11
作者 庄慎 华霞虹 《建筑学报》 北大核心 2014年第11期27-35,共9页
斯里兰卡建筑师杰弗里·巴瓦的建筑成就,通常被视为南亚地区将现代性和地方性高度融合的典范。论文试图突破以现代主义为核心的主流建筑学价值体系与可识别标签,从非识别体系的角度来探讨巴瓦的建筑世界:其独特的图纸世界和生活世界... 斯里兰卡建筑师杰弗里·巴瓦的建筑成就,通常被视为南亚地区将现代性和地方性高度融合的典范。论文试图突破以现代主义为核心的主流建筑学价值体系与可识别标签,从非识别体系的角度来探讨巴瓦的建筑世界:其独特的图纸世界和生活世界,相对主义的立场、拿来主义的策略和实用主义的方法,糅杂的构筑体系以及世俗中的精神空间。同时认为,巴瓦所达到的建筑高度可以成为当代建筑师在寻找识别体系之外的建筑学拓展的一个重要参考。 展开更多
关键词 杰弗里·巴瓦 斯里兰卡 非识别体系 图纸 被现代主义 糅杂 世俗生活 精神空间
原文传递
类别误标下证据链推理的群决策分类方法 被引量:4
12
作者 余海燕 沈江 徐曼 《系统工程与电子技术》 EI CSCD 北大核心 2015年第11期2546-2553,共8页
针对群决策分类中可解释性的推理信息存在类别错误标识的问题,提出了类别误标下证据链推理的群决策分类方法。该方法采用可信度函数的一致性和凸性,以查询案例与证据链之间的属性关联相似度作为群决策信息源的权重,建立了基于证据链推... 针对群决策分类中可解释性的推理信息存在类别错误标识的问题,提出了类别误标下证据链推理的群决策分类方法。该方法采用可信度函数的一致性和凸性,以查询案例与证据链之间的属性关联相似度作为群决策信息源的权重,建立了基于证据链推理的混合整数优化模型,实现了决策分类标识能力最大化,同时获取了可解释性最好的证据链集合。该模型考虑了决策类别的错误标识情形,依据可信度序的概念,将推导出的融合可信度作为查询案例推论可解释性的评价标准。通过多源感知数据的诊断实例,说明了该方法的有效性和合理性。 展开更多
关键词 群决策分析 证据链 关联相似度 错误标识 可信度序
下载PDF
基于动态阈值和差异性检验的自训练算法
13
作者 吕佳 邱鸿波 肖锋 《智能系统学报》 CSCD 北大核心 2024年第4期839-852,共14页
针对自训练算法在迭代训练分类器的过程中存在难以有效选取高置信度样本以及误标记样本错误累积的问题,本文提出了基于动态阈值和差异性检验的自训练算法。引入样本的局部离群因子,据此剔除有标签样本中的离群点以及分类标注无标签样本... 针对自训练算法在迭代训练分类器的过程中存在难以有效选取高置信度样本以及误标记样本错误累积的问题,本文提出了基于动态阈值和差异性检验的自训练算法。引入样本的局部离群因子,据此剔除有标签样本中的离群点以及分类标注无标签样本,依据标注分批次处理无标签样本,以使模型更易选取到高置信度的无标签样本;根据新增伪标签样本的数量和对比隶属度的变化,设计一种动态隶属度阈值函数,提升高置信度样本的质量;定义密集距离度量样本间的差异性,分别计算伪标签样本与同类和不同类样本之间的密集距离之和,从而找出不确定度高的伪标签样本,并将此类样本并入下轮训练的无标签样本集中,缓解误标记样本错误累积的问题。实验结果表明,该算法在12个UCI基准数据集上均取得理想效果。 展开更多
关键词 自训练算法 误标记样本 高置信度样本 动态阈值 差异性检验 局部离群因子 对比隶属度 密集距离
下载PDF
基于γ-散度的稳健有序误标记logistic回归
14
作者 郭美君 任明旸 +1 位作者 李仕明 张三国 《中国科学院大学学报(中英文)》 CSCD 北大核心 2022年第3期289-301,共13页
有序多分类方法已经得到了广泛研究。传统的有序多分类方法假设样本的类别标签是不存在误标记的。但是由于实际数据复杂以及人工经验有限,获得标记完全正确的样本是不现实的,因此,传统的方法就存在局限性。提出一种基于γ-散度的有序误... 有序多分类方法已经得到了广泛研究。传统的有序多分类方法假设样本的类别标签是不存在误标记的。但是由于实际数据复杂以及人工经验有限,获得标记完全正确的样本是不现实的,因此,传统的方法就存在局限性。提出一种基于γ-散度的有序误标记logistic回归方法,在处理存在误标记的有序多分类问题时具有很强的稳健性,也就是说,当某一样本被错误标记时它对参数估计的权重小于其被正确标记时的权重。该方法通过最小化γ-散度构建模型,利用梯度下降算法求解模型,不仅具有很强的稳健性而且在模型中可以忽略误标记概率。模拟研究和真实数据分析都说明该有序误标记logistic回归方法在处理存在误标记的有序分类问题时效果很好。 展开更多
关键词 γ-散度 LOGISTIC回归 误标记 有序分类 稳健性
下载PDF
微阵列癌症数据误标记样本和异常样本识别的广义CL-stability算法
15
作者 周柚 张琛 +2 位作者 吴春国 时小虎 梁艳春 《吉林大学学报(理学版)》 CAS CSCD 北大核心 2008年第3期509-511,共3页
针对微阵列癌症数据的特点,提出一种能识别数据集中误标记样本和异常样本的广义CL-stability算法.该算法以CL-stability为基本算子,通过样本的全局稳定性识别误标记样本或异常样本.实验结果表明,广义CL-stability算法对于识别微阵列癌... 针对微阵列癌症数据的特点,提出一种能识别数据集中误标记样本和异常样本的广义CL-stability算法.该算法以CL-stability为基本算子,通过样本的全局稳定性识别误标记样本或异常样本.实验结果表明,广义CL-stability算法对于识别微阵列癌症数据中的误标记样本优于已有算法,并能给出区分误标记样本和异常样本的信息. 展开更多
关键词 误标记样本识别 异常样本识别 微阵列 广义CL—stability算法
下载PDF
上一页 1 下一页 到第
使用帮助 返回顶部