摘要
为了帮助药学实现智能化,解决药学领域缺失有效的数据集的困境。本文提出了构建一个中药数据集,同时为应对数据获取过程中存在的障碍,本文提出基于Selenium构建中药数据集。值得提出的是,在数据标注过程中,本文引入一种人在环路(human-in-the-loop)的数据标注方式。本文提出一个包含6112张图片的中药识别数据集,使用这种半自动的标注模式能自动标注多达64%的数据。经过抽样检测,标注错误率仅为1.4%。
In order to help pharmacy achieve intelligence and solve the plight of the lack of effective data sets in the field of pharmacy.This article proposes to construct a traditional Chinese medicine data set.At the same time,in order to deal with the obstacles in the data acquisition process,this article proposes to construct a traditional Chinese medicine data set based on Selenium.It is worth mentioning that in the process of data labeling,this article introduces a human-in-theloop data labeling method.This paper proposes a traditional Chinese medicine identification data set containing 6112 pictures.Using this semi-automatic labeling mode can label up automatically to 64%of the data.After sampling and testing,the labeling error rate is only 1.4%.
作者
吴楠
娄洁
吕娟
WU Nan;LOU Jie;LV Juan(Yunnan Medical Health College,Kunming,650101;Medical School,Yunnan College of Business Management,Kunming,650106)
出处
《办公自动化》
2021年第21期15-17,共3页
Office Informatization
基金
云南医药健康职业学院科学研究基金项目(2020Y004)