’Long tail’data is the difficult-to-get-at data that sits in libraries,institutes and on the computers of individual scientists.Informatics specialists like to contrast it with the smaller number of large,more acces...’Long tail’data is the difficult-to-get-at data that sits in libraries,institutes and on the computers of individual scientists.Informatics specialists like to contrast it with the smaller number of large,more accessible data sets(e.g.Sinha et al.,2013).The name’long tail’derives from graphs drawn of the size of data sets against their number:there are relatively few large datasets and a lot of smaller ones.展开更多
大数据时代,众包系统需要通过聚合多个数据提供者的数据来获得准确的真相。在基于指纹识别的蓝牙定位应用场景中,通过对数据的长尾特性、连续性和地理关系的研究,提出了一种处理具有地理相关性的连续长尾数据的机制,即an accurate data ...大数据时代,众包系统需要通过聚合多个数据提供者的数据来获得准确的真相。在基于指纹识别的蓝牙定位应用场景中,通过对数据的长尾特性、连续性和地理关系的研究,提出了一种处理具有地理相关性的连续长尾数据的机制,即an accurate data aggregation mechanism processing sequential long-tail data with spatial relativity(DAST-SR)。为了捕获数据的长尾特性,该机制使用数据源出现错误的置信上限来估计可信度。而为了捕获数据的连续特性和地理相关性,该机制联合使用数据源提供的数据、前一时刻的聚合真相和相关实体的聚合真相作为虚拟源,聚合获得真相。通过虚拟数据集上的仿真,与an accurate data aggregation mechanism incorporating sequential long-tail characteristics(DAST)、Dynamic Truth Discovery(DynaTD)和truth discovery on correlated entities(TD-corr)相比,DAST-SR聚合结果的平均绝对误差和均方根误差最小,聚合的结果更加准确。展开更多
文摘’Long tail’data is the difficult-to-get-at data that sits in libraries,institutes and on the computers of individual scientists.Informatics specialists like to contrast it with the smaller number of large,more accessible data sets(e.g.Sinha et al.,2013).The name’long tail’derives from graphs drawn of the size of data sets against their number:there are relatively few large datasets and a lot of smaller ones.
文摘大数据时代,众包系统需要通过聚合多个数据提供者的数据来获得准确的真相。在基于指纹识别的蓝牙定位应用场景中,通过对数据的长尾特性、连续性和地理关系的研究,提出了一种处理具有地理相关性的连续长尾数据的机制,即an accurate data aggregation mechanism processing sequential long-tail data with spatial relativity(DAST-SR)。为了捕获数据的长尾特性,该机制使用数据源出现错误的置信上限来估计可信度。而为了捕获数据的连续特性和地理相关性,该机制联合使用数据源提供的数据、前一时刻的聚合真相和相关实体的聚合真相作为虚拟源,聚合获得真相。通过虚拟数据集上的仿真,与an accurate data aggregation mechanism incorporating sequential long-tail characteristics(DAST)、Dynamic Truth Discovery(DynaTD)和truth discovery on correlated entities(TD-corr)相比,DAST-SR聚合结果的平均绝对误差和均方根误差最小,聚合的结果更加准确。