摘要
计算社会科学与大数据的结合,将扎根真相指导数据挖掘,一方面有助于修补资料挖掘的不足,另一方面也可以将社会科学研究的议题与理论指导算法。以如何在中国风险投资产业中找到产业领袖为例,建基于在复杂网中寻找最具有影响力结点的方法,首先,需要基于业内访谈和德菲尔法,得到关于"产业领袖"的定义和业内公认的名单,用以验证之后数据挖掘的模型结果是否有解释力。然后,用清科数据库的VC共同投资数据建立动态的产业结构网络,计算各项网络指标。最后,利用数据挖掘的方法找到由动态数据直接进行分组的合理方案,从而验证有现实意义的寻找"产业领袖"的指标。在社会科学的理论指导下进行定性以及定量资料收集,提供了大数据资料挖掘的扎根真相,这使得过去做声音、图像、地景之类的大数据研究有了更多议题、方法和理论上的发展空间。计算社会科学的核心方法正是不断地在社会科学理论指导下的算法与社会科学方法上的扎根真相之间往往复复地对话,使得算法越来越接近扎根真相。
The purpose of this paper is to illustrate the importance of ground truth in data mining. The Chinese Venture Capital(VC) industry network is used as an example to show how ground truth can help identify industry leaders. Guanxi Circle theory states that leaders have a group of followers that form an ego-centered network. In the Chinese VC industry important companies generally lead investments and have their own network of companies that act as their followers, in other words, a Guanxi Circle. To identify these leaders,this research uses the Delphi method. We interviewed four experts to build our list of VC industry leaders, creating this paper's groundtruth. We then used various predictors and computing methods to find the best model to get accurate predictions. In the final section, wediscuss the influence of ground truth on both theoretical and methodological development.
出处
《探索与争鸣》
CSSCI
北大核心
2018年第7期94-102,共9页
Exploration and Free Views
基金
清华大学校内自主科研项目“基于通讯数据的关系强度与社会资本挖掘”(20175080105)
腾讯研究项目“以微信及QQ大数据分析个人人脉”(20162001703)
关键词
产业领袖
网络分析
扎根真相
风险投资
资料挖掘
Industry Leader
Network Analysis
Rooted in the truth
Venture Capital
Data Mining