摘要
现代软件开发大量依赖类库以及第三方软件框架和开发,为此软件开发者经常需要寻找能够解决特定问题的应用程序编程接口API并通过示例代码学习相关API的使用方式。然而,由于开发者的问题描述与相关的API及其使用代码之间的词汇上经常存在差异,因此直接的代码搜索效果往往不好。Stack Overflow等众包问答网站上存在着很多开发问题及建议解决方案,其中也会包含一些API等代码元素,可以成为问题描述和代码内容之间匹配的桥梁。基于这一思想,提出并实现一种基于众包问答信息的API使用代码搜索方法。该方法首先借助于众包问答信息将问题描述对应到相关的代码元素上,并生成一种包含结构信息的代码骨架,然后基于代码骨架对代码库(例如开源项目代码)进行搜索和匹配,从而生成搜索结果。为了验证方法的有效性,收集了Stack Overflow上的137多万条问答信息以及GitHub上的3亿行Java源码,并针对30个与API相关的问题进行了实验验证。结果表明,96.6%的问题能在前十个结果中找到正确答案,有40%的问题能在第一个结果中找到正确答案,并且所有问题能在2 s内返回结果。
Modern software development relies heavily on class libraries and third-party software frameworks and development,for which software developers often have to look for an API that solves a specific problem and learn how to use the APIs through the example code. However,direct code search tends to be less effective due to the frequent differences in the vocabulary between the developer's problem description and the associated API and its usage code.There are many developments and suggestion solutions on the Stack Overflow and other crowdsourcing Q A web sites,which also include some API and other code elements. It can be a bridge between the problem description and the content of the code. Based on this idea,we proposed and implemented a code search method based on crowdsourcing QA information. The problem was mapped to related code elements using crowdsourcing question answering information. Search and match the code base( e. g. open source project code) based on the code skeleton to generate search results. To verify the effectiveness of the method,we collected more than 1. 37 million Q A messages from Stack Overflow and 300 million lines of open source Java codes from Git Hub and validated against 30 API-related issues.The results show that 96. 6% of the questions find the correct answer in the top ten,40% of the questions find the correct answer in the first result,and all the questions return the result in 2 seconds.
作者
李宇琨
彭鑫
赵文耘
Li Yukun,Peng Xin,Zhao Wenyun(Software School, Fudan University, Shanghai 201203, China;Shanghai Key Laboratory of Data Science, Fudan University, Shanghai 201203, China)
出处
《计算机应用与软件》
北大核心
2018年第7期43-51,共9页
Computer Applications and Software
关键词
代码搜索
代码特征提取
文本摘要
Code search
Code feature extraction
Text summarization