用户名: 密码: 验证码:
High-Impact Bug Report Identification with Imbalanced Learning Strategies
详细信息    查看全文
  • 作者:Xin-Li Yang ; David Lo ; Xin Xia ; Qiao Huang…
  • 关键词:high ; impact bug ; imbalanced learning ; bug report identification
  • 刊名:Journal of Computer Science and Technology
  • 出版年:2017
  • 出版时间:January 2017
  • 年:2017
  • 卷:32
  • 期:1
  • 页码:181-198
  • 全文大小:
  • 刊物类别:Computer Science
  • 刊物主题:Computer Science, general; Software Engineering; Theory of Computation; Data Structures, Cryptology and Information Theory; Artificial Intelligence (incl. Robotics); Information Systems Applications (
  • 出版者:Springer US
  • ISSN:1860-4749
  • 卷排序:32
文摘
In practice, some bugs have more impact than others and thus deserve more immediate attention. Due to tight schedule and limited human resources, developers may not have enough time to inspect all bugs. Thus, they often concentrate on bugs that are highly impactful. In the literature, high-impact bugs are used to refer to the bugs which appear at unexpected time or locations and bring more unexpected effects (i.e., surprise bugs), or break pre-existing functionalities and destroy the user experience (i.e., breakage bugs). Unfortunately, identifying high-impact bugs from thousands of bug reports in a bug tracking system is not an easy feat. Thus, an automated technique that can identify high-impact bug reports can help developers to be aware of them early, rectify them quickly, and minimize the damages they cause. Considering that only a small proportion of bugs are high-impact bugs, the identification of high-impact bug reports is a difficult task. In this paper, we propose an approach to identify high-impact bug reports by leveraging imbalanced learning strategies. We investigate the effectiveness of various variants, each of which combines one particular imbalanced learning strategy and one particular classification algorithm. In particular, we choose four widely used strategies for dealing with imbalanced data and four state-of-the-art text classification algorithms to conduct experiments on four datasets from four different open source projects. We mainly perform an analytical study on two types of high-impact bugs, i.e., surprise bugs and breakage bugs. The results show that different variants have different performances, and the best performing variants SMOTE (synthetic minority over-sampling technique) + KNN (K-nearest neighbours) for surprise bug identification and RUS (random under-sampling) + NB (naive Bayes) for breakage bug identification outperform the F1-scores of the two state-of-the-art approaches by Thung et al. and Garcia and Shihab.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700