用户名: 密码: 验证码:
Label propagation based semi-supervised learning for software defect prediction
详细信息    查看全文
文摘
Software defect prediction can automatically predict defect-prone software modules for efficient software test in software engineering. When the previous defect labels of modules are limited, predicting the defect-prone modules becomes a challenging problem. In static software defect prediction, there exist the similarity among software modules, a software module can be approximated by a sparse representation of the other part of the software modules, and class-imbalance problem, the number of defect-free modules is much larger than that of defective ones. In this paper, we propose to use graph based semi-supervised learning technique to predict software defect. By using Laplacian score sampling strategy for the labeled defect-free modules, we construct a class-balance labeled training dataset firstly. And then, we use a nonnegative sparse algorithm to compute the nonnegative sparse weights of a relationship graph which serve as clustering indicators. Lastly, on the nonnegative sparse graph, we use a label propagation algorithm to iteratively predict the labels of unlabeled software modules. We thus propose a nonnegative sparse graph based label propagation approach for software defect classification and prediction, which uses not only few labeled data but also abundant unlabeled ones to improve the generalization capability. We vary the size of labeled software modules from 10 to 30 % of all the datasets in the widely used NASA projects. Experimental results show that the NSGLP outperforms several representative state-of-the-art semi-supervised software defect prediction methods, and it can fully exploit the characteristics of static code metrics and improve the generalization capability of the software defect prediction model.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700