An integrated graph neural network model for joint software defect prediction and code quality assessment.
Ping Dai, HongJun Zhu, Jinhua Wu, Hao He
Abstract
Open AccessCurrent software defect prediction and code quality assessment methods treat these inherently related tasks independently, failing to leverage their complementary information. Existing graph-based approaches lack the ability to jointly model structural dependencies and quality characteristics, limiting their effectiveness in capturing the complex relationships between defect patterns and code quality indicators. This paper proposes a novel integrated model that simultaneously tackles both objectives using graph neural networks to leverage the inherent graph structure of software systems. Our novelty lies in the first-of-its-kind integration of multi-level graph representations (AST, CFG, DFG) with a dual-branch attention-based GNN architecture for simultaneous defect prediction and quality assessment. Our approach constructs multi-level graph representations by integrating abstract syntax trees, control flow graphs, and data flow graphs, capturing both syntactic and semantic relationships in source code. The proposed dual-branch GNN architecture employs shared representation learning with attention mechanisms and multi-task optimization to exploit complementary information between defect prediction and quality assessment tasks. Comprehensive experiments on six real-world software projects demonstrate significant improvements over traditional methods, achieving F1-scores of 0.811 and AUC values of 0.896 for defect prediction, while showing 9.3% average improvement in code quality assessment accuracy across multiple quality dimensions. The integration strategy proves effective in capturing complex structural dependencies and provides actionable insights for software development teams, establishing a foundation for intelligent software engineering tools that deliver comprehensive code analysis capabilities.