Applying GNN for Source Code Analysis: Code Smell Identification

CEBAN, Dan

Home
→
Facultatea Calculatoare, Informatică şi Microelectronică
→
Teze de master
→
Program de studii - Ingineria software (IS)
→
2026
→
View Item

dc.contributor.advisor	CATRUC, Mariana
dc.contributor.author	CEBAN, Dan
dc.date.accessioned	2026-02-26T09:04:44Z
dc.date.available	2026-02-26T09:04:44Z
dc.date.issued	2026
dc.identifier.citation	CEBAN, Dan. Applying GNN for Source Code Analysis: Code Smell Identification. Teză de master. Programul de studiu Ingineria software. Conducător ştiinţific CATRUC Mariana, lect. univ. Universitatea Tehnică a Moldovei. Chișinău, 2026.	en_US
dc.identifier.uri	https://repository.utm.md/handle/5014/35484
dc.description	Fişierul ataşat conţine: Abstract, Contents, Introduction, Bibliography.	en_US
dc.description.abstract	The main idea behind this study is that source code is naturally graph-structured and that automated analysis must work with representations that keep the grammatical hierarchy, control-flow semantics, and data dependencies intact. Linear or token-based representations compress these dimensions, leading to significant information loss and an incapacity to analyze non-local interactions crucial for comprehending software quality. To address this discrepancy, the thesis utilizes graph-based representations informed by compiler theory, particularly the Code Property Graph, which integrates Abstract Syntax Trees, Control- Flow Graphs, and Data-Flow Graphs into a cohesive and expressive framework. This format allows for the modeling of programs as interconnected systems instead of separate sequences of instructions. This gives a more accurate picture of how software is designed and run. The research builds on this structural base and suggests a GNN-based learning framework that can find code smells as abnormalities at the graph or node level in software systems. Vulnerabilities are usually small logical flaws, while code smells are bigger architectural problems like too much coupling, too little cohesion, too many complex control structures, or too much concentration of responsibilities. These traits are inherently relational and topological, which makes them perfect for graph-based analysis. The suggested method sees code smell detection as a structural learning problem. In this case, the GNN learns to link certain graph patterns and neighborhood configurations with known design anti-patterns. The results show that GNNs can learn design principles without having to explicitly write them down. They do this by learning representations that fit with common sense in software engineering. In addition to raw speed, the study looks at how easy it is to understand the models. It shows that the learnt representations may be used to find important nodes and edges in the code graph. This feature is very important for practical use since it lets developers link predictions back to real design problems in the source code. In summary, this thesis presents empirical proof that Graph Neural Networks, in conjunction with comprehensive graph representations of source code, deliver a robust and scalable solution for automated code smell detection. The suggested method improves the state of the art in software quality assessment by bringing together traditional static analysis and current machine learning. The findings indicate that graph-based learning models may underpin next-generation developer tools designed to proactively address technical debt, enhance maintainability, and ultimately improve the long-term sustainability of complex software systems.	en_US
dc.language.iso	en	en_US
dc.publisher	Universitatea Tehnică a Moldovei	en_US
dc.rights	Attribution-NonCommercial-NoDerivs 3.0 United States	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/us/	*
dc.subject	source code	en_US
dc.subject	Code Property Graph	en_US
dc.subject	Graph Neural Network	en_US
dc.title	Applying GNN for Source Code Analysis: Code Smell Identification	en_US
dc.type	Thesis	en_US