Evaluating whisper’s speech to text performance for romanian using audio from diverse domains

CERNEI, Ion

Home
→
Colecția instituțională
→
Conferințe
→
Conferinţa tehnico-ştiinţifică a studenţilor, masteranzilor şi doctoranzilor
→
2025
→
Secţia Calculatoare, Informatică şi Microelectronică
→
Subsecţia Ştiinţa Calculatoarelor
→
View Item

dc.contributor.advisor	BEȘLIU, Corina
dc.contributor.author	CERNEI, Ion
dc.date.accessioned	2026-01-15T06:25:19Z
dc.date.available	2026-01-15T06:25:19Z
dc.date.issued	2026
dc.identifier.citation	CERNEI, Ion. Evaluating whisper’s speech to text performance for romanian using audio from diverse domains. In: Conferinţa Tehnico-Ştiinţifică a Colaboratorilor, Doctoranzilor şi Studenţilor = The Technical Scientific Conference of Undergraduate, Master and PhD Students, 14-16 Mai 2025. Universitatea Tehnică a Moldovei. Chişinău: Tehnica-UTM, 2026, vol. 1, pp. 771-774. ISBN 978-9975-64-612-3, ISBN 978-9975-64-613-0 (PDF).	en_US
dc.identifier.isbn	978-9975-64-612-3
dc.identifier.isbn	978-9975-64-613-0
dc.identifier.uri	https://repository.utm.md/handle/5014/34447
dc.description.abstract	This study evaluates the Whisper model’s performance for Romanian speech-to-text transcription, investigating how transcription accuracy varies across diverse audio domains. Audio sources, including audiobooks, news broadcasts, and official public speeches, were selected for their verified textual references, ensuring robust evaluation through accurate alignment. Each domain presents distinct linguistic and acoustic characteristics, from the structured and clear narration of audiobooks to the dynamic and occasionally noisy environments of live news, to the formal rhetoric of political discourse. The study uses standard evaluation metrics such as Word Error Rate (WER) and Character Error Rate (CER), enabling a consistent assessment of transcription performance. By focusing on Romanian, a low-resource language in automatic speech recognition, this study provides novel insights into Whisper’s effectiveness and the influence of the audio domain on transcription quality, contributing to advancements in speech recognition for under-resourced languages. Results show that Whisper performs best on scripted, high-quality audio such as audiobooks. At the same time, accuracy decreases in more variable and spontaneous contexts, highlighting the model’s sensitivity to content structure and recording conditions.	en_US
dc.language.iso	en	en_US
dc.publisher	Universitatea Tehnică a Moldovei	en_US
dc.relation.ispartofseries	Conferinţa tehnico-ştiinţifică a studenţilor, masteranzilor şi doctoranzilor = The Technical Scientific Conference of Undergraduate, Master and PhD Students: 14-16 mai 2025;
dc.rights	Attribution-NonCommercial-NoDerivs 3.0 United States	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/us/	*
dc.subject	automatic speech recognition	en_US
dc.subject	low-resource languages	en_US
dc.subject	error metrics	en_US
dc.subject	speech analysis	en_US
dc.subject	domain-specific evaluation	en_US
dc.title	Evaluating whisper’s speech to text performance for romanian using audio from diverse domains	en_US
dc.type	Article	en_US