Transfer Learning for Speaker Diarization on Bangla Audio Dataset

dc.contributor.advisorMs. Tanjila Farah
dc.contributor.authorA.N.M Fahim Faisal
dc.contributor.authorUmnoon Binta Ali
dc.contributor.authorMd. Rayhan Talukder
dc.contributor.authorMd. Afikur Rahman
dc.contributor.id1711758642
dc.contributor.id1713013042
dc.contributor.id1712356642
dc.contributor.id1711035042
dc.coverage.departmentElectrical and Computer Engineering
dc.date.accessioned2025-07-23
dc.date.accessioned2025-07-23T04:58:11Z
dc.date.available2025-07-23T04:58:11Z
dc.date.issued2021
dc.description.abstractSound classification is a very intriguing concept in the field of artificial intelligence. Speaker Diarization is a very interesting task in the domain of sound classification. It has recently expanded significantly with the introduction of deep learning technology, which has transformed research and practices throughout speech application fields. Speaker diarization is the process of assigning labels to audio data that match the speaker's identity. It is quite beneficial when it comes to identifying audio information. In Bangla, very little work has been done on speaker diarization. This project aimed to build a Bangla dataset for the diarization process in this research. We described our deep learning model for the speaker diarization problem and demonstrated how transfer learning can be utilized to swiftly learn a model with minimal performance loss when compared to a fully trained one. To increase the universal applicability of our model, we focused on transfer learning and tweaked it manually across AMI and Bangla dataset. Additionally, we've been focusing our efforts on improving the Diarization Error Rate (DER) and experimenting with other embedding generation networks. We obtained a DER score of 0.24 using our transfer learning variation trained on Bangla dataset.
dc.description.degreeUndergraduate
dc.identifier.cd600000614
dc.identifier.print-thesisTo be assigned
dc.identifier.urihttps://repository.northsouth.edu/handle/123456789/1314
dc.language.isoen
dc.publisherNorth South University
dc.rights© NSU Library
dc.titleTransfer Learning for Speaker Diarization on Bangla Audio Dataset
dc.typeThesis
oaire.citation.endPage75
oaire.citation.startPage1
Files
Original bundle
Now showing 1 - 2 of 2
Loading...
Thumbnail Image
Name:
600000614.Abstract.pdf
Size:
134.81 KB
Format:
Adobe Portable Document Format
Description:
Loading...
Thumbnail Image
Name:
600000614.pdf
Size:
1.78 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.93 KB
Format:
Item-specific license agreed to upon submission
Description: