Lip Reading App

Jubair Mahmud Pulock

Lip Reading App

Files

600000587.Abstract.pdf(231.83 KB)

600000587.pdf(2.23 MB)

Date

2023

Abstract

Effective communication is essential for promoting inclusivity and closing the gap between people who speak different languages in today's globally connected world. But hearing-impaired people frequently have trouble understanding spoken language and must rely significantly on visual signals to follow discussions. To solve this problem, we introduce a revolutionary Lip Reading App, a state-of-the-art mobile application that uses computer vision and artificial intelligence (AI) to enable real-time speech interpretation. In order to effectively monitor and interpret lip movements and translate them into understandable text or audio output, the Lip Reading App makes use of the sophisticated capabilities of deep learning algorithms. The software recognizes and tracks facial landmarks using facial recognition technology, capturing the delicate movements and gestures involved in speech creation. The software can produce accurate transcriptions or audio renderings of the spoken content by processing these acquired visual cues and comparing them to a huge library of phonetic and linguistic patterns. This project presents the development and training of a character-level sequence-to sequence language model using recurrent neural networks (RNNs). The primary objective of the model is to generate text predictions given input sequences of characters. The model employs the 'binary_crossentropy' loss function, although it might be more suitable to use categorical cross-entropy for multi-class classification tasks. Training is performed over multiple epochs, with a learning rate of 1.0e-4 (0.0001) and an average epoch time of 35026 seconds. The model's progress is evaluated based on the loss value, which steadily decreases with each epoch. By the end of 100 epochs, the average loss stands at 63.765. Despite the potential improvements in the loss function selection, the model exhibits promising capabilities in generating coherent and meaningful text predictions. Further fine-tuning and optimization could potentially enhance the performance and versatility of the language model for various text generation tasks.

Department Name

Electrical and Computer Engineering

Publisher

North South University

Printed Thesis

To be determined

CD

600000587

URI

https://repository.northsouth.edu/handle/123456789/1101

Collections

Theses - Undergraduate

Full item page

Lip Reading App

Files

Date

Authors

Student ID

Research Supervisor

Editor

Journal Title

Volume

Issue

Journal Title

Journal ISSN

Volume Title

Abstract

Description

Keywords

Citation

Department Name

Publisher

Printed Thesis

CD

DOI

ISSN

ISBN

URI

Collections