Lip Reading App

Date
2023
Student ID
Research Supervisor
Editor
Journal Title
Volume
Issue
Journal Title
Journal ISSN
Volume Title
Abstract
Effective communication is essential for promoting inclusivity and closing the gap between people who speak different languages in today's globally connected world. But hearing-impaired people frequently have trouble understanding spoken language and must rely significantly on visual signals to follow discussions. To solve this problem, we introduce a revolutionary Lip Reading App, a state-of-the-art mobile application that uses computer vision and artificial intelligence (AI) to enable real-time speech interpretation. In order to effectively monitor and interpret lip movements and translate them into understandable text or audio output, the Lip Reading App makes use of the sophisticated capabilities of deep learning algorithms. The software recognizes and tracks facial landmarks using facial recognition technology, capturing the delicate movements and gestures involved in speech creation. The software can produce accurate transcriptions or audio renderings of the spoken content by processing these acquired visual cues and comparing them to a huge library of phonetic and linguistic patterns. This project presents the development and training of a character-level sequence-to sequence language model using recurrent neural networks (RNNs). The primary objective of the model is to generate text predictions given input sequences of characters. The model employs the 'binary_crossentropy' loss function, although it might be more suitable to use categorical cross-entropy for multi-class classification tasks. Training is performed over multiple epochs, with a learning rate of 1.0e-4 (0.0001) and an average epoch time of 35026 seconds. The model's progress is evaluated based on the loss value, which steadily decreases with each epoch. By the end of 100 epochs, the average loss stands at 63.765. Despite the potential improvements in the loss function selection, the model exhibits promising capabilities in generating coherent and meaningful text predictions. Further fine-tuning and optimization could potentially enhance the performance and versatility of the language model for various text generation tasks.
Description
Keywords
Citation
Department Name
Electrical and Computer Engineering
Publisher
North South University
Printed Thesis
DOI
ISSN
ISBN