Bangla Plagiarism Checker using Machine Learning-Multiple line Approach

Date
2023
Student ID
Research Supervisor
Editor
Journal Title
Volume
Issue
Journal Title
Journal ISSN
Volume Title
Abstract
Plagiarism is the practice of representing another person's ideas or works as your own, with or without that person's consent, by incorporating them into your work without giving them due credit. So, there are some excellent tools available to identify plagiarism, such as Turnitin, Grammarly, etc. Even if there are many techniques available to detect plagiarism in a document, most of them are domain-specific and designed to operate with English texts, but plagiarism is not limited to a single language. With 300 million native speakers and 37 million second-language speakers, Bengali is the most frequently spoken language in both Bangladesh and India. In this project, we designed and developed a plagiarism detection method that makes use of Natural Language Processing (NLP) methods to deliver more precise results than traditional methods. We used some Bangla datasets, which are mainly based on a lot of articles from Bangla newspapers, and we also created a dataset using scarper. We processed the collected datasets and then merged them with the dataset that we made by scraping and categorizing it. Then we trained the dataset with different models and tested the results. After performance testing, we built a UI for Bangla Plagiarism Checker so that users can check the similarity based on our training dataset.
Description
Keywords
Citation
Department Name
Electrical and Computer Engineering
Publisher
North South University
Printed Thesis
DOI
ISSN
ISBN