Bangla Plagiarism Checker using Machine Learning-Multiple line Approach

Md Saiful Islam

Bangla Plagiarism Checker using Machine Learning-Multiple line Approach

Files

600000383.Abstract.pdf(138.13 KB)

600000383.pdf(541.85 KB)

Date

2023

Abstract

Plagiarism is the practice of representing another person's ideas or works as your own, with or without that person's consent, by incorporating them into your work without giving them due credit. So, there are some excellent tools available to identify plagiarism, such as Turnitin, Grammarly, etc. Even if there are many techniques available to detect plagiarism in a document, most of them are domain-specific and designed to operate with English texts, but plagiarism is not limited to a single language. With 300 million native speakers and 37 million second-language speakers, Bengali is the most frequently spoken language in both Bangladesh and India. In this project, we designed and developed a plagiarism detection method that makes use of Natural Language Processing (NLP) methods to deliver more precise results than traditional methods. We used some Bangla datasets, which are mainly based on a lot of articles from Bangla newspapers, and we also created a dataset using scarper. We processed the collected datasets and then merged them with the dataset that we made by scraping and categorizing it. Then we trained the dataset with different models and tested the results. After performance testing, we built a UI for Bangla Plagiarism Checker so that users can check the similarity based on our training dataset.

Department Name

Electrical and Computer Engineering

Publisher

North South University

Printed Thesis

To be determinded

CD

600000383

URI

https://repository.northsouth.edu/handle/123456789/1149

Collections

Theses - Undergraduate

Full item page