Road Accident Prediction System Using Machine Learning Model

Research Supervisor
Journal Title
Journal Title
Journal ISSN
Volume Title
Road accidents are one of the most frequent issues in today's transportation sector. They have become a global concern that results in lives, injuries, and other direct and indirect costs. Sadly, car accidents are ranked ninth on the list of the top 10 causes of mortality by the world health organization. It has an impact on many families in addition to our economies. Therefore, a comprehensive analysis is needed to manage this chaotic circumstance. This study extensively used machine learning approaches to examine and anticipate traffic events to determine their severity. We have developed models using two different datasets and a combination of machine learning, and some deep learning methods. One is the US Accident Dataset, which was gathered from Kaggle and included all US Accidental Records from 2016 through 2021. There are 47 attributes and about 4000000 instances in this dataset. Another dataset we used was the Accidental Dataset of Bangladesh (BD Accident Dataset), which was compiled from accident reports from 2000 to 2003 and obtained from the Accident Research Institute (ARI), BUET, Dhaka. This dataset has 20 features and about 460 samples. We were able to forecast the severity of traffic incidents using the open-source US accident dataset and analyze data on the number of crashes by year, by state, on designated days of the week, the ratio of fatal crashes between rural and urban areas, the age of the injured individuals, and the deadliest times to drive. We also made an effort to determine accident severity and analyze data on driver and passenger casualties and the number of accidents by month, year, vehicle type, and district using the BD accident dataset. We focused on anticipating the "traffic accident severity" by incorporating both datasets, and we compared the outcomes between the US and the BD datasets. We removed the feature with too many null values during data pre-processing and filled the other instances with means due to our dataset's large number of cases. A combination of the Synthetic Minority Oversampling Technique (SMOTE) and hyperparameter optimization has been employed. This study developed Decision Trees, Logistic Regression, Random Forest, KNN, ANN, SVM, CNN, and LSTM models to predict accident severity. To enhance the performance of the models, we applied the Voting Classifiers. We also used the explainable AI technique LIME (Local Interpretable Model-agnostic Explanations), which clarifies a machine learning model and makes each prediction comprehensible on its own. According to the results of the US dataset, CNN received the highest rating of all models, with a training score of 99%, a validation score of 98%, and an F1-Score of 99%. On the contrary, from the BD accidental dataset, the model Random Forest obtained the best training accuracy of 99%, validation accuracy of 94%, and F1-Score of 99%.
Department Name
Electrical and Computer Engineering
North South University
Printed Thesis