A Distributed Big Data Analytics Models for Traffic Accidents Classification and Recognition based SparkMlLib Cores

Imad El Mallahi; Jamal  Riffi; Hamid  Tairi; Abderrahamane  Ez-Zahout; Mohamed Adnane  Mahraz

doi:10.14313/JAMRIS/4-2022/34

Authors

Imad El Mallahi Sidi Mohammed ben abdellah University, Faculty of sciences dhar el Mahraz, Department of computer sciences, LISAC laboratory, Fez, Morocco
Jamal Riffi Sidi Mohammed ben abdellah University, Faculty of sciences dhar el Mahraz, Department of computer sciences, LISAC laboratory, Fez, Morocco
Hamid Tairi Sidi Mohammed ben abdellah University, Faculty of sciences dhar el Mahraz, Department of computer sciences, LISAC laboratory, Fez, Morocco
Abderrahamane Ez-Zahout Mohamed V University, Faculty of Sciences, Intelligent Processing Systems & Security Team (IPSS) Computer Science Department, Rabat, Morocco
Mohamed Adnane Mahraz Sidi Mohammed ben abdellah University, Faculty of sciences dhar el Mahraz, Department of computer sciences, LISAC laboratory, Fez, Morocco

Keywords: Big data, Machine learning, Traffic accident, severity prediction, convolutional neural network

Abstract

In this paper, our focus is on predicting the severity of traffic accidents, which represents a significant advancement in road accident management. Addressing this issue holds crucial implications for emergency logistical planning within urban areas. To assess accident severity within congested settings, we analyze the potential consequences of accidents, aiming to enhance the effectiveness of accident management protocols.

In the context of this study, we introduce a real-time big data project. Our approach involves the implementation and comparison of various machine learning algorithms, namely Random Forest, Support Vector Machine (SVM), and Artificial Neural Network (ANN). The objective is to accurately classify and predict the severity of traffic accidents. Our methodology revolves around the real-time capture of incoming datasets, which are then stored within a Hadoop Distributed File System cluster. Subsequently, we leverage the core functionality of Spark MLlib, making use of pre-implemented Lambda functions. Throughout the project, classification and recognition tasks are conducted as part of the data locality processing paradigm.

To validate our approach, we utilize a confusion matrix, which enables us to gauge the interclass impacts among Pedestrians, Vehicles or pillion passengers, and Drivers or riders. For empirical validation, we employ the TRAFFIC ACCIDENTS_2019_LEEDS dataset sourced from the Road Safety Department of Transport. This dataset facilitates the classification of severity predictions into three distinct categories: Pedestrian, vehicle or pillion passenger, and driver or rider. Notably, our experiments reveal impressive results. The Random Forest algorithm achieves an accuracy rate of 93%, outperforming SVM at 82% and ANN at 87%. Furthermore, in terms of precision-recall metrics, Random Forest also excels with a score of 93.82%, compared to SVM's 82.22% and ANN's 87.88%.

A Distributed Big Data Analytics Models for Traffic Accidents Classification and Recognition based SparkMlLib Cores

Authors

Abstract

Downloads

How to Cite

Most read articles by the same author(s)

Information