A Distributed Big Data Analytics Models for Traffic Accidents Classification and Recognition based SparkMlLib Cores
Authors
Abstract
In this paper, our focus is on predicting the severity of traffic accidents, which represents a significant advancement in road accident management. Addressing this issue holds crucial implications for emergency logistical planning within urban areas. To assess accident severity within congested settings, we analyze the potential consequences of accidents, aiming to enhance the effectiveness of accident management protocols.
In the context of this study, we introduce a real-time big data project. Our approach involves the implementation and comparison of various machine learning algorithms, namely Random Forest, Support Vector Machine (SVM), and Artificial Neural Network (ANN). The objective is to accurately classify and predict the severity of traffic accidents. Our methodology revolves around the real-time capture of incoming datasets, which are then stored within a Hadoop Distributed File System cluster. Subsequently, we leverage the core functionality of Spark MLlib, making use of pre-implemented Lambda functions. Throughout the project, classification and recognition tasks are conducted as part of the data locality processing paradigm.
To validate our approach, we utilize a confusion matrix, which enables us to gauge the interclass impacts among Pedestrians, Vehicles or pillion passengers, and Drivers or riders. For empirical validation, we employ the TRAFFIC ACCIDENTS_2019_LEEDS dataset sourced from the Road Safety Department of Transport. This dataset facilitates the classification of severity predictions into three distinct categories: Pedestrian, vehicle or pillion passenger, and driver or rider. Notably, our experiments reveal impressive results. The Random Forest algorithm achieves an accuracy rate of 93%, outperforming SVM at 82% and ANN at 87%. Furthermore, in terms of precision-recall metrics, Random Forest also excels with a score of 93.82%, compared to SVM's 82.22% and ANN's 87.88%.