Development of a Distributed Outlier Detection Method based on the Alternating Direction Method of Multipliers
Authors
Abstract
In data mining, one of the most studied problems is outlier detection, which involves identifying "unusual" data points within a dataset that are suspected to be generated by a different mechanism than the rest of the dataset. Outlier detection has applications in discovering novel information, detecting bank fraud, and identifying system intrusions, and others. However, handling large volumes of data, known as big data, poses a challenge to outlier detection algorithms because the resources of a single computer may not be sufficient to achieve efficient performance. Furthermore, datasets are often stored in distributed environments.
The goal of this work is to develop a new distributed outlier detection algorithm based on the solution of the support vector data description using the alternating direction method of multipliers. Mathematical optimization methods and Python language libraries are mainly used for the implementation. As a result, the design and distributed implementation of the proposed algorithm are achieved, which are validated using several test datasets, yielding satisfactory and competitive results compared to existing methods.