Towards Explainable Graph Spectral Clustering for BERT Embeddings
Authors
Abstract
Artificial Intelligence algorithms are increasingly applied to tasks in Natural Language Processing, including document clustering. As these algorithms become increasingly complex (such as transformer-based embeddings, like BERT) and/or are of a ``black-box'' nature, such as Graph Spectral Clustering (GSC) algorithms, the demand for explaining the results of such algorithms is becoming increasingly urgent.
In this paper, we propose a model-aware method to explain the results of GSC in the context of BERT-based embeddings.
We present a novel theoretical methodology for explanation, based on the premise that document similarity in GSC is computed as cosine similarity of BERT embeddings of documents.
We demonstrate the validity of this methodology by presenting strong GSC clustering results, restoring the human-made assignment of hashtags to tweets. We show that GSC based on BERT embeddings outperforms approaches using Term Vector Space and GloVe embeddings. Therefore, the resulting explanations are also expected to be of higher quality.



