Towards Explainable Graph Spectral Clustering for BERT Embeddings
DOI:
https://doi.org/10.14313/jamris-2026-005Keywords:
Explainable Machine Learning, Natural Language Processing, Graph Spectral Clustering, Document Embedding versus Explainability, BERT and GloVe and TVS EmbeddingAbstract
Artificial Intelligence algorithms are increasingly applied to tasks in Natural Language Processing, including document clustering. As these algorithms become increasingly complex (such as transformer-based embeddings, like BERT) and/or are of a ``black-box'' nature, such as Graph Spectral Clustering (GSC) algorithms, the demand for explaining the results of such algorithms is becoming increasingly urgent.
In this paper, we propose a model-aware method to explain the results of GSC in the context of BERT-based embeddings.
We present a novel theoretical methodology for explanation, based on the premise that document similarity in GSC is computed as cosine similarity of BERT embeddings of documents.
We demonstrate the validity of this methodology by presenting strong GSC clustering results, restoring the human-made assignment of hashtags to tweets. We show that GSC based on BERT embeddings outperforms approaches using Term Vector Space and GloVe embeddings. Therefore, the resulting explanations are also expected to be of higher quality.
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Mieczysław Kłopotek, Sławomir T. Wierzchoń, Bartłomiej Starosta, Piotr Borkowski, Dariusz Czerski

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Authors retain copyright. Authors grant the journal a non-exclusive right to publish the article. Articles are published under the CC BY-NC-ND 4.0 licence.


