Identification of Keywords for Legal Documents Categories using SOM

Paulina Puchalska; Kacper Krzemiński; Maksymilian Lis; Rafal  Scherer; Paweł Drozda; Kajetan Komar-Komarowski; Konrad Szałapak; Andrzej Sobecki; Tomasz Zymkowski; Julian Szymański

doi:10.14313/jamris-2025-004

Authors

Paulina Puchalska Gdańsk University of Technology, Poland
Kacper Krzemiński Gdańsk University of Technology, Poland
Maksymilian Lis Gdańsk University of Technology, Poland
Rafal Scherer Czestochowa University of Technology, Poland
https://orcid.org/0000-0001-9592-262X
Paweł Drozda University of Warmia and Mazury in Olsztyn, Poland
https://orcid.org/0000-0003-3163-9408
Kajetan Komar-Komarowski Lex Secure, Poland
Konrad Szałapak Lex Secure, Poland
Andrzej Sobecki Gdańsk University of Technology, Poland
Tomasz Zymkowski Gdańsk University of Technology, Poland
Julian Szymański Gdańsk University of Technology, Poland
https://orcid.org/0000-0001-5029-6768

DOI:

https://doi.org/10.14313/jamris-2025-004

Keywords:

Document classification, RoBERTa, NLP, 20 GloVe, NER, SOM

Abstract

This study aims to use the decision-making process in categorizing legal documents by identifying keywords characterizing each legal domain class. The study utilizes the Kohonen Self-Organizing Map method and the Global Vectors for Word Representation (GloVe) model to create an efficient document classification system. As a result, a satisfactory classification accuracy of 71.69% was achieved. The article also discusses alternative approaches implemented to improve classification accuracy, such as the use of Named Entity Recognizer (NER) tools and the RoBERTa model, along with a comparison of their effectiveness. Challenges related to the uneven distribution of categories in the dataset are also mentioned, and potential directions for further research to enhance the classification results of legal documents are presented.

Identification of Keywords for Legal Documents Categories using SOM

Authors

DOI:

Keywords:

Abstract

Downloads

Published

Issue

Section

License

How to Cite

Information