Krasnov F.V., Smaznevich I.S.

The explicability factor of the algorithm in the problems of searching for the similarity of text documents

The problem of providing a comprehensive explanation to any user why the applied intelligent information system suggests meaning similarity in certain texts imposes significant requirements on the intelligent algorithms. The article covers the entire set of technologies involved in the solution of the text clustering problem and several conclusions are stated thereof.

Matrix decomposition aimed at reducing the dimension of the vector representation of a corpus does not provide clear explanatiom of the algorithmic principles to a user. Ranking using the TF-IDF function and its modifications finds a few documents that are similar in meaning, however, this method is the easiest for users to comprehend, since algorithms of this type detect specific matching words in the compared texts. Topic modeling methods (LSI, LDA, ARTM) assign large similarity values to texts despite a few matching words, while a person can easily tell that the general subject of the texts is the same. Yet the explanation of how topic modeling works requires additional effort for interpretation of the detected ones. This interpretation gets easier as the model quality grows, while the quality can be optimized by its average coherence. The experiment demonstrated that the absolute value of documents similarity is not invariant for different intelligent algorithms, so the optimal threshold value of similarity must be set separately for each problem to be solved.

The results of the work can be further used to assess which of the various methods developed to detect meaning similarity in texts can be effectively implemented in applied information systems and to determine the optimal model parameters based on the solution explicability requirements.

Keywords: explainable artificial intelligence, XAI, ranking function, document similarity

doi: 10.25743/ICT.2020.25.5.009

Krasnov Fedor Vladimirovich
Office: NAUMEN R and D
Address: 620028, Russia, Ekaterinburg, 49A, Tatishcheva street
SPIN-code: 8650-1127

Smaznevich Irina Sergeevna
Position: business analyst
Office: NAUMEN R and D
Address: 620028, Russia, Ekaterinburg, 49A, Tatishcheva street

