Korbicz, Józef (1951- ) - red. ; Uciński, Dariusz - red.
Tytuł: Tytuł publikacji grupowej: Temat i słowa kluczowe:text mining ; discussion forums ; text representation ; document classification ; word embedding
Abstract:Despite the rapid growth of other types of social media, Internet discussion forums remain a highly popular communication channel and a useful source of text data for analyzing user interests and sentiments. Being suited to richer, deeper, and longer discussions than microblogging services, they particularly well reflect topics of long-term, persisting involvement and areas of specialized knowledge or experience. Discovering and characterizing such topics and areas by text mining algorithms is therefore an interesting and useful research direction. ; This work presents a case study in which selected classification algorithms are applied to posts from a Polish discussion forum devoted to psychoactive substances received from home-grown plants, such as hashish or marijuana. The utility of two different vector text representations is examined: the simple bag of words representation and the more refined embedded global vectors one. ; While the former is found to work well for the multinomial naive Bayes algorithm, the latter turns out more useful for other classification algorithms: logistic regression, SVMs, and random forests. The obtained results suggest that post-classification can be applied for measuring publication intensity of particular topics and, in the case of forums related to psychoactive substances, for monitoring the risk of drug-related crime.
Wydawca:Zielona Góra: Uniwersytet Zielonogórski
Data wydania: Typ zasobu: DOI: Strony: Źródło:AMCS, volume 28, number 4 (2018) ; kliknij tutaj, żeby przejść
Jezyk: Licencja CC BY 4.0: Prawa do dysponowania publikacją: