Stefanowski, Jerzy - ed. ; Krawiec, Krzysztof - ed. ; Wrembel, Robert - ed.
When running data-mining algorithms on big data platforms, a parallel, distributed framework, such as MAPREDUCE, may be used. However, in a parallel framework, each individual model fits the data allocated to its own computing node without necessarily fitting the entire dataset. In order to induce a single consistent model, ensemble algorithms such as majority voting, aggregate the local models, rather than analyzing the entire dataset directly. Our goal is to develop an efficient algorithm for choosing one representative model from multiple, locally induced decision-tree models. ; The proposed SySM (syntactic similarity method) algorithm computes the similarity between the models produced by parallel nodes and chooses the model which is most similar to others as the best representative of the entire dataset. In 18.75% of 48 experiments on four big datasets, SySM accuracy is significantly higher than that of the ensemble; in about 43.75% of the experiments, SySM accuracy is significantly lower; in one case, the results are identical; and in the remaining 35.41% of cases the difference is not statistically significant. Compared with ensemble methods, the representative tree models selected by the proposed methodology are more compact and interpretable, their induction consumes less memory, and, as confirmed by the empirical results, they allow faster classification of new records.
Zielona Góra: Uniwersytet Zielonogórski
AMCS, volume 27, number 4 (2017) ; kliknij tutaj, żeby przejść
Biblioteka Uniwersytetu Zielonogórskiego
14 lip 2025
7 lip 2025
18
https://zbc.uz.zgora.pl/repozytorium/publication/100713
| Nazwa wydania | Data |
|---|---|
| Interpretable decision-tree induction in a big data parallel framework | 14 lip 2025 |
González-Vélez, Horacio Kontagora, Maryam Korbicz, Józef (1951- ) - red. Uciński, Dariusz - red.
Stefanowski, Jerzy Krawiec, Krzysztof Wrembel, Robert Stefanowski, Jerzy - ed. Krawiec, Krzysztof - ed. Wrembel, Robert - ed.
Łukasik, Szymon Lalik, Konrad Sarna, Piotr Kowalski, Piotr A. Charytanowicz, Małgorzata Kulczycki, Piotr Kulczycki, Piotr - ed. Kacprzyk, Janusz - ed. Kóczy, László T. - ed. Mesiar, Radko - ed.
Gorawski, Marcin Lorek, Michał Stefanowski, Jerzy - ed. Krawiec, Krzysztof - ed. Wrembel, Robert - ed.
Sawerwain, Marek Wróblewski, Marek Gamper, Johann - ed. Wrembel, Robert - ed.
Pięta, Piotr Szmuc, Tomasz Bartłomiej (1948- ) Kusy, Maciej - ed. Scherer, Rafał - ed. Krzyżak, Adam - ed.
Iaremko, Iaroslav Senkerik, Roman Jasek, Roman Lukastik, Petr Kusy, Maciej - ed. Scherer, Rafał - ed. Krzyżak, Adam - ed.
Cichosz, Paweł Pawełczak, Łukasz Abaev, Pavel - ed. Razumchik, Rostislav - ed. Kołodziej, Joanna - ed.