Parallel and Distributed Pattern Mining

Ishak H.A Meddah, Nour El Houda REMIL

Source Title: International Journal of Rough Sets and Data Analysis (IJRSDA)6(3)

ISSN: 2334-4598|EISSN: 2334-4601|EISBN13: 9781522568469|DOI: 10.4018/IJRSDA.2019070101

MLA

Meddah, Ishak H.A, and Nour El Houda REMIL. "Parallel and Distributed Pattern Mining." IJRSDA vol.6, no.3 2019: pp.1-17. http://doi.org/10.4018/IJRSDA.2019070101

APA

Meddah, I. H. & Nour El Houda REMIL. (2019). Parallel and Distributed Pattern Mining. International Journal of Rough Sets and Data Analysis (IJRSDA), 6(3), 1-17. http://doi.org/10.4018/IJRSDA.2019070101

Chicago

Meddah, Ishak H.A, and Nour El Houda REMIL. "Parallel and Distributed Pattern Mining," International Journal of Rough Sets and Data Analysis (IJRSDA) 6, no.3: 1-17. http://doi.org/10.4018/IJRSDA.2019070101

Export Reference

Favorite Full-Issue Download

View Full Text HTML

View Full Text PDF

Abstract

The treatment of large data is difficult and it looks like the arrival of the framework MapReduce is a solution of this problem. This framework can be used to analyze and process vast amounts of data. This happens by distributing the computational work across a cluster of virtual servers running in a cloud or a large set of machines. Process mining provides an important bridge between data mining and business process analysis. Its techniques allow for extracting information from event logs. Generally, there are two steps in process mining, correlation definition or discovery and the inference or composition. First of all, their work mines small patterns from log traces. Those patterns are the representation of the traces execution from a log file of a business process. In this step, the authors use existing techniques. The patterns are represented by finite state automaton or their regular expression; and the final model is the combination of only two types of different patterns whom are represented by the regular expressions (ab)* and (ab*c)*. Second, they compute these patterns in parallel, and then combine those small patterns using the Hadoop framework. They have two steps; the first is the Map Step through which they mine patterns from execution traces, and the second one is the combination of these small patterns as a reduce step. The results show that their approach is scalable, general and precise. It minimizes the execution time by the use of the Hadoop framework.