Inspecting and Cleaning an Event Log
Before applying any mining technique to an event log, we recommend you to first get an idea of the information in this event log. The main reason for this is that you can only answer certain questions if the data is in the log. For instance, you cannot calculate the throughput time of cases if the log does not contain information about the dates and times on which tasks were executed. Additionally, you may want to remove unnecessary information from the log before you start the mining. For instance, you may be interested in mining only information about the cases that have completed. For our running example, all cases without an archiving task as the last one are still running cases and should not be considered. The cleaning step is usually a projection of the log to consider only the data you are interested in. Thus, in this section we show how you can inspect and clean (or pre-process) an event log in ProM 6. Furthermore, we show how you can save the results of the cleaned log, so that you avoid redoing work.
The questions answered in this Section are summarized in the table below. As you can see, the first section shows how to answer questions related to log inspection and the second section explains how to filter an event log and how to save your work. Note that the list of questions in the table is not exhaustive, but they are enough to give you an idea of the features offered by ProM 6 for log inspection and filtering.
|How many cases (or process instances) are in the log?||First|
|How many tasks (or audit trail entries) are in the log?||First|
|How many resources are in the log?||First|
|Are there running cases in the log?||First|
|Which resources work on which tasks?||First|
|How can I filter the log so that only completed cases are kept?||Second|
|How can I see the result of my filtering?||Second|
|How can I save the pre-processed log so that I do not have to redo work?||Second|
Inspecting the Log
The first thing you need to do to inspect or mine a log is to load it into ProM 6. In this tutorial we use the repairExample.xes log. This log has process instances of the running example described earlier.
To open this log, do the following:
- Download the log file for the running example and save it at your computer.
- Import the log via clicking “Import…”, and select your saved copy of the log file for the running example.
Once your file has been imported as an event log, you should get a screen like the one in the figure below. Now that the log has been imported, we can proceed with the actual log inspection. Recall that we want to answer the following questions:
- How many cases (or process instances) are in the log?
- How many tasks (or audit trail entries) are in the log?
- How many resources are in the log?
- Are there running cases in the log?
- Which resources work on which tasks?
The five questions can be answered by viewing the log summary. To view this summary, have the “repairExample.mxml” entry selected in your workspace (as shown by the last figure above) and select the “Visualize” button in the details view of this entry. This will open a view on the log entry. In this view, select the “Summary” tab. Can you now answer the first four questions of the list above?If so, you probably have noticed that this log has 1104 running cases and 1000 completed cases. You see that from the information in the table “MXML Legacy Classifier / End events” of the log summary (cf. the figure below). Note that only 1000 cases end with the task “Archive Repair”.
The fifth question of the list above can be answered by checking out the “Event Name AND Resource” section of the log summary. Based on this section, can you identify which resources perform the same tasks for the running example log? If so, you probably have also noticed that there are 3 people in each of the teams in the Repair department of the company. The employees with login “SolverC. . . ” deal with the complex defects, while the employees with login “SolverS. . . ” handle the simple defects.
Take your time to inspect this log and find out more information about it. If you like, you can also inspect the individual cases by first clicking the “Inspector” tab.
Cleaning the Log
In this tutorial, we will use the process mining techniques to get insight about the process for repairing telephones. Since our focus in on the process as a whole , we will base our analysis on the completed process instances only. Note that it does not make much sense to talk about the most frequent path if it is not complete, or reason about throughput time of cases when some of them are still running. In short, we need to pre-process (or clean or filter) the logs.
In ProM 6, a log can be filtered by applying specific actions. From the description of our running example, we know that the completed cases are the ones that start with a task to register the phone and end with a task to archive the instance. Thus, to filter the completed cases, you need to execute the following procedure:
Select the “repairExample.mxml” entry in your workspace, and select the “Action” button. This will open the Action view (see the figure below) with the log preselected as an input for the action to perform. In this view, only actions that take a log as input are listed, where actions that only take a log as input are colored green (all inputs are available) and actions that require additional inputs are colored yellow (some but not all inputs are available).
Select the “Filter Log using Simple Heuristics” action (should be green) and select the “Start” button. This will start the “Filter Log using Simple Heuristics” action on the example log. This action combines a number of log filters, that can be configured individually using a wizard.
The first log filter to configure is the event type filter, which allows us to select the type of events (or tasks or audit trail entries) that we want to consider while mining the log. For our running example, the log has tasks with two event types: complete and start. If you want to:
- keep all tasks of a certain event, you should select the option “keep”,
- omit the tasks with a certain event type from a trace, select the option “remove”, and
- discard all traces with a certain event type, select the option “discard instance”. This last option may be useful when you have aborted cases etc.
Options can be selected by clicking on an event type. When done, select the “Next” button.
The second filter is the start event filter, which filters the log in such a way that only the traces (or cases) that start with the indicated tasks are kept. The slider at the bottom allows us to select the most frequent start events. For example, if this slider is set to “80%”, then the most frequent start events will be selected until at least 80% of the traces is covered. As all traces start with “Register+complete”, this step is straightforward. When done, select the “Next” button.
The third filter is the end event filter, which filters (see the figure below) the log in such a way that only the traces (or cases) that end with the indicated tasks are kept. The slider at the bottom allows us to select the most frequent end events. The figure shows that over 80% of the traces end with “Archive repair+complete”. If you want to select more end events, you can either select them manually, or use the slider. When done, select the “Next” button.
The fourth filter is the event filter, which filters all unselected events from the log. The slider at the bottom allows us to select the most frequent events. If, as a result, all events are removed from a trace, then the entire trace will be removed (no empty traces remain). Move the slider to 100% and select the “Finish” button.
If you now inspect the resulting log (cf. the first section), you will notice that the log contains fewer cases (Can you say how many?) and all the cases indeed start with the task “Register (complete)” and finish with the task “Archive Repair (complete)”.
Although the log filters we have presented so far are very useful, they have some limitations. For instance, you cannot rename tasks (or events) in a log. For reasons like this, ProM 6 provides more powerful log filters. We strongly advise you to spent some time trying them out and getting more feeling about how they work. Our experience shows that the advanced log filters are especially useful when handling real-life logs. These filters not only allow for projecting data in the log, but also for adding data to the log.
Once you are done with the filtering, you can export the filtered log by selecting it in your workspace and selecting the “Export to disk” button. If you like, you can export the filtered log for our running example. Can you open this exported log into ProM 6? What do you notice by inspecting this log? Note that your log should only contain 1000 cases and they should all start and end with a single task.