The second step after ‘getting to know’ the data, is to apply a filtering. Usually real data is not ready to be mined. Some cases are not completed yet, different types of cases (with different processes) are combined, etc. Using the findings of the previous discussion, what do you recommend as a filtering on the data?
Note that there are several plug-ins available for filtering:
- Filter Log using Simple Heuristics
- Filter Log on Event Attributes
- Filter Log on Trace Attributes
There is an additional way of filtering available in ProM Lite, that we did not cover in this course. If you are interested you can also use the ‘Explore event log (Trace variants / searchable)’ visualization on the event log. This visualizer contains some advanced filtering approaches that could be used, such as keeping traces that do(n’t) contain a particular activity. However, with the three filtering plug-ins discussed in this course you can already filter this event log sufficiently for further analysis.
Please consider why you believe this is a good filtering method, and what the effect is of the filtering (how many traces and/or events are removed for instance?). Also consider the implications of your filtering: which conclusions can you still make and which ones can’t you make anymore?