discuss the alpha algorithm for process mining and its limitations
discuss its improved version called alpha plus
create event logs from existent Petri nets
explore different software for process mining that implements the alpha plus algorithm
check whether the Petri net generated from the algorithm is a workflow net through reachability graph

In this work the following software is used:

EMiT - a tool that implements the Alpha Plus algorithm
PIPE2 for constructing reachability graphs

This work is organized in the following way: first we briefly introduce the important concepts; then discuss the Alpha algorithms and its drawbacks; after that we introduce the improved version of this algorithm called Alpha Plus. In the third section we describe a tool that implements the Alpha Plus algorithm called EMiT. In the fourth section we describe how we generate logs from existing Petri Next. The fifth section is devoted to a tool for Petri Nets that also can generate reachability graph called PIPE2. And finally, the last section illustrates all discussed algorithms and software with an example.

Definitions

Petri net is a graphical and mathematical notation for describing and analysing concurrent systems. It consists of places (circles) and transitions (squares): places serve as input and output for transitions, and transitions represent actions. Each place can contain tokens, and a transition can fire if there is a token in all its input places. When firing, it consumes one token from each input place and puts one token to each output place.

A workflow net is a special form of a Petri net that is suitable for expressing workflows. It has a clear start and a clear end that are expressed with dedicated input and output places. Every other node in a workflow net should be on the path from input to output.

Reachability graph represents all the state changes of a Petri net by nodes and labeled arcs. The nodes correspond to the markings of Petri net and the labeled arcs are the transitions change one state from the other state. Reachability graph helps verify Petri net is workflow net or not.

The following is a simple example of a Petri net that is also a workflow net:

Consider the initial place as P0, intermediate as P1 and end place as P2.

For generating a reachability graph, our marking will correspond to {P0,P1,P2}

Here is our reachability graph, as we have one token at P0, we have initial marking as {1,0,0}

{1,0,0} →A fires→ {0,1,0} →C→ {0,0,1} similarly,

{1,0,0} →B→ {0,1,0}→C → {0,0,1}

The tool we use is PIPE2^[1], generates reachability graph as following -

Where S0, S1 and S2 are states (or markings) contain three places which are P0, P1 and P2. It provides details for each state and it includes - Marking, Edges From (Transition that lead to S1) and Edges To (Transition that will lead from S1).

Process Mining or Process Discovery is a technique for extracting workflow nets from logs. An event log consists of log traces - sequences of event in the process that happened.

In this work we do not provide the formal mathematical definitions and instead refer to (Aalst, 2011).

The Alpha Algorithm

α algorithm (Aalst, Weijters, Maruster, 2004) is one of the first Process Mining algorithm that discovers Workflow Nets from logs. It consists of three phases: (1) pre-processing (inferring relations between the transitions), (2) processing (execution of the alpha algorithm) and (3) post-processing.

However there is one important caveat: implicit places. A place is implicit if adding or removing it does not cause the behavior of a workflow.

The following workflow does not contain any implicit places:

But if we connect a and d with a new place - this place will be implicit:

These places cannot be seen from the logs since they have no affect on the behavior and typically mining algorithm cannot find such places. Therefore we will not consider such places while rediscovering workflow nets.

Complete Log

Another important notion for the alpha algorithm is log completeness. A log of a workflow net N is complete if it contains all other possible logs of N and it contains all transitions t from this N.

For example, for the following net the complete log is [abcd, acbd, ef] since it covers all possible traces of this net.

Relations

In order to discover a workflow net from logs, we need to establish the ordering between the transitions of this workflow. These relations will later be used in order to find places and connection between the transitions and these places.

We define the following relations between transitions in the log:

direct succession x > y. We say x > y if we see in logs sub-traces ...xy...
causality x→y. We say x → y ⟺ x > y ∧ y ≯ x - if there are traces ...xy... and no traces ...yx.... This relation may mean that we will need to put a place between x and y.
parallel x || y. We say x || y ⟺ x > y ∧ y > x - if we can see both ...xy... and ...yx.... In this case we cannot put a place for such x and y - if we placed, we'd impose some order on them. This is symmetric relation: a || b → b || a.
unrelated x # y. We say x # y ⟺ x ≯ y ∧ y ≯ x - when there are no traces ...xy... nor ...yx.... This is also a symmetric relation: x # y → y # x

For example, consider the following workflow net:

From the log is [abcd, acbd, ef] we can infer the following

a > b, a > c, b > c, b > d, c > b, c > d, e > f
a → b, a → c, b → d, c → d, e → f
b || c

α Algorithm

With these relations we define the α algorithm as follows.

α(L):

extract all transition names from L to set T
let TI be the set of all initial transitions and TO the set of all end transitions
find all pairs of sets (A,B) such that

tA ∈ A should be connected to all tB ∈ B via some place p
∀a ∈ A and ∀b ∈ B holds that a → b
∀a1,a2 ∈ A: a1 # a2 and ∀b1,b2∈ B: b1 # b2

in this example: a # b, c # d, a → c, a → d, b → c, b → d
in other words: all transitions from A put tokens to the place p, and transitions of B take tokens from p

once found all such sets, we retain only the maximal ones

a maximal set contains the maximal possible number of elements that can be connected via single place

for each such pair (A, B) we connect all elements from A with all elements from B with one single place p(A,B)
then we also connect appropriate transitions with the input and output places

Limitations

There are some limitations of the α algorithm. First of all, it cannot handle short loops of length one and two. Also it cannot find long-term dependencies. We will discuss these limitation in more details.

Loops of Length One

If there are short loops of length one in a workflow net, α cannot re-discover them.

We will illustrate this problem with an example. Suppose we have a log [ac, abc, abbc, abbbc], which was generated by this net:

But from this log the α algorithm generated a model which is different:

The reason is that we cannot see the step for finding pairs of sets A and B. Here we want to have b ∈ A and b ∈ B, but it cannot happen because b ¬ # b (¬ means not).

Loops of Length Two

Another limitation of the α algorithm is that it cannot rediscover loops of length two. We also illustrate this problem with an example.

Suppose we have a model that generates the log [abd, abcbd, abcbcbd]. But here is what α finds:

The reason for that is that we think that b || c, since b > c and c > b, but it's not the case in this situation.

Non-Local Dependencies

And the last problem of this algorithm discussed here is dealing with non-local dependencies. Such dependencies are usually used to represent some process constraints.

For instance, consider the workflow net that generated log [acd,bce]

Process constraints that can be represented in such a way:

If I came to work by car (p1), I will leave by car;
If I came to work by train (p2), I will leave by train.

The blue places represent non-local dependencies.

There is a problem with them: these constraints cannot be captured by the α algorithm and the reason is that they are not visible in the logs. It's actually the problem of many Process Mining algorithms, not just α.

We see that there are clear limitations of this algorithm, but some of them can be dealt with, and the improved version of the alpha algorithm, called the alpha plus algorithm, specifically handles the issue with short loops. However it doesn’t improve the algorithm in all other aspects. We discuss this algorithm in the following section.

The Alpha Plus Algorithm

α+ algorithm, presented in (Medeiros, Dongen et al 04), is an extension of α algorithm where we are able to address the issues of loops, discussed below.

Loops of Length Two

First, we discuss how to handle loops of length two. Recall the example of a workflow net that cannot be handled by the alpha algorithm:

We have identified that the algorithm can’t identify loops of length two due to incorrect assumption b || c: the logs shows that b > c and c > b, therefore we cannot say b → c based only on the previously defined relations. Thus we need to define new relations that are able to capture such behavior.

For that we firstly introduce another notion of log completeness. A log for a workflow net is loop-complete if (1) it is complete and (2) there's enough information to detect loops of length two (i.e. we are able to see sequences of ...t1t2t1... s.t. t1 ≠ t2 in the logs).

For example, for the following workflow the loop-complete log should contain one or more traces with ...cdc... and ...dcd....

New Ordering Relations

To correctly capture loops of length two and distinguish the relations a → b and a || b, we re-define them and introduce new relations.

In the same way as previously we define direct succession and unrelated relations:

a > b ⟺ we see a trace ...ab...
a # b ⟺ a ≯ b ∧ b ≯ a

Also we introduce new relations:

a △ b ⟺ there is a subsequence ...aba... in the logs
a ♢ b ⟺ there are sequences ...aba... and ...bab...

And finally with new relation we re-define causality and parallel

a → b ⟺ a > b ∧ (b ≯ a ∨ a ♢ b): this way it is now possible correctly identify the follow relation when there's a loop of length two.
a || b ⟺ a > b ∧ b > a ∧ a ¬♢ b: in this case by adding the last condition we don't misidentify the parallel relation.

However, there is one caveat. Short loops of length one also can produce such sequences that resembles sequences produced by short loops of length two.

Consider this workflow:

It is clear that this workflow can produce sequences such as ...dcd... and ...cdc....

Therefore at this step we assume that the net is one-loop-free, i.e. it does not contain loops of length one. If there are such loops in a net, we can turn it into one-loop-free by removing all transitions that create these loops. In the following section we will discuss why it is possible and how to handle these short loops.

Also note that for one-loop-free workflow nets a △ b → b △ a.

Loops of Length One

In this section we will address the problem of handling loops of length one. A transition is one-loop if it participates in a loop of length one.

An one-loop transition is always connected to a single place (assuming that we don’t have implicit places) and this cannot be the input place nor the output place: otherwise such a network would not be a workflow net.

(a) one length transition connected to output

(b) one length transition connected to input

Since they are always connected to one place, we can safely remove such a transition from a workflow net: it will clearly not affect all other transitions. What is more, if a workflow net was sound before removal, it will remain sound afterwards. Additionally, removal of such transitions will not change the overall behavior of a workflow net. This statement can be proved by checking the reachability graph of the net with these transitions and without them, and in both cases the states of the reachability graphs will be the same.

Consider the following Petri Net:

Let us create a reachability graph for it. Assume p1 is the initial place, p2 is the intermediate and p3 is the final place.

[p1] → b fires → [p2] → a fires → [p2] → a fires → [p2] → c fires → [p3]

As we can see firing a has no effect on marking with a loop of length one.

Let us create a reachability graph for same net without the loop of a transition, it will be as follows:

[p1] → b fires → [p2] → c fires → [p3]


(a) reachability graph before removing the transition a	(b) reachability graph after removing the transition a

So, its clear, we have same set of markings, i.e., [p1] → [p2] → [p3] without the loop.

Therefore it is possible to remove such transitions during the pre-processing stage of the algorithm and return them back during the post-processing stage.

To determine the place p to which a one-loop transition t should be connected, we find the transitions a and b s.t.

t > a ∧ a ≯ t : transitions a that are followed by t, but don't follow t themselves (p should be an input place for a and output place for t)
b > t ∧ t ≯ b: transitions b that follow t, but are not followed by t (p should be an output place for b and input place for t)

α+ Algorithm

α+(W):

let T be all transitions found in the log W
identify all one-loop transitions and put them into set L1L
let T′ be all non-one-loop transitions: T′ ← T − L1L
let FL1L be the set of all arcs to transitions from L1L
it consists of:

all transitions a that happen before t: a ∈ T′ s.t. a > t
all transitions b that happen after t: b ∈ T′ s.t. t > b

now remove all occurrences of transitions t ∈ L1L from the log W, let the result be W−L1L
run the α algorithm on W−L1L: α(W−L1L)
reconnect the one-loop transitions back: add all the transitions and from L1L and arcs from FL1L to the transitions and arcs discovered by α

Examples

Examples of petri-nets that α+ can re-discover, but α cannot:

Following section shows that this is indeed true by testing these workflow nets with EMiT.

EMiT: A process mining tool

EMiT, introduced by B.F. van Dongen and , W.M.P. van der Aalst (Dongen, Boudewijn, Aalst 04), is a tool which we identified to implement α+ algorithm. Enhanced Mining Tool (EMiT) is a windows based tool to rediscover workflow processes. The process of constructing a Petri net from an event log is divided into three sub processes. First to read the log file, and then convert the log file into relations. Finally, it generates Petri net from the relations. Let us go through them using a simple example.

Operating Procedure

1. Converting log to XML

As we are going to concentrate on Petri net, select Pnet+ in conversion module; open the respective log file (.dtd) for the Petri net we need .

2. Building relations using XML -

After log conversion we build a relation from the XML. Export the relation.

3. Using relation to construct the Petri net -

Now we open the relation and construct Petri net from it.

Then we have an option to export the constructed Petri net in tab 3, the options that are available through export are:

Low detail dot (can be used in AT&T Graphviz tool),
High detail dot (can be used in AT&T Graphviz tool),
Woflan (can be used in Prom)

Out of these three, we will later concentrate on Woflan (*.tpn) format, this format can be used as an input to Prom, and we can use few packages to perform analysis or recreate with modifications.

Let us explain in detail the mining process of EMiT, as α algorithm, this one is also divided in three sub processes - pre-processing, processing, post-processing.

EMiT Detail

1. Pre-processing

In the pre-processing phase, the log is read into EMiT and the log based ordering relations are inferred. In order to build these relations, the assumption is made that the transitions in the log are totally ordered. However, several refinements can be made.

The format of the log is such that, for a transition A, it can capture when it was scheduled, when it started when it was completed or suspended or aborted or withdrawn. We will be targeting only “complete” part for a transition, but it’s a feature of EMiT to read the various stages of a given transition and generate the respective log. We have to make sure that log has all the details with the timing information.

The parallel relations can be inferred based on time-information present in the log. The two transitions A and B are considered to be in parallel if and only if in the log, A is directly followed by B at least once and B is directly followed by A at least once. However, here the assumption is made that A and B are atomic transitions, while in real situations transitions span a certain time interval. The beginning and the end of such a transition however can be considered atomic. EMiT is capable of inferring parallelism for all events of a certain transition. In this phase relations are built as well as loops of length one or two are identified.

2. Processing

In the next phase, the core α-algorithm is called. However, since the α-algorithm cannot deal with loops of length one and two, some refinements are made. First of all, all the transitions that are identified as a loop of length one are taken out of the set of transitions . These loops are then plugged back in later in the processing phase. Second, for all transitions that are identified as loops of length two, the relations are changed. If two transitions are identified as being a loop of length two together, say transition A and transition B, then a parallel relation between the two exists, A || B. This relation is removed and two new relations are added in the following way: A → B and B → A. When the Petri net is built, the loops of lengths one are added to it. And now EMiT is ready for post processing phase.

3. Post-processing

In this phase, the original log is loaded into the program again. Using the original log and the Petri net generated in the processing step, additional information can be derived. Besides, the relations inferred in the pre-processing phase can be altered. To calculate additional information for the Petri net, the original log is used. So basically in post-processing phase we derive additional information from the log and petri-net, output of processing phase, to add extra meaning to the net. We have limited the EMiT use to our project scope of regenerating the net from log, hence we did not dive into much details of timing information and other features of EMiT.

The Limitations Solved

Now let us show how EMiT tackles the problem of α algorithm. We have created the log for the respective problems, in EMiT readable format, and then performed the necessary steps to generate the petri-net using EMiT.

We consider the traces of net and have not taken into account the number of times a single trace is getting executed.

1. Loop of length 1 - Consider the log [ < a d d d b >, < a d d c c b >, < a d c d c b >, < a c c b >, < a b > ], after loading this log to EMiT, we get following output -

Which is the desired output, and cannot be produced by α algorithm.

2. Loop of length 2 - Consider the log [ < a b c b c b c b d >, < a b d > ] and the output -

3. Loop of length 1 and length 2 - Consider the log [ < a b b b c d c d b b c e e d b b c f >, < a b c d b c e d c d c e f >, < a c d c d c f >, < a c d c d c e e f >, < a c f > ] and the output -

This shows that EMiT indeed eliminates the limitations of α algorithm.

EMiT and Prom

When we export the generated petri-net, we have main-process.tpn file. Let us explore this in Prom.

1. We import the EMiT generated .tpn file

2. We click use resources (play button)

3. The packages for which this file can act as an input are -

a. Analyze Model Using Uma; allowed us to check the soundness of the net

b. Analyze Model Using LoLA; allows us to check boundedness, deadlock in net

c. Analyze Structural Property of Petri net

d. Configure Visibility of Transitions; allows us to remove (make invisible) any transition

e. Create Accepting Petri net

f. Create Final Marking

g. Create Initial Marking

h. Create/Edit Petri net with Data

i. Synthesize Partner using Wendy

j. Analyze with Woflan; allowed us to check the soundness of workflow net

h. Convert Petri net to CPF model

k. Convert Petri net to Flexible model

l. Reduce All Transitions

k. Remove edgepoints; replaced the loops with double arrowed arc

m. Unpack aggregate type.

This way we were able to analyse on our Petri net created through EMiT in Prom, not all of the packages, few packages such as Create Accepting Petri net, recreated the net with initial and final markings.

Generating an Event Log

Since in this work we want to assess the quality of a mining algorithm, we need to set a baseline for our evaluation. Generating the event log from a known Petri net and then comparing them can be such a measure that allows us to see visually how well an algorithm works.

For that purpose we have decided to evaluate the quality based on four workflow Petri nets, created during the course of Business Process Management as our first assignment. They all implement some real-life business case and should be able to serve as a good source of event traces.

There have been studies that try to address this problem. In [Medeiros, Gunther 05] they use the CPN tools^[2] to generate logs based on a Petri net, but unfortunately we were not able to use their approach, and the foremost reason is that the prepared Petri nets could not be opened with this editor.

Instead, we implement the log extraction in a semi-manual manner:

First, we rename all our transitions to capital letters (A, B, C, …)
Second, we manually play out a Petri net to capture its behavior in a graph
Then we traverse this graph to collect all traces it produces
And finally with a simple script we generate a log that is opened with EMiT

Converting to Graph

For sequential executing conversion to a graph is straightforward: if one transition A is followed by another transition B, for correspondings nodes in the graph put an edge from A to B.

(a) part of a Petri net with sequential flow

(b) resulting graph

However the translation is more difficult for transitions that are executed in parallel because graphs are not well-suited to represent such behavior. To overcome this problem we create several nodes that list the sequence in which two or more parallel transitions are executed. This approach may become cumbersome as the number of possible orderings grows, but in our situation we consider it reasonable.

(a) part of a Petri net with parallel flow

(b) resulting graph

Traversing a Graph

For graph traversal we use a simple modification of Breadth-First Search^[3] in which we allow to visit a node no more than two times. All nodes that are executed before the final place in a Petri net we mark as “final” in our graph and output only the traces that have reached these final nodes.

In this example “X” is a final node and we show it with a double circle.

These generated sequences with small additional transformation can be loaded by EMiT.

Please refer to Case Study for a bigger example.

We understand that a different approach should be preferred for generating logs from Petri nets with many parallel transitions. One of such approaches could be generating a reachability graph from a Petri net and then traversing it in similar manner, but instead of keeping a list of visited nodes, record labels of visited edges.

Now we have the Petri net generated from log. Therefore, we are going to verify whether this graph is workflow net by reachability graph.

PIPE2 : Verifying the Workflow Net Generated from Log

PIPE2 provides some functions to model, verify the Petri net originally created by Imperial College London. Petri net have many variations, for example, free choice net, marked graph, state machine, and this article will focus on workflow net which models the business process. If the Petri net generated from the log has created, however, is it workflow net? Does it have the properties of workflow net? PIPE (Platform Independent Petri net Editor) provides reachability graph of the Petri net and it helps verify workflow net. Thanks to the reachability graph, we can check the properties of workflow net.

Operating Procedure

Transforming Petri net into PNML format

In the courses, the improper completion and live lock cases introduced the examples of bad workflow net. These examples will be used for the demonstration of PIPE2.

Those cases are in pflow format which is not compatible with PIPE2. Therefore, first thing is to transform the pflow format into pnml format (XML version of Petri net). PNeditor can deal with this format problem by exporting the diagram and then selecting pnml format in PNeditor interface.

export the diagram

transform the pflow format into pnml format

Interface

PIPE2 has many functions to analyze the Petri net. Classification can determine the model whether it is the marked graph or state machine. Incidence Analysis can generate the rule matrix. Invariant Analysis can analyze the invariants. Siphons and Traps can identify the transitions are continually producing tokens or consuming tokens. To analyze the workflow net only Reachability Graph is introducing, and ignoring the other functions. In addition, the firing sequence can be animated showed by following producer and consumer example through Toggle Animations Mode.

toggle animation mode

The red transitions show that it can be fired, and the fire sequence recorded on the left side.

firing sequence example

Reachability Graph

Workflow Net

The improper completion is a bad workflow net introduced in the class. With PIPE2, the reachability graph explicitly shows the feature.

improper completion

The reachability graph of the workflow net must have one start marking and one end marking. Therefore, the following graph is improper completion because of two end markings. The tangible state and vanishing state are used for stochastic Petri net so it isn't described here.

reachability graph for improper completion

The other case is livelock. It can be seen through reachability graph because of no explicit end markings.

Petri net with livelock and its reachability graph

After seeing above cases, the reachability graph of workflow net clearly will have following property. A start marking can go along the firing sequence to arrive an end marking.

Rule Matrix (Incident Matrix)

Markings and the sequence of markings by firing sequence of transitions are two goals to generate reachability graph. The rule matrix (or incident matrix) can represent Petri net in linear algebra to get the marking, presented by T.Murata[T.Murata, 89]. For example, the rule matrix of improper completion case can be expressed through Incidence & Marking function in PIPE2. The meaning of the rule matrix is, if “select payment method” transition fires, p1 will lose one token and p2 will get one token from t1, and the other places won’t change because they don’t have the arcs linked to the “select payment method” transition.

rule matrix

The rule matrix generates the reachability graph through following formula.

Next marking = Initial marking + Firing sequence[i] * Rule matrix, i belongs to firing sequence

To present briefly with an example, the names of places and transitions are aliased to P and T respectively. Initial Marking :[p1, 0, 0, 0, 0, 0, 0, 0 ]

Rule matrix :

	t1	t2	t3	t4	t5	t6	t7	t8	t9
p1	-1	0	0	0	0	0	0	0	0
p2	1	-1	-1	0	0	0	0	0	0
p3	0	0	0	1	1	-1	-1	0	0
p4	0	1	0	-1	0	0	0	0	0
p5	0	0	1	0	-1	0	0	0	0
p6	0	0	0	0	0	0	1	-1	-1
p7	0	0	0	0	0	1	0	1	-1
p8	0	0	0	1	1	0	0	-1	0

One example of firing sequence:

t1	1
t2	0
t3	0
t4	0
t5	0
t6	0
t7	0
t8	0
t9	0

Rule matrix Firing sequence = [-1,1,0,0,0,0,0,0], so the next marking is [0,p2,0,0,0,0,0,0 ].

Program can apply Breadth-First Search or Depth-First Search for all firing sequence, and then build reachability graph.

Case Study: A Housing Agency Workflow Net

In this section we illustrate all the discussed concepts and software with an example. The model we use is a workflow net of a housing agency where customers asks to find them an apartment to rent, and while they live there the apartment is checked periodically. This model is big enough and has interesting features that we want to check. Namely, it contains a long loop, parallel processing and many short loops of length two.

(a) a Petri net

(b) constructed graph (firing sequence of transitions)

The first step of processing the Petri Net is to construct a graph that can be used for generating firing sequences. These firing sequences are later used as log traces and fed into EMiT.

For this Petri net we generated 1234 log traces. For example,

ABCDX
ABCDFBCEHGJRTYZ
ABCEHGJKLOPKNRTYZ
ABCEGHJRTWSKNKLOQZ
ABCEHGJKLMLMNKNRSRTWTYZ
and so on

After applying the Alpha+ algorithm with EMiT the following workflow net was obtained:

This net was tested by replaying the generated logs and EMiT reported no problems. Note that EMiT correctly mined all the short loops of length two that were present in the original workflow.

The rediscovered Petri Net was then tested with PIPE2 by space testing analysis. We can see that it is a sound workflow net. Besides, the reachability graphs of the original Petri net and of the Petri net generated from Alpha+ algorithm are identical (up to renaming edges and nodes).

The reachability graph of the discovered workflow net

The reachability graph of the original workflow net

Analysis using PIPE2

Drawbacks of EMiT and PIPE2

EMiT and PIPE2 each has one drawback for format problem. As described before, this article uses EMiT to generate Petri net from log. But EMiT can’t read the following format of log and it asks more information of log file.

ABCDX

ABCDFBCDX

ABCEHGJRTYZ

ABCEGHJRTYZ

ABCEHGJKLOQZ

ABCEGHJKLOQZ

Therefore, we transform the above format of log into following format of log to let EMiT can read the log.

A_schedule;0;A_start;0;A_complete;0;B_schedule;0;B_start;0;B_complete;0;C_schedule;0;C_start;0; C_complete;0;D_schedule;0;D_start;0;D_complete;0;X_schedule;0;X_start;0;X_complete;0;

A_schedule;0;A_start;0;A_complete;0;B_schedule;0;B_start;0;B_complete;0;C_schedule;0;C_start;0; C_complete;0;D_schedule;0;D_start;0;D_complete;0;F_schedule;0;F_start;0;F_complete;0;B_schedule;0; B_start;0;B_complete;0;C_schedule;0;C_start;0;C_complete;0;D_schedule;0;D_start;0;D_complete;0; X_schedule;0;X_start;0;X_complete;0;

A_schedule;0;A_start;0;A_complete;0;B_schedule;0;B_start;0;B_complete;0;C_schedule;0;C_start;0; C_complete;0;E_schedule;0;E_start;0;E_complete;0;H_schedule;0;H_start;0;H_complete;0;G_schedule;0; G_start;0;G_complete;0;J_schedule;0;J_start;0;J_complete;0;R_schedule;0;R_start;0;R_complete;0; T_schedule;0;T_start;0;T_complete;0;Y_schedule;0;Y_start;0;Y_complete;0;Z_schedule;0;Z_start;0; Z_complete;0;

The other drawback is we could not read the PNML output generated from ProM in PIPE2. We have tried to manually change some parts of XML generated from ProM, but PIPE2 was unable read the same. Thus, to read a workflow net generated from EMiT we had to redraw the diagram.

Conclusions

The goals of this work are to learn about the Alpha plus algorithm and explore a tool that implements it.

To achieve this goal we reviewed the alpha algorithm and discussed its limitations such as being unable to mine short loops; then we described its improved version that tackles these limitation and installed and explored a tool called EMiT that implements this algorithm. We find this tool easy to use, especially compared to big process mining tools such as ProM. Later we found a way to generate logs from an existing Petri net in a semi-manual manner so they can be read by EMiT.

Then, we installed and explored a tool called PIPE2 to verify the Petri net generated from Alpha plus algorithm. Based on reachability graph, we know this Petri net is workflow net and the output of Alpha plus algorithm is workflow net. Thus, Alpha plus algorithm removes some transitions at first and adds them at final step and also defines new relationship should be logical correct because this algorithm can keep soundness of the workflow net and the Petri net of those are the same.

Finally we illustrated how the Alpha Plus algorithm, as implemented in EMiT, can rediscover an existent model with some interesting features, such as loops of length two. These features cannot be found by the Alpha Algorithm, but were successfully discovered with its extended version. Besides, the reachability graph of the model shows the rediscovered model is workflow net, and it ensures the correctness of Alpha Plus algorithm.

References

Murata, Tadao. "Petri nets: Properties, analysis and applications." Proceedings of the IEEE 77.4 (1989): 541-580.
Van der Aalst, Wil MP, and Wil van der Aalst. Process mining: discovery, conformance and enhancement of business processes. Springer, 2011.
De Medeiros, AK Alves, and Christian W. Günther. "Process mining: Using CPN tools to create test logs for mining algorithms." Proceedings of the sixth workshop on the practical use of coloured Petri nets and CPN tools (CPN 2005). Vol. 576. 2005. [pdf].
A. de Medeiros, A K and van Dongen, B F and van der Aalst, W M P and Weijters, A J M M (2004). "Process mining: extending the α-algorithm to mine short loops". [pdf].
van Dongen, Boudewijn F., and Wil MP van der Aalst. "EMiT: A process mining tool." Applications and Theory of Petri Nets 2004. Springer Berlin Heidelberg, 2004. 454-463. [pdf].
Business Process Management course, ULB, the offering of autumn 2013 by Prof. T. Calders.
Weske, Mathias. Business process management: concepts, languages, architectures. Springer, 2012.
W.M.P. van der Aalst, A.J.M.M. Weijters, and L. Maruster. Workflow Mining: Discovering Process Models from Event Logs. IEEE Transactions on Knowledge and Data Engineering, 16(9):1128–1142, 2004

[1] "Platform Independent Petri net Editor 2." 2004. 22 Jan. 2014 <http://pipe2.sourceforge.net/>

[2] http://cpntools.org/

[3] http://en.wikipedia.org/wiki/Breadth-first_search

	t1	t2	t3	t4	t5	t6	t7	t8	t9
p1	-1	0	0	0	0	0	0	0	0
p2	1	-1	-1	0	0	0	0	0	0
p3	0	0	0	1	1	-1	-1	0	0
p4	0	1	0	-1	0	0	0	0	0
p5	0	0	1	0	-1	0	0	0	0
p6	0	0	0	0	0	0	1	-1	-1
p7	0	0	0	0	0	1	0	1	-1
p8	0	0	0	1	1	0	0	-1	0

	t1	t2	t3	t4	t5	t6	t7	t8	t9
p1	-1	0	0	0	0	0	0	0	0
p2	1	-1	-1	0	0	0	0	0	0
p3	0	0	0	1	1	-1	-1	0	0
p4	0	1	0	-1	0	0	0	0	0
p5	0	0	1	0	-1	0	0	0	0
p6	0	0	0	0	0	0	1	-1	-1
p7	0	0	0	0	0	1	0	1	-1
p8	0	0	0	1	1	0	0	-1	0

	t1	t2	t3	t4	t5	t6	t7	t8	t9
p1	-1	0	0	0	0	0	0	0	0
p2	1	-1	-1	0	0	0	0	0	0
p3	0	0	0	1	1	-1	-1	0	0
p4	0	1	0	-1	0	0	0	0	0
p5	0	0	1	0	-1	0	0	0	0
p6	0	0	0	0	0	0	1	-1	-1
p7	0	0	0	0	0	1	0	1	-1
p8	0	0	0	1	1	0	0	-1	0