There are two possibilities of preparing data for process mining:

Preparation Type AdvantageDisadvantage

1. Single eventlog

Fast solution by sticking to one table

Unnecessary replication of data

2. Activities table, cases table &

additional information tables

Minimizing data replication and improving performance,

some tables might be already at hand

Additional work with establishing

foreign key relations between tables

1.      Single eventlog

It is possible to prepare a single eventlog and add as many columns as you wish in order to include additional information. However, with increasing number of columns

and replicating case-specific data for each activity in the eventlog, performance might suffer. On the other hand, this is the easiest way of preparing data.

1.1. Minimal eventlog

In order to set up an eventlog for SAP Process Mining by Celonis 4, you need at least the following information in three columns:

  • Case ID – the numeric identifier of a case
  • Activities – the specification of actions taken
  • Timestamps – the precise date and/or time of every action taken

A Sorting column is optional.

The minimal eventlog consisting of three columns.


Case ID

The definition of a case is always process-depending. The chosen definition should suit the purpose of the analysis. Some examples in this context:

  • In an IT service desk, the journey of a ticket can represent one case.
  • In a product assembly line, all the steps of production for one item/product can represent one case.
  • In a purchasing process, all actions of handling an order item can represent one case

A case ID is the unique identifier which is solely given to the events belonging to one case.

 

Activities

Each case of a process consists of activities that name the steps which happen within the process. For instance, some activities in an accounts payable process would be

  • Scan invoice
  • Book invoice
  • Pay invoice

 

Timestamps

A timestamp specifies the exact date (and time) when an activity was performed. Each activity in the eventlog must have a timestamp in order to visualize the process.

 

Sorting

The sorting is an integer. Whenever two events have the exact same timestamp, the sorting will make the activity with the lower number to appear first in the process.

Hence, you should number the activities according to the expected procedure. A sorting is recommended, for instance, if the data only allows for timestamps that

are only dates without exact time.

1.2.  Additional information

When analyzing processes, topics of analysis are not restricted to the process flow itself. Therefore additional information can be useful. In case of a single eventlog, the information

has to be directly attached in additional columns.

 

Additional information columns in an eventlog file. Replicated data in the last three columns due to one-file-eventlog-structure.


2.      Activities table, cases table & additional information tables

Most often, additional information (such as materials, countries, currencies, etc.) will not differ within one case. Therefore, it is reasonable to split the eventlog into activities table and cases table. 

2.1. Activities table

The activity table usually has the same structure as a minimal eventlog (potentially including a sorting column), see 1.1. for details.

Also, activity-specific information can be added to the activity table, such as the user which performed an activity in the IT system.

 

Activity table consisting of minimal eventlog, sorting and one activity-specific information column.

 

2.2.  Case table

The case table may contain case-specific information and will be linked to the activity table over a foreign key relation. The Case ID must be a primary key of the case table (i.e. each Case ID appears only once in the case table). Hence, the information is only stored once per case. This prevents unnecessary data explosion.

Case table with three case-specific columns.

 

2.3. Linking activities and cases

Activity table and cases table will be linked over the Case ID. The link can be established by clicking the button “add foreign key” in the software.

Establishing a foreign key relation.

 

2.4. Additional information tables

Any further information that is related to the process can be added in additional tables as well. For each table that is supposed to be part of the analysis, a foreign key relation has to be established in the same way as described under section 2.3.

Linking two additional information tables to the case table.

 

  • No labels