Tuning abap parallel cursor

Today I want to share with you how tuning abap parallel cursor works and when you would need to change your algorithm, because of different data distributions in the inner and outer table.

This post takes off where this one stopped. I highly recommend you reading it first, because here I take that setup and technique knowledge for granted.

different data distribution -> different performance

Here we have a parallel cursor implementation with a read table statement:

And this is our data generation logic:

1k Head datasets and 2,5 Mio Item datasets. In the following measurements I am also including the necessary sort of both tables.

Time taken:          151.398 Microseconds

Items / microsecond: 16,78

This is our baseline. Now I will change the distribution of datasets between the head and the items a bit:

Now we have 100k head datasets and 2,5Mio item datasets. Each head has now only has 25 item datasets.

Time taken:          324.033 Microseconds

Items / microsecond: 7,72

We suffered a runtime increase of over 117%. This is a problem.

Why is it slowing down?

As I changed the ratio of items<->head, I also increased the amount of cursor switches in the algorithm. Now I have to read 100times more often than I did before. This impacts my performance notably, as the read operation is still costly.

If I want to improve this, I have to get rid of my read mechanic somehow…

no read statement

There actually is a parallel cursor implementation without a read involved:

I ran it without the read statement:

Time taken:          215.838 Microseconds

Items / microsecond: 11,58

This implementation has improved my performance by 50%.

As we removed the read statement as a switch mechanic in our algorithm, we now do not receive such a big performance penalty when we switch to a new head dataset.

Is it always faster?

No, not always. Let’s look at our first data pattern:

Time taken:          149.364 Microseconds

Items / microsecond: 16,74

With the usage of our read statement we processed 16,78 Items per microsecond.

So both algorithms have a bottom line where they both deliver the same performance. But the less items a head dataset has, the more an implementation without a read statement makes sense.

I tested both implementations with a lot of different data distributions and I never saw the read statement implementation to ever clearly beat the implementation without it. So if you want to get on the safe side of things, implement a parallel cursor without a read statement.

Take care,