Tuning abap parallel cursor

Today I want to share with you how tuning abap parallel cursor works and when you would need to change your algorithm, because of different data distributions in the inner and outer table.

This post takes off where this one stopped. I highly recommend you reading it first, because here I take that setup and technique knowledge for granted.

different data distribution -> different performance

Here we have a parallel cursor implementation with a read table statement:

And this is our data generation logic:

1k Head datasets and 2,5 Mio Item datasets. In the following measurements I am also including the necessary sort of both tables.

Time taken:          151.398 Microseconds

Items / microsecond: 16,78

This is our baseline. Now I will change the distribution of datasets between the head and the items a bit:

Now we have 100k head datasets and 2,5Mio item datasets. Each head has now only has 25 item datasets.

Time taken:          324.033 Microseconds

Items / microsecond: 7,72

We suffered a runtime increase of over 117%. This is a problem.

Why is it slowing down?

As I changed the ratio of items<->head, I also increased the amount of cursor switches in the algorithm. Now I have to read 100times more often than I did before. This impacts my performance notably, as the read operation is still costly.

If I want to improve this, I have to get rid of my read mechanic somehow…

no read statement

There actually is a parallel cursor implementation without a read involved:

I ran it without the read statement:

Time taken:          215.838 Microseconds

Items / microsecond: 11,58

This implementation has improved my performance by 50%.

As we removed the read statement as a switch mechanic in our algorithm, we now do not receive such a big performance penalty when we switch to a new head dataset.

Is it always faster?

No, not always. Let’s look at our first data pattern:

Time taken:          149.364 Microseconds

Items / microsecond: 16,74

With the usage of our read statement we processed 16,78 Items per microsecond.

So both algorithms have a bottom line where they both deliver the same performance. But the less items a head dataset has, the more an implementation without a read statement makes sense.

I tested both implementations with a lot of different data distributions and I never saw the read statement implementation to ever clearly beat the implementation without it. So if you want to get on the safe side of things, implement a parallel cursor without a read statement.

Take care,


Improving performance before the squeeze

Improving performance only comes up as a relevant topic when things have gotten really bad.

When organisations are in a squeeze situation that one thing which was disregarded in the past suddenly becomes topic number one.

The first squeeze

Unfortunately a squeeze situation puts oneself in a very weak negotiation position. And when reality catches up everybody realizes that performance tuning is nothing that is done on a weekend right before go live. Sometimes these situations are “solved” by throwing a huge amount of expensive hardware at the problem. Combined with collecting the quick wins (Which are in essence developer created fuck ups, because you can not analyze and fix a complex design driven problem in 2 days), the situation can sometimes be brought to a point where the pain is just below the critical point of collapse. This is then called a “rough” and “exciting” or “engaging” go live.

The resurrection

Years later these applications (breathing imitations of frankensteins monster) rise up again to cause yet another “engaging” experience. This is typically the point where everyone has adapted to everything, the run departement knows that the world is not perfect but has accepted the situation.

But now that application is so slow in production, that people call for help and solutions. Unfortunately this is often again a squeeze situation (if it would’nt be considered urgent by management it would not have been brought up), so the drama developes again. Experts are invited to fix it in “as fast as you can” and new hardware is beeing ordered (despite nobody even previously analyzing that application in depth). But now things are different, actually worse. Because you do not have any low hanging fruits, those were collected by the initial go live fire departement. So we have an application with almost exclusively design driven performance problems which require an in depth analysis, without the time to do so. In addition you do not have a fresh application anymore – years in production have added a ton of changes, workarounds and features. Sumed up: Peace of cake!

Change your perception of performance

This does not have to be the way things develop, really! I know my view on performance tuning differs from that of my colleagues, but what I also try to communicate is that performance is a currency. A currency you can stack up when it is cheap to do so.

You can save up for unbenefitial changes in hardware prices ( Anyone checked RAM prices recently?), for new features you would like to design and implement, or simply to have a buffer you can use on the operational level to give your organisation the ability to recover from unexpected errors without appearing in the news. After all people do not often complain about HP reserves in their car, even if they do not use them every day.

Now really good performance tuning results do require effort and the right people to do so, but since you will probably have to do it at some point in time – do it before you actually get into a squeeze situation. Quality work takes time, you do not have that luxury in a squeeze.


Take care,