Tuning abap parallel cursor

Today I want to share with you how tuning abap parallel cursor works and when you would need to change your algorithm, because of different data distributions in the inner and outer table.

This post takes off where this one stopped. I highly recommend you reading it first, because here I take that setup and technique knowledge for granted.

different data distribution -> different performance

Here we have a parallel cursor implementation with a read table statement:

And this is our data generation logic:

1k Head datasets and 2,5 Mio Item datasets. In the following measurements I am also including the necessary sort of both tables.

Time taken:          151.398 Microseconds

Items / microsecond: 16,78

This is our baseline. Now I will change the distribution of datasets between the head and the items a bit:

Now we have 100k head datasets and 2,5Mio item datasets. Each head has now only has 25 item datasets.

Time taken:          324.033 Microseconds

Items / microsecond: 7,72

We suffered a runtime increase of over 117%. This is a problem.

Why is it slowing down?

As I changed the ratio of items<->head, I also increased the amount of cursor switches in the algorithm. Now I have to read 100times more often than I did before. This impacts my performance notably, as the read operation is still costly.

If I want to improve this, I have to get rid of my read mechanic somehow…

no read statement

There actually is a parallel cursor implementation without a read involved:

I ran it without the read statement:

Time taken:          215.838 Microseconds

Items / microsecond: 11,58

This implementation has improved my performance by 50%.

As we removed the read statement as a switch mechanic in our algorithm, we now do not receive such a big performance penalty when we switch to a new head dataset.

Is it always faster?

No, not always. Let’s look at our first data pattern:

Time taken:          149.364 Microseconds

Items / microsecond: 16,74

With the usage of our read statement we processed 16,78 Items per microsecond.

So both algorithms have a bottom line where they both deliver the same performance. But the less items a head dataset has, the more an implementation without a read statement makes sense.

I tested both implementations with a lot of different data distributions and I never saw the read statement implementation to ever clearly beat the implementation without it. So if you want to get on the safe side of things, implement a parallel cursor without a read statement.

Take care,

Dmitrii

Parallel cursor vs. secondary key

This one bugged me  for a while. Whenever I implement an inner loop which is dependent upon the outer loop I start to think about which solution to implement. So i decided to answer the question parallel cursor vs. secondary key for my self – and make a post about it in the process 🙂

Initial Setup

Let me introduce my mighty initial setup…

… a simple head <-> item loop logic. And this is my data generation for this:

Before we start the comparison, I measured the initial solution:

sequential:

parallel:

 

 

 

So now we know what we are tuning for.

 

Parallel cursor – Take 1

So I wanted to implement the parallel cursor solution first, although this was not my preferred solution. I am a creature of some habit and after I picked one up it takes some work to change it. Remember that in order for this to work the inner table has to be sorted before this:

This is the implementation:

This gave me a slight boost (Sort was also measured).

 

sequential:

parallel:  

 

 

 

But I was actually very disappointed, I expected way more. That was only an improvement by not even a factor of 2. And I did remember it being a very potent technique.

Secondary key

So now my hopes were with the secondary key of the table – which I made up after the subroutine was called:

Implementing my secondary key was very simple and straight forward. Now I was excited to see the runtime.

sequential:

parallel:

 

 

 

Now this seems right. This result is way more desirable. We got a performance improvement by 6.271 %. Scaling also improved. While the difference in runtime between the sequential workprocess and the individual parallel workprocess in the initial solution was + 95%, we have improved to “just” 73%.

But this seemed way too heavy of a difference. I was getting second thoughts… this seemed to good to be true. My favorite technique won. But it won by such a large margin that it seemed also ridiculous to me.

And in my experience something that is too good to be true – it isn’t. So my conclusion was that I had to have messed up something. So I searched – and  I found.

Parallel cursor – Take 2

I did discover what I messed up. I even found it to be interesting enough to share it, because I imagine that more people could tap into this trap without knowing that they left out so much potential.

Let me highlight the magical spot:

Take a look at my where condition. Found something unusual? Let me help.

Tell me, what happens when we are done with looping over the 25 matching datasets in the item table? Quit? Really?

We actually do not quit, because this thing does not know that we are done after the first 25 sets. It needs to make sure that the table does not contain matches which are in another part of that table. And this is a problem.

Remember our initial code?

This code was slow. And it was slow for a reason. Searching with a generic key forces a search with an exponential runtime curve. And this is exactly what I forced in my first parallel cursor implementation:

I just gave it another starting point. But the search type stays the same and so does the runtime curve – which makes this implementation stupid.

I then made a little change and made the code rely on my previous sort:

I deleted the where clause in the inner loop and replaced it with the if statement. You should place it right in the beginning in order to avoid bugs.

This worked very nice:

sequential:

parallel:

 

 

 

This solution beat my secondary key solution by 398%. And it is 31.654% faster than the initial solution.

So if you have a similar situation, I would recommend a parallel cursor solution.

And I have to change my habit, after all this blows my previous favorite out of the water…

 

Take care

Dmitrii