Today I want to share with you how tuning abap parallel cursor works and when you would need to change your algorithm, because of different data distributions in the inner and outer table.
This post takes off where this one stopped. I highly recommend you reading it first, because here I take that setup and technique knowledge for granted.
different data distribution -> different performance
Here we have a parallel cursor implementation with a read table statement:
1 2 3 4 5 6 7 8 9 10 11 12 |
loop at it_head assigning <ls_head>. read table it_item assigning <ls_item> with key head_id = <ls_head>-head_id binary search. lv_from = sy-tabix. loop at it_item assigning <ls_item> from lv_from. * Please insert bug... if <ls_item>-head_id <> <ls_head>-head_id. exit. endif. endloop. endloop. |
And this is our data generation logic:
1 2 3 4 5 6 7 8 |
do 1000 times. ls_head-head_id = sy-tabix. do 2500 times. ls_item-head_id = ls_head-head_id. append ls_item to lt_item. enddo. append ls_head to lt_head. enddo. |
1k Head datasets and 2,5 Mio Item datasets. In the following measurements I am also including the necessary sort of both tables.
Time taken: 151.398 Microseconds
Items / microsecond: 16,78
This is our baseline. Now I will change the distribution of datasets between the head and the items a bit:
1 2 3 4 5 6 7 8 |
do 100000 times. ls_head-head_id = sy-tabix. do 25 times. ls_item-head_id = ls_head-head_id. append ls_item to lt_item. enddo. append ls_head to lt_head. enddo. |
Now we have 100k head datasets and 2,5Mio item datasets. Each head has now only has 25 item datasets.
Time taken: 324.033 Microseconds
Items / microsecond: 7,72
We suffered a runtime increase of over 117%. This is a problem.
Why is it slowing down?
As I changed the ratio of items<->head, I also increased the amount of cursor switches in the algorithm. Now I have to read 100times more often than I did before. This impacts my performance notably, as the read operation is still costly.
If I want to improve this, I have to get rid of my read mechanic somehow…
no read statement
There actually is a parallel cursor implementation without a read involved:
1 2 3 4 5 6 7 8 9 10 11 |
loop at it_head assigning <ls_head>. loop at it_item assigning <ls_item> from lv_from. * Please insert bug... if <ls_item>-head_id <> <ls_head>-head_id. lv_from = sy-tabix. exit. endif. endloop. endloop. |
I ran it without the read statement:
Time taken: 215.838 Microseconds
Items / microsecond: 11,58
This implementation has improved my performance by 50%.
As we removed the read statement as a switch mechanic in our algorithm, we now do not receive such a big performance penalty when we switch to a new head dataset.
Is it always faster?
No, not always. Let’s look at our first data pattern:
1 2 3 4 5 6 7 8 |
do 1000 times. ls_head-head_id = sy-tabix. do 2500 times. ls_item-head_id = ls_head-head_id. append ls_item to lt_item. enddo. append ls_head to lt_head. enddo. |
Time taken: 149.364 Microseconds
Items / microsecond: 16,74
With the usage of our read statement we processed 16,78 Items per microsecond.
So both algorithms have a bottom line where they both deliver the same performance. But the less items a head dataset has, the more an implementation without a read statement makes sense.
I tested both implementations with a lot of different data distributions and I never saw the read statement implementation to ever clearly beat the implementation without it. So if you want to get on the safe side of things, implement a parallel cursor without a read statement.
Take care,
Dmitrii