Tuning of abap delete performance

Today I would like to discuss how you can improve your abap delete performance. As there are different kinds of solutions to this problem, I would like to work with one example setup through the whole post – improving the performance step by step.

Initial setup

Our initial setup deals with deleting all of the non essential datasets from our datastream. In order for the datasets to be relevant and not be deleted, they have to be validated.

And this is how the data structure looks like:

So we just search for a pattern in the payload. If we find it, our task is to not deliver it back. Which we achieve by deleting.

Before we start changing the solution, let us measure the runtime.

1 WP 1k lines

8 WP 1k lines

 

 

 

 

1WP 10k lines

8WP 10k lines

 

 

 

First thoughts

Now I initially wanted to go beyond 100k, but I honestly was afraid that this thing would never come back to me. The non linear runtime curve scares me. Which will be our main target to get rid off. A non linear response forced by the generic where condition…

… forces an internal search in the table for the right target(s). Now even if there is just one, the runtime curve still stays exponential. We have to change this as fast as possible.

Also the scaling concerns me a bit. We take almost double the time in full parallel mode per workprocess compared to our single process. Unfortunately, I can not spot anything I could do to reduce the load on the memory system. I do not use deep data types in my data structure and also do not have the possibility to reduce the width of my table structure. 🙁

Stage 1

Stage 1 has to get rid off the non linear response to table size. The solution here is to replace the target format for the deletion with an index value.

Why an index access? Because the access performance is not dependent upon the amount of the data in the table. It behaves like a hashed table in this regard. And also has a similar or mostly lower constant access time.

So let us look at the runtime:

1WP 1k lines

8WP 1k lines 

 

 

 

1 WP 10k lines

 

8WP 10k lines

 

 

 

Much better!

Judging Stage 1

Sequential performance improved by over 630% at 1k lines and over 610.000 % at 10 k. Removing the exponential search does pay off really well here.

Scaling attributes have also improved slightly, although not intended. Probably because we do not grind through the entire table every time we find something interesting.

But there is still one aspect of this solution, which I am not happy with. That is the delete statement itself.

Table Index rebuild

When we delete an entry from a table, we change its composition. With that action comes a necessity for the table to update its own composition status. After all we do expect the table to know what nodes it still has. That means that through our deletion process we force a rebuild of the tables index. It has to update its own status in order to stay consistent. This operation does take time and I want to know how much performance there is to gain if we avoid that rebuild.

Stage 2

I changed the mechanic by removing the delete statement and turning it into an append only mechanic. If something does not fit our criteria, it just does not get appended. After the loop I replace the initial table with my result.

This removes the index rebuild as a performance drain.

1WP 1k lines

8WP 1k lines

 

 

 

1WP 10k lines

8WP 10k lines

 

 

 

Judging Stage 2

Sequential performance improved by over 340% at 1k lines and by over 370%. compared to Stage 1. So the rebuild of the table index does have a significant impact on the performance of a deletion operation.

Compared to the initial solution, we achieved a performance improvement of over 282.000% at 1k lines and an improvement of over 2.897.000 % in runtime.

As a side note I must admit, that the last performance improvement figure does look ridiculous… but it proves a point. The point being not to ever build a solution which has an exponential runtime response to data input.

Take care,

Dmitrii