Tuning of abap delete performance

Today I would like to discuss how you can improve your abap delete performance. As there are different kinds of solutions to this problem, I would like to work with one example setup through the whole post – improving the performance step by step.

Initial setup

Our initial setup deals with deleting all of the non essential datasets from our datastream. In order for the datasets to be relevant and not be deleted, they have to be validated.

And this is how the data structure looks like:

So we just search for a pattern in the payload. If we find it, our task is to not deliver it back. Which we achieve by deleting.

Before we start changing the solution, let us measure the runtime.

1 WP 1k lines

8 WP 1k lines

 

 

 

 

1WP 10k lines

8WP 10k lines

 

 

 

First thoughts

Now I initially wanted to go beyond 100k, but I honestly was afraid that this thing would never come back to me. The non linear runtime curve scares me. Which will be our main target to get rid off. A non linear response forced by the generic where condition…

… forces an internal search in the table for the right target(s). Now even if there is just one, the runtime curve still stays exponential. We have to change this as fast as possible.

Also the scaling concerns me a bit. We take almost double the time in full parallel mode per workprocess compared to our single process. Unfortunately, I can not spot anything I could do to reduce the load on the memory system. I do not use deep data types in my data structure and also do not have the possibility to reduce the width of my table structure. 🙁

Stage 1

Stage 1 has to get rid off the non linear response to table size. The solution here is to replace the target format for the deletion with an index value.

Why an index access? Because the access performance is not dependent upon the amount of the data in the table. It behaves like a hashed table in this regard. And also has a similar or mostly lower constant access time.

So let us look at the runtime:

1WP 1k lines

8WP 1k lines 

 

 

 

1 WP 10k lines

 

8WP 10k lines

 

 

 

Much better!

Judging Stage 1

Sequential performance improved by over 630% at 1k lines and over 610.000 % at 10 k. Removing the exponential search does pay off really well here.

Scaling attributes have also improved slightly, although not intended. Probably because we do not grind through the entire table every time we find something interesting.

But there is still one aspect of this solution, which I am not happy with. That is the delete statement itself.

Table Index rebuild

When we delete an entry from a table, we change its composition. With that action comes a necessity for the table to update its own composition status. After all we do expect the table to know what nodes it still has. That means that through our deletion process we force a rebuild of the tables index. It has to update its own status in order to stay consistent. This operation does take time and I want to know how much performance there is to gain if we avoid that rebuild.

Stage 2

I changed the mechanic by removing the delete statement and turning it into an append only mechanic. If something does not fit our criteria, it just does not get appended. After the loop I replace the initial table with my result.

This removes the index rebuild as a performance drain.

1WP 1k lines

8WP 1k lines

 

 

 

1WP 10k lines

8WP 10k lines

 

 

 

Judging Stage 2

Sequential performance improved by over 340% at 1k lines and by over 370%. compared to Stage 1. So the rebuild of the table index does have a significant impact on the performance of a deletion operation.

Compared to the initial solution, we achieved a performance improvement of over 282.000% at 1k lines and an improvement of over 2.897.000 % in runtime.

As a side note I must admit, that the last performance improvement figure does look ridiculous… but it proves a point. The point being not to ever build a solution which has an exponential runtime response to data input.

Take care,

Dmitrii

 

DD03L remap performance

Today I would like to show you how I improve DD03L remap performance into a csv format. I will start with a simple setup and tune it step by step, thereby showing you different tuning possibilities.

Starting setup

Our starting setup is pretty basic. We have a lcl_file_service class which performs the reformat operation for us. We can feed it any table we want, as long as it is contained in DD03L.

So this where we start. I feed it from the outside with the request to remap the whole content of the DD02L table.

Initial runtime:

Stage 1

There are a few things that catch my attention here right away:

  • Looping at the it_dd03l into a workarea, that is a common site – but really slow. I want to change this to a field symbol
  • Assigning component with the fieldname, instead of the position. This statement can also be used by working with the position of the target field. This is usually faster.
  • We are inserting our result instead of appending it. If inserting is not obligatory, we should not do it. Switching to append here.
  • We have a branch, which checks if the end of the file line was reached in order to skip the last separator. I do not like branches in hot loops, I really don’t. If you have any possibility to pull branches outside of your hot loops, do it. Although your processor is likely to be able to predict the outcome of this branch most of the time, I would not bet on it. Therefore I will only loop until the next-to-last line and read the last line separately. This way I avoid having the branch.

So let us then implement these conclusions..

Now let us check our performance.

Stage 1 runtime:

We improved our performance by 125%.

 

But I am still not happy. I actually do not like the whole approach of the solution. Why do I have to do things over and over again?

For example the mapping of the individual fields: I do an assign component every time I touch a field in a data row.

And every time I have to check if my palms are in trouble.

Then I have to go through that buffer variable, so I do not blow myself up when I touch datatypes which are not compatible to a simple concatenate operation.

And if that was not enough I have to keep track of where I am in the file line, so I do not destroy the output format. 🙁

Stage 2

I do not want to do all of those things above over and over again. If I really have to do them, I want to do them only once.

In order to reach that goal we have to change our approach. Let me present you to my ideal remap solution:

Now I know that is quite abstract, but it is essentially what I am willing to invest. I want to enter a magical loop, where everything has already been taken care of. The input is just waiting to be mapped to the correct output. Both are in one line, so I do not have to jump around like a fool. And also every necessary input check has to be taken care of. After all everything I would need for those checks is contained in the dd03l table and one single row of input data.

I also have no more stupid branches to worry about and the formatting has also been taken care of.

Unfortunately the real solution is not that simple or lean, but it still follows the same approach:

As you see we have a new method involved – build_remap_customizing – which takes care of all that work I do not want to repeat doing.

The result table has a structure with only 4 components:

  • source_offset type i
  • source_width type i
  • target_offset type i
  • target_width type i

Nothing more is needed.

This is what our remap loop looks now. The whole assigning and input to output matching has already been taken care of.

Now maybe you have noticed that I do not work on the input and output directly, but through buffer variables. The reason for that is that I want my source and target to be alphanumerical, so I can work with parts of them. When I copy the data row into my input structure (well, it is a long char field actually…), I make sure that every field is right where I expect it to be. The same applies for the output field – here the separators have also been pre computed.

So, what’s the performance?

We have increased performance by 98% compared to stage 1 and 350% compared to the initial solution.

Conclusion

Basic improvements do help and can provide you with solid performance improvements. But sometimes we need a little change of perspective to come up with different and better solutions.

Do take care of the basics, but also try to pre compute and optimize your whole solution approach from time to time – it pays off.

Take care,

Dmitrii

ABAP PP-Framework documentation

I would like to introduce you to the ABAP PP-Framework – a parallel processing framework, which I use for all of my batch applications.

It was developed in the year 2002 by SAP. The main author is Thomas Bollmeier.

This framework was recommended to me by my mentor back when I started my education – and I still use it. It is old, but gold.

If you do not require recursive parallelization, this framework does the job pretty well. Not very fancy, but very reliable. Because of its availabilty in every Netweaver system it has become a standard choice for parallelizing SAP applications.

After some of my colleagues asked me for the documentation, I decided to just put it online.

As a side note I have to say, that the documentation is in german.

Take care

Dmitrii

 

Framework für die Parallelverarbeitung

Einführung in das Parallelverarbeitungstool

 

ABAP – Why parallelization through DIA processes is a bad idea

Today I would like to talk about how your program parallelization fits into a bigger concept of an abap system design & spec.

Now for this I assume you know that you have multiple types of process types available for parallelization on your abap system.

Whenever I see programs that can function in a parallel fashion, I look at what kind processes those programs use. Often, I see a simple function call with “Starting new task” added – the report is pumping out dialog processes to distribute the workload.  And this is precisely what I do not want to see as a solution. The reason is not so much the code itself, but the consequences it has on the overall system behavior and spec driven strategy of the abap system.

Let us take a step back and imagine for a moment that we are not developers – let us change our perspective. You now have the task of configuring your abap system to meet a certain business requirement. This business requirement is simple:

“Up to X Users per Minute must be able to use the reporting functions, apart from that we have these jobs which need to run every hour. Those jobs must run. If they do not, we lose money and clients. Budget is X€ per month.”

Now let us assume that the sizing of the machine has already been taken care off, the budget limit was also respected. Your job now is to make sure that those jobs can indeed run every hour with a minimum chance of failure. How do you do that?

Your only reliable solution is to adapt our abap server memory configuration. Your abap instance has multiple types of memory, but we now just want to focus on two:

HEAP

HEAP Memory is mostly allocated by BTCH processes. BTCH processes start with allocating HEAP until they either hit their individual limit (parameter value) or the global HEAP limit (parameter value). If your BTCH process hits either of those limits, it starts allocating EM memory (skipped the roll memory, but it’s insignificant for this case) until it hits one of the set limits (individual or global).

 

EM

EM Memory is mostly allocated by DIA processes. DIA processes start with allocating EM until they either hit their individual limit (parameter value) or the global EM limit (parameter value). If your DIA process hits either of those limits, it starts allocating HEAP memory (again skipped the roll memory) until it hits one of the set limits (individual or global). If a DIA process is allocating HEAP memory is goes into the so called private mode. In that mode the process is not being freed when the program is finished. In order to be able to hold the memory allocated in the HEAP it has to keep occupying the workprocess. If you have too much of this, you do not have any spare DIA workprocesses left and your system becomes unusable.

 

Those are the basics in a very simplified form. Your DIA and your BTCH processes each have a memory area where they feel at home. Apart from the split having an interesting technical background (for example: EM Memory can handle fast context switches better. Therefore better equipped to handle masses of short workloads), they also give the possibility of sealing off user driven workloads from batch driven workloads by restricting them to their starting memory type. In that case your DIA processes would be restricted from allocating more than a minimal amount of heap and your BTCH processes could not allocate EM past a technical minimum.

This is what enables you to make sure that those jobs are never short on memory and would always be in the position to start. Even if your DIA Users start to demand an extreme amount of memory, your BTCH processes are shielded. Although you can’t compensate for an undersized machine, you can protect your important business processes.

After having our core processes dump in the night because a user was trying to allocate 40GB of memory, I am a big fan of sealing off those workloads against each other.

And this is precisely the point where parallelization of programs with DIA processes becomes a big problem. If I as a “run oriented” person start BTCH jobs, I expect them to run in batch – not to pump out DIA processes. You can’t seal that off, and you will therefore fail to deliver. Your batch workload has to be immune to dialog user driven actions. When it gets serious, I rather have a few dialog users receive a memory dump than my core jobs failing.

 

Take care

Dmitrii

 

 

How to wreck ABAP Performance

Read statement with generic key                                                         

My all-time favorite. Really. This construct blows up so often and with such force that you need a special name for it.

If both tables reach or surpass a line count of 10k, the death spiral starts. If you go beyond 100k your runtime can go from a few minutes to surpassing multiple hours. After all you are using a search method with an exponential runtime curve - enjoy the ride!

One special characteristic of this construct is that it does not surface until you put real pressure on the application. This leads to an interesting sequence of events:

  1. Developers develop software driven by the business case.
  2. Test with a small amount of test data.
  3. If scope is complete or budget has been consumed, rollout begins.
  4. Customer installs and try’s the application with a small testset, in order to confirm that the application is doing the task right.
  5. Application enters production.
  6. Just when the workload is high, pressure on the team is high, the timing really bad and the possible margin for errors in production are next to zero - it blows up.

This setup is really easy to spot, but it should be changed before it tears you apart.

 

Modify/Delete Statement with generic key

This follows the same pattern as our first setup. The problem here is that this delete operation requires a search in order to find out what is supposed to be deleted. And this search intrinsically has the same runtime curve as our generic read statement from the first setup. An exponential one. As a bonus, you receive the effect of your table index (something has to hold the internal table together) constantly rebuilding by force. This does not make it faster.

And of course, you get the same pleasant pattern of surfacing only when the workload is high - perfect for occasions like end of year processing and other mission critical processes.

Looping into a workarea

This is pretty common. Although this is nothing that obliterates your production machine, it does hurt and should be avoided.

Always replace this with a field symbol or a reference (there are differences in performance, but only if you like to exploit your CPU Cache). Fixing this is cheap and can improve your performance significantly.

 

Select loops

Select. .... Endselect. Loops are useful, but should be avoided like the plague.

If there is a possibility to use a standard bulk sql statement for it - use it. Everything is better than this.

You either fetch your whole package in one take or you push it down into your DB, but do not use this solution in anything even remotely dependent on performance.

Select single in loops

If you want to trash your database with a high management cost to result ratio, you should use this as often as you possibly can.

Great for slowing your application down for no reason. Avoid the "for all entries" function of open sql, because it might speed you up and reduce your lunch break significantly.

 

Not using mass update or insert

Some people tend to micromanage things. But there are folks that take it way too far.

I assure you, that the compiler and the database can manage things well themselves. You can even go a step further and trust them with inserting or updating a database table from internal table in just one command! Really. I promise you, they will not lose anything important. There is a reason for a database not having a lost property office...

 

Committing too often

An excellent strategy for preservation of the turning hourglass is committing after every interaction with the database.

It is important to take your time...

 

Take care

Dmitrii