NonTech | Why software can run slow – Part 4/5 | Data structure

Welcome to one of my NonTech Posts. This type of post is targeting a non-developer audience and tries to explain specific Topics on a high-flier level.

 

When reduced to a minimum, an application transforms data from one form into another.

Algorithms are very important for that, but algorithms are designed to work with data. And data is structured.

How an application structures it‘s data has a great influence on it‘s performance. But why?

Imagine two trucks, both need to transport cars from Point A to Point B.

Both have the same amount of cars to transport.

The difference is that the first Trucker did not manage to load all of his cars onto his truck for his first run, he must drive two times to transport all of his cars.

The second Trucker does, so he needs to drive only once. He managed to structure his payload better and is therefore faster.

Now imagine the same situation with two applications. Both request data from RAM, but the first application has a data structure that allows it to fetch only once.

The second application does not access such a tight packaged data structure and therefore must either request twice, or fill up the internal Bus system to get that data.

What is a Bus System? Oversimplified, it is what connects your RAM, CPU and all other components. Think of it as the highway, which you have to take to move data around.

In a modern computer, CPU Power is not what holds you back. Actually your CPU is ridicilously fast.

The problem is that your applications do not manage to get that data through the Bus System fast enough. Your CPU typically waits 50%-75% of runtime for data.

So the goal is to design data structures which have a very high information density.

This is especially important in regard to the very small but fast L1 and L2 Caches of your CPU.

What are L1 and L2 Caches?

Think of it as very small, fast and expensive Memory right next to the CPU. L1 Cache is smaller, faster and more expensive than the L2 Cache. It also is closer to the CPU.

The Goal is to keep the data you touch often in a L1 or L2 Cache, because RAM access is 100 times slower than an access to L1 Cache.

If your data structure is chubby and has a low information density, you will not be able to fit it in one of these very fast caches and performance will suffer.

Your application will then move data in and out of RAM and thereby not only fail to provide the CPU with data in time, it will also fill that Highway up so no other processes can get their data through as well (Think of really bad L.A. Traffic).

As a side note your data structures also influence your algorithms. If your data structures are chubby, your algorithms will have to do more work to access all of the relevant data (or change it).

Leave a Reply

Your email address will not be published.