Understanding how .NET memory allocation works and why it is so (dangerously) fast

If you need to improve the performance of .NET applications, then at some point you will need to understand how .NET memory management works. Both the memory allocation and the garbage collection are frequent sources of performance problems (because of developers’ lousy understanding about how it works)

In fact, .NET is so efficient to allocate new objects into the memory that reinforces the idea that the developer should not worry about it. Unfortunately, more allocations you do, more collections you will need, and that would result in a performance problem.

This post was inspired by the excellent book Writing high-performance .NET code (now with an excellent second edition – I strongly recommend reading this).

What is so innovative about how allocation works on .NET

There are notable differences between how typical native heaps work (like in C++) and how CLR heaps work.

Here is the minimum you need to know how native heaps work to be able to appreciate the .NET GC Heaps implementation:

  • Native Windows heap implementation maintains free lists to know where to put new allocations.
  • Long-running native code applications frequently struggle with fragmentation.
  • The time spent in memory allocation gradually increases as the allocator spends more time looking for open spots.

It is pretty familiar to “native” developers to replace the default implementation of malloc with custom allocation schemes that work hard to reduce fragmentation.

If you want to learn more about how allocation works on Native Heaps, I recommend this page on StackOverflow.

.NET memory allocation works differently.

  • When you create an object instance, it usually happens at the end of a memory segment, and it consumes few “cheap” instructions.
  • There is no need to traverse a “free list,” and there is (almost) no fragmentation.
  • GC heaps have improved locality. Since objects are allocated together in time, they tend to be near on the heap.

From the Book of Runtime:

The managed heap is a set of managed heap segments. A heap segment is a contiguous block of memory that is acquired by the GC from the OS. The heap segments are partitioned into small and large object segments, given the distinction of small and large objects. On each heap the heap segments are chained together. There is at least one small object segment and one large segment – they are reserved when CLR is loaded.

There’s always only one ephemeral segment in each small object heap, which is where gen0 and gen1 live. This segment may or may not include gen2 objects. In addition to the ephemeral segment, there can be zero, one or more additional segments, which will be gen2 segments since they only contain gen2 objects.

There are 1 or more segments on the large object heap.

A heap segment is consumed from the lower address to the higher address, which means objects of lower addresses on the segment are older than those of higher addresses. Again there are exceptions that will be described below.

Heap segments can be acquired as needed. They are deleted when they don’t contain any live objects, however the initial segment on the heap will always exist. For each heap, one segment at a time is acquired, which is done during a GC for small objects and during allocation time for large objects. This design provides better performance because large objects are only collected with gen2 collections (which are relatively expensive).

Heap segments are chained together in order of when they were acquired. The last segment in the chain is always the ephemeral segment. Collected segments (no live objects) can be reused instead of deleted and instead become the new ephemeral segment. Segment reuse is only implemented for small object heap. Each time a large object is allocated, the whole large object heap is considered. Small object allocations only consider the ephemeral segment.

Show me the code

Let’s dig in the .NET allocation process with a straightforward example. Consider the following program:

class Foo
{
    private int X;
    private int B;
}
class Program
{
    static void Main(string[] args)
    {
        new Foo();
    }
}

Let’s debug it using WinDBG.

sxe ld clrjit
g
.loadby sos clr
!bpmd Allocations.exe Program.Main
g

Here is the main method (JITed).

Note that the actual addresses will be different each time you execute the program.

The relevant part is:

mov ecx, 3394D9Ch
call 031830f4
ret

The 3394D9Ch in this execution is the address of the method table. Let’s check it

!dumpmt -md 3394D9Ch

This is the memory table:

Going step-by-step, we get this (the JITed ctor code):

The relevant part:

mov eax,dword ptr [ecx+4] ; ds:002b:03394da0=00000010
mov edx,dword ptr fs:[0E28h]
add eax,dword ptr [edx+40h]
cmp eax,dword ptr [edx+44h]
ja 0318310f
mov dword ptr [edx+40h],eax
sub eax,dword ptr [ecx+4]
mov dword ptr [eax],ecx
ret
jmp clr!JIT_New (730b7d40)

In summary:

  • The constructor gets the information about the size of the new object from the method table (10h = 16 bytes) from the method table (previous screenshot)
  • edx+40h contains the memory position that should be used to store the new object
  • edx+44h contains the memory position of the last “available” byte reserved for .net objects
  • if there is not enough space (are we exceeding the size of the segment?), then CLR will need to start a slower allocation path (jmp clr!JIT_New (730b7d40))
  • if there is enough space, the work is done.

Conclusions

.NET is extremely efficient (especially compared with the standard implementation of the malloc function) to allocate objects. That is great!

The downside is that soon or later; objects will need to be discarded. GC is efficient as well, but only if your code behaves according to his design.

Cover image:Max Lakutin

Would you need help to improve the performance of your code? Let me know – I can help you.

1 Comment
  1. Thiago Borba

    Hi Elemar,

    Great post! I would like ask you what is yours thoughts regarding immutability in a functional thinking, where more objects will be allocated (probably more fragmentations, gen0 collecting, GC pressure, etc). And how about immutability of large objects?

    Best Regards,
    Thiago Borba

Leave a Reply

Your email address will not be published. Required fields are marked *