I’d like to get back to writing about things that will, hopefully, improve your understanding of XNA and C#, so I thought I’d discuss something fundamental to the performance and resource usage of your application: garbage. This is a topic which many C# programmers can probably get away without fully understanding – and many probably do – but to be able to write efficient and performant code, you need to know what’s going on under the hood.
Those of you who really want the nitty gritty can probably skip these articles and read this one, this one, and (for the Compact Framework, used on Xbox 360) this one, but they can be pretty heavy going. I aim to provide a more abstract explanation, and will be always trying to emphasise the implications for your game code.
We should probably start at the beginning. Computer programs are all about manipulating data, and data is stored in memory. A huge variety of data structures have been developed to control that memory, but at the highest level (and with particular importance for C#) some memory is set aside for the stack, and the rest is used by the heap. (There is also constant memory and static memory, where your constants and statics are stored).
The stack is strictly limited in size, and it’s the area of memory set aside for your code to run in. Every time you call a function, the stack grows and information necessary for the running of the program is stored in it, including the return address (so the function knows where to go back to when it completes), function parameters, and local variables. It is possible to write software that only uses the stack, and in fact safety-critical software usually does exactly that – the last thing you need in your nuclear reactor control code is a memory allocation failure. On the other hand, the fact that the stack is of limited size makes some algorithms a bad idea in software – in particular, recursion (where a function repeatedly calls itself) can make the stack grow very rapidly, potentially until it runs out of memory. It is usually therefore preferred to use iteration (a single function containing a loop) rather than recursion.
The heap is limited in size only by the total amount of memory available to the process, which may be close to the total amount of physical memory in the machine. Where data on the stack is temporary in nature, data on the heap can be more permanent. The flipside of the extra longevity is that the rules for controlling data in heap memory are more complex. While stack memory can be reclaimed as soon as the function using that part of the stack exits, heap memory must be allocated and deallocated at less predictable points in the code. In C++, for example, memory for a data object is explicitly requested using operator new, and released using operator delete; it is all too easy to forget to delete something, leading to a memory leak, or attempt to delete it twice, leading to (often) a crash. In addition, allocating and deallocating memory can be slow.
C# and other managed languages attempt to solve these problems by using a garbage collector to control the lifetime of data on the heap. This strategy makes it somewhat easier to write correct programs that do not leak and removes some classes of bugs entirely, but changes how code needs to be written to achieve good performance. In particular, allocating memory in a managed environment is much quicker – but while unallocating it is done automatically and is more likely to be done correctly, it can cause severe performance problems if you’re not careful. Hence why garbage collection is mentioned so often in discussions about C# in general, and XNA in particular (games being more performance-sensitive than a typical software application).
Basically (the articles above go into excruciating detail) data on the heap is stored in a single block. Every time an allocation is requested, the garbage collector expands the size of the block by the necessary amount and returns a reference to that data. Note that this is very fast – it’s literally just incrementing a number (the size of the heap), and that objects allocated at the same time exist next to one another in memory (which can bring performance benefits due to data locality). Also note that, by the nature of software, the reference that is returned will be stored somewhere – either on the stack (in a local variable perhaps) or somewhere else on the heap (as a member variable of an object that exists on the heap).
You can see from the diagram how this works. The stack will grow and shrink in size as functions are called and exited, while the heap will continually grow in size. References, on the stack or the heap, point at data on the heap (you never get a reference to data on the stack – there’s more to say about that later). If you look closely you’ll notice that if you start with a reference on the stack (a “root”) to the object it references, and then each reference in the object to the others in turn, you can build a list of which objects are still (directly or indirectly) referenced by a root, and which are not.
(Note that in the real world, static data is also a root, and heap objects can themselves act as roots under certain circumstances as well; but we’ll keep things simple in this discussion).
For example, you can see that stack object S2 references heap object H4, which in turn references H2 and H5. However, although H3 references H7, neither H3 nor H7 themselves are referenced by anything on the stack (the function that allocated them must have exited). They are therefore said to be “unrooted” and have been coloured slightly differently to show that. Think about what this means. The stack represents our running program. So if a heap object is unrooted, it must mean that nothing in our running program has any way of affecting, or being affected by, that heap object. It’s no use to us any more.
I mentioned that the heap will continually grow in size. This is clearly something that cannot be allowed to continue forever! At some point we must reclaim some memory. That point will come when we request another memory allocation, and the garbage collector decides that it can’t allocate any more without making room. It might do this when it has literally run out of spare memory, but will usually do it earlier, after some value of memory has been allocated. (This value varies on Windows and Xbox).
This point is important because it has several implications. Firstly, garbage collections cannot happen at any arbitrary moment. They only occur when you request more memory – if you don’t create any more heap objects using new, you will not get a garbage collection happening. This may mean garbage collections are unpredictable, but they are fully deterministic. Secondly, garbage collections happen on the thread that is allocating the memory, inside the new operator, not on a separate “garbage collecting” thread. This means that new is almost always very fast, until a garbage collection is necessary at which point it is very slow (then goes back to being very fast again).
When the garbage collection occurs, it builds a list of rooted and unrooted objects, very much as I described above – starting with roots on the stack, and following references through objects on the heap. It won’t visit any object more than once, so is efficient and can cope very happily with circular references (which, for example, smart pointers in C++ often do not). On Windows, it uses a technique called “generational garbage collection” to speed things up still further. This is incredibly clever, but not available on Xbox so I won’t go into detail here; again see the links above if you’re interested. Once it has worked out which objects are still rooted, it shuffles them together in memory, copying over the unrooted ones. This frees up more space so program execution can continue. Here’s what it will look like after the garbage collector runs on the memory in the diagram above.
Again there are some important consequences of this. We’ve already seen that allocating a new object is (almost always) very fast. However, we can begin to understand that having lots of references in our data means the garbage collector has to follow more links when working out what is rooted and unrooted, which is a small performance cost. It’s also more likely we’ll forget to set a reference to null, so objects may still be rooted when we actually don’t care about them any more. (Note though that a C++-style “true” memory leak is impossible). On the plus side, we never suffer from fragmentation, and we can see that objects allocated together (for good memory locality) will stay together due to the way they are shuffled. Less happily, objects that aren’t actually needed any more could stay in memory for a long time, until the next garbage collection, so our software is more memory hungry than may strictly be required.
Most of all, though, we can begin to understand why a garbage collection is slow. When the collector decides that a collection is necessary (when new is called and certain conditions have been met), it must: (1) Wait for other threads to get to a safe state and halt them, so they don’t interfere. (2) Iterate through all roots, following all references, building a list of rooted and unrooted objects. (3) Shuffle the rooted objects together on the heap, writing over unrooted objects. (4) Iterate through all references in all objects on the stack and heap, updating the pointers they contain to the new locations in memory. (In fact there are a few more complications relating to finalisers and pinning that I won’t cover here). Then, and only then, can it allocate the new object and return it to your code. That’s a lot of work – and that is why it’s so easy for garbage collections to cause your game to run unplayably slowly.
I hope you will now have a better grasp of exactly why the garbage collector exists, what it does during a garbage collection, and why it does it when it does it. If you’re coming from a C++ background, you should probably feel relieved that you don’t have to worry about having to delete data yourself, though it would be only human to be a bit concerned about the fact you have so little control over when the garbage collection runs.
You do have some control, though. As I mentioned above – if you don’t call new to create an object on the heap, you won’t suffer a garbage collection. That’s probably the most important lesson to get from all this. Create new heap objects as infrequently as possible, and you will see great benefits in the performance of your game. One common and very sensible approach is to create all your heap objects at level start, keep them in memory throughout the level gameplay, then forcibly induce a garbage collection during the next level load, when no-one cares about performance. You may be able to think of other approaches for non-level-based games. But we’re getting ahead of ourselves – how do you control what is created on the stack, and what is created on the heap? I’ll talk about that in part two.
“640KB of memory is more than anyone will ever need.” – Bill Gates (attributed – but not true!)
PS. If you think garbage may be causing you problems, the CLRProfiler is your friend.