December 6, 2008


Filed under: Games Development,XNA — bittermanandy @ 12:04 am

Wow, what a time. After having barely played a game all year, I spent weeks playing Fable 2 (very good, though could have been even better), and I’m about to get cracking on Banjo Kazooie: Nuts and Bolts (the last game I worked on before leaving Rare) and Banjo Kazooie XBLA. I’ll be getting Tomb Raider Underworld and the new Prince of Persia for Christmas, and sometime in January, I’ll be surrendering every waking moment to Empire: Total War (seriously – I plan to take a week off work when it comes out). I’ve not so much as had a spare second to check out the games on Xbox Live Community Games, which probably makes me a terrible hypocrite who will be punished in the afterlife. Frankly it’s beyond a joke. I don’t even have enough time to play all of these games, all coming out within two months of each other – never mind time for my other hobbies!

Fortunately the other day I managed to squeeze in a few hours with XNA and spent the time fixing up a few odds and ends I’d been meaning to look at for a while. Specifically, I rearranged the structure of my engine and game code, for clarity and useability. It’s well worth emphasising a couple of points here: first, while I’ve built up this structure after a fair bit of experience with professional game engines and a lot of careful thought, there are undoubtedly other ways of doing it. I encourage you to hunt around, look at what others have done, and deeply consider what will work for you. Second, this is very much a work in progress. I will undoubtedly be tweaking things yet further as time goes by, and I’ve had to make a few decisions “blind” – I’m pretty sure they’re the right choices but it’s entirely possible I’ll reverse them later as experience dictates.

So. At the highest level, the XNA code I write is split into two general groups: engine code (shared code, usable by more than one game, written in such a way as to be reasonably general purpose without being so abstract as to be useless); and game code and assets (specific to one particular game, though sometimes code can start life here before being moved into the engine later). In an ideal world, all your code would exist in the engine, and there would be no game-specific code at all. Games would be differentiated only by their assets – the single executable built by the engine would search for an XML file or similar, that would define what other assets to load, and by a combination of scripting and data-driven design, the game would run with no unique code at all. Realistically, that’s not an achievable goal, and a game will always have game-specific code. The trick is to find a balance.

(First a plea. Don’t “write an engine”. The XNA Framework is so close to being an engine that such a task is pointless, and you’ll also find that no-one will use it – because you won’t have solved the problems you’d encounter if you wrote a game. Hobbyist DirectX programmers all set out to write an engine, and barely any of them ever complete a game. In fact, it wouldn’t be bad advice to write a game first, then only when it’s finished, go back and look at how it can be separated into engine and game code, ready for your next game).

Anyway, let’s have a look at the folder structure of the engine and the games in turn. You’ll want to store all your stuff in a single folder – mine are in Documents\Code\XNA – Visual Studio will encourage you to put everything under the Documents\Visual Studio 2008 folder, but that’s a really bad idea because when Visual Studio 2010 and XNA 4.0 come out you’ll just have to move everything, which apart from being generally inconvenient can be a royal pain if you’re using source control!

Engine Folder Layout

My engine is codenamed Kensei, and is therefore stored in the Kensei folder. Here, you will find the Kensei.sln Visual Studio solution file, as well as my Kensei.FxCop project (with CustomDictionary.xml) and a couple of handy batch files which I’ll come to in a minute. Everything else is stored in subfolders, corresponding to the code projects that are referenced in Kensei.sln (which in turn correspond to the assemblies that can be used by my games).

Kensei – the “core” project containing the majority of the code I write. In here exists my audio engine, camera management code, handy development utility code, command pattern implementation, and everything else I’ll come up with as I add features. Each logical unit is stored in a subfolder of this folder, so for example my audio code exists in Documents\Code\XNA\Kensei\Kensei\Audio – but it’s all built together in a single C# project called Kensei. (I’ve worked on games where each logical module had its own project. It just meant we ended up with fifty-odd projects and fifty-odd assemblies where one would do. It might make it easier to keep things decoupled, but adds headaches via complexity, especially as C# makes circular references awkward or worse. Don’t bother).

Kensei.EmbeddedResources – there are a few things in the Kensei engine that require specific assets. For example, Kensei.Dev uses a font for simple text and a shader for simple shape rendering. At first, these assets had to be copied into the content project of each game prototype I created but that quickly became a pain. The answer (as Shawn Hargreaves describes expertly) is embedded resources.

Kensei.Pipeline and Kensei.Pipeline.Runtime – XNA makes working with assets an absolute dream and most non-trivial games will extend the content pipeline to support custom asset types. Kensei.Pipeline extends the content pipeline itself, while Kensei.Pipeline.Runtime acts as a bridge between the content pipeline and the game – again, Shawn explains in full.

DebugContentBuild – it’s an inevitable fact that all code has bugs, so I followed Stephen’s advice and created a project that vastly simplifies the process of debugging my content pipeline extensions. Basically, running this project attaches to MSBuild.exe and runs the content pipeline on the assets in its content project – allowing me to step into my Kensei.Pipeline code and see where it’s going wrong.

(Other Code Projects) – one of the things I love about the XNA community is the easy willingness with which people share their work. Make use of it! I don’t want to spend my hobby time reinventing the wheel or rewriting something someone else has already done perfectly well (unless, of course, it’s the main differentiating point of my game, or something I have a specific interest in doing or learning. Always in-source your USPs, but don’t even attempt to do everything yourself!). Therefore, my Kensei engine solution contains projects for JigLibX, the Curve Editor Control, and a few other public components, as well as code supplied with various books I own. Which books should you own? Start here!

External – of course, some code isn’t supplied with source, but as a standalone assembly. For example, I use Perforce for source control (accept no substitute! It’s the best there is, period – it’s so good, I use it even though I don’t have to, and I’ve not yet come across any other source control I’d say that about) and the rather wonderful P4.NET lets me write the ability to work with Perforce into my tools. This allows my game editor to automatically check out assets that I work on, for instance. This, and other such assemblies, are all stored in the External folder (with a handy text file so I don’t forget where I got them from…).

Now, with all these projects and assemblies floating about, each built into their own build directory, finding them from the individual games becomes awkward! It’s much easier when all the engine assemblies are in the same place. With this in mind, I wrote the following PostBuild.bat batch file. Every project in the solution runs this as a post-build step (details in the script itself). It copies the output files of the project into the bin folder, so when I create a new game, I only have to look in that one folder for all my references.

@echo off
rem ————————————————————————
rem Run this script as a Post Build Step from every Kensei library to be
rem used by a project or game outside of the Kensei solution.
rem It will place all built files into solution\bin\platform\configuration.
rem This will make them easier to find when adding references.
rem ————————————————————————
rem Post Build Step:
rem call “$(SolutionDir)PostBuild.bat” “$(SolutionDir)” “$(PlatformName)” “$(ConfigurationName)” “$(TargetDir)” “$(TargetName)”
rem ————————————————————————

echo —- Post Build Copy —-
echo Copying %4%5.* to %1bin\%2\%3…
xcopy   %4%5.*   %1bin\%2\%3   /Y   /I   /F

Finally, with all these assemblies floating around, it’s nice to be able to clear everything out and know that a build will start completely afresh. (Why does “Clean” do nothing on a C# project? Alright, Visual Studio handles C# a hundred times better than C++, but I still sometimes want to know that everything that isn’t a source file has been deleted). So I wrote a CleanAll.Bat (warning, use at your own risk, especially if you make use of folders named ‘obj’ or ‘bin’):

@echo off
rem Script to completely start afresh without *anything* still hanging around from previous builds.
rem This is useful to, for example, check what would happen if it was checked out on a clean machine.

echo ——————————–
echo Delete Visual Studio local files
echo ——————————–

del /Q /S *.csproj.user
del /Q /S /AH *.suo

echo ————————————
echo Delete intermediate and output files
echo ————————————

FOR /R %%i IN (*) DO IF EXIST “%%i\..\obj” rmdir /Q /S “%%i\..\obj”
FOR /R %%i IN (*) DO IF EXIST “%%i\..\bin” rmdir /Q /S “%%i\..\bin”


That’s pretty much it for the Kensei engine (well, more or less, as you’ll see in a moment). After seven or eight years with playing around with Visual Studio and trying to arrange large game projects in a way that preserves my sanity, I’m pretty confident that this layout is the easiest to work with.

Game Folder Layout

The whole point of the above is to make creating new games prototypes quick and easy. I’ve got ten or twelve game ideas in my head right now, although Pandemonium remains my main focus, and every time I get a new one, I like to be able to spend one evening slapping something together that just works with a SpriteBatch (for 2D) or Kensei.Dev.Shapes (for 2D and 3D) without spending any time at all plugging the lower-level stuff together. So, all my games exist in the Documents\Code\XNA\Games folder, and reference assemblies in the Documents\Code\XNA\Kensei\bin and External folders. Taking Pandemonium as an example, Documents\Code\XNA\Games\Pandemonium contains (as you’ll have come to expect) the Pandemonium.sln Visual Studio solution, and the Pandemonium.FxCop file (and CustomDictionary.xml) as well. Then there are the following projects in the following folders – again you should be able to guess what they are, and they should be fairly self explanatory when read in combination with the above:

Pandemonium – the game itself. Of course, the content project and all assets are stored in here too.
Pandemonium.Pipeline – content pipeline extensions for the game.
Pandemonium.Pipeline.Runtime – a bridge between the two.

It really is as simple as that, and if I were just writing a quick prototype there would probably only be the first one.


There’s one project I didn’t mention above in the Kensei engine. I’ve only just started doing things this way, and although so far it’s working well I’m a little concerned it may prove inflexible in the long run. I’d encourage you to experiment with it – after all, if a particular game finds it hard work fitting into this model, just don’t use it. It will save you a lot of time on those games that work alright with it.

Basically, I’ve created a project named Kensei.Game, and it contains a single class, Kensei.Game, which derives from Microsoft.Xna.Framework.Game. (Unimaginative names, I’m afraid…) If you’re wondering why it’s a separate project and not part of the Kensei project, that’s because it uses code from all the different assemblies and it was easier to keep it apart than try to prevent them from referencing one another in a circular manner. The idea is, that all of my games can inherit from Kensei.Game instead of (more directly) Microsoft.Xna.Framework.Game, and Kensei.Game can handle all of the things that each individual game would have to do for themselves.

Allow me to provide an example. Kensei.Game provides the following (sealed!) implementation of the Draw method: 

/// <summary>

/// This is called when the game should draw itself.

/// </summary>

/// <param name=”gameTime”>Provides a snapshot of timing values.</param>

protected sealed override void Draw( GameTime gameTime )


    // Prepare the game for drawing


    if ( Kensei.Dev.Options.GetOption( “Draw.Wireframe” ) )


        GraphicsDevice.RenderState.FillMode = FillMode.WireFrame;



    if ( Kensei.Dev.Options.GetOption( “Draw.CullModeNone” ) )


        GraphicsDevice.RenderState.CullMode = CullMode.None;



    Color clearColour = Color.Black;


    if ( Kensei.Dev.Options.GetOption( “Draw.CornflowerBlue”, Kensei.Dev.Options.BehaviourIfNotPresent.ReturnTrue ) )


        clearColour = Color.CornflowerBlue;



    GraphicsDevice.Clear( ClearOptions.Target | ClearOptions.DepthBuffer, clearColour, 1.0f, 0 );


    GraphicsDevice.RenderState.DepthBufferEnable = true;

    GraphicsDevice.RenderState.AlphaBlendEnable = false;

    GraphicsDevice.RenderState.AlphaTestEnable = false;


    // Perform game-specific drawing


    DrawGame( gameTime );


    // Complete drawing


    Kensei.Dev.Manager.Draw( GraphicsDevice,



        Window.ClientBounds.Height ); 

    base.Draw( gameTime );



/// <summary>

/// This is called when the game should draw itself.

/// </summary>

/// <param name=”gameTime”>Provides a snapshot of timing values.</param>

protected abstract void DrawGame( GameTime gameTime );

What this means is that each individual game I write has much, much less to do in it’s (abstract) DrawGame method. I can’t forget to call base.Draw, and I always get the benefits of the same Kensei.Dev options and Kensei.Dev.DevText and Kensei.Dev.Shapes drawing – completely for free. The DrawGame method in Pandemonium is simply this (admittedly, it’s very much work in progress and there’s not much to draw yet, but you should get the idea of how little code there is to write per-game, or per-game-state when it’s a little more developed):

/// <summary>

/// This is called when the game should draw itself.

/// </summary>

/// <param name=”gameTime”>Provides a snapshot of timing values.</param>

protected override void DrawGame( GameTime gameTime )


    m_background.Draw( m_camPosition, ViewMatrix, ProjectionMatrix );

    m_angel.Draw( m_camPosition, ViewMatrix, ProjectionMatrix );


Update, Initialize and other methods are similarly simplified for the individual game, with as much complexity and common behaviour as is feasible moved into the engine.

Incidentally, when you have a base method that needs to do things before and/or after the derived method, I much prefer this pattern of having the base method (Draw here) as public, and have the base method call into protected virtual (or abstract) methods (DrawGame here). This is much superior than having a single protected function and relying on your more-derived class to remember to call the base version of the method. It’s too easy to forget, and too easy to get subtle bugs as a result! Of course, there is a cost. It may be, as I develop my games, that I discover I need to call some game-specific code after base.Draw (or base.Update, or what have you). That would become difficult or impossible in this pattern. If things turn out that way, I’ll have to change it. For now, I’m enjoying the simplicity of my game-specific methods. I can, quite literally, create a new XNA project, change the main class to inherit from Kensei.Game instead of Microsoft.Xna.Framework.Game, and I’ve got all the functionality of my Kensei engine instantly available. It’s very powerful.

(My original plan was going to be even more powerful; I’d create a simple project using Kensei.Game, and with all the necessary references to the Kensei\bin assemblies, and use that as the basis of a Visual Studio Project Template. Then, when creating a new project, I’d select “New… Kensei Game” instead of “New… Microsoft XNA Framework 3.0 Game”. It nearly worked! Everything was perfect, in fact, except that the Export Template wizard didn’t pick up the content project. That’s very unfortunate, and means I can’t use this trick as I’d wanted, but if anyone knows a way around it, please let me know.)

I think that’s enough for now. I hope what I’ve written makes sense, and gives you some ideas on how to structure your own engine and games, code and assets. I don’t claim for a second that this is the only way of doing it, and in fact if you’ve got some better ideas, I’d love to hear about them – leave a comment. This has been a fairly high-level view of the problem and my solution to it, and there are so many tradeoffs to be made at a lower level – something for a future article perhaps, though I’ve tried a number of different structures over the years, and have heard of still more without trying them out yet, and have yet to settle on a favourite. Something to experiment with, for sure. Anyway – let me know if this was useful to you.


November 12, 2008

Easy Upgrade

Filed under: XNA — bittermanandy @ 10:08 pm

I was planning to spend the next couple of nights upgrading my Kensei engine and Pandemonium game to XNA 3.0, and making notes of the problems I encountered, so that I may write an article explaining how I overcame them and others may more easily overcome the same problems if they encounter them too.

Unfortunately, I can’t do that – because if I did, the article would consist of the following: “It just worked”.

Yes, it really was as easy as loading up VS 2008 and loading my existing solutions into it. It took a few minutes, to be sure, but I only had to press OK a few times and everything worked. It was so easy that I didn’t believe it at first – I did a batch build of all projects, then a batch rebuild of all projects, then double checked all the references to ensure it really was using XNA 3.0 and hadn’t just pretended to upgrade from 2.0. But it was true – the upgrade really did take zero effort.

So I thought I’d kick things up a notch – bam! – and investigate what Microsoft had seen fit to provide us with in an area that was sorely lacking in 2.0: installation and distribution of my game on Windows. Please understand that this is only a preliminary look at what’s offered, and I might easily have missed some subtleties. Basically you’ve got three choices:

Package as CCGame: this particular option seems to be quietly glossed over in the docs so I suspect it’s just a hangover from XNA 2.0. It’s got the same old drawbacks, in that the recipient has to have the XNA redistributable already installed, and it doesn’t offer any choice of installation location or creating shortcuts on the desktop or Start menu. However, on the plus side, the final output is a single compressed file. It’s much easier to give someone one file than several files and folders.

Setup Project: assuming you have VS 2008 Professional or higher, you can now create a VS setup project that creates an MSI file. (Instructions are also provided to use a third party installer such as Wise, should you so desire). This would also create a single file, as well as giving total control over installation folders, shortcuts, the ability to show licensing agreements; but the downside is that it doesn’t automatically recognise content project files as dependencies, which would need to be added individually. There could be thousands of these in a typical game. Not going to happen. You’ll drive yourself mad trying to keep it in synch with your game. Personally I’ll be avoiding this option like the plague.

ClickOnce: obviously the favoured choice at Microsoft, it’s got a lot of good things going for it. You can set version numbers (and make them update automatically), it installs the .NET and XNA prerequisites for you, you can massage which files get included and define download locations or whether to autorun the CD you burn it onto… brilliant. Plus you get all the expected shortcuts in the Start menu (but not the Vista Games folder, oddly). The downside? The final output isn’t a single file! It’s an exe, and a manifest, and a folder containing all your game files with the extensions changed to .deploy. I don’t really get what that’s supposed to save you. Sure, you can zip it up – but then your user has to open a zip file and run the right exe. You could just as easily zip your game from the bin folder and tell them to run the XNA installer separately, you’ve not saved anything.

Overall these options are a big leap forward from XNA 2.0 but I still wish they’d taken just one more step and either made ClickOnce create just one file, or made the Setup Project more intelligent. At least now there’s a better-than-evens chance that whoever you give your game to will be able to install and play it first time, but it’s still not quite as simple as it could be. It’s possible that I’m missing something, but where the upgrade from 2.0 to 3.0 was a perfectly smooth 10/10 experience, I’d still have to give the Windows distributable options no more than 8/10.

One thing I do like isn’t strictly an XNA feature at all, but C# 2008. Select View -> Code Definition Window, and every time you select a symbol in your code (class, function, namespace, enum…) the window will update to show you the code definition of that symbol. It’s priceless and I officially love it. I’m looking forward to using all the other new features of the IDE and the language.

(Though bizarrely, the Open Containing Folder feature on the active item tab only worked the first time I used it, and never since. If anyone can tell me why I’d be very grateful as I’d use that all the time).

Garbage, Part Two (Director’s Cut): oh, alright then

Filed under: Tools and Software Development — bittermanandy @ 12:27 am

You twisted my arm.

Let’s very quickly go over what you should know already (remember, this isn’t necessarily a beginner’s guide to garbage, rather an attempt to explain why what you know about garbage is like it is, and what it means for your code):

1. Value types (ints, floats, and other in-built types, as well as any user-defined struct type, eg. Microsoft.Xna.Framework.Vector3 or a struct you define yourself) are created on the stack, so you don’t need to worry about garbage.

2. Reference types (any class) are created on the heap, so should be used carefully as they will be garbage collected, which can be bad for the frame rate of your game if it happens at an inconvenient time. Part one explains why, in more detail than you really need to know.

As you may be able to guess, it isn’t quite that simple.

First, it’s not completely accurate to say that value types are always created on the stack. They will be, if they are created within a function, but not if they are themselves a member of a class, in which case they will exist (as part of the class object) on the heap. It’s a minor and fairly obvious point but it’s important to be correct.

Second, it’s not completely accurate to say that value types never generate garbage. They can, of course, contain reference members, in which case creating a new struct object can indirectly create a new class object, which generates garbage exactly as though you’d created the class object yourself. It’s fair to say that this is a slightly unusual thing to do (I’m not sure I can think of an example?) but it’s entirely legal and something to watch out for.

Third, remember how I said value types are created on the stack unless they’re members of a class, and how that was a minor point? Actually it’s not that minor. Value types contained within reference types are allocated on the heap. For example, arrays always catch out new C#/XNA programmers:

int a = 0;               // System.Int is value type. Allocated on the stack

int[] b = new int[5];    // Array is reference type! Allocated on the heap (garbage!)

Slightly confusingly, passing value types by reference (using ‘ref’ or ‘out’) does not “turn them into” reference types. It simply means that the object’s value can (or must, with ‘out’) be changed within the function and those changes are reflected in the original object that exists outside the function. It also means that no temporary object is created to act as the function parameter. For example, v1 below is a copy of another Vector3; creating a new object and copying onto it can be marginally slower than just passing by reference, as with v2, and the difference becomes more pronounced the larger the object. (Remember a reference is eight bytes in size in 32-bit code like XNA, so any struct eight bytes or smaller won’t benefit from passing by reference at all). For this reason, functions like Vector3.CatmullRom, that take many struct parameters, are provided in two flavours: one, which is more convenient to use, that takes (copies) the inputs by value and returns the result; and another, which is more performant, that takes the inputs by reference and the result is an out parameter. It’s a pattern you may want to use in your own code though 90% of the time you’ll just call the more convenient version.

void F( ref int a, out int b, Vector3 v1, ref Vector3 v2 )


    a = 1;        // No garbage here

    b = 2;        // Or here either

    m_v1 = v1;    // Or here… but v1 is a temporary

    m_v2 = v2;    // Or here… but v2 is passed by reference


Of course, reference types are always passed by reference anyway (which means a function can always change them; that’s one thing I miss about C++: const!) so using the ‘ref’ keyword for reference types is pointless, though legal. (FXCop rightly points out your mistake though).

There’s another difference between value and reference types – more specifically, between structs and classes – than simply where they live. Reference types can inherit from other reference types (and you therefore get inheritance, polymorphism, and all the other clever object-oriented stuff) while value types cannot inherit at all. And yet – you are probably aware that any and every type in C# is considered to ultimately derive from System.Object. You will undoubtedly have seen functions like this (particularly in .NET 1.x, before generics came along):

void G( object obj )


    // …


If obj can be anything, including a value type, like an int, a float, or a Vector3 – but value types can’t derive from anything – how come you’re allowed to call G( 3 )? Well, when such a call is made, a temporary object which does derive from System.Object and is known as a “box” is created on the heap, and a copy of the value type object placed inside it. Putting the value type in the box is called “boxing”, and taking it back out (via a cast, or the ‘as’ operator) is known as “unboxing”. Critically, the temporary box object is garbage. This means that using value types in functions or containers that rely on type System.Object generate garbage. As a result, boxing is probably the second most common cause of unexpected garbage. To avoid it, avoid using or writing functions or classes that use System.Object – prefer generics instead – though you can still get caught out if you’re not careful, as my previous post on Reflector showed.

So, usually, you will want to use reference types (classes) for your data that stays in memory for a significant amount of time. You’ll want to be careful with when you allocate it (remember, a garbage collection can only ever happen on an allocation). Value types (structs) are most useful for small, lightweight data. Think about Vector3 – it only contains three floats and has a handful of methods. The XNA team could have written an abstract base class Vector and specialised it with Vector2, Vector3, Vector4, and VectorN, but what a piece of over-engineering that would have been! More to the point, using vectors (very common objects in 3D games) would have generated masses of garbage. As a struct instead of a class, Vector3 is much more elegant – you can write things like SetPosition( new Vector3( 1.0f, 2.0f, 3.0f ) ), a thousand times a frame if you like, and know that there’s no chance of garbage. On the other hand, your main character object is bound to be of class type. It’s likely to derive from things (interfaces if not classes) and needs to be kept in memory, which means on the heap.

This also implies that if you have an object of class type, you should keep it around if you’re likely to be able to reuse it. For example, this kind of trick is useful all over the place:

void F1()


    MyClass myObject = new MyClass( 5 );    // Bad! Generates garbage every call!




static MyClass myStaticObject = new MyClass( 5 );


void F2()


    myStaticObject.DoSomething();            // Good! Does not generate garbage!


Just a couple more things to think about, though I’m a bit short on space. Some types can be considered as “atomic”. Basically what that means is, that if they change, they become a different object. So a Vector3 is non-atomic – you can change v.X and you still have the same Vector3, just in a slightly different place. But if you have a user defined type PlayerDetails, and you change any of the fields, you’ve got a completely different “thing” to deal with:

PlayerDetails player( “Andy”, “Patrick” );

player.FirstName = “Fred”;    // Look out! Now “Fred Patrick”, that’s not right!

player.LastName = “Bloggs”;

The player details for me are fundamentally a different “thing” to the player details for Fred Bloggs. (Language fails me slightly, here, and I’m also not sure I’ve picked a great example). Furthermore, if anything unexpected happens in the middle – the object gets viewed on another thread, or setting LastName throws an exception – the object can be seen in an invalid state. To prevent this, such atomic types should be immutable (I’m just not going to stop linking to those books!), which means you can’t change any single field, and once an object has been created, it never changes. This is really good software engineering (it allows you to write more correct and more secure code) but can lead to surprising results:

string message = “Player “;

message += playerNum.ToString();

message += ” wins by “;

message += points.ToString();

message += ” points!”;

That short code snipper generates not one, but seven objects: the final message object contains “Player 1 wins by 6 points!” for example, while the strings “Player “, “1”, “Player 1”, “Player 1 wins by “, “6”, and “Player 1 wins by 6” are all garbage! System.String is an immutable, atomic, reference type. This was absolutely the right choice by the .NET architects (strings in .NET are wondrous things of great beauty compared to strings in C++) but leads careless game programmers to watch helplessly as their frame rate plummets. Code like the above is probably the number one cause of unwanted garbage – watch out for it. If you must build strings piece by piece, use a StringBuilder object. (And, if you find yourself creating a type that is atomic and immutable, consider creating a non-atomic mutable MyTypeBuilder class to work with it).

I know what you’re thinking. “It would generate less garbage if you didn’t call playerNum.ToString() and points.ToString(), and just passed in playerNum and points!” Nice try, but no. System.String.operator+= works with System.Object types, and calls System.Object.ToString(). That means, if you passed playerNum into it, it would box playerNum and still generate the extra string – boxing and strings, the two worst garbage generators happening at once – you’ve made your garbage problem worse, not better!

There’s a lot to understand about garbage but I hope the last couple of articles have opened your eyes somewhat. I’ve approached it in a slightly unorthodox way – most writers will start with an article like this one, explaining the dos and don’ts, and probably never even cover the topics in the previous article which showed the whys and wherefores. Personally, I like to know what’s going on “under the hood”. You can memorise the rules relating to garbage and write good code, but until you understand the reasons behind the rules you might not write brilliant code. There is one important thing related to garbage I’ve not covered – that’s the Dispose pattern. This book (there it is again!) explains what it is and how to implement it, much better than I can; the only thing it doesn’t really cover is the detail of how objects waiting for disposal relates back to the mechanics of garbage collection I covered in Part One, so I might go over that at some point but I’m not promising anything.

I hope this appeases those of you who wanted a follow-up to Part One, I misjudged what people wanted and apologise for that. There is one way to make sure it doesn’t happen again – your feedback is always welcome, let me know what’s useful, if anything wasn’t clear, and what you’d like me to write about in future (though I reserve the right to choose not to :-).

“There’s four and twenty million doors, on life’s endless corridor” – Oasis

November 11, 2008

Garbage, Part Two: anticlimax

Filed under: Tools and Software Development — bittermanandy @ 12:12 am

Oooooookaaaaaaaaaay, it seems that Part One wasn’t very interesting as nobody commented on it and not many people even read it. I personally think that the way C# manages garbage is fascinating and immensely clever. Perhaps I’m the only one!

So I was going to go into a similar level of detail for much more garbage-related stuff, and spread it out across three or four posts; but that seems a bit pointless given the previous response, a lot of effort for not much gain. I could instead write about it at a very high level, but you can find better resources elsewhere. Sorry… bit of a let-down this post. I’ll try to make up for it with the next one.

I’m planning to get what Pandemonium (such as it is so far) running on XNA 3.0 over the next few days, hopefully I’ll be able to write a bit about that. I’m looking forward to getting some actual gameplay up and running, not just the basic systems, which I intend to show as I go. I have a few other topic ideas too. Of course, if there’s anything you’d like me to cover, say the word. Check out the kind of things I’ve written about so far to see the kind of level I’m writing at (and for).

Back to normal good service soon…

October 19, 2008

Garbage, Part One: Stack and Heap

Filed under: Tools and Software Development — bittermanandy @ 1:06 pm

I’d like to get back to writing about things that will, hopefully, improve your understanding of XNA and C#, so I thought I’d discuss something fundamental to the performance and resource usage of your application: garbage. This is a topic which many C# programmers can probably get away without fully understanding – and many probably do – but to be able to write efficient and performant code, you need to know what’s going on under the hood.

Those of you who really want the nitty gritty can probably skip these articles and read this one, this one, and (for the Compact Framework, used on Xbox 360) this one, but they can be pretty heavy going. I aim to provide a more abstract explanation, and will be always trying to emphasise the implications for your game code.

We should probably start at the beginning. Computer programs are all about manipulating data, and data is stored in memory. A huge variety of data structures have been developed to control that memory, but at the highest level (and with particular importance for C#) some memory is set aside for the stack, and the rest is used by the heap. (There is also constant memory and static memory, where your constants and statics are stored).

The stack is strictly limited in size, and it’s the area of memory set aside for your code to run in. Every time you call a function, the stack grows and information necessary for the running of the program is stored in it, including the return address (so the function knows where to go back to when it completes), function parameters, and local variables. It is possible to write software that only uses the stack, and in fact safety-critical software usually does exactly that – the last thing you need in your nuclear reactor control code is a memory allocation failure. On the other hand, the fact that the stack is of limited size makes some algorithms a bad idea in software – in particular, recursion (where a function repeatedly calls itself) can make the stack grow very rapidly, potentially until it runs out of memory. It is usually therefore preferred to use iteration (a single function containing a loop) rather than recursion.

The heap is limited in size only by the total amount of memory available to the process, which may be close to the total amount of physical memory in the machine. Where data on the stack is temporary in nature, data on the heap can be more permanent. The flipside of the extra longevity is that the rules for controlling data in heap memory are more complex. While stack memory can be reclaimed as soon as the function using that part of the stack exits, heap memory must be allocated and deallocated at less predictable points in the code. In C++, for example, memory for a data object is explicitly requested using operator new, and released using operator delete; it is all too easy to forget to delete something, leading to a memory leak, or attempt to delete it twice, leading to (often) a crash. In addition, allocating and deallocating memory can be slow.

C# and other managed languages attempt to solve these problems by using a garbage collector to control the lifetime of data on the heap. This strategy makes it somewhat easier to write correct programs that do not leak and removes some classes of bugs entirely, but changes how code needs to be written to achieve good performance. In particular, allocating memory in a managed environment is much quicker – but while unallocating it is done automatically and is more likely to be done correctly, it can cause severe performance problems if you’re not careful. Hence why garbage collection is mentioned so often in discussions about C# in general, and XNA in particular (games being more performance-sensitive than a typical software application).

Basically (the articles above go into excruciating detail) data on the heap is stored in a single block. Every time an allocation is requested, the garbage collector expands the size of the block by the necessary amount and returns a reference to that data. Note that this is very fast – it’s literally just incrementing a number (the size of the heap), and that objects allocated at the same time exist next to one another in memory (which can bring performance benefits due to data locality). Also note that, by the nature of software, the reference that is returned will be stored somewhere – either on the stack (in a local variable perhaps) or somewhere else on the heap (as a member variable of an object that exists on the heap).

You can see from the diagram how this works. The stack will grow and shrink in size as functions are called and exited, while the heap will continually grow in size. References, on the stack or the heap, point at data on the heap (you never get a reference to data on the stack – there’s more to say about that later). If you look closely you’ll notice that if you start with a reference on the stack (a “root”) to the object it references, and then each reference in the object to the others in turn, you can build a list of which objects are still (directly or indirectly) referenced by a root, and which are not.

(Note that in the real world, static data is also a root, and heap objects can themselves act as roots under certain circumstances as well; but we’ll keep things simple in this discussion). 

For example, you can see that stack object S2 references heap object H4, which in turn references H2 and H5. However, although H3 references H7, neither H3 nor H7 themselves are referenced by anything on the stack (the function that allocated them must have exited). They are therefore said to be “unrooted” and have been coloured slightly differently to show that. Think about what this means. The stack represents our running program. So if a heap object is unrooted, it must mean that nothing in our running program has any way of affecting, or being affected by, that heap object. It’s no use to us any more.

I mentioned that the heap will continually grow in size. This is clearly something that cannot be allowed to continue forever! At some point we must reclaim some memory. That point will come when we request another memory allocation, and the garbage collector decides that it can’t allocate any more without making room. It might do this when it has literally run out of spare memory, but will usually do it earlier, after some value of memory has been allocated. (This value varies on Windows and Xbox).

This point is important because it has several implications. Firstly, garbage collections cannot happen at any arbitrary moment. They only occur when you request more memory – if you don’t create any more heap objects using new, you will not get a garbage collection happening. This may mean garbage collections are unpredictable, but they are fully deterministic. Secondly, garbage collections happen on the thread that is allocating the memory, inside the new operator, not on a separate “garbage collecting” thread. This means that new is almost always very fast, until a garbage collection is necessary at which point it is very slow (then goes back to being very fast again).

When the garbage collection occurs, it builds a list of rooted and unrooted objects, very much as I described above – starting with roots on the stack, and following references through objects on the heap. It won’t visit any object more than once, so is efficient and can cope very happily with circular references (which, for example, smart pointers in C++ often do not). On Windows, it uses a technique called “generational garbage collection” to speed things up still further. This is incredibly clever, but not available on Xbox so I won’t go into detail here; again see the links above if you’re interested. Once it has worked out which objects are still rooted, it shuffles them together in memory, copying over the unrooted ones. This frees up more space so program execution can continue. Here’s what it will look like after the garbage collector runs on the memory in the diagram above.

Again there are some important consequences of this. We’ve already seen that allocating a new object is (almost always) very fast. However, we can begin to understand that having lots of references in our data means the garbage collector has to follow more links when working out what is rooted and unrooted, which is a small performance cost. It’s also more likely we’ll forget to set a reference to null, so objects may still be rooted when we actually don’t care about them any more. (Note though that a C++-style “true” memory leak is impossible). On the plus side, we never suffer from fragmentation, and we can see that objects allocated together (for good memory locality) will stay together due to the way they are shuffled. Less happily, objects that aren’t actually needed any more could stay in memory for a long time, until the next garbage collection, so our software is more memory hungry than may strictly be required.

Most of all, though, we can begin to understand why a garbage collection is slow. When the collector decides that a collection is necessary (when new is called and certain conditions have been met), it must: (1) Wait for other threads to get to a safe state and halt them, so they don’t interfere. (2) Iterate through all roots, following all references, building a list of rooted and unrooted objects. (3) Shuffle the rooted objects together on the heap, writing over unrooted objects. (4) Iterate through all references in all objects on the stack and heap, updating the pointers they contain to the new locations in memory. (In fact there are a few more complications relating to finalisers and pinning that I won’t cover here). Then, and only then, can it allocate the new object and return it to your code. That’s a lot of work – and that is why it’s so easy for garbage collections to cause your game to run unplayably slowly.

I hope you will now have a better grasp of exactly why the garbage collector exists, what it does during a garbage collection, and why it does it when it does it. If you’re coming from a C++ background, you should probably feel relieved that you don’t have to worry about having to delete data yourself, though it would be only human to be a bit concerned about the fact you have so little control over when the garbage collection runs.

You do have some control, though. As I mentioned above – if you don’t call new to create an object on the heap, you won’t suffer a garbage collection. That’s probably the most important lesson to get from all this. Create new heap objects as infrequently as possible, and you will see great benefits in the performance of your game. One common and very sensible approach is to create all your heap objects at level start, keep them in memory throughout the level gameplay, then forcibly induce a garbage collection during the next level load, when no-one cares about performance. You may be able to think of other approaches for non-level-based games. But we’re getting ahead of ourselves – how do you control what is created on the stack, and what is created on the heap? I’ll talk about that in part two.

“640KB of memory is more than anyone will ever need.” – Bill Gates (attributed – but not true!)

PS. If you think garbage may be causing you problems, the CLRProfiler is your friend.

October 18, 2008

There’s No “I” In “Team”

Filed under: Games Development,Personal — bittermanandy @ 12:57 am

I’m starting to get a little bit excited about “the new Xbox experience” and in particular (at last!) the live launch of the whole Community Games thing. It’s been a long time coming, and it still remains to be seen what solution XNA 3.0 will have for publishing games on Windows (which, in XNA 2.0, amounts to black magic and a reliance on extreme goodwill from your target audience); but I’m massively in favour of what Microsoft are trying to achieve here, and I still aim to get Pandemonium out there at some point.

There is one major issue that the system has that I really wish MS would do something to address. It’s a bit of a tricky one, to be sure, but the whole idea of making the 360 open to the hobbyist games development community hasn’t exactly been a walk in the park, so I’m sure they could do something if they tried.

The problem that Xbox 360 Community Games has is this: all games provide a free time-limited demo (so far so good) but all games must charge between 200 and 800 Microsoft Points for the full version. That is: you are not able to release your game for free.

Don’t get me wrong – money is a good thing! I don’t think many people will get rich from their Community Games, but there is definitely the potential for those people who put in a lot of effort to make a little bit of cash out of it. It’s actually quite nice to be allowed to charge the same kind of money as a full XBLA game, because some Community Games will be as good as the best XBLA games. (The worst XBLA games, unfortunately, are very poor).

However. I like at my own game: Pandemonium. There is nothing I would like more than to open up development of Pandemonium to other people. A few have even offered to help me out with various things, be it code (which I should be able to manage so long as I keep things simple, but more help would allow it to be a more complex game) or art (at which I am awful, and any help would be gratefully received, were it practical). If I were able to release Pandemonium for free, I would love to make it a team effort.

The problem is this. Pandemonium, when the time comes for it to be released on Xbox 360 Community Games, and like every other Community Game, will cost real money.

So… if I get the help of an artist, and Pandemonium sells a million copies and makes me rich (here’s hoping!), that artist would have every right to say: “hey. I helped you out there! Give me a share of the money.” To the best of my understanding, there is nothing in Xbox 360 Community Games to facilitate that. I could obviously “miss out the middle man” and transfer money into his bank account – but then, how much? The artist in question might say, “there’s two of us, so give me 50%” – but I might be of the opinion that I put in more time and effort and there wouldn’t even be a game without me, so only offer 20%. We would end up in dispute – and again, as far as I can tell Microsoft have washed their hands of the whole problem.

As the team grows larger, the problem grows worse. By the time you’ve got even five people contributing, you’re going to need to start writing legally binding contracts if you think your game will make any significant money. But then – I contract an artist to produce some character models and animations. Unfortunately, the results are awful and they don’t make it into the game. He did the work – he’ll want to get paid! But they didn’t make it into the game, so I’m unlikely to want to pay him.

It would also put obligations on me. The last few weeks, there have been (fun and exciting) things going on in my social life that have meant development on Pandemonium, and updates to this blog, have effectively frozen for a while. This is, after all, just a hobby project. If I’d contracted an artist to produce models, which he’d done – but then my distractions meant Pandemonium was delayed or unfinished, preventing him from getting any income – he might start getting pretty annoyed.

Real games teams have producers and management and business experts to handle all this kind of thing. (You see? I said “producers” without spitting. Aren’t you proud?) Hobbyist developers have nothing, and that’s fine when your games are free. You can make it clear from day one: “you want to contribute? That’s great! But this game is going to be free, so you won’t get any money, and neither will anyone else.” When you have to charge – there isn’t any choice in the matter, every game will be at least 200 Microsoft Points, of which up to 70% goes to the developer – it gets more complicated. Who gets the money? What share of it? What do they have to do to entitle them to that share? What happens if they don’t do it? Who do you complain to if you never get the payment that was agreed upon? How do you know the person making the payments is being honest – in fact, can you be sure you will ever get a payment at all?

In my opinion, if Microsoft are going to mandate that Community Games must be premium, they should also provide some kind of framework for answering these questions, and arbitrating on them. (The MS system need not be compulsory, but without it, where do you start?). I cannot, in all good conscience, accept offers of help on Pandemonium, because I cannot, with any level of honesty, promise to fulfil my side of the bargain (which would be much less of a problem with no money involved). Even if I did, and Pandemonium starting generating income, I would be deeply uncomfortable with attempting to control who gets a share of what – and unwilling to be blamed if someone felt they received an unfair share. I would love to make and distribute Pandemonium on 360 for free, but that’s simply not an option. So, it will remain a one-man effort – with all the implications that has – and that’s a real shame. Xbox 360 Community Games will, I predict, be a massive success; but part of a community involves working together, as well as sharing the fruits of your labour, and there are very real obstacles to that under the current system.

I’d love to be told I’ve overlooked something in the literature, but I fear I haven’t.

“An army cannot be commanded from within. A nation cannot be governed from without.” – Sun Tzu

September 2, 2008

Tools of the Trade – Part Five: Pot Pourri

Filed under: Games Development,Tools and Software Development,XNA — bittermanandy @ 11:02 pm

I think that FxCop, Reflector, CLRProfiler, and PIX – all of which I’ve previously discussed in some detail – are the most useful, bread-and-butter tools you’ll use in your XNA development. (In fact, all but perhaps PIX are useful regardless of whether you’re writing games or applications). There are of course many, many more tools, each with their uses, and I’d like to summarise some of them here. The sign of a good craftsman is using the right tool for the job, so I’d encourage you to explore all the different options available to you. Put bluntly, if you’re spending all your time performing repetitive tasks, or going through endless tweak-test-tweak-test cycles to try and hunt down problems in your code, you’re wasting your time. Your time may not be worth a lot to you, but mine is worth a lot to me, so I for one am always looking out for new tools and I think you should too.

Your primary development tool is Visual Studio itself (the Express version of which is available as a free download – possibly the most amazing free thing ever). I’m going to assume you already have it or you’d not be coding in XNA! However, I can guarantee that you are not using it to it’s full potential. I know I’m not. The reason I can make this guarantee is that the full potential of Visual Studio is huuuuuuuuuge. Almost every day, certainly every week, I discover new things it can do and think “that’s amazing!” Start by keeping up with the Visual Studio Tip of the Day, learn how to write macros, take some time to explore for yourself (especially consider keyboard shortcuts), and look out for plug-ins and extensions too. Project Line Counter is a personal favourite.

We’ve already looked at a couple of profiling tools but Perfmon (free with Windows) is the daddy of them all. As an example, hit Start then Run, and type “perfmon”. When the tool loads, select “Performance Monitor”, right-click on the counter list at the bottom and select “Add Counters…”. Select, say, the “.NET CLR Memory” category, then, for example, “% Time in GC”. Choose your game as the selected object and click “Add >>” then “OK”. Hey presto! A line displaying the exact processor cost you are paying for garbage collection will be added to the graph. There are hundreds of counters like this one, and much more that Perfmon can do.

I’m a bit loathe to mention this next one because, frankly, I hate it, but there are some bugs that can only be identified using WinDbg (or “windbag”, free with the Debugging Tools for Windows). Running out of memory and not sure why? Take a memory dump of your game, load sos.dll, call !DumpHeap -stat to see what’s live on the heap, call !DumpHeap -type <type> on the most memory-expensive type it lists to see all the items of that type, and call !GCRoot with the address of one of those objects to see exactly what is keeping it in memory and why. Sometimes there’s just no other way to work out what’s happening to your memory. WinDbg is an advanced tool and it’s an absolute swine to work with, but if the debugger in Visual Studio can’t solve your problem, WinDbg will.

I previously wrote about Reflector and described how it can reveal any assembly’s code to you. How does the code you write get translated from C# to the CLR and IL, and finally JIT-compiled into machine code? Well, I can’t help you with the JITter but IldAsm (free with Visual Studio) can provide a fascinating insight into the Intermediate Language stage of your code’s existence. Much in the same way that you don’t need to understand assembly language to write or use C++, but knowledge of assembly can help you fine-tune your C++ and fix the really tricky problems, knowledge of IL and an understanding of the translation process – while not essential – will make you a better C# programmer.

There’s a whole bunch of tools that are a bit more specialised or esoteric:

Perforce is absolutely the best choice for source control. I can’t live without Perforce now, it’s as though it has become a part of me. It’s free for up to two users, though very expensive for larger teams than that (absolutely worth it if you’re a professional company, perhaps less so if you’re a group of hobbyists, in which case try Subversion).

– If you’re a pro developer and aren’t using continuous integration, you face months of torment in an endless death-march of crunch at the end of the project. Do yourself a favour and use CruiseControl .NET (free).

– Continuous integration becomes even more useful when a build is run against a set of unit tests, and in fact they’re useful for finding mistakes early which is good for anyone, pro or hobbyist alike. I’ve heard good things about NUnit (free)… do as I say, not as I do, and use it… not doing unit testing is my worst programming habit that one day I will get out of. Don’t fall into that trap.

– Perfmon can tell you when you’re slow on the CPU and CLRProfiler can tell you if it’s garbage at fault, but if not and you want to know which specific functions are slow (and you very often do!) NProf is the tool for you, and it’s free.

– Finally, I’ve not used it yet but RPM (Remote Performance Monitor for Xbox, free with XNA) looks to be pretty damn useful for working out why you’re running fine on PC but slow on 360.

The best thing about all of these tools? Like XNA itself, they’re all free. That’s the kind of money I’m OK with spending! It means you have no excuse for not becoming familiar with them and, hopefully, rather than staying up bleary-eyed until 5am trying to find the bug in your code, you can fire up the appropriate tool, find and fix the bug and be home in time to see your family and get a good night’s sleep. Everyone’s a winner!

There’s only one major category of tool I’m missing, and that’s a decent bug database. I’ve tried Bugzilla and OnTime, and at work we have to use Sunrise, and I hate all of them as well as some others. By far the best defect tracking system I used was Product Studio, when I was at Microsoft, but despite being brilliant it is only available with the Team System version of Visual Studio which is very expensive. If anyone can recommend a good, usable, simple bug database that is not web-based and has a good UI, please let me know.

In fact, undoubtedly many of you out there will have your own favourite tools. Why not share the love, leave a comment and let me and everyone else know which tools make your life easier?

“If the foreman knows and deploys his men well the finished work will be good.” – Miyamato Musashi

September 1, 2008

Tools of the Trade – Part Four: PIX

Filed under: Games Development,Tools and Software Development,XNA — bittermanandy @ 9:44 pm

There is one more tool that I want to cover in a little bit more detail before presenting a round-up of the best of the rest (there really are so many good ones out there that this mini-series could last for months or years if I wrote a post for each one). The last article presented the CLRProfiler, a tool to help you manage your garbage and ensure it is being collected properly. Careless garbage collection is often the cause of poor performance on the CPU – but the CPU is only half the story, and to find out what’s causing poor performance on the GPU, you will need to use PIX (available free as part of the DirectX SDK).

Those of you who downloaded and used my Kensei Dev library might have noticed this comment in the Dev Shapes code:

// TODO I have noted some performance issues with this code when drawing very large

// numbers of shapes, but have not had time to profile it and fix it up yet, sorry!

Recently I had a bit of spare time so decided to go back and revisit this section. I set up a very simple test within Pandemonium, to draw lots and lots of spheres at random positions. I discovered that drawing 1000 spheres, or about 860,000 triangles (which doesn’t seem that many to me), caused the frame rate to plummet to only about 6Hz:

Lots of spheres!

Lots of spheres!

Using tricks that I’ve covered and linked to previously it didn’t take long to determine that the GPU was the bottleneck. (For example, returning early out of the game’s Update method, therefore dropping CPU usage as close to zero as possible, had zero effect on the frame rate). So my next port of call was PIX itself.

PIX (originally an acronym for Performance Investigator for Xbox) is an immensely powerful tool and we’re only going to scratch the surface of it here. At the most basic level, you can think of it as a recorder for absolutely everything that happens on the GPU. You can see exactly when every single function that used the GPU was called, and exactly how long it took. You can even rebuild a frame of your game method call by method call, seeing the results rendered step by step, instead of within a sixtieth of a second.

In this case, I want to see which functions are taking so long within a frame. I therefore chose to sample a single frame, as all frames are likely to be pretty much equal in this case. (If, for example, I was seeing a generally solid frame rate with occasional stutters, I’d have had to have chosen a different option).



After starting the experiment and getting to a point where the frame rate was low, I hit F12 to capture a frame (this can take a second or two). After I’d shut down my game, PIX generated a report:

A PIX report

A PIX report

There’s quite a lot going on in this image so let’s take a look at each section in turn.

The top window shows a graphical timeline. It’s not obvious from this picture, but you’ll see it very clearly when you run PIX for yourself, that the bars on the timeline indicate time when the GPU and CPU are busy doing things. As you click along the timeline, the arrows indicate where the GPU and CPU synchronise to the same call. With some classes of performance problem, you’ll see big gaps in one or other processor – these indicate whether you are CPU or GPU bound, for example, if you are GPU bound, you’ll see gaps in the timeline for the CPU where it was waiting for the GPU to catch up. The red circle in the top right of the picture shows the range of calls within our sampled frame (which occurred about 48 seconds in) – it looks mostly empty in the screenshot, but zooming reveals more details.

The middle window shows the DirectX resources in use (remember, XNA is just a layer on top of DirectX) including pixel and vertex shaders, vertex buffers, surfaces and such like. Not of much interest to us at this point.

In the bottom right I’ve selected the Render window. This shows us a preview of the frame as it was constructed. As you advance the cursor along the timeline, this preview is updated – initially getting cleared to Cornflower Blue, then having more and more things drawn onto it. This can be invaluable for detecting overdraw, and is really interesting in its own right. One of my favourite features is the ability to “Debug This Pixel”, which shows every call that affected the colour of any given pixel in the frame. This kind of thing is very useful when investigating transparencies, occluders, quadtrees etc.

Finally, in the bottom left is a list of GPU events, in sequence. Here you can see every call made to the GPU during the sample (note how they are all Direct3D calls, as mentioned above). Using the timeline view, I was able to visually identify which function call was the most expensive. Clicking on that call in the timeline synchronised it in the events window. I’ve circled the call in question. You can see from the StartTime of each event that the call to IDirect3DDevice9::DrawPrimitiveUP took 107349677 nanoseconds, or 107 milliseconds. When you consider that a whole frame normally completes in just 17 or 33 milliseconds, this one function call taking 107ms is a massive limiting factor on my frame rate.

Using a combination of intuition, logic, common sense, and the Render window (clicking on the call previous to DrawPrimitiveUP removed all the spheres from the preview, so it’s obvious what it was drawing!) I identified the corresponding code in my XNA program:


        PrimitiveType.TriangleList, s_triangle3DVerticesArray,

        0, s_triangle3DVertices.Count / 3 );

You may not think this tells me very much. I already knew that the Kensei.Dev rendering code was slow, that’s why I fired up PIX in the first place! In fact, this is hugely valuable information. I know exactly which line of code is causing my GPU to run like a dog with no legs.

As this is a call to DrawUserPrimitives, it seems likely that the reason for this call being so slow lies in the User part of the method name. That is to say, the Kensei.Dev code builds up an array of vertices (s_triangle3DVerticesArray) each frame, and passes that into the function. This involves copying all those 860,000 triangles from main memory into the GPU memory, and is in contrast to using a vertex buffer, which lives on the GPU. If I can find a way to use native GPU resources and avoid the User methods, I may get a substantial speed boost; on the other hand, the User methods exist for the very usage scenario I’m using here, which is of vertices that can arbitrarily change position from frame to frame and which are controlled by the CPU.

Alternatively, it was suggested on the XNA Creators forums that I may be expecting the GPU to do too much in one go, and that splitting up the calls into smaller batches may improve performance. This is somewhat contrary to my understanding of modern GPUs, which, I was led to believe, vastly prefer to perform fewer operations on larger datasets than more operations on smaller datasets; nevertheless I am far from a GPU expert so will be taking that advice, and experimenting with splitting the vertex array/buffer into smaller pieces to see if this improves matters.

There are a few more possibilities as well. I’d like to say this story has a happy ending, but it doesn’t, at least not yet – I am hopeful for the future. I am still trying to solve this problem and find how to avoid this bottleneck. However, whenever investigating performance it is absolutely essential to base your observations and lines of inquiry on hard evidence. At the beginning of this article, I knew that “something in Kensei.Dev is slow”. PIX has since revealed that “DrawUserPrimitives is taking over 100ms to draw 860,000 triangles”. This will allow me to precisely focus my efforts, and hopefully find a correct, performant fix for the problem.

PIX has an awful lot more to offer than just single-frame samples and as your game nears completion you will probably find a lot of value in it. There are a lot of bugs that simply can’t be solved any other way, and if you are doing anything remotely clever with your graphics I strongly encourage you to learn about what PIX can do for you.

“My mistress’ eyes are nothing like the sun…” – William Shakespeare

August 20, 2008

Tools of the Trade: Part Three – CLRProfiler

Filed under: Tools and Software Development — bittermanandy @ 12:11 am

Over the last few days I’ve been encountering a growing realisation of just how many excellent C#/XNA/programming tools there are out there, that I use regularly; and therefore how large this mini-series will become, if I continue with this pattern of one article per tool. Still, there remain a few that I think are important enough to talk about in detail, before I summarise the rest more concisely (though leaving open the possibility of returning to them later).

So by now, you will hopefully be keeping the compiler warning level as high as possible, checking your code for style-based or semantic errors with FxCop, and when necessary, checking how the assemblies you’re interacting with do the things they do with Reflector. Now, perhaps, you’re starting to wonder about what the code you’re writing is doing at run time.

Probably the most notable thing your XNA/C# game  will be doing compared to, say, C++, which is (or at least, has been for many years) the de facto language used in the games industry, is dealing with memory in a completely different way. C# is a managed language. This means that where in C++ you use “new” to allocate memory and “delete” to deallocate it (with the possibility of leaks, scribbles and access violations), in C# you use “new” to get a reference to an object, and the garbage collector tidies it up for you when you’re finished with it.

There is a lot that I could write about the garbage collector. It’s probably fair to say that dealing with garbage correctly is key to programming in C#. In fact, until two minutes ago this article was about 1000 words longer, until I realised that I’d written a whole bunch of stuff more relevant to GC itself than the tool the article is about, and even then had either gone into too much or not enough detail depending on your point of view. (Perhaps something for a future article, or even series, if there is demand). In summary:

– GC on Windows is not a good thing and you should try to avoid generating garbage where reasonably convenient.
– GC on Xbox is a very bad thing, and if your game is intended for Xbox 360 you should go out of your way to avoid it everywhere possible.

The other day I was working on Pandemonium and noticed that my frame rate had plummetted from 1800 Hz in the morning to 200 Hz by the evening. Now, 200Hz is more than enough – for a finished game! But I only had one character running around a simple background so it seemed a bit low. Losing 1600 Hz in a day is not a good day. Luckily, using some of the Kensei.Dev.Options I’ve shown before, it took just a couple of mouse clicks to realise that I was suffering lots of garbage collections. This was confirmed when I used another Kensei.Dev.Option to show more details:

    if ( Options.GetOption( “Profile.ShowGarbage” ) )


        Kensei.Dev.DevText.Print( “Max Garbage Generation: “ + GC.MaxGeneration.ToString(), Color.LimeGreen );

#if !XBOX

        for ( int i = 0; i <= GC.MaxGeneration; ++i )


            Kensei.Dev.DevText.Print( “Generation “ + i.ToString() + “: “ + GC.CollectionCount( i ).ToString() + ” collections”, Color.LimeGreen );



        Kensei.Dev.DevText.Print( “Total Memory: “ + GC.GetTotalMemory( false ).ToString( “n0” ), Color.LimeGreen );


It was clear that something was generating lots of garbage. Suspicion soon fell on my background hits implementation, as that’s what I’d worked on that day. It was deliberately simple, just an array of triangles, each with a BoundingBox, against which I would test lines and swept spheres. I know an octree or kd-tree would be better but that’s something I can address later. It might not be a very quick method, but it shouldn’t be generating garbage.

Visual inspection of the code revealed nothing so to find the cause, I fired up the CLRProfiler (top tip: make sure to use version 2.0 and run as Administrator) and asked it to run my game. The first thing you should note is that it is invasive: my frame rate dropped from about 200 Hz without it, to less than 10 Hz with it running.

The CLR Profiler

The CLR Profiler

So, clear the “Profiling Active” checkbox, only then “Start Application”, get yourself to the part of your game that’s slow, and only then tick “Profiling Active” for a few seconds before clearing it again. That will generate a report.

A CLR Profiler report

A CLR Profiler report

109 Generation 0 garbage collections is quite a lot in the few seconds I ran the game for. Ideally, a game would have zero garbage collections during gameplay. (That’s not possible if you’re using XACT for sound, but it’s a good ideal to get as close to as possible). The Time Line button in the centre gives the best view of when GCs happen and why.

The CLR Profiler Timeline

The CLR Profiler Timeline

The coloured peaks show garbage, which you can see was increasing very sharply before dropping as each collection occurred. The vertical black line can be moved to show what the garbage consists of at each time slice; here I placed it just before a GC, and it shows me that 2MB (99% of my garbage) was from System.SZArrayHelper.SZGenericArrayEnumerator<T>. The rest was basically strings generated by the Kensei.Dev prints, listed above, and can be ignored. But – where did this SZGenericArrayEnumerator come from?

Back on the report page, I was interested in Heap Statistics, and wanted to see what had been allocated, so clicked Allocation Graph at the top. This shows a diagram indicating where all the garbage comes from. The highest level is on the left, and you go progressively further right to get more detail. So, Program::Main is the first significant box on the left and will have created 100% of the garbage – so all the garbage in your program was created by your program, go figure – but what’s lower down (ie. to the right)?

The Allocation Graph

The Allocation Graph

As expected, 99% of the garbage is SZGenericArrayEnumerator – which comes from SZArrayHelper::GetEnumerator – which comes in turn from BoundingBox::CreateFromPoints. There was only one place this appeared in my code, and yes, it was bad. In order to provide a (small) speedup to my collision detection, which I’ve already mentioned I knew to be non-optional but wasn’t ready to spend time on yet, I’d done a Line-BoundingBox check to reject misses before doing a Line-Triangle check. And, being lazy and naughty, I’d created a new BoundingBox every test:

    foreach ( Triangle triangle in data.Triangles )


        BoundingBox box = BoundingBox.CreateFromPoints( triangle.Vertices );

Just to be clear, here. foreach has gotten a really bad reputation for garbage because in CLR 1.0 the enumerator that foreach uses to traverse the container was of reference type. Some people, wrongly, claim that you should avoid it for that reason. That’s nonsense – in CLR 2.0 onwards, foreach is safer, easier to read, and at least as performant as a for loop. You should definitely prefer foreach in almost all cases. All my garbage was coming not from foreach, but from BoundingBox.CreateFromPoints.

The fix was easy – instead of creating a new BoundingBox for each Triangle for each hit test, I’d store the BoundingBox with the Triangle when they were created in the Content Pipeline. No more garbage, or at least, none at runtime; and the Content Pipeline doesn’t care about garbage. Really, I ought to have done it that way in the first place, so a definite slap on the wrist for me.

Only one thing still niggled. BoundingBox is a value type (created on the stack) so why was BoundingBox.CreateFromPoints creating garbage? The answer comes when you look back at the screenshot of Reflector in the previous article. BoundingBox.CreateFromPoints has a foreach loop in the middle of it, and foreach creates garbage.

“But Andy, wait!” I hear you cry. “You just said foreach doesn’t create garbage, in fact, you said that was nonsense!” Well, yes, though I did insert the word “almost” in a key position. The truth is that foreach does not create garbage for arrays, Lists, LinkedLists and any container other than Collection<T>. However, BoundingBox.CreateFromPoints has been designed to handle any container via the IEnumerable interface, which means the enumerator has to be boxed. Boxing means it is moved onto the heap, and is therefore garbage.

I’m still a little surprised that an array-specialised version of BoundingBox.CreateFromPoints isn’t provided but then, I guess it’s not exactly difficult to write if you desperately need it – especially given that Reflector shows how! In any case, I hope that this demonstration of how CLRProfiler helped me with a garbage problem, has shown how it can help you, too. Leave a comment and let me know how helpful you’re finding these articles.

“First, solve the problem. Then, write the code.” – John Johnson

August 16, 2008

Tools of the Trade – Part Two: Reflector

Filed under: Tools and Software Development — bittermanandy @ 11:51 pm

I had originally been intending to write about something completely different in this second part of the Tools of the Trade mini-series, but I encountered a problem today that I simply could not have solved without Reflector. It reminded me of just how cool this tool is, and I just had to make sure everyone else knew how cool it is too.

Following on from my article about FxCop, I decided to have a crack at writing some custom rules to use alongside the default Microsoft ones. The first sign that this might not be as easy as I had assumed it to be was when I realised there was a total absence of documentation about it. There weren’t even many articles – the best ones I did find are here, here and here. I will cut a long story short by saying that the task of setting up my custom rules was more arduous than I expected. (That said, it’s working now and I’m very happy with the results).

I eventually reached the stage where I believed my custom rules library was ready to add into FxCop, but unfortunately when I attempted to do so, it complained that the XML file I had embedded as a resource in my Kensei.FxCopRules assembly was incorrectly formatted. And that’s about all it said – the error messages were not exactly verbose in their description of the problem. I spent a significant amount of time trying to work out what was wrong.

After some head-scratching, I realised that the default rules libraries that ship alongside FxCop must (obviously) contain a correctly-formatted XML file. If only there were some way for me to examine those XML files in the assemblies themselves! In the Bad Old Days, this might have been possible by loading the assembly into a binary viewer and scouring it by eye, looking for readable phrases; otherwise it would be a lengthy and painful process of reverse engineering. Happily this is Space Year 2008 and .NET has made everything so much easier.

Enter Reflector.

This wonderful little tool uses reflection (surprisingly enough) to analyze any .NET assembly you care to throw at it, and expose its innards in glorious detail. All I had to do to solve my problem was open up one of the FxCop assemblies, browse through it until I found the Resources section, find the XML file and click “View Resource”:

An FxCop rules library XML file viewed in Reflector.

An FxCop rules library XML file viewed in Reflector.

And there it was. A correctly-formatted FxCop custom rules XML file laid out in all its glory. From there it was mere seconds to see where I had been going wrong, and just a couple of minutes to fix it. I now have custom FxCop rules up and running and catching errors in my code – and I genuinely believe it would have been, if not impossible, at least very difficult and time-consuming without Reflector.

Of course, Reflector’s awesomeness doesn’t stop there. As well as showing you resources that have been embedded into assemblies, it can show you the code they contain as well. And I don’t mean “code” as in machine code, assembler, or even MSIL – I mean it can show you what is for all intents and purposes the same C# the assembly was originally written in. And it will do it for any assembly you care to throw at it. That’s pretty damn cool. To spell it out – if, say, you’re writing an XNA game and are curious as to how the actual XNA Framework code was written, just fire up Reflector and fill your boots:

The Microsoft.XNA.Framework assembly in Reflector.

The Microsoft.XNA.Framework assembly in Reflector.

If that’s not the single coolest tool anyone has written in the history of computing, I don’t know what is. And, again – just like FxCop, and indeed XNA and (should you not have the full version) Visual C# 2005 Express, it’s completely free. (Incidentally, if you’re wondering why I was inspecting BoundingBox.CreateFromPoints – all will be revealed in Part Three of this mini-series).

I feel as though I should spend longer explaining why Reflector is so amazing but honestly, if what you’ve seen already doesn’t excite you I’m not sure what will. (Perhaps the add-ins will convince you?) In any case, it is unquestionably a tool that any XNA (or indeed C# or .NET) programmer should be familiar with, and you will probably find that you use it often. I certainly do.

“Like the carpenter, the samurai looks after his own tools.” – Miyamato Musashi

« Previous PageNext Page »

Blog at