Pandemonium

September 2, 2008

Tools of the Trade – Part Five: Pot Pourri

Filed under: Games Development,Tools and Software Development,XNA — bittermanandy @ 11:02 pm

I think that FxCop, Reflector, CLRProfiler, and PIX – all of which I’ve previously discussed in some detail – are the most useful, bread-and-butter tools you’ll use in your XNA development. (In fact, all but perhaps PIX are useful regardless of whether you’re writing games or applications). There are of course many, many more tools, each with their uses, and I’d like to summarise some of them here. The sign of a good craftsman is using the right tool for the job, so I’d encourage you to explore all the different options available to you. Put bluntly, if you’re spending all your time performing repetitive tasks, or going through endless tweak-test-tweak-test cycles to try and hunt down problems in your code, you’re wasting your time. Your time may not be worth a lot to you, but mine is worth a lot to me, so I for one am always looking out for new tools and I think you should too.

Your primary development tool is Visual Studio itself (the Express version of which is available as a free download – possibly the most amazing free thing ever). I’m going to assume you already have it or you’d not be coding in XNA! However, I can guarantee that you are not using it to it’s full potential. I know I’m not. The reason I can make this guarantee is that the full potential of Visual Studio is huuuuuuuuuge. Almost every day, certainly every week, I discover new things it can do and think “that’s amazing!” Start by keeping up with the Visual Studio Tip of the Day, learn how to write macros, take some time to explore for yourself (especially consider keyboard shortcuts), and look out for plug-ins and extensions too. Project Line Counter is a personal favourite.

We’ve already looked at a couple of profiling tools but Perfmon (free with Windows) is the daddy of them all. As an example, hit Start then Run, and type “perfmon”. When the tool loads, select “Performance Monitor”, right-click on the counter list at the bottom and select “Add Counters…”. Select, say, the “.NET CLR Memory” category, then, for example, “% Time in GC”. Choose your game as the selected object and click “Add >>” then “OK”. Hey presto! A line displaying the exact processor cost you are paying for garbage collection will be added to the graph. There are hundreds of counters like this one, and much more that Perfmon can do.

I’m a bit loathe to mention this next one because, frankly, I hate it, but there are some bugs that can only be identified using WinDbg (or “windbag”, free with the Debugging Tools for Windows). Running out of memory and not sure why? Take a memory dump of your game, load sos.dll, call !DumpHeap -stat to see what’s live on the heap, call !DumpHeap -type <type> on the most memory-expensive type it lists to see all the items of that type, and call !GCRoot with the address of one of those objects to see exactly what is keeping it in memory and why. Sometimes there’s just no other way to work out what’s happening to your memory. WinDbg is an advanced tool and it’s an absolute swine to work with, but if the debugger in Visual Studio can’t solve your problem, WinDbg will.

I previously wrote about Reflector and described how it can reveal any assembly’s code to you. How does the code you write get translated from C# to the CLR and IL, and finally JIT-compiled into machine code? Well, I can’t help you with the JITter but IldAsm (free with Visual Studio) can provide a fascinating insight into the Intermediate Language stage of your code’s existence. Much in the same way that you don’t need to understand assembly language to write or use C++, but knowledge of assembly can help you fine-tune your C++ and fix the really tricky problems, knowledge of IL and an understanding of the translation process – while not essential – will make you a better C# programmer.

There’s a whole bunch of tools that are a bit more specialised or esoteric:

Perforce is absolutely the best choice for source control. I can’t live without Perforce now, it’s as though it has become a part of me. It’s free for up to two users, though very expensive for larger teams than that (absolutely worth it if you’re a professional company, perhaps less so if you’re a group of hobbyists, in which case try Subversion).

– If you’re a pro developer and aren’t using continuous integration, you face months of torment in an endless death-march of crunch at the end of the project. Do yourself a favour and use CruiseControl .NET (free).

– Continuous integration becomes even more useful when a build is run against a set of unit tests, and in fact they’re useful for finding mistakes early which is good for anyone, pro or hobbyist alike. I’ve heard good things about NUnit (free)… do as I say, not as I do, and use it… not doing unit testing is my worst programming habit that one day I will get out of. Don’t fall into that trap.

– Perfmon can tell you when you’re slow on the CPU and CLRProfiler can tell you if it’s garbage at fault, but if not and you want to know which specific functions are slow (and you very often do!) NProf is the tool for you, and it’s free.

– Finally, I’ve not used it yet but RPM (Remote Performance Monitor for Xbox, free with XNA) looks to be pretty damn useful for working out why you’re running fine on PC but slow on 360.

The best thing about all of these tools? Like XNA itself, they’re all free. That’s the kind of money I’m OK with spending! It means you have no excuse for not becoming familiar with them and, hopefully, rather than staying up bleary-eyed until 5am trying to find the bug in your code, you can fire up the appropriate tool, find and fix the bug and be home in time to see your family and get a good night’s sleep. Everyone’s a winner!

There’s only one major category of tool I’m missing, and that’s a decent bug database. I’ve tried Bugzilla and OnTime, and at work we have to use Sunrise, and I hate all of them as well as some others. By far the best defect tracking system I used was Product Studio, when I was at Microsoft, but despite being brilliant it is only available with the Team System version of Visual Studio which is very expensive. If anyone can recommend a good, usable, simple bug database that is not web-based and has a good UI, please let me know.

In fact, undoubtedly many of you out there will have your own favourite tools. Why not share the love, leave a comment and let me and everyone else know which tools make your life easier?

“If the foreman knows and deploys his men well the finished work will be good.” – Miyamato Musashi

Advertisements

September 1, 2008

Tools of the Trade – Part Four: PIX

Filed under: Games Development,Tools and Software Development,XNA — bittermanandy @ 9:44 pm

There is one more tool that I want to cover in a little bit more detail before presenting a round-up of the best of the rest (there really are so many good ones out there that this mini-series could last for months or years if I wrote a post for each one). The last article presented the CLRProfiler, a tool to help you manage your garbage and ensure it is being collected properly. Careless garbage collection is often the cause of poor performance on the CPU – but the CPU is only half the story, and to find out what’s causing poor performance on the GPU, you will need to use PIX (available free as part of the DirectX SDK).

Those of you who downloaded and used my Kensei Dev library might have noticed this comment in the Dev Shapes code:

// TODO I have noted some performance issues with this code when drawing very large

// numbers of shapes, but have not had time to profile it and fix it up yet, sorry!

Recently I had a bit of spare time so decided to go back and revisit this section. I set up a very simple test within Pandemonium, to draw lots and lots of spheres at random positions. I discovered that drawing 1000 spheres, or about 860,000 triangles (which doesn’t seem that many to me), caused the frame rate to plummet to only about 6Hz:

Lots of spheres!

Lots of spheres!

Using tricks that I’ve covered and linked to previously it didn’t take long to determine that the GPU was the bottleneck. (For example, returning early out of the game’s Update method, therefore dropping CPU usage as close to zero as possible, had zero effect on the frame rate). So my next port of call was PIX itself.

PIX (originally an acronym for Performance Investigator for Xbox) is an immensely powerful tool and we’re only going to scratch the surface of it here. At the most basic level, you can think of it as a recorder for absolutely everything that happens on the GPU. You can see exactly when every single function that used the GPU was called, and exactly how long it took. You can even rebuild a frame of your game method call by method call, seeing the results rendered step by step, instead of within a sixtieth of a second.

In this case, I want to see which functions are taking so long within a frame. I therefore chose to sample a single frame, as all frames are likely to be pretty much equal in this case. (If, for example, I was seeing a generally solid frame rate with occasional stutters, I’d have had to have chosen a different option).

PIX

PIX

After starting the experiment and getting to a point where the frame rate was low, I hit F12 to capture a frame (this can take a second or two). After I’d shut down my game, PIX generated a report:

A PIX report

A PIX report

There’s quite a lot going on in this image so let’s take a look at each section in turn.

The top window shows a graphical timeline. It’s not obvious from this picture, but you’ll see it very clearly when you run PIX for yourself, that the bars on the timeline indicate time when the GPU and CPU are busy doing things. As you click along the timeline, the arrows indicate where the GPU and CPU synchronise to the same call. With some classes of performance problem, you’ll see big gaps in one or other processor – these indicate whether you are CPU or GPU bound, for example, if you are GPU bound, you’ll see gaps in the timeline for the CPU where it was waiting for the GPU to catch up. The red circle in the top right of the picture shows the range of calls within our sampled frame (which occurred about 48 seconds in) – it looks mostly empty in the screenshot, but zooming reveals more details.

The middle window shows the DirectX resources in use (remember, XNA is just a layer on top of DirectX) including pixel and vertex shaders, vertex buffers, surfaces and such like. Not of much interest to us at this point.

In the bottom right I’ve selected the Render window. This shows us a preview of the frame as it was constructed. As you advance the cursor along the timeline, this preview is updated – initially getting cleared to Cornflower Blue, then having more and more things drawn onto it. This can be invaluable for detecting overdraw, and is really interesting in its own right. One of my favourite features is the ability to “Debug This Pixel”, which shows every call that affected the colour of any given pixel in the frame. This kind of thing is very useful when investigating transparencies, occluders, quadtrees etc.

Finally, in the bottom left is a list of GPU events, in sequence. Here you can see every call made to the GPU during the sample (note how they are all Direct3D calls, as mentioned above). Using the timeline view, I was able to visually identify which function call was the most expensive. Clicking on that call in the timeline synchronised it in the events window. I’ve circled the call in question. You can see from the StartTime of each event that the call to IDirect3DDevice9::DrawPrimitiveUP took 107349677 nanoseconds, or 107 milliseconds. When you consider that a whole frame normally completes in just 17 or 33 milliseconds, this one function call taking 107ms is a massive limiting factor on my frame rate.

Using a combination of intuition, logic, common sense, and the Render window (clicking on the call previous to DrawPrimitiveUP removed all the spheres from the preview, so it’s obvious what it was drawing!) I identified the corresponding code in my XNA program:

    device.DrawUserPrimitives<VertexPositionColor>(

        PrimitiveType.TriangleList, s_triangle3DVerticesArray,

        0, s_triangle3DVertices.Count / 3 );

You may not think this tells me very much. I already knew that the Kensei.Dev rendering code was slow, that’s why I fired up PIX in the first place! In fact, this is hugely valuable information. I know exactly which line of code is causing my GPU to run like a dog with no legs.

As this is a call to DrawUserPrimitives, it seems likely that the reason for this call being so slow lies in the User part of the method name. That is to say, the Kensei.Dev code builds up an array of vertices (s_triangle3DVerticesArray) each frame, and passes that into the function. This involves copying all those 860,000 triangles from main memory into the GPU memory, and is in contrast to using a vertex buffer, which lives on the GPU. If I can find a way to use native GPU resources and avoid the User methods, I may get a substantial speed boost; on the other hand, the User methods exist for the very usage scenario I’m using here, which is of vertices that can arbitrarily change position from frame to frame and which are controlled by the CPU.

Alternatively, it was suggested on the XNA Creators forums that I may be expecting the GPU to do too much in one go, and that splitting up the calls into smaller batches may improve performance. This is somewhat contrary to my understanding of modern GPUs, which, I was led to believe, vastly prefer to perform fewer operations on larger datasets than more operations on smaller datasets; nevertheless I am far from a GPU expert so will be taking that advice, and experimenting with splitting the vertex array/buffer into smaller pieces to see if this improves matters.

There are a few more possibilities as well. I’d like to say this story has a happy ending, but it doesn’t, at least not yet – I am hopeful for the future. I am still trying to solve this problem and find how to avoid this bottleneck. However, whenever investigating performance it is absolutely essential to base your observations and lines of inquiry on hard evidence. At the beginning of this article, I knew that “something in Kensei.Dev is slow”. PIX has since revealed that “DrawUserPrimitives is taking over 100ms to draw 860,000 triangles”. This will allow me to precisely focus my efforts, and hopefully find a correct, performant fix for the problem.

PIX has an awful lot more to offer than just single-frame samples and as your game nears completion you will probably find a lot of value in it. There are a lot of bugs that simply can’t be solved any other way, and if you are doing anything remotely clever with your graphics I strongly encourage you to learn about what PIX can do for you.

“My mistress’ eyes are nothing like the sun…” – William Shakespeare

Create a free website or blog at WordPress.com.