Optimization is FUN!

Wednesday, October 15, 2008

Well, not really at all - but it's something every project needs to stop and work on every once in a while to make sure things will run smoothly. As we race towards the deadline for submission to the IGF, we had to come to terms with the fact that performance was not where we needed it to be with 16 players on the screen. In a game which focuses so much on physics and flying and tumbling around, not to mention driving at high speeds - you really need some steady performance in order to have the most fun.

So for the last week and a half we have been overhauling certain parts of the graphics in order to speed things up. With 16 players on screen at once, we were getting around 20 fps with our decent rigs, so we had a lot to work on.

The thing we had to work on was the "batch count" in the game, which is basically a measure of how many tasks the computer and video card need to do in order to render everything on the screen. Ideally, grouping as many things as you can into one of these batches is the best way to make things faster. So we set upon figuring out where every single one of our batches was going. Right off the bat, we found that we could treat the level entirely as one piece, and reduce the number of batches it took to render significantly. This was our first easy fix, and as anyone in development will tell you, there is nothing quite so sweet as discovering an easy fix. Once we addressed that, things were running really fast with nobody playing. We knew the performance issues were in large part tied to the number of players in the game, so Brian hooked up a key to pop a fake client into the game. Now it was easy to drop 15 other players into the game and test performance.

Not too surprisingly, performance was still underwhelming although a bit better becuase of our changes to the environments. We decided to use NVIDIA's PerfHUD to step through all the rendering steps, which was pretty easy to do since Ogre already supports the tool.

After some testing we figured that every new player added to the game averaged around 20 new batches to display, which multiplied by 16 added up to a lot. This was due to the fact that each players stuff was made up of a lot of different customizable items: there are separate karts, characters, hats, accessories, and wheels. Those are only 5 different items, (8 if you count every wheel) but each item could have any number of materials on it (in order to make it look awesome, of course) and also the kart and the wheels were casting a shadow,each of which made up it's own batch.

So, optimally we needed each player to use the absolute minimum number of batches, which would be 5, since each player loads 5 meshes. To achieve this, we created a simple shader that would enable us to do the effects that we used such as color masks, environment mapping, and rim lighting - but all in one material. Each character kart hat, etc could be rendered all in once batch now since it was all inside one material. Since we are shader noobs, it took us a few days to get the shader working, as well as a few days to move all of the items we had previously made to this new material format and make sure they would work. Brian also created a shader especially for the wheels that made use of Ogre's model instancing to render all 4 wheel models in one batch instead of four. After all these changes, we succeeded in reducing each player to 5 batches instead of 20! Huzzah!

We also decided to move away from the stencil shadows we were using to create shadows under the karts and items, they looked great - but the way they are created was more and more of a bottleneck the more players that were on screen. We dug into Ogre's texture shadow system, and set up some render to texture shadows that while not totally inexpensive, at least performs better with many shadows being cast at once.

All that work, and hopefully nobody will ever know about it when they go to play the final game! Here is a screenshot with myself and 15 test players onscreen, with Ogre's performance display.

We still have a lot of improvements to make, the soonest of which will probably be hooking up our new shader to utilize hardware skinning in order to move some of the animation cost to the GPU. But we are off to a good start! Already the game feels a lot more responsive with lots of players, and that's fun for everybody!

p.s. here is the 3d mark score for my computer which this screenshot was taken on.


Wingman said...

Keep up the great work. I've been following you guys for quite some time now, and I can't wait to see the final project. Need any help for testing builds?

Brian Cronin said...

Thanks wingman. We will probably will do some testing next week. If you are interested in helping out, join our Steam group here: http://steamcommunity.com/groups/zerogear (You need Steam installed first). We post announcements there when we need more testers. We may also post on the blog if we need even more testers.

Anonymous said...

Why not merge all the mesh bits into a single mesh at load time?

Brian Cronin said...

I assume you are talking about merging all the player items (kart, wheels, character, hat, and accessory) into one mesh at load time. Each of these items has it's own texture, masks, colors, and other properties. So they cannot be rendered in the same batch. There would be 5 batches. Also, the hats, accessories, karts, and characters all have different animations. So it wouldn't be possible for us to merge those animations together.

The 4 wheels are being merged together at load time however.

The static levels are merged together as best as possible when they are exported from the art tool.

The goal was to merge all the mesh data together that we could without sacrificing too much. We did go from 6 colors per item down to 4 however.

Edward Elwood said...

Efficiency is a beautiful thing! (:

bod said...

Hey guys. This is looking really nice! Just thought i'd chuck my 2p in in case it might be of use to you.

I haven't done much graphics programming for a while (particularly with high end PC cards), so these points may not still hold true but they might be worth looking into if you become pixel shader bound.

You could try moving the texture reads. Do these as early as possible as the operation that uses the result will stall until the data has been fetched, and fetches are slow. (e.g. the use of reflectedColor is immediately after the texture read. If you put the read at the start it is more likely to be done by the time you use it later in the shader).

Also, conditionals in shaders used to be extremely slow. If this is still the case doing 4 of them may be painful!

Good luck at the IGF :)

Brian Cronin said...

Hey bod,
Thanks for the tips! I moved the texture fetches like you recommend. I know about the conditionals :( I need to get rid of them soon...

Thanks again!