You may have heard of data-oriented game engine design, a relatively new concept that proposes a different mindset to the more traditional object-oriented design. In this article, I'll explain what DOD is all about, and why some game engine developers feel it could be the ticket for spectacular performance gains.
A Bit of History
In the early years of game development, games and their engines were written in old-school languages, such as C. They were a niche product, and squeezing every last clock cycle out of slow hardware was, at the time, the utmost priority. In most cases, there was only a modest number of people hacking at the code of a single title, and they knew the entire codebase by heart. The tools they were using had been serving them well, and C was providing the performance benefits that enabled them to push the most out of the CPU—and as these games were still by large bound by the CPU, drawing to its own frame buffers, this was a very important point.
With the advent of GPUs that do the number-crunching work on the triangles, texels, pixels, and so on, we've come to depend less on the CPU. At the same time, the gaming industry has seen steady growth: more and more people want to play more and more games, and this in turn has led to more and more teams coming together to develop them.
Bigger teams needed better cooperation. Before long, the game engines, with their complex level, AI, culling, and rendering logic required the coders to be more disciplined, and their weapon of choice was object-oriented design.
At big companies, software tends to be written by large (and frequently changing) teams of mediocre programmers. Object-oriented programming imposes a discipline on these programmers that prevents any one of them from doing too much damage.
Whether we like it or not, this has to be true to some degree—bigger companies started to deploy bigger and better games, and as the standardization of tools emerged, the hackers working on games became parts that could be swapped out way more easily. The virtue of a particular hacker became less and less important.
Problems With Object-Oriented Design
While object-oriented design is a nice concept that helps developers on big projects, such as games, create several layers of abstraction and have everyone work on their target layer, without having to care about the implementation details of the ones underneath, it's bound to give us some headaches.
We see an explosion of parallel programming—coders harvesting all the processor cores available to deliver blazing computation speeds—but at the same time, game scenery becomes more and more complex, and if we want to keep up with that trend and still deliver the frames-per-second our players expect, we need to do it, too. By using all speed we have at hand, we can open doors for entirely new possibilities: using the CPU time to reduce the number of data sent to the GPU altogether, for example.
In object-oriented programming, you keep state within an object, which requires you to introduce concepts like synchronization primitives if you want to work on it from multiple threads. You have one new level of indirection for every virtual function call you make. And the memory access patterns generated by code written in an object-oriented manner can be awful—in fact, Mike Acton (Insomniac Games, ex-Rockstar Games) has a great set of slides casually explaining one example.
Similarly, Robert Harper, a professor at Carnegie Mellon University, put it this way:
Object-oriented programming is [...] both anti-modular and anti-parallel by its very nature, and hence unsuitable for a modern CS curriculum.
Talking about OOP like this is tricky, because OOP encompasses a huge spectrum of properties, and not everyone agrees what OOP means. In this sense, I'm mostly talking about OOP as implemented by C++, because that's currently the language that vastly dominates the game engine world.
So, we know that games need to become parallel because there is always more work that the CPU can (but doesn't have to) do, and spending cycles waiting for the GPU to finish processing is just wasteful. We also know that common OO design approaches require us to introduce expensive lock contention, and at the same time, can violate cache locality or cause unnecessary branching (which can be costly!) in the most unexpected circumstances.
This raises the question: should we rethink our paradigms altogether?
Enter: Data-Oriented Design
Some proponents of this methodology have called it data-oriented design, but the truth is that the general concept has been known for much longer. Its basic premise is simple: construct your code around the data structures, and describe what you want to achieve in terms of manipulations of these structures.
We've heard this kind of talk before: Linus Torvalds, the creator of Linux and Git, said in a Git mailing list post that he is a huge proponent of "designing the code around the data, not the other way around", and credits this as one of the reasons for Git's success. He goes on even to claim that the difference between a good programmer and a bad one is whether she worries about data structures, or the code itself.
The task may seem counterintuitive at first, because it requires you to turn your mental model upside-down. But think of it this way: a game, while running, captures all the user's input, and all performance-heavy pieces of it (the ones where it would make sense to ditch the standard everything is an object philosophy) do not rely on outside factors, such as network or IPC. For all you know, a game consumes user events (mouse moved, joystick button pressed, and so on) and the current game state, and churns these up into a new set of data—for example, batches that are sent to the GPU, PCM samples that are sent to the audio card, and a new game state.
This "data churning" can be broken down into a lot more sub-processes. An animation system takes the next keyframe data and the current state and produces a new state. A particle system takes its current state (particle positions, velocities, and so on) and a time advancement and produces a new state. A culling algorithm takes a set of candidate renderables and produces a smaller set of renderables. Almost everything in a game engine can be thought of as manipulating a chunk of data to produce another chunk of data.
Processors love locality of reference and utilization of cache. So, in data-oriented design, we tend to, wherever possible, organize everything in big, homogenous arrays, and, also wherever possible, run good, cache-coherent brute-force algorithms in place of a potentially fancier one (which has a better Big O cost, but fails to embrace the architecture limitations of hardware it works on).
When performed per-frame (or multiple times per frame), this potentially gives huge performance rewards. For example, the folks at Scalyr report searching log files at 20GB/sec using a carefully-crafted but a naive sounding brute-force linear scan.
Examples
Data-oriented design has us thinking all about data, so let's do something also a bit different from what we usually do. Consider this piece of code:
void MyEngine::queueRenderables() { for (auto it = mRenderables.begin(); it != mRenderables.end(); ++it) { if ((*it)->isVisible()) { queueRenderable(*it); } }
Although simplified a lot, this common pattern is what is often seen in object-oriented game engines. But wait—if a lot of renderables aren't actually visible, we run into a lot of branch mispredictions which cause the processor to trash some instructions that it had executed in hope a particular branch was taken.
For small scenes, this obviously isn't an issue. But how many times do you do this particular thing, not just when queuing renderables, but when iterating through scene lights, shadow map splits, zones, or the like? How about AI or animation updates? Multiply all that you do throughout the scene, see how many clock cycles you expel, compute how much time your processor has available to deliver all the GPU batches for a steady 120FPS rhythm, and you see that these things can scale to a considerable amount.
It would be funny if, for instance, a hacker working on a web app even considered such miniscule micro-optimizations, but we know that games are real-time systems where resource constraints are incredibly tight, so this consideration is not misplaced for us.
To avoid this from happening, let's think about it in another way: what if we kept the list of visible renderables in the engine? Sure, we would sacrifice the neat syntax of myRenerable->hide()
and violate quite a few OOP principles, but we could then do this:
void MyEngine::queueRenderables() { for (auto it = mVisibleRenderables.begin(); it != mVisibleRenderables.end(); ++it) { queueRenderable(*it); } }
Hooray! No branch mispredictions, and assuming mVisibleRenderables
is a nice std::vector
(which is a contiguous array), we could have as well rewritten this as a fast memcpy
call (with a few extra updates to our data structures, probably).
Now, you may call me out on the sheer cheesiness of these code samples and you will be quite right: this is simplified a lot. But to be honest, I haven't even scratched the surface yet. Thinking about data structures and their relationships opens us to a whole lot of possibilities we haven't thought about before. Let's look at some of them next.
Parallelization and Vectorization
If we have simple, well-defined functions that operate on large data chunks as base building blocks for our processing, it's easy to spawn four, or eight, or 16 worker threads and give each of them a piece of data to keep all the CPU cores busy. No mutexes, atomics or lock contention, and once you need the data, you need only to join on all the threads and wait for them to finish. If you need to sort data in parallel (a very frequent task when preparing stuff to be sent to the GPU), you have to think about this from a different perspective—these slides might help.
As an added bonus, inside one thread you can use SIMD vector instructions (such as SSE/SSE2/SSE3) to achieve an additional speed boost. Sometimes, you can accomplish this only by laying your data in a different way, such as placing vector arrays in a structure-of-arrays (SoA) manner (like XXX...YYY...ZZZ...
) rather than the conventional array-of-structures (AoS; that would be XYZXYZXYZ...
). I'm barely scratching the surface here; you can find more information in the Further Reading section below.
Unit Testing You Didn't Know Was Possible
Having simple functions with no external effects makes them easy to unit-test. This can be especially good in a form of regression testing for algorithms you'd like to swap in and out easily.
For example, you can build a test suite for a culling algorithm's behavior, set up an orchestrated environment, and measure exactly how it performs. When you devise a new culling algorithm, you run the same test again with no changes. You measure performance and correctness, so you can have assessment at your fingertips.
As you get more into the data-oriented design approaches, you'll find it easier and easier to test aspects of your game engine.
Combining Classes and Objects With Monolithic Data
Data-oriented design is by no means opposed to object-oriented programming, just some of its ideas. As a result, you can quite neatly use ideas from data-oriented design and still get most of the abstractions and mental models you're used to.
Take a look, for example, at the work on OGRE version 2.0: Matias Goldberg, the mastermind behind that endeavor, chose to store data in big, homogenous arrays, and have functions that iterate over whole arrays as opposed to working on only one datum, in order to speed up Ogre. According to a benchmark (which he admits is very unfair, but the performance advantage measured cannot be only because of that) it works now three times faster. Not only that—he retained a lot of the old, familiar class abstractions, so the API was far from a complete rewrite.
Is It Practical?
There is a lot of evidence that game engines in this manner can and will be developed.
The development blog of Molecule Engine has a series named Adventures in Data-Oriented Design, and contains a lot of useful advice regarding where DOD was put to use with great results.
DICE seems to be interested in data-oriented design, as they have employed it in Frostbite Engine's culling system (and got significant, speed-ups, too!). Some other slides from them also include employing data-oriented design in the AI subsystem—worth looking at, too.
Besides that, developers like the aforementioned Mike Acton seem to be embracing the concept. There are a few benchmarks which prove that it does gain a lot in performance, but I haven't seen a lot of activity on the data-oriented design front in quite some time. It could, of course, be just a fad, but its main premises seem very logical. There sure is a lot of inertia in this business (and any other software development business, for that matter) so this may be hindering large-scale adoption of such a philosophy. Or maybe it's not such a great idea as it seems to be. What do you think? Comments are very welcome!
Further Reading
- Data-Oriented Design (Or Why You Might Be Shooting Yourself in The Foot With OOP)
- Introduction to Data Oriented Design [DICE]
- A rather nice discussion on Stack Overflow
- An online book by Richard Fabian explaining a lot of the concepts
- A benchmark showing other side of the story, a seemingly counter-intuitive result
- Mike Acton's review of OgreNode.cpp, revealing some common OOP game engine development pitfalls