|Page (1) of 1 - 09/28/11||email article||print page|
How to Build a Scalable 3D Particle System
Particle systems are an ideal candidate for multithreading in games. Most games have particle systems and their general nature of independent entities lends well to parallelism. However, a naive approach won’t load balance well on modern architectures. There are two complementary approaches -- task-based threading and SSE -- that are ideally suited for particle systems and will obtain maximum performance from multicore processors.
A particle system is ideal for threading because it’s essentially a big loop that operates on a bunch of independent objects. Because the objects don’t need to interact (they don’t write to shared data), they can easily be spread across multiple threads.
Here is an example of a loop that updates all particles:
Threading these loops is trivial. OpenMP is supported by all major compilers and allows simple for-loops to be parallelized. OpenMP will automatically divide the loop and run it across all available cores on the machine.
However, this is not an ideal approach. This method does not support good load balancing in a complex system such as a real game. For example, if spawned threads generate more threads (nested threading), then it’s possible to oversubscribe the system. Oversubscription causes a performance hit because there is an overhead associated with swapping the execution of the threads.
There is a better way to thread particles than the simple fork-and-join approach of OpenMP’s parallel for. It’s very simple to divide the work to run as tasks. Using tasks provides several benefits. Once you have a tasking system set up, it’s easier to add new tasks to increase parallelism throughout the code. Also, it’s easier to load balance and be platform-agnostic. If the task scheduler manages all parallel tasks, the program will avoid oversubscription. In this example, we don’t have to wait for all the particles for a given emitter to finish before moving to the next emitter and scheduling more tasks.
To convert the loop above to use tasks, the code needs to divide the work into several tasks and submit them to a task scheduler. These tasks will define the range of particles to update and includes all required information.
Tasking is a great way to get scaling with a particle system. However, it’s also important to make sure the code fully uses the CPU cores it is running on. Developers should consider using SIMD (single instruction, multiple data) with SSE instructions. For floating point, SSE instructions operate on four floating points in a single instruction. Obviously, this has the potential to increase throughput by up to four times.
There are multiple ways to use SSE. For developers interested in maximum control, intrinsics are the best way to utilize SSE. Intrinsics are compiler-specific functions that generate inlined highly efficient machine instructions. For developers targeting DirectX on PC or Xbox, the XNA Math library wraps the use of intrinsics in a library that already supports vectors and matrices.
There are a few things to keep in mind when using the XNA Math library. First, be careful when accessing individual elements. Getting and setting elements inside an SSE vector isn’t free. It’s best to put data into XMVECTORS and keep it there as long as possible. Also, make sure you are using properly aligned data.
Both multicore and SSE give significant benefits. Hyperthreading takes advantage of the fact that all execution units for a CPU core might not be fully utilized by a single thread. Multiple execution units allow multiple instructions to be executed simultaneously and be pipelined. Execution units perform operations such as loads, stores, integer operations, floating-point operations and SSE operations. With more SSE instructions, there is a better utilization of all the processors resources and therefore better use of hyperthreading.
If you are afraid that adding a highly parallel particle system will Duke Nuke your schedule, you can always investigate middleware options. For example, the team at Fork Particle has a parallel particle systems backed with solid content creation tools.
With higher core counts, it’s possible to scale with compute power and show a larger number of particles. This would give users a perceivable difference for high-end machines, without punishing players with lesser gaming hardware.
Copyright (c) 2011 Studio One Networks. All rights reserved.>
Related Keywords:Games, Game Technology,