Using multi-threading to animate and speed up loading screens

One feature that almost all games have in common is loading screens.

Many small games can get away with a single loading screen at the start of the game to load all or most assets.

Most larger games have one before every level or environment, or at least for major transitions. The only way to get around loading screens entirely is to have only a small core engine that is loaded at the beginning of the game, and to then load everything else (other engine parts and frameworks, assets, script files..) on the fly.

To make the latter feasible without causing the game to freeze at times, concurrency is a necessity – content has to be loaded in the background while different threads update the gamestate, render it, and keep the interface responsive.

This is a complex topic however, and too big to tackle in this post.

Instead we will look at a smaller problem and how to solve it.

What do we want and why do we care?

In this post we will look at how to make a loading screen that loads a considerable amount of assets, but still remains responsive.

What this means is that we need to make our loading screen both load our content, while still running a common update loop.

In the simplest case that update loop will simple show a small animation to let the user know that the game is still loading – an important thing to communicate from a usability standpoint. But if we can go that far, we could also allow the user to interact with the screen, maybe by giving them something interesting to do or look at while they are waiting.

While we do not want to do anything so complex that is prolongs the loading process noticeably, too many games literally keep the user waiting to join the action, instead of making a larger effort to respect their time – for example by showing mission details or interesting background information.

How will we make this work?

One solution – and possibly the easiest – to both load game content and also keep an update loop running is to split up the loading process into lots of small tasks, and do just a few of them every update frame.

The main drawback of that method is that not every task can be easily split up, or task lengths estimated, which could lead to long frames that would freeze the screen and provide a less than optimal user experience.

Another solution is to use multiple threads, updating the screen on one while loading content on another.

While this may seem like a great solution, there can still be performance issues – especially when it comes to graphical assets. The reason for this is – as mentioned last week, that OpenGL uses a thread specific context to manage all resources. (For this post we will assume that our game runs on OpenGL, but similar issues can occur with other APIs as well.)

This means that we cannot render our loading screen on one thread, while also loading graphical assets – like shaders or textures – on another.

To circumvent this problem, we will combine the these two approaches.

Using for example the multi-threaded queue I developed last week we can do most of the work related to loading our content on a separate thread while efficiently scheduling our update loop thread to executes all necessary calls to the OpenGL API while it is otherwise idle.

While this can still lead to freezes, if we make sure never to do too much work on the OpenGL thread in a single thread, it is unlikely that the user will notice anything.

An example

There are many possible ways to schedule a multi-threaded loading process like this. Instead of going through a lot of different options – the best choice of which may depend on a variety of factors – we will step through and discuss an a successful implementation in a real game: Roche Fusion

See this video for how the screen turned out once we implemented it as explained below:

Basic setup

The first thing we do when a player launches Roche Fusion is creating out OpenGL context and window on the main thread. This – including loading and just-in-time compiling the game’s most important code (Roche Fusion is written entirely in C#) tends to take only a split second.

Before we show the window to the user we make sure it is set to a small size and borderless style, and it’s colour is cleared to black.

Then – and still on the main thread – we load the most basic resources of the game. This includes shaders, fonts and a few simple sprites, all of which we need to start drawing the loading screen. All but one of the textures are also used in the game itself which means we waste very little time.

This first step – with the loading screen still black – usually takes less than a second or two on most configurations.

Right after this, we initialise our loading screen visuals and do a quick fade into our animation, while also showing a small text message informing the user that the game is loading. That message is randomised and includes many humorous variants – or so we would like to think – to make the short waiting time less boring, if the users attention is focused entirely on the game.

Also note how the style of the loading screen is the same as that of the game itself – using the same kind of custom fading animations and particle and post processing effects to already start setting the mood for the game.

At this point we start a new thread that is in charge of loading all the game’s content while the main thread going into a regular update and render loop.

As mentioned above, the loading thread calls back to the main thread for all OpenGL related operations. To see how we execute these calls in a thread-safe way and also do our best to keep the main thread from freezing, check last week’s post.

This works great! We now have an animated loading screen that is much less boring than just a blank window.

However, since our loading thread sometimes has to wait for our main thread do execute OpenGL calls, our loading process may now actually be slower than it was before.

With Roche Fusion we did not see any such problems in practice. That is because rendering the loading screen takes very little time, and so the main thread is usually ready to do all the OpenGL calls we require of it.

However, why not try and see if we can actually improve the situation, and use the same concepts to speed up our loading process – while making it less boring at the same time.

Making it faster

Since we are already using multiple threads, why not go all the way?

These days virtually every computer has multiple cores, and four or more cores are common, even in consumer products – and certainly in gaming rigs. Add to that Intel’s hyper-threading technology and we have a lot of (virtual) cores to help us out.

Since loading our content involves a number of different tasks, many of which can be separated fairly cleanly from one another, why can take advantage of the hardware’s capabilities and split our loading process into a number of different threads.

There are again many solutions on how to do this specifically, but to make things easier and keep them practical we will again look at how we do it in Roche Fusion.

Roche Fusion’s multi-threaded loading

Roche Fusion loading screen thread diagram

As explained above, in Roche Fusion the OpenGL thread starts the main loading thread which is responsible for loading all content.

That thread takes a prepared queue of tasks complete, and executes them as follows.

Steam

First, we launch yet another thread which is responsible for setting up our connection to Steam. Roche Fusion use Steam for achievements, statistics and leaderboards, which are all important parts of the game. Since initialising the appropriate APIs can take several seconds however, we do this work on a different thread while continuing to load the game’s content.

Critical Systems

Then the loading thread initialises a number of critical systems, like our singleton audio manager and some other resources embedded in the executable, including parts of the gameplay logic.

While some of this could be moved to a separate thread, some of the future steps rely on these to have completed and they usually do not take more than a few dozen milliseconds or so.

Content verification

We then load the main content file of the game – this includes all essential script files, definitions for weapons, enemies, and more – into memory.

The file is then checksummed and compared to an encrypted signature file.

Since this content file is little more than a zip archive in disguise, we can thus find out if the file was modified, to prevent the user from cheating.

Content loading

We then use the file – still loaded into memory – to load our content from.

Right at the start of this process we create yet another thread which is solely responsible to load all the game’s sound effects, while we proceed to load the rest of the content on the main loading thread.

Most of the files we have to deal with are JSON files defining enemies, weapons and other things. These are generally loaded and parsed very quickly. You can find more information on how we use JSON in my introduction to Json.NET.

In terms of file size, most of our content consists of textures however. The loading thread does not load these itself, but instead pushes this work to the OpenGL thread.

Note that while we did not do so, that process could be further optimised by extracting the texture images themselves on the loading thread – or even yet another thread – and only doing the final OpenGL calls on the OpenGL thread.

However, we were already very happy with results we got so far, and did not consider doing this worth the effort in the case of Roche Fusion.

Once all our content is loaded, we wait until the audio thread has completed as well – which it usually has at that point – and with that all of Roche Fusion’s content is loaded.

Last steps

After our content is ready, we use it to initialise and populate various systems involved with Roche Fusion’s procedural generation. Since almost all our content is contained in statically typed objects at this point, this again only takes a few dozen milliseconds.

In the meantime, the Steam thread has been busy. It has done a lot of asynchronous operations, callbacks and network calls, accessing the Steam API.

It also usually had the time to download the player’s achievements and current statistics. Note that we have to ask for these more than once sometimes, since Steam seams to forget about our request from time to time.

Before we are done with this part of the loading process, we need to check whether the user is allowed to unlock any new achievements, modify their statistics or participate on the leaderboards. We do this by waiting for the main loading thread to check whether the content file has been modified or not.

If it has been modified, we disable these features, but still allow the player to play the game. This allows them to mod the game if they wish, but protects the integrity of the Steam community features.

Usually the steam thread finishes all its tasks before all content is loaded.

In either case however, once both are complete the last few systems – most noteworthy the game’s screen manager which the screen in which the actual gameplay runs – are initialised.

And then we are done.

All that remains it to dispose of the steam and master loading threads, reset the game’s window state and proceed into the proper game update loop on our original main thread.

Performance comparison

During the development of Roche Fusion’s multi-threaded loading screen we unfortunately never bothered to take detailed performance measurements.

However, while before the game took roughly 20 seconds to load, the new system did the same job in 5 to 7 seconds.

This could be improved even further by splitting up loading individual elements onto even more threads, and by doing as little work as possible on the OpenGL thread, but we were very happy with our results, and Roche Fusion loads its content like this to this day.

In fact, the significant performance gain was a surprise to us – some of our beta testers reported loading times of just 3 seconds. We had to speed up the loading screens animation to make sure all users got to see the game’s name and logo fade in properly.

Conclusion

I hope this has given you some insight and some ideas regarding the possibility of using concurrency for both better looking, and more efficient loading screens.

Naturally, your mileage may vary and depend heavily on how exactly you apply these techniques, what kind of content you apply them to, and the configuration you run them on.

One main consideration is the trade-off between being CPU bound or disk bound – whether your loading process has to wait for files to be loaded before being able to process them or not.

Usually the processing done to the content takes the most time, but it is still important to prevent unnecessarily many disk accesses. That is why we combine most of Roche Fusion’s content into a single file. This allows us to read the entire file into memory continuously, a task that both older hard disks, and modern solid state disks excel at.

Feel free to drop a comment below if you have come across other important aspects that should to be taken account, or of course if you have any questions.

Enjoy the pixels!

Leave a Reply