first, many thanks to world_of_cactus for finding this very interesting video. In it, Maksim Baryshnikov is explaining, how the server architecture works during a lecture in Minsk on 13.12.2014 (during the Wargaming Developer Contest event in Minsk). And it’s… interesting. I wish this was in English, but it is not and the guy speaks TERRIBLY (hard to understand), but I’ll do my best to bring you the info from it. So, here’s the video.
There are various servers within the cluster – (for EU it’s Amsterdam and Frankfurt). They are tied within the cluster. For some reason, RU cluster has two servers in Europe (one in Amsterdam, one in Frankfurt) and they are tied together as such:
Amsterdam has two datacenters with cca 40 servers, Franfurt has 70 servers. To compare, Moscow has 5 datacenters with 250+ servers (Novosibirsk has 80 servers, Krasnoyarsk has 70 servers).
Oddly enough, the game is distributed – for Russians, the login server and garage is in Amsterdam, while the game server is on separate location. Each datacenter works as a cluster by itself. It’s problematic to access various spots at the same time. Here he gets technical, but the general meaning I understood (okay, I always say this is not my thing but I know some of the basics) is that the lower functions of the clusters are not synchronized (asynchronous, distributed), as when considering that login server for Krasnoyarsk is in Amsterdam, the distance is such that latency becomes an issue, that’s why asynchronicity.
A battle can run on multiple nodes (server machines in the data center), one half of players is on one and the second one on the other (this is not an ideal case, but it’s possible). At this point he gets really technical, but basically, he states that probably the most important feature of the Wargaming-developed system is its ability to work with desynchronization (reliability). In order to assure swift access to databases, there are limits to working with them, such as for example each database has to be loaded to memory – no disk accessing when working with it. There are more limitations, but I just can’t understand exactly how this works (sorry, if there’s someone from the IT crowd who can speak Russian watching this, perhaps he could explain), but the result was that with the amount of database transactions per day, Wargaming now has three really bad/strange cases caused by database data inconsistency per day. These include an example, where a player was seeing himself as not in clan while the clan saw him still as a member, this state was caused by series of extremely low probability events, that however do happen since the amount of data transactions on WG server is so high.
More than 50 components of World of Tanks are handled outside the specific game infractructure, amongst them are the registration, SSO, game data access, clans, CW, portals, forums, shops and others. These components talk to each other using API. The fact that the entire system is split into components is advantageous for Wargaming, as it allows them to develop and test each component separately.
The disadvantage of such a distributed system is that for example your clan status is handled by clan service component and the game service has to “request” it. It’s thus possible for various components to fall out of sync. It’s not possible to sync everything in real time (every time either), WG had to develop a system where the data was actualized for all the components that don’t use them from the data master component, so you don’t get desyncs. Some components however can get actualized later than the more important ones, that’s why in a game you can see yourself in a clan (if you just entered one) while the portal gets actualized later via API. In any case, each components gets rigorously tested, even for really strange cases (some other components stopping working etc.).