Personal rating proposal

Author: Hillin (Chinese server)

Hello everyone,

today, we have an interesting guest article by Hillin, who actually is from China and plays on their private server. It’s his personal rating proposal and – whatever you might think of it – it’s very thought thru. Took me a while to grasp though, so 3 mins won’t do.

- SS

Hillin:

As we all know, in patch 8.8, an official personal rating system is live in game, and many players disliked it. Edrard had also written a post to discuss about it. In China, most players use a modified XVM algorithm, which in my opinion is not quite a good one. Finally I decided to write an article to demonstrate how would my ideal personal rating system be like. And here it is. Hope it can be published on your FTR site, so it could be shared and discussed by people all around the world. And sorry for any literal or syntax problems, as English is not my mother language.

A Personal Rating System: My Ideal Design

Discussion Principles

1. No personal rating (PR) system could be sufficient to measure the overall skill level of a player.

A PR system is implemented with simple or complex computation of gameplay statistics, many in-game events or actions which reflect the personal ability in strategies and tactics of a player could not be epitomized in these statistics. A PR system can only measure the personal ability of a player in a certain framework with a certain algorithm, and it is only able to ensure its fairness to all players within this framework. Anything other than how to give such a framework and an ideal algorithm, besides whether to use a PR system or not, is not discussed in this topic.

2. Implementability is not discussed in principle.

The major non-official PR systems lacks some statistic data, which is considered important in reflecting one’s battle skill, such as overall assisted damage (damage dealt to targets you are spotting, by teammates who are not spotting them themselves), makes it difficult to rate a scout player. In the PR system we are going to design in this topic, we intensively utilize a bunch of existing statistic data, in which, however, many could not be retrieved by non-official developers. This topic tries to inspire future PR system designs, so we principally do not discuss about implementability too much.

3. No detailed formula will be given.

As we’ve said above, many parameters of our PR system are not accessible, so it doesn’t make sense to give a detailed formula for the system. However, I will try to give draft formulas, with constants left unfilled, in order to help readers to understand this rating system more easily.

4. Examples in this topic are randomly provided.

Design Principles

1. The PR value does not simply reflect the personal ability of a player. The contribution to the team is more important.

It is easy to understand, because World of Tanks is a team-based game. Personal ability, such as damaged dealt, are important, however, a large-scale applied PR system will affect the battle behavior of players, and eventually the global atmosphere of the game in turn (e.g. IMO the official PR system actually encourages players to play more negatively). As for a team-based game, the PR system should encourage people to contribute to the team as much as possible.

2. Design from the view of followers.

The PR value, in one hand, is a utility to find out which aspect could be improved for a player himself, in the other hand, is a window to know your opponents. We should start from these two angle-of-views, in order to provide accurate information for these two kind of followers. Other parameters should be rejected.

a) Employ benchmark values based on statistics, rather than by experience; use relative values rather than absolute ones.

Major PR systems employ absolute benchmark values (such as the base damage for each tier) by experience. However, I consider a benchmark value based on relative statistics is more reliable. Such a value is calculated by comparing the stats of one player and the overall level, which reflects how the player is performing amongst all players.

b) In a single battle, Rating per Tank (RT) of a player makes more sense than his overall PR value.

It should be easy to understand, a player good at sniping might play bad while scouting. You are fighting an opponent on a certain tank, you care about how did he performed on this tank more.

c) The PR value should represent the recent performance of a player, rather than his whole history.

Everyone begins as a noob. What you care is your opponent’s current skill level, his history makes no sense to you.

The Design

1. The Absolute Rating per Battle (RB-abs) value

This value represents how a player performs in a single battle. It is the weighted sum of two values: the common part and the tank-specific part, and finally adjusted by the battle result.

a) The common part

The common part is basically the raw experience (with premium account and special premium vehicle bonuses exempted) you get, with the average Rating per Tank (RT) value of your teammates and opponents taken in count.

Raw battle experience (Xraw) is a composite factor, based on almost all sorts of battle statistics, computed by a complex official algorithm. It represents the performance of a player in an official way, in how the creators of this game wanted our play-styles to be. This official algorithm, in my opinion, have much more say than any other algorithms. Just as Edrard, the inventor of the Effective Value system, said in his post, if the raw experience per battle of a player is displayed in his personal statistics, and can be viewed by anyone else, there is no need to invent another rating system.

The average RT of your teammates (RT-teammate) and opponents (RT-opponent) are balance factors. The higher your enemies’ average RT and the lower your teammates’ average RT are, the higher rating point you will get from this battle. That is, you get bonus when you are fighting a strong enemy, or leading a relatively weak team, vice versa.

My draft formula for the common part of RB-abs is:

f1

b) The tank-specific part

This part is based on several battle statistics, such as damage dealt, potential damage received, enemy spotted and assisted damages. However, different kind of tanks have different weights to these factors. Here we do not use a narrow sense, such as light/medium/heave tanks, TD and SPGs to categorize tanks, for there are many crossover tanks in the game, such as the Crusader (acts more like a medium but categorized as a light tank) and Bat Chatillon 25t (generally both a scout and a damage output, unlike many other mediums). Instead, weight schemas are defined in a per-tank basis, according to their characteristics.

The factors are:

Enemy spotted (S): the number of enemies you spotted (for the first time they are detected) is an important factor for scouts. This factor is heavily weighted for most light and medium tanks, less for others.

Damage dealt (D): an important factor for almost all kind of tanks, especially for damage outputs. Weights a lot for TDs and SPGs, less for heavies and mediums. Combat light tanks (such as AMX 13 90 and WZ-132) also have a relatively high weight on this factor, however pure scouts (e.g. M24 Chaffee) has minimum weight on it.

Assisted damage dealt by spotting enemy (Aspot): Very important for scouts. Heavily weighted for most light and medium tanks, less for others.

Assisted damage dealt by detracking (Adetrack): A minor factor for all tanks.

Potential damage received (H): Important for tankers, mostly heavy tanks and heavy-armored TDs. Less weighted for lighter heavies (such as Tiger and AMX 50 series) and tough mediums (such as E-50 and T-54), minimum for other type of tanks.

Capture points and defense points (C and F): Equally important for all tanks, these two factors are very important for the team, while defense points are even more important.

While recording and calculating the damage-related factors, the RT and tier (T) of its related target must be taken in count. For example, the target you hit, or the source of potential damage you received. The ratio of his and your (RT × T) value will be multiplied to the factor score. The higher the target’s RT and T is, the more score you will get from this factor. This is to encourage you to fight high-value targets rather than kicking around noobs, which is beneficial to your team’s interest.

The formula for this part might look like:

f2

Where Tbattle is the tier of battle (1~12);

ks is the basic score of each enemy spotted, for example, 20;

WC and WF are weight factors for capture points and defense points, respectively. They are the same for all tanks, their values, for example, might be 0.5 and 1.5 respectively.

For instance, the weights for Bat Chatillon 25t might be (the sum of weights does not necessarily to be 1.0):

f3

And the weights for E-100 might be:

f4

c)  Battle result adjustment

The battle result, of which its integration is literally the win-rate, is a global factor to RB-abs. Although Edrard did not agree that win-rate is a valuable factor, in my opinion, everyone in the team is responsible to the battle result, conversely, the battle result reflects how a player is performed in its own way. Furthermore, according to Design Principle 1, our rating system should encourage players to consider the interest of the team as the most important thing, that is, to win.

The final RB-abs value thus will be:

f5

Where kresult is the battle result influence value, for example, it could be 1.0 on win, 0.75 on draw and 0.5 on lose;

Wcommon and Wtank are weight factors. Their values, for example, could be 0.3 and 0.7 respectively.

2. The Rating per Tank (RT) value

A player’s RT value of a tank, is his time-aware average RB-abs value of this tank, normally distributed in all players.

Simply speaking, the RT value should be within an interval of 0 to 200 (or 0 to 1000 or any other interval, not important). Most players has an RT value near 100. First class players might reach 190, but virtually no one could reach 200.

The term “time-aware” means, recent battle RB-abs values weights more than ancient ones. This is discussed in Design Principle 2c.

Most popular PR systems give absolute rating values, which do not take the rating of other players in count, makes it difficult to tell how the player is performing amongst all the players. As we’ve mentioned in Design Principle 2a, the normalized value will help followers to understand much faster and easier.

In practice, the RT value might be very inaccurate on a newly brought tank. We could use the personal rating (PR, which is discussed in the following section) instead in such case, until a certain amount (say, 20) of battles are played with this tank.

Another practical problem is, the RT is a distributed value, so if one player’s RB-abs changed, all the players’ RT value will change with it. It is virtually impossible for the server to calculate all players’ RT on the fly. This however can be resolved by applying a periodic update mechanism, which will not be discussed in this topic.

3. The Personal Rating (PR) value

The PR value is simply the time-aware and tier-weighted average of all his RB-abs values. The weight value is related to the tier of tanks, thus playing on low tier too much will result in a low PR value; playing on higher tier is encouraged. The PR value is mostly used to exhibit oneself, in some cases, it will be used to temporarily replace the RT value, as we’ve discussed in the section above.

Conclusion

In this topic, I have demonstrated my ideal design of a personal rating system for World of Tanks. This system is complex, many factors are considered while calculating the rating value, from common battle statistics to value distribution; but the purpose is simple: it tries to measure how a player is performing in the game. It is virtually impossible for a third-party developer to implement this system, the official might be able, but there is little possibility for them to do it. However, this does not mean that this system makes no sense, I hope its design will help and inspire future design and development of rating systems, even the official one perhaps. Anyway, thanks for reading.

124 thoughts on “Personal rating proposal

  1. Looking at some posts about how “hard” is the matchmaking to understand from the point of view of some WG.net employees and the fact that they hate restricted matchmaking, I am surprised they actually made a rating system but that is not my point, my point is they are so bad at coding a good matchmaking system, how can they actually code a better rating system?

    • If I wasn’t too clear all I am trying to say is just look at the competition, War Thunder in arcade battles even though it has 20 tiers and less players then WoT has still has implemented a 2 tier(0-1, 2-3, 4-5 … etc.) spread matchmaking instead of a 4 tier spread like we have here(-2/+2, even WoWp has a 4 tier spread which is bad in a game were the planes have HP and each +2 tiers has double the amount of DPS of -2 tiers, and i’m not even gonna start on tier 10s TDs vs t8 tanks), and i do not care if once upon a time there were -4/+4 tier spread for pete’s sake there are constantly 100k players on each european server at “rush” hour, don’t tell me there are not enough players for a better matchmaking system now.

      • I believe the reason for the +2/-2 tier spread is not ineptitude, but greed.

        It is much easier to have guys feel invincible when having to fight lower tier enemies, and thus gets them hooked easier, ergo more cash for WG.

        • I don’t think that’s really true. You can do really much with a -2 tier tank in any fight. It’s true, that if you are in the wrong spot at the wrong time, then you are dead no matter what you do, but that’s the thing of it: not to be there, finding how to be at the right spot!

          It’s not easy, but as I’m advancing more and more in the gami I find myself a lot of times enjoying this kind of challange more then playing with equal tiers. Here every bad move you make might be your last, you have to be the little invisible mosquito to slowly drain the health and nerves of the enemy, allways aiming for the weak spots.
          It’s true also that this can be done more easily woth medium and light tanks, but a heavy can also do this, you just maybe have to have a bit more patience.

          PS: every player is able to afford a few gold shells in ther guns, then +2 tier armor should not be a problem anyways.

          I think this is a great system, it would be more boring with +/-1 spread.
          (now let the rage flame begin)

          • Have fun in a tier 8 Heavy “scouting” for your team or be in a better “position” vs 4 tier 10 TDs and 4 tier 9 TDs, ’cause I’m 100% sure that’s why you like playing 4 tier spread Matchmaking.

            • Why do you try to scout with a heavy? It’s not meant for that.

              The thing you search is patience.

              (true: you will need a minimal amount of teamwork from your teammates wich is not allways present, but that’s true with equal tiers random games too)

            • When there happens to be no low tier lights and just a few mediums, somebody has to pick up the slack. And don’t count on t9 meds, more often then not they’ll be practicing the patience thing along the heavy tanks.

            • *shrug* And then in the next match you have equal odds to be the top dog in a 6-to-8 tier spread. Cry some more, whiner.

      • I dont get it please explain.

        In WT as a T2-3 I can see T0-1 and T4-5 …
        Where is the difference?

        • Every T8 heavy has a gun capable of damaging every T10 tanks although some are tough ofc…its just a matter of reading the setup of the Teams and then acting with that in mind…the +-2 Tierspread is good enough, and good player will always pawn :) On top you get much more xp out of a T10 game with a T8 tank if you do dmg…

          greets
          Rich

          • All tier 8 tanks disagree with the statemens vs MAUS. My T 28 proto was the only one who could penetrate him one in 2 shots from the side, and my T 32, T 69 and Pershing did not have a clue even form the back, without gold shells(T 32 managed to punch though into an unangled back once.

            • Armor is pretty much the only thing the Mouse *has* to recommend itself, so it’s kind of a given it should be a tough nut to crack. Flank and load prem shells. Can’t? Then the other team is doing something better than yours.

            • This can’t be true, I can pen the back with the T20, and it has less pen than any of the tanks you said (160mm).

  2. This is one of the first people, who when creating a PR system/formula, actually realises upfront that it cannot truly mean anything, as I have been stating on the Wot forum for months and getting negs for..

    As he states, major, and I mean, MAJOR, data is not available for public use, spotting being aprime example..
    ..and the fact that different tanks require ENTIRELY different skills, some people excel at scouting, some heavy brawling, some REMFs (I mean artillery..), and cannot play other tanks that well…

    A single, overall, PR system will never be correct.

    btw… his English shames most native English speaker, especially considering the nature of the discussion..

    PS Donkey says “bite the pillow, I’m going in dry….!

    • Donkey was a bitch. Forced me to read the whole article.

      Some really good thoughts. Would love to see this system in practice.

    • This guy went way too far on a road towards increased complexity of the ranking. Why? Because adjusting multiple weights for all tanks means this ranking will never be better then the ranks asigned and with so many factors taken out of thin air – it is useless.

      Equally dangerous is the fact that such ranking assumes one optimal playstyle to every tank – which is something quite contrary to WoT should be about – use the tank the way YOU find it useful, as long as it work.

      There will never be an agreement about which factor is important and which is not, and how many base capture points are equal to 1000 damage done. That’s why 1 number will never be eficcient.

      Basically, this idea, if it would ever be possible to implement, would lead to the magic number that would be praised by those who have it high and regarded stupid by those who have it low.
      Sounds like every player ranking so far :D

      There is an upside to this proposition – using the numbers compared to the average in the playerbase not some stiff factors, however his suggestion goes so far that it is a bit redundant.

      And to top of that – going into who you shoot in every battle is very cimplicated while I think averages a lot with enough number of battles.

      What I would take from his proposition?
      I would merge the chart graphs like this circles
      http://www.vbaddict.net/player/phalynx-eu-0af79b837139a524d1df5b2289bcc213
      with his idea of using not absolute values but rather how you look compared to the others.

      The best way to visually represent your skill is not with some kind of an absolute number, but with a number from 0 to 100 with 0 meaning you are the worse on a server, 100 you are the best, 75.33 means that 75.33% of players is worse then you and so on.
      Downside of this is you need to know all the results on any given statistics, upside is, that you really know how good are your scores.

      So:
      For each possible statistic (WR, damage, damage upon spotting, you name it) lets do:
      …… For each tank lets do:
      ………….. check how many players has on average lower appriopriate score on a given tank

      and then for each statistic use the averge value, maybe with some smart weights to exclude or reduce the weight of a tanks you haven’t played much.

      Then the result you can show as a graph or you can compile one easy to grasp number from it according to a more or less flawed formula.

      That’s what I would be looking for.

      on a related side note:
      XWN (WN6 transferred to 0-100) would be way better if it was not translated with A WN6 = 0, B WN 6 = 100, inear dependence in between but instead if it was a function to work in a way that XWN = c means c% of players has lower WN6 then this player.

      • Denier has no shits to give to the XVM fanboy, so he continues to play for fun and relaxing and not some numbers.
        (still oneshotting u with T92 :D )

        • “play 4 fun and relaxing” players is the worst kind of players in this game…if you want to fucking relax then go and play minesweeper or solitaire and not ruin the games for people who actually care to win

      • dummie probably thinks XVM is the reason why he’s shit…even if you somehow made all the “XVM fanboys” to read the article that wouldn’t prove your point….article is about even more complex rating system(advanced XVM) and no matter what parts of gameplay it uses in its calculations you would still suck measured by it….better go and practice in training rooms instead of crying about XVM fanboys rustling your jimmies

          • don’t worry…some day someone will make XVM for retards where red will be good and purple will be bad….then you can brag around how awesome you are…until then practice and only practice

    • Yep, that and several thousand years of statesmanship, war doctrine and awareness of their history, vast land and own strengths. ANY nation on earth, I’d even dare say a pact of a few strong ones too, could not hope to win against China. In a dick swinging contest, size does matter.

      • Several thousand years of statesmanship … this doesn’t seem to be helping the Italians to much at the present. Or the Greeks.

        Awareness of history … the same. What happened when Europeans came knocking at the doors ? Civilised, clean and sophisticated Chinese had to watch grubby, constantly warmongering Europeans sail within gun range of the Imperial Palace and force upon them a not so beneficial “trade treaty”.

        War doctrine … they used gunpowder as fireworks for crying out loud !!!

        Own strength supposedly means huge population ? Human wave tactics such as in the Korean War ? Send a platoon to take hill XZ. If in half an hour we don’t get news that they succeeded presume dead and send another platoon. Continue until we know that hill XZ is taken or we run out of platoons.

        On the other hand it would be hands down crazy for anyone to want to attack the place where all of our shit is made.

        Also, military dick swinging contest in today’s world means local confrontations of limited scale with huge technological input and limited outright destruction. The end or continuation of further prestige damaging confrontations is used then as a bargaining chip in UN mediated negotiations. No superpower is ready to confront another with hight tech toys because they are to well tied and risk international ridicule if one’s superweapons are defeated by the other’s.

        • So, the fact that those things you stated arent helping others mean they also are not in the mind of Chinese leadership and military, and not effective? False argument.

          1. The fact that the Chinese political clique is STILL controlling a billion+ population with a system that failed in almost every other place shows they do know how to use the statesmanship and tradition tehy have inherited. There may have been idiots (Mao’s Great Leap Forward anyone?) but these current are anything but. They learn from their mistakes. Fast.

          2. The Europeans may have dominated once, but as you will probably live to see, the Chinese have learned, and definitely did not forget. I predict, and some politicians and a lot of economists agree, that the world of 30 years from now will be dominated by China. History works for those who remember it even if its negative for them. As an example of not letting it happen again.

          3. War doctrine is not war technology, so please learn to use a dictionary. Sun Tzu’s principles are equally applicable in cyber warfare as they are in the field, and the tools (gunpowder, APCR ammo, ICBMs, computer viruses…) are just that, TOOLS to implement a doctrine. And guess who currently has control of their own cyberspace, probably the biggest army of hackers and the means to quickly adapt to tech changes in the IT world? Its not hard, it starts with C.

          4. Your outdated concept is definitely not what I had in mind, but feel free to assume. It does make an ass out of you but not me. The fact that China has billion+ people also means that it has the option to SELECT among a billion+ individuals, and find experts. Not to mention the fact that you yourself admitted their economic power. Guess who can make the most weaponry within the least amount of time? IF they wanted to, that is…

          On one thing we agree, there will not likely be a war, not even a small scale conflict you describe, involving China. There will be a slow, inevitable advance of the Chinese by subtle, economic, political means. Until they get what they want. And IF they ever go to war, the results of that conflict will be a done deal.
          What i

          • You funny guy. Also calm down and stop with the ad hominae already, I didn’t attack you, I just put a commentary under your claim.
            I still do not see how millennia of statesmanship have anything to do with the world today, and I gave two examples that it doesn’t. Are Italy and Greece the exception to the rule, or is China the exception to the contrary rule ? Debatable. That is why I say that a tradition in statesmanship doesn’t guarantee anything.

            PRC has been around for what, 64 years ? The USSR lasted ~69 years. The DPRK for 67 years. What makes China special here ? The 1.35 billion people ? Why ? Controlling 200 million and controlling 1 billion is so different do you think ? Is there a critical mass for population in these cases ? I don’t agree with your argument.

            War doctrine involves the proper development and use of technology. I don’t need any dictionary thank you very much. The fact that the Europeans saw a bloody application for black powder and applied it in mass before anyone else is a testament to the constant state of warfare that has been present on this continent for almost the last thousand years. In that time China was kind of enjoying peace and prosperity. What is so bad about that ? Also, to say that the principles of Sun Tzu are unique is silly. Alexander the Great use the same. The Romans used the same. Napoleon to. By the way, ever heard of a guy called von Clausewitz ? Nevermind then.
            The Iraqis had the principles in mind very well in ’91, that didn’t stop the bombs from falling. The Germans had a very good grasp of the theory in ’44, that wouldn’t resupply their troops. Theory in itself is fine, but you better back it up with the practical tech or it’s useless.

            With the rest we may be pretty much in agreement, but for one thing. These are all speculations. Hitler also did an economic miracle for Germany before the war. It mostly meant to burrow money he didn’t plan on giving back. Nobody argues that the leadership in China might know their trade but you have to be realistic about these things. Time will tell.
            Secondly, a large population means you can surely gather tons of experts on anything, what it doesn’t give you back is time. When someone has a stealth fighter they’ve been developing for 40+ years, you can’t say that with triple the amount of experts you will equal their tech in 10. Given enough time all armies might achieve technology parity but I was talking about the short to mid future when, despite it’s size the PLA doesn’t really have quantities of high tech stuff.
            Thirdly, don’t forget to relax sometimes and don’t take things so seriously. Not everyone you meet on the Internet is a moron.

            • Just a correction, but the Chinese most assuredly wasted no time finding military applications for gunpowder. Weaponised rocketry, bombs, fire-lances which eventually evolved into guns small and large, you name it. Where they fell short was *further developement* of such applications; by the time the Europeans were huffing and puffing the front door down (with artillery) the Chinese gunpowder weaponry was still essentially the same as it had been around the 1600s, and their military methodology and organisation may actually have been *worse*.

            • Well, on some issues we could debate for a while, and the Internet is most definitely not a place best suited for it… Yes, Herr Carl Von Clausewitz is on my bookshelf, thank you very much, I guess this ad hominem is deserved as I started :)
              .
              Napoleon did not win by using Sun Tzu-like priciples per se, as most sound generals discovered those on their own, as you correctly pointed out. The novelty was that he created the first semi-autonomous army detachment of corpus size with all parts of a huge army on a smaller scale (infantry, cavalry, artillery ), and control several of those with a headquarters administrative machine, aides-du-camps and his megalomanic energy (Look up Martin van Creveld, Command in War). That won him wars. His megalomania lost them, as with most of the dictators in history.
              As for the Romans, I could go on for a while longer on the factors influencing their development and downfall (consul system, winning by attrition etc. etc.), but there is no point, is there, as you probably know most of it…

              On the doctrine, and issues of development:
              the doctrine advantage works when all sides have similar level of tech, and the tech gets developed as per the needs of the times. China’s isolation from the hotpot of European small kingdoms and principalities (their constant warring supporting rapid war tech development) and the vast areas allowing time to react to Asian invaders meant they were sorely unprepared for the Europeans, true. Yet they are still here and were never a real colony, that must mean something. Some other great nations colonized by the same warmongering Europeans (India anyone?) have managed to spring back only after they left.

              There are certain very unique qualities tied to China that make it an exception to rules of downfall of other great nations, and the reasons why it never became a world superpower in history before. Their treasure fleets were on the road to do that once, but as Kellomies points out below, the insanely reactionary leadership stopped what could have become a world dominance practice.

      • “Several thousand years of statesmanship” really amounted to a somewhat regular unification-affluence-collapse-interregnum cycle you know, as well as a rigidly reactionary ruling-class mindset that flat out failed to keep up with the times and duly in part contributed to the eventual final demise of both itself and the Empire. And “war doctrine”? Bitchplease, we’re talking about a land that got fairly regularly overrun by steppe nomads and could mainly handle the Japanese (on at least two occasions) mainly by relying on a truly overwhelming resource base because in terms of military ability it sure wasn’t cutting it.

        ‘Sides, Egypt and Mesopotamia have even stronger claims to such and fat lot of good is that doing them.

  3. All too complicated when all that maters is win rate but taking out the effects of using OP tanks, stock tanks, stronk platoons, seal clubbing, TCs and CW (though I doubt those matter in all but a few cases).

    • Rating should only take account of solo random games and have a separate rating for platoons and companies. The difference between a players solo rating and the platoon rating would be very revealing.

      • Yeah, all that you would need to have 1 number to wave your dick over would be win ratio in solo random battles.
        Nothing else really matters.

        Of course we can get a bit complicated and let’s say exclude first 50 battles on any tank.

        Or go even further then that and compare your win ratio on any tank with general win ratio of that tank to minimise the OPdness of tanks you play.
        But that’s a detail.

  4. I really like the article, and the idea behind it. A deep bow to the author for managing to use borderline comprehensible terminology in a non-native language while dealing with arcane design principles and mathematics formulas, and making it all somewhat interesting while proving his point.

    An lone objection to the stated though:

    “Assisted damage dealt by detracking (Adetrack): A minor factor for all tanks.”

    - Completely wrong approach. This should be weighted the at least same as spotting dmg is, as a skilled medium or scout player playing as a low tier can and will turn the tide of battle with tracking a high tier enemy, and allowing his teammates to kill him/her more easily.

    • Tracking can be very important in some situations, while in some other situations it has basically 0 meaning. Yet we get the same raw number from each of these situations, so you’d must think about great general average of meaningfulness of tracking and weight it with that value. Also I feel this should have more weight in medium style tanks than in high alpha tank destroyers.

      In the end not knowing and not being able to weight the statistical numbers is and will the major downside of any rating done by machine. If you want great rating… you need real people with good knowledge of the issue (the game here) to do the rating (and even that is not perfect) – computer will always fall short.

      • in that matter tracking assisted damage is similar to cap points.

        If you sneak behind enemy lines, play hide and seek on enemy cap and get 100 cap points while your team loses 3:13, those cap points are important.
        If you stood 1 minute in cap circle when the team was looking for a last camping enemy those points are worthless.

        In the same manner, if you detracked enemy scout passing by on a way for your 3 arties – this detrack is worth more then damaging this scout.
        On the other hand, if there is slow enemy tank in a middle of an empty field with 9 tanks aiming at him, detracking him doesn’t matter.

        It’s worth is situational and does not depend on what tank are you playing at the moment.

        • Ehh, I’d say even if there is a heavy in an open field and three of your friendlies are behind him, it is still valuable to track him. It gives your friends better chance to aim, stops him from rotating (and thus getting his gun on target as quickly) and basically ruining his chances of doing anything to stop your buddies from destroying him.

          The moments where tracking isn’t as useful is usually one-on-one moments where you need to do damage, not immobilise the enemy, but that point is moot as you’d get no tracking assist damage alone anyway.

  5. Very interesting read. Thx for sharing.
    Due to its complexity I doubt that anything like that will be implemented though.

    The complexity leads to accuracy but is hard to implement correctly.
    Imo the tracking damage should get a major influence on the rating though, as tracking generally is a very effective way to kill targets. Either keep em tracked and finish them of in a 1on1 or tracking and dooming them to die in a bulletstorm of your allies.
    Its often way mroe effective to track a target that the team can finish it instead of shooting at the target for 1x your alpha. Imo give it the weight of spotting damage.

  6. Too much time is spent by players worrying about stats and crap these days. Its meant to be a game you play for fun! I really do think the game and community in general would be much better if they’d just get rid of all in-game stats completely. There would be less camping, less abuse, no-one suiciding at the start because they got a crap team etc.

    • Well… maybe pro gamers and WN6 grinders would play more freely and more “nicely”, but I think average and below average players would play even more dumb, only the worst players who don’t get the game all together wouldn’t be effected.

      Personally I think it would be a bad idea, but I could live with that.

  7. World Of Maths.

    We are assuming that the skill parameter consists of one scalar value which interacts with the battle set-up to give an expectation of a win or loss. What if skill is a two (or more) dimensional? Like shooting ability vs positioning etc.

    Also, the arguments against winrate seem to be the particular exceptions which aren’t very hard to detect on their own. Why not start the rating with a winrate and then adjust that to give a more accurate image of the player. In any case, winrate is the best picture of the win expectation without taking the tank choice into account. (Before a player clicks Battle!)

    Instead we are looking at a bunch of garbage stats like spotted and shot accuracy. Even damage done is a bad stat inherently. Imagine a player of constant skill playing the KV-1S (a consistent scrub, if you prefer). He did X damage on average and that was staying roughly the same and led to a skill rating of Y. Then the patch comes in that introduces the TOG II*. The average hp pool of the enemy teams has increased somewhat, which has raised the damage that this player does. He has stayed at the same skill level (still playing the KV-1S), but the rating has increased.

    • You are implying that suddenly everybody and their mother who plays word of tanks now drives a TOG and TOGs with 1400hp are the only thing that appear on the enemy team. If that was the case then you would actually make sense other wise no

  8. I appreciate the ammount of time you have invested in formulating this. The thing is you missed out the ‘kills’ value, witch is important as well.
    You took into consideration for example damage, witch is a very strong indicator, but if you damage 1990 damage of a target with 2000 damage that means you did not kill it.

    For example lets take a perfect example where one player from each team are left alive. Both of them are full hp with 2000points. The one who scores 2000 and kill the other who scored 1990 damage will have only marginal higher PR score. That`s why its important to measure the kill as well, who actually bought the victory.

      • So which is worse?

        1) The player who won’t shoot the last 3 HP off a tank because he won’t get enough damage for his stats?

        2) The player who stalks others to steal kills?

        3) A game that seems *designed* to have the penultimate shot come up just a few HP short of a kill?

        Seriously… kills are very important.Taking guns off the field wins games It would be trivial to put in a negative adjustment factor for someone who had high kills and low damage.

    • This might be however covered by the match outcome parameter. If you get the kill, a win is more likely and that will be reflected in the final PR. In your example the difference in PR won’t be marginal because the one that did 2000 dmg will also benefit from the victory multiplier and the other will not.

      • kills have some meaning, usually killing of 3 hp target is better for a team then dealing 900 damage to full hp target (not always, if there is 5 tanks already trying to finish that 3hp one :D)

        but then we reach a spot “how much damage has equal worth as 1 kill, taking killstealing into an account?”
        And we will never agree on that :D

        For me personally, important stats would be:
        win ratio in solo random battles
        damage upon spotting and damage dealt (half of the damage dealt if it is dealt with someone else’s assistance) with some coefficients

        In WoT spotting a target is MORE important as being the one who damages it. Why? Because spotting a target is more difficult and dangerous then sniping from far away.
        So if we have 2 guys, 1 spotting, 1 sniping, their contribution to the team is equall, but spotter will die faster and sniper will be able to do some more damage.

        In the same way, for me personally, when I compare player with high survival ratio and low survival ratio, I have in mind, that the low SR player did all his damage, spot and influenced the outcome of the battle in a shorter time and more difficult conditions, while high SR player scored part of is kills and damage while mopping up enemy leftovers.

        But the relative importance of each factor is impossible to measure objectively.

        • So true! Spotting is vital! That’s why it is hard, and that’s why they nerfed t-50-2 to shit, and that’s why there is very few good scout players.

  9. meh, it uses cappoints, which have been proven to have very litlle correlation with winrate, also, from what Ive seen wn8 will be a fking masterpiece.

  10. “Everyone begins as a noob. What you care is your opponent’s current skill level, his history makes no sense to you.”
    Totally agree! That’s why I always look at the recent stats on Noobmeter (not only because my recentstats are quite good and my overall is slightly above-average :D )

    • As I don’t have time atm I just scanned through the Mathemathics but at the first glance it looks like a well thought-out work.

  11. @Hillin:

    Copy paste much??? The term “PR”(performance rating) is already used by Mr. Noobmeter for his rating. Why am I not suprised that you are from China?

    And reading only your introduction makes the impression, you also read some WN-Threads

    crawl back behind your big wall!

    • Derp did you even read what PR meant in his article? And reading the introduction some how tells you how the rest of the article is about?

      You are obviously not a very intelligent person

      • Derp die derp …

        if someone doesnt even understand that avg XP (without prem bonus) is useless in its current form, I dont have to further into the rating.

        why it is uselsss?

        - it includes team and victory bonus (remember u want to masure a single player not a team)
        - the formula is more or less a black box, and what little is know doenst fill me with conidence (close combat bonus as example)

        • “a) Employ benchmark values based on statistics, rather than by experience; use relative values rather than absolute ones.” Clearly you didn’t bother to try to understand. I know not everybody have time to read through this but calling the author copy pasting and telling him to crawl back to the wall is a bit too much. Because i would love to see you come up with something better.

        • He already mentions that this PR concept is not possible to implement by third party sources, but by WG themselves because to them the black box is not black … or closed. They can extract individual contributions like damage upon spotting and damage upon a detracked target. RTFA !

          • What happens when WG implements a rating we saw a few weeks ago … fail , because they want to massage the ego of every noob, as long has he plays and pays much.

      • I read enough to understand that the author of this, doesnt understand how the game works.

        “1. The PR value does not simply reflect the personal ability of a player. The contribution to the team is more important.

        It is easy to understand, because World of Tanks is a team-based game. Personal ability, such as damaged dealt, are important, however, a large-scale applied PR system will affect the battle behavior of players, and eventually the global atmosphere of the game in turn (e.g. IMO the official PR system actually encourages players to play more negatively). As for a team-based game, the PR system should encourage people to contribute to the team as much as possible.”

        This isnt 15vs15, but 1vs 29 in most cases. Words like “contribution to the team” and such usually come from muppets on the NA/EU forum, who will never understand, that you dont protect arti by parking your tank 50m in front of it.

        “b) In a single battle, Rating per Tank (RT) of a player makes more sense than his overall PR value.”

        And what do you do, if the number of battles in a given tank is below 500 which already is pretty low??? Per tank ratings are of a very limited values, because most players dont play enough games in them.

        Example: a good player with a WN1800 plays a new tank and had a few bad games in it – low rating. That wont change the fact, that this player will direct his front to the enemy if possible.

        Thats why overall ratings make much more sense. As long as u deal with a small sample size, per tank rating is BS.

        “c) The PR value should represent the recent performance of a player, rather than his whole history.

        Everyone begins as a noob. What you care is your opponent’s current skill level, his history makes no sense to you.”

        Well not always. Imo a player with 1400 WN7 after 5k games is more dangerous than a player with 1400WN7 after 10K games.

        “The average RT of your teammates (RT-teammate) and opponents (RT-opponent) are balance factors. The higher your enemies’ average RT and the lower your teammates’ average RT are, the higher rating point you will get from this battle. That is, you get bonus when you are fighting a strong enemy, or leading a relatively weak team, vice versa.”

        evens out over a larger number of battles. Sometime you fight morons and sometimes you fight unicums. So why include in the first place, everyone faces the same kind of opponents over a larger period of time?!

        “Enemy spotted (S): the number of enemies you spotted (for the first time they are detected) is an important factor for scouts. This factor is heavily weighted for most light and medium tanks, less for others.”

        It isnt the number of enemies u spot first thats important … fucking retarded suicide scouts. And the author really displays lacking understanding of the game here … .

        next paragraph Damage… Chaffeee pure scout … lol … what kind of BS is that. It was the best damage dealer amoung t5 scouts, when it still had competition there (VK2801, T50-2).
        Damage should be compared to other players who played the same tank.

        “Potential damage received (H): Important for tankers, mostly heavy tanks and heavy-armored TDs. Less weighted for lighter heavies (such as Tiger and AMX 50 series) and tough mediums (such as E-50 and T-54), minimum for other type of tanks.”

        Sadly for most battles its only a indicator for the number of noobs in the enemy team. Sure it takes skill to position your tank in a way that makes it harder to penetrate, but that amount of skill (if u want to give it a number) gets lost in noise (inability of the avg tanking noob). The same principle as with cap points, most of them are totally useless and have nothing to do with skill. Thats why you dont include them in any rating – they increase the inaccuracy of your rating.

        “While recording and calculating the damage-related factors, the RT and tier (T) of its related target must be taken in count. For example, the target you hit, or the source of potential damage you received. The ratio of his and your (RT × T) value will be multiplied to the factor score. The higher the target’s RT and T is, the more score you will get from this factor. This is to encourage you to fight high-value targets rather than kicking around noobs, which is beneficial to your team’s interest.”

        I m not sure about that one. But what I m sure about, is that taking out enemy guns is in my and my teams interest. If I have an enemy IS3 and KV1s shooting at my KingTiger and there are both full HP, I ll often (not always) try to kill the KV1s first. Sure the the IS3 in this scenario is the high value target, but taking out guns and or eyes of the enemy (scouts/meds) will make winning much easier.

        ….

        • Agreed on enemy spotted. That should rank as zero for any kind of rating to eliminate the early suicides, and damage from spotting should be weighed enough to make up for it.

  12. And i think that every “rating” is and will be bullsit. If you are a commander preparing for CW, you assign your players to different roles, on different tanks and no single number will tell you everything. For that you have full set of stats. And there is no ultimate best player in everything, you just work with what you have to fit best.

    What would be very faluable is “30days” stats avalibile on your game profile – or last 100/300battles if there is not enough data to call for statistical analysis, but for hardcore players, that a commander is interested with, there will be more than enough data from 30days. Nobody is indeed interested how bad you were a year ago or if your today’s proficiency has a handbrake of your bad start and learning curve applied.

    All this rating is a marketing tool to steer community towards certain behaviors – let’s not split the hair and forget about the whole WPR thing. Frank, please start lobbying here on implementing 30days stats into the game ;) .

    • Everyone (that understands how it works) knows that WG PR is a marketing tool meant to make you play more. Better PR padding is possible through more matches. Which is what WG wants from us, to play play play. As long as it doesn’t “do” anything, such as serve in a possible skill based MM for 7/42, so that it’s purpose is objectivity to the highest degree technically possible, no WG rating will mean anything.

      • Yes, WPR is bad because of “marketing” factor – but at the same time it serves good role if insiting players NOT to make second, third and tenth account in the purpose of statpadding. That’s what the battle count part is for. That’s kind of protection. At the same time it helps them sell you the product ;) , but nothing is perfect.
        It is bad for comparing players one against another. It is good for boosting your interest into the game, even if you are bad. Some learn faster, others slower – if you can’t see progress you simply quit. Is it good if somebody quits? You won’t have opponents to sealclub…
        You just compare against your former self.

        30days stats would give additional protection – as they give no advantage in 10th account statpadding. They give no incentive to keep your account too ;) , so they can keep WPR in use while providing us more data, please.

    • I don’t know who you play for but my CW team doesn’t work like that (EFE-X). You play the tanks you have available, you’re expected to be proficient in anything you can field and what you’re expected to field is what the clan desires because they work well together.

      However I agree, WPR system is just a way to get people to play lots of higher tier battles. This means more money for WG. No wonder people skip the mid tiers, which are often far more fun that Tier X.

      • That’s exactly what i’ve said. And the only way to compare and rate your proficiency (besides playing together and seeing what and how you play with) is your recent stats. Not some made up rating, not previous year stats.

  13. 中国的效率(XVM)插件比W7准确和高级不少。它还会读取双方实力对比,点亮伤害,权重,潜在伤害等等一些数据,并且最新版本的XVM是显示单车的实力增幅和评分,不再注重总分。而且只读取最近场次的战斗数据。更客观反映玩家最近表现

  14. I strongly don’t agree with weighting by tank or by tank class. Damage is damage, no matter who did it, and spotting damage is spotting damage, no matter who did it. The inherent tank statistics make sure that equally skilled player in a TD will out-damage an equally skilled player in a light tank of same tier.

  15. Its all well and good until you realise that until Wargaming publishes the statistics needed or adds them to an API, this guys idea is a pipe dream.

    Until this happen WN7 seems to be the most valid system. Efficiency is just to prone to abuse and noobmeter won’t publish their algorithm.

  16. I hope that wargaming will publish a new version of the api. I really want to get the statistics for each tank e.g. player A did 1500 avg damage, 1.3 avg cappoints on his VK 3601 and so on. With this it is possible to create a more accurate raiting and avoid stat pushing with low tiers.

    • fuck, I’m cringing so bad watching this

      the fake acting and enthusiasm makes it really awkward to watch… it looks like she doesn’t even want to be there, she just wants to go home and do girl things.

      “this tank computer game is boring, so many smelly nerdy men around me, why am I promoting this game, I want to go home”

      • It’s just a Japan thing, you won’t understand this.
        Still, GUP seems even more absurd now, doesn’t it :) ?

  17. Even just glancing over these formulas, I can tell this isn’t going anywhere.

    Why? No exponents, that’s why. Is someone with 4000 avg damage twice as good as someone with 2000? I think not. Rather, I’d say he’s 4x as good, easily. The difference factor needs to be squared. The same applies to cap and decap points. Getting 10 cap points may mean you sat on the cap while the rest of the team killed everything. Getting 100 means you win the game.

    The second reason is even simpler. This rating formula tries to put all aspects of the game in a single-component number. For example, a score between 0 and 9999 like the current ratings. That, I believe, is a mistake, that has yet to be fixed by anyone. Rather, mutiple numbers, or a multiple-component number is required to show people’s strong and weak points. Say our new rating is a 4-component (4 digits with each digit relating to a different aspect) number. These 4 components could relate to, for example:
    -Average scouted per battle (S)
    -Average cap points per battle (C)
    -Average decap points per battle (D)
    -Damage taken/dealt ratio, or in one word: lethality (L)

    Each would be rated between 0-9. A coarse rating, but then again, there’s no practical difference between 1200 and 1201 efficiency/WN7/etc either.
    An example of 3629 in this “SCDL”-rating, would mean the player is not much of a scout (first digit), is more agressive (2nd digit) than defensive (3rd digit) and has great capability to destroy enemy tanks (4th digit).

    On the other hand, SCDL-8173 would mean a great scout that knows when to go back to base to decap, but isn’t much of a threat when you encounter him in a big heavy tank. Despite being much higher than the previous number, it doesn’t necessarily mean this player is better, simply that his strengths lie elsewhere. As such, the whole point of statpadding is also diminished (though not eliminated) because the highest number doesn’t mean as much anymore (of course 9999 is still better than 1111).

    Other possible components could be:
    -Average tier played (rounded)
    -Kill/death ratio
    -Hit ratio (pointless, but possible)
    -Win ratio (global or last 1000)
    -etc.

  18. the approach is wrong,

    trying to find the “correct” formula to represent some X, but the problem is X is not a real property it’s a compounded and calculated from a big set of distinct parameters, so whats the point if people can’t decide on what to agree? what is more important, this param or this param for player skill?

    better use of your time would be to download
    http://creativemachines.cornell.edu/eureqa

    and let the software find the formula’s in the data,
    good luck.

  19. Well, I didn’t read, cause I haven’t any Interesse in how the formula works, but that it works. So I agree with you that we current system is like shit.

  20. Just wanted to comment a lot of these ideas are being used in the development of WN8. Per-tank ratings (or actually the best we can do with the stats we have) are used, so light tanks are no longer bonkered by WN8 as they were in WN7, but it is pretty surprising how many similar ideas are in this post to what we are developing. Of course, implementation limitations apply, so many nice things like DUD can´t be used because they are not available.

    Anyways, Eureqa uses neural networks to find relationships, in WN8 so far we have been using evolutionary algorithms, which are a bit more primitive, but can do the job. Nice find on Eureqa though, didn´t knopw there was such a user-friendly, GUI-based implementation of neural network algorithms.

    • What do you even relate the datapoints to?

      That is, if your formula is z=f(x,y) then all the stats are x,y, but what do you select as z to get the formula?

      • We correlate to a corrected version of winrate, as in (winrate on each tank/expected winrate on each tank)-.70. (0.70 corresponds to the winrate ratio for the 0.5% lowest winrate players in our database).

        • In fact, what this does is exactly what you are talking about regarding “exponents”. By setting a baseline for each stat, we turn the interval scale into a ratio scale, which really measures how many times people are better at making their team win. Using this ratio scale, WN8 scales from 0-4000 instead of 500-2500 like WN7 does. This also means that someone with 1200 WN8 is twice at good at making his team win than a 600 WN8 player.

          This is obviously not true for WN7, as a unicum (1800 WN7) is hardly only 1.5 times as skilled as a decent (1200 WN7) player, and top players (2400 WN7) are hardly only 2 times as good.

          On WN8, a unicum (aprox. 2400 WN8), is twice as good as a decent player (1200), and top players (3600) are approximately 3 times as good as a decent player. Makes more sense doesnt it?

          I believe this is basically what you were asking for a few posts above isn´t it? Ratio scales provide added significance to the rating, as it scales better throughout the range of values.

          • That’s part of it, yes. Sadly, it’s still only one number. The problem with one number is that it doesn’t tell you HOW a certain player contributes to his battles.
            For example, on Murovanka, it’s not so much the snipers (although important) but the one who scouts the magic forest who really wins the game for the team. Knowing the enemy has a light tank player with very high WN7/8, you might go and intercept him, to avoid getting steamrolled. With multi-component ratings you might notice he got that high WN7 from doing damage, and he is actually a shit scout, so you would then realise intercepting him is a waste of time and HP.
            Smart use of this data allows you to pick your engagements and targets with better care.
            For example, if enemy team has a lot of people with high capture-rating, you may decide to go defend the cap. If they have low cap-rating, go right the other way. While not reliable, it will increase your chances of intercepting enemy lemming-trains and breakthroughs.

            Not just that, it makes for more reliable calculation of pre-battle win-chance.
            With global WN7/8, if a player has high stats, you’ll be forced to assume he’s good in his current tank too. Multi-component rating could tell you he is a good scout…. but in this battle he plays TD, so his contribution to win-chance formula is decreased. However, if he plays a light tank, his factor is increased.

            • Finally, one of the most important points: all stat ratings as they currently exist are subject to padding/twinking. This is often done by focussing on one aspect of the rating. In old efficiency, this was cap points. In WN, this is damage. People will then ignore all other aspects, and wonder why they lost despite doing over 9000 damage. That is, if you dont finish people off because they have too little HP to be “worth” a shell, they’re gonna keep killing your noobteam faster, and you cant win. So despite increasing rating, it doesnt tell you how good a person is.

              Component stats avoid this alltogether, because to have high ratings, you’ll have to be good at everything, and stat-padders are easily picked out as a result.

          • Actually, i just read your post again and I would like to clarify my original intents with the exponents in the formula.

            The desired result here is a logarithmic scale, as opposed to a linear one. On a scale of 0-9, someone with an 8 is not 2x as good as someone with a 4, but rather 2^(8-4) = 2^4 = 16x as good. This needs not be base 2, but it serves as an example. In reality, you might find base 1.2 works better, or base 3, I don’t know. This needs to be evaluated with the top rating of the server being a 9 and the lowest rating a 0.

            One more question: will WN8 be based on global ratings, or last n-thousand battles? Because I see great merit in the last. Global ratings are “diluted” with people’s noob-time in the game. It gives a bad representation of what the player is capable of and promotes making new accounts solely for statpadding. It will thereby eliminate the inaccuracy of the influence of statpadders on the win-chance formula, whick makes it vastly more reliable.

  21. I love the fact you state that no PR system will adequately serve as a comprehensive tool for player comparison. I hope everyone who touts WN and other Eff Ratings pay attention to your wise words.

    I also like the fact you include AID, cap and def points on your formula.

    I disagree, however, with your statement that Def points are more important than Cap points. My own personal experience is that I get more Def points in losses than in wins. The majority of wins neither produce nor require any Def points at all.

    Cap points are a tougher issue because Capping is a multiple edged sword. frequently smart players cap not to produce a cap victory but to mandate a response by the enemy team, often shifting the outcome of the battle or easing a victory. Such cap points are every bit as valid in terms of TEAM play as are those produced by a cap victory.

    To me, Wargaming would best help players improve or evaluate their personal play by creating comparative stats for specific important parameters of play. You have already done this to an extent with your Kill/Death and Damage Caused/Received meters on player’s profile pages on the portal sites.

    Such an idea should be made TANK SPECIFIC and include various additional parameters such as (not listed in order of relevance):

    1) Damage
    2) Spots
    3) AID
    4) Cap
    5) Def
    6) XP
    7) Win Rate
    8) Kills

    Giving players such tools would allow them to measure their performance against everyone else playing that tank on their server. Player could then easily see how they measure up to other players using THAT TANK on their server and see where they needed to focus on improving.

    Trying to give us a holistic rating which covers our entire WOT career is just downright goofy – and accomplishes nothing useful. Give us stats the have actual substance instead!

  22. Hi All,

    My name is Matt Fleming and I work for Nutonian.

    I just wanted to clear one thing up. Eureqa uses symbolic regression as opposed to neural networks. I’m thrilled to see your feedback on the product!

    Best,

    -Matt

    • You know, it would be a great tool to finally decipher xp formula… And they we would know what and how to do to boost the income ;) . Somebody get at that.
      It is not valid to create any rating, who came with that idea?