Kirb your enthusiasm!

WEBSITE HOSTED AT: www.3plusplus.net

"Pink isn't a color. It's a lifestyle." - Chumbalaya
"...generalship should be informing list building." - Sir Biscuit
"I buy models with my excess money" - Valkyrie whilst a waitress leans over him


Monday, July 25, 2011

The Facts of Tournament Results

We have a guest article, finally, from Messanger of Death of Imperial Life lack-of-game. I've been asking him to do this article for a while now in relation to taking tournament results as canon in terms of 'this army is good cause it won a tournament herp a derp.' Hopefully there will be a follow-up with some more lay understandings but is otherwise a highly recommended read for all players. 

Hello all my Imaginary Friends. Today we will take a look at how to determine whether the results of a tournament apply to you. And hopefully by the end of this article you will understand why using tournaments as a means to compare armies is not only flawed, but just plain stoopid. So sit back and be prepared to have your brains melted from extreme levels of boredom.



Knowledge is power, it allows us to build balanced lists and play the game we all love. However, most of our understanding of the game mechanics is through intuition and reasoning as there is an absence of supportable or confirmable data. To fill this gap, some players use results of tournaments as a source of information. And this is a problem. The lists taken to tournaments are mainly produced through trial and error where someone has thrown several lists together until one works for them. This is a haphazard approach where the results may not be reproduced a second time think rock-paper-scissors. Some players do this 100s of times to hone their skills and lists... in a way they are conducting their own research where we, the player base, analyse the tournament results.

However, it is never good enough to just conduct research. In order to use or apply research it is necessary to make a judgement about the quality of the research and its relevance to a particular context or purpose. Or in layman terms, you need to know how good the information is practical the tournament results are before you use it.

To determine the quality of the research we need to know how the researcher tournament organiser controlled the study. A researcher controls their study by imposing rules so as to decrease the possibility of error and thus increase the probability that the study’s findings are an accurate reflection of reality. In a table top setting this is done with the rulebook, codices and tournament design. Through control, the researcher tournament organiser can reduce the influence or confounding effect of extraneous variables on the study variables. To do this they need to ensure the missions are balanced, there is a suitable amount and mix of terrain, and that the player scoring and seeding don’t allow the system to be gamed. Without a rigorous tournament design the results are neither valid nor reliable as they are unlikely to be reproduced again.
To determine the relevance to a particular context or purpose, we need to look at the sample, the sample size and just how generalised the results are.

A sample is the subset of a population selected to participate in a research. For this discussion the sample of a tournament is the army builds that are being played. In order for a sample to be representative, it must be like the target population in as many ways as possible. Composition scoring and list tailoring for bias missions can mess around with the sample to the point that it no longer represents the population. The inclusion of comp automatically excludes a tournament from analysis. The sample size is the size of the sample obviously. As a general rule, the larger the sample, the more representative it is of the population, while the smaller the sample the larger the sampling error.

Generalisation is the ability to apply study results from the sample to the population. This is where everything comes to show just how stoopid it is to think that the results from any tournament, of any size, is applicable to how you play with your war dollies. Very few tournaments have missions that don’t screw around with the game mechanics. This year’s Lords of Terra are one of the extreme examples of this where the missions *self-edit* mechanised players. But even the most balanced missions with a large enough sample won’t help simply because the larger the sample size the worse the terrain gets. AdeptiCon 2011 is possibly the best example of this. Even with thousands of hours of volunteer work they still didn’t have a suitable mix of terrain for the 128 tables. No knock against them. Even if there is a tournament where everything comes together the results can’t be generalised to the population. Why? I will give you two reasons.

Firstly, it is a single tournament. The hallmark of a good research study is that the results can be reproduced. Unless everybody replays that tournament we won’t know if the tournament results are reliable and valid. Secondly, you may not be the target population. The build you play may not have been a part of that population or the player using it may not be at your skill level... even with over a hundred players the sample size is still too small to reflect the Global Gaming CommunityTM.

I will skip a conclusion and instead help revive your brain with this...

Comments (25)

Loading... Logging you in...
  • Logged in as
Congratulations, you just summarised the "research methods" module of my degree. Have you by any chance done the same module? :P
Nonono, Lords of Terra had foot Eldar come *both* first and second, and Necrons a close third! You don't understand how my army works man. They win games. And that's High Lords too, you know, the guys who play really well.

:p
3 replies · active 714 weeks ago
heard you got best general though so well done *pats*.
I worked out if you paid someone to paint your army to a 30/30, played like a champ, and brought an army that would vt2 the shakes, you'd still come a minimum of 4th out of 30 (ie 5 massacre/losses).
Not sure if 4 tanks counts as foot Eldar.
I thoroughly enjoyed this post. An obviously educated individual tacking a (true) scientific approach to the hobby, for those that simply read tournament lists and believe they are the end all, be all of gaming. Having taken a few psychology classes myself, I too recognize and understand your references to research methods. Hopefully this post will be helpful to those players who are feeling down on their luck with their current armies. Looking forward to your next post regarding the topic. ;)
TL;DR: Tourneys don't show skill.

However, I do feel it worth noting that the power of dance might.
General Smooth's avatar

General Smooth · 714 weeks ago

My one problem with this is that by this argument, all attempts at assessing a codex is confounded. As every tournament will have different missions hence preventing generalistion. If you create a theoretical framework whereby you assess a codexes value you then lose environmental validity as it is not transferable because they mess with the missions.

If the context you are concerned with is an army's performance in a tournament a spread of tournament results are still the best GUIDELINE. However these guidelines are sufficient to deny certain sweeping statments and suggest others but not establish anything as a fact.

So we all get to say anything and claim to be right. Horrrrayyyyyyyyy!!!!!!!!!!!!!
11 replies · active 714 weeks ago
It is over 800 words long. Anything longer and not even a beer commercial will be able to save people's brains. Kirby has suggested that I do a follow up article that goes into more detail on certain aspects.

Funny thing is this article is the result of people arguing that Orks winning tournaments must mean that they are good.

Messanger
Not ALL atempts, just practical ones.

Hypothetically, if you could create a venue where the following could happen, then you would have an excellent and sound method of determining various codices strengths and weaknesses:

- You provide an army of 1800-2000pts of every tournament archetype for each codex. Lash, Air Cav, Mechdar, Jetseer, Jumpers, Rodeo etc etc etc. Obviously there are going to be small variations, but so long as the overall archetype is kept intact then it shouldn't matter too much.

- You have a gaming table with exactly the correct amount of suggested terrain coverage, and make sure there is an equal mix of LoS blocking, area, ruins, forests and so on. The terrain is deployed equally and fairly and does not change between games.

- You find two players who are considered equally skilled and have them play each other. They get five or more "practice" games every time they pick up a new list/codex, and then 9 "real" games where the results are logged and counted. These 9 games are the missions from the rulebook, with every combination of mission and deployment type.

If you could accomplish that, you could start to get some kind of useful data. As it stands, we just have to put up with people claiming that CSM are still a good list because it wins tourneys from time to time...
general smooth 's avatar

general smooth · 714 weeks ago

fraid not, as this is completely irrelevant in the context of real tournament play since such a situation would never exist. It would simply serve for some form of purist discussion on the web.

Something like this http://www.rankingshq.com/rankings/default.aspx?G... if it had enough data (maybe the yanks could do it) might be a useful for guidance. Can't be bothered looking up the US version
Problem with RHQ is it includes all tournaments of any type and are not a reflection of what armies do well but which people do well over any tournament format. This includes comp, paint and quite often sports scores rather than a direct W/L.
general smooth 's avatar

general smooth · 714 weeks ago

Not denying any faults it may have. Personally I trust my own judgement more to pick out what's relevant. However I wouldn't really expect anyone else to trust my judgement so if you're looking for evidence it is still better than anything else I have seen.
Rankings HQ fails on an epic level in providing data... how many of those tournaments had the same mission design? What about terrain coverage? What about sample size?

Remember the hallmark of a good research study is the ability to reproduce the same results...

Messanger
general smooth 's avatar

general smooth · 714 weeks ago

Well you're kind of echoing my point but maybe I didn't explain it properly. Firstly it is based on the premise that we are interested in how armies hold up in tournaments as this is the environment most focused on performance or the environment where people are most interested taking an army that performs well. The second thing then is that any non flawed environment absent of confounding factors, as covered in the article, doesn't exist as all tournaments are flawed environment for a pure scientific endeavor. Nowhere does 40k codex exist in a void. This then means that the ecological validity ( the ability of the research to predict results when applied to the real world) of a sterile environment is next to meaningless to most people as an army wide assessment.

We will do math hammer on units and even lists (the fad being Nikos system - which I use a little myself) but as a method of highlighting things we may have missed or as an indicator of things we should examine. Take psychometric testing as an example these are most powerfully used in recruitment as decision aids not decision making tools. Why? Because we as individuals exist in a context and any outcome we want to predict are almost impossibly multi-factorial.

In fact all these counfounding variables addressed in the article are in fact an important part of the context. e.g. if an army repeatedly performs well maybe it is suited to a large number of the preferred missions used in tournaments or perhaps it performs well against popular builds. This is not pure science. (as an add on an interesting if everyone looks at this data and behaved accordingly this will of course cause a shift in the data thus immediately being made useless by the people using it correctly - would you have broken that vase if I hadn't said anything????)

Now I offer examples and make these points as a contrarian view but I am not polarised on the issue. I don't think I have ever looked at that data before. Nonetheless this sort of stuff is worth looking at when assessing the competitiveness of an army in a tournament environment, as flawed as it may be. Not because it is proof but because it is an indicator!
Assuming that any tournament that matters has a reasonable amount of terrain and sticks closely to rulebook missions, I think that it would be very relevant indeed.

The value of research is not just to say "X is better than Y in this specific situation" but rather to say "X is better than Y generally speaking". For example Mechdar has a terrible matchup against Razorwolves - mitigated slightly in DoW missions, but otherwise fairly constant. Unless something out of the ordinary happens, Mechdar will force a draw or lose. Considering that what, 80% of all games of 40k are 1750-2000pts using rulebook missions, I think it would give a great baseline for gauging which armies are "Good", which are "Better" than each other, and which are "Sucky beyond belief".
Not all attempts at assessing a codex... just using tournaments as a source of data in evaluating codices and army builds.

If you look at what I say will notice that the spread of tournament results will not be valid... lack of control by the Tournament Organiser. The only exception to this is the NOVA design by Mike. However, a single NOVA tournament does not provide enough data as the sample size is far too small to ensure that it is representative...

Messanger
I wouldn't say NOVA is an exception. Rather, of all the tournament formats it is the most valid in determining 'good' and 'bad' armies. We again have all the other associated issues of player skill, match-ups, dice and personal differences, etc. but what MVB has done is make an excellent system to control some of the factors we have control over.
fredbob524's avatar

fredbob524 · 714 weeks ago

tl;dr
1 reply · active 714 weeks ago
I know right... 3 minutes of my life lost.
This is a truly useless post. Provide solutions, don't just regurgitate what you learned in class and stick 40k stickers all over it. If you had bothered with a conclusion this might have been worth reading (with vast amounts of editing).
The purpose of the article isn't to provide solutions. It is to point out how stoopid it is for players to use tournament results in their arguments over which of their war dollies are better. The target audience isn't those who already have a basic understanding of evidence based research... it is for those that don't.

Messanger
1 reply · active less than 1 minute ago
I'd say you failed at your goal by not providing a better alternative solution. Essentially you said, "you are dumb for doing what you do" without providing a better way of doing it. This is not convincing.

My main issue with the post is that absent a better objective method to judge an army or a player, then tournament results are the best method and not something to be derided.
There were 10 tournaments. 7 of them were won by Space Wolves. Ergo, Space Wolves are good.

Post a new comment

Comments by

Follow us on Facebook!

Related Posts Plugin for WordPress, Blogger...