The Dangers of Playtesting

Posted on February 20, 2012

You don’t win a Pro Tour by not doing any playtesting. Just ask Brian Kibler or Paulo Vitor Damo da Rosa, who met in the finals of Pro Tour: Dark Ascension after spending the week playtesting with Team ChannelFireball at a house on the beach. On the other hand, it’s not like they were the only two players who spent a lot of time testing before the PT. The point is that it’s not enough to playtest. You need to have a good process, make good testing decisions, and most importantly, make good conclusions and choices based on that testing. There are many common mistakes that need to be avoided when playtesting.

By now, many of you will have heard about the two testing beach houses that featured Team ChannelFireball in one and Jon Finkel’s testing team in the other. They weren’t the only ones with the testing-beach-house idea, however. Pro Tour Hall of Famer Rob Dougherty also spent the week camped out testing with qualified players at a beach house. His many housemates included lesser known players such as Melissa DeTora, Craig Edwards, James Searles, and Steve Guillerm. By the end of the week, there were twelve players working together in their house.

In a playtesting situation like this, you need to decide on a starting point. Some will remember a couple weeks ago when I mentioned that I felt W/B tokens would be among the first decks people tried:

"W/B Tokens"

Planeswalkers (1)
1 Sorin, Lord of Innistrad

Well, Rob agreed that it was the obvious deck, and he decided that it would be his starting point. The next major step in the process is developing a testing gauntlet. Your testing will be almost useless if you don’t test against relevant decks. Before the changes to Standard, the two big decks were Wolf Run Ramp and various Delver decks, so those decks were immediately designated to be key parts of their testing pool of decks.

One of the biggest goals of any major testing effort, especially when the environment is having new cards added, is to break the format. While a worthy goal, it’s one that comes with many pitfalls. But here’s one thing to consider: How often are Pro Tours won by a deck that breaks the format? Let’s look at recent Standard Pro Tour–winning decks for example:

2012 – Hawaii – Wolf Run Ramp

2011 – San Francisco – Wolf Run Ramp

2011 – Paris – Caw-Go

2010 – Chiba – U/B Control

2010 – San Diego – Jund

2009 – Rome – Naya

2009 – Kyoto – 5-Color Control

2008 – Memphis – Faeries

2008 – Hollywood – Elves

The way I determine whether a deck broke the format is with two tests. First, is the deck good enough to win the biggest event in the format? All of the decks above meet that standard by virtue of winning a Pro Tour. Second, is the deck new and innovative compared to the decks that had been doing well in the format at the time? Most of the decks above don’t pass this test. Charles Gindy’s Elf deck that won PT: Hollywood and Gabriel Nassif’s 5-Color Control deck that was tops at Worlds in Kyoto are probably the two decks that were the most innovative. The other seven decks were all well-known entities.

The best example of domination without inventing a new deck may even be the Wolf Run deck that Brian Kibler just won PT: Dark Ascension with. That’s because it’s the same archetype that Jun’ya Iyanaga just won Worlds in San Francisco with. This trend of winning Standard Pro Tours with known decks is important to keep in mind when testing.

Obviously, not everyone is going to win a Pro Tour. You need to keep your goals in mind when testing. If your goal is to win the event, perhaps breaking the format shouldn’t be your goal. If your goal is to become famous as a deck innovator, go for it. Too many playtesters try to do both and fail to accomplish either.

One of the problems experienced by Rob’s loyal band of beach-house testers was that none of them wanted to adopt a known deck to champion. One of the advantages of having a large group of playtesters is the ability to have each tester try to perfect a different deck. Generally, if a deck doesn’t have a tester championing it, the best version of it won’t be discovered, and its maximum potential won’t be unlocked.

The good news was that Rob’s team built Delver and Wolf Run to have available as test decks. The bad news was that nobody really wanted to play them. Everyone seemed to have an idea for a new deck that they wanted to break the format with, and no one really wanted to be the Delver or Ramp player in testing. This led to two problems.

First, there was the classic problem of inbred testing. Too many games were played between new decks that were under consideration, and not enough games were played against the previously existing metagame. Not only were both of the PT finalists playing Ramp, but Delver was the most played deck at the event. Thus, the testers weren’t as experienced playing against the dominant decks at the PT as they could and should have been. They might also not have had a realistic impression of those matchups, since they weren’t testing against someone who believed in those decks.

The second problem was it meant that none of the testers seriously considered playing Delver or Ramp. Given that the Top 8 of the PT included three Ramp decks and four Delver decks, perhaps they would have been good decks to consider playing. It’s worth noting that the one player at Rob’s house who played Ramp in the event, Simon Harnden, ended up with a solid fifty-seventh-place finish.

Rob’s band of rising stars also ran into another major, yet common, playtesting problem. They over-evolved their metagame. If you’re going to have a house filled with dedicated playtesters for an entire week, it’s safe to say you’re going to be doing more testing that the average competitor. Most PT players can’t afford the time or the money to spend an entire week at the site ahead of time. Between that and the advantages of playing a deck that you’re familiar with, it’s pretty common for Standard events to be filled with known decks—like what happened in Hawaii.

As a result, when doing the kind of kind of intensive testing that Rob’s merry band were doing, there is the danger of over-tuning. In this case, Rob started with a W/B token deck fairly similar to the one I listed above. Initially, he crushed all of his testing opponents. They naturally concluded that Rob was playing the deck to beat.

At this point, I find what is usually the correct move is for everyone to climb on the bandwagon and start testing the initially dominant deck against the existing metagame. Then, the whole team can work together until they have the best version and everyone has perfected playing and sideboarding it. In this case, Rob’s mates stuck with their preferred archetypes and just tried to tune them to beat Rob’s deck.

Their first instinct was understandable, and that’s why it’s such a common mistake. As I’ve said many times, there are great benefits to playing a deck that’s your style and/or that you have experience with. Many of the players Rob was testing with had a style preference for blue control. This was among the reasons that none of them was especially inclined to play ramp, Delver, or tokens in the PT. The other thing that scared most of them away from tokens was the overreaction of their metagame. After Rob started smashing his fellow testers with tokens, they suddenly started testing decks featuring four Ratchet Bombs and various other mass-removal spells.

By the time the event rolled around and it was time to choose what deck to play, Rob’s deck was no longer dominating, and it had even morphed away from being so dedicated to tokens. He was now even main-decking four Loyal Cathar in order to help against what now seemed like a metagame stacked with mass removal. Now Rob was forced to look back at his week of testing and make a conclusion about what he should play.

I believe Rob’s vast Pro Tour experience was of great assistance to him here. He looked back at the week of testing done by his team and realized that they were lacking in process—and that perhaps their metagame had become over-evolved. Maybe he even thought back to the testing that Team Your Move Games did for PT: Chicago 2000. At that event, we quickly determined that the R/G Fires of Yavimaya deck was the best deck. At that point, Rob went on vacation with his family while the rest of the team continued testing. Rob ended up making the Top 8 with a basic Fires deck while Dave Humpherys and I struggled with decks that had formed out of an overdeveloped metagame.

So, Rob decided to go back to the deck that had dominated at the beginning of his testing, and this is what he played:

"Rob Dougherty’s W/B Tokens"

As I predicted in my last article, Ratchet Bomb decks may be the future of the format, but an aggressive token deck was a fine choice for this event. Rob finished 7–2–1 in Standard, with the draw being intentional, to ensure him of a good finish at the end of the event. Rob finished with a strong thirty-ninth place to lead his testing group.

While tokens started as the deck to beat in their metagame, only Rob stuck with it. The rest of the testers went with various other options, such as W/U/B control, Tezzeret control, G/W aggro, and of course, the one Ramp deck. While Rob wasn’t the only member of his testing team who ended up being happy with the deck he played in Standard, they pretty much all agreed that there was a need for them to improve their process for the next Pro Tour.

TAGS articles, standard, theory, constructed, deck lists, testing, darwin kastle, wisdom

The Dangers of Playtesting

"W/B Tokens"

"Rob Dougherty’s W/B Tokens"

Recent Articles

Commanding Commander – Durnan of the Yawning Portal/

Five Mono-Colored Lists for Outlaws of Thunder Junction Standard

Ranking the Mythics of Gatecrash

Playing Standard Annie Legends

How Should I Upgrade the Desert Bloom Precon?

Lessons Learned from Outlaws of Thunder Junction Prerelease

Stacking the Deck: Commander Staples for Blue

Dial H for Heroclix Episode 514: Teambuilding and Preparing for Competitive

Five Choices with Azorius in Pioneer

Looting with Permanents in Commander