Groundhog Day, or, the Problem with A/B Testing

One of my all-time favorite movies…watched it over 20 times. :slight_smile:

I think your argument is a tad off for A/B testing, because you ignore the fact that Phil should actually be dating hundred or thousands of women (his target audience) to get the most accurate data (a good enough sample group that gives you 95% confidence rate). Some would fall off (dump him along the way, not answer his calls) and some will convert right to the end of the funnel. But the important thing the test(s) would tell him, what are the most likely variables that convert to “catching” the woman or “hitting a home run” whatever his desired outcome was defined in the beginning for his ideal woman. (Aka conversion)

Jeff, I’ve heard you mention religion and purpose a few times now on the podcast and on this blog. You should really read Chesterton:

http://www.gutenberg.org/cache/epub/130/pg130.html

I disagree completely.

The problem with Phil’s success (or lack thereof) isn’t a limitation of A/B testing. It also isn’t because he was ‘faking’ it.

The problem is that when you give someone a series of A/B tests, all you end up with is the result of their those tests.

In other words, she got EXACTLY what she THOUGHT she wanted.

We see this ALL THE TIME, with all sorts of things. For example, why would you hire an interior decorator? You can use a series of tests to determine what, in your own opinion, is the best at each step of the way.

It’s not ‘fake’. It’s not that you are lying to yourself or failing to buy the paint color you thought you wanted.

The problem is that people rarely know what they want. We’re so much better at knowing when we have something we don’t want - IE, ‘God, this room is ugly’.

Hi Jeff,

Have you ever actually done A/B testing? I find you often preach from the pulpit.

-H

Knowing that Ground Hog was a big-name Hollywood production, it most likely went through a whole bunch of A/B testing of its own before it was released. And seeing you liked the result, we can conclude that A/B testing works just fine.

@Ben Anderson
Thanks for pointing this out. I did not know that Chesterton was public domain. I have been meaning to read this for awhile and just downloaded it to my iTouch. From what I have heard about his stuff and based on Jeff’s comments, I too would recommend that he read it.

“It has no feeling, no empathy, and at worst, it’s dishonest.”

I didn’t know testing was used for psychiatric consoling, or for friendship, or for ‘winning hearts and minds’.

I thought it was about producing reliable software.

It’s important to notice that in the movie Groundhog Day, Phil takes up ice sculpting. This is also quite a beautiful symbol to his eventual balance. Ice sculpting by it’s nature is eternally evanescent. You work long and hard, but ultimately your art will always un-create itself.

For Phil, he could have been carving in titanium, and yet he’d have the same outcome. His actions only had the meaning of that moment. There’s an acceptance of change that is symbolized by his talent.

imho

I think you’re missing the point with A/B testing, it’s largely used to convince clients (and prospective clients) that the sizable sum of money they’ve just paid you or going to pay you to redesign their website is actually worth it and generates more turnover. The people who make the money decisions want this, you can’t just tell them ‘it looks nicer’. If you can’t prove that your work generates results they aren’t going hire you?

AB testing - If there were one example to confirm the merits of A-B testing wouldn’t it be the astounding variety, beauty and well adaptedness of the millions of life forms on this planet? Isn’t every living thing you see around you a product of ab testing? Consider your brain, that wonderfully sophisticated device that can contemplate A-B testing, read this article and understand these comments, isn’t it just a product of a very long series of A-B tests. Isn’t evolution the ultimate confirmation of the power of A-B testing? There were many comments in this series that A-B is limited by your ingenuity in coming up with good variations to test. Nature in fact eschews purposeful design of new variations. In evolution variations are in fact introduced completely randomly on A to create B without design and without regard to the consequences they will have. The results are nevertheless spectacular.

Doesn’t evolution also refute the sandpaper analogy? After all, homosapiens, blue whales, centipedes, blue green algae, and tube worms in undersea volcanic vents that consume sulfur for energy all have the same great great grandmother. It seems to be just a question of how many iterations you are willing to do and perhaps whether you are willing to try truly random variations.

Evolution works through a form of A/B testing. That seems to be working pretty well.

I should have read the other comments first. Several people already presented the evolution argument:)

Whilst I agree with the fundamentals of this post and the problems of A/B testing I think the analogy with Groundhog Day is completely wrong.

The “Perfect Date” was never the perfect outcome of Groundhog Day and therefore achieving it did not end the movie. Once that particular goal was achieved it produced another failure. The true perfect outcome was for Groundhog Day to end. This did eventually happen after many more outcomes were achieved and this is when the A/B testing scenario happened.

So whilst A/B testing can be considered “bad”. Groundhog Day itself is an absolutely perfect example of how it could work, not how it couldn’t.

One thing about movies is although the writers may want to send a positive message, that message won’t get out there if it isn’t believable. It may be subconscious, but there is usually some kind of truth hidden in there - just not what it appears to be on the surface.

In groundhog day, Phil eventually succeeded because (1) he impressed Rita indirectly, by impressing everyone else, and (2) he had decades of experience to develop the knowledge and abilities and experience he needed to impress everyone else.

Of course was also relaxed, natural and happy that last day - but that’s a social status symbol too (hard to be relaxed and happy if you’re under stress and not coping). Also, it only emphasizes just how much practice he’d had finding helpful things to do, learning the piano, and generally impressing people.

Groundhog day had that superficial nice message, but it was believable precisely because it wasn’t nice at all. Intuition isn’t magic. Rita wasn’t some superior being. We’re supposed to believe Phil changed, but we only saw one good day after however many years of failure. Rita was fooled by one good day and the fact that he impressed lots of other people during that, for her, single 24 hour period.

As for intuition - it’s not magic. It’s information processing in the brain. Even expert systems developers have found that it’s usually a lot easier to invent a heuristic solution that’ll probably work than to justify that solution rationally. Evolution had the same issue to deal with. But rely on intuition and to a large extent you’re relying on the cro-magnon instinct. Just because it more-or-less worked for cro-magnons doesn’t mean it’ll work here and now, and a lot of the cro-magnon solutions really weren’t very nice.

The difference with the evolution argument is that evolution is based on random mutations, sometimes drastically different in a single iteration. The reason A/B tests arrive only at local maxima is because we don’t often throw big random changes at it – we make a small change and the numbers move slightly in the direction we want, so we follow that up with successive changes in the same vein, usually stopping when we can’t find any (minor) modification that gives better numbers.

I’m astonished that no-one’s mentioned ‘the tunnel under the world’ (Frederik Pohl). It’s a story where a town lives the same day over and over, to test marketing strategies, and is therefore very applicable :slight_smile: Also a very good story, and out of copyright: http://www.gutenberg.org/etext/31979

Jeff, why did you break quoting? I hate Twitter-style “@whoever” syntax.

I’ve seen your photo on wikipedia. You look like a child molester. LOL

Genius- how you elegantly explained the danger of A-B testing all the while drawing reference to a popular movie and philosophy. Not only am I more dissuaded from falling into the A-B testing trap, but I’m forced to reflect on the larger truths of purpose and plan. Thanks for posting this.