Why Rating Wine Is Bad Science

Vinted on May 16, 2012 binned in best of, commentary

[ Editor’s note: this article is not an easy one to follow, because the topic is not an easy one to get your head around; intrepid readers will want to stick with it, though, because I think the conclusions are fodder for some amazing discussion on their implications on wine criticism. ]

The world of wine critique is fraught with logical contradictions.

Case in point: take this excerpt from a recent interview with critic James Suckling on Liv-Ex.com on the topic of evaluating wines while they are still in the barrel, as is often done during En Primeur in Bordeaux (emphasis mine):

The key thing to remember is that the nose isn’t important at all. I learnt that from Daniel Lawton, one of the great negociants of Bordeaux. The important thing is the texture – the quality of the tannins and how they relate to the acidity and alcohol – and then the finish. Wines with long seamless finishes are really the great wines. It’s not all about power. It takes a long time before you can taste En Primeur properly. There’s a hierarchy in Bordeaux that helps as you can kind of figure out what should taste good. But to really understand how wines evolve you need a good 10 years of tasting.”

The logic issue here is that we know scientifically that the vast majority of our sensory experience in tasting wine comes aromatically and retro-nasally. So one (but not the only!) interpretation of the above quote is that En Primeur ratings are meaningless, or at least limited in value to consumers, because the aromas – and therefore the majority of the wine’s sensory experience – cannot be fully evaluated. The contradiction being that the wine world largely treats those ratings as not having any such limited usage.

Issues like that one crop up all over the place in the wine world, if you’re willing to look hard enough. And so it should be of little surprise to many of you when I tell you that the act of rating wines falls squarely into what is commonly called “bad science” in the scientific world…

To explain this, we first need to explore what’s meant by “bad science.” David Deutsch lays out a compelling definition in his excellent book The Beginning of Infinity: bad science offers explanations that are easy to vary, and therefore are not actually explanations of how things truly work.

For example, saying that someone lost a vintage of grapes to hail because of bad luck is bad science. It’s bad science even if that grapegrower has had many random unfavorable circumstances befall him before, a string of seemingly convincing “data” to bolster this claim. The data in this case are irrelvant because a) those random events cannot directly predict or explain the subsequent unpleasant events and b) we can substitute any manner of explanations in place of “bad luck” (“the will of the gods”for example) all of which are equally incorrect. In other words, the explanation is too easy to vary – accepting it is no better than acting on blind faith. That’s an over-simplification, but you get the drift.

Now, if we were able to trace back the meteorological events leading to that hail storm, and were able to gather enough data and causal links from the formation of the clouds to the weather events that triggered the hail above the poor guy’s vineyard, we would end up with an explanation and evidence (the formation of hail) that’s pretty difficult to vary. This is “good science” – it explains what happened, and the results could be used to accurately predict future similar events.

What does this have to do with criticism?

Wine ratings are most often presented via scales that imply scientific precision, however they are measuring something for which we have no scientificly reliable calibration: how people sense (mostly) qualitative aspects of wine. Yes, there may be objective qualities about a wine that can indeed be somewhat calibrated (the presence of faults, for example) but even with these we have varying thresholds of detection between critics. That’s important because it means that the objective (i.e., measurable) quantities of those elements are not perceived the same way by two different reviewers, and so their perception of the levels of those elements cannot reliable be calibrated.

But it’s the subjective stuff that really throws the money wrench into the works here. How we perceive those – and measure our enjoyment of them – will likely not be fully explainable in our lifetimes by science. That is because they are what is known as qualia: like happiness, depression, pain, and pleasure, those sensations can be described but cannot effectively be measured across individuals in any meaningful way scientifically.

Yes, we can come to pretty good agreement on a wine’s color, and on the fact that it smells like, say, strawberries. After that, the qualia perception gets pretty tricky, however: my perception on how vibrantly I perceive that strawberry aroma might be quite different from yours. Once that factors into how you and I would “rate” that wine’s aroma, we start to diverge, and potentially quite dramatically at that.

It’s a bit like you and I cutting our fingers accidentally when trying to open a wine bottle: ask us how we’d rate the pain on a scale of one to ten, and we might both say “2” but there is NO calibration possible to ensure that our twos are meaningfully comparable – it only appears that way because we chose arbitrarily to use a numeric scale in reporting our perceptions of the pain. Substitute our perception of a wine’s balance for finger pain in that example and you can see we have a problem.

I am grossly oversimplifying this, by the way, in order to prevent this piece from ending up as ten thousand words (the qualia rabbit hole is very, very deep). The point is that the qualia can directly impact the rating, and so are not irrelevant.

Mistaking qualia perception as either accurately measurable or removing it as scientifically irrelevant has lead to a lot of “bad” science in the wine world, in some cases coming from what would otherwise qualify as “good” scientific data.

Interestingly, fine wine is acutely susceptible to this problem of differing qualia perceptions specifically because of how wonderful it is – it contains more complexity in offering more qualia (styles, flavors, primary and secondary aromas, textures, tannins, acidic structure, etc., and the senses of balance between them) and therefore a lot more room for error (of the scientific kind, I mean) between individuals.

A relatively recent example of this can be found in the Journal of Wine Economics’ publication of article by Robert T. Hodgson titled An Analysis of the Concordance Among 13 U.S. Wine Competitions. In that article, Hodgson gathered data on wines submitted to medal competitions and concluded that “the probability of winning a Gold medal at one competition is stochastically independent of the probability of receiving a Gold at another competition, indicating that winning a Gold medal is greatly influenced by chance alone.” The trouble was that Hodgson a) removed all of the factors that lead into how the judges came to their ratings, on the grounds that they are statistically irrelevant, which lead to him b) treating the qualia perception differences between judges and competitions as fungible when in reality they are not – they simply cannot yet be measured accurately (and may not be done so in out lifetimes, given the scientific complexity of that task). So in effect Hodgson made a critical error in assuming that the ratings in each competition are reached in way that can be measured scientifically and interchangeably. But they aren’t – the qualia rule the day when it comes to wine reviewing. Good approach, but the conclusion is bad science.

What does this mean for wine ratings?

Wine ratings (mine included!) are all at least partially “bad” science, and are not meaningfully comparable between reviewers, at least not scientifically, because all of them will at least partially perceive the qualia presented in those wines differently in ways that cannot be measured. This is true despite the fact that the scales are often comparable, and despite the fact that the wine world occasionally sports a serious hard-on for comparing ratings from different critics.  It is true despite the fact that the wine business does not want it to be true. It is true because if wine ratings cannot be reliable compared in a scientifically meaningful way, then they should NOT be treated as fungible (even if everyone happens to be treating them that way now). They are NOT accurate science.

“So what?” you might ask, “aren’t they all close enough for government work? Don’t lots of wine critics appear to agree at least somewhat on wine ratings for many wines?”

The answer to that while many critics appear to agree, it is NOT close enough for any meaningful comparison, and we cannot safely assume that any two critics reach equivalent scores in exactly the same way (or even objectively). Why? Because we have no way to accurately measure their qualia perceptions, and so we have no way of estimating how close to objective accuracy those ratings are individually, let alone across individual critics. This assumes that there is some objective accuracy to those qualia perceptions (interestingly, The Beginning of Infinity does suggest that there might be objective qualia like beauty – but we are likely many generations away from getting a handle on that scientifically).

What does this mean for everyday imbibers? It means that you shouldn’t assume that different reviewers are rating wines in the same way or even on the same scale, even if those scales appear to be identical superficially.

Accepting any wine review, or even a collection of such reviews on the same wines, as an ultimate determinant of that wine’s present and future enjoyment across individuals is “bad” science, and no better than accepting them on blind faith.

There is a light at the end of this tunnel, though.

If you have found that a reviewer’s ratings and descriptors and – importantly! – their relative perception of the levels of those elements in a given wine at least seem to closely mirror your own, then you may have found someone who likely perceives fine wine qualia similarly to you. Seek out those reviewers and listen to them. Just don’t take their 95, A, 5 puffs, or any other scale to be replaceable by anyone else’s.

Cheers!

35

 

 

    Comments

  • willybuoy


    Wow…fascinating post. I felt like I was in a class that combined wine, philosophy, science, statistics and vocabulary building! The best was getting exposed to "qualia", a fascinating concept I cannot wait to explore. Reading this also reminded me of trying to determine how art is perceived and judged, a topic I have often thought about. This gives me a new framework in which to place my thoughts in that area. Thanks Joe!

    • 1WineDude


      Thanks, Wee Ree San! Hope all is well!

  • Thomas Pellechia


    Joe,

    Here, you veer not into bad science, but maybe into psychology:

    "If you have found that a reviewer’s ratings and descriptors and – importantly! – their relative perception of the levels of those elements in a given wine at least seem to closely mirror your own, then you may have found someone who likely perceives fine wine qualia similarly to you."

    Maybe because the first one or two times with that critic produced agreement on your part, you have fooled yourself into believing most everything thereafter, mainly because you are looking for agreement. One of the things I've noticed a wine judge (ex-wine judge) is that many judges, including me, have a bent that causes them to seek certain things in a wine, often at the expense of other attributes.

    Bad science extends to the belief that we are in control of what is generally unknown or up for interpretation.

    • 1WineDude


      Thomas – true. But I see some light at the end of that tunnel. If you find yourself in agreement with a critic based on your own preferences, then I suppose you're minimizing the chances of being led vs making your own informed decisions?

  • @michaelamstein


    Very Interesting post Joe. I just discovered your stuff after DLW and have thoroughly enjoyed it thus far.

    I've often wondered in todays world of mass spectrometry, has there not been any science to identify a combination of stereo isomers and chemical isotopes that make "good wine"? Or moreover a reverse engineering of commonly agreed upon "good wine" to identify its chemical characteristics, and then produce a franken-wine based on that data to see if it rates the same? Time to bring some samples to Abby Sciuto and let Major mass spec take a shot at identifying a 95…

    Hmmm, maybe I need to fire up my grant pen. UC Davis, are you listening?

    • Mark Cochard


      Michael have you heard of Leo Mc Closkey and Enologix? http://www.enologix.com/

    • 1WineDude


      @michaelamstein – thanks. I know a robot was developed. Not too long ago that does something similar in terms of evaluating wine on that level but I don't think it made the jump to scores prediction.

  • Winowill


    Turning wine ratings into scientific conclusions is difficult to rationalize, even when using the term 'bad science'.

    A rating is a number therefore looks scientific, however, if I gave my opinion of a politician a ranking or number does this 'opinion' then have a scientific foundation? If I rank a restaurant or hotel on the basis of Service, Decor, Value, etc. I can explain each element and give it a number, sum them, determine an average to find a Rating – is it Science?

    Composed of individual opinions of a wine's color, its aroma, its texture and staying power, and lack of artificiality I choose to use a number system to put these into perspective. The sum of these separate opinions is still an opinion.

    Your last paragraph has, in my view, the correct view of critic ratings… each wine tasting is independent of the next. My rating of the article as to how it stimulates thought about wine ratings is 94/100, totally unscientific.

    • 1WineDude


      ThanksWinowill. I suppose the answer to your question is “probably not” but I support your rating of the article anyway :).

  • Charlie Olken


    Joe–

    I wish you had started with your conclusion as your initial thesis because no one has ever said that wine ratings, whether in 20 points or 200 hundred, whether in three stars or ten chopsticks, was scientific. It is not different from movie reviews. Each critic has a unique view.

    What wine consumers should do, and I would argue, most do, is to find a voice or voices that they respect and to listen to, not worship, those voices. Wine reviews are not bad science, and that is the major quibble I have here. As I said at the top. They are not science at all, and no one in the wine critic world thinks they are. Indeed, very few of our readers, including your readers, think that ratings are science.

    • Michael


      I agree. I would, however, take this one step further. While I don't think scores are absolute or necessarily comparable from one reviewer to another, that a lot of reviewers like something is meaningful to me.

      Movies are a good analogy here. If I know that a movie has gotten good reviews from pretty much everyone who has reviewed it, it's probably a pretty good movie. The fact that reviewer A really likes it — 5 stars! — is really worthless, unless I tend to like the same sorts of movies as that reviewer does. (Even more so in that, judging by the promos, almost every movie can find some reviewer somewhere willing to say something nice about the movie. "Best buddy cop movie this week!")

      I can't help but thinking there is a similarity to judging figure skating or gymnastics at the Olympics. That said, two differences stand out. First, the criteria for judging there are specifically delineated — .2 off for this, .2 off for that, etc. Second, in those events typically they have a number of judges, and then they throw out the high and low scores, and average the rest. So no judge's score is absolutely right or wrong, but they are looking for some sort of objective consensus on the performance.

      In my view the same can apply, to some extent, to wine. As you note, one of the things that is missing in wine reviews is an objective, measurable, universal set of standards for grading — 1 point off for this, 1 point off for that, etc. Second, relying on one judge's appraisal is dangerous. But by the same token, if I know that three or four different reviewers like a particular wine, maybe there is some consensus that this is a pretty good bottle.

      I use CellarTracker similarly. A wine that has 2 reviews that are both in the 90s is pretty meaningless (although I might still read the notes to see if they are consistent and might give a hint as to what the wine tastes like). But if there are two dozen notes, all over 90, with generally positive reactions to the wine, then maybe there is a consensus in the community that this is a pretty good wine.

      Consensus, however, is not a scientific principle.

    • 1WineDude


      Papa Olken – my experience has been that the rest of the world treats ratings as having objective value beyond what's reasonable. So while we might not encourage that, it still happens and this is my little way of chipping away at that stone, I suppose.

  • doug wilder


    Joe must have been on a walkabout and then finally came to a conclusion that most people already accept and understand. Stay hydrated, Joe!

    • 1WineDude


      Doug – I wish! :)

  • @WineCharlatan


    You could however, with a large enough sample size, average out all the qualia to get something that would be, or approach, good science. If 100 experienced wine critics, all with their own qualia preferences, rated the same wine, on the same day, under the same conditions (serving temperature, breathing time, etc…). Average out their scores. Even do some other statistical tests to determine the statistical relevance of the results.

    What the wine world needs then is a touring group of 100 qualified raters, all from different commercial publication entities (to avoid certain biases and pay offs), globetrotting en masse. Presto, science! …or close enough. :)

    Note: I have zero scientific basis for using 100 raters. Maybe 35 could be enough or maybe it needs to be 187? I'm not a statistician.

    I like the user ratings on websites like cellartracker.com, when there are often a good number of reviews to be had for a particular wine. It shows average and mean ratings and gross number of ratings. It is however a mish mash of users with varying degrees of tasting/scoring experience, but interesting results none the less.

    Also websites like Snooth will show a few ratings from certain acclaimed experts plus mix in user ratings. But total review numbers tend to be a bit low still (10 or under)

    • 1WineDude


      @WineCharlatan ah, but aren't those sites falling prey to the problem in averaging the scores? I agree that this are useful, I'm just saying that they're not as useful as we think they are at first view.

    • MyrddinGwin


      From my understanding of today's post, it's more that qualitative descriptions are not scientific. Describing a wine as smelling like strawberries is qualitative. There's nothing wrong with qualitative descriptions–they can be lovely, and give good descriptions of the experience, but they do honestly suck for measuring things. Qualitative description isn't meant for that.
      Quantitative descriptions, on the other hand, ARE scientific. They actually measure something. Describing a wine as containing 5 ng/l methyl cinnamate, 8 ng/l methyl butyrate, 6 ng/l isobutyl acetate, 4 ng/l ethyl formate, 12 ng/l ethyl butyrate, and 2 ng/l benzyl acetate would be very quantitative. I may have missed the detection threshold completely for every one of those esters, but if you get them in sufficient quantity in the wine, it could be qualitatively described as smelling like strawberries.
      Since few of us are analytical to that level to want to know every ester, acid, phenol, tannin, alcohol, etc., in a particular wine as well as each one's concentration, we tend to find qualitative descriptions more useful. One day, when we perfect wine-tasting robots, we may be able to quantitatively describe wines. Even then, though, we probably would prefer the robot to give us qualitative descriptions after its analysis.
      At this point in history, wine-tasting is still firmly an art, rather than a science. Unless, of course, in the next few years, we develop a ridiculously analytical wine-taster with a particular interest in chemistry and very accurate olfactory/gustatory sensors…

      • 1WineDude


        @MyrddinGwin – well stated.

  • Thomas Pellechia


    Good grief!

    The way we go on one would think that wine is some sort of life force without which humanity would wither.

    Wait a minute: maybe that is the point.

    Joe, I failed to mention earlier that while reading this post I felt like you had channeled Jeff LeFevre.

    • 1WineDude


      Thomas – i take that as a compliment! :)

  • @UCBeau


    I read through the post and at the end thought; "doesn't the conclusion go without saying?". Maybe that was bad science though…

    Joking aside, I think as more people educate themselves and move up the wine-drinking hierarchy, they'll stop viewing critic's scores as scientifically quantifiable numbers, stars, letters, grape clusters, or whatever.

    • 1WineDude


      UC – Let's hope! And I obviously felt it went with saying :).

  • Dan


    Did it really take 10,000 words to say that "Wine ratings are subjective"?

    • 1WineDude


      Dan – apparently Yes without an editor. Just adding further potential cement to the foundation for that view, hopefully.

  • 1WineDude


    Michael – thanks for that fascinating perspective. Part of me is wondering how much confirmation bias comes into play with the crowd surfed reviews, but I'd they're all arrived at independently then I think it is meaningful in the way that you've described. Cheers!

  • gabe


    Great post Joe.

    I feel like there is some sort of cognitive dissonance between people's reaction to the topic of wine ratings versus their behavior. I mean, every person on this comment thread has said something negative about wine ratings, but if I told you about a 94-point wine that cost $16, I'm willing to bet the same people would want more details.

    I think the point I am trying to make is that ratings do have value. Just like a recommendation from a sommelier or wine-shop employee, a good rating serves as a recommendation from someone who drinks wine for a living. I agree that the 100-point scale is a far cry from scientific, and I guess that was the point you were trying to make. I just think that if we did away with the 100-point scale, recommendations from wine journalists would still have value. The scale is just the language they use to make their recommendations.

    As one last aside – I recently stopped by a local winery that was pouring a 95-point pinot noir. They said it sold-out the minute it got the score, and they saved a few cases for the tasting room. I stuck my nose in the glass, and the first thing I smelled was VA (volatile acidity). While VA is a minor flaw that occurs in many great wines, it showed me that good-scoring wines aren't always scientifically "good". But would you rather get your wine recommendations from a robot or a human?

    • MyrddinGwin


      Reviews can be helpful, I agree. If you've never tried a wine before, it can be nice to know what the wine is supposed to be like before you open it. Qualitative descriptors in the form of specific aromas, flavours, and mouthfeel really do come in useful for me.
      There can be vague use of qualitative descriptors, though, like writing "Tasty!!" and nothing else. I think I might be willing to argue that a wine score is really a vague qualitative descriptor, as well, since a number above a particular threshold describes about as much as "Tasty!!" does, and a threshold above that means "Really Tasty!!" It's more useful to see the number paired with other qualitative descriptors: sour, velvety cherries –94 says a lot more to me than a 96 pulled from the æther, and all it took were three words.
      I do suspect, though, that not all wine writers are given as long as they'd like to describe the wines. It's probably more like a wine industry trade show, where you have 110 wineries to try, and only 120 minutes to try as many as you can, while hoping desperately to not run into anyone you know or get caught in lines. Writing numbers down can be like a shorthand. When I've been to a trade show, myself, my hopes of writing down intricate notes in shorthand were hopelessly dashed, and I essentially started writing down numbers as quickly as I could taste and spit. The ones that really stood out, I remembered, but the rest kind of were a fuzzy blur.
      As a taster, I do try to stay somewhat objective. That said, there is a style of wine I really don't like, there are other people who do enjoy that style of wine. By some sort of masochism, because it is part of my school and job, I do try many examples of this style, anyway, and try to learn to tell the good from the bad. Now, that said, though I do try to be fair, my own biases would creep in at some point, and I'd either unfairly rate a wine too low, or over-compensate and rate it too highly. And as a human, my thresholds for detecting particular things vary from other people's, and I'm more or less sensitive to those things as compared to other people. Really, though I do try to stay objective, subjectivity does creep in.
      At this point, wine-tasting is still an art, since it still is subjective. Since it probably always will have the qualitative judgement of good or bad, then it likely will remain an art. Wine-making, since it does have the subjective judgement calls, will likely remain an art, using scientific techniques, rather than as a pure science.
      Also, if robots do develop to a point where they can appreciate beauty, I probably wouldn't mind robots coming to wine tastings, especially if they come recreationally. There's no use turning away potential customers. As long as they don't drive while intoxicated or hurt someone, I'll be open-minded about it.

      • 1WineDude


        MyrddinGwin – Regarding the robots, Skynet is overdue to become self-aware, right? ;-)

        • MyrddinGwin


          I think so, but voluntarily sharing the booze before the robots even think of rampaging probably should count as a mitigating factor in humanity's favour. ;)

    • 1WineDude


      THanks, Gabe. I'll go with human. For now. :)

  • Dwight Furrow


    Joe,

    Your points about the objectivity of wine ratings are well-taken. But I should point out that the usefulness of qualia (the subjective quality of consciousness) for understanding our experience is contested by many cognitive scientists and philosophers. The dominant position in the field would argue that qualia are reducible to brain states that have a quantitative and thus measurable dimension. (Althugh the neurophysiology of the brain is still a very new science). This is not my primary field of expertise so I can't predict with any accuracy how soon (if ever) science will offer an account of the kinds of qualia wine tasters discuss.

    But if you want to get the bottom of this issue, prepare to climb into an fMRI machine.

    • 1WineDude


      Dwight – no doubt. Someday we will be there, but probably not for a long time!

The Fine Print

This site is licensed under Creative Commons. Content may be used for non-commercial use only; no modifications allowed; attribution required in the form of a statement "originally published by 1WineDude" with a link back to the original posting.

Play nice! Code of Ethics and Privacy.

Contact: joe (at) 1winedude (dot) com

Google+

Labels

Vintage

Find