Scores Still Kinda Suck – Now With More Better Science?

Posted on March 21, 2018March 20, 2018 by 1WineDude

There’s been a good bit of discussion lately on the Global Interwebs over a recent blog post by the wine-data-focused David Morrison (to which I was alerted by intrepid friend-of-1WD Bob Henry).

In that post, Morrison puts the scores of two of Wine Spectator’s then-critics-both-named-James, James Laube and James Suckling, through the data-analysis wringer, focusing on scores they gave to wines as part of WS’s “Cabernet Challenge” of 1996.

Generally speaking, Morrison’s blog post, while enviably thorough, can justifiably be criticized as much ado about nothing, considering that no one in the right minds could draw any statistically relevant conclusions from such a small data set. The summary version is that he found a high level of disagreement in the scores that the two Jameses gave to the same wines. Morrison draws out some interesting suggestions from this finding, though, primarily about the use of numbers when evaluating wine quality; to wit (emphasis is mine):

“The formal explanation for the degree of disagreement is this: the tasters are not using the same scoring scheme to make their assessments, even though they are expressing those assessments using the same scale. This is not just a minor semantic distinction, but is instead a fundamental and important property of anything expressed mathematically. As an example, it means that when two tasters produce a score of 85 it does not necessarily imply that they have a similar opinion about the wine; and if one produces 85 points and the other 90 then they do not necessarily differ in their opinion.“

So… where have we heard that before?

Oh, that’s right, we heard it right here on 1WD. Several times, actually…

Morrison gets to his point a different way than I did (and by that, I mean not only via data analysis, but also more eloquently and in about one-third as many words), but the point remains the same: specific numeric values are just a sucky way to talk about subjective experiences (something that the medical field has known for a long, long time), and wine criticism will always have large subjective elements baked into it.

Here’s a recap of my version of a similar conclusion (with newly-added emphasis):

“Wine ratings are most often presented via scales that imply scientific precision, however they are measuring something for which we have no scientifically reliable calibration: how people sense (mostly) qualitative aspects of wine. Yes, there may be objective qualities about a wine that can indeed be somewhat calibrated (the presence of faults, for example) but even with these we have varying thresholds of detection between critics. That’s important because it means that the objective (i.e., measurable) quantities of those elements are not perceived the same way by two different reviewers, and so their perception of the levels of those elements cannot reliable be calibrated.

But it’s the subjective stuff that really throws the money wrench into the works here. How we perceive those – and measure our enjoyment of them – will likely not be fully explainable in our lifetimes by science. That is because they are what is known as qualia: like happiness, depression, pain, and pleasure, those sensations can be described but cannot effectively be measured across individuals in any meaningful way scientifically.

Yes, we can come to pretty good agreement on a wine’s color, and on the fact that it smells like, say, strawberries. After that, the qualia perception gets pretty tricky, however: my perception on how vibrantly I perceive that strawberry aroma might be quite different from yours. Once that factors into how you and I would “rate” that wine’s aroma, we start to diverge, and potentially quite dramatically at that.”

Add to this quagmire the penchant of humans to treat numeric values as fungible (see Morrison article quote above), and you have a recipe for a not-so-great consumer experience when using specific numbers to rate a wine, and then comparing those specific numbers across critics, particularly when those numbers are stripped of their original context (which is, oh, just about every time they are presented…).

Cheers!

11 thoughts on “Scores Still Kinda Suck – Now With More Better Science?”

clegg5 says:

March 21, 2018 at 7:19 pm

Marketing. Marketing. Marketing.

If the magazines and their paid reviewers truly cared about objective analysis of wine, they would use the 20-point scale (Davis, AWS, etc.). But they are in business to sell wine, not objectivity.
1. 1WineDude says:
  
  March 21, 2018 at 9:13 pm
  
  That’s a bit too sweeping of a generalization, IMO. I think they’re in the business to make money, but that doesn’t mean that they are merely PR vehicles for wine sales.
Bob ("intrepid friend of 1WD") Henry says:

March 22, 2018 at 12:13 am

Wine Spectator implicitly touts its so-called 100 point scale as mathematically sound and a trusted arbiter of quality.

That there is a demonstrable difference between (say) a 89 point wine and a 90 point wine.

But David Morrison’s earlier blog titled “Biases in wine quality scores” has shown a tendency for wine critics to “bump up” wine scores to get them over that 90 point threshold.

[ Link: http://winegourd.blogspot.com/2016/10/biases-in-wine-quality-scores.html ]

So what is Wine Spectator’s scale?

Does it comprise a critical assessment of the technical “components” of a wine, as promoted by Peynaud and Amerine? A scale where you add up the numbers and arrive at an overall score? A scale that forces you to deduct points from 20-point or 100-point “perfection” when flaws are revealed?

Or is it driven by whim and caprice? Not subject to being repeated?

By their own words, it is the latter:

“In brief, our editors do not assign specific values to certain properties [“components”] of a wine when we score it. We grade it for overall quality as a professor grades an essay test. We look, smell and taste for many different attributes and flaws, then we assign a score based on how much we like the wine overall.

Quoting Robert Parker’s observation in this interview:

“Catching Up with Robert Parker Jr.”

[ Link: http://www.themalaymailonline.com/eat-drink/article/catching-up-with-robert-parker-jr ]

Malay Mail Online: “Your list of 100-pointers is one of the most scrutinised. But what defines a 100-pointer wine, and how much does personal taste have to do with it?”

Robert Parker: “100-point wines are usually distinguishable by their intensity and complexity of aromas, even before they hit your palate. However, once on the palate, I’m always looking for perfect equilibrium between fruit extracts, the tannins and any wood if the wine is aged in oak, the acidity, the alcohol as well as an incredible purity, a multi-layered texture and a finish of at least 40 to 45-plus seconds. Personal taste certainly has something to do with it, but I think we all agree that while wine tasting is subjective, for the great wines of the world and the greatest vintages, there’s always a consensus on those particular wines and years among experts.”

Laube and Suckling were tasting some of the “great wines of the world.” So where was the “consensus” among these two experts?

Wine Spectator has established itself as an oracle of wine. Wears the cloak of authority figure.

But as Caltech lecturer (on randomness) Leonard Mlodinow pointed out in his Wall Street Journal essay titled “A Hint of Hype, A Taste of Illusion,” the Emperor isn’t always wearing clothes:

“But what if the successive judgments of the same wine, by the same wine expert, vary so widely that the ratings and medals on which wines base their reputations are merely a powerful illusion? That is the conclusion reached in two recent papers in the Journal of Wine Economics.”

“. . . The [California State Fair Wine Competition] judges’ wine ratings typically varied by ±4 points on a standard ratings scale running from 80 to 100. . . .”

“As a consumer, accepting that one taster’s tobacco and leather is another’s blueberries and currants, that a 91 and a 96 rating are interchangeable, or that a wine winning a gold medal in one competition is likely thrown in the pooper in others presents a challenge. If you ignore the web of medals and ratings, how do you decide where to spend your money?”

[ Link: http://online.wsj.com/article/SB10001424052748703683804574533840282653628.html ]

~~ Bob (“intrepid friend of 1WD”) Henry
1. 1WineDude says:
  
  March 22, 2018 at 6:27 am
  
  Thanks, Bob. I’ve written my takes on most of the points (ha ha!) that you mentioned, all of them here on 1wd. My conclusion, in summary, is that scores aren’t the problem in and of themselves; the problem is that the scores are presented out of context (and often the authors of those scores do nothing to prevent this) and, given human nature, are interpreted as being fungible/precise/totally objective despite the fact that they’re none of those things.
  1. Bob ("intrepid friend of 1WD") Henry says:
    
    March 24, 2018 at 6:07 am
    
    “… the problem is that the scores are presented out of context …”
    
    As Parker declared back in 1989, you have to READ his word description, not just default to his numerical score.
    
    Excerpts from Wine Times magazine (September/October 1989) interview with Robert Parker, publisher of The Wine Advocate:
    
    WINE TIMES: How is your scoring system different from The Wine Spectator’s?
    
    PARKER: Theirs is really a different animal than mine, though if someone just looks at both of them, they are, quote, two 100-point systems. Theirs, in fact, is advertised as a 100-point system; mine from the very beginning is a 50-point system. If you start at 50 and go to 100, it is clear it’s a 50-point system, and it has always been clear. Mine is basically two 20-point systems with a 10-point cushion on top for wines that have the ability to age. . . .
    
    . . . The newsletter was always meant to be a guide, one person’s opinion. The scoring system was always meant to be an accessory to the WRITTEN REVIEWS, TASTING NOTES. That’s why I use sentences and try and make it interesting. Reading is a lost skill in America. There’s a certain segment of my readers who only look at numbers, but I think it is a much smaller segment than most wine writers would like to believe. The TASTING NOTES are one thing, but in order to communicate effectively and quickly where a wine placed vis-à-vis its peer group, a numerical scale was necessary. If I didn’t do that, it would have been a sort of cop-out.
    1. 1WineDude says:
      
      March 24, 2018 at 9:00 am
      
      Bob, I’m aware of that quote, and it’s been referenced before here on 1WD a few times; the bottom line is that few follow that advice, including (most importantly) retailers / buyers / importers / etc., who use mostly the numbers to sell / buy / decide / etc. And the purveyors of the scores do little or nothing to stop that (mis)use.
      1. Bob ("intrepid friend of 1WD") Henry says:
        
        March 25, 2018 at 5:13 am
        
        I have seen The Wine Advocate stickers affixed to wine bottles, attesting to their score.
        
        Here’s an example: 93 points for this Spanish red wine.
        
        Link: https://i2.wp.com/costcowineblog.com/wp-content/uploads/2017/12/DSC_0003.jpg?resize=574%2C400
        
        No written review. No tasting note. The “monetization” of Parker Points® by The Wine Advocate’s new owners?
        
        (Let me throw in a “bonus” link to a Forbes magazine column circa 2015 titled “Questioning Wine Perfection: What Does A 100-Point Score Really Mean?”
        
        https://www.forbes.com/sites/katiebell/2015/03/03/questioning-wine-perfection-what-does-a-100-point-score-really-mean/ )
Bob ("intrepid friend of 1WD") Henry says:

March 22, 2018 at 12:26 am

While I await a comment to pass “moderation,” let me add these quotes:

“. . . [The Wine Advocate] Readers often wonder what a 100-point score means, and the best answer is that it is pure emotion that makes me give a wine 100 instead of 96, 97, 98 or 99. ”

Source: Robert Parker, The Wine Advocate (unknown issue from 2002)

Well, go and try repeating that euphoric moment!

And as Parker has conceded, he can’t:

“How often do I go back and re-taste a wine that I gave 100 points and repeat the score? Probably about 50% of the time.”

Source: “Perfection isn’t perfect: Parker says only 50% of his 100-point scores are repeatable,” W. Blake Gray, The Gray Report blog (May 13, 2015)

[ Link: http://blog.wblakegray.com/2015/05/perfection-isnt-perfect-parker-says.html ]
Bob ("intrepid friend of 1WD") Henry says:

March 22, 2018 at 12:32 am

(I have requested the above comment be deleted due to an editing error. The replacement text follows beow.)
Cheers-2-Wine.Com says:

March 29, 2018 at 2:18 pm

Judging wine objectively is a challenge. The ratings do give us an idea that how particular reviewers rate the wine. I know many people place their trust in a reviewer who’s tastes they have judged to be similar to their own. As always, the bottom line is drink what YOU like.
1. 1WineDude says:
  
  March 29, 2018 at 5:43 pm
  
  Truth!!