Apparently, the hockey stick debate has been finally laid to rest, at least according to WTFIUWT.
This is why I refuse to label some people as “skeptics’ – a true skeptic hasn’t already decided and is now only looking for evidence to support their position. They don’t latch onto every new paper that comes out before it is even published and has been reviewed by the experts and assessed. The true skeptic waits and evaluates all new data before deciding.
BREAKING: New paper makes a hockey sticky wicket of Mann et al 98/99/08
NOTE: this will be the top post at WUWT for a couple of days, see below for new stories – Anthony
Sticky Wicket – phrase, meaning: “A difficult situation”.
Oh, my. There is a new and important study on temperature proxy reconstructions (McShane and Wyner 2010) submitted into the Annals of Applied Statistics and is listed to be published in the next issue. According to Steve McIntyre, this is one of the “top statistical journals”. This paper is a direct and serious rebuttal to the proxy reconstructions of Mann. It seems watertight on the surface, because instead of trying to attack the proxy data quality issues, they assumed the proxy data was accurate for their purpose, then created a bayesian backcast method. Then, using the proxy data, they demonstrate it fails to reproduce the sharp 20th century uptick.
I haven’t read the entire article, nor do I claim to have the expertise to comprehend its validity and value. Sad to see so many people who are like me but who do feel enabled. They are not skeptics.
I’ll post more response in the coming days, but in the meantime, feel free to post your response to the paper and the ensuing blogospew.
Here’s the conclusion to the new paper:
Research on multi-proxy temperature reconstructions of the earth’s temperature is now entering its second decade. While the literature is large, there has been very little collaboration with university-level, professional statisticians (Wegman et al., 2006; Wegman, 2006). Our paper is an effort to apply some modern statistical methods to these problems. While our results agree with the climate scientists ﬁndings in some respects, our methods of estimating model uncertainty and accuracy are in sharp disagreement.
On the one hand, we conclude unequivocally that the evidence for a ”long-handled” hockey stick (where the shaft of the hockey stick extends to the year 1000 AD) is lacking in the data. The fundamental problem is that there is a limited amount of proxy data which dates back to 1000 AD; what is available is weakly predictive of global annual temperature.
Our backcasting methods, which track quite closely the methods applied most recently in Mann (2008) to the same data, are unable to catch the sharp run up in temperatures recorded in the 1990s, even in-sample. As can be seen in Figure 15, our estimate of the run up in temperature in the 1990s has a much smaller slope than the actual temperature series. Furthermore, the lower frame of Figure 18 clearly reveals that the proxy model is not at all able to track the high gradient segment. Consequently, the long ﬂat handle of the hockey stick is best understood to be a feature of regression and less a reﬂection of our knowledge of the truth. Nevertheless, the temperatures of the last few decades have been relatively warm compared to many of the thousand year temperature curves sampled from the posterior distribution of our model.
Our main contribution is our efforts to seriously grapple with the uncertainty involved in paleoclimatological reconstructions. Regression of high dimensional time series is always a complex problem with many traps. In our case, the particular challenges include (i) a short sequence of training data, (ii) more predictors than observations, (iii) a very weak signal, and (iv) response and predictor variables which are both strongly autocorrelated. The ﬁnal point is particularly troublesome: since the data is not easily modeled by a simple autoregressive process it follows that the number of ruly independent observations (i.e., the effective sample size) may be just too small for accurate reconstruction.
Climate scientists have greatly underestimated the uncertainty of proxy-based reconstructions and hence have been overconﬁdent in their models. We have shown that time dependence in the temperature series is sufﬁciently strong to permit complex sequences of random numbers to forecast out-of-sample reasonably well fairly frequently (see, for example, Figure 9). Furthermore, even proxy based models with approximately the same amount of reconstructive skill (Figures 11,12, and 13), produce strikingly dissimilar historical backcasts: some of these look like hockey sticks but most do not (Figure 14).
Natural climate variability is not well understood and is probably quite large. It is not clear that the proxies currently used to predict temperature are even predictive of it at the scale of several decades let alone over many centuries. Nonetheless, paleoclimatoligical reconstructions constitute only one source of evidence in the AGW debate.
Our work stands entirely on the shoulders of those environmental scientists who labored untold years to assemble the vast network of natural proxies. Although we assume the reliability of their data for our purposes here, there still remains a considerable number of outstanding questions that can only be answered with a free and open inquiry and a great deal of replication.
So, if I understand this correctly, the authors created a model based on proxy records using some new or spiffy statistical methods and did a ‘backcast’ in order to see if the model could accurately reconstruct temperatures of the most recent instrumental period and the MBH98/99 and 08 hockey sticks, either with a flat long blade to 1000 AD or the more wobbly spaghetti graph. They claim that their model failed to do either and that there is far more uncertainty in the proxy records and paleoclimate recons than is acknowledged or understood.
McI is off to Italy and so his post on the matter is short and sweet, just posting a link and a brief quote from the paper. He does note that the paper mentions MM… However short his post is, his acolytes have chimed in with quite the giddy glee. For those who do not venture to that place, here I present a tray of goodies for your reading enjoyment:
Halle-bloody-lujah. It’s about time statisticians took a detailed interest in the scientific kludge that is proxy thermometry.
Steve has been the lonely voice in this field for years, has single-handedly carried the fight for honest audits to the very core of the field, and has bravely withstood the resulting vicious opprobrium that has disgraced science.
So, hats off to Blakeley McShane and Abraham Wyner for standing up with Steve and taking up the rescue of scientific integrity. Until now, apart from Edward Wegman, it has been sorely neglected by their colleagues.
Agreed Pat. All I’d add is that Steve’s voice has been made less lonely by the terrific community here at Climate Audit. Never has something as humble as WordPress been used for something so significant. Every constructive critic and online supporter should consider some of the glory from the Annals of Applied Statistics as duly reflected on them today. It’s telling that Steve only knew of this from one of CA’s followers. I salute every one of you, friends.
VINDICATION ! ! ! ! !
This is very interesting. A paper by two heavy duty statisticians, both at top 5 business schools, in a highly respected mainstream peer-reviewed statistical journal. The paper appears a vindication of MM and a complete repudiation of Wahl and Amman (and to a fair extent of Mann). To my mind, the money quote is:
“…The major difference between our model and those of climate scientists, however, can be seen in the large width of our uncertainty bands. Because they are pathwise and account for the uncertainty in the parameters (as outlined in Section 5.3), they are much larger than those provided by climate scientists. In fact, our uncertainty bands are so wide that they envelop all of the other backcasts in the literature. Given their ample width, it is difficult to say that recent warming is an extraordinary event compared to the last 1,000 years. For example, according to our uncertainty bands, it is possible that it was as warm in the year 1200 AD as it is today. In contrast, the reconstructions produced in Mann et al. (2008) are completely pointwise…”
It will be interesting to see how the team reacts to this, but it is a harsh blow to the hockey stick.
You can almost hear the smear campaign being formulated.
“This wasn’t published in a scientific journal.”
“The authors weren’t scientists, let alone climate scientists.”
They will be subjected to all of the same irrelevant and underhanded attacks as Wegman.
It won’t budge the faithful, but it will make a difference in scientific circles that acknowledge the dependence of scientific inquiry on legitimate statistical analysis.
Whether it will find traction in the popular media is another matter entirely.
They’re actually speculating that people are working to prevent this paper from being published!
> Any chance of some Team member still “going to town” to keep this from being actually published?”
Forecast calls for 100% chance that the the thumb screws are being turned as we speak. The question is will they be successful? This might be a case where being successful might be more damaging to their cause than failing. You don’t just smack down top end talent with a track record with out raising some eye brows. If the paper gets flushed with out an actual serious flaw being found, it will definitely pop up some where else. And the brew haha will make people even more curious.
They’re pretty giddy right now, but I did enjoy a bit of humor:
Blakeley McShane is from the Kellogg School of Management and is obviously funded by big corn.
LOL! Gotta love a sense of humor, even in a CA acolyte. ;)
Update: (Aug 18, 2010)
A couple of substantive looking comments on the part of Eli and Chris Watkins:
The new McShane and Wyner paper due to appear in Ann. Stats. is clearly going to be much discussed, so I thought I would get in with a few comments, after scanning it briefly.
Let me say first that it is great news that some stats journals are taking a look at climate reconstructions. Unfortunately the first half of this paper is very silly, and the second half is slightly more sensible, and the most plausible reconstruction they produce…..looks rather like the hockey-stick.
In the first half, they take 1200 temperature proxy series (treated as independent variables) and fit them to 119 temperature measurements (keeping overlapping holdout sequences of 30 yearly temperature measurements). Fitting 1200 coefficients to 119 data points is of course hopeless without further assumptions. Instead of doing some form of thoughtful data reduction, they employ the lasso to to the regression directly, with strong sparsity constraints.
They justify their choice of the lasso by saying:
“…the Lasso has been used successfully in a variety of p >> n contexts and because we repeated the analyses in this section using modeling strategies other than the
Lasso and obtained the same general results.” Both parts of this statement are wrong, and the first part is a MORONIC thing for statisticians to say. They give absolutely no reasons to suppose that the Lasso — a method that makes _very_strong_ implicit assumptions about the data — is in any way appropriate for this problem.
The Lasso _is_ appropriate in certain cases where you believe that only a small subset of your variables are relevant. To use it as a substitute for any data reduction with 1200 variables and 119 data points, when _all_ the temperature proxy series are presumed to be relevant to some degree, and all are thought to be noisy, is simply stupid.
Not surprisingly, they find they can’t predict anything at all using the Lasso. (It is a completely inappropriate technique for the problem.)
In the second half of the paper, they do something which is almost sensible, (but less sensible than what the climate modellers do). They take 93 proxy series that go back a thousand years, and do OLS regression on various numbers of principal components of these series. Regressing on just one PC gives more or less Mann’s curve (ironically this is probably the most defensible prediction from all the ones they try)– when they regress on 10, they back-cast historical upward trends. If they were being agnostic statisticians, then I suspect that from the cross-validations they show, the most conservative model they could choose would be a model predicting on one or a very few principal components.
Hey presto, they’ve recovered Mann’s hockey stick as a most plausible estimate. As Garfield would say, Big Hairy Do.
That’s the bulk of the paper. Some but not all of the points they make about over-tight confidence intervals in the previous literature seem valid.
In my opinion they do not introduce any useful new techniques into palaeclimate reconstruction: their main contribution is to show that using the Lasso with no prior dimension reduction is as useless an idea as any sensible person would expect it to be.
This paper shows, if proof were needed, that it is possible to get ill-considered papers into good peer reviewed journals, especially if they are on hot topics.
So, three points: First, the Lasso approach is useless and shows nothing except that it is useless. Second, doing OLS regression on the first PC produces an MBH hockey sticky looking thing. Oh, and the first half is silly.
Here is a further elucidation of his comment:
What the Lasso does does — intuitively put — is to force quite a lot of the regression coefficients to be zero. This means that if you have (as in this case) about 120 data points, and you have a much larger number of proxy series ( over a thousand), then _if_ you believe that there is a linear combination of just a small number of the proxy series that can correctly fit the data, then a Lasso is a good technique to try. In other words, use it if you believe that there may be a good prediction rule based on a small number of the series.
Now, is this reasonable for climate proxies? Well … perhaps … but common sense might indicate not. After all, all, the proxy series are thought to be noisy, and it would seem reasonable (if they can be used at all to predict temperatures) that you would want a rule that combined a lot of them so as to average out errors in any individual series.
Hence, for this type of data, it does not seem plausible that taking over 1000 series and selecting a few of them to fit the (short) temperature series is going to produce a good predictor.
Well, this paper seems to show that, indeed, using the Lasso doesn’t work well for this problem.
That’s a valuable thing to show. It’s reasonable to try the Lasso — but I don’t think we should be surprised that it didn’t work here. Other methods which make different assumptions about the data might work much better.
Here’s Eli via Martin Vermeer:
My explanation at link for why the M&W reconstruction is erroneous, was a little too simple. It’s the Wabett who gets it completely right: the fundamental error is calibration only against a hemispheric average, when local data – the 5×5 degree grid cells of instrumental data as used by Mann et al. –provide a so much richer source of variability — i.e., signal — to calibrate against.
It is this poor signal/noise ratio that helps the calibration pick up spurious low vs. high latitude temp difference “signal”, which in the reconstruction interacts with the Earth axis tilt change effect.
What stands is the observation that doing the calibration against the instrumental PC1 (instead of hemispheric average) will give you back pretty exactly the genuine Mann stick(TM) even in spite of this.
And here is some fellow called William M. Briggs — Statistician to the Stars! who provides a play-by-play analysis of the paper:
This is just as beautiful as a shorthanded goal. It means we cannot tell—using these data and this model—with any reasonable certainty if temperatures have changed plus or minus 1oC over the last thousand years.
McShane and Wyner don’t skate off the ice error free. They suggest, but only half-heartedly, that “the proxy signal can be enhanced by smoothing various time series before modeling.” Smoothing data before using it as input to a model is a capital no-no (see this, this, and this).
Finally, we have our Amen, the touch of Grace, the last element of a Gordie Howe hat trick3, and worthy of an octopus tossed onto the ice:
Climate scientists have greatly underestimated the uncertainty of proxy-based reconstructions and hence have been overconfident in their models.
So that’s just a sampling of discussion on the paper.