Apparently, the hockey stick debate has been finally laid to rest, at least according to WTFIUWT.
This is why I refuse to label some people as “skeptics’ – a true skeptic hasn’t already decided and is now only looking for evidence to support their position. They don’t latch onto every new paper that comes out before it is even published and has been reviewed by the experts and assessed. The true skeptic waits and evaluates all new data before deciding.
From WTFIUWT:
BREAKING: New paper makes a hockey sticky wicket of Mann et al 98/99/08
Posted on August 14, 2010 by Anthony Watts
NOTE: this will be the top post at WUWT for a couple of days, see below for new stories – Anthony
Sticky Wicket – phrase, meaning: “A difficult situation”.
Oh, my. There is a new and important study on temperature proxy reconstructions (McShane and Wyner 2010) submitted into the Annals of Applied Statistics and is listed to be published in the next issue. According to Steve McIntyre, this is one of the “top statistical journals”. This paper is a direct and serious rebuttal to the proxy reconstructions of Mann. It seems watertight on the surface, because instead of trying to attack the proxy data quality issues, they assumed the proxy data was accurate for their purpose, then created a bayesian backcast method. Then, using the proxy data, they demonstrate it fails to reproduce the sharp 20th century uptick.
I haven’t read the entire article, nor do I claim to have the expertise to comprehend its validity and value. Sad to see so many people who are like me but who do feel enabled. They are not skeptics.
I’ll post more response in the coming days, but in the meantime, feel free to post your response to the paper and the ensuing blogospew.
<hr>
Update:
Here’s the conclusion to the new paper:
Research on multi-proxy temperature reconstructions of the earth’s temperature is now entering its second decade. While the literature is large, there has been very little collaboration with university-level, professional statisticians (Wegman et al., 2006; Wegman, 2006). Our paper is an effort to apply some modern statistical methods to these problems. While our results agree with the climate scientists findings in some respects, our methods of estimating model uncertainty and accuracy are in sharp disagreement.
On the one hand, we conclude unequivocally that the evidence for a ”long-handled” hockey stick (where the shaft of the hockey stick extends to the year 1000 AD) is lacking in the data. The fundamental problem is that there is a limited amount of proxy data which dates back to 1000 AD; what is available is weakly predictive of global annual temperature.
Our backcasting methods, which track quite closely the methods applied most recently in Mann (2008) to the same data, are unable to catch the sharp run up in temperatures recorded in the 1990s, even in-sample. As can be seen in Figure 15, our estimate of the run up in temperature in the 1990s has a much smaller slope than the actual temperature series. Furthermore, the lower frame of Figure 18 clearly reveals that the proxy model is not at all able to track the high gradient segment. Consequently, the long flat handle of the hockey stick is best understood to be a feature of regression and less a reflection of our knowledge of the truth. Nevertheless, the temperatures of the last few decades have been relatively warm compared to many of the thousand year temperature curves sampled from the posterior distribution of our model.
Our main contribution is our efforts to seriously grapple with the uncertainty involved in paleoclimatological reconstructions. Regression of high dimensional time series is always a complex problem with many traps. In our case, the particular challenges include (i) a short sequence of training data, (ii) more predictors than observations, (iii) a very weak signal, and (iv) response and predictor variables which are both strongly autocorrelated. The final point is particularly troublesome: since the data is not easily modeled by a simple autoregressive process it follows that the number of ruly independent observations (i.e., the effective sample size) may be just too small for accurate reconstruction.
Climate scientists have greatly underestimated the uncertainty of proxy-based reconstructions and hence have been overconfident in their models. We have shown that time dependence in the temperature series is sufficiently strong to permit complex sequences of random numbers to forecast out-of-sample reasonably well fairly frequently (see, for example, Figure 9). Furthermore, even proxy based models with approximately the same amount of reconstructive skill (Figures 11,12, and 13), produce strikingly dissimilar historical backcasts: some of these look like hockey sticks but most do not (Figure 14).
Natural climate variability is not well understood and is probably quite large. It is not clear that the proxies currently used to predict temperature are even predictive of it at the scale of several decades let alone over many centuries. Nonetheless, paleoclimatoligical reconstructions constitute only one source of evidence in the AGW debate.
Our work stands entirely on the shoulders of those environmental scientists who labored untold years to assemble the vast network of natural proxies. Although we assume the reliability of their data for our purposes here, there still remains a considerable number of outstanding questions that can only be answered with a free and open inquiry and a great deal of replication.
So, if I understand this correctly, the authors created a model based on proxy records using some new or spiffy statistical methods and did a ‘backcast’ in order to see if the model could accurately reconstruct temperatures of the most recent instrumental period and the MBH98/99 and 08 hockey sticks, either with a flat long blade to 1000 AD or the more wobbly spaghetti graph. They claim that their model failed to do either and that there is far more uncertainty in the proxy records and paleoclimate recons than is acknowledged or understood.
McI is off to Italy and so his post on the matter is short and sweet, just posting a link and a brief quote from the paper. He does note that the paper mentions MM… However short his post is, his acolytes have chimed in with quite the giddy glee. For those who do not venture to that place, here I present a tray of goodies for your reading enjoyment:
Pat Frank
Halle-bloody-lujah. It’s about time statisticians took a detailed interest in the scientific kludge that is proxy thermometry.
Steve has been the lonely voice in this field for years, has single-handedly carried the fight for honest audits to the very core of the field, and has bravely withstood the resulting vicious opprobrium that has disgraced science.
So, hats off to Blakeley McShane and Abraham Wyner for standing up with Steve and taking up the rescue of scientific integrity. Until now, apart from Edward Wegman, it has been sorely neglected by their colleagues.
Richard Drake:
Agreed Pat. All I’d add is that Steve’s voice has been made less lonely by the terrific community here at Climate Audit. Never has something as humble as WordPress been used for something so significant. Every constructive critic and online supporter should consider some of the glory from the Annals of Applied Statistics as duly reflected on them today. It’s telling that Steve only knew of this from one of CA’s followers. I salute every one of you, friends.
Stephen Richards:
VINDICATION ! ! ! ! !
SOI:
This is very interesting. A paper by two heavy duty statisticians, both at top 5 business schools, in a highly respected mainstream peer-reviewed statistical journal. The paper appears a vindication of MM and a complete repudiation of Wahl and Amman (and to a fair extent of Mann). To my mind, the money quote is:
“…The major difference between our model and those of climate scientists, however, can be seen in the large width of our uncertainty bands. Because they are pathwise and account for the uncertainty in the parameters (as outlined in Section 5.3), they are much larger than those provided by climate scientists. In fact, our uncertainty bands are so wide that they envelop all of the other backcasts in the literature. Given their ample width, it is difficult to say that recent warming is an extraordinary event compared to the last 1,000 years. For example, according to our uncertainty bands, it is possible that it was as warm in the year 1200 AD as it is today. In contrast, the reconstructions produced in Mann et al. (2008) are completely pointwise…”
It will be interesting to see how the team reacts to this, but it is a harsh blow to the hockey stick.
Lance:
You can almost hear the smear campaign being formulated.
“This wasn’t published in a scientific journal.”
“The authors weren’t scientists, let alone climate scientists.”
They will be subjected to all of the same irrelevant and underhanded attacks as Wegman.
It won’t budge the faithful, but it will make a difference in scientific circles that acknowledge the dependence of scientific inquiry on legitimate statistical analysis.
Whether it will find traction in the popular media is another matter entirely.
They’re actually speculating that people are working to prevent this paper from being published!
JV:
> Any chance of some Team member still “going to town” to keep this from being actually published?”
Forecast calls for 100% chance that the the thumb screws are being turned as we speak. The question is will they be successful? This might be a case where being successful might be more damaging to their cause than failing. You don’t just smack down top end talent with a track record with out raising some eye brows. If the paper gets flushed with out an actual serious flaw being found, it will definitely pop up some where else. And the brew haha will make people even more curious.
They’re pretty giddy right now, but I did enjoy a bit of humor:
GrantB:
Blakeley McShane is from the Kellogg School of Management and is obviously funded by big corn.
LOL! Gotta love a sense of humor, even in a CA acolyte. 😉
Update: (Aug 18, 2010)
A couple of substantive looking comments on the part of Eli and Chris Watkins:
The new McShane and Wyner paper due to appear in Ann. Stats. is clearly going to be much discussed, so I thought I would get in with a few comments, after scanning it briefly.
Let me say first that it is great news that some stats journals are taking a look at climate reconstructions. Unfortunately the first half of this paper is very silly, and the second half is slightly more sensible, and the most plausible reconstruction they produce…..looks rather like the hockey-stick.
In the first half, they take 1200 temperature proxy series (treated as independent variables) and fit them to 119 temperature measurements (keeping overlapping holdout sequences of 30 yearly temperature measurements). Fitting 1200 coefficients to 119 data points is of course hopeless without further assumptions. Instead of doing some form of thoughtful data reduction, they employ the lasso to to the regression directly, with strong sparsity constraints.
They justify their choice of the lasso by saying:
“…the Lasso has been used successfully in a variety of p >> n contexts and because we repeated the analyses in this section using modeling strategies other than the
Lasso and obtained the same general results.” Both parts of this statement are wrong, and the first part is a MORONIC thing for statisticians to say. They give absolutely no reasons to suppose that the Lasso — a method that makes _very_strong_ implicit assumptions about the data — is in any way appropriate for this problem.The Lasso _is_ appropriate in certain cases where you believe that only a small subset of your variables are relevant. To use it as a substitute for any data reduction with 1200 variables and 119 data points, when _all_ the temperature proxy series are presumed to be relevant to some degree, and all are thought to be noisy, is simply stupid.
Not surprisingly, they find they can’t predict anything at all using the Lasso. (It is a completely inappropriate technique for the problem.)
In the second half of the paper, they do something which is almost sensible, (but less sensible than what the climate modellers do). They take 93 proxy series that go back a thousand years, and do OLS regression on various numbers of principal components of these series. Regressing on just one PC gives more or less Mann’s curve (ironically this is probably the most defensible prediction from all the ones they try)– when they regress on 10, they back-cast historical upward trends. If they were being agnostic statisticians, then I suspect that from the cross-validations they show, the most conservative model they could choose would be a model predicting on one or a very few principal components.
Hey presto, they’ve recovered Mann’s hockey stick as a most plausible estimate. As Garfield would say, Big Hairy Do.
That’s the bulk of the paper. Some but not all of the points they make about over-tight confidence intervals in the previous literature seem valid.
In my opinion they do not introduce any useful new techniques into palaeclimate reconstruction: their main contribution is to show that using the Lasso with no prior dimension reduction is as useless an idea as any sensible person would expect it to be.
This paper shows, if proof were needed, that it is possible to get ill-considered papers into good peer reviewed journals, especially if they are on hot topics.
So, three points: First, the Lasso approach is useless and shows nothing except that it is useless. Second, doing OLS regression on the first PC produces an MBH hockey sticky looking thing. Oh, and the first half is silly.
Here is a further elucidation of his comment:
What the Lasso does does — intuitively put — is to force quite a lot of the regression coefficients to be zero. This means that if you have (as in this case) about 120 data points, and you have a much larger number of proxy series ( over a thousand), then _if_ you believe that there is a linear combination of just a small number of the proxy series that can correctly fit the data, then a Lasso is a good technique to try. In other words, use it if you believe that there may be a good prediction rule based on a small number of the series.
Now, is this reasonable for climate proxies? Well … perhaps … but common sense might indicate not. After all, all, the proxy series are thought to be noisy, and it would seem reasonable (if they can be used at all to predict temperatures) that you would want a rule that combined a lot of them so as to average out errors in any individual series.
Hence, for this type of data, it does not seem plausible that taking over 1000 series and selecting a few of them to fit the (short) temperature series is going to produce a good predictor.
Well, this paper seems to show that, indeed, using the Lasso doesn’t work well for this problem.
That’s a valuable thing to show. It’s reasonable to try the Lasso — but I don’t think we should be surprised that it didn’t work here. Other methods which make different assumptions about the data might work much better.
Here’s Eli via Martin Vermeer:
My explanation at link for why the M&W reconstruction is erroneous, was a little too simple. It’s the Wabett who gets it completely right: the fundamental error is calibration only against a hemispheric average, when local data — the 5×5 degree grid cells of instrumental data as used by Mann et al. –provide a so much richer source of variability — i.e., signal — to calibrate against.
It is this poor signal/noise ratio that helps the calibration pick up spurious low vs. high latitude temp difference “signal”, which in the reconstruction interacts with the Earth axis tilt change effect.
What stands is the observation that doing the calibration against the instrumental PC1 (instead of hemispheric average) will give you back pretty exactly the genuine Mann stick(TM) even in spite of this.
Congrats Eli!
And here is some fellow called William M. Briggs — Statistician to the Stars! who provides a play-by-play analysis of the paper:
This is just as beautiful as a shorthanded goal. It means we cannot tell—using these data and this model—with any reasonable certainty if temperatures have changed plus or minus 1oC over the last thousand years.
McShane and Wyner don’t skate off the ice error free. They suggest, but only half-heartedly, that “the proxy signal can be enhanced by smoothing various time series before modeling.” Smoothing data before using it as input to a model is a capital no-no (see this, this, and this).
Finally, we have our Amen, the touch of Grace, the last element of a Gordie Howe hat trick3, and worthy of an octopus tossed onto the ice:
Climate scientists have greatly underestimated the uncertainty of proxy-based reconstructions and hence have been overconfident in their models.
So that’s just a sampling of discussion on the paper.
Gave it a quick read-through, and find that the part about calibration/validation is the weakest, and I think wrong. They use a 30-year validation block, and move it through the instrumental period — so mostly it will be bookended between two pieces of calibration block. Most of the time you’re interpolating! And no year in the validation block is more than 15 years from the edge (Mann: 50 years), so temporal autocorrelation becomes a bigger issue. The Mann approach is so much cleaner, and tells what we’re interested in: how well can we infer temps at a level away from the calibration range.
About the Bayesian thingy, yes that looks interesting… actually the result is not so very different from the Mann curve, which is pointed out. BTW the differences between the curves in Figure 14 look suspiciously like the signature of the Earth axis tilt change over time, cf. Kaufman et al. I suspect this means something…
Remember when looking at the late 20th C issue, that this reconstruction uses only the 93 proxies that extend back to 1000AD (i.e., a “frozen” recon). Compare with Mann et al.’s Figure 2c, the light green curve. Yes it goes outside the 95% bounds at the end… but then look at the other curves, esp. the orange one from 1500 onward, based on many more proxies. I would say it’s a fluke.
The good news is that this isn’t a “stupid” paper. I’m finding new things all the time re-reading it. But, any paper citing the Wegman report as is it were serious science makes my tooth ache.
Back to reading and thinking…
Took a quick read through it. My thoughts on the paper (sorry not synthesized). A little bit loosey goosey as I’m not a technical expert in the field.
1. Overall: The paper can be thought of as 3 parts: background on climate recons, correlation tests of proxies and pseudoproxies to instrumental temperature (and hence implications of their predictive ability, further out), their own hindcast. Of these, the middle part is most important and the last part is a nice touch. The begginning part is a little flawed, but not central to the contributions.
2. Bacgroundr: There’s something sad in that even as a layperson following this stuff on the blogs and scanning occasional papers, I have a better feel for the history of the disputes and the overall field than the paper authors. There are a few mistakes in describing Mike’s methods (one PC). Also in the history of the disputes. Finally, the disputes and Mike’s methods are both intricate…so it’s not surprising that an outsider, even one doing good work, does not follow them all. Yet, it would make me feel more confident about their work itself if they had things like this “buttoned up” which are really just attention to detail.
3. Tone of background: In addition, there are things that are phrased in a little but non-neutral manner (e.g. “alarming”). Even for tactical reasons (let alone fairness, inviting subjects of dispute to view the rest of the paper), it would be better to have been more bland here. This does not affect tha stats work and I hope it does not become a distractor.
4. Extraneous background issues: There are some things like the McI full-acf that are mentioned and implicitly defended, but which are not central to the work and I suspect, the criticisms of such are not fully understood by the writers (iow that they are overly similar to signal containing proxies). In particular this criticism of the McI work was not (at least from me) a critique that there is no “mining” within Mike’s methods, but that McI exaggerates the effect and the cause. this is similar to the (very good) Huybers comment where he shows McI confounding two separate issues (skew PCA and standard deviation normalization): Huybers actually agrees (I’ve corresponded with him) that skew PCA has a hockey stick selection bias, just that McI exaggerate the amount, by mixing in a different issue.
5. Happy that they wrote a paper: I’ve long felt that what is being argued by McI is essentially an issue of data-method space. Iow certain methods (mathematical functions) have certain behaviors on certain shapes (mathematical data time relations). McI has always resisted doing real exploration of this “space” to show where it has issues where it does not, etc. Just to catalog the landscape. instead McI has gone with one-offs, blog posts, gotchas, dotcom stocks and the like. In contrast, mcshayne tries to survey the landscape.
6. Burger work ignored: Burger and Cubasch did nice examinations of method variability of recon, in Teleos and GRL. Also Burger did an examination of different training periods (very similar to McShayne’s work) in COTPD. But neither paper is cited or discussed.
7. Nice touch to look at within instrument prediction: It’s well explained and intuitively apt to see how well different calibration/verification schemes function. IOW, just to test within the instrumental temp regime, “are the proxies, proxies”. I’ve long had a concern that the degrees of freedom in the instrument period are not that high and there is a danger of just fitting two “upgoing” trends to each other. Zorita has also described this concern more mathematically. Just looking at individual proxies, you don’t see a quick match of temp and most proxies within the wiggles. In contrast, something like Kim Cobb’s coral shows, just a visual correspondance, including at the annual scale to ups and downs, at least of local temp.
8. Issue of global temp versus local: It’s a real concern to have to have the proxies calibrated or predictive of global as opposed to local temp (what Mike usuall does and what was done more in this paper as well). We know that regional variations can occur irrespective of global temps. Thus any physical relationship to local temp, is likely to be even more tenuous to global temp. Not zero. But less. In particular, I suspect you will lose a lot of the smaller time scale wiggles and ability to wiggle match….which sort of interacts with point 7, to reduce your ability to do reliable proxy matching and basically makes you tend to pick time series that generally go up to match an instrumental period that generally goes up. but that;s a single degree of freedom. Gives less confidence of a real relationship that will persist out of sample, versus a “wiggle match” with several degrees of freedom.
9. Generalizability: The McKshane paper would seem to have most relevance to Mann’s work. Other recons have used other methods. Even Mann uses different methods at different times. While the general issue of calibrating proxies versus instruments to predict back, may be more general than just Mike, I’m not sure how generalizeable McShayne’s work is. Also, perhaps the paper is a little overglorified if it is criticizing one person’s work, but saying it is correcting the field’s work. (I don’t even know, really–just a concern, here.)
10. Approve of the recon being done: I think doing a recon and giving best estimate, but showing the wider error bars is really the best way to challenge one’s self to think about the methods and how much we can/do know. This is an improvement over McI’s critiques. It’s not that I want to stop McI from finding errors, or even to transform him into a working scientist. It’s just that honestly, this exercise would concentrate the thinking on the subject, even in a fault finding mode. Capisce?
11. Detail of alternatives: In several places, McShayne talks about alternate methods or choices, but them “not affecting the conclusions qualitatively”. I think actually showing these would be helpful. If possible in a summarized form in the paper. If not, in the SI. Also, what are “the conclusions” (can he number them?). And a bit more color would be helpful and fair, even if he can’t show all the details of every choice. For instance, instead of “not affecting the conclusions qualitiatively”, some comment about the amount (or at least direction) that “the conclusions” are weakened or strengthened.
12. I’m not a statistician and haven’t read the paper that closely, so can’t really engage on the details. However, even if there are issues with the paper, I think it’s a useful step to starting to catalog data-method parameter “space”.
13. One tiny trial proxy suggestion: But I think an interesting proxy to try would be just a few instrumental records. how well do they work at predicting global temp (if calibrated to global temp). Capisce?
14. McI comment: Annals of Applied Stats was founded in 2007. I think “high ranked journal within statistics” is a bit of hyperbole. but I really don’t care. Specialty journals are a good place to really get into things regardless.
It hasn’t yet been replicated yet, has it? Until then, I believe nobody gets to say either way, so all of the proclamations of the death of the hockey stick seem a tad hurried and even frenzied where I’m seeing it waved around as if it’s a smoking gun.
I believe both proclamations of death and defenses of death from both sides over the years have been non-thoughtful. I’m not saying this because of wanting to be a peacemaker or split the difference. It’s my honest assessment. I think the way to think about these issues is in terms of data-method space. People like Zorita, Huybers, Burger are going after the thing right. That’s not to say any of those 3 understand all or are correct in all their papers. but that they are really trying to understand the space. Vice kill the stick with gotchaz (McI, Watts) or prop it up with a bunch of doesn’t matters or with a lack of interest in real methodology discussions (Mike…he spends WAY too much time in Science/Nature/PNAS on the recon itself, way too LITTLE time in real methods journals talking about methods themselves with an almost disinterested stance on what the answer is, even what the field is)
I did a first read of the McShane & Wyner paper. I am not a climatologist or a statistician.
I think everyone agrees there are large uncertainties when “backcasting” global temps a thousand years. M&W take issue with how proxies are calibrated with the instrumental record. If you delete a block of data you can try to “predict” it by using the remaining temp data (and its derivative) or using your proxies. You’d like for the proxies to do better than interpolation. What M&W call pseudo-proxies or “fake” data is really a form of interpolation. They aren’t really using pure noise. Climatologists, according to M&W, block out a 50 year window in the middle of the instrumental record. M&W block out various 30 year windows. With the shorter window interpolation does as well as the proxies they claim and therefore the proxies are of little use. It makes sense to me that interpolation would do better with shorter window length. So, I don’t see paper as punching that big a hole in the “hockey stick” constructions. I look forward to seeing what people who know more about this than I do have to say.
Thought some more, and realized that the fatal flaw in their reconstruction is, that they calibrate against a single global mean instrumental temp curve.
Look at how Mann is doing it: he reconstructs individually (using 5×5 grid celle of instrumental values) the temperature in each grid cell. Only then does he construct the global average, by averaging over these cells. What these authors use for calibration is a single global average; all the detailed information (which is useful for calibration, and which Mann uses!) is thrown away. The most important of this info is, I think, latitude dependent temp variation patterns.
It doesn’t matter then if you use one or ten PCs, you’re not gonna find this information. The authors also try using instead of the average, the first PC of instrumental. Probably better at getting the essential info out, and it is very very interesting to note that this (the blue solution in Figure 14) is like two peas with Mann’s EIV NH land.
One wonders how these authors justify choosing PC10 as their standard solution? With local PC1 + PC10 — which also according to their whisker plot is clearly better –, the curve pivots down 0.3C, and I bet the significance levels of 1998 and the last 30 years relative the last millenium would have gone through the roof. This is one question I really would like to hear an answer to (or a leaked email would do, sorry can’t resist).
Mike, climatologists, specifically Mann, according to the paper (and in reality) do not block out 50 years in the middle of instrumental, but at the start or the end. An important difference!
I’m on shakey ground, since I’m just a blogreader not a scientist, but my impression was that in essense MBH98/99 use global (or hemispheric) instrument temps for training in the regressions. Also that the EIV in M08 used global (there’s discussion of varying the analysis to use local, but impression I have is their figures shown are back to global/hemisphereic).
I talked about the issue with fitting to global, not preserving regional variation for training in 8 above.
I’m straining the NE Audit memory cells, but I think Burger tried an approach of looking at a lot of missing years, maybe even noncontiguous ones to think about how well the proxies really predict. Something that gets us past a degree or two or three of freedom implicit in matching a placid first 50 and rising last 50.
Sorry for abusing tghe authors’ names. McShane and Wyner.
Also am informed that the journal is a split from a longer more established stats journal.
Finally, when I say “sad” about me following the kerfuffle more than the authors, I didn’t mean to imply they were “sad” for not following this more, but that it’s sad that the history is so intricate and that I followed so much of it!
TCO, I feel the same way. I’ve only been following the hockey stick debates since 2007 and am nearly burnt out to think we’re still talking about it and that people believe so much is riding on it. How must others feel who’ve been at this for longer?
I just wish my stats was up to snuff but I’m just a policy wonk not a stats geek.
TCO,
If I recall Burger and Cubash (I assume that is what you are referring to), they looked at the various choices which could have been made in the analysis for the MBH 98/99 analysis using synthetic proxies. I believe that the takeaway from that was that the method MBH chose tended to underestimate variability on both decadal and centennial scales. This seems to be a reasonable conclusion, one Mann himself reached.
MBH used the late calibration (as far as I can recall) and used the rather “placid” early period for verification.
One real weakness of the B-School boys, besides approaching this with no real knowledge of the field, is that they did not benchmark their new technique against synthetic proxy data derived from GCM runs. This has become a rather common technique amongst people in the paleo community when looking at new techniques. Indeed, and to their credit, MW actually cited a paper by Li, Nychka and Ammann which also used a BHM, but they were benchmarking it to see whether it actually worked (it did, to a certain extent).
Good point. They could have added that into the whole section where they had a bunch of different pseudoporxies.
BTW, I used to just figure you for some hoi polloi but you seem pretty up on this stuff. What gives?
Rattus makes an interesting point (but does not take it further): McShane and Wyner refer to the lack of using professional statisticians, but as Rattus notes “One real weakness of the B-School boys, besides approaching this with no real knowledge of the field,”. That little by-line is what bugs me: people criticise the paleocommunity for supposedly not involving enough statisticians, and then two statisticians look at paleoclimate data…without involving any paleoclimatologists!
Yeah. And no bloggerz or blog commenterz either. 😦
TCO,
I am a hoi polloi, but I am pretty much up on this stuff. Hey, it’s interesting. The one question I had about Li, et. al. was that they only used 4:1 noise which seems to me to be pretty generous. IIRC, Mann has done some investigations with 10:1 noise, but was still able to get reasonable results with synthetic proxies.
Not to get into the Tiljander argument, but that would explain his choice of ~.15 correlation for proxies to local temp in Mann, et. al. 2009. This would let darksum and thickness through but it appears that lightsum and XRD might have gotten through although I am not even sure there.
And then proceed to cite a paper which involves two statisticians who have experience with paleo recons (they also included a paleo person in that paper also). John Mashey has a few links to some presos at NCAR (you can find the links at Deep Climate, Wegman and Said thread) in which a statistician (I believe his name is Barber) brings up a good point about the limitations on getting more statisticians involved in climate work. The main limitation: the time it takes to understand the issues which are faced by scientists working in the given field. He was referring to model evaluation, but it surely refers to paleo also.
I got the last bit in my response to TCO wrong. XRD and lightsum are negatively correslated to temp so should have been tossed. Classifying them as garden variety varve proxies was wrong.
> XRD and lightsum are negatively correslated to temp so should have been tossed.
My understanding as well
BTW continuing the afore, what McShane and Wyner also get wrong is doing the RMSE test against the (aggregated) global mean temp curve, when they should have used the full field over time. Very different things. As experienced statisticians they should have known that 🙂
BTW I am pretty sure Mann et al. used 5×5 five gid cells according to the code I’ve seen. They place the proxies with their data and the grid cells with the instru data in one honking big data table, and start “imputating” the missing values using Tapio Schneider’s RegEM — essentially multiple regression like these authors do. And only then, global/hemispheric averaging.
The four proxies of thickness, XRD, LS and DS all need to be boiled down to two. Probably thickness and XRD. LS and DS are somewhat out there measurements that are really aspects of XRD. I feel better about using the more intrinsic physical measurement XRD, unless there is a large literature to say why LS and DS are better than XRD. XRD is related to percent transmittance. LS and DS just has to do with pixels of light and dark on an X ray film (or a printed version of one if done electronically) and should intrinsically be related to XRD itself.
TCO,
I’ll buy that. I’d need to reread the Boreas paper to confirm your observations, but still thickness should have been allowed through. Toss the other three and what happens? I don’t know, but my guess is probably not much certainly not enough to invalidate the paper or even the claim of CPS skill back to 1500 w/o dendro, which was not invalidated by leaving out all four of the Tiljander proxies. Quite frankly I don’t get the rather surreal argument over this. It might be possible to use LS and XRD to disambiguate the anthro influences on thickness, but damned if I know how it might be done.
I wrote a lot about the Boreas paper in a somewhat tedious (for others) evolving discussion with Amac. My impression from the Boreas paper is that the varves are even chancier than the tree rings and at least in the Tiljander case, that she really did not have a good impression of what was changing why/when. The M08 paper honestly did have a lot of emphasis on an old, tree-less recon (that was not all of the paper, but it was a lot of the claim for novelty, was in the abstract and such…I know, I checked…because I suspected McI of sliding a fast one). Take away those claims and the paper is a lot less novel and a lot less of a PNAS paper. That’s not to say, you’re screwed. As long as you beleive the bcps, you’re fine. Just that was the situation before.
Just looking at the series and reading the paper, my impression was that landuse was driving the pretty strong and persistent trend from 1700 on. But I am not an expert. It really looked kind of different from a tree or coral proxy though. Tiljander did not assign a meaning to thickness, so hard to say how to treat it. Yeah…it’s a time series of a naturally occuring thing. But would be a bit better if it were like a tree ring, that is had the collector believing it was suitable for a temp reconstruction. Not sure if the hesitancy is Tiljander’s inability to figure out the materials or just that the stuff is a mess. It is well known that varves may be better precip proxies or landuse proxies than temp sometimes. So really, best thing would be an independant varveologists view of the stuff. Mike should have had one on his paper, probably. And a cavesickleologist, too. (He had tree rings covered.)
By the way TCO, did you notice that McShane and Wyner try to do something similar to what Burger and Cubash did if I remember well: study the full space of all the alternative processing options. And funnily enough it appears to suffer from the same flaw: some of the included processing alternatives are just wrong.
I don’t know that what Burger and Cubash did was “wrong”. I’m not an expert, but the sensitivity of RE to alternate method choices in the MBH seems to me to be a danger sign. Almost like “method” fitting as another step in the regression.
Also, I think it is closer to what Burger did in 2007 in terms of looking at multiple (I seem to remember the term “all”) RE/CE locations in the instrumental record.
> negatively correslated to temp
A good correlation is useful, whether negative or positive. The key is to detect the change in correlation. The years in which the correlation was poor could tell you something interesting was going on if looked at carefully. Say removing a forest by fire or logging, and replacing it with grassland, brush, or agriculture — each change would have a signature in the sediments, as well as changing the time of year during which organic vs. inorganic material was washing off the land into the lake bed.
“Negative” does not mean “bad” correlation.
Thanks! I reread page 12.
1) This is ~ Wegman Report, the Sequel …
Among other things, even some oddities of wording seem to derive from the WR.
Start with Bradley(1999), p.1, first sentence.
“Paleoclimatology is the study of climate prior to the period of instrumental measurements…”
See Wegman Report, specifically p., which essentially/reasonably summarizes Bradley(1999) pp.1-10. It starts:
“Paleoclimatology focuses on climate, principally temperature, prior to the era when instrumentation was available to measure climate artifacts.”
*artifacts* is odd terminology that I don’t recall seeing elsewhere.
McShane&Wyner have:
“Paleoclimatology is the study of climate and climate
change over the scale of the entire history of earth. A particular area of focus
is temperature….
The key idea is to use various artifacts of historical periods which
were strongly influenced by temperature and which survive to the present.”
2) Although not as bad as the WR, the Bibliography is suspect:
– BBC
– AIT
– Green, K.C. Armstrong, J.S., and Soon… (recall that Armstrong is at Wharton)
– 3 WSJ OpEds on climategate
– Lamb(1990)
– Matthes (1939)
– Rothstein, http://www.nytimes.com/2008/10/17/arts/design/17clim.html?pagewanted=all cited as alarming both populace and policy makers
3) They cite Wegman often. I believe they will come to regret that.
4) The results will stand or be demolished on their own merits, but as background, who are these guys?
McShane is a 2010 PhD from Wharton, Wyner is his Dissertation director.
WHYNER
Pubs, through 2003.
Pubs, Google Scholar, including 3 on baseball (or Bayesball).
For several years, he contributed (as “Adi” to a group blog, now dormont Politically Incorrect Statistics, occasionally touching upon climate:
http://picstat.blogspot.com/2008/08/is-it-really-so-simple.html August 27, 2008
http://picstat.blogspot.com/2008/07/vanishing-temperature-trends.html July 29, 2008
http://picstat.blogspot.com/2008/05/sea-ice-continued.html May 04, 2008
“Back in 1975, when we are at the end of a 30 year period of declining global temperatures, the consensus among the climate scientists was a coming ice age…”
Zeke Hausfather explains in comments…
http://picstat.blogspot.com/2008/05/sea-ice-continued.html may 04, 2008 Sea ice variations are just normal, and besides Antarctic ice growing…
http://picstat.blogspot.com/2008/05/southern-hemisphere-sea-ice.html May 01, NSIDC must be wrong…
http://picstat.blogspot.com/2006_08_01_archive.html August 07, 2006 (not by Wyner, but by his colleague Dean Foster)
http://picstat.blogspot.com/2005/11/greenhouse-gases-increasing-but-still.html November 25, 2006 (again, not by Wyner, but by Dean Foster)
Like Wegman, who often gave talks to audiences unlikely to have much climate expertise, we have:
Wyner gave talk March 2010:
Click to access Abraham%20Wyner%20-%20Title%20and%20Abstract.pdf
People may wish to read this, but put down coffee first. I’m sure all will be pleased to see:
‘The relationship between proxies and temperature is weak”
MCSHANE:
Ph.D. in Statistics, May 2010 (Anticipated)
Thesis: Integrating Machine Learning Methods with Hidden Markov Models: A New
Approach to Categorical Time Series Analysis with Application to Sleep Data
Thesis Advisor: Abraham Wyner, Department of Statistics
Marketing Advisor: Eric Bradlow, Department of Marketing
His C.V is eclectic, including modeling sleep in mice and several baseball papers. It is impressive that he managed to become a paleolclimate expert also.
““Are Reconstructions of Surface Temperatures Over the Last 1000 Years Reliable?” Presented
February 2009 at Information Theory and Applications Workshop, San Diego, CA.” that’s:
http://ita.ucsd.edu/workshop.php?submitted=1
http://ita.ucsd.edu/workshop/09/talks/ Wyner organized session, McShane presented…
This conference covers a vast range of topics.
Cross posted from DC:
http://deepclimate.org/2010/08/12/open-thread-%C2%A05/#comment-5034
There are a *lot* of problems with McShane and Wyner.
First there are the obvious problems with sections 1 and 2, where M&W demonstrate an abysmal grasp of paleoclimatology in an exposition based on dimly understood and hilariously misinterpreted portions of MM and the Wegman Report.
Section 3 is probably the low point where the authors use a toy strawman model (Lasso) to “prove” that random noise will validate within the instrumental period as well or better than the actual proxies from Mann et al 08.
There are many issues glossed over here. Off the top of my head:
– M&W complain about the short verification windows, and yet have shortened them from 50 years to 30!
– It’s not clear to me yet what relationship is assumed between proxies and temperature. Is it assumed to be positive? Or whatever pops out?
Either way it’s problematic because the real recon model assumes different a priori relationships for different proxies. You’ve got to wonder: how would their random “proxies” have compared to the real proxies if they had each been run through a real validation engine that does a real screening/mini-reconstruction within the instrumental sub-window?
On section 4 (the actual reconstruction), as many have already pointed out, the choice of k=10 PCs is absurd. There are only 90 or so proxies back to 1000! I believe this issue has been covered in the literature.
And as RN points out at SheWonk, it is now standard to benchmark recon models/methodologies against pseudo-proxies generated from a GCM based temperature record. This allows one to judge the performance of a methodology where the “correct” answer is already known.
I doubt M&W are actually aware of any of this, though – I don’t think they have even read Mea 2008.
(And, yes, I’m doing a post on this later today. Stay tuned.)
I mostly agree with PolyTCO in #23 supra on the Tiljander proxies.
I’ve done a more careful compilation of these data series and put up some pretty pictures as a blog post, The Tiljander Data Series: Data and Graphs. Gee, elegant title, now that I see it in print…
Aaah…I’m such a sneaky bastard. I asked Ed Zorita to give me a review of M&W (I know I can’t find stats flaws and regressions and all). Just like I knew MMH was hosed up in very key aspects of the hypothesis testing and trend variance definition…and knew James Annan was the one to help me. I feel like a spider in a web. I don’t even understand the technical issues…but I can sometimes smell what might be an issue and then appeal to who will be able to smoke out and understand the complications. Way more lazy and fun than “learning R” as Ross and Moshpit have been after me to do for years.
🙂 🙂
I actually pulled out my grad stats text and tried to work through some of the stats. I’m trying but at this point, I’ll read what DC and Zorita have to say.
Returning from vacations and while watching every Spiderman videos on Youtube for my little fella in another window, I’ll simply note that:
– I truly enjoy the coverage of the comments on CA, which shows that the CA readership is as ruthless as any other one.
– I truly dislike analogies that lack any plausibility. Gordie Howe is one of the most celebrated hockey players. He had a very, very long career. And for the hockey illiterate: elbowing is illegal.
Maybe McShane will become the Gordie Howe of statistics, but maybe we should start by comparing him with a gritty rookie, or stick to the Hanson brothers character from **Slap Shot**. If we really want to make him shine like a young Stanley Cup ringbearer that likes to get in front of the net, we could perhaps compare him to Dustin Bufyglien.
“The true skeptic waits and evaluates all new data before deciding.”…….Nice, so the debate has been going on for over 20 yrs. and you still don’t have an opinion regarding the validity of climate science or are you specifically referring to this paper? In which case, you could actually read the paper and decide for yourself before trying to weigh peoples opinions that would be no more informed than yours. I don’t mean to be critical, but at some point, we have to learn to learn for ourselves without being held to the interpretations of some people that may or may not have “dogs in the fight”.
I am not a skeptic. I was trying to point out that a true skeptic is interested in evidence and not just select bits of it that validate a particular position. In other words, Watts and others are not true skeptics. At this point in time and given the preponderance of evidence, the only legitimate reason to maintain a skeptical position is because you personally feel unable to judge. This means you must try to educate yourself so you can decide.
>>”I haven’t read the entire article, nor do I claim to have the expertise to comprehend its validity and value. Sad to see so many people who are like me but who do feel enabled.”
I completely agree with this, which made some of the comments above especially funny.
Very nice blog you have heree