But, a complaint was raised concerning my calculations. The calculations were done correctly, but the premise of the calculations was what was called into question. Namely, I used every month as being random when in fact, given one 12-month period, the next 12-month period (and previous one, as well) would have 11 months in common with it. So, it is not a truly random series. This, the objector said, would make a difference. Once it was pointed out, I agreed and went to another source to help me out - Dr. John Grego of the University of South Carolina.

Dr. Grego and his colleagues did a similar study addressing the fact that all 15 years since the year 2000 have been in the top-20 hottest months. They wanted to know what are the chances of that happening on a random basis and came up with the odds of it happening randomly are 1.5 quadrillion to 1. This study was the actual inspiration for my posting mentioned above, so I thought this would be a good place to check. Dr. Grego was kind enough to take the time to figure the odds and this is his response:

Hello Dr. Keating--it's straightforward enough to write down the correlation structure for 12-month running means, but trying to account for that in an urn-model probability calculation looked formidable to me. So I simulated the process instead, while still using an underlying assumption that monthly temperatures were independent normal random variables, which I would consider to be a naive, but instructive approach. I appended R code at the end of this email. Basically, I generated 1632 (136X12) independent normal random variables, computed their 1621 (1632-11) running averages, sorted them from highest to lowest and then tallied how many of the highest 10 (those with rankings from 1612 to 1621) occurred in the last two years (indicated by an index ranging from 1598 to 1621). I first ran this simulation 1000 times, then 100000, then 1000000. I thought I might need to run much larger simulations if probabilities of some of the events below turned out to be very small, but that proved unnecessary.

The table lists each of the possible outcomes (0 of the 10 hottest months occurred in the last two years, 1 of the 10 hottest months occurred in the last two years,...,10 of the 10 hottest months occurred in the last two years) and the proportion of simulations for which each outcome occurred for each of the three simulation series I ran—results were surprisingly consistent as I upped the number of runs. As you can see, the proportion of extreme events is not as small as one might think, apparently driven by the very high autocorrelation in the series of running averages. Note how even a single value in the top 10 is quite small, but then the distribution is quite heavy-tailed and dies off very slowly--almost linearly!

(Listed proportion for test simulations of 1000, 100000, 1000000)

> 0 .92600, .90819, .90902

> 1 .03900, .03889, .03854

> 2 .01000, .02112, .02051

> 3 .01200, .01241, .01262

> 4 .00700, .00778, .00794

> 5 .00200, .00517, .00488

> 6 .00200, .00295, .00300

> 7 .00200, .00187, .00186

> 8 .00000, .00088, .00090

> 9 .00000, .00051, .00051

> 10 .00000, .00023, .00022

>

>

> --John Grego

>

> #Backward-looking 12-month running average--10000 simulations.

>

> a=filter(x,rep(1/n,n), sides=1)

> return(a)

> }

> tx=1632 #1632=136*12

> nhot=NULL

> for(nrep in 1:10000){

> x=rnorm(tx)

> ax=runavg(x)

> sort_ax=order(ax,na.last=NA)

> nhot[nrep]=sum(order(ax)[1612:1621]>=1598)

> }

> table(nhot)

Essentially, what Dr. Grego did was to run the data and determined just how many times this event actually occurred. The three columns demonstrate the outcome when running it 1000 times, 100,000 times and 1 million times. Comparing the columns, we can see the numbers for each occurrence is converging on some percentage. For the case of nine of the ten warmest months occurring on a random basis in a 24-month period the percentage is converging on .00051.

That means, the odds of this happening on a random basis are about 51 out of 100,000 (about 1 in 2000).

Those odds are certainly not as bad as the 1 in 1.6 x 10^17 I previously calculated, but they are sufficient to demonstrate this record could not possibly be random.

## No comments:

## Post a Comment