David Kirkpatrick

June 13, 2010

World Cup fans of Spain …

don’t start celebrating just yet.

From the link:

The World Cup offers fans of the globe’s most popular sport the chance to thrill and agonize over the ups and downs of their nations’ teams. For scientists, whether or not they are fans, it’s another chance to collect data and test hypotheses about how close the final match results reflected the relative skill and performance of the two teams — and if they used the best possible winning strategies.

When the dust clears after the  concludes next month, it’s likely that the champion will not be the team that played the best, said Gerald Skinner, an astrophysicist at the University of Maryland in College Park.

Following up on a lunchroom discussion with his avid fan tablemates, Skinner, who admits not being a great sports enthusiast, published a research paper in 2009 that worked out the details of his claim using statistical techniques familiar to astronomers. The findings backed up his posturing.

“It’s not entirely a , but the result of an individual football match has got a very large element of chance and  in it,” said Skinner.

October 8, 2009

Can outlier detection protect banking?

Because we know everything in place up to the end of last year failed miserably.

The release:

Banking on outlier detection

Simple computer model could act as early warning system for failing banks

Recent bank failures point to the continuing need for vigilance by regulators and investors. Now, a report in the International Journal of Operational Research, discusses the possibility of an early-warning system that spots the outliers before they fail.

The downfall of dozens of banks and financial organizations across the globe has been in the headlines since the meltdown of the subprime mortgage market, but even during the decade before, 1997 to 2007, more than forty banks failed in the US.

Randall Kimmel of the Department of Finance, at Kent State University, and colleagues David Booth and Stephane Elise Booth explain that there are numerous financial computer models that can predict specific outcomes for a given bank. However, these programs require large amounts of data available only to the banks themselves and the regulators and this data requires lots of preparation and manipulation for the model to work properly.

Bank regulators have the resources to handle the data and to use the complex computer models. But, these are significant barrier for researchers and individual investors. Kimmel and colleagues have now shown that a simple model with minimal data demands can be effective in the early detection of potentially troubled banks.

They suggest that banks that are similar to one another in size, based on total assets or total loans, should be similar to one another in terms of the associated returns or risk factors, namely net operating income and net loan losses respectively. “The further from its peers a bank is in terms of these variables, the more likely it is a potentially troubled bank,” Kimmel explains, “In terms of statistical analysis, such a bank would be an outlier.”

A problem bank can be defined as a bank with lots of high risk assets compared with total reserves, it is usually one that is performing poorly relative to other banks of a similar size – it is a statistical outlier, in other words. Such outliers distort the traditional statistical analyses making it difficult to spot them among the more average performers.

The KSU team has now developed a new application of a mathematical model, a Locally Weighted Scatter Plot Smooth, which they say requires minimal data preparation. More critically, it can be run in many off-the-shelf statistical software packages. Their model could be very effective as an early warning system for detecting potential bank failures.

“Our solution to this problem is to use a special type of regression, called LOESS,” explains Kimmel, “It gives more weight to the information about a particular bank the more similar it is to a peer bank, using this to build a profile (prediction equation) of what a bank should look like.” He adds that one then compares the actual information from the banks, which is in the public domain, with this profile. “Our research indicates that this technique, using readily available information and statistical software, is able to identify potentially problematic banks,” says Kimmel.

The researchers suggest that the same approach could be extended to the analysis of other industries, especially those that are highly regulated like utilities.


“The analysis of outlying data points by robust Locally Weighted Scatter Plot Smooth: a model for the identification of problem banks” in Int. J. Operational Research, 2010, 7, 1-15

September 26, 2009

Is one pollster cooking the books?

Maybe. This would rock the polling industry and how it gets its results published if true. Think about it, groups pay for polls all the time and the media dutifully reports those results comparing them to other results. Those polls might even get aggregated into trend lines at places like Pollster.com.

Statistician Nate Silver of 538 has long had issues with polls from Strategic Vision because they wouldn’t release their methodology, which is pretty much standard within the industry, and now he’s found very possible evidence the company is purely creating polling results out of whole cloth.

Stats aren’t very sexy, and polling is, as they say, an inexact science, but this allegation is very serious and Silver wouldn’t put his budding punditry on the line if he weren’t pretty sure of it’s veracity.

From the second link:

I posed that question largely as a hypothetical yesterday. But today, I pose it much more literally. Certain statistical properties of the results reported by Strategic Vision, LLC suggest, perhaps strongly, the possibility of fraud, although they certainly do not prove it and further investigation will be required.

The specific evidence in question is as follows. I looked at all polling results reported by Strategic Vision LLC since the beginning of 2005; results from 2008 onward are available at their website; other polls were recovered through archive.org. This is a lot of data — well over 100 polls, each of which asked an average of about 15-20 questions.

Like I said, very serious allegations. If you are interested in the gritty details, here’s a link to the original post Silver alluded to in the excerpt, and a follow-up post.

From the “follow-up post” link above:

Bottom line: It is highly unlikely, in my opinion, that the distribution of the results from the Strategic Vision polls are reflective of any sort of ordinary and organic, mathematical process.

That does not necessarily mean that they simply made these numbers up.

November 28, 2008

Take research papers with a grain of salt

I post a lot of science press releases and many are on research papers. This post from the excellent new blog, Secular Right, makes a great point.

Just because something was published does not make it correct. Not too sure about the stats statistic since it looks like a casual sample, but it should remind you to keep your skeptical mindset whatever the source.

From the link:

Just a quick addendum to my previous post where I advised caution about skepticism of science.  A biomedical scientist recently told me that the journal Virology had a statistician audit all their papers within a 1 year interval with statistics to see if they were using them correctly. Turned out that 2/3 of the papers which had statistics made basic elementary errors!  The moral here is to be very cautious of, and therefore skeptical of, new science, especially sexy new science.  Junk statistics are especially an issue with medical science because of the incentive structure of these research.

(And on another note for all those at Secular Right — thanks for the shoutoutfor my shoutout. That’s right, I’m thanking you for thanking me for thanking you for starting the blog. Er, or something.)

November 5, 2008

FiveThirtyEight.com rocked

Not complete, but look at these comparisans from 538:

