A Head in the Polls

You can’t go anywhere at the minute without hearing about polls, so I thought I’d write a few words on how these figures are calculated, and why some polling companies tend to give different results to others. I hope for this to be neutral (though I appreciate the timing could be better), so please trust I have avoided as much partizan-ness as possible. 

By some means (I am not entirely sure how as I have never been asked and don’t know of anyone who has, but it involves people filling out a survey online), the polling company will have a number of responders to its survey, a number of whom (let’s say x) intend to vote no, and a number (y) intend to vote yes. These figures, however, are not what are reported, and rightly, because they may be biased. A bias occurs when the demographic of the responders is not representative of the population as a whole.

For example, if our sample is 25% women and 75% men, this is biased, as the ratio should be closer to 50:50. Instead, what happens is that the poll companies will multiply the total number of votes in each demographic by a particular number, called a weight, to get a total more representative. In the case I described above, each woman will count as two people, and each man as 2/3 of a person.

So, if in a survey of some choice A versus another choice B we survey 250 women, of whom 100 vote A and 150 vote B and 750 men of whom 500 vote A and 250 vote B, the A-B ratio pre-weighting is 60%-40% and after weighting it is 53%-47%. 

Similar things happen with age, employment status, and so on, and it may be the case that for every demographic, the resulting numbers are a little bit “off” what they should be (as a result of the competing weights used). 

The reason there are differences between polling companies is that they calculate the weightings differently- namely, how do you decide what a valid “class” (a demographic which shares a common weight) is? And then how do you calculate the correct weights for these?

The different weights applied to classes combine to form a “model” (in statistical terms a model is a way of using data to represent a reality- the better the model, the closer to reality it is). Finding a model that approximates reality is a statistician’s holy grail (and if you do get a model of 100% accuracy, it’s best to approach with an air of suspicion).

Demographic data such as sex, age, and so on is fairly easy to calculate using census data (although, it will require some minor adjustment as it will be three years out-of-date), but most polling companies will also classify responders based on political leanings, which they’ll decide based on who they voted for in 2010 (General Election), 2011 (Scottish parliament election) or 2014 (European parliament). Newspaper readership may also come into play.

By way of example, two polls came out on Saturday night, this one from YouGov showing a slight advantage for “Yes”, and this one from Panelbase showing a slightly-less-slight advantage for “No”.

The YouGov one included Newspaper readership, while Panelbase did not.

Panelbase went for results in the 2011 and 2014 elections, while YouGov opted for the more general “political party affiliation” (taking into account that many voters vote Labour in Westminster, but SNP in Holyrood).

These factors combined contribute towards the differences, and you can hopefully see how the weightings have been applied from the various tables.

Ultimately, however, the variation is derived from the fact that it’s a random sample. For a good guide to actually reading polls, I recommend Nate Silver’s website fivethirtyeight and his book “The Signal and the Noise”. Also, worth checking out my old flatmate’s post on how we should interpret results here.

  1. markdavo says:

    Also worth pointing out in the YouGov poll is that more people said they were going to vote “No” than “Yes”, but because the sample was felt to have an insufficient number of SNP, the weighted sample gave Yes a 2% lead. Strangely, a higher percentage of people said “No” in YouGov’s unweighted sample than Panelbase’s, yet because of the way they weight their votes, pretty different conclusions were reached.

