How to Extract Prejudicial Data from a Political Survey by Yael Ben-Shachar?

How can you extract prejudicial data from a political survey? That was the challenge I faced when I began my summer internship with Tobias Konitzer of the Stanford Communications Department. At first, I was unsure about how a mathematician like myself could contribute to a study about politics. But I was both surprised and delighted to find out that math was the secret ingredient in solving the problem.

Before digging into the work itself, I first had to master a challenging statistical program called “R”, which would play a major role in helping us squeeze bias out of existing poll results.  I also had to learn the ins-and-outs of a proprietary algorithm that Tobi had developed for collecting and organizing large-scale data quickly and accurately.

Still, I had my questions about what we were attempting to do. I asked the project head: “How can polls be biased when the data is a reflection of the people being polled? And, if there is bias, how are we supposed to ferret it out?”

“Most people view polls or surveys as sources of scientifically-developed data,” Tobi explained. However, the history of political polling tells us otherwise because results frequently underperform our expectations. For example, pollsters were far off-base in the recent Brexit vote by British citizens. Furthermore, while the average national results of the Obama vs. Romney presidential election were largely accurate, many individual polls were consistently wrong.”

“If polling is a science, how could so many polls provide contradictory results, and how could polls such as those in the United Kingdom be so far off the mark?” I asked.

Tobi had the answer: “Bias of one form or another is often built into the polling instruments themselves,” he told me. “Such bias can result from the choice of questions posed by pollsters, how those questions are phrased, the groups that are selected for the sample, the size of the sample, and whether polls are self-selected or randomly selected in a scientific manner by a third party.”

Now, this project was getting interesting, and Tobi had my full attention.

The goal of my summer internship at Stanford was to use “R” to mathematically strip all possible bias out of a poll for the upcoming presidential election, and thus produce a more accurate result. The data we used was biased towards one side of the political spectrum because the poll was published on a website viewed almost exclusively by voters who shared that point of view. I used the algorithm to manipulate big data sets containing demographic data for both Republicans and Democrats. Then, I put my math skills to work, using the “R” program to squeeze out biases. After a substantial amount of work, we began to see different results, and my concerns about our ability to actually find and remove bias faded.

Several weeks into the project, I was thrilled to find that the polling data began to shed its built-in favoritism and actually lean towards the opposite side of the political spectrum, as was reflected in more well-regarded polls. With additional work, the data would eventually contain almost no bias, making the polling much more objective and reliable.

Although the results we were seeking seemed anti-intuitive at first, it turned out that the meticulous process we used, helped along by my love of and expertise in math, could achieve what had seemed impossible when we began. Additionally, I realized that I had developed a new skill set using the “R” program and Tobi’s algorithm for data collection and analysis. These skills could have applications in many other areas, including data gathering for school assignments, or analyzing future polling results.

I now realize that our work could have a genuine impact on the accuracy of critical information and that math could be even more powerful than I thought. Meanwhile, I’ve personally learned to take most polling data with a grain of salt.

ABOUT THE AUTHOR:
Yael Ben-Shachar is a senior at a Silicon Valley High School. Volunteers for Boys and Girls Club teaching students math and reading skills and training other volunteers. Works with special needs children, specifically a boy with autism all year.
Learned how to squeeze prejudicial data out of large polls using sophisticated statistical programs at Stanford.
She is a Journalist for her school newspaper.