Big Data: The role of citizen scientists in the age of information abundance

| May 3, 2013
Big Data: The role of citizen scientists in the age of information abundance

A group of 36 students in Western University's Master of Arts in Journalism class has spent three months studying and reporting on citizen science. Over the past three weeks we have been sharing our citizen science stories -- how it emerged and evolved, where it stands now and where it's going. Visit our Citizen Science page to read previous articles. The series concludes next week.

 Stephen Brabin is very good at folding proteins. He twists and contorts their structures, bending them to his will. He does this, not for his living, but for fun.

 Brabin, a citizen scientist, is one of the top ranked players of Foldit, an online computer puzzle game that asks users to manipulate virtual proteins.

It might be a game for Brabin, but for people at Foldit it means real world medical advances. The results from the game could help develop AIDS medication among other breakthroughs.

 And Brabin's solution is just one of roughly 500,000, which the people behind Foldit will have to wade through. This massive amount of information is 'big data.'

Big Data? 

 Big data refers to the vast and unprecedented amounts of data, that scientists now have access to because of advancements in scientific technologies. In addition, citizen scientists are collecting large amounts of data every day using communications technologies such as smartphones.

There are human and natural activities that are also generating a lot of information. Everything from cell phone signals, scientific experiments, social media and many, many other processes are adding to big data.

Andrea Wiggins, of the data archiving company DataOne, describes big data as "a general set of practices and conditions around having way more data than we've ever had before."

"But it's not just that we have more data now," she said. "We also have a lot more access to it. And "more data is more knowledge," she added.

 (Andrea Wiggins interview.)   

In cancer research for example, sophisticated technology such as genetic sequencing allows scientists to analyze tumour samples faster and in much more detail than in the past. However, this generates much more data than researchers can handle.

When analyzed thoroughly, scientists are able to make new and groundbreaking discoveries that would not have been possible without big data. They can also easily identify trends and gain insights about what the information is saying. And with the help of citizen scientists, they can now do it much quicker.

Galaxy Zoo is one way the public is helping scientists analyze their big data. It's an interactive website that displays pictures of galaxies and asks questions about shape, size and colour. This allows scientists to create databases of accurately classified galaxies for their research.

With millions and millions of galaxies in a single data set, scientists have to rely on the help of citizen scientists to comb through and identify them one at a time. (Listen to an interview with Kevin Schawinsky here.) 

Why humans and not machines?

Advancements in scientific technology have made it possible to capture much more data than we can even imagine. But if we have the technology to collect it, why can't we use that same technology to analyze it?

Some scientists have looked at a number of methods -- such as automated algorithms -- to see if they can train a computer to analyze big data faster.

However, computer algorithms simply don't have the same ability to recognize patterns as the human eye, said Dr. Joanna Owens of Cancer Research UK.

A lot of the data that they are looking at have subtle changes and shifts in patterns, or differences in colour that the human eye is great at distinguishing, but is difficult to train a computer to do accurately.

"We believe there's a lot of potential for getting the collective eye of the public to help us analyze research faster and maybe more effectively," said Owens.

Cancer Research UK has an interactive website called Cell Slider which is similar to Galaxy Zoo. But, instead of analyzing galaxies, here citizens help to examine cancer cells.

By simply spotting how many cancer cells are in a sample, how many are of them are stained yellow and what proportion of those are very bright, citizens help scientists test the level of the success of breast cancer treatment.

"So we're getting data back that is helping us link what we're seeing under the microscope with the outcome of a woman's treatment for breast cancer," said Owens.

Interactive websites like Cell Slider have easy to understand tutorials.

And because the process is structured in a way that harnesses the human brain's natural ability to recognize patterns, the websites are easy to work with, said Kevin Schawinski, founder of Galaxy Zoo.

Getting it right 

 With so much data coming in, and a lack of experience among those working with it, some people may be concerned about the accuracy of the projects.

In terms of collection, it's about understanding the potential sources of errors, said Wiggins. There are two methods to ensure accuracy, quality assurance and quality control.

Quality assurance is a procedure that happens before the data is collected, said Wiggins.

"It's a specific protocol we follow. We control the data entry fields so that bogus data can't be entered."

She said that citizens follow procedures that produce useful information.

Quality control happens once the data is collected. It is a method that throws out any data that has too many errors or inconsistencies.

After the data is collected, a new challenge arises.

When it come to data analysis, "repeat observation is like a gold mine," said Wiggins.

For instance, with Galaxy Zoo there are so many people analyzing the same data, the probability of getting an accurate result is increased, explained Schawinski.

"One thing we've discovered is that because we have 20 to 40 people independently look at each galaxy, the Galaxy Zoo classifications are very accurate," added Karen Masters a professional astronomer. (Listen to our interview with Karen Masters here.)

Big Data simplified 

 Only trained professionals have the background necessary to understand scientific data. In order to engage the general public and get useful information in return, developers have to find ways to simplify complex content.

Before launching Cell Slider, the scientists conducted some surveys to determine how the presentation of their data affected people's tendency to come back and play, and how comfortable people felt about looking at cancer cells. They inverted the colors of the cells and added some abstract imagery to make the pictures more beautiful.

"The big challenge for us was getting people to return to science. We wanted to make it exciting and engaging enough so that people keep coming back," said Owens.

And in the case of Foldit, they turned the complex process of protein manipulation into a video game.

"One of the big issues was basically showing the players only the required information," explained Tamir Husain, one of the game's developers.

The team stripped protein folding down to its core. In the game, proteins appear as 3D cartoon images users can manipulate in space. The problem areas are represented as red balls with spikes called clashes.

To a scientist, the clash represents a complex scientific problem, but to a video game player it's simply a hurdle that needs to be overcome before getting the high score.

Discussing Big Data 

 So what happens after you're done folding proteins and classifying galaxies? There are forums and research boards where citizens can get more information about the research they are helping with.

There, they can ask researchers questions and discuss the project with their fellow citizen scientists. It's an opportunity for citizen scientists to stay involved with the project beyond the games.

Big Data, big results 

 One of the biggest breakthroughs in terms of citizen science and big data came in 2011, when Foldit players solved the Mason-Pfizer Monkey Virus puzzle.

Scientists had been agonizing over the protein's structure for years, but by crowdsourcing the data to players, they had their solution in a matter of days. Their discovery could have major implications for developing AIDS medication.

In 2007 a citizen scientist stumbled across a unique and potentially groundbreaking astronomical object hidden in a Galaxy Zoo image.

Schawinski said that it's an object that scientists have never seen before and they are still trying to work out exactly what it is.

This is an example of a discovery that likely would have been overlooked by a computer algorithm and solidifies the benefit of using the human eye as a means for processing big data.

Other scientists credit citizens for cutting their research time in half and in some cases like Cell Slider, by much more.

"We have sped up the time it would take to carry out that data from 18 months to three months -- freeing up our scientists to carry out more research," said Owens.

Looking ahead 

 Scientists say that this is just the beginning of an exciting collaboration with the public. "I believe that it's an area that has huge potential. We are already looking at other types of data that we might be able to ask the public to help us analyze," said Owens.

Schawinski believes the value of using citizen scientists to process big data is something that is going to evolve and become more prominent in the future. And he says the way it will be done will likely change. "Humans and machines are going to collaborate more," said Schawinski.

He expects to see platforms with machines that can decide which objects need to be looked at closely by humans. In turn, humans will make sure that the machines are working sensibly and that nothing odd or unique is missed, he said.

For now, the human mind remains a unique tool for dissecting information. Computers have immense power, but they lack pattern recognition skills that humans are very good at.

"It is one of our goals to try and capture the intuition that players have and turn that back into algorithms that can be run on computers," said developer Jeff Flatten of Foldit.

In the meantime, the data continues to pile up and scientists are relying on normal citizens like Stephen Brabin to keep helping them out.

 

Brendan McConnell, Brent Boles, Lola Fakinlede are journalism students at the University of Western Ontario -- part of a team producing this Citizen Science series for co-publication by rabble.ca and The Tyee. Listen to their podcast on this topic here

 

Coming up next week 

The terms citizen journalism and citizen science are relatively new in our vocabulary and could not exist today without the everyday person.

Both continue to arouse the interest and curiosity of the public, particularly those who are engaged in exploring new realms of knowledge. And you don't have to look far to find similarities between the two.

Citizen journalism and citizen science both attract passionate people who care about their communities and collect data that professionals can't or won't.

Citizen journalists often pick up where the mainstream media leave off, telling stories that would otherwise be missed, and citizen scientists collect data in areas that would otherwise go unstudied.

Despite efforts to ensure accuracy by checking and re-checking their facts, both citizen scientists and citizen journalists face questions about the legitimacy of their work. But neither shows any sign of going away soon.

Which prompts a big question: What does this mean for societies that need good journalism and good science to function well? What can one discipline tell us about the other? And what frustrations and benefits do they share?

Join us next week as we explore all of these questions and more.

 

Further reading 

-Cancer Research UK is the organization behind the interactive website Cell Slider that is working towards finding new ways to treat cancer.

-Cell Slider is an interactive website to analyze cancer cells.

-An article that discusses the achievements of Cell Slider

-An article about the rise of public participation in scientific research.

-One of the oldest digital citizen science experiments, which combs the sky for extraterrestrial intelligence.

-A historical survey of galaxy zoo published by the Guardian.

-The organization behind citizen science projects.

-A data archiving project which offers expertise to citizen scientists on everything about data collection, analysis and storage.

-An interactive website that harnesses the power of crowdsourcing to accurately classify huge collections of galaxies.

-A video game that lets users fold proteins in an effort to make medical advances.

embedded_video

Comments

We welcome your comments! rabble.ca embraces a pro-human rights, pro-feminist, anti-racist, queer-positive, anti-imperialist and pro-labour stance, and encourages discussions which develop progressive thought. Our full comment policy can be found here. Learn more about Disqus on rabble.ca and your privacy here. Please keep in mind:

Do

  • Tell the truth and avoid rumours.
  • Add context and background.
  • Report typos and logical fallacies.
  • Be respectful.
  • Respect copyright - link to articles.
  • Stay focused. Bring in-depth commentary to our discussion forum, babble.

Don't

  • Use oppressive/offensive language.
  • Libel or defame.
  • Bully or troll.
  • Post spam.
  • Engage trolls. Flag suspect activity instead.