Visiting Fellow Interview | Lucy Fortson
Lucy Fortson is an observational astrophysicist and a Professor of Physics at the University of Minnesota. She joins Keble this year as a Visiting Fellow and is also a Leverhulme Visiting Professor at the Department of Physics. In addition to her core astrophysics expertise in very-high-energy gamma rays, Dr Fortson is a pioneer in citizen science, combining human and AI strengths to create algorithms and address Big Data challenges.
Let’s start with the basics – what is citizen science?
Citizen science actually goes by many names, including Public Participation in Science, Participatory Research, Crowdsourcing Science, or Volunteer Monitoring. All of these names have in common the invitation for members of the general public to contribute to research and take advantage of the idea that “many hands make light work”. Typically, citizen science projects can be split into one of two modes: either data collection or data analysis tasks, with both relying on mechanisms whereby the tasks are distributed to as many people as possible. Often, the aims of data collection citizen science research are related to monitoring impacts of climate change on specific locations, where the information can be used both locally as well as being fed into more global assessments. For example, by submitting their observation records, bird watchers contribute to research into understanding migratory patterns of specific species, both locally and globally.
The data analysis mode of citizen science is trying to directly address the Big Data challenge of too few people to carry out analysis of huge amounts of data. Thus, the aims of data analysis citizen science projects are typically to collect annotations on data coming in from the full range of sensors contributing to Big Data, including telescopes, microscopes, cameras, biomedical devices—and even the data coming in from data collection citizen science projects. While artificial intelligence (AI) algorithms are becoming increasingly capable of analysing much of these data, a large proportion remains complex and still needs human intervention for analysis and interpretation. What may surprise some people is that contributions from non-experts are very accurate if the tasks are structured properly and the contributions from multiple people combined.
Tell us about the Galaxy Zoo project. What were some of the challenges and major successes?
The Galaxy Zoo project grew out of the need to analyse one million images of galaxies taken by the Sloan Digital Sky Survey, which, starting in 2000, was the first ever large-scale sky survey that would process the images digitally and not via photographic plates. In 2007, Chris Lintott, who was then a post-doctoral scholar here at Oxford, and a graduate student working with him wanted to understand the relationship between the shapes of galaxies (their morphology) and their colour. Galaxies come in two basic shapes—beautiful grand design spirals that are typically (but not always!) blue, due to the large amount of ongoing star formation within them, or round blobs, more elliptical in shape, that are typically red because their star formation has ceased. It turns out that to really understand how galaxies evolve with cosmic time, we need to understand the interplay between their shapes and colours. Colour is an easy attribute for a computer to measure. Shapes—especially complex shapes like the morphology of an individual galaxy—are not. So it was easy to compile a list of the million Sloan galaxy images that registered as “blue” but very hard to compile a list of, say, red spirals, or blue ellipticals—exactly the examples needed to best test certain theories of galaxy evolution. Inspired by the NASA citizen science project Stardust@Home which had garnered about 20,000 members of the public to help analyse images of dust particles from a comet, the Oxford team built a website to present the Sloan galaxy images to the public, asking them to classify whether the galaxy looked like a spiral or elliptical. Galaxy Zoo went live on 11 July 2007 collecting within two months nearly 30 million classifications from over 100,000 volunteers and literally changing the way many scientists think about the best way to carry out research in the era of Big Data. One of the key successes, apart from just the sheer popularity of the project, was the ability of volunteers to find unusual objects in the images and make discoveries that to this day still have an enormous impact on galaxy evolution research. For example Green Peas (small, round galaxies that are green and look like…well, green peas) were noticed by a Dutch elementary school teacher and after discussing with other volunteers on the Galaxy Zoo Forum, and then eventually astronomers from the Galaxy Zoo team, it became clear that these objects were representative of very early galaxies—except they were all in the local universe.
After the success of Galaxy Zoo, you decided to expand on your work with Zooniverse, which is now the largest citizen science portal on the web. Can you tell us a bit about how it works, and what sorts of projects utilise it?
At the time Galaxy Zoo went live, I was Vice President for Research at the Adler Planetarium in Chicago, and had established a Center for Citizen Science working with the Sloan data and other projects. Chris Lintott and I joined forces in June 2008, setting up the partnership that took Galaxy Zoo to the Zooniverse. We first asked volunteers to do more detailed classifications of the Sloan galaxy images—were there barred shapes present, how many spiral arms, how tightly were they wound, etc. So we knew the volunteers could do more complex tasks. Then, based on results of a survey suggesting that the main motivation of Galaxy Zoo participants was to contribute to research, we were convinced that the public would be interested in projects other than classifying shapes of galaxies. Zooniverse was launched on December 12, 2009 and within a year we had projects that evaluated simulations of merging galaxies, discovered supernovae (exploding stars) in galaxies, looked for peculiar star formation bubbles in our Milky Way Galaxy, and searched for planets outside our own Solar System. Closer to home, there were projects that classified images of solar storms (giant eruptions on the sun that can lead to damaging impacts if they hit Earth), counted and evaluated the size of craters on the Moon, and asked for volunteers to enter weather records from World War I Royal Navy ships’ logs to extend back in time temperature data to improve climate change models. And, of course, Galaxy Zoo continued but with data from the Hubble Space Telescope. By 2015, Zooniverse had over 50 projects and had branched out of astrophysics substantially with many projects in ecology, biomedicine or the humanities. Each of them took quite a lot of effort for web developers to build so that the research team would get the right type and quality of data back from the crowd. However, even though the data could be wildly different, it was clear that most research teams wanted very similar tasks for the analysis of their data—a question or decision tree, simple marking (a point, a line, a circle or a box), transcription of text, or a survey (where people selected from a list e.g., which animal species was present in an image). In July 2015, we provided a Project Builder toolkit for anyone to link together sets of tasks, upload their data, create their tutorials to train the volunteers on their data and tasks, and in general, put together their own Zooniverse project. By then, Zooniverse had well over a million volunteers and growing. The Project Builder led to an explosion of growth where today, nearly 500 projects have been hosted on the Zooniverse and nearly 3 million volunteers contribute to them. About a billion classifications have been collected across all of our domains.
You’ve said that humans still have the edge over machines, which is a comforting thought, but can you tell us a bit about the role machine learning does have to play?
Interestingly, the history of the Zooniverse parallels the advancements in AI, in particular as applied to science. Many of the Zooniverse classifications have been used to train AI algorithms that now stand as the de facto AI for the field, such as for determining animal presence in the billions of camera trap images flooding the world. Over the past decade, since my move to the University of Minnesota, I have been working to incorporate machine intelligence alongside its human counterparts in the platform in an AI process called “human-in-the-loop”. We need to find ways to keep up with the ever-growing amounts of data—our new astronomical surveys are producing billions of galaxy images compared to the million images from the Sloan Digital Sky Survey back when we first started. Even if we had all the people on the planet participating, we would not be able to look at all these images individually. So we must apply AI to accelerate the analysis of the data. But this is where things get interesting—AI is only as good as its training data. It only knows what it has been told about. Machines are good at brute force work—not looking at an image and recognising that there is something unusual in it, or doing sophisticated reasoning about what that unusual something might be or whether it really is a scientific discovery, such as the Green Pea galaxies. We need (still!) humans to find the scientific needles in the giant haystacks of data. So we are working out ways for the machines to confidently classify all of the common things, but send to the humans those images that are most likely to contain things that it doesn’t know about. But to guide the machine in figuring out what it needs help with, we are also learning to quantify how our volunteers collectively make decisions about what they think is unusual. The machine can then select the images with the highest probability of something interesting for vetting by the volunteers—and then, once one example of something new has been found by the humans, the machine can look for all similar examples in the data set, giving these new examples to the volunteers for further vetting. Even in the near-ish future when machines will get really good at figuring out what is unusual in a data set, we will still need humans to interpret what those unusual things might be. After all, though we may be aided more and more by machines to gather and make sense of data, science is really a human endeavour at its heart.