Short: A Portuguese-based group is suggesting that winemakers could have more useful information about choosing a yeast strain if scientists did a better job of putting together data from different kinds of experiments.
Scientific research generates a lot of different shapes and sizes of data. How does anyone make it work together?
Contemporary scientific research has a lot of big challenges, but here are three: funding, replicability, and integration. Funding is a great big gory topic for another day.
Replicability has seen a lot of attention in recent science news: scientists across disciplines have been reporting difficulty duplicating their colleagues’ results when they try to repeat the same experiments. This is worrisome. (Most) science is supposed to be about making observations about the world that remain the same independent of who is making the observations. Two careful people should be able to do the same experiment in two different places and obtain the same results. Well-trained scientists, however, are finding themselves unable to replicate the results described in scientific papers, and the community isn’t sure what to do about it.
Integration – how to fit together large amounts of lots of different kinds of data – looks like a separate kind of problem. Scientists (microbiologists, biochemists, systems biologists, geneticists, physicists…) study a thing – yeast, say – in many, many different ways. They generate data in many different shapes and sizes, using all manner of different kinds of instruments to make numbers that don’t just tidily line up with each other. But, at least in theory, all of those data are about the same thing – the same yeast – and so finding ways to integrate data from different kinds of experiments should massively improve our understanding of how yeast works as a whole.
The problem is a bit like trying to compile lots of different kinds of images of a large building – photos from outside and from inside, satellite images, historic accounts of parties hosted there, watercolors of the grounds, plumber’s bills, paint chips from the last remodel – into a single detailed, coherent model of the structure. You might be happy deciding to rent a house on the basis of a floor plan and a picture of the outside sometimes, but occasionally you’re going to move in and realize that the living room is wallpapered pink or that every room smells like cigar smoke and that you have a disaster on your hands that could have been averted by having more information.
A Portuguese-based group of molecular biologists and biotechnologists has suggested that winemakers might have fewer fermentation disasters if scientists did a better job of integrating the different kinds of pictures they take of wine yeast. This, they note, is a “data resource” problem. Solutions lie not necessarily in doing better or different scientific research,* but in using computational or informatic tools to find points of alignment across existing kinds of data. The method they offer is unique because they can find correlations across not just two kinds of data, but three or more, and lots of it. One of the interesting things about their example for demonstrating that method is that it aligns data about yeast behavioral characteristics – qualities like low hydrogen sulfide production** – with data about genetic variability. This kind of information might help wine yeast developers increase genetic variability in yeast strains by making it easier to assess large number of potential yeast strains for the right combination of good winemaking characteristics and genetic diversity. And, consequently, their analyses could help winemakers have more complete ideas about what to expect from the yeast they choose to use.
What’s most interesting about this paper, though, is the way it points out that integration and replicability aren’t entirely separate issues. Yes, scientists doing precisely the same thing should arrive at precisely the same results. But how often do scientists do precisely the same thing? Even in trying to repeat “the same experiment,” unaccounted-for differences might interfere and yield different results. And maybe those kinds of differences are more troublesome when other living things – like yeast cells – are also participating in the experiment, compelled or willing to cooperate with the scientist to some extent but still also doing their own thing. So, a different but related question is: can the results of multiple sets of experiments make sense together? Having better computational methods for lining up different kinds of data makes it easier to find out.
*Though experiments could surely be designed so that results are easier to put together with the results of other experiments, which is very much a scientific problem.
**Important if you want to avoid making wine that smells like rotten eggs.