Sunday, 22 January 2017

Birdwatching

The buildings where the statistics classes are held are out in the countryside. After a really productive morning it looked like we'd finish early, so me and a friend went for a walk around a local lake. I've done a lot of garden birdwatching over the years, but water birds are mostly a mystery, so I hoped to see something new and exciting.

There was a lot of ice about. We got excited when we heard movement in the trees down a bank, but it turned out to be an ice sheet moving underneath, catching low-hanging twigs. We then found some mallards, but, y'know, mallards.

Eventually, the ice gave way to open water, and we came upon half a dozen great crested grebes and a big flock of tufted duck. I'd only seen grebes from a distance before, so that was exciting, and tufted duck were completely new. The males are black with bold white wings, and both sexes have the eponymous tuft off the backs of their heads. Fabulous!

My friend is a keen photographer. Here are a couple to set the scene:

 



A beautiful close-up of two of the grebes:

And a lovely shot of the two of us:
Here at Organising Life, we respect privacy.

Wednesday, 18 January 2017

Modelling and statistics

This week we've been back in the wonderful world of statistics. I quite like this stuff. One particularly important concept that's really been driven in is that anything beyond the most basic of statistics is actually a type of modelling. But modelling involves models, which by definition aren't real, so how does they help to analyse real data?

The most basic: descriptive statistics
First, a quick reminder of the sorts of statistics most people are familiar with. These are called descriptive because they simply look at the data and tell you exactly what's there.

These describe the centrality (average):
  • Mean ('average')
  • Median (middle value if you listed them all in order)
  • Mode (most common value)

These describe the variance (spread):
  • Range (difference between the largest and smallest)
  • Variance
  • Standard deviation

These might be simple, but they can be very helpful. I'm always finding the means and standard deviations of things. Their limitation is that they do just describe the data, whereas in biology we usually want to be able to take our data and make predictions from it. You might have studied the amount of cabbage eaten by caterpillars reared at 10°, 15° and 20°, but what about temperatures in between? It's sensible to use your data to predict that, but descriptive statistics won't help you do it.

The ones that use models: parametric statistics
Here's some fake data about a particularly difficult video game:

16 people each played one level of eight, and the number of times their character died before they managed to finish the level was recorded. It seems that, in general, the higher levels were more difficult.




Unfortunately, the researcher had left her glasses at home and didn't realise that only every other level was tested. What about the missing levels?

The statistically knowledgeable reader might suggest drawing a line of best fit:

Now we aren't limited to reading up from existing x values. We can read from anywhere, because the line of best fit should represent the general pattern of the data. These in-between values aren't real, but from what we already know we can make a reasonable guess of what they are. For example, it looks like someone playing level three can expect to die seven or eight times.

This is exactly what a model does: the line of best fit is a model of the data. I've been learning to use linear models, which are lines drawn to fit the data as accurately as possible. The better the model fits the data, the more faithful its predictions will be, so the more we can trust them. Parametric statistics are named so because they are described by parameters of the data, like the mean and the variance.


A line fits this data pretty well, but what about data that follows different shapes, like arcs? Mathematically, you describe an arc in a similar way to a straight line, but with an extra parameter that makes it a bit more complicated. To make a line with multiple bends you add more parameters, and so on. The video game data isn't a perfect straight line, so you might extend its model to end up with something like this:
 Or even this:


The last one might describe the data perfectly. However, it's going to be extremely complicated and difficult to make predictions from. This model is as impenetrable as real life. That's not the point.

So, when you make a model to pull out the gist of data, there's a trade-off between making it realistic enough for trustworthy predictions and simple enough to be useful. Statistics involves using numbers at exactly the right level of pretendness.

Monday, 9 January 2017

Lactose tolerance: how humans evolved to drink milk

A big part of my course is working with DNA, but you don't just use DNA to identify things. An interesting topic that I've been revising lately is how you can see which genes in a population have been evolving recently by comparing DNA between individuals. Genomes (the entire DNA sequence of an individual) collected in the present can say quite a lot about the past. One particularly nice story is how some humans recently evolved to digest milk as adults, something which only matters to people who herd animals. These people adapted to a new environment, through natural selection, and that new environment was a step away from hunter-gathering and towards agriculture.

What's important about digesting milk?
All mammals start life getting all of their nutrition from milk. The main source of carbohydrate in milk is the molecule lactose, which is built from two simple sugars (galactose and glucose) stuck together. Inside the baby animal, the enzyme lactase breaks lactose molecules to release the simple sugars, which can then be broken down by other enzymes and absorbed. Once the animal can feed itself and stops drinking milk, it no longer needs to produce lactase because lactose isn't found in other foods, so lactase production stops.

The same is true of most humans. Give a typical adult a drink of milk, and the lactase will travel through the digestive system untouched. They won't be able to absorb the sugars, so won't get as much energy from the milk as a baby would. Too much unabsorbed lactose can also cause problems1:
  • It increases the water potential of the gut contents, so more water is drawn into the gut by osmosis. This can cause diarrhoea.
  • It can be broken down by certain gut bacteria, which release hydrogen and carbon dioxide as by-products, causing pain and bloating.
These are symptoms of lactose intolerance, and they're not pleasant. But because adult humans have spent almost all of our evolutionary history not consuming lactose, it wasn't an issue until recently.

Why did lack of lactase become an issue?
The difference came when some people began herding animals and recognised milk as source of food for everyone rather than just infants. Pastoralism (animal husbandry) developed independently in the Middle East about 10,000 years ago and Africa about 8,000 years ago, and has flourished since.

Just because your culture keeps cattle doesn't mean you need to drink milk to survive, but an opinion I read somewhere too long ago to source argued that milk would have been an important hardship food. Even when staple plant foods are hard to find, livestock can keep producing milk (to an extent). In times of hardship, people who could get by on milk and not much else would have had a significant advantage over those who got ill by doing so, even if they performed equally when there were plenty of alternative foods. Milk could also have been a necessary long-term supplement to a poor diet.

In any case, lactase persistence evolved independently about five times in humans, so digesting milk must have been important. We say that there was a selection pressure: that natural selection favoured those who, by some genetic quirk, produced lactase for longer. Those people were more likely to have more children (because they weren't ill or dead), so their genetic quirk for lactase persistence also persisted.

How do we know this happened?
Because some people today aren't lactose intolerant.

Yes, but what's the science?
I'm glad you asked. A key feature of a gene that's been under recent natural selection is that it's very similar across the population. Imagine a population where individuals come in many different colours. Their hypothetical 'colour gene' would be very varied, because across the population there would be some blue copies, some red copies, some green copies, and so on. Now imagine a terrible disease arrives that wipes out every individual apart from the yellow ones. The survivors might be quite a mixed bunch, but they would all have one thing in common: the yellow variant of the colour gene. The colour gene now has no variation at all, because all of its copies are the same.

A similar thing happens in real populations. One study2 compared the genomes of people from Kenya and Tanzania. Kenyan people generally have high lactase persistence, and Tanzanians virtually none. The amount of variation in the lactase gene was much lower in the persistent genome, showing that they had had selection on that gene recently. There hadn't been any clear selection there in the non-persistent population because they haven't been under pressure to produce lactase differently. The researchers could even use the amount of variation to date the time when most people in the Kenyan population became lactose-persistent: 5-8,000 years ago. This matches really well with the date of pastoralism on the continent.

Conclusion
Lactose-persistent populations are better adapted than non-persistent populations to drinking milk, so they're better adapted to an environment where drinking milk is important. Natural selection was still driving evolution in humans even in the first stages of agriculture, which was just a very very short time ago in the grand scheme of things.

References
  1. Lomer, Parkes & Sanderson (2007) "Review article: lactose intolerance in clinical practice – myths and realities" http://onlinelibrary.wiley.com/doi/10.1111/j.1365-2036.2007.03557.x/full
  2. Tishkoff et al. (2006) "Convergent adaptation of human lactase persistence in Africa and Europe" http://www.nature.com/ng/journal/v39/n1/full/ng1946.html

Mystery plant thing II

After three weeks, has anything grown from that enigmatic lump of compost?

Yes, but probably not what the manufacturer intended. There are about half a dozen little seedlings, all sprouting from the compost shell rather than the little mesh bag. But often seeds take longer than this, so there's a chance the main player might still be alive. I'm considering digging it out before I go back to London to see if I can identify it, and maybe give it more appropriate conditions. This windowsill is not known for keeping plants alive!