Think Stats – Chapter 2 – Exercise 3

In this exercise we will look at how we can use different plots to aid our analysis of the gestation period for live births.

In the last blog we represented the data as a Probability mass function (pmf). Now we will be looking at more visualisation techniques to better aid our analysis.

As usual you will find the code for this exercise on github.

Lets start by looking at the differences of representing the data as a histogram and a pmf.

nsfg_histnsfg_pmf

As you can see the shapes of both the histogram and pmf are comparable as expected (thank goodness for that). The pmf is of course normalised by the total frequency, the sum of the pmf equals 1. The description of both plots, which was explained in the previous post still holds, thus we will just leave this visualisation at that.

Now that we have an idea of what is going on with distributions, lets focus our attention to the week by week difference between first and other live births. I will visually represent this difference by calculating the differences of the two pmfs, multiplying it by 100 and plotting it using our friend matplotlib!

nsfg_diffs

The plot above now shows the relationship between first borns and others. We can see that others is ~6% more likely to be born in the 39th week than first borns. After 41 weeks first borns are more likely to be born. Of course we have not considered the statistical errors of these bins and also systematic uncertainties when recording the data.

I am currently a Particle Physics PhD working with the ATLAS Collaboration at CERN

Tagged with: , , , ,
Posted in Stats, Think Stats

Leave a comment