Prev: The normal distribution       |      Next: The f distribution

# Student's t distribution¶

• This is a widely used distribution in hypothesis testing that plays a central role in the very popular t-test.
• A t distribution describes samples drawn from a full population that follows a normal distribution. The larger the sample of t distribution, the more the t distribution resembles a normal distribution.
• You can read historical and scientific details of the distribution here (Wikipedia)

### t probability density function¶

                                  gamma((df+1)/2)
t.pdf(x, df) = ---------------------------------------------------
sqrt(pi*df) * gamma(df/2) * (1+x**2/df)**((df+1)/2)

### df: degrees of freedom¶

• Except loc and scale, t distribution takes one important shape parameter: df (from: 'degrees of freedom')
• Simply speaking 'df' in statistics is the number of values in a calculation that are free to vary without violating the result of the calculation.
• For example, when calculating the mean value M of a sample with size N, then considering that M remains constant only the N-1 values in the sample can vary. The N-th value is dependent on the other N-1 values.
• df is a shape parameter in both t and F distributions that are commonly used in hypothesis tests.
• Read more on degrees of freedom here (Wikipedia)

### Sample statistics of t¶

In [1]:
from scipy.stats import t

# Freeze for df=20, loc=0, scale=1 and get the first four moments from stats()
rv = t(df=20, loc=0, scale=1)
mean, var, skew, kurt = rv.stats(moments='mvsk')
mean, var, skew, kurt

Out[1]:
(array(0.0), array(1.1111111111111112), array(0.0), array(0.375))

### Plotting t¶

In [2]:
import numpy as np
from scipy.stats import t
import matplotlib.pyplot as plt
%matplotlib inline

rv = t(df=20, loc=0, scale=1)
x = np.linspace(rv.ppf(0.0001), rv.ppf(0.9999), 100)
y = rv.pdf(x)

plt.xlim(-5,5)
plt.plot(x,y)

Out[2]:
[<matplotlib.lines.Line2D at 0x816b9b0>]
• For practice, freeze and plot in the same graph a normal distribution
• Then run the code for various df and observe how t is getting closer to norm as df increases

### Probability to pass a cutoff value¶

In [3]:
a = 0.05  # set the cutoff

rv = t(df=20, loc=0, scale=1)
x = np.random.normal(size=1)

p = rv.sf(x) # equal but sometimes more accurate than '1-rv.cdf(x)'

if p < a:
print('Cutoff at: ', x, p)
else:
print('No cutoff', x, p)

No cutoff [ 0.46958638] [ 0.32186492]