Prev: The normal distribution | Next: The f distribution

Student's t distribution¶

This is a widely used distribution in hypothesis testing that plays a central role in the very popular t-test.
A t distribution describes samples drawn from a full population that follows a normal distribution. The larger the sample of t distribution, the more the t distribution resembles a normal distribution.
You can read historical and scientific details of the distribution here (Wikipedia)

t probability density function¶

                                  gamma((df+1)/2)
    t.pdf(x, df) = ---------------------------------------------------
                   sqrt(pi*df) * gamma(df/2) * (1+x**2/df)**((df+1)/2)

df: degrees of freedom¶

Except loc and scale, t distribution takes one important shape parameter: df (from: 'degrees of freedom')
Simply speaking 'df' in statistics is the number of values in a calculation that are free to vary without violating the result of the calculation.
- For example, when calculating the mean value M of a sample with size N, then considering that M remains constant only the N-1 values in the sample can vary. The N-th value is dependent on the other N-1 values.
df is a shape parameter in both t and F distributions that are commonly used in hypothesis tests.
Read more on degrees of freedom here (Wikipedia)

Sample statistics of t¶

from scipy.stats import t

# Freeze for df=20, loc=0, scale=1 and get the first four moments from stats()
rv = t(df=20, loc=0, scale=1)
mean, var, skew, kurt = rv.stats(moments='mvsk')
mean, var, skew, kurt

(array(0.0), array(1.1111111111111112), array(0.0), array(0.375))

Plotting t¶

import numpy as np
from scipy.stats import t
import matplotlib.pyplot as plt
%matplotlib inline

rv = t(df=20, loc=0, scale=1)
x = np.linspace(rv.ppf(0.0001), rv.ppf(0.9999), 100)
y = rv.pdf(x) 

plt.xlim(-5,5)
plt.plot(x,y)

[<matplotlib.lines.Line2D at 0x816b9b0>]

For practice, freeze and plot in the same graph a normal distribution
Then run the code for various df and observe how t is getting closer to norm as df increases

Probability to pass a cutoff value¶

a = 0.05  # set the cutoff

rv = t(df=20, loc=0, scale=1)
x = np.random.normal(size=1)

p = rv.sf(x) # equal but sometimes more accurate than '1-rv.cdf(x)'

if p < a:
    print('Cutoff at: ', x, p)
else:
    print('No cutoff', x, p)

No cutoff [ 0.46958638] [ 0.32186492]

Student's t distribution¶

t probability density function¶

df: degrees of freedom¶

Sample statistics of t¶

Plotting t¶

Probability to pass a cutoff value¶

Further reading¶

Copyright¶