Home      |       Contents       |       About

Prev: The normal distribution       |      Next: The f distribution

Student's t distribution

  • This is a widely used distribution in hypothesis testing that plays a central role in the very popular t-test.
  • A t distribution describes samples drawn from a full population that follows a normal distribution. The larger the sample of t distribution, the more the t distribution resembles a normal distribution.
  • You can read historical and scientific details of the distribution here (Wikipedia)

t probability density function

    t.pdf(x, df) = ---------------------------------------------------
                   sqrt(pi*df) * gamma(df/2) * (1+x**2/df)**((df+1)/2)

df: degrees of freedom

  • Except loc and scale, t distribution takes one important shape parameter: df (from: 'degrees of freedom')
  • Simply speaking 'df' in statistics is the number of values in a calculation that are free to vary without violating the result of the calculation.
    • For example, when calculating the mean value M of a sample with size N, then considering that M remains constant only the N-1 values in the sample can vary. The N-th value is dependent on the other N-1 values.
  • df is a shape parameter in both t and F distributions that are commonly used in hypothesis tests.
  • Read more on degrees of freedom here (Wikipedia)

Sample statistics of t

In [1]:
from scipy.stats import t

# Freeze for df=20, loc=0, scale=1 and get the first four moments from stats()
rv = t(df=20, loc=0, scale=1)
mean, var, skew, kurt = rv.stats(moments='mvsk')
mean, var, skew, kurt
(array(0.0), array(1.1111111111111112), array(0.0), array(0.375))

Plotting t

In [2]:
import numpy as np
from scipy.stats import t
import matplotlib.pyplot as plt
%matplotlib inline

rv = t(df=20, loc=0, scale=1)
x = np.linspace(rv.ppf(0.0001), rv.ppf(0.9999), 100)
y = rv.pdf(x) 

[<matplotlib.lines.Line2D at 0x816b9b0>]
  • For practice, freeze and plot in the same graph a normal distribution
  • Then run the code for various df and observe how t is getting closer to norm as df increases

Probability to pass a cutoff value

In [3]:
a = 0.05  # set the cutoff

rv = t(df=20, loc=0, scale=1)
x = np.random.normal(size=1)

p = rv.sf(x) # equal but sometimes more accurate than '1-rv.cdf(x)'

if p < a:
    print('Cutoff at: ', x, p)
    print('No cutoff', x, p)
No cutoff [ 0.46958638] [ 0.32186492]

Further reading

. Free learning material
. See full copyright and disclaimer notice