Prev: The normal distribution | Next: The f distribution

- This is a widely used distribution in hypothesis testing that plays a central role in the very popular t-test.
- A t distribution describes samples drawn from a full population that follows a normal distribution. The larger the sample of t distribution, the more the t distribution resembles a normal distribution.
- You can read historical and scientific details of the distribution here (Wikipedia)

```
gamma((df+1)/2)
t.pdf(x, df) = ---------------------------------------------------
sqrt(pi*df) * gamma(df/2) * (1+x**2/df)**((df+1)/2)
```

- Except loc and scale, t distribution takes one important shape parameter:
**df**(from: 'degrees of freedom') - Simply speaking 'df' in statistics is the
*number of values in a calculation that are free to vary without violating the result of the calculation*.- For example, when calculating the mean value M of a sample with size N, then considering that M remains constant only the N-1 values in the sample can vary. The N-th value is dependent on the other N-1 values.

- df is a shape parameter in both t and F distributions that are commonly used in hypothesis tests.
- Read more on degrees of freedom here (Wikipedia)

In [1]:

```
from scipy.stats import t
# Freeze for df=20, loc=0, scale=1 and get the first four moments from stats()
rv = t(df=20, loc=0, scale=1)
mean, var, skew, kurt = rv.stats(moments='mvsk')
mean, var, skew, kurt
```

Out[1]:

In [2]:

```
import numpy as np
from scipy.stats import t
import matplotlib.pyplot as plt
%matplotlib inline
rv = t(df=20, loc=0, scale=1)
x = np.linspace(rv.ppf(0.0001), rv.ppf(0.9999), 100)
y = rv.pdf(x)
plt.xlim(-5,5)
plt.plot(x,y)
```

Out[2]:

- For practice, freeze and plot in the same graph a normal distribution
- Then run the code for various df and observe how t is getting closer to norm as df increases

In [3]:

```
a = 0.05 # set the cutoff
rv = t(df=20, loc=0, scale=1)
x = np.random.normal(size=1)
p = rv.sf(x) # equal but sometimes more accurate than '1-rv.cdf(x)'
if p < a:
print('Cutoff at: ', x, p)
else:
print('No cutoff', x, p)
```

- Read more on t distribution@'Stat Trek'

. Free learning material

. See full copyright and disclaimer notice