Prev: The t distribution | Next: -

F distribution¶

The F distribution (or the Fisher–Snedecor distribution) is commonly used in analysis of variance (releant to F-test)
As F statistic is calculated in the form of a variance ratio it has two df shape parameters:
- dfn (degrees of freedom in the estimate of variance of numerator) and
- dfd (degrees of freedom in the estimate of variance of denominator)
You can read historical and scientific details of the F distribution here (Wikipedia)

f probability density function¶

                     df2**(df2/2) * df1**(df1/2) * x**(df1/2-1)
F.pdf(x, df1, df2) = --------------------------------------------
                     (df2+df1*x)**((df1+df2)/2) * B(df1/2, df2/2)

where B is the Beta function

Sample statistics of f¶

from scipy.stats import f

# Freeze for dfn=4, dfd=12, loc=0, scale=1 and get the first four moments from stats()
rv = f(dfn=4, dfd=12, loc=0, scale=1)
mean, var, skew, kurt = rv.stats(moments='mvsk')
mean, var, skew, kurt

(array(1.2), array(1.26), array(3.2071349029490923), array(26.142857142857135))

Why are these quantities considered arrays and not scalars?
Read an explanation here (stackoverflow blog)
And here: Scalars in numpy (scipy docs)

Plotting f¶

import numpy as np
from scipy.stats import f, norm
import matplotlib.pyplot as plt
%matplotlib inline

# first f
rv1 = f(dfn=3, dfd=15, loc=0, scale=1)
x = np.linspace(rv1.ppf(0.0001), rv1.ppf(0.9999), 100)
y = rv1.pdf(x) 

plt.xlim(0,5)
plt.plot(x,y, 'b-')

# second f 
rv2 = f(dfn=10, dfd=50, loc=0, scale=1)
x = np.linspace(rv2.ppf(0.0001), rv2.ppf(0.9999), 100)
y = rv2.pdf(x) 

plt.plot(x,y, 'r--')

[<matplotlib.lines.Line2D at 0x831ae10>]

For practice, freeze and plot in the same graph a normal distribution
Then run the code for various dfn and dfd and observe how f varies in relation to norm

Probability to pass a cutoff value¶

a = 0.05  # set the cutoff

x = np.random.normal(size=1)

rv1 = f(dfn=3, dfd=15, loc=0, scale=1)
rv2 = f(dfn=10, dfd=50, loc=0, scale=1)

p1 = rv1.sf(x)
if p1 < a:
    print('F1 cutoff at: ', x, p1)
else:
    print('F1 No cutoff', x, p1)
    
p2 = rv2.sf(x)
if p2 < a:
    print('F2 cutoff at: ', x, p2)
else:
    print('F2 No cutoff', x, p2)

F1 No cutoff [ 1.58750964] [ 0.23403218]
F2 No cutoff [ 1.58750964] [ 0.13785191]

F distribution¶

f probability density function¶

Sample statistics of f¶

Plotting f¶

Probability to pass a cutoff value¶

Further reading¶

Copyright¶