Home      |       Contents       |       About

Prev: The t distribution       |      Next: -

F distribution

  • The F distribution (or the Fisher–Snedecor distribution) is commonly used in analysis of variance (releant to F-test)
  • As F statistic is calculated in the form of a variance ratio it has two df shape parameters:
    • dfn (degrees of freedom in the estimate of variance of numerator) and
    • dfd (degrees of freedom in the estimate of variance of denominator)
  • You can read historical and scientific details of the F distribution here (Wikipedia)

f probability density function

                     df2**(df2/2) * df1**(df1/2) * x**(df1/2-1)
F.pdf(x, df1, df2) = --------------------------------------------
                     (df2+df1*x)**((df1+df2)/2) * B(df1/2, df2/2)

where B is the Beta function

Sample statistics of f

In [1]:
from scipy.stats import f

# Freeze for dfn=4, dfd=12, loc=0, scale=1 and get the first four moments from stats()
rv = f(dfn=4, dfd=12, loc=0, scale=1)
mean, var, skew, kurt = rv.stats(moments='mvsk')
mean, var, skew, kurt
Out[1]:
(array(1.2), array(1.26), array(3.2071349029490923), array(26.142857142857135))
  • Why are these quantities considered arrays and not scalars?
  • Read an explanation here (stackoverflow blog)
  • And here: Scalars in numpy (scipy docs)

Plotting f

In [2]:
import numpy as np
from scipy.stats import f, norm
import matplotlib.pyplot as plt
%matplotlib inline

# first f
rv1 = f(dfn=3, dfd=15, loc=0, scale=1)
x = np.linspace(rv1.ppf(0.0001), rv1.ppf(0.9999), 100)
y = rv1.pdf(x) 

plt.xlim(0,5)
plt.plot(x,y, 'b-')

# second f 
rv2 = f(dfn=10, dfd=50, loc=0, scale=1)
x = np.linspace(rv2.ppf(0.0001), rv2.ppf(0.9999), 100)
y = rv2.pdf(x) 

plt.plot(x,y, 'r--')
Out[2]:
[<matplotlib.lines.Line2D at 0x831ae10>]
  • For practice, freeze and plot in the same graph a normal distribution
  • Then run the code for various dfn and dfd and observe how f varies in relation to norm

Probability to pass a cutoff value

In [3]:
a = 0.05  # set the cutoff

x = np.random.normal(size=1)

rv1 = f(dfn=3, dfd=15, loc=0, scale=1)
rv2 = f(dfn=10, dfd=50, loc=0, scale=1)

p1 = rv1.sf(x)
if p1 < a:
    print('F1 cutoff at: ', x, p1)
else:
    print('F1 No cutoff', x, p1)
    
p2 = rv2.sf(x)
if p2 < a:
    print('F2 cutoff at: ', x, p2)
else:
    print('F2 No cutoff', x, p2)
F1 No cutoff [ 1.58750964] [ 0.23403218]
F2 No cutoff [ 1.58750964] [ 0.13785191]

Further reading

. Free learning material
. See full copyright and disclaimer notice