Prev: The t distribution       |      Next: -

F distribution¶

• The F distribution (or the Fisherâ€“Snedecor distribution) is commonly used in analysis of variance (releant to F-test)
• As F statistic is calculated in the form of a variance ratio it has two df shape parameters:
• dfn (degrees of freedom in the estimate of variance of numerator) and
• dfd (degrees of freedom in the estimate of variance of denominator)
• You can read historical and scientific details of the F distribution here (Wikipedia)

f probability density function¶

                     df2**(df2/2) * df1**(df1/2) * x**(df1/2-1)
F.pdf(x, df1, df2) = --------------------------------------------
(df2+df1*x)**((df1+df2)/2) * B(df1/2, df2/2)



where B is the Beta function

Sample statistics of f¶

In [1]:
from scipy.stats import f

# Freeze for dfn=4, dfd=12, loc=0, scale=1 and get the first four moments from stats()
rv = f(dfn=4, dfd=12, loc=0, scale=1)
mean, var, skew, kurt = rv.stats(moments='mvsk')
mean, var, skew, kurt

Out[1]:
(array(1.2), array(1.26), array(3.2071349029490923), array(26.142857142857135))
• Why are these quantities considered arrays and not scalars?
• Read an explanation here (stackoverflow blog)
• And here: Scalars in numpy (scipy docs)

Plotting f¶

In [2]:
import numpy as np
from scipy.stats import f, norm
import matplotlib.pyplot as plt
%matplotlib inline

# first f
rv1 = f(dfn=3, dfd=15, loc=0, scale=1)
x = np.linspace(rv1.ppf(0.0001), rv1.ppf(0.9999), 100)
y = rv1.pdf(x)

plt.xlim(0,5)
plt.plot(x,y, 'b-')

# second f
rv2 = f(dfn=10, dfd=50, loc=0, scale=1)
x = np.linspace(rv2.ppf(0.0001), rv2.ppf(0.9999), 100)
y = rv2.pdf(x)

plt.plot(x,y, 'r--')

Out[2]:
[<matplotlib.lines.Line2D at 0x831ae10>]
• For practice, freeze and plot in the same graph a normal distribution
• Then run the code for various dfn and dfd and observe how f varies in relation to norm

Probability to pass a cutoff value¶

In [3]:
a = 0.05  # set the cutoff

x = np.random.normal(size=1)

rv1 = f(dfn=3, dfd=15, loc=0, scale=1)
rv2 = f(dfn=10, dfd=50, loc=0, scale=1)

p1 = rv1.sf(x)
if p1 < a:
print('F1 cutoff at: ', x, p1)
else:
print('F1 No cutoff', x, p1)

p2 = rv2.sf(x)
if p2 < a:
print('F2 cutoff at: ', x, p2)
else:
print('F2 No cutoff', x, p2)

F1 No cutoff [ 1.58750964] [ 0.23403218]
F2 No cutoff [ 1.58750964] [ 0.13785191]