Prev: Distributions | Next: The t distribution
Normal (a.k.a.: 'Gausian', 'bell curve') distribution is of major importance in statistics and in hypothesis testing specifically. The probability density function for normal distribution in scipy is written as:
norm.pdf(x) = exp(-x**2/2)/sqrt(2*pi)
import scipy.stats as stats # Imports the entire scipy.stats (all distributions)
from scipy.stats import norm # imports only the normal distribution
from scipy.stats import t # imports only the Studnet's t distribution
from scipy.stats import f # imports only the Fisher's f distribution
# Get help by printing the distribution docstring as follows:
# print(stats.norm.__doc__) # or print(stats.t.__doc__) etc.
# print(stats.norm.__doc__)
Import norm and use some basic distribution statistical functions like:
- median => median()
- mean => mean()
- standard deviation => std()
- variance => var()
stats() returns the first four moments of the distribution: mean, std, skew and kyrtosis (for a reminder of what 'moments' are in statistics read here)
from scipy.stats import norm
# get median, mean, standard deviation and var
print(norm.median(), norm.mean(), norm.std(), norm.var())
# or use stats() function
m, v, s, k = norm.stats(loc=0, scale=1, moments='mvsk')
print(m, v, s, k)
- z = (x - M)/SD, where M: mean and SD: standard deviation
Thus, z scores are independent from any specific measurements and only describe how far from M is the x value in terms of SD
- loc represents the mean
- scale represents SD
# loc and scale examples
print(norm.var(loc=2, scale=4)) # prints the variance of a norm distro with M=2 and STD=4
# print the mean and variance of a norm distro with M=1.5 and STD=0.5
m, v = norm.stats(loc=1.5, scale=0.3, moments='mv')
print(m, v)
rv = norm(loc=2, scale=4) # Use 'rv' to refer to the frozen norm distro with M=2 and SD=4
m, v, s, k = rv.stats(moments='mvsk')
print( m, v, s, k)
from scipy.stats import norm
obs = norm.rvs(loc=0, scale=1, size=10)
obs
import scipy as sp
s = sp.randn(100)
sp.stats.describe(s)
import numpy as np
from scipy.stats import norm
import matplotlib.pyplot as plt
%matplotlib inline
# A simplistic way to do it
x = np.linspace(-5,5,100) # define a big enough x interval
nd = norm.pdf(x) # get the norm.pdf for x interval
plt.plot(x,nd) # plot it!
- ...for any distribution the ppf() function returns a x value that corresponds to the probability that this value appears.
- ...near the lower or higher limit of the value range of the distribution the cumulative probability (cdf) becomes very small (almost zero) or very large (almost 1) respectively.
Thus, the [ppf(0.0001), ppf(0.9999)] value range should return x values close enough to the lower or higher end of the x values of the distribution. So, we rewrite the code as follows:
# A better approach
rv = norm(loc=0, scale=1) # Freeze the norm distribution
x = np.linspace(rv.ppf(0.0001), rv.ppf(0.9999), 100)
y = rv.pdf(x)
plt.plot(x,y)
a = 0.05 # set the cutoff
rv = norm(loc=0, scale=1)
x = np.random.normal(size=1)
p = rv.sf(x) # equal but sometimes more accurate than '1-rv.cdf(x)'
if p < a:
print('Cutoff at: ', x, p)
else:
print('No cutoff', x, p)
. Free learning material
. See full copyright and disclaimer notice