Prev: Chi Square test | Next: Write your own ANOVA function
Examples of use:
- In educational research: Suppose you want to investigate the impact of an instructional method when tasks of varied difficulty are assigned to students. You randomly distribute your students to three groups and implement the instructional method differently for each group: group-1: no tasks at all, group-2: introductory level tasks, group-3: advanced level tasks. Later you run a post-test questionnaire to record students' learning performance. You apply a one way ANOVA to statistically compare the post-test performance in the three student groups. In your design you have: one factor (task difficulty) with three levels (none, introductory, advanced).
- ANOVA tells us about 'main interactions', that is, whether there is a statistically significant difference between the means of the groups, but does NOT tell which exactly pair (or pairs) of groups causes this significance.
- Therefore, after applying ANOVA, a number of specialized t-tests (usually Tukey or Bonferoni) have to be also applied (this is discussed at the end of this section).
- IV: Background music (three conditions: 1) no b.m., 2) soft tune, 3) modern tune
- DV: Learning performance as measured by some reliable and validated test (continuous variable, scale 0-100)
- Population: students of specific age and background studying in a multimedia elearning environment
- Research Design: Three groups post-test only design
- Groups:
- Control (C-group): N0=40 students (randomly selected) studying without background music
- Treatment1 (T1-group): N1=42 students (randomly selected) studying with soft background music
- Treatment2 (T2-group): N2=43 students (randomly selected) studying with modern background music
- Null hypothesis H0 = "Students studying with background music (either soft or modern) will perform the same in an appropriate knowledge test with students studing without background music" (non-directional)
import pandas as pd
import scipy.stats as stats
data = pd.read_excel('../../data/researchdata.xlsx', sheetname="anova")
data.head()
data.tail()
dC = data.Control.dropna()
dT1 = data.Treatment1.dropna()
dT2 = data.Treatment2.dropna()
print('Control group\n')
print(dC.describe())
print('\nTreatment-1 group\n')
print(dT1.describe())
print('\nTreatment-2 group\n')
print(dT2.describe())
- The normality criterion: each group compared should come from a population following the normal distribution.
- The variance criterion (or 'homogeneity of variances'): samples should come from populations with the same variance.
- Independent samples: performance (the dependent variable) in each sample should not be affected by the conditions in other samples.
# Shapiro-Wilk normality test
stats.shapiro(dC), stats.shapiro(dT1), stats.shapiro(dT2)
# Levene variance test
stats.levene(dC, dT1, dT2)
F, p = stats.f_oneway(dC, dT1, dT2)
print('F statistic = {:5.3f} and probability p = {:5.3f}'.format(F, p))
# apply ttest_indep()
t, p = stats.ttest_ind(dC, dT1)
print('Control vs T1:', t, p)
t, p = stats.ttest_ind(dC, dT2)
print('Control vs T2:', t, p)
t, p = stats.ttest_ind(dT1, dT2)
print('T1 vs T2:', t, p)
So, to implement Tukey's test we need to:
- (a) Import from statsmodels.stats the multicomp module
- (b) Call the MultiComparison class constructor and pass as arguments:
- an array of data ('Score' in the example code below)
- an array of labels ('Group' in the example code below)
- (c) Finally, call method tukeyhsd() of the Multicomparison object and print the outcome.
Read more about MultiComparison class@statsmodel documentation
import pandas as pd
import statsmodels.stats.multicomp as ml
# Note that data in sheet have been preformatted in Group and Score columns
dtuk = pd.read_excel('../../data/researchdata.xlsx', sheetname="multicomp")
print(dtuk.head(),'\n')
mcobj = ml.MultiComparison(dtuk.Score, dtuk.Group)
out = mcobj.tukeyhsd(0.05)
print(out)
. Free learning material
. See full copyright and disclaimer notice