Home      |       Contents       |       About

Prev: Experimental design       |      Next: -


  • A hypothesis (plural: hypotheses; from Greek 'υπόθεσις') is a specific statement of prediction relevant to the phenomenon under study.
  • Hypothesis testing refers to the procedure of applying appropriate statistical controls that help formulate conclusions regarding the "truthfulness" of the stated hypotheses.
  • Hypothesis tests (or 'statistical controls') are a whole range of statistical algorithms for data processing that return:
    • (a) a value for the 'test statistic' we need to compute, and
    • (b) the probability for this value to appear
  • Based on this probability we either:
    • 'reject' the null hypothesis and support an 'alternative' hypothesis that better interprets the data, or
    • 'fail to reject' the null hypothesis in which case we stay with it, until -of course- new data may lead us to reject it (note that, in general, we avoid the expression "to accept the null hypothesis").

Stating hypotheses: an example

Suppose you are an educational technology researcher and you are interested in the following research topic:

  • Does background music in a multimedia learning environment have a positive/negative impact on students who use this environment to learn?

Furthermore, suppose that you have no reason to believe that b.m. might have any impact on student learning whatsoever. Then you may state a null hypothesis as follows:

  • "Background music has no impact whatsoever on students' learning"

The above is your "null hypothesis" (H0). As said, based on the calculated probability you may either:

  • 'reject' and come up with an alternative (Ha) hypothesis that better explains what happens, for example "Background music has statistically significant impact on students' learning".
  • 'fail to reject': and declare that your experiment failed to reject the null hypothesis.

Hypothesis directionality

  • 'non-directional' hypothesis: does not predict the direction of impact (positive/negative).

    • "Background music has no impact whatsoever on students' learning" is a non-directional hypothesis
  • 'directional' hypothesis: it predicts the direction of impact (positive/negative).

    • "Background music has a positive impact on students' learning" is a directional hypothesis
  • We usually state an alternative hypothesis as directional in which case the respective null hypothesis should be stated so that when (H0) and (Ha) are taken together they are mutually exclusive and exhaustive (they leave no other alternative)

    • "Background music has not a positive impact on students' learning" is the null hypothesis for the above directional Ha

The rationale for hypothesis testing

  • 'Signal to noise' ratio: essentially all statistical hypothesis tests try to elicit order out of chaos by estimating a 'signal to noise' ratio. In the core of this conceptualization lies data variability.

    In statistics we usually do measurements in sample groups. For example, we measure the performance of a group of students after study, the prices of a group of stocks at a certain moment, the answers that a group of users give in a user-satisfaction survey, etc. These group measurements exhibit a certain variability (not all students in a group get the same perfomance grade, grades vary). Also, group measurements vary across groups (the mean grade most probably is differnet across student groups studying with or without background music).

  • 'Variability' refers to the property of data sets exhibiting variations (random or not) around some value of reference (for example, the mean value of the set). Considering this unavoidable variation we introduce the 'noise' and 'signal' concepts as follows:

    • 'Noise': Data in a set vary from the reference value (the mean value) due to various uncontrollable factors that may affect any single measurement. Variability in this case is considered as a natural and unavoidable 'noise' that interferes in data measurement.
    • 'Signal': when comparing different data sets, mean values will almost always appear different. However, if this difference is big enough this might signal that important variability exists due to some factor totally different from the factors producing the effect of the anticipated random noise.
  • So, now we have a way of estimating the 'noise to signal ratio':

    • 'Noise' (denominator) is the expected variation within the groups. As explained, this represents the variability in measurement which appears in our sample groups (within groups variability)
    • 'Signal' (nominator) is a possibly exceptional variability identified when comparing two or more groups. Thus, the signal is the variability between the two (or more) sample groups (between groups variability)

  • Obviously, we can calculate the above measure if we know how to estimate the between and within group variability (and we do know). Thus, we have an indicator showing how big the between group variability is compared to the within group variability. And, is this useful?
  • Simply estimating the above measure is not enough. Suppose you get the value of 2, or 2.5, or 4, or 128. How do you know how important each value is?
  • Here is where distributions come into play.


  • A probability distribution (or simply 'distribution') of a variable is an expression (in mathematical, graphical, computational notation, etc.) of the probability that when measuring the variable the result will lie within a specific range of values.
  • For example, suppose you throw a fair dice. What is the probability that the dice brings 1,2,3,4,5 or 6? In each of the above cases, 1/6 you would say. So, the probability distribution across all possible outcomes is constant: 1/6
  • Another example: if you ask "what is the age mortality" distribution in a certain population you may get a distribution like the following. In this case the probability distribution across all ageas varies.

  • We further explore distributions in the 'Distributions' section, but right now what is important to us is that a distribution allows us to estimate the probability that a measurement will be within a specific range of values. For example, from the distribution above it is easy to estimate the probability that someone dies before reaching the age of 90 (recall what the area under a probability distribution curve represents)
  • Going back to our "measure = signal/noise" story we easily understand now that if we know a probability distribution expressing the variability of our measure then we can estimate what the probability is that a measurement outcome will lie within a certain range of values.
  • To be specific: suppose we estimated a measure value of 2.5. Then by looking at the relevant distribution we see that the probability of measuring this or greater value (in other words: a value in the [2.5, +inf) interval) is, say, 0.035.
  • Now, we are getting somewhere:
    • when to 'reject': if we believe that this is quite a small probability then it's reasonable to assume that the measure (signal/noise ratio) did not get its exceptionally value by pure chance. Some other factor might had a significant impact during the measurement. Typically, in a situation like this, we 'reject' the original null hypothesis (H0) and look for an alternative Ha. In educational studies we are satisfied with a probability cutoff value of 0.05 (the a level) so the above p=0.035 value would lead us to this path of thinking.
    • when to 'fail to reject': if, however, we still think that this is a high probability (as in critical medical studies where the 'a cutoff level' is usually set at 0.001) then we may continue arguing that we 'failed to reject' the H0.


  • Overall, hypothesis testing:
    • a) Defines a "signal/noise measure" and provides a means (algorithm) for its computation
    • b) Uses a "measure-relevant distribution" to provide the probability that a measure lies within a range of values

The rest is the researcher's decision

Further reading

. Free learning material
. See full copyright and disclaimer notice