Definition and Purpose

  • Analysis of variance (ANOVA) is a commonly used statistical methodology for partitioning the total variance within a data set into components, allowing researchers to estimate and test hypotheses about population means.
  • It is primarily applied to examine the difference in means of three or more independent groups or populations.
  • The fundamental null hypothesis () in ANOVA states that all population means are identical (), meaning any observed variation between the groups is solely attributable to random sampling errors.
  • The alternative hypothesis () dictates that at least one group mean is significantly different from the others.

Variability Disassembly

  • The core theme of ANOVA is dividing the total variation of all data into different identifiable sources of variation.
  • Total Variation (SST): The overall variation among all individual values and the overall mean, which may be caused by the treatment factor or by random error.
  • Variation Between Groups (SSB): The variation between the mean of each specific group and the overall mean, which represents the effect of the treatment or intervention.
  • Variation Within Groups (SSE): The variation between individual values within a group and their respective group mean, which is exclusively caused by random error or individual differences.
  • The mathematical relationship is expressed as Total Sum of Squares (SST) = Between Groups Sum of Squares (SSB) + Within Groups Sum of Squares (SSE).

Test Statistic and The F-Ratio

  • ANOVA utilizes the F-statistic to determine if the means are significantly different.
  • The F-statistic is computed as the ratio of the Mean Square Between groups (MSB) to the Mean Square Within groups (or Mean Square Error, MSE).
  • The mean squares are derived by dividing the respective sum of squares by their degrees of freedom.
  • If the null hypothesis is true (the treatment has no effect), the variation between groups will roughly equal the variation within groups, yielding an F-value close to 1.
  • If the F-statistic is significantly larger than 1, it indicates that the between-group variance exceeds the within-group variance, leading to the rejection of the null hypothesis.

Assumptions of ANOVA

  • For an ANOVA to be statistically valid, the data must satisfy three primary assumptions.
AssumptionDescriptionVerification / Action if Violated
IndependenceThe observed values must be completely independent of each other.Ensure appropriate study design and random sampling.
NormalityThe dependent variable should be approximately normally distributed within each group.Verified by Histograms or Shapiro-Wilk tests; if violated, use non-parametric equivalents (e.g., Kruskal-Wallis).
Homogeneity of VarianceThe population variance of each group should be approximately equal.Verified by Levene’s or Bartlett’s test; if violated, use Welch’s adjusted test.

Types of ANOVA Models

  • Different variations of ANOVA are utilized based on the specific research design and the number of independent variables (factors) being tested.
Type of ANOVAClinical ApplicationNon-Parametric Equivalent
One-Way ANOVACompares means across three or more independent groups evaluating a single continuous outcome against one categorical independent variable.Kruskal-Wallis H test.
Repeated Measures ANOVAApplied when paired or matched data is collected by measuring the same subjects multiple times (e.g., pre-test, post-test, follow-up).Friedman test.
Two-Way ANOVAEvaluates the independent effects and the interaction effect of two distinct categorical independent variables (factors) on a single continuous outcome.N/A (Often requires data transformation or complex modelling).

Multiple Comparisons (Post-Hoc Testing)

  • A significant F-value in ANOVA strictly indicates that at least one group mean differs from the rest, but it does not specify which specific groups differ from each other.
  • To identify the exact differences, researchers must conduct multiple pairwise comparisons between the groups.
  • Conducting multiple standard t-tests is inappropriate because it drastically increases the probability of making a Type I error (false positive).
  • Post-hoc tests adjust the significance level to safeguard against this error inflation while identifying the specific differing pairs.
  • Commonly utilized post-hoc procedures include the Bonferroni correction, Tukey’s Honestly Significant Difference (HSD) test, the Student-Newman-Keuls (SNK-q) test, the Least Significant Difference (LSD-t) test, and the Dunnett-t test (used specifically to compare multiple treatment groups against a single control group).