Analysis of Variance: Foundations and Calculation
Analysis of Variance (ANOVA) is a statistical technique used to examine differences between the means of two or more groups. It partitions the total variability observed in a dataset into different sources, allowing for the assessment of the significance of group differences.
Fundamental Principles of ANOVA
- Partitioning Variance: Decomposes the total sum of squares into components attributable to different sources of variation (e.g., between-group and within-group).
- Hypothesis Testing: Tests the null hypothesis that the means of all groups are equal against the alternative hypothesis that at least one group mean is different.
- F-Statistic: Calculates an F-statistic, which is the ratio of between-group variance to within-group variance. A large F-statistic suggests significant differences between group means.
- Assumptions: Relies on assumptions such as normality of data within groups, homogeneity of variances across groups, and independence of observations. Violation of these assumptions can affect the validity of the results.
Key Components and Calculations
Sum of Squares (SS)
Measures the total variability in the data and the variability attributable to different sources.
- Total Sum of Squares (SST): Measures the total variability in the dataset.
- Between-Groups Sum of Squares (SSB): Measures the variability between the group means.
- Within-Groups Sum of Squares (SSW): Measures the variability within each group (also known as Sum of Squares Error, SSE).
- Relationship: SST = SSB + SSW
Degrees of Freedom (df)
Represents the number of independent pieces of information used to estimate a parameter.
- Total Degrees of Freedom (dfT): N - 1, where N is the total number of observations.
- Between-Groups Degrees of Freedom (dfB): k - 1, where k is the number of groups.
- Within-Groups Degrees of Freedom (dfW): N - k.
- Relationship: dfT = dfB + dfW
Mean Square (MS)
Calculated by dividing the sum of squares by its corresponding degrees of freedom. Represents an estimate of variance.
- Between-Groups Mean Square (MSB): SSB / dfB
- Within-Groups Mean Square (MSW): SSW / dfW
F-Statistic Calculation
The F-statistic is calculated as the ratio of the between-groups mean square to the within-groups mean square: F = MSB / MSW.
P-Value
The p-value represents the probability of observing an F-statistic as extreme as, or more extreme than, the one calculated, assuming the null hypothesis is true. A small p-value (typically less than 0.05) indicates strong evidence against the null hypothesis, suggesting significant differences between group means.
Common ANOVA Designs
- One-Way ANOVA: Examines the effect of one independent variable (factor) with two or more levels on a continuous dependent variable.
- Two-Way ANOVA: Examines the effects of two independent variables and their interaction on a continuous dependent variable.
- Repeated Measures ANOVA: Used when the same subjects are measured multiple times under different conditions.
Post-Hoc Tests
Used after a significant ANOVA result to determine which specific group means differ significantly from each other. Common post-hoc tests include:
- Tukey's Honestly Significant Difference (HSD)
- Bonferroni correction
- Scheffé's method