# Analysis of Variance

Posted April 6, 2009

Analysis of variance (ANOVA) is a statistical technique that was invented by Fisher, and it is therefore sometimes called Fisher’s analysis of variance (ANOVA). In survey research, analysis of variance (ANOVA) is used to compare the means of more than two populations. Analysis of variance (ANOVA) technique can be used in the case of two sample means comparison.

Additionally, it can be used in cases of two samples analysis of variance (ANOVA) and results will be the same as the t-test. For example, if we want to compare income by gender group. In this case, t-test and analysis of variance (ANOVA) results will be the same. In the case of more than two groups, we can use t-test as well, but this procedure will be long. Thus, analysis of variance (ANOVA) technique is the best technique when the independent variable has more than two groups. Before performing the analysis of variance (ANOVA), we should consider some basics and some assumptions on which this test is performed:

Assumptions:

1. **Independence of case:** Independence of case assumption means that the case of the dependent variable should be independent or the sample should be selected randomly. There should not be any pattern in the selection of the sample.

2. **Normality:** Distribution of each group should be normal. The Kolmogorov-Smirnov or the Shapiro-Wilk test may be used to confirm normality of the group.

3. **Homogeneity:** Homogeneity means variance between the groups should be the same. Levene's test is used to test the homogeneity between groups.

If particular data follows the above assumptions, then the analysis of variance (ANOVA) is the best technique to compare the means of two populations, or more than two populations. Analysis of variance (ANOVA) has three types.

**One way analysis of variance (ANOVA):** When we are comparing more than three groups based on one factor variable, then it said to be one way analysis of variance (ANOVA). For example, if we want to compare whether or not the mean output of three workers is the same based on the working hours of the three workers, then it said to be one way analysis of variance (ANOVA).

**Two way analysis of variance (ANOVA):** When factor variables are more than two, then it is said to be two way analysis of variance (ANOVA). For example, based on working condition and working hours, we can compare whether or not the mean output of three workers is the same. In this case, it is said to be two way analysis of variance (ANOVA).

**K way analysis of variance (ANOVA):** When factor variables are k, then it is said to be the k way of analysis of variance (ANOVA).

Key terms and concepts:

**Sum of square between groups:** For the sum of the square between groups, we calculate the individual means of the group, then we take the deviation from the individual mean for each group. And finally, we will take the sum of all groups after the square of the individual group.

Sum of squares within group: In order to get the sum of squares within a group, we calculate the grand mean for all groups and then take the deviation from the individual group. The sum of all groups will be done after the square of the deviation.

**F –ratio:** To calculate the F-ratio, the sum of the squares between groups will be divided by the sum of the square within a group.

**Degree of freedom:** To calculate the degree of freedom between the sums of the squares group, we will subtract one from the number of groups. The sum of the square within the group’s degree of freedom will be calculated by subtracting the number of groups from the total observation.

BSS df = (g-1) for BSS is between the sum of squares, where g is the group, and df is the degree of freedom.

WSS df = (N-g) for WSS within the sum of squares, where N is the total sample size.

Significance: At a predetermine level of significance (usually at 5%), we will compare and calculate the value with the critical table value. Today, however, computers can automatically calculate the probability value for F-ratio. If p-value is lesser than the predetermined significance level, then group means will be different. Or, if the p-value is greater than the predetermined significance level, we can say that there is no difference between the groups’ mean.

**Analysis of variance (ANOVA) in SPSS:** In SPSS, analysis of variance (ANOVA) can be performed in many ways. We can perform this test in SPSS by clicking on the option “one way ANOVA,” available in the “compare means” option. When we are performing two ways or more than two ways analysis of variance (ANOVA), then we can use the “univariate” option available in the GLM menu. SPSS will give additional results as well, like the partial eta square, Power, regression model, post hoc, homogeneity test, etc. The post hoc test is performed when there is significant difference between groups and we want to know exactly which group has means that are significantly different from other groups.

Extension of analysis of variance (ANOVA):

**MANOVA:** Analysis of variance (ANOVA) is performed when we have one dependent metric variable and one nominal independent variable. However, when we have more than one dependent variable and one or more independent variable, then we will use multivariate analysis of variance (MANOVA).

**ANCOVA:** Analysis of covariance (ANCOVA) test is used to know whether or not certain factors have an effect on the outcome variable after removing the variance for quantitative predictors (covariates).

For information on statistical consulting, click here.

Thanks for the great info. I have 2 data set (season A and B). Each season have environment data and organisms data. I would like to know if there is significant or insignificant variations between 2 seasons. What kind of Anova can be used to understand this study? Please also help me to interpret the analysis result. Thank you very much for your help. aida

The following is the data set:

Season A Season B

Temp. 29.3 30.6

Sal. 33 35.5

Oxygen 5.9 4.8

PO4 0.012 0.009

NO4 0.939 0.854

TOM 41.71 20.22

Silica 0.91 0.8

Chl-a 0.92 1.41

Tot. Phytoplankton 39052 2344

Diatoms 38404 1692

Dinoflagellates 648 652

Tot. Zooplankton 504 2412

Copepods 496 2172

Molluscs 4 132

Others 4 108

Hi Aida,

We can certainly assist you with selecting

the appropriate statistics for your study. Please call or email us to

set up a time to speak with Dr. Lani about your research. We look

forward to assisting you!

Hello. We made an experiment choosing 8 different plots, 4 where in edge of a forest fragment (almost open canopy) and other 4 where to the inside of the forest (closed canopy). In the study plots we set bags of sedes to germinate (with two treatments: scarified seeds versus non scarified ones), this variable was expressed in germination percentages. We wanted to see the effect of closed or open canopy and the condition of seeds (scarified, non scarified) on the germination percentages (arc sin transformed in order to meet the assumptions of normality). We applied a two way ANOVA. The interaction of the two factors (independent variables) were not significative. The p value in the results for the condition of seeds was 0.02, and Tukey test showed two significantly different groups with the highest mean values of germination for the scarified seeds. Differences were also observed for the plots p= 0.04, but when we applied Tukey test, the means of the 8 plots appeared in only one column, like if they were only one group. When we applied Fisher index there is when we saw different groups…but we are not supposed to use Fisher in this case, but only Tukey, right? What way would be a good one to analyze our data? Forgot to mention that these anovas were made with Statistica (StatSoft). Thanks very much for your help. Sorry about my english.