10 comparing means
“Are these two things the same?”
Often, we want to know if our study subjects contain different amounts of certain analytes. For example, “Does this lake over here contain more potassium than that lake over there?” For this, we need statistical tests. Here, we will have a look at comparing mean values for analyte abundance in situations with two samples and in situations with more than two samples.
I find many of the concepts discussed in this chapter easier to think about with an example in mind. For that, suppose that you are an analytical chemist on Hawaii that is studying the chemistry of the island’s aquifers. you have the data set hawaii_aquifers
. You can see in the output below the structure of the data set - we have 990 measurements of a 9 different analytes in multiple wells that draw on a set of 10 aquifers.
hawaii_aquifers## # A tibble: 990 × 6
## aquifer_code well_name longitude latitude analyte abundance
## <chr> <chr> <dbl> <dbl> <chr> <dbl>
## 1 aquifer_1 Alewa_Heights_Spring NA NA SiO2 11
## 2 aquifer_1 Alewa_Heights_Spring NA NA Cl 20
## 3 aquifer_1 Alewa_Heights_Spring NA NA Mg 7.6
## 4 aquifer_1 Alewa_Heights_Spring NA NA Na 14
## 5 aquifer_1 Alewa_Heights_Spring NA NA K 0.6
## 6 aquifer_1 Alewa_Heights_Spring NA NA SO4 7
## 7 aquifer_1 Alewa_Heights_Spring NA NA HCO3 51
## 8 aquifer_1 Alewa_Heights_Spring NA NA dissolved_s… 118
## 9 aquifer_1 Alewa_Heights_Spring NA NA Ca 6.3
## 10 aquifer_1 Beretania_High_Service NA NA SiO2 30
## # … with 980 more rows
unique(hawaii_aquifers$aquifer_code)
## [1] "aquifer_1" "aquifer_2" "aquifer_3" "aquifer_4" "aquifer_5"
## [6] "aquifer_6" "aquifer_7" "aquifer_8" "aquifer_9" "aquifer_10"
Importantly, there are many wells that draw on each aquifer, as shown in the graph below.
%>%
hawaii_aquifers select(aquifer_code, well_name) %>%
group_by(aquifer_code) %>%
summarize(n_wells = length(unique(well_name))) -> aquifers_summarized
aquifers_summarized## # A tibble: 10 × 2
## aquifer_code n_wells
## <chr> <int>
## 1 aquifer_1 12
## 2 aquifer_10 7
## 3 aquifer_2 5
## 4 aquifer_3 3
## 5 aquifer_4 16
## 6 aquifer_5 4
## 7 aquifer_6 12
## 8 aquifer_7 9
## 9 aquifer_8 3
## 10 aquifer_9 30
ggplot(aquifers_summarized) + geom_col(aes(x = n_wells, y = aquifer_code))
10.1 definitions
populations and independent measurements: When we are comparing means, we are comparing two sets of values. It is important to consider where these values came from in the first place. In particular, it is usually useful to think of these values as representatives of larger populations. In the example of our aquifer data set, we can think of the measurements from different wells drawing on the same aquifer as independent measurements of the “population” (i.e. the aquifer).
the null hypothesis: When we conduct a statistical test, we are testing the null hypothesis. The null (think “default”) hypothesis is that there is no difference bewteen the means (hence the name “null”). In the example of our aquifers, let’s say that we’re interested in whether two aquifers have different abundances of potassium - in this case the null hypothesis is that they do not differ, in other words, that they have the same amount of potassium.
the p value: The p value represents the probability of getting data as extreme as our results if the null hypothesis is true. In other words - the p value is the probability that we would observe the differences we did, if in fact there were no differences in the means at all. To continue with our example: suppose we measure potassium levels in 10% of the wells that access each aquifer and find that aquifer_1 has potassium levels of 14 +/- 2 and aquifer_2 has potassium levels of 12 +/- 1. Suppose that we then conduct a statistical test and get a p value of 0.04. This means that, assuming the aquifers have the same magneisum levels (i.e. assuming the null hypothesis is true), there is a 4% chance that we would get the measured values that we did. In other words, IF the aquifers have the same potassium abundance, it is pretty unlikely that we would have obtained the measurements that we did.
Please note that the the p value is not the probability of a detected difference being a false positive. The probability of a false positive requires additional information in order to be calculated. For further discussion please see the end of this chapter.
10.2 test selection
There are many different types of statistical tests. Below is a flow chart illustrating how it is recommended that statistical tests be used in this course. You can see that there are three regimes of tests: variance and normality tests (blue), parametric tests (green), and non-parametric tests (orange):
When we are comparing means, we need to first determine what kind of statistical tests we can use with our data. If (i) our data can be reasonably modelled by a normal distribution and (ii) the variances about the two means are similar, then we can use the more powerful “parametric” tests (i.e. tests that will be more likely to detect a difference in means, assuming one exists). If one of these criteria are not met, then we need to use less powerful “non-parametric” tests.
We can check our data for normality and similar variances using the Shapiro test and the Levene test. Let’s use the hawaii_aquifers
data as an example, and let’s consider only the element potassium:
<- hawaii_aquifers %>%
K_data filter(analyte == "K")
K_data## # A tibble: 110 × 6
## aquifer_code well_name longitude latitude analyte abundance
## <chr> <chr> <dbl> <dbl> <chr> <dbl>
## 1 aquifer_1 Alewa_Heights_Spring NA NA K 0.6
## 2 aquifer_1 Beretania_High_Service NA NA K 3.1
## 3 aquifer_1 Beretania_Low_Service NA NA K 3.6
## 4 aquifer_1 Kuliouou_Well -158. 21.3 K 3.9
## 5 aquifer_1 Manoa_Well_II -158. 21.3 K 0.6
## 6 aquifer_1 Moanalua_Wells_Pump_1 -158. 21.4 K 2.9
## 7 aquifer_1 Moanalua_Wells_Pump_2 -158. 21.4 K 2.8
## 8 aquifer_1 Moanalua_Wells_Pump_3 -158. 21.4 K 3.4
## 9 aquifer_1 Nuuanu_Aerator_Well -158. 21.4 K 0.4
## 10 aquifer_1 Palolo_Tunnel -158. 21.3 K 0.7
## # … with 100 more rows
To work with two means, let’s just look at aquifers 1 and 6:
<- K_data %>%
K_data_1_2 filter(aquifer_code %in% c("aquifer_1", "aquifer_6"))
K_data_1_2## # A tibble: 24 × 6
## aquifer_code well_name longitude latitude analyte abundance
## <chr> <chr> <dbl> <dbl> <chr> <dbl>
## 1 aquifer_1 Alewa_Heights_Spring NA NA K 0.6
## 2 aquifer_1 Beretania_High_Service NA NA K 3.1
## 3 aquifer_1 Beretania_Low_Service NA NA K 3.6
## 4 aquifer_1 Kuliouou_Well -158. 21.3 K 3.9
## 5 aquifer_1 Manoa_Well_II -158. 21.3 K 0.6
## 6 aquifer_1 Moanalua_Wells_Pump_1 -158. 21.4 K 2.9
## 7 aquifer_1 Moanalua_Wells_Pump_2 -158. 21.4 K 2.8
## 8 aquifer_1 Moanalua_Wells_Pump_3 -158. 21.4 K 3.4
## 9 aquifer_1 Nuuanu_Aerator_Well -158. 21.4 K 0.4
## 10 aquifer_1 Palolo_Tunnel -158. 21.3 K 0.7
## # … with 14 more rows
ggplot(K_data_1_2, aes(x = aquifer_code, y = abundance)) +
geom_boxplot() +
geom_point()
Are these data normally distributed? Do they have similar variance? Let’s get a first approximation by looking at a plot:
%>%
K_data_1_2 ggplot(aes(x = abundance)) +
geom_histogram(bins = 30) +
facet_wrap(~aquifer_code) +
geom_density(aes(y = ..density..*10), color = "blue")
Based on this graphic, it’s hard to say! Let’s use a statistical test to help. When we want to run the Shaprio test, we are looking to see if each group has normally distributed here (here group is “aquifer_code”, i.e. aquifer_1 and aquifer_6). This means we need to group_by(aquifer_code)
before we run the test:
%>%
K_data_1_2 group_by(aquifer_code) %>%
shapiro_test(abundance)
## # A tibble: 2 × 4
## aquifer_code variable statistic p
## <chr> <chr> <dbl> <dbl>
## 1 aquifer_1 abundance 0.885 0.102
## 2 aquifer_6 abundance 0.914 0.239
Both p-values are above 0.05! This means that the distributions are not significantly different from a normal distribution. What about the variances about the two means? Are they similar? For this we need a Levene test. With that test, we are not looking within each group, but rather across groups - this means we do NOT need to group_by(aquifer_code)
and should specify a y ~ x
formula instead:
%>%
K_data_1_2 levene_test(abundance ~ aquifer_code)
## # A tibble: 1 × 4
## df1 df2 statistic p
## <int> <int> <dbl> <dbl>
## 1 1 22 0.289 0.596
The p-value from this test is 0.596! This means that their variances are not significantly different.
10.3 two means
Now, since our data passed both test, this means we can use a normal t-test. A t-test is a parametric test. This means that it relies on modelling the data using a normal distribution in order to make comparisons. It is also a powerful test. This means that it is likely to detect a difference in means, assuming one is present. Let’s try it out:
%>%
K_data_1_2 t_test(abundance ~ aquifer_code)
## # A tibble: 1 × 8
## .y. group1 group2 n1 n2 statistic df p
## * <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl>
## 1 abundance aquifer_1 aquifer_6 12 12 -2.75 20.5 0.0121
A p-value of 0.012! This is below 0.05, meaning that there is a 95% chance that the two means are different. Suppose that our data had not passed the Shapiro and/or Levene tests. We would then need to use a Wilcox test. The Wilcox test is a non-parametric test, which means that it does not use a normal distribution to model the data in order to make comparisons. This means that is a less powerful test than the t-test, which means that it is less likely to detect a difference in the means, assuming there is one. For fun, let’s try that one out and compare the p-values from the two methods:
%>%
K_data_1_2 wilcox_test(abundance ~ aquifer_code)
## # A tibble: 1 × 7
## .y. group1 group2 n1 n2 statistic p
## * <chr> <chr> <chr> <int> <int> <dbl> <dbl>
## 1 abundance aquifer_1 aquifer_6 12 12 33.5 0.0282
A p-value of 0.028! This is higher than the value given by the t-test (0.012). That is because the Wilcox test is a less powerful test: it is less likely to detect differences in means, assuming they exist.
10.4 more than two means
In the previous section we compared two means. What if we want to compare means from more than two study subjects? The first step is again to determine which tests to use. Let’s consider our hawaii aquifer data again, though this time let’s use all the aquifers, not just two:
<- hawaii_aquifers %>%
K_data filter(analyte == "K")
K_data## # A tibble: 110 × 6
## aquifer_code well_name longitude latitude analyte abundance
## <chr> <chr> <dbl> <dbl> <chr> <dbl>
## 1 aquifer_1 Alewa_Heights_Spring NA NA K 0.6
## 2 aquifer_1 Beretania_High_Service NA NA K 3.1
## 3 aquifer_1 Beretania_Low_Service NA NA K 3.6
## 4 aquifer_1 Kuliouou_Well -158. 21.3 K 3.9
## 5 aquifer_1 Manoa_Well_II -158. 21.3 K 0.6
## 6 aquifer_1 Moanalua_Wells_Pump_1 -158. 21.4 K 2.9
## 7 aquifer_1 Moanalua_Wells_Pump_2 -158. 21.4 K 2.8
## 8 aquifer_1 Moanalua_Wells_Pump_3 -158. 21.4 K 3.4
## 9 aquifer_1 Nuuanu_Aerator_Well -158. 21.4 K 0.4
## 10 aquifer_1 Palolo_Tunnel -158. 21.3 K 0.7
## # … with 100 more rows
ggplot(data = K_data, aes(y = aquifer_code, x = abundance)) +
geom_boxplot() +
geom_point(color = "maroon", alpha = 0.6, size = 3)
Let’s check visually to see if each group is normally distributed and to see if they have roughly equal variance:
%>%
K_data group_by(aquifer_code) %>%
ggplot(aes(x = abundance)) +
geom_histogram(bins = 30) +
facet_wrap(~aquifer_code) +
geom_density(aes(y = ..density..*10), colour = "blue")
Again, it is somewhat hard to tell visually if these data are normally distributed. It seems pretty likely that they have different variances about the means, but let’s check using the Shapiro and Levene tests. Don’t forget: with the Shaprio test, we are looking within each group and so need to group_by()
, with the Levene test, we are looking across groups, and so need to provide a y~x
formula:
%>%
K_data group_by(aquifer_code) %>%
shapiro_test(abundance)
## # A tibble: 10 × 4
## aquifer_code variable statistic p
## <chr> <chr> <dbl> <dbl>
## 1 aquifer_1 abundance 0.885 0.102
## 2 aquifer_10 abundance 0.864 0.163
## 3 aquifer_2 abundance 0.913 0.459
## 4 aquifer_3 abundance 0.893 0.363
## 5 aquifer_4 abundance 0.948 0.421
## 6 aquifer_5 abundance 0.902 0.421
## 7 aquifer_6 abundance 0.914 0.239
## 8 aquifer_7 abundance 0.915 0.355
## 9 aquifer_8 abundance 0.842 0.220
## 10 aquifer_9 abundance 0.786 0.00000866
%>%
K_data levene_test(abundance ~ aquifer_code)
## # A tibble: 1 × 4
## df1 df2 statistic p
## <int> <int> <dbl> <dbl>
## 1 9 100 3.12 0.00239
Based on these tests, it looks like the data for aquifer 9 is significantly different from a normal distribution (Shaprio test p = 0.000008), and the variances are certainly different from one another (Levene test p = 0.002).
Let’s assume for a second that our data passed these tests. This means that we could reasonably model our data with normal distributions and use a parametric test to compare means. This means that we can use an ANOVA to test for differences in means.
10.4.1 ANOVA, Tukey tests
We will use the anova_test
function from the package rstatix
. It will tell us if any of the means in the data are statistically different from one another. However, if there are differences between the means, it will not tell us which of them are different.
%>%
K_data anova_test(abundance ~ aquifer_code)
## Coefficient covariances computed by hccm()
## ANOVA Table (type II tests)
##
## Effect DFn DFd F p p<.05 ges
## 1 aquifer_code 9 100 10.021 7.72e-11 * 0.474
A p-value of 7.7e-11! There are definitely some significant differences among this group. But, WHICH are different from one another though? For this, we need to run Tukey’s Honest Significant Difference test (implemented using tukey_hsd
). This will essentially run t-test on all the pairs of study subjects that we can derive from our data set (in this example, aquifer_1 vs. aquifer_2, aquifer_1 vs. aquifer_3, etc.). After that, it will correct the p-values according to the number of comparisons that it performed. This controls the rate of type I error that we can expect from the test. These corrected values are provided to us in the p.adj
column.
%>%
K_data tukey_hsd(abundance ~ aquifer_code)
## # A tibble: 45 × 9
## term group1 group2 null.value estimate conf.low conf.high p.adj
## * <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 aquifer_code aquifer_1 aquif… 0 0.00357 -2.00 2.01 1 e+0
## 2 aquifer_code aquifer_1 aquif… 0 1.44 -0.668 3.55 4.56e-1
## 3 aquifer_code aquifer_1 aquif… 0 0.375 -2.35 3.10 1 e+0
## 4 aquifer_code aquifer_1 aquif… 0 -1.15 -2.75 0.437 3.68e-1
## 5 aquifer_code aquifer_1 aquif… 0 -0.845 -3.09 1.40 9.68e-1
## 6 aquifer_code aquifer_1 aquif… 0 1.98 0.261 3.71 1.15e-2
## 7 aquifer_code aquifer_1 aquif… 0 2.70 0.837 4.56 3.56e-4
## 8 aquifer_code aquifer_1 aquif… 0 -0.125 -2.85 2.60 1 e+0
## 9 aquifer_code aquifer_1 aquif… 0 -0.378 -1.78 1.03 9.97e-1
## 10 aquifer_code aquifer_10 aquif… 0 1.44 -0.910 3.79 6.13e-1
## # … with 35 more rows, and 1 more variable: p.adj.signif <chr>
Using the output from our tukey test, we can determine which means are similar. We can do this using the p_groups function:
<- K_data %>%
groups_based_on_tukey tukey_hsd(abundance ~ aquifer_code) %>%
p_groups()
groups_based_on_tukey## treatment group spaced_group
## aquifer_1 aquifer_1 ab ab
## aquifer_10 aquifer_10 abc abc
## aquifer_2 aquifer_2 acd a cd
## aquifer_3 aquifer_3 abcd abcd
## aquifer_4 aquifer_4 b b
## aquifer_5 aquifer_5 ab ab
## aquifer_6 aquifer_6 cd cd
## aquifer_7 aquifer_7 d d
## aquifer_8 aquifer_8 abc abc
## aquifer_9 aquifer_9 ab ab
We can use the output from p_groups
to annotate our plot:
ggplot(data = K_data, aes(y = aquifer_code, x = abundance)) +
geom_boxplot() +
geom_point(color = "maroon", alpha = 0.6, size = 3) +
geom_text(data = groups_based_on_tukey, aes(y = treatment, x = 9, label = group))
Excellent! This plot shows us, using the letters on the same line with each aquifer, which means are the same and which are different. If a letter is shared among the labels in line with two aquifers, it means that their means do not differ significantly. For example, aquifer 2 and aquifer 6 both have “b” in their labels, so their means are not different - and are the same as those of aquifers 3 and 10.
10.4.2 Kruskal, Dunn tests
The above ANOVA example is great, but remember - our data did not pass the Shapiro or Levene tests. This means not all our data can be modelled by a normal distribution and taht we need to use a non-parametric test. The non-parametric alternative to the ANOVA is called the Kruskal test. Like the Wilcox test, it is less powerful that its parametric relative, meaning that it is less likely to detected differences, should they exist. However, since our data do not pass the Shapiro/Levene tests, we have to resort to the Kruskal test. Let’s try it out:
%>%
K_data kruskal_test(abundance ~ aquifer_code)
## # A tibble: 1 × 6
## .y. n statistic df p method
## * <chr> <int> <dbl> <int> <dbl> <chr>
## 1 abundance 110 57.7 9 0.0000000037 Kruskal-Wallis
A p-value of 3.9e-9! This is higher than the p-value from running ANOVA on the same data (remember, the Kruskal test is less powerful). Never the less, the value is still well below 0.05, meaning that some of the means are different. So, how do we determine WHICH are different from one another? When we ran ANOVA the follow-up test (the post hoc test) was Tukey’s HSD. After the Kruskal test, the post hoc test we use is the Dunn test. Let’s try:
%>%
K_data dunn_test(abundance ~ aquifer_code)
## # A tibble: 45 × 9
## .y. group1 group2 n1 n2 statistic p p.adj p.adj.signif
## * <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <chr>
## 1 abundance aquifer_1 aquifer_10 12 7 -0.205 0.838 1 ns
## 2 abundance aquifer_1 aquifer_2 12 6 2.25 0.0242 0.702 ns
## 3 abundance aquifer_1 aquifer_3 12 3 0.911 0.362 1 ns
## 4 abundance aquifer_1 aquifer_4 12 17 -2.70 0.00702 0.232 ns
## 5 abundance aquifer_1 aquifer_5 12 5 -1.15 0.252 1 ns
## 6 abundance aquifer_1 aquifer_6 12 12 2.53 0.0113 0.351 ns
## 7 abundance aquifer_1 aquifer_7 12 9 3.02 0.00254 0.0967 ns
## 8 abundance aquifer_1 aquifer_8 12 3 0.182 0.855 1 ns
## 9 abundance aquifer_1 aquifer_9 12 36 -0.518 0.605 1 ns
## 10 abundance aquifer_10 aquifer_2 7 6 2.20 0.0278 0.777 ns
## # … with 35 more rows
This gives us adjusted p-values for all pairwise comparisons. Once again, we can use p_groups()
to give us a compact letter display for each group, which can then be used to annotate the plot:
<- K_data %>%
groups_based_on_dunn dunn_test(abundance ~ aquifer_code) %>%
p_groups()
groups_based_on_dunn## treatment group spaced_group
## aquifer_1 aquifer_1 abcd abcd
## aquifer_10 aquifer_10 abcd abcd
## aquifer_2 aquifer_2 abc abc
## aquifer_3 aquifer_3 abcd abcd
## aquifer_4 aquifer_4 d d
## aquifer_5 aquifer_5 acd a cd
## aquifer_6 aquifer_6 ab ab
## aquifer_7 aquifer_7 b b
## aquifer_8 aquifer_8 abcd abcd
## aquifer_9 aquifer_9 cd cd
ggplot(data = K_data, aes(y = aquifer_code, x = abundance)) +
geom_boxplot() +
geom_point(color = "black", alpha = 0.4, size = 2) +
scale_x_continuous(name = "Potassium abundance", breaks = seq(0,10,1)) +
scale_y_discrete(name = "Aquifer code") +
geom_text(data = groups_based_on_dunn, aes(y = treatment, x = 9, label = group)) +
theme_bw()
Note that these groupings are different from those generated by ANOVA/Tukey.
10.5 pairs of means
Oftentimes we have more than two means to compare, but rather than wanting to compare all means at once, we want to compare them in a pairwise fashion. For example, suppose we want to know if any of the aquifers contain different amounts of Na and Cl. We are not interested in testing for differences among all values of Na and Cl, rather, we want to test all pairs of Na and Cl values arising from each aquifer. That is to say, we want to compare the means in each facet of the plot below:
%>%
hawaii_aquifers filter(analyte %in% c("Na", "Cl")) %>%
ggplot(aes(x = analyte, y = abundance)) + geom_violin() + geom_point() + facet_grid(.~aquifer_code)
Fortunately, we can use an approach that is very similar to the what we’ve learned in the earlier portions of this chapter, just with minor modifications. Let’s have a look! We start with the Shapiro and Levene tests, as usual (note that we group using two variables when using the Shapiro test so that each analyte within each aquifer is considered as an individual distribution):
%>%
hawaii_aquifers filter(analyte %in% c("Na", "Cl")) %>%
group_by(analyte, aquifer_code) %>%
shapiro_test(abundance)
## # A tibble: 20 × 5
## aquifer_code analyte variable statistic p
## <chr> <chr> <chr> <dbl> <dbl>
## 1 aquifer_1 Cl abundance 0.900 1.59e- 1
## 2 aquifer_10 Cl abundance 0.486 1.09e- 5
## 3 aquifer_2 Cl abundance 0.869 2.24e- 1
## 4 aquifer_3 Cl abundance 0.75 0
## 5 aquifer_4 Cl abundance 0.903 7.49e- 2
## 6 aquifer_5 Cl abundance 0.767 4.22e- 2
## 7 aquifer_6 Cl abundance 0.741 2.15e- 3
## 8 aquifer_7 Cl abundance 0.893 2.12e- 1
## 9 aquifer_8 Cl abundance 0.878 3.17e- 1
## 10 aquifer_9 Cl abundance 0.414 7.58e-11
## 11 aquifer_1 Na abundance 0.886 1.06e- 1
## 12 aquifer_10 Na abundance 0.593 2.26e- 4
## 13 aquifer_2 Na abundance 0.884 2.88e- 1
## 14 aquifer_3 Na abundance 0.822 1.69e- 1
## 15 aquifer_4 Na abundance 0.933 2.41e- 1
## 16 aquifer_5 Na abundance 0.782 5.71e- 2
## 17 aquifer_6 Na abundance 0.764 3.80e- 3
## 18 aquifer_7 Na abundance 0.915 3.51e- 1
## 19 aquifer_8 Na abundance 0.855 2.53e- 1
## 20 aquifer_9 Na abundance 0.544 2.09e- 9
Looks like some of those distributions are significantly different from normal! Let’s run the levene test anyway. Note that for this particular case of the Levene test, we are interested in testing whether each pair of distributions has similar variances. For that we need to feed the Levene test data that is grouped by aquifer_code (so that it tests each pair as a group), then we need to specify the y ~ x formula (which in this case is abundance ~ analyte):
%>%
hawaii_aquifers filter(analyte %in% c("Na", "Cl")) %>%
group_by(aquifer_code) %>%
levene_test(abundance ~ analyte)
## # A tibble: 10 × 5
## aquifer_code df1 df2 statistic p
## <chr> <int> <int> <dbl> <dbl>
## 1 aquifer_1 1 22 10.5 0.00375
## 2 aquifer_10 1 12 0.0535 0.821
## 3 aquifer_2 1 10 0.0243 0.879
## 4 aquifer_3 1 4 0.320 0.602
## 5 aquifer_4 1 32 1.57 0.219
## 6 aquifer_5 1 8 0.474 0.511
## 7 aquifer_6 1 22 1.03 0.322
## 8 aquifer_7 1 16 1.54 0.232
## 9 aquifer_8 1 4 0.515 0.512
## 10 aquifer_9 1 70 1.07 0.304
It looks like the variances of the pair in aquifer 1 have significantly different variances. So - we for sure need to be using non-parametric testing. If this were a simple case of two means we would use the wilcox_test
, but we have may pairs, so we will use pairwise_wilcox_test
(note that with this test there are options for various styles of controlling for multiple comparisons, see: ?pairwise_wilcox_test
):
%>%
hawaii_aquifers filter(analyte %in% c("Na", "Cl")) %>%
group_by(aquifer_code) %>%
pairwise_wilcox_test(abundance~analyte)
## # A tibble: 10 × 10
## aquifer_code .y. group1 group2 n1 n2 statistic p p.adj
## * <chr> <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl>
## 1 aquifer_1 abundance Cl Na 12 12 99.5 0.119 1.19e-1
## 2 aquifer_10 abundance Cl Na 7 7 14 0.198 1.98e-1
## 3 aquifer_2 abundance Cl Na 6 6 36 0.005 5 e-3
## 4 aquifer_3 abundance Cl Na 3 3 3 0.658 6.58e-1
## 5 aquifer_4 abundance Cl Na 17 17 189 0.129 1.29e-1
## 6 aquifer_5 abundance Cl Na 5 5 17.5 0.329 3.29e-1
## 7 aquifer_6 abundance Cl Na 12 12 53 0.285 2.85e-1
## 8 aquifer_7 abundance Cl Na 9 9 42 0.929 9.29e-1
## 9 aquifer_8 abundance Cl Na 3 3 6 0.7 7 e-1
## 10 aquifer_9 abundance Cl Na 36 36 248. 0.00000554 5.54e-6
## # … with 1 more variable: p.adj.signif <chr>
Excellent! It looks like there is a statistically significant difference between the means of the abundances of Cl and Na in aquifer_2 and (surprisingly?) in aquifer_9 (perhaps due to the large number of observations?).
What would we have done if our Shaprio and Levene tests had revealed no significant differences? Well, a pairwise_t_test
of course!
%>%
hawaii_aquifers filter(analyte %in% c("Na", "Cl")) %>%
group_by(aquifer_code) %>%
pairwise_t_test(abundance~analyte) -> test_output
test_output## # A tibble: 10 × 10
## aquifer_code .y. group1 group2 n1 n2 p p.signif p.adj
## * <chr> <chr> <chr> <chr> <int> <int> <dbl> <chr> <dbl>
## 1 aquifer_1 abundance Cl Na 12 12 0.0469 * 0.0469
## 2 aquifer_10 abundance Cl Na 7 7 0.882 ns 0.882
## 3 aquifer_2 abundance Cl Na 6 6 0.0000375 **** 0.0000375
## 4 aquifer_3 abundance Cl Na 3 3 0.683 ns 0.683
## 5 aquifer_4 abundance Cl Na 17 17 0.103 ns 0.103
## 6 aquifer_5 abundance Cl Na 5 5 0.145 ns 0.145
## 7 aquifer_6 abundance Cl Na 12 12 0.566 ns 0.566
## 8 aquifer_7 abundance Cl Na 9 9 0.521 ns 0.521
## 9 aquifer_8 abundance Cl Na 3 3 0.428 ns 0.428
## 10 aquifer_9 abundance Cl Na 36 36 0.948 ns 0.948
## # … with 1 more variable: p.adj.signif <chr>
Excellent, now we see how to run parametric and non-parametric pairwise comparisons. How do we annotate plots with the output of these tests? Here is an example:
<- data.frame(
anno xmin = test_output$group1,
xmax = test_output$group2,
y_position = c(150, 150, 150, 175, 80, 50, 300, 150, 50, 125),
text = test_output$p.signif,
text_size = 10,
text_vert_offset = 10,
text_horiz_offset = 1.5,
tip_length_xmin = 5,
tip_length_xmax = 5,
aquifer_code = test_output$aquifer_code
)
%>%
hawaii_aquifers filter(analyte %in% c("Na", "Cl")) %>%
ggplot(aes(x = analyte, y = abundance)) +
geom_violin(fill = "gold", color = "black") +
geom_point(shape = 21, fill = "maroon", color = "black") +
facet_grid(.~aquifer_code) +
geomSignif(data = anno) +
scale_x_discrete(name = "Analyte") +
scale_y_continuous(name = "Abundance") +
theme_bw() +
theme(
text = element_text(size = 16)
)
10.6 further reading
For more on comparing multiple means in R: www.datanovia.com
For more on parametric versus non-parametric tests: Statistics by Jim
For more on interpreting p values: [The p value wars (again) by Ulrich Dirnagl]
10.7 exercises
Using the hawaii_aquifers
data set, please complete the following:
Choose one analyte and filter the data so only the rows for that analyte are shown.
Choose two of the aquifers. Are the mean abundances for your chosen analyte different in these two aquifers? Don’t forget to test your data for normality and homogeneity of variance before selecting a statistical test. Use a plot to illustrate whether the means are similar or different.
Choose a second analyte, different from the first one you chose. Considering all the aquifers in the dataset, do any of them have the same abundance of this analyte? Again, don’t forget about normality and homogeneity of variance tests. Use a plot to illustrate your answer.
Repeat #3 above, but switch the type of test used (i.e. use non-parametric if you used parametric for #3 and vice-versa). Compare the p values and p groups obtained by the two methods. Use a graphic to illustrate this. Why are they different?