comparing means
“Are these two things the same?”
Often, we want to know if our study subjects contain different amounts of certain analytes. For example, “Does this lake over here contain more potassium than that lake over there?” For this, we need statistical tests. Here, we will have a look at comparing mean values for analyte abundance in situations with two samples and in situations with more than two samples.
I find many of the concepts discussed in this chapter easier to think about with an example in mind. For that, suppose that you are an analytical chemist on Hawaii that is studying the chemistry of the island’s aquifers. you have the data set hawaii_aquifers
. You can see in the output below the structure of the data set - we have 990 measurements of a 9 different analytes in multiple wells that draw on a set of 10 aquifers.
hawaii_aquifers
## # A tibble: 954 × 6
## aquifer_code well_name longitude latitude analyte
## <chr> <chr> <dbl> <dbl> <chr>
## 1 aquifer_1 Alewa_Heights_Sp… NA NA SiO2
## 2 aquifer_1 Alewa_Heights_Sp… NA NA Cl
## 3 aquifer_1 Alewa_Heights_Sp… NA NA Mg
## 4 aquifer_1 Alewa_Heights_Sp… NA NA Na
## 5 aquifer_1 Alewa_Heights_Sp… NA NA K
## 6 aquifer_1 Alewa_Heights_Sp… NA NA SO4
## 7 aquifer_1 Alewa_Heights_Sp… NA NA HCO3
## 8 aquifer_1 Alewa_Heights_Sp… NA NA dissol…
## 9 aquifer_1 Alewa_Heights_Sp… NA NA Ca
## 10 aquifer_1 Beretania_High_S… NA NA SiO2
## # ℹ 944 more rows
## # ℹ 1 more variable: abundance <dbl>
unique(hawaii_aquifers$aquifer_code)
## [1] "aquifer_1" "aquifer_2" "aquifer_3" "aquifer_4"
## [5] "aquifer_5" "aquifer_6" "aquifer_7" "aquifer_8"
## [9] "aquifer_9" "aquifer_10"
Importantly, there are many wells that draw on each aquifer, as shown in the graph below.
hawaii_aquifers %>%
select(aquifer_code, well_name) %>%
group_by(aquifer_code) %>%
summarize(n_wells = length(unique(well_name))) -> aquifers_summarized
aquifers_summarized
## # A tibble: 10 × 2
## aquifer_code n_wells
## <chr> <int>
## 1 aquifer_1 12
## 2 aquifer_10 7
## 3 aquifer_2 5
## 4 aquifer_3 3
## 5 aquifer_4 16
## 6 aquifer_5 4
## 7 aquifer_6 12
## 8 aquifer_7 9
## 9 aquifer_8 3
## 10 aquifer_9 30
definitions
populations and independent measurements: When we are comparing means, we are comparing two sets of values. It is important to consider where these values came from in the first place. In particular, it is usually useful to think of these values as representatives of larger populations. In the example of our aquifer data set, we can think of the measurements from different wells drawing on the same aquifer as independent measurements of the “population” (i.e. the aquifer).
the null hypothesis: When we conduct a statistical test, we are testing the null hypothesis. The null (think “default”) hypothesis is that there is no difference bewteen the means (hence the name “null”). In the example of our aquifers, let’s say that we’re interested in whether two aquifers have different abundances of potassium - in this case the null hypothesis is that they do not differ, in other words, that they have the same amount of potassium.
the p value: The p value represents the probability of getting data as extreme as our results if the null hypothesis is true. In other words - the p value is the probability that we would observe the differences we did, if in fact there were no differences in the means at all. To continue with our example: suppose we measure potassium levels in 10% of the wells that access each aquifer and find that aquifer_1 has potassium levels of 14 +/- 2 and aquifer_2 has potassium levels of 12 +/- 1. Suppose that we then conduct a statistical test and get a p value of 0.04. This means that, assuming the aquifers have the same magneisum levels (i.e. assuming the null hypothesis is true), there is a 4% chance that we would get the measured values that we did. In other words, IF the aquifers have the same potassium abundance, it is pretty unlikely that we would have obtained the measurements that we did.
Please note that the the p value is not the probability of a detected difference being a false positive. The probability of a false positive requires additional information in order to be calculated. For further discussion please see the end of this chapter.
test selection
There are many different types of statistical tests. Below is a flow chart illustrating how it is recommended that statistical tests be used in this course. You can see that there are three regimes of tests: variance and normality tests (blue), parametric tests (green), and non-parametric tests (orange):
When we are comparing means, we need to first determine what kind of statistical tests we can use with our data. If (i) our data can be reasonably modelled by a normal distribution and (ii) the variances about the two means are similar, then we can use the more powerful “parametric” tests (i.e. tests that will be more likely to detect a difference in means, assuming one exists). If one of these criteria are not met, then we need to use less powerful “non-parametric” tests.
We can check our data for normality and similar variances using the Shapiro test and the Levene test. Let’s use the hawaii_aquifers
data as an example, and let’s consider only the element potassium:
K_data <- hawaii_aquifers %>%
filter(analyte == "K")
K_data
## # A tibble: 106 × 6
## aquifer_code well_name longitude latitude analyte
## <chr> <chr> <dbl> <dbl> <chr>
## 1 aquifer_1 Alewa_Heights_Sp… NA NA K
## 2 aquifer_1 Beretania_High_S… NA NA K
## 3 aquifer_1 Beretania_Low_Se… NA NA K
## 4 aquifer_1 Kuliouou_Well -158. 21.3 K
## 5 aquifer_1 Manoa_Well_II -158. 21.3 K
## 6 aquifer_1 Moanalua_Wells_P… -158. 21.4 K
## 7 aquifer_1 Moanalua_Wells_P… -158. 21.4 K
## 8 aquifer_1 Moanalua_Wells_P… -158. 21.4 K
## 9 aquifer_1 Nuuanu_Aerator_W… -158. 21.4 K
## 10 aquifer_1 Palolo_Tunnel -158. 21.3 K
## # ℹ 96 more rows
## # ℹ 1 more variable: abundance <dbl>
To work with two means, let’s just look at aquifers 1 and 6:
K_data_1_6 <- K_data %>%
filter(aquifer_code %in% c("aquifer_1", "aquifer_6"))
K_data_1_6
## # A tibble: 24 × 6
## aquifer_code well_name longitude latitude analyte
## <chr> <chr> <dbl> <dbl> <chr>
## 1 aquifer_1 Alewa_Heights_Sp… NA NA K
## 2 aquifer_1 Beretania_High_S… NA NA K
## 3 aquifer_1 Beretania_Low_Se… NA NA K
## 4 aquifer_1 Kuliouou_Well -158. 21.3 K
## 5 aquifer_1 Manoa_Well_II -158. 21.3 K
## 6 aquifer_1 Moanalua_Wells_P… -158. 21.4 K
## 7 aquifer_1 Moanalua_Wells_P… -158. 21.4 K
## 8 aquifer_1 Moanalua_Wells_P… -158. 21.4 K
## 9 aquifer_1 Nuuanu_Aerator_W… -158. 21.4 K
## 10 aquifer_1 Palolo_Tunnel -158. 21.3 K
## # ℹ 14 more rows
## # ℹ 1 more variable: abundance <dbl>
ggplot(K_data_1_6, aes(x = aquifer_code, y = abundance)) +
geom_boxplot() +
geom_point()
Are these data normally distributed? Do they have similar variance? Let’s get a first approximation by looking at a plot:
K_data_1_6 %>%
ggplot(aes(x = abundance)) +
geom_histogram(bins = 30) +
facet_wrap(~aquifer_code) +
geom_density(aes(y = ..density..*10), color = "blue")
Based on this graphic, it’s hard to say! Let’s use a statistical test to help. When we want to run the Shaprio test, we are looking to see if each group has normally distributed here (here group is “aquifer_code”, i.e. aquifer_1 and aquifer_6). This means we need to group_by(aquifer_code)
before we run the test:
K_data_1_6 %>%
group_by(aquifer_code) %>%
shapiroTest(abundance)
## # A tibble: 2 × 4
## aquifer_code variable statistic p
## <chr> <chr> <dbl> <dbl>
## 1 aquifer_1 abundance 0.885 0.102
## 2 aquifer_6 abundance 0.914 0.239
Both p-values are above 0.05! This means that the distributions are not significantly different from a normal distribution. What about the variances about the two means? Are they similar? For this we need a Levene test. With that test, we are not looking within each group, but rather across groups - this means we do NOT need to group_by(aquifer_code)
and should specify a y ~ x
formula instead:
K_data_1_6 %>%
leveneTest(abundance ~ aquifer_code)
## # A tibble: 1 × 4
## df1 df2 statistic p
## <int> <int> <dbl> <dbl>
## 1 1 22 0.289 0.596
The p-value from this test is 0.596! This means that their variances are not significantly different. If their shape is the same (normality) and their variances are the same (equal variances), then the only thing that can be different is their means. This allows us to set up our null hypothesis and calculate the probability of obtaining these different means if the two sets of observations come from an identical source.
two means
Now, since our data passed both test, this means we can use a normal t-test. A t-test is a parametric test. This means that it relies on modelling the data using a normal distribution in order to make comparisons. It is also a powerful test. This means that it is likely to detect a difference in means, assuming one is present. Let’s try it out:
K_data_1_6 %>%
tTest(abundance ~ aquifer_code)
## # A tibble: 1 × 8
## .y. group1 group2 n1 n2 statistic df p
## * <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl>
## 1 abundance aquif… aquif… 12 12 -2.75 20.5 0.0121
A p-value of 0.012! This is below 0.05, meaning that there is a 95% chance that the two means are different. Suppose that our data had not passed the Shapiro and/or Levene tests. We would then need to use a Wilcox test. The Wilcox test is a non-parametric test, which means that it does not use a normal distribution to model the data in order to make comparisons. This means that is a less powerful test than the t-test, which means that it is less likely to detect a difference in the means, assuming there is one. For fun, let’s try that one out and compare the p-values from the two methods:
K_data_1_6 %>%
wilcoxTest(abundance ~ aquifer_code)
## # A tibble: 1 × 7
## .y. group1 group2 n1 n2 statistic p
## * <chr> <chr> <chr> <int> <int> <dbl> <dbl>
## 1 abundance aquifer_1 aquifer_6 12 12 33.5 0.0282
A p-value of 0.028! This is higher than the value given by the t-test (0.012). That is because the Wilcox test is a less powerful test: it is less likely to detect differences in means, assuming they exist.
more than two means
In the previous section we compared two means. What if we want to compare means from more than two study subjects? The first step is again to determine which tests to use. Let’s consider our hawaii aquifer data again, though this time let’s use all the aquifers, not just two:
K_data <- hawaii_aquifers %>%
filter(analyte == "K")
K_data
## # A tibble: 106 × 6
## aquifer_code well_name longitude latitude analyte
## <chr> <chr> <dbl> <dbl> <chr>
## 1 aquifer_1 Alewa_Heights_Sp… NA NA K
## 2 aquifer_1 Beretania_High_S… NA NA K
## 3 aquifer_1 Beretania_Low_Se… NA NA K
## 4 aquifer_1 Kuliouou_Well -158. 21.3 K
## 5 aquifer_1 Manoa_Well_II -158. 21.3 K
## 6 aquifer_1 Moanalua_Wells_P… -158. 21.4 K
## 7 aquifer_1 Moanalua_Wells_P… -158. 21.4 K
## 8 aquifer_1 Moanalua_Wells_P… -158. 21.4 K
## 9 aquifer_1 Nuuanu_Aerator_W… -158. 21.4 K
## 10 aquifer_1 Palolo_Tunnel -158. 21.3 K
## # ℹ 96 more rows
## # ℹ 1 more variable: abundance <dbl>
ggplot(data = K_data, aes(y = aquifer_code, x = abundance)) +
geom_boxplot() +
geom_point(color = "maroon", alpha = 0.6, size = 3)
Let’s check visually to see if each group is normally distributed and to see if they have roughly equal variance:
K_data %>%
group_by(aquifer_code) %>%
ggplot(aes(x = abundance)) +
geom_histogram(bins = 30) +
facet_wrap(~aquifer_code) +
geom_density(aes(y = ..density..*10), colour = "blue")
Again, it is somewhat hard to tell visually if these data are normally distributed. It seems pretty likely that they have different variances about the means, but let’s check using the Shapiro and Levene tests. Don’t forget: with the Shaprio test, we are looking within each group and so need to group_by()
, with the Levene test, we are looking across groups, and so need to provide a y~x
formula:
K_data %>%
group_by(aquifer_code) %>%
shapiroTest(abundance)
## # A tibble: 10 × 4
## aquifer_code variable statistic p
## <chr> <chr> <dbl> <dbl>
## 1 aquifer_1 abundance 0.885 0.102
## 2 aquifer_10 abundance 0.864 0.163
## 3 aquifer_2 abundance 0.913 0.459
## 4 aquifer_3 abundance 0.893 0.363
## 5 aquifer_4 abundance 0.948 0.421
## 6 aquifer_5 abundance 0.993 0.972
## 7 aquifer_6 abundance 0.914 0.239
## 8 aquifer_7 abundance 0.915 0.355
## 9 aquifer_8 abundance 0.842 0.220
## 10 aquifer_9 abundance 0.790 0.0000214
K_data %>%
leveneTest(abundance ~ aquifer_code)
## # A tibble: 1 × 4
## df1 df2 statistic p
## <int> <int> <dbl> <dbl>
## 1 9 96 2.95 0.00387
Based on these tests, it looks like the data for aquifer 9 is significantly different from a normal distribution (Shaprio test p < 0.05), and the variances are certainly different from one another (Levene test p > 0.05).
Let’s assume for a second that our data passed these tests. This means that we could reasonably model our data with normal distributions and use a parametric test to compare means. This means that we can use an ANOVA to test for differences in means.
ANOVA, Tukey tests
We will use the anovaTest
function from the package rstatix
. It will tell us if any of the means in the data are statistically different from one another. However, if there are differences between the means, it will not tell us which of them are different.
K_data %>%
anovaTest(abundance ~ aquifer_code)
## ANOVA Table (type II tests)
##
## Effect DFn DFd F p p<.05 ges
## 1 aquifer_code 9 96 9.486 3.28e-10 * 0.471
A pretty small p-value! There are definitely some significant differences among this group. But, WHICH are different from one another though? For this, we need to run Tukey’s Honest Significant Difference test (implemented using tukey_hsd
). This will essentially run t-test on all the pairs of study subjects that we can derive from our data set (in this example, aquifer_1 vs. aquifer_2, aquifer_1 vs. aquifer_3, etc.). After that, it will correct the p-values according to the number of comparisons that it performed. This controls the rate of type I error that we can expect from the test. These corrected values are provided to us in the p.adj
column.
K_data %>%
tukey_hsd(abundance ~ aquifer_code)
## # A tibble: 45 × 9
## term group1 group2 null.value estimate conf.low
## * <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 aquifer_code aquifer… aquif… 0 0.00357 -2.04
## 2 aquifer_code aquifer… aquif… 0 1.44 -0.708
## 3 aquifer_code aquifer… aquif… 0 0.375 -2.40
## 4 aquifer_code aquifer… aquif… 0 -1.15 -2.78
## 5 aquifer_code aquifer… aquif… 0 -0.875 -3.36
## 6 aquifer_code aquifer… aquif… 0 1.98 0.228
## 7 aquifer_code aquifer… aquif… 0 2.70 0.801
## 8 aquifer_code aquifer… aquif… 0 -0.125 -2.90
## 9 aquifer_code aquifer… aquif… 0 -0.349 -1.80
## 10 aquifer_code aquifer… aquif… 0 1.44 -0.954
## # ℹ 35 more rows
## # ℹ 3 more variables: conf.high <dbl>, p.adj <dbl>,
## # p.adj.signif <chr>
# K_data %>%
# tukeyHSD(abundance ~ aquifer_code)
Using the output from our tukey test, we can determine which means are similar. We can do this using the pGroups function:
groups_based_on_tukey <- K_data %>%
tukey_hsd(abundance ~ aquifer_code) %>%
pGroups()
groups_based_on_tukey
## treatment group spaced_group
## aquifer_1 aquifer_1 ab ab
## aquifer_10 aquifer_10 abc abc
## aquifer_2 aquifer_2 acd a cd
## aquifer_3 aquifer_3 abcd abcd
## aquifer_4 aquifer_4 b b
## aquifer_5 aquifer_5 ab ab
## aquifer_6 aquifer_6 cd cd
## aquifer_7 aquifer_7 d d
## aquifer_8 aquifer_8 abcd abcd
## aquifer_9 aquifer_9 ab ab
We can use the output from pGroups
to annotate our plot:
ggplot(data = K_data, aes(y = aquifer_code, x = abundance)) +
geom_boxplot() +
geom_point(color = "maroon", alpha = 0.6, size = 3) +
geom_text(data = groups_based_on_tukey, aes(y = treatment, x = 9, label = group))
Excellent! This plot shows us, using the letters on the same line with each aquifer, which means are the same and which are different. If a letter is shared among the labels in line with two aquifers, it means that their means do not differ significantly. For example, aquifer 2 and aquifer 6 both have “b” in their labels, so their means are not different - and are the same as those of aquifers 3 and 10.
Kruskal, Dunn tests
The above ANOVA example is great, but remember - our data did not pass the Shapiro or Levene tests. This means not all our data can be modelled by a normal distribution and taht we need to use a non-parametric test. The non-parametric alternative to the ANOVA is called the Kruskal test. Like the Wilcox test, it is less powerful that its parametric relative, meaning that it is less likely to detected differences, should they exist. However, since our data do not pass the Shapiro/Levene tests, we have to resort to the Kruskal test. Let’s try it out:
K_data %>%
kruskalTest(abundance ~ aquifer_code)
## # A tibble: 1 × 6
## .y. n statistic df p method
## * <chr> <int> <dbl> <int> <dbl> <chr>
## 1 abundance 106 55.9 9 0.00000000807 Kruskal-Wal…
A pretty small p-value! This is higher than the p-value from running ANOVA on the same data (remember, the Kruskal test is less powerful). Never the less, the value is still well below 0.05, meaning that some of the means are different. So, how do we determine WHICH are different from one another? When we ran ANOVA the follow-up test (the post hoc test) was Tukey’s HSD. After the Kruskal test, the post hoc test we use is the Dunn test. Let’s try:
K_data %>%
dunnTest(abundance ~ aquifer_code)
## # A tibble: 45 × 9
## .y. group1 group2 n1 n2 statistic p p.adj
## * <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl>
## 1 abunda… aquif… aquif… 12 7 -0.194 0.846 1
## 2 abunda… aquif… aquif… 12 6 2.24 0.0254 0.736
## 3 abunda… aquif… aquif… 12 3 0.866 0.387 1
## 4 abunda… aquif… aquif… 12 17 -2.65 0.00806 0.266
## 5 abunda… aquif… aquif… 12 4 -1.12 0.263 1
## 6 abunda… aquif… aquif… 12 12 2.51 0.0121 0.388
## 7 abunda… aquif… aquif… 12 9 3.01 0.00257 0.100
## 8 abunda… aquif… aquif… 12 3 0.143 0.886 1
## 9 abunda… aquif… aquif… 12 33 -0.470 0.639 1
## 10 abunda… aquif… aquif… 7 6 2.17 0.0296 0.830
## # ℹ 35 more rows
## # ℹ 1 more variable: p.adj.signif <chr>
This gives us adjusted p-values for all pairwise comparisons. Once again, we can use pGroups()
to give us a compact letter display for each group, which can then be used to annotate the plot:
groups_based_on_dunn <- K_data %>%
dunnTest(abundance ~ aquifer_code) %>%
pGroups()
groups_based_on_dunn
## treatment group spaced_group
## aquifer_1 aquifer_1 abcd abcd
## aquifer_10 aquifer_10 abcd abcd
## aquifer_2 aquifer_2 abc abc
## aquifer_3 aquifer_3 abcd abcd
## aquifer_4 aquifer_4 d d
## aquifer_5 aquifer_5 acd a cd
## aquifer_6 aquifer_6 ab ab
## aquifer_7 aquifer_7 b b
## aquifer_8 aquifer_8 abcd abcd
## aquifer_9 aquifer_9 cd cd
ggplot(data = K_data, aes(y = aquifer_code, x = abundance)) +
geom_boxplot() +
geom_point(color = "black", alpha = 0.4, size = 2) +
scale_x_continuous(name = "Potassium abundance", breaks = seq(0,10,1)) +
scale_y_discrete(name = "Aquifer code") +
geom_text(data = groups_based_on_dunn, aes(y = treatment, x = 9, label = group)) +
theme_bw()
Note that these groupings are different from those generated by ANOVA/Tukey.
pairs of means
Oftentimes we have more than two means to compare, but rather than wanting to compare all means at once, we want to compare them in a pairwise fashion. For example, suppose we want to know if any of the aquifers contain different amounts of Na and Cl. We are not interested in testing for differences among all values of Na and Cl, rather, we want to test all pairs of Na and Cl values arising from each aquifer. That is to say, we want to compare the means in each facet of the plot below:
hawaii_aquifers %>%
filter(analyte %in% c("Na", "Cl")) %>%
ggplot(aes(x = analyte, y = abundance)) + geom_violin() + geom_point() + facet_grid(.~aquifer_code)
Fortunately, we can use an approach that is very similar to the what we’ve learned in the earlier portions of this chapter, just with minor modifications. Let’s have a look! We start with the Shapiro and Levene tests, as usual (note that we group using two variables when using the Shapiro test so that each analyte within each aquifer is considered as an individual distribution):
hawaii_aquifers %>%
filter(analyte %in% c("Na", "Cl")) %>%
group_by(analyte, aquifer_code) %>%
shapiroTest(abundance)
## # A tibble: 20 × 5
## aquifer_code analyte variable statistic p
## <chr> <chr> <chr> <dbl> <dbl>
## 1 aquifer_1 Cl abundance 0.900 1.59e- 1
## 2 aquifer_10 Cl abundance 0.486 1.09e- 5
## 3 aquifer_2 Cl abundance 0.869 2.24e- 1
## 4 aquifer_3 Cl abundance 0.75 0
## 5 aquifer_4 Cl abundance 0.903 7.49e- 2
## 6 aquifer_5 Cl abundance 0.849 2.24e- 1
## 7 aquifer_6 Cl abundance 0.741 2.15e- 3
## 8 aquifer_7 Cl abundance 0.893 2.12e- 1
## 9 aquifer_8 Cl abundance 0.878 3.17e- 1
## 10 aquifer_9 Cl abundance 0.420 2.68e-10
## 11 aquifer_1 Na abundance 0.886 1.06e- 1
## 12 aquifer_10 Na abundance 0.593 2.26e- 4
## 13 aquifer_2 Na abundance 0.884 2.88e- 1
## 14 aquifer_3 Na abundance 0.822 1.69e- 1
## 15 aquifer_4 Na abundance 0.933 2.41e- 1
## 16 aquifer_5 Na abundance 0.827 1.61e- 1
## 17 aquifer_6 Na abundance 0.764 3.80e- 3
## 18 aquifer_7 Na abundance 0.915 3.51e- 1
## 19 aquifer_8 Na abundance 0.855 2.53e- 1
## 20 aquifer_9 Na abundance 0.531 3.97e- 9
Looks like some of those distributions are significantly different from normal! Let’s run the levene test anyway. Note that for this particular case of the Levene test, we are interested in testing whether each pair of distributions has similar variances. For that we need to feed the Levene test data that is grouped by aquifer_code (so that it tests each pair as a group), then we need to specify the y ~ x formula (which in this case is abundance ~ analyte):
hawaii_aquifers %>%
filter(analyte %in% c("Na", "Cl")) %>%
group_by(aquifer_code) %>%
leveneTest(abundance ~ analyte)
## # A tibble: 10 × 5
## aquifer_code df1 df2 statistic p
## <chr> <int> <int> <dbl> <dbl>
## 1 aquifer_1 1 22 10.5 0.00375
## 2 aquifer_10 1 12 0.0535 0.821
## 3 aquifer_2 1 10 0.0243 0.879
## 4 aquifer_3 1 4 0.320 0.602
## 5 aquifer_4 1 32 1.57 0.219
## 6 aquifer_5 1 6 2 0.207
## 7 aquifer_6 1 22 1.03 0.322
## 8 aquifer_7 1 16 1.54 0.232
## 9 aquifer_8 1 4 0.515 0.512
## 10 aquifer_9 1 64 1.10 0.298
It looks like the variances of the pair in aquifer 1 have significantly different variances. So - we for sure need to be using non-parametric testing. If this were a simple case of two means we would use the wilcox_test
, but we have may pairs, so we will use pairwise_wilcox_test
(note that with this test there are options for various styles of controlling for multiple comparisons, see: ?pairwise_wilcox_test
):
hawaii_aquifers %>%
filter(analyte %in% c("Na", "Cl")) %>%
group_by(aquifer_code) %>%
pairwiseWilcoxTest(abundance~analyte)
## # A tibble: 10 × 10
## aquifer_code .y. group1 group2 n1 n2 statistic
## * <chr> <chr> <chr> <chr> <int> <int> <dbl>
## 1 aquifer_1 abundan… Cl Na 12 12 99.5
## 2 aquifer_10 abundan… Cl Na 7 7 14
## 3 aquifer_2 abundan… Cl Na 6 6 36
## 4 aquifer_3 abundan… Cl Na 3 3 3
## 5 aquifer_4 abundan… Cl Na 17 17 189
## 6 aquifer_5 abundan… Cl Na 4 4 13
## 7 aquifer_6 abundan… Cl Na 12 12 53
## 8 aquifer_7 abundan… Cl Na 9 9 42
## 9 aquifer_8 abundan… Cl Na 3 3 6
## 10 aquifer_9 abundan… Cl Na 33 33 195
## # ℹ 3 more variables: p <dbl>, p.adj <dbl>,
## # p.adj.signif <chr>
Excellent! It looks like there is a statistically significant difference between the means of the abundances of Cl and Na in aquifer_2 and (surprisingly?) in aquifer_9 (perhaps due to the large number of observations?).
What would we have done if our Shaprio and Levene tests had revealed no significant differences? Well, a pairwise_TTest
of course!
hawaii_aquifers %>%
filter(analyte %in% c("Na", "Cl")) %>%
group_by(aquifer_code) %>%
pairwiseTTest(abundance~analyte) -> test_output
test_output
## # A tibble: 10 × 10
## aquifer_code .y. group1 group2 n1 n2 p
## * <chr> <chr> <chr> <chr> <int> <int> <dbl>
## 1 aquifer_1 abundance Cl Na 12 12 4.69e-2
## 2 aquifer_10 abundance Cl Na 7 7 8.82e-1
## 3 aquifer_2 abundance Cl Na 6 6 3.75e-5
## 4 aquifer_3 abundance Cl Na 3 3 6.83e-1
## 5 aquifer_4 abundance Cl Na 17 17 1.03e-1
## 6 aquifer_5 abundance Cl Na 4 4 9.75e-2
## 7 aquifer_6 abundance Cl Na 12 12 5.66e-1
## 8 aquifer_7 abundance Cl Na 9 9 5.21e-1
## 9 aquifer_8 abundance Cl Na 3 3 4.28e-1
## 10 aquifer_9 abundance Cl Na 33 33 8.96e-1
## # ℹ 3 more variables: p.signif <chr>, p.adj <dbl>,
## # p.adj.signif <chr>
Excellent, now we see how to run parametric and non-parametric pairwise comparisons. How do we annotate plots with the output of these tests? Here is an example:
anno <- data.frame(
xmin = test_output$group1,
xmax = test_output$group2,
y_position = c(150, 150, 150, 175, 80, 50, 300, 150, 50, 125),
text = test_output$p.signif,
text_size = 10,
text_vert_offset = 10,
text_horiz_offset = 1.5,
tip_length_xmin = 5,
tip_length_xmax = 5,
aquifer_code = test_output$aquifer_code,
hjust = 0.5,
vjust = 0.5
)
hawaii_aquifers %>%
filter(analyte %in% c("Na", "Cl")) %>%
ggplot(aes(x = analyte, y = abundance)) +
geom_violin(fill = "gold", color = "black") +
geom_point(shape = 21, fill = "maroon", color = "black") +
facet_grid(.~aquifer_code) +
geomSignif(data = anno, orientation = "horizontal") +
scale_x_discrete(name = "Analyte") +
scale_y_continuous(name = "Abundance") +
theme_bw() +
theme(
text = element_text(size = 16)
)
further reading
For more on comparing multiple means in R: www.datanovia.com
For more on parametric versus non-parametric tests: Statistics by Jim
For more on interpreting p values: [The p value wars (again) by Ulrich Dirnagl]
Ten common statistical mistakes and their solutions: Science Forum: Ten common statistical mistakes to watch out for when writing or reviewing a manuscript
How to think about very small p-values: Reporting p Values, by Wolfgang Huber
exercises
Using the hawaii_aquifers
data set or the tequila_chemistry
data set, please complete the following:
Choose one analyte and filter the data so only the rows for that analyte are shown.
Choose two of the aquifers (or bottles of tequila). Are the mean abundances for your chosen analyte different in these two aquifers? Don’t forget to test your data for normality and homogeneity of variance before selecting a statistical test. Use a plot to illustrate whether the means are similar or different. If you wish, you can use the figure caption to help illustrate whether means are significantly different or not. Either way, be sure to describe the tests you ran in your figure caption, and how you interpreted the results of the tests. For example, you should include a statment like “Groups sharing a letter label (a, b) are not significantly different (ANOVA/Tukey HSD tests, p > 0.05). Groups with different letters indicate a significant difference (p < 0.05).”, or, if you are using asterisks then perhaps something like “Significant differences between groups are indicated by asterisks: * = p < 0.05, ** = p < 0.01 (Kruskal / Dunn tests).”
Choose a second analyte, different from the first one you chose. Considering all the aquifers (or all bottles of tequila) in the dataset, do any of them have the same abundance of this analyte? Again, don’t forget about normality and homogeneity of variance tests. Use a plot to illustrate your answer. Be sure to describe the tests you ran in your figure caption, and how you interpreted the results of the tests.
Repeat #3 above, but switch the type of test used (i.e. use non-parametric if you used parametric for #3 and vice-versa). Compare the p values and p groups obtained by the two methods. Use a graphic to illustrate this. Why are they different?