The data are rolling in!

It is very exciting to see the first data sets from G-TREE sites that were established in 2013! With the arrival of data comes the next step: analyses. 

Here is some information on how we will be analyzing G-TREE data, followed by some tips and recommendations for individual site analyses:

Site-level versus global analyses:

Analyses of G-TREE data will occur at (at least) two spatial scales: site and global. Each G-TREE group will analyze and interpret the data for their area. A small group of G-TREE'ers will analyze the global dataset, and will rely on the entire G-TREE group to help with interpretation. An example: G-TREE Davos (Swiss Alps) had a high incidence of seedling winter-kill. Perhaps in the global analyses, we will find those data to be an outlier compared to other site's over-winter survival. We will then rely on the Davos group for more detailed information. Perhaps during their site-level analyses, they discovered a late-spring frost event, or an unusually cold autumn before snowfall, which would help us to interpret their datapoint. 

Tips for site-level analyses:

1. Explore your data. We recommend the following paper as a starting point:

Zuur, A. F., Ieno, E. N. and Elphick, C. S. (2010), A protocol for data exploration to avoid common statistical problems. Methods in Ecology and Evolution, 1: 3–14. doi: 10.1111/j.2041-210X.2009.00001.x

2. Determine your response  and explanatory variables (okay, perhaps do this first!). Do you know (a) exactly how many seeds germinated and didn't germinated, or (b) the count of germinated seedlings (but no count of ungerminated seeds)?

(a) If you know exactly how many seeds you sowed and the exact number of seedlings that germinated (you definitely know that), then you can analyze the data as proportion data using a binomial distribution. What stats program are you using? If you are using R, you can use the “glmer” in the “lme4” package and specify the family as "binomial". For proportion data with this function, the response variable in the model is entered as the count of germinated and ungerminated seedlings like this (in bold):

model1<-glmer(cbind(germ.positive, germ.negative)~treatment1+treatment2+etc, family=binomial, data=mydata)

(b) If you don’t know exactly how many seeds you sowed, but just have the number of seedlings germinated (this is most likely the case), then you can analyze the data as count data using a Poisson distribution. In R, again you would use the “glmer” function but this time specify “poisson” as the family. That would be:

model2<-glmer(seedling.count~treatment1+treatment2+etc, family=poisson, data=mydata)

Explanatory variables included will depend on the level of protocol you did, but may include seed-substrate treatmentsite (forest, transition, arctic/alpine tundra), cage (present/absent), etc. 

Check out Zuur et al. 2009. Mixed Effects Models and Extensions in R. It is very useful even if you aren’t using R or mixed effects models - there are very good chapters on analyzing proportion and count data, and checking model assumptions and fit. 

3. Dig a little deeper!  

While you explore your data, I would also recommend looking at the various treatments on their own. E.g., Are more seedlings found in cages than uncaged plots in a site, regardless of seeding/scarifying treatment? You could look at that by comparing caged to uncaged at transition, for example, without including the seed/substrate treatments in the model. What site-specific conditions may be affecting your results? Do you need to look at climate data for your site to help explain what you found? 

This is a good forum to discuss different statistical approaches, or to ask questions/address problems, so please feel free to comment below and let's start analyzing those data!

Finally, have fun. This is exciting!!