We use our findings about the sample to draw inferences about the environment that we are truly interested in. H|TMo0W4CHa-KBev#Hv'Oo8b+Zrvb-Q% Z97C8z:.wW>Co\8 doi: 10.1038/ismej.2016.118, Shannon, C. E. (1948). doi: 10.2307/1934145, Lande, R. (1996). Annu. *Correspondence: Amy D. Willis, adwillis@uw.edu, 2. This manuscript has been released as a preprint via bioRxiv (Willis, 2017). The unique property of microbiome experiments and alpha diversity analysis is that samples do not faithfully represent the entire microbial community under study. Stat. Suppose I conduct an experiment in which I take a sample from Environment A and count the number of different microbial taxa present in my sample. In this article, I discuss why unequal sample sizes appear to cause special problems in the analysis of alpha diversity. This article is based on course notes presented by the author at the Marine Biological Laboratory at the STAMPS course in 2013, 2014, 2015, 2016, 2017, and 2018. Bias in Estimating and Comparing Alpha Diversity, Creative Commons Attribution License (CC BY), Department of Biostatistics, University of Washington, Seattle, WA, United States. Because technical replicates in microbiome experiments yield different numbers of reads, different community compositions, and different levels of alpha diversity, we have measurement error in microbial experiments. In the case where the environments have equal richness (Figures 1EH), this approach correctly detects equal richness, even when the abundance structures differ. Implicitly, this model acknowledges that we can assess the flux with high precision; that is, the margin for error for determining flux is negligible. 0000048969 00000 n The first practice is using biased estimates of alpha diversity indices. x`vZC 6@LEsr_Qoe6pT>_s68`$aGH(.;LM d ` 7 Figure 1. J. Anim. xb```b``Qa`e`` l,|{5,A/tXxf=~** 6" .}|oyzYETY_?#2eCStfi~4A}`i6N6*tlljQ4GT6.G{Dd\jb3_K%MU(^%P-|%)Hp(Zz.@5@JxY@at!k[d4\N,IX)ar"SKk1. Furthermore, not all information collected from the samples was used in making the comparison. While the focus of the examples is microbiome data analysis, the issues and discussion are equally applicable to macroecological data analysis. Estimating diversity via frequency ratios. Methods for phylogenetic analysis of microbiome data. I then take a sample from Environment B, count the number of different taxa in that sample, and compare it to the number of taxa in Environment A. I am likely to observe higher numbers of different taxa in the sample with more microbial reads. Here I advocate for a third strategy: adjust the sample richness of each ecosystem by adding to it an estimate of the number of unobserved species, estimate the variance in the total richness estimate, and compare the diversities relative to these errors (Figure 1D). I introduce a statistical perspective on the estimation of alpha diversity, and argue that a common view of diversity indices is causing fundamental issues in comparing samples. Received: 19 August 2019; Accepted: 07 October 2019; Published: 23 October 2019. However, since estimates for alpha diversity metrics are heavily biased when taxa are unobserved, comparing alpha diversity using either raw or rarefied data should not be undertaken. Stat. I argue that latent variable models can address issues with variance, but bias corrections need to be utilized as well. Attempting to address this problem using rarefaction actually induces more bias. The second practice is treating alpha diversity estimates as precisely observed quantities that do not have measurement error. The nonconcept of species diversity: a critique and alternative parameters. In the flux experiment, this would involve measuring the flux of the same soil sites again using the same experimental conditions. While alpha diversity estimation for microbiomes is an active area of research in statistics (Arbel et al., 2016; Zhang and Grabchak, 2016; Willis and Martin, 2018), there remain many features of microbial ecosystems (such as crosstalk between samples and spatial organization of microbes) that are not yet incorporated into statistical methodology for alpha diversity estimation. Improved detection of changes in species richness in high-diversity microbial communities. Imagine that we had complete knowledge of every microbe in existence, including identity, abundance and location. In order to draw meaningful conclusions regarding comparisons of microbial communities, it is necessary to use measurement error models to adjust for the uncertainty in the estimation of alpha diversity. !Fh{T$zCwJR?Oh,zy,UQ[vb]2A ISME J. Understanding the drivers of diversity is a fundamental question in ecology. The set-up where an estimate of a quantity converges to the correct value as more samples are obtained is also well understood in statistics. Similarly, when comparing the response of different treatment groups in clinical trials, the number of subjects in each treatment group is accounted for in a comparison of the overall treatment effect. We take samples from environments, and investigate the microbial community present in the sample. Z., Peddada, S., Amir, A., Bittinger, K., Gonzalez, A., et al. u71H1l{uR(MJ 12:42. doi: 10.2307/1411, Hurlbert, S. H. (1971). 0000004910 00000 n %%EOF (2017). First proposed by Sanders (1968), rarefaction involves selecting a specified number of samples that is equal to or less than the number of samples in the smallest sample, and then randomly discarding reads from larger samples until the number of remaining samples is equal to this threshold (see Hurlbert, 1971 for a deterministic version). 11, 20352046. Arbel, J., Mengersen, K., and Rousseau, J. Copyright 2019 Willis. (1943). The strategy outlined here for modeling richness after adjusting for missing species adjusts for both bias and variance, thus accounting for library size differences and incomplete microbial surveys. The first method, Figure 1B, is to use the estimates cA1, cA2, cB1, and cB2, and perform modeling and hypothesis testing (such as ANOVA) as if both the bias and variance of these estimates were zero (see, for example, Makipaa et al., 2017). 0000014469 00000 n In meta-analyses, larger studies need to be given more weight in determining the overall effect size, and this is incorporated into a meta-analysis via the smaller standard errors on the effect size estimates. QKjhjZF`N_$ xOV doi: 10.1023/A:1026096204727, Demidenko, E. (2004). Marine benthic diversity: a comparative study. 27, 379423. endstream endobj 136 0 obj<>stream 119 33 Nat. 0000005354 00000 n J. R. Stat. J. Nonparametr. 0000012940 00000 n jt$gZ 9O'0z,ZlM6wO,t7+@==p5Ar:lT*}cMNo;V 5 0000004938 00000 n The editor and reviewer's affiliations are the latest provided on their Loop research profiles and may not reflect their situation at the time of review. (2017). Bell Syst. Interactions between soil- and dead wood-inhabiting fungal communities during the decay of Norway spruce logs. However, this is not generally true, because environments can be identical with respect to one alpha diversity metric, but the different abundance structures will induce different biases when rarefied. HS]o0|4) mn%H,%(;Ab J:G>o^|z4X# :cFhBG ~wD r\=^/_76uV%[g=F5^e}n'_wI ^%=$Q4kD"&& MGk T1| doi: 10.1111/biom.12332, Willis, A. D., Bunge, J., and Whitman, T. (2016). bioRxiv 18. Stat. (2017). 0000002526 00000 n 990eAt!9kDg9HQ7eTYTrPAYaF>dX?yl$jXB6e]l*Yi6EMq&X91('\h1mn9sx:7:B175>zr;Ijizc8S6 AF`F=g`, I describe the state of the statistical literature for addressing these problems, focusing on the analysis of microbial diversity. The samples are not of particular interest, except that they reflect the environment from which they were sampled. We would adjust for the measurement error by adding 5 units to each measurement before comparing them. doi: 10.1038/ismej.2017.57, McMurdie, P. J., and Holmes, S. (2014). bioRxiv 123. While the example discussed here is richness, this approach to estimating and comparing alpha diversity using a bias correction (incorporating unobserved taxa) and a variance adjustment (measurement error model) could apply to any alpha diversity metric. Appl. 0000012731 00000 n 0000005143 00000 n The library sizes can dominate the biology in determining the result of the diversity analysis (Lande, 1996). 0000007248 00000 n Statistics and partitioning of species diversity, and similarity among multiple communities. To this criticism, I add misapplying statistical tools is undermining many analyses of alpha diversity. Despite this, alpha diversity estimates that account for unobserved taxa and provide variance estimates are vastly preferable to both plug-in and rarefied estimates, which do not account for unobserved taxa nor provide variance estimates. 0000067213 00000 n I encourage ecologists to use estimates of diversity that account for unobserved species, and to use measurement error models to compare diversity across ecosystems. Suppose we are interested in modeling the CO2 flux of soil treated with different amendments. xref doi: 10.1214/16-AOAS944, Arora, T., Seyfried, F., Docherty, N. G., Tremaroli, V., le Roux, C. W., Perkins, R., et al. However, detecting a difference between the effects of amendment on flux would be more challenging statistically: we would require more samples to detect a true difference compared to the case without measurement error. To clarify this discussion, I will focus on taxonomic richness (the simplest case), and later generalize the argument to other alpha diversity metrics. Extensive literature discusses different methods for describing diversity and documenting its effects on ecosystem health and function. 3:652. doi: 10.1038/s41564-018-0156-0, Weiss, S., Xu, Z. Entropic representation and estimation of diversity indices. Montana State University System, United States. doi: 10.1002/0471728438, Fisher, R. A., Corbet, A. S., and Williams, C. B. Based on these subsamples of equal size, diversity metrics can be calculated that can contrast ecosystems fairly, independent of differences in sample sizes (Weiss et al., 2017). Diversity is the question, not the answer. 0000011550 00000 n doi: 10.1111/rssc.12206, Willis, A. D., and Martin, B. D. (2018). Rarefaction is a method that adjusts for differences in library sizes across samples to aid comparisons of alpha diversity. Nonparametric estimation of Shannon's index of diversity when there are unseen species in sample. While measurement error in microbiome studies affects all analyses of microbiome data, alpha diversity is particularly affected because commonly used estimates of alpha diversity are heavily biased compared to other estimation problems in microbial ecology (such as estimating relative abundances). Ecol. Unfortunately, determining how to meaningfully estimate and compare alpha diversity is not trivial. 0000002113 00000 n 0000001884 00000 n 66, 963977. 0000048803 00000 n Comparing sample taxonomic richness can therefore often lead to incorrect conclusions about true richness (B,F). Diabetes-associated microbiota in fa/fa rats is modified by Roux-en-Y gastric bypass. trailer For example, the Chao-Bunge (Chao and Bunge, 2002) and breakaway (Willis and Bunge, 2015) estimators of taxonomic richness provide variance estimates, account for unobserved taxa, and are not overly sensitive to the singleton count (the number of species observed once). <<9BA3DB7AEA9C6C4BA71E5272DAA5A3D1>]>> doi: 10.1101/231878, Willis, A., and Bunge, J. Measurement of diversity. Alpha diversity metrics summarize the structure of an ecological community with respect to its richness (number of taxonomic groups), evenness (distribution of abundances of the groups), or both. The second method is to generate a normalized, or rarefied sample by randomly discarding reads from all samples until each sample has nA1 reads (the number of reads in the smallest sample), Figure 1C. Stat. doi: 10.1034/j.1600-0706.2000.890320.x, Makipaa, R., Rajala, T., Schigel, D., Rinne, K. T., Pennanen, T., Abrego, N., et al. Appl. gh78?PFj#HfHi:?hsk8f`i9Xjgry2I0o4)~CKCa*s~]Ir$&z4 uzf6SPpI$yjv6M8Nj1_#!:0Kg"7SfYdV'| 8N-yl,i(u0a%?Gm~eRr+:!Ca,gGA+ECk2q0nU|nu$?s$BmQd-W* ?=I5._Mo'P3=)J\'{ea 2p' }_ (2015). doi: 10.1146/annurev-statistics-022513-115654, Chao, A., and Bunge, J. Furthermore, this discussion applies equally to diversity analyses performed at the strain, species, or other taxonomic level. (2002). 0000010375 00000 n We would measure the flux of equally sized soil sites treated with the different amendments, performing biological replicates using multiple sites for each amendment. *7]9rQ(_Eh%;K) [8)JR=W-&z%/q b<5mD:;3[\.z6H-Aa&9WD\h+(*0,8OuNOd*B&jr'J V ^o |o7\;lW N6p*n:K;tK{ DG%9gHs6 While the example employed here concerns microbial richness, the same argument applies to macroecological richness, as well as other alpha diversity indices. 0000007451 00000 n In contrast, the coverage adjusted entropy estimator of the Shannon index (Chao and Shen, 2003) provides variance estimates and accounts for unobserved taxa, but is extremely sensitive to the singleton count, which is often difficult to determine in microbiome studies. 11, 16. Measurement Error and Variance in Microbiome Studies, 3. ^vB+ J2.U9-VkPAZbz_b?g|@4=Pt_Q/.{|>93*@,p*v>,1kqw"q\j@ FaG As we sample more and more of the environment using larger samples, we get closer to understanding the true and total microbial community of interest. We currently do not account for measurement error in microbial diversity studies. This is sometimes justified by claiming that rarefied estimates are equally biased. Alpha diversity could be compared exactly, because we would know entire microbial populations with perfect precision. Stat. Adjusting for unobserved taxa and accounting for uncertainty in the estimate correctly detects both true (D) and false (H) differences in richness. In order to draw meaningful conclusions about the entire microbial community, it is necessary to adjust for inexhaustive sampling using statistically-motivated parameter estimates for alpha diversity. H|TKs0W%q-t:2(-ulJ8aj_rc6vCr&o[mOr9.-r* There is unadjusted error in using our samples as proxies for the entire community. 0000013146 00000 n 166KK@D$ISuH@IIY+2f#P+c1pY m@].iiNsAl-mtD The relation between the number of species and the number of individuals in a random sample of an animal population. AW is supported by start-up funds awarded by the Department of Biostatistics at the University of Washington, and the National Institutes of Health (R35GM133420). The author is grateful to Berry Brosi, the MBL, the STAMPS course directors, and the STAMPS participants for countless discussions on this topic. Unfortunately, we do not have knowledge of every microbe. (2018). .b lVehxWr=y3(o!!Mwzom9Wg6R.c-x.-s@Pd3'77h(Cpz \u Rev. To decide if measurement error must be accounted for when observations are made in an experiment, it is necessary to consider the effect of repeating the observational process on the same experimental unit. 0000000956 00000 n (2016). Rarefaction, alpha diversity, and statistics. Microbiome 5:27. doi: 10.1186/s40168-017-0237-y, Willis, A. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). 0000007622 00000 n I discuss a statistical perspective on diversity, framing the diversity of an environment as an unknown parameter, and discussing the bias and variance of plug-in and rarefied estimates. Biol. Soc. Oikos 76, 513. (2003). Divnet: estimating diversity in networked communities. 10:2407. doi: 10.3389/fmicb.2019.02407. ISME J. Modeling parameters observed with estimation error is not a new suggestion: this approach is from the field of statistical meta-analysis, where the results of multiple studies estimating the same effect size is compared (Demidenko, 2004; Willis et al., 2016; Washburne et al., 2018). As may commonly occur in practice, cA1 < cA2 < cB1 < cB2. Biometrics 71, 10421049. 0000005481 00000 n AW wrote the manuscript and performed the data analysis. endstream endobj 131 0 obj<> endobj 132 0 obj<> endobj 133 0 obj<> endobj 134 0 obj<>stream To illustrate, consider the following example where the alpha diversity metric of interest is strain-level richness of a microbial community (the total number of strain variants present in the environment). Microbiol. 0000028841 00000 n Ecol. hiE8 6@Z}""TT-a?5NB04uj xp_wwI"D2,gtW`B8Yr89X3Txqa @Ee'f#,+nxJ9*(0r11X>^& 80r089d(A,|,Gg`1\ Hoboken, NJ: Wiley-Interscience. Normalization and microbial differential abundance strategies depend upon data characteristics. To illustrate this distinction, I contrast microbial diversity experiments with a non-microbial experiment. Let cij be the observed richness of environment i on replicate j. However, richness estimation has a well-studied statistical literature, and richness estimators that are adapted to microbiome data exist (see Bunge et al., 2014 for a review). ISME J. This option has the advantages of leveraging all observed reads, comparing estimates of the actual parameter of interest (taxonomic richness), and accounting for experimental noise. Because many perturbations to a community affect the alpha diversity of a community, summarizing and comparing community structure via alpha diversity is a ubiquitous approach to analyzing community surveys. Consider the setting in Figure 1A, where we are investigating 2 different environments, and Environment A's richness (call it CA) is higher than Environment B's richness (CB). The author also thanks Thea Whitman and two referees for many thoughtful suggestions on the manuscript. There are currently two commonly used methods for comparing alpha diversity. startxref doi: 10.1080/10485252.2016.1190357, Keywords: bioinformatics, computational biology, ecological data analysis, latent variable model, reproducibility, measurement error, Citation: Willis AD (2019) Rarefaction, Alpha Diversity, and Statistics. Oikos 89, 601605. The resulting rarefied richness levels are then cA1, cA2, cB1, and cB2. 0000006078 00000 n doi: 10.2307/3545743, Lande, R., DeVries, P. J., and Walla, T. R. (2000). 119 0 obj <> endobj Observing small samples from a large population is not an experimental set-up unique to microbial ecology: it is almost universal in statistics. Bayesian nonparametric dependent model for partially replicated data: the influence of fuel spills on species diversity. 0000013801 00000 n G f>-%4d21eel77QNc /Ov'J@Hg`2+W2|< 9sg 8y)((($ Adjusting for sample size when comparing different groups of observations without discarding data is widely prevalent in the sciences, and discarding data to adjust for unequal sample sizes is the exception. 10, 429443. endstream endobj 135 0 obj<>stream doi: 10.1038/ismej.2017.70, PubMed Abstract | CrossRef Full Text | Google Scholar, Bunge, J., Willis, A., and Walsh, F. (2014). 0 doi: 10.1002/j.1538-7305.1948.tb01338.x, Simpson, E. H. (1949). Unfortunately, rarefaction is neither justifiable nor necessary, a view framed statistically by McMurdie and Holmes (2014) in the context of comparison of relative abundances. 0000008330 00000 n However, it is widely believed that diversity depends on the intensity of sampling. 0000000016 00000 n The same is not true for other alpha diversity metrics.
Engraved Name Plates Near San Jose, Ca, Under The Canopy Matelasse Blanket, Restaurant Industry And Covid, Gucci Cardigan Rainbow, Giant Tcr Advanced Sl Frameset, Smashbox Always Sharp Eyeliner How To Twist Up,
0 Comment