Conservation Genetics and Detection of Rare Mleles in Finite Populations PER SJOGREN PER-NAN WYONI Department of Genetics Uppsala University Box 7003 S-750 07 Uppsala, Sweden An increasing number of biological studies concern finite populations. Habitat fragmentation, a recurrent conservation issue, subdivides populations into smaller units subjected to demographic and genetic hazards. Investigating the genetic status of"target" populations is a natural part of a population viability analysis (Gilpin & Soul6 1986; Gilpin 1987; Shaffer 1990); low variability could be detrimental to threatened species because it would constrain their adaptive potential and might be associated with inbreeding depression (Frankel & Soul6 1981; Ailendorf & Leary 1986; Vrijenhoek 1989). Thus, many authors who have scored virtually no electrophoretic variation in a species have speculated extensively regarding its status (see O'Brien et al. 1983, 1985; Lesica et al. 1988), but few consider the effects of their sampiing procedure on these results. Theoretical studies in which loci were sampled independently in populations have shown that data from few individuals ( n ~< 25) but many loci (~,40) are required to assess low heterozygosity levels (H) (Archie 1985). But rather than sampiing loci independently for detailed heterozygosity estimates, comparative population studies generally set out to detect polymorphism at particular, homologous loci. Based on this approach for finite populations, we present a model that shows that the effect of sample size ( n ) in detecting genetic variation may have been underrated; the numbers of individuals usually examined in genetic studies do not even suf/ice to match a Paper submitted October 22, 1992; revised manuscript acoeptedJuly 2, 199~ coarse polymorphism criterion for populations of size N > 30. We apply our model to some studies in conservation genetics to illustrate the problem. Models and Results Among the previous models, Gregorius (1980) calculated the sample sizes required to detect all k alleles, each with the frequency 1//g with a given probability at a particular locus in infinite diploid populaUons. But because loci with allele frequencies inversely proportional to their number (k) constitute rare exceptions (Chakraborty et al. 1980; Grant & S~hl 1988), his results unfortunately apply to very few, if any, natural pop. ulations. Another model for infinite diploid populations was used for a diaUelic locus by Tave (1986) and Swofford and Berlocher (1987). They calculated the probability ( P - ) of losing a rare allele with frequency q in a sample of n individuals as P - = (1 - q)2,, (1) In this way, and with n -- N e (the effective population size), Tave (1986) sampled haploid genomes from an infinite gamete pool in a breeding program. When sampies are taken without replacement from small populations, however, the model underestimates the probability of detecting variation. Statistically, this is a "conservative" error but, from a conservation standpoInt, oversampling of small and endangered populations may be hazardous and unacceptable. 267 Conservation Blolosy, pastm 2 6 7 - 2 7 0 Volume 8, No. 1, March 1994 268 $j~en & Wyo~ DetecUonof Rare Alleles Therefore, using the same scenario as Tave ( 1 9 8 6 ) and Swofford and Berlocher ( 1 9 8 7 ) did, w e developed a similar model for finite populations. In our model, 2n genes of a diallelic locus are sampled from a pool of 2N gene copies ( N = population size) according to a hyp e r g e o m e t r i c distribution. (This is a good approximation of sampling diploid genotypes, provided their frequencies do not deviate significantly from a HardyWeinberg distribution.) At this locus, the predominating allele has the frequency p. Similar to equation 1, but for this finite population, w e calculate the probability of sampling only the predominating allele ( P - ) as 2n-- 1 p- = II [(p2N- /y(2N- i)] (2) i=0 w h e r e p 2 N must be an integer. Provided t h a t p >> ~ the probability of detecting variation (P + )----of scoring at least one variant a U e l e - - b e c o m e s P+ = 1 - P - . We used a c o m p u t e r program to solve the minimum n u m b e r of diploid individuals ( n ) that need to be sampled from a population of size N in order to detect variation w i t h P + ~> 0.95 at given values o f p ; t h e program is available on request. Our results are shown in Table 1. We note that with sample sizes of less than 20, which are not u n c o m m o n in electrophoretic studies, this statistical criterion for detecting polymorphism at particular loci is not satisfied in populations of size N > 30, even w h e n a coarse p o l y m o r p h i s m criterion is used (p ~< 0.95 below). For very large populations (AT--->o0), our Table 1. Sample size (a individuals) required from a diploid p o p ~ l t o n of size N to detect variation at a diallelic locus with one pfedominatJq allele (frequency = p) with a probability/P+ ~> 0.95* N 10 20 30 40 50 60 8O 100 120 140 150 200 300 400 500 1000 1500 2000 2500 109 p = 0.95 n p -- 0.99 n 10 16 19 21 23 24 25 26 26 27 27 28 28 29 29 29 29 29 30 30 ~ * D a s h e s i n d i c a t e i m p o s s i b l e case~ Conservation Biology Volume 8, No. 1, March 1994 ---48 --78 --95 106 118 125 129 139 142 144 145 150 p = 0.999 n -- -475 777 948 1054 1127 1498 n values b e c o m e identical with those calculated for infinite populations in equation 1. Contrasting the results of equations 1 and 2, however, finite population size needs to be considered in cases with very rare alleles ( q < 0.05) and small populations (Table 1). Discussion Reviews on electrophoretic variation in natural populations (Fuerst et al. 1977; Chakraborty et al. 1978, 1980; see also Sarich 1977) have found that allele frequencies show a U-shaped distribution ( m o s t alleles being either very c o m m o n or rare) and that h o m o l o g o u s loci tend to be polymorphic in different conspecific populations. All this indicates that w e address a p r o b l e m relevant to studies in genetics and conservation. Our results show that small samples yield a significant risk of scoring " m o n o m o r p h i s m " in p o l y m o r p h i c populations. Therefore, w e suggest a statistical criterion for m o n o m o r p h i s m w h e r e the probability of sampling only the predominating allele by chance in a sample of n individuals is assessed as in statistical tests ( P - < 0.05* or < 0.01"*, etc.); this should be done with reference to the adopted allele frequency criterion for p o l y m o r phism, such as p ~< 0.95 (Table 1). If no such criterion is used, w e suggest that the "resolution" of the analysis is quantified as the frequency of the rarest allele, ~ at a hypothetical diallelic locus, that is detected with P + I> 0.95 in the actual sample; q can also represent the sum of frequencies of rare alleles (X qi "~ P ) in a similar multiallele situation. Other authors (such as Swofford & Berlocher 1987; Archie et al. 1989) have also stressed the importance of evaluating sample size effects in genetic studies, as well as scoring an adequate n u m b e r of loci (Archie 1985). When multiple loci are examined, P - decreases. Assuming that loci are independent observations, the comp o u n d probability that no variation is detected at any of the studied loci is the p r o d u c t of the individual probabilities. Empirical studies have s h o w n that m o s t e n z y m e loci are monomorphic, while a few are highly variable (Fuerst et al. 1977); the c o m p o u n d probability of detecting variation thus b e c o m e s a function of b o t h sample size per locus and the probability of scoring polymorphic loci. For calculating the c o m p o u n d probability, the frequency of the predominating allele at each locus has to be known; usually, this is not the case. An approximation could be made if these allele frequencies w e r e known from a reference population; hence, P - w o u l d b e c o m e the probability that the study sample detects no variation given that the two populations have similar allele frequencies. If no reference data exist, w e advocate the statistically m o r e conservative approach of restricting the probability analysis to a single-locus situation. sjopm • Do~ Examples Repeated sampling of subadults in a local pool frog population (Rana lessonae) in Sweden by Sj6gren ( 1 9 9 1 ) revealed a mean heterozygosity ( H ) of 0.0047 in 28 allozyme loci, compared to 0.0497 in a Polish population; the Swedish frogs effectively originated from different parental combinations and two different generations (Sj6gren 1991). No genetic variation was detected in the first generation ( n = 9 and 24 per locus), but in the second, two alleles with frequencies q = 0.025 and 0.050 w e r e detected in two different loci ( n = 60). With = 8 0 0 0 juveniles per generation, w e find samples of 9, 24, and 60 individuals to detect alleles with frequencies q/> O. 154, 0.061, and 0.025, respectively, with P+ I> 0.95. In this perspective, the rare alleles did probably escape detection by chance in the first samples, and there is no reason to believe they w e r e supplied by immigration (Sj6gren 1991). Among studies in conservation genetics with population size estimates, Lesica et al. ( 1 9 8 8 ) found no electrophoretic variation in 18 loci examined in an endangered plant, Howellia aquatilts. With sample sizes ranging from 5 to 63 per locus in four populations of sizes ranging from less than 1000 to 10,000, w e find that the smallest sample only detects alternative alleles with q I> 0.259 and P+ ~> 0.95 (5 individuals sampled from N -- 2000), whereas alleles with q ~> 0.024 are detected with the same probability in the largest sample (63 individuals from N > 5000). We do not doubt that these populations have low heterozygosity levels (H), but because populations numbering 103 to 104 in size are particularly likely to harbor alleles with q < 0.024, w e find the conclusions of Lesica et al. about "lack of genetic variability" to be premature. In studies of smaller populations, Sherwin et al. ( 1 9 9 1 ) failed to detect variation in a population of 633 bandicoots (Perameles gunnii) with sample sizes of 2, 10, 26, and 30 individuals. They attributed this result to the possibility that too few loci w e r e investigated; w e also find that alleles with frequencies q ~< 0.137 and 0.047 could escape detection by chance ( P - > 0.05) at loci with samples of 10 and 30, respectively. Wayne et al. ( 1 9 9 1 ) found that seven Isle Royale gray wolves, of the present N = 12, w e r e monomorphic in three out of five loci that are polymorphic in mainland populations. If the allele frequencies of the island population w e r e similar to those of the mainland samples (pooled), variation would not have escaped detection at all three loci ( c o m p o u n d P - -~ 0.0). At any individual locus, however, an allele with q ~< 0.125 would escape detection with P - > 0.05 in Wayne et al.'s sample. They correctly concluded that m o r e Individuals n e e d e d to be sampled to assess the number of alleles lost. Finally, Triggs et al. ( 1 9 8 9 ) sampled two and five kakapos (Strigops habropt/lus) from two remaining insular populations with N = oe~cUooof R ~ Alldes Z69 5 and 45, respectively. They found three out of 27 loci to be polymorphlc in these populations, and scored two additional polymorphic loci in a recently introduced population on a third island ( n = 6 , N = 22). Triggs et al. ( 1 9 8 9 ) briefly discussed possible effects of their small samples, but they felt confident that their "large percentages" ( 1 2 - 4 0 % ) of each population would not cause any major error. With two out of five birds sampled, we find that alleles with a frequency q = 0.30 are detected with P+ = 0.90 in a single-locus situation; similarly, with 5 out of 45 birds sampled from the largest population, only alleles with q I> 0.256 are detected with P+ ~> 0.95. In fact, even if one compares the first population with the third over multiple loci (usIng the latter as reference) the lack of variation in Est-4, Gpi-1, and Mpi-1 in the first population could be a sampling artifact ( c o m p o u n d P - = 0.07). Thus, Triggs et al.'s ( 1 9 8 9 ) conclusion about less variation in the first population, and speculations about absence of certain alleles in the individual populations, are presently not well founded. Conclusions Rare alleles and finite populations are realities for studies in genetics and conservation. In this respect, and also in the context of estimating gene flow between subdivided populations (Slatkin 1985), our model provides a useful tool for determining sample sizes required for confident analyses and for evaluating sample-size effects in earlier studies. In conservation genetics, the model can be used to secure rare alleles in breeding programs (see Tave 1986) and to design sampling so as to minimize disturbance in the study populations; sampling can also be avoided if it is impossible to settle a matter u n d e r acceptable c o n d i t i o n s . Moreover, w e advise against labelling a population as "monomorphlc" w h e n sample sizes do not suffice to match the polymorphism criterion adopted ( such asp ~< 0.95 or 0.99 )---if nothing else, for the sake of the population (Sj6gren 1991 ). Acknowledgments We thank Pekka Pamilo, Staffan Ulfstrand, Terry Ashley, and two anonymous reviewers for comments on earlier drafts of this paper. The study was supported by grants from the Sven and Lilly Lawski foundation to Per-Ivan Wy6nl, and from the Swedish Environmental Protection Agency to Per Sj6gren, from w h o m the computer program GENESAMP is available with the submission of an IBM- or Macintosh-formatted diskette (indicate for- mat!). Crawl Allendorf, F.W., and 1ZF. Leafy. 1986. Heterozygosity and fitness in natural populations of animals. Pages 57-76 in M.E. Conservation Biology Volume 8, No. 1, March 1994 270 Detec~n of Rare MIdes Soul~, editor. Conservation biology: The science of scarcity and diversity. Sinauer Associates, Sunderland, Massachusetts. Archie, J.W. 1985. Statistical analysis of heterozygosity data: Independent sample comparisens. Evolution 39:623-637. Archie, J. w., C. Simon, and A. Martin. 1989. Small sample size does decrease the stability of dendograms calculated from allozyme-frequency data. Evolution 43: 678-683. Chakraborty, IZ, P.A. Fuerst, and M. Nei. 1978. Statistical studies on protein polymorphism in natural populations. II. Gene differentiation between populations. Genetics 88:367-390. Chakraborty, IL, P. A. Fuerst, and M. Nei. 1980. Statistical studies on protein polymorphism in natural populations. III. Distribution of allele frequencies and the number of alleles per locus. Genetics 94:1039-1063. Frankel, O. H., and M. E. Soul~. 1981. Conservation and evolution. Cambridge University Press, Cambridge, England. Fuerst, P.A., R. Chakraborty, and M. Nei. 1977. Statistical studies on protein polymorphism in natural populations. I. Distribution of single locus heterozygosity. Genetics 86:455-483. Gilpin, M.E. 1987. Spatial structure and population vulnerability. Pages 125-139 in M. E. Soul.~, editor. Viable populations for conservation. Cambridge University Press, Cambridge, En- glan& Gilpin, M.E., and M.E. Soul6. 1986. Minimum viable populations: Processes of species extinction. Pages 13-34 in M.E. Soul~, editor. Conservation biology: The science of scarcity and diversity. Sinauer Associates, Sunderland, Massachusetts. Grant, W. S., arid G. St~l. 1988. Evolution of Atlantic and Pacific cod: Loss of genetic variation and gene expression in Pacific cod. Evolution 42:138-146. Gregorius, H. 1980. The probability of losing an allele when diploid genotypes are sampled. Biometrics 36:643-652. Lesica, P., 1Z F. Leafy, F.W. Allendorf, and D.E. Bilderbacl~ 1988. Lack of genic diversity within and among populations of an endangered plant, Howellta aquatllt~ Conservation Biology 2:275-282. ConservationBiology Volume8, No. I, March 1994 S]6gren & V~/b'~ O'Brien, S.J., D. E. Wildt, D. Goldman, C. R. Meril, and M. Bush. 1983. The cheetah is depauperate in genetic variation. Science 221:459-462. O'Brien, S.J., M.E. Roelke, L Marker, A. Newman, C.A. Winider, D. Meltzer, L. ColIy, J. F. EvermmuL M. Bush, and D.E. Wildt. 1985. Genetic basis for species vulnerability in the cheetah. Science 227:1428-1434. Sarich, V. M. 1977. Rates, sample sizes, and the neutrality hy.pothesis for electrophorests in evolutionary studies. Nature 265:24-28. Shalfer, M. L 1990. Population viability analysis. Cotmervation Biology 4:39. Sherwin, W.B., N. D. Murray, J.A. Marshall Graves, and P. Brown. 1991. Measurement of genetic variation in endangered populations: Bandicoots (Marsupialia: Peramelidae ) as an example. Conservation Biology 5:103-108. Sj6gren, P. 1991. Genetic variation in relation to demography of peripheral pool frog populations (Rana lessonae). Evolutionary Ecology 5:248-271. Slatkin, M. 1985. Rare alleles as indicators of gene flow. Evolution 39:53-65. Swoiford, D. L., and S. H. Berlocher. 1987. Inferring evolutionary trees from gene frequency data under the principle of maximum parsimony. Systematic Zoology 36:293-325. Tave, D. 1986. Genetics for fish hatchery managers. The AVI Publishing Company, Westport, Connecticut. Triggs, S.J., R. G. Powlesland, and C.H. Daugherty. 1989. Genetic variation and conservation of Kakapo (Strfgops habropttlus: Psittaclformes). Conservation Biology 3:92-96. Vrijenhoek, R. C. 1989. Population genetics and conservation. Pages 89-98 in D. We.stem and M. Pearl, editors. Conservation for the twenty-first century. Oxford University Press, Oxford, England. Wayne, R.K., D.A. Gilbert, N. Lehman, K. Hansen, A. Eisenhawer, D. Girman, 1Z O. Peterson, L D. Mech, p.J.p. Gogan, U.S. Seal, and lZJ. Krumenaker. 1991. Conservation genetics of the endangered Isle Royale gray wolf. Conservation Biology 5:41-51.