ICES Journal of Marine Science: Journal du Conseil Advance Access originally published online on November 13, 2006
ICES Journal of Marine Science: Journal du Conseil 2007 64(1):97-109; doi:10.1093/icesjms/fsl013
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Optimizing a stratified sampling design when faced with multiple objectives
1 Quantitative Ecology and Resource Management Program, University of Washington, Box 352182, Seattle, WA 98195, USA
2 School of Aquatic and Fisheries Sciences, University of Washington, PO Box 355020, Seattle, WA 98195, USA
3 National Marine Fisheries Service, Alaska Fisheries Science Center, 7600 Sand Point Way NE, Seattle, WA 98115, USA
Correspondence to T. J. Miller: Present address: Large Pelagics Research Center, Zoology Department, University of New Hampshire, Durham, NH 03824, USA. tel: +1 603 862 2897; fax: +1 603 862 2888; e-mail: tim.miller{at}unh.edu
Miller, T. J., Skalski, J. R., and Ianelli, J. N. 2007. Optimizing a stratifield sampling design when faced with multiple objectives ICES Journal of Marine Science, 64, 97109.For many stratified sampling designs, the data collected are used by multiple parties with different estimation objectives. Quantitative methods to determine allocation of sampling effort to different strata to satisfy the often disparate estimation objectives are lacking. Analytical results for the sampling fractions and sample sizes for primary units within each stratum of a stratified (multi-stage) sampling design that are optimal with respect to a weighted sum of relative variances for the estimation objectives are presented. Further, an approach for assessing gains or losses for each estimation objective by changing allocation of sample sizes to each stratum is provided. As an illustration, the analytical results are applied to determine optimal observer sampling fractions (coverage rates) for the North Pacific Groundfish Observer Programme (NPGOP), for which the multiple objectives are assumed to be bycatch (seabird, marine mammal, and non-targeted fish species) and total catch, and catch-at-length and -age of targeted fish species. Simultaneously optimizing a criterion that defines the strata of the NPGOP sampling design is also considered. When observer coverage rates are allowed to be gear-specific for the NPGOP design, the optimized objective function is between 10% and 28% less than the value corresponding to current sampling for annual data (20002003) and 12% less when optimized over all years combined.
Keywords: multi-parameter, North Pacific, observer coverage, optimal sampling, stratified sampling
Received 27 February 2006; accepted 13 September 2006; advance access publication 13 November 2006.
| Introduction |
|---|
Investigations of efficient sampling have been an important part of developing aquatic resource surveys or catch sampling designs for several decades (Ketchen, 1949; Southward, 1976; Manly et al., 2002). Of course, researchers in other sciences are also concerned with maximizing precision of their sampling designs, and it is not uncommon to measure multiple attributes on the population being sampled, whatever the field. The usual approach is to minimize the variance of an estimator for a single attribute given a fixed cost or vice versa (Cochran, 1977; Särndal et al., 1992), but this does not suffice for multipurpose surveys. In the field of statistics, there have been many studies on optimal sampling with respect to multiple parameters for specific designs, and some analytical results can be found in sampling theory references (Jessen, 1978; Yates, 1981; Särndal et al., 1992).
In fisheries management, there have been extensive investigations of optimal sampling designs with respect to a special type of multiple objective problem: proportions at age or length of commercially important species. Early efforts for optimal estimation of age or length composition in commercial catches usually focused on one age class or length class at a time (Ketchen, 1949; Tanaka, 1953; Kutkuhn, 1963). Kimura (1977) then used an objective function that was the sum of the variances of estimated age-class proportions to compare two-phase sampling and simple random sampling without replacement (SRS) (Cochran, 1977; Särndal et al., 1992). Lai (1987, 1993) and Smith (1989) generalized Kimura's objective function by including covariance of proportion estimators. Bayesian approaches to the optimal estimation of age-class proportions have also been considered (Smith and Sedransk, 1982; Jinn et al., 1987).
Estimation of age- or length-class proportions is an important objective in fisheries management, but for the general problem of multiple estimation objectives, the parameters to be estimated from the sampling design may be very different in nature. Optimizing designs with respect to multiple unrelated parameters is rare, but Manly et al. (2002) recently focused on optimizing the variances of density estimates for several shellfish species simultaneously. They used an iterative method for finding optimal within-strata sample sizes in an unusual two-phase stratified design, where the objective function is a weighted sum of the mean coefficient of variation (CV), the maximum CV, and the mean of all CVs over an arbitrary threshold.
We present an approach for determining sampling fractions and sample sizes for each stratum within a stratified sampling design that is optimal with respect to multiple parameters that may be heterogeneous in nature. The type of optimality we consider is related closely to those treated by Manly et al. (2002) and Lai (1987, 1993). We use scaled measures of variance like Manly et al. (2002), and our objective function is a weighted sum of these scaled variance measures, like Lai (1987, 1993). The sampling design we consider for illustration of our analytical results is that used by the North Pacific Groundfish Observer Programme (NPGOP), which deploys scientifically trained personnel (observers) on vessels to collect data as catches are made.
Observer programmes are implemented in many fisheries around the world and are important for bycatch investigations in fisheries ranging in scale from artisanal (e.g. D'Agrosa et al., 2000; Ambrose et al., 2005) to industrial (e.g. Trippel et al., 1996; Stratoudakis et al., 1999; Rochet et al., 2002). Perhaps the most visible uses of observer programme data are studies concerning seabird, marine mammal, and sea turtle bycatch (e.g. Klaer and Polacheck, 1998; Lewison et al., 2004; Vinther and Larsen, 2004). However, estimation of bycatch of fish species, assessment of targeted stocks, and studies of ecological aspects of marine systems are also important uses of observer-collected data (Romanov, 2002; Spencer et al., 2002; Worm et al., 2003).
In the illustration, we focus on sampling fractions (observer coverage rates) for different strata, because NPGOP observer coverage is currently defined as a percentage of fishing effort for the regulated fisheries, but the equations for optimal sample sizes for each stratum are also provided for applicability to any stratified sampling design. A further extension we explore is simultaneous optimization of the criterion that determines the stratification used in the design. We provide example results based on data collected by NPGOP observers between 2000 and 2003.
| Observer programme sampling design and optimization strategy |
|---|
The NPGOP oversees observers deployed on vessels fishing in federal waters off of Alaska (AFSC, 2005). These observers collect information on a wide variety of catch attributes, including bycatch of seabirds and marine mammals, catch of targeted and untargeted fish species, and age and length composition of targeted fish species. The NPGOP uses a stratified multi-phase design for observer deployment and catch sampling (Särndal et al., 1992, pp. 100104 and 343350). The catches made by each vessel within each 3-month period (vessel-quarter) are the strata, and there is multi-phase sampling within each vessel-quarter (Miller and Skalski, 2006a, b). The first two phases are sampling of trips within vessel-quarters and hauls within trips, but further phases of sampling occur within hauls, depending on what information is collected. With current regulations, longline and trawl vessels >38.1 m long (large vessels) must have observers on board for all days spent fishing (100% observer coverage). Each longliner and trawler between 18.3 and 38.1 m long (medium vessels) must have observers on board for 30% of their fishing days each quarter (30% observer coverage). All vessels longer than 18.3 m that fish pot gear require just 30% observer coverage (recently, regulations changed the focus of coverage to pots fished rather than days fished). No observer coverage is required for any vessels <18.3 m long (small vessels), so we do not address this sector here (AFSC, 2005).
For the NPGOP, the goal is to optimize the sampling effort allocated to the size classes, which equates to the expected rates of observer coverage for each vessel-quarter. We assume that levels of sampling effort at subsequent phases are satisfactory and hold sampling effort within trips constant.
| Sampling variance for a stratified sampling design |
|---|
Determining optimal sample sizes and sampling fractions for each stratum of a stratified design requires a functional relationship of the sample sizes or sampling fractions to sampling variance. We begin with the sampling variance of the estimator for a given parameter total in a stratum (
), and we assume that the set of sampled elements in a stratum is obtained by SRS. For the NPGOP sampling design, this would correspond to the sampling variance for a given catch parameter total in the
th vessel-quarter and an SRS of all trips made in that vessel-quarter (for definitions of notation used throughout the development of the optimal sampling results, see Table 1). In general, there may be subsampling within each primary unit of a stratum (which is the case for the NPGOP). The sampling variance of the estimator for a given parameter total in the stratum when there is further subsampling within each stratum element is
|
| (1) |
|
|

=
t=1N
t/N
, and n
and N
are the number of sampled and total elements in the
th stratum. The form of Var(
t) depends on the sampling procedures used within the tth element (Särndal et al., 1992, p. 142). In the North Pacific groundfish fisheries, the sampling procedures within the tth trip depend primarily on the gear type used by the vessel. A form for the sampling variance [Equation (1)] more useful for determining optimal sampling fractions is
|
|
=N
S
,
2+
t=1N
Var(
t), B
=N
S
,
2, and f
=n
/N
. For stratified SRS designs without subsampling, we have
t=
t, and Var(
t)=0.
|
For the NPGOP, the sampling fraction of trips (f
) in vessel-quarters associated with medium-size vessels will generally not equal the observer coverage rate (30%) in any given instance, but we assume that the sampling fraction equals the observer coverage rate, on average [i.e. E(n
)=0.3N
] (Miller and Skalski, 2006a, b). Because the observer coverage rate is based on the number of days spent fishing each quarter for longliners and trawlers or total pots fished for pot vessels, our assumption implies that there is no relationship between trip duration or the amount of fishing effort and the presence of an observer. However, these assumptions are always implicit when extrapolating catch characteristics from observed fishing effort to unobserved fishing effort. Furthermore, the average sampling fraction (observer coverage rate) is identical for all vessel-quarters attributed to medium-size vessels because current regulations stipulate a common observer coverage for a given size class of vessels (f
=fs).
In stratified sampling designs, sampling in one stratum is independent of that in others, and the sampling variance of the estimator for a given parameter total over all elements in all strata is just the sum of the sampling variances for estimators within each stratum:
|
| (2) |

=1Vs As
, Bs=
=1Vs Bs
, and S is the number of arbitrary groups of strata. | Determining optimal sampling fractions and sample sizes |
|---|
The ultimate goal is to minimize a function of the sampling variances corresponding to the estimation objectives with respect to the sampling fraction, fs, or sample size, ns. However, there are always limited resources available for sampling, and we require a cost constraint that defines the total number of elements that can be sampled. We minimize the sampling variance for the parameter estimator over all strata, given this cost constraint. For the NPGOP, we reallocate the number of sampled trips to minimize the sampling variance over vessel-quarters in all S size classes given the cost constraint. Consider a general cost function,
|
|
th stratum in the sth group of strata (the tth trip in the
th vessel-quarter in the sth size class for the NPGOP). Realistically, in the NPGOP sampling design, the cost of sampling each trip increases with the duration of the trip, because the vessel pays for the time that the observer is on board, but cost to the NPGOP is further complicated by the fact that observers are hired for set periods of time by contracting companies (MRAG, 2000; AFSC, 2005). A simpler cost function that assumes that a set number of elements can be sampled (trips can be observed) and that the cost is the same for each element (Tn) is
|
| (3) |
The objective function [Equation (2)] and the cost function [Equation (3)] can be used to determine the optimal sampling fractions (expected observer coverage rates) for each group of strata (vessel size class) analytically:
|
| (4) |
s =1S
=1Vsns
is the total number of trips that may be sampled under the cost constraint (see Appendix 1 for a derivation). As |
| (5) |
). For hierarchically stratified sampling designs, we may wish to set sampling fractions for groups of strata that are different from those currently defined. For example, the NPGOP may wish to set sampling fractions (observer coverage rates) for each combination of gear type and size class rather than by just size class, as is currently done. That is, for a given size class, unique coverage rates for trawl, longline, and pot-gear vessels could be allowed rather than the same coverage rate for all gear types of a given size class. When this is of interest, the same result [Equation (4)] is applied, but S denotes the number of different combinations of gear type and size class, and As, Ns, and fs pertain to the sth combination.
| Multiple estimation objectives |
|---|
Commonly, there are multiple estimation objectives for a given sampling design, and determining sampling fractions or sample sizes that are near optimal for many or all of the catch parameters is a more appropriate objective. The results for a single estimation objective, [Equations (4) and (5)], could be used for each catch parameter, but how should the optimal sampling fractions or sample sizes for each catch parameter be averaged? Several approaches exist to deal with optimal allocation when multiple parameters or characteristics are of interest. These typically involve minimizing the sum or average of variances of the estimators for the parameter totals of interest (Cochran, 1977; Särndal et al., 1992). An extension discussed by Kimura (1977) and Lai (1987, 1993) in the context of sampling for age or length composition is to allow weights (wp) to be applied to different parameters. When there are p total parameters being estimated and
p is the estimator for the pth parameter, the objective function is
|
| (6) |
A problem intrinsic to Equation (6) for unrelated estimation objectives is that there may be a high degree of heterogeneity in sampling variance, Var(
p), across parameters, merely because they are heterogeneous in scale. In the North Pacific groundfish fisheries, total catch of walleye pollock in the Bering Sea is large relative to the proportion of individuals in a given length or age class, and this relationship will also exist for respective sampling variances. Comparisons of precision for these types of estimation objectives are better made using a scaled measure of variance. The most commonly used scaled variance measure is the CV. However, numerical methods are required to optimize sampling effort when CVs for each catch parameter are used instead of the variances in Equation (6), because there is no analytical solution for the optimal observer coverage in each stratum. Sukhatme and Sukhatme (1970) suggest the use of the sum of relative variances (RVs). The RV is the square of the CV and is therefore also a scaled measure of variance. Using the RVs in Equation (6) will implicitly give greater weight to estimators with larger CVs, which could well be an important a priori goal. Specifically, the objective function is
![]() | (7) |
p) is the RV of the estimator (
p). The ability to assign weights to the catch parameters is also important in the context of multiple estimation objectives, because they can reflect the consensus of parties with different priorities on the relative importance of each estimation objective.
The sampling fractions for the sth group of strata that is optimal with respect to all p parameters,
|
| (8) |
|
|
|
| (9) |
In practice, we only have estimates of the sampling variances for each parameter rather than true values, and we have to use estimates of Aps in either Equation (8) or Equation (9). Recall that, for the pth parameter total,
|
|
|
|
|
|
, see Appendix 2).
Because estimates of sampling variance are always used in equations for optimal sample sizes or sampling rates, the equations are, themselves, estimators with (typically unconsidered) corresponding variances. For the NPGOP design, in particular, there is a further complication that the total number of fishing trips is unknown for vessel-quarters with less than 100% observer coverage (Miller and Skalski, 2006a, b), and we use an estimate of N
for those vessel-quarters based on the number of observed trips and current coverage requirements, 
=10n
/3. The estimates of the total number of elements (trips) per stratum are independent of the sampling variance estimates, and their use in Equation (8) or Equation (9) will increase the variance of the estimators for optimal sample sizes or sampling rates, but the expected values of these estimators should be negligibly affected. Furthermore, several changes to the NPGOP have been recommended that would provide the number of trips for each stratum as well as reduce reliance on model-based methods, so that the stratified sampling variance formula [Equation (2)] might be used for inference (Miller and Skalski, 2006a, b).
| Further ways to optimize the NPGOP design |
|---|
Because length criteria are used to allocate vessels into size classes, a useful extension of the analytical result for optimal coverage is to optimize also the length criterion determining vessel size class. In other words, we simultaneously optimize both the length criterion that groups vessels into size classes and the sampling fractions (observer coverage rates) for the resulting size classes. For the NPGOP, we implement the simultaneous optimization by determining the conditional minimum of Equation (7) at each length criterion with Equation (8), and the length criterion with the smallest minimum is the global minimum.
A further extension is to let observer coverage rates be unique for each gear type, so that we simultaneously optimize the length criterion and coverage rates for vessel-quarters in each resulting category of gear type and size class. The global minimum for the objective function [Equation (7)] is found using the same approach described in the previous paragraph. The difference here is that S denotes the six categories of gear type and size class (three gear types, two size classes) instead of just two size-class categories.
| Assessing the effect for each parameter of changing sample allocation |
|---|
The analytical results [Equations (8) and (9)] are useful for determining optimal sampling fractions and sample sizes, respectively, but we may wish to know how different the current RVs for each parameter are from corresponding values using optimal sample allocation. This difference in RVs is directly related to the difference in the components of the objective function for each parameter at current and optimal sample allocation. Letting
|
|
|
| (10) |
| An example: optimizing NPGOP observer coverage rates |
|---|
When the observer coverage rates are changed from the currently regulated 30% and 100% for medium-size and large vessel-quarters, respectively, there may be very little change in precision within a large vicinity of the optimal sample allocation for some catch parameters, whereas for other catch parameters substantial changes may be observed. A graphical display of the change in RV of catch parameter estimates with changes in observer coverage rates is useful to see how strongly RV and observer coverage rates are related (Figure 1). On the right side of the line denoting current coverage rates in these plots, the curve represents the RV at a fixed number of observed trips that are reallocated to the different size classes to achieve the given coverage rates. As the number of trips sampled and coverage rate increase for the medium size class (bottom axis), the number and the rate sampled in the large size class (top axis) must decrease when the total sample size is fixed. To the left of the current coverage line, the curve still represents the RV, but the number of observed trips is necessarily decreasing. This is the reason why the large size class coverage rate is always 1.00 (100%), to the left of the current coverage line. Although we can reduce sample size from the medium size class, we cannot reallocate this sampling effort to the large size class when all trips in this size class are already sampled. The line denoting optimal coverage is obtained using Equation (4). In fact, at the current length criterion, the optimal coverage for vessel-quarter attributed to medium-size vessels will always be
30%, because precision cannot be gained by reducing the number of observed trips in the medium size class and keeping coverage in the large size class constant at 100%.
|
For walleye pollock (Theragra chalcogramma) and Pacific cod (Gadus macrocephalus), the estimated RVs are similar to that at the optimal observer coverage rates for a wide range of coverage rates (Figure 1). On the other hand, it appears that estimates of chinook salmon (Oncorhynchus tshawytscha) and black-footed albatross (Phoebastria nigripes) bycatch would be much more precise if coverage were shifted to the medium size class of vessel, because there was no observed mortality for these species aboard vessels of the large size class. Hence, observer coverage rates that are optimal for different catch parameters can be contradictory, and the relationship of precision to observer coverage can be quite variable among catch parameters.
To illustrate optimal sample allocation with respect to multiple estimation objectives, we use Equation (8) to obtain optimal coverage rates with several catch parameters that make use of the various types of data NPGOP observers collect (Table 2). In practice, the weights that would be used in Equation (8) would be determined by consensus among various users of the data collected under the sampling design. For our illustration, we use weights such that estimates of the same parameter types are treated equally and those of different parameter types are also treated equally. For example, catches of Pacific cod in the Aleutian Islands, Gulf of Alaska, and Bering Sea are treated equally, as are black-footed albatross bycatches in the same regions, but the more general parameters, Pacific cod catches and black-footed albatross bycatch, are also treated equally. As we treat different parameter groups equally, as well as specific parameters within each group, our weights impose a type of nested equal treatment of parameters. In a management setting with multiple users of observer data, this type of weighting represents equal treatment of different categories of observer data usage and equal treatment among subcategories of usage. Moreover, the weighting we use implies that estimation for economically valuable species is treated equally to that for species with little or no direct economic value.
|
When we use the nested equal weighting scheme (Table 2) with the objective function [Equation (7)] and restrict observer coverage rates (sampling fractions) to be constant across vessels of the same size class (except that pot vessels are all included in the medium size class), optimal coverage rates change from year to year between 2000 and 2003 (Figure 2). For the medium size class, optimal coverage rates range from the current regulations (30%) for 2001 to nearly 36% for 2003. As in the single parameter examples, optimal coverage of the medium size class coverage must be
30%, and the coverage in the medium size class cannot ever reach 100% because there are more trips made in vessel-quarters in this size class than there are total observed trips in either size class. Furthermore, the relationship between observer coverage rates for the large and medium size classes also changes from year to year (quarter to quarter as well) because of varying proportions of total numbers of fishing trips in the two size classes. The annual reduction in the objective function value by shifting to optimal observer coverage rates ranges between 0% and 2.3%.
|
When we optimize the length criterion that distinguishes which vessels are in the two size classes as well as the observer coverage rates in the resulting groups, we have a surface with respect to length criterion (for longliners and trawlers only) and medium size class observer coverage. Using the same catch parameters and weighting scheme (Table 2), there is more variability in the optimal observer coverage rates from year to year (Figure 3) relative to the results with a fixed length criterion (Figure 2). In 2002, we now have an optimal coverage of <30% for the medium size class because the optimal length criterion (37.80 m) is slightly less than the current length criterion (38.1 m), which reduces the total number of trips in this size class. Note also that the fixed length criterion results (Figure 2) are represented by the horizontal slices through the surface plots (Figure 3) at the current length criterion. Optimizing over both the length criterion and observer coverage rates results in a reduction in the objective function (between 6% and 19%) that is greater than when observer coverage is considered with the current length criterion.
|
In the surface plots (Figure 3), we only show the objective as a function of medium-sized vessel observer coverage, because the large vessel coverage is directly related to that of the medium size class and this relation will change with length criterion. Let fL and fM denote observer coverage rates in the large and medium size classes, respectively, NL and NM denote total number of trips made in the large and medium size classes, respectively, and n denote the total number of observed trips. The observer coverage for the large size class is then
|
|
Yearly optimization results (Figure 3) are useful for assessing the magnitude of variability in optimal observer coverage rates over time, but it is probably preferable to have coverage rates change infrequently so that complexity of regulations is reduced. Furthermore, for short periods, optimal observer coverage rates can change dramatically from one period to another. Hence, a suitable approach would be to find the optimal coverage rates and length criterion over multiple years (Figure 4). With such a multi-year approach, we accept that exact optimality will not be achieved in any given period, but rather that estimates will be optimal "on average". As for most of the yearly surfaces, the multi-year result (Figure 4) suggests that an increase in length criterion and an increase in coverage in the medium size class are warranted given the catch parameters and weighting scheme (Table 2). With respect to 20002003 data combined, the reduction in the objective function (7) by shifting to optimal coverage and length criteria (7%) is intermediate to the range of year-specific reductions.
|
In the final analysis, we allow different observer coverage rates for each gear type and the same catch parameters and weights as above (Table 2). The optimal length criterion is fairly stable (41.548.5 m) at a value greater than the current length criterion (Table 3). However, there can be many local minima corresponding to different length criteria (Figure 5). In 2000, a span of vessel length criteria (approximately 4070 m) would provide near optimal RVs of the catch parameters. The yearly optimal observer coverage rates for the resulting size classes of trawlers are consistent relative to other gear types and similar to those under current regulations. In contrast, the yearly coverage rates for the medium size class in the longline sector are higher than the current rate (30%), and those for the pot vessels of either size class are generally lower than the current rate (30%). Other than longline and pot vessels in the medium size class, the optimal length criterion and observer coverage rates based on data from all years (20002003) are all within the range of corresponding yearly results (Table 3), and the relationship of the optimized objective function to length criterion exhibits a local minimum similar to the yearly results for 2001 and 2002 (Figure 6). Between 2000 and 2003, the values of the objective function at the optimal length criteria and gear-specific observer coverage rates are 1028% less than the values at current observer coverage rates and length criteria. As with the multi-year result where the observer coverage rates are the same across gear types (Figure 4), the gear-specific optimization carried out over multiple years yields a reduction in the objective function value that is intermediate to the range of year-specific reductions (12%).
|
|
|
With the multi-year optimization (i.e. the bottom row of Table 3), the catch parameter estimates that stand to gain most by shifting to the optimal length criterion and observer coverage rates are those for laysan albatross (P. immutabilis) bycatch in the Aleutian Islands and Bering Sea (Table 2). Large gains are also expected for Steller sea lion (Eumetopias jubatus) bycatch estimates in the Aleutian Islands and Gulf of Alaska and black-footed albatross bycatch estimates in the Gulf of Alaska. The largest loss in RV is expected for sockeye salmon (O. nerka) bycatch estimates in the Gulf of Alaska, but the loss is small relative to the largest gains. Length-class estimates for Pacific cod and age-class estimates for walleye pollock would be affected relatively little, on average (Table 2).
| Discussion |
|---|
A method to objectively determine appropriate sample allocations to various strata in a stratified sampling design when there are multiple estimation objectives provides a crucial tool for researchers with limited resources at hand. Observer programmes, or other catch sampling designs, provide data critical to sound management of multiple resources and, as such, are obvious settings where the analytical results we provide can be used. The basis for optimal allocation we use is also appealing, because the sampling fractions (observer coverage rates) or sample sizes are determined methodically and with a high degree of transparency for data users as well as any parties that may be impinged upon by the actual sampling. In the North Pacific groundfish fisheries, the groundfish industry is impacted logistically and financially by observer coverage, and a well-documented rationale for rates of coverage that individual vessels must accept could strengthen relations between industry and fishery managers. Moreover, under any of the different scenarios (optimizing coverage with the current length criterion, optimizing coverage and length criterion, or optimizing length criterion and gear-specific coverage), the gains and losses for any catch parameters (whether they are included in the objective function or not) by shifting to optimal coverage can be assessed efficiently by different data users through Equation (10).
The objective function [Equation (7)] and the corresponding analytical solution for optimal sampling fractions in various strata [Equation (8)] also have utility beyond observer deployment. These results apply to any sampling design that is stratified and SRS within the strata (there may also be multi-phase or multi-stage sampling within each element of a given stratum). These types of sampling designs are commonly used in studies of natural resources such as forest surveys or the periodic surveys carried out in marine systems for fishery-independent information. For example, these analytical results could be used to find rates of sampling for trips within strata in stratified landings sampling programmes that account for all the various targets of estimation (e.g. Sen, 1986; Crone, 1995). Furthermore, the objective function could also be used to optimize fishery-independent surveys that also employ stratified designs and provide estimates of (relative) abundance for several different stocks or species. In fact, Equation (7) is closely related to Equations (23) and (24) discussed by Schnute and Haigh (2003).
In addition to the utility of the optimal sampling results, Equations (8) and (9), we observed an important relationship between the value of the optimized objective function and the constraints on the optimization. In the NPGOP example, as the constraints on the length criterion and distribution of observer coverage are relaxed, we can obtain greater reductions from current observer coverage allocation at the optimized objective function. In other words, we showed a greater reduction in the optimized objective function when we changed the length criterion and observer coverage rates within each gear type and size class combination (Table 3) rather than keeping observer coverage rates the same within a size class across gear types (Figure 3). Therefore, researchers or resource managers might consider changes to the sampling design more substantial than just reshuffling sample sizes between existing strata to optimize the objective function further.
Because the nature of the sampled population changes over time, and optimization using earlier data may not be as representative of the population in the immediate future, we recommend that optimization be based on recently collected data. However, a high degree of variability of results for consecutive periods would argue for optimizing over as many periods as possible (e.g. multiple years). Optimization over multiple periods will not give optimal allocations for any specific year, but instead would provide allocation that is optimal over the long term.
That optimality is always based on data already collected and that the population we wish to sample in future will likely differ from that from which we collected the data are important limitations of sample allocation optimization to keep in mind. With these limitations, we cannot expect that our "optimal allocation" will actually be the most optimal for future sampling. Rather, we are making an informed guess of the optimal allocation for the future population.
When optimal sample sizes rather than sampling rates are desired for a stratified design, we will also need to know the total number of elements in each stratum for the future population and an appropriate cost function for future sampling. This information is not likely to be available at the beginning of a sampling period for some sampling plans such as fishery observer programmes, because fishing vessels may not know how much they will fish during the sampling period.
| Appendix 1 |
|---|
Derivation of optimal observer coverage rates
The optimal sampling fractions can be found using the CauchySchwartz Inequality or Lagrange multipliers. The second component of Equation (2) and the first component of Equation (3) are not a function of fs, so the minimum of Equation (2) with respect to fs will not be affected by those components. By the CauchySchwartz Inequality (Cochran, 1977, pp. 9698; Casella and Berger, 2002, p. 187),
![]() |
|
| (11) |
s=
). Letting |
| (12) |
, we can substitute for
in Equation (11) to obtain
|
|
|
|
The same approach is used to obtain the optimal sampling fractions with multiple estimation objectives. Noting that the objective function [Equation (7)] can be written as
![]() |
| Appendix 2 |
|---|
Derivation of the unbiasedness of Â

The estimator

,
2 has a random component:
![]() |
![]() |
![]() |
t are independent across elements in the stratum. We also rely on the first and second-order inclusion probabilities in SRS, n
/N
and n
(n
1)/N
(N
1), respectively, which are proven elsewhere (Särndal et al., 1992, pp. 3132). Now,
![]() |
(i.e. E(Â
)=N
E(
,
2)=A
). | Acknowledgements |
|---|
We thank the editor, Verena Trenkel, and two anonymous reviewers for their thoughtful comments, which greatly improved this paper. We also thank the North Pacific Groundfish Observer Programme and the scientific observers working in the North Pacific Groundfish fisheries for collecting and maintaining such valuable data. The study was partially funded by the Joint Institute for the Study of the Atmosphere and Ocean under NOAA Cooperative Agreement No. NA17RJ1232 and both the School of Aquatic and Fishery Sciences and Quantitative Ecology and Resource Management Program at the University of Washington.
| References |
|---|
-
AFSC (Alaska Fisheries Science Center). (2005) North Pacific Groundfish Observer Manual. (North Pacific Groundfish Observer Program, Seattle, Washington)388 AFSC, 7600 Sand Point Way NE 98155.
Ambrose E. E., Solarin B. B., Isebor C. E., Williams A. B. (2005) Assessment of fish by-catch species from coastal artisanal shrimp beam trawl fisheries in Nigeria. Fisheries Research 71:125132.[CrossRef]
Casella G. and Berger R. L. (2002) Statistical Inference. (Duxbury, Pacific Grove, California)660.
Cochran W. G. (1977) Sampling Techniques. 3rd edn. (John Wiley & Sons, New York)428.
Crone P. R. (1995) Sampling design and statistical considerations for the commercial ground-fish fishery of Oregon. Canadian Journal of Fisheries and Aquatic Sciences 52:716732.
D'Agrosa C., Lennert-Cody C. E., Vidal O. (2000) Vaquita bycatch in Mexico's artisinal gillnet fisheries: driving a small population to extinction. Conservation Biology 14:11101119.[CrossRef]
Jessen R. J. (1978) Statistical Survey Techniques. (John Wiley & Sons, New York)520.
Jinn J. H., Sedransk J., Smith P. (1987) Optimal two-phase stratified sampling for estimation of the age composition of a fish population. Biometrics 43:343353.[CrossRef]
Ketchen K. S. (1949) Stratified subsampling for determining age distributions. Transactions of the American Fisheries Society 79:205212.[CrossRef]
Kimura D. K. (1977) Statistical assessment of the agelength key. Journal of the Fisheries Research Board of Canada 34:317324.
Klaer N. and Polacheck T. (1998) The influence of environmental factors and mitigation measures on by-catch rates of seabirds by Japanese longline fishing vessels in the Australian region. Emu 98:305316.[CrossRef]
Kutkuhn J. H. (1963) Estimating absolute age composition of California salmon landings. Fish Bulletin, California Department of Fish and Game 120:47.
Lai H-L. (1987) Optimum allocation for estimating age composition using the age-length key. Fishery Bulletin US 85:179185.
Lai H-L. (1993) Optimal sample design for using the age-length key to estimate age composition of a fish population. Fishery Bulletin US 91:382388.
Lewison R. L., Freeman S. A., Crowder L. B. (2004) Quantifying the effects of fisheries on threatened species: the impact of pelagic longlines on loggerhead and leatherback sea turtles. Ecology Letters 7:221231.
Manly B. F. J., Akroyd J-A. M., Walshe K. A. R. (2002) Two-phase stratified random surveys on multiple populations at multiple locations. New Zealand Journal of Marine and Freshwater Research 36:581591.
Miller T. J. and Skalski J. R. (2006a) Estimation of seabird bycatch for North Pacific longline vessels using design- and model-based methods. Canadian Journal of Fisheries and Aquatic Sciences 63:18781889.[CrossRef]
Miller T. J. and Skalski J. R. (2006b) Integrating design- and model-based inference to estimate length and age composition in North Pacific longline catches. Canadian Journal of Fisheries and Aquatic Sciences 63:10921114.[CrossRef]
MRAG (Marine Resources Assessment Group Americas, Inc.). (2000) Independent review of the North Pacific Groundfish Observer Program. National Marine Fisheries Service134 Report to Alaska Fisheries Science Center.
Rochet M-J., Péronnet I., Trenkel V. M. (2002) An analysis of discards from the French trawler fleet in the Celtic Sea. ICES Journal of Marine Science 59:538552.
Romanov E. V. (2002) Bycatch in the tuna purse-seine fisheries of the western Indian Ocean. Fishery Bulletin US 100:90105.
Särndal C-E., Swensson B., Wretman J. H. (1992) Model Assisted Survey Sampling. (Springer, New York)694.
Schnute J. and Haigh R. (2003) A simulation model for designing groundfish trawl surveys. Canadian Journal of Fisheries and Aquatic Sciences 60:640656.[CrossRef]
Sen A. R. (1986) Methodological problems in sampling commercial rockfish landings. Fishery Bulletin US 84:409421.
Smith P. J. (1989) Is two-phase sampling really better for estimating age composition? Journal of the American Statistical Association 84:916921.[CrossRef]
Smith P. J. and Sedransk J. (1982) Bayesian optimization of the estimation of the age composition of a fish population. Journal of the American Statistical Association 77:707713.[CrossRef]
Southward G. M. (1976) Sampling landings of halibut for age composition. 58:31 Scientific Report, International Pacific Halibut Commission.
Spencer P. D., Wilderbuer T. K., Zhang C. I. (2002) A mixed-species yield model for eastern Bering Sea shelf flatfish fisheries. Canadian Journal of Fisheries and Aquatic Sciences 59:291302.
Stratoudakis Y., Fryer R. J., Cook R. M., Pierce G. J. (1999) Fish discarded from Scottish demersal vessels: estimators of total discards and annual estimates for targeted gadoids. ICES Journal of Marine Science 56:592602.
Sukhatme P. V. and Sukhatme B. V. (1970) Sampling Theory of Surveys with Applications. (Iowa State University Press, Ames, Iowa)452.
Tanaka S. (1953) Precision of age-determination of fish estimated by double sampling method using the length for stratification. Bulletin of the Japanese Society of Scientific Fisheries 19:657670.
Trippel E. A., Wang J. Y., Strong M. B., Carter L. S., Conway J. D. (1996) Incidental mortality of harbour porpoise (Phocoena phocoena) by the gill-net fishery in the lower Bay of Fundy. Canadian Journal of Fisheries and Aquatic Sciences 53:12941300.[CrossRef]
Vinther M. and Larsen F. (2004) Updated estimates of harbour porpoise (Phocoena phocoena). Journal of Cetacean Research and Management 6:1924.
Worm B., Lotze H. K., Myers R. A. (2003) Predator diversity hotspots in the blue ocean. Proceedings of the National Academy of Sciences of the USA 100:98849888.
Yates F. (1981) Sampling Methods for Censuses and Surveys. 4th edn. (MacMillan, New York)458.
This article has been cited by other articles:
![]() |
Y. Liu, Y. Chen, and J. Cheng A comparative study of optimization methods and conventional methods for sampling design in fishery-independent surveys ICES J. Mar. Sci., October 1, 2009; 66(9): 1873 - 1882. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||















