Skip Navigation

ICES Journal of Marine Science: Journal du Conseil 2003 60(2):297-303; doi:10.1016/S1054-3139(03)00008-0
© 2003 by ICES/CIEM International Council for the Exploration of the Sea/Conseil International pour l'Exploration de la Mer
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Sondre, A.
Right arrow Articles by Pennington, M.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Sondre, A.
Right arrow Articles by Pennington, M.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

On estimating the age composition of the commercial catch of Northeast Arctic cod from a sample of clusters

Aanes Sondre* and Michael Pennington

Institute of Marine Research PO Box 1870 Nordnes, N-5817 Bergen, Norway

*Correspondence to S. Aanes; tel: +47 55 238627; fax: +47 55 238687. e-mail: sondre.aanes{at}imr.no; michael.pennington{at}imr.no.

Assessment of Northeast Arctic cod is based on estimates of the commercial catch in numbers at age. The age structure of the catch is estimated by sampling fish from commercial fishing trips. Although it is commonly assumed that a sample of individuals is a random sample from the population, fish sampled from the same trip (i.e. from a "cluster" of fish) tend to be more similar in age than those in the total catch. For Northeast Arctic cod, the intracluster correlation for age is positive, and therefore the effective sample size is much smaller than the number of fish aged. Given the number of fish aged, the precision of the estimated age distribution is rather low, and the number of fish aged from each trip could be reduced from approximately 85 to 20 without a significant loss in precision.

Keywords: age distribution, cluster sampling, effective sample size, intracluster correlation

Received 9 October 2002; accepted 18 December 2002.


    Introduction
 Top
 Introduction
 Commercial age data
 Estimating the age composition
 Results
 Discussion
 References
 
The fishery for Northeast Arctic cod (Gadus morhua) is the largest commercial cod fishery in the world. The total catch of cod peaked in the 1950s with an average annual landing of 800 000 t (Nakken, 1994). The total catch has declined since the 1970s as a consequence of decreasing stock size and quota regulations, which were implemented in the 1970s. Recent landings have ranged from 730 000 t in 1996 to 400 000 t in 2000. Russia and Norway each land approximately half the total catch.

A critical input to fish stock assessments is an estimate of catch at age in numbers (Gudmundsson, 1994; Skagen and Hauge, 2002). This has led to a substantial effort to collect samples from commercial catches to determine the age distribution. However, the precision, the amount of information in the samples, and the sampling scheme for Northeast Arctic cod have not been examined, and therefore the uncertainty in the assessments is not quantified.

When determining the precision of the estimated age distribution of a fish population on the basis of a sample of age readings, it is usually assumed that the age readings are a random sample from the population (see, for example, Hoenig and Heisey, 1987; Richards et al., 1992; Worthington et al., 1995). Because it is generally impossible to sample a fish population randomly, samples of fish for age determination are taken from a number of clusters, e.g. from individual trawls or fishing trips. The ages of fish caught together tend to be more similar than those in the entire population (i.e. intracluster correlation is positive), and, therefore, the resulting sample will often contain much less information on the age distribution than an equal number of fish sampled at random. If it is assumed that the sample of fish collected from clusters is a random sample of individuals, then the estimated age distribution will appear to be more precise than it actually is (Pennington and Vølstad, 1994; Pennington et al., 2002).

In this article, we examine the precision of the estimates of the age distribution of the Norwegian commercial catch of Northeast Arctic cod obtained by applying two different estimators. A number of fish were aged from each of an assumed random sample of fishing trips; the sample, therefore, consists of a number of clusters, each from a larger cluster (i.e. the fish caught during a trip). In addition, because age determination of fish is time-consuming, we examined whether the number of fish aged from each trip could be reduced without significantly reducing the precision of the estimates.


    Commercial age data
 Top
 Introduction
 Commercial age data
 Estimating the age composition
 Results
 Discussion
 References
 
The weight of Northeast Arctic cod caught in various statistical regions of the Barents Sea and adjacent areas by season and for various types of gears is available from a census, conducted by the Norwegian Directorate of Fisheries, of the commercial catch. This census does not collect information on the age or length composition of the catch. To estimate catch characteristics, such as the catch at age in numbers or weight, the Institute of Marine Research (IMR) collects fish from selected fishing trips. For practical reasons, not all the combinations of regions, seasons and gears are sampled. The sampling design for this survey does not take into account region when sampling the catches.

When a sample from a fishing trip is landed, the weight and the length of each fish are recorded along with the size of the catch taken during the trip. To determine the age of a fish, otoliths are removed and stored, and later placed under a microscope for the growth zones to be counted. Age determination is both time-consuming and difficult; currently, the average handling time for sampling a cod and determining its age is approximately 7–10 min. In 2000, 126 catches were sampled by IMR, and some 85 fish were subsampled from each catch for age determination.

In this work, we assume a random sample of catches (fishing trips) from the cod population in terms of catches (trips) in each of the four quarters of the year. We also assume that the fish were chosen randomly from each catch and that ages were determined without error.


    Estimating the age composition
 Top
 Introduction
 Commercial age data
 Estimating the age composition
 Results
 Discussion
 References
 
Given a random sample of n clusters (landings) and a random subsample of mi fish from a total of Mi individual fish in cluster i, then the estimator based on the sampling design


Formula 1

(1)
is an approximately unbiased and consistent estimator of the mean age of the population if xi is the average age of the sample of mi fish from cluster i, or the proportion at age in the population if xi is the estimated proportion of fish of specific age in cluster i (Skinner et al., 1989). This is a weighted average of the xs, where the cluster sizes are the weights. As both numerator and denominator are random variables this is a ratio-type estimator (Cochran, 1977), and an exact variance formula does not exist. The variance may be approximated using a Taylor expansion of Equation (1) or by resampling techniques, such as non-parametric bootstrapping (Efron, 1983).

An alternative to the design-based estimator is the unweighted average of the xs:


Formula 2

(2)

An estimate of the variance of Formula is given by Formula Bootstrapping techniques can also be used to assess the precision of Formula2. In general, this unweighted estimator Formula2 may be biased and this bias may not decrease with increasing sample size, but if xi and Mi are uncorrelated, then Formula2 may be an acceptable estimator (Cochran, 1977).

To illustrate the factors that affect the relative precision of Formula1 and Formula2, consider the standard random effects model


Formula 3

(3)
where xij is the measured quantity for individual j in cluster i, µ the overall mean, Ai the random effect for cluster i, and {varepsilon}ij is the residual error, and Ai and {varepsilon}ij are independent of cluster size. To simplify the algebra, consider the case where sampling is proportional to cluster size, i.e. mi{propto}Mi, or the entire cluster is sampled, i.e. mi=Mi. The samples are then self-weighting and Formula1 can be rewritten as


Formula 4

(4)

Both Formula1 and Formula2 are model-unbiased, and it may be shown (Pennington and Vølstad, 1994), given model (3), that


Formula 5

(5)
where m is a vector of the n sample sizes {sigma}x2={sigma}A2+{sigma}{varepsilon}2, where {sigma}A2 and {sigma}{varepsilon}2 denote the variances of Ai and {varepsilon}ij, respectively; m and sm2 are the mean and variance of the mis, respectively; and {rho}={sigma}A2/({sigma}A2+{sigma}{varepsilon}2) is the (model-based) intracluster correlation coefficient (Searle et al., 1992). The conditional variance of Formula2 is given by


Formula 6

(6)

To compare the precision of the weighted average Formula1 and the unweighted average Formula2, Equation (5) can be expressed as


Formula 7

(7)

If the intracluster correlation coefficient, {rho}, is 0, then{sigma}A2=0, and it follows from the Cauchy–Schwarz inequality that Formula for all m with equality only if the sample sizes, mi, are equal. Therefore, if {rho}=0, then for any m, the variance of Formula1 will be less than or equal to the variance of the unweighted estimator Formula2.

If {rho}>0, then whichever is smaller, Formula or Formula will depend on the sizes of the various components in Equations (6) and (7). In particular, it should be noted that the first term in Equation (6) can be considerably larger than the first term in Equation (7) when the samples consist of many small values and a few very large ones, which is often the case for scientific trawl surveys of fish stocks (Pennington et al., 2002). Further, it should be stressed that Formula2 may be biased and, hence, its variance can be smaller than that of Formula1, while its mean-squared error is larger. For more details on the possible drawbacks of model-based estimators, see Smith (1990).

To assess the information contained in the data, it is useful to consider the effective sample size, meff (Kish, 1965; Skinner et al., 1989). The effective sample size is defined as the number of fish that would need to be sampled at random so that the estimates generated by simple random sampling would have had the same precision as the estimates obtained based on a more complex sampling scheme. To estimate the effective sample size, we first estimated the mean age of the catch, Formulaa, its variance, and the variance of the age distribution of the catch, Formulaa2. Then, the estimated effective sample size, meff, is defined by


Formula 8

(8)

If the effective sample size is small, then this implies that the estimate of the entire age distribution is rather imprecise (Pennington and Vølstad, 1994; Pennington et al., 2002).

As the usual formula for estimating the variance of a ratio estimator, which is based on a Taylor approximation, tends to overestimate the precision, especially for small sample sizes (Cochran, 1977; Pennington and Vølstad, 1994), non-parametric bootstrapping was used to estimate the variance (Efron, 1983). Each replicate was generated by first sampling the trips at random with replacement, then a number of individual fish ages from each selected trip were sampled with replacement. The effect of reducing the number of fish aged on the precision of the estimated age distribution was assessed by reducing the number aged from each trip in the resampling procedure. The intracluster correlation coefficients were estimated as in Searle et al. (1992).


    Results
 Top
 Introduction
 Commercial age data
 Estimating the age composition
 Results
 Discussion
 References
 
The estimate of the intracluster correlation coefficient was significantly greater than 0 in all quarters (Table 1). Because xi did not significantly correlate with Mi, for either average age or proportion at age, both the weighted estimator (Formula1, Equation 1), and the unweighted estimator (Formula2, Equation 2) were applied. Based on bootstrapped estimates of the variance, it appeared that Formula is larger than Formula for all quarters except quarter 4 (Table 1).


View this table:
[in this window]
[in a new window]

 
Table 1 Summary statistics for estimating the mean age of the Northeast Arctic cod catch in 2000 using estimators Formula1 and Formula2 (see text). The parameter n is the number of fishing trips sampled, m the total number of fish aged from the n trips, Formulai and se(Formulai) the estimated mean age and its standard error for i=1, 2, respectively, mi,eff the effective sample size for i=1, 2, and Formula is the estimated intracluster correlation coefficient. The approximate 95% confidence intervals are in parentheses. The estimated standard errors and confidence intervals are based on 500 bootstrap replicates.

 
The effective sample size, which is based on estimates of the average age and its variance (Equation 8), was much smaller than the number of fish aged in each region (Table 1). For example, if it were possible to sample fish at random from the total catch in quarter 2, then it would have been sufficient to determine the ages of slightly more than 200 fish (211 and 213, if applying Formula1 and Formula2, respectively), instead of the 2277 fish that were aged, to obtain the same precision for the estimates of mean age. To further illustrate the effect of a low effective sample size, the 95% confidence interval for the mean age for quarter 2 based on Formula1 is approximately 5.33±0.04, if the fish are assumed to be a random sample from the population, while the correct confidence interval is approximately 5.33±0.13.

The effect of reducing the number of fish aged from each catch during quarter 1 on the error coefficient of variance (ECV, the estimated standard error divided by the estimated mean) for cod aged between 3 and 11+ is shown in Figure 1. The weighted average appeared to be less precise than the unweighted average in quarter 1. Regardless of the estimator applied, the curves for ages 4 through 10 were rather flat in the range of 20–85 fish, and increased fairly rapidly when the number of fish sampled was <20. For fish aged 3, the ECV appeared to be highest, but age 3 cod are not fully recruited to the fishery. For ages 11+, the ECV was large even when 85 fish were sampled per catch (0.30 for Formula1 and 0.19 for Formula2). Therefore, it appeared that little is gained by sampling more than approximately 20 fish per catch. Results were the same for the other three quarters, except that the weighted average appeared to be more precise for quarter 4 (Table 1).


Figure 1
View larger version (25K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 1 Error coefficient of variation (ECV) for the estimated proportion at age plotted against the number of cod aged per fishing trip during quarter 1 of 2000. The dotted line represents estimator Formula1 and the solid line, Formula2.

 
Figure 2 shows the estimated age distributions generated by the two estimators for each quarter and the stratified estimates for the entire year. The inner brackets denote the bootstrapped 95% confidence interval based on all fish aged, and the outer brackets the 95% intervals when the number of fish sampled within a catch was reduced from approximately 85 to 20. Again, it is apparent that reducing the number of cod aged from each catch would only slightly affect precision. An analysis of catch data from several other years, which included evaluating various post-stratification schemes, e.g. by gear, gave similar results: relatively small effective sample sizes and little loss of precision when the number of fish aged from each catch was reduced to 20.


Figure 2
View larger version (19K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 2 Estimated age distribution of the Norwegian commercial catch of cod in 2000 for each quarter and the whole year. The inner brackets denote the approximate 95% confidence interval based on all age readings, and the outer brackets the approximate 95% confidence intervals when the number of fish aged per catch was reduced from approximately 85 to 20. The open circles denote Formula1 and the filled circles Formula2.

 
The effect of varying the number of trips sampled and the number of fish sampled per trip when the intracluster correlation is positive is shown in Figure 3. The precision of the estimate was much more sensitive to varying the number of trips sampled than to varying the number of fish sampled per trip. In particular, if only the number of fish sampled per trip were to be increased, the variance tends towards an asymptote, which is equal to ({sigma}x2/n){rho}.


Figure 3
View larger version (12K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 3 Variance of the estimate of mean age [Equation (5)] with equal cluster sizes and parameters as estimated for quarter 1 as a function of the number of trips, n, sampled (left panel) and the number of fish sampled per trip, m (right panel). The dots correspond to the sampling intensity in quarter 1 (n=70, m=85).

 

    Discussion
 Top
 Introduction
 Commercial age data
 Estimating the age composition
 Results
 Discussion
 References
 
If it were feasible to select, at random, fish from the entire catch in each quarter, then a sample of about 500–600 cod would have provided an estimate of the mean age that was approximately as precise as the one based on 11 000 fish. However, even if an estimate of mean age is not needed for assessing a stock, a low effective sample size implies that the estimate of the entire age distribution is rather imprecise (Pennington and Vølstad, 1994; Pennington et al., 2002). For example, even though the ages of some 11 000 cod caught during 126 trips were determined for 2000, the estimate of the age distribution of the commercial catch was rather imprecise (Figure 2).

The low effective sample sizes were caused by the tendency for cod caught together to be more similar in age than those in the total catch. Similar results hold for scientific trawl surveys, which are conducted worldwide to assess the status of fish stocks. These surveys sample clusters of animals (a trawl catch), and because of intracluster correlation and variable density, the effective sample sizes for estimates of biological characteristics are small compared with the number of animals sampled (Pennington and Vølstad, 1994; Pennington et al., 2002).

The method used to subsample a cluster, e.g. stratify by length and then select fish to age, will not significantly increase or decrease the effective sample size if there is intracluster correlation. For instance, even if all fish are measured during a bottom trawl survey, the effective sample sizes are still extremely small (Pennington and Vølstad, 1994).

Except in quarter 4, the unweighted estimator Formula2 appeared to be more precise for estimating the mean age and the proportion of catch at age than the weighted estimator Formula1 for this data set (Table 1, Figure 2). However, care should be taken while applying the unweighted estimator because it may be biased and inconsistent.

We assumed that the age of cod was determined without error. The strength of year classes varies considerably and, therefore, even low rates of misclassification error can cause severe overestimation of the proportion in the catch of a small year class and, to a lesser degree, underestimation of an abundant cohort (Richards et al., 1992; Worthington et al., 1995).

For routine surveys of the commercial catch of Northeast Arctic cod, one way to reduce possible classification errors would be to decrease the number of fish sampled from each catch and to use the time saved in being more rigorous in determining the age of each cod. As shown, such a reduction in the number of cod sampled within each cluster would only marginally decrease the precision of the estimates. The time saved by collecting and reading fewer otoliths is significant. For example, a reduction from about 85 to 20 cod from each catch sampled in 2000 would have reduced the number of fish sampled by Norway from 11 000 to 2500, which could decrease reading and handling costs by 150 workdays. Part of the time saved could be used to collect samples from more landings, which is the only practical way to improve the precision of the surveys. Motivated by these results, IMR will in future sample fewer fish from each catch and increase the number of landings sampled.


    Acknowledgements
 
We thank Dr Kristin Helle (IMR) and Prof. Dag Tjøstheim (Department of Mathematics, University of Bergen) for their comments, Per Ågotnes (IMR) for an introduction to the practical side of collecting the catch data, and two anonymous reviewers for their constructive comments on the manuscript submitted. The Norwegian Research Council provided financial support for the work.


    References
 Top
 Introduction
 Commercial age data
 Estimating the age composition
 Results
 Discussion
 References
 

    Cochran W.G. (1977) Sampling Techniques 3rd edition (Wiley, New York).

    Efron B. (1983) The Jackknife, the Bootstrap and Other Resampling Plans 2nd edition (Society for Industrial and Applied Mathematics, Philadelphia).

    Gudmundsson G. (1994) Time series analysis of catch at age observations. Applied Statistics – Journal of the Royal Statistical Society, Series C 43:117–126.[Web of Science]

    Hoenig J.M. and Heisey D.M. (1987) Use of a log-linear model with the EM algorithm to correct estimates of stock composition and to convert length to age. Transactions of the American Fisheries Society 116:232–243.[CrossRef]

    Kish L. (1965) Survey Sampling(Wiley, New York).

    Nakken O. (1994) Causes of trends and fluctuations in the Arcto-Norwegian cod stock. ICES Marine Science Symposia 198:212–228.

    Pennington M., Burmeister L.M., Hjellvik V. (2002) Assessing the precision of frequency distributions estimated from trawl-survey samples. Fishery Bulletin, US 100:74–81.

    Pennington M. and Vølstad J.H. (1994) Assessing the effect of intra-haul correlation and variable density on estimates of population characteristics from marine surveys. Biometrics 50:725–732.[CrossRef][Web of Science]

    Richards L.J., Schnute J.T., Kronlund A.R., Beamish R.J. (1992) Statistical models for the analysis of ageing error. Canadian Journal of Fisheries and Aquatic Sciences 49:1801–1815.

    Searle R.S., Casella G., McCulloch C.E. (1992) Variance Components(Wiley, New York).

    Skagen D. and Hauge K.H. (2002) Recent development of methods for analytical fish stock assessment within ICES. ICES Marine Science Symposia 215:523–531.

    In Skinner C.J., Holt D., Smith T.M.F. (Eds.). Analysis of Complex Surveys (1989) (Wiley, New York).

    Smith S.J. (1990) Use of statistical models for the estimation of abundance from groundfish survey data. Canadian Journal of Fisheries and Aquatic Sciences 47:894–903.

    Worthington D.G., Fowler A.J., Doherty P.J. (1995) Determining the most efficient method of age determination for estimating the age structure of a fish population. Canadian Journal of Fisheries and Aquatic Sciences 52:2320–2326.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Sondre, A.
Right arrow Articles by Pennington, M.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Sondre, A.
Right arrow Articles by Pennington, M.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?