Skip Navigation

ICES Journal of Marine Science: Journal du Conseil 2004 61(2):218-230; doi:10.1016/j.icesjms.2003.12.006
© 2004 by ICES/CIEM International Council for the Exploration of the Sea/Conseil International pour l'Exploration de la Mer
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Laslett, G. M.
Right arrow Articles by Polacheck, T.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Laslett, G. M.
Right arrow Articles by Polacheck, T.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Fitting growth models to length frequency data

Geoff M. Lasletta,*, J. Paige Evesonb and Tom Polacheckb

a CSIRO Mathematical and Information Sciences Private Bag 10, Clayton South MDC, Clayton, Victoria 3169, Australia
b CSIRO Marine Research GPO Box 1538, Hobart, Tasmania 7001, Australia

*Correspondence to G. M. Laslett: tel: +61 3 9545 8018; fax: +61 3 9545 8080. e-mail: geoff.laslett{at}csiro.au; paige.eveson{at}csiro.au; tom.polacheck{at}csiro.au.

A novel two-stage procedure for fitting growth curves to length frequency data collected from commercial fisheries is described. The method is suitable for species in which cohorts are spawned over a limited time period, and samples of length frequency data are collected regularly (e.g. in weekly, fortnightly, or monthly time intervals) over an extended time period. In the first stage of analysis, Gaussian mixtures are fitted separately to the data for each time interval, and summary statistics (component means and standard errors) are extracted. In the second stage, parametric growth models, such as the von Bertalanffy seasonal growth curve, are fitted to the summary data. The error structure in this second stage of analysis incorporates random between-year effects, random within-year age-group effects, random within-year time-interval effects, random within-year age-group and time-interval interactions, and sampling errors. This complex error structure incorporating unbalanced crossed and nested random effects acknowledges that commercial fishing is not an exercise in random sampling, and allows for the inevitable additional sources of random variation in such an enterprise. The method is applied to South Australian southern bluefin tuna length frequency data collected from 1964 to 1989, and leads to the conclusion that juvenile tuna grew faster in the 1980s than in the 1960s, with the 1970s being a decade of highly variable growth.

Keywords: maximum likelihood, mixture decompositions, variance components

Received 10 May 2003; accepted 17 December 2003.


    1 Introduction
 Top
 1 Introduction
 2 The data
 3 Fitting Gaussian mixtures
 4 Fitting a growth...
 5 Discussion
 Appendix A
 References
 
Valuable information about the growth of fish can often be extracted from length data that have been collected regularly over an extended time period. Such data often exist for commercially harvested species where routine length sampling of the catch occurs. If a species has a restricted spawning period, then fish belonging to the same cohort and caught around the same time will exhibit a limited range of lengths. For young fish, which are growing quickly, the overlap in the length ranges between ages is often small enough so that the length frequency distribution will show obvious modes. For older fish, the overlap in length ranges becomes progressively greater so that the modes become more difficult to distinguish. The progression of modal lengths over time can be tracked to give information on the growth of young fish.

Length frequency data provide information on two aspects of growth. First, yearly growth can be estimated by comparing the average length of one-year-olds, two-year-olds, three-year-olds, and so on caught at the same time. Second, seasonal growth can be inferred by following the growth of a particular age group within a year. Other data sources, such as tag–recapture surveys and direct ageing data from hard parts analyses, often do not exist on a regular enough time scale to be able to provide detailed information on seasonal growth. Length frequency data are important for this reason.

Extracting the information on growth from length frequency data is not straightforward. First, length frequency data do not come with any independent age attribution so the researcher has to assign the fish to age classes, either explicitly or statistically. Second, the spawning period for a species may be several months and there may be peaks in spawning activity within this period. Such variable spawning can complicate the modal decomposition, and also the growth analysis because growth patterns of fish that were spawned early in the season may differ from those spawned later. Third, length data are often collected from commercial fisheries. In one sense, fisheries data are more informative than data from scientific research programs because they are more abundant and more consistent over time. However, fishing is not designed as a random sampling exercise and, consequently, caution must be used in treating the length data as an unbiased random sample of the population. Finally, measurement error is endemic and may be dependent on the measurer. It is important to develop methods of data analysis that capture these sources of variation.

This article presents our method for extracting growth information from length frequency data. Our method has some features in common with other methods presented in the quantitative fisheries literature (Fournier et al., 1990; Leigh and Hearn, 2000), but departs from them in significant ways. In particular, we develop a two-stage analysis. In the first stage, each length frequency distribution is decomposed into age groups using a Gaussian mixture model and relevant summary statistics are extracted. In the second stage, the summary statistics are used as raw data for growth modelling. This approach allows us to explore and visualize the sources of variation in the data prior to final modelling. More direct (i.e. single-stage) methods are likely to overlook the many possible complications in real length frequency data.

We illustrate our method on southern bluefin tuna length data collected from the South Australian surface fishery. The surface fishery operates annually from approximately November to July of the following year, and is the largest fishery in Australia. The catches are sampled for length regularly throughout each fishing season and aggregated half-monthly, so a consistent and long-term time-series of length data exists. Additionally, the surface catches consist predominantly of juvenile fish aged 1–5, and therefore are ideal for modal length analysis.

In this article, we first discuss the form of the southern bluefin tuna length frequency data used in our analysis and some of its features. We outline our method of analysis, in which we fit a growth model to summary statistics derived from fitting mixture models to each length frequency sample. Finally, we apply the method to southern bluefin data and discuss the results and the method in general.


    2 The data
 Top
 1 Introduction
 2 The data
 3 Fitting Gaussian mixtures
 4 Fitting a growth...
 5 Discussion
 Appendix A
 References
 
We focus on length frequency data collected from southern bluefin tuna caught in the South Australian surface fishery from 1964 through 1989. During this time, fish were caught for the canning market using purse-seine vessels, which use nets that catch the majority of fish in a school and do not have the ability to select by size within a school. In the 1990s, the focus of the fishery switched to catching tuna for the sashimi market using pole and line vessels, which target larger fish within schools. In order to avoid biases from size-selective fishing, we only include data prior to 1990 in our analysis.

The length data were collected using a two-phase sampling procedure. First, within each half-month, landings of tuna were selected for detailed study. Second, a sample of tuna was taken from each selected landing and the fork length of each fish was measured. The sample data from a landing were first scaled up by the ratio of the weight of fish in that landing to the weight of fish in the sample; then the resulting data were scaled up by the ratio of the weight of the catch from all landings in the half-month to the weight of the catch from all selected landings to allow for landings that were not sampled. The result was an estimate of the length frequency distribution for the entire catch in each half-month. Sampling protocols were established to avoid sampling biases. For details of the sampling and scaling procedures, consult Majkowski (1982) and Majkowski and Morris (1986).

Only the scaled-up data were available to us, so we could not apply conventional statistical analysis to the recorded catch, because it is larger than the sampled catch. Standard error estimates from treating the recorded data as raw data would, in general, be too small. To correct for this, we adopted the weighting factors derived in Appendix 1 of Leigh and Hearn (2000), which are based on the variance of the mean appropriate for the two-phase sampling procedure. Let {psi} denote the weighting factor for a given half-month. If ri is the recorded frequency of fish in length class i for that half-month, then mi=ri/{psi} is an estimate of the equivalent number of fish in class i that would have been observed under simple random sampling. The weighting factors had a median of 23.9 and an interquartile range of 13.4–44.3, and the effective sample sizes (mi's) had a median of 1710 and an interquartile range of 821–2996.

The half-months always start on day 1 or 16 of each month, so that the second half-month can have between 13 days (in a normal February) and 16 days (in a 31-day month). In South Australia, the fishing season is defined as November to July, with the bulk of the fishing taking place in December to March. For the purposes of data analysis, we assigned half-month indices of –3,...,20 to the 24 half-months from November to October. This allowed us to follow the seasonal growth of tuna through the southern summer. In practice, the index range –2 to 13 covered the fishing season.

Southern bluefin tuna are spawned in Indonesian waters off the northwest coast of Australia. Spawning occurs over a wide but restricted range of the year, predominantly from September to March (Davis and Nurhakim, 2001), with 1 January as the estimated midpoint. As such, the length frequency distributions generally show obvious modes for the younger, fast-growing fish (age classes 1, 2, 3, and 4), but the modes cannot easily be distinguished for older, slow-growing fish. The data used in our analysis range in length from 38 to 200 cm, although over 99% of the data are less than 130 cm (approximately age 6). We included data of 130 cm or less in our analysis, because for larger (older) fish the data are sparse and irregular and the modes are difficult to distinguish.


    3 Fitting Gaussian mixtures
 Top
 1 Introduction
 2 The data
 3 Fitting Gaussian mixtures
 4 Fitting a growth...
 5 Discussion
 Appendix A
 References
 
In the first step of our analysis, we fitted a Gaussian mixture to the length frequency data from each half-month separately. For a given half-month, suppose there are mi fish of length li for i=1,...,n. Here and elsewhere, mi is the recorded catch count divided by the Leigh and Hearn (2000) weighting factor. The lengths are assumed to be generated from a K component mixture in which the probability density of component k is gk(li), where


Formula

We assume that all components within a half-month have the same standard deviation {sigma}. Although it is technically possible to fit a different standard deviation for each component, particularly if we model it as a smooth function, we did not find this necessary. For the age classes we are considering, the standard deviation in length does not increase markedly with age (as one might expect if older fish were included).

The log-likelihood is


Formula 1

(1)
where {pi}k is the proportion of fish belonging to component k.

To estimate the parameters, we maximize the log-likelihood h. After considerable experimentation on our data, we concluded that it is necessary to use an optimization method that uses first and second derivatives. Derivative-free optimization methods for this problem are slow and do not necessarily converge to the maximum likelihood estimates. The first and second derivatives are set out explicitly in Appendix 7 of Polacheck et al. (2003). As outlined in this appendix, we reparameterized the proportions {pi}k using the logistic transform commonly adopted in the analysis of multinomial data. This guaranteed that the {pi}k are non-negative and sum to 1. Also, it is easy to differentiate the logistic function.

We applied this method of analysis to each half-month of data separately. However, before we could do so, we needed to make some decisions. First, we needed to decide the number of components K to fit. Hunt and Jorgensen (1999) discuss this issue for fisheries' length frequency data, highlighting the point that some older groups may have no fish or very few fish in the sample. Hence, K is impossible to choose from sample data alone. They conclude, and we concur, that K should be chosen by the modeller. Our strategy was to eliminate data greater than 130 cm, and to start with K=5. Groups 1–4 correspond to cohorts of ages 1–4, respectively, and group 5 represents fish of age 5 years and more. For a given K, the fitted mixture model was superimposed on a histogram of the data to confirm that the fit was reasonable and biologically plausible. If there was graphical evidence of overfitting – almost always this was simultaneously indicated by at least one parameter estimate on a boundary of the parameter space – K was reduced by 1, and the model refitted. It was quite common to end up with K=3 or K=4, but occasionally K=2 or even K=1.

Second, we needed to determine starting values for the parameters to be estimated. The starting values for µk were derived from a growth model fitted to corresponding tag–recapture data. To do so, each component in a half-month was assigned an average age equal to its age class plus the fractional time of the year between the midpoint of the sample period and the midpoint of the spawning period (taken to be 1 January). Specifically, the k-year-old group was assigned the average age k+(j–0.5)/24 years, where j is the half-month index and 24 is the number of half-months. For example, component k in half-month 2 of a given year was assigned the age k+0.0625, where 0.0625=1.5/24. By evaluating a seasonal VB log k growth model fitted to tag–recapture data (Laslett et al., 2002) at these ages, we obtained estimated mean lengths to be used as starting values for µk, which we will denote by Formula . Note that in obtaining the starting values, we used decade-specific growth curves; i.e., for a sample from the 1960s we used a seasonal VB log k growth model fitted to 1960s' tag–recapture data, and likewise for the 1970s and 1980s. In order that the optimization routine could not switch groups, the µk were assumed to lie between the bounds ak and bk, where Formula , Formula for k≥2, bk=ak+1 for 1≤k≤K–1 and Formula . The initial values of µk and bounds that we used for samples collected in the 1960s are presented in Table 1. The starting value for {sigma} was generally taken to be {sigma}=4, and the proportions were initially assumed to be equal.


View this table:
[in this window]
[in a new window]

 
Table 1 Initial estimates of µk and bounds used for samples from half-months 2 collected in the 1960s.

 
Proceeding in this way, we fitted a Gaussian mixture model to each available half-month of South Australian length frequency data from 1964/1965 to 1988/1989. The results for selected half-months in the 1981/1982 season are shown in Figure 1. The patterns seen are typical: the data display a number of modes, although the number of components of the fitted mixture does not necessarily equal the number of modes; the fit is sometimes excellent, and sometimes problematic; when the half-month has only a few fish, the fit can appear very bad, and the fitted components are little more than mathematical artefacts. In all but a few half-months, the allocation of fitted components to age groups was easy and unequivocal. Occasionally, an extra age group (usually a young one-year-old group in the latter part of the season) appeared, perhaps due to two peak times in spawning (Davis and Nurhakim, 2001). In such cases, we usually fitted a single age component to the bimodal group (for example, see Figure 1, half-month 7).


Figure 1
View larger version (16K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 1 South Australian 1981/1982 length frequency data and fitted Gaussian mixtures. The scaled length frequency data are plotted for the first half of December, January, February, March, April, and June (half-months –1, 1, 3, 5, 7, and 11, respectively). The locations of the fitted modes are indicated with vertical dashed lines and the corresponding age classes are given at the top of the lines.

 
The data retained for the next phase of analysis were the fitted component means and their standard errors, estimated by inverting the observed information matrix. We estimated the number of fish in each age group by multiplying the effective sample size ({sum}mi) by the estimated proportions in each age group. As an example, the summary statistics for the most complex panel in Figure 1, half-month 7 of 1982, are set out in Table 2. Groups with less than 50 fish were not included in the next phase of analysis – informal graphical analysis suggested that the maximum likelihood estimates of the standard errors, which rely on asymptotic theory, were not reliable for smaller groups.


View this table:
[in this window]
[in a new window]

 
Table 2 Fitted components of a Gaussian mixture, half-month 7 in 1982. (s.e. = standard error).

 
The fitted component means (for groups estimated to have at least 50 fish) for the 1960s, 1970s, and 1980s are shown in Figures 2, 3, and 4, respectively. Here, the estimated means Formula from the maximum likelihood fitting program are plotted against half-month, and means for different age groups are distinguished by different symbols. The three figures show a consistent pattern. Although there are inconsistencies between years, on average, the means for a particular age group tend to increase up to about half-month 7 and then flatten off – this pattern is evident to the eye, but would be even more pronounced if a non-parametric trend curve was added to each age group's data. This is consistent with a seasonal growth pattern in which growth is most pronounced over summer, but is quite minimal over winter. Also, comparing between plots, it is evident that by age 4, fish were larger, on average, in the 1980s than in the previous decades, and that growth was highly variable in the 1970s.


Figure 2
View larger version (6K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 2 Fitted means for the age groups 1, 2, 3, 4, and 5+ in the 1960s plotted against half-month. The estimated means from the Gaussian mixture fitting program have been assigned to age groups, with 1-, 2-, 3-, and 4-year-olds plotted as circles, triangles, plusses, and crosses, respectively. (Diamonds correspond to 5+ year-olds, but we do not use these data in subsequent analyses). The 2- and 3-year-olds are numerically dominant, and on average, their means tend to increase up to half-month 7 and then flatten off, reflecting faster summer growth.

 


Figure 3
View larger version (8K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 3 Fitted means for age groups 1, 2, 3, 4, and 5+ in the 1970s plotted against half-month. The symbols are the same as in Figure 2. The seasonal growth pattern is still evident; however, the separation between age groups is not as clear as in Figure 2.

 


Figure 4
View larger version (7K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 4 Fitted means for age groups 1, 2, 3, 4, and 5+ in the 1980s plotted against half-month. The symbols are the same as in Figure 2. There is good separation between age groups and a clear seasonal growth pattern in the numerically dominant 2-year-olds.

 
Note that although the age 5+ component means are shown in these figures, they are not included in subsequent growth modelling because they encompass all fish of age 5 years and more present in the sample.


    4 Fitting a growth model
 Top
 1 Introduction
 2 The data
 3 Fitting Gaussian mixtures
 4 Fitting a growth...
 5 Discussion
 Appendix A
 References
 
We assume that we have performed a mixture decomposition on the data for each half-month, and we have generated a mean and an accompanying standard error for each age group. Denote these by Formula and s, respectively. Let i index the fishing season, j the half-month, and k the age group. We assume that we can allocate a mean age ajk=k+(j–0.5)/24 to the fish at half-month j in age-group k (as discussed in the earlier paragraph on starting values). Then our initial model is


Formula 2

(2)
where µ(a) is the mean length of fish of age a, and {tau}, {nu}, {omega}, e, and {varepsilon} are all independent random effects. We assume that {tau}~N(0,{sigma}{tau}2), {nu}~N(0,{sigma}{nu}2), {omega}~N(0,{sigma}{omega}2), e~N(0,{sigma}e2), and {varepsilon}~N(0,s2), where {tau} represents a random between-season effect, {nu} a within-season random half-month effect, {omega} a within-season random age effect, and e is a within-season half-month age interaction. Finally, {varepsilon} represents sampling error, and its variance is assumed to be known (and equal to the standard error estimate obtained in the previous step). Strictly speaking, {varepsilon}ijk and {varepsilon}ijk' for k!=k' and a given i and j should be correlated because Formula and Formula have been derived from the same mixture decomposition. In our experience, such correlations are usually weak, and are effectively captured by the within-season half-month random effect {nu}.

Before proceeding, we mention that some years may be more favourable for growth than others due to, say, better environmental conditions or reduced competition (e.g. smaller population size). The random between-season effect is meant to capture this. We might also expect some cohorts to grow faster than others. It would be possible to include random cohort effects in the model as well as, or instead of, the between-season effects, although fitting the model is then more complicated and the between-season effects and the cohort effects are likely to be confounded. We believe that random between-season variations are more natural, and therefore adopt the simpler model given in Equation (2). If desired, non-random cohort effects could be included in the growth model by making growth rate parameters vary as a smooth function of cohort. We hope to outline this generalization in a future publication.

The mean growth curve can depend on several (at least three) parameters. A minimal model is the von Bertalanffy growth curve:


Formula

where µ{infty}, {kappa}, and a0 are parameters to be estimated. They represent mean asymptotic length, rate of growth, and the (theoretical) age at which a fish has length zero, respectively. It would be possible to add random effects to µ{infty}, to make {kappa} depend on cohort (as mentioned above) and to make a0 vary with age group, for example, but we prefer to fit the most parsimonious model we can to the data. However, one complication cannot be ignored. We have already seen that growth within seasons is faster in summer than in winter, and any model fitted to length frequency data should capture this effect. We incorporate a within-season growth pattern into the von Bertalanffy growth curve using a sine function as follows:


Formula

where t is the fractional time of year since 1 January, us is the amplitude of the seasonal growth pattern, and ws is the phase. We impose the condition –1≤us≤1 to guarantee that growth is monotonic, and also that –0.5<ws≤0.5 (any bounds with a span of one could have been chosen due to the periodicity of the function).

We estimate the parameters by maximizing the likelihood. The data for each season are assumed to be independent, so we can add up the log-likelihoods for each season. We can write the model in vector form as


Formula

where Formula is a vector of fitted means for all retained age-group half-month combinations in season i and µi is the vector of mean lengths from the growth curve. We shall let m index the data items (µ and s for each age-group half-month combination m). Also, 1 is a vector of 1's of the same length as Formula , and X{nu} and X{omega} are design matrices; that is, X{nu},mj=1 if datum m belongs to half-month j and is zero otherwise, and X{omega},mk=1 if datum m belongs to age-group k and is zero otherwise.

We need to compute the likelihood within a season. We drop the subscript i for notational convenience. The log-likelihood is


Formula

where V is the variance–covariance matrix of the data. Now,


Formula

where D{varepsilon} is the diagonal matrix with diagonal elements dmm=sm2. We can write V as


Formula

where D={sigma}e2I+D{varepsilon} is a diagonal matrix and Formula is a design matrix. Many popular software packages are unable to solve the likelihood equations for this model, either because the user cannot specify in advance the value of any variance component or because the fitting routines do not allow unbalanced crossed and nested random effects. In Appendix A, we indicate how to maximize the likelihood from first principles.

Once the log-likelihood Formula of the full data set over all seasons has been maximized, we need to compute the estimated random effects. It is customary to use the best linear unbiased predictors (BLUPs):


Formula



Formula



Formula

and


Formula

where Formula

Approximate standard errors for the parameter estimates can also be calculated by evaluating the inverted observed information matrix, Formula , where {theta} is the vector of parameters, at the maximum of the log-likelihood function.

Trials of this method proved reasonably satisfactory. The parameter estimates from fitting a seasonal von Bertalanffy growth model to the 1960s', 1970s', and 1980s' summary statistics are set out in Table 3, along with their estimated standard errors. We fixed µ{infty} at 185 cm because the longest fish in the analysis of the length frequency data belonged to four-year-old tuna, which are still between 50% and 60% of their asymptotic length. Analyses of tag–recapture data and direct ageing data (Laslett et al., 2002; Polacheck et al., 2003) both suggested that 185 cm is a reasonable value for µ{infty}.


View this table:
[in this window]
[in a new window]

 
Table 3 Fitted von Bertalanffy growth model parameters for South Australian length frequency data. Standard error estimates are shown in parentheses below the parameter estimates. The negative log-likelihood value is given in the last column.

 
We also fitted the model allowing µ{infty} to be a free parameter. In all decades, Formula , Formula , and Formula differed significantly when µ{infty} was a free parameter versus when it was fixed; however, the resulting mean growth curves were almost indistinguishable within the limited range of the data (from ages 1 to 4). When µ{infty} was free, the parameter estimates were not biologically plausible; for example, µ{infty} was estimated to be 280 cm in the 1980s, which is far too large to be realistic given the maximum length of SBT ever caught in adult fisheries.

The fits of the growth model with µ{infty} fixed at 185 cm were examined using residual plots (Figure 5). Overall, the fits look quite good; however, there is a slight convex pattern in the 1980s' residuals, and to a lesser degree in the 1960s' residuals. This may reflect the findings of previous studies (Laslett et al., 2002; Hearn and Polacheck, 2003), which showed that the von Bertalanffy growth curve cannot adequately capture the fast early growth of southern bluefin tuna. Fitting a more complex growth curve, such as the VB log k model introduced in Laslett et al. (2002), may fix the pattern in the residuals, but at the cost of more parameters to estimate.


Figure 5
View larger version (12K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 5 Residuals from fitting the growth model to the 1960s', 1970s', and 1980s' summary length frequency data sets.

 
In any case, the simple von Bertalanffy fits were adequate to make some general observations. The growth parameters for the 1960s and 1970s are very similar, but growth appears to be faster in the 1980s. The seasonal parameters us and ws appear to be quite plausible; the ws values suggest that growth is fastest during February/March, which is consistent with our belief that southern bluefin tuna experience a period of fast growth during the southern summer. The estimated standard deviations of the random effects vary in magnitude between decades, but suggest that most of the random effects contribute significantly to explaining variation in growth. Only {sigma}{tau} in the 1960s and {sigma}{nu} in the 1980s did not differ significantly from zero (the former having a wide confidence interval that encompasses zero and the latter converging to the set lower bound of 0.01); dropping the {tau} term from the growth model in the 1960s and the {nu} term from the model in the 1980s led to insignificant increases in the negative log-likelihood values.

An interesting pattern of random effects emerged from a combined analysis of all the length frequency data (from all decades). The estimated between-season effects Formula are shown in Figure 6. According to the model, these are estimated realizations of independent random effects, but they follow a reasonably smooth trend. It seems likely that growth has been changing systematically over these three decades, and that this should be incorporated into a time-dependent growth model. We attempted this in Appendix 8 of Polacheck et al. (2003). However, the method outlined in this article is considerably faster, simpler, and more flexible, and represents an easy way to elicit quite complex growth trends in the data without imposing preconceptions about the patterns of growth. The pattern seen in Figure 6 supports our previous finding that growth was faster in the 1980s than in the 1960s, and that the 1970s was a period of highly variable growth. The complex pattern seen in the 1970s may explain the poor separation between age groups in Figure 3.


Figure 6
View larger version (4K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 6 Estimated between-season effects plotted against season, obtained from fitting the growth model to the combined data from all decades.

 
A residual plot for the combined fit (Figure 7) shows more explicitly that age 1 fish in the 1980s were similar in length to age 1 fish in the previous two decades, but that these fish grew faster and were noticeably larger by age 4. The larger variability in the 1970s is also obvious in the residuals.


Figure 7
View larger version (9K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 7 Residuals from fitting the growth model to the length frequency summary data from all the decades combined. The open triangles indicate points from the 1960s, plusses indicate the 1970s, and solid dots indicate the 1980s. The fact that fish grew faster in the 1980s is evident, and so is the large variability in the 1970s.

 
Finally, for completeness, we fitted the growth model to the combined data with a random cohort effect instead of a random between-season effect. The pattern of estimated cohort effects was almost identical to that in Figure 6. Both models suggest that there are extended periods of good and bad growth, often lasting for several years. For fish of ages 1–4, which have their year of birth and year of capture close together, this would make cohort effects and between-season effects highly confounded and difficult to separate.


    5 Discussion
 Top
 1 Introduction
 2 The data
 3 Fitting Gaussian mixtures
 4 Fitting a growth...
 5 Discussion
 Appendix A
 References
 
The results of this study confirm the usefulness of length frequency data for understanding growth processes, in particular for detecting and modelling within-season growth. A number of previous methods have been proposed for extracting growth information from length frequency data, the most commonly cited being MULTIFAN (Fournier et al., 1990). Our method emphasizes features that are not considered in MULTIFAN and other methods.

Our method is a two-stage analysis in which a modal decomposition is performed in the first stage without making any assumptions about the pattern of growth. Single-stage methods, such as MULTIFAN and that of Leigh and Hearn (2000), constrain the modal estimates to lie on (or near) a parametric growth curve. In our experience, length frequency growth patterns do not conform tightly to parametric growth curves. There appear to be significant additional sources of variation operating between seasons, between age groups within seasons, between half-months within seasons, and between age groups and half-months within seasons (interactive effects). We suspect that some of these stem from temporal and spatial heterogeneity among age classes and changing fishing practices. As such, we doubt that it is possible to offer explanations for all of these effects in terms of covariates, and suggest that initially they should be modelled as independent hierarchical and crossed random effects. Otherwise, standard errors of growth parameters derived from length frequency data are likely to be optimistically small.

In the second stage of our analysis, we fit a growth model to the summary statistics derived in the first stage. This allows us to explore the choice of growth curve and to investigate the possible sources of variation. For example, broad changes in growth from year to year can be estimated by incorporating between-season random effects. For southern bluefin tuna, these estimated between-season effects appear to have been rather smooth, and suggest that they should be modelled as systematic effects rather than as random effects. Quantitative modelling of this trend is an extensive topic. Some initial ideas for modelling such trends are presented in Appendices 8 and 10 of Polacheck et al. (2003). Year-to-year variation in growth, as well as other sources of variation, has not been explored in previously documented methods.

Despite the complexity in growth of southern bluefin tuna, each decade of data consistently exhibits a seasonal pattern in which growth is fastest over the summer, then flattens off in autumn. We propose that this seasonal growth can be modelled using a sine curve with amplitude and phase parameters estimated from the data. In contrast, Leigh and Hearn (2000) recommended a combined analysis of each season's data using a separate linear model within each age group. Their approach ignores seasonal growth patterns. Likewise, the documented and publicly available version of MULTIFAN does not incorporate seasonal growth (although, the software could undoubtedly be modified so that it does).

There are some minor disadvantages to our two-stage method of analysis. For the overwhelming majority of half-months, the attribution of mixture components to age groups was straightforward, but presented difficulties in a few cases. For example, occasionally double modes attributable to only a single age group were evident in the data. Single-stage methods, such as MULTIFAN, that constrain the modes to a parametric growth curve do not generally have such difficulties. However, while this is a convenient feature, it can also potentially bias the modal estimates and mask sources of variation in the data. It would be of interest, if only for comparison, to develop a single-stage method of analysis in which mixtures with additional random effects are fitted to the whole data set in one sweep. The likelihood to do so involves multidimensional integration. We devised a Markov Chain Monte Carlo method that carries out the integrations implicitly, but it suffered from the problem of "label switching" (Titterington et al., 1985, p. 92), leading to false convergence on simulated data. The usual remedy of placing sensible bounds on component means (see our paragraph on starting values) did not work for the more general model. We recommend this as a problem for future research. Another disadvantage of our approach is that it is not useful for older fish. Single-stage methods that integrate the fitting of the growth model with the modal decomposition can make use of the length information for older fish.

Many growth curves, including the familiar von Bertalanffy curve used here, are parameterized in terms of asymptotic length; however, length frequency data tend to lack information on older individuals. When data on older fish are missing, allowing the asymptotic length parameter to be estimated freely can still lead to good length predictions within the age range of the data, but will often give very unrealistic predictions outside this range. Furthermore, the parameter estimates of µ{infty} and k are often unrealistic and cannot be interpreted meaningfully. We found this to be true in our analysis of the southern bluefin tuna data; therefore, we chose to use external information to fix µ{infty} at a realistic value. We used a value of 185 cm based on previous growth studies that used other data sources, but the results were not sensitive to small changes in the value chosen. By fixing µ{infty}, we obtained meaningful estimates of the growth rate parameter k that could be compared between decades to see how growth rates of tuna have changed over time. Had we fitted the growth model to each decade with µ{infty} as a free parameter, it would have been misleading to compare parameter values between decades.

One referee suggested that we could have avoided the need to fix µ{infty} by using an alternative parameterization of the von Bertalanffy growth curve that specifies the curve in terms of length at any two ages rather than asymptotic length (e.g. Schnute, 1981). This is not true. If we fit a von Bertalanffy growth model using a different parameterization, the fitted growth curve will be exactly the same as the curve derived by fitting the model using the standard parameterization, so long as all parameters are estimated freely in both models. The predicted lengths would still be unrealistic outside the range of the data, and the parameter values (k in particular) would still lack a meaningful interpretation. The problem stems from a lack of data, and no parameterization can solve that problem. The main advantage of using an alternative parameterization, such as the one proposed by Schnute (1981), is that parameter estimation tends to be more stable.

Length frequency data provide unique and valuable information for modelling growth, but as just discussed, they are usually inadequate by themselves to model growth over the lifespan of a species. Other sources of growth information can be useful in this regard. For many species, such as southern bluefin tuna, direct ageing of hard parts can provide direct age and length information on older fish, and hence the mean asymptotic length. Tag–recapture data can provide unique information on individual fish variation because there are two measurements per fish rather than one. A comprehensive model of growth that integrates length frequency, tag–recapture, and direct ageing data is presented in Eveson et al. (in press) and applied to southern bluefin tuna data. The length frequency model described in the current paper makes up one component of the integrated model.


    Appendix A
 Top
 1 Introduction
 2 The data
 3 Fitting Gaussian mixtures
 4 Fitting a growth...
 5 Discussion
 Appendix A
 References
 
To calculate the likelihood from first principles, we assume that we are in an iteration trial of an optimization routine, so that all parameter values are known. When calculating the likelihood, we need to compute |D+XX'|, where X is an nxp matrix, n is the number of data values in the season, and p is the number of design levels. Generally, n>p and sometimes n>>p. For example, the values of n and p for the 1980s' length frequency data from South Australia are presented in Table 4. Thus, {sum}ni=269 and {sum}pi=148. The number of linear algebra operations involving matrices of order n is generally proportional to n3, so if we can make the matrices of order p rather than n, we should increase the speed by about (269/148)3{approx}6 times. For our computer, it turns out that this means about 2 or 3min instead of 15min, which is substantial.


View this table:
[in this window]
[in a new window]

 
Table 4 Dimension of X by season.

 
We can use two well-known identities to reduce the amount of computation in this way. First, |D+XX'|=|D||Ip+X'D–1X|, where |A| denotes the determinant of the square matrix A and Ir is the rxr identity matrix (Graybill, 1983, p. 494). Now, |D+XX'| is the determinant of an nxn matrix and |Ip+X'D–1X| is that of a pxp matrix, so the latter requires much less computer time to calculate.

Second, we need to compute Formula . It can easily be checked that


Formula

according to Brown (1993, Appendix D). Set Formula and ß=X'D–1{alpha}. Then


Formula

where {gamma} is the solution of the p equations (Ip+X'D–1X){gamma}=ß. We can thus compute the log-likelihood using matrices of order p.

It is sometimes helpful to calculate the gradient of the likelihood as well as the likelihood itself. This can speed up optimization routines. For a growth parameter {theta}j, we have


Formula

For a given suite of parameters, we first compute Formula efficiently. The vector of partial derivatives {partial}µ/{partial}{theta} is then computed – sometimes there are specific features of the growth curve that can be exploited to help calculate these quickly.

The variance parameters are slightly more problematic. For a generic additive error structure,


Formula

the general rule is that


Formula

where tr(·) stands for the trace of a matrix. Usually, Vj is either diagonal or of the form XjX'j, where the number of columns in Xj is small. The second term on the right is just v'µVjvµ, and we have already computed vµ, so this term is readily computed for either form of Vj. If Vj=Dj is diagonal, then


Formula

If Vj=XjX'j, then


Formula

The trace of a matrix M is the sum of its diagonal elements, but it is not always necessary to compute all elements of M and then sum the diagonals. Thus, for nxp matrices A and B it is readily verified that Formula . This can be used to reduce the computational burden in most cases.


    Acknowledgements
 
We are grateful for the cooperation of the southern bluefin tuna industry and the many individuals involved in collecting length samples over the history of the South Australian fishery. We would also like to thank Dr Peter Jones for useful comments on this manuscript and Dr Bill Hearn for initial discussions regarding the data and for providing the stimulus to include a seasonal component in the growth curve. Lastly, we wish to acknowledge the Fisheries Research & Development Corporation (FRDC) for their funding contribution to this project.


    References
 Top
 1 Introduction
 2 The data
 3 Fitting Gaussian mixtures
 4 Fitting a growth...
 5 Discussion
 Appendix A
 References
 

    Brown P.J. (1993) Measurement, Regression and Calibration(Clarendon Press, Oxford) 201 pp.

    Davis T. L. O. and Nurhakim S. (2001) Catch monitoring of the fresh tuna caught by the Bali-based longline fishery. CCSBT-SC/0108/11. 12 pp.

    Eveson J. P., Laslett G. M., Polacheck T. (2004) An integrated model for growth incorporating tag–recapture, length-frequency and direct ageing data. Canadian Journal of Fisheries and Aquatic Sciences (in press).

    Fournier D.A., Sibert J.R., Majkowski J., Hampton J. (1990) MULTIFAN: a likelihood based method for estimating growth and age composition from multiple length frequency data sets illustrated using data from southern bluefin tuna (Thunnus maccoyii). Canadian Journal of Fisheries and Aquatic Sciences 47:301–313.

    Graybill F.A. (1983) Matrices with Applications in Statistics 2nd edn (Wadsworth, Belmont, California) 461 pp.

    Hearn W.S. and Polacheck T. (2003) Estimating long-term growth-rate changes of southern bluefin tuna (Thunnus maccoyii) from two periods of tag–return data. Fishery Bulletin 101:58–74.[Web of Science]

    Hunt L. and Jorgensen M. (1999) Mixture model clustering using the MULTIMIX program. Australian & New Zealand Journal of Statistics 41:153–171.[Web of Science]

    Laslett G.M., Eveson J.P., Polacheck T. (2002) A flexible maximum likelihood approach for fitting growth curves to tag–recapture data. Canadian Journal of Fisheries and Aquatic Sciences 59:976–986.

    Leigh G.M. and Hearn W.S. (2000) Changes in growth of juvenile southern bluefin tuna (Thunnus maccoyii): an analysis of length-frequency data from the Australian fishery. Marine and Freshwater Research 51:143–154.[CrossRef][Web of Science]

    In Majkowski J. (Ed.). CSIRO database for southern bluefin tuna (Thunnus maccoyii (Castlenau)). (1982) CSIRO Marine Laboratories Report No. 142. ISBN 0 643 02761 0. 23 pp.

    In Majkowski J. and Morris G. (Eds.). Data on southern bluefin tuna (Thunnus maccoyii (Castelnau)): Australian, Japanese and New Zealand systems for collecting, processing and accessing catch, fishing effort, aircraft observation and tag release/recapture data. (1986) CSIRO Marine Laboratories Report No. 179. ISBN 0 643 03965 1. 95 pp.

    Polacheck T., Laslett G. M., Eveson J. P. (2003) An integrated analysis of the growth rates of southern bluefin tuna for use in estimating the catch at age matrix in the stock assessment. Final report. FRDC Project No. 99/104. ISBN 1 876996 382. 411 pp.

    Schnute J. (1981) A versatile growth model with statistically stable parameters. Canadian Journal of Fisheries and Aquatic Sciences 38:1128–1140.

    Titterington D.M., Smith A.F.M., Makov U.E. (1985) Statistical Analysis of Finite Mixture Distributions(Wiley, Chichester) 243 pp.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
ICES J. Mar. Sci.Home page
T. Russo, S. Mariani, P. Baldi, A. Parisi, G. Magnifico, L. W. Clausen, and S. Cataudella
Progress in modelling herring populations: an individual-based model of growth
ICES J. Mar. Sci., September 1, 2009; 66(8): 1718 - 1725.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Laslett, G. M.
Right arrow Articles by Polacheck, T.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Laslett, G. M.
Right arrow Articles by Polacheck, T.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?