© 2004 by ICES/CIEM International Council for the Exploration of the Sea/Conseil International pour l'Exploration de la Mer
Fitting growth models to length frequency data
a CSIRO Mathematical and Information Sciences Private Bag 10, Clayton South MDC, Clayton, Victoria 3169, Australia
b CSIRO Marine Research GPO Box 1538, Hobart, Tasmania 7001, Australia
*Correspondence to G. M. Laslett: tel: +61 3 9545 8018; fax: +61 3 9545 8080. e-mail: geoff.laslett{at}csiro.au; paige.eveson{at}csiro.au; tom.polacheck{at}csiro.au.
A novel two-stage procedure for fitting growth curves to length frequency data collected from commercial fisheries is described. The method is suitable for species in which cohorts are spawned over a limited time period, and samples of length frequency data are collected regularly (e.g. in weekly, fortnightly, or monthly time intervals) over an extended time period. In the first stage of analysis, Gaussian mixtures are fitted separately to the data for each time interval, and summary statistics (component means and standard errors) are extracted. In the second stage, parametric growth models, such as the von Bertalanffy seasonal growth curve, are fitted to the summary data. The error structure in this second stage of analysis incorporates random between-year effects, random within-year age-group effects, random within-year time-interval effects, random within-year age-group and time-interval interactions, and sampling errors. This complex error structure incorporating unbalanced crossed and nested random effects acknowledges that commercial fishing is not an exercise in random sampling, and allows for the inevitable additional sources of random variation in such an enterprise. The method is applied to South Australian southern bluefin tuna length frequency data collected from 1964 to 1989, and leads to the conclusion that juvenile tuna grew faster in the 1980s than in the 1960s, with the 1970s being a decade of highly variable growth.
Keywords: maximum likelihood, mixture decompositions, variance components
Received 10 May 2003; accepted 17 December 2003.
| 1 Introduction |
|---|
|
|
|---|
Valuable information about the growth of fish can often be extracted from length data that have been collected regularly over an extended time period. Such data often exist for commercially harvested species where routine length sampling of the catch occurs. If a species has a restricted spawning period, then fish belonging to the same cohort and caught around the same time will exhibit a limited range of lengths. For young fish, which are growing quickly, the overlap in the length ranges between ages is often small enough so that the length frequency distribution will show obvious modes. For older fish, the overlap in length ranges becomes progressively greater so that the modes become more difficult to distinguish. The progression of modal lengths over time can be tracked to give information on the growth of young fish.
Length frequency data provide information on two aspects of growth. First, yearly growth can be estimated by comparing the average length of one-year-olds, two-year-olds, three-year-olds, and so on caught at the same time. Second, seasonal growth can be inferred by following the growth of a particular age group within a year. Other data sources, such as tagrecapture surveys and direct ageing data from hard parts analyses, often do not exist on a regular enough time scale to be able to provide detailed information on seasonal growth. Length frequency data are important for this reason.
Extracting the information on growth from length frequency data is not straightforward. First, length frequency data do not come with any independent age attribution so the researcher has to assign the fish to age classes, either explicitly or statistically. Second, the spawning period for a species may be several months and there may be peaks in spawning activity within this period. Such variable spawning can complicate the modal decomposition, and also the growth analysis because growth patterns of fish that were spawned early in the season may differ from those spawned later. Third, length data are often collected from commercial fisheries. In one sense, fisheries data are more informative than data from scientific research programs because they are more abundant and more consistent over time. However, fishing is not designed as a random sampling exercise and, consequently, caution must be used in treating the length data as an unbiased random sample of the population. Finally, measurement error is endemic and may be dependent on the measurer. It is important to develop methods of data analysis that capture these sources of variation.
This article presents our method for extracting growth information from length frequency data. Our method has some features in common with other methods presented in the quantitative fisheries literature (Fournier et al., 1990; Leigh and Hearn, 2000), but departs from them in significant ways. In particular, we develop a two-stage analysis. In the first stage, each length frequency distribution is decomposed into age groups using a Gaussian mixture model and relevant summary statistics are extracted. In the second stage, the summary statistics are used as raw data for growth modelling. This approach allows us to explore and visualize the sources of variation in the data prior to final modelling. More direct (i.e. single-stage) methods are likely to overlook the many possible complications in real length frequency data.
We illustrate our method on southern bluefin tuna length data collected from the South Australian surface fishery. The surface fishery operates annually from approximately November to July of the following year, and is the largest fishery in Australia. The catches are sampled for length regularly throughout each fishing season and aggregated half-monthly, so a consistent and long-term time-series of length data exists. Additionally, the surface catches consist predominantly of juvenile fish aged 15, and therefore are ideal for modal length analysis.
In this article, we first discuss the form of the southern bluefin tuna length frequency data used in our analysis and some of its features. We outline our method of analysis, in which we fit a growth model to summary statistics derived from fitting mixture models to each length frequency sample. Finally, we apply the method to southern bluefin data and discuss the results and the method in general.
| 2 The data |
|---|
|
|
|---|
We focus on length frequency data collected from southern bluefin tuna caught in the South Australian surface fishery from 1964 through 1989. During this time, fish were caught for the canning market using purse-seine vessels, which use nets that catch the majority of fish in a school and do not have the ability to select by size within a school. In the 1990s, the focus of the fishery switched to catching tuna for the sashimi market using pole and line vessels, which target larger fish within schools. In order to avoid biases from size-selective fishing, we only include data prior to 1990 in our analysis.
The length data were collected using a two-phase sampling procedure. First, within each half-month, landings of tuna were selected for detailed study. Second, a sample of tuna was taken from each selected landing and the fork length of each fish was measured. The sample data from a landing were first scaled up by the ratio of the weight of fish in that landing to the weight of fish in the sample; then the resulting data were scaled up by the ratio of the weight of the catch from all landings in the half-month to the weight of the catch from all selected landings to allow for landings that were not sampled. The result was an estimate of the length frequency distribution for the entire catch in each half-month. Sampling protocols were established to avoid sampling biases. For details of the sampling and scaling procedures, consult Majkowski (1982) and Majkowski and Morris (1986).
Only the scaled-up data were available to us, so we could not apply conventional statistical analysis to the recorded catch, because it is larger than the sampled catch. Standard error estimates from treating the recorded data as raw data would, in general, be too small. To correct for this, we adopted the weighting factors derived in Appendix 1 of Leigh and Hearn (2000), which are based on the variance of the mean appropriate for the two-phase sampling procedure. Let
denote the weighting factor for a given half-month. If ri is the recorded frequency of fish in length class i for that half-month, then mi=ri/
is an estimate of the equivalent number of fish in class i that would have been observed under simple random sampling. The weighting factors had a median of 23.9 and an interquartile range of 13.444.3, and the effective sample sizes (mi's) had a median of 1710 and an interquartile range of 8212996.
The half-months always start on day 1 or 16 of each month, so that the second half-month can have between 13 days (in a normal February) and 16 days (in a 31-day month). In South Australia, the fishing season is defined as November to July, with the bulk of the fishing taking place in December to March. For the purposes of data analysis, we assigned half-month indices of 3,...,20 to the 24 half-months from November to October. This allowed us to follow the seasonal growth of tuna through the southern summer. In practice, the index range 2 to 13 covered the fishing season.
Southern bluefin tuna are spawned in Indonesian waters off the northwest coast of Australia. Spawning occurs over a wide but restricted range of the year, predominantly from September to March (Davis and Nurhakim, 2001), with 1 January as the estimated midpoint. As such, the length frequency distributions generally show obvious modes for the younger, fast-growing fish (age classes 1, 2, 3, and 4), but the modes cannot easily be distinguished for older, slow-growing fish. The data used in our analysis range in length from 38 to 200 cm, although over 99% of the data are less than 130 cm (approximately age 6). We included data of 130 cm or less in our analysis, because for larger (older) fish the data are sparse and irregular and the modes are difficult to distinguish.
| 3 Fitting Gaussian mixtures |
|---|
|
|
|---|
In the first step of our analysis, we fitted a Gaussian mixture to the length frequency data from each half-month separately. For a given half-month, suppose there are mi fish of length li for i=1,...,n. Here and elsewhere, mi is the recorded catch count divided by the Leigh and Hearn (2000) weighting factor. The lengths are assumed to be generated from a K component mixture in which the probability density of component k is gk(li), where
|
|
We assume that all components within a half-month have the same standard deviation
. Although it is technically possible to fit a different standard deviation for each component, particularly if we model it as a smooth function, we did not find this necessary. For the age classes we are considering, the standard deviation in length does not increase markedly with age (as one might expect if older fish were included).
|
| (1) |
k is the proportion of fish belonging to component k.
To estimate the parameters, we maximize the log-likelihood h. After considerable experimentation on our data, we concluded that it is necessary to use an optimization method that uses first and second derivatives. Derivative-free optimization methods for this problem are slow and do not necessarily converge to the maximum likelihood estimates. The first and second derivatives are set out explicitly in Appendix 7 of Polacheck et al. (2003). As outlined in this appendix, we reparameterized the proportions
k using the logistic transform commonly adopted in the analysis of multinomial data. This guaranteed that the
k are non-negative and sum to 1. Also, it is easy to differentiate the logistic function.
We applied this method of analysis to each half-month of data separately. However, before we could do so, we needed to make some decisions. First, we needed to decide the number of components K to fit. Hunt and Jorgensen (1999) discuss this issue for fisheries' length frequency data, highlighting the point that some older groups may have no fish or very few fish in the sample. Hence, K is impossible to choose from sample data alone. They conclude, and we concur, that K should be chosen by the modeller. Our strategy was to eliminate data greater than 130 cm, and to start with K=5. Groups 14 correspond to cohorts of ages 14, respectively, and group 5 represents fish of age 5 years and more. For a given K, the fitted mixture model was superimposed on a histogram of the data to confirm that the fit was reasonable and biologically plausible. If there was graphical evidence of overfitting almost always this was simultaneously indicated by at least one parameter estimate on a boundary of the parameter space K was reduced by 1, and the model refitted. It was quite common to end up with K=3 or K=4, but occasionally K=2 or even K=1.
Second, we needed to determine starting values for the parameters to be estimated. The starting values for µk were derived from a growth model fitted to corresponding tagrecapture data. To do so, each component in a half-month was assigned an average age equal to its age class plus the fractional time of the year between the midpoint of the sample period and the midpoint of the spawning period (taken to be 1 January). Specifically, the k-year-old group was assigned the average age k+(j0.5)/24 years, where j is the half-month index and 24 is the number of half-months. For example, component k in half-month 2 of a given year was assigned the age k+0.0625, where 0.0625=1.5/24. By evaluating a seasonal VB log k growth model fitted to tagrecapture data (Laslett et al., 2002) at these ages, we obtained estimated mean lengths to be used as starting values for µk, which we will denote by
. Note that in obtaining the starting values, we used decade-specific growth curves; i.e., for a sample from the 1960s we used a seasonal VB log k growth model fitted to 1960s' tagrecapture data, and likewise for the 1970s and 1980s. In order that the optimization routine could not switch groups, the µk were assumed to lie between the bounds ak and bk, where
,
for k
2, bk=ak+1 for 1
k
K1 and
. The initial values of µk and bounds that we used for samples collected in the 1960s are presented in Table 1. The starting value for
was generally taken to be
=4, and the proportions were initially assumed to be equal.
|
Proceeding in this way, we fitted a Gaussian mixture model to each available half-month of South Australian length frequency data from 1964/1965 to 1988/1989. The results for selected half-months in the 1981/1982 season are shown in Figure 1. The patterns seen are typical: the data display a number of modes, although the number of components of the fitted mixture does not necessarily equal the number of modes; the fit is sometimes excellent, and sometimes problematic; when the half-month has only a few fish, the fit can appear very bad, and the fitted components are little more than mathematical artefacts. In all but a few half-months, the allocation of fitted components to age groups was easy and unequivocal. Occasionally, an extra age group (usually a young one-year-old group in the latter part of the season) appeared, perhaps due to two peak times in spawning (Davis and Nurhakim, 2001). In such cases, we usually fitted a single age component to the bimodal group (for example, see Figure 1, half-month 7).
|
The data retained for the next phase of analysis were the fitted component means and their standard errors, estimated by inverting the observed information matrix. We estimated the number of fish in each age group by multiplying the effective sample size (
mi) by the estimated proportions in each age group. As an example, the summary statistics for the most complex panel in Figure 1, half-month 7 of 1982, are set out in Table 2. Groups with less than 50 fish were not included in the next phase of analysis informal graphical analysis suggested that the maximum likelihood estimates of the standard errors, which rely on asymptotic theory, were not reliable for smaller groups.
|
The fitted component means (for groups estimated to have at least 50 fish) for the 1960s, 1970s, and 1980s are shown in Figures 2, 3, and 4, respectively. Here, the estimated means
|
|
|
Note that although the age 5+ component means are shown in these figures, they are not included in subsequent growth modelling because they encompass all fish of age 5 years and more present in the sample.
| 4 Fitting a growth model |
|---|
|
|
|---|
We assume that we have performed a mixture decomposition on the data for each half-month, and we have generated a mean and an accompanying standard error for each age group. Denote these by
|
| (2) |
,
,
, e, and
are all independent random effects. We assume that 
N(0,
2), 
N(0,
2), 
N(0,
2), e
N(0,
e2), and 
N(0,s2), where
represents a random between-season effect,
a within-season random half-month effect,
a within-season random age effect, and e is a within-season half-month age interaction. Finally,
represents sampling error, and its variance is assumed to be known (and equal to the standard error estimate obtained in the previous step). Strictly speaking,
ijk and
ijk' for k
k' and a given i and j should be correlated because
. Before proceeding, we mention that some years may be more favourable for growth than others due to, say, better environmental conditions or reduced competition (e.g. smaller population size). The random between-season effect is meant to capture this. We might also expect some cohorts to grow faster than others. It would be possible to include random cohort effects in the model as well as, or instead of, the between-season effects, although fitting the model is then more complicated and the between-season effects and the cohort effects are likely to be confounded. We believe that random between-season variations are more natural, and therefore adopt the simpler model given in Equation (2). If desired, non-random cohort effects could be included in the growth model by making growth rate parameters vary as a smooth function of cohort. We hope to outline this generalization in a future publication.
The mean growth curve can depend on several (at least three) parameters. A minimal model is the von Bertalanffy growth curve:
|
|
,
, and a0 are parameters to be estimated. They represent mean asymptotic length, rate of growth, and the (theoretical) age at which a fish has length zero, respectively. It would be possible to add random effects to µ
, to make
depend on cohort (as mentioned above) and to make a0 vary with age group, for example, but we prefer to fit the most parsimonious model we can to the data. However, one complication cannot be ignored. We have already seen that growth within seasons is faster in summer than in winter, and any model fitted to length frequency data should capture this effect. We incorporate a within-season growth pattern into the von Bertalanffy growth curve using a sine function as follows: |
|
us
1 to guarantee that growth is monotonic, and also that 0.5<ws
0.5 (any bounds with a span of one could have been chosen due to the periodicity of the function).
We estimate the parameters by maximizing the likelihood. The data for each season are assumed to be independent, so we can add up the log-likelihoods for each season. We can write the model in vector form as
|
|
and X
are design matrices; that is, X
,mj=1 if datum m belongs to half-month j and is zero otherwise, and X
,mk=1 if datum m belongs to age-group k and is zero otherwise.
We need to compute the likelihood within a season. We drop the subscript i for notational convenience. The log-likelihood is
|
|
|
|
is the diagonal matrix with diagonal elements dmm=sm2. We can write V as |
|
e2I+D
is a diagonal matrix and
Once the log-likelihood
of the full data set over all seasons has been maximized, we need to compute the estimated random effects. It is customary to use the best linear unbiased predictors (BLUPs):
|
|
|
|
|
|
|
|
Approximate standard errors for the parameter estimates can also be calculated by evaluating the inverted observed information matrix,
, where
is the vector of parameters, at the maximum of the log-likelihood function.
Trials of this method proved reasonably satisfactory. The parameter estimates from fitting a seasonal von Bertalanffy growth model to the 1960s', 1970s', and 1980s' summary statistics are set out in Table 3, along with their estimated standard errors. We fixed µ
at 185 cm because the longest fish in the analysis of the length frequency data belonged to four-year-old tuna, which are still between 50% and 60% of their asymptotic length. Analyses of tagrecapture data and direct ageing data (Laslett et al., 2002; Polacheck et al., 2003) both suggested that 185 cm is a reasonable value for µ
.
|
We also fitted the model allowing µ
to be a free parameter. In all decades,
was a free parameter versus when it was fixed; however, the resulting mean growth curves were almost indistinguishable within the limited range of the data (from ages 1 to 4). When µ
was free, the parameter estimates were not biologically plausible; for example, µ
was estimated to be 280 cm in the 1980s, which is far too large to be realistic given the maximum length of SBT ever caught in adult fisheries.
The fits of the growth model with µ
fixed at 185 cm were examined using residual plots (Figure 5). Overall, the fits look quite good; however, there is a slight convex pattern in the 1980s' residuals, and to a lesser degree in the 1960s' residuals. This may reflect the findings of previous studies (Laslett et al., 2002; Hearn and Polacheck, 2003), which showed that the von Bertalanffy growth curve cannot adequately capture the fast early growth of southern bluefin tuna. Fitting a more complex growth curve, such as the VB log k model introduced in Laslett et al. (2002), may fix the pattern in the residuals, but at the cost of more parameters to estimate.
|
In any case, the simple von Bertalanffy fits were adequate to make some general observations. The growth parameters for the 1960s and 1970s are very similar, but growth appears to be faster in the 1980s. The seasonal parameters us and ws appear to be quite plausible; the ws values suggest that growth is fastest during February/March, which is consistent with our belief that southern bluefin tuna experience a period of fast growth during the southern summer. The estimated standard deviations of the random effects vary in magnitude between decades, but suggest that most of the random effects contribute significantly to explaining variation in growth. Only

in the 1960s and 
in the 1980s did not differ significantly from zero (the former having a wide confidence interval that encompasses zero and the latter converging to the set lower bound of 0.01); dropping the
term from the growth model in the 1960s and the
term from the model in the 1980s led to insignificant increases in the negative log-likelihood values.
An interesting pattern of random effects emerged from a combined analysis of all the length frequency data (from all decades). The estimated between-season effects
are shown in Figure 6. According to the model, these are estimated realizations of independent random effects, but they follow a reasonably smooth trend. It seems likely that growth has been changing systematically over these three decades, and that this should be incorporated into a time-dependent growth model. We attempted this in Appendix 8 of Polacheck et al. (2003). However, the method outlined in this article is considerably faster, simpler, and more flexible, and represents an easy way to elicit quite complex growth trends in the data without imposing preconceptions about the patterns of growth. The pattern seen in Figure 6 supports our previous finding that growth was faster in the 1980s than in the 1960s, and that the 1970s was a period of highly variable growth. The complex pattern seen in the 1970s may explain the poor separation between age groups in Figure 3.
|
A residual plot for the combined fit (Figure 7) shows more explicitly that age 1 fish in the 1980s were similar in length to age 1 fish in the previous two decades, but that these fish grew faster and were noticeably larger by age 4. The larger variability in the 1970s is also obvious in the residuals.
|
Finally, for completeness, we fitted the growth model to the combined data with a random cohort effect instead of a random between-season effect. The pattern of estimated cohort effects was almost identical to that in Figure 6. Both models suggest that there are extended periods of good and bad growth, often lasting for several years. For fish of ages 14, which have their year of birth and year of capture close together, this would make cohort effects and between-season effects highly confounded and difficult to separate.
| 5 Discussion |
|---|
|
|
|---|
The results of this study confirm the usefulness of length frequency data for understanding growth processes, in particular for detecting and modelling within-season growth. A number of previous methods have been proposed for extracting growth information from length frequency data, the most commonly cited being MULTIFAN (Fournier et al., 1990). Our method emphasizes features that are not considered in MULTIFAN and other methods.
Our method is a two-stage analysis in which a modal decomposition is performed in the first stage without making any assumptions about the pattern of growth. Single-stage methods, such as MULTIFAN and that of Leigh and Hearn (2000), constrain the modal estimates to lie on (or near) a parametric growth curve. In our experience, length frequency growth patterns do not conform tightly to parametric growth curves. There appear to be significant additional sources of variation operating between seasons, between age groups within seasons, between half-months within seasons, and between age groups and half-months within seasons (interactive effects). We suspect that some of these stem from temporal and spatial heterogeneity among age classes and changing fishing practices. As such, we doubt that it is possible to offer explanations for all of these effects in terms of covariates, and suggest that initially they should be modelled as independent hierarchical and crossed random effects. Otherwise, standard errors of growth parameters derived from length frequency data are likely to be optimistically small.
In the second stage of our analysis, we fit a growth model to the summary statistics derived in the first stage. This allows us to explore the choice of growth curve and to investigate the possible sources of variation. For example, broad changes in growth from year to year can be estimated by incorporating between-season random effects. For southern bluefin tuna, these estimated between-season effects appear to have been rather smooth, and suggest that they should be modelled as systematic effects rather than as random effects. Quantitative modelling of this trend is an extensive topic. Some initial ideas for modelling such trends are presented in Appendices 8 and 10 of Polacheck et al. (2003). Year-to-year variation in growth, as well as other sources of variation, has not been explored in previously documented methods.
Despite the complexity in growth of southern bluefin tuna, each decade of data consistently exhibits a seasonal pattern in which growth is fastest over the summer, then flattens off in autumn. We propose that this seasonal growth can be modelled using a sine curve with amplitude and phase parameters estimated from the data. In contrast, Leigh and Hearn (2000) recommended a combined analysis of each season's data using a separate linear model within each age group. Their approach ignores seasonal growth patterns. Likewise, the documented and publicly available version of MULTIFAN does not incorporate seasonal growth (although, the software could undoubtedly be modified so that it does).
There are some minor disadvantages to our two-stage method of analysis. For the overwhelming majority of half-months, the attribution of mixture components to age groups was straightforward, but presented difficulties in a few cases. For example, occasionally double modes attributable to only a single age group were evident in the data. Single-stage methods, such as MULTIFAN, that constrain the modes to a parametric growth curve do not generally have such difficulties. However, while this is a convenient feature, it can also potentially bias the modal estimates and mask sources of variation in the data. It would be of interest, if only for comparison, to develop a single-stage method of analysis in which mixtures with additional random effects are fitted to the whole data set in one sweep. The likelihood to do so involves multidimensional integration. We devised a Markov Chain Monte Carlo method that carries out the integrations implicitly, but it suffered from the problem of "label switching" (Titterington et al., 1985, p. 92), leading to false convergence on simulated data. The usual remedy of placing sensible bounds on component means (see our paragraph on starting values) did not work for the more general model. We recommend this as a problem for future research. Another disadvantage of our approach is that it is not useful for older fish. Single-stage methods that integrate the fitting of the growth model with the modal decomposition can make use of the length information for older fish.
Many growth curves, including the familiar von Bertalanffy curve used here, are parameterized in terms of asymptotic length; however, length frequency data tend to lack information on older individuals. When data on older fish are missing, allowing the asymptotic length parameter to be estimated freely can still lead to good length predictions within the age range of the data, but will often give very unrealistic predictions outside this range. Furthermore, the parameter estimates of µ
and k are often unrealistic and cannot be interpreted meaningfully. We found this to be true in our analysis of the southern bluefin tuna data; therefore, we chose to use external information to fix µ
at a realistic value. We used a value of 185 cm based on previous growth studies that used other data sources, but the results were not sensitive to small changes in the value chosen. By fixing µ
, we obtained meaningful estimates of the growth rate parameter k that could be compared between decades to see how growth rates of tuna have changed over time. Had we fitted the growth model to each decade with µ
as a free parameter, it would have been misleading to compare parameter values between decades.
One referee suggested that we could have avoided the need to fix µ
by using an alternative parameterization of the von Bertalanffy growth curve that specifies the curve in terms of length at any two ages rather than asymptotic length (e.g. Schnute, 1981). This is not true. If we fit a von Bertalanffy growth model using a different parameterization, the fitted growth curve will be exactly the same as the curve derived by fitting the model using the standard parameterization, so long as all parameters are estimated freely in both models. The predicted lengths would still be unrealistic outside the range of the data, and the parameter values (k in particular) would still lack a meaningful interpretation. The problem stems from a lack of data, and no parameterization can solve that problem. The main advantage of using an alternative parameterization, such as the one proposed by Schnute (1981), is that parameter estimation tends to be more stable.
Length frequency data provide unique and valuable information for modelling growth, but as just discussed, they are usually inadequate by themselves to model growth over the lifespan of a species. Other sources of growth information can be useful in this regard. For many species, such as southern bluefin tuna, direct ageing of hard parts can provide direct age and length information on older fish, and hence the mean asymptotic length. Tagrecapture data can provide unique information on individual fish variation because there are two measurements per fish rather than one. A comprehensive model of growth that integrates length frequency, tagrecapture, and direct ageing data is presented in Eveson et al. (in press) and applied to southern bluefin tuna data. The length frequency model described in the current paper makes up one component of the integrated model.
| Appendix A |
|---|
|
|
|---|
To calculate the likelihood from first principles, we assume that we are in an iteration trial of an optimization routine, so that all parameter values are known. When calculating the likelihood, we need to compute |D+XX'|, where X is an nxp matrix, n is the number of data values in the season, and p is the number of design levels. Generally, n>p and sometimes n>>p. For example, the values of n and p for the 1980s' length frequency data from South Australia are presented in Table 4. Thus,
ni=269 and
pi=148. The number of linear algebra operations involving matrices of order n is generally proportional to n3, so if we can make the matrices of order p rather than n, we should increase the speed by about (269/148)3
6 times. For our computer, it turns out that this means about 2 or 3min instead of 15min, which is substantial.
|
We can use two well-known identities to reduce the amount of computation in this way. First, |D+XX'|=|D||Ip+X'D1X|, where |A| denotes the determinant of the square matrix A and Ir is the rxr identity matrix (Graybill, 1983, p. 494). Now, |D+XX'| is the determinant of an nxn matrix and |Ip+X'D1X| is that of a pxp matrix, so the latter requires much less computer time to calculate.
Second, we need to compute
. It can easily be checked that
|
|
. Then |
|
is the solution of the p equations (Ip+X'D1X)
=ß. We can thus compute the log-likelihood using matrices of order p.
It is sometimes helpful to calculate the gradient of the likelihood as well as the likelihood itself. This can speed up optimization routines. For a growth parameter
j, we have
|
|
µ/
is then computed sometimes there are specific features of the growth curve that can be exploited to help calculate these quickly.
The variance parameters are slightly more problematic. For a generic additive error structure,
|
|
|
|
|
|
|
|
| Acknowledgements |
|---|
We are grateful for the cooperation of the southern bluefin tuna industry and the many individuals involved in collecting length samples over the history of the South Australian fishery. We would also like to thank Dr Peter Jones for useful comments on this manuscript and Dr Bill Hearn for initial discussions regarding the data and for providing the stimulus to include a seasonal component in the growth curve. Lastly, we wish to acknowledge the Fisheries Research & Development Corporation (FRDC) for their funding contribution to this project.
| References |
|---|
|
|
|---|
-
Brown P.J. (1993) Measurement, Regression and Calibration(Clarendon Press, Oxford) 201 pp.
Davis T. L. O. and Nurhakim S. (2001) Catch monitoring of the fresh tuna caught by the Bali-based longline fishery. CCSBT-SC/0108/11. 12 pp.
Eveson J. P., Laslett G. M., Polacheck T. (2004) An integrated model for growth incorporating tagrecapture, length-frequency and direct ageing data. Canadian Journal of Fisheries and Aquatic Sciences (in press).
Fournier D.A., Sibert J.R., Majkowski J., Hampton J. (1990) MULTIFAN: a likelihood based method for estimating growth and age composition from multiple length frequency data sets illustrated using data from southern bluefin tuna (Thunnus maccoyii). Canadian Journal of Fisheries and Aquatic Sciences 47:301313.
Graybill F.A. (1983) Matrices with Applications in Statistics 2nd edn (Wadsworth, Belmont, California) 461 pp.
Hearn W.S. and Polacheck T. (2003) Estimating long-term growth-rate changes of southern bluefin tuna (Thunnus maccoyii) from two periods of tagreturn data. Fishery Bulletin 101:5874.[Web of Science]
Hunt L. and Jorgensen M. (1999) Mixture model clustering using the MULTIMIX program. Australian & New Zealand Journal of Statistics 41:153171.[Web of Science]
Laslett G.M., Eveson J.P., Polacheck T. (2002) A flexible maximum likelihood approach for fitting growth curves to tagrecapture data. Canadian Journal of Fisheries and Aquatic Sciences 59:976986.
Leigh G.M. and Hearn W.S. (2000) Changes in growth of juvenile southern bluefin tuna (Thunnus maccoyii): an analysis of length-frequency data from the Australian fishery. Marine and Freshwater Research 51:143154.[CrossRef][Web of Science]
In Majkowski J. (Ed.). CSIRO database for southern bluefin tuna (Thunnus maccoyii (Castlenau)). (1982) CSIRO Marine Laboratories Report No. 142. ISBN 0 643 02761 0. 23 pp.
In Majkowski J. and Morris G. (Eds.). Data on southern bluefin tuna (Thunnus maccoyii (Castelnau)): Australian, Japanese and New Zealand systems for collecting, processing and accessing catch, fishing effort, aircraft observation and tag release/recapture data. (1986) CSIRO Marine Laboratories Report No. 179. ISBN 0 643 03965 1. 95 pp.
Polacheck T., Laslett G. M., Eveson J. P. (2003) An integrated analysis of the growth rates of southern bluefin tuna for use in estimating the catch at age matrix in the stock assessment. Final report. FRDC Project No. 99/104. ISBN 1 876996 382. 411 pp.
Schnute J. (1981) A versatile growth model with statistically stable parameters. Canadian Journal of Fisheries and Aquatic Sciences 38:11281140.
Titterington D.M., Smith A.F.M., Makov U.E. (1985) Statistical Analysis of Finite Mixture Distributions(Wiley, Chichester) 243 pp.
This article has been cited by other articles:
![]() |
T. Russo, S. Mariani, P. Baldi, A. Parisi, G. Magnifico, L. W. Clausen, and S. Cataudella Progress in modelling herring populations: an individual-based model of growth ICES J. Mar. Sci., September 1, 2009; 66(8): 1718 - 1725. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||









