Skip Navigation


ICES Journal of Marine Science: Journal du Conseil Advance Access originally published online on January 8, 2007
ICES Journal of Marine Science: Journal du Conseil 2007 64(2):218-233; doi:10.1093/icesjms/fsl024
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
64/2/218    most recent
fsl024v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Schnute, J. T.
Right arrow Articles by Haigh, R.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Schnute, J. T.
Right arrow Articles by Haigh, R.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© 2007 International Council for the Exploration of the Sea. Published by Oxford Journals. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

Compositional analysis of catch curve data, with an application to Sebastes maliger

Jon T. Schnute and Rowan Haigh

Fisheries and Oceans Canada, Pacific Biological Station, 3190 Hammond Bay Road, Nanaimo, British Columbia, Canada V9T 6N7

Correspondence to J. T. Schnute: tel: +1 250 756 7146; fax: +1 250 756 7053; e-mail: schnutej{at}pac.dfo-mpo.gc.ca

Schnute, J. T. and Haigh, R. 2007. Compositional analysis of catch curve data, with an application to Sebastes maliger. – ICES Journal of Marine Science, 64: 218–233.

This paper applies modern compositional analysis to catch curve data from a quillback rockfish (Sebastes maliger) population in British Columbia, Canada. Bubble plots and ternary diagrams portray variable age distributions and highlight distinctions between commercial and survey sample data. The models formalize important historical issues in catch curve analysis related to selectivity and recruitment variability, where a particular model corresponds to a prescribed vector of design parameters. The roles that compositional distributions (multinomial, Dirichlet, logistic-normal) can play in fishery data analysis are described, and Bayesian methods are used to examine how the distribution of a key mortality parameter depends on model choice. The framework provides a direct link between model designs and policy outcomes that depend on estimated mortalities or mortality ratios.

Keywords: age composition, catch curve, compositional analysis, Dirichlet distribution, logistic-normal distribution, mortality, quillback rockfish

Received 9 December 2005; accepted 5 October 2006; advance access publication 8 January 2007.


    Introduction
 Top
 Introduction
 Catch curve models
 Compositional data
 Compositional statistics
 Estimation and model selection
 Application to quillback...
 Discussion
 Appendix A
 Appendix B
 Appendix C
 Appendix D
 References
 
Estimates of mortality rates in fish populations often come from age and size composition data. Historically, techniques for obtaining these estimates played an important role in the development of quantitative fishery science. For example, Ricker (1975, pp. 29–73) devoted the second chapter of his influential handbook on the statistics of fish populations to this problem. In the simplest case, the proportion of fish declines exponentially with age, and the rate of decline gives a measure of total mortality rate Z = F + M, where F and M denote mortality rates attributable to fishing and natural causes. Ricker (1975, p. 33) cites Baranov (1918) for giving the name catch curve to the graph of log frequency against age or size. Theoretically, some right-hand portion of this curve should follow a descending straight line with slope equal to –Z.

Although Ricker clearly recognized the importance of catch curve analysis for stock assessment, he gave many reasons for applying the technique with caution. For example, younger fish typically experience lower selectivity in the sampling process, and variable recruitment occasionally produces strong year classes that do not fit the exponential decay predicted by theory. Time trends in recruitment and fishing mortality can introduce further complications. These problems generate a profusion of methods for selecting some portion of the age distribution where exponential decay might represent the total mortality. Quinn and Deriso (1999, Section 8.2.2) discuss these problems further and point out (p. 322) that "catch curve analysis from a single year can be a dangerous procedure that should be used only if one knows that no trend in recruitment has been present."

Changes in computing technology and statistical theory now make it possible to extend Ricker's analyses into a more comprehensive framework (Schnute, 2006). Both frequentist and Bayesian methods have evolved to take into account the many recognized sources of error simultaneously. Furthermore, new theories suggest better methods for exploring and analysing data on the composition of a population (Aitchison and Shen, 1980; Aitchison 1985, 1986). Although these techniques cannot circumvent the problems discussed earlier, they do make it possible to examine their consequences systematically.

The analytical framework discussed here can facilitate such investigations. It includes a family of models that admit some of the catch curve scenarios contemplated by Ricker and other authors. Statistical theory plays a major role, where the data consist of proportions yi (i = 1, ... , g) in group i among a total of g groups. Although the observed vector has length g, its effective length is g–1, because the constraint {Sigma}i yi = 1 determines yg from (y1, ... , yg–1). Just as age compositions indicate properties of fish populations, mineral compositions reveal features of geological structures. Aitchison's (1986) theory of compositional data analysis, used particularly by geologists, fits well with the application here. Following his approach, we define relevant terminology, such as a composition operator, subcomposition, amalgamation, and ternary diagram. Our examples use ternary diagrams as devices for exploratory data analysis.

To assess uncertainty, we need to compare observed proportions yi with predicted proportions pi from a specified model. We discuss three approaches to this problem, based on the multinomial, Dirichlet, and logistic-normal distributions. We adopt an exploratory approach by comparing results obtained from a variety of ad hoc trial models. If different models give similar outcomes for a fixed data set, then at least we know that the results are robust to that set of model assumptions. On the other hand, if results vary substantially with model choice, then we have discovered different biological scenarios, consistent with the available data. We use formal criteria, such as Akaike's information criterion (AIC), to guide model selection (Burnham and Anderson, 2002).

The statistical theory described here applies not only to catch curves, but more generally to age- and size-structured population models with proportion data for one or more years. Our model focuses primarily on the data-limited context associated with data from a single year. Our worked example, however, uses a richer data set on quillback rockfish (Sebastes maliger) from British Columbia (BC), Canada. This allows us to examine the results of single-year analyses in a broader context and to illustrate the utility of compositional techniques, such as ternary diagrams.


    Catch curve models
 Top
 Introduction
 Catch curve models
 Compositional data
 Compositional statistics
 Estimation and model selection
 Application to quillback...
 Discussion
 Appendix A
 Appendix B
 Appendix C
 Appendix D
 References
 
Our catch curve model generates theoretical proportions pa of fish at age a, based on the combined effects of survival, selectivity, and recruitment. Table 1 summarizes the required notation, and the model itself appears in Table 2. We confine fish ages to the range k ≤ a ≤ A, where k is the youngest age of interest and A denotes a plus class for all ages a ≥ A. Internally, the model computes pa for a broad range of ages up to B >> A, then uses (T2.5) to accumulate the proportion pA in the plus class.


View this table:
[in this window]
[in a new window]

 
Table 1. Notation for the catch curve model.

 


View this table:
[in this window]
[in a new window]

 
Table 2. Catch curve model with age-dependent survivals Sa, selectivities ßa, and recruitments Ra. Calculations depend on a fixed design vector {phi} in (1), and the predicted proportions pa({theta}) vary with the parameter vector {theta} in (2).

 
The total mortality parameter Z determines the exponentially decaying survival curve (T2.1), where cumulative survival Sa from age k to A has the initial value Sk = 1. The model assumes that fish are fully selected by the fishery or sampling programme at age b0. For younger ages the selectivity ßa is given by the asymptotic curve (T2.2). This depends on two parameters: the selectivity ßk (0 < ßk < 1) on the youngest age k, and a positive exponent {alpha} that determines how rapidly ßa increases from ßk to 1 as a increases from k to b0. Across this range, the selectivity curve is concave, linear, or convex when {alpha} > 1, {alpha} = 1, or 0 < {alpha} < 1, respectively. In our examples, we constrain {alpha} between 2 and 25 to ensure a concave selectivity curve. Schnute and Richards (1995) used a similar curve with b0 = A, but we give the model here greater flexibility by allowing b0 to be set arbitrarily.

The model assumes a constant base level of recruitment Ra to each age a, but this can be modified to include m anomalies at specified ages bh (h = 1, ... , m). In actual samples, if a strong year class appears at age bh, then ageing error tends to increase the proportion of fish at nearby ages a. Our model uses a single parameter {tau} to capture this effect, where {tau} increases as the influence of ageing error spreads to a broader range of nearby ages. Although we could allow a different parameter {tau}h for each anomaly, a single value {tau} works reasonably well in our examples below. If circumstances permit, {tau} might be set or assigned a prior distribution from independent reader data. The model uses (T2.3) to calculate the combined effect of all recruitment anomalies, where a standard base level Ra = 1 is incremented by the amount {rho}h at age bh and this effect is spread to nearby ages with the profile of a normal distribution.

The three model equations (T2.1–T2.3) describe the effects of survival, selectivity, and recruitment. Each is normalized to a base level by requiring that Sk = 1, ßa = 1 for large a, and Ra = 1 in the absence of recruitment anomalies. The products SaßaRa determine the profile of age proportions pa(a = k, ... , B). Explicitly, the transformation (T2.4) converts these positive quantities to model predictions pa with the property {Sigma}a = kB pa = 1. Finally, the aggregation (T2.5) gives the proportion pA in the plus class, as mentioned earlier.

In practice, an analyst applying this model would specify the design vector


Formula 024M1

(1)
with m + 5 components, partitioned by semicolons into quantities associated with the age range, selectivity, and recruitment. Predictions {pa}a = kA then depend on the parameter vector


Formula 024M2

(2)
with m + 4 components associated with mortality, selectivity, and recruitment. Because catch curve analysis focuses primarily on Z, the remaining components of {theta} act essentially as nuisance parameters. As discussed in the introduction, an exploratory analysis typically involves numerous trial designs (1), such as choices bh that correspond to prominent modes in the observed data. The model can be used to investigate how the estimates of Z vary among these potential interpretations.

Any useful version of the model includes the survival component Sa in (T2.1), but the selectivity and recruitment anomaly components a, Ra) can be turned on or off. This suggests four cases of particular interest:
Case 1: b0 = k, m = 0; (Sa)
Case 2: b0 > k + 3, m = 0; (Sa, ßa)
Case 3: b0 = k, m ≥ 1; (Sa, Ra)
Case 4: b0 > k + 3, m ≥ 1; (Sa, ßa, Ra)

where labels on the right indicate active components. In particular, the choice b0 = k effectively removes selectivity from the model, because then ßa = 1 for every a. Similarly, recruitment anomalies do not occur when m = 0, so that Ra = 1 for every a. If selectivity is active, the portion of the curve (T2.2) governed by the two parameters ({alpha}, ßk) covers the age range from k to k + b0–1. The condition b0 > k + 3 in Cases 2 and 4 requires that this range should include at least three ages to estimate these two parameters.

Figure 1 illustrates Case 4 for a hypothetical catch curve model when m = 3. Three panels show plots of the components Sa, ßa, and Ra in relation to ages in the range k = 5 to A = 40. The fourth panel shows the resulting proportions pa for each a. Features in the final panel reflect various aspects of the underlying components, such as the three anomalous recruitments. Although the graph represents ages in the range from k to A, the model's internal calculations go up to age B = 200, so the graphs of Sa, ßa, and Ra are not shown for their entire range. However, the results of this extended calculation account for the plus class at age A = 40 in the graph of pa. For proper calculation of pA, age B must be chosen large enough to include fish with a very small probability of survival. For example, if {varepsilon} represents a small number like 10–5, then the requirement SB < {varepsilon} in (T2.1) implies that


Formula 024M3

(3)
In Figure 1, where Z = 0.1, this estimate suggests using B >> 120. We have used a higher value B = 5A = 200, with SB = 3.4 x 10–8.


Figure 1
View larger version (22K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 1. Plots of (a) survival Sa, (b) selectivity ßa, (c) recruitment Ra, and (d) predicted proportions pa as functions of age a in the range k ≤ a ≤ A, based on the catch curve model (T2.1)–(T2.5). This example uses the design vector {phi} in (1) specified by k = 5, A = 40, B = 200, b0 = 20, m = 3, b1 = 10, b2 = 18, and b3 = 30. The parameter vector {theta} in (2) has components Z = 0.1, ßk = 0.5, {alpha} = 3, {tau} = 1.5, {rho}1 = 2, {rho}2 = 6, and {rho}3 = 5. Broken lines indicate (b) the age b0 of full selectivity, (c) ages bh with recruitment anomalies, and (d) proportions pa based on survival only (Case 1 of the catch curve model).

 

    Compositional data
 Top
 Introduction
 Catch curve models
 Compositional data
 Compositional statistics
 Estimation and model selection
 Application to quillback...
 Discussion
 Appendix A
 Appendix B
 Appendix C
 Appendix D
 References
 
Catch curve analysis typically begins with age measurements from a sample of n fish. Assume as before that ages a lie in the range k ≤ a ≤ A, with a plus class at age A. Then n = {Sigma}a = kA na, where na denotes the number of fish with age a. The age composition data (nk, ... , nA) give the observed proportions


Formula 024M4

(4)
of fish at each age a, where {Sigma}a = kA ya = 1.

We often need to amalgamate age proportions into g groups, based on a prescribed sequence of ages ai (i = 0, ... , g) with


Formula 024M5

(5)
By definition, group i (i = 1, ... , g) includes all fish with age a in the range ai–1 < a ≤ ai, and


Formula 024M6

(6)
denotes the proportion of fish in this group. We use different indices a and i to distinguish the proportion ya at actual age a from the amalgamated proportion yi within group i. This notation lacks mathematical elegance, because subscripts usually act as dummy variables, but it works effectively in the context here. Similarly, proportions pa calculated from a catch curve model can be amalgamated to give the theoretical proportion pi in group i:


Formula 024M7

(7)

The transition (4), from an age composition vector (nk, ... , nA) to a vector of proportions (yk, ... , yA), represents one of several compositional operators mentioned in the introduction. Because these transformations play an important role in the theory of compositional data analysis, we define them mathematically here. In general, let u = (u1, ... , ug), v = (v1, ... , vg), and y = (y1, ... , yg) denote vectors of real numbers, positive numbers, and proportions, respectively. Thus, –{infty} < ui < +{infty}, 0 < vi < +{infty}, 0 < yi < 1, and {Sigma}i = 1g yi = 1. The three transformations


Formula 024M8

(8)


Formula 024M9

(9)


Formula 024M10

(10)
take real numbers to positives, positives to proportions, and reals to proportions, respectively. We use both (9) and (10) to design probability models for a random proportion vector y, such as the age proportion data in a catch curve analysis.

Aitchison and Shen (1980) used the relationship (9) to define the composition operator C, where y = C[v]. Their theory found a natural home in geology, where C transforms a vector v of observed mineral components to a corresponding vector y of proportions. Similarly, our transformation (4) illustrates an application of C. Geologists also use subcompositions, defined by selecting particular components of a composition. For example, the first three components define the subcomposition y' = (y'1, y'2, y'3) = C[(y1, y2, y3)] = (y1, y2, y3)/(y1+y2+y3). Our transformation (6) represents an amalgamation, which preserves all population components, but lumps some of them together.

When population components are amalgamated into g = 3 groups, vectors of proportions can be portrayed in a graph called a ternary diagram (Aitchison, 1986, p. 5). As illustrated in Figure 2, the diagram begins with an equilateral triangle that has vertices labelled "1", "2", and "3". A vector p = (p1, p2, p3) of proportions can then be represented as a point within this triangle, where the perpendicular distance to the side opposite vertex "i" is proportional to pi. The example here shows the point p = C[(3, 2, 1)] = (1/2, 1/3, 1/6). This lies closest to vertex "1", because of the relatively large proportion p1 with a corresponding large distance p1 from the opposite side between "2" and "3".


Figure 2
View larger version (9K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 2. Ternary diagram for compositions amalgamated into g = 3 groups. The indicated point represents a vector of proportions. Solid lines perpendicular to the sides of an equilateral triangle have lengths proportional to pi (i = 1, 2, 3). Dotted lines facilitate a proof that this method actually works, given the constraint {Sigma}i = 13 pi = 1.

 
Why does this work? Figure 2 illustrates the proof. We need to show that a single constant c can be used to scale any vector p of three proportions and represent it in this way. Suppose that the equilateral triangle has side length a, altitude {surd}3a/2, and area {surd}3a/4. Three dotted lines from the vertices to the point representing p separate the entire triangle into three component triangles, with base a, altitude cpi, and area acpi/2 (i = 1, 2, 3). Equating the area of the triangle to the sum of these three component areas gives


Formula 024M11

(11)
independent of the vector p of proportions. Consequently, the constant c = {surd}3(a/2) scales the proportions pi to the perpendicular distances shown in Figure 2.

Ternary diagrams have the additional property that a straight line from one vertex to the opposite side corresponds to proportions p with a constant odds ratio between the other two components. For example, a line through vertex "3" to the side joining "1" and "2" represents proportions with a constant odds ratio p1/p2. Equivalently, the subcomposition C[(p1, p2)] remains the same along this line. Readers inclined towards geometry can prove this by drawing the figure and observing that any two points on the line define similar triangles with a common ratio p1/p2. Appendix A gives the formulae needed to construct ternary diagrams in a graphical computer language like R or S–PLUS (R Development Core Team 2005; Venables and Ripley, 2000).


    Compositional statistics
 Top
 Introduction
 Catch curve models
 Compositional data
 Compositional statistics
 Estimation and model selection
 Application to quillback...
 Discussion
 Appendix A
 Appendix B
 Appendix C
 Appendix D
 References
 
The catch curve model in Table 2 predicts the proportion pa of fish at each age a, and samples from a survey or commercial fishery give observed proportions ya. Although the observations might be 0 for some ages a, a suitable amalgamation (6) guarantees that each age group i has a positive observed proportion yi. The corresponding amalgamation (7) gives comparable predictions pi. Statistical analysis requires a probability distribution P(y | p) that generates observation vectors y = (y1, ... , yg) from the predicted vector p = (p1, ... , pg). The distribution should have properties that link random observations to the predictions, such as the expected value


Formula 024M12

(12)

Perhaps the simplest probability model for catch curve analysis is the multinomial distribution (Table 3). Given a sample size n and probability vector p, the multinomial model (T3.1) generates a vector v = (n1, ... , ng) representing the number ni of fish observed in each age group i, where n = {Sigma}ini. The composition (T3.2) transforms these observations to proportions. The function P(y | p, n) in (T3.3) gives the explicit probability of observing y, given p and n. Statistical inference depends on the negative log-likelihood function


Formula 024M13

(13)
where K represents any convenient constant that does not depend on the parameters p. In particular, the distribution (T3.3) and constant K = log n!–{Sigma}i log[(nyi)!] give the function {ell}(p | y, n) in (T3.4). The observations yi(i = 1, ... , g) have means (T3.5), variances (T3.6), and covariances (T3.7) determined by the predictions pi and sample size n, where (T3.5) corresponds to the relationship (12).


View this table:
[in this window]
[in a new window]

 
Table 3. The multinomial distribution as a statistical model for the vector of observed proportions y, given the predicted proportions p and the sample size n.

 
In spite of its attractive simplicity, the multinomial distribution has at least two limitations that keep it from dealing adequately with catch curve analysis. First, it assumes a definite sample size n and generates proportions yi that necessarily take discrete fractional values in the set


Formula 024M14

(14)
If the data yi come from more complex sampling schemes, such as a stratified random sample, n may not be properly defined, and the observations ya can take arbitrary values in the interval [0, 1]. Second, according to (T3.5)–(T3.6), the variances Var[yi] should become small as the sample size n becomes large and the observations yi should become close to their expected values E[yi] = pi. In other words, the data should fit the catch curve model almost perfectly if the sample size is large enough. In reality, the model itself is not entirely appropriate, and a realistic analysis requires a distribution that allows potentially large errors, regardless of sample size.

The composition transformation (9) suggests a general technique for modelling stochastic proportions y. Start by generating a random vector v with positive components; then set y = C[v]. The Dirichlet distribution (Table 4) results from this approach, where the vector v has components vi drawn independently from the gamma distributions


Formula 024M15

(15)
with shape parameters npi and a single scale parameter n, as indicated in (T4.1). In our formulation, n plays the role of an effective sample size, where y becomes less variable as n increases. The model allows any value of n >> 0, and uncertainty increases dramatically as n -> 0. The Dirichlet probability distribution (T4.3) formally resembles the multinomial distribution (T3.3), given the linkage


Formula 024M16

(16)
between the factorial and gamma functions. Essentially, the roles of p and y are reversed between the two distributions. For this reason, in a Bayesian framework, the Dirichlet is the conjugate prior distribution for the parameters of the multinomial (Gelman et al., 1995, p. 482). The moment formulae (T4.6)–(T4.8) and (T3.5)–(T3.7) also bear a striking resemblance to each other.


View this table:
[in this window]
[in a new window]

 
Table 4. The Dirichlet distribution as a statistical model for the vector of observed proportions y, given the predicted proportions p and an effective sample size n. The model applies when y comes from the composition (T4.2) of independent variates (T4.1) drawn from gamma distributions with a common scale parameter n >> 0. The approximate modal estimate n in (T4.5) depends on Stirling's approximation (17) to the gamma function.

 
The Dirichlet distribution improves the multinomial for catch curve analysis in two key ways. First, the proportions yi are now continuous variables with 0 < yi < 1, rather than discrete fractions in the set (14). Second, the Dirichlet parameter n can actually be estimated from the observed data y, unlike the prescribed value n for the multinomial. Estimation depends on the negative log-likelihood (T4.4), where (13) defines {ell} from P with K = {Sigma}i log yi. Stirling's approximation


Formula 024M17

(17)
implies the approximate modal estimate (T4.5) for n (Appendix B), where n -> {infty} as the data conform more closely to the model (yi -> pi for each i).

A third model for compositional data starts with a random vector u of real numbers, which are then transformed by (10) into a vector y of proportions. In particular, a multivariate normal vector u generates a logistic-normal distribution (Table 5) for y. Our formulation uses the simplifying assumption (T5.1) that the components ui are drawn independently with mean log pi and common variance {sigma}2. A more general approach would allow u to have an arbitrary covariance matrix, but the application here does not require this added level of complexity. In Appendix C, we explain precisely how the model here fits into the broader context of compositional distributions. We also prove that assumptions (T5.1) and (T5.2) give the relatively simple probability distribution (T5.3). This, combined with definition (13), gives {ell}(p, {sigma} | y) in (T5.4), where K = –[(g–1/2]log(2{pi})–{surd}g({Sigma}ilogyi).


View this table:
[in this window]
[in a new window]

 
Table 5. The logistic-normal distribution as a statistical model for the vector of observed proportions y, given the predicted proportions p and a standard deviation {sigma}. The model applies when y comes from the logistic transformation (T5.2) of independent variates (T5.1) drawn from normal distributions with a common standard deviation {sigma}. Calculations involve the geometric means Y and p of y and p, respectively. The modal estimate Formula in (T5.5) is exact. The approximations (T5.9)–(T5.11) apply when {sigma} is small.

 
Exact values (T5.6)–(T5.8) for the means and covariances of log odds ratios log(pi/pj) can be computed from the parameters p and {sigma}, as can the modal estimate (T5.5) for {sigma}. Notice that Formula -> 0 as yi -> pi for each i, so that a good model fit corresponds to a small value of {sigma}. Analytically, the moments (T5.9)–(T5.11) have only approximate values when {sigma} is small (Appendix C), in contrast with the exact moments (T3.5)–(T3.7) for the multinomial and (T4.6)–(T4.8) for the Dirichlet. Aitchison (1986, p. 64) argues that log odds ratios have greater statistical relevance than the proportions themselves, as we discuss further in Appendix C.


    Estimation and model selection
 Top
 Introduction
 Catch curve models
 Compositional data
 Compositional statistics
 Estimation and model selection
 Application to quillback...
 Discussion
 Appendix A
 Appendix B
 Appendix C
 Appendix D
 References
 
Our catch curve models generate predicted proportions pi({theta}) for a specified parameter vector {theta}. We can compare these predictions with observations yi, using the multinomial, Dirichlet, or logistic-normal distributions. At best, our models represent only approximations to reality with


Formula 024M18

(18)
As we have discussed, the multinomial model implies that the approximation (18) approaches equality as the sample size n increases. (The variances (T3.6)–(T3.7) approach 0 as n -> {infty}.) In effect, the multinomial allows only for measurement error, which should become negligibly small with large sample sizes.

To address this limitation, various authors have suggested the alternatives discussed here: logistic-normal (Schnute and Richards, 1995) or Dirichlet (Williams and Quinn, 1998). Each of these distributions involves an extra parameter related to the approximation (18), which becomes more exact as the logistic-normal {sigma} decreases or the Dirichlet n increases. Effectively, {sigma} and n leave room for the approximation (18) to include both measurement and process error. Even very precise observations yi need not closely match the predictions pi({theta}). Nature's "true" model might not lie in the family defined by Table 2.

Let {Theta} denote the complete parameter vector: ({theta}, {sigma}) for the logistic-normal model or ({theta}, n) for the Dirichlet model. Equations (T4.4) and (T5.4) define the negative log-likelihood function {ell}({Theta} | y) for a given data set y, as illustrated for the Dirichlet distribution by the calculation


Formula 024UM1

From (13), the function {ell} defines the posterior distribution


Formula 024M19

(19)
where P0({Theta}) denotes the prior distribution for {Theta}.

Given a data set y, we want to explore models that correspond to various choices of design vector {phi}. From a Bayesian perspective, we could generate a posterior sample (Appendix D) of {Theta} from (19) for each model in question. As suggested in the discussion of catch curve models, it might suffice to look at the resulting sample distribution of Z, where all other parameters are considered nuisance parameters. Analysis would be guided by investigating how model choice influences the Z distribution.

A frequentist approach typically begins by finding the maximum likelihood estimate Formula that minimizes the negative log-likelihood (Appendix D), where


Formula 024M20

(20)
As discussed by Burnham and Anderson (2002, pp. 61–66), this can be used to calculate Akaike's information criterion


Formula 024M21

(21)
and a corrected criterion


Formula 024M22

(22)
where N and g denote lengths of the vectors {Theta} and y, respectively. The derivation of (21) comes from a theory in which the "true" model of nature lies outside the parametric family that defines {ell}({Theta} | y), a theory appropriate to the application here.

Based on information theory, the AIC measures relative distances of various proposed models to the (unknown) true model. Intuitively, the calculation (21) balances model fit (a low minimum in (20)) with model complexity (a large number N of parameters {Theta}). The minimum decreases as N increases, and the final term in (21) becomes a penalty associated with the number of parameters. Given a set of proposed models, the one with lowest AIC seems most credible.

The corrected AICc in (22) also involves the effective number g–1 of observations y. For large values g, the AIC and AICc are essentially the same, but the penalty in (22) increases as g decreases. Intuitively, when the number of observations is small, it takes a greater improvement of fit to justify adding a new parameter to the model. Our examples illustrate situations in which the model selected by AIC differs from that selected by AICc.


    Application to quillback rockfish
 Top
 Introduction
 Catch curve models
 Compositional data
 Compositional statistics
 Estimation and model selection
 Application to quillback...
 Discussion
 Appendix A
 Appendix B
 Appendix C
 Appendix D
 References
 
The genus Sebastes comprises a diverse group of marine fish generally known as rockfish. More than 60 rockfish species live in the northeast Pacific Ocean (Love et al., 2002). According to Hart (1973), 35 of these inhabit waters along Canada's west coast in BC. The BC commercial fishery captures at least 28 rockfish species (Yamanaka et al., 2004). In particular, quillback rockfish (S. maliger) belong to a group loosely termed "inshore rockfish" owing to their prevalence in shallow waters (0–200 m). They occupy rocky reefs, where they are caught primarily by hook and line. Like most rockfish, quillback can live a long time, with ages recorded up to 90 years (Munk, 2001).

Historically, large populations of quillback rockfish occurred in BC coastal waters between Vancouver Island and the mainland, including the Strait of Georgia. In the late 1970s, fishers began targeting rockfish in response to developing local markets for live fish. Easily accessible populations in the Strait of Georgia became depleted, and fishing pressure shifted north to more remote areas. The fishery reached Johnstone Strait by the mid-1980s, and this prompted the introduction of a survey to target that population while it was still relatively unfished (Richards and Cass, 1987).

Handline surveys have been conducted in Johnstone Strait and adjacent waterways (126°37'W to 126°53'W, 50°32'N to 50°39'N) since 1986. Yamanaka and Richards (1993) describe surveys conducted in 1986, 1987, 1988, and 1992. In 2001, the Rockfish Selective Fishery Study (Berry, 2001) targeted quillback rockfish for experiments on improving survival after capture by hook and line. The resulting data subsequently have been incorporated into the survey data series. The most recent survey in 2004 essentially repeated the 1992 survey design.

Commercial handline fishery samples have been collected from a larger region (126°35'W to 127°39'W, 50°32'N to 50°59'N) in the years 1984–1993, 1996, and 2000–2001. Bubble plots of age proportions (Figure 3) portray all available survey data, plus commercial data (if available) from years when no surveys were conducted (1984–1985, 1989–1991, 1993, 1996, and 2000). We include the commercial data partly to complete the time-series and partly to compare data from the two sources. Commercial samples sometimes reinforce major recruitment anomalies identified by survey samples (e.g. the 1985 year class). In some years, however, a low number of commercial specimens (e.g. 1991 with n = 50) shows at best highly smeared age structure patterns.


Figure 3
View larger version (47K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 3. Bubble plot representing observed age proportion ya in various years for quillback rockfish in Johnstone Strait, BC, Canada. Background shading indicates columns with data from surveys, and the remaining data come from the commercial fishery. Diagonal lines highlight possible strong cohorts born in the years indicated. Numbers below the horizontal line at age 0 show the number n of fish aged each year.

 
Overall, Figure 3 suggests the decline of older year classes from 1984 to 2004, with a tendency for surveys to capture younger fish than the commercial fishery. Data early in the time-series indicate the consistent presence of fish with ages up to 50 and occasionally beyond. By 2004, the age distribution lies almost entirely below age 30. This apparent decline is consistent with the history of the fishery, in which 1986 was the first year of a new license category that applied direct quotas to quillback rockfish (Yamanaka et al., 2004). Previously, the species was caught only as bycatch in other fisheries. The 1986 survey gives baseline data from a relatively unfished population that can be compared with data from the fished population sampled by the 2004 survey.

Ternary diagrams (Figure 4) show age proportions amalgamated into three groups (ages 0–12, 13–20, 21–69) corresponding to fish that could be considered young, mid-aged, and old. We distinguish survey data (Figure 4a) from commercial data (Figure 4b), where Figure 4b portrays commercial data from all available years. The time trend in both panels shows a pronounced movement away from vertex 3 (old fish). A large influx of new recruits from the 1985 year class appears as a high proportion p1 near vertex 1 for the survey year 1992 (Figure 4a). This shift to younger fish does not appear in the commercial data (Figure 4b), probably because the fishery is less selective for young fish.


Figure 4
View larger version (10K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 4. Ternary diagrams for data from (a) surveys and (b) commercial fisheries, including some commercial data not represented in Figure 3. Ages a are amalgamated into three groups defined by 0 < a ≤ 12, 12 < a ≤ 20, 20 < a ≤ 69. Within each panel, a dashed line indicates points with equal proportions in the first two age groups. A dotted line represents points where the ratio between these two age groups equals the geometric mean of ratios in the observed data.

 
Each panel of Figure 4 shows a vertical dashed line from vertex "3" to the base of the triangle. Along this line, the proportions p1 and p2 of young and mid-aged fish are equal (i.e. p1/p2 = 1). Dotted lines are also drawn from vertex 3 to indicate a constant ratio p1/p2 = p1/p2, where pi denotes the geometric mean of observed proportions pi in group i. All points on this line have a constant ratio between young and mid-aged fish. The survey shows a larger proportion of young fish, so that the dotted line lies left of the dashed line in Figure 4a. In contrast, the dotted line lies right of the dashed line in Figure 4b, indicating a larger proportion of mid-aged fish in the fishery.

We use the start and end years, 1986 and 2004, of the survey series to illustrate the models described here. Table 6 lists design parameters {phi} for three examples. Two of these use the 1986 data, with either m = 2 or m = 4 recruitment anomalies, and the third uses the 2004 data with m = 2 anomalies. Our analyses involve the model parameters {theta}, plus an additional parameter n for the Dirichlet distribution or {sigma} for the logistic-normal. We constrain each parameter to lie in a specified interval (Table 7). From a frequentist perspective, these constraints bound the region of interest in parameter space. In a Bayesian context, they define uniform prior distributions.


View this table:
[in this window]
[in a new window]

 
Table 6. Design parameters {phi} in (1) for three examples based on quillback rockfish data from surveys in 1986 and 2004.

 


View this table:
[in this window]
[in a new window]

 
Table 7. Constraints on the parameters {theta}, n, and {sigma} used in all analyses in this paper.

 
For two reasons, we like to start our analyses using uniform priors. First, in this context, the maximum likelihood estimate corresponds exactly to the posterior mode. Second, a uniform prior makes it fairly easy to see the impact of the prior on the posterior. The data provide at least some information about a parameter if its posterior distribution lies well within the prior interval. By contrast, the data appear uninformative if the posterior lies rather evenly scattered across the interval.

As described earlier, the model in Table 2 has four cases associated with possible combinations of survival, selectivity, and recruitment anomalies. Combining these with the three examples in Table 6 gives a total of ten distinct analyses, six for the 1986 data and four for the 2004 data. Tables 8 and 9 list maximum likelihood parameter estimates obtained from the Dirichlet and logistic-normal distributions, respectively. Figure 5 provides some insight into the biological interpretation of the estimates in Table 8. Vertical bars in Figures 5a, 5c, and 5e represent the observed age proportions ya for the three examples in Table 6, where both Figures 5a and 5c portray the 1986 data. Curves in each panel show estimated age proportions pa associated with the four cases of the model. Corresponding curves in Figures 5b, 5d, and 5f portray the cumulative probability distributions for the data and the proportions estimated from each case of the model.


Figure 5
View larger version (35K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 5. Observed and estimated proportions ya and pa from the posterior mode of the Dirichlet distribution for Examples 1–3 in Table 6, and Cases 1–4 of the catch curve model. (a), (c), (e) Results for Examples 1–3, respectively, where vertical lines represent ya, and curves show pa for Cases 1–4 (dotted, short-dashed, long-dashed, solid). (b), (d), (f) Cumulative curves corresponding to the proportions in (a), (c), (e).

 


View this table:
[in this window]
[in a new window]

 
Table 8. Maximum likelihood estimates of parameters {theta} in (2) and n from the Dirichlet distribution for the three examples in Table 6 and Cases 1–4 of the catch curve model. Analyses apply to survey data from 1986 (Examples 1 and 2) and 2004 (Example 3). AIC and AICc values are computed from (21)(22). For comparison with n, examples 1 and 2 have sample size n = 313; example 3 has n = 174.

 


View this table:
[in this window]
[in a new window]

 
Table 9. Maximum likelihood estimates of parameters {theta} in (2) and {sigma} from the logistic-normal distribution for the three examples in Table 6 and Cases 1–4 of the catch curve model. Analyses apply to survey data from 1986 (Examples 1 and 2) and 2004 (Example 3). AIC and AICc values are computed from (21)(22).

 
Rockfish age structure data typically suggest multiple recruitment anomalies, as in the bubble plot of Figure 3. The 1986 survey data shown in Figures 5a and 5c suggest more anomalies than the 2004 survey data in Figure 5e, although the latter spans a smaller group of ages because of the reduced presence of old fish. Our three examples (Table 6) are motivated by patterns in the data. Cumulative model estimates pa match the cumulative observations ya in 1986 with m = 4 anomalies (Figure 5d) and in 2004 with m = 2 anomalies (Figure 5f). Estimates of Z appear fairly robust to model choices, with Z varying from 0.047 to 0.060 in 1986 (Table 8, Examples 1 and 2) and from 0.083 to 0.162 in 2004 (Table 8, Example 3). In particular, these estimates suggest a higher total mortality Z in 2004.

For the 1986 survey data, the AIC and AICc values in Tables 8 and 9 suggest choosing Case 3 with m = 4 recruitment anomalies. Both the Dirichlet and logistic-normal models lead to the same conclusion, although with somewhat different parameter estimates. The 1986 data set has a relatively large number of observations from g = 37 age groups (Table 6). By contrast, the survey data set for 2004 has only g = 15 groups, because of the reduced number of fish ages present in the sample. The AIC indicates Case 4 with m = 2 anomalies, but the AICc suggests Case 3. Intuitively, the number of observations is too small to justify the extra selectivity parameters ({alpha}, ßk). Again, the Dirichlet and logistic-normal models produce the same conclusions.

Figure 6 represents a Bayes sample drawn from the Dirichlet posterior distribution for Example 3 (1986 data), Case 4. Unlike the point estimates listed in Table 8 and portrayed in Figure 5, this analysis examines the uncertainty in all seven parameters (Z, ßk, {alpha}, {rho}1, {rho}2, {tau}, n). Modal values from Table 8 appear as triangles in Figure 6, and these often lie near the edge of the scatterplot. Mean parameter values, shown as squares, are generally in a more central position. Scatterplots for the selectivity parameters ({alpha}, ßk) indicate very little structure, with each parameter varying almost like the uniform prior distribution across the range specified in Table 7. This result reinforces the preference for Case 3, based on the AIC and AICc in Table 8.


Figure 6
View larger version (71K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 6. Pairs plot showing 500 sample points from the posterior Dirichlet distribution for Example 3, Case 4. These come from a systematic subsample of every 200th point in a convergent Markov chain with length 100 000. Panels on the diagonal contain histograms representing the distributions of individual parameters. Other panels show a loess line through the scatterplots for parameter pairs. Triangles and squares designate points corresponding to the posterior mode and mean, respectively.

 
Histograms in the diagonal panels of Figure 6 show varying degrees of skewness in the model parameters, where the distribution of Z appears almost normal. Loess lines suggest that Z is inversely related to all other model parameters except n. In particular, the perceived value of Z will decrease as the parameters ({rho}1, {rho}2, {tau}) become larger, corresponding to higher and broader recruitment anomalies.

In the framework here, a choice of design parameters {phi} determines a particular model structure. That, combined with a data set, a Dirichlet or logistic-normal error distribution, and a choice of prior distributions implies a posterior distribution for Z. How much does the choice of model influence this distribution?

Figures 7a and 7b answer this question for Examples 2 and 3 (the 1986 and 2004 survey data), respectively. Four cases of the model, coupled with a choice of Dirichlet or logistic-normal error, give eight outcomes for each example. We portray the resulting Z distributions as boxplots, which often overlap considerably and therefore suggest similar conclusions about Z. Logistic-normal estimates usually have a wider distribution than those obtained from the Dirichlet. Consistent with Figure 6, the mode Z sometimes lies in the periphery of the Z distribution, typically in the upper range for the examples here. Case 4 of the model, with all components present, tends to give the most consistent posteriors between the two error assumptions.


Figure 7
View larger version (25K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 7. Boxplots of 500 sample values Z from the posterior distribution for (a) Example 2 (1986) and (b) Example 3 (2004). As in Figure 6, these come from a systematic subsample of a convergent Markov chain with length 100 000. Each panel shows boxplots for Cases 1–4, based on the Dirichlet (D) and logistic-normal (L) error distributions. A dotted line represents the mean Z of all samples in (a) and (b). (c) Boxplots of 500 ratios Z2004/Z1986 from values Z in (a) and (b). These lie almost entirely above the value 1, highlighted by a dashed line. Bounds marked as "95% limits" correspond to the 2.5 and 97.5% quantiles of the posterior distribution.

 
Figure 7 also allows us to address an essential question for the biology of this stock. Has Z changed between 1986 and 2004? Dotted lines in Figures 7a and 7b represent the mean of all Z values in both panels. Boxplots from 1986 lie almost entirely below this line, whereas those from 2004 lie predominantly above. This suggests a higher Z in 2004 than in 1986, and we carry the analysis one step further. For each year, we have equal sized, independent samples from the two distributions. Thus, we can obtain a sample distribution for the ratios


Formula 024M23

(23)
These lie almost entirely above 1 for all models and choices of error distribution. Given the assumptions of our catch curve model, Z has apparently increased by a factor near 2 for S. maliger in Johnstone Strait over the two decades since a quota fishery was initiated.


    Discussion
 Top
 Introduction
 Catch curve models
 Compositional data
 Compositional statistics
 Estimation and model selection
 Application to quillback...
 Discussion
 Appendix A
 Appendix B
 Appendix C
 Appendix D
 References
 
Our model formalizes issues with catch curve analysis raised by Ricker (1975, Chapter 2). Different choices of the design vector {phi} produce different posterior distributions for Z. If these all appear similar, then the information on Z is robust to the choice of model. If not, then the analyst can examine reasons why different models lead to different interpretations. Graphs such as those in Figure 5 act as diagnostic tools for assessing model fit at the maximum likelihood estimate or posterior mode. Age distribution plots (e.g. Figure 5a) and cumulative curves (e.g. Figure 5b) show how well the model fits the observed data.

Both our theory and the worked examples show the relevance of techniques from compositional data analysis. Visual methods, such as bubble plots and ternary diagrams, help reveal patterns in the data. The Dirichlet and logistic-normal distributions give rigour and biological meaning to the results, where estimates n and {sigma} provide measures of model error. A good model fit corresponds to a high value of n or a low value of {sigma}. Appendix D describes software to implement our analyses.

The model in Table 2 deals only with scenarios limited to constant effects of survival, selectivity, and recruitment. It could be extended in various ways, such as allowing a distinct standard deviation {tau}h for each recruitment anomaly at age bh. The ages bh could also be treated as free parameters, similar to parameters that determine modes of a mixture distribution (Schnute and Fournier, 1980). Historical patterns could be represented as systematic trends in natural or fishing mortality. Such models might be used to examine the problems with catch curve analysis discussed by Quinn and Deriso (1999), cited in the introduction.

As models become more complex, the number N of parameters increases. The AIC provides a guide for evaluating model fit, given the value of N. Moreover, the AICc takes account of the effective number g–1 of observations. Statistical analysis requires more observations than parameters; the denominator on the right side of (22) is positive only if g–1 > N + 1. This constraint highlights an essential issue for catch curve analysis. The available data may not be adequate to estimate all parameters of interest. We propose the framework here as an exploratory tool for investigating potential biological interpretations of limited data sets. The examples in Tables 8 and 9 and in Figure 7 by no means exhaust the possibilities for the 1986 and 2004 survey data sets.

The likelihood (T3.3) for the multinomial distribution can be calculated even if yi = 0 for some observations i (0! = 1 and p0 = 1 for p >> 0). However, for reasons discussed earlier, the multinomial fails to give an appropriate statistical description of the data vector y. The more realistic Dirichlet and logistic-normal distributions both require yi > 0 for every observation i. We use an amalgamation (one of several common compositional transformations) to obtain data that meet this requirement. A good general strategy gives as many groups as possible, subject to a specified minimum value for the amalgamated proportions yi. To some extent, an amalgamation smooths the observed data by combining small values of the observed proportions ya. Excessive smoothing corresponds to using a small value of g, which weakens a test based on the AICc.

More complete data, such as the age structures for two decades in Figure 3, suggest using a full catch-at-age analysis that tracks the population through time. For example, Schnute and Richards (1995) describe such analyses with a logistic-normal model for the observed age proportions. Here we have focused on separate analyses for each year, with a comparison between years early and late in the time-series. If we assume that the 1986 stock had experienced very little fishing mortality (F = 0), then the mortality ratio


Formula 024M24

(24)
in (23) relates simply to the ratio between fishing and natural mortality. In particular, the value r = 2 corresponds to a policy of setting fishing mortality F at the same level as natural mortality M.

The Bayesian approach in Figures 6 and 7 illustrates a focus on a single key parameter in a model with many additional nuisance parameters. In this case, the total mortality Z or the ratio (24) act like reference parameters that could be linked closely with policy decisions. Schnute and Haigh (2006) describe a similar theory of management strategies, where the allowable catch quota Q acts as the key parameter. Catch curve analyses might play a similar strategic role in helping define policies associated with a target mortality ratio F/M. The framework here provides a direct link between model designs, encapsulated by design vectors {phi}, and policy outcomes that depend on estimated mortalities or mortality ratios.


    Appendix A
 Top
 Introduction
 Catch curve models
 Compositional data
 Compositional statistics
 Estimation and model selection
 Application to quillback...
 Discussion
 Appendix A
 Appendix B
 Appendix C
 Appendix D
 References
 
Ternary diagrams
This appendix gives the formulae required to draw ternary diagrams, such as those in Figures 2 and 4. The proofs, not given here, depend on standard arguments from analytical geometry. In xy-coordinates with a 1:1 aspect ratio, start by drawing the equilateral triangle with vertices (0, 0), (1, 0), and (1/2, {surd}3/2). This corresponds to the choices a = 1 and c = {surd}3/2 in (11). The point p has coordinates


Formula 024M25

(A.1)

For each vertex "i", let (xi, yi) denote the point on the opposite side of the triangle, where a perpendicular line connects that side to p = (x, y). Then


Formula 024M26

(A.2)


Formula 024M27

(A.3)


Formula 024M28

(A.4)

For distinct indices (i, j, k) and a given odds ratio rjk = pj/pk, define p by the coordinates


Formula 024M29

(A.5)
Then a line through vertex "i" with constant odds ratio pj/pk connects "i" to the point (x, y) in (A.1) determined by p in (A.5).


    Appendix B
 Top
 Introduction
 Catch curve models
 Compositional data
 Compositional statistics
 Estimation and model selection
 Application to quillback...
 Discussion
 Appendix A
 Appendix B
 Appendix C
 Appendix D
 References
 
Proofs: Dirichlet distribution
Gelman et al. (1995, pp. 476–477) give the expressions (T3.3) and (T4.3) for the multinomial and Dirichlet distributions. Their version of the Dirichlet uses parameters {alpha}i equivalent to npi here. They also give moment formulae equivalent to (T4.6)–(T4.8). Applying Stirling's approximation (17) to the expression (T4.4) for {ell} gives (after algebraic simplification)


Formula 024UM2

where the constant K = (1/2)[(g–1) log 2{pi}{Sigma}i log pi] does not depend on n. The estimate n that minimizes this approximation satisfies the derivative condition d{ell}'/dn = 0, that is


Formula 024M30

(B.1)
Solving (B.1) for n gives (T4.5).


    Appendix C
 Top
 Introduction
 Catch curve models
 Compositional data
 Compositional statistics
 Estimation and model selection
 Application to quillback...
 Discussion
 Appendix A
 Appendix B
 Appendix C
 Appendix D
 References
 
Proofs: logistic-normal distribution
The logistic-normal distribution (T5.3) appears less commonly in the literature than the Dirichlet, particularly in the simplified form given here. At the end of this appendix, we sketch the key steps in its derivation. The underlying model (T5.1) and (T5.2) implies that


Formula 024M31

(C.1)
Taking expected values of both sides of (C.1) gives the result (T5.6). Similarly, for i != j


Formula 024UM3

because {varepsilon}i and {varepsilon}j are independent. This proves (T5.7). For three distinct indices (i, j, k), another moment formula comes from the calculation


Formula 024UM4

again because {varepsilon} has independent components. This proves (T5.8).

Aitchison (1986, p. 64) argues forcefully that log odds ratios, like those in (T5.6)–(T5.8), have more relevance to statistical analysis than the original proportions. Although the moment formulae (T4.6)–(T4.8) for the Dirichlet might look attractive, the notion of covariance seems a bit inappropriate for variates such as yi confined to the interval (0, 1). In an amusing analogy, he points out that a barbecue, although an excellent tool in the wide open spaces of North America (comparable to real space in g dimensions), might not be a suitable concept for cooking in a cramped Hong Kong apartment (comparable to the simplex associated with g proportions). For small values {sigma} in the model here, a power series expansion in the nonlinear transformation (T5.1) and (T5.2) gives


Formula 024M32

(C.2)
The approximate moments (T5.9)–(T5.11) follow from (C.2), after some algebra not shown here. Readers can test these results using a small value {sigma} in the simulation (T5.1) and (T5.2).

We come finally to the derivation of the likelihood (T5.3). Because the observed vector y sums to 1, the probability distribution properly belongs to a region in g–1 dimensions. The subvector yg = (y1, ... , yg–1) defines a suitable region, namely the simplex where yi > 0 and {Sigma}i = 1g–1 yi < 1. (By convention, a negative subscript indicates removal of a particular component from a vector.) We transform y to an equivalent vector z of real numbers with zg = 0, where the transformation and its inverse are


Formula 024M33

(C.3)


Formula 024M34

(C.4)
These also define a map between yg and zg, where it is understood that yg = 1–{Sigma}i = 1g–1 yi in (C.3) and zg = 0 in (C.4).

It follows from (C.1) that


Formula 024M35

(C.5)
The assumed normality of {varepsilon} implies that zg is multivariate normal. Aitchison (1986, p. 115, Formula 6.3) shows that


Formula 024M36

(C.6)
where µ and V denote the mean vector and covariance matrix of zg. These have dimensions (g–1) x 1 and (g–1) x (g–1), respectively. The result (C.6) takes account of the Jacobian


Formula 024M37

(C.7)
Aitchison (1986, p. 77) calls V the logratio covariance matrix, although his notation differs somewhat from ours.

The moment formulae (T5.6)–(T5.8) and the definition (C.3) allow us to evaluate µ and V explicitly. In particular, (T5.6) gives


Formula 024M38

(C.8)
Similarly, (T5.7) and (T5.8) imply that Vii = 2{sigma}2 and Vij = {sigma}2 when i != j. In matrix notation,


Formula 024M39

(C.9)
where Ig–1 is the identity matrix, Jg–1 is a matrix with all entries 1, and subscripts emphasize that these matrices have dimension (g–1) x (g–1). Moreover, the determinant and inverse of V can be calculated explicitly to give


Formula 024M40

(C.10)


Formula 024M41

(C.11)

The result (C.11) allows the quadratic form in the exponential factor of (C.6) to be written explicitly in terms of the residuals x = zgµ:


Formula 024M42

(C.12)


Formula 024M43

(C.13)


Formula 024M44

(C.14)


Formula 024M45

(C.15)
The symbol {delta}ij in (C.12) refers to the usual indicator variable, with {delta}ii = 1 and {delta}ij = 0 when i != j. By definition, zg = 0 in (C.3), and similarly (C.8) implies µg = 0. Thus xg = zg–µg = 0, and this extension of x allows us to extend the range of summation from g–1 to g in (C.13). The mean x in (C.14) refers to the full vector (x1, ... , xg), where


Formula 024M46

(C.16)

Because xi = zi–µi = log(yi/yg)–log(pi/pg), it follows from (C.16) that


Formula 024M47

(C.17)
Combining (C.6), (C.10), (C.12)–(C.15), and (C.17) gives the logistic-normal distribution in (T5.3).


    Appendix D
 Top
 Introduction
 Catch curve models
 Compositional data
 Compositional statistics
 Estimation and model selection
 Application to quillback...
 Discussion
 Appendix A
 Appendix B
 Appendix C
 Appendix D
 References
 
Implementation methods
Analyses such as those described here require appropriate software. We used "AD Model Builder" (http://otter-rsch.com/admodel.htm) to obtain modal parameter estimates and to generate Bayes posterior distributions. This software package primarily requires code to calculate the negative log-likelihood {ell}, as defined here by (T4.4) and (T5.4). Constraints such as those in Table 7 can readily be incorporated. The package automates calculation of the posterior mode, and it generates posterior Markov chain samples that start at the mode. For the examples presented here, formal tests (not shown) indicated rapid convergence. We believe that samples of size 500 drawn systematically from a chain of length 100 000 give reasonable representations of the posterior, although other examples might require longer chains. In part, we restricted posterior sample sizes to produce legible graphics in Figure 6. We used R and S–PLUS (Venables and Ripley, 2000) to generate all Figures.

Users interested in software available without charge can investigate our library "PBS Modelling" (Schnute et al., 2006) available on the Comprehensive R Archive Network (CRAN, http://cran.r-project.org/). We include an example that uses native R code to obtain maximum likelihood estimates. We also generate posterior samples using the R "BRugs" library and the program "OpenBUGS" (Bayesian inference Using Gibbs Sampling, http://mathstat.helsinki.fi/openbugs/). This includes native support for the Dirichlet distribution. Furthermore, as described in Appendix C, suitable transformations convert the logistic-normal distribution into a multivariate normal distribution, which is also supported by "OpenBUGS".

Another CRAN package "PBS Mapping" (Schnute et al., 2004) supports the fixed aspect ratios needed to produce ternary diagrams from the formulae in Appendix A.


    Acknowledgements
 
We thank the teams of field workers who, under the leadership of Laura Richards and Lynne Yamanaka, collected data on inshore rockfish along the coast of BC, Canada. Carl Schwarz provided useful background on the history and application of compositional data analysis. Terry Quinn, Franz Mueter, and a third anonymous reviewer provided helpful comments leading to an improved final draft.


    References
 Top
 Introduction
 Catch curve models
 Compositional data
 Compositional statistics
 Estimation and model selection
 Application to quillback...
 Discussion
 Appendix A
 Appendix B
 Appendix C
 Appendix D
 References
 

    Aitchison J. (1985) A general class of distributions on the simplex. Journal of the Royal Statistical Society 47:136–146 Series B.

    Aitchison J. (1986) The Statistical Analysis of Compositional Data. (The Blackburn Press, Caldwell, NJ)416 reprinted in 2003.

    Aitchison J. and Shen S. M. (1980) Logistic-normal distributions: some properties and uses. Biometrika 67:261–272.[Abstract/Free Full Text]

    Baranov F. I. (1918) [On the question of the biological basis of fisheries.] Izvestâ Otdela rybovodstva i naucno-promyslovyh issledovanii. 1:181–128 in Russian; cited after Ricker, 1975.

    Berry M. D. (2001) Area 12 (Inside) Rockfish Selective Fishery Study. Science Council of British Columbia, Project Number FS00–05.

    Burnham K. P. and Anderson D. R. (2002) Model Selection and Multivariate Inference: a Practical Information—Theoretic Approach 2nd edn. (Springer Science and Business Media, Inc., New York, NY) pp. 488.

    Gelman A., Carlin J. B., Stern H. S., Rubin D. B. (1995) Bayesian Data Analysis(Chapman & Hall/CRC, Boca Raton, FL) pp. 526 reprinted in 2000.

    Hart J. L. (1973) Pacific fishes of Canada. Bulletin of the Fisheries Research Board of Canada 180: pp. 740 reprinted in 1988.

    Love M. S., Yoklavich M., Thorsteinson L. (2002) The Rockfishes of the Northeast PacificUniversity of California Press pp. 405.

    Munk K. M. (2001) Maximum ages of groundfishes in waters off Alaska and British Columbia and considerations of age determination. Alaska Fisheries Research Bulletin 8:112–21.

    Quinn T. J. II and Deriso R. B. (1999) Quantitative Fish Dynamics(Oxford University Press, New York, NY) pp. 542.

    R Development Core Team. (2005) R: A language and environment for statistical computing. (R Foundation for Statistical Computing, Vienna, Austria) ISBN 3-900051-07-0, URL http://www.R-project.org.

    Richards L. J. and Cass A. J. (1987) 1986 Research catch and effort data on nearshore reef-fishes in British Columbia statistical areas 12, 13, and 16. pp. 119 Canadian Manuscript Report of Fisheries and Aquatic Sciences, 1903.

    Ricker W. E. (1975) Computation and interpretation of biological statistics of fish populations. Bulletin of the Fisheries Research Board of Canada, 191 382.

    Schnute J. T. (2006) Curiosity, recruitment, and chaos: a tribute to Bill Ricker's inquiring mind. Environmental Biology of Fishes 75:95–110.[CrossRef]

    Schnute J. T., Boers N., Haigh R. (2004) PBS Mapping 2: User's Guide. 126 Canadian Technical Report of Fisheries and Aquatic Sciences, 2549.

    Schnute J. T., Couture-Beil A., Haigh R. (2006) PBS Modelling 1: User's Guide. 111 Canadian Technical Report of Fisheries and Aquatic Sciences, 2674.

    Schnute J. and Fournier D. (1980) A new approach to length-frequency analysis: growth structure. Canadian Journal of Fisheries and Aquatic Sciences 37:1337–1351.

    Schnute J. T. and Haigh R. (2006) Reference points and management strategies: lessons from quantum mechanics. ICES Journal of Marine Science 63:4–11.

    Schnute J. T. and Richards L. J. (1995) The influence of error on population estimates from catch-age models. Canadian Journal of Fisheries and Aquatic Sciences 52:2063–2077.

    Venables W. N. and Ripley B. D. (2000) S Programming. (Springer, New York, NY)264.

    Williams E. H. and Quinn T. J. II. (1998) A parametric bootstrap of catch-age compositions using the Dirichlet distribution. Fishery stock assessment models for the 21st century Alaska Sea Grant College Program371–384 AK-SG-98-01.

    Yamanaka K. L., Lacko L. C., Lochead J. K., Marin J., Haigh R., Grandin C., West K. (2004) Stock assessment framework for inshore rockfish. Canadian Science Advisory Secretariat 63 Research Document, 2004/068.

    Yamanaka K. L. and Richards L. J. (1993) 1992 Research catch and effort data on nearshore reef-fishes in British Columbia Statistical Area 12. 77 Canadian Manuscript Report of Fisheries and Aquatic Sciences, 2184.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
64/2/218    most recent
fsl024v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Schnute, J. T.
Right arrow Articles by Haigh, R.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Schnute, J. T.
Right arrow Articles by Haigh, R.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?