Skip Navigation


ICES Journal of Marine Science: Journal du Conseil Advance Access originally published online on October 4, 2007
ICES Journal of Marine Science: Journal du Conseil 2007 64(9):1723-1734; doi:10.1093/icesjms/fsm149
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
64/9/1723    most recent
fsm149v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Roa-Ureta, R.
Right arrow Articles by Niklitschek, E.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Roa-Ureta, R.
Right arrow Articles by Niklitschek, E.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© 2007 International Council for the Exploration of the Sea. Published by Oxford Journals. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

Biomass estimation from surveys with likelihood-based geostatistics

Rubén Roa-Ureta1, and Edwin Niklitschek2

1 Department of Fisheries, PO Box 598, Stanley, Falkland Islands, Departamento de Oceanografía, Universidad de Concepción, PO Box 160-C, Concepción, Chile
2 Centro Trapananda, Universidad Austral de Chile, Portales 73, Coyhaique, Chile

Correspondence to R. Roa-Ureta: tel: +56 41 2203765; fax: +56 41 2256571; e-mail: rroa{at}udec.cl

Roa-Ureta, R., and Niklitschek, E. 2007. Biomass estimation from surveys with likelihood-based geostatistics. – ICES Journal of Marine Science, 64.

A likelihood-based geostatistical method for estimating fish biomass from survey data is presented. Biomass estimates from analysis of a positive random variable with an additional discrete probability mass at zero means that the method accommodates null observations and positive fish density. The positive fish density data were used to estimate mean fish density in the subareas where the stock was present. A presence/absence representation of the data in the survey area was modelled with a generalized linear spatial model of the binomial family, leading to an estimate of the area effectively occupied by the stock. As an extension, a procedure is proposed to accommodate extra sources of correlation, such as multiple surveys or multiple vessels. The new methodology was applied to three cases. The simplest case is a scallop trawl survey for which only the positive density data need to be analysed. The intermediate case is a trawl survey of highly mobile squid where the stock area and the mean density inside the stock area are analysed. The most complex case is in estimating the biomass of very localized orange roughy, for which repeat surveys create dependence in the data in addition to spatial correlation.

Keywords: geostatistics, likelihood, mixed models, stock assessment, survey

Received 22 October 2006; accepted 3 August 2007; advance access publication 4 October 2007.


    Introduction
 Top
 Introduction
 Material and methods
 Applications
 Discussion
 References
 
The estimation of fish biomass from survey data usually involves the use of an expansion estimator, where mean fish density estimated over the field covered by the survey is multiplied by the area over which the mean fish density estimate is considered to be valid. The survey biomass estimate is a relative biomass estimate, related to absolute biomass through a proportionality constant (Equation (3) of Hilborn, 2000), the "survey catchability". In general, both the mean fish density and the area over which this density is valid, referred to here as the effective stock area, or simply the stock area, could be targets of statistical estimation, because data are used to calculate both, and the resulting quantities contain uncertainty. Geostatistics is a technique powerful enough to accomplish this task because it takes into account spatial correlation in population density or sampling design. Currently in fisheries research and ecology, two branches of geostatistics are recognized, the transitive and intrinsic theories, both established by Matheron (1971) and introduced to fisheries by Conan (1985), Foote and Stefánsson (1993), and Petitgas (1993). Transitive and intrinsic geostatistical theories belong, respectively, to the design- and model-based schools of inference in finite population theory (Hansen et al., 1983, and discussions in Smith, 1990; Valliant et al., 2000). Owing to its model-based inferential basis, the intrinsic theory of geostatistics does not require the application of a probabilistic sampling design for inference from sample space (Aubry and Debouzie, 2000). Both transitive and intrinsic geostatistics differ from most model-based statistical techniques in that the likelihood function for the post-experimental description of the data in terms of model parameters is absent. Therefore, results such as maximum likelihood estimates, measures of precision derived from the shape of the likelihood function about the maximum, and model selection criteria, such as the Akaike Information Criterion (AIC), cannot be calculated. Here we present fisheries applications of a third geostatistical approach, based upon likelihood theory (Diggle et al., 1998, 2002, 2003), to estimate mean fish density and effective stock area, using explicit probability models for continuous and discrete spatial representations of the data, respectively, obtained from standard fisheries surveys. These two quantities are combined to estimate relative biomass, using the theory of a positive random variable with a spike of probability mass at the origin (Aitchison, 1955).

The effective stock area is often unknown because its perimeter is not observed completely and/or because there are "holes" inside the stock area where the density is zero. In intrinsic geostatistics, the effective stock area is supposed to be known and independent from the inner distribution of fish, allowing the concept of intrinsic properties of fish distribution to be advanced. With this approach, the effective stock area can be the area inside a perimeter defined by a cut-off density value based on the density surface reconstructed by kriging (e.g. Simard et al., 1992), or simply the area of a polygon drawn from the data. This approach is not entirely satisfactory because the stock area is considered as known, whereas in fact there might be considerable uncertainty associated with its boundaries. Alternatively, the transitive theory can be used either on presence/absence data to estimate the stock area only (Petitgas and Lafont, 1997), or on null and positive fish density observations jointly (Bez, 2002). Here we propose a likelihood-based method of estimating the effective stock area using a generalized linear spatial model on presence/absence data and the logit link function.

Survey data may have dependence other than spatial correlation, for example when repeat surveys are used to improve the precision of estimates in a given area, or when multiple vessels are used to increase spatial coverage in the narrowest possible time window. A further purpose of our work is to show an integrated approach to model simultaneously spatial and categorical sources of dependence in the data within the generalized linear mixed model framework described by Searle (1987; with applications and further developments in Littell et al., 1996).

We present three applications of the likelihood-based geostatistical methodology. The first is the simplest example: the perimeter of the stock area is observed and there are no zero density areas inside the stock area. The data come from a trawl survey on the Holberg bank of a species of scallop (Zygochlamys patagonica) off the east coast of the Falkland Islands. The second is an example of likelihood-based estimation of the effective stock area with presence/absence data and mean density with the positive density data, with no additional sources of dependence. Data are derived from a trawl survey for Patagonian squid (Loligo gahi) off the east coast of the Falkland Islands. The third application is more complex, including estimation of mean density from positive observations and the effective stock area from presence/absence data, with additional sources of dependence in the data. The purpose here is to estimate the biomass of orange roughy (Hoplostethus atlanticus) from repeated hydro-acoustic surveys over a seamount off the coast of central Chile.


    Material and methods
 Top
 Introduction
 Material and methods
 Applications
 Discussion
 References
 
First, we show our approach to estimating biomass including positive fish density data and null observations of fish density, with the theory of a positive variable having a spike of probability at zero (Aitchison, 1955). Second, we reproduce the statistical model of Ribeiro et al. (2003), valid only for positive observations of fish density, and develop a proposal to introduce sources of dependence other than spatial proximity. Third, we use our methodology to estimate effective stock area using a presence/absence representation of the full data and the generalized spatial linear model of Christensen (2004). The full modelling approach is shown in schematic fashion in Figure 1, and all symbols and notation are shown in Table 1. Fourth, we indicate briefly how our methods can be implemented in widely used programming systems, and finally, we give details of the three applications.


Figure 1
View larger version (21K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 1. Schematic representation of the modelling approach to estimating biomass from surveys using likelihood-based geostatistics.

 


View this table:
[in this window]
[in a new window]

 
Table 1. Glossary of mathematical symbols for the spatial models used.

 
Estimating the Gaussian model
In the Gaussian likelihood-based approach of Diggle et al. (2003), there are three conceptualizations for the spatial process. First, there is the true spatial signal, the true density surface, referred to as the "latent signal process" by Ribeiro et al. (2003). This is a continuous positive random variable conceptualized as a two-dimensional stochastic process. We call it Z(x, y). Second, the data are represented by a random variable Z(x, y) = f{lambda}–1 Z(x, y)) , where Z is the observable variable with realization z(x, y). Third, Z(x, y) is a normally distributed variable, with f{lambda} the Box–Cox transformation with parameter {lambda}. The variable Z is subject to geostatistical analysis.

Consider a two-dimensional region A. Any two points in A, {x, y} and {x, y}', are separated by the Euclidean distance h = ||{x, y}–{x, y}'||. For simplicity of exposition we consider Z(·) to be a stationary and isotropic Gaussian process, with EA[Z(x, y)] = ß, VarA[Z(x, y)] = {sigma}2, and spatial correlation function {rho}(h). The expectation of Z can be other than a constant, for example to account for a spatial trend. This will increase the number of parameters to be estimated with the likelihood function. Z(x, y) is conditionally normal and independent, given Z(x, y) with expectation z(x, y) and variance {tau}2. Finally, the observable variable Z(x, y)isinR+ is the inverse transform of Z(x, y). This specification allows for the fact that the realizations may not be distributed normally. Also, it assumes that the sampling design, i.e. the selection of observed localities, is fixed or that it is random but independent of Z(x, y).

Inside A there might be subregions unoccupied by the stock. This may happen for reasons of habitat unsuitability or the species might be schooling. However, the specification in the previous paragraph only refers to ziisinR+. To generalize it to the case of presence of null density observations, we adopt the approach of Aitchison (1955) and Pennington (1983), where the observable variable is considered as a positive random variable with a spike of probability mass at zero. Under this approach, the null density observations are treated separately (Pennington, 1983), rather than jointly with the positive density observations. Examples of applications of this approach are not rare in fisheries. For example, Maunder and Punt (2004) describe it as the Delta-lognormal approach in the standardization of catch and effort data to construct indices of relative abundance, and refer to it as a "hurdle model". In the generalized specification, Z distributes f{lambda}–1 (Z) conditionally on Z > 0 with mean {theta} = f{lambda}–1 (ß) and probability p of being positive. Recall that ß is the mean of the spatial process transformed to follow a normal distribution. Therefore, if {theta} and {gamma} are the mean and variance of the distribution of Z conditionally on Z > 0, then the unconditional mean and variance of Z are, respectively (Aitchison, 1955):


Formula 149M1

(1)

Let a be the surface area of A. Then the biomass in A becomes


Formula 149M2

(2)

The surface area a is assumed known exactly. For example, it could be the total survey area (the area inside the convex hull defined by the whole set of observations), or an area delimited by natural boundaries. When all marginal observations are null and all other observations positive, we calculate the total area a as the area inside the perimeter defined by passing a line at mid-distance between the last positive density observations inside A and the closest null observations outside (see below, Application 1).

Aitchison (1955) was interested in finding the best unbiased estimators for both {chi} and {eta} in Equation (1). Here we are interested in a maximum likelihood estimator (MLE) for {chi} only, and a measure of precision for it. Therefore, we are interested in MLEs for p, and {theta}, p and Formula from spatially correlated data. By functional invariance (Zhena, 1966; Berk, 1967), the estimator


Formula 149M3

(3)

is an MLE of true biomass with estimated variance


Formula 149M4

(4)

In Equation (3), the product ap can be interpreted as the area occupied by the stock in A. This is a parameter of interest in its own right, and we refer to it as the effective stock area {alpha}.

Estimating mean fish density in the effective stock area  
The effective stock area corresponds to the subset of locations where fish are present. Therefore, fish density (Z) is a positive variable, with density units such as g m–2. It is transformed using the function f{lambda}(.) which belongs in the Box–Cox family:


Formula 149M5

(5)

This introduces a new parameter, {lambda}, which plays the role of making the observations closest to normal. Under the basic stationary and isotropic Gaussian model for the random variable Z(xi, yi ) = f{lambda} (Zi), the model can be formulated as


Formula 149M6

(6)

where the {epsilon}i are iid normal random errors with mean zero and variance {tau}2. Thus, owing to the spatial correlation function, the vector Z is multivariate normal:


Formula 149M7

(7)

where 1 is an m times 1 vector of 1s, R is a matrix whose (i, i')th element is {rho}(hi,i'|{kappa},{phi}), and I is the m times m identity matrix (Diggle et al., 2003). A non-spatial model corresponds to a model where the {sigma}2R summand in Equation (7) is removed, i.e. the spatial correlation function is a "pure nugget" model, and Z distribute multivariate normal but {Sigma} is diagonal. The log-likelihood function for all model parameters {psi} = [{lambda} ß {phi} {sigma}2 {tau}2 ... ] is


Formula 149M8

(8)

Several features of the basic model need comment. First, in connection with the traditional approach of intrinsic geostatistics, parameters {sigma}2, {tau}2, and {phi} are the sill, the nugget, and the range of the model variogram. Second, the basic spatial model can be extended to include non-stationarity and anisotropy (Ribeiro et al., 2003). Third, parameter ß is the mean level of the Gaussian spatial process, related through {lambda} to the original mean fish density. Fourth, the construction of an empirical variogram is not necessary to estimate the parameters of the Gaussian process, in particular the parameter of interest in biomass estimation, ß. This is because this parameter is not estimated as the mean of the kriged surface. In fact the biomass can be estimated without carrying out kriging. In the likelihood approach of Diggle et al. (2003), kriging is mainly a form of prediction carried out using Bayesian inference (Ribeiro et al., 2003).

An important extension of the basic model is the case where there is an additional source of dependence, e.g. when repeated acoustic surveys are carried out over the same stock or when several vessels are used simultaneously. In this case, the model in Equation (6) could be


Formula 149M9

(9)

where the additional term is composed of the random effects design matrix F and a vector w of realizations of a random vector W, representing the random effects. W is distributed multivariate normally with mean zero and covariance matrix G. Under the extended model of Equation (9), the distribution of Z becomes


Formula 149M10

(10)

Estimating the probability of observing the stock in the survey area
Let the survey area A of surface area a be divided into K cells of equal size, and let Mk be the number of successes in observing the stock in cell k (k = 1,K) when Nk observations have been made in that cell. The observation at each cell k is composed of the set, {k, mk, nk}, where k is the latitude and longitude in the centre of the cell. The model we adopt is a binomial spatial process with a common observation probability p:


Formula 149M11

(11)

The binomial observation probability is connected to the underlying unobserved Gaussian process through the logit link function


Formula 149M12

(12)

Observation probability p is assumed to be a constant, i.e. the process is first-order stationary. This is the same assumption as in Equation (7), where the mean process ß is constant across space. The constant g(p) is the mean of the binomial observation process in the generalized linear specification, so we can write ßM = g(p) with the subscript M indicating that this ß corresponds to the binomial observation process. This two-dimensional specification of a spatial binomial process is similar to a one-dimensional logistic regression where the logit link is equal to the intercept only, i.e. a constant. The unobserved underlying multivariate normal vector Mk is then


Formula 149M13

(13)

The corresponding null model is the non-spatial model where the {sigma}2MRM term in the variance parameter has been removed, a pure nugget model. The likelihood function for this type of generalized linear spatial model is, in general, not expressible in closed form (Diggle et al., 2003; Christensen, 2004). Therefore, Diggle et al. (2003) proposed using a Monte Carlo Markov Chain (MCMC) to simulate from the conditional distribution of M given M, and to use the simulation runs to approximate the likelihood function, a procedure regarded as MCMC maximum likelihood by Geyer and Thompson (1992). Christensen (2004) implemented this proposal for generalized linear spatial models in the Poisson and binomial families.

The ML estimate for p is derived from that of ßM, and its variance is obtained using Taylor series:


Formula 149M14

(14)

Computer implementation
When there are no additional sources of dependence in the data apart from spatial proximity (Applications 1 and 2 below), all analyses can be carried out in the statistical programming system R (R Development Core Team, 2005), specifically using the packages geoR (Ribeiro and Diggle, 2001), with positive fish density, and geoRglm, with the presence/absence data (Ribeiro et al., 2003). When there are additional sources of dependence, the analysis can be carried out in the macro-Glimmix written for SAS (SAS Institute, 1999) by Littell et al. (1996). We can provide illustrative sessions from both systems on request.


    Applications
 Top
 Introduction
 Material and methods
 Applications
 Discussion
 References
 
Estimating total scallop biomass from a trawl survey
Between 10 and 24 January 2005, a scallop (Z. patagonica) trawl survey was conducted on the northeast continental shelf of the Falkland Islands on board the Uruguayan factory vessel "Holberg". A total of 103 hauls (66 positive hauls) was carried out in the Holberg Scallop Bed with twin otter trawl gear (Figure 2). A local depletion experiment (details omitted for brevity) carried out with a separate dataset produced a maximum likelihood estimate for gear capture efficiency, r (the fraction of the stock within the area swept by the net that is captured and brought on board for processing) of r= 0.084, and this estimate had a measure of precision s(r) = 0.011. If Formula* is the MLE of mean scallop density before gear-efficiency correction, then the gear-efficiency corrected back-transformed mean density is Formula'=Formula*/r. As this application included an estimate of survey catchability, the biomass estimate can be regarded as absolute.


Figure 2
View larger version (20K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 2. Locations of observations of the scallop bed (Z. patagonica) trawl survey in the Falkland Islands. Polygons in the left map indicate rocky bottoms. Empty circles are null observations, and filled circles are positive density observations.

 
The scallop bed presents the extreme case where all external observations are null, so its perimeter was observed completely (Figure 2). Moreover, the bed has no holes of distribution, i.e. all internal observations of density are positive. Therefore, the stock area can be taken as known exactly, which reduces the spatial analysis to estimation of mean density [Equations (6)–(8)]. Estimating the survey catchability introduced extra uncertainty. Taking into account the fact that the estimates Formula* and r are independent random variables by construction, an approximate measure of precision of biomass by Taylor series is


Formula 149M15

(15)

The univariate distribution of density was asymmetrical (Figure 3a), but became nearly symmetrical when the parameter of the Box–Cox transformation was {lambda} = 0.3 (Figure 3b). Therefore, 0.3 was the starting value for {lambda}. The plot of density against each spatial dimension separately suggested there was no trend (Figures 3c an 3d), so the model for the mean of the process was assumed to consist of a single parameter, ß. The maximum distance between any two observations was 5.32 km. A variogram constructed with the transformed data allocated into 24 arbitrary distance bins of 112.5 m each in four orthogonal directions showed no evidence of anisotropy (Figure 3e). This plot also showed that 5 and 10 were good starting values for the variance owing to incomplete observation (the nugget, {tau}2) and the variance attributable to the spatial process (the sill, {sigma}2), respectively. Additionally, this plot and the plot of the raw variogram data (i.e. the variogram cloud with neither arbitrary distance bins nor distance limits; Figure 3f) showed that at about 1 km distance, the spatial correlation flattened, so the starting value of the range parameter {phi} was 1. In practice, we tried all combinations resulting from adopting two starting values close to those indicated above, for each of the four parameters.


Figure 3
View larger version (26K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 3. Positive scallop bed (Z. patagonica) density data in the trawl survey in the Falkland Islands: (a) raw density frequency distribution; (b) Box–Cox transformation of raw density data; (c) east–west density profile; (d) north–south density profile; (e) directional binned variogram; (f) omnidirectional cloud variogram.

 
The form of the spatial correlation function was chosen as the general 3-parameter Whittle–Matérn function. This function arises naturally when extending the classic one-dimensional and unidirectional time-series theory into a two-dimensional and bidirectional spatial series theory (Whittle, 1954). It was generalized by Matérn (1987) to account for a range of smoothness in the underlying Gaussian process. The function has mathematical properties that make it suitable for the type of calculation necessary to maximize the likelihood and to compute the curvature of the likelihood about the maximum. The function has the form


Formula 149M16

(16)

where {kappa} is a parameter that determines the degree of smoothness of the spatial process Z(x, y), K{kappa}(·) is the modified Bessel function of the second kind, order {kappa}, and {Gamma} is the gamma function. Determination of a starting value for the smoothness parameter {kappa} is less straightforward than for the other parameters. In general, {kappa} is lower when the variogram cloud appears more exponential and is higher when it appears more Gaussian. We tried several values from 0.5 to 4.5.

The spatial model was superior to a non-spatial model (>15 units of difference in the AIC) where the three spatial correlation parameters have been dropped (Table 2). A visual impression of data and model fit can be obtained by plotting binned density data over the two-dimensional field and reconstructing the scallop bank by spatial prediction (simple kriging using all data for interpolation) with fixed parameter values (Figure 4). Clearly, the model reflects the observed variation, with two nuclei of higher abundance in the north and the south of the bank. The estimate was ~15 000 t of scallops, with a CV of 30% (Table 2).


Figure 4
View larger version (23K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 4. (a) Spatial distribution of transformed density data of the scallop bed (Z. patagonica) in the Falkland Islands; (b) spatial prediction by the likelihood-based geostatistical model (simple kriging using all data for interpolation).

 


View this table:
[in this window]
[in a new window]

 
Table 2. Maximum likelihood estimation of biomass of scallops (Z. patagonica) from the Falkland Islands in January 2005 using a geostatistical model and additional results from a gear-efficiency experiment.

 
Estimating the relative biomass of female squid from a trawl survey
Between 1 and 14 February 2005, a trawl survey for Patagonian squid (L. gahi) was conducted over the eastern continental shelf of the Falkland Islands on board the Falklands factory vessel "Capricorn". We obtained 482 observations of local density, 328 of which were positive, from 49 trawls (Figure 5) by applying a catch allocation scheme described elsewhere (Roa-Ureta and Arkhipkin, 2007). Those authors argue that the grouping of local density observation into trawls (nearly 10 observations per trawl) did not introduce extra dependence into the data. Moreover, the survey was carried out by a single vessel that passed over each locality once. Therefore it would appear to be adequate to treat this application as one where dependence in the data comes from spatial correlation only. In computing the standard deviation of the biomass estimate, we ignored the uncertainty derived from applying the proportion of females in a random sample to the total catch of that trawl. The geostatistical analysis shown here only dealt with the local density of females.


Figure 5
View larger version (37K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 5. Locations of observations of female squid (L. gahi) in the trawl survey in the Falkland Islands. Polygons indicate rocky seabed, black circles positive density, and grey circles null density.

 
The positive density data were analysed in geoR, as reported for the first application, and we fitted a Gaussian correlation model [{kappa}->{infty} in Equation (16)]. This spatial model was far superior to a non-spatial model in which the range and spatial variance parameters have been dropped (Table 3). The fitted value of the Box–Cox transformation parameter indicated that the original variable was distributed approximately log normally.


View this table:
[in this window]
[in a new window]

 
Table 3. Maximum likelihood estimation of female L. gahi biomass from the Falkland Islands in February 2005.

 
This application is an example of a situation where the perimeter of the biological process was not observed completely and there were holes in the observed region where the density observed was zero (Figure 5). Therefore, the effective stock area could not be determined unambiguously and had to be considered unknown and a target of estimation. We defined a grid of 166 cells of 5 x 5 km each including only the ground covered during the survey, and counted the number of trials (Figure 6a) and the number of successes (Figure 6b) in observing the squid stock in each cell. The size of the cells was fixed as a trade-off between the number of points for estimation of spatial correlation (166 squares) and the number of observations in each cell. The analysis was carried out in geoRglm (Christensen and Ribeiro, 2002). We also used the package coda for diagnostics on convergence and mixing of the Markov chain. As in the case of the positive density data, we fitted a Gaussian correlation model. The estimated probabilities of observing the stock are shown in Figure 6c, and detailed numerical results in Table 3. Clearly, the spatial model was much superior to a non-spatial model (91 AIC units difference). The range of the spatial correlation function can be compared between the spatial model for the positive continuous densities and the spatial model for the binomial counts. It was similar, as expected, because of its origin in the same stock, but somewhat higher in the binomial counts model probably because of a loss of spatial resolution (Table 3). Taking the results from the two models together leads to a relative biomass estimate of around 11 000 t of female squid, with a CV of 34% (Table 3).


Figure 6
View larger version (15K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 6. Counts of (a) trials, and (b) successes in observing the Patagonian squid stock (L. gahi) during trawl surveys in the Falkland Islands, and predictions of the probability of observing the stock. (c) The survey area has been gridded into 5 x 5 km cells.

 
Estimating orange roughy spawning stock size from an acoustic survey
On 8 and 9 August 2004, three repeated acoustic surveys were conducted over the seamount JF3 near the Chilean Archipelago of Juan Fernandez. All surveys were conducted aboard the factory vessel "Betanzos", using a SIMRAD EK60 transceiver and a SIMRAD ES38B (38 kHz) transducer. The adaptive sampling design was based upon semi-random transects (Jolly and Hampton, 1990), which were arbitrarily divided into 100 m sections, yielding a total of 1358 elementary sampling units (ESUs). After covering the whole seamount during the first survey, a high-density stratum was defined and covered during two subsequent surveys.

In all, 22 discrete echo traces were identified, involving 95 ESUs (Figure 7). Hence, this is a case where (i) high spatial correlation was expected since the observed stock was highly aggregated and fragmented; (ii) there was a need to deal with stratification and repeated surveys, which were likely to impose an additional source of (intra-survey) dependence in the data; (iii) concern existed about correlation within transects as a result both of the temporal correlation in data collection and the potential weather effects of surveying while sailing against (westward) or along (eastward) the swell; and (iv) there was interest in exploring a fourth source of correlation that might exist among ESUs belonging to the same echo trace, beyond what was captured by the spatial correlation itself. For simplicity we ignored other sources of uncertainty typical of acoustic observations in deepwater, such as those derived from estimating target strength, mean fish length, species identification, and dead zone. The model for the positive density corresponded to Equations (9)–(10), with an additional variance term FGF', with F the design matrix identifying the survey, transect, and echo trace. The transformation parameter {lambda} was estimated to be 0.01, then fixed to 0 for convenience in terms of back-transformation. Hence, we assumed a lognormal distribution for the sampling probability for the positive values of the echo-integration variable, i.e. the nautical area scattering coefficient (SA, m2 nautical mile–2). The variable SA was transformed into fish density using standard hydroacoustic procedures (MacLennan and Simmonds, 1992), and a target strength constant for each fish size following Hampton and Soule (2002). In this application we had several competing models available to us, depending upon which sources of data-dependence were included (Table 4). For instance, a full spatial mixed model for positive mean fish density was somewhat superior (AIC = 275.6) to a purely spatial model (AIC = 279.3), and both were better than a null (pure nugget) model (AIC = 286.7). However, a simpler spatial mixed model with survey as the only extra source of dependence in the data was similar (AIC = 274.8) to the full spatial mixed model. Therefore, we chose the last model with spatial distance and survey factors, and used it to estimate mean fish density with positive density observations and the effective stock area. To estimate effective stock area (Equations (11)–(14) plus the term FGF' in the covariance matrix), we first attempted to model a Bernoulli process, by assigning point locations to the observation process with a 0/1 response variable. However, we experienced convergence problems when mixing spatial and additional sources of correlation. Therefore, we grouped the presence/absence data collected in each survey into 500 x 500 m cells, counting successes and trials as we did in Application 2. Thereafter we used comparable models to those used for acoustic density: pure nugget (null), spatial only, and spatial mixed models with survey as the only random source of correlation. For estimating mean acoustic density and effective stock area, we used a Gaussian spatial correlation function. All analyses were carried out in SAS©, by means of macro-Glimmix (Littell et al., 1996).


Figure 7
View larger version (39K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 7. Hydroacoustic sampling tracks for orange roughy on the JF3 seamount, off the central Chilean coast. Grey dots correspond to sampling units where the fish were absent, and black dots to positive density samples.

 


View this table:
[in this window]
[in a new window]

 
Table 4. Maximum likelihood estimation of orange roughy (H. atlanticus) biomass for the seamount JF3 off Robinson Crusoe Island (Chile) in August 2004, using models of increasing complexity.

 
The best-fit mixed model provided a biomass estimation of 1500 t (CV = 60%; Table 4). The null model produced a higher estimate of 1700 t (CV = 31%). The smaller CV of the null model suggested that uncertainty would be underestimated if full independence was assumed for the data. The higher biomass estimated under the null model was caused by overestimating both mean density and stock area (Table 4). Compared with the mixed model, the spatial-only model underestimated both the biomass and its uncertainty, yielding a biomass estimate of 1100 t ± 580 t (s.e.). The reduction resulted from underestimating both the stock area by 7% and the mean density by 17%.


    Discussion
 Top
 Introduction
 Material and methods
 Applications
 Discussion
 References
 
Transitive and intrinsic geostatistics have provided a powerful theoretical framework to analyse spatial data from surveys ever since they were introduced into fisheries research. Likelihood-based geostatistics is a third geostatistical approach to modelling spatial data from surveys, and as such it increases the range of analytical tools available for stock assessment. The underlying spatial model in intrinsic and likelihood geostatistics is the same; the difference lies in the manner in which inferences are drawn. This inferential difference stems from the fact that in the likelihood-based approach, the data are realizations of a random vector with an explicit probability model (Diggle et al., 2003; Ecker, 2003), whereas in the intrinsic approach the data are realizations of a random function without an explicit probability model. In the transitive approach the data are conceived as a set of constants, and probability considerations only enter through the actions of the researcher when selecting locations to observe.

The methodology presented here is based on an underlying Gaussian model, although it seems natural to develop alternative theories based on different distributions. For example, the Gamma density is the distribution of a positive random variable, so a theory based on it may not require transformation to analyse the positive density data. The general likelihood theory of the Gaussian linear spatial model for continuous random variables and the generalized linear models in the binomial and Poisson families for counts have been developed by Diggle et al. (1998, 2003), Ribeiro et al. (2003), and Christensen (2004). Here we described this new theory and proposed (i) a methodology to estimate the stock area using Christensen's (2004) generalized spatial lineal model, (ii) an extension of the basic likelihood model to account for extra sources of dependence in the data, and (iii) a methodology to estimate fish biomass by combining the stock area estimate with an estimate of mean fish density in the areas where the stock is present, using Aitchison's (1955) approach, also called a "hurdle" model. It would be better to have a joint likelihood model for both spatial processes, namely the binomial counting process and the positive fish density process. With such a model, all the parameters, and the biomass index itself, could be estimated simultaneously by maximization of a joint likelihood function, instead of analysing each process separately and combining the results with Aitchison's (1955) approach (see Figure 1). Also, a joint likelihood model could account for possible interactions between the binomial counting process and the positive fish density process.

Many concepts from intrinsic geostatistics are important too in likelihood geostatistics. For example, the binned and the cloud variograms can be used in likelihood geostatistics to find starting values for likelihood maximization. Another example is kriging, recognized as spatial prediction in the likelihood approach and usually analysed using Bayesian inference (e.g. Ribeiro et al., 2003). In fact, simple kriging is equivalent to conditional expectation (Ribeiro and Diggle, 2000, p. 7). We used kriging to apprehend visually the fit of the spatial model for the scallop bed example.

Ecker (2003) provides a thorough review of practical and theoretical differences between intrinsic and likelihood-based geostatistics. Our purpose here was explanatory rather than comparative, but we emphasize a few more aspects of a comparison between likelihood-based and intrinsic geostatistics that could be relevant to fishery scientists. The first aspect is the ease with which additional sources of dependence in the data can be accommodated within the likelihood-based approach. It is not apparent to us how this can be done in intrinsic geostatistics. This is important because surveys are often carried out in such a way that the data present some structure that may produce additional dependence. For example, a hydroacoustic survey may pass repeatedly over a small area where the stock is aggregated (e.g. the third application in this paper), or several vessels may cover different sections of a large area. A second aspect is that the likelihood approach allows use of a model-based inference framework for both mean positive fish density and the probability of observing the stock. By contrast, in the model-based intrinsic approach, the stock area is usually not estimated (the spatial process is assumed to be realized within a field of fixed geometry) and when it is estimated, authors use the design-based transitive theory with the indicator variable (Journel and Huijbregts, 2004, pp. 428–435; Petitgas and Lafont, 1997) or with null and positive fish density observations jointly (Bez, 2002). The transitive theory is based on an entirely different inference theory, and it requires adherence to a specific sampling design (Bez, 2002). However, likelihood-based estimation of the effective stock area comes at the cost of better use of the stationarity assumption, and this assumption may be more easily violated with binomial counts than with positive density data. A third aspect is related to the estimation of mean fish density, and the precision of this estimate and consequently of the biomass estimate. In intrinsic geostatistics, mean fish density is the mean of the kriged surface, i.e. the mean of the spatial prediction. The precision of mean fish density is estimated from the idea of extensive variances (Journel and Huijbregts, 2004), whereby the repetitions of small parts in space become a proxy for repetitions over the realizations of the random function at a given point. In likelihood-based geostatistics, on the other hand, the mean fish density estimate is the mean of the spatial process, not the mean of the spatial prediction. The mean of the spatial process is a parameter in the likelihood function and, as such, its precision can be estimated directly from the curvature of the likelihood about the maximum.

Treating the extra sources of dependence as random factors in a spatial mixed model is adequate for truly independent factors, such us multiple vessels surveying the same area at the same time. However, for factors such as multiple surveys carried out sequentially in time (by the same vessel), the mixed model approach is only a crude approximation, because the surveys are actually ordered in time and therefore not really random; the levels of this factor show temporal serial correlation, and in the mixed model approach the information regarding the order of the series is lost. Nevertheless, given the limitations of available theories and computational methods, the proposed approach represents at the very least an improvement in the process of making explicit acknowledgment of the time factor when repeated surveys are conducted on the same stock.

A biomass estimate from a survey is related to the total biomass in the stock by a scaling factor, 1/q in Hilborn (2000). In the case of trawl surveys, there is usually a fraction of escapement of the stock from the fishing gear or the fishing gear path, so q < 1, although herding can lead to q > 1. In our first application this fraction was quantified by means of a depletion experiment, so the biomass estimate for the scallop bank can be considered an estimate of absolute biomass. In fact, in the Falkland Islands this estimate was used to compute a fishing quota by directly applying an exploitation rate to the estimate. By contrast, the biomass estimate in Application 2 provides only an index of abundance, because no effort was made to estimate the scaling factor. Roa-Ureta and Arkhipkin (2007) compared survey biomass estimates with estimates from models of daily commercial catch and effort, and it appeared that the q parameter was of the order of 0.35. In the case of hydroacoustic surveys, the scaling factor represents the fraction of the stock that escapes from the gear by avoiding the vessel, migrating out of the targeted zone (dial migrations) and "hiding" in the dead zone (the unobservable layer located immediately above the bottom). Therefore, the biomass estimate for orange roughy presented in Application 3 should also be considered relative.


    Acknowledgements
 
We thank Alexander Arkhipkin, who read and commented on an early version of the manuscript, and two anonymous reviewers, who suggested major improvements. Also, Nicolas Bez carried out (twice) a thorough technical review that contributed greatly to clarification of many of our ideas and suggested the reference to Ribeiro and Diggle (2000) on the equivalence of conditional expectation and simple kriging. RR-U is especially grateful to Paulo Ribeiro and Ole F. Christensen for their help in the use of their R packages geoR and geoRglm, respectively, and to Eino Uikkanen for his assistance with his program, GeoConv, for coordinate transformation. The method to estimate effective stock area is part of a PhD dissertation in statistics by RR-U. The orange roughy application was funded by a grant of the Chilean Fisheries Research Fund, FIP 2004–13, to EN.


    References
 Top
 Introduction
 Material and methods
 Applications
 Discussion
 References
 

    Aitchison J. On the distribution of a positive random variable having a discrete probability mass at the origin. Journal of the American Statistical Association (1955) 50:901–908.[CrossRef][Web of Science]

    Aubry P., Debouzie D. Geostatistical estimation variance for the spatial mean in two-dimensional systematic sampling. Ecology (2000) 81:543–553.[Web of Science]

    Berk R. Review of "Invariance of maximum likelihood estimators" by P.W. Zehna. Mathematical Reviews (1967) 33:342–343.

    Bez N. Global fish abundance estimation from regular sampling: the geostatistical transitive method. Canadian Journal of Fisheries and Aquatic Sciences (2002) 59:1921–1931.

    Christensen O. F. Monte Carlo maximum likelihood in model-based geostatistics. Journal of Computational and Graphical Statistics (2004) 13:702–718.[CrossRef][Web of Science]

    Christensen O. F., Ribeiro P. J. geoRglm: a package for generalized linear spatial models. R-NEWS (2002) 2:26–28.

    Conan G. Y. Assessment of shelfish stocks by geostatistical techniques. ICES Document, CM 1985/K: 30 (1985) 24.

    Diggle P. J., Moyeed R. A., Rowlingson B., Thomson M. Childhood malaria in Gambia: a case-study in model-based geostatistics. Applied Statistics (2002) 51:493–506.

    Diggle P. J., Ribeiro P. J., Christensen O. F. An introduction to model-based Geo statistics. In: Spatial Statistics and Computational Methods—Møller J., ed. (2003) 173. New York: Springer. 43–86. Lecture Notes in Statistics. 216 pp.

    Diggle P. J., Tawn J. A., Moyeed R. A. Model-based geostatistics. Applied Statistics (1998) 47:299–350.

    Ecker M. D. Geostatistics: past, present, and future. In: Environmetrics. Developed under the auspices of UNESCO. Encyclopedia of Life Support Systems (EOLSS)—El-Shaarawi A. H., Jureckova J., eds. (2003) Oxford, UK: Eolss Publishers. http://www.eolss.net.

    Foote K. G., Stefánsson G. Definition of the problem of estimating fish abundance over an area from acoustic line-transect measurements of density. ICES Journal of Marine Science (1993) 50:369–381.[Abstract/Free Full Text]

    Geyer C. J., Thompson E. A. Constrained Monte Carlo maximum likelihood for dependent data. Journal of the Royal Statistical Society, Series B (1992) 54:657–699.

    Hampton I., Soule M. Acoustic survey of orange roughy biomass on the north east Chatam Rise. Marine Fisheries Surveys Pty Ltd. In: Report to the New Zealand Orange (2002) Nelson, New Zealand: Roughy Management Company. 63.

    Hansen M. H., Madow W. G., Tepping B. J. An evaluation of model-dependent and probability-sampling inferences in sample surveys. Journal of the American Statistical Association (1983) 78:776–793.[CrossRef][Web of Science]

    Hilborn R. Calculation of biomass trend, exploitation rate, and surplus production from survey and catch data. Canadian Journal of Fisheries and Aquatic Sciences (2000) 58:579–584.

    Jolly G. M., Hampton I. A stratified random transect design for acoustic surveys of fish stocks. Canadian Journal of Fisheries and Aquatic Sciences (1990) 47:1282–1291.

    Journel A. G., Huijbregts Ch. J. Mining Geostatistics (2004) New Jersey: Blackburn Press. 600.

    Littell R. C., Milliken G. A., Stroup W. W., Wolfinger R. D. SAS System for Mixed Models (1996) Cary, NC: SAS Institute Inc. 633.

    MacLennan D. N., Simmonds E. J. Fisheries Acoustics (1992) London: Chapman & Hall. 325.

    Matérn B. Spatial Variation. In: Lecture Notes in Statistics (1987) 2nd. 151.

    Matheron G. The Theory of Regionalized Variables and its Applications. Les Cahiers du Centre de Morphologie Mathématique de Fontainebleau. Mimeo (1971) 211.

    Maunder M. N., Punt A. E. Standardizing catch and effort data: a review of recent approaches. Fisheries Research (2004) 70:141–159.[CrossRef][Web of Science]

    Pennington M. Efficient estimators of abundance, for fish and plankton surveys. Biometrics (1983) 39:281–286.[CrossRef][Web of Science]

    Petitgas P. Geostatistics for fish stock assessment: a review and an acoustic application. ICES Journal of Marine Science (1993) 50:285–298.[Abstract/Free Full Text]

    Petitgas P., Lafont T. EVA2: estimation variance. Version 2. A geostatistical software on Windows 95 for the precision of fish stock assessment surveys. ICES Document 1997/Y: 22 (1997).

    R: A language and environment for statistical computing. R Foundation for Statistical Computing. (2005) Vienna, Austria. R Development Core Team.

    Ribeiro P. J., Christensen O. F., Diggle P. J. geoR and geoRglm: software for model-based geostatistics. In: Proceedings of the 3rd International Workshop on Distributed Statistical Computing, Vienna—Hornik K., Leisch F., Zeileis A., eds. (2003) 16.

    Ribeiro P. J., Diggle P. J. Bayesian inference in Gaussian model-based geostatistics. Technical Report ST-99-08 (2000) 46.

    Ribeiro P. J., Diggle P. J. geoR: a package for geostatistical analysis. R-NEWS (2001) 1:15–18.

    Roa-Ureta R., Arkhipkin A. Short-term stock assessment of the squid Loligo gahi of the Falkland Islands: sequential use of stochastic biomass projection and stock depletion models. ICES Journal of Marine Science (2007) 64:3–17.[Abstract/Free Full Text]

    SAS OnlineDoc®, Version 8, SAS Institute Inc. Cary, North Carolina, U.S.A. (1999) SAS Institute.

    Searle S. R. Linear Models for Unbalanced Data. John Wiley and Sons. (1987) New York.

    Simard Y., Legendre P., Lavoie G., Marcotte D. Mapping, estimating biomass, and optimizing sampling programs for spatially autocorrelated data: case study of the northern shrimp (Pandalus borealis). Canadian Journal of Fisheries and Aquatic Sciences (1992) 49:32–45.

    Smith S. J. Use of statistical models for the estimation of abundance from groundfish trawl survey data. Canadian Journal of Fisheries and Aquatic Sciences (1990) 47:894–903.

    Valliant R., Dorfman A. H., Royall R. M. Finite Population Sampling and Inference. A Prediction Approach. (2000) New York: John Wiley.

    Whittle P. On stationary processes in the plane. Biometrika (1954) 41:434–449.[Free Full Text]

    Zehna P. W. Invariance of maximum likelihood estimators. Annals of Mathematical Statistics (1966) 37:744.[Web of Science]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
64/9/1723    most recent
fsm149v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Roa-Ureta, R.
Right arrow Articles by Niklitschek, E.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Roa-Ureta, R.
Right arrow Articles by Niklitschek, E.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?