ICES Journal of Marine Science: Journal du Conseil Advance Access originally published online on January 31, 2007
ICES Journal of Marine Science: Journal du Conseil 2007 64(3):559-569; doi:10.1093/icesjms/fsl045
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Geostatistical simulations of eastern Bering Sea walleye pollock spatial distributions, to estimate sampling precision
Alaska Fisheries Science Center, NMFS, NOAA, 7600 Sand Point Way NE, Building 4; Seattle WA 98115, USA
tel: +1 206 526 4681; fax: +1 206 526 6723; e-mail: paul.walline{at}noaa.gov
Walline, P. D. 2007. Geostatistical simulations of eastern Bering Sea walleye pollock spatial distributions, to estimate sampling precision. ICES Journal of Marine Science, 64: 559569.Sequential Gaussian and sequential indicator geostatistical simulation methods were used to estimate confidence intervals (CIs) for biomass estimates from six echo-integration trawl surveys of eastern Bering Sea walleye pollock (Theragra chalcogramma) biomass. Uncertainty in the acoustic and the length frequency data was combined in the calculation of CIs. Sampling in 2002 provided evidence for isotropy in the spatial distribution. Variogram models were characterized by long ranges (75122 nautical miles for non-zero acoustic data, for example) compared with the smallest dimension of the survey area (
100 nautical miles) and small nugget effects (
20% of the semi-variance in transformed normal space for acoustic data). The 95% CIs obtained for the abundance estimates did not vary greatly between years and were similar to those from a one-dimensional transitive geostatistical analysis, i.e. ± 59% of estimated total biomass.
Keywords: confidence intervals, geostatistical simulation, hydroacoustic survey precision, walleye pollock
Received 30 May 2006; accepted 17 December 2006; advance access publication 31 January 2007.
| Introduction |
|---|
|
|
|---|
For more than 20 years, scientists from the Alaska Fisheries Science Center (AFSC) have conducted periodic echo integration/trawl surveys of the eastern Bering Sea (EBS) to assess the distribution and abundance of walleye pollock (Theragra chalcogramma). The surveys consist of equally spaced parallel transects, a design chosen to obtain the most precise estimate of abundance possible in the presence of local positive autocorrelation (Simmonds and Fryer, 1996; Harbitz and Aschan, 2003; Simmonds and MacLennan, 2005). This choice was made despite the difficulties created in the estimation of precision, because high precision of the estimator of abundance was considered more important. Random placement of transects would allow straightforward estimation of variance (Jolly and Hampton, 1990), but would produce less precise estimates of abundance.
When sampling is not random, classical methods for estimating variance cannot be used without correction for spatial correlation in data, unless the minimum distance between the observations is larger than the correlation length. In contrast, geostatistical methods can be used with most sample designs, including systematic ones. According to Simmonds and MacLennan (2005), geostatistical methods are "now accepted as a reliable way... . for estimating the sampling variance when the survey design is not strictly random". The ICES-sponsored Workshop on Survey Design and Data Analysis (ICES, 2005) also recommended a systematic survey design with a geostatistics-based estimation of variance as optimal for an acoustic survey to estimate the abundance of a single species.
Consistent with these recommendations, a geostatistical transitive method for calculating the variance of the abundance estimate for acoustic surveys with equally spaced transects (Petitgas, 1993a; Bez, 2002) has been applied to the analysis of EBS surveys (Williamson and Traynor, 1996). Because the transitive method used at AFSC is based on transect sums of abundance, i.e. each transect is treated as a single observation, it is referred to as a one-dimensional (1D) geostatistical method. The transitive geostatistical method gives a relative estimation error (estimation variance0.5 total abundance1), which provides an indication of the precision of the surveys, but it cannot directly be used to construct confidence intervals (CIs) about the biomass estimate without making assumptions about the probability distribution function (pdf) of the estimates (Rivoirard et al., 2000). Use of twice the relative estimation error as a CI implies that the distribution is normal, which may not be true for highly skewed data typical of acoustic surveys. Another drawback is that the sampling error associated with trawl sampling is not included in this estimate of variance. The spatial distribution of the length frequency distributions from the trawl samples, needed for the conversion of acoustic samples to biomass by species, is assumed to be measured without error when the transitive method is applied using transect sums of biomass.
In theory, the estimation variance for a survey can also be obtained through global block kriging methods (Journel and Huijbregts, 1978). If calculated for a single block encompassing the entire survey area, practical problems can arise because of the need to invert a large covariance matrix (Goovaerts, 1999). The estimation variance from smaller blocks can be aggregated providing that the block estimates are not autocorrelated (Kern and Coyle, 2000). However, even if the block estimates are uncorrelated, the drawbacks of the use of the transitive geostatistical method are shared by block kriging methods. The form of the pdf for block kriging estimates of abundance is unknown, so CIs cannot be determined unless the pdf is assumed to be normal.
Recently, Gimona and Fernandes (2003) presented a method using conditional sequential Gaussian (SG) geostatistical simulations to estimate CIs for the abundance estimates from an acoustic survey. Their method allows the combined variance of the acoustic and fish length measurements to be estimated. In addition, the pdf of abundance estimates is obtained and can be used to determine CIs empirically. Despite the advantages of the method, Gimona and Fernandes (2003) concluded that the estimates obtained are likely to be biased because of the transformation algorithm, which is an integral part of the analysis. In the present study, all the pitfalls mentioned by Gimona and Fernandes (2003) are addressed. An attempt is made to eliminate the bias of the transformation algorithm by simulating the spatial distribution of zeros and non-zero data separately and then making the simulation only at points where pollock are present. In addition, the FORTRAN code was modified to avoid the problem of estimating abundance at points outside the survey area and to eliminate a problem in the transformation algorithm within the simulation FORTRAN code.
For comparison, CIs were also estimated using conditional sequential indicator (SI) simulations (Goovaerts, 1997). The SI method makes fewer assumptions than the SG method and deals with highly skewed distributions, typical of acoustic data, by fitting separate variogram models to high density areas and to moderate or low density areas (Petitgas, 1993b) rather than using a transformation or a robust estimator of the variogram (Cressie and Hawkins, 1980; Maravelias et al., 1996). The SI method may also be more conservative than the SG method because indicator simulations exhibit larger fluctuations in realization statistics, such as variograms or the stationary mean, than do Gaussian simulations (Deutsch and Journel, 1998).
The simulation techniques employed are based on the spatial distribution of survey data as characterized by model variograms (see Rivoirard et al., 2000, or Webster and Oliver, 2001, for discussion of variograms and variogram models). Because survey designs in the years considered here consisted of parallel transects, no samples were taken between transects, so the short-range (inter-transect) isotropy of the distributions could not be evaluated directly. In a recent study walleye pollock distributions at some locations in the EBS were found to be isotropic at scales of 0.14 nautical miles (Horne and Walline, 2005), hereafter referred to as "miles" for brevity. In the present study, the distributions were assumed to be isotropic, i.e. the spatial correlation between transects was assumed to be the same as that along transects. However, during the 2002 survey, data were collected specifically to evaluate this assumption, and the results are presented here.
Two geostatistical simulation methods are applied to EBS walleye pollock survey data to obtain CIs for the abundance estimates. Those intervals are compared with intervals obtained from a 1D transitive geostatistical method, assuming normality. The relative importance of the errors in acoustic and length frequency data is evaluated, and the CIs obtained are compared. Implications of the results for survey design and suggestions for extension of the simulation methods to calculate CIs by size or age and to calculate CIs for the precision estimates are briefly discussed.
| Material and methods |
|---|
|
|
|---|
Survey data
All six AFSC EBS surveys conducted between 1994 and 2002 were analysed. Survey methodology was consistent for those surveys (Karp and Walters, 1994; Traynor, 1997; Honkalehto et al., 2002), although survey tracks differed slightly. The survey design consisted of parallel NS transects spaced 20 miles apart, sampled over a 2-month period in summer. The southern end of each transect was either near shore (Alaska Peninsula or Aleutian Islands) or beyond the shelf break, in water 5001000 m deep. The northern ends of transects were fixed on the basis of historical pollock distributions, in 5085 m of water or at the USRussia Convention Line. Transects were extended north if fish echo sign was present. Acoustic data for all these surveys were collected with a calibrated Simrad. EK 500 echosounder (Bodholt et al., 1989) operating at 38 kHz. (Reference to product names does not imply endorsement by the National Marine Fisheries Service, NOAA.) The nominal pulse length was 1 ms, beam width was 7°, and ping rate was 1 s1. Data were analysed and echo sign classified to species with the Simrad BI500 software (Knudsen, 1990). Calibrations were performed before, during, and after each survey following standard procedures described by Foote et al. (1987).
Surveys also included trawling to collect biological samples to partition the echo sign by species and length and to convert echo-integration data to biomass. The trawls were conducted in areas with strong echo sign attributed to walleye pollock. For the analyses described here, trawl length frequency data were summarized by calculating the root mean square length (RMSL, cm) for each trawl. RMSL is obtained by taking the square root of the average squared fish length:
|
|
Nautical area backscattering coefficient (sA, following the notation of MacLennan et al., 2002) attributed to walleye pollock of all sizes combined was averaged for each 0.5 nautical mile length segment. The acoustic data were collected and integrated from an upper depth of 14 m to within 0.5 m of the bottom. RMSL and sA data were combined to obtain the density of fish:
|
| (1) |
a is the density in number per square mile and
bs the backscattering cross-section (MacLennan et al., 2002), related to target strength (TS):
|
|
Target strength is related to length through the empirically derived equation for walleye pollock:
|
|
For each survey, descriptive statistics were calculated for the original sA and RMSL data points, as determined in the surveys. These statistics included the mean, standard deviation, upper and lower quartiles, skewness, count, and range. For the sA data set, the percentage of zeros was calculated and tabulated. CIs under the assumption of normality (estimate ± 2 x estimation error) from a 1D geostatistical procedure were also calculated for each survey.
Simulations
Three separate simulations of the spatial distribution of sA were made for each survey: (i) a conditional SG simulation, (ii) a conditional SG simulation with zeros treated separately, and (iii) a conditional SI simulation. For the second approach, the locations where walleye pollock was present were determined using an indicator simulation method with only two categories, presence and absence. Finally, a conditional SG simulation was made of RMSL of walleye pollock from the trawl samples. All processes were assumed to be homogeneous/stationary, i.e. the correlations involved were assumed to be dependent on the distance between two observation points only and not on the location itself.
The domain over which simulations were made was chosen as the area bounded by the ends of the transects for each survey (Figure 1). The domain was rasterized by dividing it into 2.7 x 2.7 mile (5 x 5 km) blocks, with the points at the centre of these blocks composing the grid matrix used to represent the survey area. Stochastic geostatistical simulations were made of the sA and RMSL fields, closely following Goovaerts (1997) and Gimona and Fernandes (2003), using GSLIB FORTRAN routines (Deutsch and Journel, 1998; code available at www.gslib.com). Data were transformed to standard normal N(0,1) space by a non-parametric normal score transform, which is a global one-to-one or rank-preserving transform. This transform can be seen as a correspondence table between equal probability quantiles of the sample distribution and a normal distribution with a mean of 0 and variance of 1 (for more details, see Goovaerts, 1997). A model variogram (with nugget) was fitted to the empirical variogram in normal score space for sA and RMSL. Then a random sample for all grid points (nodes) in the domain was generated by conditional simulation. Both sA and RMSL simulations were conditional, i.e. they maintained the observed values at nodes with data. The normal score values were then back-transformed to observation space using the GSLIB backtr FORTRAN program (Deutsch and Journel, 1998), which is the inverse of the normal score transformation. The transformation table generated in the normal score procedure was applied in the back-transformation, with linear interpolation for simulated values falling between values in the table. No extrapolation beyond the maximum value in the data set was included.
|
To calculate biomass for a realization, the value for sA at each node was combined with the RMSL value for the same node using Equation (1). This procedure was repeated 1000 times, resulting in 1000 equi-probable realizations of biomass spatial distribution. Summing the biomass over all grid points for a realization and multiplying by the area represented by a grid point (7.3 miles2) produced an estimate of total walleye pollock biomass for a realization. CIs were then determined empirically from the frequency distribution of the 1000 biomass estimates.
This procedure assumes that the processes determining RMSL and sA distributions are independent. If these variables are correlated, a more complicated method would be necessary to ensure that the correlation is preserved in the simulated fields. A priori, there is no reason to suppose that RMSL is related to the density of fish (sA), and the correlation between RMSL and sA was low (average r2 was 0.06) in the six survey data sets. In addition, the use of the conditioning (sample) data in the simulations should force the realizations to display at least some correlation, if it is present. Therefore, it was considered unnecessary to employ a more complicated simulation procedure, such as simulated annealing or using one of the variables as secondary information in the simulation of the other.
The SG approach is based on an assumption that the data, in this case sA and RMSL data, are multivariate Gaussian. A normal score transformation was first made to ensure that the univariate cumulative distribution functions (cdf) were normal (Goovaerts, 1997). To assess the assumption of binormality of the normally transformed sA and RMSL variables, indicator variograms were compared with the theoretically derived bivariate normal values (Deutsch and Journel, 1998). The normal score transformation ensures that univariate normality is achieved, but this transform has no effect on the bivariate cdf (Deutsch and Journel, 1998). As Deutsch and Journel (1998) stress, this check is not a formal statistical test, and there are no criteria for acceptance or rejection of the hypothesis of bivariate normality. The consequences of failure to meet by various degrees the requirement for multivariate normality are not known. Still, according to the same authors, "heuristic considerations are enough to make the Gaussian model the privileged choice for modeling continuous variables, unless proven inappropriate". In the present case, although some of the indicator variograms appeared to conform closely to the expected bivariate normal values (Figure 2a), others did not (Figure 2b).
|
Gimona and Fernandes (2003) suggested that a correction such as that presented by Saito and Goovaerts (2000) for multi-Gaussian kriging might be a way to adjust for bias in the normal score back-transformation algorithm. However, such a correction does not appear to be necessary for simulations, because the value obtained through simulation at a node is not a weighted average of normal score values, i.e. the kriged mean. Instead, a single value is drawn from the appropriate cdf in the simulation procedure, with mean and variance determined from simple kriging with the normal score variogram model. Any averaging (e.g. to calculate the average value at a node over all 1000 realizations) is done in the original data space. If normal scores at a node were averaged over many realizations and then back-transformed, the result would be biased, and a correction procedure such as that suggested by Saito and Goovaerts (2000) would be necessary.
Gimona and Fernandes (2003) suggested that the normal score transform could be a problem for data sets with many zeros, so the data were blocked from the initial 0.5-mile grid to a 2.7-mile (5 km) grid eventually used in the study, reducing skewness and the percentage of zeros. In a further attempt to address the potential bias caused by the presence of zeros in the data, a two-stage procedure in which zeros and non-zeros were treated separately was developed. The presence/absence of walleye pollock (or zeros) was modelled using an indicator simulation of the categorical indicator variable, coded with a 1 for non-zero walleye pollock sA and with a 0 where there was no pollock (Deutsch and Journel, 1998). A new variogram model was obtained by fitting a curve to the experimental variogram of the normal-score-transformed non-zero data. Then, the sA values at nodes where pollock occurred, as determined from one of the realizations of presence/absence, were obtained by SG simulation. A different presence/absence realization was used as a mask for each SG realization, so only non-zero nodes from the indicator simulation were simulated in the SG simulation with non-zero data. The average sA over the survey area was then calculated for each of 1000 SG realizations of the spatial distribution of non-zero sA made using one of 1000 realizations of presenceabsence.
Empirical variograms were calculated for normal-score-transformed sA values for each survey for the full data sets with zeros included, and for the sets with the zeros removed (S+ for Windows, Version 6.1, Insightful Corp., 2002). Exponential or spherical models with nugget were fitted to the empirical variograms using a weighted least squares objective function (Cressie, 1993). For the 1994 and 1996 surveys, a single model provided a poor fit at the shortest lag distances. For these two cases, a good fit was obtained using two nested models (exponential and spherical) with differing ranges. These two fits were made visually, using EVA2 software (Petitgas and Lafont, 1997). A single model with nugget was used to fit all non-zero sA empirical variograms.
For the SI method, data were transformed to a set of binary variables corresponding to each probability quantile (p-quantile). For each sampled grid node and each p-quantile, a value of 1 was assigned if the sA was greater than or equal to the p-quantile cut-off threshold, and a value of 0 was assigned if the sA was less than the cut-off threshold. For each data set (year), nine indicator variograms were fitted to the empirical variograms obtained using the indicator coded data, one for each p-quantile (0.1, 0.2, ..., 0.9). The variograms were fitted automatically using a weighted least squares objective function (S+). In one instance (out of 54 in total), a manual adjustment was made because the automatic fit was deemed to be incorrect.
Modified versions of the GSLIB FORTRAN routines sgsim and sisim (Deutsch and Journel, 1998) were used to produce simulated values at each node in the survey area sequentially. The modifications to the FORTRAN routines (code supplied by P. Goovaerts, pers. comm.) allowed simulations to be made only at points within the domain, which is preferable to making simulations at all points in a rectangular grid, then discarding values outside the survey area. For the SG simulations, a newly simulated value at a node is chosen at random from a Gaussian conditional cdf generated by kriging the input data values and any nearby previously simulated values. A similar procedure was followed for the SI simulations, using full indicator kriging (simple kriging) at each grid location to establish the conditional cdf from which simulated values were randomly drawn. The nodes are visited in random order, so a variable number of previously simulated nodes is available to be used in estimating the cdf at the new node.
Other parameters that must be chosen when making a series of simulations, such as maximum search radii, minimum and maximum original data for the simulation, and the number of simulated nodes to use were kept constant for all surveys with few exceptions. Most of these parameters are intended to allow control of the size of the search neighbourhood and the number of points to be used in simulating a new node. They had little effect on these simulations, because data points from the closest transects are so heavily weighted that they dominate, even if the search radius or the number of original data points to be used is increased in an attempt to include points from transects farther away.
In all, 1000 replicate data sets of RMSL (SG method) and sA (SG, SG no zeros, and SI methods) were simulated for each survey, and the empirical 95% CIs for mean sA and mean RMSL, i.e. the 25th smallest means and the 25th largest means (25/1000 = 2.5% in each tail) were obtained. For each survey, 1000 sA (SI) and 1000 RMSL simulated data sets were combined to calculate walleye pollock density at each node in the resulting 1000 grids. Numbers were then converted to biomass using the lengthweight relationship for walleye pollock. The biomass values at each node were summed to produce a biomass of walleye pollock within the domain. In addition, the percentage of nodes with zero sA and the skewness was calculated for each simulated sA data set, to compare with the input data values.
Isotropy
Following the 2002 survey, three transects (from north to south: 184, 127, and 172 miles long) were made in a direction (approximately EW) orthogonal to the survey transect orientation (NS) to check for anisotropy (Figure 1). These three transects together with the two connecting cross-transects (105 and 189 miles long) are collectively referred to herein as the isotropy transects. Spherical models were fit to empirical variograms of the normal-score-transformed 0.5-mile-averaged sA data from the isotropy transects and compared with variogram models developed from the block-averaged 2002 survey data. An additional data set consisting of the isotropy transects and three survey transects (Transects 22, 19, and 9; 310, 217, and 185 miles, respectively) crossing near the middle of the three orthogonal transects was also analysed. Each data set was examined for geometric anisotropy by comparing the range of the spherical model fit to the data in two directions: one the direction with the longest range (major range), the other orthogonal to it (minor range).
| Results |
|---|
|
|
|---|
For these surveys, 782510 600 points (0.5 miles averages) of sA were available within the domain selected for analysis and simulation. After block-averaging, the number of data points was reduced to a low of 1590 in 1994 and a high of 2147 in 2000 (Table 1). Skewness was reduced by block-averaging, especially for the 1999 survey, which dropped from 15.5 to 8.5. In the other years, the reduction was not as large. The percentage of zeros in the data sets was also reduced by block-averaging, and ranged from a low of 4% to a high of >22% (Table 1).
|
There was a high degree of autocorrelation in all surveys because the nugget was only 1322% of the sill (Table 2). The sA empirical variograms converged on a sill in all cases, providing support for the assumption of stationarity. This was less apparent for the RMSL variograms, which had much larger ranges than did the sA variograms (Figures 2c and 2d). An exponential model was fitted to the sA empirical variograms for surveys from 1997 through 2000. A spherical model was a better fit for the 2002 survey. As explained, two components were used to model the 1994 and 1996 empirical variograms: one with a range of 40.5 miles and a second longer-range component of around 216 miles. The variogram for the 2002 sA data set, which had the lowest percentage of zeros, was nearly identical to that for the same data set excluding zeros. The rest of the variograms for the sA data without zeros had shorter ranges and higher nugget effects than did the corresponding variograms for sA with zeros (Table 2).
|
As would be expected, the empirical indicator variograms showed a high degree of autocorrelation (Table 3). The variograms were regular and well behaved, in that variation about the model fit was low (Figures 2e and 2f). The range of the model variogram generally decreased with p-quantile, i.e. higher values of sA were correlated for shorter distances than were smaller values, but this relationship was not observed in 1996. The range of the 0.9 p-quantile, the category of the largest sA values, was between 30 and 63 miles (Table 3, Figure 2f).
|
For the RMSL variograms, a spherical model was chosen in two cases (1997 and 2002). The ranges and nuggets for the model fits were greater than for the sA variograms (Table 2), and the model fits were poorer (Figure 2b).
No evidence of geometric anisotropy was apparent from the analysis of the 2002 data set collected for that purpose. The difference between the major and minor ranges for all three data sets examined was <3 miles, only <4% of the ranges (Table 4). The ranges for all three sets were about the same, 6770 miles.
|
The simulated sA data sets contained nearly the same percentage of zeros and had almost the same skewness (Table 5) as the input data sets. The frequency distributions of simulated mean sA estimates were nearly normal for all surveys. That observed for the 2000 survey was representative (Figure 3a). Skewness was low, ranging from 0.02 to a maximum of 0.14 for the SI simulations and from a low of 0.04 in 1996 and 1997 to a high of 0.17 in 2000 for the SG simulations. As were the distributions of simulated mean sA estimates, the frequency distributions of mean RMSL for the survey simulations (e.g. Figure 3b) were close to normal and characterized by low values of skewness (0.20 to 0.06), resulting in 95% CIs that were nearly symmetrical about the mean. When the sA and RMSL simulated data sets were combined to estimate CIs for total biomass, the resulting frequency distributions in some cases deviated from a normal distribution more than did the sA or RMSL distributions (Figure 3c).
|
|
As a result of the low skewness observed in the distributions of mean sA, RMSL, and total biomass, the empirically derived 95% confidence limits were more or less symmetrical (Table 6). For all three simulation methods, the largest CIs were found for the 1999 survey. The smallest observed were for the SG and SI methods applied to the 1997 survey data. The mean sA for all simulation methods in all years was smaller than the mean of the input data, except for the SI method in 1994, where the simulation mean equalled the mean of the input data. The SI method generated the means closest to the input data means (Table 6), so these results were used with RMSL simulations to produce biomass estimates.
|
Overall, the CIs estimated through simulations and those obtained from a 1D geostatistical method assuming normality were similar, the smallest being ± 5% of the mean for the 1D method and the largest being ± 9% of the mean for both the simulation and 1D methods in 1999. Both methods identified the 1999 survey as the most imprecise, and CIs from the simulation method were larger than those from the 1D method in four of the six years.
The widths of the survey biomass CIs were not obviously related to any of the descriptive statistics. For example, the 2002 survey was characterized by the single highest data value (sA of 25 852 for a 2.7-mile block average) and highest skewness (8.8) observed, but it did not have the widest CI for mean sA or total biomass (Tables 1 and 5). The greatest variability (widest CI) appeared in the 1999 simulations. That result could not be explained by poor model fit to the empirical variograms. On the contrary, the fits for both sA and RMSL variograms were exceptionally good. The 1999 survey did have the second highest skewness, block-averaged data value, and standard deviation. Although the 1999 survey had the highest observed sA value for a single 0.5-mile ESDU (71 824), replacing this value with the mean had almost no effect on the sA simulations. The calculated CIs remained essentially unchanged.
| Discussion |
|---|
|
|
|---|
All three simulation methods gave similar results for CIs about the estimate of mean sA. The results from the more complicated SG method of treating zeros separately did not seem to justify the additional effort. The CIs produced were indistinguishable from those obtained using the SG method on all the data, despite the differences in the normal-score-transformed variograms.
The CIs about the estimates of biomass obtained from the simulations were generally larger than those using the 1D method and assuming normality. However, in both cases, the CIs were < ± 10% of the mean for all the surveys. The CIs from the simulation methods were nearly symmetrical about the abundance estimators, because the frequency distributions were nearly normal.
The small interannual differences in sampling precision (CIs) were not considered significant. Simmonds and Fryer (1996) demonstrated, by sampling from simulated data sets, that the estimation of variance is much less precise than the estimate of abundance for a systematic survey such as the EBS walleye pollock surveys. The precision of the variance estimates from geostatistical simulations can be investigated by resampling from the conditional realizations or from unconditional realizations created using variograms inferred from the survey data (Journel, 1994). In either case, it would be necessary to fit variogram models to hundreds of empirical variograms made from resampled data sets. For the indicator simulations, the number of model fits needed would be in the thousands. It might prove difficult to establish criteria that could be used to judge the model fits and that would be equivalent to the manual inspection normally used to ensure that the fits are proper.
The mean sA from SG simulations was always smaller than the input data sA mean. The difference was smaller than that observed by Gimona and Fernandes (2003), but mean sA from the input data was outside the 95% CIs for the simulated mean sA in the last 4 y surveyed. This was not thought to be caused by the transformation procedure, because the difference persisted for the simulations treating zeros separately and occurred before back-transformation, i.e. the normal score mean for realizations was consistently less than 0. When the range parameter of the variogram is large compared with the dimensions of the domain, discrepancies ("ergodic fluctuations" in geostatistics jargon) between realization statistics and the corresponding model statistics are larger than when the range is small compared with the size of the simulation field (Goovaerts, 1997). The range parameter for variograms of normal score-transformed sA was large compared with the minimum dimension (width) of the survey area. The consistent underestimate of mean sA is therefore likely attributable to the similarity between years in the number of survey transects, survey area, and variography after normal score transformation. The ranges for the indicator variograms used in the SI simulations were shorter, especially for the class of highest abundance (0.9 p-quantile). For that class, ranges were <42 miles for all years except 2002, so the effect would be expected to be smaller, as observed.
The assumption of isotropy was supported by the limited data set obtained from special transects made after the 2002 survey. This result was expected on the Bering Sea shelf, where the depth gradient is small. However, only 1 y was sampled with only a few transects in the EW direction. Interannual differences in the thermal structure of the EBS cause differences between years in the distribution of walleye pollock (Kotwicki et al., 2005), so small-scale anisotropy may have existed in some of the years analysed here or in other areas of the EBS.
The isotropic model variograms for the normal-score-transformed sA data sets were remarkably consistent between years, especially when the zeros were removed. Differences in variogram behaviour at distances up to the transect spacing (20 miles) were small for the surveys from 1997 to 2002. The 1994 and 1996 surveys differed, but the longer ranges observed for those surveys tended partially to offset the effect of the larger nuggets observed then, making the difference between these two surveys and the last four smaller. The fit of the models to the empirical variograms was very good for all surveys. These characteristics were in marked contrast to those reported for herring by Gimona and Fernandes (2003), who also had to cope with a much higher percentage of zeros (>50%). These attributes (large range and small nugget) of walleye pollock spatial distribution make geostatistical approaches especially well suited to estimation of variance of the abundance estimator and account for the small CIs calculated.
The variogram parameters can be used to describe and quantify fish aggregation patterns (Wilson et al., 2003; Mello and Rose, 2005). The range parameter for walleye pollock distributions in the EBS is larger than for pollock in the Gulf of Alaska reported by Sullivan (1991), and may be the largest reported for any fish. The large range parameter is explained by structures in the spatial distribution that reach lengths as great as 110 miles and is consistent with the observation of layers of pollock that extend for 50100 miles. The interpretation of variogram model parameters as they relate to fish aggregation patterns is difficult if the variogram models consist of nested components, as was the case in 1994 and 1996. When a single model is used for these 2 y, the fit to the empirical variogram is not as good, but the range is close to that of the subsequent four surveys. The large range and small nugget indicates strong autocorrelation, and the similarity of variogram parameters suggests that the aggregation pattern is consistent between years at this season and over the range of population sizes observed.
These relatively small differences between years and the consistencies in the variography make it reasonable to generalize from these results and to apply conclusions to future surveys. For example, the range of the indicator variograms for both the 0.8 and 0.9 p-quantiles, the highest density patches of pollock, was consistently >30 miles. Therefore, transect spacing (currently 20 miles) could be increased to 25 or 30 miles without exceeding the range of any of the variograms observed in these six surveys, which is the minimum sampling requirement necessary for mapping (Simmonds and MacLennan, 2005).
An isotropic spatial distribution for walleye pollock also has implications for survey design, especially transect orientation. In the presence of anisotropy, transects should be orientated in the direction of greatest change, e.g. perpendicular to shore or to depth contours. If the distribution is spatially isotropic, transect orientation can be selected to minimize errors attributable to changes, especially migration, during the survey (Simmonds and MacLennan, 2005). Although ignored in the simulations because of lack of information, migration could have a significant effect in these surveys, because of their length (>2 months). Both biomass and its variance would tend to be biased by a survey that proceeds in the direction of a migration or opposite to it (Simmonds and MacLennan, 2005). Walleye pollock in the EBS appear to move northwards and shorewards during summer, when these surveys are made (Kotwicki et al., 2005). If so, a large component of the movement would be perpendicular to the direction in which the survey proceeds, and the effect on estimates of biomass and their variance would be small. Potentially, underway acoustic Doppler current profiler measurements of fish schools could be used to determine the speed and direction of their movements (on average) and might provide some useful information for evaluating the impact of migration on EBS walleye pollock biomass estimates from acoustic surveys.
The observed range in sA was nearly five orders of magnitude, whereas RMSL varied by a factor of 2 or 3 at most. The variance (as indicated by the CIs in terms of percentage of the mean) of sA in the simulated data sets were much higher than that for RMSL. When the two sets of simulated spatial distributions were combined to estimate total biomass, CIs were only slightly greater than those for sA alone. It was concluded that most of the variance in the estimates of total biomass, as determined by the simulation methods used in this study, were due to the variance of the acoustic return (sA). This is to be expected, as variations in RMSL affecting the number of fish have a lesser effect on the total biomass because changes in numbers are partially compensated for by corresponding reciprocal changes in the weight of individual fish.
Because the mean sA obtained from the SI simulations was close to that of the input data, differences between the biomass obtained from the normal survey methods and those obtained from the simulations were attributed to the use of RMSL to summarize length frequency distributions. As used here, RMSL has the advantage of propagating the variability from the trawl sampling in the computation of biomass from acoustic data. However, using RMSL to represent a complete length frequency distribution does not allow the determination of CIs by size or age class, although it does a surprisingly good job in the calculation of biomass from sA, judging from the close agreement between the survey estimates of biomass and the estimates obtained from the simulations. During AFSC walleye pollock surveys, locations for trawl hauls are chosen by experienced survey scientists and are not distributed randomly. The 100 or so hauls made on a typical EBS survey are insufficient to allow geostatistical simulation of each length increment separately. Bootstrapping methods, such as those used by Simmonds (2003), offer an alternative to the geostatistical method used in this study for including the variance of the trawl length frequency data. However, bootstrapping samples must be drawn from randomly sampled and uncorrelated data sets, which is not the case for the trawls made during an EBS survey. The variance attributable to the length frequency sampling would therefore likely be underestimated by bootstrapping. Such a refinement would, however, allow the estimation of precision of biomass estimates by size or age group, but would not be expected to change the conclusion reached in this study that patchiness of acoustic return accounts for more of the variability in total biomass estimates than does uncertainty in estimation of length frequency distributions.
The framework used here for combining the sampling variance of sA and length frequency could be extended to include additional sources of error in acoustic surveys designed to produce an abundance index. The method would be especially useful for propagating the error from those additional sources of uncertainty that vary spatially, such as hydrographic conditions. Systematic errors (bias) attributable to factors such as migration out of the sampling area, inaccuracy of the relationship between target strength and fish length, and variation in catchability as a function of fish length would also contribute to the variance of the biomass estimate if they vary spatially or temporally. At present, insufficient is known about these sources of bias and their variation to include them in these simulations. EBS pollock surveys are designed to control these factors as much as possible, so that they have a much smaller effect on the abundance treated as an index than they do on estimates of absolute abundance. Surveys are made at the same time each year, and sampling is made only during daylight, so if they exist, some of the systematic errors associated with fish behaviour may be relatively stable between years. The low sampling variability observed and the consistent survey strategies employed result in a very precise index of EBS pollock abundance.
| Acknowledgements |
|---|
I thank Rick Towler for work on the FORTRAN code used, Pierre Goovaerts for supplying the modified GSLIB FORTRAN programs, Neal Williamson for helpful discussions and editing, and AFSC reviewers M. Dorn, J. Millstein, and J. Ver Hoef and two anonymous reviewers for comments which helped improve the manuscript.
| References |
|---|
|
|
|---|
-
Bez N. (2002) Global fish abundance estimation from regular sampling: the geostatistical transitive method. Canadian Journal of Fisheries and Aquatic Sciences 59:19211931.
Bodholt H., Nes H., Solli H. (1989) A new echo sounder system. Proceedings of the Institute of Acoustics of the UK 11: pp. 123130.
Cressie N. (1993) Statistics for Spatial Data, revised edn. (John Wiley, New York)928.
Cressie N. A. C. and Hawkins D. M. (1980) Robust estimation of the variogram, 1. Mathematical Geology 12:115125.[CrossRef]
Deutsch C. V. and Journel A. G. (1998) GSLIB: Geostatistical Software Library and User's Guide. 2nd edn (Oxford University Press, New York)369.
Foote K. G. and Traynor J. J. (1988) Comparison of walleye pollock target strength estimates determined from in situ measurements and calculations based on swimbladder form. Journal of the Acoustical Society of America 88:917.[CrossRef]
Foote K. G., Knudsen H. P., Vestnes G., MacLennan D. N., Simmonds E. J. (1987) Calibration of acoustic instruments for fish density estimation: a practical guide. 144:69 ICES Cooperative Research Report.
Gimona A. and Fernandes P. G. (2003) A conditional simulation of acoustic survey data: advantages and potential pitfalls. Aquatic Living Resources 16:123129.[CrossRef][Web of Science]
Goovaerts P. (1997) Geostatistics for Natural Resources Evaluation. (Oxford University Press, New York)483.
Goovaerts P. (1999) Geostatistical tools for deriving block-averaged values of environmental attributes. Journal of Geographical Information Sciences 5:8896.
Harbitz A. and Aschan M. (2003) A two-dimensional geostatistic method to simulate the precision of abundance estimates. Canadian Journal of Fisheries and Aquatic Sciences 60:15391551.
Honkalehto T., Patton W., de Blois S., Williamson N. (2002) Echo integrationtrawl survey results for walleye pollock (Theragra chalcogramma) on the Bering Sea shelf and slope during summer 2000. 66 U.S. Department of Commerce, NOAA Technical Memorandum, NMFS-AFSC-126.
Horne J. K. and Walline P. D. (2005) Spatial and temporal variance of walleye pollock (Theragra chalcogramma) in the eastern Bering Sea. Canadian Journal of Fisheries and Aquatic Sciences 62:28222831.
. ICES. (2005) Report of the Workshop on Survey Design and Data Analysis (WKSAD). 913 May 2005, Sète, France. ICES Document CM 2005/B: 07. 170 pp.
Jolly G. M. and Hampton I. (1990) A stratified random transect design for acoustic surveys of fish stocks. Canadian Journal of Fisheries and Aquatic Sciences 47:12821291.
Journel A. G. (1994) Resampling from stochastic simulations. Environmental and Ecological Statistics. 1:6391.
Journel A. G. and Huijbregts Ch. J. (1978) Mining Geostatistics. (Academic Press, London)600.
Karp W. A. and Walters G. E. (1994) Survey assessment of semi-pelagic gadoids: the example of walleye pollock, Theragra chalcogramma, in the eastern Bering Sea. 56:822 Marine Fisheries Review.
Kern J. W. and Coyle K. O. (2000) Global block kriging to estimate biomass from acoustic surveys for zooplankton in the western Aleutian Islands. Canadian Journal of Fisheries and Aquatic Sciences 57:21122121.
Knudsen H. P. (1990) The Bergen echo integrator: an introduction. ICES Journal of Marine Science 47:167174.
Kotwicki S., Buckley T. W., Honkalehto T., Walters G. (2005) Variation in the distribution of walleye pollock (Theragra chalcogramma) with temperature and implications for seasonal migration. Fishery Bulletin US 103:574587.
MacLennan D. N., Fernandes P. J., Dalen J. (2002) A consistent approach to definitions and symbols in fisheries acoustics. ICES Journal of Marine Science 59:365369.
Maravelias C. D., Reid D. G., Simmonds E. J., Haralabous J. (1996) Spatial analysis and mapping of acoustic survey data in the presence of high local variability: geostatistical application to North Sea herring (Clupea harengus). Canadian Journal of Fisheries and Aquatic Sciences 53:14971505.
Mello L. G. S. and Rose G. A. (2005) Using geostatistics to quantify seasonal distribution and aggregation patterns of fishes: an example of Atlantic cod (Gadus mohua). Canadian Journal of Fisheries and Aquatic Sciences 62:659670.
Petitgas P. (1993a) Geostatistics for fish stock assessments: a review and an acoustic application. ICES Journal of Marine Science 50:285298.
Petitgas P. (1993b) Use of a disjunctive kriging to model areas of high pelagic fish density in acoustic fisheries surveys. Aquatic Living Resources 6:201209.[CrossRef]
Petitgas P. and Lafont T. (1997) Eva2: estimation of variance version 2. A geostatistical package on Windows 95 for estimating precision of fish stock assessment surveys. 22:22 ICES Document CM 1997/Y.
Rivoirard J., Simmonds J., Foote K. G., Fernandes P., Bez N. (2000) Geostatistics for Estimating Fish Abundance. (Blackwell Science, Oxford)206.
Saito H. and Goovaerts P. (2000) Geostatistical interpolation of positively skewed and censored data in a dioxin-contaminated site. Environmental Science and Technology 34:42284235.
Simmonds E. J. (2003) Weighting of acoustic- and trawl-survey indices for the assessment of North Sea herring. ICES Journal of Marine Science 60:463471.
Simmonds E. J. and Fryer R. J. (1996) Which are better, random or systematic acoustic surveys? A simulation using North Sea herring as an example. ICES Journal of Marine Science 53:3950.
Simmonds E. J. and MacLennan D. (2005) Fisheries Acoustics: Theory and Practice. Blackwell Science, Oxford pp. 437.
Sullivan P. J. (1991) Stock abundance estimation using depth-dependent trends and spatially correlated variation. Canadian Journal of Fisheries and Aquatic Sciences 48:16911703.
Traynor J. J. (1996) Target strength measurements of walleye pollock (Theragra chalcogramma) and Pacific whiting (Merluccius productus). ICES Journal of Marine Science 53:253258.
Traynor J. J. ( January 1997 1997) Midwater fish surveys at the AFSC. Alaska Fisheries Science Center Quarterly Report. 19.
Webster R. and Oliver M. A. (2001) Geostatistics for Environmental Scientists. (John Wiley, Chichester, UK)271.
Williamson N. J. and Traynor J. J. (1996) Application of a one-dimensional geostatistical procedure to fisheries acoustic surveys of Alaskan pollock. ICES Journal of Marine Science 53:423428.
Wilson C. D., Hollowed A. B., Shima M., Walline P., Stienessen S. (2003) Interactions between commercial fishing and walleye pollock. Alaska Fishery Research Bulletin 10:6177.
This article has been cited by other articles:
![]() |
K. M. Boswell, B. M. Roth, and J. H. Cowan Jr Simulating the effects of side-aspect fish orientation on acoustic biomass estimates ICES J. Mar. Sci., July 1, 2009; 66(6): 1398 - 1403. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Woillez, J. Rivoirard, and P. G. Fernandes Evaluating the uncertainty of abundance estimates from acoustic surveys using geostatistical simulations ICES J. Mar. Sci., July 1, 2009; 66(6): 1377 - 1383. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. M. Burgos and J. K. Horne Characterization and classification of acoustically detected fish spatial distributions ICES J. Mar. Sci., October 1, 2008; 65(7): 1235 - 1247. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||



