This analyzes a dataset from the Conne River, Newfoundland collected in 1992. Smolts are tagged and released at one location in the river. A fence at a second station also captures fish. Data are stratified on a weekly level.
A stratified-Petersen estimator (Darroch, 1961; Plante et al. 1996) is found for the original data and some pooling of the rows/columns.
This will load the SPAS fitting functions and any associated packages needed by SPAS.
The data should be stored as an \(s+1\) by \(t+1\) (before pooling) matrix. The \(s \times t\) upper left matrix is the number of animals released in row stratum \(i\) and recovered in column stratum \(j\). Row \(s+1\) contains the total number of UNMARKED animals recovered in column stratum \(j\). Column \(t+1\) contains the number of animals marked in each row stratum but not recovered in any column stratum. The \(rawdata[s+1, t+1]\) is not used and can be set to 0 or NA. The sum of the entries in each of the first \(s\) rows is then the number of animals marked in each row stratum. The sum of the entries in each of the first \(t\) columns is then the number of animals captured (marked and unmarked) in each column stratum. The row/column names of the matrix may be set to identify the entries in the output.
Here is the raw data for the Conne River:
conne.data.csv <- textConnection("
 149,  126,    0,    0,    0,    0,      1561
   0,  308,   65,    0,    0,    0,      1235
   0,    0,  161,   77,    0,    0,       884
   0,    0,    0,   67,    7,    0,       215
   0,    0,    0,    0,   17,    3,        71
   0,    0,    0,    0,    0,   13,        16
1895, 8503, 2184,  525,  155,  118,         0")
conne.data <- as.matrix(read.csv(conne.data.csv, header=FALSE))A total of \(149+126+1561=1836\) fish were tagged and released in week 1. Of these fish, \(149\) were recovered down stream in column stratum 1; 126 were recovered in column stratum 2; and \(1561\) were never seen again. A total of \(1895\) UNMARKED fish were recovered in column stratum 1.
You can add row and column names to the matrix which will used when the data are printed.
The Stratified-Petersen model is fit to the above data object. The code to fit the model is:
The row.pool.in and col.pool.in inform the
function on which rows or columns to pool before the analysis proceeds.
Both parameters use a vector of codes (length \(s\) for row pooing and length \(t\) for column pooling) where rows/columns
are pooled that share the same value.
For example. row.pool.in=c(1,2,3,4,5,6) would imply that
no rows are pooled, while
row.pool.in=c('a','a','a','b','b','b') would imply that the
first three rows and last three rows are pooled. The entries in the
vector can be numeric or character; however, using character entries
implies that the final pooled matrix is displayed in the order of the
character entries. I find that using entries such as '123'
to represent pooling rows 1, 2, and 3 to be easiest to use.
The SPAS system only fits models were the number of rows after pooling is less than or equal to the number of columns after pooling.
The results of the model fit is a LARGE list. But the function
SPAS.print.model produces a nice report
SPAS.print.model(mod1)
#> Model Name: No restrictions 
#>    Date of Fit: 2025-02-05 23:06 
#>    Version of OPEN SPAS used : SPAS-R 2025.2.1 
#>  
#> Raw data 
#>        V1   V2   V3  V4  V5  V6   V7
#> [1,]  149  126    0   0   0   0 1561
#> [2,]    0  308   65   0   0   0 1235
#> [3,]    0    0  161  77   0   0  884
#> [4,]    0    0    0  67   7   0  215
#> [5,]    0    0    0   0  17   3   71
#> [6,]    0    0    0   0   0  13   16
#> [7,] 1895 8503 2184 525 155 118    0
#> 
#> Row pooling setup : 1 2 3 4 5 6 
#> Col pooling setup : 1 2 3 4 5 6 
#> Physical pooling  : TRUE 
#> Theta pooling     : FALSE 
#> CJS pooling       : FALSE 
#> 
#> 
#> Chapman estimator of population size  71956  (SE  1969  )
#>  
#> 
#> Raw data AFTER PHYSICAL (but not logical) POOLING 
#>       pool1 pool2 pool3 pool4 pool5 pool6   V7
#> pool1   149   126     0     0     0     0 1561
#> pool2     0   308    65     0     0     0 1235
#> pool3     0     0   161    77     0     0  884
#> pool4     0     0     0    67     7     0  215
#> pool5     0     0     0     0    17     3   71
#> pool6     0     0     0     0     0    13   16
#>        1895  8503  2184   525   155   118    0
#> 
#> Condition number of XX' where X= (physically) pooled matrix is  838.2935 
#> Condition number of XX' after logical pooling                   838.2935 
#> 
#> Large value of kappa (>1000) indicate that rows are approximately proportional which is not good
#> 
#>   Conditional   Log-Likelihood: 127001.6    ;  np: 48 ;  AICc: -253907.3 
#> 
#>   Code/Message from optimization is:  1 singular convergence (7) 
#> 
#> Estimates
#>              pool1 pool2 pool3 pool4 pool5 pool6  psi cap.prob exp factor
#> pool1          149   126     0     0     0     0 1561    0.073       12.7
#> pool2            0   308    65     0     0     0 1235    0.043       22.4
#> pool3            0     0   161    77     0     0  884    0.181        4.5
#> pool4            0     0     0    67     7     0  215    0.275        2.6
#> pool5            0     0     0     0    17     3   71    0.111        8.0
#> pool6            0     0     0     0     0    13   16    0.122        7.2
#> est unmarked  1895  8503  2184   525   155   118    0       NA         NA
#>              Pop Est
#> pool1          25186
#> pool2          37634
#> pool3           6193
#> pool4           1052
#> pool5            822
#> pool6            238
#> est unmarked   71127
#> 
#> SE of above estimates
#>              pool1 pool2 pool3 pool4 pool5 pool6  psi cap.prob exp factor
#> pool1         12.2  11.2   0.0   0.0   0.0   0.0 39.5    0.006        1.1
#> pool2          0.0  17.5   8.1   0.0   0.0   0.0 35.1    0.003        1.5
#> pool3          0.0   0.0  12.7   8.8   0.0   0.0 29.7    0.044        1.3
#> pool4          0.0   0.0   0.0   8.2   2.6   0.0 14.7    0.130        1.7
#> pool5          0.0   0.0   0.0   0.0   4.1   1.7  8.4    0.027        2.2
#> pool6          0.0   0.0   0.0   0.0   0.0   3.6  4.0    0.037        2.5
#> est unmarked    NA    NA    NA    NA    NA    NA  0.0       NA         NA
#>              Pop Est
#> pool1           1987
#> pool2           2347
#> pool3           1512
#> pool4            498
#> pool5            204
#> pool6             72
#> est unmarked    2246
#> 
#> 
#> Chisquare gof cutoff  : 0.1 
#> Chisquare gof value   : 1.510711e-18 
#> Chisquare gof df      : 0 
#> Chisquare gof p       : NAThe original data, the data after pooling, estimates and their standard errors are shown. Here the stratified-Petersen estimate of the total number of smolts passing the at the first sampling station is 71,127 with a standard error of 2,246.
Each entry in the output is available in fitted model object. You will need to explore the structure
cat("Names of objects at highest level\n")
#> Names of objects at highest level
names(mod1)
#>  [1] "version"        "date"           "input"          "kappa"         
#>  [5] "kappa.after.lp" "est"            "vcv"            "se"            
#>  [9] "fit.setup"      "conditional"    "model.info"     "gof"
cat("\n\nNames of estimates (both beta and real)\n")
#> 
#> 
#> Names of estimates (both beta and real)
names(mod1$est)
#> [1] "N.Chapman" "beta"      "real"
cat("\n\nNames of real estimates\n")
#> 
#> 
#> Names of real estimates
names(mod1$est$real)
#> [1] "psi"          "theta"        "cap"          "all.expanded" "N"           
#> [6] "N.stratum"    "exp.factor"The N entries refer to the population size; the
N.stratum entries refer to the individual stratum
population sizes; the cap entries refer to the estimated
probability of capture in each row stratum; the exp.factor
entries refer to (1-cap)/cap, or the expansion factor for
each row; the psi entries refer to the number of animals
tagged but never seen again (the right most column in the input data);
and the theta entries refer to the expected number of
animals that were tagged in row stratum \(i\) and recovered in column stratum \(j\) (after pooling).
As noted by Darroch (1961), the stratified-Petersen will fail if the matrix of movements is close to singular. This often happens if two rows are proportional to each other. In this case, there is no unique MLE for the probability of capture in the two rows, and they should be pooled. A detailed discussion of pooling is found in Schwarz and Taylor (1998).
Let us now pool the first two rows and the last two rows, but leave rows 3 and 4 alone. Similar, let us pool the last two columns.
The code is
mod2 <- SPAS.fit.model(conne.data, model.id="Pooling some rows",
                       row.pool.in=c("12","12","3","4","56","56"),
                       col.pool.in=c(1,2,3,4,56,56))Notice how we specify the pooling for rows and columns and the choice of entries for the two corresponding vectors. I used character entries for the row pooling to ensure that the pooled matrix is sorted properly (see below); the numeric entries for the columns is ok as is.
The results of the model fit is
SPAS.print.model(mod2)
#> Model Name: Pooling some rows 
#>    Date of Fit: 2025-02-05 23:06 
#>    Version of OPEN SPAS used : SPAS-R 2025.2.1 
#>  
#> Raw data 
#>        V1   V2   V3  V4  V5  V6   V7
#> [1,]  149  126    0   0   0   0 1561
#> [2,]    0  308   65   0   0   0 1235
#> [3,]    0    0  161  77   0   0  884
#> [4,]    0    0    0  67   7   0  215
#> [5,]    0    0    0   0  17   3   71
#> [6,]    0    0    0   0   0  13   16
#> [7,] 1895 8503 2184 525 155 118    0
#> 
#> Row pooling setup : 12 12 3 4 56 56 
#> Col pooling setup : 1 2 3 4 56 56 
#> Physical pooling  : TRUE 
#> Theta pooling     : FALSE 
#> CJS pooling       : FALSE 
#> 
#> 
#> Chapman estimator of population size  71956  (SE  1969  )
#>  
#> 
#> Raw data AFTER PHYSICAL (but not logical) POOLING 
#>        pool1 pool2 pool3 pool4 pool56   V7
#> pool12   149   434    65     0      0 2796
#> pool3      0     0   161    77      0  884
#> pool4      0     0     0    67      7  215
#> pool56     0     0     0     0     33   87
#>         1895  8503  2184   525    273    0
#> 
#> Condition number of XX' where X= (physically) pooled matrix is  178.6469 
#> Condition number of XX' after logical pooling                   178.6469 
#> 
#> Large value of kappa (>1000) indicate that rows are approximately proportional which is not good
#> 
#>   Conditional   Log-Likelihood: 129431.9    ;  np: 28 ;  AICc: -258807.7 
#> 
#>   Code/Message from optimization is:  0 relative convergence (4) 
#> 
#> Estimates
#>               pool1  pool2 pool3 pool4 pool56  psi cap.prob exp factor Pop Est
#> pool12        108.5  474.5    65     0      0 2796    0.053       17.8   64869
#> pool3           0.0    0.0   161    77      0  884    0.136        6.4    8263
#> pool4           0.0    0.0     0    67      7  215    0.657        0.5     440
#> pool56          0.0    0.0     0     0     33   87    0.109        8.2    1099
#> est unmarked 1935.0 8463.0  2184   525    273    0       NA         NA   74671
#> 
#> SE of above estimates
#>              pool1 pool2 pool3 pool4 pool56  psi cap.prob exp factor Pop Est
#> pool12           5  19.8   8.1   0.0    0.0 52.9    0.002        0.8    2614
#> pool3            0   0.0  12.7   8.8    0.0 29.7    0.020        1.1    1243
#> pool4            0   0.0   0.0   8.2    2.6 14.7    0.675        1.6     451
#> pool56           0   0.0   0.0   0.0    5.7  9.3    0.018        1.5     185
#> est unmarked    NA    NA    NA    NA     NA  0.0       NA         NA    2284
#> 
#> 
#> Chisquare gof cutoff  : 0.1 
#> Chisquare gof value   : 19.59421 
#> Chisquare gof df      : 1 
#> Chisquare gof p       : 9.575909e-06Here the stratified-Petersen estimate of the total number of smolts passing the at the first sampling station is 74,671 with a standard error of 2,284. which is a slight reduction from the unpooled estimates.
You can pool to a single row (and multiple columns) or a single row and single column (which is equivalent to the pooled Petersen estimator). The code and output follow:
mod3 <- SPAS.fit.model(conne.data, model.id="A single row",
                       row.pool.in=rep(1, nrow(conne.data)-1),
                       col.pool.in=c(1,2,3,4,56,56))SPAS.print.model(mod3)
#> Model Name: A single row 
#>    Date of Fit: 2025-02-05 23:06 
#>    Version of OPEN SPAS used : SPAS-R 2025.2.1 
#>  
#> Raw data 
#>        V1   V2   V3  V4  V5  V6   V7
#> [1,]  149  126    0   0   0   0 1561
#> [2,]    0  308   65   0   0   0 1235
#> [3,]    0    0  161  77   0   0  884
#> [4,]    0    0    0  67   7   0  215
#> [5,]    0    0    0   0  17   3   71
#> [6,]    0    0    0   0   0  13   16
#> [7,] 1895 8503 2184 525 155 118    0
#> 
#> Row pooling setup : 1 1 1 1 1 1 
#> Col pooling setup : 1 2 3 4 56 56 
#> Physical pooling  : TRUE 
#> Theta pooling     : FALSE 
#> CJS pooling       : FALSE 
#> 
#> 
#> Chapman estimator of population size  71956  (SE  1969  )
#>  
#> 
#> Raw data AFTER PHYSICAL (but not logical) POOLING 
#>   pool1 pool2 pool3 pool4 pool56   V7
#> 1   149   434   226   144     40 3982
#>    1895  8503  2184   525    273    0
#> 
#> Condition number of XX' where X= (physically) pooled matrix is  1 
#> Condition number of XX' after logical pooling                   1 
#> 
#> Large value of kappa (>1000) indicate that rows are approximately proportional which is not good
#> 
#>   Conditional   Log-Likelihood: 132850.1    ;  np: 7 ;  AICc: -265686.1 
#> 
#>   Code/Message from optimization is:  0 relative convergence (4) 
#> 
#> Estimates
#>               pool1  pool2  pool3 pool4 pool56  psi cap.prob exp factor Pop Est
#> 1             141.2  617.4  166.5  46.2   21.6 3982    0.069       13.5   72010
#> est unmarked 1903.0 8320.0 2243.0 623.0  291.0    0       NA         NA   72010
#> 
#> SE of above estimates
#>              pool1 pool2 pool3 pool4 pool56  psi cap.prob exp factor Pop Est
#> 1              5.3    20   6.1   2.3    1.4 63.1    0.002        0.4    1973
#> est unmarked    NA    NA    NA    NA     NA  0.0       NA         NA    1973
#> 
#> 
#> Chisquare gof cutoff  : 0.1 
#> Chisquare gof value   : 320.8279 
#> Chisquare gof df      : 4 
#> Chisquare gof p       : 3.475759e-68mod4 <- SPAS.fit.model(conne.data, model.id="Pooled Peteren",
                       row.pool.in=rep(1, nrow(conne.data)-1),
                       col.pool.in=rep(1, ncol(conne.data)-1))SPAS.print.model(mod4)
#> Model Name: Pooled Peteren 
#>    Date of Fit: 2025-02-05 23:06 
#>    Version of OPEN SPAS used : SPAS-R 2025.2.1 
#>  
#> Raw data 
#>        V1   V2   V3  V4  V5  V6   V7
#> [1,]  149  126    0   0   0   0 1561
#> [2,]    0  308   65   0   0   0 1235
#> [3,]    0    0  161  77   0   0  884
#> [4,]    0    0    0  67   7   0  215
#> [5,]    0    0    0   0  17   3   71
#> [6,]    0    0    0   0   0  13   16
#> [7,] 1895 8503 2184 525 155 118    0
#> 
#> Row pooling setup : 1 1 1 1 1 1 
#> Col pooling setup : 1 1 1 1 1 1 
#> Physical pooling  : TRUE 
#> Theta pooling     : FALSE 
#> CJS pooling       : FALSE 
#> 
#> 
#> Chapman estimator of population size  71956  (SE  1969  )
#>  
#> 
#> Raw data AFTER PHYSICAL (but not logical) POOLING 
#>       1   V7
#> 1   993 3982
#>   13380    0
#> 
#> Condition number of XX' where X= (physically) pooled matrix is  1 
#> Condition number of XX' after logical pooling                   1 
#> 
#> Large value of kappa (>1000) indicate that rows are approximately proportional which is not good
#> 
#>   Conditional   Log-Likelihood: 148636.7    ;  np: 3 ;  AICc: -297267.3 
#> 
#>   Code/Message from optimization is:  0 relative convergence (4) 
#> 
#> Estimates
#>                  1  psi cap.prob exp factor Pop Est
#> 1              993 3982    0.069       13.5   72010
#> est unmarked 13380    0       NA         NA   72010
#> 
#> SE of above estimates
#>                 1  psi cap.prob exp factor Pop Est
#> 1            31.5 63.1    0.002        0.4    1973
#> est unmarked   NA  0.0       NA         NA    1973
#> 
#> 
#> Chisquare gof cutoff  : 0.1 
#> Chisquare gof value   : 4.229177e-14 
#> Chisquare gof df      : 0 
#> Chisquare gof p       : NABoth models have an estimated abundance of 72,010 with a standard error of 1,973. which is a slight reduction from the unpooled estimates.
Unfortunately, because pooling actually changes the data (internally), it is not possible to do likelihood ratio testing or to use AICc to compare physical different poolings.
Darroch, J. N. (1961). The two-sample capture-recapture census when tagging and sampling are stratified. Biometrika, 48, 241–260. https://www.jstor.org/stable/2332748
Plante, N., L.-P Rivest, and G. Tremblay. (1988). Stratified Capture-Recapture Estimation of the Size of a Closed Population. Biometrics 54, 47-60. https://www.jstor.org/stable/2533994
Schwarz, C. J., & Taylor, C. G. (1998). The use of the stratified-Petersen estimator in fisheries management with an illustration of estimating the number of pink salmon (Oncorhynchus gorbuscha) that return to spawn in the Fraser River. Canadian Journal of Fisheries and Aquatic Sciences, 55, 281–296. https://doi.org/10.1139/f97-238