Integration with existing packages

Maciej Beręsewicz

1 Setup

library(blocking)
library(reclin2)

2 Data

In the example we will use the same dataset as in the Blocking records for record linkage vignette.

data(census)
data(cis)
census[, x:=1:.N]
cis[, y:=1:.N]

3 Integration with the reclin2 package

The package contains function pair_ann which aims at integration with reclin2 package. This function works as follows.

pair_ann(x = census[1:1000], 
         y = cis[1:1000], 
         on = c("pername1", "pername2", "sex", "dob_day", "dob_mon", "dob_year", "enumcap", "enumpc"), 
         deduplication = FALSE) |>
  head()
.x .y block
204 1 1
204 176 1
204 375 1
204 391 1
204 405 1
204 424 1

Which provides you information on the total number of pairs. This can be further included in the pipeline of the reclin2 package (note that we use a different ANN this time).

pair_ann(x = census[1:1000], 
         y = cis[1:1000], 
         on = c("pername1", "pername2", "sex", "dob_day", "dob_mon", "dob_year", "enumcap", "enumpc"), 
         deduplication = FALSE,
         ann = "hnsw") |>
  compare_pairs(on = c("pername1", "pername2", "sex", "dob_day", "dob_mon", "dob_year", "enumcap", "enumpc"),
                comparators = list(cmp_jarowinkler())) |>
  score_simple("score",
               on = c("pername1", "pername2", "sex", "dob_day", "dob_mon", "dob_year", "enumcap", "enumpc")) |>
  select_threshold("threshold", score = "score", threshold = 6) |>
  link(selection = "threshold") |>
  head()
.y .x person_id.x pername1.x pername2.x sex.x dob_day.x dob_mon.x dob_year.x hse_num enumcap.x enumpc.x str_nam cap_add census_id x txt.x person_id.y pername1.y pername2.y sex.y dob_day.y dob_mon.y dob_year.y enumcap.y enumpc.y cis_id y txt.y
11 945 DE256NG039003 HARRIET THOMSON F 12 1 1995 39 39 SPRINGFIELD ROAD DE256NG Springfield Road 39, Springfield Road CENSDE256NG039003 945 HARRIETTHOMSONF121199539 SPRINGFIELD ROADDE256NG DE256NG039003 HARRIET THOMSON F 12 1 39 SPRINGFIELD ROAD DE256NG CISDE256NG039003 11 HARRIETTHOMSONF12139 SPRINGFIELD ROADDE256NG
71 427 DE159QA062001 LEWIS GREEN M 23 3 1973 62 62 CHURCH ROAD DE159QA Church Road 62, Church Road CENSDE159QA062001 427 LEWISGREENM233197362 CHURCH ROADDE159QA DE159QA062001 LEWIS GREEN M 23 3 62 CHURCH ROAD DE159QA CISDE159QA062001 71 LEWISGREENM23362 CHURCH ROADDE159QA
83 720 DE237GG025002 IMOGEN DARIS F 6 4 1968 25 25 WOODLANDS ROAD DE237GG Woodlands Road 25, Woodlands Road CENSDE237GG025002 720 IMOGENDARISF64196825 WOODLANDS ROADDE237GG DE237GG025002 IMOGEW DAVIS F 6 4 25 WOODLANDS ROAD DE237GG CISDE237GG025002 83 IMOGEWDAVISF6425 WOODLANDS ROADDE237GG
99 136 DE125LU022001 DANIEC MICCER M 21 4 1947 22 22 PARK LANE DE125LU Park Lane 22, Park Lane CENSDE125LU022001 136 DANIECMICCERM214194722 PARK LANEDE125LU DE125LU022001 DAMIEL HILLER M 21 4 22 PARK LANE DE125LU CISDE125LU022001 99 DAMIELHILLERM21422 PARK LANEDE125LU
154 949 DE256NG040002 CHLOE WILSON F 5 7 1978 40 40 SPRINGFIELD ROAD DE256NG Springfield Road 40, Springfield Road CENSDE256NG040002 949 CHLOEWILSONF57197840 SPRINGFIELD ROADDE256NG DE256NG040002 CHLOE WILSOM F 5 7 40 SPRINGFIELD ROAD DE256NG CISDE256NG040002 154 CHLOEWILSOMF5740 SPRINGFIELD ROADDE256NG
156 549 DE159QY035002 AVA KING F 7 7 1969 35 35 CHURCH ROAD DE159QY Church Road 35, Church Road CENSDE159QY035002 549 AVAKINGF77196935 CHURCH ROADDE159QY DE159QY035002 AVA KING F 7 7 35 CHURCH ROAD DE159QY CISDE159QY035002 156 AVAKINGF7735 CHURCH ROADDE159QY

5 Usage with RecordLinkage package

Just use the block column in the argument blockfld in the compare.dedup() or compare.linkage() function. Please note that block column for the RecordLinkage package should be stored as a character not a numeric/integer vector.