fuzzylink: Probabilistic Record Linkage Using Pretrained Text Embeddings
Links datasets through fuzzy string matching using pretrained text embeddings. Produces more accurate record linkage when lexical string distance metrics are a poor guide to match quality (e.g., "Patricia" is more lexically similar to "Patrick" than it is to "Trish"). Capable of performing multilingual record linkage. Methods are described in Ornstein (2025) <https://joeornstein.github.io/publications/fuzzylink.pdf>.
Version: |
0.2.1 |
Depends: |
R (≥ 4.1.0) |
Imports: |
stats, utils, dplyr, Rfast, reshape2, stringdist, stringr, httr, jsonlite, httr2, ranger |
Published: |
2025-06-14 |
Author: |
Joe Ornstein
[aut, cre, cph] |
Maintainer: |
Joe Ornstein <jornstein at uga.edu> |
BugReports: |
https://github.com/joeornstein/fuzzylink/issues |
License: |
MIT + file LICENSE |
URL: |
https://github.com/joeornstein/fuzzylink |
NeedsCompilation: |
no |
Materials: |
README NEWS |
CRAN checks: |
fuzzylink results |
Documentation:
Downloads:
Linking:
Please use the canonical form
https://CRAN.R-project.org/package=fuzzylink
to link to this page.