This webinar will introduce Splink, a software package developed for probabilistic record linkage at scale.
This free software provides a toolkit for record linkage of datasets of tens or even hundreds of millions of records, guiding the user through the various stages of linkage.The toolkit was developed in Python and uses PySpark to enable its use on massive datasets. It has been developed by analysts at the UK Ministry of Justice (MoJ) as part of the Data First programme, and is used to link some of the MoJ’s largest datasets.