oagdedupe
Dedupe is a Python library for scalable entity resolution, using active learning to learn blocking configurations and clasify matches. See Getting Started for installation and setup instructions. See User Guide for more detailed methodology.
The diagaram below shows an overview of the pipeline.
Note
This project is under active development.