oagdedupe

Dedupe is a Python library for scalable entity resolution, using active learning to learn blocking configurations and clasify matches. See Getting Started for installation and setup instructions. See User Guide for more detailed methodology.

The diagaram below shows an overview of the pipeline.

_images/dedupe.png

Note

This project is under active development.

Contents