Blocker
block.blocking
- class oagdedupe.block.blocking.Blocking(repo: ~oagdedupe.db.base.BaseRepositoryBlocking, conj: ~oagdedupe.block.base.BaseConjunctions = <class 'oagdedupe.block.learner.Conjunctions'>, forward: ~oagdedupe.block.base.BaseForward = <class 'oagdedupe.block.forward.Forward'>, pairs: ~oagdedupe.block.base.BasePairs = <class 'oagdedupe.block.pairs.Pairs'>, optimizer: ~typing.Optional[~oagdedupe.block.base.BaseConjunctions] = None)[source]
General interface for blocking: - forward: constructs forward indices - conjunctions: learns best conjunctions - pairs: generates pairs from inverted indices
- __init__(repo: ~oagdedupe.db.base.BaseRepositoryBlocking, conj: ~oagdedupe.block.base.BaseConjunctions = <class 'oagdedupe.block.learner.Conjunctions'>, forward: ~oagdedupe.block.base.BaseForward = <class 'oagdedupe.block.forward.Forward'>, pairs: ~oagdedupe.block.base.BasePairs = <class 'oagdedupe.block.pairs.Pairs'>, optimizer: ~typing.Optional[~oagdedupe.block.base.BaseConjunctions] = None) None
- _abc_impl = <_abc_data object>
- conj
alias of
Conjunctions
- optimizer: BaseConjunctions = None
- pairs
alias of
Pairs
- repo: BaseRepositoryBlocking
- save(full: bool = False)[source]
save comparison pairs, using conjunctions list;
if using sample, build all forward indices first, otherwise builds forward index as needed
- save_comparisons(table: str, n_covered: int) None[source]
Iterates through best conjunction from best to worst.
For each conjunction, append comparisons to “comparisons” or “full_comparisons” (if using full data).
Stop if (a) subsequent conjunction yields a reduction ratio below the minimum rr setting or (b) the number of comparison pairs gathered exceeds n_covered.
block.forward
This module contains objects used to construct blocks by creating forward index.
- class oagdedupe.block.forward.Forward(repo: BaseRepositoryBlocking, settings: Settings)[source]
Used to build forward indices. A forward index is a table where rows are entities, columns are block schemes, and values contain signatures.
- repository
- Type
BaseRepositoryBlocking
- _abc_impl = <_abc_data object>
- build_forward_indices(rl: str = '', full: bool = False, conjunction: Optional[Tuple[str]] = None) None[source]
Build forward indices for train or full datasets
- repo: BaseRepositoryBlocking
block.learner
This module contains objects used to construct learn the best block scheme conjunctions and uses these to generate comparison pairs.
- class oagdedupe.block.learner.Conjunctions(optimizer: BaseOptimizer, settings: Settings)[source]
For each block scheme, get the best block scheme conjunctions of lengths 1 to k using greedy dynamic programming approach.
- optimizer
- Type
BaseOptimizer
- property conjunctions_list: List[StatsDict]
flattens, dedupes and sorts list of conjunctions
- Return type
List[StatsDict]
- optimizer: BaseOptimizer