API

class oagdedupe.api.BaseModel(settings: ~oagdedupe.settings.Settings, cluster: ~oagdedupe.base.BaseCluster = <class 'oagdedupe.cluster.cluster.ConnectedComponents'>)[source]

Abstract base class from which all model classes inherit. All descendent classes must implement predict, train, and candidates methods.

__init__(settings: ~oagdedupe.settings.Settings, cluster: ~oagdedupe.base.BaseCluster = <class 'oagdedupe.cluster.cluster.ConnectedComponents'>) None
__post_init__()[source]
_abc_impl = <_abc_data object>
cluster

alias of ConnectedComponents

fit_blocks() None[source]
abstract initialize()[source]
predict() Union[DataFrame, Tuple[DataFrame]][source]

fast-api trains model on latest labels then submits scores to postgres

clusterer loads scores and uses comparison indices and predicted probabilities to generate clusters

Returns

  • df (pd.DataFrame) – if dedupe, returns single df

  • df,df2 (tuple) – if recordlinkage, two dataframes

settings: Settings
class oagdedupe.api.Dedupe(settings: ~oagdedupe.settings.Settings, cluster: ~oagdedupe.base.BaseCluster = <class 'oagdedupe.cluster.cluster.ConnectedComponents'>)[source]

General dedupe block, inherits from BaseModel.

__init__(settings: ~oagdedupe.settings.Settings, cluster: ~oagdedupe.base.BaseCluster = <class 'oagdedupe.cluster.cluster.ConnectedComponents'>) None
__post_init__()[source]
_abc_impl = <_abc_data object>
initialize(df: DataFrame) None[source]

learn p(match)

settings: Settings
class oagdedupe.api.Fapi(settings: ~oagdedupe.settings.Settings, cluster: ~oagdedupe.base.BaseCluster = <class 'oagdedupe.cluster.cluster.ConnectedComponents'>)[source]

General dedupe block, inherits from BaseModel.

__init__(settings: ~oagdedupe.settings.Settings, cluster: ~oagdedupe.base.BaseCluster = <class 'oagdedupe.cluster.cluster.ConnectedComponents'>) None
__post_init__()[source]
_abc_impl = <_abc_data object>
initialize() None[source]

learn p(match)

settings: Settings
class oagdedupe.api.RecordLinkage(settings: ~oagdedupe.settings.Settings, cluster: ~oagdedupe.base.BaseCluster = <class 'oagdedupe.cluster.cluster.ConnectedComponents'>)[source]

General dedupe block, inherits from BaseModel.

__init__(settings: ~oagdedupe.settings.Settings, cluster: ~oagdedupe.base.BaseCluster = <class 'oagdedupe.cluster.cluster.ConnectedComponents'>) None
__post_init__()[source]
_abc_impl = <_abc_data object>
initialize(df: DataFrame, df2: DataFrame) None[source]

learn p(match)

settings: Settings