Cluster
Cluster
- class oagdedupe.cluster.cluster.ConnectedComponents(repo: BaseRepository, settings: Settings)[source]
Uses a graph to retrieve connected components
- _abc_impl = <_abc_data object>
- get_connected_components(scores: DataFrame) DataFrame[source]
Build graph with “matched” candidate pairs, weighted by p(match).
Need to add feature to consider weights when generating connected components.
- Parameters
scores (pd.DataFrame) – dataframe with pair indices and match scores
- Returns
dataframe mapping cluster index to entity index
- Return type
pd.DataFrame
- get_connected_components_link(scores: DataFrame) DataFrame[source]
For record linkage:
Build graph with “matched” candidate pairs, weighted by p(match).
Keeps track of whether index is from left or right dataframe
Need to add feature to consider weights when generating connected components.
- Parameters
scores (pd.DataFrame) – dataframe with pair indices and match scores
- Returns
dataframe mapping cluster index to entity index
- Return type
pd.DataFrame
- get_df_cluster(**kwargs)
- repo: BaseRepository