HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Scalable Matching and Clustering of Entities with FAMER

{Erhard Rahm Eric Peukert Markus Nentwig Alieh Saeedi}

Abstract

Entity resolution identifies semantically equivalent entities, e.g. describing the same product or customer. It is especially challenging for Big Data applications where large volumes of data from many sources have to be matched and integrated. We therefore introduce a scalable entity resolution framework called FAMER (FAst Multi-source Entity Resolution system) that is based on Apache Flink for distributed execution and that can holistically match entities from multiple sources. For the latter purpose, FAMER includes multiple clustering schemes that group matching entities from different sources within clusters. In addition to previously known clustering schemes FAMER includes new approaches tailored to multi-source entity resolution. We perform a detailed comparative evaluation of eight clustering schemes for different real-life and synthetically generated datasets. The evaluation considers both the match quality as well as the scalability for different numbers of machines and data sizes.

Benchmarks

BenchmarkMethodologyMetrics
entity-resolution-on-musicbrainz20kFAMER-Split
F1: 0.840
entity-resolution-on-musicbrainz20kFAMER-SplitMerge
F1: 0.880

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Scalable Matching and Clustering of Entities with FAMER | Papers | HyperAI