Dec 123 min read

Tips for Configuring Clean Data Duplicate Identification Rules

Key Takeaways:

Start simple and small, then gradually progress to more complexity
Prioritize objects and fields useful in identifying duplicate records
Involve data owners in the process
Determine what is best for your business

Introduction

When setting up your Detect & Merge rules in ActivePrime CleanData to find duplicates, it is important to follow a structured approach to ensure accuracy and efficiency. Here are five key tips to guide you through the process:

Start Simple and Small

It is tempting to dive into creating complex rules, but starting with a small, manageable dataset and fewer rule criteria is better. For instance, begin with exact matches before advancing to fuzzy logic matching. Additionally, limit the number of rules per object to ensure accuracy and control. This incremental approach allows you to measure the effectiveness of your rules early on before committing more resources to handle larger data sets.

Prioritize Objects and Fields

Objects: Concentrate on critical objects central to your business processes, such as Account, Contact, and Lead. Also, be aware that cross-object duplicates, like Lead and Contact representing the same individual, may require cross-object rules to identify and resolve. These core objects are deeply integrated into many Salesforce processes, making it essential that their data remains accurate and clean.

Fields: Avoid including too many fields in your matching rules. For example, in the Account object, focus on fields useful in identifying duplicates, such as Name, Phone, and Address. These fields typically provide the highest value in determining duplicates. It’s also important to understand how users may enter data when considering the right type of matching rule. For example, if there are no validation rules on the Phone field in Salesforce, it could be useful to ignore special characters to identify and match a duplicate phone number even though it may be entered differently.

Exact and Fuzzy Matching

Exact Match: Whenever possible, use Exact Match rules for high-confidence fields such as address fields - Country, State, and Postal Code. These fields are often validated against external sources, ensuring accuracy and reducing the chance of false positives.

Smart Fuzzy: Smart Fuzzy matching can be helpful to identify duplicates in fields manually entered by users, such as Name, Email, Website, and Street. This type of matching accounts for spelling errors, phonetic spellings, and different variations in the way the data is entered. For example, “Robert” vs. “Bob” or “William” vs. “Bill.” As well, it will match Street addresses such as “1-123 Main St.” vs “123 Main Street Unit 1” vs “123 Main Street #1”.

As the Smart Fuzzy matching percentage is configured on a field-by-field basis, start with a high confidence threshold (e.g., 90%), and gradually decrease if necessary. A best practice is to avoid going too low (e.g. below 80%) to minimize the likelihood of generating false positives.

Monitor and Refine Rules

The rules set up initially may not be as effective as your Salesforce data grows and changes over time or new fields are added to support new business processes. Regularly assess and adjust your rules to ensure they remain relevant and capable of identifying duplicates as your business and data evolve.

Involve Data Owners and Users

Business users and data owners often possess crucial insights into how data is input and maintained. Collaborate with them when creating rules and analyzing their effectiveness. Also, ensure that stakeholders understand the logic behind the rules, especially if you are automating processes such as merging duplicate records. This collaboration can reduce the risk of records being unintentionally merged.

Conclusion

Following these five tips, you can set up effective duplicate identification rules in ActivePrime CleanData, ensuring cleaner and more accurate Salesforce data. Start small, focus on key fields and objects, and continuously refine rules for optimal results

Final Thoughts

There are no hard and fast rules when it comes to identifying duplicates. These are general guidelines and it is more important to determine what is best for your business. By building and refining your deduplication rules, you will be able to easily identify and resolve duplicate records within your environment, ultimately optimizing data accuracy across your organization.

Tips for Configuring Clean Data Duplicate Identification Rules

Start your Data Quality Assessment