Pasadena, CA – April 01, 2010 – The Association for the Advancement of Artificial Intelligence will publish an article co-authored by Clint Bidlack entitled “Exceptional Data Quality using Intelligent Matching and Retrieval”. This paper outlines the response to current, rapid advances in enterprise web-based software, specifically the fast growing Software as a Service (SaaS) model which has created a need for new, sophisticated yet user-friendly data quality solutions. One of the leading SaaS applications is CRM (Customer Relationship Management) and data quality has become the number one issue that limits return on investment for these applications. As the volume of data swells, the pain experienced from poor quality data grows more acute. Data quality has been an ongoing issue in the IT industry for the past 30 years, fueling the growth of the data quality segment to $1B in 2008. Experian estimates that companies are losing 6% of sales due to poor management of customer data (QAS 2006). Over the past several years, experience shows that the numbers of duplicate records in corporate CRM systems range from 10% to a staggering 70%. Additionally, it is common for at least 50% of CRM records to contain faulty data, for instance, misspelled customer names or addresses.
The paper discusses how poor data quality can easily arise from either manual entry or mass imports. At the point of entry, the individual (either the end consumer on a web page or a Sales / Marketing staff member) may use their own unique abbreviations and when it is synchronized without intelligent matching the fact that it is a duplicate record will not be caught.
The paper illustrates the five points of entry for the majority of data that enters a CRM system which are manual data entry, batch imports from external sources, data synchronization from other software, web entry by the consumer and migration from legacy systems. All of these points of entry must be closely managed to keep duplicates from being created.
Through Artificial Intelligence methodologies, these data issues can be mitigated. The challenge in the new era of SaaS deployment is to not only design data quality solutions that do the job, but due to the nature of an on-line application, the response time must be finely tuned.
The methodologies discussed in the article are:
- Search Space Reduction - Robust inexact matching algorithms, compare the closeness of two strings.
- Histogram Based Pruning is a technique to rapidly identify if two strings are beyond an acceptable edit-distance limit to reduce runtime, sometimes by well over 70%
- Search-Based Pruning - Standard edit-distance algorithms calculate the distance by finding the shortest path. Since we do not care about the exact distance if the match is unacceptable, we can often terminate the search early, once we have ruled out all paths of length T or less.
- Inexact Indexing can enable speedups in processing of several orders of magnitude by grouping the records into a large number of buckets, and then any specific string comparison searches only over the data in a small number of relevant buckets, effectively pruning the search space.
- Querying Field Ordering - Because of varying search times, the order of querying can be adjusted to take advantage of these differences by dynamically detecting the optimal field search order.
- Query Optimization is deployed to enable real-time matching of a record to a remote database. For instance, with the CleanEnter product, while entering a new record into a CRM system, the user first searches to identify if the new record already exists. The user is comparing the one new record to the entire CRM database, searching for anything that looks similar.
The combination of these techniques answer today’s challenge of how to perform such matching on large volumes of data, very quickly.
Additional information about the author, Clint Bidlack
- Award: Innovative Applications of Artificial Intelligence at 2009 AAIC Conference
- Paper: Enabling Data Quality with Lightweight Ontologies, AAAI, 2009
ActivePrime has a heritage of almost 20 years in researching and fine tuning Artificial Intelligence to bring smarter searches and cleaner data to CRM users. This enhances efficiency for all customer facing teams as duplicate data or inexact search matches can take the equivalent of one day per month away from selling time, which negatively impacts revenue streams.
The key to the success and efficiency of the ActivePrime Clean Data Suite is having balanced a complete fuzzy data search with the speed that is required to return a search using today’s SaaS and online models.
Additional benefits from the ActivePrime Clean Data Suite include standardization of Fortune 1000 company names as well as addresses within the US. The search algorithms find nicknames and alternatives of names such as “Cathy” or “Kathi”, “Johnson” or “Johnstone”, “MacDuff” or “McDuff” and “1st” or “First”.
Rosaline Gulati, 617-247-9908 ext 22 or firstname.lastname@example.org