Purpose
- Find duplicate records within a single Data Source using the specified matching method.
- From the duplicate records, choose one of the records as the surviving record.
- The results can then be further processed, for example, use a merge function to eliminate the non-surviving duplicate records.
Tips
- You can use more than one attribute to determine if records are duplicates. For each attribute, use exact matching or fuzzy matching for duplicate identification.
- For fuzzy matching, the fuzziness factor can be further fine-tuned via advanced configuration.
- Fuzzy matching can result in false-positive matches, so be sure to review your results closely to make sure your job produces the results you intend.
- Identification of the surviving record within a group of duplicates uses an elimination methodology where you can define a set of criteria that are applied in priority order to try to achieve only 1 single surviving record.
- You can use the "Manual review of de-duplication result" section to specify a data source to store the de-duplication results and then review it using the "Data -> Review De-dupe" option.
- The de-duplication task supports multiple actions to perform sequential de-duplication. The result of each action is stored in attributes "Duplicates1", "Duplicates2", etc… and "Merged Attributes1", "Merged Attributes2", etc… where the number is the corresponding action in top to bottom order. The final results are stored in the attributes "Duplicates" and "Merged Attributes".
- Please note that this task does not update your target system. Your job will need to push the deduplication request to your target system to complete the process. Refer to the following task templates to merge records: SFDC Merge Records, Marketo Merge Leads.
Merging
- The merge process is controlled by a default option and can be overridden for individual attributes. There are 4 different options on how to merge data from non-surviving records into the surviving record:
- "Fill only if empty from non-surviving records" - This option will fill in empty attributes of the surviving record using available values from non-surviving records to achieve maximum completeness.
- The non-surviving records are selected in order of any date attribute and based on your setting of earliest / latest order.
- Once all the available values are harvested from the first non-surviving record, the next latest non-surviving record will be harvested.
- This process is repeated until the surviving record has no more empty attributes, or when all the non-surviving records have been harvested.
- "Always overwrite from non-surviving records" - This option will overwrite the data in the surviving record from a record within the duplicate group where the record is selected in order of any date attribute and based on the setting of earliest / latest order. Note that the surviving record is also within the duplicate group and your logic may indicate that the surviving record is the selected record and, in this case, no data is overwritten.
- "Append values from non-surviving records" - This option will append all the data from non-surviving records into the surviving record using a specified delimiter.
- "Never merge values from non-surviving records" - This option will prevent any data merge into the surviving record.
- "Fill only if empty from non-surviving records" - This option will fill in empty attributes of the surviving record using available values from non-surviving records to achieve maximum completeness.
- Use the "Advanced Configuration" and "Add Exception" buttons to override the default merge logic on a per attribute level. Each exception can use different merge logic to achieve even the most complicated merge requirements.
- Thoughtfully select only the attributes that are required for merging. Many attributes are populated by other processes and do not need to be merged.
- Please note that the merge option creates a merged record in Openprise, but does not update your target system. Your job will need to push the merged records to your target system to update the winning record. Refer to the task template Export: Add / Update for details on updating records.
Examples
Find duplicate records where Email Address and Company Name attributes are identical.
- The surviving record is the one with Lead Source has value and Job Title has value. If there are multiple records remaining based on those criteria, then use the record with the earliest Created Date.
- By default, fill in the surviving record's empty attributes from records within the duplicates group using the data from the latest Modified Date record first.
- However, for contact information attributes Address, City, State, Zip, Country, and Phone Number, always overwrite them with data from records within the duplicates group using the data from the latest Modified Date.
- Also, for the Notes attribute, always append all the data from the duplicates group into the surviving record.
Support Contacts
If you have any additional questions, please feel free to contact us at help@openprisetech.com.