Dynamics Duplicates leads to poor experiences and poor data quality.
Unfortunately, it’s a lot easier to accidentally create duplicates than it is to fix them.
While Dynamics 365 has tools to merge duplicate records, they are time consuming, manual and not appropriate when facing hundreds or thousands of duplicates.
To resolve this problem for Help Musicians, I built a Python script to bulk merge duplicates in Dynamics 365, with customisable merge logic and comprehensive audit trails. I’ve now open sourced the script so others can use and build on it.
Contents
- Why Bulk Merge Duplicates?
- The Challenge with Manual Merging
- Key Features
- Step-by-Step Guide to Bulk Merge Duplicates
Why Bulk Merge Duplicates?
There are many scenarios where duplicate records accumulate in Dynamics 365:
- Data migrations from legacy systems often introduce duplicates
- Multiple data entry points (web forms, manual entry, integrations) create duplicate contacts
- Historical data may have accumulated duplicates before detection rules were implemented
While Dynamics 365 has built-in duplicate detection, the process for resolving them through the UI is impractical and error-prone.
The Challenge with Manual Merging
The standard Dynamics 365 merge process requires:
- Manually opening each duplicate pair
- Deciding which record to keep
- Selecting which fields to merge
- Clicking through confirmation dialogs
- Repeating this process for every duplicate
For organizations with thousands of duplicates, this could take weeks of tedious manual work. Additionally, the manual process lacks:
- Consistent merge logic – decisions may vary
- Audit trails – no record of what was merged and why
- Testing capabilities – no way to preview results before executing
- Rollback options – manual merges are difficult to undo
Key Features
This Python tool addresses these challenges by providing:
- Bulk processing – merge thousands of duplicates in minutes instead of weeks
- Customisable merge logic – define exactly how conflicting data should be handled
- Intelligent grouping – automatically groups chains of duplicates (if A=B and B=C, then A is merged to B, and B is then merged to C)
- Comprehensive audit trails – CSV files showing before/after state of every merge
- Preview mode – review and refine merge logic before executing
- Target selection – choose whether the oldest or newest record remains active
- Secure authentication – uses interactive Azure AD login (no stored credentials)
- Progress tracking – monitor merge progress in real-time
Step-by-Step Guide to Bulk Merge Duplicates
Step 1: Define Duplicate Detection Rules
Before using this Python tool, you need to configure duplicate detection rules in Dynamics 365.
This is critical: By default, the tool will merge ALL records identified by your duplicate detection job, so your rules must be strict and only find genuine duplicates.
Creating Duplicate Detection Rules:
Navigate to your Dynamics 365 environment, then go to Settings > Data Management > Duplicate Detection Rules.
When creating rules for bulk merging:
- Be strict – use exact matches only (e.g., identical email addresses, not similar names)
- Test thoroughly – run detection jobs on test environments or a subset of data
- Exclude inactive matching records – to only detect duplicates among active records
For comprehensive guidance on creating duplicate detection rules, refer to:
- Microsoft’s official documentation on setting up duplicate detection rules
- Step-by-step instructions at Advantage’s guide
Step 2: Clone the Repository
Clone the repository from GitHub onto your local machine
git clone https://github.com/YesWeCandrew/dynamics_duplicate_resolver
cd dynamics_duplicate_resolver
Install the required Python packages:
pip install -r requirements.txt
The tool uses these non-standard libraries:
requests
library for making the API requests,json
for reading and writing to JSON,pandas
for data transformation.dotenv
for reading the dotenv file.msal
for authenticating to Dynamics.
Step 3: Create Your Environment File
Create a .env
file in the root directory with your Dynamics 365 environment details:
CLIENT_ID = 51f81489-12ee-4a9e-aaae-a2591f45987d
TENANT_ID = your-tenant-id-here
ENVIRONMENT_URI = https://yourorg.crm11.dynamics.com/
AUTHORITY_BASE = https://login.microsoftonline.com
SCOPE_SUFFIX = user_impersonation
This example uses the default developer Dynamics Client ID. I recommend you use this too, so you only need to replace the TENANT_ID and the ENVIRONMENT_URI.
This uses the same configuration and authentication process that I built for my generic Dynamics Python Wrapper. If you want to authenticate with your own app, follow the Azure App instructions here.
You can find:
- TENANT_ID – In Azure Active Directory > Overview
- ENVIRONMENT_URI – In the URL of your Dynamics instance.
As always, you should use a Development or Sandbox environment first to test your merge logic before running on Production data.
Step 4: Run the Application
Start the application:
python main.py
You’ll be prompted to specify your .env
file name (or press Enter to use the first .env
the system can find). A browser window will open for Azure AD authentication.
The menu provides five options:
- Create duplicate detection job – Initiates a bulk duplicate detection in Dynamics
- Group duplicates – Downloads and intelligently groups duplicate records
- Prepare duplicates – Applies merge logic and generates preview files
- Merge duplicates – Executes the actual merge operations
- Clear all and start again – Resets variables to begin a new process
You’re now ready to start the process!
Step 5: Create a Duplicate Detection Job
You have two options for creating a duplicate detection job:
Option A: Use the Tool (Recommended)
Enter 1 to create the duplicate detection job from the menu.
Provide:
- Entity name: The entity to detect duplicates for (e.g.,
contact
,account
) - View name: A view that limits which records to check (e.g., “Active Contacts for Duplicate Checking”). This can be a personal or system view.
The tool will:
- Create the detection job in Dynamics 365
- Return a Job ID (asyncoperationid)
- Provide a URL so you can monitor job progress
Wait for the job to complete before proceeding. You can monitor the job in Settings > Data Management > Duplicate Detection Jobs.
Option B: Create Manually
Alternatively, create a duplicate detection job manually by going to Settings > Data Management > Duplicate Detection Jobs > + New.
After the job completes, you’ll need the asyncoperation ID. You can find it in the URL of the Detection Job. as ...&id={asyncoperationid}
Step 6: Download and Group Duplicates
Once you have a completed Duplicate Detection Job, enter 2 to download and group duplicates.
If you haven’t already, you’ll be prompted to enter:
- Entity name: Same entity from Step 5
- Folder name: A folder name to store data (e.g.,
contact_merge_2025
) - Optional max duplicates: An optional limit for testing or staggering (leave blank for all)
- Job ID: The asyncoperationid from Step 5
The tool will:
- Download all duplicate records identified by the job
- Group duplicates intelligently (if Record A duplicates B, and B duplicates C, they’re grouped as A+B+C)
- Save records as JSON files in
{folder}/group/record_data/
- Create a
duplicate_groups.json
file mapping the groups of recordids.
Step 7: Prepare the Duplicates
Enter 3 from the menu to prepare the duplicates.
This will run the merge logic over the data and present you with what the expected output will be. Importantly, no merges will actually be executed until you decide to do so at the next step.
If you haven’t already, you’ll be prompted to provide:
- Entity name: Same entity
- Folder name: Same folder from Step 6
- Target selection:
oldest_record
ornewest_record
The tool will:
- Select a target record (the one that remains active) based on
createdon
date - Apply the merge logic you’ve written to determine what data to update
- Generate audit CSV files in
{folder}/prepare/audit_records/
- Create
merges.csv
with all planned merge operations
It’s important that you update the merge logic to meet your needs. You determine how to handle conflicting data between the target and the subordinate record.
The default merge logic is a useful starting point but you should change it to reflect your needs. Here’s how it works
Default Merge Logic
The default merge logic in prepare_duplicates.py
:
If a field is empty on target, but populated on the subordinate:
- Copy values from subordinate to target
If both records have a field, and they are different:
- Emails: If different, demote subordinate’s email to emailaddress2
- Phone numbers: Demote conflicting phones to telephone2
- Boolean flags (IsVIP, IsMember, DoNotEmail): Always take TRUE
- Addresses: If postcodes match, fill missing address parts; if different, demote to address2
- Other fields: Default to taking subordinate’s value
This default logic will not meet all your needs – you must customise it in Step 8.
Step 8: Review and Refine
This is the most critical step. Open the audit files in {folder}/prepare/audit_records/
– there’s one CSV per subordinate record showing:
- Row 1 (subordinate): Original subordinate record values
- Row 2 (target): Current target record values
- Row 3 (merge_function): What will be updated on target, via the API call
- Row 4 (target_after_merge): Final expected result after merge
Review these files carefully to ensure:
- Critical data isn’t being lost
- Merge logic handles your specific scenarios
- Conflicting data is resolved appropriately
You should change the code in prepare_duplicates.py
and re-run it as many times as need to get your expected results. Just re-run main.py
and follow prompts.
Ignore fields
There will be some fields you want to ignore and always leave as they are on the target. Add these to the list of fields_to_ignore
.
Fields that are not valid for update, such as modifiedon, will automatically be excluded.
Some customations you might want to consider
To make changes to how conflicts are handled, edit the _create_merge_result
function in prepare_duplicates.py
.
A few examples you might want to consider
Example 1: Handle Multiple Email Addresses
if col == 'emailaddress1':
# Check if target already has emailaddress2
if target_row['emailaddress2'] is None:
result['emailaddress2'] = subordinate_val
else:
# Put this email address in email address 3 instead
result['emailaddress3'] = subordinate_val
Example 2: Conditional Field Updates
# Only update preferred contact method if last contact date is newer
if col == 'preferredcontactmethodcode':
subordinate_date = subordinate_row['lastcontacteddate']
target_date = target_row['lastcontacteddate']
if subordinate_date and target_date:
if subordinate_date > target_date:
result[col] = subordinate_val
Example 3: Address Handling
# Only move address if subordinate address is more complete
subordinate_address_fields = [f for f in ['address1_line1', 'address1_city', 'address1_postalcode'] if subordinate_row[f] is not None]
if len(subordinate_address_fields) >= 3:
# Address is complete enough to consider
# Apply your logic here
After saving changes to prepare_duplicates.py
:
- Run
main.py
again. - Select the
.env
file - Choose
3
to re-run the logic again. - Follow the prompts to specify the data folder location. If you pick the same as the existing one it will use the previous results and overwrite with previous data. You can copy folders if you want to compare before and after.
- Review the new audit files, and make changes to the code in prepare_duplicates.py until you are happy.
- Repeat until the merge logic produces the desired results
Pro tip: Start with a small subset (use the max_number_of_dupes parameter when grouping) to test your logic before processing thousands of records.
Step 9: Execute the Merges
Once you’re sure that you’re satisfied with the results and are ready to execute the merges, it’s time to merge them in Dynamics.
Enter 4 to start to Merge duplicates from the menu.
You’ll be prompted to supply:
- Entity name: Same entity
- Folder name: Same folder
- Perform parenting check:
True
orFalse
(choose True if records have parent accounts – this prevents merging contacts with different parent accounts) - Optional max duplicates: For final testing, specify a small number; leave blank for all
The tool will:
- Execute merge API calls against Dynamics 365
- Show real-time progress (% complete)
- On completion, store results in
{folder}/update/merge_actions_made.json
- Report success/failure counts
Some general notes from my experience:
- The merge operation cannot be easily undone: take your time to make sure you’re happy with the results before getting going.
- Always test on a Development/Sandbox environment first
- Use the
merge_actions_made.json
file for audit purposes and to analyse any merges that fail. - Failures are usually due to parenting conflicts, mistakes in the prepare logic, or missing permissions.
Monitoring Results
After execution:
- Review
merge_actions_made.json
for any HTTP errors - Check Dynamics 365 to verify records merged correctly
- Related records (activities, transactions) automatically reassign to the target
Conclusion
Bulk merging duplicates in Dynamics 365 is a common challenge that usually requires a lot manual effort.
This Python tool automates the process while giving you complete control over merge logic, providing comprehensive audit trails, and enabling preview-before-commit workflows.
I used this tool to successfully merge thousands of duplicate contacts, saving weeks of manual work while maintaining higher data quality than manual merging could achieve.
The complete code is available on GitHub, and I welcome contributions through pull requests. Whether you’re dealing with a one-time migration cleanup or regular duplicate management, this tool can significantly streamline your Dynamics 365 data quality initiatives.
Have questions or suggestions? Feel free to open an issue on GitHub or contribute improvements to the repository.