Using Python To Bulk Merge Dynamics 365 Duplicates

Dynamics Duplicates leads to poor experiences and poor data quality.

Unfortunately, it’s a lot easier to accidentally create duplicates than it is to fix them.

While Dynamics 365 has tools to merge duplicate records, they are time consuming, manual and not appropriate when facing hundreds or thousands of duplicates.

To resolve this problem for Help Musicians, I built a Python script to bulk merge duplicates in Dynamics 365, with customisable merge logic and comprehensive audit trails. I’ve now open sourced the script so others can use and build on it.

Github Repository

Why Bulk Merge Duplicates?

There are many scenarios where duplicate records accumulate in Dynamics 365:

Data migrations from legacy systems often introduce duplicates
Multiple data entry points (web forms, manual entry, integrations) create duplicate contacts
Historical data may have accumulated duplicates before detection rules were implemented

While Dynamics 365 has built-in duplicate detection, the process for resolving them through the UI is impractical and error-prone.

The Challenge with Manual Merging

The standard Dynamics 365 merge process requires:

Manually opening each duplicate pair
Deciding which record to keep
Selecting which fields to merge
Clicking through confirmation dialogs
Repeating this process for every duplicate

For organizations with thousands of duplicates, this could take weeks of tedious manual work. Additionally, the manual process lacks:

Consistent merge logic – decisions may vary
Audit trails – no record of what was merged and why
Testing capabilities – no way to preview results before executing
Rollback options – manual merges are difficult to undo

Key Features

This Python tool addresses these challenges by providing:

Bulk processing – merge thousands of duplicates in minutes instead of weeks
Customisable merge logic – define exactly how conflicting data should be handled
Intelligent grouping – automatically groups chains of duplicates (if A=B and B=C, then A is merged to B, and B is then merged to C)
Comprehensive audit trails – CSV files showing before/after state of every merge
Preview mode – review and refine merge logic before executing
Target selection – choose whether the oldest or newest record remains active
Secure authentication – uses interactive Azure AD login (no stored credentials)
Progress tracking – monitor merge progress in real-time

Step-by-Step Guide to Bulk Merge Duplicates

Step 1: Define Duplicate Detection Rules

Before using this Python tool, you need to configure duplicate detection rules in Dynamics 365.

This is critical: By default, the tool will merge ALL records identified by your duplicate detection job, so your rules must be strict and only find genuine duplicates.

Creating Duplicate Detection Rules:

Navigate to your Dynamics 365 environment, then go to Settings > Data Management > Duplicate Detection Rules.

When creating rules for bulk merging:

Be strict – use exact matches only (e.g., identical email addresses, not similar names)
Test thoroughly – run detection jobs on test environments or a subset of data
Exclude inactive matching records – to only detect duplicates among active records

For comprehensive guidance on creating duplicate detection rules, refer to:

Microsoft’s official documentation on setting up duplicate detection rules
Step-by-step instructions at Advantage’s guide

Step 2: Clone the Repository

Clone the repository from GitHub onto your local machine

git clone https://github.com/YesWeCandrew/dynamics_duplicate_resolver
cd dynamics_duplicate_resolver

Install the required Python packages:

pip install -r requirements.txt

The tool uses these non-standard libraries:

requests library for making the API requests,
json for reading and writing to JSON,
pandas for data transformation.
dotenv for reading the dotenv file.
msal for authenticating to Dynamics.

Step 3: Create Your Environment File

Create a .env file in the root directory with your Dynamics 365 environment details:

CLIENT_ID = 51f81489-12ee-4a9e-aaae-a2591f45987d
TENANT_ID = your-tenant-id-here
ENVIRONMENT_URI = https://yourorg.crm11.dynamics.com/
AUTHORITY_BASE = https://login.microsoftonline.com
SCOPE_SUFFIX = user_impersonation

This example uses the default developer Dynamics Client ID. I recommend you use this too, so you only need to replace the TENANT_ID and the ENVIRONMENT_URI.

This uses the same configuration and authentication process that I built for my generic Dynamics Python Wrapper. If you want to authenticate with your own app, follow the Azure App instructions here.

You can find:

TENANT_ID – In Azure Active Directory > Overview
ENVIRONMENT_URI – In the URL of your Dynamics instance.

As always, you should use a Development or Sandbox environment first to test your merge logic before running on Production data.

Step 4: Run the Application

Start the application:

python main.py

You’ll be prompted to specify your .env file name (or press Enter to use the first .env the system can find). A browser window will open for Azure AD authentication.

The menu provides five options:

Create duplicate detection job – Initiates a bulk duplicate detection in Dynamics
Group duplicates – Downloads and intelligently groups duplicate records
Prepare duplicates – Applies merge logic and generates preview files
Merge duplicates – Executes the actual merge operations
Clear all and start again – Resets variables to begin a new process

You’re now ready to start the process!

Step 5: Create a Duplicate Detection Job

You have two options for creating a duplicate detection job:

Option A: Use the Tool (Recommended)

Enter 1 to create the duplicate detection job from the menu.

Provide:

Entity name: The entity to detect duplicates for (e.g., contact, account)
View name: A view that limits which records to check (e.g., “Active Contacts for Duplicate Checking”). This can be a personal or system view.

The tool will:

Create the detection job in Dynamics 365
Return a Job ID (asyncoperationid)
Provide a URL so you can monitor job progress

Wait for the job to complete before proceeding. You can monitor the job in Settings > Data Management > Duplicate Detection Jobs.

Option B: Create Manually

Alternatively, create a duplicate detection job manually by going to Settings > Data Management > Duplicate Detection Jobs > + New.

After the job completes, you’ll need the asyncoperation ID. You can find it in the URL of the Detection Job. as ...&id={asyncoperationid}

Step 6: Download and Group Duplicates

Once you have a completed Duplicate Detection Job, enter 2 to download and group duplicates.

If you haven’t already, you’ll be prompted to enter:

Entity name: Same entity from Step 5
Folder name: A folder name to store data (e.g., contact_merge_2025)
Optional max duplicates: An optional limit for testing or staggering (leave blank for all)
Job ID: The asyncoperationid from Step 5

The tool will:

Download all duplicate records identified by the job
Group duplicates intelligently (if Record A duplicates B, and B duplicates C, they’re grouped as A+B+C)
Save records as JSON files in {folder}/group/record_data/
Create a duplicate_groups.json file mapping the groups of recordids.

Step 7: Prepare the Duplicates

Enter 3 from the menu to prepare the duplicates.

This will run the merge logic over the data and present you with what the expected output will be. Importantly, no merges will actually be executed until you decide to do so at the next step.

If you haven’t already, you’ll be prompted to provide:

Entity name: Same entity
Folder name: Same folder from Step 6
Target selection: oldest_record or newest_record

The tool will:

Select a target record (the one that remains active) based on createdon date
Apply the merge logic you’ve written to determine what data to update
Generate audit CSV files in {folder}/prepare/audit_records/
Create merges.csv with all planned merge operations

It’s important that you update the merge logic to meet your needs. You determine how to handle conflicting data between the target and the subordinate record.

The default merge logic is a useful starting point but you should change it to reflect your needs. Here’s how it works

Default Merge Logic

The default merge logic in prepare_duplicates.py:

If a field is empty on target, but populated on the subordinate:

Copy values from subordinate to target

If both records have a field, and they are different:

Emails: If different, demote subordinate’s email to emailaddress2
Phone numbers: Demote conflicting phones to telephone2
Boolean flags (IsVIP, IsMember, DoNotEmail): Always take TRUE
Addresses: If postcodes match, fill missing address parts; if different, demote to address2
Other fields: Default to taking subordinate’s value

This default logic will not meet all your needs – you must customise it in Step 8.

Step 8: Review and Refine

This is the most critical step. Open the audit files in {folder}/prepare/audit_records/ – there’s one CSV per subordinate record showing:

Row 1 (subordinate): Original subordinate record values
Row 2 (target): Current target record values
Row 3 (merge_function): What will be updated on target, via the API call
Row 4 (target_after_merge): Final expected result after merge

Review these files carefully to ensure:

Critical data isn’t being lost
Merge logic handles your specific scenarios
Conflicting data is resolved appropriately

You should change the code in prepare_duplicates.py and re-run it as many times as need to get your expected results. Just re-run main.py and follow prompts.

Ignore fields

There will be some fields you want to ignore and always leave as they are on the target. Add these to the list of fields_to_ignore.

Fields that are not valid for update, such as modifiedon, will automatically be excluded.

Some customations you might want to consider

To make changes to how conflicts are handled, edit the _create_merge_result function in prepare_duplicates.py.

A few examples you might want to consider

Example 1: Handle Multiple Email Addresses

if col == 'emailaddress1':
    # Check if target already has emailaddress2
    if target_row['emailaddress2'] is None:
        result['emailaddress2'] = subordinate_val
    else:
        # Put this email address in email address 3 instead
        result['emailaddress3'] = subordinate_val

Example 2: Conditional Field Updates

# Only update preferred contact method if last contact date is newer
if col == 'preferredcontactmethodcode':
    subordinate_date = subordinate_row['lastcontacteddate']
    target_date = target_row['lastcontacteddate']
    
    if subordinate_date and target_date:
        if subordinate_date > target_date:
            result[col] = subordinate_val

Example 3: Address Handling

# Only move address if subordinate address is more complete
subordinate_address_fields = [f for f in ['address1_line1', 'address1_city', 'address1_postalcode'] if subordinate_row[f] is not None]
    
if len(subordinate_address_fields) >= 3:
    # Address is complete enough to consider
    # Apply your logic here

After saving changes to prepare_duplicates.py:

Run main.py again.
Select the .env file
Choose 3 to re-run the logic again.
Follow the prompts to specify the data folder location. If you pick the same as the existing one it will use the previous results and overwrite with previous data. You can copy folders if you want to compare before and after.
Review the new audit files, and make changes to the code in prepare_duplicates.py until you are happy.
Repeat until the merge logic produces the desired results

Pro tip: Start with a small subset (use the max_number_of_dupes parameter when grouping) to test your logic before processing thousands of records.

Step 9: Execute the Merges

Once you’re sure that you’re satisfied with the results and are ready to execute the merges, it’s time to merge them in Dynamics.

Enter 4 to start to Merge duplicates from the menu.

You’ll be prompted to supply:

Entity name: Same entity
Folder name: Same folder
Perform parenting check: True or False (choose True if records have parent accounts – this prevents merging contacts with different parent accounts)
Optional max duplicates: For final testing, specify a small number; leave blank for all

The tool will:

Execute merge API calls against Dynamics 365
Show real-time progress (% complete)
On completion, store results in {folder}/update/merge_actions_made.json
Report success/failure counts

Some general notes from my experience:

The merge operation cannot be easily undone: take your time to make sure you’re happy with the results before getting going.
Always test on a Development/Sandbox environment first
Use the merge_actions_made.json file for audit purposes and to analyse any merges that fail.
Failures are usually due to parenting conflicts, mistakes in the prepare logic, or missing permissions.

Monitoring Results

After execution:

Review merge_actions_made.json for any HTTP errors
Check Dynamics 365 to verify records merged correctly
Related records (activities, transactions) automatically reassign to the target

Conclusion

Bulk merging duplicates in Dynamics 365 is a common challenge that usually requires a lot manual effort.

This Python tool automates the process while giving you complete control over merge logic, providing comprehensive audit trails, and enabling preview-before-commit workflows.

I used this tool to successfully merge thousands of duplicate contacts, saving weeks of manual work while maintaining higher data quality than manual merging could achieve.

The complete code is available on GitHub, and I welcome contributions through pull requests. Whether you’re dealing with a one-time migration cleanup or regular duplicate management, this tool can significantly streamline your Dynamics 365 data quality initiatives.

Have questions or suggestions? Feel free to open an issue on GitHub or contribute improvements to the repository.

Using Python to Bulk Merge Dynamics 365 Duplicates

Contents

Why Bulk Merge Duplicates?

The Challenge with Manual Merging

Key Features

Step-by-Step Guide to Bulk Merge Duplicates

Step 1: Define Duplicate Detection Rules

Creating Duplicate Detection Rules:

Step 2: Clone the Repository

Step 3: Create Your Environment File

Step 4: Run the Application

Step 5: Create a Duplicate Detection Job

Option A: Use the Tool (Recommended)

Option B: Create Manually

Step 6: Download and Group Duplicates

Step 7: Prepare the Duplicates

Default Merge Logic

Step 8: Review and Refine

Ignore fields

Some customations you might want to consider

Step 9: Execute the Merges

Monitoring Results

Conclusion