Why is data migration considered the most underestimated phase of an ERP project?

Data migration is responsible for over 75% of ERP implementation problems, yet it is consistently underfunded and delayed. Clients routinely underestimate the timeline, and the true scope only becomes apparent once the team opens legacy files and encounters merged cells, inconsistent date formats, duplicate customer records, and reused product codes for discontinued items.

How many migration dry runs should a team complete before ERP go-live, and what does each one test?

The post recommends running at least three migration dry runs before go-live. The first run surfaces structural errors such as wrong column mappings and missing required fields. The second run catches business logic errors like out-of-range dates and unbalanced amounts. The third run is the dress rehearsal — it must complete cleanly and match expected record counts; if it fails, the project is not ready to go live.

Who is responsible for data cleansing, and what does a practical validation process look like?

Data cleansing is a business task with IT support, not purely an IT task — the business owns and must validate its data. For each entity type (customers, vendors, products, employees), a business data steward is assigned ownership of a structured validation template: a spreadsheet with one row per record, required-field columns, and a status column (Clean, Needs Review, Duplicate, Delete). Cleansing must be complete before UAT begins.

How much historical data should Indonesian SMEs migrate into the new ERP?

For most Indonesian SMEs, migrating the last 2–3 years of transactions is sufficient. Older data can be archived in its original format and accessed when needed. Attempting to migrate 10 years of messy historical data adds months to the project and introduces quality risks; the recommended approach is to define a clear historical data cutoff date upfront and treat the old system as the archive.

ERP Data Quality and Migration Prep: The Work Nobody Wants to Do

Q: What three validation checks confirm that a migration run is correct?

After each dry run, three checks must all pass: record count validation (source record count equals target count minus intentional exclusions), financial balance validation (AR balances, AP balances, and inventory values in the new system match the last closed period from the old system), and spot-check validation (20–30 randomly sampled records across each entity type are manually verified by the business data steward).

Data migration is responsible for over 75% of ERP implementation problems, yet it's consistently the most underfunded and delayed phase of every project. In every ERP implementation I've done as a freelance ERP consultant, the data migration timeline was underestimated by the client, underscoped by the project plan, and only truly understood when I opened the first Excel file and found merged cells, inconsistent date formats (some dd/mm/yyyy, some mm-dd-yy, some just '1 Jan 2022'), duplicate customer records with different names ('PT Maju Jaya', 'Maju Jaya PT', 'pt maju jaya'), and product codes that had been reused for discontinued items. This post is a practitioner's guide to doing data migration right.

The Data Audit: Start on Day One

Before a single line of ERP code is written, run a data audit. The audit answers four questions: Where is your data? What format is it in? How dirty is it? How much of it do you actually need in the new system? Most Indonesian SMEs have data in three to five places: a primary accounting spreadsheet, a separate HR file (usually a shared Excel workbook), vendor and customer master data in another spreadsheet or an old desktop accounting system, and historical transaction records in email attachments and WhatsApp conversations. Map all of these sources before estimating migration complexity.

Common Data Quality Issues in Indonesian SMEs

The most common data quality problems I encounter in Indonesian SME migrations: duplicate records (the same customer appearing 3–4 times with slight name variations), missing required fields (no NPWP number for vendor records, no email for customer contacts), inconsistent coding (product codes changed over time but old codes weren't retired, creating ambiguity), date format inconsistency (multiple formats used across files), and 'dead' records (suppliers no longer active, products discontinued years ago) that haven't been cleaned up. Each of these requires a specific remediation strategy before migration.

The Data Cleansing Process

Data cleansing is not an IT task — it's a business task with IT support. The business owns the data and the business must validate it. For each entity type (customers, vendors, products, employees, historical transactions), assign a business data steward responsible for cleaning their domain. Provide them with a structured data validation template — a spreadsheet with one row per record, columns for required fields, and a status column (Clean, Needs Review, Duplicate, Delete). Set a hard deadline: data cleansing must be complete before UAT begins, not after.

ERP Data Migration Pipeline (Excel/Access → PostgreSQL)

  SOURCE SYSTEMS                STAGING DB               TARGET ERP
  ┌───────────────┐             ┌──────────────┐         ┌──────────────┐
  │ Excel Files   │──extract──▶│  customers   │         │              │
  │ (many sheets) │            │  vendors     │──load──▶│  ERP         │
  ├───────────────┤            │  products    │         │  PostgreSQL  │
  │ Access DB     │──extract──▶│  invoices    │         │  (clean)     │
  │ (.mdb/.accdb) │            │  employees   │         │              │
  ├───────────────┤            └──────┬───────┘         └──────────────┘
  │ Old ERP       │──export──▶        │
  │ (CSV export)  │            TRANSFORM STAGE
  └───────────────┘            ┌──────▼───────────────────────────┐
                               │ 1. Remove duplicates              │
                               │ 2. Standardize date formats       │
                               │ 3. Validate required fields       │
                               │ 4. Map old codes → new codes      │
                               │ 5. Validate referential integrity │
                               │ 6. Flag records needing review    │
                               └──────────────────────────────────┘

DRY RUN CHECKLIST (run 3 times before go-live):
  Run 1: Fix structural errors (column mappings, missing fields)
  Run 2: Fix business logic errors (dates, balances, codes)
  Run 3: Dress rehearsal — must complete cleanly ✓

From my experience implementing ERPs in my freelance ERP consulting work: run a data migration dry run at least three times before go-live. The first run reveals the structural errors (wrong column mappings, missing required fields). The second run reveals the business logic errors (dates outside valid range, amounts that don't balance). The third run is the dress rehearsal — it should complete cleanly and match the expected record counts. If the third run fails, you're not ready for go-live.

Technical Migration Architecture

A robust migration pipeline has three stages: Extract (pull data from source systems), Transform (clean, map, and validate), and Load (insert into the target ERP database). For Excel and Access source systems, the extract stage uses Python scripts to read files and normalize data into a staging database. The transform stage applies business rules: merge duplicate records, fill missing fields from secondary sources, convert date formats, validate referential integrity. The load stage uses the ERP's import API or direct database inserts, with comprehensive error logging for every rejected record.

# Python: Extract from Excel → Staging DB → Validate

import pandas as pd
import psycopg2
from datetime import datetime

# Stage 1: Extract
df = pd.read_excel("vendor_master.xlsx", sheet_name="Vendors")

# Stage 2: Profile — identify quality issues
print("=== DATA PROFILE ===")
print(f"Total rows: {len(df)}")
print(f"Null NPWP: {df['npwp'].isna().sum()}")
print(f"Duplicate vendor_code: {df['vendor_code'].duplicated().sum()}")
print(f"Invalid email: {df[~df['email'].str.contains('@', na=False)].shape[0]}")

# Stage 3: Transform
def standardize_date(val):
    """Handle multiple date formats from Indonesian Excel files"""
    formats = ['%d/%m/%Y', '%d-%m-%Y', '%Y-%m-%d', '%d %b %Y']
    for fmt in formats:
        try:
            return datetime.strptime(str(val), fmt).date()
        except (ValueError, TypeError):
            continue
    return None  # Flag for review

df['created_date'] = df['created_date_raw'].apply(standardize_date)
df['vendor_code'] = df['vendor_code'].str.upper().str.strip()

# Remove duplicates — keep most recent
df = df.sort_values('created_date', ascending=False)
df = df.drop_duplicates(subset=['vendor_code'], keep='first')

# Stage 4: Validate
invalid_rows = df[df['npwp'].isna() | df['email'].isna()]
print(f"Rows needing review: {len(invalid_rows)}")
invalid_rows.to_excel("vendor_review_needed.xlsx", index=False)

# Stage 5: Load clean rows
clean_df = df[df['npwp'].notna() & df['email'].notna()]
print(f"Clean rows ready to load: {len(clean_df)}")

Migration Validation: How to Know Your Data Is Right

After each migration dry run, validate with three checks. Record count validation: the number of records in the source equals the number in the target (minus intentional exclusions). Financial balance validation: the sum of AR balances, AP balances, and inventory values in the new system matches the last closed period from the old system. Spot check validation: manually verify 20–30 random records across each entity type, confirming the data looks correct to the business data steward. All three checks must pass before you can declare the migration ready.

A common mistake is trying to migrate every historical record into the new ERP. For most Indonesian SMEs, migrating the last 2–3 years of transactions is sufficient — older data can be archived in its original format and accessed when needed. Migrating 10 years of messy historical data adds months to the project and introduces quality risks to the new system. Define a clear historical data cutoff date upfront and stick to it. The new system starts clean; the old system becomes the archive.

Managing the Cutover Migration

The final migration — from the last day of parallel running to go-live — is the highest-stakes data operation in the project. It must be completed within a defined cutover window (typically a weekend) because business operations stop while it runs. Plan the cutover migration in a detailed run book: every step, who executes it, how long it takes, how you verify it completed, and what you do if it fails. The run book should be rehearsed in the staging environment at least once before go-live weekend. Assign a dedicated migration engineer to execute the run book while the rest of the team monitors system health.

Post-Migration Data Quality: The Ongoing Work

Data quality doesn't end at go-live. New data enters the system every day, and without data governance processes, quality degrades over time. Establish data entry standards: required fields, approved code formats, validation rules in the UI. Appoint data stewards per entity type who run monthly data quality audits. Set up automated alerts for data quality violations — records with missing required fields, transactions with unusual amounts, or new vendor records that duplicate existing ones. Good data quality is the foundation of useful ERP reporting; poor data quality makes every report unreliable.

Sources & Further Reading

Frequently Asked Questions

ERP Data Quality and Migration Prep: The Work Nobody Wants to Do

Frequently Asked Questions

ERP Data Quality and Migration Prep: The Work Nobody Wants to Do

The Data Audit: Start on Day One

Common Data Quality Issues in Indonesian SMEs

The Data Cleansing Process

Technical Migration Architecture

Migration Validation: How to Know Your Data Is Right

Managing the Cutover Migration

Post-Migration Data Quality: The Ongoing Work

The Data Audit: Start on Day One

Common Data Quality Issues in Indonesian SMEs

The Data Cleansing Process

Technical Migration Architecture

Migration Validation: How to Know Your Data Is Right

Managing the Cutover Migration

Post-Migration Data Quality: The Ongoing Work