Data migration is responsible for over 75% of ERP implementation problems, yet it's consistently the most underfunded and delayed phase of every project. In every ERP implementation I've been part of at Commsult Indonesia, the data migration timeline was underestimated by the client, underscoped by the project plan, and only truly understood when the team opened the first Excel file and found merged cells, inconsistent date formats (some dd/mm/yyyy, some mm-dd-yy, some just '1 Jan 2022'), duplicate customer records with different names ('PT Maju Jaya', 'Maju Jaya PT', 'pt maju jaya'), and product codes that had been reused for discontinued items. This post is a practitioner's guide to doing data migration right.
Before a single line of ERP code is written, run a data audit. The audit answers four questions: Where is your data? What format is it in? How dirty is it? How much of it do you actually need in the new system? Most Indonesian SMEs have data in three to five places: a primary accounting spreadsheet, a separate HR file (usually a shared Excel workbook), vendor and customer master data in another spreadsheet or an old desktop accounting system, and historical transaction records in email attachments and WhatsApp conversations. Map all of these sources before estimating migration complexity.
The most common data quality problems I encounter in Indonesian SME migrations: duplicate records (the same customer appearing 3–4 times with slight name variations), missing required fields (no NPWP number for vendor records, no email for customer contacts), inconsistent coding (product codes changed over time but old codes weren't retired, creating ambiguity), date format inconsistency (multiple formats used across files), and 'dead' records (suppliers no longer active, products discontinued years ago) that haven't been cleaned up. Each of these requires a specific remediation strategy before migration.
Data cleansing is not an IT task — it's a business task with IT support. The business owns the data and the business must validate it. For each entity type (customers, vendors, products, employees, historical transactions), assign a business data steward responsible for cleaning their domain. Provide them with a structured data validation template — a spreadsheet with one row per record, columns for required fields, and a status column (Clean, Needs Review, Duplicate, Delete). Set a hard deadline: data cleansing must be complete before UAT begins, not after.
ERP Data Migration Pipeline (Excel/Access → PostgreSQL)
SOURCE SYSTEMS STAGING DB TARGET ERP
┌───────────────┐ ┌──────────────┐ ┌──────────────┐
│ Excel Files │──extract──▶│ customers │ │ │
│ (many sheets) │ │ vendors │──load──▶│ ERP │
├───────────────┤ │ products │ │ PostgreSQL │
│ Access DB │──extract──▶│ invoices │ │ (clean) │
│ (.mdb/.accdb) │ │ employees │ │ │
├───────────────┤ └──────┬───────┘ └──────────────┘
│ Old ERP │──export──▶ │
│ (CSV export) │ TRANSFORM STAGE
└───────────────┘ ┌──────▼───────────────────────────┐
│ 1. Remove duplicates │
│ 2. Standardize date formats │
│ 3. Validate required fields │
│ 4. Map old codes → new codes │
│ 5. Validate referential integrity │
│ 6. Flag records needing review │
└──────────────────────────────────┘
DRY RUN CHECKLIST (run 3 times before go-live):
Run 1: Fix structural errors (column mappings, missing fields)
Run 2: Fix business logic errors (dates, balances, codes)
Run 3: Dress rehearsal — must complete cleanly ✓From my experience implementing ERPs at Commsult: run a data migration dry run at least three times before go-live. The first run reveals the structural errors (wrong column mappings, missing required fields). The second run reveals the business logic errors (dates outside valid range, amounts that don't balance). The third run is the dress rehearsal — it should complete cleanly and match the expected record counts. If the third run fails, you're not ready for go-live.
A robust migration pipeline has three stages: Extract (pull data from source systems), Transform (clean, map, and validate), and Load (insert into the target ERP database). For Excel and Access source systems, the extract stage uses Python scripts to read files and normalize data into a staging database. The transform stage applies business rules: merge duplicate records, fill missing fields from secondary sources, convert date formats, validate referential integrity. The load stage uses the ERP's import API or direct database inserts, with comprehensive error logging for every rejected record.
# Python: Extract from Excel → Staging DB → Validate
import pandas as pd
import psycopg2
from datetime import datetime
# Stage 1: Extract
df = pd.read_excel("vendor_master.xlsx", sheet_name="Vendors")
# Stage 2: Profile — identify quality issues
print("=== DATA PROFILE ===")
print(f"Total rows: {len(df)}")
print(f"Null NPWP: {df['npwp'].isna().sum()}")
print(f"Duplicate vendor_code: {df['vendor_code'].duplicated().sum()}")
print(f"Invalid email: {df[~df['email'].str.contains('@', na=False)].shape[0]}")
# Stage 3: Transform
def standardize_date(val):
"""Handle multiple date formats from Indonesian Excel files"""
formats = ['%d/%m/%Y', '%d-%m-%Y', '%Y-%m-%d', '%d %b %Y']
for fmt in formats:
try:
return datetime.strptime(str(val), fmt).date()
except (ValueError, TypeError):
continue
return None # Flag for review
df['created_date'] = df['created_date_raw'].apply(standardize_date)
df['vendor_code'] = df['vendor_code'].str.upper().str.strip()
# Remove duplicates — keep most recent
df = df.sort_values('created_date', ascending=False)
df = df.drop_duplicates(subset=['vendor_code'], keep='first')
# Stage 4: Validate
invalid_rows = df[df['npwp'].isna() | df['email'].isna()]
print(f"Rows needing review: {len(invalid_rows)}")
invalid_rows.to_excel("vendor_review_needed.xlsx", index=False)
# Stage 5: Load clean rows
clean_df = df[df['npwp'].notna() & df['email'].notna()]
print(f"Clean rows ready to load: {len(clean_df)}")After each migration dry run, validate with three checks. Record count validation: the number of records in the source equals the number in the target (minus intentional exclusions). Financial balance validation: the sum of AR balances, AP balances, and inventory values in the new system matches the last closed period from the old system. Spot check validation: manually verify 20–30 random records across each entity type, confirming the data looks correct to the business data steward. All three checks must pass before you can declare the migration ready.
A common mistake is trying to migrate every historical record into the new ERP. For most Indonesian SMEs, migrating the last 2–3 years of transactions is sufficient — older data can be archived in its original format and accessed when needed. Migrating 10 years of messy historical data adds months to the project and introduces quality risks to the new system. Define a clear historical data cutoff date upfront and stick to it. The new system starts clean; the old system becomes the archive.
The final migration — from the last day of parallel running to go-live — is the highest-stakes data operation in the project. It must be completed within a defined cutover window (typically a weekend) because business operations stop while it runs. Plan the cutover migration in a detailed run book: every step, who executes it, how long it takes, how you verify it completed, and what you do if it fails. The run book should be rehearsed in the staging environment at least once before go-live weekend. Assign a dedicated migration engineer to execute the run book while the rest of the team monitors system health.
Data quality doesn't end at go-live. New data enters the system every day, and without data governance processes, quality degrades over time. Establish data entry standards: required fields, approved code formats, validation rules in the UI. Appoint data stewards per entity type who run monthly data quality audits. Set up automated alerts for data quality violations — records with missing required fields, transactions with unusual amounts, or new vendor records that duplicate existing ones. Good data quality is the foundation of useful ERP reporting; poor data quality makes every report unreliable.