Why should you audit the source Excel files before writing any migration code?

A data audit — taking at least one week — answers critical questions about deduplication, date formats, required vs optional fields, status column values, and cross-references between sheets. Without this baseline, migration scripts are built on assumptions that break during testing. The audit also surfaces common Indonesian business data issues like NPWP numbers stored with dots and dashes, and dates written as Indonesian text strings that standard date libraries cannot parse.

How do ExcelJS and Zod work together in the migration script?

ExcelJS handles reading real-world Excel files in Node.js, including merged cells, rich text, and formula values. Zod then validates each parsed row against a defined schema, reporting exactly which field failed and why for every invalid row. Together they separate the concerns of reading raw spreadsheet data from enforcing the data shape required by the ERP.

Why must the migration script be idempotent, and how is that achieved?

Idempotency means the script can be run multiple times against the same database without creating duplicate records. This is essential because the post recommends running the script at least three times before go-live: once for a baseline, once after data cleaning, and once on go-live day itself. It is achieved by using TypeORM's upsert with a unique conflict key, so reruns update existing records rather than inserting new ones, and wrapping bulk inserts in a database transaction so any failure rolls back the entire batch.

What is the risk of importing open records — such as unpaid invoices — at the wrong status?

Open records like unpaid invoices must be imported at exactly the right state so that ERP automation picks them up correctly. The post calls out importing an overdue invoice as DRAFT status as the most dangerous migration mistake, because it will not trigger any payment reminders or downstream workflows. Closed records such as paid invoices carry less risk since they do not feed into live automation.

What does the parallel running phase involve, and why is it required before hard cutover?

During parallel running, which should last at least two weeks, transactions are entered in both the old Excel system and the new ERP simultaneously. At the end of each week, key totals are reconciled between the two systems, and any discrepancy triggers an investigation before cutover is allowed to proceed. This phase ensures that data integrity is confirmed in a live environment before the Excel files are locked and the ERP becomes the single source of truth.

Migrating from Excel to a Custom ERP: Data Mapping, Validation, and Cutover Strategy

For most Indonesian SMEs, the starting point for an ERP project is a collection of Excel files — one for customers, one for vendors, one for each year's invoices, and probably several versions of each with names like 'Final_v3_FINAL_USE_THIS.xlsx'. Migrating this data into a new ERP system is one of the most underestimated and most critical tasks in the entire project. Get it wrong and you go live with corrupted data that erodes user trust immediately. Get it right and users arrive on day one to find their familiar customer list, invoice history, and open balances already in the system.

Phase 1: Data Audit — Know What You're Working With

Before writing a single line of migration code, spend at least one week auditing the source Excel files. The audit should answer: how many unique customers exist across all files (with deduplication), what date formats are used, what fields are required vs optional, what values appear in 'status' columns, and what cross-references exist between sheets.

Common Excel Data Problems in Indonesian Businesses

The most common data quality issues in Indonesian business Excel files are: NPWP numbers stored as text with dots and dashes needing normalization to 15 digits; dates entered as Indonesian text strings ('15 Maret 2023') that standard date libraries can't parse; duplicate customer records with slightly different names ('PT Maju Bersama' vs 'PT. Maju Bersama'); currency amounts stored as text with 'Rp' prefix; and merged cells causing rows to appear empty.

Data Mapping: Excel Columns to ERP Fields

Create a mapping document that shows every Excel column, its target ERP field, the transformation required, and validation rules. This document becomes the specification for your migration script and the agreement between the technical team and business stakeholders about what data is moving and how. For each target field, document whether it's required or optional, its data type, allowed values (for enums), and what happens when the source value is missing or invalid.

Excel → Custom ERP Migration Pipeline

  ┌─────────────────┐
  │  SOURCE: Excel  │  Multiple sheets, ad-hoc formulas,
  │  (.xlsx files)  │  merged cells, inconsistent formats
  └────────┬────────┘
           │  Step 1: Extract & Parse
           ▼
  ┌─────────────────────────────────────┐
  │        EXTRACT (Node.js / exceljs)  │
  │  • Read all sheets                  │
  │  • Strip formatting, get raw values │
  │  • Map columns → field names        │
  └────────────────┬────────────────────┘
                   │  Step 2: Validate
                   ▼
  ┌─────────────────────────────────────┐
  │         VALIDATE (Zod schemas)      │
  │  • Required fields present?         │
  │  • Date formats parseable?          │
  │  • FK references exist? (customer)  │
  │  • Duplicate invoice numbers?       │
  │  • Amount totals match?             │
  └──────┬──────────────────────┬───────┘
         │ PASS                 │ FAIL
         ▼                      ▼
  ┌─────────────┐        ┌─────────────────┐
  │  TRANSFORM  │        │  ERROR REPORT   │
  │  & LOAD     │        │  (.xlsx output) │
  │  PostgreSQL │        │  Row + reason   │
  └─────────────┘        └─────────────────┘

  Step 3: Cutover
  ┌────────────────────────────────────┐
  │  Parallel run: Excel + ERP (2 wk) │
  │  → Reconcile totals daily         │
  │  → Hard cutover on month-end      │
  └────────────────────────────────────┘

Run your migration script against the real Excel files at least three times before go-live: once at the start of data preparation to establish a baseline, once after data cleaning to measure improvement, and once on go-live day with the final data export. Each run should produce an error report — the goal is to drive that report to zero rows.

Building the Migration Script with ExcelJS and Zod

We use ExcelJS for reading Excel files in Node.js — it handles merged cells, rich text, formula values, and all the quirks of real-world Excel files. Zod provides schema validation with TypeScript type inference — you define the expected shape of each row, and Zod tells you exactly which field failed and why for each invalid row.

Bulk Insert with Upsert for Idempotent Reruns

The migration script should be idempotent — you should be able to run it multiple times against the same database without creating duplicates. Use TypeORM's upsert with a unique conflict key so that reruns update existing records rather than creating new ones. Wrap the bulk insert in a database transaction so that if any row fails, the entire batch is rolled back.

// NestJS: Excel import service using exceljs + Zod validation
import * as ExcelJS from 'exceljs';
import { z } from 'zod';

const CustomerRowSchema = z.object({
  customerCode: z.string().min(1),
  name:         z.string().min(1),
  taxId:        z.string().regex(/^d{15,16}$/, 'NPWP must be 15-16 digits').optional(),
  creditLimit:  z.coerce.number().min(0),
  paymentTerms: z.enum(['NET_7', 'NET_14', 'NET_30', 'NET_45', 'NET_60']),
  phone:        z.string().optional(),
  email:        z.string().email().optional(),
});

@Injectable()
export class ExcelImportService {
  async importCustomers(buffer: Buffer): Promise<ImportResult> {
    const workbook = new ExcelJS.Workbook();
    await workbook.xlsx.load(buffer);
    const sheet = workbook.getWorksheet('Customers');

    const errors: ImportError[] = [];
    const valid: CustomerRow[] = [];
    const HEADER_ROW = 1;

    sheet.eachRow({ includeEmpty: false }, (row, rowNumber) => {
      if (rowNumber === HEADER_ROW) return; // skip header

      const rawData = {
        customerCode: row.getCell('A').text.trim(),
        name:         row.getCell('B').text.trim(),
        taxId:        row.getCell('C').text.trim() || undefined,
        creditLimit:  row.getCell('D').value,
        paymentTerms: row.getCell('E').text.trim().toUpperCase(),
        phone:        row.getCell('F').text.trim() || undefined,
        email:        row.getCell('G').text.trim() || undefined,
      };

      const result = CustomerRowSchema.safeParse(rawData);
      if (result.success) {
        valid.push(result.data);
      } else {
        errors.push({
          row: rowNumber,
          data: rawData,
          issues: result.error.issues.map(i => i.message),
        });
      }
    });

    // Bulk insert valid rows in a transaction
    if (valid.length > 0) {
      await this.dataSource.transaction(async manager => {
        await manager.upsert(Customer, valid, ['customerCode']);
      });
    }

    return { imported: valid.length, failed: errors.length, errors };
  }
}

Handling Historical Invoices and Open Balances

Historical data comes in two flavors: closed records (paid invoices, completed leave requests) and open records (unpaid invoices, pending approvals). Open records need to be imported at exactly the right state so that the automation picks them up correctly. The most dangerous migration mistake is importing an overdue invoice as DRAFT status — it won't trigger any reminders.

Never delete the source Excel files after migration. Archive them in read-only storage and keep them for at least 12 months after go-live. Users will inevitably find discrepancies between the ERP and what they remember from Excel, and you'll need the original files to investigate. Also, before go-live, take a full PostgreSQL database dump and store it offline — if something catastrophic happens in the first 48 hours, you need a clean rollback point.

The Parallel Running Phase

After the initial migration, run the old Excel system and the new ERP in parallel for at least 2 weeks before the hard cutover. During parallel running, transactions are entered in both systems. At the end of each week, reconcile key totals between the two systems. Any discrepancy triggers an investigation before cutover.

Cutover Day: A Practical Checklist

Schedule go-live for a Friday afternoon or a quiet mid-month day. Final steps: export the last Excel data on Thursday evening, run the migration script one final time Friday morning, verify record counts match, run reconciliation totals, have the finance manager sign off, disable Excel file editing, send company-wide announcement, and have the implementation team on-site or on call for the full first week.

Sources & Further Reading

Frequently Asked Questions

Migrating from Excel to a Custom ERP: Data Mapping, Validation, and Cutover Strategy

Frequently Asked Questions

Migrating from Excel to a Custom ERP: Data Mapping, Validation, and Cutover Strategy

Phase 1: Data Audit — Know What You're Working With

Common Excel Data Problems in Indonesian Businesses

Data Mapping: Excel Columns to ERP Fields

Building the Migration Script with ExcelJS and Zod

Bulk Insert with Upsert for Idempotent Reruns

Handling Historical Invoices and Open Balances

The Parallel Running Phase

Cutover Day: A Practical Checklist

Phase 1: Data Audit — Know What You're Working With

Common Excel Data Problems in Indonesian Businesses

Data Mapping: Excel Columns to ERP Fields

Building the Migration Script with ExcelJS and Zod

Bulk Insert with Upsert for Idempotent Reruns

Handling Historical Invoices and Open Balances

The Parallel Running Phase

Cutover Day: A Practical Checklist