30% of ERP projects experience critical failures during go-live week, with the first 48 hours being the highest-risk period. I've been through several ERP go-live events at Commsult Indonesia — some smooth, some terrifying — and the single biggest predictor of success is not the quality of the code. It's the completeness of the preparation. This checklist synthesizes what I've learned from both successful and failed go-lives into 40 actionable checkpoints organized by category. Run through this in the two weeks before your go-live date. If any item is red, you're not ready.
These are the infrastructure and system checks that confirm the environment is production-ready. Each must be verified in the production environment, not staging. 1. Production server provisioned and performance-tested under expected user load. 2. SSL certificates installed and valid for minimum 12 months. 3. Database backups configured and tested (restore a backup — don't just assume it works). 4. Monitoring and alerting set up (Grafana, UptimeRobot, or equivalent). 5. All environment variables and secrets confirmed in production config. 6. Application performance tested with realistic data volumes. 7. All third-party integrations (payment gateway, email, SMS) tested in production. 8. Rollback procedure documented and rehearsed.
These confirm the migrated data is accurate and complete. 9. Final data migration dry run completed successfully. 10. Record counts validated (source = target for all entity types). 11. Financial balances validated (AR, AP, inventory values match last period close). 12. Spot check validation completed by business data stewards for 20+ records per entity. 13. Data cutover run book written and rehearsed. 14. Cutover migration window scheduled (weekend preferred). 15. Old system freeze date confirmed (no new transactions after this point). 16. Archive of old system data confirmed in secure, accessible storage.
These confirm users are trained and ready. 17. All users have login credentials and have successfully logged in to production. 18. Role-based permissions tested for every user role. 19. Training completed for 100% of users who will use the system on Day 1. 20. Training videos accessible and indexed in internal wiki. 21. Quick reference guides printed and distributed to key users. 22. Department champions confirmed and briefed on Day 1 support plan. 23. Support escalation path documented (who to call, in what order, for what type of issue).
ERP Go-Live Master Checklist (40 items)
TECHNICAL READINESS (items 1–8)
[ ] 1. Production server load-tested at 2× expected peak users
[ ] 2. SSL certificate installed, valid 12+ months, auto-renew configured
[ ] 3. Database backup restored successfully from a recent backup
[ ] 4. Monitoring: Grafana/UptimeRobot alerts configured + tested
[ ] 5. All env vars in production config (no dev values)
[ ] 6. App performance tested with full production data volume
[ ] 7. Third-party integrations tested in prod (Midtrans, email, SMS)
[ ] 8. Rollback procedure documented, rehearsed, team knows the steps
DATA READINESS (items 9–16)
[ ] 9. Final migration dry run #3 completed, zero errors
[ ] 10. Record counts: source = target for all entity types
[ ] 11. Financial balances match last period close (AR, AP, inventory)
[ ] 12. Spot check: 20+ records per entity validated by data stewards
[ ] 13. Cutover run book written, rehearsed in staging environment
[ ] 14. Cutover window scheduled (weekend, non-peak period)
[ ] 15. Old system freeze date confirmed + communicated to all users
[ ] 16. Old system data archived in secure, accessible storage
USER READINESS (items 17–23)
[ ] 17. 100% of Day-1 users have production login credentials
[ ] 18. Role permissions verified for every user role (not just admins)
[ ] 19. Training complete for every user who works on Day 1
[ ] 20. Training videos accessible in internal wiki
[ ] 21. Quick reference cards printed + posted at workstations
[ ] 22. All dept champions briefed and confirmed available Day 1
[ ] 23. Support escalation path: who to call, in what order
BUSINESS READINESS (items 24–30)
[ ] 24. All dept heads have signed UAT sign-off documents
[ ] 25. Go-live date communicated company-wide
[ ] 26. Customer communications prepared (if AR process changes)
[ ] 27. Vendor communications prepared (if AP payment process changes)
[ ] 28. Payroll schedule reviewed — no conflict with go-live week
[ ] 29. No month-end/year-end within 2 weeks of go-live date
[ ] 30. Executive sponsor confirmed available for go-live week
DAY OF GO-LIVE (items 31–35)
[ ] 31. War room / coordination channel established (Slack/WhatsApp)
[ ] 32. 08:00 morning check-in call scheduled with all key personnel
[ ] 33. First real transaction confirmed successful within Hour 1
[ ] 34. System performance dashboard visible to technical team
[ ] 35. Issue log started — log every report immediately
POST-GO-LIVE (items 36–40)
[ ] 36. Daily standup with dept champions for first 14 days
[ ] 37. Issue resolution: P1 <2hrs, P2 <24hrs, P3 weekly sprint
[ ] 38. Adoption metrics reviewed weekly (DAU, error rate, tickets)
[ ] 39. 30-day formal review meeting scheduled (with dept heads)
[ ] 40. Hypercare end date defined and communicated upfrontFrom my experience implementing ERPs at Commsult: run a go-live simulation the week before. Treat it exactly like the real go-live: execute the cutover run book, have all users log in and complete their Day 1 tasks, and time every step. Anything that fails in the simulation gives you one week to fix it. Anything that fails on actual go-live gives you a panic-filled weekend.
These confirm the business is operationally ready for the transition. 24. All department heads have formally signed off on their UAT results. 25. Go-live date communicated to all staff including vendors and customers if applicable. 26. Customer-facing communications prepared if AR workflows change for customers. 27. Vendor-facing communications prepared if AP payment process changes. 28. Payroll calendar reviewed — confirm go-live doesn't conflict with payroll processing. 29. Month-end or year-end calendar reviewed — go-live should not be within 2 weeks of period close. 30. Executive sponsor has confirmed availability for go-live week.
# ERP Go-Live Rollback Decision Tree
if critical_failure_detected:
# P1: System completely down or core process broken
step1 = "Notify executive sponsor and implementation team immediately"
step2 = "Assess: Can the issue be fixed within 4 hours?"
if fixable_within_4_hours:
# Hot fix: keep system up, deploy fix
action = "Apply fix to production while users are on hold"
comms = "Notify all users: 'System temporarily unavailable, ETA Xhrs'"
else:
# Rollback to old system
action = [
"Restore pre-go-live database backup",
"Bring old system back online",
"Notify all users to revert to old system",
"Log all transactions entered since go-live cutover",
"Schedule emergency post-mortem for next business day",
]
comms = "All-staff: 'ERP go-live postponed. Continue using [old system].'"
# Rollback is not failure — skipping rollback when needed is failure
# Having a tested rollback plan is what makes go-live safe to attemptThese are the Day 1 execution steps. 31. War room or coordination channel established (Slack, Teams, or WhatsApp group with all key personnel). 32. Morning check-in call scheduled for 8:00 AM on go-live day. 33. First real transaction confirmed successful within first hour. 34. System performance dashboard visible to technical team throughout the day. 35. Issue log started — every reported problem logged immediately with severity, user, and timestamp.
I've heard 'we'll fix it after go-live' many times. It almost never happens. Post-go-live, the team is in firefighting mode, users are overwhelmed, and the implementation partner is wrapping up. Known defects that are accepted as 'post-go-live fixes' become permanent workarounds. Every item on your go-live checklist that you mark 'acceptable risk' should have a named owner, a specific deadline, and a defined impact if not fixed. Anything critical that cannot be fixed before go-live is a reason to delay go-live — not to proceed and hope.
The first 48 hours require dedicated technical coverage. Monitor: server CPU and memory under real user load (flag anything above 80% sustained), database query performance (slow queries will appear under real data volumes that staging didn't surface), error rates in application logs (anything above 1% of requests), email delivery success for any automated notifications, and user-reported issues through the support channel. Resolve P1 issues (system down, core process broken) within 2 hours. Resolve P2 issues (a workflow fails for a subset of users) within 24 hours.
Hypercare is the structured 30–90 day period after go-live where the implementation team provides intensive support. Plan for hypercare before you go live: who staffs it, what the support hours are, what the escalation path is, and when hypercare officially ends. At Commsult, our standard hypercare covers 30 days of dedicated support with daily check-ins for the first 2 weeks and weekly reviews for weeks 3–4. Usage metrics, error rates, and user-reported issues are reviewed daily. Hypercare doesn't end at day 30 because the calendar says so — it ends when adoption metrics hit targets and the business is stable.