According to Deloitte, 79% of business leaders report that their team's productivity is hindered by disconnection between systems. Nowhere is this truer than document management: invoices emailed as PDFs, contracts stored on someone's hard drive, technical drawings scattered across Google Drive folders. When documents live outside the ERP, they can't be linked to transactions, can't trigger workflows, and can't be found quickly during an audit. At Commsult, I built an integrated document management module that links documents directly to ERP entities — contracts to vendors, invoices to AP entries, technical drawings to product records. This post covers the architecture.
Every document in our DMS has: document_id, entity_type (vendor, product, invoice, project, employee), entity_id (the linked ERP record), document_type (contract, invoice, certificate, drawing, photo), filename, storage_path (Google Cloud Storage key), file_size, mime_type, version (integer, starting at 1), uploaded_by, uploaded_at, and an optional expires_at (for compliance documents with expiry dates). This metadata enables: filtering documents by type and entity, expiry tracking for compliance, version history, and usage attribution.
All document files are stored in Google Cloud Storage, not in the database. We store only the GCS key (path) in PostgreSQL. Files are organized by folder structure: {entity_type}/{entity_id}/{document_type}/{version}/{filename}. Access is via GCS signed URLs with configurable TTL: 1 hour for in-app viewing, 24 hours for download links sent via email. Signed URLs prevent direct access without ERP authentication — files are not publicly accessible even if someone knows the storage path.
ERP Document Management Architecture
ERP Entity (vendor, invoice, project, employee)
│ linked via entity_type + entity_id
▼
┌─────────────────────────────────────────────────────┐
│ documents table (PostgreSQL) │
│ │
│ id │ UUID │
│ entity_type │ 'vendor' | 'invoice' | 'project' │
│ entity_id │ UUID (FK to entity) │
│ document_type│ 'contract' | 'invoice' | 'cert' │
│ storage_path │ GCS key (not the file itself) │
│ version │ INT (increments on replacement) │
│ expires_at │ TIMESTAMPTZ (nullable) │
│ text_content │ tsvector (for full-text search) │
└──────────────────────┬──────────────────────────────┘
│
┌──────────────┼──────────────┐
▼ ▼ ▼
┌──────────────┐ ┌──────────────┐ ┌──────────────────┐
│ GCS Storage │ │ OCR Extract │ │ Expiry Monitor │
│ (files) │ │ Vision API │ │ → alerts 60/30/ │
│ Signed URLs │ │ → tsvector │ │ 7 days before │
└──────────────┘ └──────────────┘ └──────────────────┘
Full-text search:
SELECT * FROM documents
WHERE text_content @@ to_tsquery('english', 'Sumber & Makmur')
AND uploaded_at BETWEEN '2025-01-01' AND '2025-03-31';From my experience building ERP systems at Commsult: implement document versioning from day one. The most common document management request after go-live is 'I uploaded the wrong version of the contract — can I replace it without losing the old one?' If you don't have versioning, the answer is ugly. Our system creates a new version row for each replacement, retaining all previous versions. The 'active' version is the highest version number. We've never had to do a data recovery because of a bad document upload since implementing versioning.
Documents should trigger and participate in workflows. When a vendor uploads an NPWP certificate via the vendor portal, the document appears in the ERP procurement team's review queue. When an AP invoice is approved, the attached PDF invoice automatically archives to the vendor's document folder. When a project contract is uploaded, the system extracts the contract value and populates the project budget if the document type is CONTRACT. These workflow integrations turn the DMS from a filing cabinet into an active participant in business processes.
Raw PDF files are not searchable unless you extract their text. We use Google Cloud Vision API for OCR on uploaded PDFs and images: the text content is extracted and stored in a document_search table as a PostgreSQL tsvector. Full-text search against the documents uses PostgreSQL's ts_query, which handles Indonesian and English text. When an auditor asks 'find all invoices from PT Sumber Makmur between January and March 2025', the search runs in under a second and returns the matching documents with highlighted excerpts.
// NestJS: Document upload with GCS storage + OCR
@Post('/documents/upload')
@UseInterceptors(FileInterceptor('file'))
async uploadDocument(
@UploadedFile() file: Express.Multer.File,
@Body() dto: UploadDocumentDto,
@GetUser() user: User,
) {
// 1. Determine storage path and version
const existing = await this.docRepo.findLatestVersion(
dto.entityType, dto.entityId, dto.documentType
);
const version = (existing?.version ?? 0) + 1;
const storagePath = [
dto.entityType, dto.entityId, dto.documentType,
`v${version}`, file.originalname
].join('/');
// 2. Upload to GCS
await this.storageService.upload(storagePath, file.buffer, file.mimetype);
// 3. Extract text via Google Cloud Vision (async)
const doc = await this.docRepo.save({
entityType: dto.entityType,
entityId: dto.entityId,
documentType: dto.documentType,
storagePath, version,
fileSize: file.size,
mimeType: file.mimetype,
uploadedBy: user.id,
expiresAt: dto.expiresAt,
});
// Trigger OCR extraction asynchronously
await this.ocrQueue.add('extract-text', { documentId: doc.id, storagePath });
return { documentId: doc.id, version, storagePath };
}
// OCR processor: updates tsvector for full-text search
@Process('extract-text')
async extractText(job: Job<{ documentId: string; storagePath: string }>) {
const imageBytes = await this.storageService.download(job.data.storagePath);
const [result] = await this.visionClient.textDetection({
image: { content: imageBytes },
});
const text = result.fullTextAnnotation?.text ?? '';
await this.docRepo.update(job.data.documentId, {
textContent: () => `to_tsvector('english', ${JSON.stringify(text)})`,
});
}The DMS backend is a NestJS DocumentModule with a StorageService (wraps GCS), DocumentService (metadata CRUD), and SearchService (PostgreSQL full-text). The front end is a React document viewer using react-pdf for PDF rendering and a folder tree view built with react-arborist. Documents open in a side panel without leaving the current ERP page — the user can view a contract while editing the vendor record it's linked to. We also implemented a bulk download feature that creates a ZIP archive of selected documents via streaming.
Document storage costs accumulate faster than expected. A company that stores all invoices, contracts, photos, and reports for 5 years can easily accumulate 500GB-1TB of files. Google Cloud Storage costs for 1TB in the asia-southeast2 (Jakarta) region are approximately $20/month for storage plus egress costs for downloads. Implement a document lifecycle policy: move documents older than 1 year to Coldline storage (10x cheaper for rarely accessed archives). Also enforce maximum file sizes (we cap at 50MB per document and 500MB per entity) to prevent engineers from accidentally uploading video files.
Indonesian regulatory compliance requires tracking documents with expiry dates: PKP certificates (renewable annually), SIUP (business license), API (import license), SNI certifications, ISO certificates, and staff qualification certificates. Our DMS sends email alerts 60 days, 30 days, and 7 days before a compliance document expires. The procurement module checks document validity before issuing POs to vendors — if a vendor's PKP certificate has expired, new PO creation is blocked. This automated compliance tracking replaced a manual calendar reminder system that frequently missed expirations.
During an audit (tax audit by DJP, or client-required compliance audit), auditors request specific documents: all purchase invoices for a period, all contracts with a specific vendor, evidence of approval for transactions above a threshold. Our DMS supports: bulk export by date range, entity type, and document type; audit trail showing who accessed each document and when; and a read-only auditor access level that grants view-only access to specific document sets without exposing the rest of the ERP. The eDiscovery export runs as a background job and emails the requester a download link when complete.