VPC networking is the part of cloud infrastructure that most developers avoid until something breaks — a service can't reach the database, a container can't pull from the registry, or two services in different VPCs need to communicate. I avoided deep VPC knowledge for the first year of managing cloud infrastructure at Commsult Indonesia, relying on default settings and hoping things would work. Then I had to debug a GCP Cloud SQL connection issue that turned out to be a Private Service Access configuration problem — something I'd never heard of. After that, I spent two weeks systematically learning cloud networking. This guide is what I wish I'd had then.
A Virtual Private Cloud (VPC) is a logically isolated network within a cloud provider's infrastructure. Unlike a physical network, a VPC is software-defined — its topology, routing, and firewall rules are all configuration, not hardware. On GCP, VPCs are global by default: a single GCP VPC can span all regions, with subnets in different regions all part of the same network. On AWS and DigitalOcean, VPCs are regional — you create separate VPCs per region. This architectural difference matters: in GCP, a VM in us-central1 can communicate with a VM in asia-southeast1 within the same VPC without any additional configuration (subject to firewall rules).
Subnets divide a VPC's IP address space into smaller blocks, each associated with a specific zone or region. A subnet in 10.0.1.0/24 provides 254 usable IP addresses for resources deployed into it. Public subnets route to the internet via an internet gateway — resources in public subnets can receive inbound traffic from the internet if firewall rules allow. Private subnets have no direct internet gateway — resources can reach the internet only via a NAT gateway, and cannot receive unsolicited inbound connections from the internet. The best practice: deploy application servers and databases in private subnets, and place only load balancers and bastion hosts in public subnets.
Cloud firewall rules operate on a default-deny model for ingress (inbound) traffic — unless a rule explicitly allows traffic, it's blocked. Egress (outbound) traffic is allowed by default in most configurations. GCP uses ingress/egress firewall rules with priority values (lower priority number = evaluated first). DigitalOcean uses Cloud Firewalls with explicit allow rules. The critical discipline: never create firewall rules that allow all inbound traffic (0.0.0.0/0) except for specific justified exceptions like a public-facing load balancer on ports 80 and 443. Database ports should never be exposed to 0.0.0.0/0.
From my experience: use GCP's network tags for firewall rules instead of individual VM IP addresses. Tag your web server VMs with tag 'web-server' and your database VMs with tag 'database'. Write a firewall rule that allows traffic from tag 'web-server' to tag 'database' on port 5432. When you add new web servers, applying the 'web-server' tag automatically grants them database access — no firewall rule editing required. This scales to dozens of VMs without manual IP management and is far more readable than IP-based rules.
Resources in private subnets cannot initiate connections to the internet by default. But they often need to — to download packages, call external APIs, or pull container images. A NAT (Network Address Translation) gateway sits in a public subnet and proxies outbound internet connections for private subnet resources. The private resource's traffic appears to come from the NAT gateway's public IP. GCP's Cloud NAT is a managed, highly available service — no VM to manage. DigitalOcean requires a Droplet configured as a NAT gateway (or using their managed NAT add-on). Always use a NAT gateway for private subnet resources rather than giving them public IPs.
┌───────────────────────────────────────────────────────┐
│ VPC Network Architecture │
├───────────────────────────────────────────────────────┤
│ │
│ Internet │
│ │ │
│ ┌───▼──────────────────────────┐ │
│ │ Public Subnet 10.0.1.0/24 │ │
│ │ [Load Balancer] [Bastion] │ │
│ └───┬──────────────────────────┘ │
│ │ NAT Gateway │
│ ┌───▼──────────────────────────┐ │
│ │ Private Subnet 10.0.2.0/24 │ │
│ │ [App Servers] [Workers] │ │
│ └───┬──────────────────────────┘ │
│ │ │
│ ┌───▼──────────────────────────┐ │
│ │ Private Subnet 10.0.3.0/24 │ │
│ │ [Database] [Redis] │ │
│ └──────────────────────────────┘ │
└───────────────────────────────────────────────────────┘When two VPCs need to communicate — for example, a main application VPC and a services VPC, or your VPC and a Google-managed services VPC — you use VPC peering. VPC peering creates a direct, private connection between two VPCs without traffic traversing the public internet. In GCP, Private Service Access is a specific type of VPC peering that connects your VPC to Google's service producer network — required for Cloud SQL private IP, Memorystore Redis, and other managed services. Setting up Private Service Access requires creating a private services range in your VPC and enabling the servicenetworking.googleapis.com API.
I spent three hours debugging a failed VPC peering setup between a client VPC (10.0.0.0/16) and our management VPC (10.0.0.0/24). The peering creation succeeded in the console but traffic wouldn't flow — routes weren't being imported. The cause: both VPCs used overlapping IP ranges, and GCP VPC peering requires non-overlapping CIDR ranges. The fix required renumbering one VPC, which meant recreating subnets and migrating VMs. Lesson: plan your IP addressing scheme before provisioning anything. Use non-overlapping ranges across all VPCs you might ever need to peer: 10.0.0.0/16 for VPC A, 10.1.0.0/16 for VPC B, 10.2.0.0/16 for VPC C.
Within a GCP VPC, all VM instances get an internal DNS name based on their instance name and zone: my-instance.asia-southeast1-b.c.my-project.internal. Services can communicate via these DNS names rather than hardcoded IP addresses. For Kubernetes workloads in GKE, Kubernetes DNS (CoreDNS) provides service-level DNS within the cluster. For service mesh scenarios where you need fine-grained traffic control, Istio or GCP's Anthos Service Mesh adds a sidecar proxy model on top of VPC networking. For most applications, internal DNS and VPC-native network tags are sufficient without a full service mesh.
# Create VPC with non-overlapping CIDR ranges
gcloud compute networks create my-vpc --subnet-mode=custom --bgp-routing-mode=regional
# Create private subnet for app servers
gcloud compute networks subnets create app-subnet --network=my-vpc --region=asia-southeast1 --range=10.0.2.0/24 --enable-private-ip-google-access
# Firewall rule using network tags (not IP addresses)
gcloud compute firewall-rules create allow-web-to-db --network=my-vpc --direction=INGRESS --action=ALLOW --rules=tcp:5432 --source-tags=web-server --target-tags=database
# Enable Cloud NAT for private subnet outbound internet access
gcloud compute routers create my-router --network=my-vpc --region=asia-southeast1
gcloud compute routers nats create my-nat --router=my-router --region=asia-southeast1 --auto-allocate-nat-external-ips --nat-all-subnet-ip-rangesIn a properly configured VPC, database servers and application backends are in private subnets with no public IP. SSH access for administration must go through a bastion host (jump server) — a small VM in the public subnet with a public IP, hardened with fail2ban, restricted SSH to specific IPs, and MFA enabled. The bastion is the single entry point to the private network. In GCP, the preferred modern alternative is the Identity-Aware Proxy (IAP) tunnel — you SSH to private VMs through Google's IAP without any public IP on the target VM or a bastion at all. IAP tunnels are authenticated via Google account credentials and don't require firewall rule exceptions for SSH.
Before creating any network resources, draw a diagram. Which subnets are public? Which are private? Which services need internet access? Which services need to communicate with each other? Which external services need to reach into your VPC? This 30-minute design step prevents the overlapping CIDR problems, the firewall rule sprawl, and the 'why can't my Cloud Run service reach Cloud SQL' debugging sessions that consume hours. I now use draw.io to maintain a current VPC architecture diagram for every GCP project we manage at Commsult Indonesia — it's the first document I look at when debugging network issues.
Sources & Further Reading