The Bug That Wasn't a Bug

It Worked on My Machine

The most dangerous sentence in software engineering. And I said it, out loud, to myself, at 2 AM.

I was building an OAuth login flow with NextAuth and Prisma on Vercel. Google OAuth. Standard stuff. User clicks sign in, Google authenticates, callback fires, account gets linked in the database, session starts. I've done this before. It's not complicated.

Locally, it worked perfectly. Every time. No delays, no errors, no drama. Push to Vercel? The callback endpoint hangs for 10 seconds and dies. User record gets created fine. But the moment NextAuth tries to link the OAuth account — the step where it inserts the access token, refresh token, and ID token into the Account table — nothing. The query never completes.

The Prisma linkAccount() adapter method just... hangs. Indefinitely. Until Vercel's serverless function times out.

The Rabbit Hole

So I did what any reasonable engineer would do. I blamed the ORM.

Maybe it's Prisma's connection pooling. Maybe @prisma/adapter-pg doesn't play well with serverless. Maybe the pg Pool is creating connections that go stale during cold starts. I tried everything:

Custom connection pool with explicit timeouts — no change
Raw SQL with $executeRaw instead of prisma.account.create() — still hangs
Increased pool size from 1 to 10 — no change
Stripped the adapter down to the bare PrismaAdapter — still hangs
Wrapped linkAccount() in a 10-second timeout to at least see it fail faster — confirmed the hang

I added logging everywhere. The adapter method gets called. The account data is there. The Prisma query is constructed. And then... silence. The query never reaches PostgreSQL. I checked pg_stat_activity — no hanging connections. I checked the PostgreSQL logs — no INSERT query for the Account table ever appears. The query leaves Prisma and enters a void.

Meanwhile, everything else works. User creation? Fast. Profile updates? Fast. Session writes? Fast. API routes that manually write to the Account table? Fast. Only linkAccount() during the OAuth callback hangs. Only on Vercel. Only in production.

I spent hours on this. Claude Code couldn't solve it. Stack Overflow had nothing. I went to bed.

Sleep Matters

I woke up and did something I should have done earlier: I stopped blaming the application layer and started looking at the network.

After sleep, I ran a quick sanity check:

Connected my local app to the production database — linkAccount works fine from localhost
Logged into the production app on Vercel — session-based operations work fine
Updated profile data through the Vercel-hosted API — writes to the database work fine

So Vercel can write to the database. Vercel can write to the Account table. The database can accept connections from Vercel. The only thing that fails is one specific operation, in one specific context, and only when the payload is large enough.

That last part was the key. I just didn't know it yet.

tcpdump Doesn't Lie

I fired up tcpdump on the database server:

tcpdump -i ens18 'tcp port 5432' -n

And there it was. Vercel connects to PostgreSQL. TCP handshake completes. Initial data exchange works. Then at a certain point, the database sends a SACK — a selective acknowledgment — indicating a missing packet. A 1398-byte packet. And Vercel retransmits. And retransmits. And retransmits. Forever.

The packet never arrives. PostgreSQL never sees the query. The connection just sits there, both sides waiting, until the function times out.

1438 > 1420

Here's my infrastructure setup:

Vercel serverless function → WireGuard VPN tunnel → Self-hosted PostgreSQL

WireGuard adds 80 bytes of overhead per packet. My external interface (ens18) has a standard MTU of 1500 bytes. The WireGuard tunnel (wg0) has an MTU of 1420 bytes — which is 1500 minus the 80-byte WireGuard overhead.

The OAuth token INSERT — which includes the full id_token, access_token, and refresh_token from Google — produces a data payload of 1398 bytes. Add 40 bytes for TCP/IP headers, and you get 1438 bytes total packet size.

1438 is greater than 1420.

WireGuard silently dropped the packet. No ICMP "fragmentation needed" message. No error. No log entry. Just a packet that entered the tunnel and never came out the other side.

That's it. That's the entire bug. An 18-byte overflow.

Why Only OAuth?

This is why the bug was so deceptive. Most database operations produce small packets:

Operation	Approximate Size	Result
User profile update	~200-500 bytes	Fits in MTU
Session write	~300-600 bytes	Fits in MTU
API key insert	~400-800 bytes	Fits in MTU
OAuth account link	~1400+ bytes	Exceeds MTU

OAuth tokens are massive. Google's id_token alone is a JWT that can be 800+ bytes. Add the access_token and refresh_token, and you're well over the WireGuard MTU limit. Every other database operation I tested used small payloads that fit comfortably within 1420 bytes. Only the OAuth flow carried enough data to trigger the silent drop.

This is why it "worked on my machine." Locally, there's no WireGuard tunnel. No MTU restriction. Packets go straight to PostgreSQL over the local network with a 1500-byte MTU. The 1438-byte packet fits just fine.

The Fix: TCP MSS Clamping

The solution is two iptables rules that force TCP to negotiate a smaller Maximum Segment Size, keeping all packets within the WireGuard MTU:

# Clamp MSS for traffic entering the WireGuard tunnel (to PostgreSQL)
iptables -t mangle -A FORWARD -p tcp --dport 5432 -o wg0 -j TCPMSS --set-mss 1360
 
# Clamp MSS for traffic leaving the WireGuard tunnel (from PostgreSQL)
iptables -t mangle -A FORWARD -p tcp --sport 5432 -o ens18 -j TCPMSS --set-mss 1360

Why 1360? WireGuard MTU is 1420. TCP/IP headers are 40 bytes. That gives a maximum safe MSS of 1380. I use 1360 for margin — TCP options can add extra bytes, and I'd rather waste 20 bytes per packet than debug this again.

With MSS clamping, TCP knows the maximum segment size before sending. It splits large payloads into smaller chunks that fit within the tunnel. No silent drops. No retransmissions. No hanging queries.

After adding these rules, linkAccount() completed in 240ms on Vercel. First try.

The Lesson

The bug was never in Prisma. Never in NextAuth. Never in Vercel's serverless runtime. Never in PostgreSQL. It was in the 18-byte gap between a packet's actual size and what my WireGuard tunnel could carry.

I spent hours reading NextAuth source code, Prisma adapter internals, and serverless cold start documentation. The answer was in tcpdump. It always is, eventually, when the application layer looks innocent and the symptoms don't make sense.

If your database queries hang in production but work locally, and your traffic passes through any kind of tunnel or VPN — check your MTU. Check your MSS. The most maddening bugs are the ones that aren't bugs at all. They're infrastructure whispering that something doesn't fit, and you have to be quiet enough to hear it.

Sleep helps with that part.

This post originated from a Stack Overflow question I asked (and answered myself): NextAuth PrismaAdapter linkAccount hangs indefinitely on Vercel but works locally