The Bug That Wasn't a Bug
It Worked on My Machine
The most dangerous sentence in software engineering. And I said it, out loud, to myself, at 2 AM.
I was building an OAuth login flow with NextAuth and Prisma on Vercel. Google OAuth. Standard stuff. User clicks sign in, Google authenticates, callback fires, account gets linked in the database, session starts. I've done this before. It's not complicated.
Locally, it worked perfectly. Every time. No delays, no errors, no drama. Push to Vercel? The callback endpoint hangs for 10 seconds and dies. User record gets created fine. But the moment NextAuth tries to link the OAuth account — the step where it inserts the access token, refresh token, and ID token into the Account table — nothing. The query never completes.
The Prisma linkAccount() adapter method just... hangs. Indefinitely. Until Vercel's serverless function times out.
The Rabbit Hole
So I did what any reasonable engineer would do. I blamed the ORM.
Maybe it's Prisma's connection pooling. Maybe @prisma/adapter-pg doesn't play well with serverless. Maybe the pg Pool
is creating connections that go stale during cold starts. I tried everything:
- Custom connection pool with explicit timeouts — no change
- Raw SQL with
$executeRawinstead ofprisma.account.create()— still hangs - Increased pool size from 1 to 10 — no change
- Stripped the adapter down to the bare PrismaAdapter — still hangs
- Wrapped
linkAccount()in a 10-second timeout to at least see it fail faster — confirmed the hang
I added logging everywhere. The adapter method gets called. The account data is there. The Prisma query is constructed. And
then... silence. The query never reaches PostgreSQL. I checked pg_stat_activity — no hanging connections. I checked the
PostgreSQL logs — no INSERT query for the Account table ever appears. The query leaves Prisma and enters a void.
Meanwhile, everything else works. User creation? Fast. Profile updates? Fast. Session writes? Fast. API routes that manually
write to the Account table? Fast. Only linkAccount() during the OAuth callback hangs. Only on Vercel. Only in production.
I spent hours on this. Claude Code couldn't solve it. Stack Overflow had nothing. I went to bed.
Sleep Matters
I woke up and did something I should have done earlier: I stopped blaming the application layer and started looking at the network.
After sleep, I ran a quick sanity check:
- Connected my local app to the production database —
linkAccountworks fine from localhost - Logged into the production app on Vercel — session-based operations work fine
- Updated profile data through the Vercel-hosted API — writes to the database work fine
So Vercel can write to the database. Vercel can write to the Account table. The database can accept connections from Vercel. The only thing that fails is one specific operation, in one specific context, and only when the payload is large enough.
That last part was the key. I just didn't know it yet.
tcpdump Doesn't Lie
I fired up tcpdump on the database server:
tcpdump -i ens18 'tcp port 5432' -nAnd there it was. Vercel connects to PostgreSQL. TCP handshake completes. Initial data exchange works. Then at a certain point, the database sends a SACK — a selective acknowledgment — indicating a missing packet. A 1398-byte packet. And Vercel retransmits. And retransmits. And retransmits. Forever.
The packet never arrives. PostgreSQL never sees the query. The connection just sits there, both sides waiting, until the function times out.
1438 > 1420
Here's my infrastructure setup:
- Vercel serverless function → WireGuard VPN tunnel → Self-hosted PostgreSQL
WireGuard adds 80 bytes of overhead per packet. My external interface (ens18) has a standard MTU of 1500 bytes. The
WireGuard tunnel (wg0) has an MTU of 1420 bytes — which is 1500 minus the 80-byte WireGuard overhead.
The OAuth token INSERT — which includes the full id_token, access_token, and refresh_token from Google — produces a
data payload of 1398 bytes. Add 40 bytes for TCP/IP headers, and you get 1438 bytes total packet size.
1438 is greater than 1420.
WireGuard silently dropped the packet. No ICMP "fragmentation needed" message. No error. No log entry. Just a packet that entered the tunnel and never came out the other side.
That's it. That's the entire bug. An 18-byte overflow.
Why Only OAuth?
This is why the bug was so deceptive. Most database operations produce small packets:
| Operation | Approximate Size | Result |
|---|---|---|
| User profile update | ~200-500 bytes | Fits in MTU |
| Session write | ~300-600 bytes | Fits in MTU |
| API key insert | ~400-800 bytes | Fits in MTU |
| OAuth account link | ~1400+ bytes | Exceeds MTU |
OAuth tokens are massive. Google's id_token alone is a JWT that can be 800+ bytes. Add the access_token and
refresh_token, and you're well over the WireGuard MTU limit. Every other database operation I tested used small payloads
that fit comfortably within 1420 bytes. Only the OAuth flow carried enough data to trigger the silent drop.
This is why it "worked on my machine." Locally, there's no WireGuard tunnel. No MTU restriction. Packets go straight to PostgreSQL over the local network with a 1500-byte MTU. The 1438-byte packet fits just fine.
The Fix: TCP MSS Clamping
The solution is two iptables rules that force TCP to negotiate a smaller Maximum Segment Size, keeping all packets within
the WireGuard MTU:
# Clamp MSS for traffic entering the WireGuard tunnel (to PostgreSQL)
iptables -t mangle -A FORWARD -p tcp --dport 5432 -o wg0 -j TCPMSS --set-mss 1360
# Clamp MSS for traffic leaving the WireGuard tunnel (from PostgreSQL)
iptables -t mangle -A FORWARD -p tcp --sport 5432 -o ens18 -j TCPMSS --set-mss 1360Why 1360? WireGuard MTU is 1420. TCP/IP headers are 40 bytes. That gives a maximum safe MSS of 1380. I use 1360 for margin — TCP options can add extra bytes, and I'd rather waste 20 bytes per packet than debug this again.
With MSS clamping, TCP knows the maximum segment size before sending. It splits large payloads into smaller chunks that fit within the tunnel. No silent drops. No retransmissions. No hanging queries.
After adding these rules, linkAccount() completed in 240ms on Vercel. First try.
The Lesson
The bug was never in Prisma. Never in NextAuth. Never in Vercel's serverless runtime. Never in PostgreSQL. It was in the 18-byte gap between a packet's actual size and what my WireGuard tunnel could carry.
I spent hours reading NextAuth source code, Prisma adapter internals, and serverless cold start documentation. The answer
was in tcpdump. It always is, eventually, when the application layer looks innocent and the symptoms don't make sense.
If your database queries hang in production but work locally, and your traffic passes through any kind of tunnel or VPN — check your MTU. Check your MSS. The most maddening bugs are the ones that aren't bugs at all. They're infrastructure whispering that something doesn't fit, and you have to be quiet enough to hear it.
Sleep helps with that part.
This post originated from a Stack Overflow question I asked (and answered myself): NextAuth PrismaAdapter linkAccount hangs indefinitely on Vercel but works locally