Possible HTTP response read bug in Nginx 1.0.12

dubstep · April 3, 2012, 1:07pm

Hi,

I’ve nginx server setup that terminates SSL connections from clients and
forwards plain HTTP requests to backend servers. Some of these requests
can last pretty long (up to 1 minute) and usually get very short 15 byte
reply. (Ajax polling)

However, we see often these kind of errors that indicate that backend
servers would have disconnected socket while writing reply.

2012/03/03 06:46:38 [error] 13394#0: *2100 upstream prematurely closed
connection while reading response header from upstream, client:
XX.XX.XX.XX, server: secure.example.com, request: “GET
/url/example?par=12345 HTTP/1.1”, upstream:
“http://127.0.0.1:8082/url/example?par=12345”, host:
“secure.example.com”, referrer: “https://secure.example.com/”

Strangely I can see correponding 200 OK responses from our server logs
at the same timestamp, so at least backend server is trying to respond
correctly.
Then we captured the traffic between nginx and backend server with
wireshark and noticed that when this happens the response from server is
actually sent in two TCP packets.

The first packet has headers and the second packet has small 15 byte
HTTP body. I suspect that nginx fails to read the second packet in some
conditions and thinks backend server disconnected prematurely.

Any insights? Thanks!

Posted at Nginx Forum:

tikonen · April 3, 2012, 6:16pm

Hello!

On Tue, Apr 03, 2012 at 07:07:04AM -0400, tikonen wrote:

2012/03/03 06:46:38 [error] 13394#0: *2100 upstream prematurely closed
wireshark and noticed that when this happens the response from server is
actually sent in two TCP packets.

The first packet has headers and the second packet has small 15 byte
HTTP body. I suspect that nginx fails to read the second packet in some
conditions and thinks backend server disconnected prematurely.

Any insights? Thanks!

Please try 1.1.x to see if this is already resolved. If not,
please provide traffic capture and corresponding debug log, see

Maxim D.

tikonen · April 4, 2012, 3:06am

Hi,

Thanks for advice, I’ll try next with 1.1.X and see if that helps.

In case 1.1.X does not solve the problem. Server load is pretty big and
only minor % of requests have the error so the debug log and capture
dump would be Huge to catch even one case. (Still these errors are
frequent enough to cause ip_hash to flag node failed and cause havoc in
the cluster).

To narrow down needed data, can you recommend proper debugging settings,
should I start server or global level?
Also, should I make debug log againts 1.1.X or 1.0.12 ?

Thanks, Teemu

Maxim D. Wrote:

Please try 1.1.x to see if this is already
resolved. If not,
please provide traffic capture and corresponding
debug log, see
Debugging | NGINX

Posted at Nginx Forum:

tikonen · April 4, 2012, 10:13am

I can confirm that problem exisits also with Nginx 1.1.18.

Posted at Nginx Forum:

tikonen · May 19, 2012, 4:29am

Running now on nginx 1.2.0 and haven’t seen this error anymore in normal
production.

Now I can see these errors only when non-gracefully restarting upstream
nodes, but of course in this case these “prematurely closed” errors are
expected and normal.

Posted at Nginx Forum:

tikonen · April 4, 2012, 1:10pm

Hello!

On Tue, Apr 03, 2012 at 09:06:12PM -0400, tikonen wrote:

To narrow down needed data, can you recommend proper debugging settings,
should I start server or global level?

It’s better to have debugging activated at global level to be able
to see events reported by the kernel.

Also, should I make debug log againts 1.1.X or 1.0.12 ?

The 1.1.x would be better.

Maxim D.