File download results in "110: Connection timed out" on nginx; Download continues until Linux Kernel

Hello everyone,

when trying to download a static file using curl with a limit rate of
4000 (to simulate slow consumers) the download aborts after a certain
time. A workaround to make that work is to increase the send_timeout
to 3600. Nevertheless I would like to understand the behaviour and
hope that you can help:

  • File download starts with via curl with --limit-rate 4000.
  • After exactly 60s connection on nginx host goes from ESTABLISHED to
    FIN_WAIT_1 (verified with ‘ss’). Having nginx on debug gives the
    following:

2016/01/29 17:32:19 [debug] 9037#0: *1 http run request: “/8m?”
2016/01/29 17:32:19 [debug] 9037#0: *1 http writer handler: “/8m?”
2016/01/29 17:32:19 [debug] 9037#0: *1 http output filter “/8m?”
2016/01/29 17:32:19 [debug] 9037#0: *1 http copy filter: “/8m?”
2016/01/29 17:32:19 [debug] 9037#0: *1 image filter
2016/01/29 17:32:19 [debug] 9037#0: *1 xslt filter body
2016/01/29 17:32:19 [debug] 9037#0: *1 http postpone filter “/8m?”
0000000000000000
2016/01/29 17:32:19 [debug] 9037#0: *1 write old buf t:0 f:1
0000000000000000, pos 0000000000000000, size: 0 file: 624692, size:
7763916
2016/01/29 17:32:19 [debug] 9037#0: *1 http write filter: l:1 f:0
s:7763916
2016/01/29 17:32:19 [debug] 9037#0: *1 http write filter limit 0
2016/01/29 17:32:19 [debug] 9037#0: *1 sendfile: @624692 7763916
2016/01/29 17:32:19 [debug] 9037#0: *1 sendfile: 777600, @624692
777600:7763916
2016/01/29 17:32:19 [debug] 9037#0: *1 http write filter
00000000024CA578
2016/01/29 17:32:19 [debug] 9037#0: *1 http copy filter: -2 “/8m?”
2016/01/29 17:32:19 [debug] 9037#0: *1 http writer output filter: -2,
“/8m?”
2016/01/29 17:32:19 [debug] 9037#0: *1 event timer: 12, old:
1454088799035, new: 1454088799119
2016/01/29 17:33:06 [debug] 9037#0: *1 post event 00000000024FFEF8
2016/01/29 17:33:06 [debug] 9037#0: *1 delete posted event
00000000024FFEF8
2016/01/29 17:33:06 [debug] 9037#0: *1 http run request: “/8m?”
2016/01/29 17:33:06 [debug] 9037#0: *1 http writer handler: “/8m?”
2016/01/29 17:33:06 [debug] 9037#0: *1 http output filter “/8m?”
2016/01/29 17:33:06 [debug] 9037#0: *1 http copy filter: “/8m?”
2016/01/29 17:33:06 [debug] 9037#0: *1 image filter
2016/01/29 17:33:06 [debug] 9037#0: *1 xslt filter body
2016/01/29 17:33:06 [debug] 9037#0: *1 http postpone filter “/8m?”
0000000000000000
2016/01/29 17:33:06 [debug] 9037#0: *1 write old buf t:0 f:1
0000000000000000, pos 0000000000000000, size: 0 file: 1402292, size:
6986316
2016/01/29 17:33:06 [debug] 9037#0: *1 http write filter: l:1 f:0
s:6986316
2016/01/29 17:33:06 [debug] 9037#0: *1 http write filter limit 0
2016/01/29 17:33:06 [debug] 9037#0: *1 sendfile: @1402292 6986316
2016/01/29 17:33:06 [debug] 9037#0: *1 sendfile: 518400, @1402292
518400:6986316
2016/01/29 17:33:06 [debug] 9037#0: *1 http write filter
00000000024CA578
2016/01/29 17:33:06 [debug] 9037#0: *1 http copy filter: -2 “/8m?”
2016/01/29 17:33:06 [debug] 9037#0: *1 http writer output filter: -2,
“/8m?”
2016/01/29 17:33:06 [debug] 9037#0: *1 event timer del: 12:
1454088799035
2016/01/29 17:33:06 [debug] 9037#0: *1 event timer add: 12:
60000:1454088846743
2016/01/29 17:34:06 [debug] 9037#0: *1 event timer del: 12:
1454088846743
2016/01/29 17:34:06 [debug] 9037#0: *1 http run request: “/8m?”
2016/01/29 17:34:06 [debug] 9037#0: *1 http writer handler: “/8m?”
2016/01/29 17:34:06 [info] 9037#0: *1 client timed out (110:
Connection timed out) while sending response to client, client:
123.123.123.123, server: localhost, request: “GET /8m HTTP/1.1”, host:
“xxx.xxx.xxx”
2016/01/29 17:34:06 [debug] 9037#0: *1 http finalize request: 408,
“/8m?” a:1, c:1
2016/01/29 17:34:06 [debug] 9037#0: *1 http terminate request count:1
2016/01/29 17:34:06 [debug] 9037#0: *1 http terminate cleanup count:1
blk:0
2016/01/29 17:34:06 [debug] 9037#0: *1 http posted request: “/8m?”
2016/01/29 17:34:06 [debug] 9037#0: *1 http terminate handler count:1
2016/01/29 17:34:06 [debug] 9037#0: *1 http request count:1 blk:0
2016/01/29 17:34:06 [debug] 9037#0: *1 http close request
2016/01/29 17:34:06 [debug] 9037#0: *1 http log handler
2016/01/29 17:34:06 [debug] 9037#0: *1 run cleanup: 00000000024D4AD0
2016/01/29 17:34:06 [debug] 9037#0: *1 file cleanup: fd:13
2016/01/29 17:34:06 [debug] 9037#0: *1 free: 00000000024D3B40, unused: 0
2016/01/29 17:34:06 [debug] 9037#0: *1 free: 00000000024CA250, unused:
3145
2016/01/29 17:34:06 [debug] 9037#0: *1 close http connection: 12
2016/01/29 17:34:06 [debug] 9037#0: *1 reusable connection: 0
2016/01/29 17:34:06 [debug] 9037#0: *1 free: 00000000024D3730
2016/01/29 17:34:06 [debug] 9037#0: *1 free: 00000000024C5780, unused: 0
2016/01/29 17:34:06 [debug] 9037#0: *1 free: 00000000024EA910, unused:
128

  • Taking a look at the tcpdump file taken on the client side during
    the download NO FIN has been received.
  • Client keeps download and nginx keeps providing data. The tcp
    connection on client side remains in ESTABLISHED state.
  • After a while: the download still works, the connection on NGINX
    host goes from FIN_WAIT_1 to FIN_WAIT_2. After that client side goes
    to CLOSE_WAIT.
  • Waiting a few minutes and the connection is closed (it seems to me
    that this matches with tcp_fin_timeout = 180) and the download is
    stopped with an error only on client side. nginx doesn’t print
    anything in the logs.

Note: This only produced the problem when accessing HTTP via the
internet (in a local network everything worked fine).

Thanks a lot for your help in advance!
Markus

To reproduce the problem one can do the following (on a linux system):

  • Create a file about 8m big e.g.

    dd if=/dev/random of=/tmp/abc bs=1024 count=0 seek=$[1024*8]

and make it available for download via

  • curl -# --limit-rate 4000 --verbose -O --trace-ascii trace.out
    –trace-time http://PUBLIC-IP/8m

  • monitor connections with

    watch -n 2 " ss ‘( dport = :http or sport = :http )’ "

as well as the error log


nginx -V

nginx version: nginx/1.4.6 (Ubuntu)
built by gcc 4.8.2 (Ubuntu 4.8.2-16ubuntu6)
TLS SNI support enabled
configure arguments: --with-cc-opt=‘-g -O2 -fstack-protector
–param=ssp-buffer-size=4 -Wformat -Werror=format-security
-D_FORTIFY_SOURCE=2’ --with-ld-opt=‘-Wl,-Bsymbolic-functions
-Wl,-z,relro’ --prefix=/usr/share/nginx
–conf-path=/etc/nginx/nginx.conf
–http-log-path=/var/log/nginx/access.log
–error-log-path=/var/log/nginx/error.log
–lock-path=/var/lock/nginx.lock --pid-path=/run/nginx.pid
–http-client-body-temp-path=/var/lib/nginx/body
–http-fastcgi-temp-path=/var/lib/nginx/fastcgi
–http-proxy-temp-path=/var/lib/nginx/proxy
–http-scgi-temp-path=/var/lib/nginx/scgi
–http-uwsgi-temp-path=/var/lib/nginx/uwsgi --with-debug
–with-pcre-jit --with-ipv6 --with-http_ssl_module
–with-http_stub_status_module --with-http_realip_module
–with-http_addition_module --with-http_dav_module
–with-http_geoip_module --with-http_gzip_static_module
–with-http_image_filter_module --with-http_spdy_module
–with-http_sub_module --with-http_xslt_module --with-mail
–with-mail_ssl_module

uname -a

Linux 7e355bbce7a7 4.2.0-17-generic #21-Ubuntu SMP Fri Oct 23 19:56:16
UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

This looks normal. Definition of ‘send_timeout’ is in:

http://nginx.org/en/docs/http/ngx_http_core_module.html#send_timeout

“Sets a timeout for transmitting a response to the client. The timeout
is set only between two successive write operations, not for the
transmission of the whole response. If the client does not receive
anything within this time, the connection is closed.”

Here the tricky part is “write operations” phrase. In your config (and
from your logs), I see that you are using “sendfile” which is a single
write operation for static files from nginx perspective. With
4000/sec, it’ll take 34 minutes for the download (8MB) to complete, so
send timeout fires.

Cheers,
Tolga

Thanks a lot for the clarification. I did read the docs but the part
with
the sendfile and no successive writes was what i did not connect. I
thought
that because of “not for the
transmission of the whole response” it didn’t apply in that case.

I tried to fix that by adding ‘sendfile_max_chunk 32k’ but this didn’t
help.
To me when doing that it seems to be weird that many timers are added
and
deleted when starting the transfer

2016/01/30 12:54:11 [debug] 9259#0: *1 event timer del: 12:
1454158511891
2016/01/30 12:54:11 [debug] 9259#0: *1 event timer add: 12:
1:1454158451892
2016/01/30 12:54:11 [debug] 9259#0: *1 event timer del: 12:
1454158451892
2016/01/30 12:54:11 [debug] 9259#0: *1 event timer add: 12:
1:1454158451894
2016/01/30 12:54:11 [debug] 9259#0: *1 event timer del: 12:
1454158451894
2016/01/30 12:54:11 [debug] 9259#0: *1 event timer add: 12:
120000:1454158571894
2016/01/30 12:54:11 [debug] 9259#0: *1 event timer del: 12:
1454158571894
2016/01/30 12:54:11 [debug] 9259#0: *1 event timer add: 12:
1:1454158451922
2016/01/30 12:54:11 [debug] 9259#0: *1 event timer del: 12:
1454158451922
2016/01/30 12:54:11 [debug] 9259#0: *1 event timer add: 12:
1:1454158451923
2016/01/30 12:54:11 [debug] 9259#0: *1 event timer del: 12:
1454158451923
2016/01/30 12:54:11 [debug] 9259#0: *1 event timer add: 12:
1:1454158451924
2016/01/30 12:54:11 [debug] 9259#0: *1 event timer del: 12:
1454158451924
… (many more follow)

So the only way to fix that for the moment to me is to set ‘sendfile
off’
and also increase the ‘send_timeout’ to a more reasonable value in that
case. Again thanks a lot for the clarification!

Cheers,
Markus

Posted at Nginx Forum: