Workers CPU leak

Rafa_F · February 24, 2016, 3:17pm

Hello, I have strange issuses with nginx workers. For some time after
start
Nginx I notice that some process of workers cause high load to CPU (
principally sys CPU).

At first I’ve got syscall traces from one of such process:

futex(0x157d914, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x157d910, {FUTEX_OP_SET,
0,
FUTEX_OP_CMP_GT, 1}) = 1
epoll_wait(38, {{EPOLLIN, {u32=7156096, u64=7156096}}}, 512, -1) = 1
epoll_ctl(38, EPOLL_CTL_ADD, 178, {EPOLLOUT|EPOLLET, {u32=3888102096,
u64=140028411886288}}) = 0
epoll_wait(38, {{EPOLLOUT, {u32=3888102096, u64=140028411886288}}}, 512,
-1)
= 1
epoll_ctl(38, EPOLL_CTL_DEL, 178, 7ffda2bc7f30) = 0
futex(0x157d914, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x157d910, {FUTEX_OP_SET,
0,
FUTEX_OP_CMP_GT, 1}) = 1
epoll_wait(38, {{EPOLLIN, {u32=7156096, u64=7156096}}}, 512, -1) = 1
epoll_ctl(38, EPOLL_CTL_ADD, 178, {EPOLLOUT|EPOLLET, {u32=3888102096,
u64=140028411886288}}) = 0
epoll_wait(38, {{EPOLLOUT, {u32=3888102096, u64=140028411886288}}}, 512,
-1)
= 1
epoll_ctl(38, EPOLL_CTL_DEL, 178, 7ffda2bc7f30) = 0
futex(0x157d914, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x157d910, {FUTEX_OP_SET,
0,
FUTEX_OP_CMP_GT, 1}) = 1
futex(0x157d8d0, FUTEX_WAKE_PRIVATE, 1) = 1

epoll_wait, epoll_ctl, futex are repeated circularly.

Then I’ve got lsof of process and see who owns of 38 file descriptor:

nginx 18862 www 38u a_inode 0,9 0 6
[eventpoll]

also I see several CLOSE_WAIT sockets

nginx 18862 www 101u IPv4 85643376 0t0 TCP
154.59.82.194:http->105.107.179.210:24519 (CLOSE_WAIT)
nginx 18862 www 133r REG 8,3 0 4743
/mnt/ssd1/wwwroot/71/7/27394667.mp4 (deleted)
nginx 18862 www 178u IPv4 86054929 0t0 TCP
154.59.82.194:http->5adc98ed.bb.sky.com:45665 (CLOSE_WAIT)
nginx 18862 www 179r REG 8,3 0 5098
/mnt/ssd1/wwwroot/21/9/29603499.mp4 (deleted)

Nginx has such version and modules:

nginx version: nginx/1.9.11
built with OpenSSL 1.0.2f 28 Jan 2016
TLS SNI support enabled
configure arguments: --prefix=/usr --conf-path=/etc/nginx/nginx.conf
–error-log-path=/var/log/nginx/error_log --pid-path=/run/nginx.pid
–lock-path=/run/lock/nginx.lock --with-cc-opt=-I/usr/include
–with-ld-opt=-L/usr/lib64 --http-log-path=/var/log/nginx/access_log
–http-client-body-temp-path=/var/lib/nginx/tmp/client
–http-proxy-temp-path=/var/lib/nginx/tmp/proxy
–http-fastcgi-temp-path=/var/lib/nginx/tmp/fastcgi
–http-scgi-temp-path=/var/lib/nginx/tmp/scgi
–http-uwsgi-temp-path=/var/lib/nginx/tmp/uwsgi --with-file-aio
–with-ipv6
–with-pcre --with-threads --without-http_autoindex_module
–without-http_fastcgi_module --without-http_geo_module
–without-http_limit_req_module --without-http_limit_conn_module
–without-http_memcached_module --without-http_uwsgi_module
–with-http_flv_module --with-http_gzip_static_module
–with-http_mp4_module
–with-http_perl_module
–add-module=external_module/headers-more-nginx-module-0.261
–add-module=external_module/ngx_estreaming_module-0.01
–add-module=external_module/ngx_slice_module-0.01
–with-http_ssl_module
–without-mail_imap_module --without-mail_pop3_module
–without-mail_smtp_module --user=‘www --group=www’

and using for video streaming.

Has anyone encountered such behavior ? Help please.

Posted at Nginx Forum:

vizl · February 24, 2016, 3:39pm

P.S: we are using Gentoo with 4.4.1 kernel and CPU X3330 @ 2.66GHz
GenuineIntel GNU/Linux

Posted at Nginx Forum:

vizl · March 4, 2016, 1:12pm

Sorry for long answer, but we have doing some tests, and notice that
probles
is appear when thread_pool enabled.

thread_pool default threads=128 max_queue=1024

We need to use thread_pool, and can’t permenent disable it unfortunately

Posted at Nginx Forum:

vizl · March 4, 2016, 1:21pm

On Friday 04 March 2016 07:12:09 vizl wrote:

Sorry for long answer, but we have doing some tests, and notice that probles
is appear when thread_pool enabled.

thread_pool default threads=128 max_queue=1024

We need to use thread_pool, and can’t permenent disable it unfortunately

[…]

Do you modify files that are served by nginx?
Do you have open_file_cache enabled?

wbr, Valentin V. Bartenev

vizl · February 24, 2016, 4:26pm

On Wednesday 24 February 2016 09:17:01 vizl wrote:

u64=140028411886288}}) = 0
epoll_ctl(38, EPOLL_CTL_DEL, 178, 7ffda2bc7f30) = 0

–http-client-body-temp-path=/var/lib/nginx/tmp/client
–add-module=external_module/headers-more-nginx-module-0.261
–add-module=external_module/ngx_estreaming_module-0.01
–add-module=external_module/ngx_slice_module-0.01 --with-http_ssl_module
–without-mail_imap_module --without-mail_pop3_module
–without-mail_smtp_module --user=‘www --group=www’

and using for video streaming.

Has anyone encountered such behavior ? Help please.

[…]

Could you provide a minimal configuration that is causing problems
with debug log? See: A debugging log

wbr, Valentin V. Bartenev

vizl · March 4, 2016, 1:25pm

On 3/4/16 3:20 PM, Valentin V. Bartenev wrote:

Do you modify files that are served by nginx?
Do you have open_file_cache enabled?

Can we ask you again about the nginx configuration?

–
Maxim K.

vizl · March 4, 2016, 2:26pm

user www;
worker_processes 16;
thread_pool default threads=128 max_queue=1024;
worker_rlimit_nofile 65536;
###timer_resolution 100ms;

#error_log /home/logs/error_log.nginx error;
error_log /home/logs/error_log.nginx.debug debug;

events {
worker_connections 30000;
use epoll;
}

http {
include mime.types;
default_type application/octet-stream;
index index.html index.htm;

output_buffers 2 256k;
read_ahead 256k; # was 1m;
aio threads=default;
aio on;
sendfile on;
sendfile_max_chunk 256k;

server {
listen *:80 default rcvbuf=32768 backlog=2048 reuseport deferred;
listen *:443 ssl default rcvbuf=32768 backlog=2048 reuseport
deferred;
server_name localhost;
access_log /home/logs/access.log;
error_log /home/logs/error.log warn;
root /mnt;
expires 20m;

location ~ ^/crossdomain.xml { }
location ~ \.[Ff][Ll][Vv]$ {
  flv;
}
location ~ \.[Mm][Pp]4$ {
  mp4;
}

}
}

Posted at Nginx Forum:

vizl · March 4, 2016, 3:43pm

Sorry, my misprint.

Config whithout aio on;

only aio threads=default;

do you or some tool periodically change the files ?
no, files are unchanged, just periodically some new are added and some
expired are deleted

Posted at Nginx Forum:

vizl · March 4, 2016, 2:45pm

On Friday 04 March 2016 08:25:43 vizl wrote:

worker_connections 30000;
aio threads=default;
aio on;

This is invalid configuration due to duplicated “aio” directive.
I assume it’s not the configuration with which your nginx is
currently running.

expires 20m;

The question remains the same: do you or some tool periodically
change the files?

wbr, Valentin V. Bartenev

vizl · March 4, 2016, 3:46pm

On Friday 04 March 2016 09:42:45 vizl wrote:

Sorry, my misprint.

Config whithout aio on;

only aio threads=default;

do you or some tool periodically change the files ?
no, files are unchanged, just periodically some new are added and some
expired are deleted

Could you provide the debug log?

wbr, Valentin V. Bartenev

vizl · March 9, 2016, 3:28pm

Debug log regarding to hanged PID 7479
http://dev.vizl.org/debug.log.txt

Posted at Nginx Forum:

vizl · March 9, 2016, 4:13pm

Hello!

On Wed, Mar 09, 2016 at 09:28:20AM -0500, vizl wrote:

Debug log regarding to hanged PID 7479 http://dev.vizl.org/debug.log.txt

This looks like a threads + sendfile() loop due to non-atomic updates of
underlying file, similar to one recently reported on the Russian
mailing list.

Correct solution would be to fix your system to update files
atomically instead of overwriting them in place.

The patch below will resolve CPU hog and will log an alert
instead:

HG changeset patch

User Maxim D. [email protected]

Date 1457536139 -10800

Wed Mar 09 18:08:59 2016 +0300

Node ID e96e5dfe4ff8ffe301264c3eb2771596fae24d38

Parent 93049710cb7f6ea91fa9bd707e88fbe79d82d0ef

Truncation detection in sendfile() on Linux.

This addresses connection hangs as observed in ticket #504, and
CPU hogs with “aio threads; sendfile on” as reported in the mailing
list,
see http://mailman.nginx.org/pipermail/nginx-ru/2016-March/057638.html.

The alert is identical to one used on FreeBSD.

diff --git a/src/os/unix/ngx_linux_sendfile_chain.c
b/src/os/unix/ngx_linux_sendfile_chain.c
— a/src/os/unix/ngx_linux_sendfile_chain.c
+++ b/src/os/unix/ngx_linux_sendfile_chain.c
@@ -292,6 +292,19 @@ eintr:
}
}

if (n == 0) {
```
   /*
```

    * if sendfile returns zero, then someone has truncated the

file,

    * so the offset became beyond the end of the file

```
    */
```

   ngx_log_error(NGX_LOG_ALERT, c->log, 0,

                 "sendfile() reported that \"%s\" was truncated at

%O",

                 file->file->name.data, file->file_pos);

```
   return NGX_ERROR;
```
}
ngx_log_debug3(NGX_LOG_DEBUG_EVENT, c->log, 0, “sendfile: %z of %uz
@%O”,
n, size, file->file_pos);

@@ -349,6 +362,19 @@ ngx_linux_sendfile_thread(ngx_connection
return NGX_ERROR;
}

   if (ctx->err != NGX_AGAIN && ctx->sent == 0) {

```
       /*
```

        * if sendfile returns zero, then someone has truncated the

file,

        * so the offset became beyond the end of the file

```
        */
```

       ngx_log_error(NGX_LOG_ALERT, c->log, 0,

                     "sendfile() reported that \"%s\" was

truncated at %O",

                     file->file->name.data, file->file_pos);

```
       return NGX_ERROR;
```
```
   }
```

   *sent = ctx->sent;

   return (ctx->sent == ctx->size) ? NGX_DONE : NGX_AGAIN;

–
Maxim D.
http://nginx.org/

vizl · March 9, 2016, 4:56pm

Thank you.
We don’t make any changes with files or overwrite them while sendfile
processing it.
Only create temp file and then mv it.

Maybe Is it the same bug concerned with treads aio like in this
messeges,
https://forum.nginx.org/read.php?21,264701,265016#msg-265016
and it would be fixed in future ?

Posted at Nginx Forum:

vizl · March 9, 2016, 5:19pm

Hello!

On Wed, Mar 09, 2016 at 10:56:40AM -0500, vizl wrote:

Thank you.
We don’t make any changes with files or overwrite them while sendfile
processing it.
Only create temp file and then mv it.

The debug log suggests this is not true.

Maybe Is it the same bug concerned with treads aio like in this messeges,
Re: nginx отъедает все процессорное время
and it would be fixed in future ?

Yes, and the patch provided links to the very same thread and
resolves the problem in nginx, i.e., CPU hog. With the patch an
alert will be correctly logged.

Note well that the root cause of CPU hog observed are non-atomic
file updates. You will still see other problems till this is
resolved (including data corruption), even with the patch.

–
Maxim D.
http://nginx.org/

vizl · March 20, 2016, 9:34pm

We found, that we are running ‘truncate -s 0’ to file before removing
them.
Can it potentially cause the mentioned above problems ?

Posted at Nginx Forum:

vizl · March 21, 2016, 9:09am

On 3/20/16 11:34 PM, vizl wrote:

We found, that we are running ‘truncate -s 0’ to file before removing them.
Can it potentially cause the mentioned above problems ?

Yes, quite possible.

The fix was committed to the mainline branch and will be available
in 1.9.13:

http://mailman.nginx.org/pipermail/nginx-devel/2016-March/008012.html

–
Maxim K.

vizl · March 10, 2016, 3:09pm

One thing: when we disable threads in config, and leave only “senfile
on;”
the problem was gone.
So if problem caused by sendfile and non-atomic file updates, why it
does
not apeared when threads off ?

Posted at Nginx Forum:

vizl · March 21, 2016, 2:30pm

Hello!

On Sun, Mar 20, 2016 at 04:34:14PM -0400, vizl wrote:

We found, that we are running ‘truncate -s 0’ to file before removing them.
Can it potentially cause the mentioned above problems ?

Yes, for sure. This is a modification of a file being served, and
it’s expected to cause the CPU hog in question when using sendfile
in threads on Linux. In nginx 1.9.13 nginx will log an alert
instead, see this commit:

http://hg.nginx.org/nginx/rev/4df3d9fcdee8

–
Maxim D.
http://nginx.org/

vizl · March 21, 2016, 1:01pm

Thank you. Waiting for 1.9.13 branch.

Posted at Nginx Forum: