Worker processes not shutting down

Following an upgrade from nginx 1.0.2 to 1.0.11 I’ve experienced a
problem with worker processes that are not shutting down. When looking
in the list of running processes the worker processes appears to be
shutting down but never actually exits.

I’ve been using the ubuntu nginx ppa’s for maverick.

As the worker processes doesn’t shut down nginx ends up consuming to
much memory and is ultimately unable to spawn new worker processes.

Below is a list of processes showing the problem:

www-data 560 0.5 4.2 376376 321760 ? S 20:04 0:07 nginx:
worker process is shutting down
www-data 562 0.1 4.2 376376 321720 ? S 20:04 0:01 nginx:
worker process is shutting down
www-data 1029 0.2 4.2 377712 323092 ? S 20:10 0:02 nginx:
worker process is shutting down
www-data 1030 0.0 4.2 377712 322992 ? S 20:10 0:00 nginx:
worker process is shutting down
www-data 1093 0.2 4.2 377972 323300 ? S 20:11 0:02 nginx:
worker process is shutting down
www-data 1094 0.0 4.2 377972 323260 ? S 20:11 0:00 nginx:
worker process is shutting down
www-data 1095 0.0 4.2 377972 323300 ? S 20:11 0:00 nginx:
worker process is shutting down
www-data 1158 0.0 4.2 377972 323272 ? S 20:12 0:00 nginx:
worker process is shutting down
www-data 1342 0.4 4.2 377972 323300 ? S 20:14 0:03 nginx:
worker process is shutting down
www-data 1879 0.1 4.2 376928 322220 ? S 20:21 0:00 nginx:
worker process is shutting down
www-data 1881 0.4 4.2 376928 322260 ? S 20:21 0:01 nginx:
worker process is shutting down
www-data 2190 0.1 4.2 377972 323052 ? S 20:25 0:00 nginx:
worker process is shutting down
www-data 2193 2.4 4.2 377972 323348 ? S 20:25 0:04 nginx:
worker process is shutting down
www-data 2194 0.5 4.2 377972 323272 ? S 20:25 0:01 nginx:
worker process is shutting down
www-data 2376 0.3 4.1 375884 321060 ? S 20:28 0:00 nginx:
worker process
www-data 2377 0.5 4.1 375884 320320 ? S 20:28 0:00 nginx:
worker process
www-data 2378 0.2 4.1 375884 319780 ? S 20:28 0:00 nginx:
worker process
www-data 2379 4.0 4.1 375884 321192 ? S 20:28 0:02 nginx:
worker process
ubuntu 2576 0.0 0.0 7964 900 pts/1 R+ 20:29 0:00 grep
–color=auto nginx
root 3185 8.2 4.1 373960 318980 ? Rs Jan09 1854:37 nginx:
master process nginx
www-data 25687 0.1 4.2 377420 322804 ? S 18:28 0:12 nginx:
worker process is shutting down
www-data 25688 0.1 4.2 377420 322768 ? S 18:28 0:10 nginx:
worker process is shutting down
www-data 25689 0.2 4.2 377420 322804 ? S 18:28 0:17 nginx:
worker process is shutting down
www-data 26682 0.1 4.2 377676 323076 ? S 18:41 0:10 nginx:
worker process is shutting down
www-data 27287 0.2 4.2 377004 322344 ? S 18:49 0:15 nginx:
worker process is shutting down
www-data 27289 0.0 4.2 377004 322276 ? S 18:49 0:04 nginx:
worker process is shutting down
www-data 27290 0.1 4.2 377004 322356 ? S 18:49 0:10 nginx:
worker process is shutting down
www-data 29293 0.0 4.2 376608 322116 ? S 19:15 0:04 nginx:
worker process is shutting down
www-data 29619 0.0 4.2 376608 322076 ? S 19:20 0:02 nginx:
worker process is shutting down
www-data 29620 0.1 4.2 376608 322116 ? S 19:20 0:04 nginx:
worker process is shutting down
www-data 30116 0.0 4.2 376608 322068 ? S 19:26 0:02 nginx:
worker process is shutting down
www-data 30497 0.0 4.2 376608 322064 ? S 19:31 0:03 nginx:
worker process is shutting down
www-data 31235 0.1 4.2 376536 321904 ? S 19:41 0:05 nginx:
worker process is shutting down
www-data 31817 0.0 4.2 376544 321896 ? S 19:49 0:01 nginx:
worker process is shutting down
www-data 31938 0.1 4.2 376460 321848 ? S 19:50 0:03 nginx:
worker process is shutting down
www-data 31939 0.0 4.2 376460 321724 ? S 19:50 0:01 nginx:
worker process is shutting down
www-data 32368 0.5 4.2 376412 321800 ? S 19:56 0:10 nginx:
worker process is shutting down

Posted at Nginx Forum:

The problem occurs when reloading the configuration - where nginx
usually exits the existing worker processes gracefully.

Posted at Nginx Forum:

Hello!

On Tue, Jan 24, 2012 at 03:42:49PM -0500, runesoerensen wrote:

Below is a list of processes showing the problem:

www-data 560 0.5 4.2 376376 321760 ? S 20:04 0:07 nginx:
worker process is shutting down
www-data 562 0.1 4.2 376376 321720 ? S 20:04 0:01 nginx:
worker process is shutting down

[…]

What does “nginx -V” show? If there are any 3rd party
modules/patches - do you see the same behaviour with vanilla
nginx?

Maxim D.

nginx -V returns:

nginx version: nginx/1.2.0
TLS SNI support enabled
configure arguments: --prefix=/etc/nginx
–conf-path=/etc/nginx/nginx.conf
–error-log-path=/var/log/nginx/error.log
–http-client-body-temp-path=/var/lib/nginx/body
–http-fastcgi-temp-path=/var/lib/nginx/fastcgi
–http-log-path=/var/log/nginx/access.log
–http-proxy-temp-path=/var/lib/nginx/proxy
–http-scgi-temp-path=/var/lib/nginx/scgi
–http-uwsgi-temp-path=/var/lib/nginx/uwsgi
–lock-path=/var/lock/nginx.lock
–pid-path=/var/run/nginx.pid --with-debug --with-http_addition_module
–with-http_dav_module --with-http_geoip_module
–with-http_gzip_static_module --with-http_image_filter_module
–with-http_realip_module --with-http_stub_status_module
–with-http_ssl_module --with-http_sub_module --with-http_xslt_module
–with-ipv6 --with-sha1=/usr/include/openssl
–with-md5=/usr/include/openssl
–with-mail --with-mail_ssl_module
–add-module=/build/buildd/nginx-1.2.0/debian/modules/nginx-auth-pam
–add-module=/build/buildd/nginx-1.2.0/debian/modules/nginx-echo
–add-module=/build/buildd/nginx-1.2.0/debian/modules/nginx-upstream-fair
–add-module=/build/buildd/nginx-1.2.0/debian/modules/nginx-dav-ext-module

I’ve only seen the issue in a production environment where it’s
difficult to
test with vanilla nginx

Posted at Nginx Forum:

Hello!

On Tue, Jan 24, 2012 at 12:42 PM, runesoerensen wrote:

Following an upgrade from nginx 1.0.2 to 1.0.11 I’ve experienced a
problem with worker processes that are not shutting down. When looking
in the list of running processes the worker processes appears to be
shutting down but never actually exits.

Tools like pstack and strace are your good friends here :slight_smile: You can try
both of these tools to inspect your problematic worker processes to
see what they’re spinning onto. You can paste the results here if you
still don’t understand it.

Best regards,
-agentzh

Hi,

I have the same issue and would like to find a solution. This is in
production so couple of 3rd party module compiled. This happens when we
reload the nginx config.

ps -ef | grep nginx

root 10163 1 0 Feb08 ? 00:00:00 nginx: master process
/usr/sbin/nginx -c /etc/nginx/nginx.conf
nginx 14434 10163 0 06:39 ? 00:00:00 nginx: worker process is
shutting down
nginx 15664 10163 0 06:40 ? 00:00:00 nginx: worker process
nginx 15665 10163 0 06:40 ? 00:00:00 nginx: worker process
nginx 15666 10163 0 06:40 ? 00:00:00 nginx: worker process
nginx 15667 10163 0 06:40 ? 00:00:00 nginx: worker process
root 17489 9311 0 06:43 pts/3 00:00:00 grep nginx
nginx 23887 10163 0 Feb08 ? 00:00:12 nginx: worker process is
shutting down
nginx 23888 10163 0 Feb08 ? 00:00:08 nginx: worker process is
shutting down
nginx 23892 10163 0 Feb08 ? 00:00:20 nginx: worker process is
shutting down
nginx 32240 10163 0 Feb11 ? 00:00:15 nginx: worker process is
shutting down
nginx 32241 10163 0 Feb11 ? 00:00:16 nginx: worker process is
shutting down
nginx 32244 10163 0 Feb11 ? 00:00:13 nginx: worker process is
shutting down
nginx 32245 10163 0 Feb11 ? 00:00:19 nginx: worker process is
shutting down

pstack 32245

#0 0x00000039dced4863 in __epoll_wait_nocancel () from /lib64/libc.so.6
#1 0x0000000000425e57 in ngx_epoll_process_events ()
#2 0x000000000041cc2e in ngx_process_events_and_timers ()
#3 0x0000000000423d7d in ngx_worker_process_cycle ()
#4 0x000000000042236d in ngx_spawn_process ()
#5 0x00000000004232cc in ngx_start_worker_processes ()
#6 0x0000000000424a9e in ngx_master_process_cycle ()
#7 0x00000000004069fd in main ()

]# pstack 32244
#0 0x00000039dced4863 in __epoll_wait_nocancel () from /lib64/libc.so.6
#1 0x0000000000425e57 in ngx_epoll_process_events ()
#2 0x000000000041cc2e in ngx_process_events_and_timers ()
#3 0x0000000000423d7d in ngx_worker_process_cycle ()
#4 0x000000000042236d in ngx_spawn_process ()
#5 0x00000000004232cc in ngx_start_worker_processes ()
#6 0x0000000000424a9e in ngx_master_process_cycle ()
#7 0x00000000004069fd in main ()

pstack 32241

#0 0x00000039dced4863 in __epoll_wait_nocancel () from /lib64/libc.so.6
#1 0x0000000000425e57 in ngx_epoll_process_events ()
#2 0x000000000041cc2e in ngx_process_events_and_timers ()
#3 0x0000000000423d7d in ngx_worker_process_cycle ()
#4 0x000000000042236d in ngx_spawn_process ()
#5 0x00000000004232cc in ngx_start_worker_processes ()
#6 0x0000000000424a9e in ngx_master_process_cycle ()
#7 0x00000000004069fd in main ()

strace -p 32244

Process 32244 attached - interrupt to quit
epoll_wait(21,

strace -p 32241

Process 32241 attached - interrupt to quit
epoll_wait(19,

strace -p 32245

Process 32245 attached - interrupt to quit
epoll_wait(24, {{EPOLLIN|EPOLLOUT, {u32=412706096, u64=412706096}}},
512,
9351728) = 1
recvfrom(447, “HTTP/1.1 200 OK\r\nDate: Tue, 12 F”…, 2048, 0, NULL,
NULL)
= 611
write(444,
“\27\3\1\2\200\224T\204\354\353w\344+\267c9\314\270\30I\216\200\314\354\376\3\323\202\332(:\323”…,
645) = 645
recvfrom(447, 0x1a0e98f0, 2048, 0, 0, 0) = -1 EAGAIN (Resource
temporarily
unavailable)
epoll_wait(24, {{EPOLLIN|EPOLLOUT, {u32=412730864, u64=412730864}}},
512,
9351656) = 1
read(443, “\27\3\1\0
\t\355\205\202\267>\274\240\324qR\305\334\36\223\355\0251\31[A\366\225~\217\220\252”…,
34821) = 698
read(443, 0x1a0d0600, 34821) = -1 EAGAIN (Resource
temporarily
unavailable)
sendto(446, “GET /CometServer/cometd/connect?”…, 621, 0, NULL, 0) =
621
epoll_wait(24, {{EPOLLIN|EPOLLOUT, {u32=412783472, u64=412783472}}},
512,
9351648) = 1
recvfrom(446, “HTTP/1.1 200 OK\r\nDate: Tue, 12 F”…, 1024, 0, NULL,
NULL)
= 1024
write(443, “\27\3\1\4
+T@\376\260\306q\vrq\1\240T\244\377\227\23g\5\340\262FYs:]o”…, 1061) =
1061
recvfrom(446, “|TRT\"},\"DAT\":{\"DQ\":[\"PFX|1”…, 1024,
0,
NULL, NULL) = 1024
write(443, “\27\3\1\4
\262\202\372V\326\251S\252?\353\266\307\272\257\240\306\247\205\345\307\320L\31W\223\337h”…,
1061) = 1061
recvfrom(446, “":"{\"RT\":\"4\",\"HED\":{\"DQ\"”…,
1024,
0, NULL, NULL) = 1024
write(443, “\27\3\1\4
9(\241\235\377\350\220\4\320&\25\34|;#\215\212\220"\263\vFz\350\360ks”…,
1061) = 1061
recvfrom(446, “a":{"scope":"public","server":"P”…, 1024, 0,
NULL,
NULL) = 1024

cat /etc/redhat-release

Red Hat Enterprise Linux Server release 5.7 (Tikanga)

uname -rop

2.6.18-274.el5 x86_64 GNU/Linux

nginx -V

nginx version: nginx/1.2.6
built by gcc 4.1.2 20080704 (Red Hat 4.1.2-50)
TLS SNI support disabled
configure arguments: --prefix=/etc/nginx/ --sbin-path=/usr/sbin/nginx
–conf-path=/etc/nginx/nginx.conf
–error-log-path=/var/log/nginx/error.log
–http-log-path=/var/log/nginx/access.log --pid-path=/var/run/nginx.pid
–lock-path=/var/run/nginx.lock
–http-client-body-temp-path=/var/cache/nginx/client_temp
–http-proxy-temp-path=/var/cache/nginx/proxy_temp
–http-fastcgi-temp-path=/var/cache/nginx/fastcgi_temp
–http-uwsgi-temp-path=/var/cache/nginx/uwsgi_temp
–http-scgi-temp-path=/var/cache/nginx/scgi_temp --user=nginx
–group=nginx
–with-http_ssl_module --with-http_realip_module
–with-http_addition_module
–with-http_sub_module --with-http_dav_module --with-http_flv_module
–with-http_gzip_static_module --with-http_random_index_module
–with-http_secure_link_module --with-http_stub_status_module
–with-file-aio --without-mail_pop3_module --without-mail_imap_module
–without-mail_smtp_module --with-debug
–add-module=/usr/local/hirantha/rpmbuild/BUILD/nginx-1.2.6/contrib/ngx_devel_kit-master
–add-module=/usr/local/hirantha/rpmbuild/BUILD/nginx-1.2.6/contrib/echo-nginx-module-master
–add-module=/usr/local/hirantha/rpmbuild/BUILD/nginx-1.2.6/contrib/set-misc-nginx-module-master
–add-module=/usr/local/hirantha/rpmbuild/BUILD/nginx-1.2.6/contrib/srcache-nginx-module-master
–add-module=/usr/local/hirantha/rpmbuild/BUILD/nginx-1.2.6/contrib/nginx-sticky-module-1.1
–add-module=/usr/local/hirantha/rpmbuild/BUILD/nginx-1.2.6/contrib/nginx_upstream_check_module-master
–add-module=/usr/local/hirantha/rpmbuild/BUILD/nginx-1.2.6/contrib/memc-nginx-module-master
–add-module=/usr/local/hirantha/rpmbuild/BUILD/nginx-1.2.6/contrib/nginx_cross_origin_module-master
–add-module=/usr/local/hirantha/rpmbuild/BUILD/nginx-1.2.6/contrib/nginx_tcp_proxy_module-master
–add-module=/usr/local/hirantha/rpmbuild/BUILD/nginx-1.2.6/contrib/naxsi-core-0.48/naxsi_src
–with-cc-opt=‘-O2 -g -m64 -mtune=generic’

Thanks in advance.

Posted at Nginx Forum:

Hi,

Did you solved that problem? I’m with the same issue using nginx/1.4.3.

Could be related with websockets that are still connected?

Some browsers can’t connect until that worker be killed? This happens
for
you too?

Thanks

Posted at Nginx Forum:

Hello,

You should read the end of the 1st paragraph of the following section to
find your answer:
http://nginx.org/en/docs/control.html#reconfiguration

If you do not wish to reach all the workers’ ‘graceful shutdown’
conditions, look at the list of signals they handle to find out how to
force it:
http://nginx.org/en/docs/control.html

Happy controlling,

B. R.

Are you able to reproduce the problem? Could you provide steps to do so?

Based on what you said, I would suspect a conflict between new and old
workers. I do not see in your report where the problem could come from.
I
suppose it is related, then.

Do you know which worker was receiving data?
Why did the problem solve itself? Did that happen at the same time as
the
old workers finally died?

What is the difference between handling HTTP and HTTPS in your
configuration? Is there any difference on that particular point between
old
and new configuration?

B. R.

Hi BR,

This helps a lot, but I can’t find a explanation for my (temporary)
problem.

About a hour ago one of our developers can’t access port 80, but
everything
goes fine with https. My nginx is listening http and https in the same
server,

With help of tcpdump, we could see packets coming in, but nothing
getting
out.

In my newbie understanding, the new worker are ready and got the
position,
the old worker is waiting all sockets to be closed to quit. That’s it?

But why this user still without response from http for a long period
(more
then 20 minutes)? After that, everything come back to work fine.

Thanks again

Igor

Posted at Nginx Forum:

Any hanging/error should be available in your log files.

Cross-check access and error logs and you will probably find unusual
entries you seek for.

B. R.

Hello

As another tool to analyze the problem similar to strace but more
powerful,
I suggest you to try sysdig

http://www.sysdig.org/

https://www.google.com/search?client=ubuntu&channel=fs&q=sysdig&ie=utf-8&oe=utf-8

You can do a trace of everything in your system.

Greetings,

Oscar

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi,

Which one do you use to reloading the config? restart or reload
command?

On 9/20/2014 02:50, igorhmm wrote:

COMMAND nginx 6009 0.0 0.2 71020 18532 ? S Sep18
down nginx 9302 0.0 0.2 70804 18296 ? S 10:26
72924 20336 ? S Sep18 0:23 nginx: worker process is
Really thanks for your attention.

Igor

Posted at Nginx Forum:
Re: Worker processes not shutting down

_______________________________________________ nginx mailing list
[email protected] nginx Info Page

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (MingW32)

iQEcBAEBAgAGBQJUHTiSAAoJEF1+odKB6YIxL7UIAIJjD+FR81E+mFEMNY1YY/5z
5R+8ZX7StDVUMRbCsT6VRY0Z7GRP7rPuAZasQtRM47lQ3nbE9rarFymB4CNmzZYt
H3j/qgiJ7Hwq+geMeGez9dXLoFll9/mKJ9op+dvAqL+SSto0fbcOTFWxKF0ycfxb
/MdQ/caJhF+ZuITW+qOcM7Clo7lUU1VZ6To0VVQNfbJZFiuC78D+P6PHGZDMzB4m
8zuxyDIbHJTev6XKLv+hZRlG7fgyM09FwH0SACxcsRKr3XF1dsKQE2OBF0bY8oia
nAiW6K0jX9tatOkm+Vj47MF0R37A0L4y86ChGYW2DZsB2Fc6HxLAQ0VBza2HvgU=
=Fkjc
-----END PGP SIGNATURE-----

Hi BR,

I don’t known how to reproduce, not yet :slight_smile:

I couldn’t identify which worker was responding too, but I can see with
strace warnings in the old wolker about EAGAIN (Resource temporarily
unavailable). I can see that because old workers still running:

USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
nginx 6009 0.0 0.2 71020 18532 ? S Sep18 0:37 nginx:
worker process is shutting down
nginx 6010 0.0 0.2 71028 18496 ? S Sep18 0:12 nginx:
worker process is shutting down
nginx 6289 0.2 0.3 75672 23188 ? S 07:54 0:58 nginx:
worker process is shutting down
nginx 6290 0.0 0.2 71932 19248 ? S 07:54 0:15 nginx:
worker process is shutting down
nginx 9182 0.0 0.2 70872 18380 ? S 10:20 0:02 nginx:
worker process is shutting down
nginx 9295 0.0 0.2 70952 18380 ? S 10:26 0:02 nginx:
worker process is shutting down
nginx 9297 0.0 0.2 70368 17856 ? S 10:26 0:02 nginx:
worker process is shutting down
nginx 9302 0.0 0.2 70804 18296 ? S 10:26 0:01 nginx:
worker process is shutting down
nginx 10132 0.2 0.2 74776 22280 ? S 10:53 0:47 nginx:
worker process is shutting down
nginx 10133 0.0 0.2 71484 18972 ? S 10:53 0:09 nginx:
worker process is shutting down
nginx 13690 0.2 0.2 72876 20296 ? S 14:22 0:10 nginx:
worker process
nginx 13691 0.1 0.2 71492 19088 ? S 14:22 0:07 nginx:
worker process
nginx 13692 0.0 0.0 57292 3180 ? S 14:22 0:00 nginx:
cache manager process
root 29863 0.0 0.0 57292 4048 ? Ss Sep11 0:00 nginx:
master process /usr/sbin/nginx -c /etc/nginx/nginx.conf
nginx 30956 0.0 0.2 72924 20336 ? S Sep18 0:23 nginx:
worker process is shutting down
[/code]

Looking for our user’s usage, this workers will stay online for more few
hours :slight_smile:

The difference between old and new configuration is just a “down” flag
in
one of our servers from our websockets pool (upstream). You can see a
simplified version of config on: nginx conf - Pastebin.com

Really thanks for your attention.

Igor

Posted at Nginx Forum:

Hi people,

@BR: I not found anything on logs related to this problem, but I still
investigating and trying to reproduce.

@oscaretu: this looks a nice tool, thanks for recommendation

@dewanggaba: I’m using the reload command. We can’t use restart because
this
will kill all established connections

Thanks for all

Posted at Nginx Forum:

Hello!

On Fri, Sep 19, 2014 at 12:50 PM, igorhmm wrote:

I don’t known how to reproduce, not yet :slight_smile:

I couldn’t identify which worker was responding too, but I can see with
strace warnings in the old wolker about EAGAIN (Resource temporarily
unavailable). I can see that because old workers still running:

Nginx workers take forever to quit usually because of pending timers.

One suggestion is to dump out all the pending timers’ handlers so that
we can know what parts of nginx are responsible for this. To be more
specific, you can traverse through the rbtree rooted at the C global
variable “ngx_event_timer_rbtree” and for each tree node, you obtain
the ngx_event_t object by doing the pointer arithmetic "((char *) cur

  • offsetof(ngx_event_t, timer))", then check the function pointed to
    by “ev->handler” [1]. All these checks can be done in a gdb script or
    a systemtap script that is inspecting a typical nginx worker pending
    shutting down.

[1] You can take this piece of C code from the ngx_lua module for such
an example:

But you need to rewrite it in gdb’s python extension language or
systemtap’s stap scripting language for online dynamic tracing.