From what I see it’s your function ngx_rbtree_next() which
segfaults, not nginx. It tries to identify root node via checking
it’s parent against NULL - which is not correct way to do it, you
should instead compare node pointer against tree root pointer.
If you do still think that nginx is affected - please provide
another test case. The one which uses nginx, not code borrowed
from it, is preffered - see nginx-tests: log for
some framework and test samples.
The root node of the nginx-rbtree is always NULL, because
“ngx_rbtree_insert()” provides that feature. Only if the rbtree
has 2 elements and you delete the root node, then “parent” of the
root-node points to the deleted element. Why not fixing it?
On Sat, Mar 20, 2010 at 04:43:56AM -0400, double wrote:
Hello,
The root node of the nginx-rbtree is always NULL, because
“ngx_rbtree_insert()” provides that feature. Only if the rbtree
has 2 elements and you delete the root node, then “parent” of the
root-node points to the deleted element. Why not fixing it?
As you already pointed out, there is at least one place where
root->parent becomes non-NULL (and I’m not sure it’s the only
place where it happens). And this doesn’t cause any harm as nginx
doesn’t assume it should be NULL.
While I tend to think that it’s good idea to keep it NULL at least
with NGX_DEBUG defined (to simplify debugging) - there is no bug
here. The bug is in your tree traversal code which tries to use
assumption that root->parent == NULL. And even if your patch will
be applied (it’s up to Igor anyway) - your tree traversal code
should be fixed if you are planning to use it somewhere in
production.
With a standard (i.e. no options) installation of 0.8.34 on my Linux
machine I get a segfault if the resolver named in the conf file is a
loopback address/IP, but the resolver does not exist. Other
non-existing resolvers don’t cause a problem (they just hang, and
probably will time out), only loopback ones.
On Sun, Mar 21, 2010 at 04:21:38AM +0200, Marcus C. wrote:
Hi,
With a standard (i.e. no options) installation of 0.8.34 on my Linux
machine I get a segfault if the resolver named in the conf file is a
loopback address/IP, but the resolver does not exist. Other
non-existing resolvers don’t cause a problem (they just hang, and
probably will time out), only loopback ones.
Could you please provide full debug log (i.e. switched on at global
level, not
http/server/location)? E.g. between the above lines should be some
valuable
resolving information which isn’t logged in request context but in
global one
instead.
but will do if no-one gets around to it at some point.
Config and backtrace should be helpfull too. Unfortunately I’m not able
to
reproduce the problem.
Not only loopback resolver address, but any addresses without UDP:53
port listened can trigger this segfault. I’ve confirmed the problem on
my Linux box. Here’s the backtrace of nginx-0.8.49 when segfault
happens:
#0 0x08059acd in ngx_log_error_core (level=4, log=0x810c16c, err=111,
fmt=0x80e92cd “recv() failed”) at src/core/ngx_log.c:93 #1 0x08062442 in ngx_connection_error (c=0x812c538, err=111,
text=0x80e92cd “recv() failed”) at src/core/ngx_connection.c:1015 #2 0x0806ef06 in ngx_udp_unix_recv (c=0x812c538, buf=0xbfffdcfc
“m\220”, size=4096) at src/os/unix/ngx_udp_recv.c:99 #3 0x080689b1 in ngx_resolver_read_response (rev=0x81424f8) at
src/core/ngx_resolver.c:952 #4 0x08073035 in ngx_epoll_process_events (cycle=0x810d528, timer=5000,
flags=) at src/event/modules/ngx_epoll_module.c:642 #5 0x0806baa2 in ngx_process_events_and_timers (cycle=0x810d528) at
src/event/ngx_event.c:245 #6 0x08070edc in ngx_single_process_cycle (cycle=0x810d528) at
src/os/unix/ngx_process_cycle.c:306 #7 0x08059576 in main (argc=1, argv=0xbfffef74) at src/core/nginx.c:398
The bug exists in all versions later than nginx-0.8.36 but not in
nginx-0.7.x versions.
This problem is due to the incorrectly copy of cycle->new_log when
ngx_resolver_create() initializing udp_connection->log. Because
cycle->new_log only gets initialized in ngx_init_cycle() after all
configurations have been processed, the ngx_resolver_create() will be
called BEFORE cycle->new_log has anything meaningful. And because it
COPIED cycle->new_log, udp_connection->log will always be invalid even
cycle->new_log does get initialized properly later. When resolver is
used to resolve names and failed to connect the specified nameserver,
nginx tries to log a timeout error using the invalid log structure
stored in udp_connection, and then boom…
IMHO, by changing the definition of ngx_udp_connection_t to make log
field a pointer instead of a ngx_log_t structure (also with a bunch of
related reference modifications, of course), this bug will gone without
pains.
The attached patch should fix the bug.
This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.