Weird timeouts, not sure if I've set the right threshholds

It is proxying (basically just load balancing) to 3 upstream nginx
webservers which do FastCGI/PHP/normal static files.

I get a lot of “upstream timed out” errors and I can’t determine why
exactly. Perhaps my buffers are too high, too low, timeouts are too
high, too low? I bumped up the timeouts on my proxy machine higher
than I normally would set so it wouldn’t timeout as often but it still
does. All the machines are quad-core xeons with 4GB RAM, SATA2 disks,
dedicated to web/fastcgi/php, all connected via a private gigabit
VLAN… the files are hosted on NFS, but I’m not seeing any errors
related to that in logs either…

nginx PROXY:

user www-data www-data;
worker_processes 4;
worker_cpu_affinity 0001 0010 0100 1000;
working_directory /var/run;
error_log /var/log/nginx/error.log error;
pid /var/run/nginx.pid;

events {
worker_connections 1024;
}

http {
upstream webservers {
server web01:80;
server web02:80;
server web03:80;
}
include /etc/nginx/mime.types;
default_type application/octet-stream;
sendfile on;
tcp_nopush on;
tcp_nodelay on;
client_max_body_size 100m;
client_header_buffer_size 8k;
large_client_header_buffers 12 6k;
keepalive_timeout 5;
gzip on;
gzip_static on;
gzip_proxied any;
gzip_min_length 1100;
#gzip_http_version 1.0;
gzip_comp_level 2;
gzip_types text/plain text/html text/css application/x-javascript
text/xml application/xml application/xml+rss
gzip_disable “MSIE [1-6].”;
gzip_vary on;
server_names_hash_max_size 4096;
server_names_hash_bucket_size 128;
server {
listen 80;
access_log off;
location / {
proxy_pass http://mikehost;
proxy_next_upstream error timeout http_500 http_503
invalid_header;
proxy_max_temp_file_size 0;
proxy_read_timeout 50;
proxy_connect_timeout 30;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_ignore_client_abort on;
}
}
}

nginx WEBSERVERS:

user www-data www-data;
worker_processes 4;
worker_cpu_affinity 0001 0010 0100 1000;
working_directory /var/run;
error_log /var/log/nginx/error.log debug;
pid /var/run/nginx.pid;

events {
worker_connections 1024;
}

http {
include /etc/nginx/mime.types;
default_type application/octet-stream;

set_real_ip_from 10.13.5.16;
access_log off;
sendfile on;
tcp_nopush on;
tcp_nodelay on;
client_max_body_size 100m;
client_header_buffer_size 8k;
large_client_header_buffers 12 6k;
keepalive_timeout  5;
server_tokens off;
gzip off;
gzip_static off;
server_names_hash_max_size 4096;
server_names_hash_bucket_size 128;

then an example vhost block

server {
listen 80;
server_name michaelshadle.com www.michaelshadle.com;
index index.php;
root /home/mike/web/michaelshadle.com/;
location ~ .php {
include /etc/nginx/fastcgi.conf;
fastcgi_pass 127.0.0.1:11000;
fastcgi_index index.php;
}
if (!-e $request_filename) {
rewrite ^(.+)$ /wordpress/index.php?q=$1 last;
}
}

Any thoughts? Would turning off some buffers on the proxy make it
better, or would turning off buffering on the webservers make it
better? Not quite sure here where the best place would be to change
(if anywhere…)

Thanks…

On Wed, Apr 30, 2008 at 09:37:09PM -0700, mike wrote:

related to that in logs either…
The timeout errors have a “while …” string that describes conditions,
when the error has happened.

worker_connections  1024;
sendfile on;
#gzip_http_version 1.0;
    location / {

}

set_real_ip_from 10.13.5.16;
gzip_static off;
    include /etc/nginx/fastcgi.conf;
    fastcgi_pass 127.0.0.1:11000;
    fastcgi_index index.php;
}
if (!-e $request_filename) {
    rewrite ^(.+)$ /wordpress/index.php?q=$1 last;
}

Instead of this “if” it’s better to use:

  location / {
      error_page  404  = //wordpress/index.php?q=$uri;
  }

}

Any thoughts? Would turning off some buffers on the proxy make it
better, or would turning off buffering on the webservers make it
better? Not quite sure here where the best place would be to change
(if anywhere…)

The timeout errors has no relation to buffers. These errors usually
means that backends are too slow.

On 4/30/08, Igor S. [email protected] wrote:

The timeout errors has no relation to buffers. These errors usually
means that backends are too slow.

What should I look at then to speed them up?

The backends are Linux, NFS server is FBSD7, proxy server is Linux.
Any pointers are appreciated.

The load on the machines seems to stay quite low…

22:07:38 up 7:12, 2 users, load average: 0.08, 0.08, 0.08

On Wed, Apr 30, 2008 at 10:07:59PM -0700, mike wrote:

On 4/30/08, Igor S. [email protected] wrote:

The timeout errors has no relation to buffers. These errors usually
means that backends are too slow.

What should I look at then to speed them up?

The backends are Linux, NFS server is FBSD7, proxy server is Linux.
Any pointers are appreciated.

Could you show some timeout messages ?

On the proxy. I’ve changed the IPs and hostnames to protect the
innocent (except for the upstream IPs - which are the 10.13.5.x ones)

2008/04/30 22:02:44 [error] 24799#0: *77643721 upstream timed out
(110: Connection timed out) while reading upstream, client:
21.5.4.247, server: lvs01.myhosting.net, request: “GET
/gallery/videos/FLV_LG/ics2.flv HTTP/1.1”, upstream:
http://10.13.5.14:80/gallery/videos/FLV_LG/ics2.flv”, host:
media.clientdomain.net”, referrer:
http://clientdomain.net/images/tempVid_lg.swf

2008/04/30 22:04:45 [error] 24801#0: *77661724 upstream timed out
(110: Connection timed out) while reading response header from
upstream, client: 12.107.23.215, server: lvs01.myhosting.net, request:
“GET /forum/aus-classifieds/171768-e46-sub-box-2-x-r-f-p2-subs.html
HTTP/1.1”, upstream:
http://10.13.5.14:80/forum/aus-classifieds/171768-e46-sub-box-2-x-r-f-p2-subs.html”,
host: “www.clientdomain.net”, referrer:
http://images.google.com/imgres?imgurl=http://memimage.cardomain.net/member_images/12/web/2124000-2124999/2124693_10_full.jpg&imgrefurl=http://www.clientdomain.net/forum/aus-classifieds/171768-e46-sub-box-2-x-r-f-p2-subs.html&start=100&h=431&w=575&sz=170&tbnid=vF8z4lznXJyOxM:&tbnh=100&tbnw=134&hl=en&prev=/images%3Fq%3De46%26start%3D80%26imgsz%3Dlarge%257Cxlarge%257Cxxlarge%257Chuge%26gbv%3D1%26hl%3Den%26sa%3DN

2008/04/30 22:08:18 [error] 24798#0: *77681622 upstream timed out
(110: Connection timed out) while reading response header from
upstream, client: 66.29.2.76, server: lvs01.myhosting.net, request:
“GET /forum/general-discussion/117997-how-hard-repaint-car.html
HTTP/1.1”, upstream:
http://10.13.5.10:80/forum/general-discussion/117997-how-hard-repaint-car.html”,
host: “www.clientdomain.net

i’m not actually seeing anything for that request in the backend error
log. sometimes i see “client aborted connection” etc. but i enabled
the “fastcgi_ignore_client_abort on;” on the backend, and
“proxy_ignore_client_abort on;” on the proxy…

On 4/30/08, Igor S. [email protected] wrote:

On Wed, Apr 30, 2008 at 09:37:09PM -0700, mike wrote:

location ~ .php {

BTW, this regex should be:

 location ~ \.php$ {

I originally had that. The problem is some of my apps do a rewrite from

foo.com/bar/bazfoo.com/index.php/bar/baz

In which case, FastCGI won’t be executed because php$ is not matched.
I had to remove the restriction on it ending with PHP for that. (Yes,
nowadays I know better, but I can’t change those apps right now)

I could at least add in that \ before the period though, I suppose.

On Wed, Apr 30, 2008 at 10:26:16PM -0700, mike wrote:

(110: Connection timed out) while reading response header from
upstream, client: 66.29.2.76, server: lvs01.myhosting.net, request:
“GET /forum/general-discussion/117997-how-hard-repaint-car.html
HTTP/1.1”, upstream:
http://10.13.5.10:80/forum/general-discussion/117997-how-hard-repaint-car.html”,
host: “www.clientdomain.net

The first line is error in the middle of ics2.flv transfer.
The last two are did not get responses at all: “while reading response
header”.
I think these two are handled by PHP and it may be slow.
But the first timeout of static FLV is strange. Are you show that it
handled
by nginx but not PHP ?

yeah, i issued the request manually, there is no PHP there.

that could have been a client timeout or something for all i know. i
am sure with a few million requests per day there will be a handful of
crappy connections, proxies, tor exit nodes changing, etc…

On Wed, Apr 30, 2008 at 10:53:59PM -0700, mike wrote:

yeah, i issued the request manually, there is no PHP there.

What do you mean by manually ?

that could have been a client timeout or something for all i know. i
am sure with a few million requests per day there will be a handful of
crappy connections, proxies, tor exit nodes changing, etc…

No, this are timeout errors between nginx and its backend.

On Wed, Apr 30, 2008 at 09:37:09PM -0700, mike wrote:

location ~ .php {

BTW, this regex should be:

  location ~ \.php$ {

On 4/30/08, Igor S. [email protected] wrote:

What do you mean by manually ?

I mean I loaded it manually in my browser to double check it loads,
and there is no handler on that vhost that does fastcgi no matter what
either.

No, this are timeout errors between nginx and its backend.

Oh, hmm…

Not that time, I think ever since I did the ignore client aborts/etc I
don’t get the corresponding errors.

On Wed, Apr 30, 2008 at 11:55:32PM -0700, mike wrote:

Not that time, I think ever since I did the ignore client aborts/etc I
don’t get the corresponding errors.

You should look upstream errors on proxy nginx side and client errors
on other nginx side: the proxy nginx is client for nginxes of second
level.

On Wed, Apr 30, 2008 at 11:27:21PM -0700, mike wrote:

On 4/30/08, Igor S. [email protected] wrote:

What do you mean by manually ?

I mean I loaded it manually in my browser to double check it loads,
and there is no handler on that vhost that does fastcgi no matter what
either.

Are the corresponding messages on other nginx ?

Just got a ton of these on the proxy:

2008/05/02 00:39:18 [error] 30191#0: *88449678 upstream timed out
(110: Connection timed out) while reading response header from
upstream

Not anything in the upstream error log - but it looks like NFS might
have frozen.

Shouldn’t the upstream’s error log report something about failure to
read a file or something?

and lots of these, in groups:

2008/05/02 01:26:08 [error] 30190#0: *88628174 recv() failed (104:
Connection reset by peer) while reading response header from upstream,
client: 52.94.24.43, server: lvs01.myhost.net, request: “GET
/i/rate/100/294/100294392_48e1c_h.jpg HTTP/1.1”, upstream:
http://10.13.5.10:80/i/rate/100/294/100294392_48e1c_h.jpg”, host:
media.client.com”, referrer: “http://www.client.com/photos.php

this is from the proxy server error log

below were from the upstream error log

Here’s some weird ones:

2008/05/02 01:24:37 [info] 6656#0: *562998 client timed out (110:
Connection timed out) while reading client request line, client:
10.13.5.16, server: 0.0.0.0:80
2008/05/02 01:24:38 [info] 6658#0: *563002 client timed out (110:
Connection timed out) while reading client request line, client:
10.13.5.16, server: 0.0.0.0:80

10.13.5.16 is my proxy server (nginx)

Yes that’s the setup.

I get timeouts from client ↔ nginx1 (which is probably normal)
and nginx1 ↔ nginx2
and nginx2 ↔ nfs (which I am trying to address, I know some of the
timeouts are related but I dont believe all of them are.)

So I don’t repost it again here are the configs:
http://article.gmane.org/gmane.comp.web.nginx.english/4738

On Fri, May 02, 2008 at 10:46:44PM +1000, Dave C. wrote:

You have a setup ?

client -> nginx1 -> nginx2

so nginx2 is waiting for nginx1 to send it something, but nginx1 is
expecting nginx2 to sending it something, nginx2 gets tired of waiting
and closes the connection.

No. nginx1 gets all client request, then it connects to nginx2 and
passes full request to it. Then nginx1 wait nginx2 response.

You have a setup ?

client -> nginx1 -> nginx2

so nginx2 is waiting for nginx1 to send it something, but nginx1 is
expecting nginx2 to sending it something, nginx2 gets tired of waiting
and closes the connection.

What are your proxy settings? grep proxy nginx.conf

Cheers

Dave