499 error reverse proxy

I folks, wondering if I could get your input.

I have a nginx 1.7.3 configured as a reverse proxy. The system seems to
work
fine when not under heavy load. However, there are periods of time when
we
have extreme load because of ecom promotions.

The $request_time gets up to 15 seconds. Normally the $request_time is
only
a few seconds. The client machine is making a web service call to the
Nginx
reverse proxy and the client web service making the call is designed to
abort the connection if it takes longer than 15 seconds. As a result
that
is why you see all the $request_time’s stop at 15 seconds.

The $upstream_addr field is populated with “-“. What would cause the
$request_time to take longer than 15 seconds?
It seems the nginx reverse proxy never passes the connection on to the
upstream server.

10.8.165.116 - - [13/Dec/2014:07:58:12 -0600] “POST
/Common/service/GateService/GateService.svc HTTP/1.1” 499 0 “-” “JAX-WS
RI
2.1.6 in JDK 6” “-” LB_req_Time: 15.000 WebSrv_rspTime: - Req_size: 8292
HTTP_content_size: 7875
10.8.165.112 - - [13/Dec/2014:07:58:13 -0600] “POST
/Common/service/GateService/GateService.svc HTTP/1.1” 499 0 “-” “JAX-WS
RI
2.1.6 in JDK 6” “-” LB_req_Time: 14.814 WebSrv_rspTime: - Req_size:
12024
HTTP_content_size: 11606
10.8.165.113 - - [13/Dec/2014:07:58:13 -0600] “POST
/Common/service/GateService/GateService.svc HTTP/1.1” 499 0 “-” “JAX-WS
RI
2.1.6 in JDK 6” “-” LB_req_Time: 14.808 WebSrv_rspTime: - Req_size: 6230
HTTP_content_size: 5813
10.8.165.117 - - [13/Dec/2014:07:58:13 -0600] “POST
/Common/service/GateService/GateService.svc HTTP/1.1” 499 0 “-” “JAX-WS
RI
2.1.6 in JDK 6” “-” LB_req_Time: 15.000 WebSrv_rspTime: - Req_size: 6249
HTTP_content_size: 5832

I have done some tuning on the kernel params, here they are:
net.ipv4.ip_forward = 0
net.ipv4.conf.default.rp_filter = 1
net.ipv4.conf.default.accept_source_route = 0
kernel.sysrq = 0
kernel.core_uses_pid = 1
net.ipv4.tcp_syncookies = 1
kernel.msgmnb = 65536
kernel.msgmax = 65536
kernel.shmmax = 68719476736
kernel.shmall = 4294967296
net.ipv4.ip_nonlocal_bind = 1
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_max_orphans = 60000
net.ipv4.tcp_synack_retries = 2
net.ipv4.tcp_syn_retries = 2
net.ipv4.icmp_echo_ignore_broadcasts = 1
kernel.exec-shield = 1
kernel.randomize_va_space = 1
net.ipv4.icmp_ignore_bogus_error_responses = 1
net.ipv4.conf.default.send_redirects = 0
net.ipv4.conf.all.send_redirects = 0
net.ipv4.conf.all.rp_filter = 1
net.ipv4.conf.all.accept_source_route = 0
net.ipv4.conf.all.accept_redirects = 0
net.ipv4.conf.all.secure_redirects = 0
net.ipv4.conf.default.accept_redirects = 0
net.ipv4.conf.default.secure_redirects = 0
net.ipv4.ip_local_port_range = 1024 65535
net.ipv4.tcp_window_scaling = 1
net.core.somaxconn = 65535
net.core.netdev_max_backlog = 16384
net.ipv4.tcp_max_syn_backlog = 65536
net.ipv4.tcp_max_tw_buckets = 1440000
net.core.rmem_default = 8388608
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216
net.ipv4.tcp_congestion_control = cubic
fs.file-max = 3000000
net.ipv4.tcp_slow_start_after_idle = 0
net.ipv4.tcp_fin_timeout = 15

##########nginx.conf############

user nginx;
worker_processes 4;

error_log /var/log/nginx/error.log warn;
pid /var/run/nginx.pid;

worker_rlimit_nofile 40960;
events {
use epoll;
# accept X connections per worker process
worker_connections 10240;
# accept more than 1 connection at a time
multi_accept on;
}
timer_resolution 500ms;

######### server config ##################

upstream secureProd {
#sticky;
least_conn;
server server1:443 weight=10 max_fails=3 fail_timeout=3s;
server server2:443 weight=10 max_fails=3 fail_timeout=3s;
server server3:443 weight=10 max_fails=3 fail_timeout=3s;
server server4:443 weight=10 max_fails=3 fail_timeout=3s;

    keepalive 100;

}

   proxy_connect_timeout 75;
    proxy_send_timeout 75;
    proxy_read_timeout 75;


    # do not transfer http request to next server on timeout
    proxy_next_upstream off;


    client_max_body_size 10m;
    proxy_buffering on;
    client_body_buffer_size 10m;
    proxy_buffer_size 32k;
    proxy_buffers 1024 32k;
    large_client_header_buffers 20 8k;


    location / {
            index   index.html
            # needed for HTTPS
            #client_body_buffer_size 2m;

            #client_max_body_size 10m;
            #proxy_buffering on;
            #client_body_buffer_size 10m;
            #proxy_buffer_size 16k;
            #proxy_buffers 2048 8k;

            proxy_set_header X-Forwarded-Proto https;
            proxy_redirect off;
            proxy_set_header Host $host;
            proxy_set_header  X-Real-IP  $remote_addr;
            proxy_set_header  X-Forwarded-For

$proxy_add_x_forwarded_for;
proxy_http_version 1.1;
proxy_max_temp_file_size 0;
proxy_pass https://secureProd;

    } #end location

} #end server

netstat -s

Ip:
832857527 total packets received
324844 with invalid addresses
0 forwarded
0 incoming packets discarded
790582949 incoming packets delivered
969556973 requests sent out
21248 outgoing packets dropped
1516 fragments dropped after timeout
12544 reassemblies required
2920 packets reassembled ok
1516 packet reassembles failed
2981 fragments received ok
5962 fragments created
Icmp:
33476 ICMP messages received
0 input ICMP message failed.
ICMP input histogram:
destination unreachable: 25335
timeout in transit: 31
echo requests: 8110
12995 ICMP messages sent
0 ICMP messages failed
ICMP output histogram:
destination unreachable: 3023
echo request: 1862
echo replies: 8110
IcmpMsg:
InType3: 25335
InType8: 8110
InType11: 31
OutType0: 8110
OutType3: 3023
OutType8: 1862
Tcp:
68091793 active connections openings
17998625 passive connection openings
23993 failed connection attempts
12419 connection resets received
69 connections established
765636836 segments received
952207983 segments send out
563253 segments retransmited
6445 bad segments received.
39498 resets sent
Udp:
20177272 packets received
1 packets to unknown port received.
0 packet receive errors
14411997 packets sent
UdpLite:
TcpExt:
3965 invalid SYN cookies received
20824 resets received for embryonic SYN_RECV sockets
1 packets pruned from receive queue because of socket buffer overrun
7 ICMP packets dropped because they were out-of-window
14642125 TCP sockets finished time wait in fast timer
217 packets rejects in established connections because of timestamp
38764858 delayed acks sent
177 delayed acks further delayed because of locked socket
Quick ack mode was activated 42592 times
40 packets directly queued to recvmsg prequeue.
4 packets directly received from prequeue
143786688 packets header predicted
220243967 acknowledgments not containing data received
81227601 predicted acknowledgments
2200 times recovered from packet loss due to SACK data
2 bad SACKs received
Detected reordering 1 times using SACK
Detected reordering 2 times using time stamp
2 congestion windows partially recovered using Hoe heuristic
TCPDSACKUndo: 27138
380316 congestion windows recovered after partial ack
1348 TCP data loss events
TCPLostRetransmit: 4
14291 timeouts after SACK recovery
1252 timeouts in loss state
1658 fast retransmits
2316 forward retransmits
8351 retransmits in slow start
525882 other TCP timeouts
1005 sack retransmits failed
48 packets collapsed in receive queue due to low socket buffer
42624 DSACKs sent for old packets
131 DSACKs sent for out of order packets
462648 DSACKs received
37 DSACKs for out of order packets received
31 connections reset due to unexpected data
5596 connections reset due to early user close
136 connections aborted due to timeout
TCPDSACKIgnoredNoUndo: 19299
TCPSackShifted: 239
TCPSackMerged: 455
TCPSackShiftFallback: 484195
TCPChallengeACK: 5832
TCPSYNChallenge: 6445
IpExt:
InMcastPkts: 24767376
OutMcastPkts: 2360767
InBcastPkts: 39238283
InOctets: 309264166239
OutOctets: 352578759904
InMcastOctets: 1643443932
OutMcastOctets: 94433873
InBcastOctets: 5296847155

Posted at Nginx Forum:

Hello!

On Wed, Dec 17, 2014 at 08:02:47PM -0500, tatroc wrote:

is why you see all the $request_time’s stop at 15 seconds.

The $upstream_addr field is populated with “-“. What would cause the
$request_time to take longer than 15 seconds?
It seems the nginx reverse proxy never passes the connection on to the
upstream server.

10.8.165.116 - - [13/Dec/2014:07:58:12 -0600] “POST
/Common/service/GateService/GateService.svc HTTP/1.1” 499 0 “-” “JAX-WS RI
2.1.6 in JDK 6” “-” LB_req_Time: 15.000 WebSrv_rspTime: - Req_size: 8292
HTTP_content_size: 7875

[…]

As requests are POSTs, likely the problem is with sending request
body. Probably due to network problems.


Maxim D.
http://nginx.org/