Nginx / LRO on vmxnet3 / missing ACKs

Hello,

I’m currently investigating an issue with Linux (3.13.0), nginx (1.6.2),
vmxnet3 (1.2.0.0-k-NAPI), IPv6 connections and large receive offload
(LRO)
enabled. The workflow we are investigating is a POST of a small file
(jpg)
towards a php5-fpm pool.

From a network (tcpdump) point a view, it seems that when LRO is
disabled on
the vmxnet3 interface, all tcp packets are ack’ed correctly after
reception.
However, when LRO is enabled, only the request part of the POST is acked
at
tcp-level before the client retransmits the packets.

I couldn’t reproduce the issue reliably yet on another server with a
more
simple config, or with another daemon. The issue is only occurring over
IPv6.

We use sendfile/tcp_nopush/tcp_nodelay.

I know there’s not enough detail here to solve the issue, but can
someone
tell me if there’s any specific code path or specific configuration
values
that could trigger this kind of behavior on nginx’s side ?

The CURL we use to test - nothing special:
curl -6 -F foo=abcd -F bar=dcba -F file=@/tmp/jpeg-home.jpg
http://server/v1/images’ -lv --trace-time > /dev/null

LRO enabled:
12:01:15.094022 IP6 2001:db8::1.49870 > 2001:db8::2.http: Flags [S], seq
3856909919, win 65535, options [mss 1400,nop,wscale 5,nop,nop,TS val
963369552 ecr 0,sackOK,eol], length 0
12:01:15.094043 IP6 2001:db8::2.http > 2001:db8::1.49870: Flags [S.],
seq
1346488220, ack 3856909920, win 28560, options [mss 1440,sackOK,TS val
228121950 ecr 963369552,nop,wscale 7], length 0
12:01:15.100478 IP6 2001:db8::1.49870 > 2001:db8::2.http: Flags [.], ack
1,
win 4120, options [nop,nop,TS val 963369558 ecr 228121950], length 0
12:01:15.101827 IP6 2001:db8::1.49870 > 2001:db8::2.http: Flags [P.],
seq
1:238, ack 1, win 4120, options [nop,nop,TS val 963369558 ecr
228121950],
length 237: HTTP: POST /v1/images HTTP/1.1
12:01:15.101837 IP6 2001:db8::2.http > 2001:db8::1.49870: Flags [.], ack
238, win 232, options [nop,nop,TS val 228121952 ecr 963369558], length 0
12:01:15.101873 IP6 2001:db8::2.http > 2001:db8::1.49870: Flags [P.],
seq
1:26, ack 238, win 232, options [nop,nop,TS val 228121952 ecr
963369558],
length 25: HTTP: HTTP/1.1 100 Continue
12:01:15.109132 IP6 2001:db8::1.49870 > 2001:db8::2.http: Flags [.], ack
26,
win 4119, options [nop,nop,TS val 963369566 ecr 228121952], length 0
12:01:15.109846 IP6 2001:db8::1.49870 > 2001:db8::2.http: Flags [P.],
seq
238:580, ack 26, win 4119, options [nop,nop,TS val 963369566 ecr
228121952],
length 342: HTTP
12:01:15.114752 IP6 2001:db8::1.49870 > 2001:db8::2.http: Flags [.], seq
580:3356, ack 26, win 4119, options [nop,nop,TS val 963369566 ecr
228121952], length 2776: HTTP
12:01:15.114762 IP6 2001:db8::1.49870 > 2001:db8::2.http: Flags [.], seq
3356:4744, ack 26, win 4119, options [nop,nop,TS val 963369566 ecr
228121952], length 1388: HTTP
12:01:15.147172 IP6 2001:db8::2.http > 2001:db8::1.49870: Flags [.], ack
580, win 240, options [nop,nop,TS val 228121964 ecr 963369566], length 0
[problem starts here - seq up to 4474 was received, but only 580 are
acked]

12:01:15.160117 IP6 2001:db8::1.49870 > 2001:db8::2.http: Flags [.], seq
4744:6132, ack 26, win 4119, options [nop,nop,TS val 963369611 ecr
228121952], length 1388: HTTP
12:01:15.160138 IP6 2001:db8::2.http > 2001:db8::1.49870: Flags [.], ack
580, win 263, options [nop,nop,TS val 228121967 ecr
963369566,nop,nop,sack 1
{4744:6132}], length 0
12:01:15.421491 IP6 2001:db8::1.49870 > 2001:db8::2.http: Flags [.], seq
580:1968, ack 26, win 4119, options [nop,nop,TS val 963369870 ecr
228121967], length 1388: HTTP
12:01:15.421523 IP6 2001:db8::2.http > 2001:db8::1.49870: Flags [.], ack
1968, win 285, options [nop,nop,TS val 228122032 ecr
963369870,nop,nop,sack
1 {4744:6132}], length 0
12:01:15.435450 IP6 2001:db8::1.49870 > 2001:db8::2.http: Flags [.], seq
1968:4744, ack 26, win 4119, options [nop,nop,TS val 963369884 ecr
228122032], length 2776: HTTP
12:01:15.739853 IP6 2001:db8::1.49870 > 2001:db8::2.http: Flags [.], seq
1968:3356, ack 26, win 4119, options [nop,nop,TS val 963370186 ecr
228122032], length 1388: HTTP
12:01:15.739879 IP6 2001:db8::2.http > 2001:db8::1.49870: Flags [.], ack
3356, win 307, options [nop,nop,TS val 228122112 ecr
963370186,nop,nop,sack
1 {4744:6132}], length 0
12:01:15.751112 IP6 2001:db8::1.49870 > 2001:db8::2.http: Flags [.], seq
3356:4744, ack 26, win 4119, options [nop,nop,TS val 963370197 ecr
228122112], length 1388: HTTP
12:01:15.751131 IP6 2001:db8::2.http > 2001:db8::1.49870: Flags [.], ack
6132, win 330, options [nop,nop,TS val 228122114 ecr 963370197], length
0
12:01:15.751136 IP6 2001:db8::1.49870 > 2001:db8::2.http: Flags [.], seq
6132:7520, ack 26, win 4119, options [nop,nop,TS val 963370197 ecr
228122112], length 1388: HTTP
12:01:15.751141 IP6 2001:db8::2.http > 2001:db8::1.49870: Flags [.], ack
7520, win 352, options [nop,nop,TS val 228122115 ecr 963370197], length
0
[…]

LRO disabled:
15:21:23.262654 IP6 2001:db8::1.51774 > 2001:db8::2.http: Flags [S], seq
1600883588, win 65535, options [mss 1400,nop,wscale 5,nop,nop,TS val
975315949 ecr 0,sackOK,eol], length 0
15:21:23.262680 IP6 2001:db8::2.http > 2001:db8::1.51774: Flags [S.],
seq
1261254534, ack 1600883589, win 28560, options [mss 1440,sackOK,TS val
231123992 ecr 975315949,nop,wscale 7], length 0
15:21:23.269919 IP6 2001:db8::1.51774 > 2001:db8::2.http: Flags [.], ack
1,
win 4120, options [nop,nop,TS val 975315956 ecr 231123992], length 0
15:21:23.273546 IP6 2001:db8::1.51774 > 2001:db8::2.http: Flags [P.],
seq
1:238, ack 1, win 4120, options [nop,nop,TS val 975315956 ecr
231123992],
length 237: HTTP: POST /v1/images HTTP/1.1
15:21:23.273563 IP6 2001:db8::2.http > 2001:db8::1.51774: Flags [.], ack
238, win 232, options [nop,nop,TS val 231123995 ecr 975315956], length 0
15:21:23.273586 IP6 2001:db8::2.http > 2001:db8::1.51774: Flags [P.],
seq
1:26, ack 238, win 232, options [nop,nop,TS val 231123995 ecr
975315956],
length 25: HTTP: HTTP/1.1 100 Continue
15:21:23.279832 IP6 2001:db8::1.51774 > 2001:db8::2.http: Flags [.], ack
26,
win 4119, options [nop,nop,TS val 975315965 ecr 231123995], length 0
15:21:23.281329 IP6 2001:db8::1.51774 > 2001:db8::2.http: Flags [P.],
seq
238:580, ack 26, win 4119, options [nop,nop,TS val 975315965 ecr
231123995],
length 342: HTTP
15:21:23.285367 IP6 2001:db8::1.51774 > 2001:db8::2.http: Flags [.], seq
580:1968, ack 26, win 4119, options [nop,nop,TS val 975315965 ecr
231123995], length 1388: HTTP
15:21:23.285379 IP6 2001:db8::2.http > 2001:db8::1.51774: Flags [.], ack
1968, win 263, options [nop,nop,TS val 231123998 ecr 975315965], length
0
15:21:23.285440 IP6 2001:db8::1.51774 > 2001:db8::2.http: Flags [.], seq
1968:3356, ack 26, win 4119, options [nop,nop,TS val 975315965 ecr
231123995], length 1388: HTTP
15:21:23.285463 IP6 2001:db8::1.51774 > 2001:db8::2.http: Flags [.], seq
3356:4744, ack 26, win 4119, options [nop,nop,TS val 975315965 ecr
231123995], length 1388: HTTP
15:21:23.285469 IP6 2001:db8::2.http > 2001:db8::1.51774: Flags [.], ack
4744, win 307, options [nop,nop,TS val 231123998 ecr 975315965], length
0
15:21:23.297518 IP6 2001:db8::1.51774 > 2001:db8::2.http: Flags [.], seq
4744:6132, ack 26, win 4119, options [nop,nop,TS val 975315977 ecr
231123998], length 1388: HTTP
15:21:23.297690 IP6 2001:db8::1.51774 > 2001:db8::2.http: Flags [.], seq
6132:7520, ack 26, win 4119, options [nop,nop,TS val 975315977 ecr
231123998], length 1388: HTTP
15:21:23.297701 IP6 2001:db8::2.http > 2001:db8::1.51774: Flags [.], ack
7520, win 352, options [nop,nop,TS val 231124001 ecr 975315977], length
0
15:21:23.297827 IP6 2001:db8::1.51774 > 2001:db8::2.http: Flags [.], seq
7520:8908, ack 26, win 4119, options [nop,nop,TS val 975315977 ecr
231123998], length 1388: HTTP
15:21:23.299739 IP6 2001:db8::1.51774 > 2001:db8::2.http: Flags [.], seq
8908:10296, ack 26, win 4119, options [nop,nop,TS val 975315977 ecr
231123998], length 1388: HTTP
15:21:23.299747 IP6 2001:db8::2.http > 2001:db8::1.51774: Flags [.], ack
10296, win 397, options [nop,nop,TS val 231124002 ecr 975315977], length
0

Thanks,
Best regards,
Aurélien

Posted at Nginx Forum:

tcp-level before the client retransmits the packets.
This is clearly not a userspace issue. Its either a kernel or a
hypervisor issue.

I would start by using a supported and uptodate kernel, because 3.13.0
is neither.

Thanks for this confirmation.

These were my next steps anyways.

I will update this post if I have any definite indications that
something is
amiss in userspace.

Best regards,
Aurélien

Posted at Nginx Forum: