I’m having an issue with nginx writing to cephfs. Often I’m getting:
writev() “/home/ceph/temp/44/94/1/0000119444” failed (4: Interrupted
system call) while reading upstream
looking with strace, this happens:
…
write(65, “e\314\366\36\302”…, 65536) = ? ERESTARTSYS (To be
restarted)
It happens after first 4MBs (exactly) are written, subsequent write gets
ERESTARTSYS (sometimes, but more rarely, it fails after first 32 or
64MBs, etc are written). Apparently nginx doesn’t expect this and
doesn’t handle it so it cancels writes and deletes this partial file.
Looking at the code, I saw it doesn’t handle ERESTARTSYS in any
different way compared to other write errors. Shouldn’t it try to write
same data again for a couple of times before finally giving up and
erroring out? Do you have any suggestions on how to resolve this? I’m
using latest stable nginx.
It more looks like a bug in cephfs. writev() should never return
ERESTARTSYS.
I’ve talked to the ceph people, they say ERESTARTSYS shows up in strace
output but it is handled by the kernel and that writev(2) is interrupted
by
the SIGALRM, which actually appears in the strace output just after
writev
fails.
I also failed to get this error by doing the same this as nginx using
dd, dd
always succeeded so it happens due to combination of nginx and cephfs.
Here’s full strace output (2 examples from 2 differently configured
servers):
“ngx_write_fd() is just a write(), which, when interrupted by SIGALRM,
fails with EINTR because SA_RESTART is not set. We can try digging
further, but I think nginx should retry in this case.”
This is the root cause of interrupts. Every 50ms it signals nginx
and can interrupt any interruptible syscall (writing to file is
usually not, but it seems different for Cephfs).
You should avoid using timer_resolution, or try this patch: