Serving files from a slow NFS storage

addis_a · December 29, 2014, 9:52pm

Hi all,

In our production environment, we have several nginx servers running on
Ubuntu that serve files from a very large (several PBs) NFS mounted
storage.

Usually the storage responds pretty fast, but occasionally it can be
very
slow. Since we are using aio, when a file read operation runs slow, it’s
not
that bad - the specific HTTP request will just take a longer time to
complete.
However, we have seen cases in which the file open itself can take a
long
time to complete. Since the open is synchronous, when the open is slow
all
active requests on the same nginx worker process are delayed.
One way to mitigate the problem may be to increase the number of nginx
workers, to some number well above the number of CPU cores. This will
make
each worker handle less requests and therefore less requests will be
delayed
due to a slow open, but this solution sounds far from ideal.
Another possibility (that requires some development) may be to create a
thread that will perform the task of opening the file. The main thread
will
wait on the completion of the open-thread asynchronously, and will be
available to handle other requests until the open completes. The child
thread can either be created per request (many requests probably won’t
even
need to open a file, thanks to the caching of open file handles), or
alternatively some fancier thread pooling mechanism could be developed.

I’d love to hear any thoughts / ideas on this subject

Thanks,

Eran

Posted at Nginx Forum:

erankor2 · December 29, 2014, 10:17pm

https://blog.yrden.de/2013/10/13/setting-up-nfs-cache-on-debian.html

Or setup a debian VM which mounts your storage and where nginx connects
to.

Posted at Nginx Forum:

erankor2 · December 29, 2014, 10:36pm

Without knowing everything in the mix my first thought would be the
NFS head node is being tapped out and can’t keep up. Generally you’d
solve
this with some type of caching, either at a CDN level or you could look
at
the SlowFS module. I’ve not checked to see if it still compiles against
the
current releases but if you’re dealing with short-life hot data or a
consistent group of commonly accessed files, either solution would make
a
significant impact in reducing NFS load without having to resort to
other
potentially dodgy solutions.

__________________Scott LarsonSystems AdministratorWiredrive/LA310 823
8238 ext. 1106310 943 2078 faxwww.wiredrive.com
http://www.wiredrive.com/www.twitter.com/wiredrive
http://www.twitter.com/wiredrive Wiredrive
http://www.wiredrive.com/facebook

erankor2 · December 30, 2014, 8:35am

Thank you all for your replies.

Since all 3 replies suggest some form of caching I’ll respond to them
together here -
The nginx servers that I mentioned in my post do not serve client
requests
directly, the clients always hit the CDN first (we use mostly Akamai),
and
the CDN then pulls from these nginx servers. In other words, these
servers
act as the CDN origin. Therefore, hot / popular content is already taken
care of - I have no problem there.
Since the files we serve are large (video) the CDN isn’t caching them
for
too long (we send caching header of 3 months, and the files usually get
cached for a couple of days), so the servers are getting quite a few
requests, and these requests hardly repeat themselves. Each server is
delivering roughly 1/2TB of data per day, so to get any hits on an NFS
cache
we’ll probably need a very large cache. And even if do that, we’ll still
have this problem with the non-popular content (e.g. videos that are
watched
on average once a week) - such a request may hang the process if opening
the
file takes a long time.

Thanks,

Eran

Posted at Nginx Forum:

erankor2 · December 30, 2014, 9:02pm

On Tue, 2014-12-30 at 02:34 -0500, erankor2 wrote:

too long (we send caching header of 3 months, and the files usually get
Eran
I’m a bit confused here. Are you saying that the CDN is pulling from
NFS? If so, then surely the solution is under your control… deliver
all this content from a single server. If the web servers never deliver
it, then mount this content via NFS on them so they know it exists, but
no more.

Steve

–
Steve H. BSc(Hons) MIITP

Linkedin: http://www.linkedin.com/in/steveholdoway
Skype: sholdowa

erankor2 · May 13, 2015, 2:08pm

An update on this - I ended up implementing support for asynchronous
file
open, based on the thread pool feature that was added in nginx 1.7.11.
I copied nginx’s ngx_open_file_cache.c (from 1.9.0) and made it
asynchronous, source code is here:

github.com

kaltura/nginx-vod-module/blob/master/ngx_async_open_file_cache.c


/*
 * Copyright (C) Igor Sysoev
 * Copyright (C) Nginx, Inc.
 */


#include <ngx_config.h>
#include <ngx_core.h>
#include <ngx_event.h>

#if (NGX_THREADS)

#include "ngx_async_open_file_cache.h"

/*
 * open file cache caches
 *    open file handles with stat() info;
 *    directories stat() info;
 *    files and directories errors: not found, access denied, etc.

This file has been truncated. show original

(you can diff it with ngx_open_file_cache.c to see the changes)

If there are any nginx core developers on this thread - I would really
love
to see this feature make its way to the nginx core, so that I won’t have
this code duplication with the builtin ngx_open_file_cache. The feature
was
very thoroughly tested for any race conditions etc., test script is
here:

github.com

kaltura/nginx-vod-module/blob/master/test/test_open_file_cache.py

from __future__ import print_function
from test_base import *
from threading import Thread
import commands
import urllib2
import random
import time
import sys
import pwd
import grp
import os

'''
config:
-------
use the associated test_open_file_cache.conf, also recommended to run some tests (no need for the whole matrix) 
with each of the following combinations:
* comment out open_file_cache (no file caching)
* change open_file_cache_min_uses to 2
* change open_file_cache to max=10 inactive=2;

This file has been truncated. show original

Thanks,

Eran

Posted at Nginx Forum:

erankor2 · December 30, 2014, 12:54am

On Mon, 2014-12-29 at 15:52 -0500, erankor2 wrote:

time to complete. Since the open is synchronous, when the open is slow all
need to open a file, thanks to the caching of open file handles), or
alternatively some fancier thread pooling mechanism could be developed.

I’d love to hear any thoughts / ideas on this subject

Thanks,

Eran

As a generic SysAdmin, I would say the first place to start is to look
into installing something like cachefs, which will keep local copies of
the remote files, so once filled, the problem should go away.

There are also options to tune kernel and mount options that can help a
bit.

I would/do use this approach in preference to the (still too new IMO)
alternatives like GlusterFS.

In my experience, serving any volume of files over NFS will always be a
bottleneck.

Cheers,

Steve

–
Steve H. BSc(Hons) MIITP

Linkedin: http://www.linkedin.com/in/steveholdoway
Skype: sholdowa