Request for thoughts / feedback: Guide on Nginx Monitoring

addis_a · February 12, 2015, 11:11pm

Hi everyone - first off, many thanks for the wealth of knowledge on the
forum / mailing list. I’ve been learning the nitty gritty of Nginx over
the
past few months this has been a hugely valuable resource.

I’ve put together a guide on monitoring production Nginx systems (along
with
some background information on metrics and system variables that goes a
bit
deeper than the official docs.) I’d love to get some feedback on things
I’ve gotten wrong / right / sorta right. Any feedback at all is
appreciated.

Also - hope this info is helpful to someone in some context either now
or in
the future.

https://www.scalyr.com/community/guides/how-to-monitor-nginx-the-essential-guide
https://www.scalyr.com/community/guides/an-in-depth-guide-to-nginx-metrics

–Noah

Posted at Nginx Forum:

noahlh · February 13, 2015, 8:35am

Hi Noah,

thanx for your guides; interesting read.

for everyone else:

there bis a nagios-plguin to monitor the stub/status - outputs:
https://bitbucket.org/maresystem/dogtown-nagios-plugins/overview

beside monitoring it also extracts all date from the status page and
returns
them as performance-data for graphing and as sources for
warning/critival -
notifications

Performancedata:

NginxStatus.Check OK | ac=1;acc=64; han=64; req=64; err=0; rpc=1;

rps=0;
cps=0; dreq=1;
dcon=1; read=0; writ=1; wait=0; ct=6ms;

    ac      -> active connections
    acc     -> totally accepted connections
    han     -> totally handled connections
    req     -> total requests
    err     -> diff between acc - han, thus errors
    rpc     -> requests per connection (req/han)
    rps     -> requests per second (calculated) from last checkrun

vs
actual values
cps → connections per (calculated) from last checkrun vs
actual
values
dreq → request-delta from last checkrun vs actual values
dcon → accepted-connection-delta from last checkrun vs
actual
values
read → reading requests from clients
writ → reading request body, processes request, or writes
response to a client
wait → keep-alive connections, actually it is ac - (read +
writ)
ct → checktime (connection time) for this check

cheers,

mex

Posted at Nginx Forum:

noahlh · February 19, 2015, 8:25pm

Thanks for the kind words mex – also thank you for the nagios plugin
info
– adding that to my monitoring toolbelt.

Some more specific questions for you and others re: the guide I wrote:

Are there any “essential” metrics I’m missing? I listed the 14 that I
think are the most critical.
Are there any Nginx features related to monitoring that I haven’t
covered
(and should cover)?
I want to make sure I also have good coverage for monitoring potential
“breaking” issues with Nginx – things that can overflow / top out.
I’ve
got open file handles as one of them, and the standard server-related
stuff
(cpu, memory, disk, bandwidth). Anything Nginx-specific I’m missing
that
are critical and could cause the application to sputter?

Many continued thanks.

–Noah

Posted at Nginx Forum: