Hi,
I stuck in analyzing this and can not repeat in a test environment.
We see ‘/’ location “freeze” from time to time as nginx stops updating
cache. This lasts for 30 minutes - an hour.
Regular requests flow:
HIT
…
HIT
UPDATING
UPDATING
EXPIRED (every 30 seconds)
HIT
…
So under normal conditions a page is served from the cache, cache record
expires, next request goes to upstream locking the cache record,
requests
still served from cache with UPDATING cache status while UPDATING
request
is working, on return from the upstream cache is updated, unlocked and
EXPIRED cache status show up in the log. Back to step one - serving from
cache.
Abnormal flow we see under nginx 1.6.x
HIT
…
HIT
UPDATING
STALE
UPDATING
UPDATING
STALE
UPDATING
STALE
STALE
…
continues for a while
…
HIT
We first noticed this when attempted upgrade to 1.8.0 where situation
was
much worse:
HIT
…
HIT
UPDATING
UPDATING
UPDATING
UPDATING
…
continues until cache clean
…
We downgraded nginx back to 1.6, after some time realized that this
situation also happens on 1.6, but page doesn’t freeze forever.
I don’t say it only happens with ‘/’, it’s just the busiest page, we
probably don’t notice other pages. On 1.8 this happened also on top
level
sections /football/. I feel it’s related to amount of pressure on the
page.
Our error log has “ignore long locked inactive cache entry” alerts, but
I
really couldn’t match it to “defreeze” event. Access log has
STALE/UPDATING
requests between the alert and EXPIRED (cache updating) request.
Any help on hunting it down would be appreciated.