Rate limit with good bot IPs whitelisted

addis_a · November 22, 2014, 5:08am

I am trying to figure out if there is any way to rate limit all traffic
except Googlebot, msnbot, yandex and baidu bots. Here is what I have
started with:

Whitelisted IPs

geo $rate_limit_ip {
default $binary_remote_addr;
127.0.0.1 “”;
10.0.0.0/8 “”;
}

Rate limit

limit_req_zone $rate_limit_ip zone=publix:10m rate=10r/s;

I can add googlebot, msnbot, yandex and baidu IP ranges manually to the
whitelist, but that will make lookup table big. I am not sure whether
this
approach will work for high traffic like - 1200 requests/second
distributed
across 20 nginx hosts. Any ideas on such setup will be really helpful.

Also, can such host lookups be done in real-time for every request? I am
guessing that may not be efficient for each request, but I was wondering
if
there are any solutions.

Appreciate all your help.

thanks,
N

neubyr · November 22, 2014, 4:34pm

Yesterday Nov 21, 2014 at 20:07 neubyr wrote:

Rate limit

limit_req_zone $rate_limit_ip zone=publix:10m rate=10r/s;

It will not work as you expect.
Geo does not support variables in values.
You need something like this:
geo $whitelist {
default 0;
127.0.0.1 1;
…
}
map $whitelist $rate_limit_ip {
default $binary_remote_addr;
1 “”;
}

I can add googlebot, msnbot, yandex and baidu IP ranges manually to the
whitelist, but that will make lookup table big. I am not sure whether
this approach will work for high traffic like - 1200 requests/second
distributed across 20 nginx hosts. Any ideas on such setup will be
really helpful.

Nginx parses and loads this data into radix tree in memory on startup.

Also, can such host lookups be done in real-time for every request? I am
guessing that may not be efficient for each request, but I was wondering if
there are any solutions.

All variables are evaluated when they are used in request.

–
WNGS-RIPE

neubyr · November 22, 2014, 6:43pm

Thank you Oleksandr!!

On Sat, Nov 22, 2014 at 7:33 AM, Oleksandr V. Typlyns’kyi <
[email protected]> wrote:

  10.0.0.0/8 "";
 127.0.0.1 1;
 ...
}
map $whitelist $rate_limit_ip {
default $binary_remote_addr;
1 “”;
}

I am not sure how, but it’s working only with geo defining IP addresses.
I
can see HTTP 503 on client side and also ‘limiting requests, excess:
10.033
by zone’ in error logs. Nginx version: nginx/1.6.0

geo $rate_limit_ip {
    default $binary_remote_addr;
    127.0.0.1 1;
    10.0.0.0/8 1;
}

if

there are any solutions.

All variables are evaluated when they are used in request.

I was wondering if remote ip’s hostname lookup can be done before
rate-limiting it. For example, I don’t want to block IPs coming from
baidu.com. Can I do such IP-hostname lookup before rate-limiting? Will
it
efficient or what are other options?

Thanks again for detailed reply.

N

neubyr · November 22, 2014, 9:54pm

I was wondering if remote ip’s hostname lookup can be done before
rate-limiting it. For example, I don’t want to block IPs coming from

Nginx does not lookup remote hostnames at all.

Posted at Nginx Forum:

neubyr · November 25, 2014, 5:59pm

On Sat, Nov 22, 2014 at 12:53 PM, itpp2012 [email protected] wrote:

I was wondering if remote ip’s hostname lookup can be done before
rate-limiting it. For example, I don’t want to block IPs coming from

Nginx does not lookup remote hostnames at all.

GitHub - flant/nginx-http-rdns: Nginx HTTP rDNS module

Thank you! Looks interesting.

N

neubyr · November 22, 2014, 8:59pm

Today Nov 22, 2014 at 09:42 neubyr wrote:

 1       "";

    10.0.0.0/8 1;
}

You define key “$binary_remote_addr”(string, not variable) and clients
share one limit for all.

I was wondering if remote ip’s hostname lookup can be done before
rate-limiting it. For example, I don’t want to block IPs coming from
baidu.com. Can I do such IP-hostname lookup before rate-limiting? Will it
efficient or what are other options?

Nginx does not lookup remote hostnames at all.

–
WNGS-RIPE