Question about proxy_store

I am having problems with proxy being unable to store pages that do
not have a file extension (such as a directory or “nice url”).

  1. User requests http://domain.com/page/hello/
  2. nginx looks in the root, can not find the page.
  3. nginx uses the error page, which then calls a proxy pass inside a
    location
  4. nginx fetches the page.
  5. nginx cannot save the results, because it is /page/hello/
    nginx does create the /page/hello/ directory inside the proxy_store
    directory though.

It could very well be a problem with my configuration. It works for
urls with file extensions.

    location ^~ /fetch/ {
        internal;
        include  proxy.conf;
        proxy_set_header  X-Real-IP $remote_addr;
        proxy_pass http://127.0.0.1:9001/;
        proxy_store /var/www/data/domain.com$uri;
        proxy_store_access user:rw group:rw all:r;
    }

    location / {
        root /var/www/data/doman.com/fetch;
        error_page 404 = /fetch$uri;
    }

Hi Cactus!

On Sat, 2008-01-26 at 19:18 -0800, eliott wrote:

I am having problems with proxy being unable to store pages that do
not have a file extension (such as a directory or “nice url”).

Have you tried with just

proxy_store on;

rather than explicitly specifying a path? Unfortunately I don’t have a
way to test this at the moment, but it’s worth a try.

Cliff

On 1/27/08, Cliff W. removed_email_address@domain.invalid wrote:

Hi Cactus!

Hello! :slight_smile:

Have you tried with just

proxy_store on;

rather than explicitly specifying a path? Unfortunately I don’t have a
way to test this at the moment, but it’s worth a try.

I haven’t tried that.
I will give it a shot tomorrow and see if it changes the behavior at
all.

That might work with some tweaking.

My problem is that the upstream cache doesn’t have a target for a
http://domain.com/foo/index.html request, just the
http://domain.com/foo/
(django app with the django url routing)

Maybe I can add another rewrite clause in the /fetch/ location, to
strip off an ending index.html before the request hits the proxy.

On Sat, Jan 26, 2008 at 07:18:26PM -0800, eliott wrote:

        proxy_store /var/www/data/domain.com$uri;
        proxy_store_access user:rw group:rw all:r;
    }

    location / {
        root /var/www/data/doman.com/fetch;
        error_page 404 = /fetch$uri;
    }

You may try

      location / {
          root /var/www/data/doman.com/fetch;
          set     $fetch   /fetch$uri;
          error_page 404 = $fetch;
      }

      location ~ /$ {
          index   index.html;
          root /var/www/data/doman.com/fetch;
          set     $fetch   /fetch${uri}index.html;
          error_page 403 404 = $fetch;
      }

      location ^~ /fetch/ {
          internal;
          include  proxy.conf;
          proxy_set_header  X-Real-IP $remote_addr;
          proxy_pass http://127.0.0.1:9001/;
          proxy_store /var/www/data/domain.com$fetch;
          proxy_store_access user:rw group:rw all:r;
      }

On Sun, Jan 27, 2008 at 10:31:54PM -0800, eliott wrote:

That might work with some tweaking.

My problem is that the upstream cache doesn’t have a target for a
http://domain.com/foo/index.html request, just the
http://domain.com/foo/
(django app with the django url routing)

Then:

       location / {
           root /var/www/data/doman.com/fetch;
           error_page 404 = /fetch$uri;
       }

       location  /fetch/ {
           internal;
           include  proxy.conf;
           proxy_set_header  X-Real-IP $remote_addr;
           proxy_pass http://127.0.0.1:9001/;
           proxy_store /var/www/data/domain.com$uri;
           proxy_store_access user:rw group:rw all:r;
       }

       location ~ /$ {
           index   index.html;
           root /var/www/data/doman.com/fetch;
           error_page 403 404 = @fetch;
       }

       location @fetch {
           internal;
           include  proxy.conf;
           proxy_set_header  X-Real-IP $remote_addr;
           proxy_pass http://127.0.0.1:9001;
           proxy_store 

/var/www/data/domain.com/fetch${uri}index.html;
proxy_store_access user:rw group:rw all:r;
}

eliott wrote:

I am having problems with proxy being unable to store pages that do
not have a file extension (such as a directory or “nice url”).

  1. User requests http://domain.com/page/hello/
  2. nginx looks in the root, can not find the page.
  3. nginx uses the error page, which then calls a proxy pass inside a location
  4. nginx fetches the page.
  5. nginx cannot save the results, because it is /page/hello/
    nginx does create the /page/hello/ directory inside the proxy_store
    directory though.
     location /{
             proxy_pass http://somewhere.com/;
             proxy_set_header X-Real-IP $remote_addr;
             proxy_store /home/arcade/www2/$uri#;
             proxy_set_header Host $host;
     }

Trick is to add some symbol to any path making any url a valid file
link. If we do select symbol which would not come in uri at any case we
can use as escape symbol.

Works for me.

On Mon, Jan 28, 2008 at 10:55:19AM +0200, Volodymyr K. wrote:

nginx does create the /page/hello/ directory inside the proxy_store
link. If we do select symbol which would not come in uri at any case we
can use as escape symbol.

Works for me.

I do no think that http://domain.com/page/hello/%23 is “nice URL”.

On Mon, 2008-01-28 at 12:36 +0300, Igor S. wrote:

  1. nginx fetches the page.

Trick is to add some symbol to any path making any url a valid file
link. If we do select symbol which would not come in uri at any case we
can use as escape symbol.

Works for me.

I do no think that http://domain.com/page/hello/%23 is “nice URL”.

But how will the browser ever see the %23 in this case? I assume it
doesn’t, so does it make any difference how “nice” it is?

Regards,
Cliff

I actually tested both methods presented above, and got them both to
work.

########
method 1 was copied nearly verbatim from Igor’s example.
Thanks Igor!

    location / {
      root /var/www/data/example.com/fetch;
      error_page 404 = /fetch$uri;
    }

    location  ^~ /fetch/ {
      internal;
      include  proxy.conf;
      proxy_set_header  X-Real-IP $remote_addr;
      proxy_pass http://127.0.0.1:9001;
      proxy_store /var/www/data/example.com$uri;
      proxy_store_access user:rw group:rw all:r;
    }

    location ~ /$ {
      index   index.html;
      root /var/www/data/example.com/fetch;
      error_page 403 404 = @fetch;
    }

    location @fetch {
      internal;
      include  proxy.conf;
      proxy_set_header  X-Real-IP $remote_addr;
      proxy_pass http://127.0.0.1:9001;
      proxy_store /var/www/data/example.com/fetch${uri}index.html;
      proxy_store_access user:rw group:rw all:r;
    }

########
Method 2 required some minor adjustment
Thanks Volodymyr!

    location / {
      root /var/www/data/example.com/fetch;
      if (-f $request_filename/_cache.html) {
          rewrite (.*) $1/_cache.html;
      }
      error_page 403 404 = /fetch$uri;
    }

    location  ^~ /fetch/ {
      internal;
      include  proxy.conf;
      proxy_set_header  X-Real-IP $remote_addr;
      proxy_pass http://127.0.0.1:9001;
      proxy_store /var/www/data/example.com${uri}_cache.html;
      proxy_store_access user:rw group:rw all:r;
    }

########

I am not sure which method is better. The first method results in a
proper site structure that you could tar up and move to another box,
or serve from another webserver, and it would be functional.

The second method is shorter and simpler, while it is tied more
closely to custom rewrite rules and an odd file extension.

I think for now I want the first method, for the ability to be more
highly portable.
I can rsync the output around if I want to.

Thanks for all the help in this thread.
It was a nice first experience on the mailing list for me.
:slight_smile:

Igor S. ÐÉÛÅÔ:

  1. nginx fetches the page.
    Trick is to add some symbol to any path making any url a valid file
    link. If we do select symbol which would not come in uri at any case we
    can use as escape symbol.

Works for me.

I do no think that http://domain.com/page/hello/%23 is “nice URL”.

This seem to work anyway. The file is stored on the local disk and and
in the remote logs i see response 200 turned to 304. The browser is
unaffected and keep displaying the original url.