URL decode rewrite handling: + vs. %20

Ubuntu Natty
nginx/0.8.54
PHP 5.3.5 / FPM / FastCGI

I’m just beginning to work with nginx for the first time. Converting my
home server (very few hits) as an experiment for possible broader-scale
testing and implementation on work servers (billions of hits per
month).

Everything has gone beautifully… up until I begin working on an older
app: Gallery 2. Some of the photos in my Gallery contain spaces, so
they’re being encoded in the old-but-still-accepted + format. However,
when I make the requests in nginx, it’s passing the pathinfo through to
fastcgi as an escaped + (%2b) instead of a space (+ or %20).

The path is something like this:

http://www.example.com/gallery/foo+bar.jpg.html
which gets rewritten to
http://www.example.com/gallery/main.php?g2_path=foo+bar.jpg.html

In a path, + should be handled literally per RFC. In a query string,
it’s supposed to be interpreted as a space per x-www-form-urlencoded.
This is the disparity - what once was path is now query string. Because
of the conversion, Apache translates the + from literal string to
encoded space during rewrite. nginx simply encodes the literal string.

I don’t know if there’s a solid answer as to which one is strictly
“correct”… but what I do need to know is if there’s a way to achieve
the same behavior in nginx?

This is an interesting edge case…

Posted at Nginx Forum:

edit the php script to do something like this before the rest of the
script

$_GET[‘g2_path’] = urldecode($_GET[‘g2_path’])

?

splitice Wrote:

edit the php script to do something like this
before the rest of the script

$_GET[‘g2_path’] = urldecode($_GET[‘g2_path’])

?

That’s not the problem - query parameters are automatically decoded by
PHP before processing even begins. nginx is turning a + (space) into a
%2b (plus). Once it’s handed off to the application, it can’t tell the
difference between a filename that actually had a +, and one that was
converted by nginx. It needs to happen before it’s handed off to
fastcgi.

I’ve already modified G2 to present URLs which are more accurately
encoded, but that doesn’t resolve the issue since these URLs have been
published for years. They’re in search engines, blog posts, etc.

Posted at Nginx Forum:

You can also use PCRE capturing inside virtual host configuration, to
capture both SCRIPT_FILENAME and PATH_INFO without URL encoding:

location ~ ^(?<SCRIPT_FILENAME>.+.php)$ {
include fastcgi_params;
fastcgi_param SCRIPT_FILENAME
$document_root$SCRIPT_FILENAME;
fastcgi_pass …;
}

location ~ ^(?<SCRIPT_FILENAME>.+.php)(?<PATH_INFO>.+)$ {
include fastcgi_params;
fastcgi_param SCRIPT_FILENAME
$document_root$SCRIPT_FILENAME;
fastcgi_param PATH_INFO $PATH_INFO;
fastcgi_param PATH_TRANSLATED
$document_root$PATH_INFO;
fastcgi_pass phpfarm;
}

Posted at Nginx Forum: