How to do substitions (like perl s/// operator) in rewrites?

I recently migrated a web site from Apache to Nginx (and from CMS-system
x to CMS system y)
Almost all of the rewrites are OK in the new CMS, except for one older
class of article URLs:
http://example.com/News/Articlepage-News/This-is-the-best-News-EVER.htm
(example)

The direct mapping for these literal urls to the new urls is lost.
I do however have a file (200k+ entries) that maps the titles to
article-id’s like so:

old-class-urls.txt: (first two commented lines are not actually in the

file)

all lowercase title without spaces and hyphens article-id

thisisthebestnewsever 123456;

In apache I used a rewritemap to rewrite these urls:

httpd.conf:


RewriteMap articles prg:/etc/httpd/rewrites/old-class-article-urls.pl

old-class-article-urls.pl:

#!/usr/bin/perl

$| = 1;

###############################################

code to be executed at startup of webserver

###############################################

open(TEXTFILE,";
close TEXTFILE;

foreach $line (@lines) { # load the
data in an associative array for fast lookup
($keyword,$article_id) = split(/\s+/,$line);
$keys{$keyword} = $article_id;
}

##########################################

code to be once every URL is requested

##########################################

while () {
$url = $_;
chomp($url);
if($url =~ //([^/]+).htm$/) { # a match
could be made
$keyword = lc($1);
$keyword =~ s/-//g;
if($keys{$keyword}) {
print ‘old-class-url.php?articleid=’ . $keys{$keyword}.“\n”;
next;
}
}
print “Not found\n”; # no match
could be made
}

I was hoping something similar or even easier could be done with Nginx:

map $uri $old-class-url {
include /etc/nginx/rewrites/old-class-urls.txt;
}

server {

location ~* ^/News/Articlepage-News/.htm {
rewrite ^/News/Articlepage-News/(.
)htm $1 ;

change $uri to lowercase and remove the hyphens…

I am looking for something equivalent like in perl:

s/-//g;

s/.*/\L{$1}/;

    if ($old-class-url) {
            rewrite  ^ 

/old-class-url.php?articleid=$old-class-url permanent;
}
}
}

I have seen questions about similar functionality in the forums, but not
with a solution:

Is it possible to solve this issue this way or do you recommend a
different solution?

Thanks in advance,
Bart

Posted at Nginx Forum:

hi

the map module is case insensitive, it uses ngx_hash_strlow to produce
the key internally so this is not a problem

for the hypens problem i would generate a map file with two keys for
each article id, one with and one without hyphens like:

title-1 1
title1 1

cheers, bernd

Posted at Nginx Forum:

Hello!

On Sun, Jan 10, 2010 at 09:08:48AM -0500, bartschipper wrote:


[…]

I am looking for something equivalent like in perl:

s/-//g;

s/.*/\L{$1}/;

There is no easy way to do this without perl as of now. With
embedded perl it’s trivial though.

Maxim D.

Thank you for your responses!

I think Maxim is right and embedded Perl is the way to go.

Bernd’s idea would work if there was an easy way to generate a map file
with hyphens from the source file.
However the source file has 200k+ entries that look like this:

thisisthebestnewsever 123456;
expertsadvicetoeatmorefish 123457;

I can not think of an automated way to convert this to:

this-is-the-best-news-ever 123456;
thisisthebestnewsever 123456;
experts-advice-to-eat-more-fish 123457;
expertsadvicetoeatmorefish 123457;

Knowing that the map module is case-insensitive may safe me from
confusion in the future.

Thanks again,
Bart

Posted at Nginx Forum:

Issue solved!

I followed Maxim’s advice and compiled embedded perl in. However, I
found the configuration not as trivial as Maxim claimed: embedded perl
documentation is scarce and examples are rare.

The following configuration solved it for me and it may serve as an
example for others of using the map module together with embedded perl:

http {

use the map module to include a list with keys and corresponding

article-ids
map $uri $old-class-url {
include /etc/nginx/rewrites/old-class-urls.txt; # (see the first
post for a sample of the contents)
}

lower case uri’s file name part and remove dashes with perl. Return

result to nginx variable $old_uri
perl_set $old_uri ‘sub {
my $r = shift;
my $uri = $r->uri;
if($uri =~ //([^/]+).htm$/) {
$uri = lc($1);
$uri =~ s/-//g;
}
return $uri;
}’;

server {

location ~* ^/News/Articlepage-News/.*htm$ {

rewrite $uri to variable $old_uri

  rewrite ^ $old_uri ;

return the article-id in the rewrite if $uri appears in the map

  if ($old-class-url) {
    rewrite  ^    /old-class-url.php?articleid=$old-class-url 

permanent;
}
}

}
}

I welcome any suggestion for a simpler solution.

Bart Schipper

Posted at Nginx Forum: