hacker emblem
Search | Tags | Photos | Flights | Gas Mileage | Log in


Started: 2013-03-10 20:17:40

Submitted: 2013-03-10 20:44:38

Visibility: World-readable

In which the intrepid narrator does his part to keep the Internet free of the scourge of link rot

I don't believe in breaking links (although sometimes it feels like I'm the only person on the Internet who believes this), so when Kiesa migrated from Serendipity to Wordpress I resolved to set up a robust set of redirects. I couldn't use the Redirect directive directly because Serendipity generated urls that put the per-entry arguments in the query string (in the form "index.php?/archives/1234-something.html"), and Apache treated the filename as distinct from the query string and made it harder to match the query string. So I ended up using mod_rewrite, which let me match the query string inside a RewriteCond statement. Here's the relevant part of my Apache configuration for redirecting the RSS and Atom feeds (inside a VirtualHost):

RewriteEngine   on

# First, a rule to redirect the feeds (both rss and atom) to the RSS
# feed provided by Wordpress
RewriteCond     %{QUERY_STRING} ^/feeds/
RewriteRule     ^/serendipity/index.php http://kiesa.festing.org/wordpress/feed/? [R=301,L]

Kiesa used RSS to import all of her posts from Serendipity to Wordpress, and Wordpress picked up the guids used by Serendipity (which intelligent RSS readers will use to uniquely identify content), so once I set up the redirect Google Reader picked up the new feed, used the guids to tell what was old and what was new, and showed me only the new content in the feed.

Apache's default redirect code is 302, moved temporarily; since this is a permanent move I elected to use code 301, moved permanently, just in case an RSS reader wanted to update its link, but so far the web logs suggest that Google Reader is still fetching the old url and being redirected.

Redirecting the individual blog posts was a bit more complicated; their urls were in the form "index.php?/archives/1234-something.html", so I needed first to match the post id in the query string, then look up the old post id in some sort of database to resolve the new url. I am apparently not the first person to ever have this problem, so Apache has a handy directive that lets me import a space-separated text file and use it as a database. Wordpress discarded the old post ids but I was able to match the old post ids to the new urls using the new Wordpress RSS feed -- which (you may recall) kept the old guids, which conveniently used the post id. So I wrote a Perl script to parse the RSS output, cross-reference the old guids to the new urls, and write a text file that Apache could load in the RewriteMap directive. I matched the post id in a RewriteCond directive, and looked up the new url in the RewriteRule directive -- using "%1" to backreference to the regular expression in the RewriteCond directive.

# Next, a rule to redirect individual posts from Serendipity to
# Wordpress. This is accomplished with a text file that maps the
# Serendipity ids to the new Wordpress urls.
RewriteMap      ids txt:/home/kiesa/serendipity-id-map.txt
RewriteCond     %{QUERY_STRING} archives/(\d+)
RewriteRule     ^/serendipity/index.php ${ids:%1|http://kiesa.festing.org/wordpress/}? [R=301,L]

The last rule was easy: Redirect everything else (that didn't match any other rule) straight to the root of the Wordpress install.

# Finally, a default rule for everything else; just redirect to the
# root of the Wordpress site
RewriteRule     ^/serendipity/index.php http://kiesa.festing.org/wordpress/? [R=301,L]

I'm not sure anyone else really cares, but I at least know I did my part to prevent link rot -- and dig into some neat features of Apache I didn't know existed.