baavgai | Skin challenged cats and such

Livejournal has a public feed where voyeurs and the merely bored can watch all the data posted to the site. One page I know of that uses this feed is here. It shows you the last 40 images posted to LJ. This is not for the faint of heart, people post some truly unpleasant stuff. But, by and large, they post things they simply want to share with others, and much of that is amusing even without the context of the poster.

While I don't think I'm particularly voyeuristic, I do enjoy opening a closed box for the simple joy of discovery as much as anyone. I found myself visiting the page enough that I needed to know; how'd they do that?

Turns out there's an image only XML page found on LJ, meaning that the RSS feed doesn't even have to be parsed. Making an image viewer should be easy.

My first thought was AJAX. Alas, the XMLHttpRequest object doesn't like to span domains, so the idea of a standalone browser page is shot. I could proxy the feed in, but at that point I'm doing server side script anyway, so...

I wanted perl, but the XML library depencies were tourturing me. PHP libraries seemed more standard and did what I needed. So, the PHP 5 version looks like so:


<?php 
$xsltProc = new XsltProcessor();
$xml = DOMDocument::loadXML( file_get_contents("http://www.livejournal.com/stats/latest-img.bml") );
$xsl = new DomDocument;
$xsl->loadXML('<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/"><xsl:apply-templates select="//recent-image"/></xsl:template>
<xsl:template match="recent-image"></xsl:template>
<xsl:template match="recent-image[count(preceding::recent-image)&lt;20]">
<p><a href="{@url}"><img src="{@img}" alt="feed"/></a></p>
</xsl:template>
</xsl:stylesheet>');
$xsltProc->importStyleSheet($xsl);
?>
<html><head><title>Live Journal Image Feed</title></head><body>
<h1>Live Journal Image Feed</h1>
<?php 
print $xsltProc->transformToXML($xml);
unset($xsltProc);
?>
</body></html>

Of course, thats the PHP5 version, the PHP4 version that the Redhat box supports is different and ultimately lacks some libraries in the base. As I started search RPMs, I got annoyed. Why, after all this time, is XML support so damn sketchy? I should be able to do this with a shell script...

So, mind filled with *nix propaganda, I got this:


#!/bin/sh

echo Content-type: text/html
echo
echo "<html><head><title>Live Journal Image Feed</title></head>"
echo "<body><h1>Live Journal Image Feed</h1>";

wget -q http://www.livejournal.com/stats/latest-img.bml -O - | 
grep '<recent-image' |
head -n 45 |
sed 's/^.*recent-image img=.\(.*\). url=.\(.*\). .*/\1\t\2/' |
awk 'BEGIN {FS="\t"; }{ print "<p><a href=\"" $2 "\"><img src=\"" $1 "\" alt="feed"/></a></p>" }'
echo "</body></html>";

I have no doubt a script guru will explain how this and more could be donein awk alone, but my awk is rusty and pearl had already pissed me off. The regex is awful, because the command line didn't want another single quote, but it mostly works.

Still, it's brittle, making assumptions on XML data that may not be true. Strangly, there is a program called xsltproc that comes with a standard parsing library and runs fine on most distros. It's flaw is that it wont process stdio, but I can live with that.


#!/bin/sh

echo Content-type: text/html
echo
TMPFILE=`mktemp /var/tmp/ljfeed.XXXXXXXXXX` || exit 1
wget -q http://www.livejournal.com/stats/latest-img.bml -O $TMPFILE
xsltproc --nonet --nowrite ljfeed.xsl $TMPFILE
rm -f $TMPFILE

It works well. It originally used the XSL style sheet I embedded in the first script, but I modified that sheet a little to suit. It now looks like so:


<?xml version="1.0"?>

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
        <xsl:template match="/">
                <html><head><title>LJ Image Feed</title></head>
                        <body>
                                <h1>Live Journal Image Feed</h1>
                                <a name="#top"/>
                                <hr/>
                                <xsl:apply-templates select="//recent-image"/> 
                                <hr/>
                        </body>
                </html>
        </xsl:template>
        <xsl:template match="recent-image"></xsl:template>
        <xsl:template match="recent-image[count(preceding::recent-image)&lt;20]">
                <p>
                        <a href="{@url}">
                                <img src="{@img}" alt="feed"/>
                        </a>
                        <br/>
                        <a href="#top" onclick="javascript:history.go(0)">Refresh</a>
                </p>
        </xsl:template>

Well, that's my lazy Saturday of scripting. Funny how the simple problems can spawn so many answers. Or, how, with such problems we're often driven to ferret out those answers.

Random Rambling Blog Thingy

Skin challenged cats and such

Profile

Links

Most Popular Tags

Style Credit

Expand Cut Tags