My blog software

Published: 2012-03-28T21:13Z
Tags: blog
License: CC-BY

In this post I will explain the software behind my blog, since several visitors have asked about it. But I will have to disappoint those of you hoping for fancy Haskell code: it is written in PHP. So no pandoc, no hakyll, no happstack and no Yesod.

Some reasons for picking php are:

From files to blog posts

The code is inspired by blosxom, which I used before. Blog posts, as well as most other pages on the website, are stored in text files. The script blog.php calls a function Resolver::find_all_pages, which does a simple directory listing to find all the .txt and .lhs files in the blog subdirectory.

For me this makes it very easy to write a new blog post. I just fire up a text editor, and save the file under the right name. I can then view it on localhost. To publish to the rest of the internet I copy the file to the webserver.

The parser for the text files is also quite simple. First comes a header block, with lines like

title: My blog software
tags: blog, bananas
date: 2012-03-28 23:13 CEST

After that is the page body, using a markup syntax strongly inspired by MediaWiki.

-- This is a header --
Body text with ''italic'', @embeded haskell@, <a href="#">embeded html</a>

> haskell_code = 1 + 1
>   where more haskell = undefined

]> also Haskell, but ignored by the literate haskell preprocessor

To parse the markup, I use a state machine. I loop over the lines, and determine the line's type. For example, lines starting with "> " indicate Haskell code, -- (.*) -- indicates a header, etc. I then just output the appropriate html code for that line. The state machine comes in when merging multiple lines of code into one <pre> tag. I just keep a variable with the last used open tag. If the previous line uses the same open tag, then do nothing, otherwise insert the close tag for that state, and the open tag for the new state. The details are in WikiFormat.php.

Then the source code itself. It needs to be turned into fancy syntax highlighted html. That is done with a simple hand-written lexer. Lexing is surprisingly easy if you have access to a build in regular expression library. Just repeatedly look for the first match after the current index for a set of possible token regexes.

There are some backdoors in the lexer, to allow arbirary html inside code blocks, so

]> !!!<span style="background:red">wrong</span>!!!

gets rendered as

wrong

This sometimes comes in handy when writing blog posts. Usually I add these backdoors as they are needed.

One issue is that all this on-the-fly parsing can be a bit slow. For that I use a cache. I just capture the entire rendered page, and save it in a file. Then before rendering, and in fact before even loading the file, I check if the cache is up to date. If it is, output the cache contents and exit.

Finally the comments, which again use a simple hand-written solution. I just store the comments for each post in a single text file. New comments are appended at the end. The comments use the same markup parser as the article bodies. The most annoying part of the comment system is actually the spam filter. I have a blacklist of words and urls that are not allowed, and a script for retroactively removing spam posts. But some spam does get through.

That's it. The code is on github, if anyone is interested.

Comments

Steven ShawDate: 2012-04-02T01:15Zx

You feed appears to be broken:

Notice: Undefined property: DirFeed::$show_comments in /home/deb40374/domains/twanvl.nl/public_html/lib/Cache.php on line 23

Warning: Cannot modify header information - headers already sent by (output started at /home/deb40374/domains/twanvl.nl/public_html/lib/Cache.php:23) in /home/deb40374/domains/twanvl.nl/public_html/lib/Feed.php on line 37 http://twanvl.nl/blog 2012-03-28T21:13:00Z Twan van Laarhoven blog@twanvl.nl http://twanvl.nl/blog/2012-03-28-blog-software 2012-03-28T21:13:00Z

In this post I will explain the software behind my blog, since several visitors have asked about it. But I will have to disappoint those of you hoping for fancy Haskell code: it is written in PHP. So no pandoc, no hakyll, no happstack and no Yesod.

Some reasons for picking php are:

  • PHP works pretty m...

Reply

(optional)
(optional, will not be revealed)
Name a function of type [[a]] -> [a]:
Use > code for code blocks, @code@ for inline code. Some html is also allowed.