Dumbest guestbook known to man

After years of being blissfully ignorant about this page, I finally decided to do something about it. I totally rewritten the ABOUT page that felt to me like a bad attempt at corpo speech, I returned and updated the LINKS, and implemented a GUEST BOOK as well as this BLOG page. While I write these lines, the BLOG is not really something I would considered to be in a functional state...more like good enough for the time being. As for the GUEST BOOK...I figured I might as well explain here how it works. If not for others, then at least for me to remember how that pile of PHP spaghetti code works. I'm going to assume that the reader is familiar with at least the most basic principles of HTML and PHP (basic forms, difference between POST and GET, PHP syntax, ...), but don't expect much, because the last time I touched anything with PHP was in high-school, which was over 6 years ago. I also won't go into CSS and webstyle styling as that's not something I feel comfortable around at this moment.

What really is a guestbook (philosophical, I know)? To answer that, we can pull on the most trustful source, the Wikipedia page:

A guestbook (also guest book, visitor log, visitors' book, visitors' album) is a paper or electronic means for a visitor to acknowledge a visit to a site, physical or web-based, and leave details such as their name, postal or electronic address and any comments. Such paper-based ledgers or books are traditional in churches, at weddings, funerals, B&Bs, museums, schools, institutions and other private facilities open to the public.

Fancy guestbookHow it feels to sign my name to a particually inspiring website.

Personally I would include sick webpages to that list, but that's just me. From this we can conclude that an electronic guestbook should provide means to: capture name, electronic address, and a comment (collectively called a signature from here on), but also record and persist this signature, indefinitely if possible. It should also prevent the user from recording more than a single signature, either indefinitely, or with a set timeout. Within the implemention done on this site, I chose to record a nickname, an optional website, and a message. The user can submit this information once every week. Why a week? Because I came up with it on the spot.

So how is this information captured? With a tragically unorganized PHP/HTML code. The HTML page has the following form:

<form action="guestbook.php" method="POST"> <label for="name">Nickname</label> <input type="text" id="name" name="name"> <label for="website">Website (optional)</label> <input type="text" id="website" name="website"> <label for="content">Message</label> <textarea id="content" name="content"></textarea> <input type="submit"> </form>

No rocket science to be found here, just a plain'ol HTML POST form sending a request with the information to guestbook.php. So what does the PHP script do? For that I need to zoom out a little bit and explain the general method of persistence and spam prevention in this system. If I should ask a typical developer on how to implement a simple persistence into their website, they may answer with a database like SQLite or MySQL, perhaps something more complicated and modern like MongoDB. And how about spam prevention? Cloudflare or Google captcha would probably be the first ideas, both of which make my hair stand. None of these were used here however. The guestbook writes all the signatures and user information into plain XML files, which are queried using XPath.

The site keeps two XML files for persistence and spam prevention: guestbook.xml and history.xml. The first one, guestbook.xml, acts as a record of all the messages, A.K.A. an XML database. Here's a slightly altered snippet of the file with my first inserted signature:

<?xml version="1.0"?> <signatures> ... <signature id="0"> <author>omicron</author> <content>Shweet</content> <date>2025-07-28 21:02:28</date> <web>https://omicronsetup.eu</web> </signature> ... </signatures>
XML really fits everything

One may notice that the file contains a list containing XML encoded signatures with a field coresponding to the author, content, date, and an optional website url of the signature. This is exactly what is being written into the file by the guestbook.php script when a user sends the request. That's, of course, if the scripts allow it, which in this context means if the user's last signature is from a timespan greater than one week ago. So how do we check this in a way that's (hopefully) secure, but also dumb enough to be written by me? One might suggest to go through all the signatures with the provided name, get the latest date, and confirm that last message was not within this timespan. That would be nice, except the fact that each user can enter whatever name they wish, and I would not like to spend my evening clearing messages from john1, john2, john3, etc. So how do we identify a user without the user's intervention? By IP of course. The file history.xml contains exactly that: an IP of the user, and a timestamp of their last signature. Below is another snippet of this file, with obfuscated IP that was recorded on my first signature:

<?xml version="1.0"?> <history> ... <event> <timestamp>1753729348</timestamp> <ip>XX.XX.XX.XX</ip> </event> ... </history>

The ip element of the XML file is fairly self-explanatory, but to some the timestamp may seem confusing. This time format is recorded in a UNIX epoch, basically containing the number of seconds from 1970-01-01 00:00:00 UTC. Why is it here stored in UNIX epoch time and above in a normal date string? Because I came up with it on the spot...but also because I find it easier and (in my mind) more efficient to check whether or not the user is within the timespan by: finding their record in the XML file, adding 604800 into it (number of seconds in a week), and check if this value is greater than current UNIX epoch. That's pretty much the entire spam prevention, the PHP script basically writes down the last signature timestamp under the IP, and checks against it if the user would like to post another one later. One might argue that using a proxy or a VPN would completely negate this protection...and of course they would be absolutely right! I just trust that people wouldn't go to such lengths, but if I suddenly notice a lot of random signatures being thrown at the site, the first thing I'm doing is mass-blocking public proxies and VPNs.

I talked a lot about finding some records inside these XML files, but how exactly I go about that? As mentioned previously, the script uses something called XPath, which is a simple language for querying and transforming XML documents. As it turns out, PHP's simple XML class, subtly named SimpleXML, provides interface for XPath 1.0. In the following example is a query that finds a history.xml record with the user's IP:

$ip = $_SERVER["REMOTE_ADDR"]; ... $userHistory = $history->xpath("/history/event/ip[text() = \"".$ip."\"]/..");

The script discovers the user's IP from $_SERVER["REMOTE_ADDR"], and slots it into the XPath query field. Because PHP needs to escape certain characters and because the way it concatenates strings is horrendous to look at, we can imagine that the IP is 127.0.0.1, in which case the XPath would resolve to: /history/event/ip[text() = "127.0.0.1"]/..

XPath looks through each XML list by looking up elements separated by the '/' character. In the example, this includes going through the root /history (which is just one) and finding an /event and an /ip element which fits the [text() == "127.0.0.1"] constraint. This constraint goes through the IPs of each event and selects the first whose element content matches the provided IP. But this would return the ip element itself, but we want the entire event entry. This can be accessed by going one level up with the /.., much like in a typical filesystem path, hence the name XPath, as we form a path to the desired data.

So the script is able to persist signatures by writing/reading into/from an XML file, and check for previous user interaction by writing/reading into/from another XML file. That's pretty much the entire logic, except few little details that crop up naturally as the system is written. One of those is the ability to respond back to the user with usefull error messages in case something goes wrong. Situations like: exceeding weekly signatures, not including mandatory field, or exceeding the length of one of the fields. In each of those cases the correct response needs to be displayed. This is communicated by redirecting back to the original GUEST BOOK page and setting a GET field err to an error message that the site writes down. Nothing complicated. In PHP this can be done by calling the header function inside a die function that changes the Location header, sets the GET field, and returns. Below is an example of a code that returns error message when the maximum characters are exceeded on either of the fields:

if(strlen($nickname) > 25){ die(header("Location: guests.xhtml?err=".htmlentities("Nickname can't exceed 25 characters."))); } if(strlen($message) > 250){ die(header("Location: guests.xhtml?err=".htmlentities("Message can't exceed 250 characters."))); } if(strlen($website) > 100){ die(header("Location: guests.xhtml?err=".htmlentities("Website link can't exceed 100 characters."))); }

Another very important thing to remember is XSS prevention, because we take and display user provided information. XSS, or Cross Side Scripting, is a type of web application attack that enables an attacker to inject scripts/HTML/or otherwise harmful custom data, that is then executed by another user when entering the page. There's two kinds of XSS to protect against here: Stored XSS and DOM Based XSS, the first of which being more drastic than the other. Stored XSS is a type of XSS that displays stored data field, which may contain harmful JavaScript (as if there's a JS that isn't harmful) or other executable data. This is obviously the case with all the signature data fields, which are displayed back to the user on the site itself. To protect against unsolicited JS execution, a filtering function can be used to filter the provided data before showing it back to the user. In this case, a PHP function htmlspecialchars is used. It converts all raw HTML tag characters into HTML entities, which can be displayed without any execution. The following example shows conversion of an infamous XSS testing script into a HTML displayable form:

php > var_dump(htmlspecialchars("<string>alert(1)</string>")); string(37) "&lt;string&gt;alert(1)&lt;/string&gt;"

The encoded entities &lt; and &gt; are shown as < and > when rendering the page, thus preventing an execution of the script (checkout the source of this sentence to see it first hand).

The second type of XSS is DOM Based XSS, which is much less serious, but still important to prevent. It allows the attacker to only execute their desired code by altering the DOM of the page directly. This typically requires them to modify some input GET parameters, which remain in a shared URL, that is then sent to the victim. After this URL is opened, the GET parameter alters the page in harmful way, executing the script, and making our day worse. The prime candidate for this here is of course the GET parameter returned from the script containing error messages. Preventing this is exactly the same as with the first XSS case, that is filtering the rendered string into a form that can't be executed by the browser. Try clicking on this link to check it out: guests.xhtml?err=<script>alert(1)</script>.

I think this covers all of the things I wanted to write about. I'm curious how durable this system is and for how long it will be able to remain online. If you somehow managed to read all the way down here, and still not signed your name, give it a try! Feel free to write what's on your mind while you're at it!