brief update on my web programming project.
I have preferred to create online text by editing simple text files; so I only need a text editor and an FTP client as management tool. My ‘old’ personal and business web pages are currently created dynamically in the following way:
[Code for including a script (including other scripts)]
[Content of the article in plain HTML = inner HTML of content div]
[Code for writing footer]
The main script(s) create layout containers, meta tags, navigation menus etc.
Meta information about pages or about the whole site are kept in CSV text files. There are e.g. files with tables…
- … listing all of pages in each site and their attributes – like title, key words, hover texts for navigation links or
- … tabulating all main properties of all web sites – such as ‘tag lines’ or the name of the CSS file.
A bunch of CSV files / tables can be accessed like a database by defining the columns in a schema.ini file, and using a text driver (on my Windows web server). I am running SQL queries against these text files, and it would be simple to migrate my CSV files to a grown-up database. But I tacked on RSS feeds later; these XML files are hand-crafted and basically a parallel ‘database’.
This CSV file database is not yet what I mean by flat-file database: In my new site the content of a typical ‘article file’ should be plain text, free from code. All meta information will be included in each file, instead of putting it into the separate CSV files. A typical file would look like this:
title: Some really catchy title headline: Some equally catchy, but a bit longer headline date_created: 2015-09-15 11:42 date_changed: 2015-09-15 11:45 author: elkement [more properties and meta tags] content: Text in plain HTML.
The logic for creating formatted pages with header, footer, menus etc. has to be contained in code separate from these files; and text files needs to be parsed for meta data and content. The set of files has effectively become ‘the database’, the plain text content being just one of many attributes of a page. Folder structure and file naming conventions are part of the ‘database logic’.
I figured this was all an unprofessional hack until I found many so-called flat-file / database-less content management systems on the internet, intended to be used with smaller sites. They comprise some folders with text files, to be named according to a pre-defined schema plus parsing code that will extract meta data from files’ contents.
Motivated by that find, I created the following structure in VB.NET from scratch:
- Retrieving a set of text files based on a search criteria from the file system – e.g. for creating the menu from all pages, or for searching for one specific file that should represent the current page – current as per the URL the user entered.
- Code for parsing a text file for lines having a [name]: [value] structure
- Processing nice URL entered by the user to make the web server pick the correct text file.
Speaking about URLs, so-called ASP.NET Routing came in handy: Before, I had used a few folders whose default page redirects to an existing page (such as /heatpump/ redirecting to /somefolder/heatpump.asp). Otherwise my URLs all corresponded to existing single files.
I use a typical blogging platform’s schema with the new site: If users enters
the server accesses a text text file whose name contains language, year, such as:
… and displays the content at the nice URL.
‘Language’ is part of the URL: If a user with a German browsers explicitly accesses an URL starting with /en/ , the language is effectively set to English. However, If the main page is hit, I detect the language from the header sent by the client.
I am not overly original: I use two categories of content – posts and pages – corresponding to text files organized in two different folders in the file system, and following different conventions for file names. Learning from my experience with hand-crafted menu pages in this this blog here, I added:
- A summary text included in the file, to be displayed in a list of posts per category.
- A list of posts in a single category, displayed on the category / menu page.
The category is assigned to the post simply as part of the file name; moving a post to another category is done by renaming it.
Since I found that having to add my Google+ posts to just a single Collection was a nice exercise I limit myself to one category per post deliberately.
Having built all the required search patterns and functions for creating lists of posts or menus or recent posts, or for extracting information from specific pages as the current or the corresponding page in the other language … I realized that I needed a better and clear-cut separation of a high-level query for a bunch of attributes for any set of files meeting some criteria from the lower level doing the search, file retrieval, and parsing.
So why not using genuine SQL commands at the top level – to be translated to file searches and file content parsing on the lower level?
I envisaged building the menu of all pages e.g. by executing something like
SELECT title, url, headline from pages WHERE isMenu=TRUE
and creating the list of recent posts on the home page by running
SELECT * FROM posts WHERE date_created < [some date]
This would also allow for a smooth migration to an actual relational database system if the performance of file-based database would not be that great after all.
I underestimated the efforts of ‘building your own database engine’, but finally the main logic is done. My file system recordset class has this functionality (and I think I finally got the hang of classes and objects):
- Parse a SQL string to check if it is well-formed.
- Split it into pieces and translate pieces to names of tables (from FROM) and list of fields (from SELECT and WHERE).
- For each field, check (against my schema) if the field should be encoded in the file’s name of if it was part of the name / value attributes in the file contents.
- Build a file search pattern string with * at the right places from the file name attributes.
- Get the list of files meeting this part of the WHERE criteria.
- Parse the contents of each file and exclude those not meeting the ‘content fields’ criteria specified in the WHERE clause.
- Stuff all attributes specified in the SELECT statement into a table-like structure (a dataTable in .NET) and return a recordset object – that can be queried and handled like recordsets returned by standard database queries – that is: Check for End Of File, or MoveNext, return the value of a specific cell in a column with specific name.
Now I am (re-)creating all collections of pages and posts using my personal SQL engine, In parallel I am manually sifting through old content and turning my web pages into articles. To do: The tag cloud and handling tags in general, and the generation of the RSS XML file from the database.
The new site is not publicly available yet. At the time of writing of this post, all my sites still use the old schema.
- I don’t claim this is the best way to build a web site / blog. It’s also a fun project for the sake of having fun with developing it, exploring the limits of flat-file databases, forcing myself to deal with potential performance issues.
- It is a deliberate choice: My hosting space allows for picking from different well-known relational databases and I have done a lot of SQL Server programming in the past months in other projects.
- I have a licence of Visual Studio. Using only a text editor instead is a deliberate choice, too.