Internet of Things. Yet Another Gloomy Post.

Technically, I work with Things, as in the Internet of Things.

As outlined in Everything as a Service many formerly ‘dumb’ products – such as heating systems – become part of service offerings. A vital component of the new services is the technical connection of the Thing in your home to that Big Cloud. It seems every energy-related system has got its own Internet Gateway now: Our photovoltaic generator has one, our control unit has one, and the successor of our heat pump would have one, too. If vendors don’t bundle their offerings soon, we’ll end up with substantial electricity costs for powering a lot of separate gateways.

Experts have warned for years that the Internet of Things (IoT) comes with security challenges. Many Things’ owners still keep default or blank passwords, but the most impressive threat is my opinion is not hacking individual systems: Easily hacked things can be hijacked to serve as zombie clients in a botnet and lauch a joint Distributed Denial of Service attack against a single target. Recently the blog of renowned security reporter Brian Krebs has been taken down, most likely as an act of revenge by DDoSers (Crime is now offered as a service as well.). The attack – a tsunami of more than 600 Gbps – was described as one of the largest the internet had seen so far. Hosting provider OVH was subject to a record-breaking Tbps attack – launched via captured … [cue: hacker movie cliché] … cameras and digital video recorders on the internet.

I am about the millionth blogger ‘reporting’ on this, nothing new here. But the social media news about the DDoS attacks collided with another social media micro outrage  in my mind – about seemingly unrelated IT news: HP had to deal with not-so-positive reporting about its latest printer firmware changes and related policies –  when printers started to refuse to work with third-party cartridges. This seems to be a legal issue or has been presented as such, and I am not interested in that aspect here. What I find interesting is the clash of requirements: After the DDoS attacks many commentators said IoT vendors should be held accountable. They should be forced to update their stuff. On the other hand, end users should remain owners of the IT gadgets they have bought, so the vendor has no right to inflict any policies on them and restrict the usage of devices.

I can relate to both arguments. One of my main motivations ‘in renewable energy’ or ‘in home automation’ is to make users powerful and knowledgable owners of their systems. On the other hand I have been ‘in security’ for a long time. And chasing firmware for IoT devices can be tough for end users.

It is a challenge to walk the tightrope really gracefully here: A printer may be traditionally considered an item we own whereas the internet router provided by the telco is theirs. So we can tinker with the printer’s inner workings as much as we want but we must not touch the router and let the telco do their firmware updates. But old-school devices are given more ‘intelligence’ and need to be connected to the internet to provide additional services – like that printer that allows to print from your smartphone easily (Yes, but only if your register it at the printer manufacturer’s website before.). In addition, our home is not really our castle anymore. Our computers aren’t protected by the telco’s router / firmware all the time, but we work in different networks or in public places. All the Things we carry with us, someday smart wearable technology, will check in to different wireless and mobile networks – so their security bugs should better be fixed in time.

If IoT vendors should be held accountable and update their gadgets, they have to be given the option to do so. But if the device’s host tinkers with it, firmware upgrades might stall. In order to protect themselves from legal persecution, vendors need to state in contracts that they are determined to push security updates and you cannot interfere with it. Security can never be enforced by technology only – for a device located at the end user’s premises.

It is horrible scenario – and I am not sure if I refer to hacking or to proliferation of even more bureaucracy and over-regulation which should protect us from hacking but will add more hurdles for would-be start-ups that dare to sell hardware.

Theoretically a vendor should be able to separate the security-relevant features from nice-to-have updates. For example, in a similar way, in smart meters the functions used for metering (subject to metering law) should be separated from ‘features’ – the latter being subject to remote updates while the former must not. Sources told me that this is not an easy thing to achieve, at least not as easy as presented in the meters’ marketing brochure.

Linksys's Iconic Router

That iconic Linksys router – sold since more than 10 years (and a beloved test devices of mine). Still popular because you could use open source firmware. Something that new security policies might seek to prevent.

If hardware security cannot be regulated, there might be more regulation of internet traffic. Internet Service Providers could be held accountable to remove compromised devices from their networks, for example after having noticed the end user several times. Or smaller ISPs might be cut off by upstream providers. Somewhere in the chain of service providers we will have to deal with more monitoring and regulation, and in one way or other the playful days of the earlier internet (romanticized with hindsight, maybe) are over.

When I saw Krebs’ site going offline, I wondered what small business should do in general: His site is now DDoS-protected by Google’s Project Shield, a service offered to independent journalists and activists after his former pro-bono host could not deal with the load without affecting paying clients. So one of the Siren Servers I commented on critically so often came to rescue! A small provider will not be able to deal with such attacks.

WordPress.com should be well-protected, I guess. I wonder if we will all end up hosting our websites at such major providers only, or ‘blog’ directly to Facebook, Google, or LinkedIn (now part of Microsoft) to be safe. I had advised against self-hosting WordPress myself: If you miss security updates you might jeopardize not only your website, but also others using the same shared web host. If you live on a platform like WordPress or Google, you will complain from time to time about limited options or feature updates you don’t like – but you don’t have to care about security. I compare this to avoiding legal issues as an artisan selling hand-made items via Amazon or the like, in contrast to having to update your own shop’s business logic after every change in international tax law.

I have no conclusion to offer. Whenever I read news these days – on technology, energy, IT, anything in between, The Future in general – I feel reminded of this tension: Between being an independent neutral netizen and being plugged in to an inescapable matrix, maybe beneficial but Borg-like nonetheless.

Have I Seen the End of E-Mail?

Not that I desire it, but my recent encounters of ransomware make me wonder.

Some people in say, accounting or HR departments are forced to use e-mail with utmost paranoia. Hackers send alarmingly professional e-mails that look like invoices, job applications, or notifications of postal services. Clicking a link starts the download of malware that will encrypt all your data and ask for ransom.

Theoretically you could still find out if an e-mail was legit by cross-checking with open invoices, job ads, and expected mail. But what if hackers learn about your typical vendors from your business website or if they read your job ads? Then they would send plausible e-mails and might refer to specific codes, like the number of your job ad.

Until recently I figured that only medium or larger companies would be subject to targeted attacks. One major Austrian telco was victim of a Denial of Service attacked and challenged to pay ransom. (They didn’t, and were able to deal with the attack successfully.)

But then I have encountered a new level of ransomware attacks – targeting very small Austrian businesses by sending ‘expected’ job applications via e-mail:

  • The subject line was Job application as [a job that had been advertised weeks ago at a major governmental job service platform]
  • It was written in flawless German, using typical job applicant’s lingo as you learn in trainings.
  • It was addressed to the personal e-mail of the employee dealing with applications, not the public ‘info@’ address of the business
  • There was no attachment – so malware filters could not have found anything suspicious – but only a link to a shared cloud folder (‘…as the attachments are too large…’) – run by a a legit European cloud company.
  • If you clicked the link (which you should not so unless you do this on a separate test-for-malware machine in a separate network) you saw a typical applicant’s photo and a second file – whose name translated to JobApplicationPDF.exe.

Suspicious features:

  • The EXE file should have triggered red lights. But it is not impossible that a job application creates a self-extracting archive, although I would compare that to wrapping your paper application in a box looking like a fake bomb.
  • Google’s Image Search showed that the photo has been stolen from a German photographer’s website – it was an example for a typical job applicant’s photo.
  • Both cloud and mail service used were less known ones. It has been reported that Dropbox had removed suspicious files so it seemed that attackers tuned to alternative services. (Both mail and cloud provider reacted quickly and sht down the suspicious accounts)
  • The e-mail did not contain a phone number or street address, just the pointer to the cloud store: Possible but weird as an applicant should be eager to encourage communications via all channels. There might be ‘normal’ issues with accessing a cloud store link (e.g. link falsely blocked by corporate firewall) – so the HR department should be able to call the applicant.
  • Googling the body text of the e-mail gave one result only – a new blog entry of an IT professional quoting it at full length. The subject line was personalized to industry sector and a specific job ad – but the bulk of the text was not.
  • The non-public e-mail address of the HR person was googleable as the job ad plus contact data appeared on a job platform in a different language and country, without the small company’s consent of course. So harvesting both e-mail address and job description automatically.

I also wonder if my Everything as a Service vision will provide a cure: More and more communication has been moved to messaging on social networks anyway – for convenience and avoiding false negative spam detection. E-Mail – powered by old SMTP protocol with tacked on security features, run on decentralized mail servers – is being replaced by messaging happening within a big monolithic block of a system like Facebook messaging. Some large employer already require their applications to submit their CVs using their web platforms, as well as large corporations demand that their suppliers use their billing platform instead of sending invoices per e-mail.

What needs to be avoided is downloading an executable file and executing it in an environment not controlled by security policies. A large cloud provider might have a better chance to enforce security, and viewing or processing an ‘attachment’ could happen in the provider’s environment. As an alternative all ‘our’ devices might be actually be part of a service and controlled more tightly by centrally set policies. Disclaimer: Not sure if I like that.

Iconic computer virus - from my very first small business website in 1997. Image credits mine.

(‘Computer virus’ – from my first website 1997. Credits mine)

 

Anniversary 4 (4 Me): “Life Ends Despite Increasing Energy”

I published my first post on this blog on March 24, 2012. Back then its title and tagline were:

Theory and Practice of Trying to Combine Just Anything
Physics versus engineering
off-the-wall geek humor versus existential questions
IT versus the real thing
corporate world’s strangeness versus small business entrepreneur’s microcosmos knowledge worker’s connectedness versus striving for independence

… which became

Theory and Practice of Trying to Combine Just Anything
I mean it

… which became

elkemental Force
Research Notes on Energy, Software, Life, the Universe, and Everything

last November. It seems I have run out of philosophical ideas and said anything I had to say about Life and Work and Culture. Now it’s not Big Ideas that make me publish a new post but my small Big Data. Recent posts on measurement data analysis or on the differential equation of heat transport  are typical for my new editorial policy.

Cartoonist Scott Adams (of Dilbert fame) encourages to look for patterns in one’s life, rather than to interpret and theorize – and to be fooled by biases and fallacies. Following this advice and my new policy, I celebrate my 4th blogging anniversary by crunching this blog’s numbers.

No, this does not mean I will show off the humbling statistics of views provided by WordPress 🙂 I am rather interested in my own evolution as a blogger. Having raked my virtual Zen garden two years ago I have manually maintained lists of posts in each main category – these are my menu pages. Now I have processed each page’s HTML code automatically to count posts published per month, quarter, or year in each category. All figures in this post are based on all posts excluding reblogs and the current post.

Since I assigned two categories to some posts, I had to pick one primary category to make the height of one column reflect the total posts per month:Statistics on blog postings: Posts per month in each main category

It seems I had too much time in May 2013. Perhaps I needed creative compensation – indulging in Poetry and pop culture (Web), and – as back then I was writing a master thesis.

I had never missed a single month, but there were two summer breaks in 2012 and 2013 with only 1 post per month. It seems Life and Web gradually have been replaced by Energy, and there was a flash of IT in 2014 which I correlate with both nostalgia but also a professional flashback owing to lots of cryptography-induced deadlines.

But I find it hard to see a trend, and I am not sure about the distortion I made by picking one category.

So I rather group by quarter:

Statistics on blog postings: Posts per quarter in each main category

… which shows that posts per quarter have reached a low right now in Q1 2016, even when I would add the current posting. Most posts now are based on original calculations or data analysis which take more time to create than search term poetry or my autobiographical vignettes. But maybe my anecdotes and opinionated posts had just been easy to write as I was drawing on ‘content’ I had in mind for years before 2012.

In order to spot my ‘paradigm shifts’ I include duplicates in the next diagram: Each post assigned to two categories is counted twice. Since then the total number does not make sense I just depict relative category counts per quarter:

Statistics on blog postings: Posts per quarter in each category, including the assignment of more than one category.

Ultimate wisdom: Life ends, although Energy is increasing. IT is increasing, too, and was just hidden in the other diagram: Recently it is  often the secondary category in posts about energy systems’ data logging. Physics follows an erratic pattern. Quantum Field Theory was accountable for the maximum at the end of 2013, but then replaced by thermodynamics.

Web is also somewhat constant, but the list of posts shows that the most recent Web posts are on average more technical and less about Web and Culture and Everything. There are exceptions.

Those trends are also visible in yearly overviews. The Decline Of Web seems to be more pronounced – so I tag this post with Web.

Statistics on blog postings: Posts per year in each main category

Statistics on blog postings: Posts per year in each category, including the assignment of more than one category.

But perhaps I was cheating. Each category was not as stable as the labels in the diagrams’ legends do imply.

Shortcut categories refer to
1) these category pages: EnergyITLifePhysicsPoetryWeb,
2) and these categories EnergyITLifePhysicsPoetryWeb, respectively, manually kept in sync.

So somehow…

public-key-infrastructure became control-and-it

and

on-writing-blogging-and-indulging-in-web-culture is now simply web

… and should maybe be called nerdy-web-stuff-and-software-development.

In summary, I like my statistics as it confirms my hunches but there is one exception: There was no Poetry in Q1 2016 and I have to do something about this!

________________________________

The Making Of

  • Copy the HTML content of each page with a list to a text editor (I use Notepad2).
  • Find double line breaks (\r\n\r\n) and replace them by a single one (\r\n).
  • Copy the lines to an application that lets you manipulate strings (I use Excel).
  • Tweak strings with formulas / command to cut out date, url, title and comment. Use the HTML tags as markers.
  • Batch-add the page’s category in a new column.
  • Indicate if this is the primary or secondary category in a new column (Find duplicates automatically before so 1 can be assigned automatically to most posts.).
  • Group the list by month, quarter, and year respectively and add the counts to new data tables that will be used for diagrams (e.g. Excel function COUNTIFs, using only the category or category name  + indicator for the primary category as criteria).

It could be automated even better – without having to maintain category pages by simply using the category feeds (like this: https://elkement.wordpress.com/category/physics/feed) or by filtering the full blog feed for categories. I have re-categorized all my posts so that categories matches menu page lists, but I chose to use my lists as

  1. I get not only date and headline, but also my own additional summary / comment that’s not part of the feed. For our German blog, I actually do this in reverse: I create the HTML code of a a sitemap-style overview page on wordpress.com from an Excel list of all posts plus custom comments and then copy the auto-generated code to the HTML view of the respective menu page on the blog.
  2. the feed provided by WordPress.com can have 150 items maximum no matter which higher number you try to configure. So you need to start analyzing before you have published 150 posts.
  3. I can never resist to create a tool that manipulates text files and automates something, however weird.

Shortest Post Ever

… self-indulgent though, but just to add an update on the previous post.

My new personal website is  live:

elkement.subversiv.at

I have already redirected the root URLs of the precursor sites radices.net, subversiv.at and e-stangl.at. Now I am waiting for Google’s final verdict; then I am going to add the rewrite map for the 1:n map of old ASP files and new ‘posts’. This is also the pre-requisite for informing Google about the move officially.

The blog-like structure and standardized attributes like Open Graph meta tags and a XML sitemap should make my site more Google-likeable. With the new site – and one dedicated host name only – I finally added permanent redirects (HTTP 301). Before I used temporary (HTTP 302) redirects, to send requests from the root directory to subfolders, which (so the experts say) is not search-engine-friendly.

On the other hand the .at domain will not help: You can pick a certain country as preferred audience for a non-country domain, but I have to stick with Austria here, even if the language is set to English in all the proper places (I hope).

I have discovered that every WordPress.com Tag or Category has its own feed – just add /feed/ to the respective URLs – and I will make use this in order to automate some of my link curation, like this. This list of physics postings has been created from this feed of selected postings:
https://elkement.wordpress.com/category/science-and-technology/physics/feed/
Of course this means re-tagging and re-categorizing here! Thanks WordPress for the Tags to Categories (and vice versa) Conversion Tools!

It is fun to watch my server’s log files more closely. Otherwise I would have missed that SQL injection attack attempt, trying to put spammy links on my website (into my database):

SQL injection by spammer-hackers

Random Things I Have Learned from My Web Development Project

It’s nearly done (previous episode here).

I have copied all the content from my personal websites, painstakingly disentangling snippets of different ‘posts’ that were physically contained in the same ‘web page’, re-assigning existing images to them, adding tags, consolidating information that was stored in different places. Raking the Virtual Zen Garden – again.

New website: A 'post.'

Draft of the layout, showing a ‘post’. Left and right pane vanish in responsive fashion if the screen gets too small.

… Nothing you have not seen in more elaborate fashion elsewhere. For me the pleasure is in creating the whole thing bottom up not using existing frameworks, content management systems or templates – requiring an FTP client and a text editor only.

I spent a lot of time on designing my redirect strategy. For historical reasons, all my sites use the same virtual web server. Different sites have been separated just by different virtual directories. So in order to display the e-stangl.at content as one stand-alone website, a viewer accessing e-stangl.at is redirected to e-stangl.at/e/. This means that entering [personal.at]/[business] would result in showing the business content at the personal URL. In order to prevent this, the main page generation script used checks for the virtual directory and redirects ‘bottom-up’ to [business.at]/[business].

In the future, I am going to use a new hostname for my website. In addition, I want to have the option to migrate only some applications while keeping the others tied to the old ASP scripts temporarily. This means more redirect logic, especially as I want to test all the redirects. I have a non-public test site on the same server, but I have never tested redirects as it means creating loads of test host names; but due to the complexity of redirects to come I added names like wwwdummy for every domain, redirecting to my new main test host name, in the same way as the www URLs would redirect to my new public host name.

And lest we forget I am obsessed with keeping old URLs working. I don’t like it if websites are migrated to a new content management system, changing all the URLs. As I mentioned before, I already use ASP.NET Routing for having nice URLs with the new site: A request for /en/2014/10/29/some-post-title does not access a physical folder but the ‘flat-file database engine’ I wrote from scratch will search for the proper content text file based on a SQL string handed to it, retrieve attributes from both file name and file content, and display HTML content and attributes like title and thumbnail image properly.

New website: Flat-file database.

Flat-file database: Two folders, ‘pages’ and ‘posts’. Post file names include creation date, short relative URL and category. Using the ascx extension (actually for .NET ‘user controls’ as the web server will not return these files directly but respond with 404. No need to tweak permissions.)

The top menu, the tag cloud, the yearly/monthly/daily archives, the list of posts on the Home page, XML RSS Feed and XML sitemap  are also created by querying these sets of files.

New web site: File / database entry

File representing a post: Upper half – meta tags and attributes, lower half – after attribute ‘content’: Actual content in plain HTML.

Now I want to redirect from the old .asp files (to be deleted from the server at some point in the future) to these nice URLs. My preferred solution for this class of redirects is using a rewrite map hard-coded in the web server’s config file. From my spreadsheet documentation of the 1:n relation of old ASP pages to new ‘posts’ I have automatically created the XML tags to be inserted in the ‘rewrite map’.

Now the boring part is over and I scared everybody off (But just in case you can find more technical information on the last update on the English version of all website, e.g. here) …

… I come up with my grand insights, click-bait X-Things-You-Need-To-Know-About-Seomthing-You-Should-Not-Do-and-Could-Not-Care-Less-Style:

It is sometimes painful to read really old content, like articles, manifestos and speeches from the last century. Yet I don’t hide or change anything.

After all, this is perhaps the point of such a website. I did not go online for the interaction (of social networks, clicks, likes, comments). Putting your thoughts out there, on the internet that does never forget, is like publishing a book you cannot un-publish. It is about holding yourself accountable and aiming at self-consistency.

I am not a visual person. If I would have been more courageous I’d use plain Courier New without formatting and images. Just for the fun of it, I tested adding dedicated images to each post and creating thumbnails from them – and I admit it adds to the content. Disturbing, that is!

I truly love software development. After a day of ‘professional’ software development (simulations re physics and engineering) I am still happy to plunge into this personal web development project. I realized programming is one of the few occupations that was part of any job I ever had. Years ago, soul-searching and preparing for the next career change, I rather figured the main common feature was teaching and know-how transfer – workshops and acedemic lectures etc. But I am relieved I gave that up; perhaps I just tried to live up to the expected ideal of the techie who will finally turn to a more managerial or at least ‘social’ role.

You can always find perfect rationales for irrational projects: Our web server had been hacked last year (ASP pages with spammy links put into some folders) and from backlinks in the network of spammy links I conclude that classical ASP pages had been targeted. My web server was then hosted on Windows 2003, as this time still fully supported. I made use of Parent Paths (../ relative URLs) which might have eased the hack. Now I am migrating to ASP.NET with the goal to turn off Classical ASP completely, and I already got rid of the Parent Paths requirement by editing the existing pages.

This website and my obsession with keeping the old stuff intact reflects my appreciation of The ExistingBeing Creative With What You Have. Re-using my old images and articles feels like re-using our cellar as a water tank. Both of which are passions I might not share with too many people.

My websites had been an experiment in compartmentalizing my thinking and writing – ‘Personal’, ‘Science’, ‘Weird’, at the very beginning the latter two were authored pseudonymously – briefly. My wordpress.com blog has been one quick shot at Grand Unified Theory of my Blogging, and I could not prevent my personal websites to become more an more intertwined, too, in the past years. So finally both do reflect my reluctance of separating my personal and professional self.

My website is self-indulgent – in content and in meta-content. I realize that the technical features I have added are exactly what I need to browse my own stuff for myself, not necessarily what readers might expect or what is considered standard practice. One example is my preference for a three-pane design, and for that infinite (no dropdown-menu) archive.

New website: Category page.

Nothing slows a website down like social media integration. My text file management is for sure not the epitome of efficient programming, but I was flabbergasted by how fast it was to display nearly 150 posts at once – compared to the endless sending back and forth questionable stuff between social networks, tracking, and ad sites (watch the status bar!).

However, this gives me some ideas about the purpose of this blog versus the purpose of my website. Here, on the WordPress.com blog, I feel more challenged to write self-contained, complete, edited, shareable (?) articles – often based on extensive research and consolidation of our original(*) data (OK, there are exceptions, such as this post), whereas the personal website is more of a container of drafts and personal announcements. This also explains why the technical sections of my personal websites contain rather collections of links than full articles.

(*)Which is why I totally use my subversive sense of humour and turn into a nitpicking furious submitter of copyright complaints if somebody steals my articles published here, on the blog. However, I wonder how I’d react if somebody infringed my rights as the ‘web artist’ featured on subversiv.at.

Since 15 years I spent a lot of time on (re-)organizing and categorizing my content. This blog has also been part of this initiative. That re-organization is what I like websites and blogs for – a place to play with structure and content, and their relationship. Again, doing this in public makes me holding myself accountable. Categories are weird – I believe they can only be done right with hindsight. Now all my websites, blogs, and social media profiles eventually use the same categories which have evolved naturally and are very unlike what I might have planned ‘theoretically’.

Structure should be light-weight. I started my websites with the idea of first and second level ‘menu’s and hardly any emphasis on time stamps. But your own persona and your ideas seem to be moving targets. I started commenting on my old articles, correcting or amending what I said (as I don’t delete, see above). subversiv.at has been my Art-from-the-Scrapyard-Weird-Experiments playground, before and in addition to the Art category here and over there I enjoyed commenting in English on German articles and vice versa. But the Temporal Structure, the Arrow of Time was stronger; so I finally made the structure more blog-like.

Curated lists … were most often just ‘posts’. I started collecting links, like resources for specific topics or my own posts written elsewhere, but after some time I did not considered them so useful any more. Perhaps somebody noticed that I have mothballed and hidden my Reading list and Physics Resources here (the latter moved to my ‘science site’ radices.net – URLs do still work of course). Again: The arrow of time wins!

I loved and I cursed the bilingual nature of all my sites. Cursed, because the old structure made it too obvious when the counter-part in the other language was ‘missing’; so it felt like a translation assignment. However, I don’t like translations. I am actually not even capable to really translate the spirit of my own posts. Sometimes I feel like writing in English, sometimes I feel like writing in German. Some days or weeks or months later I feel like reflecting in the same ideas, using the other language. Now I came up with that loose connection of an English and German article, referencing each other via a meta attribute, which results in an unobtrusive URL pointing to the other version.

Quantitative analysis helps to correct distorted views. I thought I wrote ‘so much’. But the tangle of posts and pages in the old sites obscured that actually the content translates to only 138 posts in German and 78 in English. Actually, I wrote in bursts, typically immediately before and after an important change, and the first main burst 2004/2005 was German-only. I think the numbers would have been higher had I given up on the menu-based approach earlier, and rather written a new, updated ‘post’ instead of adding infinitesimal amendments to the existing pseudo-static pages.

Analysing my own process of analysing puts me into this detached mode of thinking. I have shielded myself from social media timelines in the past weeks and tinkered with articles, content written long before somebody could have ‘shared’ it. I feel that it motivates me again to not care about things like word count (too long), target groups (weird mixture of armchair web psychology and technical content), and shareability.

My Flat-File Database

A brief update on my web programming project.

I have preferred to create online text by editing simple text files; so I only need a text editor and an FTP client as management tool. My ‘old’ personal and business web pages are currently created dynamically in the following way:
[Code for including a script (including other scripts)]
[Content of the article in plain HTML = inner HTML of content div]
[Code for writing footer]

The main script(s) create layout containers, meta tags, navigation menus etc.

Meta information about pages or about the whole site are kept in CSV text files. There are e.g. files with tables…

  • … listing all of pages in each site and their attributes – like title, key words, hover texts for navigation links or
  • … tabulating all main properties of all web sites – such as ‘tag lines’ or the name of the CSS file.

A bunch of CSV files / tables can be accessed like a database by defining the columns in a schema.ini file, and using a text driver (on my Windows web server). I am running SQL queries against these text files, and it would be simple to migrate my CSV files to a grown-up database. But I tacked on RSS feeds later; these XML files are hand-crafted and basically a parallel ‘database’.

This CSV file database is not yet what I mean by flat-file database: In my new site the content of a typical ‘article file’ should be plain text, free from code. All meta information will be included in each file, instead of putting it into the separate CSV files. A typical file would look like this:

title: Some really catchy title
headline: Some equally catchy, but a bit longer headline
date_created: 2015-09-15 11:42
date_changed: 2015-09-15 11:45
author: elkement
[more properties and meta tags]
content:
Text in plain HTML.

The logic for creating formatted pages with header, footer, menus etc. has to be contained in code separate from these files; and text files needs to be parsed for meta data and content. The set of files has effectively become ‘the database’, the plain text content being just one of many attributes of a page. Folder structure and file naming conventions are part of the ‘database logic’.

I figured this was all an unprofessional hack until I found many so-called flat-file / database-less content management systems on the internet, intended to be used with smaller sites. They comprise some folders with text files, to be named according to a pre-defined schema plus parsing code that will extract meta data from files’ contents.

Motivated by that find, I created the following structure in VB.NET from scratch:

  • Retrieving a set of text files based on a search criteria from the file system – e.g. for creating the menu from all pages, or for searching for one specific file that should represent the current page – current as per the URL the user entered.
  • Code for parsing a text file for lines having a [name]: [value] structure
  • Processing nice URL entered by the user to make the web server pick the correct text file.

Speaking about URLs, so-called ASP.NET Routing came in handy: Before, I had used a few folders whose default page redirects to an existing page (such as /heatpump/ redirecting to /somefolder/heatpump.asp). Otherwise my URLs all corresponded to existing single files.

I use a typical blogging platform’s schema with the new site: If users enters

/en/2015/09/15/some-cool-article/

the server accesses a text text file whose name contains language, year, such as:

2015-09-15_en_some-cool-article.txt

… and displays the content at the nice URL.

‘Language’ is part of the URL: If a user with a German browsers explicitly accesses an URL starting with /en/ , the language is effectively set to English. However, If the main page is hit, I detect the language from the header sent by the client.

I am not overly original: I use two categories of content – posts and pages – corresponding to text files organized in two different folders in the file system, and following different conventions for file names. Learning from my experience with hand-crafted menu pages in this this blog here, I added:

  • A summary text included in the file, to be displayed in a list of posts per category.
  • A list of posts in a single category, displayed on the category / menu page.

The category is assigned to the post simply as part of the file name; moving a post to another category is done by renaming it.

Since I found that having to add my Google+ posts to just a single Collection was a nice exercise I limit myself to one category per post deliberately.

Having built all the required search patterns and functions for creating lists of posts or menus or recent posts, or for extracting information from specific pages as the current or the corresponding page in the other language …  I realized that I needed a better and clear-cut separation of a high-level query for a bunch of attributes for any set of files meeting some criteria from the lower level doing the search, file retrieval, and parsing.

So why not using genuine SQL commands at the top level – to be translated to file searches and file content parsing on the lower level?

I envisaged building the menu of all pages e.g. by executing something like

SELECT title, url, headline from pages WHERE isMenu=TRUE

and creating the list of recent posts on the home page by running

SELECT * FROM posts WHERE date_created < [some date]

This would also allow for a smooth migration to an actual relational database system if the performance of file-based database would not be that great after all.

I underestimated the efforts of ‘building your own database engine’, but finally the main logic is done. My file system recordset class has this functionality (and I think I finally got the hang of classes and objects):

  • Parse a SQL string to check if it is well-formed.
  • Split it into pieces and translate pieces to names of tables (from FROM) and list of fields (from SELECT and WHERE).
  • For each field, check (against my schema) if the field should be encoded in the file’s name of if it was part of the name / value attributes in the file contents.
  • Build a file search pattern string with * at the right places from the file name attributes.
  • Get the list of files meeting this part of the WHERE criteria.
  • Parse the contents of each file and exclude those not meeting the ‘content fields’ criteria specified in the WHERE clause.
  • Stuff all attributes specified in the SELECT statement into a table-like structure (a dataTable in .NET) and return a recordset object –  that can be queried and handled like recordsets returned by standard database queries – that is: Check for End Of File, or MoveNext, return the value of a specific cell in a column with specific name.

Now I am (re-)creating all collections of pages and posts using my personal SQL engine, In parallel I am manually sifting through old content and turning my web pages into articles. To do: The tag cloud and handling tags in general, and the generation of the RSS XML file from the database.

The new site is not publicly available yet. At the time of writing of this post, all my sites still use the old schema.

Disclaimers:

  • I don’t claim this is the best way to build a web site / blog. It’s also a fun project for the sake of having fun with developing it, exploring the limits of flat-file databases, forcing myself to deal with potential performance issues.
  • It is a deliberate choice: My hosting space allows for picking from different well-known relational databases and I have done a lot of SQL Server programming in the past months in other projects.
  • I have a licence of Visual Studio. Using only a text editor instead is a deliberate choice, too.

Interrupting Regularly Scheduled Programming …

(… for programming.)

Playing with websites has been a hobby of mine since nearly two decades. What has intrigued me was the combination of different tasks, appealing to different moods – or modes:

  • Designing the user interface and organizing content.
  • Writing the actual content, and toggling between creative and research mode.
  • Developing the backend: database and application logic.

I have distributed different classes of content between my three personal sites, noticed how they drifted apart or become similar again, and have migrated my content over and over when re-doing the underlying software.

e-stangl: screenshotsubversiv-screenshotradices.net: screenshotCurrently the sites run on outdated ASP scripts accessing CSV files as database tables via SQL. This was not a corporate software project – or too similar to one: I kept tacking on new features as I went, indulging in organically grown code. I hand-craft my XML feeds!

It is time to consolidate all this. I feel entitled, motivated, or perhaps even forced to migrate to a new ‘platform’, finally based on true object-oriented programming. Our other three sites run on the same legacy code, which I don’t want to support forever – I will migrate those sites as well in the long run.

So: I am developing a new .NET site from scratch, and I am going to merge my three personal sites into one.

However, I cannot bring myself to re-doing the code only and trying to migrate the content unchanged and as automated as possible. Every old article brings up memories and challenges me to comment on it and to reply to former self. I have to deal with all the three aspects listed above!

As for the layout, the challenge is to preserve the spirit and colors of all three sites – perhaps using something silly as three different layouts that visitors (especially: myself) can pick from, changing the layout based on category, or based on something random.

This is just a draft - but it seems I prefer to build on the 'subversive' layout.

This is just a first draft – building on the ‘subversive’ layout.

I will dedicate most of my ‘online time’ to this project; so I am taking a break from my usual blogging here and there – except from progress reports on this web migration project – and I will not be very active on social media.