Ice Storage Hierarchy of Needs

Data Kraken – the tentacled tangled pieces of software for data analysis – has a secret theoretical sibling, an older one: Before we built our heat source from a cellar, I developed numerical simulations of the future heat pump system. Today this simulation tool comprises e.g. a model of our control system, real-live weather data, energy balances of all storage tanks, and a solution to the heat equation for the ground surrounding the water/ice tank.

I can model the change of the tank temperature and  ‘peak ice’ in a heating season. But the point of these simulations is rather to find out to which parameters the system’s performance reacts particularly sensitive: In a worst case scenario will the storage tank be large enough?

A seemingly fascinating aspect was how peak ice ‘reacts’ to input parameters: It is quite sensitive to the properties of ground and the solar/air collector. If you made either the ground or the collector just ‘a bit worse’, ice seems to grow out of proportion. Taking a step back I realized that I could have come to that conclusion using simple energy accounting instead of differential equations – once I had long-term data for the average energy harvesting power of the collector and ground. Caveat: The simple calculation only works if these estimates are reliable for a chosen system – and this depends e.g. on hydraulic design, control logic, the shape of the tank, and the heat transfer properties of ground and collector.

For the operations of the combined tank+collector source the critical months are the ice months Dec/Jan/Feb when air temperature does not allow harvesting all energy from air. Before and after that period, the solar/air collector is nearly the only source anyway. As I emphasized on this blog again and again, even during the ice months, the collector is still the main source and delivers most of the ambient energy the heat pump needs (if properly sized) in a typical winter. The rest has to come from energy stored in the ground surrounding the tank or from freezing water.

I am finally succumbing to trends of edutainment and storytelling in science communications – here is an infographic:

Ambient energy needed in Dec/Jan/Fec - approximate contributions of collector, ground, ice

(Add analogies to psychology here.)

Using some typical numbers, I am illustrating 4 scenarios in the figure below, for a  system with these parameters:

  • A cuboid tank of about 23 m3
  • Required ambient energy for the three ice months is ~7000kWh
    (about 9330kWh of heating energy at a performance factor of 4)
  • ‘Standard’ scenario: The collector delivers 75% of the ambient energy, ground delivers about 18%.
  • Worse’ scenarios: Either collector or/and ground energy is reduced by 25% compared to the standard.

Contributions of the three sources add up to the total ambient energy needed – this is yet another way of combining different energies in one balance.

Contributions to ambient energy in ice months - scenarios.

Ambient energy needed by the heat pump in  Dec+Jan+Feb,  as delivered by the three different sources. Latent ‘ice’ energy is also translated to the percentage of water in the tank that would be frozen.

Neither collector nor ground energy change much in relation to the base line. But latent energy has to fill in the gap: As the total collector energy is much higher than the total latent energy content of the tank, an increase in the gap is large in relation to the base ice energy.

If collector and ground would both ‘underdeliver’ by 25% the tank in this scenario would be frozen completely instead of only 23%.

The ice energy is just the peak of the total ambient energy iceberg.

You could call this system an air-geothermal-ice heat pump then!

My Data Kraken – a Shapeshifter

I wonder if Data Kraken is only used by German speakers who translate our hackneyed Datenkrake – is it a word like eigenvector?

Anyway, I need this animal metaphor, despite this post is not about facebook or Google. It’s about my personal Data Kraken – which is a true shapeshifter like all octopuses are:

(… because they are spineless, but I don’t want to over-interpret the metaphor…)

Data Kraken’s shapeability is a blessing, given ongoing challenges:

When the Chief Engineer is fighting with other intimidating life-forms in our habitat, he focuses on survival first and foremost … and sometimes he forgets to inform the Chief Science Officer about fundamental changes to our landscape of sensors. Then Data Kraken has to be trained again to learn how to detect if the heat pump is on or off in a specific timeslot. Use the signal sent from control to the heat pump? Or to the brine pump? Or better use brine flow and temperature difference?

It might seem like a dull and tedious exercise to calculate ‘averages’ and other performance indicators that require only very simple arithmetics. But with the exception of room or ambient temperature most of the ‘averages’ just make sense if some condition is met, like: The heating water inlet temperature should only be calculated when the heating circuit pump is on. But the temperature of the cold water, when the same floor loops are used for cooling in summer, should not be included in this average of ‘heating water temperature’. Above all, false sensor readings, like 0, NULL or any value (like 999) a vendor chooses to indicate as an error, have to be excluded. And sometimes I rediscover eternal truths like the ratio of averages not being equal to the average of ratios.

The Chief Engineer is tinkering with new sensors all the time: In parallel to using the old & robust analog sensor for measuring the water level in the tank…

Level sensor: The old way

… a multitude of level sensors was evaluated …

Level sensors: The precursors

… until finally Mr. Bubble won the casting …

blubber-messrohr-3

… and the surface level is now measured via the pressure increasing linearly with depth. For the Big Data Department this means to add some new fields to the Kraken database, calculate new averages … and to smoothly transition from the volume of ice calculated from ruler readings to the new values.

Change is the only constant in the universe, paraphrasing Heraclitus [*]. Sensors morph in purpose: The heating circuit, formerly known (to the control unit) as the radiator circuit became a new wall heating circuit, and the radiator circuit was virtually reborn as a new circuit.

I am also guilty of adding new tentacles all the time, too, herding a zoo of meters added in 2015, each of them adding a new log file, containing data taken at different points of time in different intervals. This year I let Kraken put tentacles into the heat pump:

Data Kraken: Tentacles in the heat pump!

But the most challenging data source to integrate is the most unassuming source of logging data: The small list of the data that The Chief Engineer had recorded manually until recently (until the advent of Miss Pi CAN Sniffer and Mr Bubble). Reason: He had refused to take data at exactly 00:00:00 every single day, so learned things I never wanted to know about SQL programming languages to deal with the odd time intervals.

To be fair, the Chief Engineer has been dedicated at data recording! He never shunned true challenges, like a legendary white-out in our garden, at the time when measuring ground temperatures was not automated yet:

The challenge

White Out

Long-term readers of this blog know that ‘elkement’ stands for a combination of nerd and luddite, so I try to merge a dinosaur scripting approach with real-world global AI Data Krakens’ wildest dream: I wrote scripts that create scripts that create scripts [[[…]]] that were based on a small proto-Kraken – a nice-to-use documentation database containing the history of sensors and calculations.

The mutated Kraken is able to eat all kinds of log files, including clients’ ones, and above all, it can be cloned easily.

I’ve added all the images and anecdotes to justify why an unpretentious user interface like the following is my true Christmas present to myself – ‘easily clickable’ calculated performance data for days, months, years, and heating seasons.

Data Kraken: UI

… and diagrams that can be changed automatically, by selecting interesting parameters and time frames:

Excel for visualization of measurement data

The major overhaul of Data Kraken turned out to be prescient as a seemingly innocuous firmware upgrade just changed not only log file naming conventions and publication scheduled but also shuffled all the fields in log files. My Data Kraken has to be capable to rebuild the SQL database from scratch, based on a documentation of those ever changing fields and the raw log files.

_________________________________

[*] It was hard to find the true original quote for that, as the internet is cluttered with change management coaches using that quote, and Heraclitus speaks to us only through secondary sources. But anyway, what this philosophy website says about Heraclitus applies very well to my Data Kraken:

The exact interpretation of these doctrines is controversial, as is the inference often drawn from this theory that in the world as Heraclitus conceives it contradictory propositions must be true.

In my world, I also need to deal with intriguing ambiguity!

My Flat-File Database

A brief update on my web programming project.

I have preferred to create online text by editing simple text files; so I only need a text editor and an FTP client as management tool. My ‘old’ personal and business web pages are currently created dynamically in the following way:
[Code for including a script (including other scripts)]
[Content of the article in plain HTML = inner HTML of content div]
[Code for writing footer]

The main script(s) create layout containers, meta tags, navigation menus etc.

Meta information about pages or about the whole site are kept in CSV text files. There are e.g. files with tables…

  • … listing all of pages in each site and their attributes – like title, key words, hover texts for navigation links or
  • … tabulating all main properties of all web sites – such as ‘tag lines’ or the name of the CSS file.

A bunch of CSV files / tables can be accessed like a database by defining the columns in a schema.ini file, and using a text driver (on my Windows web server). I am running SQL queries against these text files, and it would be simple to migrate my CSV files to a grown-up database. But I tacked on RSS feeds later; these XML files are hand-crafted and basically a parallel ‘database’.

This CSV file database is not yet what I mean by flat-file database: In my new site the content of a typical ‘article file’ should be plain text, free from code. All meta information will be included in each file, instead of putting it into the separate CSV files. A typical file would look like this:

title: Some really catchy title
headline: Some equally catchy, but a bit longer headline
date_created: 2015-09-15 11:42
date_changed: 2015-09-15 11:45
author: elkement
[more properties and meta tags]
content:
Text in plain HTML.

The logic for creating formatted pages with header, footer, menus etc. has to be contained in code separate from these files; and text files needs to be parsed for meta data and content. The set of files has effectively become ‘the database’, the plain text content being just one of many attributes of a page. Folder structure and file naming conventions are part of the ‘database logic’.

I figured this was all an unprofessional hack until I found many so-called flat-file / database-less content management systems on the internet, intended to be used with smaller sites. They comprise some folders with text files, to be named according to a pre-defined schema plus parsing code that will extract meta data from files’ contents.

Motivated by that find, I created the following structure in VB.NET from scratch:

  • Retrieving a set of text files based on a search criteria from the file system – e.g. for creating the menu from all pages, or for searching for one specific file that should represent the current page – current as per the URL the user entered.
  • Code for parsing a text file for lines having a [name]: [value] structure
  • Processing nice URL entered by the user to make the web server pick the correct text file.

Speaking about URLs, so-called ASP.NET Routing came in handy: Before, I had used a few folders whose default page redirects to an existing page (such as /heatpump/ redirecting to /somefolder/heatpump.asp). Otherwise my URLs all corresponded to existing single files.

I use a typical blogging platform’s schema with the new site: If users enters

/en/2015/09/15/some-cool-article/

the server accesses a text text file whose name contains language, year, such as:

2015-09-15_en_some-cool-article.txt

… and displays the content at the nice URL.

‘Language’ is part of the URL: If a user with a German browsers explicitly accesses an URL starting with /en/ , the language is effectively set to English. However, If the main page is hit, I detect the language from the header sent by the client.

I am not overly original: I use two categories of content – posts and pages – corresponding to text files organized in two different folders in the file system, and following different conventions for file names. Learning from my experience with hand-crafted menu pages in this this blog here, I added:

  • A summary text included in the file, to be displayed in a list of posts per category.
  • A list of posts in a single category, displayed on the category / menu page.

The category is assigned to the post simply as part of the file name; moving a post to another category is done by renaming it.

Since I found that having to add my Google+ posts to just a single Collection was a nice exercise I limit myself to one category per post deliberately.

Having built all the required search patterns and functions for creating lists of posts or menus or recent posts, or for extracting information from specific pages as the current or the corresponding page in the other language …  I realized that I needed a better and clear-cut separation of a high-level query for a bunch of attributes for any set of files meeting some criteria from the lower level doing the search, file retrieval, and parsing.

So why not using genuine SQL commands at the top level – to be translated to file searches and file content parsing on the lower level?

I envisaged building the menu of all pages e.g. by executing something like

SELECT title, url, headline from pages WHERE isMenu=TRUE

and creating the list of recent posts on the home page by running

SELECT * FROM posts WHERE date_created < [some date]

This would also allow for a smooth migration to an actual relational database system if the performance of file-based database would not be that great after all.

I underestimated the efforts of ‘building your own database engine’, but finally the main logic is done. My file system recordset class has this functionality (and I think I finally got the hang of classes and objects):

  • Parse a SQL string to check if it is well-formed.
  • Split it into pieces and translate pieces to names of tables (from FROM) and list of fields (from SELECT and WHERE).
  • For each field, check (against my schema) if the field should be encoded in the file’s name of if it was part of the name / value attributes in the file contents.
  • Build a file search pattern string with * at the right places from the file name attributes.
  • Get the list of files meeting this part of the WHERE criteria.
  • Parse the contents of each file and exclude those not meeting the ‘content fields’ criteria specified in the WHERE clause.
  • Stuff all attributes specified in the SELECT statement into a table-like structure (a dataTable in .NET) and return a recordset object –  that can be queried and handled like recordsets returned by standard database queries – that is: Check for End Of File, or MoveNext, return the value of a specific cell in a column with specific name.

Now I am (re-)creating all collections of pages and posts using my personal SQL engine, In parallel I am manually sifting through old content and turning my web pages into articles. To do: The tag cloud and handling tags in general, and the generation of the RSS XML file from the database.

The new site is not publicly available yet. At the time of writing of this post, all my sites still use the old schema.

Disclaimers:

  • I don’t claim this is the best way to build a web site / blog. It’s also a fun project for the sake of having fun with developing it, exploring the limits of flat-file databases, forcing myself to deal with potential performance issues.
  • It is a deliberate choice: My hosting space allows for picking from different well-known relational databases and I have done a lot of SQL Server programming in the past months in other projects.
  • I have a licence of Visual Studio. Using only a text editor instead is a deliberate choice, too.

Interrupting Regularly Scheduled Programming …

(… for programming.)

Playing with websites has been a hobby of mine since nearly two decades. What has intrigued me was the combination of different tasks, appealing to different moods – or modes:

  • Designing the user interface and organizing content.
  • Writing the actual content, and toggling between creative and research mode.
  • Developing the backend: database and application logic.

I have distributed different classes of content between my three personal sites, noticed how they drifted apart or become similar again, and have migrated my content over and over when re-doing the underlying software.

e-stangl: screenshotsubversiv-screenshotradices.net: screenshotCurrently the sites run on outdated ASP scripts accessing CSV files as database tables via SQL. This was not a corporate software project – or too similar to one: I kept tacking on new features as I went, indulging in organically grown code. I hand-craft my XML feeds!

It is time to consolidate all this. I feel entitled, motivated, or perhaps even forced to migrate to a new ‘platform’, finally based on true object-oriented programming. Our other three sites run on the same legacy code, which I don’t want to support forever – I will migrate those sites as well in the long run.

So: I am developing a new .NET site from scratch, and I am going to merge my three personal sites into one.

However, I cannot bring myself to re-doing the code only and trying to migrate the content unchanged and as automated as possible. Every old article brings up memories and challenges me to comment on it and to reply to former self. I have to deal with all the three aspects listed above!

As for the layout, the challenge is to preserve the spirit and colors of all three sites – perhaps using something silly as three different layouts that visitors (especially: myself) can pick from, changing the layout based on category, or based on something random.

This is just a draft - but it seems I prefer to build on the 'subversive' layout.

This is just a first draft – building on the ‘subversive’ layout.

I will dedicate most of my ‘online time’ to this project; so I am taking a break from my usual blogging here and there – except from progress reports on this web migration project – and I will not be very active on social media.

Waging a Battle against Sinister Algorithms

I have felt a disturbance of the force.

As you might expect from a blog about anything, this one has a weird collection of unrelated top pages and posts. My WordPress Blog Stats tell me I am obviously an internet authority on: how rodents get into kitchen appliances, about the physics of a spinning toy, about the history of the first heat pump, and most recently about how to sniff router traffic. But all those posts and topics are eclipsed by the meteoric rise of the single most popular ever article, which was a review of a book on a subfield in theoretical physics. I am not linking this post or quoting its title for reasons you might understand in a minute.

Checking out Google Webmaster Tools the effect is even more pronounced. Some months ago this textbook review attracted by far the most Google search impressions and clicks. Looking at the data from the perspective of a bot it might appear as if my blog had been created just to promote that book. Which is, what I believe might actually had happened.

Concluding from historical versions of the book author’s website (on archive.org), the page impressions of my review started to surge when he put a backlink to my post on his page, some when in spring this year.

But then in autumn this happened.

Page impressions for this blog on Google Webmaster Tools, Sept to Dec.These are the impressions for searches from desktop computers (‘Web’), without image or mobile search. A page impression means that  the link had been displayed on Google Search Results pages to some user. The curve does not change much if I remove the filter for Web.

For this period of three months, that article I Shall Not Quote is the top page in terms of impressions, right after the blog’s default page. I wondered about the reason for this steep decline as I usually don’t see any trend within three months on any of my sites.

If I decrease the time slot to the past month that infamous post suddenly vanishes from the top posts:

Page impressions and top pages in the last monthIt was eradicated quickly – which can only be recognized when decreasing the time slot step-by-step. With a few days at the end of October / beginning of November the entry seems to have been erased from the list of impressions.

I sorted the list of results shown above by the name of the page, not by impressions. Since WordPress posts’ names are prefixed with dates you would expect to see any of your posts in that list somewhere, some of them of course with very slow scores. Actually, that list does include also obscure early posts from 2012 nobody ever clicks at.

The former top post, however, did not get a single impression anymore in the past month. I have highlighted the posts before and after in the list, and I have removed all filters for this one, thus also image and mobile search are taken into account. The post’s name started with /2013/12/22/:

Last month, top pages, recent top post missingChecking the status of indexed pages in total confirms that links have been recently removed:

Index status of this blogFor my other sites and blog this number is basically constant – as long as a website does not get hacked. As our business site actually has been a month ago. Yes, I only mention this in passing as I am less worried about that hack than about that mysterious penalizing of this blog.

I learned that your typical hack of a website is less spectacular that what hacker movies let you believe: If you are not a high-profile target, hacker-spammers leave your site intact, but place additional spammy pages with cross-links on your site to promote their links. You recognize this immediately by a surge of the number of URLs, of indexing activities, and – in case your hoster is as vigilant as mine – a peak in 404 not found errors after that spammy pages have been removed. This is the intermittent spike in spammy pages on our business page crawled by Google:

Crawl stats after hackI used all tools at my disposal to clean up the mess the hackers caused – those pages actually have been indexed already. It will take a while until things like ‘fake Gucci belts’ will be removed from our top content keywords, after I removed the links from the index by editing robots.txt, and using the Google URL removal tool and the URL parameters tool (the latter comes in handy as the spammy pages have been indexed with various query strings, that is: parameters).

I have expected the worst but Google have not penalized me for that intermittent link spam attack (yet?). Numbers are now back to normal after a peak in queries for those fake brand stuff:

Queries back to normal after clean-up.It was an awful lot of work to clean those URLs popping up again and again every day. I am willing to fight the sinister forces without too much whining. But Google’s harsh treatment of the post on this blog freaks me out. It is not only the blog post that was affected but also the pages for the tags, categories and archive entries. Nearly all of these pages – thus all the pages linking to the post – did not get a single impression anymore.

Google Webmaster Tools also tells me that the number of so-called Structured Data for this blog had been reduced to nearly zero:

Structured data on this blogStructured Data are useful for pages that show e.g. product reviews or recipes – anything that should have a pre-defined structure that might be presented according to that structure in Google search results, via nice formatted snippets. My home-grown websites do not use those, but the spammer-hackers had used such data in their link spam pages – so on our business site we saw a peak in structured data at the time of the hack.

Obviously WP blogs use those per design. Our German blog is based on the same WP theme – but the number of structured data there has been constant. So if anybody out there is using theme Twenty Eleven I would be happy to learn about your encounters with structured data.

I have read a lot: what I never wanted to know about search engine optimization. This also included hackers’ Black SEO. I recommend the book Spam Nation by renowned investigative reporter and IT security insider Brian Krebs, published recently. Whose page and book I will again not link.

What has happened? I can only speculate.

Spammers build networks of shady backlinks to promote their stuff. So common knowledge is of course that you should not buy links or create such network scams. Ironically, I have cross-linked all my own sites like hell for many years. Not for SEO purposes but in my eternal quest for organizing my stuff, keeping things separate, but adding the right pointers though, Raking the virtual Zen Garden etc. Never ever did this backfire. I was always concerned about the effect of my links and resources pages (links to other pages, mainly tech and science). Today my site radices.net which was once an early German predecessor of this blog is my big link dump – but still these massive link collections are not voted down by Google.

Maybe Google considers my posting and the physics book author’s website part of such a link scam. I have linked to the author’s page several times – to sample chapters, generously made available via download as PDFs, and the author linked back to me. I had refused to tie my blog to my Google+ account and claim ‘Google authorship’ so far as I don’t wanted to trade elkement for my real name on G+. Via Webmaster tools Google knows about all my domains but they might suspect I – a pseudo-anonymous elkement, using an @subversiv.at address on G+ – might also own the book author’s domain that I – diabolically smart – did not declare in Webmaster Tools.

As I said before, from a most objective perspective Google’s rationale might not be that unreasonable. I don’t write book reviews that often, my most recent were about The Year Without Pants and The Glass Cage. I rather write posts triggered by one idea in a book, maybe not even the main one. When I write about books I don’t use Amazon Affiliate marketing – as professional reviewers such as Brain Pickings or Farnam Street do. I write about unrelated topics. I might not match the expected pattern. This is amusing as long as only a blog is concerned but on principle it is similar as being interviewed by the FBI at an airport because your travel pattern just can’t be normal (as detailed in the book Bursts, on modelling human behaviour – a book I also sort of reviewed last year).

In short, I sometimes review and ‘promote’ books without any return on that. I simply don’t review books I don’t like as I think blogging should be fun. Maybe in an age of gamified reviews and fake forum posts with spammy signatures Google simply doesn’t buy into that. I sympathize. I learned that forums websites shod add a nofollow tag to any hyperlinks users post so that Google will now downvote the link targets. So links in discussion groups are considered spammy per se and you need to do something about it so that they don’t hurt what you – as a forum user – are probably trying to discuss or recommend in good faith. I already live in fear that those links some tinkerers set in DIYer’s forums (linking to our business site or my posts on our heating system) will be considered paid link spam.

However, I cannot explain why I can find my book review post on Google (thus generating an impression) when searching for site:[URL of the post]. Perhaps consolidation takes time. Perhaps there is hope. I even see the post when I use Tor Browser and a foreign IP address so this is not related to my preferences as a logged on Google user. But if there isn’t a glitch in Webmaster Tools, no other typical searcher encounters this impression. I am aware of the tool for disavowing URLs but I don’t want to report a perfectly valid backlink. In addition, that backlink from the author’s site does not even show up in the list of external backlinks which is another enigma.

I know that this seems to be an obsession with a first world problem: This was an post on a topic I don’t claim expertise or that I don’t consider strategically important. But whatever happens to this blog could happen to other sites I am more concerned about, business-wise. So I hope if is just a bug and/or Google Bots will read this post and will release my link. Just in case I mentioned your book or blog here, even if indirectly, please don’t backlink.

Perhaps Google did not like my ranting about encrypted search terms, not available to the search term poet. I dared to display the Bing logo back then. Which I will do again now as:

  • Bing tells me that the infamous post generates impressions and clicks
  • Bing recognizes the backlink
  • The number of indexed pages is increasing gradually with time.
  • And Bing did not index the spammy pages in the brief period they were on our hacked website.

Bing logo (2013)Update 2014-12-23 – it actually happened twice:

Analyzing the impressions from the last day I realize that Google has also treated my physics resources page Physics Books on the Bedside Table. Page impressions dropped and now that page which was the top one )after the review had plummeted) is gone, too. I had already considered to move this page to my site that hosts all those list of links (without issues, so far): radices.net, and I will complete this migration in a minute. Now of course Google might think I, the link spammer, am frantically moving on to another site.

Update 2014-12-24 – now at least results are consistent:

I cannot see my own review post anymore when I search for the title of the book. So finally the results from Webmaster Tools are in line with my tests.

Update 2015-01-23 – totally embarrassing final statement on this:

WordPress has migrated their hosted blogs to https only. All my traffic was hiding in the statistics for the https version which has to be added in Google Webmaster Tools as a separate website.

What I Never Wanted to Know about Security but Found Extremely Entertaining to Read

This is in praise of Peter Gutmann‘s book draft Engineering Security, and the title is inspired by his talk Everything You Never Wanted to Know about PKI but were Forced to Find Out.

Chances are high that any non-geek reader is already intimidated by the acronym PKI – sharing the links above on LinkedIn I have been asked Oh. Wait. What the %&$%^ is PKI??

This reaction is spot-on as this post is more about usability and perception of technology by end-users despite or because I have worked for more than 10 years at the geeky end of Public Key Infrastructure. In summary, PKI is a bunch (actually a ton) of standards that should allow for creating the electronic counterparts of signatures, of issuing passports, of transferring data in locked cabinets. It should solve all security issues basically.

The following images from Peter Gutmann’s book  might invoke some memories.

Security warnings designed by geeks look like this:

Peter Gutmann, Engineering Security, certificate warning - What the developers wrote

Peter Gutmann, Engineering Security, book draft, available at https://www.cs.auckland.ac.nz/~pgut001/pubs/book.pdf, p.167. Also shown in Things that Make us Stupid, https://www.cs.auckland.ac.nz/~pgut001/pubs/stupid.pdf, p.3.

As a normal user, you might rather see this:

Peter Gutmann, Engineering Security, certificate warning - What the user sees

Peter Gutmann, Engineering Security, book draft, available at https://www.cs.auckland.ac.nz/~pgut001/pubs/book.pdf, p.168.

The funny thing was that I picked this book to take a break from books on psychology and return to the geeky stuff – and then I was back to all kinds of psychological biases and Kahneman’s Prospect Theory for example.

What I appreciate in particular is the diverse range of systems and technologies considered – Apple, Android, UNIX, Microsoft,…, all evaluated agnosticly, plus a diverse range of interdisciplinary research considered. Now that’s what I call true erudition with a modern touch. Above all, I enjoyed the conversational and irreverent tone – I have never started reading a book for technical reasons and then was not able to put it down because it was so entertaining.

My personal summary – which resonates a lot with my experience – is:
On trying to make systems more secure you might not only make them more unusable and obnoxious but also more insecure.

A concise summary is also given in Gutmann’s talk Things that Make Us Stupid. I liked in particular the ignition key as a real-world example for a device that is smart and easy-to-use, and providing security as a by-product – very different from interfaces of ‘security software’.

Peter Gutmann is not at all siding with ‘experts’ who always chide end-users for being lazy and dumb – writing passwords down and stick the post-its on their screens – and who state that all we need is more training and user awareness. Normal users use systems to get their job done and they apply risk management in an intuitive way: Should I waste time in following an obnoxious policy or should I try to pass that hurdle as quick as possible to do what I am actually paid for?

Geeks are weird – that’s a quote from the lecture slides linked above. Since Peter Gutmann is an academic computer scientist and obviously a down-to-earth practitioner with ample hands-on experience – which would definitely qualify him as a Geek God – his critique is even more convincing. In the book he quotes psychological research which prove that geeks really think different (as per standardized testing of personality types). Geeks constitute a minority of people (7%) that tend to take decisions – such as Should I click that pop-up? – in a ‘rational’ manner, as the simple and mostly wrong theories on decision making have proposed. One example Gutmann uses is testing for basic understanding of logics, such as Does ‘All X are Y’ imply ‘Some X are Y’? Across cultures the majority of people thinks that this is wrong.

Normal people – and I think also geeks when they don’t operate in geek mode, e.g. in the wild, non in their programmer’s cave – fall for many so-called fallacies and biases.

Our intuitive decision making engine runs on autopilot and we get conditioned to click away EULAs, or next-next-finish the dreaded install wizards, or click away pop-ups, including the warnings. As users we don’t generate testable hypotheses or calculate risks but act unconsciously based on our experience what had worked in the past – and usually the click-away-anything approach works just fine. You would need US navy-style constant drilling in order to be alert enough not to fall for those fallacies. This does exactly apply to anonymous end users using their home PCs to do online-banking.

Security indicators like padlocks and browser address bar colors change with every version of popular browsers. Not even tech-savvy users are able to tell from those indicators if they are ‘secure’ now. But what it is extremely difficult: Users would need to watch out for the lack of an indicator (that’s barely visible when it’s there). And we are – owing to confirmation bias – extremely bad at spotting the negative, the lack of something. Gutmann calls this the Simon Says problem.

It is intriguing to see how biases about what ‘the others’ – the users or the attackers – would do enter technical designs. For example it is often assumed that a client machine or user who has authenticated itself is more trustworthy – and servers are more vulnerable to a malformed packet sent after successful authentication. In the Stuxnet attack digitally signed malware (signed by stolen keys) that has been used – ‘if it’s signed it has to be secure’.

To make things worse users are even conditioned for ‘insecure’ behavior: When banks use all kinds fancy domain names to market their latest products, they lure their users into clicking on links to that fancy sites in e-mails and have them logon with their banking user accounts via these sites they train users to fall for phishing e-mails – despite the fact that the same e-mails half-heartedly warn about clicking arbitrary links in e-mails.

I want to stress that I believe in relation to systems like PKI – that require you run some intricate procedures every few years only (these are called ceremonies for a reason), but then it is extremely critical – also admins should also be considered ‘users’.

I have spent many hours in discussing proposed security features like Passwords need to be impossible to remember and never written down with people whose job it is to audit, draft policies, and read articles on what Gutmann calls conference-paper-attacks all the day. These are not the people who have to run systems, deal with helpdesk calls or costs, and with requests from VIP users as top level managers who had on the one hand been extremely paranoid about system administrators sniffing their e-mails but yet on the other hand need instant 24/7 support with recovery of encrypted e-mails (This should be given a name like the Top Managers’ Paranoia Paradox)

As a disclaimer I’d like to add that I don’t underestimate cyber security threats, risk management, policies etc. It is probably the current media hype on governments spying on us that makes me advocate a contrarian view.

I could back this up by tons of stories, many of them too good to be made up (but unfortunately NDA-ed): security geeks in terms of ‘designers’ and ‘policy authors’ often underestimate time and efforts required in running their solutions on a daily basis. It is often the so-called trivial and simple things that go wrong, such as: The documentation of that intricate process to be run every X years cannot be found, or the only employee who really knew about the interdependencies is long gone, or allegedly simple logistics that go wrong (Now we are locked in the secret room to run the key ceremony… BTW did anybody think of having the media ready to install the operating system on that high secure isolated machine?).

A large European PKI setup failed (it made headlines) because the sacred key of a root certification authority had been destroyed – which is the expected behavior for so-called Hardware Security Modules when they are tampered with or at least the sensors say so, and there was no backup – the companies running the project and running operations blamed each other.

I am not quoting this to make fun of others although the typical response here is to state that projects or operations have been badly managed and you just need to throw more people and money on them to run secure systems in a robust and reliable way. This might be true but it does simply not reflect the typical budget, time constraints, and lack of human resources typical IT departments of corporations have to deal with.

There is often a very real, palpable risk of trading off business continuity and availability (that is: safety) for security.

Again I don’t want to downplay risks associated with broken algorithms and the NSA reading our e-mail. But as Peter Gutmann points out cryptography is the last thing an attacker would target (even if a conference-paper attack had shown it is broken) – the implementation of cryptography rather guides attackers along the lines of where not to attack. Just consider the spectacular recent ‘hack’ of a prestigious one-letter Twitter account which was actually blackmailing the user after having gained control over a user’s custom domain through social engineering – of most likely underpaid call-center agents who had to face that dilemma of meeting the numbers in terms of customer satisfaction versus following the security awareness training they might have had.

Needless to say, encryption, smart cards, PKI etc. would not have prevented that type of attack.

Peter Gutmann says about himself he is throwing rocks at PKIs, and I believe you can illustrate a particularly big problem using a perfect real-live metaphor: Digital certificates are like passports or driver licenses to users – signed by a trusted agency.

Now consider the following: A user might commit a crime and his driver license is seized. PKI’s equivalent of that seizure is to have the issuing agency publishing a black list regularly, listing all the bad guys. Police officers on the road need to have access to that blacklist in order to check drivers’ legitimacy. What happens if a user isn’t blacklisted but the blacklist publishing service is not available? The standard makes this check optional (as many other things which is the norm when an ancient standard is retrofitted with security features) but let’s assume the police app follows the recommendation what they SHOULD do.  If the list is unavailable the user is considered and alleged criminal and has to exit the car.

You could also imagine something similar happening to train riders who have printed out an online ticket that cannot be validated (e.g. distinguished from forgery) by the conductor due to a failure in the train’s IT systems.

Any ’emergency’ / ‘incident’ related to digital certificates I was ever called upon to support with was related to false negative blocking users from doing what they need to do because of missing, misconfigured or (temporarily) unavailable certificate revocation lists (CRLs). The most important question in PKI planning is typically how to workaround or prevent inaccessible CRLs. I am aware of how petty this problem may appear to readers – what’s the big deal in monitoring a web server? But have you ever noticed how many alerts (e.g. via SMS) a typical administrator gets – and how many of them are false negatives? When I ask what will happen if the PKI / CRL signing / the web server breaks on Dec. 24 at 11:30 (in a European country) I am typically told that we need to plan for at least some days until recovery. This means that revocation information on the blacklist will be stale, too, as CRLs can be cached for performance reasons.

As you can imagine most corporations rather tend to follow the reasonable approach of putting business continuity over security so they want to make sure that a glitch in the web server hosting that blacklists will not stop 10.000 employees from accessing the wireless LAN, for example. Of course any weird standard can be worked around given infinite resources. The point I wanted to make was that these standards have been designed having something totally different in mind, by PKI Theologians in the 1980s.

Admittedly though, digitally certificates and cryptography is great playground for geeks. I think I was a PKI theologian myself many years ago until I rather morphed in what I call anti-security consultant tongue-in-cheek – trying to help users (and admins) to keep on working despite new security features. I often advocated for not using certificates and proposing alternative approaching boiling down the potential PKI project to a few hours of work, against the typical consultant’s mantra of trying to make yourself indispensable in long-term projects and by designing blackboxes the client will never be able to operate on his own. Not only because of the  PKI overhead but because alternatives were as secure – just not as hyped.

So in summary I am recommending Peter Gutmann’s terrific resources (check out his Crypto Tutorial, too!) to anybody who is torn between geek enthusiasm for some obscure technology and questioning its value nonetheless.

Rusty Padlock

No post on PKI, certificates and key would be complete without an image like this.I found the rusty one particularly apt here. (Wikimedia, user Garretttaggs)

Fragile Technology? (Confessions of a Luddite Disguised as Tech Enthusiast)

I warn you – I am in the mood for random long-winded philosophical ramblings.

As announced I have graduated recently again, denying cap-and-gown costume as I detest artificial Astroturf traditions such as re-importing academic rituals from the USA to Europe. A Subversive El(k)ement fond of uniforms would not be worth the name.

However, other than that I realize that I have probably turned into a technophobe luddite with a penchant for ancestral traditions.

Long-term followers might know what I am heading at again as I could only have borrowed a word as ancestral from Nassim N. Taleb. I have re-read Taleb’s The Black Swan and Antifragile. The most inspirational books are those that provide you with words and a framework to re-phrase what you already know:

Authors theorize about some ancestry of my ideas, as if people read books then developed ideas, not wondering whether perhaps it is the other way around; people look for books that support their mental program. –Nassim N. Taleb, Antifragile, Kindle Locations 3405-3406.

I have covered Antifragile at length in an earlier article. In a nutshell, antifragility is the opposite of fragility. This definition goes beyond robustness – it is about systems gaining from volatility and disorder. I will not be able to do this book justice in a blog post, not even a long one. Taleb’s speciality is tying his subject matter expertise (in many fields) to personal anecdotes and convictions (in many fields) – which is why some readers adore his books and others call them unscientific.

I am in the former camp as hardly any other author takes consistency of personal biography and professional occupation and writing that far. I was most intrigued by the notion Skin in the Game which is about being held accountable 100%, about practicing what you preach.

I eat my own cooking. I have only written, in every line I have composed in my professional life, about things I have done, and the risks I have recommended that others take or avoid were risks I have been taking or avoiding myself. I will be the first to be hurt if I am wrong. –Nassim N. Taleb, Antifragile, Kindle Locations 631-633

Taleb has the deepest respect for small business owners and artisans – and so do I. He is less kind to university professors, particularly those specialized in economics and employed managers, particularly those of banks.

Some of Taleb’s ideas appear simple (to comprehend, not necessarily to put into practice), often of the What my grandmother told me variety – which he does not deny. But he can make a nerd like me wonder if some things are probably – simply that simple. In case you are not convinced he also publishes scientific papers loaded with math jargon. Taleb mischievously mentions that his ideas called too trivial and obvious have been taken seriously after he translated them into formal jargon.

I don’t read his books as a detached scientist – it is more like talking to somebody, comparing biographies and ideas, and suddenly feeling vindicated.

A mundane example: At times I had given those woman-in-tech-as-a-role-model interviews – despite some reluctance. One time my hesitation was justified. Talking about my ‘bio’ I pointed that I am proud of having thrived for some years as an entrepreneur in a narrow niche in IT. In the written version the interviewers rather put emphasis on the fact I had been employed by a well-known company years before. Fortunately I was given a chance to review and correct it.

Asking for their rationale they made it worse: I have been told that it is an honor to be employed by such a big brand name company. Along similar lines I found it rather disturbing that admirers of my academic track record told me (in retrospect of course, when I was back on a more prestigious track) that working as a consultant for small businesses was just not appropriate.

What is admirable about being the ant in the big anthill?

I had considered my own life and career an attempt – or many attempts – to reconcile, unite or combine things opposite. Often in a serial fashion. In my pre-Taleb reading era I used to quote Randy Komisar’s Portfolio of Passions or Frank Levinson’s 1000 ideas you need to have (and discard again) as a business ower.

Taleb introduced optionality to my vocabulary, borrowed from trader’s jargon: An option is the right but not the obligation to engage in a transaction. Thus you should avoid personal and career decisions that puts you on a track of diminishing options. This is exactly what I felt about staying in academia too long – becoming a perpetual post-doc, finally too old and too specialized for anything else.

Nassim Taleb does not respect nerdiness and smartness as we define it the academic way.

If you “have optionality,” you don’t have much need for what is commonly called intelligence, knowledge, insight, skills, and these complicated things that take place in our brain cells. For you don’t have to be right that often. –Nassim N. Taleb, Antifragile, Kindle Locations 3097-3099.

He suggests just passing exams with minimum score. I, nerd of stellar grades and academic fame, declare defeat – I have already repented here. But let me add a minor remark from cultural perspective: I feel that academic smartness is more revered in North America than it is in middle Europe although America values hands-on, non-academic risk taking more, as Taleb points out correctly. I had been surrounded by physicists with an engineering mindset – theoretical physics was for the socially awkward nerds and not a domain you become a rockstar in.

It would not de me good to brag about any sort of academic achievement in my ancestral country – it rather puts you under pressure to prove that you are a genuine human being and still capable of managing daily life’s challenges, such as exchanging a light bulb, despite your absent-minded professor’s attitude. Probably it can be related to our strong tradition of non-academic, secondary education – something Taleb appreciates in the praise of Switzerland’s antifragility.

I have been torn between two different kinds of aspirations ever since: I was that bookish child cut out for academia or any sort of profession concerned with analyzing, writing, staying at the sideline, fence-sitting and commenting. But every time I revisited my career decisions I went for the more tangible, more applied, more involved in getting your hands dirty – and the more mundane. Taleb’s writings vindicate my propensity.

I had always felt at home in communities of self-educated tinkerers – both in IT and in renewable energy. I firmly believe that any skill of value in daily professional life is self-taught anyway, no matter how much courses in subjects as project management you have been forced to take.

For I am a pure autodidact, in spite of acquiring degrees. –Nassim N. Taleb, Antifragile, Kindle Locations 4132-4133.

Blame it on my illiteracy but Taleb is the first author who merges (for me) deep philosophical insights with practical and so-to-say ‘capitalist’ advice – perfectly reflecting my own experiences:

My experience is that money and transactions purify relations; ideas and abstract matters like “recognition” and “credit” warp them, creating an atmosphere of perpetual rivalry. I grew to find people greedy for credentials nauseating, repulsive, and untrustworthy. –Nassim N. Taleb, Antifragile, Kindle Locations 678-680

I’d rather work some not-too-glorious jobs based on a simple feedback loop, that is: People do want something badly – I do it – they pay me, and I’d rather not (anymore): write applications for research grants in order to convince a committee or execute the corporate plan to meet the numbers.

Taleb provided very interesting historical evidence that so-called innovation has actually been triggered by now forgotten self-educated tinkerers rather than by science applying Soviet-Havard-style planning. You might object to those theories, probably arguing that we never had a man on the moon or the Dreamliner airplane without Soviet-Havard-style research, let alone LHC and the discovery of the Higgs boson. I might object to this objection by hypothesizing that the latter probably does not result in products we desperately really need (which includes big airplanes and business travel).

But I do know the counter-arguments – Einstein and the GPS, Faraday and allegedly useless electromagnetic waves that once will be taxed, WWW and CERN – and I don’t hold very strong opinions on this.

Because of the confirmation problem, one can argue that we know very little about our natural world; we advertise the read books and forget about the unread ones. Physics has been successful, but it is a narrow field of hard science in which we have been successful, and people tend to generalize that success to all science. It would be preferable if we were better at understanding cancer or the (highly nonlinear) weather than the origin of the universe. –Nassim N. Taleb, The Black Swan, Kindle Locations 3797-380

I absolutely do love theoretical physics – when other people listen to meditation music, do yoga, go to church, take walks in the sunset, wax poetic, read Goethe, are bamboozled by renaissance art: I read text books on quantum field theory. There is joy in knowledge for the sake of knowledge. So academics should be paid by the public for providing the raw material.

But I know that Taleb’s analysis is true when applied to some research I have some personal familiarity with. Austria has been a pioneer in solar thermal energy – many home owners have installed glazed solar collectors on their roofs. The origin of that success is tinkering by hobbyists – and solar collectors are still subject to DIY tinkering. Today academics do research in solar thermal energy, building upon those former hobbyist movements. And I know from personal experience and training that academics in applied sciences are really good at dressing up their tinkering as science.

Nassim Taleb also believes that organized education and organized science follows wealth, not the other way round. Classical education in the sense of true erudition is something you acquire because you want to become a better human being. Sending your kids to school in order to boost GDP is a rather modern, post WW II, approach.

Thus I believe in the value of fundamental research in science in the same way as I still believe in the value of a well-rounded education and reading the ancients, as Nassim Taleb does. But it took me several attempts to read Taleb’s book and to write this post to realize that I am skeptical about the sort of tangible value of some aspects of science and technology as they relate to my life here and now.

I enjoyed Taleb’s ramblings on interventionism in modern medicine – one of the chapters in Antifragile that probably polarizes the most. Taleb considers anything living and natural superior to anything artificial and planned by Soviet-Harvard-style research – something better not be tinkered with unless odds are extremely high for positive results. Surgery in life-threatening situations is legitimate, cholesterol and blood pressure reducing medication is not. Ancestral and religious traditions may get it right even if their rationales are wrong: Fasting for example may provide the right stimuli for the human body that is not designed for an over-managed regular, life-hacker’s, over-optimizer’s life-style along the lines of those five balanced daily meals your smartphone app reminds you of. As a disclaimer I have to add: Just as Taleb I am not at all into alternative medicine.

Again, I don’t have very strong opinions about medical treatments and the resolution to the conflict might be as simple as: Probably we don’t get the upsides of life-saving surgery without the downsides of greedy pharmaceuticals selling nice-to-have drugs that are probably even harmful in the long run.

But – again – I find Taleb’s ideas convincing if I try to carry them over to other fields in history of science and technology I have the faintest clue of. Software vendors keep preaching to us – and I was in that camp for some time, admittedly – that software makes us more productive. As a mere user of software forced upon me, by legal requirements, I have often wondered if ancient accountants had been less productive in literally keeping books.

I found anecdotal evidence last year that users of old tools and software are still just as productive – having become skilled in their use, even if they do accounting on clay tablets. This article demonstrates that hopelessly outdated computer hardware and software is still in use today. I haven’t been baffled by ancient computers in military and research but I have been delighted to read this:

Punch-Card Accounting
Sparkler Filters of Conroe, Texas, prides itself on being a leader in the world of chemical process filtration. If you buy an automatic nutsche filter from them, though, they’ll enter your transaction on a “computer” that dates from 1948. Sparkler’s IBM 402 is not a traditional computer, but an automated electromechanical tabulator that can be programmed (or more accurately, wired) to print out certain results based on values encoded into stacks of 80-column Hollerith-type punched cards.
Companies traditionally used the 402 for accounting, since the machine could take a long list of numbers, add them up, and print a detailed written report. In a sense, you could consider it a 3000-pound spreadsheet machine.

I guess the operators of this computer are smiling today, when reading about the NSA spying on us and Russian governmental authorities buying typewriters again.

IBM 403 accounting machine

The machine in the foreground is an IBM 403 accounting machine where the input are punched cards; the machine in the center is an IBM 514 Reproducing Punch apparently connected to the foreground 403 as a summary punch, and the one in the background is another 403 or 402 accounting machine. (Wikimedia, Flickr user ArnoldReinhold)

I don’t advocate reverting to ancient technology – but I don’t take progress and improvements for granted either. Nicholas Carr, author of The Shallows: What the Internet is Doing to Our Brains plans to release his new book in 2014, titled The Glass Cage: Automation and Us. In his related essay in The Atlantic Carr argues:

It reveals that automation, for all its benefits, can take a toll on the performance and talents of those who rely on it. The implications go well beyond safety. Because automation alters how we act, how we learn, and what we know, it has an ethical dimension. The choices we make, or fail to make, about which tasks we hand off to machines shape our lives and the place we make for ourselves in the world. That has always been true, but in recent years, as the locus of labor-saving technology has shifted from machinery to software, automation has become ever more pervasive, even as its workings have become more hidden from us. Seeking convenience, speed, and efficiency, we rush to off-load work to computers without reflecting on what we might be sacrificing as a result.

Probably productivity enhancements kick in exactly when the impacts outlined by Carr take effect. But I would even doubt the time-saving effects and positive impacts on productivity in many domains where they are marketed so aggressively today.

Show me a single company whose sales people or other road warriors do not complain about having to submit reports and enter the numbers to that infamous productivity tool. As a small business owner I do complain about ever increasing reporting and forecasting duties inflicted upon me by governmental agencies, enterprise customers, or big suppliers – a main driver for me to ‘go small’ in any aspect of my business, by the way. Of course software would ease our bureaucratic pains if the requirements would be the same as when double-entry accounting has been invented by Pacioli in the 15th century. But the more technology John Doe is expected to use today, the more ideas CEOs and bureaucrats dream up – about data they need because John Doe ought to deliver them anyway in an effortless way.

Reading all the articles about the NSA makes me wonder if additions of painful tedious work due to the technology we ought to use is something marginal only I rant about. I had said it often in pre-public-NSA-paranoia times: I would love to see that seamless governmental spying at work to free me from that hassle. I had been confronted with interfaces and protocols not working and things too secure in the sense of people locking themselves out of the system.

So in summary I feel like an anti-technology consultant often, indulging in supporting people with working productively despite technology. Since this seems quite a negative approach I enjoy making wild speculative connections and mis-use interdisciplinary writings such as Taleb’s to make my questionable points.