My Data Kraken – a Shapeshifter

I wonder if Data Kraken is only used by German speakers who translate our hackneyed Datenkrake – is it a word like eigenvector?

Anyway, I need this animal metaphor, despite this post is not about facebook or Google. It’s about my personal Data Kraken – which is a true shapeshifter like all octopuses are:

(… because they are spineless, but I don’t want to over-interpret the metaphor…)

Data Kraken’s shapeability is a blessing, given ongoing challenges:

When the Chief Engineer is fighting with other intimidating life-forms in our habitat, he focuses on survival first and foremost … and sometimes he forgets to inform the Chief Science Officer about fundamental changes to our landscape of sensors. Then Data Kraken has to be trained again to learn how to detect if the heat pump is on or off in a specific timeslot. Use the signal sent from control to the heat pump? Or to the brine pump? Or better use brine flow and temperature difference?

It might seem like a dull and tedious exercise to calculate ‘averages’ and other performance indicators that require only very simple arithmetics. But with the exception of room or ambient temperature most of the ‘averages’ just make sense if some condition is met, like: The heating water inlet temperature should only be calculated when the heating circuit pump is on. But the temperature of the cold water, when the same floor loops are used for cooling in summer, should not be included in this average of ‘heating water temperature’. Above all, false sensor readings, like 0, NULL or any value (like 999) a vendor chooses to indicate as an error, have to be excluded. And sometimes I rediscover eternal truths like the ratio of averages not being equal to the average of ratios.

The Chief Engineer is tinkering with new sensors all the time: In parallel to using the old & robust analog sensor for measuring the water level in the tank…

Level sensor: The old way

… a multitude of level sensors was evaluated …

Level sensors: The precursors

… until finally Mr. Bubble won the casting …

blubber-messrohr-3

… and the surface level is now measured via the pressure increasing linearly with depth. For the Big Data Department this means to add some new fields to the Kraken database, calculate new averages … and to smoothly transition from the volume of ice calculated from ruler readings to the new values.

Change is the only constant in the universe, paraphrasing Heraclitus [*]. Sensors morph in purpose: The heating circuit, formerly known (to the control unit) as the radiator circuit became a new wall heating circuit, and the radiator circuit was virtually reborn as a new circuit.

I am also guilty of adding new tentacles all the time, too, herding a zoo of meters added in 2015, each of them adding a new log file, containing data taken at different points of time in different intervals. This year I let Kraken put tentacles into the heat pump:

Data Kraken: Tentacles in the heat pump!

But the most challenging data source to integrate is the most unassuming source of logging data: The small list of the data that The Chief Engineer had recorded manually until recently (until the advent of Miss Pi CAN Sniffer and Mr Bubble). Reason: He had refused to take data at exactly 00:00:00 every single day, so learned things I never wanted to know about SQL programming languages to deal with the odd time intervals.

To be fair, the Chief Engineer has been dedicated at data recording! He never shunned true challenges, like a legendary white-out in our garden, at the time when measuring ground temperatures was not automated yet:

The challenge

White Out

Long-term readers of this blog know that ‘elkement’ stands for a combination of nerd and luddite, so I try to merge a dinosaur scripting approach with real-world global AI Data Krakens’ wildest dream: I wrote scripts that create scripts that create scripts [[[…]]] that were based on a small proto-Kraken – a nice-to-use documentation database containing the history of sensors and calculations.

The mutated Kraken is able to eat all kinds of log files, including clients’ ones, and above all, it can be cloned easily.

I’ve added all the images and anecdotes to justify why an unpretentious user interface like the following is my true Christmas present to myself – ‘easily clickable’ calculated performance data for days, months, years, and heating seasons.

Data Kraken: UI

… and diagrams that can be changed automatically, by selecting interesting parameters and time frames:

Excel for visualization of measurement data

The major overhaul of Data Kraken turned out to be prescient as a seemingly innocuous firmware upgrade just changed not only log file naming conventions and publication scheduled but also shuffled all the fields in log files. My Data Kraken has to be capable to rebuild the SQL database from scratch, based on a documentation of those ever changing fields and the raw log files.

_________________________________

[*] It was hard to find the true original quote for that, as the internet is cluttered with change management coaches using that quote, and Heraclitus speaks to us only through secondary sources. But anyway, what this philosophy website says about Heraclitus applies very well to my Data Kraken:

The exact interpretation of these doctrines is controversial, as is the inference often drawn from this theory that in the world as Heraclitus conceives it contradictory propositions must be true.

In my world, I also need to deal with intriguing ambiguity!

Anniversary 4 (4 Me): “Life Ends Despite Increasing Energy”

I published my first post on this blog on March 24, 2012. Back then its title and tagline were:

Theory and Practice of Trying to Combine Just Anything
Physics versus engineering
off-the-wall geek humor versus existential questions
IT versus the real thing
corporate world’s strangeness versus small business entrepreneur’s microcosmos knowledge worker’s connectedness versus striving for independence

… which became

Theory and Practice of Trying to Combine Just Anything
I mean it

… which became

elkemental Force
Research Notes on Energy, Software, Life, the Universe, and Everything

last November. It seems I have run out of philosophical ideas and said anything I had to say about Life and Work and Culture. Now it’s not Big Ideas that make me publish a new post but my small Big Data. Recent posts on measurement data analysis or on the differential equation of heat transport  are typical for my new editorial policy.

Cartoonist Scott Adams (of Dilbert fame) encourages to look for patterns in one’s life, rather than to interpret and theorize – and to be fooled by biases and fallacies. Following this advice and my new policy, I celebrate my 4th blogging anniversary by crunching this blog’s numbers.

No, this does not mean I will show off the humbling statistics of views provided by WordPress 🙂 I am rather interested in my own evolution as a blogger. Having raked my virtual Zen garden two years ago I have manually maintained lists of posts in each main category – these are my menu pages. Now I have processed each page’s HTML code automatically to count posts published per month, quarter, or year in each category. All figures in this post are based on all posts excluding reblogs and the current post.

Since I assigned two categories to some posts, I had to pick one primary category to make the height of one column reflect the total posts per month:Statistics on blog postings: Posts per month in each main category

It seems I had too much time in May 2013. Perhaps I needed creative compensation – indulging in Poetry and pop culture (Web), and – as back then I was writing a master thesis.

I had never missed a single month, but there were two summer breaks in 2012 and 2013 with only 1 post per month. It seems Life and Web gradually have been replaced by Energy, and there was a flash of IT in 2014 which I correlate with both nostalgia but also a professional flashback owing to lots of cryptography-induced deadlines.

But I find it hard to see a trend, and I am not sure about the distortion I made by picking one category.

So I rather group by quarter:

Statistics on blog postings: Posts per quarter in each main category

… which shows that posts per quarter have reached a low right now in Q1 2016, even when I would add the current posting. Most posts now are based on original calculations or data analysis which take more time to create than search term poetry or my autobiographical vignettes. But maybe my anecdotes and opinionated posts had just been easy to write as I was drawing on ‘content’ I had in mind for years before 2012.

In order to spot my ‘paradigm shifts’ I include duplicates in the next diagram: Each post assigned to two categories is counted twice. Since then the total number does not make sense I just depict relative category counts per quarter:

Statistics on blog postings: Posts per quarter in each category, including the assignment of more than one category.

Ultimate wisdom: Life ends, although Energy is increasing. IT is increasing, too, and was just hidden in the other diagram: Recently it is  often the secondary category in posts about energy systems’ data logging. Physics follows an erratic pattern. Quantum Field Theory was accountable for the maximum at the end of 2013, but then replaced by thermodynamics.

Web is also somewhat constant, but the list of posts shows that the most recent Web posts are on average more technical and less about Web and Culture and Everything. There are exceptions.

Those trends are also visible in yearly overviews. The Decline Of Web seems to be more pronounced – so I tag this post with Web.

Statistics on blog postings: Posts per year in each main category

Statistics on blog postings: Posts per year in each category, including the assignment of more than one category.

But perhaps I was cheating. Each category was not as stable as the labels in the diagrams’ legends do imply.

Shortcut categories refer to
1) these category pages: EnergyITLifePhysicsPoetryWeb,
2) and these categories EnergyITLifePhysicsPoetryWeb, respectively, manually kept in sync.

So somehow…

public-key-infrastructure became control-and-it

and

on-writing-blogging-and-indulging-in-web-culture is now simply web

… and should maybe be called nerdy-web-stuff-and-software-development.

In summary, I like my statistics as it confirms my hunches but there is one exception: There was no Poetry in Q1 2016 and I have to do something about this!

________________________________

The Making Of

  • Copy the HTML content of each page with a list to a text editor (I use Notepad2).
  • Find double line breaks (\r\n\r\n) and replace them by a single one (\r\n).
  • Copy the lines to an application that lets you manipulate strings (I use Excel).
  • Tweak strings with formulas / command to cut out date, url, title and comment. Use the HTML tags as markers.
  • Batch-add the page’s category in a new column.
  • Indicate if this is the primary or secondary category in a new column (Find duplicates automatically before so 1 can be assigned automatically to most posts.).
  • Group the list by month, quarter, and year respectively and add the counts to new data tables that will be used for diagrams (e.g. Excel function COUNTIFs, using only the category or category name  + indicator for the primary category as criteria).

It could be automated even better – without having to maintain category pages by simply using the category feeds (like this: https://elkement.wordpress.com/category/physics/feed) or by filtering the full blog feed for categories. I have re-categorized all my posts so that categories matches menu page lists, but I chose to use my lists as

  1. I get not only date and headline, but also my own additional summary / comment that’s not part of the feed. For our German blog, I actually do this in reverse: I create the HTML code of a a sitemap-style overview page on wordpress.com from an Excel list of all posts plus custom comments and then copy the auto-generated code to the HTML view of the respective menu page on the blog.
  2. the feed provided by WordPress.com can have 150 items maximum no matter which higher number you try to configure. So you need to start analyzing before you have published 150 posts.
  3. I can never resist to create a tool that manipulates text files and automates something, however weird.