Waging a Battle against Sinister Algorithms

I have felt a disturbance of the force.

As you might expect from a blog about anything, this one has a weird collection of unrelated top pages and posts. My WordPress Blog Stats tell me I am obviously an internet authority on: how rodents get into kitchen appliances, about the physics of a spinning toy, about the history of the first heat pump, and most recently about how to sniff router traffic. But all those posts and topics are eclipsed by the meteoric rise of the single most popular ever article, which was a review of a book on a subfield in theoretical physics. I am not linking this post or quoting its title for reasons you might understand in a minute.

Checking out Google Webmaster Tools the effect is even more pronounced. Some months ago this textbook review attracted by far the most Google search impressions and clicks. Looking at the data from the perspective of a bot it might appear as if my blog had been created just to promote that book. Which is, what I believe might actually had happened.

Concluding from historical versions of the book author’s website (on archive.org), the page impressions of my review started to surge when he put a backlink to my post on his page, some when in spring this year.

But then in autumn this happened.

Page impressions for this blog on Google Webmaster Tools, Sept to Dec.These are the impressions for searches from desktop computers (‘Web’), without image or mobile search. A page impression means that  the link had been displayed on Google Search Results pages to some user. The curve does not change much if I remove the filter for Web.

For this period of three months, that article I Shall Not Quote is the top page in terms of impressions, right after the blog’s default page. I wondered about the reason for this steep decline as I usually don’t see any trend within three months on any of my sites.

If I decrease the time slot to the past month that infamous post suddenly vanishes from the top posts:

Page impressions and top pages in the last monthIt was eradicated quickly – which can only be recognized when decreasing the time slot step-by-step. With a few days at the end of October / beginning of November the entry seems to have been erased from the list of impressions.

I sorted the list of results shown above by the name of the page, not by impressions. Since WordPress posts’ names are prefixed with dates you would expect to see any of your posts in that list somewhere, some of them of course with very slow scores. Actually, that list does include also obscure early posts from 2012 nobody ever clicks at.

The former top post, however, did not get a single impression anymore in the past month. I have highlighted the posts before and after in the list, and I have removed all filters for this one, thus also image and mobile search are taken into account. The post’s name started with /2013/12/22/:

Last month, top pages, recent top post missingChecking the status of indexed pages in total confirms that links have been recently removed:

Index status of this blogFor my other sites and blog this number is basically constant – as long as a website does not get hacked. As our business site actually has been a month ago. Yes, I only mention this in passing as I am less worried about that hack than about that mysterious penalizing of this blog.

I learned that your typical hack of a website is less spectacular that what hacker movies let you believe: If you are not a high-profile target, hacker-spammers leave your site intact, but place additional spammy pages with cross-links on your site to promote their links. You recognize this immediately by a surge of the number of URLs, of indexing activities, and – in case your hoster is as vigilant as mine – a peak in 404 not found errors after that spammy pages have been removed. This is the intermittent spike in spammy pages on our business page crawled by Google:

Crawl stats after hackI used all tools at my disposal to clean up the mess the hackers caused – those pages actually have been indexed already. It will take a while until things like ‘fake Gucci belts’ will be removed from our top content keywords, after I removed the links from the index by editing robots.txt, and using the Google URL removal tool and the URL parameters tool (the latter comes in handy as the spammy pages have been indexed with various query strings, that is: parameters).

I have expected the worst but Google have not penalized me for that intermittent link spam attack (yet?). Numbers are now back to normal after a peak in queries for those fake brand stuff:

Queries back to normal after clean-up.It was an awful lot of work to clean those URLs popping up again and again every day. I am willing to fight the sinister forces without too much whining. But Google’s harsh treatment of the post on this blog freaks me out. It is not only the blog post that was affected but also the pages for the tags, categories and archive entries. Nearly all of these pages – thus all the pages linking to the post – did not get a single impression anymore.

Google Webmaster Tools also tells me that the number of so-called Structured Data for this blog had been reduced to nearly zero:

Structured data on this blogStructured Data are useful for pages that show e.g. product reviews or recipes – anything that should have a pre-defined structure that might be presented according to that structure in Google search results, via nice formatted snippets. My home-grown websites do not use those, but the spammer-hackers had used such data in their link spam pages – so on our business site we saw a peak in structured data at the time of the hack.

Obviously WP blogs use those per design. Our German blog is based on the same WP theme – but the number of structured data there has been constant. So if anybody out there is using theme Twenty Eleven I would be happy to learn about your encounters with structured data.

I have read a lot: what I never wanted to know about search engine optimization. This also included hackers’ Black SEO. I recommend the book Spam Nation by renowned investigative reporter and IT security insider Brian Krebs, published recently. Whose page and book I will again not link.

What has happened? I can only speculate.

Spammers build networks of shady backlinks to promote their stuff. So common knowledge is of course that you should not buy links or create such network scams. Ironically, I have cross-linked all my own sites like hell for many years. Not for SEO purposes but in my eternal quest for organizing my stuff, keeping things separate, but adding the right pointers though, Raking the virtual Zen Garden etc. Never ever did this backfire. I was always concerned about the effect of my links and resources pages (links to other pages, mainly tech and science). Today my site radices.net which was once an early German predecessor of this blog is my big link dump – but still these massive link collections are not voted down by Google.

Maybe Google considers my posting and the physics book author’s website part of such a link scam. I have linked to the author’s page several times – to sample chapters, generously made available via download as PDFs, and the author linked back to me. I had refused to tie my blog to my Google+ account and claim ‘Google authorship’ so far as I don’t wanted to trade elkement for my real name on G+. Via Webmaster tools Google knows about all my domains but they might suspect I – a pseudo-anonymous elkement, using an @subversiv.at address on G+ – might also own the book author’s domain that I – diabolically smart – did not declare in Webmaster Tools.

As I said before, from a most objective perspective Google’s rationale might not be that unreasonable. I don’t write book reviews that often, my most recent were about The Year Without Pants and The Glass Cage. I rather write posts triggered by one idea in a book, maybe not even the main one. When I write about books I don’t use Amazon Affiliate marketing – as professional reviewers such as Brain Pickings or Farnam Street do. I write about unrelated topics. I might not match the expected pattern. This is amusing as long as only a blog is concerned but on principle it is similar as being interviewed by the FBI at an airport because your travel pattern just can’t be normal (as detailed in the book Bursts, on modelling human behaviour – a book I also sort of reviewed last year).

In short, I sometimes review and ‘promote’ books without any return on that. I simply don’t review books I don’t like as I think blogging should be fun. Maybe in an age of gamified reviews and fake forum posts with spammy signatures Google simply doesn’t buy into that. I sympathize. I learned that forums websites shod add a nofollow tag to any hyperlinks users post so that Google will now downvote the link targets. So links in discussion groups are considered spammy per se and you need to do something about it so that they don’t hurt what you – as a forum user – are probably trying to discuss or recommend in good faith. I already live in fear that those links some tinkerers set in DIYer’s forums (linking to our business site or my posts on our heating system) will be considered paid link spam.

However, I cannot explain why I can find my book review post on Google (thus generating an impression) when searching for site:[URL of the post]. Perhaps consolidation takes time. Perhaps there is hope. I even see the post when I use Tor Browser and a foreign IP address so this is not related to my preferences as a logged on Google user. But if there isn’t a glitch in Webmaster Tools, no other typical searcher encounters this impression. I am aware of the tool for disavowing URLs but I don’t want to report a perfectly valid backlink. In addition, that backlink from the author’s site does not even show up in the list of external backlinks which is another enigma.

I know that this seems to be an obsession with a first world problem: This was an post on a topic I don’t claim expertise or that I don’t consider strategically important. But whatever happens to this blog could happen to other sites I am more concerned about, business-wise. So I hope if is just a bug and/or Google Bots will read this post and will release my link. Just in case I mentioned your book or blog here, even if indirectly, please don’t backlink.

Perhaps Google did not like my ranting about encrypted search terms, not available to the search term poet. I dared to display the Bing logo back then. Which I will do again now as:

  • Bing tells me that the infamous post generates impressions and clicks
  • Bing recognizes the backlink
  • The number of indexed pages is increasing gradually with time.
  • And Bing did not index the spammy pages in the brief period they were on our hacked website.

Bing logo (2013)Update 2014-12-23 – it actually happened twice:

Analyzing the impressions from the last day I realize that Google has also treated my physics resources page Physics Books on the Bedside Table. Page impressions dropped and now that page which was the top one )after the review had plummeted) is gone, too. I had already considered to move this page to my site that hosts all those list of links (without issues, so far): radices.net, and I will complete this migration in a minute. Now of course Google might think I, the link spammer, am frantically moving on to another site.

Update 2014-12-24 – now at least results are consistent:

I cannot see my own review post anymore when I search for the title of the book. So finally the results from Webmaster Tools are in line with my tests.

Update 2015-01-23 – totally embarrassing final statement on this:

WordPress has migrated their hosted blogs to https only. All my traffic was hiding in the statistics for the https version which has to be added in Google Webmaster Tools as a separate website.

Advertisements

25 thoughts on “Waging a Battle against Sinister Algorithms

  1. Pingback: All My Theories Have Been Wrong. Fortunately! | Theory and Practice of Trying to Combine Just Anything

  2. Pingback: Looking for Patterns | Theory and Practice of Trying to Combine Just Anything

  3. In the later part of 2014 the parameters for search engine inclusions changed so rapidly that I finally began to tune it out. Facebook also made several changes to the way it moves posts through our readers, and Google+ changed the rules again, too. I decided it was time for me to strip back to some basics. When I first started blogging–a while before I turned up here with this WordPress blog–I decided that one way to learn how to do SEO was to practice the opposite and do everything (but block the Google bots from indexing my site) that didn’t optimize search engine results. It was a good learning experience, and oddly more difficult to do than you’d think! I considered doing the same with all the recent changes but then chose to simply opt out of Google. My traffic on WordPress has come to a stop now… I don’t publicize on social media either. I chose this deliberately, to de-clutter my professional life for the next few months. I also dumped my portfolio links, and right now my LinkedIn looks pretty empty. I removed one recent job, too, due mostly to the high rate of complaints I’ve encountered about the company since I left working there (not wanting to be mixed up with all that mess that has accumulated). Ironically, removing that has increased the LinkedIn traffic by a bit, and there have been some opportunities with work, too. All I can say is life is weird.

    If you are talking about the book by your much-discussed favourite author, your WP post came up today when I searched it. I needed to add your name to the search parameters, and even then Dan’s post on the same author was one return in front of yours. Your comment was indexed. I wonder if the Christmas season was interfering with the normal flow of the internet? Unless it’s an online shopping site, December is not a web-friendly time for most of us. 😉

    • Thanks, Michelle! Since you mention Dan I suppose you searched for another post / author who writes about improbable events? My cursed post was about a physics text book that Dan never referred to.
      But the conclusion is still true: After I added the comment I have also noticed that adding one additional term (as ‘elkement’) makes my post come up. But just searching for the title – which showed the post in the third position or so – did not make it show up any more. Since most people searching for the book will just search for the title this could explain why there are no page impressions any more.

      The chilling thing was (is) how extreme the effect was (is) as Google seem to have penalized all the pages not just the review. The page impressions for the blog as now still at their all-time-low – less than 10 per day, compared to nearly 1000 per day a few months ago. This is lower than the results for my old static web pages which I update at greater intervals with much shorter ‘articles’. I have followed the gradual change of Google page impressions over the years and I have never seen an effect as drastic as that: A reduction by a factor of hundred within a month.

      In a sense I am happy that it happened to this blog and not to our German blog. The impressions on the other blog really translate into clicks and into inquiries by clients although we haven’t planned or expected that. We write a lot about the configuration of the specific control unit we use with the heating system and nearly all of the impressions and clicks are related to the name of this product or its vendor. I sincerely hope that this company will never link back to us as this would look exactly like my alleged link scam here. And again I am ‘promoting’ something just because I find it cool not because I gain from the link.

      • Ah, so I guessed wrong about the book!

        I always wondered about the viability of intense social media development, and the dissemination of links throughout the various platforms. It seemed that eventually this might start to cause problems for search engines as well as most readers, given so much of this is simply ‘clutter.’ It’s unfortunate that a non-violating blog fell victim to the sudden link culling. I wonder if your blog will become ‘less penalized’ if you removed the linked-to post? You can remove without deleting by sending it back to drafts, or resetting it’s published status to ‘pending.’ Perhaps in a few weeks your blog will begin to recover?

        That’s quite a sucker-punch to the gut, considering how much work and time has gone into developing your blog. I feel sad about it! Ah, I see you have a post in the reader. I must go to drive kids around for after-school activities, and can’t wait to get back to read it! I will think on this Google problem; I have some post links in google+ on some of these social media changes. Perhaps there’s something I failed to notice before that will help now.

        • Links posts in forums (and perhaps on social media) automatically are added a nofollow tag which makes search engines ignore them… as many people ‘spam’ their links to forums, e.g. by adding their links in their signatures. I have read that there are religious discussions about this: One argument is that users should not have to do the search engines’ job – that is finding out which content is spammy and which is not by proving upfront that their content is legitimate.

          I have migrated my second ‘offending’ page – my curation of physics links (which also had a link to the author’s page) – to another site. But I had planned to do so anyway. Otherwise I think if Google downvotes a legitimate post, so be it – I will not play that silly game of trying to correct what was not wrong or ask the author to remove the backlink. It is as futile as asking those ‘PDF sharing platforms’ to remove your copyrighted content (which they always do, immediately, so I think my requests are justified… but every week another PDF is uploaded or new websites of that sort emerge…)

          The next post you have seen in the reader was my way of coping with this 🙂

          • One quick remark: I participated in that week-long blogging seminar WordPress did at the end of October. One of things we were prompted to do was create an index page of our best posts, and to embed links in it so readers could hit on the good stuff. I set one up… I was a bit concerned, but I still liked the idea as I am planning on letting the blog sit a little and not do much on it over the next few months.

          • I think such a list is a good idea, and I believe it should not hurt (Google-wise). I like such lists more than categories and tags. I have thought about a Best Of last year but finally decided to list all posts on my ‘menu pages’. This enormous numbers of links did not harm Google impressions.

  4. By the way, I have a question for you. How were you able to verify your ww.com site? The normal process is to upload a HTML file but, at least for my account, WordPress will not allow this file type. Is there an upgrade that permits it or did you use a different verification method?

  5. Pingback: Google Translational Poetry – Austrian Christmas Edition | Theory and Practice of Trying to Combine Just Anything

  6. All this super-cyber-stealth-hacker-stuff is going over my head. Are you saying that folks interested in selling stuff can somehow piggyback my site by simply sending an automatic comment that gets filtered into my Spam? And then that simple fact somehow ends up, in some magical way, as my site acting as some sort of confederate site for theirs? Or they do this with search terms or links? I’m confused (perhaps I should have read the entire post another time). And, are you saying that Google and other search engines are somehow unwitting partners in this? That, in the regular course of doing what they do, the hackers and spammers piggyback in some way? I need all of this distilled into one or two sentences so I can sleep! D

    • There are different ways of trying to ‘optimize’ search results via spammy links, illegal to different extents. I have referred to these two – it cannot be done by sending spammy comments in either case:

      1) By setting up different websites on different domains and linking between them, or by paying somebody else to link back to you. This does not require to hack the other website as you control all of them. This is what – I believe – Google falsely accuses me to have done: I have reviewed a book and linked to the author’s website. The author (whom I never been in touch with directly) obviously saw my positive review and linked back my post. His page is ranked high in Google so that backlink gave my blog a boost. From an outsider’s (or Google bot’s) perspective it might be hard to distinguish between a legitimate backlink and a link scam – I (or the author) might have set up this blog for the sole purpose of promoting his book and ‘elkement’ might be just a fake identity. This would be similar to writing fake reviews on amazon.com.
      I am nonplussed as I feel I you are suspicous if you seem to ‘promote’ something without gaining anything. So Google is not a partner in this, rather the police policing a bit too much.

      2) By really hacking another website you don’t control. This actually happened to my other web server, and it requires an attacker to exploit a vulnerability in the web server’s software or some misconfiguration which allows the hackers to obtain high privileges at the server (so that they can write to directories). It was the former in the case of my server, and chances that this happens to a site like wordpress.com are low though not zero.

      The hacker-spammers don’t touch your content, as they really just want to piggyback on your site as you say correctly. They place new web pages with lots of links on your server, perhaps in a directory that is not so easy to find. You don’t walk through all your files every day and look for one more file that should not be there – I only noticed the hack as I accidentally check the log files and found that Google bot tries to access weird URLs on my server… then I saw all that activitiy in Webmaster Tools.
      BTW I had seen similar hacks of popular e-mail accounts like Hotmail, too. If you aren’t a celebrity what hackers are interested in is basically sending spam via your e-mail account.

      Using a fully hosted and managed solution like wordpress.com is in my point of view the best option for outsourcing the risk completely. I recall a recent warning about a vulnerability in wordpress.org – the software you download and run on a self-hosted WordPress blog. If you host all the stuff yourself you also need to be very vigilant or pay some service provider to keep the software up to date.

  7. I have several good take-aways here. First, I need to take a look at the webmaster tools and will do so some time soon. Second, the whole ide of the various algorithms used to provide some measure of so-called intelligence in machines and o their unintended effects. I am constantly left shaking my head on what sites come up with. In particular vendor sites; they never even come close to what I may want in the future. The assumption is that what I do in the future can be predicted from what I did in the past is so invalid for so many reasons: 1-maybe my visit was not part of a pattern 2-maybe that purchase end my interest in that area 2-maybe the purchase was done on behalf of someone else, etc.
    So, too, with your experience. The bottom line is that an earnest attempt on your behalf to bring a measure of order to your thoughts was misinterpreted as an effort to game the ratings system. How absolutely, maddeningly ludicrous is that!
    In my case I admit to becoming increasingly suspicious at Google lack of transparency about what they are doing with all the personal data I seem to be so willingly giving them. Their response–redefining evil. Well, now!

    • Thanks, Maurice! As far as I can tell my predicament is related to that backlink from the book author (just looking for patterns – what is special about this post?) – so not even to something I did myself but to another site … All my own cross-linking of my own stuff back and forth went un-penalized for years, perhaps because all my other sites have low PageRanks. But the author’s site has a PageRank of 3 (as well as my blog, I guess it is typical for WP blogs) – so Google might have paid more attention to the alleged spammy cross-links.

      What I find most frustrating is that obviously you are suspicious if you write a positive review and don’t gain anything from it… or Google tries to figure out how you absolutely have to gain somethingin a malicious way.

    • I’ve certainly got to improve my site web mastering skills too; there’s just so much that I have to get around to doing, though, and so little time to do it.

      All I want is to earn a living on my mathematics blogging to millions of eager and passionate readers; why does this have to be hard?

      • Hi Joseph! Yes, I also find all this web stuff fascinating! Or at least I am trying to see the geeky / interesting aspects although Google has penalized me for something perfectly legitimate … and something I cannot even control (the backlink to my site).

        I am ‘happy’ that it happened to this blog, and not to our German engineering blog as that one is actually found by clients. I also think it is hard to live on blogging (especially if you don’t want to run any ads or participate in affiliate programs etc.) but it is possible that a ‘traditional business’ is found better if you blog about some very specific stuff. (It really needs to be very, very specific according to my anecdotal experience.)

        Originally I planned some end-of-year-looking-back post, and I might have mentioned how I delighted I am about the unexpected effect of our German blog (which is somewhat weird in terms in humor). We would not work with some clients 100s kilometers away hadn’t they googled our blog. But the same that happened to this blog might happen to the other: The modest success of the German blog is based on search terms for a specific brand of control unit and related software. It is just the stuff we happen to work with and specialize in – and we offer quite detailed technical descriptions on our website and blog for free. I hope that the small Austrian company manufacturing this device never links back to us (Of course we had linked to their site several times). This would be the perfect equivalent of me linking to the physics book website and that site linking back to me. We don’t have any special deals with that company, yet we might appear to ‘promote’ something.

        I don’t expect much from blogging as I ‘blogged’ for more than 10 years on my old school home-grown websites that don’t allow for any sort of interaction. I just don’t want to be penalized for writing a positive review I gained nothing from. It’s so ironic that Google obviously doesn’t believe that you don’t want to ‘make money from blogging’. One might say I gained a temporary increase in clicks – true, but now the page impressions are actually lower than ever before – even lower as those of some of my old websites that I update less frequently and whose structure, multiple language features etc. are definitely not best practices. So I believe I would be better off now if I hadn’t written that infamous post at all.

Leave a Comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s