TL;DR
I recently red fifth part of Douglas Adams Hitchhiker’s guide to the galaxy trilogy, Mostly Harmless. This post is about one section that talks about software testing. And yes, I hit publish instead of preview button.
“Click, hum. The huge grey Grebulon reconnaissance ship moved silently through the black void. It was travelling at fabulous, breathtaking speed, yet appeared, against the glimmering background of a billion distant stars to be moving not at all. It was just one dark speck frozen against an infinite granularity of brilliant night. On board the ship, everything was as it had been for millennia, deeply dark and Silent. Click, hum. At least, almost everything. Click, click, hum. Click, hum, click, hum, click, hum. Click, click, click, click, click, hum. Hmmm.”
This paragraph perfectly describes one of my testing session. I tested one feature that involved embedded video player. But video player was embedded at the bottom of the page, making it not visible when page opens. When I opened that page, I heard for less than a second:
Click, hum.
What was root cause? New feature involved playing youtube video from any position. Developer fast forwarded video to that position, but forgot to use mute feature, so click, hum was produced with short fast forward with the sound, very annoying sound!
“A low level supervising program woke up a slightly higher level supervising program deep in the ship’s semi-somnolent cyberbrain and reported to it that whenever it went click all it got was a hum.”
And this is how Elixir programming language (an OTP application) operates. Every program has one supervising program. But that program has only one task, to monitor its child programs, nothing more!
Chapter continues with very interesting root cause analysis.
“The higher level supervising program asked it what it was supposed to get, and the low level supervising program said that it couldn’t remember exactly, but thought it was probably more of a sort of distant satisfied sigh, wasn’t it? It didn’t know what this hum was. Click, hum, click, hum. That was all it was getting. The higher level supervising program considered this and didn’t like it. It asked the low level supervising program what exactly it was supervising and the low level supervising program said it couldn’t remember that either, just that it was something that was meant to go click, sigh every ten years or so, which usually happened without fail. It had tried to consult its error look-up table but couldn’t find it, which was why it had alerted the higher level supervising program to the problem. The higher level supervising program went to consult one of its own look-up tables to find out what the low level supervising program was meant to be supervising. It couldn’t find the look-up table. Odd.”
“It looked again. All it got was an error message. It tried to look up the error message in its error message look-up table and couldn’t find that either. It allowed a couple of nanoseconds to go by while it went through all this again. Then it woke up its sector function supervisor.”
“The sector function supervisor hit immediate problems. It called its supervising agent which hit problems too. Within a few millionths of a second virtual circuits Mostly Harmless 4 that had lain dormant, some for years, some for centuries, were flaring into life throughout the ship. Something, somewhere, had gone terribly wrong, but none of the supervising programs could tell what it was. At every level, vital instructions were missing, and the instructions about what to do in the event of discovering that vital instructions were missing, were also missing. Small modules of software – agents – surged through the logical pathways, grouping, consulting, re-grouping. They quickly established that the ship’s memory, all the way back to its central mission module, was in tatters. No amount of interrogation could determine what it was that had happened. Even the central mission module itself seemed to be damaged.”
“This made the whole problem very simple to deal with. Replace the central mission module. There was another one, a backup, an exact duplicate of the original. It had to be physically replaced because, for safety reasons, there was no link whatsoever between the original and its backup. Once the central mission module was replaced it could itself supervise the reconstruction of the rest of the system in every detail, and all would be well. Robots were instructed to bring the backup central mission module from the shielded strong room, where they guarded it, to the ship’s logic chamber for installation. This involved the lengthy exchange of emergency codes and protocols as the robots interrogated the agents as to the authenticity of the instructions. At last the robots were satisfied that all procedures were correct. They unpacked the backup central mission module from its storage housing, carried it out of the storage chamber, fell out of the ship and went spinning off into the void. This provided the first major clue as to what it was that was wrong.”
“Further investigation quickly established what it was that had happened. A meteorite had knocked a large hole in the ship. The ship had not previously detected Mostly Harmless 5 this because the meteorite had neatly knocked out that part of the ship’s processing equipment which was supposed to detect if the ship had been hit by a meteorite. The first thing to do was to try to seal up the hole. This turned out to be impossible, because the ship’s sensors couldn’t see that there was a hole, and the supervisors which should have said that the sensors weren’t working properly weren’t working properly and kept saying that the sensors were fine. The ship could only deduce the existence of the hole from the fact that the robots had clearly fallen out of it, taking its spare brain, which would have enabled it to see the hole, with them. The ship tried to think intelligently about this, failed, and then blanked out completely for a bit. It didn’t realise it had blanked out, of course, because it had blanked out. It was merely surprised to see the stars jump. After the third time the stars jumped the ship finally realised that it must be blanking out, and that it was time to take some serious decisions.”
“It relaxed. Then it realised it hadn’t actually taken the serious decisions yet and panicked. It blanked out again for a bit. When it awoke again it sealed all the bulkheads around where it knew the unseen hole must be. It clearly hadn’t got to its destination yet, it thought, fitfully, but since it no longer had the faintest idea where its destination was or how to reach it, there seemed to be little point in continuing. It consulted what tiny scraps of instructions it could reconstruct from the tatters of its central mission module. “Your !!!!! !!!!! !!!!! year mission is to !!!!! !!!!! !!!!! !!!!!, !!!!! !!!!! !!!!! !!!!!, land !!!!! !!!!! !!!!! a safe distance !!!!! !!!!! ….. ….. ….. …., land ….. ….. ….. monitor it. !!!!! !!!!! !!!!!…””
“All of the rest was complete garbage. Before it blanked out for good the ship would have to pass on those instructions, such as they were, to its more primitive subsidiary systems. It must also revive all of its crew. There was another problem. While the crew was in hibernation, the minds of all of its members, their memories, their identities and their understanding of what they had come to do, had all been transferred into the ship’s central mission module for safe keeping. The crew would not have the faintest idea of who they were or what they were doing there. Oh well. Just before it blanked out for the final time, the ship realised that its engines were beginning to give out too. The ship and its revived and confused crew coasted on under the control of its subsidiary automatic systems, which simply looked to land wherever they could find to land and monitor whatever they could find to monitor. As far as finding something to land on was concerned, they didn’t do very well. The planet they found was desolately cold and lonely, so achingly far from the sun that should warm it, that it took all of the Envir-O-Form machinery and LifeSupport-O-Systems they carried with them to render it, or at least enough parts of it, habitable. There were better planets nearer in, but the ship’s Strateej-OMat was obviously locked into Lurk mode and chose the most distant and unobtrusive planet and, furthermore, would not be gainsaid by anybody other than the ship’s Chief Strategic Officer. Since everybody on the ship had lost their minds no one knew who the Chief Strategic Officer was or, even if he could have been identified, how he was supposed to go about gainsaying the ship’s StrateejO-Mat. As far as finding something to monitor was concerned, though, they hit solid gold.”
Takeaway
Supervisor program should do one and only task and supervisor hierarchy must be flat!
Click hum heuristic. Supervisor hierarchy error recovery.
Given that Douglas Adams had no background in tech at all and when he did begin getting involved in IT projects well after HHGTTG became popular, he was completely self-taught, this is quite an insightful piece on testing!