True Stories
Napa Auto Parts 2022

The story

At a store today, "NAPA Auto parts" I got into a conversation with 2 of the people there.

  • They told me that whenever the company does upgrades to the POS, it is done as soon as the stores in the EASTERN time zone close. This results typically in 1 to 2 hours downtime WHILE THE STORES ARE OPEN West of them (end of day, Sundays), so no sales are possible until the upgrade completes. My thoughts:
    • As the Live Grinch movie said "You're an idiot". Why would a North American company run upgrades during North American business hours – just because Head office in the East is closed! If an upgrade HAS to take more than 30 seconds – do NOT do it during hours ANY of your stares are open.
    • Why does a typical upgrade take the system down for more than 1 second? (Design of course.) I do realize that SOMETIMES an upgrade needs to have the system down for more than 1 second … but routinely? That is just bad design. No wonder people hate upgrades.
  • Their software is multilingual: French/English. He showed me that, for years apparently, when you do a search it searches the "French" database, then shows those English records! The result is that searches are very inconsistent in their results. Update: I have been told that due to the language police in Quebec, it is against the law for the system to not be 'French first' in everything. Maybe the person was being facetious, but in this case, it looks like the programmers decided better be safe than sorry since they have branches in Quebec.
  • When the internet is down – there is NOTHING they can do. It is 100% in the cloud. There is no offline side to it.
  • Because it is 100% in the cloud EVERYTHING is ALWAYS slow (latency is what I observed as they showed me their definition of 'slow'.) The system was probably designed for and tested on a local network and they never though about the reality of latency. This is one argument for why developers and testers should be forced to be WFH people. It's also an argument why the owner of anything bigger than a mom and pop company shouldn't let his son – just because he took 'a computer course' and is therefore an 'expect' – run their software development teams.

How does MCe avoid this?

I look at all of those and say: We can make most of those problems go away just simply by how we do things. Well and also by not being stupid … a search by an English person should search the English data – not the French data.

First, whenever possible, design an upgrade to not require significant downtime.

If it does require downtime, unless it is an emergency fix, do it when every branch is off, and likely off for more than an hour in case someone is working late. We plan for evening, and on our SaaS we plan for late at night for the Western most user on the system, on a Saturday or Sunday.

How to design to not require significant downtime:

Run SQL upgrade scripts ahead of the upgrade, this way the software can run as soon as installed, not having to wait potentially hours for the SQL scripts to run.

By being offline. This means if the server is down for a time, be it short or long, the client software, the software on your desktop, laptop, tablet or cellphone, can run often without even noticing in a practical way the server is done. Changes you make are cached on the device and sent when the server comes back up.

By loading the new software on the server while the old is running, then when the new is in place, switching to it.

By on the client downloading the new software, then watching for a 'safe' point to switch to the new software, such as when you switch to or from the home page or switch modules. Indeed MCe users almost never even notice that the upgrade happened it is so quick and transparent.