Let’s assume your app is currently in production, and has a non-trivial number of users. By non-trivial I mean a number that makes it impractical for you to write a personalized apology email to each and every single one of them when you lose their data. When you reach that sort of penetration, every time your developers touch the UI or anything directly adjacent to the UI it is bound to break someone’s workflow.
You might think you are fixing a long standing UI bug, or making the user interface more consistent and therefore user friendly, but it does not matter. At least one of your users probably worked the side-effects of said bug into the way they do things and it will appear broken to them afterwards.
Let me give you a few examples from personal experience. This is not a project I am personally involved in at the development side. For once I am actually sitting at the user-end and watching the fireworks. Let me set the stage for you: we have been using a third party time tracking tool for ages now. When we first deployed it, it was a self hosted application that we had to maintain ourselves. This involved periodically rebooting the server due to memory leaks, and applying the infrequent patches and upgrades. Prior to every upgrade we would first test it on a dummy instance, and would have the folks who used the tool extensively to do scheduling and processing time and expenses give it a once-over before we deployed it to production. If there were issues we would work with the vendor to iron them out prior to the deployment. It worked well.
Unfortunately about a year ago they discontinued support and licensing for the self-hosted version and we had to upgrade to their “state of the art” cloud based service. This was nice for me because it meant we no longer had to expend time and resources to maintain the tool internally. The end users were also happy because they would be getting all kinds of new bells and whistles to play with. The vendor promised the cloud version is developed and improved very aggressively based on user suggestions and that their new agile development process can deploy fixes and custom patches much faster than before. It sounded great on paper, but it turned out to be a disaster.
The vendor likes to push out minor updates and patches every other Monday, and like clockwork this results in our ticketing system getting clogged up with timesheet software related issues. We verify all of these and tag-team grouping and compiling them into support requests who get forwarded to the vendor support team, and cc’d to our account manager. This is our third account manager since the switch and I suspect our company single-handedly got the last two fired by maintaining an unstoppable barrage of open tickets and constant demands for discounts and downtime compensation.
Most of the problems we are having stem from trivial “fixes” that make perfect sense if you are on the development team. For example, recently someone noticed that the box you use to specify how many hours you worked can accept negative values. There was no validation so the system wouldn’t even blink if you entered say negative five hours on a Monday. So they went in, added input validation, and just to be on the save side they fixed it in their database. And by fixed, I mean they took an absolute value of the relevant column, and then they changed the datatype to unsigned integer. Because if there were negative values there, they had to be in by a mistake, right? Because who in their right mind would use negative time? Well, it turns out it was my team. Somehow they figured out a way to use this bug to easily fudge time balances on the admin site. For example, if someone was supposed to work five hours on a Monday, but had an emergency and left three hours early, the admin would just go in and add -3 work hours to the timesheet with a comment. It allowed them to have both the record of what the person was supposed to do, and what actually happened. Needless to say, after the “fix” all our reports were wrong.
More recently, they noticed that there were two ways for people to request time off in the system. You could create a time-off request ahead of time (which had to be approved by a supervisor) or you could submit it as a part your timesheet by putting in 8 hours as “personal day” or whatever. Someone on the vendor’s dev team decided to “streamline” the process and removed the ability to enter time off from the timesheet page. To them it made perfect sense to only have a single system pathway for entering time. Unfortunately my team relied on that functionality. We had a special use case for the hourly contractors which simply required them to record their downtime as “unpaid leave” (don’t ask me why – I did not come up with that). Before they could do that by simply filling out their time sheet. After the upgrade they had to go to the time off tab, and fill out a time of request for every partial day that week, then have that request approved by a supervisor before they could actually submit a timesheet. So their workflow went from clicking on a box and typing in a few numbers to going through 3-5 multi-stage dialog boxes and then waiting for an approval.
To the vendor’s credit, they are addressing most of these problems in a timely manner, and their rapid development cycle means we don’t have to wait long for the patches. They do however have serious issues with feature creep and each “fix” creates three new problems on average.
Majority of these stem from the fact that our users are not using the software the way the developers intended to. They are using the application wrong… But whose fault is that? Should paying customers be punished or even chastised for becoming power users and employing the software in new, emergent ways rather than using it as you imagined they would? Every botched, incomplete or ill conceived UI element or behavior in your software is either an exploit or a power user “feature” in waiting.
I guess the point I’m trying to say is that once you deploy your software into production, and make it available to a non-trivial amount of users, it is no longer yours. From that point on, any “bug fix” can an will affect entire teams of people who rely on it. A shitty feature you’ve been campaigning to remove is probably someone’s favorite thing about your software. A forgotten validation rule is probably some teams “productivity crutch” and they are hopeless without it.
Full test coverage may help to limit the amount of “holes” your users may creatively take advantage of, but it only takes you so far. There is no way to automate testing for something you never anticipated users doing. You won’t even discover these emergent, colorful “power user tricks” by dog-fooding your app, because your team will use it as intended, rather than randomly flail around until they find a sequence of bugs that triggers an interesting side-effect and then make it the core of their workflow. This is something you can only find out if you work with genuine end users that treat your software like a magical, sentient black box that they are a little scared off.