Sunday, 13 April 2008

Going Bug Hunting

Bugs are usually thought of as programming errors, but I would extend the concept to include all unexpected results, whether they are due to programming or due to the system being used for a purpose it wasn't designed for. Here are a few tips to help you in your hunt.

Document

The easier you make it for a developer to isolate and correct the problem, the faster the problem will get solved and the more reliable the result. The very first step is to make a copy of the screen showing the error message. This may take some training because the typical response is just to click on OK without even reading the error message. Paste the screen shot into a Word document and email it to the support person.

Follow Up

Keep track of your support requests via a support log to be sure they are followed up and addressed in a timely manner. If a description of all the support requests is kept, then it can be used to point to a solution should the same issue crop up again.

Re-create
Should a solution not be immediately available, see if you can make the error happen again. Microsoft Dynamics, like most accounting systems, comes with a sample company. Re-creating the error in the sample company has the advantage that the programmer has access to the same system. It also rules out your data being the culprit.

Frustration

Let's say that the support representative looks at your issue and says that they are unable to re-create the problem. Furthermore, you can't either. But then it happens again. This is when you need to be rigorous and scientific in your approach. The worst kind of bug to find is the intermittent error. You need to comb through ever instance of the error looking for a common thread or a pattern.

Like everyone in systems work, I have lots of stories of obscure errors or the amount of hair I lost trying to sort out a problem. It comes down to patience and luck. In general, I would say that half the time it was the system's users who figured out where the actual problem was.

Hardware

When the answer is elusive, I try to eliminate sources of error. The first thing to eliminate is hardware. One defective router dropping or corrupting messages from one user's computer to the server can cause serious issues in the whole system. At one client we had printout going to (apparently) random printers. It turned out that new users were being set up with a copy from an existing user. The copy included the computer identification number, so there were duplicate identification numbers in the system simultaneously. When routing printed reports to printers, the server would choose the first computer to login with that identification number, causing an intermittent error.

Security and Set Up

Another source of error to eliminate is the user set up. The more flexible a security system is, the more complex the user set up. Watch to see if the error happens to more than one user. Also check whether it is tied to a particular time of day. At one client, the system slowed to a crawl every day around 12:00 pm. It turned out that the warehouse staff were playing internet radio stations during their lunch break.

Within the accounting system try to determine which modules are involved. For example, a transaction that works well in the local currency may cause problem in a foreign one. Pore through all of the set up to see if anything attached to the transaction causing the error message was unusual (e.g. the vendor, inventory item, general ledger account, etc.) If everything looks good, then document everything you can and wait for the situation to recur.

Patience

Patience is your best ally in this quest. Bugs are often a source of finger pointing between people who are convinced that the answer lies with someone else. The message you need to keep repeating is that we are all on the same side. We are all working towards the same goal: bug elimination.

0 comments: