Hunting a bug or why cable relief is a good idea

Although it's been a long time since my post about the Split-Flap Display and I didn't got around to make this post in a while, I finally managed to hunt the bug down I described in the post before.

As you may remember, I got two problems with the Display. The first was that some of the modules just died randomly and didn't respond to any signals. The second was that some of the modules have a small offset. Also there were a few modules that would start rotating on command as they should, but won't stop until disconnected from power.

But I can finally say that I have found the root of all the problems and now know how to fix them.

First about the dying modules. I tried swapping the socketed ICs like some people suggested, but they seemed all fine. I continued to disassemble two of the dead modules to check for failures in the PCB or mechanics. Nothing. I just plugged two modules with an offset in the slots and went home.

After being frustrated and not continuing work on it for about 2 weeks, I decided that I need to continue. I started all over again by checking the electrics for broken connections or small cracks, but couldn't find anything that could be the cause of the failure. I checked the motor and mechanics again and found that they were all good. I was astonished by the state the motors were in. If I didn't knew that they were probably over 20 years old, I would have thought that they were fresh from the store (except from the bit of dust one had accumulated over time). That's when I found the first problem. The optical sensors inside were a bit dirty. Although this can't be the cause of the modules dying or having an offset, it could be the cause of some modules not stopping as the controller can't notice the movements of the motor.

But I still didn't know why they stopped working. After a few more hours of searching for problems I decided I got enough for that day and went home, leaving two more of the defect modules on top of the box they are in with the two slots left empty.

Two days later, when I wanted to continue, I came in to someone running a script that displayed random messages every 2 minutes. Also someone had put the modules back in the slots, and one of them was running like a charm. That was the moment it dawned to me that I had searched the problem at the wrong place. Some tests later I knew that the problem were the soldered connections at the back of the address-modules. They were just not connected well enough and had loosened when putting it all in the box. So always remember to make your cables a bit longer than they need to be, so they don't get strangled when moving things around.

Now one was fixed, but the other module still gave no sign of life, even though the connections were good on this one. Interestingly it worked when put in another slot, and other modules in this slot didn't want to work either. After not being able to find any problem with the address-module, I put everything back together and started brainstorming with some other people what the problem might be.

That was the moment the second "miracle" happened this day. Since the display was powered and online during the whole process, the script was pushing new messages all the time. While brainstorming, this script did send an extra long message to the display, and suddenly the module in question DID MOVE! It was showing a ".", which wasn't what it should show, but what was showing 8 modules further down the display. I gave the later one a few commands, and interestingly both modules rotated and showed the same characters. I checked the 8 digit PCB switches, and the addresses were all correct. It concluded that the Shift-Register had failed, and was somehow always reading a high value at the 4th input pin which set the 0x08 bit of the address.

So all the dead modules were fixed (or at least I know how they can be fixed, since I need to buy real connectors at some point and remove my soldering work), time to look after the modules with the offset.

The problem here is how the controller determines the position of the rotor. It uses two optical sensors, that get triggered by a fin at the gears. one of this fins sits directly at the motor and counts the steps the motor makes. The second is at the gear in the back of the module which rotates in the same speed as the rotor in the front. So the system actually checks the position by rotating until the sensor in the back is triggered and from then on counts the steps the motor makes.

But since there are two other gears between the gear with the pin and the gear connected to the rotor, it is not possible to define the position by only triggering the sensor. The controller needs to know the offset between the two gears. There is a calibration routine, but my information about the system aren't sufficient to calibrate them the normal way. But I know that there is an I2C EEPROM on the board and that it's the only non-volatile memory in the system. So at the moment I'm trying to dump the EEPROM from a few of the boards, compare them to one another to identify the calibration data and change them until I get rid of the offset.

So in the end I now know for sure that except of one Shift-Register there is no hardware fault in the system (And the Shift Register still works when set to a specific address). And also I know that I will need to do a lot of work in recalibrating the modules.

But if there is one lesson to be learned here, it is to always make some cable reliefs.