It’s been longer than I care to admit since my last post, but sit comfortably because this is a tale worth telling. It is about all the planning and thought in the world can be let down with one careless oversight.
We use a fairly inexpensive HiPot tester (Clare H101) at work to check that some passive circuits are isolated from each other. This involves putting 850V on one circuit and checking that the leakage to an adjacent circuit is less that 5mA. We currently use a custom switch box to manually dial between tests and perform the hipot test, remembering which combination of tests fail to determine which circuits require rework. This is a fairly quick (40 second) test but relies on the operator to connect the unit under test (UUT) correctly, dial through the tests correctly, record the results correctly, and stamp the correct section of the associated paperwork. With all the workplace distractions it is easy to forget (or overlook) one of those steps. Granted, the operator is working with equipment that has the potential (pun intended) to kill someone, so you’d be forgiven for thinking that additional care must be taken. But with all things, complacency settles in pretty quickly.
So a switch box was designed and software was written to control the hipot tester (using an partially documented protocol) and switch between circuits. This would check for the presence of the UUT (although various constraints prevented continuity checks of the individual circuits), and only proceed with the test in the UUT was connected. The HAL 101 included a guard circuit that is designed as a dead-man’s switch, but ours was fitted with a wire link instead.
The guard circuit calls for a no-volt switch to be used, whereby the test would only start if the contacts were joined. The connector was physically located with Mains parts (IEC inlet, fuses, 230/120V selector) and the connector was rated to 230V with L and N labels on the screw posts, but there was no mention of what voltage the guard circuit operated on.
It was decided that it would be safer if the hipot tester couldn’t initiate a test without software control, and that breaking the guard circuit would achieve this goal. The circuit was designed with track separation for 230V and a 230V relay was spec’ed for use with the guard circuit. All testing was carried out by bypassing the guard circuit until confirmation was received from the manufacturer of the working voltage of the guard circuit. This was received this morning, and all wiring/connectors/etc. needed to be rated to at least 5V 20mA. Perfect! The relay and wiring were completely overspec’ed but it meant that I could use a panel mount 3-pole 3.5m TRS connector instead of a large 230V rated connector. The wiring was finalised, and everything was soldered, connected, and screwed together for the final test.
The first test went through alright, but then the communications to the hipot tester went down. Maybe there’d been a software issue after all. Hardware and software were restarted but the comms were still down. Time to crack out RealTerm as an ASCII protocol had been used. Still nothing. Maybe the comms settings had been changed or corrupted, but everything checked out. What followed was an EE’s (almost) worst nightmare – the smell of magic smoke. Oh dear. Something going wrong is completely manageable; you can examine everything, evaluate possible failure modes, determine what was the cause and propose a fix. But what magic smoke does is alert everyone in the room that you have messed up. Everything was quickly powered off and unplugged, to the sound of cheering from around the office. The only thing that had been changed was connecting the guard circuit so that seemed like a good place to start. Even if there was a solder defect or etching problem on the board then the worst thing was that the relay contacts were shorted together, which wouldn’t cause magic smoke. The connectors were taken apart and all the wiring was checked. Everything seemed alright. Time to take the hipot tester apart. The hipot tester was now already broken, so the ‘VOID if removed’ sticker wasn’t going to stop me.
The Clare H101 is available for around £2300, but accidents happen and I was outside of my probation period so I wasn’t fearful for my job. Opening the hipot tester revealed 2 screws rolling around the case. Maybe it was my lucky day, maybe it was just a coincidence that I plugging something in for the first time at the same time as it went bang. Unlikely… but possible. It didn’t take long to discover a slightly charred and cracked isolated DC-DC converter that powered the external interfaces (remote buttons, lights, beacon, serial interface, and guard circuit). I didn’t really want to send a unit back for a £300 fixing charge when a £5 component had failed (rest assured that my colleagues also picked up on my re-framing of “I’ve blown up a £2300 bit of kit” to “this £5 component has failed”). But what caused it to fail?
I looked over everything again. The connectors had no stray bits of wire, the soldering was perfect, the relay contacts were switching like they should, the COMMON terminal was connected to 0V… WHAT?! Why is that connected to 0V. I opened the schematics and PCB artwork, the relay was only connected to a 5.08mm pitch connector. There was no way that this relay could be attached to 0V. I’d even checked this before and there were no shorts then. What else had I changed? Something must be different. And then it occurred to me, I had added an Earth bonding wire between the front and rear panels. My panel mount 3-pole 3.5m TRS connector also happened to be metal, and so had shorted the sleeve (what I had designated common on the relay) to ground. Obviously when the relay switched across to close the guard circuit I had inadvertently shorted the isolated 5V of the hipot tester to ground (with the isolated 0V connected to the PC through the comms cable). The isolated power supply did not like this, and promptly died. I held my hands up to this. I had even added a cable gland to not use the TRS connector but decided against it at the last minute.
This is where it pays to understand the system as a whole. Yes, I was the only engineer to work on this and so I should’ve known better. What this meant was the avoidance of the fruitless exercise of software engineers blaming electronic engineers blaming mechanical engineers etc – I had to work with myself to ignore blame and work out what and why something had gone wrong. It was my fault that the Clare 5V was shorted to ground but I would learn from that mistake and make sure that it wouldn’t happen again. What actually happened was that I blamed Clare for not designing a more protected interface.
I don’t have access to any circuit diagram, but it is clear that the guard circuit did not include sufficient protection. Any inputs from the outside world should limit the voltage and current (as much as possible) before interfacing with anything sensitive like a micro-controller or logic gate. I tend to use the following circuit.
This limits the voltage and current to the gate of a MOSFET where I can then have voltage level conversion to my micro-controller VDD. This is by means not the only method, and other people may have other ideas, but it is a good place to start. However, having this alone will not protect against what actually happened, and that is that the voltage out drew too much current that the regulator burned out. Again, there are many ways of preventing this. As a starting point I would use a regulator that had over-current protection or thermal cutout. The hipot tester used a Murata NE0505MC for around £4.80 in 1000’s. A cursory check has turned up a BurrBrown part, DCP010505BP, for only £1 more. This features thermal cutout that would prevent the component failure. However, this is only part of it. What happened if the guard circuit was connected to something outputting 24V (like a light gate), or accidentally shorted to ground? Again, then the output should be current limited (using a resistor or PTC fuse) along with diodes to clamp the voltage. This obviously wouldn’t protect against connecting the circuit to mains voltage but it is a start.
If you have read this far then please take two things from this. Firstly, if you are interfacing with the outside world then please use protection. Protect what is going out and what is going in. You don’t need to go overboard, but if there is a chance that something will get shorted to ground or a power rail then limit that current. If you are powering with a DC socket, then include over-voltage and reverse polarity protection. A diode, resistor, or MOSFET are a lot easier and cheaper to replace than every IC on the board. Secondly, if you are the outside world, do not assume that the other designer has read this. Before plugging something in, check, check, and check again. If you are connecting to something that says it requires no-volt connection then don’t short it to a rail, just provide a relay. Obviously I could’ve taken the 5V into my circuit and then supplied my own 5V output, but in this case a relay was supposed to be safer as I may not have had the same 0V reference. Even though you are sure, check continuity between the relay contacts and any current source or sink – that means your voltage rails, case, ground, any IO etc. Read the manual and email the manufacturer for clarification. If something smells hot then be prepared to switch it off quickly. Limit current if you can. The manufacturer said that the wiring had to be capable of withstanding 5V 20mA so I could’ve included a resistor to limit that current. Would it have saved the isolated DC-DC converter? It’s tough to say, but it might have dragged the voltage down enough to affect communications and point to a potential issue.
I hope this has been informative and/or entertaining. To finish the story, my boss had a good laugh at my expense, we chalked it down to a learning experience and a replacement DC-DC converter is on order for me fit. It’s great being a double-E.