Tag Archives: rant

Bad day at work? – How I destroyed a £2300 piece of equipment

It’s been longer than I care to admit since my last post, but sit comfortably because this is a tale worth telling. It is about all the planning and thought in the world can be let down with one careless oversight.

We use a fairly inexpensive HiPot tester (Clare H101) at work to check that some passive circuits are isolated from each other. This involves putting 850V on one circuit and checking that the hipotleakage to an adjacent circuit is less that 5mA. We currently use a custom switch box to manually dial between tests and perform the hipot test, remembering which combination of tests fail to determine which circuits require rework. This is a fairly quick (40 second) test but relies on the operator to connect the unit under test (UUT) correctly, dial through the tests correctly, record the results correctly, and stamp the correct section of the associated paperwork. With all the workplace distractions it is easy to forget (or overlook) one of those steps. Granted, the operator is working with equipment that has the potential (pun intended) to kill someone, so you’d be forgiven for thinking that additional care must be taken. But with all things, complacency settles in pretty quickly.

So a switch box was designed and software was written to control the hipot tester (using an partially documented protocol) and switch between circuits. This would check for the presence of the UUT (although various constraints prevented continuity checks of the individual circuits), and only proceed with the test in the UUT was connected. The HAL 101 included a guard circuit that h101is designed as a dead-man’s switch, but ours was fitted with a wire link instead.

The guard circuit calls for a no-volt switch to be used, whereby the test would only start if the contacts were joined. The connector was physically located with Mains parts (IEC inlet, fuses, 230/120V selector) and the connector was rated to 230V with L and N labels on the screw posts, but there was no mention of what voltage the guard circuit operated on.

It was decided that it would be safer if the hipot tester couldn’t initiate a test without software control, and that breaking the guard circuit would achieve this goal. The circuit was designed with track separation for 230V and a 230V relay was spec’ed for use with the guard circuit. All testing was carried out by bypassing the guard circuit until confirmation was received from the manufacturer of the working voltage of the guard circuit. This was received this morning, and all wiring/connectors/etc. needed to be rated to at least 5V 20mA. Perfect! The relay and wiring were completely overspec’ed but it meant that I could use a panel mount 3-pole 3.5m TRS connector instead of a large 230V rated connector. The wiring was finalised, and everything was soldered, connected, and screwed together for the final test.

The first test went through alright, but then the communications to the hipot tester went down. Maybe there’d been a software issue after all. Hardware and software were restarted but the comms were still down. Time to crack out RealTerm as an ASCII protocol had been used. Still nothing. Maybe the comms settings had been changed or corrupted, but everything checked out. What followed was an EE’s (almost) worst nightmare – the smell of magic smoke. Oh dear. Something going wrong is completely manageable; you can examine everything, evaluate possible failure modes, determine what was the cause and propose a fix. But what magic relaysmoke does is alert everyone in the room that you have messed up. Everything was quickly powered off and unplugged, to the sound of cheering from around the office. The only thing that had been changed was connecting the guard circuit so that seemed like a good place to start. Even if there was a solder defect or etching problem on the board then the worst thing was that the relay contacts were shorted together, which wouldn’t cause magic smoke. The connectors were taken apart and all the wiring was checked. Everything seemed alright. Time to take the hipot tester apart. The hipot tester was now already broken, so the ‘VOID if removed’ sticker wasn’t going to stop me.

The Clare H101 is available for around £2300, but accidents happen and I was outside of my probation period so I wasn’t fearful for my job. Opening the hipot tester revealed 2 screws rolling around the case. Maybe it was my lucky day, maybe it was just a coincidence that I plugging something in for the first time at the same time nte0505mcas it went bang. Unlikely… but possible. It didn’t take long to discover a slightly charred and cracked isolated DC-DC converter that powered the external interfaces (remote buttons, lights, beacon, serial interface, and guard circuit). I didn’t really want to send a unit back for a £300 fixing charge when a £5 component had failed (rest assured that my colleagues also picked up on my re-framing of “I’ve blown up a £2300 bit of kit” to “this £5 component has failed”).  But what caused it to fail?

I looked over everything again. The connectors had no stray bits of wire, the soldering was perfect, the relay contacts were switching like they should, the COMMON terminal was connected to 0V… WHAT?! Why is that connected to 0V. I opened the schematics and PCB artwork, the relay was only connected to a 5.08mm pitch connector. There was no way that this relay could be attached to 0V. I’d even checked this before and there were no shorts then. What else had I changed? Something must be different. And then it occurred to me, I had added an Earth bonding wire between the front and rear panels. My panel mount 3-pole 3.5m TRS earthconnector also happened to be metal, and so had shorted the sleeve (what I had designated common on the relay) to ground. Obviously when the relay switched across to close the guard circuit I had inadvertently shorted the isolated 5V of the hipot tester to ground (with the isolated 0V connected to the PC through the comms cable). The isolated power supply did not like this, and promptly died. I held my hands up to this. I had even added a cable gland to not use the TRS connector but decided against it at the last minute.

This is where it pays to understand the system as a whole. Yes, I was the only engineer to work on this and so I should’ve known better. What this meant was the avoidance of the fruitless exercise of software engineers blaming electronic engineers blaming mechanical engineers etc – I had to work with myself to ignore blame and work out what and why something had gone wrong. It was my fault that the Clare 5V was shorted to ground but I would learn from that mistake and make sure that it wouldn’t happen again. What actually happened was that I blamed Clare for not designing a more protected interface.

I don’t have access to any circuit diagram, but it is clear that the guard circuit did not include sufficient protection. Any inputs from the outside world should limit the voltage and current (as much as possible) before interfacing with anything sensitive like a micro-controller or logic gate. I tend to use the following circuit.input-protection

This limits the voltage and current to the gate of a MOSFET where I can then have voltage level conversion to my micro-controller VDD. This is by means not the only method, and other people may have other ideas, but it is a good place to start. However, having this alone will not protect against what actually happened, and that is that the voltage out drew too much current that the regulator burned out. Again, there are many ways of preventing this. As a starting point I would use a regulator that had over-current protection or thermal cutout. The hipot tester used a Murata NE0505MC for around £4.80 in 1000’s. A cursory check has turned up a BurrBrown part, DCP010505BP, for only £1 more. This features thermal cutout that would prevent the component failure. However, this is only part of it. What happened if the guard circuit was connected to something outputting 24V (like a light gate), or accidentally shorted to ground? Again, then the output should be current limited (using a resistor or PTC fuse) along with diodes to clamp the voltage. This obviously wouldn’t protect against connecting the circuit to mains voltage but it is a start.

If you have read this far then please take two things from this. Firstly, if you are interfacing with the outside world then please use protection. Protect what is going out and what is going in. You don’t need to go overboard, but if there is a chance that something will get shorted to ground or a power rail then limit that current. If you are powering with a DC socket, then include over-voltage and reverse polarity protection. A diode, resistor, or MOSFET are a lot easier and cheaper to replace than every IC on the board. Secondly, if you are the outside world, do not assume that the other designer has read this. Before plugging something in, check, check, and check again. If you are connecting to something that says it requires no-volt connection then don’t short it to a rail, just provide a relay. Obviously I could’ve taken the 5V into my circuit and then supplied my own 5V output, but in this case a relay was supposed to be safer as I may not have had the same 0V reference. Even though you are sure, check continuity between the relay contacts and any current source or sink – that means your voltage rails, case, ground, any IO etc. Read the manual and email the manufacturer for clarification. If something smells hot then be prepared to switch it off quickly. Limit current if you can. The manufacturer said that the wiring had to be capable of withstanding 5V 20mA so I could’ve included a resistor to limit that current. Would it have saved the isolated DC-DC converter? It’s tough to say, but it might have dragged the voltage down enough to affect communications and point to a potential issue.

I hope this has been informative and/or entertaining. To finish the story, my boss had a good laugh at my expense, we chalked it down to a learning experience and a replacement DC-DC converter is on order for me fit. It’s great being a double-E.

The Perils of Windows Update (Error 80070002)

I switched my PC on this morning, and a little message popped up asking me to update my Windows 7 machine. I have configured my machine to notify me when an update was available and to not install it itself – and those wishes were indeed complied with. Now an up-to-date machine is a happy machine so I went through the motions only to be greeted with this…

Apparently Error 80070002 is caused by a mismatch in update databases, and a resolution is to stop the update service, delete some temporary update files, and resume the service. Well, I don’t need to tell you that this did not work.

If stopping and restarting services isn’t your idea of fun, then don’t worry as there is a nice and easy solution known as Fix It. This will run through the process automatically and should fix whatever problem it thinks you have. Once again, this not work. I suspect it was due to the Fix It solution being identical to the advice I had already received. I blame this on Microsoft’s continued attempt to make things “easier” for the non technical; a side effect being everything is harder for the technical.

If you recall, I have already had issues with Microsoft’s decisions regarding the move from Windows XP to Windows 7. The first being the lack of HyperTerminal, and the second is the requirement to use Windows Update (located in the Start Menu, or in Control Panel) when installing updates. There used to be the option to install updates using Internet Explorer, but not any more. users are forced down either a single path or off the road entirely.

Finally I found Autopatcher. Released in 2003 this is a free alternative to Microsoft Update. I think it is worth mentioning that Autopatcher originally used a separate server to host the updates, and was subsequently informed by Microsoft Legal to stop, due to concerns about unauthorised updates and malware. Since then, an agreement was made for Autopatcher to utilise Microsoft’s own servers to retrieve the updates. One important feature of this for network managers, is that it allows the update to be downloaded once, and installed to many PC’s. Autopatcher first gives you a list of update modules you might want to use and then allows updates to be selected, indicating already installed updates in blue. Once the appropriate updates are checked, they will be installed.

Unfortunately my “Error 80070002″ still exists, but I am happy with the knowledge that my PC has current updates. Additionally, I can archieve these updates speeding up the inevitable rebuild that accompanies Windows machines. Happy days.

Rules Are Made To Be Abused

In case you weren’t aware, BT is now blocking access to Newzbin. That isnt stopping the MPA though, as they are now looking to get Newzbin blocked from other major ISPs including TalkTalk and VirginMedia.

Let me start by saying I am against blocking content on the internet. Sure there are things that I don’t think should be on the internet, but the burden of responsibility should lay with the company hosting that data and not the company owning the pipes to your house. BT is merely providing a service moving data around, in the same way as they provide a service connecting phone lines together. To make BT responsible for the data would be like making them responsible for every crime committed that was enabled using the BT phone network.

But my annoyance is not due to the blocking, but instead the method of blocking. BT is using its own CleanFeed software to stop any access to Newzbin. CleanFeed was developed by BT in 2003 in an attempt to block illegal material as identified by the Internet Watch Foundation, specifically related to child pornography. Due to the success of this software, all ISPs were required to implement a similar system by the end of 2007. I think we can all agree that this is a noble endeavour to protect the vulnerable.

But the MPA is not vulnerable. The MPA contains some of the biggest movie companies in the world; including Walt Disney Motion Pictures, Paramount, Sony, Twentieth Century Fox, Universal and Warner Brothers. This group represents a lot of money, and a lot of legal weight. An example of this is Walt Disney Group successfully lobbying for the Copyright Term Extension Act just to protect its Mickey Mouse copyright.

Add into the mix that Newzbin does not host any material, but instead allows people to search for POTENTIALLY infringing material. There are two points here; Newzbin is a search engine, and not all searchable material is protected by copyright (and in some cases is freely distributed). I can Google any number of illegal things, from how to broadcast without a license, to how to construct a nail bomb, and yet Google (acting in the same manner as Newzbin) is free from persecution. Secondly, I could use Newzbin to search and download an obscure Linux distribution, or Jonathon Coulton’s latest album. Again, neither of these would be breaking the law, and yet CleanFeed makes no distinction.

The problem is that the people in power generally do not have an understanding of how technology works. They rely on the advice of “experts” whose advice can be bought by the likes of the MPA. And if the advice is not actually sought, then the MPA can lobby politicians. These laws are not properly thought through, or get abused by people in power. Another example of this is a recent case where a man was taking a picture of his child using his phone. This innocent action took place inside a shopping centre, and the man was told by a security guard that photographs were not permitted, and to delete the picture. The man explained he had already posted the pictures on Facebook, and for some reason the Police were called. Apparently, one officer claimed that he could confiscate the phone under the Prevention of Terrorism Act. Clearly, the Prevention of Terrorism Act was not devised to stop people taking pictures of their own children, and yet that is how it is being used.

But as I try and drag my train of thought back towards BT, CleanFeed and Newzbin, I am reminded that Newzbin have developed a system of their own. This enables users to circumvent CleanFeed and render it useless. I can’t help but think that CleanFeed’s misuse as a tool to protect MPA’s interests has actually made child pornography more accessible.