Tag Archives: foolish

Bad day at work? – How I destroyed a £2300 piece of equipment

It’s been longer than I care to admit since my last post, but sit comfortably because this is a tale worth telling. It is about all the planning and thought in the world can be let down with one careless oversight.

We use a fairly inexpensive HiPot tester (Clare H101) at work to check that some passive circuits are isolated from each other. This involves putting 850V on one circuit and checking that the hipotleakage to an adjacent circuit is less that 5mA. We currently use a custom switch box to manually dial between tests and perform the hipot test, remembering which combination of tests fail to determine which circuits require rework. This is a fairly quick (40 second) test but relies on the operator to connect the unit under test (UUT) correctly, dial through the tests correctly, record the results correctly, and stamp the correct section of the associated paperwork. With all the workplace distractions it is easy to forget (or overlook) one of those steps. Granted, the operator is working with equipment that has the potential (pun intended) to kill someone, so you’d be forgiven for thinking that additional care must be taken. But with all things, complacency settles in pretty quickly.

So a switch box was designed and software was written to control the hipot tester (using an partially documented protocol) and switch between circuits. This would check for the presence of the UUT (although various constraints prevented continuity checks of the individual circuits), and only proceed with the test in the UUT was connected. The HAL 101 included a guard circuit that h101is designed as a dead-man’s switch, but ours was fitted with a wire link instead.

The guard circuit calls for a no-volt switch to be used, whereby the test would only start if the contacts were joined. The connector was physically located with Mains parts (IEC inlet, fuses, 230/120V selector) and the connector was rated to 230V with L and N labels on the screw posts, but there was no mention of what voltage the guard circuit operated on.

It was decided that it would be safer if the hipot tester couldn’t initiate a test without software control, and that breaking the guard circuit would achieve this goal. The circuit was designed with track separation for 230V and a 230V relay was spec’ed for use with the guard circuit. All testing was carried out by bypassing the guard circuit until confirmation was received from the manufacturer of the working voltage of the guard circuit. This was received this morning, and all wiring/connectors/etc. needed to be rated to at least 5V 20mA. Perfect! The relay and wiring were completely overspec’ed but it meant that I could use a panel mount 3-pole 3.5m TRS connector instead of a large 230V rated connector. The wiring was finalised, and everything was soldered, connected, and screwed together for the final test.

The first test went through alright, but then the communications to the hipot tester went down. Maybe there’d been a software issue after all. Hardware and software were restarted but the comms were still down. Time to crack out RealTerm as an ASCII protocol had been used. Still nothing. Maybe the comms settings had been changed or corrupted, but everything checked out. What followed was an EE’s (almost) worst nightmare – the smell of magic smoke. Oh dear. Something going wrong is completely manageable; you can examine everything, evaluate possible failure modes, determine what was the cause and propose a fix. But what magic relaysmoke does is alert everyone in the room that you have messed up. Everything was quickly powered off and unplugged, to the sound of cheering from around the office. The only thing that had been changed was connecting the guard circuit so that seemed like a good place to start. Even if there was a solder defect or etching problem on the board then the worst thing was that the relay contacts were shorted together, which wouldn’t cause magic smoke. The connectors were taken apart and all the wiring was checked. Everything seemed alright. Time to take the hipot tester apart. The hipot tester was now already broken, so the ‘VOID if removed’ sticker wasn’t going to stop me.

The Clare H101 is available for around £2300, but accidents happen and I was outside of my probation period so I wasn’t fearful for my job. Opening the hipot tester revealed 2 screws rolling around the case. Maybe it was my lucky day, maybe it was just a coincidence that I plugging something in for the first time at the same time nte0505mcas it went bang. Unlikely… but possible. It didn’t take long to discover a slightly charred and cracked isolated DC-DC converter that powered the external interfaces (remote buttons, lights, beacon, serial interface, and guard circuit). I didn’t really want to send a unit back for a £300 fixing charge when a £5 component had failed (rest assured that my colleagues also picked up on my re-framing of “I’ve blown up a £2300 bit of kit” to “this £5 component has failed”).  But what caused it to fail?

I looked over everything again. The connectors had no stray bits of wire, the soldering was perfect, the relay contacts were switching like they should, the COMMON terminal was connected to 0V… WHAT?! Why is that connected to 0V. I opened the schematics and PCB artwork, the relay was only connected to a 5.08mm pitch connector. There was no way that this relay could be attached to 0V. I’d even checked this before and there were no shorts then. What else had I changed? Something must be different. And then it occurred to me, I had added an Earth bonding wire between the front and rear panels. My panel mount 3-pole 3.5m TRS earthconnector also happened to be metal, and so had shorted the sleeve (what I had designated common on the relay) to ground. Obviously when the relay switched across to close the guard circuit I had inadvertently shorted the isolated 5V of the hipot tester to ground (with the isolated 0V connected to the PC through the comms cable). The isolated power supply did not like this, and promptly died. I held my hands up to this. I had even added a cable gland to not use the TRS connector but decided against it at the last minute.

This is where it pays to understand the system as a whole. Yes, I was the only engineer to work on this and so I should’ve known better. What this meant was the avoidance of the fruitless exercise of software engineers blaming electronic engineers blaming mechanical engineers etc – I had to work with myself to ignore blame and work out what and why something had gone wrong. It was my fault that the Clare 5V was shorted to ground but I would learn from that mistake and make sure that it wouldn’t happen again. What actually happened was that I blamed Clare for not designing a more protected interface.

I don’t have access to any circuit diagram, but it is clear that the guard circuit did not include sufficient protection. Any inputs from the outside world should limit the voltage and current (as much as possible) before interfacing with anything sensitive like a micro-controller or logic gate. I tend to use the following circuit.input-protection

This limits the voltage and current to the gate of a MOSFET where I can then have voltage level conversion to my micro-controller VDD. This is by means not the only method, and other people may have other ideas, but it is a good place to start. However, having this alone will not protect against what actually happened, and that is that the voltage out drew too much current that the regulator burned out. Again, there are many ways of preventing this. As a starting point I would use a regulator that had over-current protection or thermal cutout. The hipot tester used a Murata NE0505MC for around £4.80 in 1000’s. A cursory check has turned up a BurrBrown part, DCP010505BP, for only £1 more. This features thermal cutout that would prevent the component failure. However, this is only part of it. What happened if the guard circuit was connected to something outputting 24V (like a light gate), or accidentally shorted to ground? Again, then the output should be current limited (using a resistor or PTC fuse) along with diodes to clamp the voltage. This obviously wouldn’t protect against connecting the circuit to mains voltage but it is a start.

If you have read this far then please take two things from this. Firstly, if you are interfacing with the outside world then please use protection. Protect what is going out and what is going in. You don’t need to go overboard, but if there is a chance that something will get shorted to ground or a power rail then limit that current. If you are powering with a DC socket, then include over-voltage and reverse polarity protection. A diode, resistor, or MOSFET are a lot easier and cheaper to replace than every IC on the board. Secondly, if you are the outside world, do not assume that the other designer has read this. Before plugging something in, check, check, and check again. If you are connecting to something that says it requires no-volt connection then don’t short it to a rail, just provide a relay. Obviously I could’ve taken the 5V into my circuit and then supplied my own 5V output, but in this case a relay was supposed to be safer as I may not have had the same 0V reference. Even though you are sure, check continuity between the relay contacts and any current source or sink – that means your voltage rails, case, ground, any IO etc. Read the manual and email the manufacturer for clarification. If something smells hot then be prepared to switch it off quickly. Limit current if you can. The manufacturer said that the wiring had to be capable of withstanding 5V 20mA so I could’ve included a resistor to limit that current. Would it have saved the isolated DC-DC converter? It’s tough to say, but it might have dragged the voltage down enough to affect communications and point to a potential issue.

I hope this has been informative and/or entertaining. To finish the story, my boss had a good laugh at my expense, we chalked it down to a learning experience and a replacement DC-DC converter is on order for me fit. It’s great being a double-E.

Hello Caller, You Are Through To The Helpdesk Part3

Following the advice of the people at www.openipcam.com I downloaded the entire contents of flash using Kermit95. Kermit95 is a program developed by Columbia University, but was retired in July of this year. The full source code is available, but as of writing this, there have been no compiled binaries. There exists a command “d” that dumps the data in a specified memory location and Kermit95 was used along with a script found at www.openipcam.com to record every result.

As I’ve mentioned before, the flash chip is split into 5 areas; bootloader, uCLinux, romfs, settings, and webui.99.9% of bootloaders for the Foscam IP camera (and its derivatives, and clones) use the same bootload, and by comparing bootloaders it is possible to see some address and data line problems. My flash chip is a 16bit device, meaning it deals with data in 16bits – and has 16 data lines. The bootloader sits in the bottom 64k, and so uses 12 address lines. If the two bootload sections are identical then you can safely assume that the 16 data lines, and 12 address lines are intact and working correctly.

Moving on to the uCLinux partition, and this is where things got interesting – and we found the fault. We know that the uCLinux is in a ZIP archive, because the Camera tried to unzip “image 7” (our uCLinux file). Dumping from 0x7f020000 to 0x7F0DF700 gives the *.ZIP file that contains our uCLinux data, as evidenced by the presence of a ZIP header – “50 4B 03 04” or which includes “PK” in ASCII. PK stands for Phil Katz best known for his work on, you guessed it, ZIP compression algorithms. Part of the header includes the file size, and so we could convert the dump.txt file into a HEX, and onwards to a binary ZIP file, just to check if the unzip functionality of the bootloader was borked. Although I went through the motions, I came to the same conclusion as the bootloader; the file was knackered. As it turns out, memory location 0x7f03000 contained some interesting information.

Upon power up the camera loads the bootloader, and then looks for the BOOT_INFO file at location 0x7f010000. BOOT_INFO contains the camera’s MAC address, IP address, DHCP information, Serial number etc. If the camera can’t find this information, it would query the other hardware, load the values into memory location 0x7f010000 and carry on with its boot sequence. Now image for a second what happens if some damage had happened to the pins responsible for those memory locations. One particular failure mode would be that the ARM would look in the flash at memory location 0x7f010000, but due to an addressing failure, the flash would return the data at location 0x7f030000 (pin A16 being tied with pin A15). Of course 0x7f030000 didn’t contain the BOOT_INFO and so the ARM polled the attached devices for information, and set up a new BOOT_INFO at 0x7f010000. Which would be great, apart from the fact that the flash actually wrote the new data back into 0x7f030000. For those that missed it, the uCLinux partition starts at 0x7f020000 and so NOW has a hole 8k wide. By all accounts, the camera did this to itself

There was no way I would be getting that 8k back, and the only way to move forward was to write to flash a working program. I was 99% certain that I had fixed the hardware fault, and that I would not only write to flash, but the correct location in flash too. But that 1% kept niggling away at me. If for some reason I managed to overwrite the bootloader in flash then I had convinced myself that it would be the end. As it turns out that never happened, and I could always read back the bootloader code and verify it against a previously dumped version – a dodgy address line wouldn’t even matter, as I wrote to the wrong location, so reading from that wrong location is easy. What wouldn’t be easy though is doing anything in Windows7. TeraTerm was supposed to have XMODEM capabilities, but for the life of me I couldn’t get it to work. Fortunately I managed to download an alternative program (ClearTerminal) so all was not lost. Side note: If the developers of RealTerm ever happen to read this, please implement XMODEM.

It took me some time to rationalise my following action; the camera was not usable in its current state – by deleting the current uCLinux there is the possibility that I could find a working version and all would be well with the world. So I started trawling OpenIPCam for a compatible dump. I now knew that my camera was a branded Storage Options, and that it was a clone of a EasyN IP camera which reduced the number of possible candidates. All that is left to do, is find a compatable and appropriate version of software – how hard can that be?

 

 

Hello Caller, You Are Through To The Helpdesk Part2

Hands up who thought I would have fixed it by now. You are not alone, I figured this would be an easy fix too. If you remember last post (or care to click on previous post), you would remember that I was concerned by those few pins that looked damaged. Well damaged they were, and a quick touch-up with a soldering iron soon fixed that problem. Did I say fixed; I clearly meant changed.

I have jumped the first hurdle, only to be greeted by a brick wall. Yes, I can see more of the flash. No, the flash is not complete – or at least it is corrupt. I will take this opportunity to thank the people over at http://www.openipcam.com as they have been/are being fantastically helpful. It is pretty obvious that the Flash chip is responsible for all my woes – and also for forcing me to understand ARM processors a bit more. These work differently to how a PIC or Atmel (including the arduino) works. The 4M flash chip is split into a number of sections; the bootloader, uCLinux, the file system, and additional settings or website interface. The ARM takes these pieces of data and copies them onto the 16M SDRAM chip, and runs the software from there. This realisation led me into thinking of this tiny embedded system as a mini-computer instead of a microcontroller; something that I am sure is either entirely obvious to people, or lost on them. Either way, my thinking had changed.

I still had a problem that needed breaking down and I came up with a number of different theories. Either the ARM, flash, or SDRAM couldn’t communicate properly, or the flash had gotten corrupted. I knew that some of the flash was fine, and some of the communication was working as I already had the bootloader loading up, and now I could see what was actually in the flash. However, I couldn’t trust anything I was seeing from the flash, as I knew part of it wasn’t working. If only there was a way to load the SDRAM directly. Luckily we can do just that.

Using the command “mx 0x8000” it is possible to upload a version of uCLinux directly into SDRAM, and “mx 0x700000” allows us to upload a new file system. On XP this is easy. All I need to do is open HyperTerminal, click Transfer -> Send File. All I needed to remember was I was using the XMODEM protocol. HyperTerminal was actually written by a company called Hilgraeve, and was licensed to Microsoft to use in their communications package. Apparently Microsoft did not want to continue paying a fee for this incredibly useful piece of software, and so HyperTerminal is no longer packaged with either Windows Vista, or Windows 7. To get around this, I used RealTerm as my serial communication program, and fall back on TeraTerm when I need to use the XMODEM protocol. These tools are purely my preference, and not the only ones capable. Uploading the two files takes some time, but once they are done all that was left was to type “g 0x8000”. This tells the ARM to start running whatever is found at data position 0x8000, which just happens to be where I stored the uCLinux program. I was able to ping Google, and so I was happy that the SDRAM was working properly, and I started on the long journey on sorting the flash out.

People always talk about data as “ones and noughts”, or “zeros and ones”. This is fine as that is how a computer stored that data – but it is not how the data is represented. Letters are just a collection of straight and curved lines, and yet we that is not how we think of them. I am not going to go into bytes, nybbles, hexadecimal as you already know it, or you don’t want to know it. I will say that the flash chip has numerous address and data lines; the idea being you punch in the address that you want and the chip will pump out the data held in that address. The “ones and noughts” become important now as they make or break those address and data values. Imagine we wanted data location 13. This becomes 00001101, so we would set our address lines high or low depending on those values. Now unfortunately we have a fault on the board such that bit5 is always tied high. Although we want 00001101, the flash chip sees 00011101 (or location 29) and gives us the data held in that location. This works the other way too, as writing to location 13 would not only overwrite location 29, but also leave 13 alone. There is no easy way to diagnose these problems, other that a multimeter and a steady hand. Alternatively, I could solder flying leads to each of the pins of the flash chip, and the appropriate pins on the ARM, and check the transitions with my Open Bench Logic Sniffer.

Again, if you checked my last post, you would see that I gave up on using the Logic Sniffer. Or should I say I gave up on it for the night. The next day I took it to work, followed the same steps and “hey presto” it worked first time. I am now running the latest PIC and FPGA code, and it works great…at work. For some reason, it still doesn’t play nice with my home PC. There must be some quirk with the combination of Windows 7 and a 64bit OS that means it just won’t play nice. It appears as a comms port, but software doesn’t appear to be able to talk to it. I am chalking this up to be an issue with the Microchip drivers, as I can’t open the port in any other program either. I went as far as downloading Microchips latest Application Library (containing the latest USB Framework), in an attempt to get a working virtual com port, but even that didn’t work.

Moving back to the camera. Although the boss is aware that I am bringing this problem into work, I set it apart just for breaks and lunch time, and the occasional early start and late finish. As part of the process I have had to take the PCB out of the case. This makes probing a lot easier, but also has the added benefit of disguising its true purpose if the MD walks past my desk.

 

Hello Caller, You Are Through To The Helpdesk Part1

I was getting ready to have an enjoyable weekend catching up on my free Stanford University courses when I received a phone call from my father-in-law. This usually means that some piece of technology has gone wrong and I take the role of helpdesk.  Before you all start drawing conclusions from this; he is fairly technical and so this call means something has gone seriously wrong, and I actually enjoy these challenges. Today’s challenge was a no-name pan/tilt IP Camera.

 

The first step to troubleshooting a problem is to isolate and repeat it. Normally upon power-up the camera would perform a single sweep on both axes, and sit in a centre position. This was not happening – so either the thing wasn’t powered, or the firmware wasn’t running. Plugging a network cable in lit up the status and activity LED’s so at least we knew that it was getting power. I didn’t really want to open the camera yet (purely as it wasn’t mine), so opened up AngryIPScanner and Wireshark. I couldn’t find the device on any subnet despite the activity light flickering. There was no other option, I had to crack this bad boy open.

 

There are a number of chips on the PCB (# ESTESPCB613M136) , including a Nuvoton W90N745CDGARM (main ARM processor), Winbond W9812G6JH (2M SRAM), DM9161 (phy chip), and a WM8731 (audio codec). The other side included a Stansion S29GLO32 (another Flash chip) and a WiFi daughter-board. All in all, not an overly complex board. Due to a poor design choice, the threaded mounting hole is directly above the ARM chip. There was a light witness mark on the ARM, but I was more concerned by the marking on a number of pins – almost as if the solder was squashed. Unfortunately, I don’t have a decent iron at home, so a fix will have to wait until I get back to work. I turned to Google to get more information on the camera. I didn’t have access to the box, so I had no idea who the manufacturer was, but I found out that Foscam used to OEM extremely similar looking cameras in the past, and the likelihood is that this is either one of the OEM variants or a clone thereof.

A typical feature of the Foscam camera range is the presence of a UART header connection, and this was no exception. Although it wasn’t labelled, a quick check of the N745 chip datasheet identified the pins as thus. This enables a terminal connection to the ARM’s bootloader upon power-up. I was using my only remaining Max232 in a different project so I took the opportunity to revive my Open Bench Logic Sniffer. It needed updating to bring it to the current revision, but once that was done I could start on the hard stuff. And here we hit a snag. I updated the Logic Sniffer, and now the USB won’t enumerate. Clearly I am not having a very good day.

 

So before I could start fixing the IP Camera I had to now fix my Logic Sniffer. This was as easy as soldering some header pins to the ICSP and use my PICkit3 to reprogram the PIC. And by easy, I meant I gave up and hit the forums. I had to resort to hunting down my old PICDEM 2 PLUS board (circa 2002) and use the MAX232 onboard. I had to use the same trick to get the keys off my 360 DVD-drive, but that is another story. With fresh wires soldered onto the camera board, and plugged onto the PICDEM2 board, I loaded up RealTerm. Powering up the camera board revealed that the ARM is working (at least part of it), as I get a nice serial interface.

 

Routing around in the terminal revealed that only the bootloader is present, and there is no sight of the linux.bin or romfs.img files that I was expecting. Given this I can draw one of two conclusions; either the data on the flash chip has become corrupt, or the pins with witness marks are actually broken and the ARM can’t talk to the flash chip.

Stay tuned for Part2, where we find out if the camera can be recovered.

Toddler doing a Triathlon

I want to design and develop games and let’s be honest, who wouldn’t want to do that. In truth I want to design video games; I can’t think of a single person I shared a lecture theatre for 3 years with who wouldn’t want to be involved with game design in one form or another. I’ve tried my hand at game programming before (as you will see in the future), but that has always been homebrew; nothing official, and always centred around algorithms or simulations. Video games are great, but the number of different skills required makes it difficult for a one-man-band, and unless you can drag someone else along with the same dream and different skill set I am afraid you are on your own. Making a game is relatively easy, but thinking of a game is the problem I have. So I shelved that dream.

Until I stumbled upon Kickstarter, that is. For those that don’t know; Kickstarter is probably the biggest crowd sourcing website out there. The idea is that you post your project, and people fund it in return for some arbitrary deliverable. One example would be developing a new card game, and those deciding on funding $500 (it’s a US site) would receive an uncut sheet of the original design signed by all those involved – down to $1 receiving a thank you on the website.

At which point my brain perked up and said “I can do that” and set out developing the next big thing. I will tell you that it was HARD. My career path involves me bringing ideas to life, but there is always a brief to work to, and at worst my industry market to aim at. I literally had nothing to work from other that “I can do that”. I immersed myself in the culture, from reading www.dicecreator.com and www.stormthecastle.com, to signing up to various game designer forums. I wish I was joking, but all I could see was an excuse to build a laser engraver for custom dice, and crafting my own inch high figurines for game tokens. Thankfully, I took a step back. This wasn’t just running before walking; I was a toddler doing a triathlon. I knew my weakness was having the idea, and I was willing to spend hours learning how to model figurines for a game that didn’t exist. This was something that my GCSE in Fine Art did not prepare me for. I knew I needed help, so I did what any self-respecting engineer would do – I downloaded an eBook.

Challenges for Game Designers by Brenda Brathwaite and Ian Schreiber became my Bible. This was 317 pages of pure gold, and I felt like I was at university again. I would fall asleep with it in hand, wake up the next morning and continue where I left off. This book set out weekly assignments and group exercises (not that I had a group). It broke game design down into manageable blocks, and it teaches you how a game works; not how to make a game. That is just what I wanted. I know what a sprite is, and I know how to move it around a screen. I know the importance of non-blocking calls, and how to have a responsive UI.

After moving to Northampton, my selection of available focus groups dropped to 1; my children. There is a point in a child’s life when Mousetrap, and Snakes and Ladders becomes boring. A realisation dawns that there is nothing that child can do to influence a game; the likelihood of defeating a rival sibling is equal to being beaten. This is the eternal battle between skill and luck. You will realise this too when your adorable child morphs into demon spawn purely because they “never win”. I didn’t want to be the one who designs a game responsible for that metamorphosis. But at the same time, my children aren’t ready for the strategic requirements of Risk – which probably explains why I keep winning.

And so The Planting Game was born. The idea is to cover an 8×8 grid with counters of your colour.

You will need:

  • One 8×8 grid (i.e. a Chess board)
  • One 6 sided die
  • Coloured counters (at most 64 of each type)

Prepare your die by taping -1 and -2 over 5 and 6 respectively. Each player has a side of the board, and must place counters either in a square directly adjoining that player’s side, or a square in that player’s possession. When the dice is thrown and a positive number is rolled the player places that many counters on the board. If a negative number is rolled, the player removes that number from an opponent. If any counters are orphaned, that is are unable to trace a route back to the player’s side, then that counter is removed from play. The person with the most counters at the end wins.

The game is still rough around the edges, but the children love it. Using dice means that luck is never far away, but tile placement along with tile removal introduces that level of skill. Do you always plant three across to reduce the chance of losing additional tiles as orphans, or do you race towards the opponent to stunt their growth? In interesting choice for a 5 year old to make.