As an embedded software developer and designer, when things go wrong you will often need to poke around the hardware or trace through the code in order to find out what has happened. This is often seen as the drudgery of the job and can present many challenges. However, there are tools and ideas to take on board that will help to eliminate these bugs before they happen or to resolve them quickly if they do.
Some of the key things you will have to overcome include the fact that there are serious memory limitations in many embedded systems and CPU limitations. This can impact the amount of cycles available to perform the activities that are doing the tracing and logging, and without the memory, it isn't possible to view or store these cycles either. Also the most complex bugs are typically almost impossible to replicate. Two common bugs you might find yourself faced with include:
This is a common firmware bug that you will have to deal with as an embedded software developer. It will lead to systems running out of memory, in the same way a water bucket with a hole in it will eventually run dry. Often the areas of legitimate memory will get overwritten. This can be a particular problem if the system is using dynamic memory allocation. It becomes an issue of ownership management. However, it is possible to avoid memory leaks. This is by transparently defining the ownership pattern or lifetime for every kind of heap-allocated object. A common and clear ownership pattern might involve having a buffer pool that is allocated by a producer task. This then goes through a message queue to a consumer task, which then destroys it before returning the memory back to the buffer pool.
You may find that a deadlock is occurring. This is where there is a circular dependency existing between either two or more tasks. It can cause tasks to be blocked while waiting, and this can mean none of the tasks will then occur. These issues can happen across various types of systems, so understanding how to fix them can be key. Firstly, it is important to not try to achieve the simultaneous acquisition of two or more mutexes. If you hold one mutex and block for another, this will cause a deadlock. If you are holding one mutex on its own, however, this won't cause a deadlock situation. It is important therefore to ensure you are always acquiring one mutex at a time.
This should be done through the leaf nodes of your code, in which you can push for the acquisition and release of the mutexes. In using leaf nodes, which acts as device drivers and also as reentrant libraries, it helps to hold on to the mutex acquisition. It also ensures that code is released out from the task-level algorithms and can minimise how much code is contained within critical areas.
Regardless of the bugs you're facing, there are some key top tips and techniques that you can take on board to reduce the bug count during development and to also assist in the debugging process should you be dealing with them. These include:
When you are trying to de-bug a problem, you'll need to focus in on the location of the issue in order to understand what is happening. This can help save you time when you get started as you will have a strong comprehension of how everything is working. You will then need to start by verifying that all of the lower level functions are working, before working your way up through the various layers of abstraction. Next, verify the low level A/D interface, as well as the physical memory interface and the physical communication bus interface. From there, move up to the next layer to verify other functions, including routers and data handlers. Finally, you will need to verify the data logger, which is the top level.
If there are any complicated analog inputs, change these for digital, repeatable, synthetic and known data. You might find using a counting pattern will work well, and if it does, start with this. If the algorithm has a coefficient, for example, filters, you will need to change the production coefficients for a more simple set of coefficients. This might include changing them to all zeros apart from one remaining full-scale coefficient.
It's all about getting a handle back on the issue and this is never a case of churning up the speed. Therefore, start slowing down clocks and data rates. The problem may be that one algorithm hasn't synchronised with another, or some timing requirement isn't being met. Additionally, it may be that an algorithm that needs to be data-driven has actually been coded as being clock-driven, or that data is being pushed too quickly between devices.
In the instance where multiple inputs are at play, it's important that you are only changing one variable at a time, otherwise it will be hard to know what it is that has been fixed if you manage to find the solution. When you are looking for the problem, isolate the inputs, make a change, and then observe how the system responds. It is important to identify which inputs are triggering a change within the output system.
Rather than working on an algorithm in real-time, in which you can find yourself potentially causing knock-on issues if you aren't careful, it is always advisable to create off-line models to work from. They should be a close match to the issue that is online, but then you will be able to work on it without problems. This is particularly useful in the case of non-intuitive transformations - e.g. in the event of matrix operations. By working offline, it can simplify the process of understanding how the output should look like in relation to the input. If you're not sure you have the solution, it's always worth asking someone else to take a second look - another pair of eyes won't cause any issues, and can sometimes see things more clearly than yourself when you have been studying it for a while. Ultimately, it is important to understand various debugging techniques and this is something that employers will be looking for when hiring embedded software engineers, designers and developers. It will show that you are able to improve the efficiency of product design and will have the experience to get involved if anything should go wrong.
It also means you will be able to train other employees, which will make you a particularly useful member of the team. With the right tools and techniques in hand, you can help to save your employer money, get projects resolved quicker, and exceed customer expectations.
Georgina has in-depth expertise within tech and innovation. You can find Georgina using her skills to help fellow creative industry leaders through inspiring research pieces or lecturing at Oxford College of Marketing.