The Top 5 Tenets for Writing Robust Firmware

At Promenade Software, we write a lot of firmware, but we also help clients when they need help with their own firmware. We have done this for several clients, and we often see the same situations repeated in which the firmware has intermittent problems that the client simply cannot figure out. The reason is always clear upon code review - some fundamental firmware tenets have been broken. This blog is by no means a complete tutorial for writing good firmware, but it will share the 5 main tenets to resolve intermittent bugs.

For simplicity, we will focus on small single core processes running bare-metal code – one thread running in a main loop, with interrupt handlers. The concepts can be expanded with a threaded RTOS or even a multi-core system. 

Tenet 1: Make it clear what is run from an interrupt.

The first thing we do when we receive the code is to clarify what is being called from an interrupt. We will follow the flow from every interrupt and append the function names with “ISR” (Interrupt Service Routine). We will rename the non-local variables read or written likewise. The compiler will then tell us what is being shared between an interrupt and the main thread. Using this naming convention really helps in the maintenance of the code. Often the  problem was introduced later by someone not aware a function was run from an interrupt.

Tenet 2: Make sure functions shared between running in interrupt and main thread are reentrant.

Reentrant functions do not use the same resources, such as hardware or memory areas.  Stack based local variables are fine. If they don’t need to be shared, it is best to make a version for the ISR and one for the main thread for future proofing.

Tenet 3:  Make sure that variables shared in an interrupt and the main thread are accessed either atomically (one machine instruction) or have critical sections (interrupts disabled) around their access.

For example, if the interrupt is filling in an array of data, the consumer of the data in the thread needs to disable interrupts, copy the data into  local variables, and then re-enable interrupts. For example:

The unsafe way to read what the interrupt collected in the main thread would be:

     temp1 = temperature_data_from_ISR[0]

     temp2 = temperature_data_from_ISR[1]

The safe way to read it would be:

     ENTER_CRITICAL_SECTION() // disable interrupts

     int temp1 = temperature_data_ISR[0]

     int temp2 = temperature_data_ISR[1]

     EXIT_CRITICAL_SECTION() // re-enable interrupts, if they were enabled when you came in.

Make sure that simple variable access is atomic. For example, sharing a 16-bit variable on a 32-bit processor means that the access is not one machine instruction. Interrupts can happen in between instructions, if not protected with critical sections.

Even something innocuous like the following can be a problem:

Thread writes a value, ISR reads the value:

     int mysharedvar = 1;

This looks like it would be an atomic action, but we have seen the compiler optimizer turn this into a clear and increment, which is faster than moving the value from flash memory. In one case we saw, the interrupt would occasionally fire between the clear and the increment. The interrupt saw the value as 0, even though the thread logic was continually writing only 1.

Tenet 4: Review the interrupt functions for time consuming actions.  

An interrupt should not clear flash, read a slow ADC, wait for a bus to send or receive, or do CPU intensive work. Interrupts need to be quick – in and out. If not, there is the potential for some other interrupts to be dropped and for data to be lost because they were not serviced in time (ex: for Bluetooth, serial buses, etc.). Use state flags so that the main loop can pick up the work.

Tenet 5:  Know your main loop worst-case time.

We want to avoid the situation in which the main loop takes longer than the expected time, and everything gets pushed out. We generally use an available GPIO pin and a logic analyzer for measuring this, exercising the worst case. Avoid doing delays in the main loop – even if you think you have time. That will help future-proof your code. For example, do not set an output, spin for 100ms, and clear it.  Instead, set up  a state table of actions and times (based off a timer tick), and manage the states each time the main loop comes around.

Those are our top 5 tenets for bare-metal firmware. Contact us if you have any questions or comments!

Need help on this topic?
Contact Us
Frances Cohen

Frances Cohen is President of Promenade Software Inc., a leading software services firm specializing in medical device and safety-critical system software. Frances has more than 20 years of experience leading software teams for medical device software. Starting with heart defibrillators for Cardiac Science and following with Source Scientific LLC and BIT Analytical Instruments Inc., Frances has overseen dozens of projects through development and the FDA, including IDEs, 510(k)s, and PMAs.  

Frances has a B.S. in computer engineering from the Technion, Israel Institute of Technology.

linkedin logo
SUBSCRIBE TO
NEWSLETTER
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
ABOUT
PROMENADE SOFTWARE

Promenade Software, Inc. specializes in software development for medical devices and other safety-critical applications.
Promenade is ISO 13485, and CypherMed Cloud is SOC2 Type II certified.