TI MSP430 software developers need a little help

I’ve been working for a while on using MSP430 microcontrollers. We selected them for a bunch of reasons, including price, availability, physical size, USB support, and preloaded bootloader. As it turns out, a few of those weren’t quite in the shape we expected.

Programming difficulties

First, the bootloader is really a bare minimum. It does not cover such features as programming the flash memory on its own, so tools like python-msp430-tools download a secondary bootloader into RAM in order to function. That bootloader was presented as a binary blob, although with much searching it is possible to find the Custom Bootstrap Loader sources via a link within application note SLAA450 “Creating a Custom Flash-Based Bootstrap Loader (BSL)”. It’s also explicitly mentioned at the Software Tools section, with a link titled Open Source, but that goes to a wiki which does not provide this link. In the end, however, I gave up on that because not only is it obviously not free software or even open source, it completely failed to communicate once loaded. I ended up writing a workaround based on the user guide and BSL protocol guide (buried in an mbox file here, if you need it).

USB framework

The MSP430 USB Developers Package provides example code for USB. In fact, it contains no less than 52 copies of the same USB library – in turn divided into 4 USB device class sections and one common, all with lots of code duplication. It makes it all too clear that no forethought has gone into what’s a common section, as there’s no common code for talking to other endpoints than number 0; the rest is not only duplicated among classes, but present multiple times for each.

Once I got my code at least partially working with the USB library, I found some odd bugs – for instance, the MCU could hang when sent a bunch of data in quick succession. I tracked this down to an issue that’s not limited to the USB code, but in fact present already in SLAA294 “MSP430 Software Coding Techniques” – the starting point for pretty much all MSP430 code.

The above flowchart is based on one in SLAA294, and illustrates the combination of interrupts to react to events with a main thread that uses power saving sleep modes. The USB code didn’t even manage to follow this much, by the way; it was more sensitive to interrupt timing because the “Clear flag_1” portion was moved to after the “Execute flag_1 handler” section, meaning it could miss if this flag was applied again. However, this is only part of the problem.

There are two fundamental errors in the flowchart. First, there is no exit shown from the “Enter sleep” step, although it does continue to the first flag test once woken up. Secondly, the interrupts do not cause control to flow into that same test; they will return to wherever the main thread was. This could be anywhere within the main loop, including just before the “enter sleep” step – in which case the MCU will dutifully sleep until the next interrupt. For a simple example of this, consider what happens if first event 2 occurs, then event 1 while the main thread is at the flag_2 test step the second time around.

I propose a fairly simple solution. We add one more variable, let’s call it SR_sleep, containing the SR value used to enter sleep mode. When the ISRs decide to wake the main thread, they set not only the SR but also SR_sleep to active mode. Then the Enter sleep step is modified to contain precisely two instructions: One copies SR_sleep into SR, which will cause the main thread to sleep if SR_sleep has not been modified by an interrupt. The second sets SR_sleep to the sleep value. This acts similarly to the flags used to identify specific events, except there is no test; SR_sleep is set back to the sleep state immediately on wakeup, as we already know an interrupt must have occurred. This completely removes the window of time in which an interrupt may set a flag but fail to stop the main thread going to sleep. The trick revolves around the MSP430 not interrupting within an instruction, and being capable of loading SR from a memory variable in one instruction.

It gets somewhat more complicated if the main thread uses multiple sleep levels. In that case, the instruction resetting SR_sleep should read from a variable where the desired sleep mode is stored.

Slight update on parallel processing

I’ve written previously on parallel programming; once on APIs, and twice on smaller hardware implementations (1) (2). As is bound to happen, I missed some, made some mistakes, and the world moved on.

You’ll be glad to know all the major PLD developers offer gratis synthesis tools now, including Xilinx, Altera, Lattice and Actel (now MicroSemi). The latter two don’t have their own tools, though, which complicates matters a bit; the software vendors insist on tacking on sabotage systems like FlexLM, some options are time restricted, and even from the big two support for the largest chips isn’t included – but then, those require a hefty budget in the first place. That’s why I haven’t bought a Lattice ECP3 kit already; the software is only covered for a few months, after which it costs as much every year as the kit did in the first place. And that’s a low cost one.

OpenCL is alive and well, with company backed implementations from Apple, AMD, nVidia, Intel, IBM, S3/VIA and Zii Labs, and properly free software in pocl (Portable OpenCL) and Clover (for Gallium 3D). Admittedly the quality of these may vary, but it’s great to see it moving into budget devices (S3/VIA), non-GPU systems (Intel, AMD, pocl) and even low-power mobile devices (Zii Labs).

Speaking of Zii Labs, you may recall my negative comments regarding their blatant lies in marketing. They seem to have moved on (I don’t even find those materials now), as there are now some details (extremely little, but some), devices exist (although very few), and with OpenCL support (albeit in a restricted beta they haven’t replied to my inquiry about) their processing arrays become usable with portable code. I really hope they launch a good device this year, because the old ZiiO tablet isn’t quite worth the asking price where I live.

I’m still very annoyed when companies lie at me instead of presenting their products. One of the devices recently brought to my attention, Venray Technology’s TOMI, suffers from this. At its core, it’s a low instruction set computer with tightly coupled DRAM. It’s not a parallel processor at all, but the design is aimed at systems with multiple chips. It features four memory access units (including the instruction fetcher), eight general purpose registers, and one operation unit (with ARM-like preshifting in front of an ALU). It’s interesting in that it tries to deal with the memory bandwidth limited processing by distributing the processors (calling it a CPU would be way off). But the front and center marketing is, simply put, bullshit. Stop lying to your prospective customers.

I’d also failed to remember Ubicom in my list of parallel chips. It appears to be a barrel processor much like the XMOS ones, but in a higher end system on chip with ready designs for routers and an “internet radio” player. They’ve stayed away from video, however, so it’s perhaps not that remarkable in actual performance; more likely the architecture helps with responsiveness.

Literate programming

That’s right, I’m finally beginning to take my first small steps towards literacy. I’ve known of the concept for quite some time, joining documentation and program code together into a unified document, but haven’t really been using it. Sure, I’ve used plenty of automatically extracted API documentation, but rarely (if ever) written any. And today, I needed something slightly different – I needed a report on a programming project.

As with earlier reports, I fired up LyX, because I’m a sucker for easy interfaces. I’m not really at home in LaTeX, and had prior experience that LyX could make entry of formulae, tables and such easier. This time, though, I needed some circuits, state diagrams, and above all, source code. So, looking at the options, I found LyX now supports both Noweb and Listings. So I sat about writing bits, documenting the circuit using CIRC, and inserting code with Noweb “scraps” as LyX calls them. Pretty soon, this got me tired.

LyX provided me with two options for the source code: scraps, where I had to use Ctrl+Enter to get halfway reasonable spacing, and had no indentation or syntax assistance, or Listings, where code was reformatted for printing but not in the editing view. Besides, my CIRC drawing was just literal code anyhow, so LyX didn’t help very much in the WYSIWYG department. Even looking at the file, it was clear that LyX was just adding overhead – my document would be cleaner in Noweb directly.

Having written just a little code inside LyX, I now knew I wanted back to a proper programmer’s editor. That meant Emacs or Vim. Emacs did open Noweb documents happily, but the syntax highlighting turned out to be a bit bipolar. It was switching, depending on the cursor (point?), between TeX and C sub-modes, and reinterpreting the whole document each time – which destroyed the source context. I did find a simple workaround by using /* and */ in TeX comments, letting the C mode know the text wasn’t code. Not really a big deal, but I’m not used to Emacs, and this swapping (reminescent of per window palette switching in X) was annoying either way. Vim is usually my editor of choice, but it didn’t recognize Noweb at all. For Vim, I found a few scripts, and the highest rated one actually worked. It’s not perfect – it has a few hardcoded languages it can recognize within Noweb – but it’s easy enough to modify if needed, and it does the job.

Noweb style programming is a considerable change for me. My code is now migrating from lots of different files into one larger document, within which I’m writing the structure of the code in an easier, modular fashion. It’s not perfect, but I’m learning. The current question is why double dashes (as in decrements in C) are converted to single ones in print. The same thing even happens here in wordpress. Still, a few steps forward.

Fossil: project management on the quick

Sooner or later, development projects need some revision tracking. Usually right about when you either need an experimental branch for a new feature or sharing the project, which would include releases. You’ll also need to document the work, and if you’re maintaining it at all, probably track issues. Even better if this can all be done publically.
Traditionally, all these tasks are done in central repositories with specialized tools – perhaps RCS (with descendants like CVS and Subversion), Bugzilla, and so on. They’ve been more or less difficult to set up and serve, which lead to services like Sourceforge, Github, and Google Code. There are tools to handle the combination, like Trac. Most of these work, and sometimes they’re just the thing – because you know you’ll want to share the project and spend the time to set up that infrastructure.
Other times, you’re just doing a quick hack. And then you give it to someone. And, two months later, you run into an indirect friend who’s using that same hack, with their own changes, and experiencing an issue you solved later on.. but the code has grown so much you can’t easily track down the changes needed, let alone figure out which release their version is based on.

We’ve seen a move lately towards distributed revision control, with the likes of Git, Mercurial, Darcs, Bazaar and so on. They can, and do, solve the issue of independent development – but only if people use them. Mostly that tends to get stuck on either learning how to use them, or having the tool available. The first is an issue mostly because each tool is different, and the second because they have varying requirements. This is not at all unique to revision control; people hesitate all the time to install software because of complex requirements and so on.

Fossil is a project management tool intended to solve some of these issues. It’s not necessarily best at anything it does, but it does it with a minimum of setup. It has a discoverable web interface, works as one program file, stores data in self-contained files, and offers revision control, a wiki, account management for access, and issue tracking. All set up at a moment’s notice, anywhere. Of course there’s a command line interface too.

I intend to use it for a few minor projects so I get a good sense of how it’s used. At this moment, the most nagging question is if it does anything like Git’s bisection (also available in Mercurial), which is very convenient when tracking down regressions.

OpenCL – now actually usable!

I’ve been experimenting a little bit with parallel programming, using a bunch of different interfaces – MPI, PVM, OpenMP, POSIX threads, parallel Haskell, Occam-π, and most recently OpenCL. I’ve also been looking at a few others, including XC and Spin. Of them all, OpenCL is by far the most promising when it comes to number crunching, for one simple reason – GPUs. It also has the advantages of being vendor neutral, C based, and openly published. The main downside would seem to be a lack of implementations, but it’s rapidly changing. It doesn’t by itself cover distribution over hosts (although nothing in the API prevents it), but it’s possible to combine with MPI or PVM, which do. If you only need CPU support, though, it’s likely easier to use OpenMP as it’s a more direct extension of C – and OpenMP programs reduce without modification to single threaded ones.
As for implementations, there are three big ones out for public use right now – Apple (in Mac OS X 10.6), AMD/ATI Stream, and nVidia (via CUDA). There’s mention of some others, of which the Gallium one interests me most as I am a free software enthusiast. The reason I’m writing this post is that I’ve finally been able to use nVidia’s implementation.
When I first looked into OpenCL, it was primarily to avoid the proprietary CUDA. I found nVidia did have OpenCL code in their GPU Computing SDK, but to my dismay, it was specific to an old driver, known to be buggy. I picked it up again because the most recent nVidia driver beta – 195.36.15 – contained new OpenCL libraries. With a bit of fiddling, this version actually functions on both of my computers that have a modern enough graphics card. There was just one snag while testing, and that is that OpenCL contexts must be created with a CL_CONTEXT_PLATFORM property. No really big deal, as I can just extract that from whatever device I find.

Here’s my simple OpenCL Hello World. It’s an excellent example of what sort of task you don’t leave for the GPU to do, as it’s a ridiculously small dataset and the code is full of conditionals while very low on actual processing. However, it does work, and has no extra dependencies. For some reason, that latter was one thing I didn’t find when looking about at examples. If you’re going to use OpenCL seriously, I suggest you check for errors and use something that can display them, for instance CLCC.