TI MSP430 software developers need a little help

I’ve been working for a while on using MSP430 microcontrollers. We selected them for a bunch of reasons, including price, availability, physical size, USB support, and preloaded bootloader. As it turns out, a few of those weren’t quite in the shape we expected.

Programming difficulties

First, the bootloader is really a bare minimum. It does not cover such features as programming the flash memory on its own, so tools like python-msp430-tools download a secondary bootloader into RAM in order to function. That bootloader was presented as a binary blob, although with much searching it is possible to find the Custom Bootstrap Loader sources via a link within application note SLAA450 “Creating a Custom Flash-Based Bootstrap Loader (BSL)”. It’s also explicitly mentioned at the Software Tools section, with a link titled Open Source, but that goes to a wiki which does not provide this link. In the end, however, I gave up on that because not only is it obviously not free software or even open source, it completely failed to communicate once loaded. I ended up writing a workaround based on the user guide and BSL protocol guide (buried in an mbox file here, if you need it).

USB framework

The MSP430 USB Developers Package provides example code for USB. In fact, it contains no less than 52 copies of the same USB library – in turn divided into 4 USB device class sections and one common, all with lots of code duplication. It makes it all too clear that no forethought has gone into what’s a common section, as there’s no common code for talking to other endpoints than number 0; the rest is not only duplicated among classes, but present multiple times for each.

Once I got my code at least partially working with the USB library, I found some odd bugs – for instance, the MCU could hang when sent a bunch of data in quick succession. I tracked this down to an issue that’s not limited to the USB code, but in fact present already in SLAA294 “MSP430 Software Coding Techniques” – the starting point for pretty much all MSP430 code.

Interrupt flow chart
The above flowchart is based on one in SLAA294, and illustrates the combination of interrupts to react to events with a main thread that uses power saving sleep modes. The USB code didn’t even manage to follow this much, by the way; it was more sensitive to interrupt timing because the “Clear flag_1” portion was moved to after the “Execute flag_1 handler” section, meaning it could miss if this flag was applied again. However, this is only part of the problem.

There are two fundamental errors in the flowchart. First, there is no exit shown from the “Enter sleep” step, although it does continue to the first flag test once woken up. Secondly, the interrupts do not cause control to flow into that same test; they will return to wherever the main thread was. This could be anywhere within the main loop, including just before the “enter sleep” step – in which case the MCU will dutifully sleep until the next interrupt. For a simple example of this, consider what happens if first event 2 occurs, then event 1 while the main thread is at the flag_2 test step the second time around.

I propose a fairly simple solution. We add one more variable, let’s call it SR_sleep, containing the SR value used to enter sleep mode. When the ISRs decide to wake the main thread, they set not only the SR but also SR_sleep to active mode. Then the Enter sleep step is modified to contain precisely two instructions: One copies SR_sleep into SR, which will cause the main thread to sleep if SR_sleep has not been modified by an interrupt. The second sets SR_sleep to the sleep value. This acts similarly to the flags used to identify specific events, except there is no test; SR_sleep is set back to the sleep state immediately on wakeup, as we already know an interrupt must have occurred. This completely removes the window of time in which an interrupt may set a flag but fail to stop the main thread going to sleep. The trick revolves around the MSP430 not interrupting within an instruction, and being capable of loading SR from a memory variable in one instruction.

It gets somewhat more complicated if the main thread uses multiple sleep levels. In that case, the instruction resetting SR_sleep should read from a variable where the desired sleep mode is stored.

Slight update on parallel processing

I’ve written previously on parallel programming; once on APIs, and twice on smaller hardware implementations (1) (2). As is bound to happen, I missed some, made some mistakes, and the world moved on.

You’ll be glad to know all the major PLD developers offer gratis synthesis tools now, including Xilinx, Altera, Lattice and Actel (now MicroSemi). The latter two don’t have their own tools, though, which complicates matters a bit; the software vendors insist on tacking on sabotage systems like FlexLM, some options are time restricted, and even from the big two support for the largest chips isn’t included – but then, those require a hefty budget in the first place. That’s why I haven’t bought a Lattice ECP3 kit already; the software is only covered for a few months, after which it costs as much every year as the kit did in the first place. And that’s a low cost one.

OpenCL is alive and well, with company backed implementations from Apple, AMD, nVidia, Intel, IBM, S3/VIA and Zii Labs, and properly free software in pocl (Portable OpenCL) and Clover (for Gallium 3D). Admittedly the quality of these may vary, but it’s great to see it moving into budget devices (S3/VIA), non-GPU systems (Intel, AMD, pocl) and even low-power mobile devices (Zii Labs).

Speaking of Zii Labs, you may recall my negative comments regarding their blatant lies in marketing. They seem to have moved on (I don’t even find those materials now), as there are now some details (extremely little, but some), devices exist (although very few), and with OpenCL support (albeit in a restricted beta they haven’t replied to my inquiry about) their processing arrays become usable with portable code. I really hope they launch a good device this year, because the old ZiiO tablet isn’t quite worth the asking price where I live.

I’m still very annoyed when companies lie at me instead of presenting their products. One of the devices recently brought to my attention, Venray Technology’s TOMI, suffers from this. At its core, it’s a low instruction set computer with tightly coupled DRAM. It’s not a parallel processor at all, but the design is aimed at systems with multiple chips. It features four memory access units (including the instruction fetcher), eight general purpose registers, and one operation unit (with ARM-like preshifting in front of an ALU). It’s interesting in that it tries to deal with the memory bandwidth limited processing by distributing the processors (calling it a CPU would be way off). But the front and center marketing is, simply put, bullshit. Stop lying to your prospective customers.

I’d also failed to remember Ubicom in my list of parallel chips. It appears to be a barrel processor much like the XMOS ones, but in a higher end system on chip with ready designs for routers and an “internet radio” player. They’ve stayed away from video, however, so it’s perhaps not that remarkable in actual performance; more likely the architecture helps with responsiveness.

More parallel computing chips

Somewhat over a year ago I jotted down some notes on parallel microcontrollers. I hadn’t heard or done much since, but a few things have happened. I ended the note with a plea for more options, and today it was finally – albeit indirectly – answered. Slashdot picked up some PR from Intel regarding higly multi-core processors, and a comment regarding other brands mentioned two I had not yet heard of.

GreenArrays has started offering some of their larger chips for sale. They’re another product I suspect will be relegated to niche status and forgotten, which is really a pity as they have some very good ideas. The problems aren’t very complex, and not necessarily crippling. First, the whole design is based on the creator’s favourite language, Forth. It is a 1970’s language, and hasn’t changed much since. As such, the grand interactive development system is.. well.. like an 80s microcomputer. It simply doesn’t scale well, and that’s a problem when scaling is what it’s all about – they offer 144-core chips! The other drawback is the lack of communications routing, as all those cores must programmatically shuffle data between them (and yes, the entire layout has to be done manually for now). Finally, don’t expect a hobbyist foothold when only large BGA models are available, nor much of an industrial one while you’re the only source and porting costs would be immense. Where the design shines is in power efficiency, and it’s fairly impressive when it comes to speed and code density, but it just doesn’t seem enough.

Picochip multi-core DSPs fall in the hybrid chip category. They feature a reconfigurable section, but instead of the bit-level FPGA design they have a bunch of DSPs, while ARM cores handle the general purpose computing.

The Icera chips, on the other hand, I found no actual details about. It reminds me of Zii – there’s some DSP going on, but they won’t tell what.

The Zii Plaszma is actually being sold, with plenty of marketspeak claiming it’s revolutionary, but they seem more focused on making up analogies and buzzwords rather than admitting anything about the architecture or specifications. In fact, they’re so busy making these up that they’re outright lying about what other things do. Their marketing has convinced me not to trust them.

Parallel microcontrollers

As you no doubt know, I’m fascinated with both computing and electronics. I particularly like to learn about the borderlands of these fields, and started mucking about with microcontrollers some years ago, some CPLDs, and recently FPGAs. With my limited budget, I necessarily keep looking for the best a hobbyist can get, and that means manual soldering, two-layer PCBs, and manual via wiring. Surface mounted technology means less drilling, but some variants are too hard to handle, particularly BGAs.

I started my foray into self built controller boards by purchasing a few Atmel ATtiny2313 AVR microcontrollers. These have a very well thought out processor, fair amounts of memory built in, and plenty of specialized I/O devices to assist in various tasks. The weaknesses lie in resilience (I’ve seen output drivers fail), programming sensitivity (unintentional configurations can make a chip useless), and the fact that they’re 8-bit sequential machines. Anything that happens has to pass through the single CPU. The main weakness is that unless there’s an already included peripheral to do what you need, you can only update one or eight pins (in port groups) at a time, and the sequential operation means you must be careful to handle events fast enough to not miss others. Recently a variant called the Xmega was introduced, resolving some of this by adding DMA, transfer channels, and an event system; all leading to more direct connections and lower latencies.

From the other end of the spectrum I got a few Xilinx CPLDs and eventually a Spartan (3A) FPGA starter kit. These are programmable logic, allowing to build anything – as long as you have the logic cells – but also requiring you to do so. The ready built parts are few, from none in the CPLDs to the Spartan’s multipliers, block RAMs and clock managers. Example extra modules may be found at sites like Open Cores. Xilinx was chosen for a very simple reason – they’re the only PLD brand I know of with free to use development tools (Webpack). I tried getting started on Atmel’s hybrid FPGA/MCU devices, FPSLIC, but they have no synthesis tools. You’re left to attempt renting access to Mentor Graphics tools, at rates far surpassing any hobbyist budget. I did try to get an evalutation license a few times, but they never produced one that worked.

At the same time, multicore computing has seen a rise in popularity, and a few manufacturers have jumped on this bandwagon even for microcontrollers. Right know I can think of two, Parallax Propeller and Xmos XS1.

The Propeller is really a marvel of engineering – with misguided preconceptions. Parallax’ most well known product is without a doubt their Basic Stamp controller boards. They took the ease of programming from 80s era microcomputers – that is, built in Basic interpreters – and applied them to a popular microcontroller, the Microchip PIC family. PICs are very dominant in the MCU market, but they’re not the best design. Instructions take multiples of 4 cycles, so the higher frequencies aren’t really impressive. That latter feature was kept in the Propeller, as was the interpreted language, while “hard to use” features like interrupts were discarded. Basically, the propeller gives you eight microcontrollers in one, each having two timers, but everything has to be bit-banged and polled or waited for, and you can’t run at very high speeds – when you try to access shared memory, each cog has its dedicated 1/8th timeslot. And that includes the ROM, richly sprinkled with useful items like a font and sine tables. Per-cog memory is 512 words, half what my AVRs chosen for being cheap had. At least you get the important feature of 8(!) NTSC video generators. Meanwhile, the development tools are thrown together, and you get a “high level” language noone else uses, with trivial optimizations left out because the compiler itself is too difficult to maintain.

Xmos XS1 looks much more promising, to me. These engineers have learned from the past. Like Sun’s Niagara architecture, the processor itself is multithreaded. Like MIPS, it avoids interlocking pipeline stages, giving consistent instruction timing. From the Transputer, it inherits a multiprocessing model that’s easy to analyze and extend, with links for interconnecting multiple cores. We’re at the first generation of chips, with up to four cores, and there’s a sample board showing 512 hardware threads (using 16 chips, each with 4 cores, each with 8 threads). It’s still four clock cycles for one instruction, but by having each pipeline stage work for a different thread, four threads can run at full speed on a single core. And the clock speed is 400MHz here, so the instruction rate is 100MIPS – compared to the Propeller’s 20. Admittedly these are top ratings, as memory contention is an issue; memory access heavy code will need extra cycles for instruction loading, as may branching. Programming in familiar languages is available by leveraging free software development tools like LLVM. Occam’s parallel programming features have been translated into a very C-like language, giving a more familiar layout. But the important part comes in the I/O blocks, where we have precise timing, hardware assisted shifting and strobing, and events which can trigger interrupts or wake threads. The mere concept of a sleeping thread gives us automatic power saving (which the Propeller also has, to a degree), as opposed to the sleep management which is always a challenge on traditional MCUs.

But it eventually must come to complete designs to be interesting. 3.3V builds are now commonplace, so that’s no longer an obstacle, but the Spartan 3 family of chips require two to three different voltages for power; which must all be routed to the chip in at least four places. External clock sources are necessary for many tasks. And the Xmos XS1 comes either with a BGA package, or a QFP where the only ground connection is a pad in the bottom; which means you must solder under the component. It’s doable, but not easy. That’s where the Microchip and Parallax designs stand out; they make breadboard friendly PDIP components.

In all, this ranting summation wasn’t planned out. What I’d really like is some suggestions – are there any other options out there?