As you no doubt know, I’m fascinated with both computing and electronics. I particularly like to learn about the borderlands of these fields, and started mucking about with microcontrollers some years ago, some CPLDs, and recently FPGAs. With my limited budget, I necessarily keep looking for the best a hobbyist can get, and that means manual soldering, two-layer PCBs, and manual via wiring. Surface mounted technology means less drilling, but some variants are too hard to handle, particularly BGAs.
I started my foray into self built controller boards by purchasing a few Atmel ATtiny2313 AVR microcontrollers. These have a very well thought out processor, fair amounts of memory built in, and plenty of specialized I/O devices to assist in various tasks. The weaknesses lie in resilience (I’ve seen output drivers fail), programming sensitivity (unintentional configurations can make a chip useless), and the fact that they’re 8-bit sequential machines. Anything that happens has to pass through the single CPU. The main weakness is that unless there’s an already included peripheral to do what you need, you can only update one or eight pins (in port groups) at a time, and the sequential operation means you must be careful to handle events fast enough to not miss others. Recently a variant called the Xmega was introduced, resolving some of this by adding DMA, transfer channels, and an event system; all leading to more direct connections and lower latencies.
From the other end of the spectrum I got a few Xilinx CPLDs and eventually a Spartan (3A) FPGA starter kit. These are programmable logic, allowing to build anything – as long as you have the logic cells – but also requiring you to do so. The ready built parts are few, from none in the CPLDs to the Spartan’s multipliers, block RAMs and clock managers. Example extra modules may be found at sites like Open Cores. Xilinx was chosen for a very simple reason – they’re the only PLD brand I know of with free to use development tools (Webpack). I tried getting started on Atmel’s hybrid FPGA/MCU devices, FPSLIC, but they have no synthesis tools. You’re left to attempt renting access to Mentor Graphics tools, at rates far surpassing any hobbyist budget. I did try to get an evalutation license a few times, but they never produced one that worked.
At the same time, multicore computing has seen a rise in popularity, and a few manufacturers have jumped on this bandwagon even for microcontrollers. Right know I can think of two, Parallax Propeller and Xmos XS1.
The Propeller is really a marvel of engineering – with misguided preconceptions. Parallax’ most well known product is without a doubt their Basic Stamp controller boards. They took the ease of programming from 80s era microcomputers – that is, built in Basic interpreters – and applied them to a popular microcontroller, the Microchip PIC family. PICs are very dominant in the MCU market, but they’re not the best design. Instructions take multiples of 4 cycles, so the higher frequencies aren’t really impressive. That latter feature was kept in the Propeller, as was the interpreted language, while “hard to use” features like interrupts were discarded. Basically, the propeller gives you eight microcontrollers in one, each having two timers, but everything has to be bit-banged and polled or waited for, and you can’t run at very high speeds – when you try to access shared memory, each cog has its dedicated 1/8th timeslot. And that includes the ROM, richly sprinkled with useful items like a font and sine tables. Per-cog memory is 512 words, half what my AVRs chosen for being cheap had. At least you get the important feature of 8(!) NTSC video generators. Meanwhile, the development tools are thrown together, and you get a “high level” language noone else uses, with trivial optimizations left out because the compiler itself is too difficult to maintain.
Xmos XS1 looks much more promising, to me. These engineers have learned from the past. Like Sun’s Niagara architecture, the processor itself is multithreaded. Like MIPS, it avoids interlocking pipeline stages, giving consistent instruction timing. From the Transputer, it inherits a multiprocessing model that’s easy to analyze and extend, with links for interconnecting multiple cores. We’re at the first generation of chips, with up to four cores, and there’s a sample board showing 512 hardware threads (using 16 chips, each with 4 cores, each with 8 threads). It’s still four clock cycles for one instruction, but by having each pipeline stage work for a different thread, four threads can run at full speed on a single core. And the clock speed is 400MHz here, so the instruction rate is 100MIPS – compared to the Propeller’s 20. Admittedly these are top ratings, as memory contention is an issue; memory access heavy code will need extra cycles for instruction loading, as may branching. Programming in familiar languages is available by leveraging free software development tools like LLVM. Occam’s parallel programming features have been translated into a very C-like language, giving a more familiar layout. But the important part comes in the I/O blocks, where we have precise timing, hardware assisted shifting and strobing, and events which can trigger interrupts or wake threads. The mere concept of a sleeping thread gives us automatic power saving (which the Propeller also has, to a degree), as opposed to the sleep management which is always a challenge on traditional MCUs.
But it eventually must come to complete designs to be interesting. 3.3V builds are now commonplace, so that’s no longer an obstacle, but the Spartan 3 family of chips require two to three different voltages for power; which must all be routed to the chip in at least four places. External clock sources are necessary for many tasks. And the Xmos XS1 comes either with a BGA package, or a QFP where the only ground connection is a pad in the bottom; which means you must solder under the component. It’s doable, but not easy. That’s where the Microchip and Parallax designs stand out; they make breadboard friendly PDIP components.
In all, this ranting summation wasn’t planned out. What I’d really like is some suggestions – are there any other options out there?
Pingback: More parallel computing chips « Yann Vernier’s Blog