I’ve written previously on parallel programming; once on APIs, and twice on smaller hardware implementations (1) (2). As is bound to happen, I missed some, made some mistakes, and the world moved on.
You’ll be glad to know all the major PLD developers offer gratis synthesis tools now, including Xilinx, Altera, Lattice and Actel (now MicroSemi). The latter two don’t have their own tools, though, which complicates matters a bit; the software vendors insist on tacking on sabotage systems like FlexLM, some options are time restricted, and even from the big two support for the largest chips isn’t included – but then, those require a hefty budget in the first place. That’s why I haven’t bought a Lattice ECP3 kit already; the software is only covered for a few months, after which it costs as much every year as the kit did in the first place. And that’s a low cost one.
OpenCL is alive and well, with company backed implementations from Apple, AMD, nVidia, Intel, IBM, S3/VIA and Zii Labs, and properly free software in pocl (Portable OpenCL) and Clover (for Gallium 3D). Admittedly the quality of these may vary, but it’s great to see it moving into budget devices (S3/VIA), non-GPU systems (Intel, AMD, pocl) and even low-power mobile devices (Zii Labs).
Speaking of Zii Labs, you may recall my negative comments regarding their blatant lies in marketing. They seem to have moved on (I don’t even find those materials now), as there are now some details (extremely little, but some), devices exist (although very few), and with OpenCL support (albeit in a restricted beta they haven’t replied to my inquiry about) their processing arrays become usable with portable code. I really hope they launch a good device this year, because the old ZiiO tablet isn’t quite worth the asking price where I live.
I’m still very annoyed when companies lie at me instead of presenting their products. One of the devices recently brought to my attention, Venray Technology’s TOMI, suffers from this. At its core, it’s a low instruction set computer with tightly coupled DRAM. It’s not a parallel processor at all, but the design is aimed at systems with multiple chips. It features four memory access units (including the instruction fetcher), eight general purpose registers, and one operation unit (with ARM-like preshifting in front of an ALU). It’s interesting in that it tries to deal with the memory bandwidth limited processing by distributing the processors (calling it a CPU would be way off). But the front and center marketing is, simply put, bullshit. Stop lying to your prospective customers.
I’d also failed to remember Ubicom in my list of parallel chips. It appears to be a barrel processor much like the XMOS ones, but in a higher end system on chip with ready designs for routers and an “internet radio” player. They’ve stayed away from video, however, so it’s perhaps not that remarkable in actual performance; more likely the architecture helps with responsiveness.