Processor affinity using Cygwin

I’ve been working on a Python script that takes a long time to run (about 2.5h), and as it was entirely single threaded I figured I’d bind it to a specific core, to reduce cache thrashing, enable clock boosting and such. I wanted a method that worked for arbitrary commands, or I’d use affinity package. The cmd start command always creates a new window, and I wanted the output in my existing shell session. My workaround involves using Powershell to set affinity once the process is running.

winpid () {
    # Find the Windows PIDs of specified Cygwin PIDs, using ps
    local pid=${1:-$$}
    ps -lp "$pid" | sed -ne "s/^. *$pid [ 0-9]\{16\} *\([0-9]\+\).*\$/\1/p"
    while [ $# -gt 1 ] ; do
        ps -lp "$1" | sed -ne "s/^. *$1 [ 0-9]\{16\} *\([0-9]\+\).*\$/\1/p"
setaffinity () {
    local bitmask=$((1<<$1))
    for wp in `winpid $*` ; do
        powershell -Command "[System.Diagnostics.Process]::GetProcessById($wp).ProcessorAffinity=$bitmask;"
withaffinity () {
    local affinity=$1
    "$@" &
    setaffinity $affinity $!

With these bash functions, I can run “withaffinity 3 somecommand” and have it moved to core 3 specifically.

TI MSP430 software developers need a little help

I’ve been working for a while on using MSP430 microcontrollers. We selected them for a bunch of reasons, including price, availability, physical size, USB support, and preloaded bootloader. As it turns out, a few of those weren’t quite in the shape we expected.

Programming difficulties

First, the bootloader is really a bare minimum. It does not cover such features as programming the flash memory on its own, so tools like python-msp430-tools download a secondary bootloader into RAM in order to function. That bootloader was presented as a binary blob, although with much searching it is possible to find the Custom Bootstrap Loader sources via a link within application note SLAA450 “Creating a Custom Flash-Based Bootstrap Loader (BSL)”. It’s also explicitly mentioned at the Software Tools section, with a link titled Open Source, but that goes to a wiki which does not provide this link. In the end, however, I gave up on that because not only is it obviously not free software or even open source, it completely failed to communicate once loaded. I ended up writing a workaround based on the user guide and BSL protocol guide (buried in an mbox file here, if you need it).

USB framework

The MSP430 USB Developers Package provides example code for USB. In fact, it contains no less than 52 copies of the same USB library – in turn divided into 4 USB device class sections and one common, all with lots of code duplication. It makes it all too clear that no forethought has gone into what’s a common section, as there’s no common code for talking to other endpoints than number 0; the rest is not only duplicated among classes, but present multiple times for each.

Once I got my code at least partially working with the USB library, I found some odd bugs – for instance, the MCU could hang when sent a bunch of data in quick succession. I tracked this down to an issue that’s not limited to the USB code, but in fact present already in SLAA294 “MSP430 Software Coding Techniques” – the starting point for pretty much all MSP430 code.

The above flowchart is based on one in SLAA294, and illustrates the combination of interrupts to react to events with a main thread that uses power saving sleep modes. The USB code didn’t even manage to follow this much, by the way; it was more sensitive to interrupt timing because the “Clear flag_1” portion was moved to after the “Execute flag_1 handler” section, meaning it could miss if this flag was applied again. However, this is only part of the problem.

There are two fundamental errors in the flowchart. First, there is no exit shown from the “Enter sleep” step, although it does continue to the first flag test once woken up. Secondly, the interrupts do not cause control to flow into that same test; they will return to wherever the main thread was. This could be anywhere within the main loop, including just before the “enter sleep” step – in which case the MCU will dutifully sleep until the next interrupt. For a simple example of this, consider what happens if first event 2 occurs, then event 1 while the main thread is at the flag_2 test step the second time around.

I propose a fairly simple solution. We add one more variable, let’s call it SR_sleep, containing the SR value used to enter sleep mode. When the ISRs decide to wake the main thread, they set not only the SR but also SR_sleep to active mode. Then the Enter sleep step is modified to contain precisely two instructions: One copies SR_sleep into SR, which will cause the main thread to sleep if SR_sleep has not been modified by an interrupt. The second sets SR_sleep to the sleep value. This acts similarly to the flags used to identify specific events, except there is no test; SR_sleep is set back to the sleep state immediately on wakeup, as we already know an interrupt must have occurred. This completely removes the window of time in which an interrupt may set a flag but fail to stop the main thread going to sleep. The trick revolves around the MSP430 not interrupting within an instruction, and being capable of loading SR from a memory variable in one instruction.

It gets somewhat more complicated if the main thread uses multiple sleep levels. In that case, the instruction resetting SR_sleep should read from a variable where the desired sleep mode is stored.

Adapteva Epiphany parallel chip

I’ve written previously on the subject of parallel processing – mostly with a focus on microcontrollers. I’ve also noted that there’s a hole in the current offerings, with FPGAs being extremely fine grained and GPUs being specialized on massively parallel computations with the same essential program. The Zii Labs processors made me curious, the Green Arrays chips lacked the switching layer that is present in the XMOS and Transputer systems.. but we finally see a real contender.

I had the good fortune to talk to one of the people responsible for the development tools for Adapteva Epiphany, currently in a Kickstarter campaign for a computer named Parallella. This is the real deal – low power, high performance, and properly available documentation and tools. It’s not like Zii, where you can request an OpenCL implementation and never get a reply, nor like GreenArray where there’s only one possible programming language. This time there’s floating point and integer support, a unified memory system (although local memory is obviously the fastest), and somebody has prepared a board to get started! So what do we wait for? Only enough backers. Currently we’re short, and I for one have already signed up. Update: funding succeeded!

At a technical conference the first question was what the chip is for. In short, new applications; this level of performance in this efficient a package has not been available (to the public) before. I think a graphics card will still be the more efficient option for Bitcoins, but imagine a synthesizer musician no longer constrained by a local computer. Or a fully programmable camera capable of doing the trivial stuff – like lens correction and HDR imagery – on the fly. This is just the beginning.

Oh, and incidentally, it has one of the coolest FPGAs I’ve seen on a budget board, at the lowest price I know of (1/3rd of the next one). I may go into more detail on the architecture later on. 🙂

Some thoughts on dancing games (and exercise)

I have always liked music. When I was younger I played and sang more, and I recall even liking to dance. Nowadays, I mostly stay away from all; some because I don’t know the songs, but mostly out of irrational embarrassment. Coupled with fairly stationary studies and work, and rarely even taking a walk, let alone climbing or hiking (even though I quite like to), I’ve grown weaker – and flabbier – than I like.

My main issues with dance are that I don’t know what to do (and yes, rationally I figure it is not that important as long as I keep up the flow… but that does not stop me feeling lost, which breaks my stride), and even when I do have an idea, I may be too uncoordinated to pull it off. Rhythm isn’t a problem; my problem with DDR-style games, of which I mostly played Stepmania (I even instigated a bulk purchase of mats once), is only to read the abstract symbols ahead far enough. And rarely to stay in place over the mat.

Technology caught up.
Continue reading

ST1080 – how bad does it get?

I’ve been asked to record some video of how the ST1080 really performs. In most cases, it’s not a bad experience, but knowing its limitations I decided to set about reproducing the actual problems. I also gave a try at video editing to see if I could present this in a helpful manner, but in the end that turned into just a bad excuse for further procrastination so I’ve decided to publish raw files showing the performance of the ST1080. The files in there are a jumble and very easily misinterpreted, so here’s a bit of a guide. Beware: A lot of the video clips are high speed footage, so expect extreme amounts of flickering. This can be dangerous to some individuals, such as optically triggered epileptics.

CIMG 2880 through 2892 show the stain in the left lens that troubled me earlier. It is now gone, and might have been as simple as some condensation on the rear lens.

CIMG 2893 shows a peek through the front of the ST1080; it’s very reflective at the partially translucent mirror, and the covering it’s shipped with is not very good for reapplying after stripping off (it stretches). The extras included appear to be a different type, however.

CIMG 2894 through 2897 are high speed video clips of static images, showing the sequential color field display technique used by the ST1080. Setting a lower brightness simply reduces the length of each strobe, likely making the so-called rainbow effect worse, not better.

CIMG 2898 through 2904 are high speed footage of a moving test pattern (a simple vertical colored bar, moving side to side and changing colors, synchronised to vblank). They did not really show anything particular, though one effect of the sequential fields is that objects suddenly switching color may stagger since they’re not shown in an even frame rate.

CIMG 2905 is a 1000fps video showing the HMZ-T1 (on the left) and ST1080 (on the right). Both are fed the same video signal through an HDMI splitter, and that signal is live from a video camera into which we blink an LED light. It thus shows very sharp transitions between a fully lit and dark picture. Here I expected the HMZ-T1 to show pictures before the ST1080, because unlike the ST1080 it not only shows all colors at once but scans similarly to a traditional CRT (notable by a dark band sweeping across the display). What I found surprised me – you can count about 80ms delay in the HMZ-T1, remarkable as we know the ST1080 is already delayed about 17ms.

CIMG 2906 and 2907 are pictures I’ve posted before, approximating the difference in field of view for those two devices. The color error in the ST1080 is simply because the exposure is not well aligned with the color sequence.

CIMG 2908 is a high speed recording of live motion (a swinging foot) displayed by the ST1080. While it’s not clearly visible, the ST1080 accepts interlaced material and displays full frames with half the lines updated – a method that can cause a comb effect. I believe the camera in this instance is sending progressive frames as an interlaced signal. Nothing to really remark on.

CIMG 2920 and 2921 is some high resolution video showing fumbling around and a bit of Big Buck Bunny. Nothing to see here. Colors go nuts as usual. The bright spots aren’t live pixels but dust in the display module, an issue I understand SMD have done work to avoid in later units.

CIMG 2922 through 2926 show the beginning of Elephants Dream. This is of note because it has just the sort of scene the ST1080 does badly at; high motion, dark, yet high contrast. The dark scene makes the side light bleeding much more noticable, and the moving high contrast detail triggers the “rainbow effect” quite handily. The frame rate of the source material and display differ, however, so don’t blame uneven motion on the ST1080.

CIMG 2927 and 2928 show a cheap camera shutter bouncing. Nothing to do with the ST1080, but at this point I’ll just leave it there and see if anyone remarks on it.

CIMG 2929 through 2932 are stills of the display; 2930 in particular manages to show that the color reproduction isn’t all bad.

CIMG 2933 through 2935, 2941 and 2942 again show Elephants Dream (the file being played is also in the folder as ED_HD.avi). In particular 2935 has about 9 minutes of it, thus showing a lot of different scenes.

CIMG 2936 through 2940 are test stills attempting to show the display resolution. This is xclock with antialiasing turned off, so we should get some pixelated diagonal lines to examine. The lower resolution shots are the rescaled 720p mode, in which I have not yet noticed any problems. That is more than I can say of any of the 3D capable games I tested it with, because they all render at even lower resolutions and have horrible scaling artefacts on top of sad frame rates.

CIMG 2943 shows something remarkable. It’s a fairly static display, but is supposed to be 24fps output from an nvidia board. Now, I don’t have another 24fps display to compare with, but to my eyes this looked horrible, and the 240fps footage shows precisely what’s going on. We seem to be getting nearly precisely 240 fields per second – which raises the question, with three colors to show, what is the tenth field? – and therein lies the problem. The tenth field is black, but the LED is still strobing through a sequence of colors. For some reason, this lights up the border around the picture brightly, causing a 24fps strobe in different colors. Now, even at the cinema they know 24fps is never enough – that’s why they triple it, just like here. But with the color sequence I’m seeing frames going gBRGBRGBRG rGBRGBRGBR bRGBRGBRGB. So aside from tripled frames with a different field order each frame, we get strobes of light not involved in the picture. The proper fix is to simply not strobe the LEDs for the blank time slot, but this may be impossible to fix in the updatable firmware if it’s in the display module rather than the control box. As is, I must sadly report that 24fps is unusable on the ST1080 (but they could possibly fix it in another firmware update).

Finally, the MOV files are some useless video footage of the way I shot CIMG 2929 through 2942. And yes, the camera this is shot with can do 1080p24, but it cannot photograph the pictures within HMDs well because the lenses are too large.

I basically agree with a statement I saw on the MTBS3D forum: The ST1080 is the best consumer HMD yet, and not yet very good. I believe this market is in its infancy and the two devices mentioned here are just at the threshold; the ST1080 is good enough for me to purchase, but not to use most days as I had envisioned.

Expanding the computer’s interface

I’ve recently ordered yet another gadget – an oversized android tablet. The intent is to fill a role somewhere between my laptop, e-reader and phone, but also to provide more work area (the main reason I always want higher resolution displays). There just isn’t enough on just two monitors, and I’ll never quite understand why Dell saw fit to put seven video outputs but only two display controllers in the laptop.

My initial simple idea for how to do this involves using a larger framebuffer and VNC to display an off-screen section over the network. Or perhaps distributed multi-head X. I might have to tweak my window manager’s idea of what screens are about a bit, but it should fit neatly into the existing Xinerama support. That should cover getting a picture up as a third screen.

It doesn’t quite cover another issue I’ve been feeling a bit; the controls of my windowing system aren’t aging well. With the Alphagrip I’m already feeling that the super-shift-digit binding for moving windows is impossible, and the tablet won’t have any keys at all when I’m using it away from the work terminal itself (unless it’s in a dock). So it’s time to look at other schemes, like tagging windows and using gestures.

A few programs have their own gesture support, such as Xmonad, Blender, gschem, epiphany and firefox (some of those only through extensions). But we can do better, and I believe I shall try with easystroke – a gesture recognition tool that can send custom commands to other programs. It’s not proper TUIO control (which would support multitouch), but it’s a start.

Silicon Micro Display ST1080 – Early Impressions

For once, one of the products I have preordered has actually been delivered. This time it’s the Silicon Micro Display ST1080, a head mounted display in goggle form featuring independent 1920×1080 color displays, as well as partially translucent mirrors permitting you to view the outside world through them. This post will detail some early impressions.

Continue reading

Slight update on parallel processing

I’ve written previously on parallel programming; once on APIs, and twice on smaller hardware implementations (1) (2). As is bound to happen, I missed some, made some mistakes, and the world moved on.

You’ll be glad to know all the major PLD developers offer gratis synthesis tools now, including Xilinx, Altera, Lattice and Actel (now MicroSemi). The latter two don’t have their own tools, though, which complicates matters a bit; the software vendors insist on tacking on sabotage systems like FlexLM, some options are time restricted, and even from the big two support for the largest chips isn’t included – but then, those require a hefty budget in the first place. That’s why I haven’t bought a Lattice ECP3 kit already; the software is only covered for a few months, after which it costs as much every year as the kit did in the first place. And that’s a low cost one.

OpenCL is alive and well, with company backed implementations from Apple, AMD, nVidia, Intel, IBM, S3/VIA and Zii Labs, and properly free software in pocl (Portable OpenCL) and Clover (for Gallium 3D). Admittedly the quality of these may vary, but it’s great to see it moving into budget devices (S3/VIA), non-GPU systems (Intel, AMD, pocl) and even low-power mobile devices (Zii Labs).

Speaking of Zii Labs, you may recall my negative comments regarding their blatant lies in marketing. They seem to have moved on (I don’t even find those materials now), as there are now some details (extremely little, but some), devices exist (although very few), and with OpenCL support (albeit in a restricted beta they haven’t replied to my inquiry about) their processing arrays become usable with portable code. I really hope they launch a good device this year, because the old ZiiO tablet isn’t quite worth the asking price where I live.

I’m still very annoyed when companies lie at me instead of presenting their products. One of the devices recently brought to my attention, Venray Technology’s TOMI, suffers from this. At its core, it’s a low instruction set computer with tightly coupled DRAM. It’s not a parallel processor at all, but the design is aimed at systems with multiple chips. It features four memory access units (including the instruction fetcher), eight general purpose registers, and one operation unit (with ARM-like preshifting in front of an ALU). It’s interesting in that it tries to deal with the memory bandwidth limited processing by distributing the processors (calling it a CPU would be way off). But the front and center marketing is, simply put, bullshit. Stop lying to your prospective customers.

I’d also failed to remember Ubicom in my list of parallel chips. It appears to be a barrel processor much like the XMOS ones, but in a higher end system on chip with ready designs for routers and an “internet radio” player. They’ve stayed away from video, however, so it’s perhaps not that remarkable in actual performance; more likely the architecture helps with responsiveness.

More unsorted Android addons

I’m sure everybody’s collection grows as time goes by, but I thought I’d note down a few more tools I’ve come across – mostly because for some, there are so many options it’s actual work to try them out, and it’s a pain to have to customize one just to find another did the job better.

A fundamental annoyance with phones is how they track a lot of information about their environs only to not use it to assist me. For instance, they are constantly connecting to cell stations, which gives them their approximate location, but mapping it to a location takes an extra database lookup. And while most wifi networks we use stay put, we normally can’t associate them with locations. Enter tools like Llama. It lets me define things like times to keep the phone quiet, to turn wifi off when I’m out of range to save battery, or even turn alarms off based on calendar events. There are other similar tools, like Tasker and Profiles, but Llama was free and works nicely. If I were to request an improvement, it would be an idea I’ve had for a long time – associate the areas with travel time between them, and use that to adjust reminders.

Also on a related note, telephony is fundamentally timing sensitive, yet for some reason the current time and date are not set on phones – even when they have multiple sources. My carrier doesn’t seem to send calendar time on GSM, but what gives with not having the option to set the time from GPS? Anyhow, ClockSync let me set time from NTP, although doing it automatically requires root.

A less security oriented machine I was supposed to log in to remotely uses VNC. The simple pick for a viewer was android-vnc-viewer, which I would have found a lot faster had I not first searched for “remote desktop”.

When I looked up PwdHash again, a tool for having unique passwords for every website without needing to either memorize or store them all, I found that there are multiple implementations for Android. The one I settled on is Password Hash, because it has published source, properly avoids any Android privileges (so you know it’s not sending your passwords elsewhere), and the resulting hash can be read off the screen. That last thing is not so good if someone else can see, of course, but it means you can use it together with other machines you don’t trust with your master password(s). Within the Android browser, the page sharing feature is used to call up the hasher with the site filled in, and you can paste the hash.

I went looking for a different launcher mostly because I find it annoying that the layout has the same grid size no matter how the screen is oriented in ADW Launcher. I did not find one that fixed that. I did however find a tool to reduce the pointless wait in task swapping. Normally, you have to long-press home, and the most recent apps list fades in slowly. With SwipePad, you can swipe from the edge of the screen to launch pretty much anything – although making it aware of what programs are running takes a modestly priced add-on. If you find yourself demanding multiple of those addons, perhaps Power Strip or Wave Launcher is a better option.

I’ve also added a few network tools, like AndFTP (which does file transfers with a bunch of protocols) and Fing (which is a more general network toolbox starting with a scanner like nmap), but I haven’t actually had much use or need for them yet. They were suggested by AppBrain, a service for cataloguing Android software slightly better than the Market – albeit only slightly. For instance, I can filter on free apps, but not ad-free ones.