Category Archives: Programming

Original provided by WikiMedia:

ARM Assembly: Binary Heaps

I got a Raspberry Pi for Christmas and I’ve been teaching myself ARM assembly. It’s my first time working with assembly language, as I didn’t take an systems architecture or OS fundamentals class in college.

I’m slowly working on a Huffman Encoder, trying to use only native Linux system calls without making calls to other external libraries.

This will be the first in a series of posts about this topic.

The posts probably won’t be very long, the stuff I’m doing isn’t revolutionary, so there’s not a lot to discuss.

Today I’m here to talk about binary heaps. When I learned about binary heaps in my data structures class I didn’t really understand the usefulness of them. I knew that you could use them to implement heapsort, and to make priority queues, but because they’re simpler than red-black trees and other binary search trees, they can’t be used for easy ordered traversal. Working in higher level languages I never really had a need to implement a heap for anything.

However, I am now in need of a priority queue, and I am trying to work in a static memory footprint because I’m trying not to allocate any additional memory beyond the initial memory footprint of the application. Suddenly binary heaps make a whole lot of sense. (Before you ask, yes, I did realize afterwards that I could solve the same problem using the same space and time complexity with a binary search tree, which is what I will implement next.)

The code I wrote to work with heaps is designed to be used for a priority queue. As such it stores a 2-word (64 bit) value in the heap. The first word (32 bits) of the value is the “key” or “priority”, used to sort the entry within the heap. The second word (32 bits) is the “value”. The value is actually optional, so one could use this same code for sorting an array of integers, for example, although with a few tweaks it could be made more more efficient for that purpose. Also, when removing an element I ought to swap it with the last element rather than just copying the last element into the first spot. That would make the algorithm effectively do a heapsort out-of-the-box. Maybe in v2. The code also stores the current size of the heap and the maximum size of the heap in the first two words of the heap’s memory space so that those values don’t need to be passed around by hand.

I am still learning assembly, but hopefully someone will find this useful.

The code is located here:
The test code is here:

A MUSH Written in Racket (Scheme)

I’ve been working on a project for the past couple days to learn the Racket programming language. Racket is based on Scheme, which is in turn based on Lisp. Racket includes a compiler that will produce a native binary on Mac OSX, Linux, or Windows as well.

The project I’m working on is a simple MUSH (Multi-User Shared Hallucination). The code is on my GitHub page if you want to take a look at it. It doesn’t do much yet, but I’ll keep adding to it as I have time to do so. The initial code base is around 260 lines of code and it allows players to chat with each other, look at the room they are in, and see who is online. It also contains the necessary data structures to allow users to move from one room to another and to have objects in their inventories, but the code to make use of those structures hasn’t been written yet.

Part of my goal with this project is to have a simple in-game environment for creating rooms, objects, and commands on-the-fly. The codebase still needs a long term storage mechanism before it will be useful though, as everything is currently stored in memory.

Regular Expression Magic and ViM

Every now and then I have an editing task for which Eclipse is just not up to the job. Usually it involves making a lot of changes all at once. (I realize that Eclipse has regex find/replace, but I feel much better working in ViM in these cases.)

Here is an example: I have a block of Java enum code that I want to split out into an enum and two maps.

Here is the regular expression I use to parse the enum declaration:


Here is an example of an enum declaration that matches this regular expression:

FACEBOOK(Constants.FACEBOOK, "fbconnect://success", "fbconnect://success?error_reason"),

I then run three s// commands on the input:

%s//callbackMap.put(Provider.\1, \3);/g
%s//cancelMap.put(Provider.\1, \4);/g

After each command I copy the buffer and paste it back into Eclipse, then hit ‘u’ to undo my changes so that I can run the next command on the original buffer contents.

Here’s what I get (these aren’t meant to be used in isolation like this, of course):

callbackMap.put(Provider.FACEBOOK, "fbconnect://success");
cancelMap.put(Provider.FACEBOOK, "fbconnect://success?error_reason");

For complete examples, check behind the cut.

Continue reading Regular Expression Magic and ViM

Fixing Column Span Issues in IE7 With Dojo

I’m currently working on a Dojo/Dijit based web application.  The HTML for this app is very minimal: just a div tag that is used as a connection point.  The rest of the HTML is generated on the fly by the Dijit libraries.

Here is an example:

var table = new dojox.layout.TableContainer({
    cols: 2,
    spacing: "5px"
    new dijit.form.TextBox({
        id: "textbox", 
        name: "textbox", 
        label: "Text Box:", 
        title: "Enter your text here!",
        colspan: 2

In most browsers this works just fine, but I’ve had several problems related to Internet Explorer 7.  One of these is that column spanning doesn’t work the way it should in tables.  This is happening because IE7 uses the attribute “colSpan” to set the column span on a TD element.  However, all the other browsers I test with use “colspan”.  (Notice the capitalization!)  The Dijit code is building the HTML by modifying the dom nodes directly instead of using the innerHtml attribute.  This means that it is calling code that looks like this:

element.colspan = x;

When it needs to be calling code like this:

element.colSpan = x;

To get around this I used the dojo.query method to get a list of all elements with a colspan attribute and then set the colSpan attribute to match here.

Here is my fix:

dojo.addOnLoad(function() {
    if(dojo.isIE > 0) {"Fixing IE colSpan issue.");
        // Fix colSpan attributes
        dojo.forEach(dojo.query("[colspan]"), function(tag) {
            tag["colSpan"] = tag.colspan;  

This fix isn’t needed for IE 8 or IE 9, but it doesn’t hurt to leave the code in for those as well. You could change the if condition to check for dojo.isIE > 0 && dojo.isIE < 8 if you don’t want to run the code on IE 8 or newer.

Editing Percent Values Using Dijit’s NumberTextBox

Dijit is a web UI toolkit built on top of the Dojo framework. One of its widgets is called NumberTextBox. This widget allows you to show and edit formatted numbers easily.

For example, I can create an instance of CurrencyTextBox (a subclass of NumberTextBox) and call set(“value”,  2589632).  This will display the value as follows (assuming that my locale is set to en_US):

If I click in the box to edit the value, it changes back to just numbers and looks like this:

Exiting the field will reformat the displayed value to match my locale’s number formatting guidelines.  However, when I call the get(“value”) method on the widget, I will get back the number 2589632 instead of the formatted string.  This makes it easily to implement locale sensitive number formatting for web applications.

The problem I’ve been having is that percents aren’t handled quite right. Setting the “type” constraint to “percent” will properly display percent values so that setting the value of the widget to “0.02” will display the string “2.00%”.

However, When the user clicks on the box to edit the value, the “2.00%” is replaced by it’s real value: “0.02”.

This causes two problems:

  1. The user is expecting to enter a percentage, not a decimal value.  Users are often confused when the value they were expecting to see suddenly changes.  Also, a user might enter “2” in the field, expecting it to show “2.00%”.  Instead it will show “200.00%”.
  2. Truncation can occur when the number being edited has too many decimal places. Setting the “places” constraint on the widget to “2” will display the value “0.0256” as “2.56%”. However, when you edit the value it truncates back to 2 decimal places, showing you only “0.02”. Those extra two decimal places are now lost forever.

There is an open defect for this issue, but I couldn’t wait for them to fix it in the main dojo code.  To get around these issues I started digging into the NumberTextBox code and found that I can replace the parse() method on the NumberTextBox object to get around the problem. This solution also appends a percent symbol to the end of the string if one was left off, making it easier for a user to simply enter “10” and have that converted to “10.00%” when they move to the next field.

Here is my solution:

function createPercentTextBox(name, title, extraParams) {
    if(!dojo.isObject(extraParams)) {
        extraParams = {};

    var parseFunction = function(expression, options) {
        if(dojo.isString(expression)) {
            expression = dojo.trim(expression);
            var i = expression.lastIndexOf("%");
            if(i == -1) {
                expression = expression + "%";
        var value = dojo.number.parse(expression, { pattern: '0%' });
        return value;

    var params = {
        id: name,
        name: name,
        label: title ? title + ":" : "",
        title: title ? title : "",
        constraints: {
            type: 'percent',
            places: 2
        editOptions: {
            pattern: "##0.00%"
        parse: parseFunction

    dojo.mixin(params, extraParams);

    return new dijit.form.NumberTextBox(params);

Note: I’m using Dojo 1.6 for this example. Future versions of Dojo may resolve this issue more elegantly.

A Simple Esperanto Keyboard for iPhone

Klavaro de Esperanto

Today I wrote a simple Esperanto keyboard for the iPhone. It’s really just a little HTML page with some JavaScript and CSS to make it look like an iPhone app when you view it from an iPhone. It looks pretty bad if you view it outside of an iPhone though, because the buttons don’t line up right. Maybe I’ll make it more fully featured on non-iPhone browsers, but there are better tools out there for that already.

XML Generation in RPG

The company I work for makes heavy use of their IBM Power i midrange servers (previously known as AS/400 or iSeries servers). A lot of their software is written in the RPG programming language, which IBM originally developed back in the 1960s. The language was originally written to generate reports and lacked many “modern” programming features, such as IF statements and subroutines, which were added in RPG III.

Since starting at my current company, I’ve been trying to learn the current version of RPG, which is RPG IV (aka RPGLE or ILE/RPG). Most of the running code that I see in RPG is actually written using RPG III syntax despite the fact that RPG IV has been out since 1994. This is mostly due to the fact that much of it was either generated programmatically or was written before 1994. My goal in learning RPG isn’t to become proficient enough to program RPG for a living, but instead to become proficient enough to help our organization transition their existing systems to more modern technologies as needed. However, my “outsider” view of RPG (coming from a Java/Perl/Ruby/etc background) has helped me do some things with it that long time RPG programmers might not think of trying to do. This is an example of that.
Continue reading XML Generation in RPG

Huffman Coding, Unicode, and CJKV Data

Today I wrote a little utility in Java that compresses a file using Huffman coding. Normally Huffman coding works on 8-bit bytes. However, because of my experience dealing with Chinese, Japanese, Korean, and other non-English text I wondered how well the coding method would work on double byte character sets. Specifically, I was curious about compressing UTF-8 text.
Continue reading Huffman Coding, Unicode, and CJKV Data

First Release of Japanese Dependency Vectors

At the end of last semester I finished the first version of Japanese Dependency Vectors (jpdv).  I had to give up on using Clojure at the last minute because it was taking me too long to make progress and I needed to have some sort of a working system to turn in for my NLP final project.

To accomplish this I rewrote jpdv in Java.  It took me about 18 hours of solid coding, minus time for food of course.

The software can now generate both context-based and dependency-based vector spaces for Japanese text that has been pre-parsed with CaboCha.  It can also generate a similarity matrix for a given vector space using the cosine similarity measurement.  I still need to add a path selection function to throw out paths that are too long and a basis element selection function that determines which N basis elements to keep out of all those discovered, but I will add those to the next release.  I’m thinking of writing the path selection and basis element selection functions as Groovy scripts so that they can be supplied at run time.  This would allow for better customization of the system at run time for a given task.

More information can be found here and on the GitHub page.

Here is an example similarity matrix generated by the current version of jpdv:

WORD コンピュータ 兄弟 赤い 電話 青い 黒い
コンピュータ 1.00000 0.06506 0.07563 0.00000 0.07760 0.00000 0.00000
兄弟 0.06506 1.00000 0.19929 0.00000 0.14947 0.00000 0.00000
0.07563 0.19929 0.99999 0.00000 0.19833 0.00000 0.00000
赤い 0.00000 0.00000 0.00000 1.00000 0.00000 0.00000 0.01352
電話 0.07760 0.14947 0.19833 0.00000 1.00000 0.00000 0.00000
青い 0.00000 0.00000 0.00000 0.00000 0.00000 1.00000 0.00000
黒い 0.00000 0.00000 0.00000 0.01352 0.00000 0.00000 1.00000

Japanese Dependency Vectors

I’ve been working on a new project I call “Japanese Dependency Vectors” or “jpdv” for short.  It’s a program that generates  dependency based semantic vector spaces for Japanese text.  (There’s already an excellent tool for doing this with English, which was written by Sebastian Pado.)

However, jpdv still has a way to go before it works as promised.  So far the tool can parse CaboCha formatted XML and produce both a word co-occurrence based vector space and a slightly modified XML representation that better demonstrates the dependency relationships of the words in the text.  The next step is to use the dependency information to produce the vector space that I need.  Unfortunately, I only have until the end of next week to finish it, because I’m working on this as the final project in my NLP class this semester.  I also plan to use the vector spaces created by the tool to do word sense disambiguation for the SEMEVAL-2 shared task on Japanese WSD.

(The image included here was generated by jpdv as a LaTeX file from one of the sentences I’m using for testing.)

Replacing Emoji...
Replacing Emoji...
Replacing Emoji...
Replacing Emoji...