After taking a hiatus for two years, I’ve started working with ARM assembly language again. I realized that the code I had been working on before had become a kind of utility library, so I rearranged the git repository to reflect that.
While doing so, I noticed that my sorting libraries were in an incomplete state, so I decided to work on finishing them. The result is that I now have four working sort functions that all operate in place on an existing array of 32bit signed integers.
I got a Raspberry Pi for Christmas and I’ve been teaching myself ARM assembly.
It’s my first time working with assembly language, as I didn’t take an systems architecture or OS fundamentals class in college.
I’m slowly working on a Huffman Encoder, trying to use only native Linux system calls without making calls to other external libraries.
This will be the first in a series of posts about this topic.
I’ve been working on a project for the past couple days to learn the Racket programming language. Racket is based on Scheme, which is in turn based on Lisp. Racket includes a compiler that will produce a native binary on Mac OSX, Linux, or Windows as well.
The project I’m working on is a simple MUSH (Multi-User Shared Hallucination). The code is on my GitHub page if you want to take a look at it.
Every now and then I have an editing task for which Eclipse is just not up to the job. Usually it involves making a lot of changes all at once. (I realize that Eclipse has regex find/replace, but I feel much better working in ViM in these cases.)
Here is an example: I have a block of Java enum code that I want to split out into an enum and two maps.
Dijit is a web UI toolkit built on top of the Dojo framework. One of its widgets is called NumberTextBox. This widget allows you to show and edit formatted numbers easily.
For example, I can create an instance of CurrencyTextBox (a subclass of NumberTextBox) and call set("value", 2589632). This will display the value as follows (assuming that my locale is set to en_US):
If I click in the box to edit the value, it changes back to just numbers and looks like this:
The company I work for makes heavy use of their IBM Power i midrange servers (previously known as AS/400 or iSeries servers). A lot of their software is written in the RPG programming language, which IBM originally developed back in the 1960s. The language was originally written to generate reports and lacked many «modern» programming features, such as IF statements and subroutines, which were added in RPG III.
Since starting at my current company, I’ve been trying to learn the current version of RPG, which is RPG IV (aka RPGLE or ILE/RPG).
Today I wrote a little utility in Java that compresses a file using Huffman coding. Normally Huffman coding works on 8-bit bytes. However, because of my experience dealing with Chinese, Japanese, Korean, and other non-English text I wondered how well the coding method would work on double byte character sets. Specifically, I was curious about compressing UTF-8 text.
UTF-8 is a variable length encoding for Unicode data that stores characters using between one and four bytes per character.
At the end of last semester I finished the first version of Japanese Dependency Vectors (jpdv). I had to give up on using Clojure at the last minute because it was taking me too long to make progress and I needed to have some sort of a working system to turn in for my NLP final project. To accomplish this I rewrote jpdv in Java. It took me about 18
I’ve been working on a new project I call «Japanese Dependency Vectors» or «jpdv» for short. It’s a program that generates dependency based semantic vector spaces for Japanese text. (There’s already an excellent tool for doing this with English, which was written by Sebastian Pado.)
However, jpdv still has a way to go before it works as promised. So far the tool can parse CaboCha formatted XML and produce both a word co-occurrence based vector space and a slightly modified XML representation that better demonstrates the dependency relationships of the words in the text.