After taking a hiatus for two years, I’ve started working with ARM assembly language again. I realized that the code I had been working on before had become a sort of utility library, so I rearranged the git repository to reflect that.
While doing so, I noticed that my sorting libraries were in an incomplete state, so I decided to work on finishing them. The result is that I now have three working sort functions that all operate in place on an existing array of 32bit signed integers.
Today I wrote a little utility in Java that compresses a file using Huffman coding. Normally Huffman coding works on 8-bit bytes. However, because of my experience dealing with Chinese, Japanese, Korean, and other non-English text I wondered how well the coding method would work on double byte character sets. Specifically, I was curious about compressing UTF-8 text.
UTF-8 is a variable length encoding for Unicode data that stores characters using between one and four bytes per character.