XML Generation in RPG
The company I work for makes heavy use of their IBM Power i midrange servers (previously known as AS/400 or iSeries servers). A lot of their software is written in the RPG programming language, which IBM originally developed back in the 1960s. The language was originally written to generate reports and lacked many “modern” programming features, such as IF statements and subroutines, which were added in RPG III.
Since starting at my current company, I’ve been trying to learn the current version of RPG, which is RPG IV (aka RPGLE or ILE/RPG). Most of the running code that I see in RPG is actually written using RPG III syntax despite the fact that RPG IV has been out since 1994. This is mostly due to the fact that much of it was either generated programmatically or was written before 1994. My goal in learning RPG isn’t to become proficient enough to program RPG for a living, but instead to become proficient enough to help our organization transition their existing systems to more modern technologies as needed. However, my “outsider” view of RPG (coming from a Java/Perl/Ruby/etc background) has helped me do some things with it that long time RPG programmers might not think of trying to do. This is an example of that.
Read more…
Huffman Coding, Unicode, and CJKV Data
Today I wrote a little utility in Java that compresses a file using Huffman coding. Normally Huffman coding works on 8-bit bytes. However, because of my experience dealing with Chinese, Japanese, Korean, and other non-English text I wondered how well the coding method would work on double byte character sets. Specifically, I was curious about compressing UTF-8 text.
Read more…
First Release of Japanese Dependency Vectors
At the end of last semester I finished the first version of Japanese Dependency Vectors (jpdv). I had to give up on using Clojure at the last minute because it was taking me too long to make progress and I needed to have some sort of a working system to turn in for my NLP final project.
To accomplish this I rewrote jpdv in Java. It took me about 18 hours of solid coding, minus time for food of course. ![]()
The software can now generate both context-based and dependency-based vector spaces for Japanese text that has been pre-parsed with CaboCha. It can also generate a similarity matrix for a given vector space using the cosine similarity measurement. I still need to add a path selection function to throw out paths that are too long and a basis element selection function that determines which N basis elements to keep out of all those discovered, but I will add those to the next release. I’m thinking of writing the path selection and basis element selection functions as Groovy scripts so that they can be supplied at run time. This would allow for better customization of the system at run time for a given task.
More information can be found here and on the GitHub page.
Here is an example similarity matrix generated by the current version of jpdv:
| WORD | コンピュータ | 兄弟 | 緑 | 赤い | 電話 | 青い | 黒い |
|---|---|---|---|---|---|---|---|
| コンピュータ | 1.00000 | 0.06506 | 0.07563 | 0.00000 | 0.07760 | 0.00000 | 0.00000 |
| 兄弟 | 0.06506 | 1.00000 | 0.19929 | 0.00000 | 0.14947 | 0.00000 | 0.00000 |
| 緑 | 0.07563 | 0.19929 | 0.99999 | 0.00000 | 0.19833 | 0.00000 | 0.00000 |
| 赤い | 0.00000 | 0.00000 | 0.00000 | 1.00000 | 0.00000 | 0.00000 | 0.01352 |
| 電話 | 0.07760 | 0.14947 | 0.19833 | 0.00000 | 1.00000 | 0.00000 | 0.00000 |
| 青い | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 1.00000 | 0.00000 |
| 黒い | 0.00000 | 0.00000 | 0.00000 | 0.01352 | 0.00000 | 0.00000 | 1.00000 |
Japanese Dependency Vectors
I’ve been working on a new project I call “Japanese Dependency Vectors” or “jpdv” for short. It’s a program that generates dependency based semantic vector spaces for Japanese text. (There’s already an excellent tool for doing this with English, which was written by Sebastian Pado.)
However, jpdv still has a way to go before it works as promised. So far the tool can parse CaboCha formatted XML and produce both a word co-occurrence based vector space and a slightly modified XML representation that better demonstrates the dependency relationships of the words in the text. The next step is to use the dependency information to produce the vector space that I need. Unfortunately, I only have until the end of next week to finish it, because I’m working on this as the final project in my NLP class this semester. I also plan to use the vector spaces created by the tool to do word sense disambiguation for the SEMEVAL-2 shared task on Japanese WSD.
(The image included here was generated by jpdv as a LaTeX file from one of the sentences I’m using for testing.)
Emacs, Clojure, and Japanese
Installing CaboCha in Mac OSX with MacPorts
CaboCha is a dependency parser for Japanese used by (among other things) the Japanese FrameNet project. Getting it installed and working on my mac turned out to be more work than I had anticipated, so I thought I would post instructions for anyone who might also want to install CaboCha.
Read more…
New Cubicle
I got my new cubicle assignment yesterday. The room is only around 5′×5′ and I share it with another graduate student, but it has a door that locks and a bookshelf where I can keep some of my books.
The Effect of Selectional Preferences on Semantic Role Labeling
My undergraduate honors thesis has been approved by my advisor and is now available onilne:
- Andrew Young. The Effect of Selectional Preferences on Semantic Role Labeling. Undergraduate Honors Thesis, The University of Texas at Austin.
It ended up being almost 60 pages and around 6000 words (according to a LaTeX word count tool I found.)
The 3 Year Plan
Next month I graduate from The University of Texas at Austin with a bachelors degree in linguistics with departmental honors. In September I start my graduate studies in the same department at UT, where I’ll be working on my masters degree with a specialization in computational linguistics. The original plan had been to apply for the PhD program at UT after completing my masters degree, but now my plans have changed. The new plan: Keio University.


