I've been working on a new project I call “Japanese Dependency Vectors” or “ jpdv” for short. It's a program that generates dependency based semantic vector spaces for Japanese text. (There's already an excellent tool for doing this with English, which was written by Sebastian Pado.)
However, jpdv still has a way to go before it works as promised. So far the tool can parse CaboCha formatted XML and produce both a word co-occurrence based vector space and a slightly modified XML representation that better demonstrates the dependency relationships of the words in the text.
This might be proof that I'm crazy.
I'm working on a project for my NLP class that involves generating a semantic vector space for Japanese text, and I decided that this might be a good time to learn one of the LISP dialects. I've been looking at Clojure for a while now, but I hadn't taken the time to learn it before. I must say, I'm quite impressed so far. The fact that reading a Japanese XML document into a data structure “just works” without any tweaking is pretty nice.