I’ve been working on a new project I call “Japanese Dependency Vectors” or “jpdv” for short. It’s a program that generates dependency based semantic vector spaces for Japanese text. (There’s already an excellent tool for doing this with English, which was written by Sebastian Pado.)
However, jpdv still has a way to go before it works as promised. So far the tool can parse CaboCha formatted XML and produce both a word co-occurrence based vector space and a slightly modified XML representation that better demonstrates the dependency relationships of the words in the text. The next step is to use the dependency information to produce the vector space that I need. Unfortunately, I only have until the end of next week to finish it, because I’m working on this as the final project in my NLP class this semester. I also plan to use the vector spaces created by the tool to do word sense disambiguation for the SEMEVAL-2 shared task on Japanese WSD.
(The image included here was generated by jpdv as a LaTeX file from one of the sentences I’m using for testing.)