Java NLP edit

Google has 20% time.  Facebook has Hackathons.  LinkedIn has Hackdays.  It seems that more and more companies are trying to entice engineers to work for them by showcasing the freedom they’re willing to give employees.  Typically this comes in the form of giving them free time to work on projects that might not be company priorities.  In the case of Google, the projects their employees work on have ended as serious products (Gmail, Google Suggest, Orkut).  For Facebook they’ve added real features to their site (Chat, Friend Suggester).  It’s clear that the time and trust the leaders of these companies show in their employees isn’t just getting spent playing Xbox and drinking.

Recently at my work they rolled out something like this.  We’re essentially being given 10% free time.  That’s 4 hours a week (ok most people don’t really work 40 hours but that’s the “officially budgeted time”) to spend on any project I want.  It’s only been a short time but some of the people there have already started to get out great things.  It’s really nice to work at a mortgage companies where numbers are important and be given the opportunity to do something where you can’t really measure success with numbers.

GATEI’ve been using this time for something quite un-mortgage related but something I’ve been meaning to spend time researching for a long time:  Natural Language Processing.  I had previously (about 3 years ago) spent a small amount of time looking at Proxem’s Antelope platform.  While tempted to return to working with this, Antelope is a .Net framework and I’ve been doing a lot of Java lately (ok I haven’t done any .net in a while).  Between working on Android Apps at home and at work, experimenting with JBoss, and the desire to learn more open source systems, I wanted to look into Java based NLP.  Fortunately, I found the GATE platform.  For those of you that didn’t bother to read the link above, Natural Language Processing is all about taking normal human speech and making it so a computer can understand it.  If you think about it, most of what you say (or think) doesn’t follow easy to understand rules.  It’s not like math.  Math is something that computers are exceptionally suited to understanding.  Math is rational.  Human speech is incredibly far from that.  So if you think about that, you begin to understand how difficult it is to make a computer understand the things you say or write.  A pretty cool example of something that has been done recently, and somewhat famously, in the field of Natural Language Processing is the Watson system that IBM built to compete on Jeopardy.  Watson used NLP to understand the questions being read (written in it’s case) and then figure out what was being asked so it would know how to answer.  I don’t think I’ll get to that level anytime (soon) but so far it’s been great to learn about it.

Chris Risner

Leave a Comment