Recently I’ve been doing a lot of experimenting with Natural Language Processing (NLP). One of the goals, or maybe necessities of the program I’m working on is having the ability to look up the definition of words it doesn’t understand. While most of the time, if I’m not sure how to spell a word or am looking for the definition, I just rely on Google. However, for a running computer program that will have an undetermined (and probably excessive) amount of input, it wouldn’t be very efficient to go to Google multiple times a second (plus google would frown on that). Enter WordNet. WordNet is a lexical database of words. Nouns, verbs, adjectives..OH MY! (Yes, WordNet is much more than JUST words and definitions, but for now we’ll stick with the words) On it’s own, WordNet is just a group of files that has all of this information in it, however, there are many different implementations of WordNet available.
As I’ve been experimenting with GATE in Java, I wanted a Java extension of WordNet. The first one I looked at was JWNL (or Java WordNet Library). Unfortunately, I ran into a lot of problems getting the JWNL library working. After getting frustrated enough to look for something else, I found extJWNL, which, according to the WordNet site is an updated version of JWNL. As I mentioned before, Wordnet, comes as just flat files with the data in them. extJWNL comes with the code to accessing these files as well as for accessing the data if it’s in a database. In addition (and perhaps critical given the code for accessing via the database) are scripts and code to insert all of the WordNet data into a database. Unfortunately, extJWNL seems to be scripted ONLY for windows. Up until the recent past, this wouldn’t phase me. However, at work I’m primarily using an OS X computer which means I needed to find a way to make this work. If you look at the dict2db.bat file, you can find what it’s really doing:
%JAVACMD% %JAVA_OPTS% %EXTRA_JVM_ARGUMENTS% -classpath %CLASSPATH_PREFIX%;%CLASSPATH% -Dapp.name="dict2db" -Dapp.repo="%REPO%" -Dbasedir="%BASEDIR%" net.sf.extjwnl.utilities.DictionaryToDatabase %CMD_LINE_ARGS%
If you care to look and figure out what get’s sent into the script file you can then figure out what’s being put in all the parameters. Or you can look at this which I used to actually run the code:
java -classpath lib/extjwnl-1.6.2.jar:lib/commons-logging-1.1.1.jar:lib/mysql-connector-java-5.1.17.jar:lib/extjwnl-utilities-1.6.2.jar -Dapp.name="dict2db" -Dapp.repo="lib" -Dbasedir="/" net.sf.extjwnl.utilities.DictionaryToDatabase src/extjwnl/main/resources/net/sf/extjwnl/file_properties.xml src/utilities/main/sql/create.sql com.mysql.jdbc.Driver jdbc:mysql://localhost/newwordnet
The important things here are:
1. The list of jar files (colon separated)
2. The repo and base directory location (since this line was run from the root extjwnl directory, we used “lib” and “/” respectively")
3. The path the to file properties file. This file is part of the extjwnl zip and you MUST alter it to point to where ever you have installed WordNet before running this.
4. the location of the create.sql file. Again this file is part of the extjwnl zip.
5. The mysql jdbc driver name and database path. In this example, I created a database named newwordnet and gave the anonymous user complete access to the database for the purposes of running this.
Once you’ve done that, sit back and watch as it enlightens you with multiple table creations and data insertions. A few minutes later, you’ll be database driven and ready to go.