Semantic text annotation tools using Wordnet and DBPedia
Marvin is a semantic text annotation tool that uses various external sources to annotate imputed text. Marvin text annotator can be also used as a java library. Marvin currently supports tagging using Wordnet and DBPedia (linked data version of Wikipedia). So our small Marvin semantic annotator already has a lot of knowledge, which will make anyone probably depressed and therefore we gave him a name with reference to Hitchhikers guide to the Galaxy depressed robot.
Marvin semantic text annotatior is a java program that can be also used as a java library for other application. This means it is an .jar file, which contains all the resources inside it. However, for use of Wordnet, Wordnet has to be installed from https://wordnet.princeton.edu/wordnet/download/current-version/
After the installation it is necessary to configure Wordnet path in file_properties.xml file. Tag that currently states following:
<param name="dictionary_path" value="C:\Program Files (x86)\WordNet\2.1\dict"/>
has to be changed with the correct path within the machine where Wordnet is installed. There are no other requirements for the installation.
In order to run Marvin semantic annotator you can type in command line
java -jar Marvin.jar "Sentence to be semantically annotated."
In order to run Marvin semantic text annotator as a library, you need to also to install WordNet on a used machine and then include the Marvin.jar file into your Java project.
The library contains methods that would query DBPedia and Wordnet and both method return a LinkedList of objects called WordMeaningOutputElement
public class WordMeaningOutputElement {
public String appearingWord; // what is the word in text
public String Description; //Definition in Wordnet or abstract from DBPedia
public String Source; // String Wordnet or DBPedia
public int startAt; // position where the labelled word starts in your string
public int endAt; // position where the labelled word ends in your string
public String id; // id in Wordnet or URI from DBPedia
public String URL; // link to Wordnet definition of term or DBPedia URI
}
Roadmap of project includes addition of new external sources for semantic tagging. Currently the idea is to include the following:
If you used Marvin, please cite:
This project was created as a part of Nikola Milosevic's (@nikolamilosevic86) PhD project at the University of Manchester, supervised by dr Goran Nenadic (Univerisity of Manchester), Cassie Gregson (AstraZeneca) and Robert Hernandez (AstraZeneca). More information about Nikola you can find at his web page or you may follow him on twitter
Having trouble with this project? Contact nikola.milosevic@manchester.ac.uk and we’ll help you sort it out.