In the research paper \Distant supervision for relation extraction without labeled data",
the authors Mike Mintz, Steven Bills, Rion Snow and Dan Jurafsky investigate an alternate paradigm
[called distant supervision] for relation extraction. This algorithm combines the advantages of Super-
vised Information Extraction and Unsupervised Information Extraction to achieve greater precision.
Apart from this, they also analyze feature performance for better understanding of the roles of lexical
and syntactic features. Some of the key observations from this research are :
1) A combination of syntactic and lexical features offers a substantial improvement in relation
extraction precision over ...view middle of the document...
If we instead tag Steven Spielberg as
person/director rather than just person this confusion can be avoided.
For constructing the classifier, negative training data is needed. For this, the authors of the paper
create a feature vector during the training phase for an unrelated relation by randomly selecting
entity pairs that do not appear in any freebase relation and extract features for them. Real care
must be taken while randomly selecting the unrelated relations as skewed distribution might result
in a decreased precision.
Consider the statements \Astronomer Edwin Hubble was born in Marsheld, Missouri" and \As-
tronomer Edwin Hubble took birth in Marsheld, Missouri". Both these sentences convey the exact
same thing. Similarly, consider the statements \The critic wrote a scathing review" and \A scathing
review was written by the critic". One statement is in active voice and the other in passive voice.
Even though these sentences theoretically convey the same thing, in order to extract relations from
them different set of features must be conjuncted. This is a computationally expensive process.
Instead of this, if we can identify the correlation between these sentences beforehand, we could
reduce the number of computations by almost half.
In the research paper \Answer Extraction as Sequence Tagging with Tree Edit Distance",
the authors Xuchen Yao, Benjamin Van Durme, Chris Callison-Burch and Peter Clark try to extract
answers from pre-retrieved sentences for question answering. By constructing a linear-chain Condi-
tional Random Field based on pairs of questions and the corresponding possible answers, they cast
the problem of answer extraction as a sequence tagging problem. Additionally, they use features
from Tree Edit Distance (TED) for aligning answer sentence tree with question tree. Some of the
key observations from this research are :
1) The inclusion of the features from Tree Edit Distance (TED) boosted the overall performance
of answer extraction when compared with the use of standard features. This may be because the
features from TED help in understanding the connection between the question and answer sentences
before answer extraction rather than during answer validation.
2) NER and EDIT are the most significant...