An Efficient Hierarchy/Graph Texts Organization And Representation Schema

787 words - 4 pages

3.2.1 Semantic Text Refining and Annotating Stage
This stage is responsible for describing how the data are, and aims to:
1. Accurately parsing each sentence and identifying POS, Semantic Role Labeling, subject-action-object and Named-entity recognition.
2. Discovering the MFs of each sentence in the textual document.
3. Exploiting semantic information in each sentence through detection attributes of its MFs.
4. Reducing the dimensions as much as possible.
5. Automatically generating an Effective Descriptive Sentence Object (DSO) with a hierarchal sentence objects.
The text analysis studies Natural Language (NL) text from different linguistic levels, i.e. words, sentence and meaning. The OpenNLP [13] is an open-source chosen as a parser for Linguistic analysis, providing lexical and syntactic parsers. Furthermore, AlchemyAPI [14] is used to extract semantic meta-data from content, such as information on subject-action-object relation extraction, people, places, companies, topics, facts, relationships, authors, and languages. Semantic Analysis and Annotation
Word-tagging analysis show how a word is used in a sentence. In particular, words can be changeable from one sentence to another depending on context (e.g. 'while' can be used as preposition, conjunction, verb and noun ; and 'light' can be used as a noun, verb, adjective and adverb). Tagging techniques are used to specify word-form for each single word in a sentence, and each word is tagged as a Part Of Speech (POS) [15].
Syntactic analysis applies phrase marker, or labeled bracketing techniques to segment NL as phrases, clauses and sentences, so that the NL is identified by syntactical/grammatical annotations. While sentence grammatical features are subject , verb , object and complement, the parser looks something like (TOP (S (NP ----) (VP ----))). A sentence (S) is represented by the parser as a tree having three children: a noun phrase (NP), a verbal phrase (VP) and the full stop (.), and the root of the tree will be S. Further theories of core grammar for syntactic analysis can be found in [17].
Semantic analysis is the study of the meaning based on the POS tags and syntactic elements mentioned previously and which can be linked in the NL text to create relationships [15-17]. Furthermore, Named-entity recognition (Ner) is deduced to locate and classify atomic elements in text into predefined categories (the names of persons, organizations, locations, times, quantities,...

