Classifying The Arabic Language Texts Part 1

887 words - 4 pages

There are several research and procedures for classifying Arabic-language texts were based mostly on different environments and lack of dependence on a unified standard, unified data set, which led to the lack of precision in determining the most accurate technique in the classification, Arabic language processing is not saturated as that of other languages. Find the roots and stemmer of Arabic is an important phases towards conducting research on most effective applications of NLP Arabic so we have interest to apply algorithms to these phases. Arabic language has a complex structure which makes it difficult to integrate NLP research on it.
In this theses will be a study and analysis of the classification algorithms based on a unified environment and one dataset with the included challenges faced by these algorithms to demonstrate the effectiveness and accuracy and with a huge data set due to the expansion of data and the continuous increase in the internet.
There are several algorithms for the classification of texts which are used in the classification of texts in the group that have to do by helping to retrieve more quickly and give more accurate searches that for Arabic texts like K-NN ,DECISION TREES , Naive Bayse ,Random forest and others.

we used Diab datasets and the structure of the dataset:
The dataset has nine categories each of which contains 300 documents. Each category has its own directory that includes all files belonging to this particular category.
and we make two other collections of data set, the second dataset collections has nine categories each of which contains 600 documents. Each category has its own directory that includes all files belonging to this particular category, and third dataset has nine categories each of which contains 1200 documents. Each category has its own directory that includes all files belonging to this particular category.
then we have three collections of data set , the collection one consisting of 2700 file divided into nine categories everyone have 300 file, collection Two consisting of 5400 file divided into nine categories everyone have 600 file, and collection Three consisting of 10800 file divided into nine categories everyone have 1200 file, the categories are Art, Economy, Health, Law, Literature, Politics, Religion, and Sport.
we Applied four preprocessing method on original data , then we became have five parts of each group, the part one Original data, the part two removing stop words, punctuations, and diacritics, part 3 applying the light10 stemmer, part 4 applying Chen stemmer, part 5 applying Khoja algorithm for extracting the roots.
then we used seven...

Find Another Essay On Classifying the Arabic Language Texts Part 1

Globalisation of the Media Part 1 - Ownership

2339 words - 9 pages Essay #1 - Media OwnershipMass media is ideally what its name suggests, a voice for the masses. But, as the line between the business and editorial side of journalism grows hazier, it is instead becoming a tool for the minority of corporate and political elites. Increasingly concentrated ownership has created an oligopoly in the media industry. The result is homogenized and watered down content heavily influenced by owner and advertiser

Language Culture and the Brain: Ties Between Neurolinguistics and Linguistic Anthropology As Seen in Japanese, English (British and American), and Arabic

3955 words - 16 pages between this prevalent cultural theme and the Arabic language. To prove this, Yasir Suleiman used Yasser Arafat's interviews with American media representatives as an example of the manner in which native Arabic speakers exhibit the concept of Self in asserting power verbally:Arafat uses the pronoun I whenever he wants to talk about his authority asThe leader of the Palestinians and as Israel's negotiating partner, something former American President

Part 1: Lord of the Flies vs the crusible

2276 words - 9 pages Part 1: Lord of the FliesCompare and contrast the characters you played in Lord of the Flies with the roles of the girls in Act 3 of The Crucible. You must refer to your performance of your role; you must quote specific moments /lines from both plays where relevant.The Era they are set in, and the way we used movementIn Lord of the Flies our group played the characters as twenty-first century girls, whereas in the Crucible the girls lived in the

Art and Resistance in the West Bank Part 1

2171 words - 9 pages from international backgrounds. This is because art has the power to transcend language barriers engaging those who ordinarily are incapable of understanding or sympathizing with the Palestinian cause. Adding art to the West Bank Wall has come to symbolize an international platform of resistance. Artists came from numerous countries around the world with hopes of creating “something for the people trapped behind it, as well as creating an

Internet Assignment Four Part A 1. What are the questions

773 words - 3 pages Internet Assignment Four Part A 1. What are the questions about war and ethics? 2. The article is about the presidents wanting to give a preemptive attack on Iraq. If Iraq hasn't fired upon us how come we are going to attack them, and also without the consent of other nations in the UN. Many questions are raised in this piece, about our judgment towards Saddam Hussein. It also focuses on the consequences of an attack to Iraq. Such as India could

Prompt A: The Power of Language - Writing 1 - Essay

1065 words - 5 pages worrying if immigrants can speak English worry about your child speaking Spanish"(Speaking in Tongues). This response was straightforward by making America an English-only country takes away every other language and the different backgrounds of people immigrating. Migrants fought for this country too as well as held office and paid taxes. They are part of this country as much as an American. Like Myriam Marquez mentions “Just because you speak

Analyse two texts on the module in the light of the Linda Hutcheon’s and Siobhan O’Flynn’s assertion that ‘part of th[e] pleasure [of consuming ad

2151 words - 9 pages . Patricia Rozema warns that her version of ‘Mansfield Park’ is “not a Jane Austen film. . . . It’s a Patricia Rozema film. My job as an artist is to provide a fresh view ”. In review of the quote in the essay title, part of the pleasure of consuming adaptations, whether successful or not, is discovering difference combined with originality and crucially why these differences were implemented in the first place. In doing so, we can both

The Taming of the Shrew and 10 Things I hate about you. Comnpare both texts and what each text reveals about their culture through themes, language and characters

1641 words - 7 pages The Taming of the Shrew and 10 Things I Hate About You both raise important issues through their themes, language and portrayal of characters. In doing so they each reflect the prevailing culture of the time they were composed.A modern audience has the opportunity to compare its society to those of the past, a prospect not offered to those of Elizabethan England. This allows a modern audience the ability to analyse views and opinions throughout

Gender Roles: Men and Women from the Anglo-Saxon to the Renaissance Era Part 1

1703 words - 7 pages Queen Of Beowulf." Women & Language 21.2 (1998): 31-38. Literary Reference Center. Web. 14 Mar. 2014. Leeming, David Adams. “The Anglo-Saxons.” Element of Literature, Sixth Course. Austin: Holt, Rhinehart & Winston, 1997. 1-16. Print. Leeming, David Adams. “The Middle Ages.” Element of Literature, Sixth Course. Austin: Holt, Rhinehart & Winston, 1997. 72-88. Print. Main, C. F. “The Renaissance.” Element of Literature, Sixth Course. Austin

Summary of The Context of Our Character, Part 1 - Rhodes University, 2nd year - Assignment

699 words - 3 pages Summary of The Context of Our Character, Part 1 In this chapter, Dan Ariely explores why people are dishonest and what factors this. Ariely notes that there are two types of dishonesty, the first being the explicit type, where the perpetrators consider every aspect of their crime and acknowledge that what they are engaging in, is dishonest behaviour. The second type is that which is exerted by people who consider themselves as honest; this type

Music and Singing in the Light of the Islam, Quran and Sunnah by Abu Bilal Mustafa Al-Kanadi (Part 1)

3264 words - 13 pages yourselves (proudly) in vanities). Due to the root 'samada' having various interpretations in the Arabic language, the scholars differ about this phrase's meaning. As a result, different interpretations are given by the commentators of the Quraan, such as the companions, taabieen and later scholars of tafseer. Al-Qurtubi refers to the various derived meanings mentioned by the linguists. Among the meanings understood from the root' samada' is the raising

Similar Essays

Classifying The Arabic Language Texts Part 2

2847 words - 12 pages makes researchers and programmers are looking for solutions to these developments to facilitate the search and retrieval operations. The huge increase in the number of Arabic texts on the Internet has increased the complexity, so the process of classifying texts working to improve the process of retrieving data, the researchers and programmers developed algorithms for the classification of Arabic-language texts, and there are many of them, and

Behavioral Language Assessment: Part 1 (Ablls R)

884 words - 4 pages Summary The topic for today's reading was Behavioral Language Assessment: Part 1 (ABLLS-R). One of the assigned readings, Language Assessment and Development in Toddlers with Autism Spectrum Disorders, presented several key findings emerged from a study of early language abilities in a large sample of toddlers with ASD. They found out that although the measures employed in this study to assess emerging language skills in toddlers with ASD

“The Arabic Language” Essay

784 words - 3 pages “The Arabic Language” In the world there are many languages, some are difficult and some are simple; some are ancient, some are modern. One of the oldest and hardest languages in the world is the Arabic language because it has difficult grammar and every word or vocabulary word has many meanings. The Arabic language began in the 8th century B.C, but it developed twelve centuries later in 4 A.D. The first people to speak this language

The Uniform Crime Reports Part 1 Offenses

600 words - 2 pages The Uniform Crime Reports (UCR) is an annual FBI publication that summarizesthe incident and crime rate of reported crimes throughout the United States. The UCRProgram was designed to permit comparison over time through construction of a CrimeIndex. The Crime Index is an inclusive measure of the violent and property crimecategories of the UCR. The Uniform Crime Reports also include information on what theFBI calls Part 1 Offenses. These offenses