This website uses cookies to ensure you have the best experience. Learn more

Methodology Of The Naïve Bayes Algorithm.

1816 words - 8 pages

In this chapter we are going to provide more insight into the Naïve Bayes algorithm. The aim is to show how the method works. We will also take a look at how our model will be developed, the various data sets that will be used in the process and how they were chosen. Then we are going to look at feature selection and how it will be applied.


Bayes' rule:

P (E | H) x P (H)
P (H | E) = _________________
P (E)

The fundamental concept of Bayes' rule is that the result of a hypothesis or an event (H) can be calculated based on the presence of some observed evidences (E). From Bayes' rule, we have:
1. A prior probability of H or P(H): This is the probability of an event before observing the evidence.
2. A posterior probability of H or P(H | E): This is the probability of an event after observing the evidence.
For example to estimate the probability of a mail being classified as belonging to the Human Resources (HR) class, we usually use some evidences such as the frequency of use of words like “Employment”.

Using the equation above, let ‘HR’ be the event of a mail belonging to HR and ‘Employment’ be the evidence of the word Employment in the mail, then we have

P (Employment | HR) x P (HR)
P (HR | Employment) = _____________________
P (Employment)

P (HR | Employment) is the probability that the word Employment occurs in a mail to HR. Of course, “Employment” could occur in many other mail classes such as Joint Venture or Procurement and Contracting, but we only consider “Employment” in the context of class “HR”. This probability can be obtained from historical mail collections.
P (HR) is the prior probability of the HR class. This probability can be estimated from records, for example, the number of HR mails received throughout a year.
P (Employment) is the probability of the word “Employment” occurring. Again, this can be estimated from the records, but the evidence is not usually well recorded compared to the main event. Therefore, sometimes the full evidence, i.e., P (Employment), is hard to obtain.

As you can see from the example above, we can predict an outcome of some events by observing some data collection. Generally, it is “better” to have more evidence to support the prediction of an event. Typically, the more evidences we can gather, the better the classification accuracy can be obtained. However, the evidence must relate to the event (must make sense). For example, if you add an evidence of “Purchase Order” to the above example, the model might yield worse performance. This is since “HR” class is not related to the evidence of “Purchase Order”, i.e., if Purchase Order appears in a mail, it doesn't mean that the mail is meant for HR.

Assume we have more evidence for developing our Naïve Bayes classifier, we may perhaps run into a dilemma of dependencies, that is to say, some evidence may depend on one or more of other evidences. For instance, the presence of the word...

Find Another Essay On Methodology of the Naïve Bayes Algorithm.

The Chan-Vese Segmentation Algorithm and Global Properties of an Image

738 words - 3 pages The Chan-Vese segmentation algorithm is robust and has been used to segment different kind of images. This algorithm re- lies on global properties of an image (gray level intensities in regions, length of contours, area of regions), hence it is more suitable in cases when the edge information is not very predominant. The results are good qualitatively for noisy im- ages and images with complicated topologies. Reviewing the state of the art

Mixed Methodology in the Field of Educational Research

2190 words - 9 pages QUESTION 1 "With the development and perceived legitimacy of both qualitative and quantitative research in social and human sciences, mixed methods research, employing the combination of quantitative and qualitative approaches, has gained popularity." (Creswell, p. 203). Describe the development of mixed methodology in educational research. Discuss the steps that need to be taken to develop a viable mixed methods research study. Evaluate

Methodology for Evaluation of Mergers in the Indian Banking Sector

1463 words - 6 pages METHODOLOGY FOR EVALUATION OF THE MERGER This study is done following a before and after approach for evaluating the merger's impact. Accordingly we have tried to analyze the profitability, total income, efficiency of branch and share holding pattern has been looked into. The analysis of pre merger has been done for the financial year 1999-2000 while the post merger has been done for the financial year 2000-2001. Ratios have been broadly

Classification of the hourly solar radiation using c-means algorithm for optimal stand-alone PV system sizing

1682 words - 7 pages . 3. The methodology consists of selecting the lowest values of the solar radiation during one year period and trying different sets of panels and batteries to find the best configurations. The technical characteristics and costs of panels, batteries and other components are listed in tables (1-3), Table 1. HERE Table 2. HERE Table 3. HERE 4.1 Solar radiation classification using the FCM algorithm The first phase of the proposed

Classification of the hourly solar radiation using c-means algorithm for optimal stand-alone PV system sizing

704 words - 3 pages 2.1 Classification phase: fuzzy c –means clustering algorithm The FCM algorithm introduced by Dunn [27] and modified by Bezdek [28] was used widely in the clustering methods. It consists of separating the data point into c clusters with respect to some given criterion for the optimization of an objective function. However, due to the presence of nonlinearity in some time series such as the hourly global solar radiation time series, it is

An algorithm which allows the user to input a variable number of examination marks, calculates the average and prints the average

1243 words - 5 pages number is correctly guessed the computer automatically picks a new number and plays the game again with the same child. If the number is not guessed correctly the game is restarted with another child.Application 2Develop an algorithm, again using pseudocode, which allows the user to input a variable number of examination marks, calculates the average and prints the average and the entered examination marks in descending order.Due to these

The Social and Economic Features of Jabal Nablus and Karl Marx's Methodology

1283 words - 5 pages order to prove it, I will perform a critical analysis on Marx's methodology . I will next examine his concepts of productive forces, unequal distribution of labor, and conflict of interests in the society. In the end, I present comparative analysis of the social and economical features of Jabal Nablus in opposition to the claims of Carl Marx. The German Ideology starts off by illustrating the critique of the German idealists, while

Market research of the market potential of a chosen product, concentrating mainly on the primary research and developing a research methodology

2606 words - 10 pages activities can be made more effective" (Malhotra and Birks (2003: p6)).This assignment looks into the research of the market potential of a chosen product, concentrating mainly on the primary research and developing a research methodology. The research will be conducted for the company Sony Ericsson; the product is a brand line extension for the mobile phone market that is to be integrated into the gaming market. This product will offer the consumer the

Early Childhood Research Select a topic that relates to the field of ECE and critically review Select one methodology and examine it's appropriateness in the ECE

892 words - 4 pages , defines methodologies as a selection of related methods and strategies that "link theoretical frameworks to methods" (Mutch, 2005, p108) and methods as a process or strategy set to gather one kind of data.Experimentation is one research methodology that can be used. "Experiment based research is considered by some to be the highest level and most pure form of research" (New Zealand Tertiary College [NZTC], 2009, p.20). According to Mutch, 2005, an

The Methodology of Context in Photography

1849 words - 8 pages Among so many other mediums, it is of particular interest to note that the practice of photography is not simply bound to one side of the spectrum of creative expression. As much as it can be perceived as an emotional piece of art, a photo can also very well be seen as a showcase of the current social world through an objective lens. What it is that truly defines a photo as being either an artistic endeavor or a means for documentation, however

Literary Analysis Paper for the Adventures of Huckleberry Finn: What is the effect of having Huck, a naïve boy, and not an omniscient narrator, tell the story?

2018 words - 8 pages When Huck is contemplating about the letter which may determine Jim's fate, he eventually tears it up and thinks that he will 'go to hell'. Such a rash act followed by a wild assumption is one of the many examples that show Huck as a naïve boy. Mark Twain, the author of The Adventures of Huckleberry Finn, however, is able to use this naïveté to convey many ideas, rather than using an omniscient narrator. In The Adventures of

Similar Essays

The Naïve Man In Ohn Steinbeck’s Of Mice And Men

718 words - 3 pages even feel deserving of some sort of reward for figuring out how to do it, but Lennie is a grown man. Most grown men wouldn’t or shouldn’t be fazed by splashes making ripples in the water. It makes a man like Lennie, who feels happy and deserving of praise for his actions a childish individual. Lennie is not only a childish man, but also a naïve character. Lennie cannot comprehend what is right and wrong often leading him to do something

Communication Security: The Degree Of Security Of A Scrambling Algorithm

937 words - 4 pages process of scrambling to reduce the transmission load, but these methods shows low robustness in the presence of noise. The degree of security of a scrambling algorithm depends on residual intelligibility and key space. Residual intelligibility is the amount of intelligibility left over in the scrambled signal. The lower the residual intelligibility of a scrambling method, the higher its degree of security. Scrambling degree (SD) \cite

The Problems Of Establishing A Standardised Methodology

1772 words - 7 pages points out: “comparison is the closest historians can get to testing, attempting to falsify their own explanations”, suggesting there a deep-rooted belief that comparison is a methodology akin to the sciences. Although there is a risk the study might inadvertently become a narrative, the power of this methodology is great. Rutten provides a simplified model of what scientific methodology consists of: “accept only hypotheses with refutable

The Problems Of Establishing A Standardised Methodology

2234 words - 9 pages supremacy, whether they intended to or not. Comparison earns economic history its place in the social sciences as it enables it to test hypotheses and draw substantiated conclusions, especially when historians make clever use of qualitative data. Yet there is no consensus on methodology between traditional historians and economists, or medievalists and modernists. Some of these differences are the result of data availability, but the main reason is that