This website uses cookies to ensure you have the best experience. Learn more

Information Gain Analysis

2406 words - 10 pages

Information gain analysis

ID3 uses information gain as its attribute selection measure. This measure is based on pioneering work by Claude Shannon on information theory, which studied the value or “information content” of messages. Let node N represent or hold the tuples of partition D. The attribute with the highest information gain is chosen as the splitting attribute for node N. This attribute minimizes the information needed to classify the tuples in the resulting partitions and reflects the least randomness or “impurity” in these partitions .Such an approach minimizes the expected number of tests needed to classify a given tuple and guarantees that a simple (but not necessarily the ...view middle of the document...

However, it is quite likely that the partitions will be impure (e.g., where a partition may contain a collection of tuples from different classes rather than from a single class). How much more information would we still need (after the partitioning) in order to arrive at an exact classification. This amount is measured by

The term |Dj |/|D| acts as the weight of the jth partition. InfoA(D) is the expected information required to classify a tuple from D based on the partitioning by A. The smaller the expected information (still) required, the greater the purity of the partitions.Information gain is defined as the difference between the original information requirement (i.e., based on just the proportion of classes) and the newrequirement (i.e., obtained after partitioning on A). That is,

In other words, Gain(A) tells us how much would be gained by branching on A. It is the expected reduction in the information requirement caused by knowing the value of A.The attribute A with the highest information gain, (Gain(A)), is chosen as the splitting attribute at node N. This is equivalent to saying that we want to partition on the attribute A that would do the “best classification,” so that the amount of information still required to finish classifying the tuples is minimal (i.e., minimum InfoA(D)).

Example

Induction of a decision tree using information gain. Table 6. presents a training set,D, of class-labeled tuples randomly selected from the AllElectronics customer database.(The data are adapted from [Qui86]. In this example, each attribute is discrete-valued.Continuous-valued attributes have been generalized.) The class label attribute, buys computer, has two distinct values (namely, {yes, no}); therefore, there are two distinctclasses (that is, m = 2). Let class C correspond to yes and class C2 correspond to no.There are nine tuples of class yes and five tuples of class no. A (root) node N is created for the tuples in D. To find the splitting criterion for these tuples, we must compute theinformation gain of each attribute.We first use Equation to compute the expected information needed to classify a tuple in D:

Next, we need to compute the expected information requirement for each attribute.Let’s start with the attribute age.We need to look at the distribution of yes and no tuples for each category of age. For the age category youth, there are two yes tuples and three no tuples. For the category middle aged, there are four yes tuples and zero no tuples. For the category senior, there are three yes tuples and two no tuples. Using Equation (6.2),

the expected information needed to classify a tuple in D if the tuples are partitioned according to age is

Hence, the gain in information from such a partitioning would be

Similarly, we can compute Gain(income) = 0.029 bits, Gain(student) = 0.5 bits, andGain(credit...

Find Another Essay On Information gain analysis

MGT 350: Tools and techniques for Decision Analysis

1109 words - 4 pages poor job of understanding the probabilities of consequences" (Tools for decision analysis, 2006).Probabilistic modeling is primarily based on the application of statistics for assessing the probability of uncontrolled events and also as risk assessment of the decision. Difficulties in probability assessment arise from scarce, vague, inconsistent, or incomplete information. Even decision-makers that know statistics tend to rely more on personal

How Organizations Employ Strategic Management Concepts

1245 words - 5 pages example, Google, a tech giant has redefined itself by moving from a search engine to a company providing various software services (“Corporate Information: Google”, 2007). Several fashion brands like H&M, Zara, Topshop have explored the benefits of online shopping to increase brand value (Bruce, Daly, 2012). Strategic management tools like Porter five forces analysis are a framework for industry analysis and business strategy development

It's Always Better When They're Together

1815 words - 8 pages memories. This is important for our understanding of the topic because these two different mechanisms compliment each other to describe a single specific aspect of a phenomenon in depth. When studying a topic, providing both a psychological mechanism and a biological mechanism helps the research to gain more complexity due to the fact that not only are there multiple forms of the same information, but the phenomenon has been researched from different

Kraft FG

760 words - 4 pages . URL: http://www.sec.gov/ The Global Agricultural Information Network (GAIN) Report provides information on commodities and trades made by USDA. URL: http://gain.fas.usda.gov/Recent%20GAIN%20Publications/Retail%20Foods_Santiago_Chile_12-28-2012.pdf Bargaining power of customers The Bureau of Economic Analysis provides information about gross domestic product (GDP), personal income and consumer spending. URL: http://www.bea.gov/ American

Strategies Employed by Organisations in the Contemporary Business Environment and Their Strategic Fit

1119 words - 5 pages under investigation is previously unexplored or even new, because it can be used in various ways to better understand organisational strategies, and how their environments affect them. The method of study is through the analysis of the secondary data on the integrated annual reports of the chosen organisations. Greenhoot and Dowsett (2012:3) propose the use of secondary data because inherent information has already been collected and prepared

Using Decision Analysis for Environmental Decisions

1422 words - 6 pages . The cost factor of projects must be given special consideration in CBA because the cost of projects has a tendency to be understated. It is important to conduct a sensitivity analysis in order to show the changes in benefit if costs are increased or decreased by a certain amount.1 Cost benefit analysis requires that there is a defined bottom line or something with which all variables can be measured against. Gain or loss of money is the most

Fun with Data Analysis

1745 words - 7 pages , measurement system and population serves as a foundation for conducting appropriate statistical analysis. Using previously established measures can assist with establishing validity of the information. This is important as it helps peer reviewers quickly gain insight in support of understanding the conclusions drawn. Keeping research analysis simple helps avoid mistakes that can take away from the study outcomes. In the end, research outcomes are only

Management Information Systems

1664 words - 7 pages Introduction A management information systems helps manager make decisions by providing information from a database with little or no analysis. A decision support system (DSS), on the other hand, helps managers make decisions by analyzing data from a database and providing the results of the analysis to the manager. An MIS supports all three levels of management decision making with reports and query responses. A DSS, on the other hand, is

Research Project Proposal

1210 words - 5 pages to conduct surveys and interviews with experts in order to gain their opinion. My intention is to provide still shots of my swing compared with Tiger at each phase of the golf swing and provide this information to Stage 2 PE teachers and students to gain their insight. I also intend to interview my old Biomechanics lecturer at University, Mr Paul Grimshaw, as he is an expert within the field of Biomechanics. While this may not be possible I

Rio Tinto's Business Strategy and SWOT Analysis

2914 words - 12 pages ratings from. Future Orientation: Low to medium rating.: Accuracy: Medium rating. Rio Tinto analyst should develop outputs that aims for higher-level of accuracy .This means that the SWOT analysis must have the insights to gain the precise. Resource efficiency: Medium to high To produce the effective SWOT analysis the data that is needed to do the analysis needs to come for a source that is less in cost than the output is worth . In other words

Strategic use of Information Systems

3972 words - 16 pages Title of the topic:"What are the Strategic uses of Information Systems and also their challenges "Table of contents Description Page Number Summary 3 Introduction 3 Strategic Information System-General Definition 4 Key features of the Strategic Information Systems 6 Internet competitive intelligence 6 PORTER'S Competitive forces model and Strategies 7 Basic ways to gain competitive advantage 7 Challenges of

Similar Essays

4. An Analysis On How The Information System Can Help The Organization To Gain Competitive Advantage

736 words - 3 pages SWOT analysis defined is an overall evaluation of the company’s strengths(S), weaknesses (W), opportunities (O), and threats (T). Strengths include internal capabilities, resources, and positive situational factors that may help the DELL companies serve their customer and achieve their target or specific objectives. Weaknesses are including internal limitations and negative situational factors that may influence the DELL’s company performance

What Have You Learn In This Assignment? Do You Get Any Information And Gain Knowledge Regarding The Issue?

641 words - 3 pages I learn from this assignment is that I can learn how to answer questions correctly case study, and I was able to learn the case assignment oriented computer systems failure in a company will have an adverse impact on their management. Also, what can I learn from this case study is in terms of skill reading and writing case study with good. Then, I learn how to answer the question correctly, I will also use information from this case study to be

Traditional Data Mining Applications Essay

637 words - 3 pages Application exploration: Traditional data mining applications had a great deal of attention on helping business gain well than others of a comparable nature. Data mining is explored to an increasing extent in areas such as financial analysis, telecommunications, biomedicines, science and also for counterterrorism and mobile (wireless) data mining. Scalable and interactive data mining methods: Data mining must be able to handle large amount

Using Gap Analysis To Understand And Improve Healthcare Delivery Practices

965 words - 4 pages author hopes to gain a better understanding of the claims administration and overall healthcare reimbursement methods currently used nationwide. Data Collection Plan The first phase in a gap analysis plan involves collecting data relevant to the process of interest; in this case, the medical necessity review process at Well Health. Amaral and Faria (2010) suggest the following steps: • Meet with department leaders to gather initial information