Subjects
Applied Mathematics for Electrical Engineering - 3130908
Complex Variables and Partial Differential Equations - 3130005
Engineering Graphics and Design - 3110013
Basic Electronics - 3110016
Mathematics-II - 3110015
Basic Civil Engineering - 3110004
Physics Group - II - 3110018
Basic Electrical Engineering - 3110005
Basic Mechanical Engineering - 3110006
Programming for Problem Solving - 3110003
Physics Group - I - 3110011
Mathematics-I - 3110014
English - 3110002
Environmental Science - 3110007
Software Engineering - 2160701
Data Structure - 2130702
Database Management Systems - 2130703
Operating System - 2140702
Advanced Java - 2160707
Compiler Design - 2170701
Data Mining And Business Intelligence - 2170715
Information And Network Security - 2170709
Mobile Computing And Wireless Communication - 2170710
Theory Of Computation - 2160704
Semester
Semester - 1
Semester - 2
Semester - 3
Semester - 4
Semester - 5
Semester - 6
Semester - 7
Semester - 8
Data Mining And Business Intelligence
(2170715)
DMBI-2170715
Winter-2018
Question-3-b-or
BE | Semester-
7
Winter-2018
|
03/12/2018
Q3) (b)
4 Marks
Explain the following as attribute selection measure: (i) Information Gain (ii) Gain Ratio
Information gain :
ID3 uses information gain as its attribute selection measure.
This measure is based on pioneering work by Claude Shannon on information theory, which studied the value or “information content” of messages
Let node N represents or hold the tuples of partition D.
The attribute with the highest information gain is chosen as the splitting attribute for node N
This attribute minimizes the information needed to classify the tuples in the resulting partitions and reflects the least randomness or “impurity” in these partitions
Such an approach minimizes the expected number of tests needed to classify a given tuple and guarantees that a simple (but not necessarily the simplest) tree is found.
Gain ratio :
The information gain measure is biased toward tests with many outcomes
That is, it prefers to select attributes having a large number of values.
For example, consider an attribute that acts as a unique identifier, such as product ID
A split on product ID would result in a large number of partitions (as many as there are values), each one containing just one tuple.
Because each partition is pure, the information required to classify data set D based on this partitioning would be Info
product_ID
(D) = 0
Therefore, the information gained by partitioning on this attribute is maximal.
Clearly, such a partitioning is useless for classification
C4.5, a successor of ID3, uses an extension to information gain known as gain ratio, which attempts to overcome this bias.
Questions
Go to Question Paper
Q1
(a)
Q1
(b)
Q1
(c)
Q2
(a)
Q2
(b)
Q2
(c)
Q2
(c)
Q3
(a)
Q3
(b)
Q3
(c)
Q3
(a)
Q3
(b)
Q3
(c)
Q4
(a)
Q4
(b)
Q4
(c)
Q4
(a)
Q4
(b)
Q4
(c)
Q5
(a)
Q5
(b)
Q5
(c)
Q5
(a)
Q5
(b)
Q5
(c)