Data Mining And Business Intelligence (2170715)

Question-3-b-or

BE | Semester-7 Winter-2018 | 03/12/2018

Q3) (b)

ID3 uses information gain as its attribute selection measure.
This measure is based on pioneering work by Claude Shannon on information theory, which studied the value or “information content” of messages
Let node N represents or hold the tuples of partition D.
The attribute with the highest information gain is chosen as the splitting attribute for node N
This attribute minimizes the information needed to classify the tuples in the resulting partitions and reflects the least randomness or “impurity” in these partitions
Such an approach minimizes the expected number of tests needed to classify a given tuple and guarantees that a simple (but not necessarily the simplest) tree is found.

The information gain measure is biased toward tests with many outcomes
That is, it prefers to select attributes having a large number of values.
For example, consider an attribute that acts as a unique identifier, such as product ID
A split on product ID would result in a large number of partitions (as many as there are values), each one containing just one tuple.
Because each partition is pure, the information required to classify data set D based on this partitioning would be Info_{product_ID}(D) = 0
Therefore, the information gained by partitioning on this attribute is maximal.
Clearly, such a partitioning is useless for classification
C4.5, a successor of ID3, uses an extension to information gain known as gain ratio, which attempts to overcome this bias.