Learning about Random Forest #
Last Edited | 20/05/2024 |
Decision Trees: #
Source: https://www.youtube.com/watch?v=_L39rN6gz7Y
Lets suppose the above is our data on which we are going to train a model:
- How will we choose our root node ?
- With every column, try to classify the target. The column which gives the least gini index will be our root node
- Entropy can also be used as a replace for Gini index
- Note: The tree nodes which does not have a clear boundry of classification, i.e does not classify 100% of the data into 2 distinct groups, they those nodes of the decision trees are known as impure nodes. The goal is that, by leaf node, the entire population should be clearly classifiable.
- Going ahead, the left and right node of the tree will be selected using similar step as above
- Select all the column which is not already indexed within a branch. For those columns, again compute the gini-index or entropy within its defined scope/strength. The next column will be column with minimum impurity.
Note: If the gini score or entorpy score i.e the impurity score of the node is less than its parent, then we will skip using that particular leaf node.