bobo老师,我发现bagging生成的DecisionTreeClassifier好像都是一样的,除了random_state,其他DecisionTree的特征都一样。
下面是我的代码:
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
dt_clf = BaggingClassifier(DecisionTreeClassifier(), n_estimators=100, max_samples=100, bootstrap=True,oob_score=True)
dt_clf.fit(X, y)
print(dt_clf.estimators_)
print(dt_clf.oob_score_)
这是输出结果:
DecisionTreeClassifier(class_weight=None, criterion=‘gini’, max_depth=None,
max_features=None, max_leaf_nodes=None,
min_impurity_decrease=0.0, min_impurity_split=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, presort=False,
random_state=1522406438, splitter=‘best’), DecisionTreeClassifier(class_weight=None, criterion=‘gini’, max_depth=None,
max_features=None, max_leaf_nodes=None,
min_impurity_decrease=0.0, min_impurity_split=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, presort=False,
random_state=1447758011, splitter=‘best’), DecisionTreeClassifier(class_weight=None, criterion=‘gini’, max_depth=None,
max_features=None, max_leaf_nodes=None,
min_impurity_decrease=0.0, min_impurity_split=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, presort=False,
random_state=1886162999, splitter=‘best’), DecisionTreeClassifier(class_weight=None, criterion=‘gini’, max_depth=None,
max_features=None, max_leaf_nodes=None,
min_impurity_decrease=0.0, min_impurity_split=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, presort=False,
random_state=1931847904, splitter=‘best’), DecisionTreeClassifier(class_weight=None, criterion=‘gini’, max_depth=None,
max_features=None, max_leaf_nodes=None,
min_impurity_decrease=0.0, min_impurity_split=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, presort=False,
random_state=1672360509, splitter=‘best’), DecisionTreeClassifier(class_weight=None, criterion=‘gini’, max_depth=None,
max_features=None, max_leaf_nodes=None,
min_impurity_decrease=0.0, min_impurity_split=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, presort=False,
random_state=1862725847, splitter=‘best’)]
…………………………………………………………………………………………………………………………
不是说bagging要尽可能使得模型有差异化嘛,那DecisionTree的max_depth,max_features,criterion……不应该都不一样嘛?
怎么改能让每棵树的所有特征不一样?额,有这种操作吗 :)
还有一个问题就是,bagging这种操作能防止过拟合吗?因为毕竟每棵树看的训练集都不一样。我看到老师视频里的准确度都挺高的,那个准确度是真的准确吗?还是因为存在了过拟合,然后准确率提高了……bagging怎么结合留一法(leave one out)或者其他验证方式,来判断bagging到底是不是存在了过拟合?