机器学习知识点查漏补缺 (随机森林和 extraTrees)

随机森林

对数据样本及特征随机抽取, 进行多个决策树训练, 防止过拟合, 提高泛化能力

一般随机森林的特点:

1 有放回抽样 (所以生成每棵树的时候, 实际数据集会有重复),

2 以最优划分分裂

Given a standard training set D of size n, bagging generates m new training sets D_i, each of size n, by sampling from D uniformly and with replacement. This kind of sample is known as a bootstrap sample. The m models are fitted using the above m bootstrap samples and combined by averaging the output (for regression) or voting (for classification).

ExtraTrees 算法多一层随机性, 在对连续变量特征选取最优分裂值时, 不会计算所有分裂值的效果, 来选择分裂特征

而是对每一个特征, 在它的特征取值范围内, 随机生成一个 split value, 再计算看选取哪一个特征来进行分裂

Empirical good default values are max_features=n_features for regression problems, and max_features=sqrt(n_features) for classification tasks (where n_features is the number of features in the data).
In addition, note that in random forests, bootstrap samples are used by default (bootstrap=True) while the default strategy for extra-trees is to use the whole dataset (bootstrap=False).

来源: http://www.bubuko.com/infodetail-2546379.html

与本文相关文章

暂无,快来抢沙发吧！