1.对比决策树与随机森林的性能,参考POP3暴力破解源码,对其进行部分修改
(1)在6.3节基础修改,决策树复用6.3节内容
clf = tree.DecisionTreeClassifier()
score = model_selection.cross_val_score(clf, x, y, n_jobs=1, cv=10)
print(np.mean(score))
(2)增加随机森林处理逻辑
clf2 = RandomForestClassifier(n_estimators=10, max_depth=None,min_samples_split=2, random_state=0)
score = model_selection.cross_val_score(clf2, x, y, n_jobs=1, cv=10)
print(np.mean(score))
(3)完整源码:注意是python3基础环境运行
# -*- coding:utf-8 -*-
from sklearn import model_selection
from sklearn import tree
from sklearn.ensemble import RandomForestClassifier
import numpy as np
def load_kdd99(filename):
x=[]
with open(filename) as f:
for line in f:
line=line.strip('\n')
line=line.split(',')
x.append(line)
return x
def get_guess_passwdandNormal(x):
v=[]
w=[]
y=[]
for x1 in x:
if ( x1[41] in ['guess_passwd.','normal.'] ) and ( x1[2] == 'pop_3' ):
if x1[41] == 'guess_passwd.':
y.append(1)
else:
y.append(0)
x1 = [x1[0]] + x1[4:8]+x1[22:30]
v.append(x1)
for x1 in v :
v1=[]
for x2 in x1:
v1.append(float(x2))
w.append(v1)
return w,y
if __name__ == '__main__':
v=load_kdd99("../data/kddcup99/corrected")
x,y=get_guess_passwdandNormal(v)
clf = tree.DecisionTreeClassifier()
score = model_selection.cross_val_score(clf, x, y, n_jobs=1, cv=10)
print(score)
print(np.mean(score))
clf2 = RandomForestClassifier(n_estimators=10, max_depth=None,min_samples_split=2, random_state=0)
score = model_selection.cross_val_score(clf2, x, y, n_jobs=1, cv=10)
print(score)
print(np.mean(score))
2.运行结果对比
[0.98637602 1. 1. 1. 1. 1.
1. 1. 1. 1. ]
0.9986376021798365
[0.91008174 1. 1. 1. 1. 1.
1. 1. 1. 1. ]
0.991008174386921
3.参考原书P95修改
clf = tree.DecisionTreeClassifier()
score = model_selection.cross_val_score(clf, x, y)
print(score)
print(np.mean(score))
clf2 = RandomForestClassifier(n_estimators=10, max_depth=None,min_samples_split=2, random_state=0)
score = model_selection.cross_val_score(clf2, x, y)
print(score)
print(np.mean(score))
运行结果
[0.97128794 1. 1. ]
0.9904293136450643
[0.97292863 0.99917966 1. ]
0.9907027618266339
原书作者也提到这属于不太严谨的对比,但是也可以说明一些问题,确实在平时遇到的一些问题中,通常情况下随机森林的判别性能是好于决策树的,很多说法是,随机森林是加强版的决策树的。
本文为互联网自动采集或经作者授权后发布,本文观点不代表立场,若侵权下架请联系我们删帖处理!文章出自:https://blog.csdn.net/mooyuan/article/details/103459746