《Web安全之机器学习入门》笔记:第六章 6.5随机森林检测POP3暴力破解

本文阅读 1 分钟

1.对比决策树与随机森林的性能,参考POP3暴力破解源码,对其进行部分修改

(1)在6.3节基础修改,决策树复用6.3节内容

clf = tree.DecisionTreeClassifier()
    score = model_selection.cross_val_score(clf, x, y, n_jobs=1, cv=10)
    print(np.mean(score))

(2)增加随机森林处理逻辑

clf2 = RandomForestClassifier(n_estimators=10, max_depth=None,min_samples_split=2, random_state=0)
    score = model_selection.cross_val_score(clf2, x, y, n_jobs=1, cv=10)
    print(np.mean(score))

(3)完整源码:注意是python3基础环境运行

# -*- coding:utf-8 -*-

from sklearn import model_selection
from sklearn import tree
from sklearn.ensemble import RandomForestClassifier
import numpy as np


def load_kdd99(filename):
    x=[]
    with open(filename) as f:
        for line in f:
            line=line.strip('\n')
            line=line.split(',')
            x.append(line)
    return x

def get_guess_passwdandNormal(x):
    v=[]
    w=[]
    y=[]
    for x1 in x:
        if ( x1[41] in ['guess_passwd.','normal.'] ) and ( x1[2] == 'pop_3' ):
            if x1[41] == 'guess_passwd.':
                y.append(1)
            else:
                y.append(0)

            x1 = [x1[0]] + x1[4:8]+x1[22:30]
            v.append(x1)

    for x1 in v :
        v1=[]
        for x2 in x1:
            v1.append(float(x2))
        w.append(v1)
    return w,y

if __name__ == '__main__':
    v=load_kdd99("../data/kddcup99/corrected")
    x,y=get_guess_passwdandNormal(v)
    clf = tree.DecisionTreeClassifier()
    score = model_selection.cross_val_score(clf, x, y, n_jobs=1, cv=10)
    print(score)
    print(np.mean(score))

    clf2 = RandomForestClassifier(n_estimators=10, max_depth=None,min_samples_split=2, random_state=0)
    score = model_selection.cross_val_score(clf2, x, y, n_jobs=1, cv=10)
    print(score)
    print(np.mean(score))

2.运行结果对比

[0.98637602 1.         1.         1.         1.         1.
 1.         1.         1.         1.        ]
0.9986376021798365
[0.91008174 1.         1.         1.         1.         1.
 1.         1.         1.         1.        ]
0.991008174386921

3.参考原书P95修改

clf = tree.DecisionTreeClassifier()
    score = model_selection.cross_val_score(clf, x, y)
    print(score)
    print(np.mean(score))

    clf2 = RandomForestClassifier(n_estimators=10, max_depth=None,min_samples_split=2, random_state=0)
    score = model_selection.cross_val_score(clf2, x, y)
    print(score)
    print(np.mean(score))

运行结果

[0.97128794 1.         1.        ]
0.9904293136450643

[0.97292863 0.99917966 1.        ]
0.9907027618266339

原书作者也提到这属于不太严谨的对比,但是也可以说明一些问题,确实在平时遇到的一些问题中,通常情况下随机森林的判别性能是好于决策树的,很多说法是,随机森林是加强版的决策树的。

本文为互联网自动采集或经作者授权后发布,本文观点不代表立场,若侵权下架请联系我们删帖处理!文章出自:https://blog.csdn.net/mooyuan/article/details/103459746
-- 展开阅读全文 --
Web安全—逻辑越权漏洞(BAC)
« 上一篇 03-13
Redis底层数据结构--简单动态字符串
下一篇 » 04-10

发表评论

成为第一个评论的人

热门文章

标签TAG

最近回复