《Web安全之机器学习入门》笔记:第七章 7.3朴素贝叶斯检测异常操作

本文阅读 1 分钟

1.源码修改

原始github代码为python2环境,本人运行环境为python3,除print加括号,删掉部分无意义引用外,主要修改部分如下:

fdist = list(FreqDist(dist).keys())

note:这部分可以参考5.3节,两者处理修改源码部分基本相同 《Web安全之机器学习入门》笔记:第五章 5.3 K近邻检测异常操作(一)_mooyuan的博客-CSDN博客

2.数据集与特征化

note:这部分可以参考5.4节,两者处理原理与代码基本相同

​​​​​​《Web安全之机器学习入门》笔记:第五章 5.4 K近邻检测异常操作(二)_mooyuan的博客-CSDN博客3.朴素贝叶斯处理逻辑

clf = GaussianNB().fit(x_train, y_train)
    y_predict_nb=clf.predict(x_test)

4.对比K近邻性能

neigh = KNeighborsClassifier(n_neighbors=3)
    neigh.fit(x_train, y_train)
    y_predict_knn=neigh.predict(x_test)
    print(y_train)
    clf = GaussianNB().fit(x_train, y_train)
    y_predict_nb=clf.predict(x_test)


    score=np.mean(y_test==y_predict_knn)*100
    print("KNN %d" % score)

    score=np.mean(y_test==y_predict_nb)*100
    print("NB %d" % score)

5.完整源码

# -*- coding:utf-8 -*-

import numpy as np
from nltk.probability import FreqDist
from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import GaussianNB

#测试样本数
N=90

def load_user_cmd_new(filename):
    cmd_list=[]
    dist=[]
    with open(filename) as f:
        i=0
        x=[]
        for line in f:
            line=line.strip('\n')
            x.append(line)
            dist.append(line)
            i+=1
            if i == 100:
                cmd_list.append(x)
                x=[]
                i=0

    fdist = list(FreqDist(dist).keys())
    return cmd_list,fdist

def load_user_cmd(filename):
    cmd_list=[]
    dist_max=[]
    dist_min=[]
    dist=[]
    with open(filename) as f:
        i=0
        x=[]
        for line in f:
            line=line.strip('\n')
            x.append(line)
            dist.append(line)
            i+=1
            if i == 100:
                cmd_list.append(x)
                x=[]
                i=0

    fdist = list(FreqDist(dist).keys())
    dist_max=set(fdist[0:50])
    dist_min = set(fdist[-50:])
    return cmd_list,dist_max,dist_min

def get_user_cmd_feature(user_cmd_list,dist_max,dist_min):
    user_cmd_feature=[]
    for cmd_block in user_cmd_list:
        f1=len(set(cmd_block))
        fdist = FreqDist(cmd_block).keys()
        f2=fdist[0:10]
        f3=fdist[-10:]
        f2 = len(set(f2) & set(dist_max))
        f3=len(set(f3)&set(dist_min))
        x=[f1,f2,f3]
        user_cmd_feature.append(x)
    return user_cmd_feature

def get_user_cmd_feature_new(user_cmd_list,dist):
    user_cmd_feature=[]

    for cmd_list in user_cmd_list:
        v=[0]*len(dist)
        for i in range(0,len(dist)):
            if dist[i] in cmd_list:
                v[i]+=1
        user_cmd_feature.append(v)

    return user_cmd_feature

def get_label(filename,index=0):
    x=[]
    with open(filename) as f:
        for line in f:
            line=line.strip('\n')
            x.append( int(line.split()[index]))
    return x

if __name__ == '__main__':
    user_cmd_list,dist=load_user_cmd_new("../data/MasqueradeDat/User3")
    user_cmd_feature=get_user_cmd_feature_new(user_cmd_list,dist)
    labels=get_label("../data/MasqueradeDat/label.txt",2)
    y=[0]*50+labels

    x_train=user_cmd_feature[0:N]
    y_train=y[0:N]

    x_test=user_cmd_feature[N:150]
    y_test=y[N:150]

    neigh = KNeighborsClassifier(n_neighbors=3)
    neigh.fit(x_train, y_train)
    y_predict_knn=neigh.predict(x_test)
    print(y_train)
    clf = GaussianNB().fit(x_train, y_train)
    y_predict_nb=clf.predict(x_test)


    score=np.mean(y_test==y_predict_knn)*100
    print("KNN %d" % score)

    score=np.mean(y_test==y_predict_nb)*100
    print("NB %d" % score)

6.运行结果

KNN 83
NB 83

KNN和NB对数据集所进行异常操作检测的准确率都为83%。

本文为互联网自动采集或经作者授权后发布,本文观点不代表立场,若侵权下架请联系我们删帖处理!文章出自:https://blog.csdn.net/mooyuan/article/details/122756302
-- 展开阅读全文 --
Web安全—逻辑越权漏洞(BAC)
« 上一篇 03-13
Redis底层数据结构--简单动态字符串
下一篇 » 04-10

发表评论

成为第一个评论的人

热门文章

标签TAG

最近回复