資料探勘的小白之旅taks3

**部分

（前面主要是利用四分位數的方法處理，然後用箱型圖展示處理後的影象）

`def outliers_proc(data, col_name, scale=3):

「」"用於清洗異常值，預設用 box_plot（scale=3）進行清洗

:param data: 接收 pandas 資料格式

:param col_name: pandas 列名

:param scale: 尺度

:return:

「」"def box_plot_outliers(data_ser, box_scale):

「」"利用箱線圖去除異常值

:param data_ser: 接收 pandas.series 資料格式

:param box_scale: 箱線圖尺度，

:return:

「」"iqr = box_scale * (data_ser.quantile(0.75) - data_ser.quantile(0.25))

val_low = data_ser.quantile(0.25) - iqr

val_up = data_ser.quantile(0.75) + iqr

rule_low = (data_ser < val_low)

rule_up = (data_ser > val_up)

return (rule_low, rule_up), (val_low, val_up)

data_n = data.copy()

data_series = data_n[col_name]

rule, value = box_plot_outliers(data_series, box_scale=scale)

index = np.arange(data_series.shape[0])[rule[0] | rule[1]]

print(「delete number is: {}」.format(len(index)))

data_n = data_n.drop(index)

data_n.reset_index(drop=true, inplace=true)

print(「now column number is: {}」.format(data_n.shape[0]))

index_low = np.arange(data_series.shape[0])[rule[0]]

outliers = data_series.iloc[index_low]

print(「description of data less than the lower bound is:」)

print(pd.series(outliers).describe())

index_up = np.arange(data_series.shape[0])[rule[1]]

outliers = data_series.iloc[index_up]

print(「description of data larger than the upper bound is:」)

print(pd.series(outliers).describe())

特徵構造

特徵歸一化/標準化：

資料分桶的作用:

好處：後面的特徵篩選沒看懂，正在加油幹呢。

資料探勘小白之旅task5

模型融合是比賽後期乙個重要的環節，大體來說有如下的型別方式。什麼是 stacking 簡單來說 stacking 就是當用初始訓練資料學習出若干個基學習器後，將這幾個學習器的結果作為新的訓練集，來學習乙個新的學習器。生成一些簡單的樣本資料，test prei 代表第i個模型的值 test pre...

網路安全中的資料探勘技術（3）

1 誤用檢測誤用檢測又稱為特徵檢測，它將已知的入侵活動用一種模式來表示，形成網路攻擊特徵庫，或稱為網路攻擊規則庫。該方法對輸入的待分析資料來源進行適當處理，提取其特徵，並將這些特徵與網路攻擊特徵庫中的特徵進行比較，如果發現匹配的特徵，則指示發生了一次入侵行為。優點誤報率低能夠準確的識別已知的攻...

Python3用於資料探勘的相關環境安裝搭建

開始學習python資料探勘，在相關的開發工具和開發環境上繞了很多路，各種版本下了刪，刪了下.首先先謝過各位前輩的教程和指導。其次，總結分享一下自己目前的環境搭建過程。pc環境 win10 1.python版本python 3.5.2 目前我使用的是anaconda3，截至2016 12 13官方版...

資料探勘的小白之旅taks3

資料探勘小白之旅task5

網路安全中的資料探勘技術（3）

Python3用於資料探勘的相關環境安裝搭建

相關推薦