邏輯回歸的例項

案例背景和目標

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
data=pd.read_csv("creditcard.csv")
print(data.head())

樣本不均衡解決方案

count_classes=pd.value_counts(data['class'],sort=true).sort_index()
count_classes.plot(kind='bar')
plt.title("fraud class histogram")
plt.xlabel("class")
plt.ylabel("frequency")
plt.show()
from sklearn.preprocessing import standardscaler
#series中沒有reshape函式
#解決辦法：用value將series物件化為numpy中的ndarray,再用reshape方法
data['normamount']=standardscaler().fit_transform(data['amount'].values.reshape(-1,1))
data=data.drop(['time','amount'],axis=1)
print(data.head())

下取樣策略

#兩種資料數目不同時，下取樣策略
#下取樣策略：使0和1的資料一樣少
x=data.loc[:,data.columns!='class']
y=data.loc[:,data.columns=='class']
#number of data points in the minority class
number_records_fraud=len(data[data.class==1])#類為1的樣本個數
fraud_indices=np.array(data[data.class==1].index)#用index取資料
#picking the indices of the normal classes
normal_indices=data[data.class==0].index#隨機選擇
#out of the indices we picked,randomly select "x" number (number_records_fraud)
random_normal_indices=np.random.choice(normal_indices,number_records_fraud,replace=false)
random_normal_indices=np.array(random_normal_indices)
#合併模式
under_sample_indices=np.concatenate([fraud_indices,random_normal_indices])
#under sample dataset
under_sample_data=data.iloc[under_sample_indices,:]#定位操作
x_undersample=under_sample_data.loc[:,under_sample_data.columns!='class']
y_undersample=under_sample_data.loc[:,under_sample_data.columns=='class']
#show the ratio
print("percentage of normal trnsactions:",len(under_sample_data[under_sample_data.class==0])/len(under_sample_data))
print("percentage of fraud transactions:",len(under_sample_data[under_sample_data.class==1])/len(under_sample_data))
print("total number of transactions in resampled data:",len(under_sample_data))

邏輯回歸最基礎的線性邏輯回歸詳解

線性邏輯回歸是最基礎，也是最基本的線性模型，也就是說理解該模型，對於後續其他線性模型的研究有重要意義。所以今天找了時間專門回憶了一下。一什麼是回歸，什麼是邏輯回歸？用一條直線對資料點進行擬合，擬合過程稱為回歸。logistic回歸根據現有資料對分類邊界線建立回歸公式，以此進行分類。二如何建立回...

邏輯回歸的回顧

1 模型函式 2 啟用函式 3 單個訓練資料的損失函式衡量概率與y到底有多接近 4 m個訓練資料的損失 5 為什麼這裡損失函式是這樣的。因為這裡是值是概率，所以用線性回歸的損失函式方法不合適為什麼不合適？因為用那種方法求得的函式不是凸函式，會有多個極值那麼這裡的損失函式為什麼寫成這樣呢？這裡...

機器學習邏輯回歸 Python實現邏輯回歸

coding utf 8 author 蔚藍的天空tom import numpy as np import os import matplotlib.pyplot as plt from sklearn.datasets import make blobs global variable path...

邏輯回歸的例項

邏輯回歸 最基礎的線性邏輯回歸詳解

邏輯回歸的回顧

機器學習 邏輯回歸 Python實現邏輯回歸

相關推薦

邏輯回歸最基礎的線性邏輯回歸詳解

機器學習邏輯回歸 Python實現邏輯回歸