機器學習基礎100天 day01 資料預處理

資料集

country age salary purchased

0 france 44.0 72000.0 no

1 spain 27.0 48000.0 yes

2 germany 30.0 54000.0 no

3 spain 38.0 61000.0 no

4 germany 40.0 nan yes

5 france 35.0 58000.0 yes

6 spain nan 52000.0 no

7 france 48.0 79000.0 yes

8 germany 50.0 83000.0 no

9 france 37.0 67000.0 yes

#_*_coding:utf-8_*_
import numpy as np
import pandas as pd
from sklearn.preprocessing import imputer,labelencoder,onehotencoder
from sklearn.cross_validation import train_test_split
from sklearn.preprocessing import standardscaler
dataset = pd.read_csv('../data/data.csv')
#iloc---基於索引位來擷取資料集
x= dataset.iloc[:,:-1].values
y = dataset.iloc[ : , 3].values
#處理缺失資料
imputer = imputer(missing_values="nan",strategy="mean",axis=0)
imputer = imputer.fit(x[ : , 1:3])
#k=x[:,1:3]，使用k去訓練乙個imputer類，用該類的物件去處理k的缺失值；    用k的均值去替換k中的缺失值
x[ : ,1:3] = imputer.transform(x[ : ,1:3])
#解析分類資料   分類資料指的是含有標籤值而不是數字值的變數，例如yes、no不能用於模型的數字計算，所以需要解析成數字
label_x = labelencoder()#labelencoder 將標籤分配乙個0~n_class之間的數字編碼，此處是按照首字母來排序
x[:, 0]= label_x.fit_transform(x[ : , 0])
# #建立虛擬變數 
onehotencoder = onehotencoder(categorical_features = [0])
x = onehotencoder.fit_transform(x).toarray()
labelencoder_y = labelencoder()
y =  labelencoder_y.fit_transform(y)
# #拆分資料集為訓練集合和測試集合
x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.2,random_state=0)
sc_x = standardscaler()
x_train = sc_x.fit_transform(x_train)
x_test = sc_x.fit_transform(x_test)

SQL基礎學習day01

froeign key外來鍵 check約束 defualt預設值 truncate table 語句其它運算子描述等於不等於。注釋在 sql 的一些版本中，該操作符可被寫成大於小於大於等於小於等於 between 在某個範圍內 like 搜尋某種模式 in指定針對某個列的多個可能值...

c 基礎學習Day01

c 基礎學習day01 計算機系統計算機系統由硬體軟體組成指令系統是硬體和軟體的介面。計算機語言和程式設計方法計算機語言程式設計師與計算機溝通的語言描述解決問題的方法和相關資料。計算機語言的級別二進位制構成的機器語言使用助記符的組合語言使用類似英語單詞和語句的高階語言 c 是物件...

python之基礎學習day01

今天是python學習的第一天，收穫還是不少的，使用的編輯器為python3.7。第一天學習知識總結 1 編寫的第一句python語句 print hello world 2 python的兩種執行方式 python直譯器 py檔案路徑 python進入直譯器實時輸入並獲取到執行結果 3 pyth...

機器學習基礎100天 day01 資料預處理

SQL基礎學習day01

c 基礎學習Day01

python之基礎學習day01

相關推薦