pandas 資料處理

1.1 讀csv檔案

import numpy as np
import pandas as pd
# tsv檔案與csv檔案相似，分隔符是『\t』
# 使用round() 確定資料保留幾位小數
df = pd.read_csv('data.tsv', sep='\t').round(decimals=4)
# df = pd.read_csv(file_path, sep="\t", encoding='utf-8', error_bad_lines=false, keep_default_na=false)
# 讀入時指定字段資料型別
df = pd.read_csv('test.csv', dtype=)

1.2 讀excel檔案

df = pd.read_excel('test.xlsx', sheet_name=excel_sheet_name)

1.3 元資料資訊檢視

# 檢視前n條資料
print(df.head(n))
# 檢視資料描述
print(df.describe())
# 檢視字段型別
print(df.dtypes)
# 檢視表頭字段
print(df.columns)
# 將表頭字段轉化為陣列
df.columns.tolist()
print('*' * 10, '資料總量:', df.shape[0], ' 特徵數量：', df.shape[1])

data = df.dropna(how='all')
# data = df.dropna(how='any')
# 刪除空值列
data = df.df(how='all', axis=1)
# 空值補0
data.fillna(0, inplace=true)

# shuffle
data = data.sample(frac=1, random_state=1024)
# 取樣
data = data.sample(frac=0.8, random_state=1024)
data = data.sample(n=100, replace=false, random_state=none, axis=0)

pandas 資料處理

pandas中資料可以分為series，dataframe，panel分別表示一維至三維資料。其中在構造時，index表示行名，columns表示列名構造方式 s pd.series data index index s pd series np random randn 5 index a b ...

pandas資料處理

dataframe.duplicated subset none,keep first 判斷dataframe中的資料是否有重複必須一行中所有資料都重複才算重複，只能判斷行，不能判斷列返回series dataframe.drop duplicates subset none,keep firs...

Pandas資料處理

資料處理 pandas from sklearn.preprocessing import minmaxscaler data 1,2 0.5,6 0.10 1,18 將 numpy 轉換成 pd 表 pd.dataframe data 歸一化 0，1 之間 scaler minmaxscaler ...

pandas 資料處理

pandas 資料處理

pandas資料處理

Pandas資料處理

相關推薦