python讀csv python讀寫csv檔案

#1匯入相關包

importosimportreimportcsv#1讀取csv檔案

def read_csv(filename, header=false):

res=

with open(filename) as f:

f_csv=csv.reader(f)if header:#預設讀取頭部檔案

headers =next(f_csv)

header=falsefor row inf_csv:

defwrite_csv(data, filename):

with open(filename,"wb") as f:

f_csv=csv.writer(f)#一行一行寫入

for item indata:

f_csv.writerow(item)

2.有時候檔案是txt或者從hive等資料庫匯出來的格式則可以用下面的方法讀取資料並進行分析

#3讀取文字格式

defread_text(filename, columns, delimeter):#columns：多少列

#delimeter:分隔符

res =

with open(filename,"rb") as f:whiletrue:

line=f.readline()ifline:

line= re.sub("[ ]", "", line)#清楚換行符

return res

3.也可以用numpy直接進行讀取檔案格式：loadtxt(fname, dtype=float,delimiter=none,skiprows=0, usecols=none, unpack=false)fname:檔名，dtype:資料型別，也可使是str，delimiter：分隔符，skiprows:跳過開頭幾行，usecols:讀取某一列或者幾列的值比如(0,3)表示讀取第一和四列的值。

例子：npload.txt

409208.3269760.9539523qwe

144887.1534691.6739042aad

260521.4418710.8051241zc

7513613.1473940.4289641wed

importnumpy as np

filename= "e:/pythonproject/commonfunction/input/npload.txt"res= np.loadtxt(filename,dtype=str,delimiter=" ",skiprows=1,usecols=(0,3,4),unpack=false)printtype(res)printres

x,y,z= np.loadtxt(filename,dtype=str,delimiter=" ",skiprows=1,usecols=(0,3,4),unpack=true)print x#第一列

print y#第四列

print z#第五列

結果：[['14488' '2' 'aad']

['26052' '1' 'zc']

['75136' '1' 'wed']]

['14488' '26052' '75136']

['2' '1' '1']

['aad' 'zc' 'wed']

4.pandas也是乙個強大的資料分析工具，直接讀取csv,excel檔案，或者吧pandas的dataframe直接儲存為csv或者excel格式：例如把上面的資料可以通過write_csv()方法儲存為csv格式，然後可以直接用pandas讀取。

pd.read_csv(filename, header=none, index_col=0, usecols=(1,2,3), skiprows=0)引數和np.loadtxt()引數解釋基本是一樣的。read_excel(io,sheet_name=0,header=0,index_col=none,usecols=none,dtype=none,skiprows=none)常用的引數解釋都是一樣的。相應的儲存方法則是to_csv()和to_excel()

importpandas as pd

filename="e:/pythonproject/commonfunction/input/npload.csv"df= pd.read_csv(filename, header=none, index_col=0, usecols=(1,2,3), skiprows=0)print df.head()

結果：2 3

8.326976 0.953952 3

7.153469 1.673904 2

1.441871 0.805124 1

13.147394 0.428964 1

5.用的著的小技巧：pandas的pivot方法和numpy的permutation

df = pd.dataframe()printdf

df.pivot(index='foo', columns='bar', values='baz')#以foo為索引列，以bar列為行，zoo列為值，注（foo, bar）不能重複

結果bar baz foo zoo

0 a 1 one x

1 b 2 one y

2 c 3 one z

3 a 4 two q

4 b 5 two w

5 c 6 two t

bar a b c

fooone 1 2 3

two 4 5 6

np.random.seed(0)print np.random.permutation(10)#將0-9隨機打亂，可用於隨機取資料集

x = range(10)

np.random.shuffle(x)#洗牌，引數需要是乙個可迭代的物件

print x

[2 8 4 9 1 6 7 3 0 5]

[3, 5, 1, 2, 9, 8, 0, 6, 7, 4]

注：當然也可以用sklearn的train_test_split方法分割資料集

from sklearn.model_selection import train_test_split

train_set,test_set = train_test_split(sampledata, test_size=0.2, random_state=42)

如果資料集含有標籤，可以和標籤一塊分割 x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.33, random_state=42)

python讀csv python讀寫csv檔案

python讀座標畫素 python如何讀取畫素值

python讀音Python怎麼讀

python利用ElementTree讀寫xml

python讀csv python讀寫csv檔案

python讀座標畫素 python如何讀取畫素值

python讀音Python怎麼讀

python利用ElementTree讀寫xml

相關推薦