python的pandas處理資料第一次

一、這是kaggle上面的鐵達尼號題，看帖子上有資料探索過程，照著做了一下，感覺跟r差不多，應該是我還沒有深入學吧。

二、matplotlib的作圖子包pyplot先學一下，plt.figure()是定義乙個影象，再用figure.add_subplot()方法增加子圖，設定圖的排列順序。

import pandas as pd

import matplotlib.pyplot as plt

dt=pd.read_csv('c:\\users\\administrator\\desktop\\tatannic\\train.csv')

#第二個圖，年紀和人數的關係

age=dt.age

mean=age.mean()

age=age.fillna(mean)

fig=plt.figure()

ax=fig.add_subplot(1,2,1)

ax.hist(age,bins=10)

plt.xlabel('age')

plt.ylabel('count of people')

#第二個圖，船票和人數的關係

fare=dt.fare

ax=fig.add_subplot(1,2,2)

ax.hist(fare,bins=10)

plt.xlabel('fare')

plt.show()

#畫船票和人數的箱子圖

fig2=plt.figure()

ax=fig2.add_subplot(1,1,1)

ax.boxplot(fare)

plt.xlabel('fare')

plt.ylabel('count of people')

三、資料的groupby、

一直按照結合sql語句、r語言，python學習資料處理，基本上思想都是處理乙個表。

pandas中的groupby是乙個方法，引數是某一列。

>dt.groupby(['pclass','survived']).pclass.count()

out[52]:

pclass survived

1 0 80

1 136

2 0 97

1 87

3 0 372

1 119

name: pclass, dtype: int64

會按照pclass和survived聚合，而且會分別做統計。

統計每個pclass每一類字段對應的survived屬性的總量。

>dt.groupby('pclass').survived.count()

out[66]:

pclass

1 216

2 184

3 491

統計每個pclass每一類字段對應的survived屬性為1的總量

>dt.groupby(['pclass']).survived.sum()

out[71]:

pclass

1 136

2 87

3 119

name: survived, dtype: int64

python處理資料，pandas 處理txt檔案

以wordsim240為例目前大多數的資料集以txt檔案居多，但是我們在資料處理中，可能最終會在excel上進行一些分析製圖，當然也可以使用python製圖包，也可以做出很精美的結果圖。此篇只是簡單的讀取，檔案，儲存到excel中，可以做一些回歸分析相關係數等 txt中的原始資料 sep t ...

python使用pandas處理excel資料

使用python pandas庫讀取excel檔案 xlsx,xls 使用pandas的 read excel 方法來讀取excel資料，可以讀取第乙個sheet，指定的sheet，多個sheet或所有的sheet。pandas會將這些資料轉化成乙個 dataframe結構，它是乙個扁平的結構來的。...

Python使用pandas處理CSV檔案

python中有許多方便的庫可以用來進行資料處理，尤其是numpy和pandas,再搭配matplot畫圖專用模組，功能十分強大。csv comma separated values 格式的檔案是指以純文字形式儲存的資料，這意味著不能簡單的使用excel 工具進行處理，而且excel 處理的資料量...

python的pandas處理資料第一次

python處理資料，pandas 處理txt檔案

python使用pandas處理excel資料

Python使用pandas處理CSV檔案

相關推薦