Pandas 基本介紹

本文是pandas的基本介紹

若用 python 的列表和字典來作比較, 那麼可以說 numpy 是列表形式，而 pandas 就是字典形式。

pandas是基於numpy構建的，讓numpy為中心的應用變得更加簡單。

要使用pandas，首先要了解他主要兩個資料結構：series和dataframe。

series的字串表現形式為：索引在左邊，值在右邊。

若沒為資料指定索引，就會自動建立乙個0到n-1（n為長度）的整數型索引。

import pandas as pd
import numpy as np
s = pd.series([1
,3,6
,np.nan,44,
1])print
(s)"""
0     1.0
1     3.0
2     6.0
3     nan
4    44.0
5     1.0
dtype: float64
"""

dataframe是乙個**型的資料結構，它包含有一組有序的列，每列可以是不同的值型別（數值，字串，布林值等）。

dataframe既有行索引也有列索引，

可被看做由series組成的大字典。

dates = pd.date_range(
'20160101'
,periods=6)
df = pd.dataframe(np.random.randn(6,
4),index=dates,columns=
['a'
,'b'
,'c'
,'d'])
print
(df)
"""                   a         b         c         d
2016-01-01 -0.253065 -2.071051 -0.640515  0.613663
2016-01-02 -1.147178  1.532470  0.989255 -0.499761
2016-01-03  1.221656 -2.390171  1.862914  0.778070
2016-01-04  1.473877 -0.046419  0.610046  0.204672
2016-01-05 -1.584752 -0.700592  1.487264 -1.778293
2016-01-06  0.633675 -1.414157 -0.277066 -0.442545
"""

print (df[ 'b'] )""" 2016-01-01 -2.071051 2016-01-02 1.532470 2016-01-03 -2.390171 2016-01-04 -0.046419 2016-01-05 -0.700592 2016-01-06 -1.414157 freq: d, name: b, dtype: float64

"""

建立一組，沒給定行標籤和列標籤的資料 df1

會採取預設的從0開始 index。

df1 = pd.dataframe(np.arange(12)
.reshape((3
,4))
)print
(df1)
"""   0  1   2   3
0  0  1   2   3
1  4  5   6   7
2  8  9  10  11
"""

還有一種生成 df 的方法

能對每一列的資料進行特殊設定

df2 = pd.dataframe( )print (df2) """ a b c d e f 0 1.0 2013-01-02 1.0 3 test foo 1 1.0 2013-01-02 1.0 3 train foo 2 1.0 2013-01-02 1.0 3 test foo 3 1.0 2013-01-02 1.0 3 train foo

"""

用屬性dtype，可以檢視資料中的型別:

print (df2.dtypes) """df2.dtypes a float64 b datetime64[ns] c float32 d int32 e category f object dtype: object

"""

index檢視行的序號

print
(df2.index)
# int64index([0, 1, 2, 3], dtype='int64')

columns檢視列的名稱

print
(df2.columns)
# index(['a', 'b', 'c', 'd', 'e', 'f'], dtype='object')

只檢視所有df2的values:

print
(df2.values)
"""array([[1.0, timestamp('2013-01-02 00:00:00'), 1.0, 3, 'test', 'foo'],
[1.0, timestamp('2013-01-02 00:00:00'), 1.0, 3, 'train', 'foo'],
[1.0, timestamp('2013-01-02 00:00:00'), 1.0, 3, 'test', 'foo'],
[1.0, timestamp('2013-01-02 00:00:00'), 1.0, 3, 'train', 'foo']], dtype=object)
"""

用 describe()檢視資料的總結

df2.describe( )""" a c d count 4.0 4.0 4.0 mean 1.0 1.0 3.0 std 0.0 0.0 0.0 min 1.0 1.0 3.0 25% 1.0 1.0 3.0 50% 1.0 1.0 3.0 75% 1.0 1.0 3.0 max 1.0 1.0 3.0

"""

transpose翻轉**的行和列

print
(df2.t)

對資料的 index 進行排序

print (df2.sort_index(axis= 1, ascending= false)) """ f e d c b a 0 foo test 3 1.0 2013-01-02 1.0 1 foo train 3 1.0 2013-01-02 1.0 2 foo test 3 1.0 2013-01-02 1.0 3 foo train 3 1.0 2013-01-02 1.0

"""

對資料值排序輸出:

print (df2.sort_values(by= 'b') )""" a b c d e f 0 1.0 2013-01-02 1.0 3 test foo 1 1.0 2013-01-02 1.0 3 train foo 2 1.0 2013-01-02 1.0 3 test foo 3 1.0 2013-01-02 1.0 3 train foo

"""

參考莫煩python，簡單易懂！

打call

Pandas 基本介紹

Pandas 基本介紹和基礎操作

pandas的學習1 基本介紹

Pandas介紹使用

Pandas 基本介紹

Pandas 基本介紹和基礎操作

pandas的學習1 基本介紹

Pandas介紹 使用

相關推薦

Pandas介紹使用