pandas 中 stack 的使用

2021-08-21 05:33:08 字數 3406 閱讀 9016

有時候需要將特徵名稱轉化為變數,也就是將資料集由橫向改為縱向,或者為轉秩。使用場景如下:

# 資料集

in [5]: test

out[5]:

tweet_id doggo floofer pupper puppo

0675003128568291329

none

none

none

none

1786233965241827333

none

none

none

none

2683481228088049664

none

none pupper none

3675497103322386432

none

none

none

none

# 先設定index,再使用.stack()方法由橫向變縱向,對特徵進行命名

in [6]: s1 = test.set_index('tweet_id').stack().rename('stage')

in [7]: s1

out[7]:

tweet_id

675003128568291329 doggo none

floofer none

pupper none

puppo none

786233965241827333 doggo none

floofer none

pupper none

puppo none

683481228088049664 doggo none

floofer none

pupper pupper

puppo none

675497103322386432 doggo none

floofer none

pupper none

puppo none

name: stage, dtype: object

# 將多重索引reset

in [8]: s2 = s1.reset_index()

in [9]: s2

out[9]:

tweet_id level_1 stage

0675003128568291329 doggo none

1675003128568291329 floofer none

2675003128568291329 pupper none

3675003128568291329 puppo none

4786233965241827333 doggo none

5786233965241827333 floofer none

6786233965241827333 pupper none

7786233965241827333 puppo none

8683481228088049664 doggo none

9683481228088049664 floofer none

10683481228088049664 pupper pupper

11683481228088049664 puppo none

12675497103322386432 doggo none

13675497103322386432 floofer none

14675497103322386432 pupper none

15675497103322386432 puppo none

# 將level_1列刪除,同時stage列只保留不為none的資料

in [10]: s2.drop(['level_1'], axis=1, inplace=true)

in [11]: s3 = s2[s2.stage != 'none']

in [12]: s3

out[12]:

tweet_id stage

10683481228088049664 pupper

# 跟原始資料集進行合併

in [14]: result = pd.merge(test, s3, how='left', on='tweet_id')

in [15]: result

out[15]:

tweet_id doggo floofer pupper puppo stage

0675003128568291329

none

none

none

none nan

1786233965241827333

none

none

none

none nan

2683481228088049664

none

none pupper none pupper

3675497103322386432

none

none

none

none nan

# 刪除中間特徵,得到最終結果

in [16]: result.drop(['doggo','floofer','pupper','puppo'], axis=1)

out[16]:

tweet_id stage

0675003128568291329 nan

1786233965241827333 nan

2683481228088049664 pupper

3675497103322386432 nan

in [17]: test

out[17]:

tweet_id doggo floofer pupper puppo

0675003128568291329

none

none

none

none

1786233965241827333

none

none

none

none

2683481228088049664

none

none pupper none

3675497103322386432

none

none

none

none

應該有更為簡便易行的方法。後續補充。

pandas中的stack與unstack簡單描述

在用pandas進行資料重排時,經常用到stack和unstack兩個函式。stack簡單理解可以是堆疊,堆積,unstack即 不要堆疊 下面為較為淺顯的講述該方法,並未涉及到多標籤的問題。常見的資料的層次化結構有兩種,一種是 一種是 花括號 即下面這樣的l兩種形式 在行列方向上均有索引 類似於d...

Pandas中melt 的使用

pandas.melt 使用引數 pandas.melt frame,id vars none,value vars none,var name none,value name value col level none 引數解釋 frame 要處理的資料集。id vars 不需要被轉換的列名。val...

Pandas中pivot的使用

pivot函式用於從給定的表中建立出新的派生表,pivot有三個引數 索引 列和值。具體如下 def pivot index,columns,values produce pivot table based on 3 columns of this dataframe.uses unique val...