1.獲取字串的去重後列表
2.構造全為0的陣列(dataframe), columns為字串的列表
3.給全為0的陣列賦值
第一步
importpandas as pd
import
numpy as np
df = pd.dataframe()
#print(df)
print('
=' * 40)
print(df['c'
])"""
0 one,two,three
1 one,two
2 two,four
3 two,five,four,six
4 seven,eight,one
5 nine,ten,six,four
6 ten,six,two,seven
name: c, dtype: object
"""a = df['
c'].str.split(','
(a)"""
0 [one, two, three]
1 [one, two]
2 [two, four]
3 [two, five, four, six]
4 [seven, eight, one]
5 [nine, ten, six, four]
6 [ten, six, two, seven]
name: c, dtype: object
"""print('
=' * 50)
a_lst = df['
c'].str.split(','
).tolist()
(a_lst)
#[['one', 'two', 'three'], ['one', 'two'], ['two', 'four'],
#['two', 'five', 'four', 'six'], ['seven', 'eight', 'one'],
#['nine', 'ten', 'six', 'four'], ['ten', 'six', 'two', 'seven']]
print('
*' * 60)
new_lst =
for i in
a_lst:
for j in
i:
if j not
innew_lst:
(new_lst)
#['one', 'two', 'three', 'four', 'five',
#'six', 'seven', 'eight', 'nine', 'ten']
第二步
df_zeros = pd.dataframe(data=np.zeros((df.shape[0], len(new_lst))), columns=new_lst)(df_zeros)
"""one two three four five six seven eight nine ten
0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
4 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
6 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
"""
方法二(資料量大的情況下使用)
for i innew_lst:
df_zeros[i][df['c
'].str.contains(i)] = 1
print(df_zeros)
第三步
for i inrange(df_zeros.shape[0]):
df_zeros.loc[i, a_lst[i]] = 1
(df_zeros)
"""one two three four five six seven eight nine ten
0 1.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
1 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2 0.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0
3 0.0 1.0 0.0 1.0 1.0 1.0 0.0 0.0 0.0 0.0
4 1.0 0.0 0.0 0.0 0.0 0.0 1.0 1.0 0.0 0.0
5 0.0 0.0 0.0 1.0 0.0 1.0 0.0 0.0 1.0 1.0
6 0.0 1.0 0.0 0.0 0.0 1.0 1.0 0.0 0.0 1.0
"""
資料分析 series字串離散化
問題 1 假設dataframe中有一列名為type,其字段中內容為a,b,c 等用,隔開的值,如 type a,b,c a,f,x b,c,e 統計type中每個型別出現的次數 並繪圖 import pandas as pd import numpy as np from matplotlib i...
06 統計方法和字串離散化
假設現在我們有一組從2006年1000部最流行的電影資料,我們想知道這些電影資料中的評分的平均分,導演的人數等資訊,我們應該怎麼獲取?import pandas as pd from matplotlib import pyplot as plt file path imdb movie data....
python統計電影分類(字串離散化案例)
以下兩句是顯示中文的方法 from pylab import mpl.rcparams font.sans serif simhei 有效的方法 file path c users ming desktop dataanalysis master day05 code imdb movie data...