Pandas 資料操作求平均值資料去重

我們有一組從

2006

年到2016

年1000

部最流行的電影資料

，我們想知道這些電影資料中

評分的平均分

，導演的人數，演員人數等資訊，我們應該怎麼獲取？

資料資訊如下：

rank 1000 non-null int64

title 1000 non-null object

genre 1000 non-null object

description 1000 non-null object

director 1000 non-null object

actors 1000 non-null object

year 1000 non-null int64

runtime (minutes) 1000 non-null int64

rating 1000 non-null float64

votes 1000 non-null int64

revenue (millions) 872 non-null float64

metascore 936 non-null float64

dtypes: float64(3), int64(4), object(5)

import pandas as pd
df=pd.read_csv("imdb-movie-data.csv")
print(df.head(1))
print(df.info())
# 求平均值
print(df["rating"].mean())
# 導演的人數
print(len(df["director"].unique()))
# 將演員的名字展開為列表
temp_actors_list=df["actors"].str.split(", ").tolist()
actors_list=[i for j in temp_actors_list for i in j]
actors_num=len(set(actors_list))
print(actors_num)

pandas求滑動平均值

df.rolling 3,center true mean 如果求最小值最大值求和等，可以改變最後面的函式，比如 df.rolling 3,center true min df.rolling 3,center true max df.rolling 3,center true sum 其他引數...

hive UDAF求平均值

最近做資料遷移專案，把聚合部分從kettle遷移到hadoop集群上，需要寫很多聚合指令碼在論壇是看到alipay同事寫過類似cube的udaf,拿過來執行下報錯，有幾個地方沒看多，而且沒有注釋，只好從基礎開始看，自己搞乙個，之前寫過udf所以入手還是聽快的準備 1 實現自己的udaf需要整合u...

spark 求平均值

val rdd sc.makerdd list a 1 a 2 a 3 b 1 b 2 b 3 b 4 a 4 2 rdd.combinebykey x x,1 x int,int y int x.1 y,x.2 1 x int int y int int x.1 y.1,x.2 y.2 mapva...

Pandas 資料操作求平均值 資料去重

pandas求滑動平均值

hive UDAF求平均值

spark 求平均值

相關推薦

Pandas 資料操作求平均值資料去重