使用者留存模型的一種設計方法

使用者留存是使用者分析中最常用到的指標之一。

我們常常接到這樣的需求：

~我們要看1天、2天、3天、4天 … 7天的留存~

~我們要看1天、2天、3天、4天 … 28天的留存~

還有一些不按套路出牌的：

我們要看第33天的留存

我們要看第56天的留存

讓你在**裡寫連著7天的留存，就已經寫到手抽筋了，恨不得寫個**生成器來生成**。

後面再來個連著28天的留存… 本來以為這就是終結了，誰知後面還有33天、49天、56天…

可以這麼來設計

create external table if not exists  gdm_user_left_info_day(
uuid  string comment '使用者id'
liucun_map mapcomment '使用者90天留存情況 map(lc1:1,lc2:1...lc90:1)'
) comment '日活躍1~90天使用者留存資訊'
partitioned by (
day string
)stored as orc

當然了，欄位不限於這兩個，可以依據實際需求，新增其它屬性。

只計算近90天的留存，是因為經過考察，有99%需求，都是計算90天以內的留存。

偽**如下：

insert overwrite table gdm_user_left_info_day partition (day)
select
t1.uuid,
str_to_map
(concat_ws
(',',
collect_set
(concat('lc',cast(datediff(t2.day,t1.day) as string),'@@@','1') 
)),',','@@@'
) as liucun_map,
t1.day
from
(select
day,
uuid
from active where day>='$day' and day<='$dayago90' 
) t1 left join
(select
day,
uuid
from active  where   day>='$dayago1' and day<='$dayago91' 
) t2 on t1.uuid=t2.uuid 
where datediff(t2.day,t1.day)>0
;

其中，active 為日活躍使用者表。這樣以來，每天更新近90天的使用者留存，不僅解決了跑數的問題，同時，表裡已經計算好了1~90天使用者的留存情況。

--要計算2019-08-01日活躍使用者的 7日留存使用者數，20日留存使用者數：
select 
sum(liucun_map['lc7'])),
sum(liucun_map['lc20'])) 
from gdm_user_left_info_day 
where day='2020-01-01'   
;

一種計算留存的思路

在一些統計系統中，為了觀察使用者的粘度，我們會計算一系列的叫做留存的指標次日註冊留存 2日註冊留存.n日註冊留存，比如昨天註冊了1000名使用者中，在今天有300名使用者又登入了，那麼對應於昨天的註冊留存就是30 如果再去細究，還可以去計算活躍使用者的留存情況，比如昨天登入的1000名使用者中，在...

每天一種設計模式模板方法

松本行弘的程式世界對模板方法 template method 的說明非常清晰在父類的乙個方法中定義演算法的框架，其中幾個步驟的具體內容則留給子類來實現。class a def initialize name jinbin word hello end def say word puts welc...

Linux使用者程序記憶體洩露一種檢測方法

在 linux 中，使用者程序在 proc status 檔案中記錄了該程序的記憶體使用實時情況。vmsize 虛擬記憶體大小。整個程序使用虛擬記憶體大小，是 vmlib,vmexe,vmdata,和 vmstk 的總和。vmlck 虛擬記憶體鎖。程序當前使用的並且加鎖的虛擬記憶體總數 vmrss ...

使用者留存模型的一種設計方法

一種計算留存的思路

每天一種設計模式 模板方法

Linux使用者程序記憶體洩露一種檢測方法

相關推薦

每天一種設計模式模板方法