時間與推薦 python驗證

2021-06-21 06:47:24 字數 4150 閱讀 4620

在我了解到的所有推薦系統的結構裡(都是課本,課堂上的水版)沒有購買順序的考慮,即所有的基礎推薦系統都是把購買行為使用bag of words的方法進行過濾和推薦的。

但是,有乙個問題就是,真的購買順序與下次購買無關嗎?拿這個例子來說:買耳機,先買了個200塊的akg,又買了個200塊的阿斯翠,這是很可能買手會發現國產價效比很不錯,轉而第三個仍然是用阿斯翠;如果先阿斯翠後akg,可能會發現動圈的聽感要比動鐵舒服,可能下乙個就是另乙個牌子的動圈耳機了。

為了驗證這個猜想,我使用有序版的apriori演算法進行測試,資料集是阿里推薦大賽的資料。使用python,很慢,所以只能用資料集的一部分,看乙個結果。

演算法過程結合了apriori和skip-k-n-gram,使用apriori的思想進行協同過濾,而過濾的關鍵字使用skip-k-n-gram產生,即abc,形成ab,ac,bc(skip-1-2-gram)的組合。

預處理資料:

#/usr/bin/python

import csv

import re

def load(filename):

print 'loading'

reader = csv.reader(file(filename, 'rb'))

pattern = re.compile(r'\d+')

maps = {}

for line in reader:

date = pattern.findall(line[-1])

time = (int)(date[0]) * 100 + (int)(date[1])

uid = (int)(line[0])

op = line[2] + '-' + line[1]

if(uid in maps):

else:

maps[uid] = [op]

return maps

返回資料的格式是:,其中op是按時間排序的。

使用apriori的變種進行驗證我的想法:

#!/usr/bin/python

import dld

#data =

data = dld.load('data.csv')

threshold = 10

def patterns(old, uid, i, k):

"""skip k n gram algorithm for user action"""

#for first call

if(not isinstance(old, list)):

old = [old]

#the whole action list

line = data[uid]

result = #pattern, uid, endpoint in action list

j = 0

while (j < k and (i + j + 1) < len(line)):

j = j + 1

return result

def passes(pattern, t):

"""reform the result from patterns

and filter the results using threshold t

part of apriori"""

pattern.sort(key=lambda record : record[0])

print 'sorted %d' %(len(pattern))

passed = #[pattern, [[uid, endpoint], [uid, endpoint],...]

last = ['', 0, 0]#merge same actions

for item in pattern:

if (item[0] == last[0]):#same

else:#new

last = item

passed = [i for i in passed if len(i[1]) > t]#filter

return passed

"""" main """

print 'first round'

pattern = ;

for uid in data:

i = 0

line = data[uid]

for item in line:

pattern[len(pattern):] = patterns(item, uid, i ,3)

i = i + 1

#print pattern

passed = passes(pattern, threshold)

print 'passed %d' %(len(passed))

#print passed

j = 0

while (j < 2):

print 'round %d' %(j)

pattern =

for record in passed:

old = record[0]

for item in record[1]:

pattern[len(pattern):] = patterns(old, item[0], item[1], 3)

#print pattern

passed = passes(pattern, threshold)

print 'passed %d' %(len(passed))

#print passed

j = j + 1

#print passed

print 'statics'

result = [[p[0], len(p[1])] for p in passed]#pattern, count

print 'result %d' %(len(result))

#print result

statics = {}

print 'analyse'

for item in result:

key = sorted(item[0][:-1])

hashkey = '+'.join(key)

if(hashkey in statics):

else:

statics[hashkey] = [[item[0], item[1]]]

sop = 0#same action

src = 0#same recommand

for static in statics:

out = statics[static]

before = len(out)

statics[static] =

last = [[0],0]

maps = {}

for item in out:

if(cmp(item[0][0], item[0][1]) != 0 or cmp(item[0][0], item[0][2]) != 0):#all same action

if(last[0][:-1] == item[0][:-1]):

sop = sop + 1

if(last[0][1] < item[0][1]):

statics[static][-1] = item[0][1]

last = item

else:

if(item[0][-1] not in maps):

maps[item[0][-1]] = 0

last = item

else:

src = src + 1

#print 'ya filter %d/%d' %(len(statics[static]), before)

for static in statics:

out = statics[static]

if(len(out) > 2):

#print out

pass

print 'final\n positive:%d\n same action:%d\n same reco:%d' %(len(statics), sop, src)

結果呢,基本可以看出有旗鼓相當的一些人,因為之前的順序,在下次購買時選擇了不同的物品。至於為什麼沒人用這個演算法,或者這類演算法,請見

下文。

jquery驗證時間

驗證時間的正規表示式集合 日期格式yyyy patternsdict.date y d 日期格式yyyy mm patternsdict.date ym d 0 d 1 0 2 日期格式yyyy mm dd patternsdict.date ymd d 0 d 1 0 2 0 d 12 d 3 0...

Python 日期與時間

python 3.6.4 import time,calendar,datetime print 距離1970年的秒數為 time.time print 本地時間為 time.localtime print 格式化 time.asctime time.localtime time.time prin...

Python日期與時間

python日期操作 import time print time.localtime strcy time元組 time.struct time tm year 2017,tm mon 12,tm mday 30,tm hour 15,tm min 26,tm sec 47,tm wday 5,t...