堆排序與topK問題

找出乙個有10億數字陣列中，前k個最大值

第一步:hash去重

解法1:劃分法

def partition(l, left, right):
low = left
if left < right:
key = l[left]
high = right
while low < high:
while low < high and l[high] <= key:
high -= 1
l[low] = l[high]
while low < high and l[low] >= key:
low += 1
l[high] = l[low]
l[low] = key
return low
def topk(l, k):
if len(l) < k:
pass
low = 0
high = len(l) - 1
j = partition(l, low, high)
while j != k and low < high:
if k > j:
low += 1
else:
high = j
j = partition(l, low, high)
if __name__ == "__main__":
l = [3,2,7,4,6,5,1,8,0, 19, 23, 4, 5, 23, 3, 4, 0,1,2,3,45,6,5,34,212,3234,234,3,4,4,3,43,43,343,34,34,343,43,2]
n = 2 #find most max value
topk(l, n)
print 'result:', l[0:n]

result: [3234, 343]

思路：利用快速排序的原理，每次選取第left的值作為參考值:找出乙個劃分位置low，使得l[low]，左邊的值比參考值大，右邊的值比參考值小，這樣一直持續下去，直到low和k相等，則可以找到前k個最大值。因為選取每個參考值，都要便利一遍陣列，因此:演算法複雜度為o(n)。

優點：演算法複雜度最低

解法2：大頂堆法

思路:先用前k個值構建大頂堆，也就是頂部是最大值，如果下乙個值比頂部大，則立馬調整這個大頂堆，否則取葉子節點肯定是乙個最小值，如果陣列中值比最小值還小，則直接捨棄。演算法複雜度為o(n * log(n))

堆排序基本實現:

#coding: utf-8
#!/usr/bin/python
# create heap
def build_heap(lists, size):
for i in range(0, (int(size/2)))[::-1]:
adjust_heap(lists, i, size)
# adjust heap
def adjust_heap(lists, i, size):
lchild = 2 * i + 1
rchild = 2 * i + 2
max = i
if i < size / 2:
if lchild < size and lists[lchild] > lists[max]:
max = lchild
if rchild < size and lists[rchild] > lists[max]:
max = rchild
if max != i:
lists[max], lists[i] = lists[i], lists[max]
adjust_heap(lists, max, size)
# heap sort
def heap_sort(lists):
size = len(lists)
build_heap(lists, size)
for i in range(0, size)[::-1]:
lists[0], lists[i] = lists[i], lists[0]
adjust_heap(lists, 0, i)
return lists
a = [2,3,4,5,6,7,8,9,1,2,34,5,4,54,5,45,4,5,45,4,5,646,456,45,6,45,645,6,45,6,456,45,6,323,412,3,25,5,7,68,6,78,678]
print("began sort:%s" %a)
b = heap_sort(a)
print("end sort:%s" %b)

began sort:[2, 3, 4, 5, 6, 7, 8, 9, 1, 2, 34, 5, 4, 54, 5, 45, 4, 5, 45, 4, 5, 646, 456, 45, 6, 45, 645, 6, 45, 6, 456, 45, 6, 323, 412, 3, 25, 5, 7, 68, 6, 78, 678]

end sort:[1, 2, 2, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 7, 7, 8, 9, 25, 34, 45, 45, 45, 45, 45, 45, 54, 68, 78, 323, 412, 456, 456, 645, 646, 678]

堆排序與topK問題

堆排序 TOPK問題 C

面經筆記堆排序與topk問題

堆排序的應用 TOPK問題

堆排序與topK問題

堆排序 TOPK問題 C

面經筆記 堆排序與topk問題

堆排序的應用 TOPK問題

相關推薦

面經筆記堆排序與topk問題