從大表裡隨機取若干行的效率問題

知乎裡有個問題在 mysql 中，從 10 萬條主鍵不連續的資料裡隨機取 3000 條，如何做到高效？，用

select id from t order
byrand() limit 3000

需要花三四十秒，怎麼辦？

以下是我的回答：

慢的原因有兩個：

1）rand()執行了10萬次(而不是3000次）

2）10萬條記錄進行了排序

我的方案如下：

我用informationschema.columns表來舉例。這個表在我的資料庫裡有7482條記錄，我要從中取30個column_name。(每個mysql資料庫的informationschema.columns記錄數不同，用這個表的好處是誰都可以試）

先看看這張表：

select
count(*) from information_schema.columns;

mysql> select count(*) from information_schema.columns;

+----------+

| count(*) |

+----------+

| 7482 |

+----------+

1 row in set (0.09 sec)

第一步是做乙個全表查詢，加上乙個行號欄位n,從1到7482 (mysql沒有row_number功能，所以用此法）

select
@x:=@x+
1 n,column_name from information_schema.columns,(select
@x:=
0) x

結果類似這樣：

| n | column_name |

| 1 | character_set_name |

| 2 | default_collate_name |

| 3 | description |

| 4 | maxlen |

| 5 | collation_name |

其次做乙個查詢，產生30個隨機數，1-7482之間

select
distinct ceil(r*c) n from (select
rand() r from information_schema.columns limit 35) t1
,(select
count(*) c from information_schema.columns) t2 
limit 30

結果大致是這樣的；

| n+------

| 4452

| 3838

| 5835

| 2694

| 3449

| 1678

|……主要子查詢t1裡故意產生了35個隨機數，然後在外面用了distinct,這樣可以基本保證有30個不重複的整數

最後，組裝一下做個左連線就可以了：

select tblrand.n,tbldata.column_name from

(select distinct ceil(r*c) n from (select rand() r from information_schema.columns limit 35) t1,

(select count(*) c from information_schema.columns) t2 limit 30) tblrand

left join

(select @x:=@x +1 n,column_name from information_schema.columns,(select @x:=0) x) tbldata on tblrand.n=tbldata.n;

總結：1）只做了30多次rand()，而不是7400多次。

2) 沒有排序

3) 對錶的id連續性沒要求。例項中的表壓根沒id欄位

以上例子，用root使用者登入mysql就能測試

shell 隨機從檔案中抽取若干行

shuf n5 main.txt sort r main.txt head 5 awk vn 5 vc wc l file begin shuf 命令的選項 e,echo 將每個引數視為輸入行 i,input range lo hi 將lo 到hi 的每個數字視為輸入行 n,head count 行...

PHP隨機讀取大檔案中的若干行

我們在日常的工作中，經常會遇到用php操作大檔案，比如需要分析系統日誌等問題，有的日誌檔案可能很大，幾個g以上，如果用file和file get contents函式的話，就會產生問題，由於這兩個函式是一次性將檔案內容載入到記憶體中，而有時候php本身或機器記憶體的限制，往往就會產生記憶體的溢位。下...

從列表中隨機取數

列表為 a j g h k i l f v b 2 5 x 版本1 import numpy as np word list a j g h k i l f v b 2 5 x len list range len word list 將列表生成字典，key為元列表的id序號，values為對應序號...

從大表裡隨機取若干行的效率問題

shell 隨機從檔案中抽取若干行

PHP隨機讀取大檔案中的若干行

從列表中隨機取數

相關推薦