1、因資料庫中資料量超過1000萬條,大於桶最大值聚合,通過kibana做如下設定後可做步驟3的聚合查詢
put _cluster/settings
}
2、通過kibana做如下查詢,檢視聚合後網域名稱數量
get hidden/_search}}
}
3、通過聚合查詢取出全部(聚合去重後網域名稱)寫入檔案
# -*- coding: utf-8 -*-
import time
from elasticsearch import elasticsearch
query_es = elasticsearch(hosts=
'172.16.20.178'
, port=
9200
, timeout=
30, max_retries=10)
defdomain_query_write()
: search_query =}}
} response = query_es.search(index=
'hidden'
, body=search_query)
cnt =
0 start_time = time.time(
) res_list = response[
"aggregations"][
"domain"][
"buckets"
]for record in res_list:
cnt +=
1 end_time = time.time(
)print
(cnt, record[
"key"
], end_time - start_time)
with
open
('new_domain.txt'
,'a+'
,encoding=
'utf-8'
)as f:
f.write(record[
"key"]+
'\n'
) f.close(
)return
none
if __name__ ==
'__main__'
: domain_query_write(
)
ES聚合查詢例項
disable coord false adjust pure negative true boost 1 explain false,aggregations view code disable coord false adjust pure negative true boost 1 expla...
es 平行多次聚合查詢
比如說我有 city,source,company,user等字段,然後需要根據 user 欄位來查出 各城市對應的 source,company欄位,即同樣條件不同字段聚合,如下 user city company user city source需要同時返回這兩種聚合結果,要怎麼實現呢 aggs...
es的bool,排序,聚合組合查詢
傳入的jsonobject sort 排序指定key aggs 聚合指定key sql的count 操作 列名6 獲取連線,clustername 集群名,hostname ip,port 埠號 9300 public transportclient init catch unknownhostex...