(1)特點
只會開啟乙個reduce做聚合,所以資料量很大的話,很影響效能;如果設定多個reduce但是,輸出還是只有乙個檔案
(2)例項
set mapreduce.job.reduces=2;
insert overwrite local directory '/opt/datas/emp_order' row format delimited fields terminated by '\t' select * from emp order by sal;
結果:
儘管設定了兩個reducer個數,但是在目錄下cat /opt/datas/emp_order只有乙個檔案000000_0,因為全域性排序,所以,set mapreduce.job.reduces=2;沒有影響
7369 smith clerk 7902 1980-12-17 800.0 \n 20
7900 james clerk 7698 1981-12-3 950.0 \n 30
7876 adams clerk 7788 1987-5-23 1100.0 \n 20
7521 ward salesman 7698 1981-2-22 1250.0 500.0 30
7654 martin salesman 7698 1981-9-28 1250.0 1400.0 30
7934 miller clerk 7782 1982-1-23 1300.0 \n 10
7844 turner salesman 7698 1981-9-8 1500.0 0.0 30
7499 allen salesman 7698 1981-2-20 1600.0 300.0 30
7782 clark manager 7839 1981-6-9 2450.0 \n 10
7698 blake manager 7839 1981-5-1 2850.0 \n 30
7566 jones manager 7839 1981-4-2 2975.0 \n 20
7788 scott analyst 7566 1987-4-19 3000.0 \n 20
7902 ford analyst 7566 1981-12-3 3000.0 \n 20
7839 king president \n 1981-11-17 5000.0 \n 10
(1)特點
針對每個reduce的結果進行排序,對於分割槽後的reduce進行排序
(2)例項
set mapreduce.job.reduces=2;
insert overwrite local directory '/opt/datas/emp_sort' row format delimited fields terminated by '\t' select * from emp sort by sal;
結果:/opt/datas/emp_sort目錄下有兩個檔案
000000_0 000001_0
[root@bigdata emp_sort]# cat 000000_0
7369 smith clerk 7902 1980-12-17 800.0 \n 20
7900 james clerk 7698 1981-12-3 950.0 \n 30
7876 adams clerk 7788 1987-5-23 1100.0 \n 20
7654 martin salesman 7698 1981-9-28 1250.0 1400.0 30
7521 ward salesman 7698 1981-2-22 1250.0 500.0 30
7844 turner salesman 7698 1981-9-8 1500.0 0.0 30
7566 jones manager 7839 1981-4-2 2975.0 \n 20
7902 ford analyst 7566 1981-12-3 3000.0 \n 20
7788 scott analyst 7566 1987-4-19 3000.0 \n 20
[root@bigdata emp_sort]# cat 000001_0
7934 miller clerk 7782 1982-1-23 1300.0 \n 10
7499 allen salesman 7698 1981-2-20 1600.0 300.0 30
7782 clark manager 7839 1981-6-9 2450.0 \n 10
7698 blake manager 7839 1981-5-1 2850.0 \n 30
7839 king president \n 1981-11-17 5000.0 \n 10
(1)特點
按照指定的字段進行分割槽,然後再做其他操作。
(2)例項
set mapreduce.job.reduces=3;
insert overwrite local directory '/opt/datas/emp_dist' row format delimited fields terminated by '\t' select * from emp distribute by deptno sort by sal;
結果:/opt/datas/emp_dist有三個檔案,因為設定了3個reducer
000000_0 000001_0 000002_0
[root@bigdata emp_dist]# cat 000000_0
7900 james clerk 7698 1981-12-3 950.0 \n 30
7521 ward salesman 7698 1981-2-22 1250.0 500.0 30
7654 martin salesman 7698 1981-9-28 1250.0 1400.0 30
7844 turner salesman 7698 1981-9-8 1500.0 0.0 30
7499 allen salesman 7698 1981-2-20 1600.0 300.0 30
7698 blake manager 7839 1981-5-1 2850.0 \n 30
[root@bigdata emp_dist]# cat 000001_0
7934 miller clerk 7782 1982-1-23 1300.0 \n 10
7782 clark manager 7839 1981-6-9 2450.0 \n 10
7839 king president \n 1981-11-17 5000.0 \n 10
[root@bigdata emp_dist]# cat 000002_0
7369 smith clerk 7902 1980-12-17 800.0 \n 20
7876 adams clerk 7788 1987-5-23 1100.0 \n 20
7566 jones manager 7839 1981-4-2 2975.0 \n 20
7788 scott analyst 7566 1987-4-19 3000.0 \n 20
7902 ford analyst 7566 1981-12-3 3000.0 \n 20
(1)特點
指定分割槽字段,並且按照分割槽字段排序。等於distribute by sal sort by sal,兩者是同樣的字段。但是這個情況應用的很少。
(2)例項
insert overwrite local directory '/opt/datas/emp_cls' row format delimited fields terminated by '\t' select * from emp cluster by sal;
Hive操作命令四 排序
order by會對輸入做全域性排序,因此只有乙個reducer,會導致當輸入規模較大時,需要較長的計算時間 sort by不是全域性排序,其在資料進入reducer前完成排序。因此,如果用sort by進行排序,並且設定mapred.reduce.tasks 1,則sort by只保證每個redu...
hive 三 排序和優化
排序 order by 全域性排序,執行乙個reduce任務 sort by 在乙個reduce任務中的資料是有序的,但是總體資料看是無序的。如果只是執行乙個reduce任務和order by是一樣的。通過set mapreduce.job.reduces num 設定reduce任務的數量。資料的...
SQL學習筆記5 排序
1.查詢某列,並按某列降序 從大到小 排列。select 某列 from 某錶 order by 某列 desc2.查詢某列,並按某列公升序 從小到大 排列。select 某列 from 某錶 order by 某列 asc不寫時預設公升序排列,下列 與上相同。select 某列 from 某錶 o...