HDFS 基本使用

大資料生態技術集群：

hadoop：分布式檔案系統hdfs+分布式計算框架mapreduce+yarn資源排程管理框架

hive：資料倉儲，類似sql

zookeeper：一致性協調

sqoop:資料匯入和資料採集（關係型資料hdfs）

flume：資料採集框架（日誌採集）

storm：實時流式計算框架

spark：記憶體計算框架（sparkcore，sparksql，sparkstreaming），一站式處理

機器學習：

mahout：基於mapreduce的機器學習演算法庫

mllib：基於spark的機器學習演算法庫

學習方法：

1）理解適用場景和基本功能

2）使用（安裝部署，程式設計規範，api）

3）執行機制

4）結構原理

5）原始碼

hadoop常用命令：

1)檢視hdfs檔案系統根目錄：hadoop fs –ls /

2)拷貝本地檔案至hdfs：hadoop fs –copyfromlocal 本地檔案 hdfs路徑或者hadoop fs –put 本地檔案 hdfs路徑

3)拷貝hdfs至本地檔案：hadoop fs –copytolocal hdfs檔案本地路徑或者hadoopfs –get hdfs檔案本地路徑

4)hdfs下建立目錄：hadoopfs –mkdir /cq

5)檢視hdfs目錄下的檔案內容：hadoop fs –cat /cq/txt

6)拷貝hdfs檔案至hdfs檔案：hadoop fs –cp 原始檔路徑目標路徑

hdfs各個組建的功能（適合一次寫多次讀）：

1） namenode：第一關係：維護目錄樹，目錄/檔案的元資料，檔案對應的塊索引

第二關係：維護索引塊與datanode的對映關係

2） datanode：實際儲存檔案塊至本地工作目錄

filesystem fs=filesystem.get(new uri(「hdfs://namenodeip:port」),conf,」root」);//fs實際上是distributedfilesystem

fs.copyfromlocalfile(new path(「本地檔案路徑」),new path(「hdfs路徑」));//上傳檔案到hdfs

fs.mkdir(「/cq」);//hdfs建立乙個目錄

fs.delete(「/cq」,true);//遞迴刪除cq目錄

remoteiteratorlistfiles = fs.listfiles(new path(「/」),true);//返回/下的所有檔案（遞迴）

while(listfiles.hasnext()){

locatedfilestatus file=listfiles.next();

system.out.println(「檔名」+file.getpath().getname());

filestatus liststatus=fs.liststatus(newpath(「/」));//返回/下的所有檔案/目錄

for(filestatus f:liststatus){

system.out.println(「檔名」+file.getpath().getname());//列舉所有的檔案/目錄

blocklocation fileblocklocations=fs.getfileblocklocations(newpath(「/jdk」));//返回檔案對應的索引塊資訊

for(blocklocation location: fileblocklocations){

system.out.println(location.getoffset());

system.out.println(location.getnames()[0]);//獲得檔案索引塊所在的節點資訊

fs.setreplication(new path(「/jdk」),2);//設定檔案的副本數量

fs.exist(new path(「/cq」));//判斷是否存在目錄

fs.close();

hdfs i/o操作

in.seek(6);//定位讀的位置

fileoutputstream out=new fileoutputstream(「c:/cq」);//寫到cq檔案中

ioutils.copybytes(in,out,newconfiguration());//拷貝流

ioutils.closestream(in);

ioutils.closestream(out);

fileinputstream in=new fileinputstream(「c:/cq」);//上傳

fsdataoutputstream out=fs.create(new path(「/jdk.tar.gz」));

ioutils.copybytes(in,out,newconfiguration());//拷貝流

HDFS基本操作

使用方法 hadoop fs ls h r 功能顯示檔案目錄資訊。示例 hadoop fs ls user hadoop file1 使用方法 hadoop fs mkdir p 功能在 hdfs 上建立目錄，p 表示會建立路徑中的各級父目錄。示例 hadoop fs mkdir p user...

HDFS基本命令

hadoop的基本命令與linux命令很相似，這裡列舉一下基本的hdfs命令。命令基本格式 hadoop fs cmd args 但建議使用以下格式 hdfs dfs cmd args 1 ls 列出hdfs檔案系統根目錄下的目錄和檔案 hadoop fs ls 列出hdfs檔案系統所有的目錄和檔案...

HDFS基本命令

hdfs常用命令注以下執行命令均在spark安裝目錄的bin目錄下。path 為路徑 src為檔案路徑 dist 為資料夾 1 help cmd 顯示命令的幫助資訊 hdfs dfs help ls 1 2 ls r 顯示當前目錄下的所有檔案 r層層循出資料夾 hdfs dfs ls log m...

HDFS 基本使用

HDFS基本操作

HDFS基本命令

HDFS基本命令

相關推薦