The HTTP REST APIs supports for most of the file system operation with Hadoop File system like read, write, open, modify and delete files using HTTP GET, POST, PUT and DELETE operations.
HTTP Operations:
1. HTTP GET
OPEN
GETFILESTATUS
LISTSTATUS
GETCONTENTSUMMARY
GETFILECHECKSUM
GETHOMEDIRECTORY
GETDELEGATIONTOKEN
2. HTTP PUT
CREATE
MKDIRS
RENAME
SETREPLICATION
SETOWNER
SETPERMISSION
SETTIMES
RENEWDELEGATIONTOKEN
CANCELDELEGATIONTOKEN
3. HTTP POST
APPEND
4. HTTP DELETE
DELETE
WebHDFS FileSystem URIs:
The FileSystem format of WebHDFS is as below.
webhdfs://<MyHOST>:<HTTP_PORT>/<PATH>
In the webHDFS REST API, the prefix /webhdfs/v1 is inserted in the path and a query is appended at the end.
http://<HOST>:<HTTP_PORT>/webhdfs/v1/<PATH>?op=<OPERATION_NAME>
Configurations:
For enabling webHDFS on your Hadoop cluster you need to add some parameters inside the hdfs-site.xml configuration file, to make HDFS accessible from webHDFS REST APIs.
1. dfs.webhdfs.enabled
This is the basic and mandatory property you need to add into hdfs-site.xml to enabling HDFS access.
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
2. dfs.web.authentication.kerberos.principal
The HTTP Kerberos principal used by Hadoop-Auth in the HTTP endpoint, this is a optional if your are using Kerberos authentication.
3. dfs.web.authentication.kerberos.keytab
The Kerberos keytab file with the credentials for the HTTP Kerberos principal used by Hadoop-Auth in the HTTP endpoint. This is a optional if your are using Kerberos authentication.
File System Operations:
1. Create and Write into file:
There is two step create operation is because of preventing clients to send out data before the redirect.
Step 1: Submit a HTTP GET request
curl -i -X PUT "http://<MyHost>:50070/webhdfs/v1/user/ubantu/input?op=CREATE&overwrite=true&blocksize=1234&replication=1&permission=777&buffersize=123"
The request is redirected to a datanode where the file data is to be written with messgae on console:
HTTP/1.1 307 TEMPORARY_REDIRECT
Location: http://<DATANODE>:<PORT>/webhdfs/v1/<PATH>?op=CREATE...
Content-Length: 0
Step 2:
Submit another HTTP PUT request using the URL in the Location header with the file data to be written.
curl -i -X PUT -T /home/ubantu/hadoop/hadoop-1.0.4/input/data0.txt "http://<DATANODE>:50075/webhdfs/v1/user/ubantu?op=CREATE&user.name=ubantu&Overwrite=true&&blocksize=1234567&Token=amol"
2.Open and Read a File:
For opening a particular file from HDFS you need to know the file path where file is stored and the name of the file, you can use the below HTTP GET request for opening and reading a file form HDFS.
curl -i -L "http://<MyHOST>:50075/webhdfs/v1/user/ubantu/input/data0.txt?op=OPEN&user.name=ubantu&offset=12345&length=12345678&buffersize=123123"
3.Delete a File:
For deleting a file from HDFS you need to submit the HTTP DELETE request as below,
curl -i -X DELETE "http://<MyHOST>:50075/webhdfs/v1/user/ubantu/input/data0.txt?op=DELETE&recursive=true"
4. For setting a permission:
HTTP put request
curl -i -X PUT "http://<MyHOST>:50075/webhdfs/v1/user/ubantu/input/data0.txt?op=SETOWNER&owner=<USER>&group=<GROUP>"
Error response:
When any operation fails the server may thrown a specieific error codes with particular errors like below,
IllegalArgumentException 400 Bad Request
UnsupportedOperationException 400 Bad Request
SecurityException 401 Unauthorized
IOException 403 Forbidden
FileNotFoundException 404 Not Found
RumtimeException 500 Internal Server Error
HTTP Operations:
1. HTTP GET
OPEN
GETFILESTATUS
LISTSTATUS
GETCONTENTSUMMARY
GETFILECHECKSUM
GETHOMEDIRECTORY
GETDELEGATIONTOKEN
2. HTTP PUT
CREATE
MKDIRS
RENAME
SETREPLICATION
SETOWNER
SETPERMISSION
SETTIMES
RENEWDELEGATIONTOKEN
CANCELDELEGATIONTOKEN
3. HTTP POST
APPEND
4. HTTP DELETE
DELETE
WebHDFS FileSystem URIs:
The FileSystem format of WebHDFS is as below.
webhdfs://<MyHOST>:<HTTP_PORT>/<PATH>
In the webHDFS REST API, the prefix /webhdfs/v1 is inserted in the path and a query is appended at the end.
http://<HOST>:<HTTP_PORT>/webhdfs/v1/<PATH>?op=<OPERATION_NAME>
Configurations:
For enabling webHDFS on your Hadoop cluster you need to add some parameters inside the hdfs-site.xml configuration file, to make HDFS accessible from webHDFS REST APIs.
1. dfs.webhdfs.enabled
This is the basic and mandatory property you need to add into hdfs-site.xml to enabling HDFS access.
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
2. dfs.web.authentication.kerberos.principal
The HTTP Kerberos principal used by Hadoop-Auth in the HTTP endpoint, this is a optional if your are using Kerberos authentication.
3. dfs.web.authentication.kerberos.keytab
The Kerberos keytab file with the credentials for the HTTP Kerberos principal used by Hadoop-Auth in the HTTP endpoint. This is a optional if your are using Kerberos authentication.
File System Operations:
1. Create and Write into file:
There is two step create operation is because of preventing clients to send out data before the redirect.
Step 1: Submit a HTTP GET request
curl -i -X PUT "http://<MyHost>:50070/webhdfs/v1/user/ubantu/input?op=CREATE&overwrite=true&blocksize=1234&replication=1&permission=777&buffersize=123"
The request is redirected to a datanode where the file data is to be written with messgae on console:
HTTP/1.1 307 TEMPORARY_REDIRECT
Location: http://<DATANODE>:<PORT>/webhdfs/v1/<PATH>?op=CREATE...
Content-Length: 0
Step 2:
Submit another HTTP PUT request using the URL in the Location header with the file data to be written.
curl -i -X PUT -T /home/ubantu/hadoop/hadoop-1.0.4/input/data0.txt "http://<DATANODE>:50075/webhdfs/v1/user/ubantu?op=CREATE&user.name=ubantu&Overwrite=true&&blocksize=1234567&Token=amol"
2.Open and Read a File:
For opening a particular file from HDFS you need to know the file path where file is stored and the name of the file, you can use the below HTTP GET request for opening and reading a file form HDFS.
curl -i -L "http://<MyHOST>:50075/webhdfs/v1/user/ubantu/input/data0.txt?op=OPEN&user.name=ubantu&offset=12345&length=12345678&buffersize=123123"
3.Delete a File:
For deleting a file from HDFS you need to submit the HTTP DELETE request as below,
curl -i -X DELETE "http://<MyHOST>:50075/webhdfs/v1/user/ubantu/input/data0.txt?op=DELETE&recursive=true"
4. For setting a permission:
HTTP put request
curl -i -X PUT "http://<MyHOST>:50075/webhdfs/v1/user/ubantu/input/data0.txt?op=SETOWNER&owner=<USER>&group=<GROUP>"
Error response:
When any operation fails the server may thrown a specieific error codes with particular errors like below,
IllegalArgumentException 400 Bad Request
UnsupportedOperationException 400 Bad Request
SecurityException 401 Unauthorized
IOException 403 Forbidden
FileNotFoundException 404 Not Found
RumtimeException 500 Internal Server Error
For more details related to webHDFS REST APIs do visit: http://hadoop.apache.org/docs/stable/webhdfs.html 
 
 
 
No comments:
Post a Comment