Thursday, May 30, 2013

WebHDFS REST API : Complete FileSystem interface for HDFS

The HTTP REST APIs supports for most of the file system operation with Hadoop File system like read, write, open, modify and delete files using HTTP GET, POST, PUT and DELETE operations.

HTTP Operations:
1. HTTP GET
         OPEN
GETFILESTATUS 
LISTSTATUS 
GETCONTENTSUMMARY 
GETFILECHECKSUM 
GETHOMEDIRECTORY 
GETDELEGATIONTOKEN
2. HTTP PUT
CREATE 
MKDIRS 
RENAME 
SETREPLICATION
SETOWNER 
SETPERMISSION
SETTIMES 
RENEWDELEGATIONTOKEN 
CANCELDELEGATIONTOKEN
3. HTTP POST
APPEND
4. HTTP DELETE
DELETE

WebHDFS FileSystem URIs:

The FileSystem format of WebHDFS is as below.
    webhdfs://<MyHOST>:<HTTP_PORT>/<PATH>

In the webHDFS REST API, the prefix /webhdfs/v1 is inserted in the path and a query is appended at the end.   
    http://<HOST>:<HTTP_PORT>/webhdfs/v1/<PATH>?op=<OPERATION_NAME>

Configurations:
For enabling webHDFS on your Hadoop cluster you need to add some parameters inside the hdfs-site.xml configuration file, to make HDFS accessible from webHDFS REST APIs.

1. dfs.webhdfs.enabled 
This is the basic and mandatory property you need to add into hdfs-site.xml to enabling HDFS access.
    <property>
        <name>dfs.webhdfs.enabled</name>
        <value>true</value>
     </property>

2. dfs.web.authentication.kerberos.principal
The HTTP Kerberos principal used by Hadoop-Auth in the HTTP endpoint, this is a optional if your are using Kerberos authentication.

3. dfs.web.authentication.kerberos.keytab
The Kerberos keytab file with the credentials for the HTTP Kerberos principal used by Hadoop-Auth in the HTTP endpoint. This is a optional if your are using Kerberos authentication.

File System Operations:
1. Create and Write into file:
There is two step create operation is because of preventing clients to send out data before the redirect.
Step 1: Submit a HTTP GET request

curl -i -X PUT "http://<MyHost>:50070/webhdfs/v1/user/ubantu/input?op=CREATE&overwrite=true&blocksize=1234&replication=1&permission=777&buffersize=123"

The request is redirected to a datanode where the file data is to be written with messgae on console:

HTTP/1.1 307 TEMPORARY_REDIRECT 
Location: http://<DATANODE>:<PORT>/webhdfs/v1/<PATH>?op=CREATE...
Content-Length: 0

Step 2:
Submit another HTTP PUT request using the URL in the Location header with the file data to be written.

curl -i -X PUT -T /home/ubantu/hadoop/hadoop-1.0.4/input/data0.txt  "http://<DATANODE>:50075/webhdfs/v1/user/ubantu?op=CREATE&user.name=ubantu&Overwrite=true&&blocksize=1234567&Token=amol"

2.Open and Read a File:
For opening a particular file from HDFS you need to know the file path where file is stored and the name of the file, you can use the below HTTP GET request for opening and reading a file form HDFS.

curl -i -L "http://<MyHOST>:50075/webhdfs/v1/user/ubantu/input/data0.txt?op=OPEN&user.name=ubantu&offset=12345&length=12345678&buffersize=123123" 

3.Delete a File:
For deleting a file from HDFS you need to submit the HTTP DELETE request as below,

curl -i -X DELETE "http://<MyHOST>:50075/webhdfs/v1/user/ubantu/input/data0.txt?op=DELETE&recursive=true"

4. For setting a permission:
HTTP put request

curl -i -X PUT "http://<MyHOST>:50075/webhdfs/v1/user/ubantu/input/data0.txt?op=SETOWNER&owner=<USER>&group=<GROUP>"

Error response:
When any operation fails the server may thrown a specieific error codes with particular errors like below,

IllegalArgumentException                     400 Bad Request
UnsupportedOperationException          400 Bad Request
SecurityException                                 401 Unauthorized
IOException                                         403 Forbidden
FileNotFoundException                        404 Not Found
RumtimeException                                500 Internal Server Error

For more details related to webHDFS REST APIs do visit: http://hadoop.apache.org/docs/stable/webhdfs.html 

Followers