HttpFS : Introduction
Apache Hadoop HttpFS is a service that provides HTTP access to HDFS.
HttpFS provides a REST HTTP gateway supports HDFS operations like read and write, It can be used to transfer data between clusters running different versions of Hadoop. Also HttpFS can be used to access data in HDFS using HTTP utilities.
HttpFS was inspired by Hadoop HDFS proxy, It can be seening as a full rewrite of Hadoop HDFS proxy.
Hadoop
HDFS proxy provides a subset of file system operations (read only), Its also provides support for all file system operations.
HttpFS uses a clean HTTP REST API making its use with HTTP tools more intuitive.
About security, HttpFS supports Hadoop pseudo-authentication, HTTP SPNEGO Kerberos, and
additional authentication mechanisms via a plugin API. HttpFS also
supports Hadoop proxy user functionality.
For more information about HttpFS, see http://archive.cloudera.com/cdh4/cdh/4/hadoop/hadoop-hdfs-httpfs/index.html.
HttpFS : Installation
Prerequisites for installing HttpFS are:
- Java 6+
- Maven 3+
Installing HttpFS
HttpFS is distributed in the hadoop-httpfs package. To install
it, use your preferred package manager application. Install the package
on the system that will run the HttpFS server.
$ sudo yum install hadoop-httpfs //
on a Red Hat-compatible system
$ sudo zypper install hadoop-httpfs
/ /on a SLES system $ sudo apt-get install hadoop-httpfs //
on an Ubuntu or Debian system
or If you have a httpfs tarball then you can simply untar it,
$ tar xzf httpfs-2.0.3-alpha.tar.gz
now you are ready to configure HttpFS.
Configure HttpFS
HttpFS reads the HDFS configuration from the core-site.xml and hdfs-site.xml files in /etc/hadoop/conf/. If necessary edit those files to configure the HDFS HttpFS will use. By default, HttpFS assumes that Hadoop configuration files (core-site.xml & hdfs-site.xml) are in the HttpFS configuration directory.Configure Hadoop
Edit Hadoop core-site.xml and defined the Unix user that will run the HttpFS server as a proxyuser. For example:
<property> <name>hadoop.proxyuser.myhttpfsuser.hosts</name> <value>httpfs-host.foo.com</value> </property> <property> <name>hadoop.proxyuser.myhttpfsuser.groups</name> <value>*</value> </property>
Note : Please replace "myhttpfsuser" to your httpfs host name.
IMPORTANT : You need to restart Hadoop for the proxyuser configuration
become active.
Starting/Stopping the HttpFS Server
To start/stop HttpFS use HttpFS's bin/httpfs.sh script. For example:
httpfs-2.0.3-alpha $ bin/httpfs.sh start --> for start
httpfs-2.0.3-alpha $ bin/httpfs.sh stop --> for stop
Test HttpFS is working
A tool such as curl to access HDFS via HttpFS. For example, to obtain the home directory of the user ubantu, use a command such as this:
$ curl -i "http://<MyHttpFSHostName>:14000?user.name=ubantu&op=homedir"
HTTP/1.1 200 OK
Content-Type: application/json
Transfer-Encoding: chunked
{"homeDir":"http:\/\/<MyHttpFSHostName>:14000\/user\/ubantu"}
$ curl "http://<MyHttpFSHostName>:14000/webhdfs/v1?op=homedir&user.name=ubantu" HTTP/1.1 200 OK Server: Apache-Coyote/1.1 Set-Cookie: hadoop.auth="u=ubantu&p=ubantu&t=simple =4558977754545&s=wtFFgaGHHJFGffasWXK68rc /0xI=";Version=1; Path=/ Content-Type: application/json Transfer-Encoding: chunked Date: Wed, 28 Mar 2012 13:35:55 GMT {"Path":"\/user\/MyHttpFSHostName"}
No comments:
Post a Comment