Configure Proxy for HiveServer2 and Impala



We will use here HAProxy which is an open-source HA load balancer and proxy server for TCP and HTTP based applications. Ngnix is not recommended as we do not have webserver traffic to load balance. Let’s first install HAProxy on the proxy server.

yum -y install haproxy
systemctl enable haproxy

NOTE: do not start, will start haproxy after configuration completion

HAPROXY FOR HS2

Ensure that your Hiveserver2 is running in more than 1 hosts, for example, if it is running on server cloudera-node2 and cloudera-node3 and cloudera-node1 is a gateway server by default you can connect with beeline on port 10000 like below:
[hasnain@cloudera-node1 ~]$ beeline -u jdbc:hive2://cloudera-node2:10000
Beeline version 2.1.1-cdh6.1.1 by Apache Hive
0: jdbc:hive2://cloudera-node2:10000>

[hasnain@cloudera-node1 ~]$ beeline -u jdbc:hive2://cloudera-node3:10000
0: jdbc:hive2://cloudera-node3:10000>

To configure haproxy open below file
vi /etc/haproxy/haproxy.cfg

# This is the setup for HS2. beeline client connect to load_balancer_host:10001.
# HAProxy will balance connections among the list of servers listed below.
#tcp – connection mode between haproxy to the hive servers
listen hiveserver2 :10001
    mode tcp
    option tcplog
    balance source
    server cloudera-node2-hive2 cloudera-node2:10000
    server cloudera-node3-hive2 cloudera-node3:10000

Now we can start HAProxy using below:

systemctl start haproxy
ps -ef | grep haproxy      à To check process Status

[root@cloudera-node1 hasnain]# beeline -u jdbc:hive2://cloudera-node1:10001
0: jdbc:hive2://cloudera-node1:10001>

Now come to the CM – Hive – configuration – search ‘load balancer’ and provide the haproxy server detail and the port (10001) in which haproxy is listening for Hiveserver2.

VERIFY:
[root@cloudera-node1 hasnain]# systemctl stop haproxy
[root@cloudera-node1 hasnain]# beeline -u jdbc:hive2://cloudera-node1:10001
Error: Could not open client transport with JDBC Uri: jdbc:hive2://cloudera-node1:10001: java.net.ConnectException: Connection refused (Connection refused) (state=08S01,code=0)

NOTE: The balancing algorithms are used to decide which server at the backend each connection is transferred to. Some:

Roundrobin: Each server is used in turns according to their weights. This is the smoothest and fairest algorithm when the servers’ processing time remains equally distributed. This algorithm is dynamic, which allows server weights to be adjusted on the fly. Not recommended for Impala.

Leastconn: The server with the lowest number of connections is chosen. Round-robin is performed between servers with the same load. Using this algorithm is recommended with long sessions, such as LDAP, SQL, TSE, etc, but it is not very well suited for short sessions such as HTTP.  Recommended for Impala.

Source: The source IP address is hashed and divided by the total weight of the running servers to designate which server will receive the request. This way the same the client IP address will always reach the same server while the servers stay the same.

HAPROXY FOR IMPALA

If Impala daemon running on cloudera-node2, cloudera-node3 and cloudera-node4 we can connect either from impala-shell or from beeline like below:

impala-shell -i cloudera-node2
[cloudera-node2:21000] default>

[root@cloudera-node1~]# beeline -u 'jdbc:hive2://cloudera-node1:21050/default;auth=noSasl'
0: jdbc:hive2:// cloudera-node1:21050/default>

Note: In impala-shell, JDBC applications, or ODBC applications, connect to the listener port of the proxy host, rather than port 21000 or 21050 on a host actually running impalad. The sample configuration the file sets haproxy to listen on port 25003, therefore you would send all requests to haproxy_host:25003.


To configure haproxy open below file 
vi /etc/haproxy/haproxy.cfg

# This is the setup for Impala. Impala client connect to load_balancer_host:25003.
# HAProxy will balance connections among the list of servers listed below.
# The list of Impalad is listening at port 21000 for beeswax (impala-shell) or original ODBC driver.
# For JDBC or ODBC version 2.x driver, use port 21050 instead of 21000.
listen impala :25003
    mode tcp
    option tcplog
    balance leastconn

    server cloudera-node2-impalas cloudera-node2:21000 check
    server cloudera-node3-impalas cloudera-node3:21000 check
    server cloudera-node4-impalas cloudera-node4:21000 check

# Setup for Hue or other JDBC-enabled applications.
# In particular, Hue requires sticky sessions.
# The application connects to load_balancer_host:21051, and HAProxy balances
# connections to the associated hosts, where Impala listens for JDBC
# requests on port 21050.
listen impalajdbc :21051
    mode tcp
    option tcplog
    balance source
    server cloudera-node2-impalaj cloudera-node2:21050 check
    server cloudera-node3-impalaj cloudera-node3:21050 check
    server cloudera-node4-impalaj cloudera-node4:21050 check

Now start HAProxy and come to the CM – Impala – configuration – search ‘load balancer’ and provide the haproxy server detail and the port (25003) in which haproxy is listening for Impalad.

Testing:

[root@cloudera-node1 ~]# impala-shell -i cloudera-node1:25003
Starting Impala Shell without Kerberos authentication
Opened TCP connection to cloudera-node1:25003
Connected to cloudera-node1:25003
Server version: impalad version 3.1.0-cdh6.1.1 RELEASE (build 97215ce79febfa42364dbff8e4c4d3c5bfc583ba)
***********************************************************************************
Welcome to the Impala shell.
(Impala Shell v3.1.0-cdh6.1.1 (97215ce) built on Thu Feb  7 23:24:56 PST 2019)

Press TAB twice to see a list of available commands.
***********************************************************************************
[cloudera-node1:25003] default>

[root@cloudera-node1 ~]# beeline -u 'jdbc:hive2://cloudera-node1:21051/default;auth=noSasl'
WARNING: Use "yarn jar" to launch YARN applications.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-6.1.1-1.cdh6.1.1.p0.875250/jars/log4j-slf4j-impl-2.8.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-6.1.1-1.cdh6.1.1.p0.875250/jars/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Connecting to jdbc:hive2://cloudera-node1:21051/default;auth=noSasl
Connected to: Impala (version 3.1.0-cdh6.1.1)
Driver: Hive JDBC (version 2.1.1-cdh6.1.1)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 2.1.1-cdh6.1.1 by Apache Hive
0: jdbc:hive2://cloudera-node1:21051/default> !tables
 +------------+--------------+-------------+-------------+----------+
| TABLE_CAT  | TABLE_SCHEM  | TABLE_NAME  | TABLE_TYPE  | REMARKS  |
+------------+--------------+-------------+-------------+----------+
|            | default      | temp        | TABLE       |          |
|            | default      | temp1       | TABLE       |          |
|            | testdb       | mytesble    | TABLE       |          |
+------------+--------------+-------------+-------------+----------+

Post a Comment

Thanks for your comment !
I will review your this and will respond you as soon as possible.