We will use here HAProxy which is an open-source HA load
balancer and proxy server for TCP and HTTP based applications. Ngnix is not recommended
as we do not have webserver traffic to load balance. Let’s first install HAProxy
on the proxy server.
yum -y install haproxy
systemctl enable haproxy
NOTE: do not start, will start haproxy after configuration
completion
HAPROXY FOR HS2
Ensure that your Hiveserver2 is running in more than 1 hosts,
for example, if it is running on server cloudera-node2 and cloudera-node3 and
cloudera-node1 is a gateway server by default you can connect with beeline on
port 10000 like below:
[hasnain@cloudera-node1 ~]$ beeline -u jdbc:hive2://cloudera-node2:10000
Beeline version 2.1.1-cdh6.1.1 by Apache Hive
0: jdbc:hive2://cloudera-node2:10000>
[hasnain@cloudera-node1 ~]$ beeline -u jdbc:hive2://cloudera-node3:10000
0: jdbc:hive2://cloudera-node3:10000>
To configure haproxy open below file
vi
/etc/haproxy/haproxy.cfg
# This is the setup for HS2. beeline client connect
to load_balancer_host:10001.
# HAProxy will balance connections among the list of
servers listed below.
#tcp – connection mode between haproxy to the hive
servers
listen hiveserver2 :10001
mode tcp
option
tcplog
balance source
server cloudera-node2-hive2
cloudera-node2:10000
server cloudera-node3-hive2
cloudera-node3:10000
Now we can start HAProxy using below:
systemctl
start haproxy
ps -ef | grep haproxy à To check process Status
[root@cloudera-node1 hasnain]# beeline -u jdbc:hive2://cloudera-node1:10001
0: jdbc:hive2://cloudera-node1:10001>
Now come to the CM – Hive – configuration – search ‘load
balancer’ and provide the haproxy server detail and the port (10001) in which
haproxy is listening for Hiveserver2.
VERIFY:
[root@cloudera-node1 hasnain]# systemctl stop
haproxy
[root@cloudera-node1 hasnain]# beeline -u
jdbc:hive2://cloudera-node1:10001
Error: Could not open client transport with JDBC
Uri: jdbc:hive2://cloudera-node1:10001: java.net.ConnectException: Connection
refused (Connection refused) (state=08S01,code=0)
NOTE: The
balancing algorithms are used to decide which server at the backend each
connection is transferred to. Some:
Roundrobin: Each
server is used in turns according to their weights. This is the smoothest and
fairest algorithm when the servers’ processing time remains equally
distributed. This algorithm is dynamic, which allows server weights to be
adjusted on the fly. Not recommended for Impala.
Leastconn: The server
with the lowest number of connections is chosen. Round-robin is performed
between servers with the same load. Using this algorithm is recommended with
long sessions, such as LDAP, SQL, TSE, etc, but it is not very well suited for
short sessions such as HTTP. Recommended for Impala.
Source: The
source IP address is hashed and divided by the total weight of the running
servers to designate which server will receive the request. This way the same the client IP address will always reach the same server while the servers stay the
same.
HAPROXY FOR IMPALA
If
Impala daemon running on cloudera-node2, cloudera-node3 and cloudera-node4 we
can connect either from impala-shell or from beeline like below:
impala-shell
-i cloudera-node2
[cloudera-node2:21000] default>
[root@cloudera-node1~]# beeline -u 'jdbc:hive2://cloudera-node1:21050/default;auth=noSasl'
0: jdbc:hive2:// cloudera-node1:21050/default>
Note: In impala-shell, JDBC applications, or ODBC
applications, connect to the listener port of the proxy host, rather than port
21000 or 21050 on a host actually running impalad. The sample configuration the file sets haproxy to listen on port 25003, therefore you would send all
requests to haproxy_host:25003.
To configure haproxy open below file
vi
/etc/haproxy/haproxy.cfg
# This is the setup for Impala. Impala client
connect to load_balancer_host:25003.
# HAProxy will balance connections among the list of
servers listed below.
# The list of Impalad is listening at port 21000 for
beeswax (impala-shell) or original ODBC driver.
# For JDBC or ODBC version 2.x driver, use port
21050 instead of 21000.
listen impala
:25003
mode tcp
option
tcplog
balance leastconn
server cloudera-node2-impalas
cloudera-node2:21000 check
server cloudera-node3-impalas
cloudera-node3:21000 check
server cloudera-node4-impalas
cloudera-node4:21000 check
# Setup for Hue or other JDBC-enabled applications.
# In particular, Hue requires sticky sessions.
# The application connects to
load_balancer_host:21051, and HAProxy balances
# connections to the associated hosts, where Impala
listens for JDBC
# requests on port 21050.
listen impalajdbc
:21051
mode tcp
option
tcplog
balance source
server cloudera-node2-impalaj
cloudera-node2:21050 check
server cloudera-node3-impalaj
cloudera-node3:21050 check
server cloudera-node4-impalaj
cloudera-node4:21050 check
Now start HAProxy and come to the CM – Impala –
configuration – search ‘load balancer’ and provide the haproxy server detail
and the port (25003) in which haproxy is listening for Impalad.
Testing:
[root@cloudera-node1
~]# impala-shell
-i cloudera-node1:25003
Starting Impala Shell without Kerberos
authentication
Opened TCP connection to cloudera-node1:25003
Connected to cloudera-node1:25003
Server version: impalad version 3.1.0-cdh6.1.1
RELEASE (build 97215ce79febfa42364dbff8e4c4d3c5bfc583ba)
***********************************************************************************
Welcome to the Impala shell.
(Impala Shell v3.1.0-cdh6.1.1 (97215ce) built on Thu
Feb 7 23:24:56 PST 2019)
Press TAB twice to see a list of available commands.
***********************************************************************************
[cloudera-node1:25003] default>
[root@cloudera-node1
~]# beeline -u
'jdbc:hive2://cloudera-node1:21051/default;auth=noSasl'
WARNING: Use "yarn jar" to launch YARN
applications.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in
[jar:file:/opt/cloudera/parcels/CDH-6.1.1-1.cdh6.1.1.p0.875250/jars/log4j-slf4j-impl-2.8.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-6.1.1-1.cdh6.1.1.p0.875250/jars/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings
for an explanation.
SLF4J: Actual binding is of type
[org.apache.logging.slf4j.Log4jLoggerFactory]
Connecting to jdbc:hive2://cloudera-node1:21051/default;auth=noSasl
Connected to: Impala (version 3.1.0-cdh6.1.1)
Driver: Hive JDBC (version 2.1.1-cdh6.1.1)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 2.1.1-cdh6.1.1 by Apache Hive
0:
jdbc:hive2://cloudera-node1:21051/default> !tables
+------------+--------------+-------------+-------------+----------+
| TABLE_CAT |
TABLE_SCHEM | TABLE_NAME | TABLE_TYPE
| REMARKS |
+------------+--------------+-------------+-------------+----------+
| |
default | temp | TABLE | |
| | default | temp1 | TABLE | |
| |
testdb | mytesble | TABLE | |
+------------+--------------+-------------+-------------+----------+
Post a Comment
Post a Comment
Thanks for your comment !
I will review your this and will respond you as soon as possible.