Showing posts with the label HDFSShow All

Create encrypted zones in HDFS

To create an HDFS encryption zone first you need to set up HDFS Data at Rest encryption service. For Cloudera distribution follow below: Select Service.

Read more

Lambda vs Delta Architecture - Realtime Analytics on Delta Lake

Before I start details for Delta Architecture lets recap Lambda Architecture first, then you will be able to appreciate the beauty of delta Architecture. Lambda architecture is a popular technique where records are processed by a batch system and streaming system…

Read more

Setting up SELinux mode

Security-Enhanced Linux (SELinux) is a Linux kernel security module that provides a mechanism for supporting access control security policies, including mandatory access controls (MAC).   Without SELinux enabled, only traditional discretionary access control (DAC) …

Read more

Benchmark Hadoop

When we install hadoop we get few jars to test the installation and for benchmarking. In Cloudera distribution: /opt/cloudera/parcels/CDH-6.1.1-1.cdh6.1.1.p0.875250/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar /opt/cloudera/parcels/CDH-6.1.1-1.cdh6.1.1.p0.…

Read more

HDFS ACL

Before start using ACL, make sure it is enable. If you are using Cloudera distribution use below property in HDFS configuration:     Alternatively you can find in yarn-site.xml

Read more

Kerberos and HDFS Encryption

User security contains three parts: Authentication, Authorization and Audit  Authentication simply means verifying who the user claims to be. There are three factors of authentication: Who you are, What you know and What you have  You may have heard the term two-facto…

Read more

HDFS Erasure Coding and RAID basics

HDFS Replication is expensive – the default 3x replication scheme in HDFS has 200% overhead in storage space and other resources (e.g., network bandwidth). However, for warm and cold datasets with relatively low I/O activities, additional block replicas are rarely acc…

Read more

Hadoop Architecture, HA, Failover

MRv1 daemon • Namenode • Secondary namenode • Jobtracker • Datanode • Tasktracker The jobtracker daemon had these two parts tightly coupled within itself and was responsible for managing the tasks and all its related operations by interacting with th…

Read more

General Hadoop FileSystem Shell Commands

The File System (FS) shell includes various shell-like commands that directly interact with the Hadoop Distributed File System (HDFS): syntax: bin/hadoop fs <args>

Read more