Course Overview

Big Data Hadoop Administration
5 star rating

Overview

With the advent of Hadoop, there comes the need for professionals skilled in Hadoop Administration, making it imperative to be skilled as a Hadoop Admin for better career, salary and job opportunities. This course is designed to provide the requisite knowledge and skills for you to become a successful Hadoop architect, big data engineer or Hadoop administrator.

It begins with tutorials on the fundamental concepts of Apache Hadoop and Hadoop Cluster, It enables you to deploy, configure, manage, monitor, and secure a Hadoop Cluster. The course will also provide a brief on Hive & HBase Administration.There course will further include many challenging, practical and focused hands-on exercises. Towards end of the course, you will be able to understand and solve real industry-relevant problems that you will encounter while working on Hadoop Cluster.

Course Content

Introduction to Hadoop

  • The amount of data processing in today’s life
  • What Hadoop is why it is important?
  • Hadoop comparison with traditional systems
  • Hadoop history
  • Hadoop main components and architecture

Hadoop Distributed File System (HDFS)

  • HDFS overview and design
  • HDFS architecture
  • HDFS file storage
  • Component failures and recoveries
  • Block placement
  • Balancing the Hadoop cluster

Planning your Hadoop cluster

  • Planning a Hadoop cluster and its capacity
  • Hadoop software and hardware configuration
  • HDFS Block replication and rack awareness
  • Network topology for Hadoop cluster

Hadoop Deployment

  • Different Hadoop deployment types
  • Hadoop distribution options
  • Hadoop competitors
  • Hadoop installation procedure
  • Distributed cluster architecture

Working with HDFS

  • Ways of accessing data in HDFS
  • Common HDFS operations and commands
  • Different HDFS commands
  • Internals of a file read in HDFS
  • Data copying with ‘distcp’

Map-Reduce Abstraction

  • What MapReduce is and why it is popular
  • The Big Picture of the MapReduce
  • MapReduce process and terminology
  • MapReduce components failures and recoveries
  • Working with MapReduce

Hadoop Cluster Configuration

  • Hadoop configuration overview and important configuration file
  • Configuration parameters and values
  • HDFS parameters MapReduce parameters
  • Hadoop environment setup
  • ‘Include’ and ‘Exclude’ configuration files

Hadoop Administration and Maintenance

  • Namenode/Datanode directory structures and files
  • File system image and Edit log
  • The Checkpoint Procedure
  • Namenode failure and recovery procedure
  • Safe Mode
  • Metadata and Data backup
  • Potential problems and solutions / what to look for
  • Adding and removing nodes

Hadoop Monitoring and Troubleshooting

  • Best practices of monitoring a Hadoop cluster
  • Using logs and stack traces for monitoring and troubleshooting
  • Using open-source tools to monitor Hadoop cluster

Job Scheduling

  • How to schedule Hadoop Jobs on the same cluster
  • Default Hadoop FIFO Schedule
  • Fair Scheduler and its configuration

Hadoop Multi Node Cluster Setup and Running Map Reduce Jobs on Amazon Ec2

  • Hadoop Multi Node Cluster Setup using Amazon ec2 – Creating 4 node cluster setup
  • Running Map Reduce Jobs on Cluster.
Pre Requisite

hands-on experience in Core Java and good analytical skills
experience of Linux environment will help

Required Exam

Big Data Hadoop Administration

Duration
hideRegular TrackFast Track
Duration 3 weeks 1 week

Success Stories

Trained 1000+ Students From 10+ Countries

Blog