Techie Talks: Hadoop course

1. An Introduction To Hadoop And HDFS

Why Hadoop?
HDFS
MapReduce
Hive, Pig, HBase and other ecosystem projects
Hands-On Exercise: Installing a pseudo-distributed cluster

2. Planning Your Hadoop Cluster

General Planning Considerations
Choosing The Right Hardware
Node Topologies
Choosing The Right Software

3. Deploying Your Cluster

Installing Hadoop
Using SCM Express for easy installation
Typical Configuration Parameters
Configuring Rack Awareness
Using Configuration Management Tools
Hands-On Exercise: Installing a Hadoop Cluster

4. Cluster Maintenance

Checking HDFS with fsck
Hands-On Exercise: Breaking the Cluster
Copying data with distcp
Rebalancing cluster nodes
Adding and removing cluster nodes
Hands-On Exercise: Verifying the Cluster's Self-Healing Features
Backup And Restore
Upgrading and Migrating
Hands-On Exercise: Backing Up and Restoring the NameNode Metadata

5. Cloudera Certified Administrator Exam

Following the training, attendees will have an opportunity to take the Cloudera Certified Hadoop Admininstrator exam

6. Managing and Scheduling Jobs

Starting and stopping MapReduce jobs
Hands-On Exercise: Managing jobs
The FIFO Scheduler
The Fair Scheduler
Hands-On Exercise: Using the FairScheduler

7. Installing And Managing Other Hadoop Projects

Hive
Pig
HBase
Hands-On Exercise: Configuring the Hive Shared Metastore

8. Populating HDFS From External Sources

Using Sqoop
Using Flume
Best Practices for Data Ingestion

9. Cluster Monitoring, Troubleshooting and Optimizing

Hadoop Log Files
Using the NameNode and JobTracker Web UIs
Interpreting Job Logs
Monitoring with Ganglia
Other monitoring tools
General Optimization Tips
Benchmarking Your Cluster

As a developer we should know the following

1. The Motivation For Hadoop

    Problems with traditional large-scale systems
    Requirements for a new approach

2. Hadoop: Basic Concepts

    An Overview of Hadoop
    The Hadoop Distributed File System
    Hands-On Exercise
    How MapReduce Works
    Hands-On Exercise
    Anatomy of a Hadoop Cluster
    Other Hadoop Ecosystem Components

3. Writing a MapReduce Program

    The MapReduce Flow
    Examining a Sample MapReduce Program
    Basic MapReduce API Concepts
    The Driver Code
    The Mapper
    The Reducer
    Hadoop’s Streaming API
    Using Eclipse for Rapid Development
    Hands-on exercise

4. Integrating Hadoop Into The Workflow

    Relational Database Management Systems
    Storage Systems
    Importing Data from RDBMSs With Sqoop
    Hands-On Exercise
    Importing Real-Time Data with Flume
    Accessing HDFS Using FuseDFS and Hoop

5. More Advanced MapReduce Programming

    Custom Writables and WritableComparables
    Saving Binary Data using SequenceFiles and Avro Files
    Creating InputFormats and OutputFormats
    Hands-on exercise

6. Graph Manipulation in Hadoop

    Introduction to graph techniques Representing graphs in Hadoop Implementing a sample algorithm: Single Source Shortest Path

7. Cloudera Certified Hadoop Developer Exam

    Following the training, attendees will have an opportunity to take the Cloudera Certified Hadoop Developer exam

8. Using Hive and Pig

    Hive Basics Pig Basics Hands-on exercise

9. Delving Deeper Into The Hadoop API

    Using LocalJobRunner Mode for Faster Development Reducing Intermediate Data With Combiners The configure and close methods for Map/Reduce Setup and Teardown Writing Partitioners for Better Load Balancing Directly Accessing HDFS Using the Distributed Cache Hands-On Exercise

10. Practical Development Tips and Techniques

Testing with MRUnit Debugging MapReduce Code Using LocalJobRunner Mode For Easier Debugging Retrieving Job Information with Counters Logging Splittable File Formats Determining the Optimal Number of Reducers Map-Only MapReduce Jobs Implementing Multiple Mappers using ChainMapper Hands-On Exercise

11. Common MapReduce Algorithms

Sorting and Searching Indexing Machine Learning With Mahout Term Frequency – Inverse Document Frequency Word Co-Occurrence Hands-On Exercise

12. Joining Data Sets in MapReduce Jobs

Map-Side Joins The Secondary Sort Reduce-Side Joins Hands-On Exercise

13. Creating Workflows with Oozie

The Motivation for Oozie Oozie's Workflow Definition Format Hands-On Exercise

Syllabus guidelines for Developer exam

Core Hadoop Concepts

Storing Files in Hadoop

Job Configuration and Submission

Job Execution Environment

Input and Output

Job Lifecycle

Data processing

Key and Value Types

Common Algorithms and Design Patterns

The Hadoop Ecosystem

Syllabus guidelines for Admin exam

Apache Hadoop Cluster Core Technologies

Apache Hadoop Cluster Planning

Apache Hadoop Cluster Management

Job Scheduling

Monitoring and Logging

refer

http://www.philippeadjiman.com/blog/2009/12/07/hadoop-tutorial-part-1-setting-up-your-mapreduce-learning-playground/
http://www.cs.bgu.ac.il/~dsp112/The_Map-Reduce_Pattern
http://www.philippeadjiman.com/blog/2009/12/20/hadoop-tutorial-series-issue-2-getting-started-with-customized-partitioning/

Techie Talks

Monday, 4 June 2012

Hadoop course

No comments:

Post a Comment