Home
Cover
Part I. Hadoop Fundamentals
Chapter 1. Meet Hadoop
Data!
Data Storage and Analysis
Querying All Your Data
Beyond Batch
Comparison with Other Systems
Relational Database Management Systems
Grid Computing
Volunteer Computing
A Brief History of Apache Hadoop
What's in This Book
Chapter 2. MapReduce
A Weather Dataset
Data Format
Analyzing the Data with Unix Tools
Analyzing the Data with Hadoop
Map and Reduce
Java MapRedece
Scaling Out
Data Flow
Combiner Functions
Running a Distributed MapReduce Job
Hadoop Streaming
Ruby
Python
Chapter 3. The Hadoop Distributed Filesystem
THe Design of HDFS
HDFS Concepts
Blocks
Namenodes and Datanodes
Block Caching
HDFS Federation
HDFS High Availability
The Command-Line Interface
Basic FileSystem Operations
Hadoop Filesystems
Interfaces
The Java Interface
Reading Data from a Hadoop URL
Reading Data Using the FileSystem API
Writing Data
Directories
Querying the Filesystem
Deleting Data
Data Flow
Anatomy of a File Read
Anatomy of a File Writw
Coherency Model
Parallel Copying with distcp
Keeping an HDFS Cluster Balanced
Chapter 4. YARN
Anatomy of a YARN Application Run
Resource Requests
Application Lifespan
Building YARN Application
Scheduling in YARN
Scheduler Options
Capacity Scheduler Configuration
Fair Schedular Configuration
Delay Scheduling
Dominant Resource Fairness
Futher Reading
Part II. MapReduce
Chapter 6. Developing a MapReduce Application
The Configuration API
Combining Resources
variable Expansion
Setting Up the Development Environment
Managing Configuration
GenericOptionsParser, Tool, and ToolRunner
Writing a Unit Test with MRUnit
Mapper
Reducer
Running Locally on Test Data
Running a Job in a Local Job Runner
Testing the Driver
Running on a Cluster
Packing a Job
Luanching a Job
The MapReduce Web UI
retrieving the Results
Debugging a Job
hadoop Logs
Remote Debugging
Tuning a Job
Profiling Tasks
MapReduce Workflows
Decomposing a Problem into MapReduce Jobs
JobControl
Apache Oozie
Part IV. Related Projects
Chapter 19. Spark
Installing Spark
An Example
Spark Application, Jobs, Stages, and Tasks
A Scala Standalong Application
A Java Example
A Python Example
Resilient Distributed Datasets
Creation
Transformations and Actions
Persistence
Serialization
Shared Variables
Broadcast Variables
Accumulators
Anatomy of a Spark Job Run
Job Submission
DAG Construction
Task Scheduling
Task Execution
Execution and Cluster Managers
Spark on YARN
Futher Reading
Published with GitBook
Analyzing the Data with Unix Tools
Analyzing the Data with Unix Tools
results matching "
"
No results matching "
"