Intro to Hadoop - Big Data

What does IaaS provide?
- Computing Environment
- Hardware Only
- Software On-Demand
What does PaaS provide?
- Software On-Demand
- Computing Environment
- Hardware Only
What does SaaS provide?
- Software On-Demand
- Computing Environment
- Hardware Only
What are the two key components of HDFS and what are they used for?
- NameNode for metadata and DataNode for block storage.
- NameNode for block storage and Data Node for metadata.
- FASTA for genome sequence and Rasters for geospatial data.
What is the job of the NameNode?
- Coordinate operations and assigns tasks to Data Nodes
- Listens from DataNode for block creation, deletion, and replication.
- For gene sequencing calculations.
What is the order of the three steps to Map Reduce?
- Map -> Reduce -> Shuffle and Sort
- Shuffle and Sort -> Map -> Reduce
- Shuffle and Sort -> Reduce -> Map
- Map -> Shuffle and Sort -> Reduce
What is a benefit of using pre-built Hadoop images?
- Quick prototyping, deploying, and validating of projects.
- Guaranteed hardware support.
- Quick prototyping, deploying, and guaranteed bug free.
- Less software choices to choose from.
What are some examples of open-source tools built for Hadoop and what does it do?
- Pig, for real-time and in-memory processing of big data.
- Zookeeper, analyze social graphs.
- Giraph, for SQL-like queries.
- Zookeeper, management system for animal named related components.
What is the difference between low level interfaces and high level interfaces?
- Low level deals with storage and scheduling while high level deals with interactivity.
- Low level deals with interactivity while high level deals with storage and scheduling.
Which of the following are problems to look out for when integrating your project with Hadoop?
- Advanced Alogrithms
- Infrastructure Replacement
- Task Level Parallelism
- Random Data Access
- Data Level Parallelism
As covered in the slides, which of the following are the major goals of Hadoop?
- Enable Scalability
- Latency Sensitive Tasks
- Facilitate a Shared Environment
- Provide Value for Data
- Optimized for a Variety of Data Types
- Handle Fault Tolerance
What is the purpose of YARN?
- Allows various applications to run on the same Hadoop cluster.
- Enables large scale data across clusters.
- Implementation of Map Reduce.
What are the two main components for a data computation framework that were described in the slides?
- Resource Manager and Container
- Resource Manager and Node Manager
- Node Manager and Applications Master
- Node Manager and Container
- Applications Master and Container

David Nguyen

David Nguyen

Intro to Hadoop - Big Data