Skip to main content

Command Palette

Search for a command to run...

Graded Assignment: Understand by Doing: MapReduce - Big Data

Updated
2 min read
Graded Assignment: Understand by Doing: MapReduce - Big Data
D

A passionate full-stack developer from @ePlus.DEV

Instructions

MapReduce is the core programming model for the Hadoop Ecosystem. We’ve found it’s really helpful to walk through the steps of MapReduce for yourself in order to internalize how it really works. In video lecture, we walked through the steps of MapReduce to count words -- our keys were words. In this exercise, we’ll have you count shapes -- the keys will be shapes.

Note: This assignment can be done in PPT and printed to PDF or on paper and submitted as a picture. Template in PPT, template in JPG.

Download: PeerReviewforUpload.pptx

Your job is to perform the steps of MapReduce to calculate a count of the number of squares, stars, circles, hearts and triangles in the dataset shown in the picture above. You should follow the steps of MapReduce as they were explained in this video.

Step 0: Store the dataset across 4 partitions in HDFS. Note: we have already done one partition for you. Hint: Balance the load, but there is more than on possible “correct” partitioning.

Step 1: Map the data. Hint: Mapping involves clustering like keys together. Show this in the visual placement of keys within a partition.

Step 2: Sort and Shuffle. Note: as mentioned in lecture, you don’t have to use the same number of nodes in this step as you did before. Let’s use three instead. Hint: Balance the load.

Step 3: Reduce to calculate the final counts. Hint: Fill in the blank lines to finalize the key-value pairs

Modification: Simplify drawing the key-value pair

The “Map” stage of MapReduce generates key-value pairs. For example, in the video we saw:

my, my ->  (my, 1), (my,1)

Showing that two instances of the word “my” would get mapped to two key-value pairs. You might have noticed that until the Reduce step, the value in all key-value pairs is 1. To make this activity less cluttered visually, we will have you leave out the “,1” part of each key-value pair, and just represent a key-value pair with the appropriate image.


Solution of course

Download file: Map Reduce.pdf

C

nothing

Big Data

Part 16 of 20

Big Data refers to vast volumes of structured, semi-structured, and unstructured data generated rapidly from various sources. It’s analyzed for insights, aiding decision-making in diverse fields.

Up next

Intro to Hadoop - Big Data

What does IaaS provide? Computing Environment Hardware Only Software On-Demand What does PaaS provide? Software On-Demand Computing Environment Hardware Only What does SaaS provide? Software On-Demand Computing Environment Hardware Onl...