Johns Hopkins University
Big Data Processing Using Hadoop Specialization
Johns Hopkins University

Big Data Processing Using Hadoop Specialization

Master Big Data Processing with Hadoop. Gain hands-on experience with Hadoop tools and techniques to efficiently process, analyze, and manage big data in real-world applications.

Karthik Shyamsunder

Instructor: Karthik Shyamsunder

Included with Coursera Plus

Get in-depth knowledge of a subject
Intermediate level

Recommended experience

3 months at 5 hours a week
Flexible schedule
Earn a career credential
Share your expertise with employers
Get in-depth knowledge of a subject
Intermediate level

Recommended experience

3 months at 5 hours a week
Flexible schedule
Earn a career credential
Share your expertise with employers

What you'll learn

  • Gain expertise in Hadoop ecosystem components like HDFS, YARN, and MapReduce for big data processing and management across various tasks.

  • Learn to set up, configure, and utilize tools like Hive, Pig, HBase, and Spark for efficient data analysis, processing, and real-time management.

  • Develop advanced programming techniques for MapReduce, optimization methods, and parallelism strategies to handle large-scale data sets effectively.

  • Understand the architecture and functionality of Hadoop and its components, applying them to solve complex data challenges in real-world scenarios.

Overview

What’s included

Shareable certificate

Add to your LinkedIn profile

Taught in English
45 practice exercises

Advance your subject-matter expertise

  • Learn in-demand skills from university and industry experts
  • Master a subject or tool with hands-on projects
  • Develop a deep understanding of key concepts
  • Earn a career certificate from Johns Hopkins University

Specialization - 4 course series

What you'll learn

  • Define Big Data, explore its relevance in analytics and data science, and understand trends shaping modern data processing technologies.

  • Examine Hadoop architecture, its ecosystem, and subprojects, distinguishing distributions and their roles in Big Data solutions.

  • Acquire practical skills to install, configure, and run Hadoop on a Linux virtual machine, enabling effective Big Data processing.

Skills you'll gain

Apache Hadoop, Distributed Computing, Big Data, Linux, System Configuration, Analytics, Software Installation, Scalability, Data Infrastructure, Data Processing, and Data Science

What you'll learn

  • Understand HDFS architecture, components, and how it ensures scalability and availability for big data processing.

  • Learn to configure Hadoop for Java programming and perform file CRUD operations using HDFS APIs.

  • Master advanced HDFS programming concepts like compression, serialization, and working with specialized file structures like Sequence and Map files.

Skills you'll gain

File Systems, Data Storage, Apache Hadoop, Distributed Computing, Scalability, Java, File Management, Infrastructure Architecture, Development Environment, Big Data, Data Processing, Data Structures, and Systems Architecture

What you'll learn

  • Learn the fundamentals of YARN and MapReduce architectures, including how they work together to process large-scale data efficiently.

  • Understand and implement Mapper and Reducer parallelism in MapReduce jobs to improve data processing efficiency and scalability.

  • Apply optimization techniques such as combiners, partitioners, and compression to enhance the performance and I/O operations of MapReduce jobs.

  • Explore advanced concepts like multithreading, speculative execution, input/output formats, and how to avoid common MapReduce anti-patterns.

Skills you'll gain

Distributed Computing, Data Processing, Apache Hadoop, Performance Tuning, System Configuration, Software Architecture, Big Data, Java, and Scalability

What you'll learn

  • Learn to set up and configure Hive, Pig, HBase, and Spark for efficient big data analysis and processing within the Hadoop ecosystem.

  • Master Hive’s SQL-like queries for data retrieval, management, and optimization using partitions and joins to enhance query performance.

  • Understand Pig Latin for scripting data transformations, including the use of operators like join and debug to process large datasets effectively.

  • Gain expertise in NoSQL databases with HBase for real-time read/write operations, and use Spark’s core programming model for fast data processing.

Skills you'll gain

Query Languages, Apache Spark, NoSQL, Data Transformation, Data Processing, Apache Hadoop, Apache Hive, Big Data, Data Management, SQL, Scripting Languages, and Data Manipulation

Earn a career certificate

Add this credential to your LinkedIn profile, resume, or CV. Share it on social media and in your performance review.

Instructor

Karthik Shyamsunder
Johns Hopkins University
4 Courses1,010 learners

Offered by

Compare with similar products

Rating
Level
Skills
Tools
Last updated
Number of practice exercises
Degree eligibility
Part of Coursera Plus

You might also like

Why people choose Coursera for their career

Felipe M.
Learner since 2018
"To be able to take courses at my own pace and rhythm has been an amazing experience. I can learn whenever it fits my schedule and mood."
Jennifer J.
Learner since 2020
"I directly applied the concepts and skills I learned from my courses to an exciting new project at work."
Larry W.
Learner since 2021
"When I need courses on topics that my university doesn't offer, Coursera is one of the best places to go."
Chaitanya A.
"Learning isn't just about being better at your job: it's so much more than that. Coursera allows me to learn without limits."
Coursera Plus

Open new doors with Coursera Plus

Unlimited access to 10,000+ world-class courses, hands-on projects, and job-ready certificate programs - all included in your subscription

Advance your career with an online degree

Earn a degree from world-class universities - 100% online

Join over 3,400 global companies that choose Coursera for Business

Upskill your employees to excel in the digital economy

Frequently asked questions