Hadoop Platform and Application Framework

Methode

Hadoop Platform and Application Framework

Coursera (CC)
Logo von Coursera (CC)
Bewertung: starstarstarstar_halfstar_border 7,2 Bildungsangebote von Coursera (CC) haben eine durchschnittliche Bewertung von 7,2 (aus 6 Bewertungen)

Tipp: Haben Sie Fragen? Für weitere Details einfach auf "Kostenlose Informationen" klicken.

Beschreibung

When you enroll for courses through Coursera you get to choose for a paid plan or for a free plan

  • Free plan: No certicification and/or audit only. You will have access to all course materials except graded items.
  • Paid plan: Commit to earning a Certificate—it's a trusted, shareable way to showcase your new skills.

About this course: This course is for novice programmers or business people who would like to understand the core tools used to wrangle and analyze big data. With no prior experience, you will have the opportunity to walk through hands-on examples with Hadoop and Spark frameworks, two of the most common in the industry. You will be comfortable explaining the specific components and basic processes of the Hadoop architecture, software stack, and execution environment. In the assignments you will be guided in how data scientists apply the important concepts and techniques such as Map-Reduce that are used to solve fundamental problems in big data. You'll feel empowered to have conversation…

Gesamte Beschreibung lesen

Frequently asked questions

Es wurden noch keine FAQ hinterlegt. Falls Sie Fragen haben oder Unterstützung benötigen, kontaktieren Sie unseren Kundenservice. Wir helfen gerne weiter!

Noch nicht den perfekten Kurs gefunden? Verwandte Themen: Hadoop, Big Data, Data Mining, Apache Webserver und Data Warehouse.

When you enroll for courses through Coursera you get to choose for a paid plan or for a free plan

  • Free plan: No certicification and/or audit only. You will have access to all course materials except graded items.
  • Paid plan: Commit to earning a Certificate—it's a trusted, shareable way to showcase your new skills.

About this course: This course is for novice programmers or business people who would like to understand the core tools used to wrangle and analyze big data. With no prior experience, you will have the opportunity to walk through hands-on examples with Hadoop and Spark frameworks, two of the most common in the industry. You will be comfortable explaining the specific components and basic processes of the Hadoop architecture, software stack, and execution environment. In the assignments you will be guided in how data scientists apply the important concepts and techniques such as Map-Reduce that are used to solve fundamental problems in big data. You'll feel empowered to have conversations about big data and the data analysis process.

Created by:  University of California, San Diego
  • Taught by:  Natasha Balac, Director, Predictive Analytics Center of Excellence (PACE)

    San Diego Supercomputer Center
  • Taught by:  Paul Rodriguez, Research Programmer

    San Diego Supercomputer Center (SDSC)
  • Taught by:  Andrea Zonca, HPC Applications Specialist

    San Diego Supercomputer Center (SDSC)
Commitment 5 weeks of study, 1-2 hours/week Language English How To Pass Pass all graded assignments to complete the course. Coursework

Each course is like an interactive textbook, featuring pre-recorded videos, quizzes and projects.

Help from your peers

Connect with thousands of other learners and debate ideas, discuss course material, and get help mastering concepts.

Certificates

Earn official recognition for your work, and share your success with friends, colleagues, and employers.

University of California, San Diego UC San Diego is an academic powerhouse and economic engine, recognized as one of the top 10 public universities by U.S. News and World Report. Innovation is central to who we are and what we do. Here, students learn that knowledge isn't just acquired in the classroom—life is their laboratory.

Syllabus


WEEK 1


Hadoop Basics



Welcome to the first module of the Big Data Platform course. This first module will provide insight into Big Data Hype, its technologies opportunities and challenges. We will take a deeper look into the Hadoop stack and tool and technologies associated with Big Data solutions.


7 videos, 4 readings expand


  1. Video: Hadoop Stack Basics
  2. Video: The Apache Framework: Basic Modules
  3. Video: Hadoop Distributed File System (HDFS)
  4. Video: The Hadoop "Zoo"
  5. Video: Hadoop Ecosystem Major Components
  6. Reading: Apache Hadoop Ecosystem
  7. Reading: Lesson 1 Slides (PDF)
  8. Reading: Hardware & Software Requirements
  9. Video: Exploring the Cloudera VM: Hands-On Part 1
  10. Video: Exploring the Cloudera VM: Hands-On Part 2
  11. Reading: Lesson 2 Slides - Cloudera VM Tour

Graded: Basic Hadoop Stack

WEEK 2


Introduction to the Hadoop Stack
In this module we will take a detailed look at the Hadoop stack ranging from the basic HDFS components, to application execution frameworks, and languages, services.


10 videos, 6 readings expand


  1. Video: Overview of the Hadoop Stack
  2. Video: The Hadoop Distributed File System (HDFS) and HDFS2
  3. Video: MapReduce Framework and YARN
  4. Reading: Hadoop Basics - Lesson 1 Slides
  5. Video: The Hadoop Execution Environment
  6. Video: YARN, Tez, and Spark
  7. Video: Hadoop Resource Scheduling
  8. Reading: Lesson 2: Hadoop Execution Environment - Slides
  9. Video: Hadoop-Based Applications
  10. Video: Introduction to Apache Pig
  11. Video: Introduction to Apache HIVE
  12. Video: Introduction to Apache HBASE
  13. Reading: Lesson 3: Hadoop-Based Applications Overview - All Slides
  14. Reading: Command list for Applications Slides
  15. Reading: Tips to handle service connection errors
  16. Reading: References for Applications

Graded: Overview of Hadoop Stack
Graded: Hadoop Execution Environment
Graded: Hadoop Applications

WEEK 3


Introduction to Hadoop Distributed File System (HDFS)



In this module we will take a detailed look at the Hadoop Distributed File System (HDFS). We will cover the main design goals of HDFS, understand the read/write process to HDFS, the main configuration parameters that can be tuned to control HDFS performance and robustness, and get an overview of the different ways you can access data on HDFS.


9 videos, 5 readings expand


  1. Video: Overview of HDFS Architecture
  2. Video: The HDFS Performance Envelope
  3. Video: Read/Write Processes in HDFS
  4. Reading: Lesson 1: Introduction to HDFS - Slides
  5. Reading: HDFS references
  6. Video: HDFS Tuning Parameters
  7. Video: HDFS Performance and Robustness
  8. Reading: Lesson 2: HDFS Performance and Tuning - Slides
  9. Video: Overview of HDFS Access, APIs, and Applications
  10. Video: HDFS Commands
  11. Video: Native Java API for HDFS
  12. Video: REST API for HDFS
  13. Reading: HDFS Access, APIs
  14. Reading: Lesson 3: HDFS Access, APIs, Applications - Slides

Graded: HDFS Architecture
Graded: HDFS performance,tuning, and robustness
Graded: Accessing HDFS

WEEK 4


Introduction to Map/Reduce



This module will introduce Map/Reduce concepts and practice. You will learn about the big idea of Map/Reduce and you will learn how to design, implement, and execute tasks in the map/reduce framework. You will also learn the trade-offs in map/reduce and how that motivates other tools.


9 videos, 3 readings expand


  1. Video: Introduction to Map/Reduce
  2. Video: The Map/Reduce Framework
  3. Video: A MapReduce Example: Wordcount in detail
  4. Reading: Lesson 1: Introduction to MapReduce - Slides
  5. Reading: A note on debugging map/reduce programs.
  6. Video: MapReduce: Intro to Examples and Principles
  7. Video: MapReduce Example: Trending Wordcount
  8. Video: MapReduce Example: Joining Data
  9. Video: MapReduce Example: Vector Multiplication
  10. Video: Computational Costs of Vector Multiplication
  11. Video: MapReduce Summary
  12. Reading: Lesson 2: MapReduce Examples and Principles - Slides

Graded: Running Wordcount with Hadoop streaming, using Python code
Graded: Lesson 1 Review
Graded: Joining Data

WEEK 5


Spark



Welcome to module 5, Introduction to Spark, this week we will focus on the Apache Spark cluster computing framework, an important contender of Hadoop MapReduce in the Big Data Arena. Spark provides great performance advantages over Hadoop MapReduce,especially for iterative algorithms, thanks to in-memory caching. Also, gives Data Scientists an easier way to write their analysis pipeline in Python and Scala,even providing interactive shells to play live with data.


10 videos, 4 readings expand


  1. Video: Introduction to Apache Spark
  2. Video: Architecture of Spark
  3. Reading: Setup PySpark on the Cloudera VM
  4. Reading: Lesson 1: Intro to Apache Spark - Slides
  5. Video: Resilient Distributed Datasets
  6. Video: Spark Transformations
  7. Video: Wide Transformations
  8. Reading: Lesson 2: RDD and Transformations - Slides
  9. Video: Directed Acyclic Graph (DAG) Scheduler
  10. Video: Actions in Spark
  11. Video: Memory Caching in Spark
  12. Video: Broadcast Variables
  13. Video: Accumulators
  14. Reading: Lesson 3: Scheduling, Actions, Caching - Slides

Graded: Spark Lesson 1
Graded: Spark Lesson 2
Graded: Simple Join in Spark
Graded: Spark Lesson 3
Graded: Advanced Join in Spark

Werden Sie über neue Bewertungen benachrichtigt

Es wurden noch keine Bewertungen geschrieben.

Schreiben Sie eine Bewertung

Haben Sie Erfahrung mit diesem Kurs? Schreiben Sie jetzt eine Bewertung und helfen Sie Anderen dabei die richtige Weiterbildung zu wählen. Als Dankeschön spenden wir € 1,00 an Stiftung Edukans.

Es wurden noch keine FAQ hinterlegt. Falls Sie Fragen haben oder Unterstützung benötigen, kontaktieren Sie unseren Kundenservice. Wir helfen gerne weiter!

Bitte füllen Sie das Formular so vollständig wie möglich aus

(optional)
(optional)
(optional)
(optional)

Haben Sie noch Fragen?

(optional)

Anmeldung für Newsletter

Damit Ihnen per E-Mail oder Telefon weitergeholfen werden kann, speichern wir Ihre Daten.
Mehr Informationen dazu finden Sie in unseren Datenschutzbestimmungen.