Big Data Integration and Processing
Beschreibung
When you enroll for courses through Coursera you get to choose for a paid plan or for a free plan .
- Free plan: No certicification and/or audit only. You will have access to all course materials except graded items.
- Paid plan: Commit to earning a Certificate—it's a trusted, shareable way to showcase your new skills.
About this course: At the end of the course, you will be able to: *Retrieve data from example database and big data management systems *Describe the connections between data management operations and the big data processing patterns needed to utilize them in large-scale analytical applications *Identify when a big data problem needs data integration *Execute simple big data integration and processing on Hadoop and Spark platforms This course is for those new to data science. Completion of Intro to Big Data is recommended. No prior programming experience is needed, although the ability to install applications and utilize a virtual machine is necessary to complete the hands-on assignments…
Frequently asked questions
Es wurden noch keine FAQ hinterlegt. Falls Sie Fragen haben oder Unterstützung benötigen, kontaktieren Sie unseren Kundenservice. Wir helfen gerne weiter!
When you enroll for courses through Coursera you get to choose for a paid plan or for a free plan .
- Free plan: No certicification and/or audit only. You will have access to all course materials except graded items.
- Paid plan: Commit to earning a Certificate—it's a trusted, shareable way to showcase your new skills.
About this course: At the end of the course, you will be able to: *Retrieve data from example database and big data management systems *Describe the connections between data management operations and the big data processing patterns needed to utilize them in large-scale analytical applications *Identify when a big data problem needs data integration *Execute simple big data integration and processing on Hadoop and Spark platforms This course is for those new to data science. Completion of Intro to Big Data is recommended. No prior programming experience is needed, although the ability to install applications and utilize a virtual machine is necessary to complete the hands-on assignments. Refer to the specialization technical requirements for complete hardware and software specifications. Hardware Requirements: (A) Quad Core Processor (VT-x or AMD-V support recommended), 64-bit; (B) 8 GB RAM; (C) 20 GB disk free. How to find your hardware information: (Windows): Open System by clicking the Start button, right-clicking Computer, and then clicking Properties; (Mac): Open Overview by clicking on the Apple menu and clicking “About This Mac.” Most computers with 8 GB RAM purchased in the last 3 years will meet the minimum requirements.You will need a high speed internet connection because you will be downloading files up to 4 Gb in size. Software Requirements: This course relies on several open-source software tools, including Apache Hadoop. All required software can be downloaded and installed free of charge (except for data charges from your internet provider). Software requirements include: Windows 7+, Mac OS X 10.10+, Ubuntu 14.04+ or CentOS 6+ VirtualBox 5+.
Created by: University of California, San Diego-
Taught by: Ilkay Altintas, Chief Data Science Officer
San Diego Supercomputer Center -
Taught by: Amarnath Gupta, Director, Advanced Query Processing Lab
San Diego Supercomputer Center (SDSC)
Each course is like an interactive textbook, featuring pre-recorded videos, quizzes and projects.
Help from your peersConnect with thousands of other learners and debate ideas, discuss course material, and get help mastering concepts.
CertificatesEarn official recognition for your work, and share your success with friends, colleagues, and employers.
University of California, San Diego UC San Diego is an academic powerhouse and economic engine, recognized as one of the top 10 public universities by U.S. News and World Report. Innovation is central to who we are and what we do. Here, students learn that knowledge isn't just acquired in the classroom—life is their laboratory.Syllabus
WEEK 1
Welcome to Big Data Integration and Processing
Welcome to the third course in the Big Data Specialization. This week you will be introduced to basic concepts in big data integration and processing. You will be guided through installing the Cloudera VM, downloading the data sets to be used for this course, and learning how to run the Jupyter server.
3 videos, 6 readings expand
- Video: What is in this Course?
- Video: Summary of Big Data Modeling and Management
- Video: Why is Big Data Processing Different?
- Discussion Prompt: Getting to know you: Tell us about yourself and why you are taking this course.
- Reading: Slides: Summary & Why Is Big Data Processing Different
- Reading: Downloading and Installing the Cloudera VM Instructions (Windows)
- Reading: Downloading and Installing the Cloudera VM Instructions (Mac)
- Reading: Software Installation Frequently Asked Questions (FAQ)
- Reading: Instructions for Downloading Hands On Datasets
- Reading: Instructions for Starting Jupyter
Retrieving Big Data (Part 1)
This module covers the various aspects of data retrieval and relational querying. You will also be introduced to the Postgres database.
5 videos, 2 readings expand
- Video: What is Data Retrieval? Part 1
- Video: What is Data Retrieval? Part 2
- Video: Querying Two Relations
- Video: Subqueries
- Reading: Slides: What is Data Retrieval?
- Reading: Querying Relational Data with Postgres
- Video: Querying Relational Data with Postgres
WEEK 2
Retrieving Big Data (Part 2)
This module covers the various aspects of data retrieval for NoSQL data, as well as data aggregation and working with data frames. You will be introduced to MongoDB and Aerospike, and you will learn how to use Pandas to retrieve data from them.
5 videos, 3 readings expand
- Video: Querying JSON Data with MongoDB
- Video: Aggregation Functions
- Discussion Prompt: Let's Discuss: MongoDB
- Video: Querying Aerospike
- Reading: Slides: Querying Data Part 2
- Reading: Querying Documents in MongoDB
- Video: Querying Documents in MongoDB
- Reading: Exploring Pandas DataFrames
- Video: Exploring Pandas DataFrames
Graded: Retrieving Big Data Quiz
Graded: Postgres, MongoDB, and Pandas
WEEK 3
Big Data Integration
In this module you will be introduced to data integration tools including Splunk and Datameer, and you will gain some practical insight into how information integration processes are carried out.
11 videos, 4 readings expand
- Video: Overview of Information Integration
- Video: A Data Integration Scenario
- Video: Integration for Multichannel Customer Analytics
- Discussion Prompt: Let's Discuss: Big Data Integration
- Reading: Slides: Information Integration
- Video: Big Data Management and Processing Using Splunk and Datameer
- Video: Why Splunk?
- Video: Connected Cars with Ford's OpenXC and Splunk
- Video: Big Data Management and Processing using Datameer
- Reading: Downloading Splunk Enterprise
- Video: Installing Splunk Enterprise on Windows
- Video: Installing Splunk Enterprise on Linux
- Reading: Exploring Splunk Queries
- Video: Exploring Splunk Queries
- Reading: Optional: Instructions for Splunk Pivot Tutorial
- Video: Optional: Creating Pivot Reports in Splunk
Graded: Information Integration - Quiz
Graded: Hands-On With Splunk
WEEK 4
Processing Big Data
This module introduces Learners to big data pipelines and workflows as well as processing and analysis of big data using Apache Spark.
9 videos, 4 readings expand
- Video: Big Data Processing Pipelines
- Video: Some High-Level Processing Operations in Big Data Pipelines
- Video: Aggregation Operations in Big Data Pipelines
- Video: Typical Analytical Operations in Big Data Pipelines
- Discussion Prompt: Let's Discuss: Big Data Pipelines in Your World
- Reading: Big Data Processing Pipelines Slides
- Video: Overview of Big Data Processing Systems
- Reading: Big Data Workflow Management
- Video: The Integration and Processing Layer
- Video: Introduction to Apache Spark
- Video: Getting Started with Spark
- Discussion Prompt: Let's Discuss: Big Data Processing Systems
- Reading: Slides for Big Data Processing Tools and Systems
- Reading: WordCount in Spark
- Video: WordCount in Spark
- Discussion Prompt: Let's Discuss: Word Count
Graded: Pipeline and Tools
Graded: WordCount in Spark
WEEK 5
Big Data Analytics using Spark
In this module, you will go deeper into big data processing by learning the inner workings of the Spark Core. You will be introduced to two key tools in the Spark toolkit: Spark MLlib and GraphX.
9 videos, 5 readings expand
- Video: Spark Core: Programming In Spark using RDDs in Pipelines
- Video: Spark Core: Transformations
- Video: Spark Core: Actions
- Reading: Slides for Module 5 Lesson 1
- Video: Spark SQL
- Video: Spark Streaming
- Video: Spark MLLib
- Video: Spark GraphX
- Discussion Prompt: Let's Discuss: The Spark Ecosystem
- Reading: Slides for Module 5 Lesson 2
- Reading: Exploring SparkSQL and Spark DataFrames
- Video: Exploring SparkSQL and Spark DataFrames
- Reading: Instructions for Configuring VirtualBox for Spark Streaming
- Reading: Analyzing Sensor Data with Spark Streaming
- Video: Analyzing Sensor Data with Spark Streaming
Graded: on Spark
Graded: SparkSQL and Spark Streaming
WEEK 6
Learn By Doing: Putting MongoDB and Spark to Work
In this module you will get some practical hands-on experience applying what you learned about Spark and MongoDB to analyze Twitter data.
4 readings expand
- Reading: Let's Analyze Soccer Tweets!
- Reading: Expressing Analytical Questions as MongoDB Queries
- Reading: Exporting Data from MongoDB to a CSV File
- Reading: Analyzing Tweets About Countries
Graded: Check Your Query Results
Graded: Check Your Analysis Results
Werden Sie über neue Bewertungen benachrichtigt
Schreiben Sie eine Bewertung
Haben Sie Erfahrung mit diesem Kurs? Schreiben Sie jetzt eine Bewertung und helfen Sie Anderen dabei die richtige Weiterbildung zu wählen. Als Dankeschön spenden wir € 1,00 an Stiftung Edukans.Es wurden noch keine FAQ hinterlegt. Falls Sie Fragen haben oder Unterstützung benötigen, kontaktieren Sie unseren Kundenservice. Wir helfen gerne weiter!