Apache Spark Basics
placeKöln 16. Apr 2026 bis 17. Apr 2026 |
computer Online: Zoom 16. Apr 2026 bis 17. Apr 2026 |
placeKöln 23. Jul 2026 bis 24. Jul 2026 |
computer Online: Zoom 23. Jul 2026 bis 24. Jul 2026 |
placeKöln 10. Dez 2026 bis 11. Dez 2026 |
computer Online: Zoom 10. Dez 2026 bis 11. Dez 2026 |
Schulungen der Extraklasse ✔ Durchführungsgarantie ✔ Trainer aus der Praxis ✔ Kostenfreies Storno ✔ 3=2 Kostenfreie Teilnahme für den Dritten ✔ Persönliche Lernumgebung ✔ Kleine Lerngruppen
Seminarziel
The goal of the Apache Spark Basics course is to provide participants with a solid understanding of Apache Spark and its fundamental concepts. By the end of the course, participants should be able to understand the challenges of big data processing and the advantages of Spark. They will gain comprehension of Spark's architecture and its components, such as the driver, executor, and cluster manager. Participants will also learn how to work with Resilient Distributed Datasets (RDDs) and perform various transformations and actions on them. Additionally, they will acquire knowledge of Spark Streaming for real-time data processing and gain the ability to integrate Spark with …Es wurden noch keine FAQ hinterlegt. Falls Sie Fragen haben oder Unterstützung benötigen, kontaktieren Sie unseren Kundenservice. Wir helfen gerne weiter!
Schulungen der Extraklasse ✔ Durchführungsgarantie ✔ Trainer aus der Praxis ✔ Kostenfreies Storno ✔ 3=2 Kostenfreie Teilnahme für den Dritten ✔ Persönliche Lernumgebung ✔ Kleine Lerngruppen
Seminarziel
The goal of the Apache Spark Basics course is to provide participants with a solid understanding of Apache Spark and its fundamental concepts. By the end of the course, participants should be able to understand the challenges of big data processing and the advantages of Spark. They will gain comprehension of Spark's architecture and its components, such as the driver, executor, and cluster manager. Participants will also learn how to work with Resilient Distributed Datasets (RDDs) and perform various transformations and actions on them. Additionally, they will acquire knowledge of Spark Streaming for real-time data processing and gain the ability to integrate Spark with other technologies like Flume, Kafka, and Cassandra. Through hands-on exercises using PySpark, participants will develop practical skills and gain the confidence to effectively utilize Apache Spark for big data processing and analytics tasks.Inhalt
- Introduction to Apache Spark with Python (PySpark)
- Overview of big data processing challenges
- Introduction to distributed computing and parallel processing
- Introduction to Spark's architecture and components (driver, executor, cluster manager)
- Comparison with traditional batch processing frameworks (Hadoop MapReduce)
- Setting up Spark with Python-Shell
- Spark Fundamentals with PySpark
- Understanding Resilient Distributed Datasets (RDDs)
- RDD characteristics (immutable, partitioned, resilient)
- RDD operations: transformations (map, filter, flatMap, etc.) and actions (count, collect, reduce, etc.)
- Lazy evaluation and lineage in Spark
- Hands-on exercises using PySpark
- Understanding Resilient Distributed Datasets (RDDs)
- Spark Streaming
- Introduction to Spark Streaming
- Streaming data processing concepts
- DStream (Discretized Stream) operations in Spark Streaming
- Windowed operations
- Stateful processing using updateStateByKey()
- Handling data sources (Flume, Kafka) and sinks (HDFS, Cassandra) in Spark Streaming
- Hands-on exercises with Spark Streaming
- Integration with Flume, Kafka, and Cassandra
- Introduction to Apache Flume and its integration with Spark
- Overview of Flume's event-based data ingestion
- Setting up Flume agents and Spark integration
- Integration of Apache Kafka with Spark Streaming
- Overview of Kafka's distributed publish-subscribe messaging system
- Configuring Kafka and Spark integration for real-time data processing
- Introduction to Apache Cassandra and its integration with Spark
- Overview of Cassandra's distributed NoSQL database
- Connecting Spark to Cassandra for data storage and
retrieval
- Introduction to Apache Flume and its integration with Spark
Es wurden noch keine FAQ hinterlegt. Falls Sie Fragen haben oder Unterstützung benötigen, kontaktieren Sie unseren Kundenservice. Wir helfen gerne weiter!
