DiscoverProHuddle, a Podcast about Enterprise IT, Software, Oracle, Databases and The CloudApache Spark, the Next Generation Cluster Computing with Ivan Lozic
Apache Spark, the Next Generation Cluster Computing with Ivan Lozic

Apache Spark, the Next Generation Cluster Computing with Ivan Lozic

Update: 2017-07-19
Share

Description

In this episode, I talked to Ivan Lozic about Apache Spark. Ivan owns a master’s degree in information technology and has been working at Farmeron, a cloud based dairy farm management software. In his role as the software architect he’s been in charge of the Big Data architecture and technology stack in order to be able to process ever larger data sets the company has been processing.

Apache Spark is a general computing engine designed for large-scale data processing. It is becoming ever more popular thanks to the support from the Apache community. Many well-known companies use it to process petabytes of data on 8000+ nodes with long running jobs measured in weeks.
In this session, Ivan talks about:
Apache Spark and how it relates to (traditional) Hadoop MapReduce technology,
What makes Spark so fast
How to use its rich API’s to design and run your ETL jobs.
Apache Spark streaming capabilities for near real-time updates and its role in Big Data processing scenarios.
Structured Streaming, a scalable and fault tolerant stream processing engine which makes near real-time processing scenarios easier.

If you'd like to watch the recording of this webinar, or be notified of upcoming webinars, please register at http://www.prohuddle.com.

Now let's hear from Ivan.

Comments (1)

Vitor Olio

excellent

Jun 29th
Reply
00:00
00:00
x

0.5x

0.8x

1.0x

1.25x

1.5x

2.0x

3.0x

Sleep Timer

Off

End of Episode

5 Minutes

10 Minutes

15 Minutes

30 Minutes

45 Minutes

60 Minutes

120 Minutes

Apache Spark, the Next Generation Cluster Computing with Ivan Lozic

Apache Spark, the Next Generation Cluster Computing with Ivan Lozic