Free
Airflow Summit 2022 (Sydney, In Person)

Airflow Summit 2022 (Sydney, In Person)

Actions and Detail Panel

Free

Event Information

Share this event

Date and time

Location

Location

Mantel Group

Level 21, 580 George Street

Sydney, NSW 2000

Australia

View Map

Event description
Apache Airflow users in Sydney will not want to miss the first IN PERSON Airflow Summit for Australia!

About this event

(This registration is for attending in Sydney, if you are in Melbourne, see the other event)

Airflow Summit is the annual conference for the global community of Apache Airflow users and developers. To attend in-person you must REGISTER for a FREE TICKET, as spaces are limited.

Key Details

  • The event will be live-streamed to a global audience.
  • Our speakers will join us live from various countries and will answer questions.
  • Light refreshments will be served.
  • Most attendees will receive a free T-Shirt!

Speakers

We will be hosting a great list of international speakers during the Australian time slot:

Agenda

  • 12:30 pm: Arrive at Venue + Introductions
  • 1:00 pm: TALK 1 - Daniel Imberman (Strategy Engineer, Astronomer)
  • 2:00 pm: TALK 2 - Lior Gavish (CTO, Monte Carlo)
  • 3:00 pm: TALK 3 - Jeff Zhang (Staff Engineer, Alibaba)
  • 3:30 pm: TALK 4 - Jed Cunningham (Staff Engineer, Astronomer)
  • 3:40 pm: TALK 5 - Howie Wang (Software Engineer, Apple)
  • 4:00 pm: Local Networking + Show & Tell

About The Talks

Talk 1

Introducing Astro Flow: The next generation of DAG authoring

About the speaker:

Daniel Imberman (Strategy Engineer, Astronomer) is a member of the Apache Airflow PMC, core contributor of the KubernetesExecutor, and Strategy Engineer at Astronomer.io. He recieved a BS/MS at UC Santa Barbara with a focus in Distributed Systems and Machine Learning and is highly passionate about building the next generation of ML tooling.

About the talk:

Imagine if you could chain together SQL models using nothing but python, write functions that treat Snowflake tables like dataframes and dataframes like SQL tables. Imagine if you could write a SQL airflow DAG using only python or without using any python at all.

With Astro SDK, we at Astronomer have gone back to the drawing board around fundamental questions of what DAG writing could look like. Our goal is to empower Data Engineers, Data Scientists, and even the Business Analysts to write Airflow DAGs with code that reflects the data movement, instead of the system configuration. Astro will allow each group to focus on producing value in their respective fields with minimal knowledge of Airflow and high amounts of flexibility between SQL or python-based systems.

This is way beyond just a new way of writing DAGs. This is a universal agnostic data transfer system. Users can run the exact same code against different databases (snowflake, bigquery, etc.) and datastores (GCS, S3, etc.) with no changes except to the connection IDs. Users will be able to promote a SQL flow from their dev postgres to their prod snowflake with a single variable change.

We are ecstatic to reveal over eight months of work around building a new open-source project that will significantly improve your DAG authoring experience!

Talk 2

Keep Calm & Query On: Debugging Broken Data Pipelines with Airflow

About the speaker:

Lior Gavish (CTO, Monte Carlo) is a technology leader who loves making customers happy with software that "just works".

About the talk:

“Why is my data missing?” “Why didn’t my Airflow job run?” “What happened to this report?” If you’ve been on the receiving end of any of these questions, you’re not alone. As data pipelines become increasingly complex and companies ingest more and more data, data engineers are on the hook for troubleshooting where, why, and how data quality issues occur, and most importantly, fixing them so systems can get up and running again.

In this talk, Lior Gavish, CTO and co-founder of Monte Carlo, discusses the three primary factors that contribute to data quality issues and how data teams can leverage Airflow, dbt, and other solutions in their arsenal to conduct root cause analysis on their data pipelines.

Talk 3

Airflow & Zeppelin: Better together

About the speaker:

Jianfeng Zhang (Staff Engineer, Alibaba) has 12 years of experience in the big data industry and is an open-source veteran known for Tez, Livy, and Zeppelin. He sits on the PMC for several Apache projects, and currently works as a Staff Engineer at Alibaba, with previous roles at Hortonworks where he developed many of the popular big data tools that he has become known for.

About the talk:

Airflow is the almost de-facto standard job orchestration tool that is used in the production stage. But moving your job from the development stage in other tools to the production stage in Airflow is usually a big pain for lots of users. A major reason is the environment inconsistency between the development environment and the production environment. Apache Zeppelin is a web-based notebook that is integrated seamlessly with lots of popular big data engines, such as Spark, Flink, Hive, Presto, etc. making it very suitable for the development stage.

In this talk, Jeff will talk about the seamless integration between Airflow & Zeppelin, so that you can develop your big data job in Zeppelin efficiently and move to Airflow easily without caring too much about issues caused by the environment inconsistency.

Talk 4

Airflow / Kubernetes: Running on and using K8S

About the speaker:

Jedidiah Cunningham (Staff Engineer, Astronomer) is a member of the Apache Airflow PMC, and a core contributor to the KubernetesExecutor and Official Airflow Helm Chart.

About the talk:

Apache Airflow and Kubernetes work well together. Not only does Airflow have native support for running tasks on Kubernetes, but there is also an official helm chart that makes it easy to run Airflow itself on Kubernetes!

Confused about the differences between KubernetesExecutor and KubernetesPodOperator? What about CeleryKubernetesExecutor? Or the new LocalKubernetesExecutor? After this talk, you will understand how they all fit in the ecosystem.

We will talk about the ways you can run Airflow on Kubernetes, run tasks on Kubernetes, or do both. We will also cover things you may want to consider doing to have a reliable Airflow instance.

Talk 5

Skip tasks to make your debugging easy

About the speaker:

Howie Wang (Software Engineer, Apple) is a Software Engineer on the Data Pipelines team. Before joining Apple, Howie worked at WePay, where he helped build infrastructure software, and contributed to open source projects like Airflow and Waltz.

About the talk:

At Apple, we are building a self-serve data platform based on Airflow. Self-serve means users can create, deploy and run their DAGs freely. With provided logs and metrics, users are able to test or troubleshot DAGs on their own. Today, a common use case is that users want to test one or a few tasks in their DAG. However, when they trigger the DAG all tasks run, instead of just the ones people are interested in. To save time and resources, lots of users choose to manually mark complete for each task to skip. Can we do better than that? Is there an easy-peasy way to skip tasks?

In this lightning talk, we would like to share the challenges we had, the solution we came up with, and the lesson we learned.

Share with friends

Date and time

Location

Mantel Group

Level 21, 580 George Street

Sydney, NSW 2000

Australia

View Map

Save This Event

Event Saved