
Airflow 3.0 DAG Authoring Certificate
Exam study guide and question examples for Apache Airflow 3.0. DAG Authoring Certificate
This year I’ve quit my 9 to 5 job in pursuit of building a career as a freelance analytics engineer… which turned out to be easier said than done.
Yes, there are more work opportunities now than ever, but one has to consider that 4 years ago you had to only compete with other developers when applying for gigs.
But today, in the age of 💩GPT, you also have to stand out from hundreds of applications coming in from AI platforms that promise to 10x your number of clients in just a month.
Why I did the DAG Authoring Certification Exam
Needing to differentiate myself from the mass of chatbots and 10x vibe coders I discovered 1 thing they have in common. Both vibe coders and LLMs suck at certification exams.
A good certification exam puts you in front of a real problem, testing not only your knowledge of the tool, but your critical thinking as well.
And to the question of which certification exams to take on first: the best certificates are the ones you get for free.
It just so happened that Astronomer was giving out free shots at the new Airflow 3.0 DAG authoring certificate as a reward for participating in their annual user survey…
Post Summary;
Here are the topics I’ll talk about in this post:
What you’ll learn in the DAG Authoring Certificate Even though I’ve been working with the tool intensively for 4+ years, there were some things I had no idea about, in this section, I’ll list a few I found most interesting.
Sample exam questions. Just like any good college buddy, I made sure to write down all the questions that were confusing or hard for me, so you know what to prepare for.
Study guide. Where to start? What topics to check out? In this section I’m going give you a 3 step study plan that leads to success.
What can you learn in the Airflow DAG Authoring Certificate
Some of these are new to Airflow 3.0, some of these I just never had a chance to use. But here are some cool things you’ll learn while doing the Airflow DAG Authoring Certificate.
DAG Versioning
If you’ve ever worked with Airlfow in production environments, you may know how big of a pain can DAG versioning used to be.
Airflow assumes that the most recent DAG code applies to all of its runs.
Because of this, if you decide to remove, or add a task from a DAG that was already running, your UI would get messed up.
Code changes during execution create unpredictable behaviour.
Airflow always runs the latest version of your tasks code, even if you made a change in the middle of the execution of the pipeline.
To solve this Airflow 3.0. introduces the concept of DAG Bundles
DAG bundles are collections of one or multiple DAGs and their associated files that can be sourced from various locations.
At the moment Airflow supports two types of DAG bundles:
- Local DAG bundle -> the old un-versioned system
- Git DAG bundle -> uses git for detecting version changes in the DAG code
You can even have multiple DAG bundles, allowing your to organize your DAGs into more than one Git repository.
To configure a versioned DAG bundle, add the following to the airflow.cfg file:
[dag_processor]dag_bundle_config_list = [ { "name": "my_git_repo", "classpath": "airflow.providers.git.bundles.git.GitDagBundle", "kwargs": {"tracking_ref": "main", "git_conn_id": "my_git_conn"} }, { "name": "dags-folder", "classpath": "airflow.dag_processing.bundles.local.LocalDagBundle", "kwargs": {} } ]XCOM Map and Zip functions
I knew about the native map and zip functions in Python, but I had no idea XCom had also implemented them.
Map — Takes a Python function as an argument and maps the values of XComArgs (but doesn’t create an Airflow task).
# Define the DAGwith DAG( dag_id="example_dag", start_date=datetime(2025, 1, 1), schedule=None, # This DAG does not run on a schedule) as dag:
# Return a list of folder paths def list_folders(): return ["usr/folder_a", "usr/folder_b", "usr/folder_c"]
list_files = PythonOperator( task_id="list_files", python_callable=list_folders )
# Append 'data/' to each folder path # Using the .output.map() to process each path individually files_processed = list_files.output.map(lambda path: path + "data/")Zip — useful for merging the output of different tasks to pass as a result to another task.
@dag(start_date=datetime(2025, 1, 1), schedule=None, dag_id="download_dag")def download_dag():
@task def get_paths(): return ['/usr/local/', '/bin/test/', '/home/me/']
@task def get_filenames(): return ['file_a', 'file_b', 'file_c']
@task def get_extensions(): return ['.txt', '.zip', '.parquet']
@task def download_files(paths, filenames, extensions): # Combine path, filename, and extension for path, filename, ext in zip(paths, filenames, extensions): full_path = f"{path}{filename}{ext}" print(f"Downloading {full_path}...") # Expected output: # Downloading /usr/local/file_a.txt... # Downloading /bin/test/file_b.zip... # Downloading /home/me/file_c.parquet...Parametrized task groups
Not only can TaskGroups be parametrized in Airflow, but you can also:
- Call the parameterized task group function with different arguments, to create multiple instances of the same task group which greatly impacts re usability of DAG code.
- Import task groups from external modules which means less DAG code in a single file.
- Dynamically map task group to create multiple instances based on runtime data.
@task_group(group_id="file_processor")def process_file(file_path: str) @task def extract_data(path: str): return f"data_from_{path}"
@task def transform_data(data: str): return f"transformed_{data}"
return transform_data(extract_data(file_path))
# Create multiple instances with expand()file_paths = ["/data/file1.csv", "data/file2.csv", "data/file3.csv"]mapped_groups = process_file.expand(file_path=file_paths)
# Access specific instancesdownstream_task = pull_xcom(mapped_groups)Triggering DAGs using Assets
Assets are a neat trick when you want to make your workflows more data driven. They allow your DAGs to trigger based on updates, rather than just time-based schedules.
Assets create a visible relationship between the DAGs and the data they process.
Each time an asset is updated by a DAG, an AssetEvent is dispatched and sent to other DAGs listening for changes.
I did know about assets before I started the exam, but what I didn’t know that the asset can carry metadata between DAGs.
If we want to attach some metadata to this AssetEvent, we can do so by yielding it from a task.
my_asset = Asset("my_asset")
@task(outlets=[my_asset])def attach_metadata(): num = 23 yield Metadata(my_asset, {"my_num": num})
return "hello :)"
attach_metadata()Certificate question examples
Now that you have an idea what you can learn during the certification, I’ll go through some of the harder questions questions on the exam, so you know what to prepare for.
What is an Asset? (Select all that apply)
Correct answers:
- (T) An Airflow asset represents a logical grouping of data
- (T) An Airflow asset is technically a DAG
When you define an asset using the @asset taskflow decorator, Airflow actually creates a DAG with the same id in the background.
Is the task below idempotent?
execute_query = SQLExecuteQueryOperator( task_id="execute_query", sql="SELECT * FROM my_table WHERE date = '{{ ds }}' LIMIT 1;", split_statements=True, return_last=False,)Correct answer: Yes
I initially wanted to answer “no” because of the LIMIT 1 part of the query. A query like this can, if run multiple times, return different results. But the exam ignores this, what’s important is that {{ ds }} is used in the date filter.
This makes the function non-deterministic, but it is idempotent.
What does print_user output to the standard output?
@dagdef my_dag(): @task def get_user() -> Dict[str, Any]: return { 'id': 1, 'name': 'John', 'city': 'New York' }
@task_group def get_user_info(my_user: Dict[str, Any]):
@task def get_city() -> str: return my_user['city']
@task def get_name() -> str: return my_user['name']
@task def print_user(name: str, city: str): print(f"Name: {name}, City: {city}")
print_user(get_name(), get_city())
user = get_user() get_user_info(user)my_dag()Correct answer: Nothing, the DAG has an error
The two tasks: get_city and get_name cannot directly access variables from the scope above. It has to explicitly be passed to them.
How many XComs does this task produce?
@dagdef my_dag(): @task(multiple_outputs=True) def my_task(): return { 'id': 1, 'name': 'John' }
my_task()my_dag()Correct answer: 3
It is a little bit confusing, but the 3 XComs the task produces are the following:
- id -> 1
- name -> “Johnn”
- return_value ->
{“id”: 1, “name”:” “john”}
Which XCom method should you use to retrieve results from a specific mapped task instance?
@taskdef get_result_from_mapped_task(ti): # How to get result from the 2nd mapped task instance? passCorrect answer: ti.xcom_pull(key=’return_value’, map_indexes=1)
I have no comments for this one. Had no idea about the map_indexes parameter
Does this DAG cause a parsing error?
@dagdef my_dag(): @task def task_a(): print("Hello from task A!") @task def task_b(): print("Hello from task B!") @task def task_c(): print("Hello from task C!")
task_a >> task_b >> task_cmy_dag()Correct answer: Yes
The tasks defined using the taskflow operator have to be invoked. The correct chaining would look like this: task_a() >> task_b() >> task_c()
What task ID will choose_branch return?
@task.branchdef choose_branch(result): if result > 0.5 and result < 1: return ['task_a', 'task_b'] elif > 1: return ['task_c']
choose_branch(0)Correct answer: This code raises an error
This question is very missleading. elif > 1 is not valid Python syntax. It should be written: elif result > 1
Study guide
Photo by Aaron Burden on Unsplash

Before you start the certification, here are a couple of tips:
The exam is not hard per-se but it does tackle more advanced topics like dynamic tasks, data aware mapping and XCom management, and there are some confusing questions.
You’ll have enough time for everything. 60 minutes for 75 multiple choice questions is more than enough. The passing score is quite generous. You need to answer only 53 of the 75 questions correctly to get certified.
Step #1 - finish the Fundamentals certificate first
Before you jump into the DAG Authoring certificate, do the Apache Airflow 3 Fundamentals first.
This certification handles the basic concepts of Airflow, and while it doesn’t cover a ton of topics it presents a lot of challenging questions that repeat in the DAG Authoring certificates.
This certificate should be passable with just 1-2 years of hands-on Airflow experience …Or if you just hit the books hard enough.
Step #2 - go through the astronomer learning paths
The Astronomer Academy is a free source of educational materials on the topic of Apache Airflow and some more general data engineering concepts.
It has a ton of video materials, cheat sheets and quizes that will prepare you for the certification.
Marc Lamberti is great at breaking down complex Airflow concepts and creating video tutorials that don’t take up much of your time.
Step #3 - apply the things you learn
Getting hands on is always the best way to gain a strong grasp on topics you learn through the documentation and video materials.Are you struggling to grasp some concepts like data assets or XCom, then set up a local airflow instance and start coding.