You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: examples/airflow/airflow.md
+24-26Lines changed: 24 additions & 26 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,14 +1,14 @@
1
-
<!-- SPDX-License-Identifier: Apache-2.0 -->
1
+
# Getting Started with Airflow and OpenLineage+Marquez
2
2
3
-
# [Airflow](https://airflow.apache.org) Example
3
+
> **Note:** For a modified version of this guide that uses [Astro](https://www.astronomer.io/try-astro/?referral=docs-what-astro-banner) instead of vanilla Airflow, visit see the OpenLineage [docs](https://openlineage.io/docs/guides/airflow-quickstart).
4
4
5
-
In this example, we'll walk you through how to enable Airflow DAGs to send lineage metadata to Marquez using [OpenLineage](https://openlineage.io/). The example will help demonstrate some of the features of Marquez.
5
+
In this example, we'll walk you through how to enable Airflow DAGs to send lineage metadata to [Marquez](https://marquezproject.ai/) using OpenLineage.
6
6
7
-
### What you’ll learn:
7
+
### You’ll learn how to:
8
8
9
-
*Enable OpenLineage in Airflow
10
-
*Write your very first OpenLineageenabled DAG
11
-
*Troubleshoot a failing DAG using Marquez
9
+
*enable OpenLineage in Airflow
10
+
*write your very first OpenLineage-enabled DAG
11
+
*troubleshoot a failing DAG using Marquez
12
12
13
13
# Prerequisites
14
14
@@ -24,7 +24,7 @@ Before you begin, make sure you have installed:
24
24
First, if you haven't already, clone the Marquez repository and change into the [`examples/airflow`](https://github.com/MarquezProject/marquez/tree/main/examples/airflow) directory:
To make sure the latest [`openlineage-airflow`](https://pypi.org/project/openlineage-airflow) library is downloaded and installed when starting Airflow, you'll need to create a `requirements.txt` file with the following content:
@@ -58,17 +58,17 @@ Your `examples/airflow/` directory should now contain the following:
58
58
59
59
# Step 2: Write Airflow DAGs using OpenLineage
60
60
61
-
In this step, we'll create two new Airflow DAGs that perform simple tasks. The `counter` DAG will generate a random number every minute, while the `sum` DAG calculates a sum every five minutes. This will result in a simple pipeline containing two jobs and two datasets.
61
+
In this step, we'll create two new Airflow DAGs that perform simple tasks. The `counter` DAG generates a random number every minute, while the `sum` DAG calculates a sum every five minutes. This will result in a simple pipeline containing two jobs and two datasets.
62
62
63
63
First, let's create the `dags/` folder where our example DAGs will be located:
64
64
65
65
```bash
66
66
$ mkdir dags
67
67
```
68
68
69
-
When writing our DAGs, we'll use [`openlineage-airflow`](https://pypi.org/project/openlineage-airflow), enabling OpenLineage to observe the DAG and automatically collect task-level metadata. If you're using Airflow 2.3+ no further changes to your dag code, or configuration are needed. If you're using older version of Airflow, please look [here](https://github.com/OpenLineage/OpenLineage/blob/main/integration/airflow/README.md#setup) to understand how to configure Airflow integration.
69
+
When writing our DAGs, we'll use [`openlineage-airflow`](https://pypi.org/project/openlineage-airflow), enabling OpenLineage to observe the DAG and automatically collect task-level metadata. If you're using Airflow 2.3+ no further changes to your DAG code or configuration are needed. If you're using an older version of Airflow, please read [this](https://github.com/OpenLineage/OpenLineage/blob/main/integration/airflow/README.md#setup) to understand how to configure the Airflow integration.
70
70
71
-
# Step 2.1: Create DAG `counter`
71
+
##Step 2.1: Create `counter` DAG
72
72
73
73
Under `dags/`, create a file named `counter.py` and add the following code:
74
74
@@ -124,9 +124,9 @@ t2 = PostgresOperator(
124
124
t1 >> t2
125
125
```
126
126
127
-
# Step 2.2: Create DAG `sum`
127
+
##Step 2.2: Create `sum` DAG
128
128
129
-
Under`dags/`, create a file named `sum.py` and add the following code:
129
+
In`dags/`, create a file named `sum.py` and add the following code:
130
130
131
131
```python
132
132
from airflow importDAG
@@ -175,7 +175,7 @@ t2 = PostgresOperator(
175
175
t1 >> t2
176
176
```
177
177
178
-
At this point, you should have the following under your `examples/airflow/` directory:
178
+
At this point, your `examples/airflow/` directory should look like this:
179
179
180
180
```
181
181
.
@@ -202,11 +202,11 @@ $ docker-compose up
202
202
203
203
**The above command will:**
204
204
205
-
*Start Airflow and install `openlineage-airflow`
206
-
*Start Marquez
207
-
*Start Postgres
205
+
*start Airflow and install `openlineage-airflow`
206
+
*start Marquez
207
+
*start Postgres
208
208
209
-
To view the Airflow UI and verify it's running, open [http://localhost:8080](http://localhost:8080). Then, login using the username and password:`airflow` / `airflow`. You can also browse to [http://localhost:3000](http://localhost:3000) to view the Marquez UI.
209
+
To view the Airflow UI and verify it's running, open [http://localhost:8080](http://localhost:8080). Then, log in using the username and password `airflow` / `airflow`. You can also browse to [http://localhost:3000](http://localhost:3000) to view the Marquez UI.
210
210
211
211
# Step 4: View Collected Metadata
212
212
@@ -218,11 +218,13 @@ To view DAG metadata collected by Marquez from Airflow, browse to the Marquez UI
218
218
219
219
> **Note:** If the `counter.inc` job is not in the drop-down list, check to see if Airflow has successfully executed the DAG.
If you take a quick look at the lineage graph for `counter.inc`, you should see `public.counts` as an output dataset and `sum.total` as a downstream job!
225
+
If you take a quick look at the lineage graph for `counter.if_not_exists`, you should see `example.public.counts` as an output dataset and `sum.total` as a downstream job!
224
226
225
-

227
+

226
228
227
229
# Step 5: Troubleshoot a Failing DAG with Marquez
228
230
@@ -307,8 +309,4 @@ _Congrats_! You successfully step through a troubleshooting scenario of a failin
307
309
308
310
# Feedback
309
311
310
-
What did you think of this example? You can reach out to us on [slack](http://bit.ly/MarquezSlack) and leave us feedback, or [open a pull request](https://github.com/MarquezProject/marquez/blob/main/CONTRIBUTING.md#submitting-a-pull-request) with your suggestions!
311
-
312
-
----
313
-
SPDX-License-Identifier: Apache-2.0
314
-
Copyright 2018-2023 contributors to the Marquez project.
312
+
What did you think of this example? You can reach out to us on [slack](http://bit.ly/MarquezSlack) and leave us feedback, or [open a pull request](https://github.com/MarquezProject/marquez/blob/main/CONTRIBUTING.md#submitting-a-pull-request) with your suggestions!
0 commit comments