Skip to content

Commit 23de0de

Browse files
authored
Annotate Airflow example concerning alternate version in docs (#2526)
* update airflow example concerning new astro version and update ui images --------- Signed-off-by: Michael Robinson <merobi@gmail.com>
1 parent 2128485 commit 23de0de

4 files changed

Lines changed: 24 additions & 26 deletions

File tree

Lines changed: 24 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,14 @@
1-
<!-- SPDX-License-Identifier: Apache-2.0 -->
1+
# Getting Started with Airflow and OpenLineage+Marquez
22

3-
# [Airflow](https://airflow.apache.org) Example
3+
> **Note:** For a modified version of this guide that uses [Astro](https://www.astronomer.io/try-astro/?referral=docs-what-astro-banner) instead of vanilla Airflow, visit see the OpenLineage [docs](https://openlineage.io/docs/guides/airflow-quickstart).
44
5-
In this example, we'll walk you through how to enable Airflow DAGs to send lineage metadata to Marquez using [OpenLineage](https://openlineage.io/). The example will help demonstrate some of the features of Marquez.
5+
In this example, we'll walk you through how to enable Airflow DAGs to send lineage metadata to [Marquez](https://marquezproject.ai/) using OpenLineage.
66

7-
### What you’ll learn:
7+
### You’ll learn how to:
88

9-
* Enable OpenLineage in Airflow
10-
* Write your very first OpenLineage enabled DAG
11-
* Troubleshoot a failing DAG using Marquez
9+
* enable OpenLineage in Airflow
10+
* write your very first OpenLineage-enabled DAG
11+
* troubleshoot a failing DAG using Marquez
1212

1313
# Prerequisites
1414

@@ -24,7 +24,7 @@ Before you begin, make sure you have installed:
2424
First, if you haven't already, clone the Marquez repository and change into the [`examples/airflow`](https://github.com/MarquezProject/marquez/tree/main/examples/airflow) directory:
2525

2626
```bash
27-
git clone https://github.com/MarquezProject/marquez && cd examples/airflow
27+
git clone https://github.com/MarquezProject/marquez && cd marquez/examples/airflow
2828
```
2929

3030
To make sure the latest [`openlineage-airflow`](https://pypi.org/project/openlineage-airflow) library is downloaded and installed when starting Airflow, you'll need to create a `requirements.txt` file with the following content:
@@ -58,17 +58,17 @@ Your `examples/airflow/` directory should now contain the following:
5858

5959
# Step 2: Write Airflow DAGs using OpenLineage
6060

61-
In this step, we'll create two new Airflow DAGs that perform simple tasks. The `counter` DAG will generate a random number every minute, while the `sum` DAG calculates a sum every five minutes. This will result in a simple pipeline containing two jobs and two datasets.
61+
In this step, we'll create two new Airflow DAGs that perform simple tasks. The `counter` DAG generates a random number every minute, while the `sum` DAG calculates a sum every five minutes. This will result in a simple pipeline containing two jobs and two datasets.
6262

6363
First, let's create the `dags/` folder where our example DAGs will be located:
6464

6565
```bash
6666
$ mkdir dags
6767
```
6868

69-
When writing our DAGs, we'll use [`openlineage-airflow`](https://pypi.org/project/openlineage-airflow), enabling OpenLineage to observe the DAG and automatically collect task-level metadata. If you're using Airflow 2.3+ no further changes to your dag code, or configuration are needed. If you're using older version of Airflow, please look [here](https://github.com/OpenLineage/OpenLineage/blob/main/integration/airflow/README.md#setup) to understand how to configure Airflow integration.
69+
When writing our DAGs, we'll use [`openlineage-airflow`](https://pypi.org/project/openlineage-airflow), enabling OpenLineage to observe the DAG and automatically collect task-level metadata. If you're using Airflow 2.3+ no further changes to your DAG code or configuration are needed. If you're using an older version of Airflow, please read [this](https://github.com/OpenLineage/OpenLineage/blob/main/integration/airflow/README.md#setup) to understand how to configure the Airflow integration.
7070

71-
# Step 2.1: Create DAG `counter`
71+
## Step 2.1: Create `counter` DAG
7272

7373
Under `dags/`, create a file named `counter.py` and add the following code:
7474

@@ -124,9 +124,9 @@ t2 = PostgresOperator(
124124
t1 >> t2
125125
```
126126

127-
# Step 2.2: Create DAG `sum`
127+
## Step 2.2: Create `sum` DAG
128128

129-
Under `dags/`, create a file named `sum.py` and add the following code:
129+
In `dags/`, create a file named `sum.py` and add the following code:
130130

131131
```python
132132
from airflow import DAG
@@ -175,7 +175,7 @@ t2 = PostgresOperator(
175175
t1 >> t2
176176
```
177177

178-
At this point, you should have the following under your `examples/airflow/` directory:
178+
At this point, your `examples/airflow/` directory should look like this:
179179

180180
```
181181
.
@@ -202,11 +202,11 @@ $ docker-compose up
202202
203203
**The above command will:**
204204

205-
* Start Airflow and install `openlineage-airflow`
206-
* Start Marquez
207-
* Start Postgres
205+
* start Airflow and install `openlineage-airflow`
206+
* start Marquez
207+
* start Postgres
208208

209-
To view the Airflow UI and verify it's running, open [http://localhost:8080](http://localhost:8080). Then, login using the username and password: `airflow` / `airflow`. You can also browse to [http://localhost:3000](http://localhost:3000) to view the Marquez UI.
209+
To view the Airflow UI and verify it's running, open [http://localhost:8080](http://localhost:8080). Then, log in using the username and password `airflow` / `airflow`. You can also browse to [http://localhost:3000](http://localhost:3000) to view the Marquez UI.
210210

211211
# Step 4: View Collected Metadata
212212

@@ -218,11 +218,13 @@ To view DAG metadata collected by Marquez from Airflow, browse to the Marquez UI
218218

219219
> **Note:** If the `counter.inc` job is not in the drop-down list, check to see if Airflow has successfully executed the DAG.
220220
221-
![](./docs/search.png)
221+
<p align="center">
222+
<img src={require("./docs/current-search-count.png").default} />
223+
</p>
222224

223-
If you take a quick look at the lineage graph for `counter.inc`, you should see `public.counts` as an output dataset and `sum.total` as a downstream job!
225+
If you take a quick look at the lineage graph for `counter.if_not_exists`, you should see `example.public.counts` as an output dataset and `sum.total` as a downstream job!
224226

225-
![](./docs/lineage-view-job.png)
227+
![](./docs/current-lineage-view-job.png)
226228

227229
# Step 5: Troubleshoot a Failing DAG with Marquez
228230

@@ -307,8 +309,4 @@ _Congrats_! You successfully step through a troubleshooting scenario of a failin
307309

308310
# Feedback
309311

310-
What did you think of this example? You can reach out to us on [slack](http://bit.ly/MarquezSlack) and leave us feedback, or [open a pull request](https://github.com/MarquezProject/marquez/blob/main/CONTRIBUTING.md#submitting-a-pull-request) with your suggestions!
311-
312-
----
313-
SPDX-License-Identifier: Apache-2.0
314-
Copyright 2018-2023 contributors to the Marquez project.
312+
What did you think of this example? You can reach out to us on [slack](http://bit.ly/MarquezSlack) and leave us feedback, or [open a pull request](https://github.com/MarquezProject/marquez/blob/main/CONTRIBUTING.md#submitting-a-pull-request) with your suggestions!
-57.8 KB
Loading
102 KB
Loading
38.8 KB
Loading

0 commit comments

Comments
 (0)