Skip to content

Commit 6c15266

Browse files
Merge branch 'MarquezProject:main' into feature/lineage_event_created_at_indexed
2 parents 0216fad + 68fcb96 commit 6c15266

131 files changed

Lines changed: 3481 additions & 1083 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.circleci/get-jdk17.sh

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -14,11 +14,9 @@
1414
#
1515
# Usage: $ ./get-jdk17.sh
1616

17-
set -e
18-
1917
wget -qO - https://adoptium.jfrog.io/adoptium/api/gpg/key/public | sudo apt-key add -
2018
sudo add-apt-repository --yes https://adoptium.jfrog.io/adoptium/deb
21-
sudo apt-get update && sudo apt-get install temurin-17-jdk
19+
sudo apt-get update --allow-releaseinfo-change && sudo apt-get install --yes temurin-17-jdk
2220
sudo update-alternatives --set java /usr/lib/jvm/temurin-17-jdk-amd64/bin/java
2321
sudo update-alternatives --set javac /usr/lib/jvm/temurin-17-jdk-amd64/bin/javac
2422
java -version

.env.example

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
API_PORT=5000
22
API_ADMIN_PORT=5001
33
WEB_PORT=3000
4-
TAG=0.27.0
4+
TAG=0.28.0

.github/workflows/headerchecker.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ jobs:
3131
- name: Check for headers
3232
run: |
3333
ok=1
34-
readarray -t files <<<"$(jq -r '.[]' <<<'${{ steps.files.outputs.all }}')"
34+
readarray -t files <<<"$(jq -r '.[]' <<<'${{ steps.files.outputs.added_modified }}')"
3535
for file in ${files[@]}; do
3636
if [[ ($file == *".java") ]]; then
3737
if ! grep -q Copyright "$file"; then

.github/workflows/test-chart.yaml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -16,15 +16,15 @@ jobs:
1616
fetch-depth: 0
1717

1818
- name: Setup Helm
19-
uses: azure/setup-helm@v2.2
19+
uses: azure/setup-helm@v3.4
2020

2121
- name: Setup Python
22-
uses: actions/setup-python@v3
22+
uses: actions/setup-python@v4
2323
with:
2424
python-version: 3.7
2525

2626
- name: Setup chart-testing
27-
uses: helm/chart-testing-action@v2.3.0
27+
uses: helm/chart-testing-action@v2.3.1
2828

2929
- name: Run chart-testing (list-changed)
3030
id: list-changed

CHANGELOG.md

Lines changed: 42 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,44 @@
11
# Changelog
22

3-
## [Unreleased](https://github.com/MarquezProject/marquez/compare/0.27.0...HEAD)
3+
## [Unreleased](https://github.com/MarquezProject/marquez/compare/0.28.0...HEAD)
4+
5+
### Added
6+
7+
* Column-lineage endpoints supports point-in-time requests [`#2265`](https://github.com/MarquezProject/marquez/pull/2265) [@pawel-big-lebowski](https://github.com/pawel-big-lebowski)
8+
*Enable requesting `column-lineage` endpoint by a dataset version, job version or dataset field of a specific dataset version.*
9+
10+
### Fixed
11+
12+
* Allow null column type in column-lineage [`#2272`](https://github.com/MarquezProject/marquez/pull/2272) [@pawel-big-lebowski](https://github.com/pawel-big-lebowski)
13+
* Include error message for JSON processing exception [`#2271`](https://github.com/MarquezProject/marquez/pull/2271) [@pawel-big-lebowski](https://github.com/pawel-big-lebowski)
14+
*In case of JSON processing exceptions Marquez API should return exception message to a client.*
15+
* Fix column lineage when multiple jobs write to same dataset [`#2289`](https://github.com/MarquezProject/marquez/pull/2289) [@pawel-big-lebowski](https://github.com/pawel-big-lebowski)
16+
*The fix deprecates the way fields `transformationDescription` and `transformationType` are returned. The depracated way of returning those fields will be removed in 0.30.0.*
17+
18+
## [0.28.0](https://github.com/MarquezProject/marquez/compare/0.27.0...0.28.0) - 2022-11-21
19+
20+
### Added
21+
22+
* Optimize current runs query for lineage API [`#2211`](https://github.com/MarquezProject/marquez/pull/2211) [@prachim-collab](https://github.com/prachim-collab)
23+
*Add a simpler, alternate `getCurrentRuns` query that gets only simple runs from the database without the additional data from tables such as `run_args`, `job_context`, `facets`, etc., which required extra table joins.*
24+
* Add Code Quality, DCO and Governance docs to project [`#2237`](https://github.com/MarquezProject/marquez/pull/2237) [`#2241`](https://github.com/MarquezProject/marquez/pull/2241) [@merobi-hub](https://github.com/MarquezProject/marquez/commits?author=merobi-hub)
25+
*Adds a number of standard governance and procedure docs to the project.*
26+
* Add possibility to soft-delete namespaces [`#2244`](https://github.com/MarquezProject/marquez/pull/2244) [@mobuchowski](https://github.com/mobuchowski)
27+
*Adds the ability to "hide" inactive namespaces. The namespaces are undeleted when a relevant OL event is received.*
28+
* Add search service proposal [`#2203`](https://github.com/MarquezProject/marquez/pull/2203) [@pawel-big-lebowski](https://github.com/pawel-big-lebowski)
29+
*Proposes using ElasticSearch as a pluggable search service to enhance the search feature in Marquez and adding the ability to turn it off, as well. Includes ideas about what should be indexed and the requirements for the interface.*
30+
31+
### Fixed
32+
33+
* Show facets even when dataset has no fields [`#2214`](https://github.com/MarquezProject/marquez/pull/2214) [@JDarDagran](https://github.com/JDarDagran)
34+
*Changes the logic in the `DatasetInfo` component to always show facets so that dataset facets are visible in the UI even if no dataset fields have been set.*
35+
* Appreciate column prefix when given for `ended_at` [`#2231`](https://github.com/MarquezProject/marquez/pull/2231) [@fm100](https://github.com/fm100)
36+
*The `ended_at` column was always null when querying if `columnPrefix` was given for the mapper. Now, `columnPrefix` is included when checking for column existence.*
37+
* Fix bug keeping jobs from being properly deleted [`#2244`](https://github.com/MarquezProject/marquez/pull/2244) [@mobuchowski](https://github.com/mobuchowski)
38+
*It wasn't possible to delete jobs created from events that had a `ParentRunFacet`. Now it's possible.*
39+
* Fix symlink table column length ['#2217'](https://github.com/MarquezProject/marquez/pull/2217) [@pawel-big-lebowski](https://github.com/pawel-big-lebowski)
40+
*The dataset's name column in the `dataset_symlinks` table was shorter than the column in the datasets table. Changes the existing V48 migration script to allow proper migration for users who did not upgrade yet, and adds an extra migration script to extend the column length for users who did upgrade but did not experience the issues.*
41+
442

543
## [0.27.0](https://github.com/MarquezProject/marquez/compare/0.26.0...0.27.0) - 2022-10-24
644

@@ -42,11 +80,11 @@
4280

4381
* Add support for `parentRun` facet as reported by older Airflow OpenLineage versions [`#2130`](https://github.com/MarquezProject/marquez/pull/2130) [@collado-mike](https://github.com/collado-mike)
4482
*Adds a `parentRun` alias to the `LineageEvent` `RunFacet`.*
45-
* Add fix and tests for handling Airflow DAGs with dots and task groups [`2126`](https://github.com/MarquezProject/marquez/pull/2126) [@collado-mike](https://github.com/collado-mike) [@wslulciuc](https://github.com/wslulciuc)
83+
* Add fix and tests for handling Airflow DAGs with dots and task groups [`#2126`](https://github.com/MarquezProject/marquez/pull/2126) [@collado-mike](https://github.com/collado-mike) [@wslulciuc](https://github.com/wslulciuc)
4684
*Fixes a recent change that broke how Marquez handles DAGs with dots and tasks within task groups and adds test cases to validate.*
47-
* Fix version bump in `docker/up.sh` [`2129`](https://github.com/MarquezProject/marquez/pull/2129) [@wslulciuc](https://github.com/wslulciuc)
85+
* Fix version bump in `docker/up.sh` [`#2129`](https://github.com/MarquezProject/marquez/pull/2129) [@wslulciuc](https://github.com/wslulciuc)
4886
*Defines a `VERSION` variable to bump on a release.*
49-
* Use `clean` when running `shadowJar` in Dockerfile [`2145`](https://github.com/MarquezProject/marquez/pull/2145) [@wslulciuc](https://github.com/wslulciuc)
87+
* Use `clean` when running `shadowJar` in Dockerfile [`#2145`](https://github.com/MarquezProject/marquez/pull/2145) [@wslulciuc](https://github.com/wslulciuc)
5088
*Ensures the directory `api/build/libs/` is cleaned before building the JAR again and updates `.dockerignore` to ignore `api/build/*`.*
5189
* Fix bug that caused a single run event to create multiple jobs [`#2162`](https://github.com/MarquezProject/marquez/pull/2162) [@collado-mike](https://github.com/collado-mike)
5290
*Checks to see if a run with the given ID already exists and uses the pre-associated job if so.*

CODE_QUALITY_AND_SECURITY.md

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
# Code Quality and Security Assurance Statement
2+
3+
The authors of Marquez are committed to providing secure software of the highest quality possible. To this end, we employ a number of tools and methodologies to ensure that our design, build, maintenance and testing practices maximize efficiency and minimize risk.
4+
5+
The specific security and analysis methodologies that we employ include but are not limited to:
6+
7+
## Security
8+
9+
- Participation in the [OpenSSF Best Practices Badge Program](https://bestpractices.coreinfrastructure.org/en/projects/5106) for Free/Libre and FLOSS projects to ensure that we follow current best practices for quality and security
10+
- Use of [HTTPS](https://en.wikipedia.org/wiki/HTTPS) for network communication
11+
- Support for multiple cryptographic algorithms (through the use of HTTPS)
12+
- Separate storage of authentication credentials according to best practices
13+
- Use of secure protocols for network communication (through the use of HTTPS)
14+
- Up-to-date support for TLS/SSL (through the use of [OpenSSL](https://www.openssl.org/))
15+
- Performance of TLS certificate verification by default before sending HTTP headers with private information (through the use of OpenSSL and HTTPS)
16+
- Distribution of the software via cryptographically signed releases (on the [PyPI](https://pypi.org/) and [Maven](https://mvnrepository.com/) package repositories)
17+
- Use of [GitHub](https://github.com/) Issues for vulnerability reporting and tracking
18+
19+
## Analysis
20+
21+
- Use of [PMD](https://pmd.github.io/) and [Spotless](https://github.com/diffplug/spotless) for Java code linting on pull requests and builds
22+
- Use of [Flake8](https://flake8.pycqa.org/en/latest/) and [Pytest](https://docs.pytest.org/en/7.2.x/) for Python code linting on pull requests and builds
23+
- Use of GitHub Issues for bug reporting and tracking
24+
25+
## Contact
26+
27+
For more information about our approach to quality and security, feel free to reach out to the Marquez development team:
28+
29+
- Slack: [Marquezproject.slack.com](http://bit.ly/MarquezSlack)
30+
- Twitter: [@MarquezProject](https://twitter.com/MarquezProject)

COMMITTERS.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
# Marquez Committers
2-
The Marquez Committers are the group of people who can accept Pull Request to Marquez.
2+
The Marquez Committers are the group of people who can accept Pull Requests to Marquez.
33
They take responsibility for guiding new pull requests into the main branch.
44

55

@@ -26,13 +26,13 @@ They take responsibility for guiding new pull requests into the main branch.
2626
## Emeritus
2727

2828
The following people are no longer working on the Marquez project.
29-
However they have been a committer in the past and through their
29+
However, they have been committers in the past and, through their
3030
contributions, we have a strong foundation to build on.
3131

3232
| Name | Handle |
3333
| ---------------- | ----------------------------|
3434

3535
# Becoming a Committer
3636

37-
A Contributor may become a Committer by a majority approval of the
38-
existing Committers. (per the project [charter](https://wiki.lfaidata.foundation/download/attachments/18481434/Marquez%20Project%20Technical%20Charter%20Final_Adopted%2005.21.20.pdf?version=1&modificationDate=1591718661000&api=v2))
37+
A Contributor may become a Committer by the approval of a majority of the
38+
existing Committers (as per the project [charter](https://wiki.lfaidata.foundation/download/attachments/18481434/Marquez%20Project%20Technical%20Charter%20Final_Adopted%2005.21.20.pdf?version=1&modificationDate=1591718661000&api=v2)).

CONTRIBUTING.md

Lines changed: 43 additions & 43 deletions
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,7 @@ We use [spotless](https://github.com/diffplug/spotless) to format our code. This
4646
$ ./gradlew spotlessApply
4747
```
4848

49-
> **Note:** To make formatting code simple, we recommend installing a [plugin](https://github.com/google/google-java-format#intellij-android-studio-and-other-jetbrains-ides) for your favorite IDE. We also us [Lombok](https://projectlombok.org). Though not required, you might want to install the [plugin](https://projectlombok.org/setup/overview) as well.
49+
> **Note:** To make formatting code simple, we recommend installing a [plugin](https://github.com/google/google-java-format#intellij-android-studio-and-other-jetbrains-ides) for your favorite IDE. We also use [Lombok](https://projectlombok.org). Though not required, you might want to install the [plugin](https://projectlombok.org/setup/overview), as well.
5050
5151
# `.git/hooks`
5252

@@ -94,7 +94,7 @@ act pull_request --reuse --verbose
9494
9595
# Troubleshooting
9696

97-
There is an issue within the _act_ tool that prevents the _kind_ cluster from being deleted after execution the action.
97+
There is an issue within the _act_ tool that prevents the _kind_ cluster from being deleted after execution of the action.
9898
When this condition exists, you will experience the error below.
9999

100100
```bash
@@ -122,12 +122,12 @@ $ ./gradlew publishToMavenLocal
122122
1. [Fork](https://github.com/MarquezProject/marquez/fork) and clone the repository
123123
2. Make sure all tests pass locally: `./gradlew :api:test`
124124
3. Create a new [branch](#branching): `git checkout -b feature/my-cool-new-feature`
125-
4. Make change on your cool new branch
125+
4. Make a change on your cool new branch
126126
5. Write a test for your change
127-
6. Make sure `.java` files are formatted: `./gradlew spotlessJavaCheck`
127+
6. Make sure `.java` files are formatted: `./gradlew spotlessJavaCheck`
128128
7. Make sure `.java` files contain a [copyright and license header](#copyright--license)
129129
8. Make sure to [sign you work](#sign-your-work)
130-
9. Push change to your fork and [submit a pull request](https://github.com/MarquezProject/marquez/compare)
130+
9. Push the change to your fork and [submit a pull request](https://github.com/MarquezProject/marquez/compare)
131131
10. Work with project maintainers to get your change reviewed and merged into the `main` branch
132132
11. Delete your branch
133133

@@ -137,17 +137,17 @@ To ensure your pull request is accepted, follow these guidelines:
137137
* Do your best to have a [well-formed commit message](https://tbaggery.com/2008/04/19/a-note-about-git-commit-messages.html) for your change
138138
* [Keep diffs small](https://kurtisnusbaum.medium.com/stacked-diffs-keeping-phabricator-diffs-small-d9964f4dcfa6) and self-contained
139139
* If your change fixes a bug, please [link the issue](https://help.github.com/articles/closing-issues-using-keywords) in your pull request description
140-
* Any changes to the API reference requires [regenerating](#api-docs) the static `openapi.html` file.
140+
* Any changes to the API reference require [regenerating](#api-docs) the static `openapi.html` file.
141141

142142
> **Note:** A pull request should generally contain only one commit (use `git commit --amend` and `git push --force` or [squash](http://gitready.com/advanced/2009/02/10/squashing-commits-with-rebase.html) existing commits into one).
143143
144144
# Branching
145145

146-
* Use a _group_ at the beginning of your branch names
146+
* Use a _group_ at the beginning of your branch names:
147147

148148
```
149149
feature Add or expand a feature
150-
bug Fix a bug
150+
bug Fix a bug
151151
proposal Propose a change
152152
```
153153

@@ -156,7 +156,7 @@ To ensure your pull request is accepted, follow these guidelines:
156156
```
157157
feature/my-cool-new-feature
158158
bug/my-bug-fix
159-
bug/my-other-bug-fix
159+
bug/my-other-bug-fix
160160
proposal/my-proposal
161161
```
162162

@@ -167,18 +167,18 @@ To ensure your pull request is accepted, follow these guidelines:
167167
# Dependencies
168168

169169
We use [renovate](https://github.com/renovatebot/renovate) to manage dependencies for most of our project modules,
170-
with a couple of exceptions. Renovate automatically detects new dependency versions, and opens pull
171-
requests to upgrade dependencies in accordance to the [configured rules](https://github.com/MarquezProject/marquez/blob/main/renovate.json).
170+
with a couple of exceptions. Renovate automatically detects new dependency versions and opens pull
171+
requests to upgrade dependencies in accordance with the [configured rules](https://github.com/MarquezProject/marquez/blob/main/renovate.json).
172172

173-
The following dependencies are managed manually
173+
The following dependencies are managed manually:
174174

175175
* _Web code_ - it is challenging to programmatically validate web content
176176
* _Spark versions_ - the internal query plans parsed by the Spark OpenLineage integration are not stable across Spark versions
177177
* _Gradle_ - this tool orchestrates the entire build pipeline and was excluded to ensure stability
178178

179179
# Sign Your Work
180180

181-
The _sign-off_ is a simple line at the end of the message for a commit. All commits needs to be signed. Your signature certifies that you wrote the patch or otherwise have the right to contribute the material (see [Developer Certificate of Origin](https://developercertificate.org)):
181+
The _sign-off_ is a simple line at the end of the message for a commit. All commits need to be signed. Your signature certifies that you wrote the patch or otherwise have the right to contribute the material (see [Developer Certificate of Origin](https://developercertificate.org)):
182182

183183
```
184184
This is my commit message
@@ -208,42 +208,42 @@ $ redoc-cli serve spec/openapi.yml
208208

209209
Then browse to: http://localhost:8080
210210

211-
> **Note:** To bundle or serve the API docs, please install [redoc-cli](https://www.npmjs.com/package/redoc-cli).
212-
213-
# `COPYRIGHT` / `LICENSE`
214-
215-
We use [SPDX](https://spdx.dev) for copyright and license information. The following license header **must** be included in all `java,` `bash`, and `py` source files:
216-
217-
`java`
218-
219-
```
220-
/*
221-
* Copyright 2018-2022 contributors to the Marquez project
222-
* SPDX-License-Identifier: Apache-2.0
223-
*/
224-
```
225-
226-
`bash`
227-
228-
```
229-
#!/bin/bash
230-
#
231-
# Copyright 2018-2022 contributors to the Marquez project
232-
# SPDX-License-Identifier: Apache-2.0
233-
```
234-
235-
`py`
236-
237-
```
238-
# Copyright 2018-2022 contributors to the Marquez project
239-
# SPDX-License-Identifier: Apache-2.0
211+
> **Note:** To bundle or serve the API docs, please install [redoc-cli](https://www.npmjs.com/package/redoc-cli).
212+
213+
# `COPYRIGHT` / `LICENSE`
214+
215+
We use [SPDX](https://spdx.dev) for copyright and license information. The following license header **must** be included in all `java,` `bash`, and `py` source files:
216+
217+
`java`
218+
219+
```
220+
/*
221+
* Copyright 2018-2022 contributors to the Marquez project
222+
* SPDX-License-Identifier: Apache-2.0
223+
*/
224+
```
225+
226+
`bash`
227+
228+
```
229+
#!/bin/bash
230+
#
231+
# Copyright 2018-2022 contributors to the Marquez project
232+
# SPDX-License-Identifier: Apache-2.0
233+
```
234+
235+
`py`
236+
237+
```
238+
# Copyright 2018-2022 contributors to the Marquez project
239+
# SPDX-License-Identifier: Apache-2.0
240240
```
241241

242242
# Resources
243243

244244
* [How to Contribute to Open Source](https://opensource.guide/how-to-contribute)
245245
* [Using the Fork-and-Branch Git Workflow](https://blog.scottlowe.org/2015/01/27/using-fork-branch-git-workflow)
246246
* [Understanding the GitHub flow](https://guides.github.com/introduction/flow/)
247-
* [Keep a Changelog](https://keepachangelog.com)
247+
* [Keeping a Changelog](https://keepachangelog.com)
248248
* [Code Review Developer Guide](https://google.github.io/eng-practices/review)
249249
* [Signing Commits](https://docs.github.com/en/github/authenticating-to-github/signing-commits)

0 commit comments

Comments
 (0)