chore: removing APIs and deprecation warnings: 0.30.x here we come#3962
chore: removing APIs and deprecation warnings: 0.30.x here we come#3962rtyler merged 14 commits intodelta-io:mainfrom
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #3962 +/- ##
===========================================
+ Coverage 26.22% 74.50% +48.28%
===========================================
Files 124 152 +28
Lines 19885 39952 +20067
Branches 19885 39952 +20067
===========================================
+ Hits 5214 29765 +24551
+ Misses 14301 8854 -5447
- Partials 370 1333 +963 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
8c9e6e6 to
b5e4cd0
Compare
roeap
left a comment
There was a problem hiding this comment.
first pass over pushed changes. Just one question.
we need some more clippy, but awesome to get rid of some of this.
059834e to
659b756
Compare
|
Cool! If you need my help I can rerun somthing like this PR to remove the other clippy warnings. |
659b756 to
6740f2f
Compare
07e5913 to
30dc809
Compare
Signed-off-by: R. Tyler Croy <rtyler@brokenco.de>
Signed-off-by: R. Tyler Croy <rtyler@brokenco.de>
When using LogStoreConfig it is possible to pass a Url which does allow for safe `join()` operations inside of delta-kernel-rs. This change ensures that we always have a trailing slash before passing things off into kernel Signed-off-by: R. Tyler Croy <rtyler@brokenco.de>
Signed-off-by: R. Tyler Croy <rtyler@brokenco.de>
Signed-off-by: R. Tyler Croy <rtyler@brokenco.de>
This is almost entirely formatting changes, which is a _wee_ bit annoying. There were no practical code changes required however. Signed-off-by: R. Tyler Croy <rtyler@brokenco.de>
Signed-off-by: R. Tyler Croy <rtyler@brokenco.de>
Signed-off-by: R. Tyler Croy <rtyler@brokenco.de>
Every trailing slashes cause all sorts of subtle equivalency confusion when we do things with a Url. It's better for us to normalize everything to always have a trailing slash which makes it easier to join with, and do other things. Signed-off-by: R. Tyler Croy <rtyler@brokenco.de>
30dc809 to
df3739a
Compare
ee617c8 to
da5c1cc
Compare
In this commit I am intentionally introducing the convention in our Rust
APIs:
* anything named `uri` is expected to take a `AsRef<str>` type which
is expected to turn into a `Url`
* anything named `url` is expected to take a `Url` type.
As such I have renamed a number of our APIs which previously had been
converted to use `Url` but were referring to them as `uri`
Signed-off-by: R. Tyler Croy <rtyler@brokenco.de>
da5c1cc to
d107d57
Compare
| match operation_id { | ||
| Some(op) => self.get_transaction_url(op, self.config.location().to_string()), | ||
| None => Err(DeltaTableError::InvalidData { | ||
| violations: vec!["LakeFS must use operation_ids for operations".into()], |
There was a problem hiding this comment.
Small nit, but the nested error doesn't surface now which gives a hint in which area of the code it happens
| // NOTE: the lack of trailing slashes is a load-bearing implementation | ||
| // detail between the Delta/Spark and delta-rs S3DynamoDbLogStore |
| Url::parse("s3://bucket/prefix with space/").unwrap(), | ||
| "/prefix%20with%20space/", | ||
| ), | ||
| //(Url::parse("s3://bucket/special&chars/你好/😊").unwrap(), "/special&chars/你好/😊/"), |
There was a problem hiding this comment.
should this work, or should we remove it?
This is a hilariously invasive change insofar that we have URL-like strings leaking out all over the place in our codebase. This channge attempts to wrangle as much of that together as possible and has test API changes which support the preceeding commit where _uri functions become _url Signed-off-by: R. Tyler Croy <rtyler@brokenco.de>
The URLs used when writing entries into the table should always be normalized to ensure that the value of table_url() is identical to the one that is being written into the DynamoDB table. There may be some more testing required with the Delta/Spark implementation of S3DynamoDbLogStore to ensure that they are always generating URLs to the table with trailing slashes. Delta/Spark stores paths as [org.apache.hadoop.fs.Path](https://hadoop.apache.org/docs/stable/api/org/apache/hadoop/fs/Path.html) which should have URL-like semantics. Without treating trailing slashes as required, this could lead to inconsistencies between the two implementations. Signed-off-by: R. Tyler Croy <rtyler@brokenco.de>
Any places where we see hard-coded string formatting (e.g. format!()) around Url is likely to be **wrong** and need to be removed. This commit fixes the failing integration tests with the lakefs connector Signed-off-by: R. Tyler Croy <rtyler@brokenco.de>
…ad-bearing For better or worse, the trailing slashes on the `tablePath` DynamoDB items are semantically important and a trailing slash coming from deltalake-core causes lookup failures with multiple writers between Spark and Rust. This change ensures that the S3DynamoDbLogStore is always removing the trailing slash should it exist. Signed-off-by: R. Tyler Croy <rtyler@brokenco.de>
e64c8e1 to
80ab73b
Compare
| } | ||
|
|
||
| impl LogStoreConfig { | ||
| pub fn new(mut location: Url, options: StorageConfig) -> Self { | ||
| if !location.path().ends_with('/') { |
There was a problem hiding this comment.
Why don't we use normalize_table_url here?
There was a problem hiding this comment.
this is probably one of the many ones that was already sprinkled through the code. but would be good to update as well.
hntd187
left a comment
There was a problem hiding this comment.
I made it. LGTM, I left some nits, but approved whether you fix them or not.
| @@ -261,7 +265,7 @@ async fn test_abort_commit_entry() -> TestResult<()> { | |||
| .await?; | |||
|
|
|||
| // The entry should have been aborted - the latest entry should be one version lower | |||
| if let Some(new_entry) = client.get_latest_entry(&table.table_uri()).await? { | |||
| if let Some(new_entry) = client.get_latest_entry(&table.table_url().as_str()).await? { | |||
There was a problem hiding this comment.
Just the pattern is gonna be to use .as_str() and such?
| @@ -336,7 +344,7 @@ async fn test_concurrent_writers() -> TestResult<()> { | |||
| println!(">>> preparing table"); | |||
| let table = prepare_table(&context, "concurrent_writes").await?; | |||
| println!(">>> table prepared"); | |||
| let table_uri = table.table_uri(); | |||
| let table_uri = table.table_url(); | |||
| @@ -72,7 +72,7 @@ async fn test_object_store_onelake_abfs() -> TestResult { | |||
| #[allow(dead_code)] | |||
| async fn read_write_test_onelake(context: &IntegrationContext, path: &Path) -> TestResult { | |||
| let table_uri = Url::parse(&context.root_uri()).unwrap(); | |||
| let delta_scan = DeltaScan::new( | ||
| &wire.table_url, | ||
| wire.config, | ||
| (*inputs)[0].clone(), |
There was a problem hiding this comment.
This line totally makes sense, but I know we can't do much here.
| } | ||
| } | ||
|
|
||
| #[non_exhaustive] |
There was a problem hiding this comment.
What does this mean here?
| @@ -421,7 +416,7 @@ impl std::future::IntoFuture for WriteBuilder { | |||
|
|
|||
| fn into_future(self) -> Self::IntoFuture { | |||
| let this = self; | |||
| let table_uri = this.log_store.root_uri(); | |||
| let table_uri = this.log_store.root_url().clone(); | |||
| DeltaTable::new(log_store, Default::default()), | ||
| ) | ||
| } else { | ||
| let storage_url = | ||
| ensure_table_uri(self.location.clone().ok_or(CreateError::MissingLocation)?)?; | ||
| ( | ||
| storage_url.as_str().to_string(), | ||
| DeltaTableBuilder::from_uri(storage_url)? | ||
| storage_url.clone(), |
There was a problem hiding this comment.
never thought I'd rather see a clone than what it used to be
| let trimmed_path = url.path().trim_end_matches('/').to_owned(); | ||
| url.set_path(&trimmed_path); | ||
| Ok(url) | ||
| // We should always be normalizing the table URL because trailing or redundant slashes can be |
| @@ -140,45 +137,6 @@ impl DeltaTableState { | |||
| self.snapshot.snapshot().tombstones(log_store) | |||
| } | |||
|
|
|||
| /// Full list of add actions representing all parquet files that are part of the current | |||
…e references This is a follow up to delta-io#3962 where some leftover comments were not addressed Signed-off-by: R. Tyler Croy <rtyler@brokenco.de>
…e references This is a follow up to #3962 where some leftover comments were not addressed Signed-off-by: R. Tyler Croy <rtyler@brokenco.de>
…e references This is a follow up to delta-io#3962 where some leftover comments were not addressed Signed-off-by: R. Tyler Croy <rtyler@brokenco.de>
💣 💥
This is a pretty massive change, regrettably, but moves a tremendous amount of code from the 🦀 from using conventional/goofy
&strimplementations to usingurl::Url. This ended up surfacing a number of Url oddities that needed to be accounted for, most notably the semantic importance of a trailing/on our use ofUrlsince we heavily rely onUrl::join.There are additional changes here as a result of a Rust edition upgrade to
2024and the corresponding lints and clippy checks.To boot, a lot of deprecated code is hereby removed in this change, thus the big negative number 🔨