doc-gen: migrate scalar functions (encoding & regex) documentation#13919
Conversation
alamb
left a comment
There was a problem hiding this comment.
Thank you @Chen-Yuan-Lai 🎉
I noticed the CI is failing on this PR: https://github.com/apache/datafusion/actions/runs/12517861740/job/34921789845?pr=13919
It looks like some of the content is getting lost - when I ran ./dev/update_function_docs.sh locally, the results seem to show the regexp functions having lost their documentation 🤔
(venv) andrewlamb@Andrews-MacBook-Pro-2:~/Software/datafusion2$ git diff
diff --git a/docs/source/user-guide/sql/scalar_functions.md b/docs/source/user-guide/sql/scalar_functions.md
index be4f5e56b..f9585ce12 100644
--- a/docs/source/user-guide/sql/scalar_functions.md
+++ b/docs/source/user-guide/sql/scalar_functions.md
@@ -1729,12 +1729,12 @@ uuid()
Decode binary data from textual representation in string.
-decode(expression, format)
+decode(e xpression, format)
#### Arguments
-- **expression**: Expression containing encoded string data
+- **expression**: Expression containing string or binary data
- **format**: Same arguments as [encode](#encode)
**Related functions**:
@@ -1758,167 +1758,6 @@ encode(expression, format)
- [decode](#decode)
-## Regular Expression Functions
-
-Apache DataFusion uses a [PCRE-like](https://en.wikibooks.org/wiki/Regular_Expressions/Perl-Compatible_Regular_Expressions)
-regular expression [syntax](https://docs.rs/regex/latest/regex/#syntax)
-(minus support for several features including look-around and backreferences).
-The following regular expression functions are supported:
-
-- [regexp_count](#regexp_count)
-- [regexp_like](#regexp_like)
-- [regexp_match](#regexp_match)
-- [regexp_replace](#regexp_replace)
....
| #[user_doc( | ||
| doc_section(label = "Binary String Functions"), | ||
| description = "Decode binary data from textual representation in string.", | ||
| syntax_example = "decode(e xpression, format)", |
There was a problem hiding this comment.
| syntax_example = "decode(e xpression, format)", | |
| syntax_example = "decode(expression, format)", |
There was a problem hiding this comment.
Thank you @alamb for the correction, I have fixed the typo
|
I think the PR is just a part of #13671. Modifying the |
|
Sure! I'll modfy all the other PRs. Thanks @goldmedal |
| - [regexp_like](#regexp_like) | ||
| - [regexp_match](#regexp_match) | ||
| - [regexp_replace](#regexp_replace) | ||
|
|
There was a problem hiding this comment.
this shouldn't be the case the entire doc for 4 functions is vanished, checking
There was a problem hiding this comment.
I still have no idea why the documentation of regular expression functions was lost. Thanks @comphead
There was a problem hiding this comment.
The reason being the document printer checks first the doc section if its doesn't match with predefined enum the printer skips such function. The reason why it skips we need add a description to doc_section to be as
doc_section(
label = "Regular Expression Functions",
description = r#"Apache DataFusion uses a [PCRE-like](https://en.wikibooks.org/wiki/Regular_Expressions/Perl-Compatible_Regular_Expressions)
regular expression [syntax](https://docs.rs/regex/latest/regex/#syntax)
(minus support for several features including look-around and backreferences).
The following regular expression functions are supported:"#,
),
The description doesn't match as in attribute the description is not set
There was a problem hiding this comment.
I'll file a ticket to make user_doc to work with predefined consts.
Currently the doc_section attribute must match fully the predefined DocSection consts, for example
pub const DOC_SECTION_REGEX: DocSection = DocSection {
include: true,
label: "Regular Expression Functions",
description: Some(
r#"Apache DataFusion uses a [PCRE-like](https://en.wikibooks.org/wiki/Regular_Expressions/Perl-Compatible_Regular_Expressions)
regular expression [syntax](https://docs.rs/regex/latest/regex/#syntax)
(minus support for several features including look-around and backreferences).
The following regular expression functions are supported:"#,
),
};
In case of doc_section attribute contains any mismatches such function will be silently ignored. I believe we can make doc macros more smart like:
- the doc_section will be just a string
- using the string find correspondent const from
scalar_doc_sections.doc_sections() - in
datafusion/macros/src/user_doc.rswhen constructing the builder use the const instead of buildingDocSectionmanually
|
Is this PR ready to go? Or are we waiting for something else to finisih it up? |
|
Its not ready yet, it can be fixed by #14001 (preferrable) or alternatively I can do manual correction. I'm sending this to draft for now |
072d756 to
c832494
Compare
comphead
left a comment
There was a problem hiding this comment.
lgtm thanks @Chen-Yuan-Lai
Which issue does this PR close?
Part of #13671 .
Rationale for this change
What changes are included in this PR?
As discussed in #13671, this PR will migrate the builtin binary string and regular expression functions documentation that currently support migration.
Are these changes tested?
Are there any user-facing changes?