Skip to content

docs: DOC-1234: Core docs Python-Groovy difference checker tool#7859

Open
elijahpetty wants to merge 3 commits intodeephaven:mainfrom
elijahpetty:DOC-1234
Open

docs: DOC-1234: Core docs Python-Groovy difference checker tool#7859
elijahpetty wants to merge 3 commits intodeephaven:mainfrom
elijahpetty:DOC-1234

Conversation

@elijahpetty
Copy link
Copy Markdown
Contributor

compare_docs_v5.py compares the docs, and create_annotated_comparison.py makes a .md file with easy-to-read output.

@elijahpetty elijahpetty self-assigned this Apr 1, 2026
@elijahpetty elijahpetty added documentation Improvements or additions to documentation NoDocumentationNeeded NoReleaseNotesNeeded No release notes are needed. labels Apr 1, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 1, 2026

No docs changes detected for d91d2b5

margaretkennedy
margaretkennedy previously approved these changes Apr 6, 2026
Comment thread docs/excluded_files.txt Outdated
Comment thread docs/create_annotated_comparison.py Outdated
Comment thread docs/docs_utils.py Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name of this file and the output file name give me pause. Why all of the 5 stuff? Why have versions? Should this stuff be in docs/tools or docs/bin? I would argue that it doesn't belong in docs.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the name does not tell the users what it does. Something like compare_docs_py_java.py would be far more obvious about the purpose of the script.

Comment thread docs/compare_docs_v5.py Outdated
Comment on lines +135 to +145
_EQUIVALENT_LINK_SPECS: list[tuple[list[str], str, list[str] | None]] = [
(["empty_table", "emptyTable"], "emptyTable", None),
(["new_table", "newTable"], "newTable", None),
(["multi_join", "multiJoin", "MultiJoin.of"], "MultiJoin", ["multi-join", "multiJoin", "multijoin"]),
(["range_join", "rangeJoin"], "rangeJoin", ["range-join", "rangeJoin"]),
(["keyed_transpose", "keyedTranspose"], "keyedTranspose", ["keyed-transpose", "keyedTranspose"]),
(["partitioned_transform", "partitionedTransform"], "partitionedTransform", ["partitioned-transform", "partitionedTransform"]),
(["one_click", "oneClick"], "oneClick", ["one-click", "oneClick"]),
(["has_data_index", "hasDataIndex"], "hasDataIndex", ["has-data-index", "hasDataIndex"]),
(["meta_table", "metaTable", "meta"], "meta", ["meta_table", "metaTable", "meta"]),
]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the 3rd entries are just snake -> kebab case transformations. This could just be a function instead of more data to manage. Having said that, the first two do not have this mapping. I'm not sure why, but it jumps out at me.

Comment thread docs/compare_docs_v5.py Outdated
Comment thread docs/compare_docs_v5.py Outdated
Comment on lines +235 to +236
def should_exclude_reference_path(rel_path_str: str) -> bool:
"""Return True if this relative path should be excluded due to being in /reference.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good example of a simple place where function naming and pydocs make it very confusing for me as a reviewer. Questions that come to mind:

  1. Why do references need to be excluded?
  2. Why should references be excluded?
  3. Are only some references being excluded?
  4. etc.

To learn how to improve this, lean on AI. In windsurf try:

  1. "Improve all of the pydocs in this file. Use the Google standard."
  2. "as a senior developer, review all of the function names in this file. Teach me how to improve them. Do not change the code."
  3. Switch to planning mode. "As a senior developer, carefully review this file."

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I excluded reference docs for two reasons. The first is that the method names are so different in so many cases that I would need to create a map of which docs to compare. The second is that the actual methods work differently in so many cases that the text is different as a result. Margaret and I thought it was not worth comparing all of those just to make minor phrasing changes when much of the text will be different either way.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed this file, but my review is a high-level review.

  1. Nobody loses a billion dollars if there is a bug in this.
  2. regular expression code is hard to review.
  3. The way this code is structured, named, and documented makes it extremely difficult to follow.

Having said that, my comments are mostly about the CS aspects of the code that will help you become a better developer. We should iterate a few times to help you learn. Between these iterations, you can lean on AI to help teach you. I have included detailed notes in one of the comments. Ask windsurf things like:

  • "As a senior developer, give me a short 1 paragraph analysis of how this code could be better."
    Then once you have feedback, you can more aggressively have Windsurf review and improve the code. Do this in planning mode so that you can get feedback on what it wants to change, and you can change the plan if you don't like it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation NoDocumentationNeeded NoReleaseNotesNeeded No release notes are needed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants