Skip to content

feat: google drive error resolution#9842

Open
evan-onyx wants to merge 3 commits intomainfrom
feat/resolve-errors-efficiency2
Open

feat: google drive error resolution#9842
evan-onyx wants to merge 3 commits intomainfrom
feat/resolve-errors-efficiency2

Conversation

@evan-onyx
Copy link
Copy Markdown
Contributor

@evan-onyx evan-onyx commented Apr 1, 2026

Description

Add a new interface for connectors to implement, intended to be used for individual indexing error resolution. Implemented the interface for google drive.

How Has This Been Tested?

added connector tests

Additional Options

  • [Optional] Please cherry-pick this PR to the latest release version.
  • [Optional] Override Linear Check

Summary by cubic

Adds a new Resolver interface and Google Drive error resolution to re-fetch failed documents by webViewLink using batched Drive API calls. Emits ancestor folders and can sync permissions to reduce API calls and speed up recovery.

  • New Features

    • Added Resolver.resolve_errors(errors, include_permissions=False) to re-process failures and emit Document and HierarchyNode.
    • Implemented GoogleDriveConnector.resolve_errors with batched files().get (100-item chunks), yields per-link failures when fetches fail, emits ancestors before documents, and optionally syncs permissions.
    • Added get_files_by_web_view_links_batch using Drive BatchHttpRequest; skips invalid links and continues; daily tests cover single/multiple files, invalid links, empty inputs, entity-failure skips, and hierarchy validation.
  • Bug Fixes

    • Fixed Drive field selection for files().get by deriving single-file fields from list fields, preventing missing or incorrect metadata.

Written for commit 043df22. Summary will update on new commits.

@evan-onyx evan-onyx requested a review from a team as a code owner April 1, 2026 23:20
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 1, 2026

Preview Deployment

Status Preview Commit Updated
https://onyx-preview-8qlptlx5r-danswer.vercel.app dc25903 2026-04-01 23:21:56 UTC

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 1, 2026

Greptile Summary

This PR introduces a new Resolver interface in interfaces.py and implements GoogleDriveConnector.resolve_errors() to quickly re-fetch and re-index documents that previously failed, using the Drive batch API. A helper function get_files_by_web_view_links_batch (with correct single-file field extraction via _extract_single_file_fields) is added in file_retrieval.py, and a new daily integration-test suite covers the main scenarios.

Key changes:

  • New Resolver ABC with resolve_errors(errors, include_permissions=False) returning a generator of Document | ConnectorFailure | HierarchyNode.
  • GoogleDriveConnector now implements Resolver; the implementation batches files via the Drive BatchHttpRequest API, walks ancestors for hierarchy, and converts files in parallel.
  • get_files_by_web_view_links_batch splits requests into ≤100-item chunks and propagates per-item errors as BatchRetrievalResult.errors.
  • Integration tests cover single/multi-file, invalid links, empty input, entity-failure skipping, and hierarchy-node validation.

Issue found:

  • resolve_errors fetches files with DriveFileFieldType.WITH_PERMISSIONS when exclude_domain_link_only is True, correctly acquiring the data needed for filtering — but the actual has_link_only_permission guard that exists in both _convert_retrieved_files_to_documents and _extract_slim_docs_from_google_drive is absent here. Files with domain-link-only access would be re-indexed through this path despite being excluded in all other code paths.

Confidence Score: 4/5

  • Mostly safe to merge but one P1 correctness issue should be addressed first: files with domain-link-only access are not filtered in resolve_errors, causing them to be re-indexed when exclude_domain_link_only=True.
  • There is one P1 logic bug: the exclude_domain_link_only filter is applied in all other indexing paths but is missing from resolve_errors, leading to incorrect document re-indexing for connectors configured with that option. All other aspects of the implementation are well-structured, the batch error propagation is sound, and the test suite is thorough.
  • backend/onyx/connectors/google_drive/connector.py — missing exclude_domain_link_only guard in resolve_errors.

Important Files Changed

Filename Overview
backend/onyx/connectors/interfaces.py Adds new Resolver abstract base class with resolve_errors(); clean interface definition, no issues found.
backend/onyx/connectors/google_drive/file_retrieval.py Adds BatchRetrievalResult, get_files_by_web_view_links_batch, and field-extraction helpers; individual batch-item errors are correctly propagated, but the public wrapper has a redundant early-exit branch (flagged previously).
backend/onyx/connectors/google_drive/connector.py Implements Resolver.resolve_errors() for GoogleDriveConnector; missing exclude_domain_link_only guard means link-only-access files are incorrectly re-indexed when that option is enabled.
backend/tests/daily/connectors/google_drive/test_resolver.py New integration-test suite covering single/multi-file, invalid links, empty input, entity-failure skips, and hierarchy validation; previously flagged dead-code list comprehension has been removed.

Sequence Diagram

sequenceDiagram
    participant Caller
    participant GoogleDriveConnector
    participant get_files_by_web_view_links_batch
    participant Drive BatchHttpRequest
    participant _get_new_ancestors_for_files
    participant _convert_retrieved_file_to_document

    Caller->>GoogleDriveConnector: resolve_errors(errors, include_permissions)
    GoogleDriveConnector->>GoogleDriveConnector: extract doc_ids from errors[].failed_document
    GoogleDriveConnector->>get_files_by_web_view_links_batch: (service, doc_ids, field_type)
    loop chunks of 100
        get_files_by_web_view_links_batch->>Drive BatchHttpRequest: batch.add(files().get) per link
        Drive BatchHttpRequest-->>get_files_by_web_view_links_batch: callback(request_id, response, exception)
    end
    get_files_by_web_view_links_batch-->>GoogleDriveConnector: BatchRetrievalResult{files, errors}
    GoogleDriveConnector-->>Caller: yield ConnectorFailure for each batch error
    GoogleDriveConnector->>_get_new_ancestors_for_files: retrieved_files, permission_sync_context
    _get_new_ancestors_for_files-->>GoogleDriveConnector: ancestor HierarchyNodes
    GoogleDriveConnector-->>Caller: yield HierarchyNode (ancestors)
    GoogleDriveConnector->>_convert_retrieved_file_to_document: parallel (max_workers=8)
    _convert_retrieved_file_to_document-->>GoogleDriveConnector: Document | ConnectorFailure | None
    GoogleDriveConnector-->>Caller: yield Document or ConnectorFailure
Loading
Prompt To Fix All With AI
This is a comment left during a code review.
Path: backend/onyx/connectors/google_drive/connector.py
Line: 1719-1726

Comment:
**Missing `exclude_domain_link_only` filter before building `retrieved_files`**

Both other code paths in this connector that build a document list apply an explicit filter for `exclude_domain_link_only` before handing files off for conversion:

- `_convert_retrieved_files_to_documents` (line 1517):
  ```python
  if self.exclude_domain_link_only and has_link_only_permission(retrieved_file.drive_file):
      continue
  ```
- `_extract_slim_docs_from_google_drive` (line 1806): identical guard.

`resolve_errors` fetches files with `DriveFileFieldType.WITH_PERMISSIONS` when `self.exclude_domain_link_only` is `True` (line 1695), which means the permission data needed for the check *is* available in the response — but the check itself never runs.  As a result, when `exclude_domain_link_only=True`, files whose only access is a domain link will be re-indexed through the error-resolution path even though they are supposed to be excluded by configuration.

```python
retrieved_files = [
    RetrievedDriveFile(
        drive_file=file,
        user_email=self.primary_admin_email,
        completion_stage=DriveRetrievalStage.DONE,
    )
    for file in batch_result.files.values()
    if not (
        self.exclude_domain_link_only
        and has_link_only_permission(file)
    )
]
```

How can I resolve this? If you propose a fix, please make it concise.

Reviews (3): Last reviewed commit: "pr comments" | Re-trigger Greptile

Comment on lines +579 to +582
if exception:
logger.warning(f"Error retrieving file {request_id}: {exception}")
else:
results[request_id] = response
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Silent batch failure swallows transient errors

When an individual batch request fails (e.g., due to a transient network/auth error, not just a permanent 404), the failure is silently dropped — only a logger.warning is emitted and no ConnectorFailure is produced. This makes transient retrieval errors indistinguishable from permanent ones (file deleted, permission revoked).

Per the interface contract, "Caller's responsibility is to delete the old ConnectorFailures and replace with the new ones." A caller who deletes all input failures matching the submitted errors list and only keeps the yielded outputs will permanently lose failure records for documents that failed due to transient batch errors. The document would disappear from both the search index and the failure tracker, with no way to retry.

Consider yielding a new ConnectorFailure (with the exception message) for any batch item that fails, rather than silently dropping it:

def callback(
    request_id: str,
    response: GoogleDriveFileType,
    exception: Exception | None,
) -> None:
    if exception:
        logger.warning(f"Error retrieving file {request_id}: {exception}")
        errors[request_id] = exception  # collect errors for the caller
    else:
        results[request_id] = response

Then the public resolve_errors can yield ConnectorFailure objects for entries present in errors but absent in files.

Prompt To Fix With AI
This is a comment left during a code review.
Path: backend/onyx/connectors/google_drive/file_retrieval.py
Line: 579-582

Comment:
**Silent batch failure swallows transient errors**

When an individual batch request fails (e.g., due to a transient network/auth error, not just a permanent 404), the failure is silently dropped — only a `logger.warning` is emitted and no `ConnectorFailure` is produced. This makes transient retrieval errors indistinguishable from permanent ones (file deleted, permission revoked).

Per the interface contract, "Caller's responsibility is to delete the old `ConnectorFailure`s and replace with the new ones." A caller who deletes all input failures matching the submitted `errors` list and only keeps the yielded outputs will permanently lose failure records for documents that failed due to transient batch errors. The document would disappear from both the search index and the failure tracker, with no way to retry.

Consider yielding a new `ConnectorFailure` (with the exception message) for any batch item that fails, rather than silently dropping it:

```python
def callback(
    request_id: str,
    response: GoogleDriveFileType,
    exception: Exception | None,
) -> None:
    if exception:
        logger.warning(f"Error retrieving file {request_id}: {exception}")
        errors[request_id] = exception  # collect errors for the caller
    else:
        results[request_id] = response
```

Then the public `resolve_errors` can yield `ConnectorFailure` objects for entries present in `errors` but absent in `files`.

How can I resolve this? If you propose a fix, please make it concise.

google_drive_service_acct_connector_factory: Callable[..., GoogleDriveConnector],
) -> None:
"""Resolving an empty error list should yield nothing."""
connector = google_drive_service_acct_connector_factory(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Unused list comprehension result — dead code

The result of this list comprehension is discarded. The variable is never assigned, so this line has no effect on the test. If the intent was to assert something about the number of ConnectorFailures (e.g., that there are none), the assertion is missing.

Suggested change
connector = google_drive_service_acct_connector_factory(
new_failures = [r for r in results if isinstance(r, ConnectorFailure)]

Then add an assertion, e.g. assert len(new_failures) == 0.

Prompt To Fix With AI
This is a comment left during a code review.
Path: backend/tests/daily/connectors/google_drive/test_resolver.py
Line: 183

Comment:
**Unused list comprehension result — dead code**

The result of this list comprehension is discarded. The variable is never assigned, so this line has no effect on the test. If the intent was to assert something about the number of `ConnectorFailure`s (e.g., that there are none), the assertion is missing.

```suggestion
    new_failures = [r for r in results if isinstance(r, ConnectorFailure)]
```

Then add an assertion, e.g. `assert len(new_failures) == 0`.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +554 to +561
fields = _get_fields_for_file_type(field_type)
if len(web_view_links) <= MAX_BATCH_SIZE:
return _get_files_by_web_view_links_batch(service, web_view_links, fields)

result: dict[str, GoogleDriveFileType] = {}
for i in range(0, len(web_view_links), MAX_BATCH_SIZE):
chunk = web_view_links[i : i + MAX_BATCH_SIZE]
result.update(_get_files_by_web_view_links_batch(service, chunk, fields))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Redundant early-exit branch

The if len(web_view_links) <= MAX_BATCH_SIZE guard is unnecessary. The for loop below handles that case in a single iteration, so the early return only adds code without changing behavior. Removing it simplifies the function:

Suggested change
fields = _get_fields_for_file_type(field_type)
if len(web_view_links) <= MAX_BATCH_SIZE:
return _get_files_by_web_view_links_batch(service, web_view_links, fields)
result: dict[str, GoogleDriveFileType] = {}
for i in range(0, len(web_view_links), MAX_BATCH_SIZE):
chunk = web_view_links[i : i + MAX_BATCH_SIZE]
result.update(_get_files_by_web_view_links_batch(service, chunk, fields))
result: dict[str, GoogleDriveFileType] = {}
for i in range(0, len(web_view_links), MAX_BATCH_SIZE):
chunk = web_view_links[i : i + MAX_BATCH_SIZE]
result.update(_get_files_by_web_view_links_batch(service, chunk, fields))
return result
Prompt To Fix With AI
This is a comment left during a code review.
Path: backend/onyx/connectors/google_drive/file_retrieval.py
Line: 554-561

Comment:
**Redundant early-exit branch**

The `if len(web_view_links) <= MAX_BATCH_SIZE` guard is unnecessary. The `for` loop below handles that case in a single iteration, so the early return only adds code without changing behavior. Removing it simplifies the function:

```suggestion
    result: dict[str, GoogleDriveFileType] = {}
    for i in range(0, len(web_view_links), MAX_BATCH_SIZE):
        chunk = web_view_links[i : i + MAX_BATCH_SIZE]
        result.update(_get_files_by_web_view_links_batch(service, chunk, fields))
    return result
```

How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 issues found across 4 files

Confidence score: 2/5

  • There is a high-confidence data-loss/regression risk in backend/onyx/connectors/google_drive/connector.py: resolve_errors can silently drop unresolved error IDs instead of surfacing ConnectorFailure, which can cause failed items to disappear from handling.
  • backend/onyx/connectors/google_drive/file_retrieval.py has two related high-severity concerns (field-mask/projection mismatch and dropped items on batch exceptions) that can break batched error resolution and violate the resolver contract in user-facing failure paths.
  • The test gaps in backend/tests/daily/connectors/google_drive/test_resolver.py reduce safety: current assertions can miss missing hierarchy nodes and do not enforce the invalid-link ConnectorFailure behavior, making regressions easier to ship.
  • Pay close attention to backend/onyx/connectors/google_drive/connector.py, backend/onyx/connectors/google_drive/file_retrieval.py, backend/tests/daily/connectors/google_drive/test_resolver.py - silent failure/drop behavior and insufficient assertions around resolver error handling.
Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="backend/tests/daily/connectors/google_drive/test_resolver.py">

<violation number="1" location="backend/tests/daily/connectors/google_drive/test_resolver.py:134">
P2: This test can pass even when `resolve_errors()` returns no expected hierarchy nodes, because it only validates nodes opportunistically inside the loop and never asserts that the expected IDs were present.</violation>

<violation number="2" location="backend/tests/daily/connectors/google_drive/test_resolver.py:170">
P2: This test does not verify the invalid-link behavior it describes, because `ConnectorFailure` results are ignored instead of asserted away.</violation>
</file>

<file name="backend/onyx/connectors/google_drive/connector.py">

<violation number="1" location="backend/onyx/connectors/google_drive/connector.py:1697">
P1: Unresolved error IDs are silently dropped in `resolve_errors`; emit a `ConnectorFailure` (or raise) for IDs missing from the batch response so failed items are not lost.

(Based on your team's feedback about logging or warning instead of failing silently.) [FEEDBACK_USED]</violation>
</file>

<file name="backend/onyx/connectors/google_drive/file_retrieval.py">

<violation number="1" location="backend/onyx/connectors/google_drive/file_retrieval.py:554">
P1: Use `files.get` field masks here; the current `files.list` projection (`nextPageToken, files(...)`) can break batched error resolution.</violation>

<violation number="2" location="backend/onyx/connectors/google_drive/file_retrieval.py:580">
P1: When a batch request fails (e.g., transient network/auth error), the exception is logged but the failed item is silently dropped from results. Per the `Resolver` interface contract, the caller deletes old `ConnectorFailure`s and replaces them with yielded outputs. This means documents that fail transiently will vanish from both the index and the failure tracker with no way to retry. Consider collecting batch errors and propagating them so `resolve_errors` can yield replacement `ConnectorFailure` objects for items that couldn't be retrieved.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 1, 2026

🖼️ Visual Regression Report

Project Changed Added Removed Unchanged Report
admin 8 0 0 158 View Report
exclusive 0 0 0 8 ✅ No changes

) -> Generator[Document | ConnectorFailure | HierarchyNode, None, None]:
"""Attempts to yield back ALL the documents described by the errors, no checkpointing.

Caller's responsibility is to delete the old ConnectorFailures and replace with the new ones.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comment doesn't make 100% sense to me. it seems to imply you are meant to return ConnectorFailure, but the typing indicates you can return Document and HierarchyNode. i think it just might need some more detail.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

u can return connector failures too!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants