Fix AttributeError crashes when LLM returns malformed JSON#200
Closed
martindecz wants to merge 1 commit intoVectifyAI:mainfrom
Closed
Fix AttributeError crashes when LLM returns malformed JSON#200martindecz wants to merge 1 commit intoVectifyAI:mainfrom
martindecz wants to merge 1 commit intoVectifyAI:mainfrom
Conversation
Fix two crashes caused by malformed LLM JSON output:
1. extract_json() in utils.py now returns [] instead of {} on parse
failure, matching the list-of-dicts type expected by all callers.
2. process_no_toc() in page_index.py now validates that
generate_toc_init() and generate_toc_continue() return lists
before calling .extend().
3. meta_processor() list comprehension now filters out non-dict
items to prevent 'str' object has no attribute 'get' errors.
Fixes VectifyAI#199
martindecz
added a commit
to martindecz/PageIndex
that referenced
this pull request
Mar 29, 2026
Author
|
Closing this PR — our type guards were based on incorrect assumptions about the logger interface (JsonLogger doesn't have .warning()) and the fallback return type change from {} to [] broke more call sites than it fixed (6 out of 13 callers expect dict). The root cause is better addressed in extract_json() itself with robust JSON repair, not in individual callers. Apologies for the noise. |
martindecz
pushed a commit
to martindecz/PageIndex
that referenced
this pull request
Mar 29, 2026
Reverts the type guards added in f8e5a92, 3034bd4, and 9a5da0c. These guards were based on incorrect assumptions: - JsonLogger has no .warning() method, so guards themselves crashed - Changing extract_json fallback from {} to [] broke 6 of 13 callers - Only covered 3 of 13 vulnerable call sites The correct fix is in extract_json/json_repair (kept in 516e791). See issue VectifyAI#199 for context.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes two
AttributeErrorcrashes that occur when the LLM returns malformed JSON output, causingextract_json()to fail and downstream functions to receive wrong types.Closes #199
Changes
pageindex/utils.py-extract_json()now returns[](empty list) instead of{}(empty dict) on parse failure. All callers expect a list of dicts, so the fallback value should match that contract.pageindex/page_index.py-process_no_toc()now validates thatgenerate_toc_init()andgenerate_toc_continue()return lists before calling.extend(). If they return a non-list type (e.g. dict from failed JSON parsing), the code logs a warning and either resets to an empty list or skips the result.pageindex/page_index.py-meta_processor()list comprehension now filters out non-dict items withisinstance(item, dict)before calling.get(), preventing'str' object has no attribute 'get'errors.Reproduction context
These crashes are reliably triggered when using a local vLLM endpoint with smaller models that frequently produce malformed JSON (extra data, empty responses, malformed structure). All 7 test documents (Office docs converted to PDF) failed with these errors.
Test plan