fix: add type checks to prevent AttributeError when LLM returns malformed JSON by octo-patch · Pull Request #210 · VectifyAI/PageIndex

octo-patch · 2026-04-02T07:08:08Z

Fixes #199

Problem

When the LLM returns malformed JSON, extract_json() in utils.py returns {} (empty dict) as a fallback. This causes two crashes downstream:

Crash 1: AttributeError: 'dict' object has no attribute 'extend' in process_no_toc

generate_toc_init() can return {} on parse failure
Calling .extend() on a dict raises AttributeError

Crash 2: AttributeError: 'str' object has no attribute 'get' in meta_processor

When LLM returns a list of strings instead of dicts, iterating and calling .get() fails

Solution

Added minimal type guards at the two crash sites:

process_no_toc: Check that generate_toc_init() returns a list before using it. Skip non-list results from generate_toc_continue() rather than crashing.
meta_processor: Added isinstance(item, dict) check in the list comprehension filter so non-dict items (strings, etc.) are safely discarded instead of raising AttributeError.

These are the smallest possible fixes — they don't change the overall flow, just guard against unexpected return types from extract_json().

Testing

Reproducible with any local vLLM endpoint or small model (e.g. Qwen 7B) that frequently returns malformed JSON. With this fix, the pipeline gracefully handles parse failures instead of crashing with AttributeError.

…rmed JSON (fixes VectifyAI#199)

claude

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

Combines fixes from PRs VectifyAI#217, VectifyAI#210, and additional guards for all remaining crash sites in toc_transformer and related functions. Fixes: - TypeError: int + NoneType when calculate_page_offset returns None (VectifyAI#153) - KeyError on dict access when extract_json returns {} (VectifyAI#163) - AttributeError: NoneType has no startswith (VectifyAI#199) - KeyError: 'table_of_contents' when LLM output is malformed - AttributeError: 'dict' has no extend / 'str' has no get Changes: - add_page_offset_to_toc_json: guard None offset, return data unchanged - process_none_page_numbers: fix prev/next defaults (start_index/end_index) - toc_detector_single_page: .get() with 'no' default - check_if_toc_extraction_is_complete: .get() with 'no' default - check_if_toc_transformation_is_complete: .get() with 'no' default - detect_page_index: .get() with 'no' default - toc_transformer: isinstance + .get() for table_of_contents access - toc_transformer: None/isinstance guard on new_complete.startswith - single_toc_item_index_fixer: .get() for physical_index - meta_processor: isinstance(item, dict) filter - process_no_toc: isinstance guard for generate_toc_init/continue results

fix: add type checks to prevent AttributeError when LLM returns malfo…

9138b99

…rmed JSON (fixes VectifyAI#199)

claude bot reviewed Apr 2, 2026

View reviewed changes

sicko7947 mentioned this pull request Apr 6, 2026

fix: comprehensive crash guards for malformed LLM output #218

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: add type checks to prevent AttributeError when LLM returns malformed JSON#210

fix: add type checks to prevent AttributeError when LLM returns malformed JSON#210
octo-patch wants to merge 1 commit intoVectifyAI:mainfrom
octo-patch:fix/issue-199-type-check-malformed-json

octo-patch commented Apr 2, 2026

Uh oh!

claude bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

octo-patch commented Apr 2, 2026

Problem

Solution

Testing

Uh oh!

claude bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant