Skip to content

Commit c70c22e

Browse files
JihaoXinclaude
andcommitted
Smart figure placement + compact layout with rich icons
- Agent decides figure placement (full_width vs single_column) per figure, passed as aspect_ratio to PaperBanana/NanoBanana - Full-width (figure*): 21:9 for dual-column, 16:10 for single-column - Single-column (figure): 4:3 compact ratio - Planner prompt: text SHORT (labels ≤5 words) but icons RICH (detailed visual metaphors), layout COMPACT (minimize whitespace) - PaperBanana visual_intent: strict no-title, no-bullets, no-numbers rules - Critic: penalize text overload (sentences inside components → C ≤ 2) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 760ad4a commit c70c22e

File tree

2 files changed

+45
-15
lines changed

2 files changed

+45
-15
lines changed

ark/compiler.py

Lines changed: 26 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -520,11 +520,18 @@ def _generate_nano_banana_figures(self):
520520
"name": "fig_overview",
521521
"caption": "System architecture overview",
522522
"section_context": "Detailed description of what the figure should show, including all components, connections, data flows, and key metrics mentioned in the paper. Be as detailed as possible — the more context, the better the generated figure.",
523-
"latex_label": "fig:overview"
523+
"latex_label": "fig:overview",
524+
"placement": "full_width"
524525
}}
525526
]
526527
```
527528
529+
For each figure, you MUST decide the "placement" field:
530+
- "full_width": for complex figures — multi-stage pipelines, system architectures with many components, diagrams that need horizontal space. Uses `\\begin{{figure*}}` spanning all columns.
531+
- "single_column": for simpler figures — single concept with few components, small diagrams. Uses `\\begin{{figure}}` in one column.
532+
533+
Decision criteria: if the figure has 4+ components, multiple stages, or branching paths → "full_width". If it's a simple 2-3 component relationship → "single_column".
534+
528535
Only include figures that:
529536
1. Are referenced in LaTeX but have no existing file, OR
530537
2. Are concept/architecture/mechanism diagrams that could be improved with AI generation
@@ -563,23 +570,36 @@ def _generate_nano_banana_figures(self):
563570
venue_format = self.config.get("venue_format", venue)
564571
geo = get_geometry(venue_format) if venue_format else {"columnwidth_in": 3.333}
565572

566-
# Determine aspect ratio based on venue
567573
columns = geo.get("columns", 1)
568-
aspect_ratio = "16:9" if columns == 1 else "16:10"
574+
col_w = geo.get("columnwidth_in", 3.333)
575+
text_w = geo.get("textwidth_in", 7.0)
569576

570577
generated = 0
571578
for fig in figures:
572579
name = fig.get("name", "concept_fig")
573580
caption = fig.get("caption", "")
574581
section_ctx = fig.get("section_context", "")
582+
placement = fig.get("placement", "full_width")
575583
output_path = self.figures_dir / f"{name}.png"
576584

577585
# Skip if file already exists and is non-empty
578586
if output_path.exists() and output_path.stat().st_size > 0:
579587
self.log(f" Skipping {name}: already exists", "INFO")
580588
continue
581589

582-
self.log(f" Generating: {name}...", "INFO")
590+
# Determine aspect ratio and width based on agent's placement decision
591+
if columns == 1:
592+
# Single-column templates (NeurIPS): always use textwidth
593+
fig_width = text_w
594+
aspect_ratio = "16:10"
595+
elif placement == "full_width":
596+
fig_width = text_w
597+
aspect_ratio = "21:9" # wide for spanning both columns
598+
else:
599+
fig_width = col_w
600+
aspect_ratio = "4:3" # compact for single column
601+
602+
self.log(f" Generating: {name} (placement={placement}, {fig_width:.1f}in, ratio={aspect_ratio})...", "INFO")
583603

584604
# Try PaperBanana pipeline first (best quality)
585605
ok = self._try_paperbanana(
@@ -603,7 +623,7 @@ def _generate_nano_banana_figures(self):
603623
api_key=api_key,
604624
model=self.config.get("nano_banana_model", "pro"),
605625
venue=venue,
606-
column_width_in=geo.get("columnwidth_in", 3.333),
626+
column_width_in=fig_width,
607627
max_critic_rounds=self.config.get("nano_banana_critic_rounds", 3),
608628
log_fn=self.log,
609629
)
@@ -684,7 +704,7 @@ def _try_paperbanana(self, name: str, caption: str, paper_context: str,
684704
data = {
685705
"candidate_id": name,
686706
"content": paper_context,
687-
"visual_intent": f"{caption} IMPORTANT: Keep text minimal — component labels MAX 3-5 words, connection labels MAX 1-3 words. Use icons and visual layout to convey meaning, not text. Total visible text under 50 words.",
707+
"visual_intent": f"{caption} STYLE: Labels MAX 3-5 words, NO sentences inside components. BUT make icons detailed and elaborate (not simple flat shapes). Layout should be COMPACT — minimize whitespace, pack components closely. The figure should feel dense and information-rich through its visual elements, not through text.",
688708
"additional_info": {"rounded_ratio": aspect_ratio},
689709
}
690710

ark/nano_banana.py

Lines changed: 19 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -322,15 +322,25 @@ def _run_planner(client, text_model_id: str, figure_name: str, caption: str,
322322
- NO drop shadows, NO gradients, NO 3D effects. Flat design with semantic richness.
323323
- Every visual element should encode meaning — if a color/shape/line doesn't convey information, remove it
324324
325-
### 6. TEXT BREVITY (CRITICAL — the #1 mistake is too much text)
326-
- A diagram is a VISUAL ABSTRACTION, not a text document. Communicate through shapes, colors, icons, and layout — NOT paragraphs.
327-
- **Component labels**: MAX 3-5 words (e.g., "LLM Semantic Detection", NOT "Uses a large language model to analyze skill instructions and classify them as benign or malicious")
328-
- **Connection labels**: MAX 1-3 words (e.g., "Benign", "Malicious", "Repaired")
329-
- **Annotations**: MAX 1 short phrase if essential (e.g., "F1=0.95", "N=18K"). Skip if not critical.
330-
- **NO sentences or descriptions inside components**. The caption and paper body handle explanations.
331-
- **NO bullet lists, paragraphs, or multi-line text blocks** inside any component.
332-
- If a component needs explanation beyond its label, use a small icon to convey meaning visually instead.
333-
- Total visible text in the entire figure should be under ~50 words.
325+
### 6. TEXT vs VISUAL BALANCE (CRITICAL)
326+
**Text must be SHORT, but visuals must be RICH.** These are different things.
327+
328+
TEXT rules (keep it minimal):
329+
- Component labels: MAX 3-5 words. NO sentences inside components.
330+
- Connection labels: MAX 1-3 words (e.g., "Benign", "Repaired")
331+
- NO paragraphs, bullet lists, or multi-line text inside components
332+
- Total visible text in the figure: under ~60 words
333+
334+
VISUAL rules (make it rich and detailed):
335+
- Each component should have a DETAILED, recognizable icon (not a simple flat shape — e.g., a magnifying glass hovering over a document for detection, gears with a wrench for repair, a shield with embedded lock for security, a brick wall with a scanning beam for firewall)
336+
- Show internal sub-structure visually: nested mini-elements, small thumbnails, overlapping shapes
337+
- Use visual metaphors: a funnel for filtering, a pipeline for flow, stacked layers for hierarchy
338+
339+
LAYOUT rules (keep it compact):
340+
- MINIMIZE whitespace between components. Pack elements closely.
341+
- Components should feel tightly arranged with clear, short connections
342+
- Avoid large empty areas — if there's space, add a visual annotation or detail
343+
- The diagram should feel DENSE and INFORMATIVE even without reading the text
334344
335345
IMPORTANT: Do NOT include font sizes (e.g., "12pt"), hex color codes (e.g., "#E6F3FF"), or CSS-like properties in component descriptions. Those are for the style guide only. Just describe WHAT to draw — shapes, labels, connections, zones, icons — in plain language. The image generator will interpret font specs as literal text to render.
336346

0 commit comments

Comments
 (0)