Smart figure placement + compact layout with rich icons

JihaoXin · claude · JihaoXin · commit c70c22e8bcb1 · 2026-04-04T12:27:26.000Z
- Agent decides figure placement (full_width vs single_column) per figure,
  passed as aspect_ratio to PaperBanana/NanoBanana
- Full-width (figure*): 21:9 for dual-column, 16:10 for single-column
- Single-column (figure): 4:3 compact ratio
- Planner prompt: text SHORT (labels ≤5 words) but icons RICH (detailed
  visual metaphors), layout COMPACT (minimize whitespace)
- PaperBanana visual_intent: strict no-title, no-bullets, no-numbers rules
- Critic: penalize text overload (sentences inside components → C ≤ 2)

Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/ark/compiler.py b/ark/compiler.py
@@ -520,11 +520,18 @@ def _generate_nano_banana_figures(self):
     "name": "fig_overview",
     "caption": "System architecture overview",
     "section_context": "Detailed description of what the figure should show, including all components, connections, data flows, and key metrics mentioned in the paper. Be as detailed as possible — the more context, the better the generated figure.",
-    "latex_label": "fig:overview"
+    "latex_label": "fig:overview",
+    "placement": "full_width"
   }}
 ]
 ```
 
+For each figure, you MUST decide the "placement" field:
+- "full_width": for complex figures — multi-stage pipelines, system architectures with many components, diagrams that need horizontal space. Uses `\\begin{{figure*}}` spanning all columns.
+- "single_column": for simpler figures — single concept with few components, small diagrams. Uses `\\begin{{figure}}` in one column.
+
+Decision criteria: if the figure has 4+ components, multiple stages, or branching paths → "full_width". If it's a simple 2-3 component relationship → "single_column".
+
 Only include figures that:
 1. Are referenced in LaTeX but have no existing file, OR
 2. Are concept/architecture/mechanism diagrams that could be improved with AI generation
@@ -563,23 +570,36 @@ def _generate_nano_banana_figures(self):
         venue_format = self.config.get("venue_format", venue)
         geo = get_geometry(venue_format) if venue_format else {"columnwidth_in": 3.333}
 
-        # Determine aspect ratio based on venue
         columns = geo.get("columns", 1)
-        aspect_ratio = "16:9" if columns == 1 else "16:10"
+        col_w = geo.get("columnwidth_in", 3.333)
+        text_w = geo.get("textwidth_in", 7.0)
 
         generated = 0
         for fig in figures:
             name = fig.get("name", "concept_fig")
             caption = fig.get("caption", "")
             section_ctx = fig.get("section_context", "")
+            placement = fig.get("placement", "full_width")
             output_path = self.figures_dir / f"{name}.png"
 
             # Skip if file already exists and is non-empty
             if output_path.exists() and output_path.stat().st_size > 0:
                 self.log(f"  Skipping {name}: already exists", "INFO")
                 continue
 
-            self.log(f"  Generating: {name}...", "INFO")
+            # Determine aspect ratio and width based on agent's placement decision
+            if columns == 1:
+                # Single-column templates (NeurIPS): always use textwidth
+                fig_width = text_w
+                aspect_ratio = "16:10"
+            elif placement == "full_width":
+                fig_width = text_w
+                aspect_ratio = "21:9"  # wide for spanning both columns
+            else:
+                fig_width = col_w
+                aspect_ratio = "4:3"  # compact for single column
+
+            self.log(f"  Generating: {name} (placement={placement}, {fig_width:.1f}in, ratio={aspect_ratio})...", "INFO")
 
             # Try PaperBanana pipeline first (best quality)
             ok = self._try_paperbanana(
@@ -603,7 +623,7 @@ def _generate_nano_banana_figures(self):
                     api_key=api_key,
                     model=self.config.get("nano_banana_model", "pro"),
                     venue=venue,
-                    column_width_in=geo.get("columnwidth_in", 3.333),
+                    column_width_in=fig_width,
                     max_critic_rounds=self.config.get("nano_banana_critic_rounds", 3),
                     log_fn=self.log,
                 )
@@ -684,7 +704,7 @@ def _try_paperbanana(self, name: str, caption: str, paper_context: str,
             data = {
                 "candidate_id": name,
                 "content": paper_context,
-                "visual_intent": f"{caption} IMPORTANT: Keep text minimal — component labels MAX 3-5 words, connection labels MAX 1-3 words. Use icons and visual layout to convey meaning, not text. Total visible text under 50 words.",
+                "visual_intent": f"{caption} STYLE: Labels MAX 3-5 words, NO sentences inside components. BUT make icons detailed and elaborate (not simple flat shapes). Layout should be COMPACT — minimize whitespace, pack components closely. The figure should feel dense and information-rich through its visual elements, not through text.",
                 "additional_info": {"rounded_ratio": aspect_ratio},
             }
 
diff --git a/ark/nano_banana.py b/ark/nano_banana.py
@@ -322,15 +322,25 @@ def _run_planner(client, text_model_id: str, figure_name: str, caption: str,
 - NO drop shadows, NO gradients, NO 3D effects. Flat design with semantic richness.
 - Every visual element should encode meaning — if a color/shape/line doesn't convey information, remove it
 
-### 6. TEXT BREVITY (CRITICAL — the #1 mistake is too much text)
-- A diagram is a VISUAL ABSTRACTION, not a text document. Communicate through shapes, colors, icons, and layout — NOT paragraphs.
-- **Component labels**: MAX 3-5 words (e.g., "LLM Semantic Detection", NOT "Uses a large language model to analyze skill instructions and classify them as benign or malicious")
-- **Connection labels**: MAX 1-3 words (e.g., "Benign", "Malicious", "Repaired")
-- **Annotations**: MAX 1 short phrase if essential (e.g., "F1=0.95", "N=18K"). Skip if not critical.
-- **NO sentences or descriptions inside components**. The caption and paper body handle explanations.
-- **NO bullet lists, paragraphs, or multi-line text blocks** inside any component.
-- If a component needs explanation beyond its label, use a small icon to convey meaning visually instead.
-- Total visible text in the entire figure should be under ~50 words.
+### 6. TEXT vs VISUAL BALANCE (CRITICAL)
+**Text must be SHORT, but visuals must be RICH.** These are different things.
+
+TEXT rules (keep it minimal):
+- Component labels: MAX 3-5 words. NO sentences inside components.
+- Connection labels: MAX 1-3 words (e.g., "Benign", "Repaired")
+- NO paragraphs, bullet lists, or multi-line text inside components
+- Total visible text in the figure: under ~60 words
+
+VISUAL rules (make it rich and detailed):
+- Each component should have a DETAILED, recognizable icon (not a simple flat shape — e.g., a magnifying glass hovering over a document for detection, gears with a wrench for repair, a shield with embedded lock for security, a brick wall with a scanning beam for firewall)
+- Show internal sub-structure visually: nested mini-elements, small thumbnails, overlapping shapes
+- Use visual metaphors: a funnel for filtering, a pipeline for flow, stacked layers for hierarchy
+
+LAYOUT rules (keep it compact):
+- MINIMIZE whitespace between components. Pack elements closely.
+- Components should feel tightly arranged with clear, short connections
+- Avoid large empty areas — if there's space, add a visual annotation or detail
+- The diagram should feel DENSE and INFORMATIVE even without reading the text
 
 IMPORTANT: Do NOT include font sizes (e.g., "12pt"), hex color codes (e.g., "#E6F3FF"), or CSS-like properties in component descriptions. Those are for the style guide only. Just describe WHAT to draw — shapes, labels, connections, zones, icons — in plain language. The image generator will interpret font specs as literal text to render.