Skip to content

Latest commit

 

History

History
144 lines (93 loc) · 5.91 KB

File metadata and controls

144 lines (93 loc) · 5.91 KB
name comic-transcriber
description Transcribe a comic book page image into a structured play-script-style markdown transcript. Use this skill whenever the user uploads a comic page image (JPG, PNG, or GIF) and wants it transcribed, converted to text, turned into a script, or described panel-by-panel. Also trigger when the user mentions "comic transcript", "comic script", "panel description", "comic to text", "transcribe this comic", "comic page to markdown", or wants dialogue and action extracted from a comic page image. This skill handles single-page transcription — one image in, one markdown file out. It does NOT handle PSD files, multi-page batch processing, or non-comic images.

Comic Page Transcriber

Convert a single comic book page image (JPG, PNG, or GIF) into a play-script-style markdown transcript that captures dialogue, action, scene descriptions, and sound effects.

Before you begin: Gather context

Before analysing the image, ask the user for a character key — a list mapping character names to brief visual descriptions so you can label dialogue and action correctly. Frame it like this:

Before I transcribe this page, could you give me a character key? Just the names and a short visual description so I can identify who's who — for example:

  • NOVA: Woman with silver hair and a blue bodysuit
  • GRIM: Tall figure in a dark cloak, skull mask

If the user says they don't know the characters or wants you to improvise, fall back to descriptive labels (e.g., SILVER-HAIRED WOMAN, CLOAKED FIGURE) and stay consistent across the transcript. But always ask first.

Also ask the user what page number this is (or whether they'd like you to default to Page 1).

Reading the page

Analyse the comic page image visually. Read panels in sequential order: left to right, top to bottom — the standard Western comic reading order. If the layout is ambiguous (overlapping panels, splash pages, inset panels), do your best to follow the most natural reading flow and note any ambiguity in a comment.

For each panel, identify:

  1. Setting / Scene — Where are we? What does the environment look like? Note any significant changes from the previous panel.
  2. Characters present — Who is visible? What are they doing physically?
  3. Dialogue — Speech bubbles, in reading order within the panel (generally top-to-bottom, left-to-right). Distinguish between regular speech, whispers, shouts, thought bubbles, narration boxes, and off-panel dialogue.
  4. Sound effects — Onomatopoeia rendered as part of the art (e.g., "KRAKOOM", "THWIP").
  5. Action / Movement — What's happening physically? Motion lines, impacts, gestures.

Output format

Create a markdown file with this structure:

Metadata header

# [Title or "Untitled Comic Page"]

**Page:** [number]
**Characters:**
- [NAME]: [visual description]
- [NAME]: [visual description]

---

Panel transcription

Each panel follows this pattern:

## PAGE [N], PANEL [M]

_[Scene/setting description in italics. Only include if the setting changes or this is the first panel.]_

_[Action/staging description in italics — what characters are doing physically.]_

**[CHARACTER NAME]:** Dialogue goes here.

**[CHARACTER NAME]** _(whispering)_**:** Dialogue in a modified delivery.

> _[NARRATION]: Narration box text in a blockquote, italicised._

**SFX:** KRAKOOM

_[Action description for any movement/impact that follows the dialogue.]_

Formatting conventions

These conventions keep the transcript readable and consistent:

  • Dialogue is always **CHARACTER NAME:** Text. The character name is bolded, followed by a colon.
  • Delivery modifiers go in parenthetical italics after the name: **NOVA** _(shouting)_**:**
  • Thought bubbles use the modifier _(thinking)_.
  • Off-panel dialogue uses the modifier _(off-panel)_.
  • Narration boxes use blockquote format: > _[NARRATION]: Text_ — or if attributed to a character, > _[NOVA - NARRATION]: Text_.
  • Sound effects get their own line: **SFX:** KRAKOOM — keep the original stylisation from the art.
  • Scene descriptions and action lines are in italics: _Nova leaps across the rooftop, cape trailing behind her._
  • Panel headers are H2: ## PAGE 1, PANEL 3
  • If a panel is a splash page (full page, single image), label it: ## PAGE 1, SPLASH
  • If a panel is an inset within a larger panel, label it: ## PAGE 1, PANEL 3 (INSET)

Example output

# The Silver Vanguard, Issue 12

**Page:** 4
**Characters:**
- NOVA: Woman with silver hair in a blue bodysuit
- GRIM: Tall cloaked figure with a skull mask
- DISPATCH: Voice over comms, not visually present

---

## PAGE 4, PANEL 1

_A rain-soaked rooftop at night. The city skyline glows in the background. A water tower looms to the left._

_Nova crouches at the edge of the rooftop, peering down at the street below. Her hair is plastered to her face from the rain._

> _[DISPATCH - NARRATION]: All units — we have a Code Theta in the warehouse district._

**NOVA:** Copy that. I see movement on the third floor.

## PAGE 4, PANEL 2

_Close-up on Nova's face. Her eyes narrow._

**NOVA** _(whispering)_**:** Grim, are you in position?

**GRIM** _(off-panel)_**:** Always.

## PAGE 4, PANEL 3

_Wide shot. Grim drops from above, cloak billowing, landing on a fire escape across the alley._

**SFX:** KRANNNG

_The fire escape shudders under the impact._

**GRIM:** Subtlety was never my strong suit.

**NOVA** _(off-panel)_**:** Clearly.

Final steps

  1. Write the transcript to a .md file named after the comic or page (e.g., silver-vanguard-p4-transcript.md). If you don't know the comic title, use comic-page-[N]-transcript.md.
  2. Save to /mnt/user-data/outputs/ and present the file to the user.
  3. After presenting, ask: "Does this look accurate? I can adjust character names, fix any dialogue I misread, or change the level of scene description."