thepulimaangani is a modern web application for Tamil prosody analysis, built with a hybrid architecture combining React frontend with Rust WebAssembly backend for high-performance text processing.
- Framework: TanStack Start (React-based full-stack framework)
- Routing: TanStack Router (file-based routing)
- Styling: Tailwind CSS with custom design system
- Build Tool: Vite
- Testing: Vitest
- Language: TypeScript
- Language: Rust
- Compilation Target: WebAssembly (wasm32-unknown-unknown)
- Build Tool: wasm-pack
- Serialization: serde_json for data interchange
- Frontend: React 19, TanStack Router, Tailwind CSS
- Backend: wasm-bindgen, serde, regex, unicode-segmentation
The frontend is organized as follows:
src/
├── components/ # Reusable UI components
│ ├── Header.tsx # Site header with navigation
│ ├── Footer.tsx # Site footer
│ └── LookToggle.tsx # Real / Redpill look
├── routes/ # File-based routing
│ ├── __root.tsx # Root layout component
│ ├── index.tsx # Main application page
│ └── about.tsx # About page
├── styles.css # Global styles and Tailwind imports
├── router.tsx # Router configuration
└── wasm/ # WebAssembly module bindings
├── thepulimaangani_parser.js
└── thepulimaangani_parser_bg.wasm
The Rust parser handles all Tamil prosody analysis:
tamil-seiyul-alagi/
├── src/
│ ├── lib.rs # parse_poem, parse_poem_wasm, ParseResult
│ ├── word_scope.rs # Linguistic words → syllables
│ ├── syllable_builder.rs # Ner/Nirai syllables
│ ├── foot.rs / foot_pattern.rs # One foot per word; Ner-Nirai pattern string
│ ├── linkage.rs # Consecutive-foot edges; table-driven Talai (தளை; issue #36)
│ ├── metre/ # MetreType + heuristic hypotheses (`prediction.rs`, …)
│ ├── poem_tree.rs # Structured poem tree
│ ├── presentation.rs # Human labels; embedded in `ParseResult.presentation` (WASM JSON)
├── pkg/ # Generated WebAssembly bindings
├── Cargo.toml # Rust dependencies
└── target/ # Build artifacts
Accuracy note: User-facing copy sometimes describes classical feet and full Talai (தளை) sets; the shipped WASM JSON uses Ner/Nirai foot patterns, table-driven linkage from the previous foot’s last acai and the next foot’s first acai (issue #36), and heuristic metre ranking. See tamil-seiyul-alagi/MACHINE_FIRST_SPEC.md and QUALITY_CRAP_BASELINE.md.
- Vite Config (
vite.config.ts): Frontend build configuration with TanStack Start integration - TypeScript Config (
tsconfig.json): Type checking and compilation - Package Config (
package.json): Node.js dependencies and scripts
The WebAssembly parser is built separately and its artifacts are copied to src/wasm/ for bundling. These generated files are not tracked in git to keep the repository clean and ensure builds are reproducible from source.
Build Process:
- Rust Compilation:
wasm-pack build --target web --out-dir pkggenerates JavaScript bindings and WASM binary intamil-seiyul-alagi/pkg/ - File Copy: Generated files are automatically copied from
tamil-seiyul-alagi/pkg/tosrc/wasm/viapnpm run build:wasm - Vite Bundling: Vite processes the WASM files as static assets, serving them with proper MIME types
- Dynamic Import: Frontend uses
import('../wasm/thepulimaangani_parser.js')for lazy loading - Runtime Connection: JavaScript bindings initialize the WASM module and expose the
parse_poem_wasm()function
Why Not Track WASM Files: Binary files are excluded from version control to avoid repository bloat and ensure that all builds are generated from the source Rust code, maintaining build reproducibility.
- User Input: Tamil text entered in React component
- WASM Call: Frontend calls
parse_poem_wasm(text)(Rustparse_poemwith default options:uyir_unormalization on) - Parsing Pipeline (current implementation):
- Text preprocessing and normalization
- Syllable detection (நேர் / நிரை) per linguistic word
- Feet: one foot per word;
foot_typeis a hyphenated Ner/Nirai pattern (not classical தேமா names in JSON) - Metre: ranked hypotheses; simple heuristics, not full classical rule engines yet
- Parse features (optional): when metre detection runs,
parse_featurescarries a versioned 51-float snapshot for training / UI diagnostics (seetamil-seiyul-alagi/PARSE_FEATURES.md) - Linkage: consecutive feet with line/word positions;
linkage_type= coarse family (VenTalai,AciriyaTalai,KaliTalai,VanjiTalai) andlinkage_special_type= issue #36 row (e.g.VencirVenTalai);VenTalai/Unknownonly for malformed/empty feet
- Result Serialization:
ParseResultto JSON in the browser - Display: React reads JSON via
adaptWasmJsonToParsedPoem;presentationfrom WASM carries Tamil metre / foot / தளை labels (canonical). The app may still map enums locally whenpresentationis absent (older builds).
The parser implements traditional Tamil prosody rules:
- நேர் (Ner): Simple syllables with consonant-vowel patterns
- நிரை (Nirai): Complex syllables with consonant-vowel-consonant patterns
The engine groups syllables into one foot per linguistic word and sets foot_type to a machine-readable Ner/Nirai sequence (for example Ner-Ner). ParseResult.presentation.feet carries the classical Tamil foot label for each foot in poem order; the web app prefers that field when present.
The metre/ module produces hypotheses with scores; it does not yet encode full classical constraints for வெண்பா, வெண்கலிப்பா, ஆசிரியப்பா, கலிப்பா, etc. Treat catalogue metres as targets for MACHINE_FIRST_SPEC.md, not guarantees from the current build.
Consecutive feet get a linkage record with positions plus linkage_type / linkage_special_type. ParseResult.presentation.talai carries the full Tamil தளை string per bond (same indices as linkage). The Structure tab prefers presentation.talai when present. In English prose use Talai (capital T at sentence start), not Thalai (that suggests தலை “head”).
- Performance: Native-speed text processing in the browser
- Bundle Size: Efficient compression of parsing logic
- Memory Safety: Rust's memory safety guarantees
- Unicode Support: Robust handling of Tamil script (U+0B80-U+0BFF range)
- Test Coverage: 90%+ code coverage ensuring reliability of complex linguistic algorithms
- Dynamic Imports: WebAssembly module loaded on-demand
- Lazy Loading: Parser initialization deferred until needed
- Error Handling: Graceful degradation for parsing failures
The WebAssembly parser is built using the automated build:wasm script:
pnpm run build:wasmThis script:
- Builds the Rust parser with
wasm-pack build --target web --out-dir pkg - Copies the generated WebAssembly files to
src/wasm/where Vite can access them
Why Manual Copy Was Needed: Vite requires WebAssembly files to be in the source directory for proper bundling and serving with the correct application/wasm MIME type. The build script automates this copy process.
pnpm run dev # Development server (http://localhost:3000)
pnpm run build # Production build (includes WASM build)
pnpm run build:only # Frontend build only (assumes WASM is already built)
pnpm run test # Run complete test suite (Rust + Frontend)
pnpm run test:rust # Run Rust tests
pnpm run test:frontend # Run frontend tests (Vitest)Development Workflow: When modifying the Rust parser, run pnpm run build:wasm to rebuild and copy the WebAssembly files. For frontend-only changes, pnpm run dev will hot-reload automatically.
Note: The src/wasm/ directory is gitignored since it contains generated files. Always run pnpm run build:wasm after cloning the repository or modifying the Rust parser.
The frontend connects to the WebAssembly parser through:
- Dynamic Import:
import('../wasm/thepulimaangani_parser.js')loads the WASM bindings - Initialization:
wasm.default()initializes the WebAssembly module - Function Call:
wasm.parse_poem_wasm(text)executes the Rust parsing logic - Result Processing: JSON results are parsed and displayed in the UI
Vite automatically handles serving the .wasm files with the correct application/wasm MIME type required for WebAssembly instantiation.
- Rust: Strong typing with serde serialization
- TypeScript: Typed adapters and tests for WASM JSON shapes (
adaptWasmJsonToParsedPoem, Vitest) - Runtime validation: not via a separate JSON Schema pipeline; invalid shapes surface as TypeScript/test failures. Add explicit schema validation only if the project adopts it.
- Batch Processing: Analyze multiple poems simultaneously
- Export Formats: JSON, CSV, and PDF output options
- Additional Metres: Support for advanced classical metres (விருத்தம், வஞ்சிப்பா variants)
- Comparative Analysis: Side-by-side comparison of different metres
- Educational Mode: Interactive learning tools for Tamil prosody
- Service Worker: Offline parsing capabilities
- Web Workers: Background processing for large texts
- IndexedDB: Client-side result caching
- Progressive Web App: Installable application features
The application is designed for static deployment:
- Build Output: Self-contained static files
- WebAssembly: Embedded in the bundle
- CDN Ready: No server-side dependencies required
- HTTPS Required: WebAssembly requires secure context
This architecture provides a balance of modern web development practices with the performance requirements of complex linguistic analysis.