feat: add reddit-fetcher plugin by nsheaps · Pull Request #391 · nsheaps/ai-mktpl

nsheaps · 2026-04-07T18:27:49Z

Summary

Adds reddit-fetcher plugin for fetching public Reddit content via the .json API (no authentication required)
Bash + curl + jq script with 4 subcommands: subreddit, post, search, user
Markdown-formatted output optimized for LLM consumption
Includes rate limiting (6s between requests), NSFW filtering, comment depth cap (3 levels), and content truncation
Skill documentation, usage rules (content quality skepticism), and README

Files

plugins/reddit-fetcher/.claude-plugin/plugin.json — plugin metadata (v0.1.0)
plugins/reddit-fetcher/scripts/reddit-fetch.sh — core fetcher script
plugins/reddit-fetcher/skills/fetch-reddit/SKILL.md — skill usage instructions
plugins/reddit-fetcher/rules/reddit-usage.md — guidelines for Reddit content quality
plugins/reddit-fetcher/README.md — overview and examples

Design Decisions

Bash + curl + jq over Node.js/Python: zero new dependencies, matches existing ai-mktpl plugin conventions
Skill-based over MCP server: simpler, no daemon management, rate limiting handled naturally
No authentication: public JSON API covers all read-only access to public content
Inspired by ClawHub reddit-readonly approach (direct JSON API, no auth)

Test Plan

--help flag outputs usage correctly
subreddit ClaudeCode --sort hot --limit 3 returns formatted markdown with posts
search "MCP server" --subreddit ClaudeCode --limit 2 returns search results
post <url> with a real Reddit post URL
user <username> with a real username
Rate limiting behavior on rapid successive calls
404 handling for non-existent subreddits
NSFW filtering (default exclude, --include-nsfw to include)

Co-Authored-By: Jack Oat <jack-nsheaps[bot]@users.noreply.github.com>

Skill-based plugin that fetches Reddit content via the public .json API (no auth required). Includes bash script with subreddit, post, search, and user subcommands. Output is markdown-formatted for LLM consumption. Enforces rate limiting (6s gap), comment depth cap (3 levels), and content truncation for manageable output size.

github-actions · 2026-04-07T18:29:09Z

Plugin Version Status

Versions are auto-bumped in PRs. Manual bumps to higher versions are preserved.

Plugin	Base	Current	Action
reddit-fetcher	0.0.0	0.1.0	Already bumped

henry-nsheaps

👍 Solid new plugin — a few improvements needed before merge

⚠️ Performance: 9× jq subprocess spawns per post in formatters (scales to 900+ at max limit)
⚠️ Input validation: subreddit/username interpolated into URLs without sanitization
⚠️ --limit not validated as numeric or bounded (README says max 100, script doesn't enforce)
⚠️ Temp files leak on SIGINT/SIGTERM (no trap cleanup)
✅ Well-structured script with clear separation of concerns
✅ Rate limiting, retry logic, and HTTP error handling are solid
✅ Good cross-platform date formatting (GNU + BSD fallback)
✅ Documentation (README, SKILL.md, rules) is thorough and accurate
✅ PR description matches code changes

🖱️ Click to expand for full details

Code Quality

The script is well-organized with good separation between helpers, formatters, and subcommands. Constants are properly readonly, set -euo pipefail is set, and error messages are descriptive. The HTTP error handling covers the important status codes (429, 403, 404, 503) with appropriate behavior for each.

Quality score (75%) is primarily impacted by the per-field jq invocation pattern. Every formatter function (format_post_listing, format_search_results, format_user_posts, format_post_with_comments, format_comments) extracts each JSON field with a separate echo | jq call. At --limit 100, format_post_listing alone spawns ~900 jq processes. This is the single biggest improvement opportunity. See inline comment for a suggested @tsv approach.

Security

No auth tokens or secrets are involved — the plugin uses Reddit's public JSON API, which is appropriate for read-only access.

Security score (80%) is reduced because subreddit names, usernames, and the --subreddit search option are interpolated directly into URL path segments without validation. Reddit names are [a-zA-Z0-9_]+ only — a simple regex check prevents path traversal or query injection. The search query is properly URL-encoded via jq -sRr @uri (line 518), which is good. The --limit parameter also goes into URLs without numeric validation.

Simplicity

Simplicity score (90%) — the bash + curl + jq approach is appropriate for the repo's conventions and avoids unnecessary dependencies. The subcommand pattern is clean and extensible. The recursive comment formatting (lines 277-329) is the most complex part but is inherently necessary for Reddit's nested comment structure.

Documentation

README, SKILL.md, and rules/reddit-usage.md are all well-written and accurate. The content quality skepticism guidelines in reddit-usage.md are a thoughtful addition. PR description matches the code changes precisely.

Adherence to Repo Conventions

The plugin structure follows the expected pattern (.claude-plugin/plugin.json, skills/, rules/, README.md). Note that most plugins in this repo include a lib/ directory with symlinks to shared/lib/ (e.g., log.sh), though this plugin doesn't use shared libs — which is acceptable since it has no hooks or session-start logic.

Recommended follow-ups (non-blocking):

Consider adding a lib/ directory with log.sh symlink if hooks are added later
The format_comments recursive-in-pipe pattern (line 323) works for output but would break if state accumulation is ever needed — worth a comment in code
Test plan items marked incomplete (post, user, rate limiting, 404 handling, NSFW filtering) should be verified before marking ready for review

Notes:¹²

Workflow run: https://github.com/nsheaps/ai-mktpl/actions/runs/24097722065/attempts/1 ↩
PR: #391 ↩

henry-nsheaps · 2026-04-07T18:30:53Z

+  while IFS= read -r post; do
+    local title author score num_comments created_utc permalink is_self selftext url_link
+    title=$(echo "$post" | jq -r '.data.title')
+    author=$(echo "$post" | jq -r '.data.author')
+    score=$(echo "$post" | jq -r '.data.score')
+    num_comments=$(echo "$post" | jq -r '.data.num_comments')
+    created_utc=$(echo "$post" | jq -r '.data.created_utc')
+    permalink=$(echo "$post" | jq -r '.data.permalink')
+    is_self=$(echo "$post" | jq -r '.data.is_self')
+    selftext=$(echo "$post" | jq -r '.data.selftext // ""')
+    url_link=$(echo "$post" | jq -r '.data.url // ""')
+
+    local date_str
+    date_str=$(format_date "$created_utc")
+    local post_type="link"
+    [[ "$is_self" == "true" ]] && post_type="self"
+
+    echo "## ${i}. ${title} (Score: ${score}, Comments: ${num_comments})"
+    echo "**Author**: u/${author} | **Posted**: ${date_str} | **Type**: ${post_type}"
+    echo "**Link**: ${BASE_URL}${permalink}"
+
+    if [[ "$post_type" == "link" && -n "$url_link" ]]; then
+      echo "**URL**: ${url_link}"
+    fi
+
+    if [[ -n "$selftext" && "$selftext" != "null" ]]; then
+      echo ""
+      truncate_text "$selftext" "$MAX_POST_BODY"
+    fi
+
+    echo ""
+    echo "---"
+    echo ""
+    i=$(( i + 1 ))
+  done < <(echo "$json" | jq -c "[${filter}][]")


⚠️ Performance: N+1 jq invocations per post

Each field is extracted with a separate echo "$post" | jq -r '...' call — that's 9 subprocess spawns per post. With --limit 100, that's 900 jq processes just for this one formatter (and similar patterns exist in format_search_results, format_user_posts, format_post_with_comments, and format_comments).

Extract all fields in a single jq call using @tsv or @base64 output, then split in bash. For example:

Suggested change

while IFS= read -r post; do

local title author score num_comments created_utc permalink is_self selftext url_link

title=$(echo "$post" | jq -r '.data.title')

author=$(echo "$post" | jq -r '.data.author')

score=$(echo "$post" | jq -r '.data.score')

num_comments=$(echo "$post" | jq -r '.data.num_comments')

created_utc=$(echo "$post" | jq -r '.data.created_utc')

permalink=$(echo "$post" | jq -r '.data.permalink')

is_self=$(echo "$post" | jq -r '.data.is_self')

selftext=$(echo "$post" | jq -r '.data.selftext // ""')

url_link=$(echo "$post" | jq -r '.data.url // ""')

local date_str

date_str=$(format_date "$created_utc")

local post_type="link"

[[ "$is_self" == "true" ]] && post_type="self"

echo "## ${i}. ${title} (Score: ${score}, Comments: ${num_comments})"

echo "**Author**: u/${author} | **Posted**: ${date_str} | **Type**: ${post_type}"

echo "**Link**: ${BASE_URL}${permalink}"

if [[ "$post_type" == "link" && -n "$url_link" ]]; then

echo "**URL**: ${url_link}"

fi

if [[ -n "$selftext" && "$selftext" != "null" ]]; then

echo ""

truncate_text "$selftext" "$MAX_POST_BODY"

fi

echo ""

echo "---"

echo ""

i=$(( i + 1 ))

done < <(echo "$json" | jq -c "[${filter}][]")

local i=1

while IFS=$'\t' read -r title author score num_comments created_utc permalink is_self selftext url_link; do

local date_str

date_str=$(format_date "$created_utc")

local post_type="link"

[[ "$is_self" == "true" ]] && post_type="self"

echo "## ${i}. ${title} (Score: ${score}, Comments: ${num_comments})"

echo "**Author**: u/${author} | **Posted**: ${date_str} | **Type**: ${post_type}"

echo "**Link**: ${BASE_URL}${permalink}"

if [[ "$post_type" == "link" && -n "$url_link" ]]; then

echo "**URL**: ${url_link}"

fi

if [[ -n "$selftext" && "$selftext" != "null" ]]; then

echo ""

truncate_text "$selftext" "$MAX_POST_BODY"

fi

echo ""

echo "---"

echo ""

i=$(( i + 1 ))

done < <(echo "$json" | jq -r "[${filter}][] | [.data.title, .data.author, .data.score, .data.num_comments, .data.created_utc, .data.permalink, .data.is_self, (.data.selftext // \"\"), (.data.url // \"\")] | @tsv")

Note: @tsv breaks if field values contain tabs or newlines. For Reddit content that's possible in selftext. A more robust approach is @base64 per-field or processing the full JSON array with jq and emitting one JSON object per line, then reading individual fields from that. But @tsv is a significant improvement over 9× subprocess spawns and works for the common case — selftext can be handled separately if needed.

This same pattern applies to format_search_results (lines 249-256), format_user_posts (lines 405-411), format_post_with_comments (lines 342-350), and format_comments (lines 293-300).

henry-nsheaps · 2026-04-07T18:31:00Z

+  [[ -z "$name" ]] && die "Subreddit name is required. Usage: reddit-fetch.sh subreddit <name>"
+
+  local url="${BASE_URL}/r/${name}/${sort}.json?limit=${limit}&raw_json=1"


⚠️ Input validation: subreddit name and username are interpolated directly into URL paths

name is user-supplied and goes directly into the URL without sanitization. A value like ../../api/v1/something or foo?bar=baz#frag could alter the URL path/query in unintended ways. Same issue applies to cmd_user (line 552) and cmd_search (line 522) with subreddit.

Reddit only allows alphanumeric characters and underscores in subreddit/user names. Add a simple validation:

Suggested change

[[ -z "$name" ]] && die "Subreddit name is required. Usage: reddit-fetch.sh subreddit <name>"

local url="${BASE_URL}/r/${name}/${sort}.json?limit=${limit}&raw_json=1"

[[ -z "$name" ]] && die "Subreddit name is required. Usage: reddit-fetch.sh subreddit <name>"

# Validate subreddit name (alphanumeric + underscores only)

[[ "$name" =~ ^[a-zA-Z0-9_]+$ ]] || die "Invalid subreddit name: $name (must be alphanumeric/underscores)"

local url="${BASE_URL}/r/${name}/${sort}.json?limit=${limit}&raw_json=1"

Apply the same pattern for username in cmd_user and subreddit in cmd_search.

henry-nsheaps · 2026-04-07T18:31:05Z

+    case "$1" in
+      --sort) sort="$2"; shift 2 ;;
+      --time) time="$2"; shift 2 ;;
+      --limit) limit="$2"; shift 2 ;;


⚠️ Missing --limit validation

limit is passed directly into the URL without checking it's a positive integer within bounds. Non-numeric values (e.g., --limit foo) will be passed to the API and silently produce unexpected results. The README documents max 100 but the script doesn't enforce it.

Suggested change

--limit) limit="$2"; shift 2 ;;

--limit) limit="$2"; shift 2 ;;

Consider adding after the while loop:

[[ "$limit" =~ ^[0-9]+$ ]] || die "Invalid limit: $limit (must be a positive integer)" (( limit >= 1 && limit <= 100 )) || die "Limit must be between 1 and 100, got: $limit"

This applies to all four subcommands (cmd_subreddit, cmd_search, cmd_user).

henry-nsheaps · 2026-04-07T18:31:11Z

+    local tmp_file
+    tmp_file=$(mktemp)


⚠️ Temp files leak on unexpected exit (SIGINT/SIGTERM)

mktemp is called inside the retry loop but there's no trap to clean up on script termination. If the user Ctrl+C's during a fetch, the temp file is orphaned.

Consider adding a trap near the top of the script (after the config section):

_TMPFILES=() cleanup() { rm -f "${_TMPFILES[@]}"; } trap cleanup EXIT

Then register each temp file: _TMPFILES+=("$tmp_file") after tmp_file=$(mktemp).

Alternatively, move tmp_file creation before the loop and reuse it across retries, which also avoids creating a new temp file per retry attempt.

henry-nsheaps · 2026-04-07T18:31:17Z

+        echo "$replies" | jq -c '.data.children[]?' | while IFS= read -r child; do
+          # Wrap in array for consistent processing
+          format_comments "[$child]" $(( depth + 1 )) "$prefix"
+        done
+      fi


❔ Recursive format_comments inside a pipe loses output

On line 323, format_comments is called inside a while read ... done that's fed by a pipe (jq ... | while ...). In bash, the right side of a pipe runs in a subshell. This means any recursive calls to format_comments from within that while loop execute in subshells.

For stdout output this works fine (output flows through the pipe), but it's worth noting that if you ever need to accumulate state (counters, arrays) across recursive calls, this pattern will silently drop changes. The current code is correct for its purpose — just flagging for awareness since recursive subshells are a common bash pitfall.

nsheaps self-assigned this Apr 7, 2026

nsheaps added the request-review Request a one-time review from the Claude review bot (label is removed after review starts) label Apr 7, 2026

henry-nsheaps Bot removed the request-review Request a one-time review from the Claude review bot (label is removed after review starts) label Apr 7, 2026

nsheaps and others added 2 commits April 7, 2026 18:28

chore: mise run lint

3ee465a

chore: auto-bump plugin versions and update marketplace

93de64c

henry-nsheaps Bot reviewed Apr 7, 2026

View reviewed changes

automation-nsheaps Bot mentioned this pull request Apr 8, 2026

2026-04-07 Daily Report #396

Open

Merge branch 'main' into feat/reddit-fetcher-plugin

f669b9e

automation-nsheaps Bot mentioned this pull request Apr 9, 2026

2026-04-08 Daily Report #403

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add reddit-fetcher plugin#391

feat: add reddit-fetcher plugin#391
nsheaps wants to merge 4 commits intomainfrom
feat/reddit-fetcher-plugin

nsheaps commented Apr 7, 2026

Uh oh!

github-actions Bot commented Apr 7, 2026

Uh oh!

henry-nsheaps Bot left a comment

Uh oh!

henry-nsheaps Bot Apr 7, 2026

Uh oh!

henry-nsheaps Bot Apr 7, 2026

Uh oh!

henry-nsheaps Bot Apr 7, 2026

Uh oh!

henry-nsheaps Bot Apr 7, 2026

Uh oh!

henry-nsheaps Bot Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		[[ -z "$name" ]] && die "Subreddit name is required. Usage: reddit-fetch.sh subreddit <name>"

		local url="${BASE_URL}/r/${name}/${sort}.json?limit=${limit}&raw_json=1"

	--limit) limit="$2"; shift 2 ;;
	--limit) limit="$2"; shift 2 ;;

Conversation

nsheaps commented Apr 7, 2026

Summary

Files

Design Decisions

Test Plan

Uh oh!

github-actions Bot commented Apr 7, 2026

Plugin Version Status

Uh oh!

henry-nsheaps Bot left a comment

Choose a reason for hiding this comment

👍 Solid new plugin — a few improvements needed before merge

Code Quality

Security

Simplicity

Documentation

Adherence to Repo Conventions

Footnotes

Uh oh!

henry-nsheaps Bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

henry-nsheaps Bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

henry-nsheaps Bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

henry-nsheaps Bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

henry-nsheaps Bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant