Skip to content

feat: add reddit-fetcher plugin#391

Draft
nsheaps wants to merge 4 commits intomainfrom
feat/reddit-fetcher-plugin
Draft

feat: add reddit-fetcher plugin#391
nsheaps wants to merge 4 commits intomainfrom
feat/reddit-fetcher-plugin

Conversation

@nsheaps
Copy link
Copy Markdown
Owner

@nsheaps nsheaps commented Apr 7, 2026

Summary

  • Adds reddit-fetcher plugin for fetching public Reddit content via the .json API (no authentication required)
  • Bash + curl + jq script with 4 subcommands: subreddit, post, search, user
  • Markdown-formatted output optimized for LLM consumption
  • Includes rate limiting (6s between requests), NSFW filtering, comment depth cap (3 levels), and content truncation
  • Skill documentation, usage rules (content quality skepticism), and README

Files

  • plugins/reddit-fetcher/.claude-plugin/plugin.json — plugin metadata (v0.1.0)
  • plugins/reddit-fetcher/scripts/reddit-fetch.sh — core fetcher script
  • plugins/reddit-fetcher/skills/fetch-reddit/SKILL.md — skill usage instructions
  • plugins/reddit-fetcher/rules/reddit-usage.md — guidelines for Reddit content quality
  • plugins/reddit-fetcher/README.md — overview and examples

Design Decisions

  • Bash + curl + jq over Node.js/Python: zero new dependencies, matches existing ai-mktpl plugin conventions
  • Skill-based over MCP server: simpler, no daemon management, rate limiting handled naturally
  • No authentication: public JSON API covers all read-only access to public content
  • Inspired by ClawHub reddit-readonly approach (direct JSON API, no auth)

Test Plan

  • --help flag outputs usage correctly
  • subreddit ClaudeCode --sort hot --limit 3 returns formatted markdown with posts
  • search "MCP server" --subreddit ClaudeCode --limit 2 returns search results
  • post <url> with a real Reddit post URL
  • user <username> with a real username
  • Rate limiting behavior on rapid successive calls
  • 404 handling for non-existent subreddits
  • NSFW filtering (default exclude, --include-nsfw to include)

Co-Authored-By: Jack Oat <jack-nsheaps[bot]@users.noreply.github.com>

Skill-based plugin that fetches Reddit content via the public .json API
(no auth required). Includes bash script with subreddit, post, search,
and user subcommands. Output is markdown-formatted for LLM consumption.
Enforces rate limiting (6s gap), comment depth cap (3 levels), and
content truncation for manageable output size.
@nsheaps nsheaps self-assigned this Apr 7, 2026
@nsheaps nsheaps added the request-review Request a one-time review from the Claude review bot (label is removed after review starts) label Apr 7, 2026
@henry-nsheaps henry-nsheaps Bot removed the request-review Request a one-time review from the Claude review bot (label is removed after review starts) label Apr 7, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 7, 2026

Plugin Version Status

Versions are auto-bumped in PRs. Manual bumps to higher versions are preserved.

Plugin Base Current Action
reddit-fetcher 0.0.0 0.1.0 Already bumped

Copy link
Copy Markdown
Contributor

@henry-nsheaps henry-nsheaps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Solid new plugin — a few improvements needed before merge

⚠️ Performance: 9× jq subprocess spawns per post in formatters (scales to 900+ at max limit)
⚠️ Input validation: subreddit/username interpolated into URLs without sanitization
⚠️ --limit not validated as numeric or bounded (README says max 100, script doesn't enforce)
⚠️ Temp files leak on SIGINT/SIGTERM (no trap cleanup)
✅ Well-structured script with clear separation of concerns
✅ Rate limiting, retry logic, and HTTP error handling are solid
✅ Good cross-platform date formatting (GNU + BSD fallback)
✅ Documentation (README, SKILL.md, rules) is thorough and accurate
✅ PR description matches code changes

🖱️ Click to expand for full details

Code Quality

The script is well-organized with good separation between helpers, formatters, and subcommands. Constants are properly readonly, set -euo pipefail is set, and error messages are descriptive. The HTTP error handling covers the important status codes (429, 403, 404, 503) with appropriate behavior for each.

Quality score (75%) is primarily impacted by the per-field jq invocation pattern. Every formatter function (format_post_listing, format_search_results, format_user_posts, format_post_with_comments, format_comments) extracts each JSON field with a separate echo | jq call. At --limit 100, format_post_listing alone spawns ~900 jq processes. This is the single biggest improvement opportunity. See inline comment for a suggested @tsv approach.

Security

No auth tokens or secrets are involved — the plugin uses Reddit's public JSON API, which is appropriate for read-only access.

Security score (80%) is reduced because subreddit names, usernames, and the --subreddit search option are interpolated directly into URL path segments without validation. Reddit names are [a-zA-Z0-9_]+ only — a simple regex check prevents path traversal or query injection. The search query is properly URL-encoded via jq -sRr @uri (line 518), which is good. The --limit parameter also goes into URLs without numeric validation.

Simplicity

Simplicity score (90%) — the bash + curl + jq approach is appropriate for the repo's conventions and avoids unnecessary dependencies. The subcommand pattern is clean and extensible. The recursive comment formatting (lines 277-329) is the most complex part but is inherently necessary for Reddit's nested comment structure.

Documentation

README, SKILL.md, and rules/reddit-usage.md are all well-written and accurate. The content quality skepticism guidelines in reddit-usage.md are a thoughtful addition. PR description matches the code changes precisely.

Adherence to Repo Conventions

The plugin structure follows the expected pattern (.claude-plugin/plugin.json, skills/, rules/, README.md). Note that most plugins in this repo include a lib/ directory with symlinks to shared/lib/ (e.g., log.sh), though this plugin doesn't use shared libs — which is acceptable since it has no hooks or session-start logic.

Recommended follow-ups (non-blocking):

  • Consider adding a lib/ directory with log.sh symlink if hooks are added later
  • The format_comments recursive-in-pipe pattern (line 323) works for output but would break if state accumulation is ever needed — worth a comment in code
  • Test plan items marked incomplete (post, user, rate limiting, 404 handling, NSFW filtering) should be verified before marking ready for review

Notes:12

Footnotes

  1. Workflow run: https://github.com/nsheaps/ai-mktpl/actions/runs/24097722065/attempts/1

  2. PR: #391

Comment on lines +189 to +223
while IFS= read -r post; do
local title author score num_comments created_utc permalink is_self selftext url_link
title=$(echo "$post" | jq -r '.data.title')
author=$(echo "$post" | jq -r '.data.author')
score=$(echo "$post" | jq -r '.data.score')
num_comments=$(echo "$post" | jq -r '.data.num_comments')
created_utc=$(echo "$post" | jq -r '.data.created_utc')
permalink=$(echo "$post" | jq -r '.data.permalink')
is_self=$(echo "$post" | jq -r '.data.is_self')
selftext=$(echo "$post" | jq -r '.data.selftext // ""')
url_link=$(echo "$post" | jq -r '.data.url // ""')

local date_str
date_str=$(format_date "$created_utc")
local post_type="link"
[[ "$is_self" == "true" ]] && post_type="self"

echo "## ${i}. ${title} (Score: ${score}, Comments: ${num_comments})"
echo "**Author**: u/${author} | **Posted**: ${date_str} | **Type**: ${post_type}"
echo "**Link**: ${BASE_URL}${permalink}"

if [[ "$post_type" == "link" && -n "$url_link" ]]; then
echo "**URL**: ${url_link}"
fi

if [[ -n "$selftext" && "$selftext" != "null" ]]; then
echo ""
truncate_text "$selftext" "$MAX_POST_BODY"
fi

echo ""
echo "---"
echo ""
i=$(( i + 1 ))
done < <(echo "$json" | jq -c "[${filter}][]")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance: N+1 jq invocations per post

Each field is extracted with a separate echo "$post" | jq -r '...' call — that's 9 subprocess spawns per post. With --limit 100, that's 900 jq processes just for this one formatter (and similar patterns exist in format_search_results, format_user_posts, format_post_with_comments, and format_comments).

Extract all fields in a single jq call using @tsv or @base64 output, then split in bash. For example:

Suggested change
while IFS= read -r post; do
local title author score num_comments created_utc permalink is_self selftext url_link
title=$(echo "$post" | jq -r '.data.title')
author=$(echo "$post" | jq -r '.data.author')
score=$(echo "$post" | jq -r '.data.score')
num_comments=$(echo "$post" | jq -r '.data.num_comments')
created_utc=$(echo "$post" | jq -r '.data.created_utc')
permalink=$(echo "$post" | jq -r '.data.permalink')
is_self=$(echo "$post" | jq -r '.data.is_self')
selftext=$(echo "$post" | jq -r '.data.selftext // ""')
url_link=$(echo "$post" | jq -r '.data.url // ""')
local date_str
date_str=$(format_date "$created_utc")
local post_type="link"
[[ "$is_self" == "true" ]] && post_type="self"
echo "## ${i}. ${title} (Score: ${score}, Comments: ${num_comments})"
echo "**Author**: u/${author} | **Posted**: ${date_str} | **Type**: ${post_type}"
echo "**Link**: ${BASE_URL}${permalink}"
if [[ "$post_type" == "link" && -n "$url_link" ]]; then
echo "**URL**: ${url_link}"
fi
if [[ -n "$selftext" && "$selftext" != "null" ]]; then
echo ""
truncate_text "$selftext" "$MAX_POST_BODY"
fi
echo ""
echo "---"
echo ""
i=$(( i + 1 ))
done < <(echo "$json" | jq -c "[${filter}][]")
local i=1
while IFS=$'\t' read -r title author score num_comments created_utc permalink is_self selftext url_link; do
local date_str
date_str=$(format_date "$created_utc")
local post_type="link"
[[ "$is_self" == "true" ]] && post_type="self"
echo "## ${i}. ${title} (Score: ${score}, Comments: ${num_comments})"
echo "**Author**: u/${author} | **Posted**: ${date_str} | **Type**: ${post_type}"
echo "**Link**: ${BASE_URL}${permalink}"
if [[ "$post_type" == "link" && -n "$url_link" ]]; then
echo "**URL**: ${url_link}"
fi
if [[ -n "$selftext" && "$selftext" != "null" ]]; then
echo ""
truncate_text "$selftext" "$MAX_POST_BODY"
fi
echo ""
echo "---"
echo ""
i=$(( i + 1 ))
done < <(echo "$json" | jq -r "[${filter}][] | [.data.title, .data.author, .data.score, .data.num_comments, .data.created_utc, .data.permalink, .data.is_self, (.data.selftext // \"\"), (.data.url // \"\")] | @tsv")

Note: @tsv breaks if field values contain tabs or newlines. For Reddit content that's possible in selftext. A more robust approach is @base64 per-field or processing the full JSON array with jq and emitting one JSON object per line, then reading individual fields from that. But @tsv is a significant improvement over 9× subprocess spawns and works for the common case — selftext can be handled separately if needed.

This same pattern applies to format_search_results (lines 249-256), format_user_posts (lines 405-411), format_post_with_comments (lines 342-350), and format_comments (lines 293-300).

Comment on lines +447 to +449
[[ -z "$name" ]] && die "Subreddit name is required. Usage: reddit-fetch.sh subreddit <name>"

local url="${BASE_URL}/r/${name}/${sort}.json?limit=${limit}&raw_json=1"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Input validation: subreddit name and username are interpolated directly into URL paths

name is user-supplied and goes directly into the URL without sanitization. A value like ../../api/v1/something or foo?bar=baz#frag could alter the URL path/query in unintended ways. Same issue applies to cmd_user (line 552) and cmd_search (line 522) with subreddit.

Reddit only allows alphanumeric characters and underscores in subreddit/user names. Add a simple validation:

Suggested change
[[ -z "$name" ]] && die "Subreddit name is required. Usage: reddit-fetch.sh subreddit <name>"
local url="${BASE_URL}/r/${name}/${sort}.json?limit=${limit}&raw_json=1"
[[ -z "$name" ]] && die "Subreddit name is required. Usage: reddit-fetch.sh subreddit <name>"
# Validate subreddit name (alphanumeric + underscores only)
[[ "$name" =~ ^[a-zA-Z0-9_]+$ ]] || die "Invalid subreddit name: $name (must be alphanumeric/underscores)"
local url="${BASE_URL}/r/${name}/${sort}.json?limit=${limit}&raw_json=1"

Apply the same pattern for username in cmd_user and subreddit in cmd_search.

case "$1" in
--sort) sort="$2"; shift 2 ;;
--time) time="$2"; shift 2 ;;
--limit) limit="$2"; shift 2 ;;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Missing --limit validation

limit is passed directly into the URL without checking it's a positive integer within bounds. Non-numeric values (e.g., --limit foo) will be passed to the API and silently produce unexpected results. The README documents max 100 but the script doesn't enforce it.

Suggested change
--limit) limit="$2"; shift 2 ;;
--limit) limit="$2"; shift 2 ;;

Consider adding after the while loop:

[[ "$limit" =~ ^[0-9]+$ ]] || die "Invalid limit: $limit (must be a positive integer)"
(( limit >= 1 && limit <= 100 )) || die "Limit must be between 1 and 100, got: $limit"

This applies to all four subcommands (cmd_subreddit, cmd_search, cmd_user).

Comment on lines +98 to +99
local tmp_file
tmp_file=$(mktemp)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Temp files leak on unexpected exit (SIGINT/SIGTERM)

mktemp is called inside the retry loop but there's no trap to clean up on script termination. If the user Ctrl+C's during a fetch, the temp file is orphaned.

Consider adding a trap near the top of the script (after the config section):

_TMPFILES=()
cleanup() { rm -f "${_TMPFILES[@]}"; }
trap cleanup EXIT

Then register each temp file: _TMPFILES+=("$tmp_file") after tmp_file=$(mktemp).

Alternatively, move tmp_file creation before the loop and reuse it across retries, which also avoids creating a new temp file per retry attempt.

Comment on lines +323 to +327
echo "$replies" | jq -c '.data.children[]?' | while IFS= read -r child; do
# Wrap in array for consistent processing
format_comments "[$child]" $(( depth + 1 )) "$prefix"
done
fi
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recursive format_comments inside a pipe loses output

On line 323, format_comments is called inside a while read ... done that's fed by a pipe (jq ... | while ...). In bash, the right side of a pipe runs in a subshell. This means any recursive calls to format_comments from within that while loop execute in subshells.

For stdout output this works fine (output flows through the pipe), but it's worth noting that if you ever need to accumulate state (counters, arrays) across recursive calls, this pattern will silently drop changes. The current code is correct for its purpose — just flagging for awareness since recursive subshells are a common bash pitfall.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant