Ingests Hacker News, Stack Exchange, and dev.to on the default ideas ingest, filters out healthcare-related content, scores each post for “business tool opportunity” (embedding similarity to a fixed anchor describing B2B / ops problems solvable with software), then clusters and ranks only posts above IDEAS_MIN_BUSINESS_TOOL_FIT. RSS (default includes Lobsters), GitLab issues, Discourse topics, Mastodon hashtags, and Lemmy posts use public APIs (no key by default). Reddit and optional tokens for GitHub / dev.to / GitLab improve limits or private access (see below).
cd ideas_generator
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"
cp .env.example .envNo API keys are required for the default flow (HN + Stack Exchange + dev.to). An optional IDEAS_DEVTO_TOKEN improves dev.to rate limits (create under dev.to → Settings → Extensions).
ideas init-db
ideas ingest # default: HN + Stack Exchange + dev.to
ideas embed
ideas cluster
ideas score
ideas report --top 25Optional devtools / PM–design–engineering lens (last 90 days; writes devtools_report.md, gitignored like report.md):
python3 -m ideas_generator.devtools_reportOther sources:
ideas ingest -s hn # Hacker News only
ideas ingest -s stackexchange # Stack Exchange only
ideas ingest -s all # Everything below when configured (+ RSS Lobsters by default)
ideas ingest -s reddit # Reddit only (requires credentials)
ideas ingest -s github # GitHub issues only (requires IDEAS_GITHUB_REPOS)
ideas ingest -s devto # dev.to recent articles only (no API key)
ideas ingest -s rss # RSS feeds only (uses IDEAS_RSS_FEED_URLS; default is Lobsters)
ideas ingest -s gitlab # GitLab issues (set IDEAS_GITLAB_PROJECTS)
ideas ingest -s discourse # Discourse /latest.json (set IDEAS_DISCOURSE_BASE_URLS)
ideas ingest -s mastodon # Mastodon hashtag timelines (host + hashtags)
ideas ingest -s lemmy # Lemmy post list (set IDEAS_LEMMY_HOST)Default ideas ingest includes HN, SE, and dev.to. ideas ingest -s all adds optional Reddit, GitHub, RSS (defaults to https://lobste.rs/h.rss unless you override IDEAS_RSS_FEED_URLS), GitLab, Discourse, Mastodon, and Lemmy when the corresponding env vars are set.
Or one shot:
ideas run --top 25 # default ingest: all (HN+SE + optional sources)
ideas run -s hn-stackexchange # HN + SE only
ideas run -s all --top 50 -o report.md # full run, write Markdown to file (good for cron)Data lives in ./data/ideas.sqlite3 by default (gitignored).
IDEAS_MIN_BUSINESS_TOOL_FIT(default ~0.48): cosine vs anchor; lower lets more through (including newsy HN), higher keeps only stronger “niche B2B / operational tool” matches. Set to0to disable this gate (legacy behavior).IDEAS_BUSINESS_TOOL_ANCHOR: optional env override for the anchor paragraph (otherwise uses the built-in default inconfig.py).- Lower
IDEAS_CLUSTER_SIMILARITY_THRESHOLD(e.g.0.72) to merge more posts into fewer, broader themes; raise it (default ~0.80) for stricter, smaller clusters. IDEAS_INGEST_LOOKBACK_SECONDS(default604800= 7 days): HN Algolia search window and post-filter for every source—only items withcreated_atin the last N seconds (UTC) are kept. Set to0to disable this time filter.IDEAS_HN_LOOKBACK_SECONDSis deprecated but still honored if set (it overrides the ingest value for the whole window).
The defaults match a once-a-week job that pulls roughly the last 7 days from each configured source.
- Ensure
.envlists every API you want (ideas ingest -s allskips missing keys with a dim message). - Run
ideas run -s all --top 50 -o report.md(or usescripts/weekly-run.sh, which does the same from the repo root). - Cron (example: Mondays 06:00 local time):
0 6 * * 1 cd /path/to/ideas_generator && ./scripts/weekly-run.sh >> weekly.log 2>&1 - GitHub Actions — optional scheduled workflow:
.github/workflows/weekly-ideas.yml(uploadsreport.mdas an artifact; add repository secrets for keys you need).
Generic idea lists come from broad inputs. For problems that feel more specific than ChatGPT, bias ingestion toward communities where buyers and operators post:
- Stack Exchange — Set
IDEAS_STACKEXCHANGE_SITESto vertical or ops sites (for exampleworkplace,serverfault,dba,salesforce,sharepoint,webmasters) instead of only general programming Q&A. - Reddit — Expand
IDEAS_REDDIT_SUBREDDITSwith the industries you care about (for examplemsp,sysadmin,accounting,ecommerce,smallbusiness); requires Reddit API credentials. - Discourse — Point
IDEAS_DISCOURSE_BASE_URLSat vendor or professional forums that expose public/latest.json(product communities, open-core tools, trade associations). - RSS — Add
IDEAS_RSS_FEED_URLSentries for industry newsletters, niche blogs, and association feeds—not only large tech aggregators.
Front-page Hacker News alone skews newsy; combining it with Discourse, targeted Stack Exchange sites, and RSS surfaces more recurring operational pain.
Retrieval and clustering beat a one-shot chat answer because outputs are grounded in real posts. To decide what to build:
- Shortlist a few high-composite clusters (and check recurrence columns for repeated pain in the window).
- Read originals — open 10–20 source URLs from those clusters; use the report’s verbatim lines as quotes, not paraphrases.
- Triangulate — skim job ads or review sites (e.g. G2/Capterra themes) for the same category; alignment strengthens the signal.
- Interview — run 5–10 short calls with people in that role before committing engineering time.
After upgrading, run ideas embed once so business_tool_fit is backfilled for rows that already have embeddings.
Uses OpenAI by default or Google Gemini if you set IDEAS_LLM_PROVIDER=gemini.
API keys can be the usual bare names in .env: OPENAI_API_KEY and GEMINI_API_KEY (also supported: IDEAS_OPENAI_API_KEY, IDEAS_GEMINI_API_KEY).
The CLI searches for .env next to the package (editable install), in the current working directory, and in parent folders—so ideas llm-screen still finds keys if you run from a subfolder. Override with IDEAS_PROJECT_ROOT=/path/to/repo if needed.
Use pip install -e . so the package path stays inside your repo; a plain pip install . can leave the code in site-packages with no project .env beside it (parent-directory search still usually fixes this).
OpenAI (default): set OPENAI_API_KEY, optionally IDEAS_OPENAI_MODEL (gpt-4o-mini). Then:
ideas llm-screen— classifies each embedded post as business tool opportunity vs news / markets / consumer / other; storestool_opportunity_score(0–1).ideas runrunsllm-screenautomatically afterembedwhen the chosen provider’s key is set.- Clustering and reports only keep items with
llm_tool_score >= IDEAS_LLM_MIN_TOOL_SCORE(default0.60) when LLM screening is enabled.
Gemini: set IDEAS_LLM_PROVIDER=gemini, GEMINI_API_KEY, and optionally IDEAS_GEMINI_MODEL (default gemini-2.0-flash).
Embedding scores below IDEAS_LLM_SCREEN_MIN_EMBED_FIT skip the API call and are stamped as skipped (saves cost).
Use ideas llm-screen --force to re-classify all eligible rows.
- dev.to — Recent articles via the public API. Optional
IDEAS_DEVTO_TOKEN(orDEVTO_API_KEY) sends theapi-keyheader for higher rate limits. TuneIDEAS_DEVTO_ARTICLES_LIMIT(default 30). Included inideas ingest(defaulthn-stackexchange) andideas ingest -s all. - RSS / Atom — Default
IDEAS_RSS_FEED_URLSishttps://lobste.rs/h.rss. Add more comma-separated URLs or set empty to skip RSS inall. TuneIDEAS_RSS_MAX_ENTRIES_PER_FEED.
IDEAS_GITLAB_PROJECTS — comma-separated namespace/project paths on IDEAS_GITLAB_HOST (default https://gitlab.com). IDEAS_GITLAB_TOKEN / GITLAB_TOKEN optional. ideas ingest -s gitlab or -s all.
IDEAS_DISCOURSE_BASE_URLS — comma-separated forum origins (e.g. https://meta.discourse.org). Uses public /latest.json. IDEAS_DISCOURSE_TOPICS_PER_SITE limits topics per forum.
IDEAS_MASTODON_HOST (e.g. mastodon.social) and IDEAS_MASTODON_HASHTAGS (comma-separated, no #). Public tag timelines: /api/v1/timelines/tag/.... IDEAS_MASTODON_LIMIT.
IDEAS_LEMMY_HOST (e.g. lemmy.ml). Optional IDEAS_LEMMY_COMMUNITY (e.g. asklemmy). IDEAS_LEMMY_LIMIT.
Set IDEAS_GITHUB_REPOS to comma-separated owner/repo pairs. Fetches recent open issues (not pull requests). IDEAS_GITHUB_TOKEN (or GITHUB_TOKEN) is optional for public repos but improves rate limits. Tune IDEAS_GITHUB_ISSUES_PER_REPO (default 25).
Create an app at reddit.com/prefs/apps, then set IDEAS_REDDIT_CLIENT_ID, IDEAS_REDDIT_CLIENT_SECRET, and IDEAS_REDDIT_USER_AGENT in .env. Use ideas ingest -s all or -s reddit.
pytestUse official APIs; respect rate limits and each platform’s terms. Reddit OAuth and GitHub API access are subject to each provider’s policies.