[toongod] add support#8963
Open
nthduy wants to merge 7771 commits intomikf:masterfrom
Open
Conversation
add 'post' & 'user' extractors
* [pornpics] add category and listing extractors Add support for: - Category pages like /ass/, /milf/, /blonde/ etc. - Listing pages like /popular/, /recent/, /rating/, /likes/, /views/, /comments/ Category pages use JSON pagination like tags/search. Listing pages don't support JSON pagination and use different HTML structure. * [pornpics] simplify category pattern via class ordering - Move PornpicsCategoryExtractor after PornpicsListingExtractor so it acts as catch-all, eliminating need for negative lookahead - Use list comprehension in PornpicsListingExtractor.galleries() * update docs/supportedsites
* [fitnakedgirls] add extractor Add support for fitnakedgirls.com: - Photo galleries (/photos/gallery/) - Category pages (/photos/gallery/category/) - Tag pages (/photos/tag/) - Video posts (/videos/) - Blog posts (/fitblog/) Handles both newer (wp-block-image) and older (size-large) templates. * simplify & fix - use '_extract_title' method - move '_pagination' into base class - update 'FitnakedgirlsTagExtractor' pattern * update docs/supportedsites --------- Co-authored-by: Mike Fährmann <mike_faehrmann@web.de>
Add support for nudostar.com forum (XenForo-based forum site). This is separate from the existing nudostar.py which handles nudostar.tv. Supports: - Thread extraction with pagination - Individual post extraction - Authentication via xf_user cookie or username/password - Internal attachments (both linked and embedded images) - External image host URLs (queued for recursive processing)
- fix website_token extraction - send website_token as 'X-Website-Token' header
- intercept ytdl logging messages and signal error when it emits an error message - remove "ERROR:" etc from ytdl logging messages
fixes regression introduced in c8fc790
fixes regression introduced in 402f536
for example '?order=asc&group=j0fsj3oem3&tlang=en'
fix '400 Bad Request' errors when retrieving more than the first batch of posts.
* Make sure that `img_id`, `audio_id` and `cover_id` fields are always available.
The values are set '' where they are not applicable.
Having `img_id` is necessary for the default `archive_fmt`, the other fields are handled for consistency.
* Allow downloading more than one cover.
The previous behavior is kept as-is, but setting the "covers" option to "all" now grabs all available covers.
* Add support for downloading subtitles
Allows filtering subtitles by source type (ASR, MT) and language.
* Ensure archive uniqueness for covers and subtitles.
* Update the URL test pattern to include the `image` extension.
Although Tiktok may serve the covers with jpeg content, the file ending can be `.image`.
The test before 0c14b16 failed because the asserted URL did not match all cover types, but the now used pattern needs the mentioned file ending.
* Add support for "creator_caption" subtitles in "LC" format.
These subtitles have the keys "Format" set to "creator_caption" and "Source" to "LC".
* Add "LC" (Local Captions) as a subtitle source type in the documentation
* Code deduplication and renaming subtitle metadata
Changed the item type from singular `subtitle` to `subtitles`.
Removed the wrong descriptor `cover` from the subtitles fallback title.
* Refactor subtitle filtering
The filter is now prepared in `_init` to prevent parsing the same config parameter for every item.
The `_extract_subtitles` function will still extract if either filter (source or language) matches.
* Generate a `file_id` for subtitles
Subtitles have multiple fields that determine the unique file, so these are simply concatenated.
This is similar to the cover types, only with more variations.
* Added tests for subtitles
* fix docs entries
* fix '"covers": "all"'
* simplify some code
* Fix fallback title for subtitles
Added the missing "f" to the f-string and added "subtitle" to the title.
The resulting title will look like "TikTok video subtitle #1234567"
Add extractor for toongod.org webtoon site with Cloudflare bypass support using FlareSolverr proxy.
- Fix line length issues (max 79 chars) - Fix continuation line indentation
6186e4a to
bf23ef8
Compare
bf23ef8 to
f654cbc
Compare
Fix folder naming issue where series names included junk suffixes like "Manhwa Afahbb" by extracting titles from breadcrumb navigation instead of URL slugs or H1 tags. - Extract series name from breadcrumb links (always clean) - Fallback to H1 tag with cleaning if breadcrumb fails - Remove "Manhwa", "Webtoon", "Manhua" suffixes - Remove encoded ID patterns (e.g., "Afahbb", "Aeaabb") Before: "Perfect Half Manhwa Afahbb" After: "Perfect Half"
Contributor
Author
|
Pushed a new commit to handle an edge case I discovered. Problem: Some manhwa on ToonGod have strange slugs that break the original H1/slug-based extraction. For example:
The issue is ToonGod's chapter URLs contain these suffixes (/webtoon/perfect-half-manhwa-afahbb/chapter-1/), and the chapter extractor was converting the slug to title case as a fallback. Solution: I changed the approach to extract from breadcrumb navigation since I noticed it's always clean and consistent. Falls back to H1 tag cleaning if breadcrumb fails. Tested on multiple series:
All tests pass, flake8 clean. Thanks for your review! |
d9c75d2 to
1407564
Compare
|
Toongod also uses wpmadara underneath. The base class in #9246 would cover this site too. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add support for https://www.toongod.org/ webtoon site.
Implements chapter and webtoon extractors with Cloudflare bypass support:
Features:
Cloudflare Protection:
Site uses Cloudflare protection. Two bypass methods supported:
FlareSolverr (recommended): Automatic challenge solving with session reuse
{"extractor": {"toongod": {"flaresolverr-url": "http://localhost:8191/v1"}}}Browser cookies: Manual cookie export
gallery-dl --cookies cookies.txt <url>