Skip to content

[LinkedIn] add extractor#8745

Open
shakeelmohamed wants to merge 7590 commits intomikf:masterfrom
shakeelmohamed:linkedin
Open

[LinkedIn] add extractor#8745
shakeelmohamed wants to merge 7590 commits intomikf:masterfrom
shakeelmohamed:linkedin

Conversation

@shakeelmohamed
Copy link
Copy Markdown

Add support for LinkedIn.

Implementation is pretty straightforward, let me know if you need a more complete writeup.

Test cases pass locally:

$ python test/test_results.py linkedin

https://www.linkedin.com/feed/update/urn:li:activity:7381400030911500288/
# feed URL with video content - Found 1 media item
.
https://www.linkedin.com/posts/invalid-url-format
# invalid URL format - should trigger StopExtraction when post ID extraction fails
.
https://www.linkedin.com/posts/groveben_in-a-time-of-agentic-creation-taste-and-ugcPost-7403592769451335681-Jt4n
# posts URL with video content - Found 1 media item
.
https://www.linkedin.com/posts/the-brand-identity-group-ltd_since-2003-nuits-sonores-has-transformed-activity-7404006895822372867-2rvj?utm_source=share&utm_medium=member_desktop&rcm=ACoAAAkyN_sBOslkonzC5zCQEyrDt7AVDULHCNA
# posts URL with multiple photos and querystring - Found 4 media items
.
https://www.linkedin.com/posts/groveben_in-a-time-of-agentic-creation-taste-and-ugcPost-7403592769451335681-Jt4n?utm_source=test&utm_medium=test
# querystring removal verification - should work same as without querystring
.
----------------------------------------------------------------------
Ran 5 tests in 5.983s

OK

mikf and others added 30 commits October 12, 2025 09:37
Co-authored-by: ClosedPort22 <44864697+ClosedPort22@users.noreply.github.com>
match <source> elements with attributes before 'src="..."'
provide 'width_original' & 'height_original' metadata
micro optimizations...
test for 'dt:…' to catch NullDatetime instances,
which inherit from 'datetime.datetime' but are not exactly this class
mikf and others added 24 commits December 18, 2025 08:36
Co-authored-by: d-koc <141529764+d-koc@users.noreply.github.com>
append directory segments for each item of a list (or general non-string
iterable), which can be returned with the 'I' specifier
'filename' & 'directory' set in 'path' tests cause
'ClassifyTest' to fail
Python's 'ast' module and its 'NodeVisitor' class
were incredibly helpful in identifying these
mikf added 3 commits December 27, 2025 17:52
lexicographical order
run 'make' and let 'scripts/supportedsites.py' do its thing
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.