Compare commits

...

342 Commits

Author SHA1 Message Date
Philipp Hagemeister
eb6cb9fbe9 release 2015.05.29 2015-05-29 07:52:17 +02:00
Yen Chi Hsuan
84e1e036c2 [senate] Extend _VALID_URL (fixes #5836) 2015-05-29 12:44:31 +08:00
Sergey M․
9e0b579128 [nowtv] Add test for rtlnitro 2015-05-28 01:26:14 +06:00
Sergey M․
ff4a1279f2 [nowtv] Do not request unnecessary metadata 2015-05-28 01:15:04 +06:00
Sergey M․
9b254aa177 [nowtv] Add non-free video check 2015-05-27 23:41:43 +06:00
Sergey M․
d9446c7319 Merge branch 'akirk-nowtv' 2015-05-27 23:22:19 +06:00
Sergey M․
b25b645d51 [nowtv] Improve and simplify 2015-05-27 23:20:32 +06:00
Sergey M․
bf24c3d017 [facebook] Improve title regex (Closes #5816) 2015-05-27 21:25:07 +06:00
Yen Chi Hsuan
f0bfaa2d7d [nrk] Update subtitles test
Subtitle conversion routine is removed, so the subtitles are TTML now. See
1c7e2e64f6
2015-05-27 15:23:34 +08:00
Yen Chi Hsuan
f9f3e3df9a [teamcoco] Use determine_ext to determine the video type
Some videos does not contain a 'type' field (#5798)
2015-05-27 14:51:18 +08:00
Yen Chi Hsuan
f8d5e1cfb5 [naver] Fix video url (fixes #5809)
RTMP urls in test:naver does not work. Need more investigation.
2015-05-27 14:44:08 +08:00
Yen Chi Hsuan
c23848b3c5 [naver] Enhanced error detection 2015-05-27 14:20:29 +08:00
Yen Chi Hsuan
6d00a2dcd1 [bilibili] Catch API call failures
JSON are returned in a failed API call
2015-05-27 04:23:21 +08:00
Yen Chi Hsuan
b535170b21 [bilibili] Skip assertion if HQ videos not available 2015-05-27 04:14:24 +08:00
Sergey M․
1434184c57 [spankwire] Do not modify aes key string 2015-05-27 01:42:53 +06:00
Sergey M․
7a372b64df [pornhub] Do not modify aes key string (Closes #5824) 2015-05-27 01:41:00 +06:00
Sergey M․
5406af92bc [dailymotion:user] Fix _VALID_URL 2015-05-26 22:16:47 +06:00
Sergey M․
7d65242dc3 [dailymotion:user] Process user home as user (Closes #5823) 2015-05-26 22:12:26 +06:00
Naglis Jonaitis
544a8693b7 Remove Firedrive and Sockshare imports
Oops
2015-05-26 13:53:14 +03:00
Naglis Jonaitis
35a4f24a37 [firedrive] Remove extractor (Closes #3870)
Haywire since last October.
2015-05-26 13:44:46 +03:00
Naglis Jonaitis
ff305edd64 [sockshare] Remove extractor
Haywire since last October.
2015-05-26 13:43:00 +03:00
Yen Chi Hsuan
efec4358b9 [cinemassacre] Support an alternative form of screenwavemedia URL
fixes #5821
2015-05-26 13:54:41 +08:00
Yen Chi Hsuan
db3ca36403 [facebook] Move the title extraction warning below (fixes #5820) 2015-05-26 13:41:38 +08:00
Yen Chi Hsuan
42833b44b5 [tf1] Extend _VALID_URL (fixes #5819) 2015-05-26 13:32:43 +08:00
Alexander Kirk
5d0a33eebc rtlnow is now hosted at nowtv.de 2015-05-25 20:36:25 +02:00
Sergey M․
ba2df04b41 [odnoklassniki] Make URL explicit 2015-05-25 21:27:43 +06:00
Sergey M․
c6bbdadd79 [odnoklassniki] Support extraction from metadata URL (Closes #5813) 2015-05-25 21:22:13 +06:00
Sergey M․
b885bae634 Credit @misterhat for karrierevideos (#5729) 2015-05-25 04:53:53 +06:00
Sergey M?
d41ebe146b [tenplay] Fix formats and modernize (Closes #5806) 2015-05-24 23:58:09 +06:00
Jaime Marquínez Ferrándiz
4b4e1af059 [arte] Remove unused import 2015-05-24 18:46:29 +02:00
Sergey M.
80240b347e Merge pull request #5780 from jaimeMF/remove-nondash
[youtube] Remove the nondash formats (fixes #5774)
2015-05-24 21:42:15 +05:00
Jaime Marquínez Ferrándiz
04b3b3df05 [youtube] Remove the nondash formats (fixes #5774)
Since we use fixed values for some fields like width and height they can be wrong, and would get picked by some formats filters.
For example for https://www.youtube.com/watch?v=EQCrhbBxsjA the biggest height is 720 and for nondash formats it's set to 1440, so -f 'bestvideo[height>=1200]+bestaudio' would incorrectly pick the nondash format, instead it should report that the requested format is not available.
2015-05-24 18:26:20 +02:00
Sergey M․
2ad5708c43 [arte:future] Switch to search_regex for now (Closes #5801) 2015-05-24 21:25:00 +06:00
Sergey M․
63f3cab4ae [rtbf] Fix extraction (Closes #5803) 2015-05-24 21:09:08 +06:00
Sergey M․
8cdf03a7a2 Merge branch 'misterhat-karrierevideos' 2015-05-24 20:14:54 +06:00
Sergey M․
d78c834ead [karrierevideos] Improve and simplify 2015-05-24 20:04:13 +06:00
Sergey M․
05a976cd99 Merge branch 'karrierevideos' of https://github.com/misterhat/youtube-dl into misterhat-karrierevideos 2015-05-24 19:19:48 +06:00
Sergey M․
34fb7e46ad [empflix] Relax _VALID_URL 2015-05-24 19:11:40 +06:00
Sergey M․
abac15f3c6 [tnaflix] Do not capture cat_id 2015-05-24 19:11:31 +06:00
Sergey M.
b700055ba4 Merge pull request #5772 from frenchy1983/fix_tnaflix_regex
[TNAFlix] Allow dot (and more) in cat_id and display_id
2015-05-24 17:54:25 +05:00
Sergey M.
23905927e1 [README.md] Keep more idiomatic rwx order 2015-05-24 18:32:04 +06:00
Sergey M.
56be5f1567 Merge pull request #5800 from WassimAttar/patch-1
[README.md] chmod error
2015-05-24 17:29:26 +05:00
WassimAttar
1807ae22dd chmod error
After installing youtube-dl with this method
    sudo wget https://yt-dl.org/downloads/latest/youtube-dl -O /usr/local/bin/youtube-dl
    sudo chmod a+xr /usr/local/bin/youtube-dl
When i try to use it, i get this error
    python: can't open file '/usr/local/bin/youtube-dl': [Errno 13] Permission denied

The correct chmod is a+xr
2015-05-24 10:37:05 +02:00
Sergey M․
71646e4653 [YoutubeDL] Initialize files_to_delete (Closes #5797) 2015-05-24 04:14:01 +06:00
Sergey M?
1335c3aca8 [drtv] Improve extraction (Closes #5792) 2015-05-24 01:22:11 +06:00
Yen Chi Hsuan
30455ce255 [nextmedia] Extend and reorder _VALID_URL 2015-05-24 02:42:01 +08:00
Yen Chi Hsuan
9bf87ae3aa [nextmedia] Merge AppleDailyRealtimeNewsIE and AppleDailyAnimationNewsIE 2015-05-24 02:36:47 +08:00
Yen Chi Hsuan
abca34cbc0 [cnn] Relax _VALID_URL again (fixes #5737)
The problem is the same as test:CNN_1, so I didn't add the test case
2015-05-24 02:04:02 +08:00
Sergey M․
d386878af9 [prosiebensat1] Add support for .at domain names (Closes #5786) 2015-05-23 21:25:53 +06:00
Sergey M․
685c74d315 [rutv] Extend embed URL (Closes #5782) 2015-05-23 01:01:47 +06:00
Sergey M․
69e0f1b445 Credit @ping for viki:channel, qqmusic:toplist 2015-05-23 00:08:10 +06:00
Jaime Marquínez Ferrándiz
79979c6897 Clarify that --dump-pages encodes the pages using base64 (#5781) 2015-05-22 16:15:50 +02:00
Jaime Marquínez Ferrándiz
ba64547616 [sportbox] Remove unused import 2015-05-22 11:35:09 +02:00
frenchy1983
ed5a637d62 [TNAFlix] Restore test
See dstftw's comment in #5772
2015-05-22 09:29:35 +02:00
Yen Chi Hsuan
8a278a1d7e [nba] Fix duration extraction (fixes #5777) 2015-05-22 13:30:39 +08:00
Sergey M․
77d9cb2f04 [sportbox] Fix extraction 2015-05-22 00:45:33 +06:00
Sergey M․
0459432d96 [shared] Fix for python 3.2 2015-05-22 00:10:53 +06:00
Sergey M․
43150d7ac3 [shared] Fix for python 3.2 2015-05-22 00:10:05 +06:00
Sergey M․
afe8b594be [rtve.es:alacarta] Fix for python 3.2 2015-05-22 00:09:15 +06:00
Sergey M․
878563c847 [aes] Fix for python 3.2 2015-05-22 00:06:10 +06:00
Sergey M․
06947add03 [chilloutzone] Fix for python 3.2 2015-05-22 00:03:47 +06:00
Sergey M․
5cd47a5e4f [videott] Fix for python 3.2 2015-05-21 23:58:46 +06:00
Sergey M․
53de95da5e [viki] Extend _VALID_URLs 2015-05-21 22:27:22 +06:00
Sergey M․
663004ac2b [options] Clarify --metadata-from-title additional templates 2015-05-21 22:06:25 +06:00
Jaime Marquínez Ferrándiz
6ad9cb224a [mitele] It now uses m3u8 (#5764)
It should also be possible to use Adobe HDS, but it would require more work.
2015-05-21 12:02:53 +02:00
frenchy1983
e7752cd578 [TNAFlix] Allow dot (and more) in cat_id and display_id
URLs with dots were raising a "UnsupportedError: Unsupported URL" error.
2015-05-21 11:47:16 +02:00
Jaime Marquínez Ferrándiz
4d2f42361e [viki] remove unused import 2015-05-21 11:42:20 +02:00
Sergey M․
4d8ee01389 [viki] Fix typo 2015-05-21 02:38:43 +06:00
Sergey M․
d01924f488 [viki:channel] Extend matching URLs and extract movies 2015-05-21 02:30:04 +06:00
Sergey M․
bc56355ec6 [viki:channel] Switch to API 2015-05-21 02:08:13 +06:00
Sergey M․
ac20d95f97 [viki] Add support for youtube externals 2015-05-21 01:56:02 +06:00
Sergey M․
1a83c731bd [viki] Switch extraction to API 2015-05-21 01:44:05 +06:00
Sergey M․
ca57a59883 Merge branch 'ping-viki-shows' 2015-05-20 22:10:06 +06:00
Sergey M․
b0d619fde2 [viki:channel] Extract title from JSON 2015-05-20 21:28:04 +06:00
Sergey M․
cc7051efd7 Merge branch 'viki-shows' of https://github.com/ping/youtube-dl into ping-viki-shows 2015-05-20 20:17:47 +06:00
Philipp Hagemeister
0b9f7cd074 release 2015.05.20 2015-05-20 10:01:48 +02:00
Yen Chi Hsuan
051df9ad99 [letv/sohu] Skip tests relying on external proxies
The proxy is currently broken. See #5655 and zhuzhuor/Unblock-Youku#427
2015-05-20 14:08:23 +08:00
Sergey M․
d9d747a06a [ultimedia] Fix extraction 2015-05-19 21:28:41 +06:00
Yen Chi Hsuan
b813d8caf1 [qqmusic] Unescape '\\n' in description (#5705) 2015-05-19 01:01:42 +08:00
Yen Chi Hsuan
ecee572411 [yahoo] Add support for closed captions (closes #5714) 2015-05-19 00:50:24 +08:00
Yen Chi Hsuan
1b0427e6c4 [utils] Support TTML without default namespace
In a strict sense such TTML is invalid, but Yahoo uses it.
2015-05-19 00:45:01 +08:00
Jaime Marquínez Ferrándiz
2aa64b89b3 tox: Pass HOME environment variable
Since version 2.0 it only passes a limited set of variables and we need HOME for the tests
2015-05-18 17:58:53 +02:00
Sergey M․
484c9d2d5b [vier] Fix extraction 2015-05-18 21:43:54 +06:00
Sergey M․
5d8dcb5342 [vuclip] Fix extraction 2015-05-18 21:39:15 +06:00
Sergey M․
2328f2fe68 [vulture] Fix extraction 2015-05-18 21:34:20 +06:00
Sergey M․
4f514c7e88 [wimp] Fix youtube extraction (Closes #5690) 2015-05-18 21:29:41 +06:00
Sergey M․
5bdc520cf1 [xminus] Fix extraction 2015-05-18 21:23:05 +06:00
Jaime Marquínez Ferrándiz
fc6e75dd57 [instagram] Only recognize https urls (fixes #5739)
http urls redirect to them.
2015-05-18 11:21:09 +02:00
Sergey M․
4a5a898a8f [YoutubeDL] Clarify incompatible formats merge message
When `-f` is not specified it's misleading to see `You have requested ...` as user did not actually request any formats.
2015-05-17 20:56:03 +06:00
Mister Hat
ba9d16291b manually specify namespace 2015-05-17 03:35:08 -05:00
Mister Hat
725652e924 [karrierevideos] add support for www.karrierevideos.at (closes #5354) 2015-05-16 19:50:58 -05:00
ping
8da0e0e946 [viki] Change IE name to channel, better message output 2015-05-17 06:19:38 +08:00
Sergey M․
588b82bbf8 [tv2:article] Add extractor (Closes #5724) 2015-05-17 03:32:53 +06:00
Sergey M․
bc0f937b55 [tv2] Add extractor (#5724) 2015-05-17 03:01:52 +06:00
Sergey M․
baa43cbaf0 [extractor/common] Relax valid url check verbosity 2015-05-17 02:59:35 +06:00
Sergey M․
adb6b1b316 Merge branch 'viki-shows' of https://github.com/ping/youtube-dl into ping-viki-shows 2015-05-17 00:38:58 +06:00
ping
1c18de0019 [viki] Add proper paging and include clips 2015-05-17 01:38:50 +08:00
Jaime Marquínez Ferrándiz
4d52f2eb7f [sbs] Remove unused import 2015-05-16 18:38:28 +02:00
Sergey M․
363cf58645 Merge branch 'viki-shows' of https://github.com/ping/youtube-dl into ping-viki-shows 2015-05-16 21:28:36 +06:00
Sergey M․
7e760fc188 [espn] Add extractor (#4396)
Unfinished
2015-05-16 21:14:19 +06:00
Sergey M․
ef2dcbe4ad [sbs] Fix extraction (Closes #5725) 2015-05-16 21:07:29 +06:00
Sergey M․
9354a5fad4 [ooyala] Fix unresolved reference 2015-05-16 20:15:31 +06:00
Sergey M․
1c97b0a777 [ooyala:external] Add extractor 2015-05-16 20:00:40 +06:00
ping
2f3bdab2b9 [viki] Fix code format 2015-05-16 15:56:37 +08:00
ping
0d7f036429 [viki] Add support for shows 2015-05-16 15:43:13 +08:00
Sergey M.
2cda13213d Merge pull request #5717 from blissland/master
[CBSNewsIE] Relax thumbnail regex so test passes
2015-05-15 22:36:07 +05:00
Sergey M․
70d0d43b5e [rts] Check formats (Closes #5711) 2015-05-15 23:32:25 +06:00
Sergey M․
25c3a7348f [generic] Fix typo 2015-05-15 23:23:51 +06:00
Sergey M․
9123d64592 Merge branch 'maddoger-sportbox-fix' 2015-05-15 23:19:21 +06:00
Sergey M․
b827a6015c [generic] Add test for sportbox embeds 2015-05-15 23:18:21 +06:00
Sergey M․
d40a3b5b55 [generic] Add support for sportbox embeds 2015-05-15 23:09:34 +06:00
Sergey M․
ef28a6cb26 [sportbox:embed] Relax thumbnail 2015-05-15 23:09:10 +06:00
Sergey M․
1436a6835e [sportbox:embed] Add _extract_urls 2015-05-15 23:08:44 +06:00
blissland
e8cfacae37 [CBSNewsIE] Relax thumbnail regex so test passes 2015-05-15 17:57:32 +01:00
Sergey M․
3a7382950b [sportbox:embed] Add extractor 2015-05-15 22:50:44 +06:00
Jaime Marquínez Ferrándiz
eeb23eb7ea [gamespot] The protocol is not optional 2015-05-15 18:44:08 +02:00
Jaime Marquínez Ferrándiz
34fe5a94ba [gamespot] Add support for videos that don't use 'f4m_stream' (fixes #5707) 2015-05-15 18:42:59 +02:00
Sergey M․
6181864290 Merge branch 'sportbox-fix' of https://github.com/maddoger/youtube-dl into maddoger-sportbox-fix 2015-05-15 22:09:18 +06:00
Vitaliy Syrchikov
e9ca615a98 New test 2015-05-15 19:57:54 +04:00
Sergey M․
62c95fd5fc [youtube:feed] Check each 'load more' portion for unique video ids 2015-05-15 21:42:34 +06:00
Sergey M․
25f14e9f93 [youtube] Separate feed extractor 2015-05-15 21:06:59 +06:00
Vitaliy Syrchikov
ae670a6ed8 Sportbox source fix. HD videos support. 2015-05-15 17:53:05 +04:00
Vitaliy Syrchikov
a7b8467ac0 Sportbox extractor fix. 2015-05-15 16:52:11 +04:00
blissland
15da7ce7fb Fix file format extraction regex and update test file checksum 2015-05-15 14:12:52 +02:00
Jaime Marquínez Ferrándiz
e9eaf3fbcf [test/YoutubeDL] Add tests for 'playliststart', 'playlistend' and 'playlist_items' 2015-05-15 14:08:26 +02:00
Jaime Marquínez Ferrándiz
3884dcf313 YoutubeDL: ignore indexes from 'playlist_items' that are not in the list (fixes #5706)
We ignore them instead of failing to match the behaviour of the 'playliststart' parameter.
2015-05-15 14:08:26 +02:00
Philipp Hagemeister
c4fc559f45 release 2015.05.15 2015-05-15 10:13:43 +02:00
Jaime Marquínez Ferrándiz
2bc4330303 [youtube:history] Fix extraction (fixes #5702)
It uses the same method as YoutubeSubscriptionsIE, if other feed starts using it we should consider using base class.
2015-05-14 23:41:27 +02:00
Yen Chi Hsuan
12675275a1 [teamcoco] Detect expired videos (#5626) 2015-05-15 02:28:41 +08:00
Yen Chi Hsuan
3a105f7b20 [teamcoco] Rewrite preload data extraction
Idea: "puncture" some consecutive fragments and check whether the
b64decode result of a punctured string is a valid JSON or not.

It's a O(N^3) algorithm, but should be fast for a small N (less than 30
fragments in all test cases)
2015-05-15 02:28:40 +08:00
Sergey M․
1ae72fb23d [soundcloud:user] Defer download link resolve (Closes #5248)
Looks like final download links can expire before downloading process reach them. So, resolving download links right before actual downloading.
2015-05-14 22:28:42 +06:00
Yen Chi Hsuan
7ec676bb3d [qqmusic] Add IE_NAME for all extractors 2015-05-14 23:32:36 +08:00
Yen Chi Hsuan
29ea57283e [qqmusic] Refactoring QQMusicToplistIE 2015-05-14 23:28:42 +08:00
Yen Chi Hsuan
5488973961 [qqmusic] flake8 2015-05-14 23:25:43 +08:00
Yen Chi Hsuan
96d45a5489 Merge pull request #5680 from ping/qqmusic-toplist-ie
[qqmusic] Add support for charts / top lists
2015-05-14 23:23:32 +08:00
Sergey M․
7a012d5a16 [screenwavemedia] Add support for player2 URLs (Closes #5696) 2015-05-14 16:39:35 +06:00
Yen Chi Hsuan
fa6a16996e [worldstarhiphop] Support Android URLs (fixes #5629) 2015-05-14 18:00:57 +08:00
Sergey M․
82245a6de7 [YoutubeDL] Restore filename for thumbnails 2015-05-14 15:21:27 +06:00
Sergey M․
ff28ede2d1 Merge branch 'dstftw-best-fallback-on-outdated-avconv' 2015-05-14 15:19:14 +06:00
Sergey M․
98b8ec8616 Merge branch 'best-fallback-on-outdated-avconv' of https://github.com/dstftw/youtube-dl into dstftw-best-fallback-on-outdated-avconv
Conflicts:
	youtube_dl/YoutubeDL.py
2015-05-14 15:18:58 +06:00
Yen Chi Hsuan
88f9d8748c Merge remote-tracking branch 'upstream/master' 2015-05-14 17:07:02 +08:00
Sergey M․
7d57d2e18b [canalplus] Restore checksums in tests 2015-05-14 14:59:27 +06:00
Sergey M.
38caa00d18 Merge pull request #5695 from blissland/master
[CanalplusIE] Update tests that were no longer working
2015-05-14 13:57:56 +05:00
Yen Chi Hsuan
c827d4cfdb [xattr] Enhanced error messages on Windows 2015-05-14 16:53:10 +08:00
blissland
509c630db8 [CanalplusIE] Update tests that were no longer working 2015-05-14 08:09:56 +01:00
Yen Chi Hsuan
fbff30d2db [xattr] Catch 'Argument list too long' 2015-05-14 14:51:00 +08:00
Yen Chi Hsuan
86c7fdb17c [xattr] Enhance error handling to catch ENOSPC
Fixes #5589
2015-05-14 14:28:41 +08:00
Yen Chi Hsuan
62bd6589c7 Merge pull request #5692 from yan12125/fix-embedthumbnailpp
Use thumbnails downloaded by YoutubeDL in EmbedThumbnailPP
2015-05-14 12:35:58 +08:00
Yen Chi Hsuan
2cc6d13547 [postprocessor/embedthumbnail] Encode arguments in calling AtomicParsley 2015-05-14 04:41:30 +08:00
Yen Chi Hsuan
bb8ca1d112 [postprocessor/embedthumbnail] Use run_ffmpeg_multiple_files 2015-05-14 02:35:28 +08:00
Yen Chi Hsuan
8e59539752 [postprocessor/embedthumbnail] Use thumbnails downloaded by YoutubeDL 2015-05-14 02:32:00 +08:00
Sergey M․
372744c544 [odnoklassniki] Fix extraction (Closes #5671) 2015-05-13 22:26:30 +06:00
Sergey M.
83880949a1 Merge pull request #5682 from blissland/master
[BYUtvIE] Relax thumbnail regex so test does not fail
2015-05-13 19:36:22 +05:00
Yen Chi Hsuan
3749e36e9f [YoutubeDL] Fix PEP8 W503 2015-05-13 21:16:45 +08:00
blissland
0b4253fa37 [BYUtvIE] Change thumbnail regex so test does not fail 2015-05-12 18:57:06 +01:00
ping
86ec1e487c [qqmusic] Code fixes 2015-05-13 01:37:56 +08:00
ping
fd4eefed39 [qqmusic] Fix extraction for global list 2015-05-13 01:14:02 +08:00
ping
b480e7874b [qqmusic] Fix code formatting 2015-05-12 22:41:37 +08:00
ping
41333b97b9 [qqmusic] Add support for charts / top lists 2015-05-12 22:35:16 +08:00
Yen Chi Hsuan
c1c924abfe [utils,common] Merge format_srt_time and _subtitles_timecode
format_srt_time uses a comma as the delimiter between seconds and
milliseconds while _subtitles_timecode uses a dot. All .srt examples I
found on the Internet uses a comma, so I use a comma in the merged
version. See http://matroska.org/technical/specs/subtitles/srt.html and
http://devel.aegisub.org/wiki/SubtitleFormats/SRT
2015-05-12 13:04:54 +08:00
Yen Chi Hsuan
1c7e2e64f6 [nrk] Remove TTML to srt conversion codes
A common routine is implemented in utils.py and can be used via
--convert-subtitles.
2015-05-12 12:55:14 +08:00
Yen Chi Hsuan
7dff03636a [utils] Support 'dur' field in TTML 2015-05-12 12:47:37 +08:00
Yen Chi Hsuan
5332fd91bf [nytimes] Correct _VALID_URL of NYTimesArticleIE 2015-05-12 12:42:13 +08:00
Sergey M․
d4b963d0a6 [vine] Relax alt_title (Closes #5677) 2015-05-12 01:54:56 +06:00
Sergey M․
6d3f5935e5 [southpark] Fix IE_NAME 2015-05-11 23:47:50 +06:00
rrooij
968ee17677 [southparkdk] Add extractor 2015-05-11 23:45:38 +06:00
rrooij
81ed3bb9c0 [southpark] Sort alphabetically 2015-05-11 23:45:29 +06:00
Sergey M․
5115652828 [zingmp3] Capture error message 2015-05-11 21:31:36 +06:00
Sergey M․
1f92865494 [dumpert] Add cpc cookie (Closes #5672) 2015-05-11 21:05:39 +06:00
Yen Chi Hsuan
e41f450f28 [tmz] Add support for articles (fixes #5477) 2015-05-11 20:06:10 +08:00
Sergey M․
97fcf1bbd0 [YoutubeDL] Check if merger can actually merge 2015-05-11 02:01:16 +06:00
Sergey M․
13763ce599 [postprocessor/ffmpeg] Add can_merge method 2015-05-11 02:00:31 +06:00
Sergey M․
7fcb605b82 [YoutubeDL] Fallback to -f best when merger is outdated 2015-05-11 00:27:29 +06:00
Sergey M․
70484b9f8a [postprocessor/ffmpeg] Extract check_outdated method 2015-05-11 00:26:39 +06:00
Jaime Marquínez Ferrándiz
69b46b3d95 ExecAfterDownloadPP: fix __init__ method 2015-05-10 17:47:49 +02:00
Jaime Marquínez Ferrándiz
95c5534f8e ExecAfterDownloadPP, YoutubeDL: remove unused parameters 2015-05-10 17:41:11 +02:00
Sergey M․
370b39e8ec [voicerepublic] Fix fallback branch formats extraction 2015-05-10 18:37:52 +06:00
Sergey M․
3da8038918 Merge branch 'duncankl-voicerepublic' 2015-05-10 18:29:36 +06:00
Sergey M․
a6762c4a22 [voicerepublic] Make more robust and extract more metadata 2015-05-10 18:29:15 +06:00
Sergey M․
98c2c0febc Merge branch 'voicerepublic' of https://github.com/duncankl/youtube-dl into duncankl-voicerepublic 2015-05-10 17:31:55 +06:00
Yen Chi Hsuan
63cbd19f50 [ndr] Replace the 404 test case 2015-05-10 18:30:26 +08:00
Yen Chi Hsuan
1934f3a0ea [ndr] Extended to support n-joy.de as well (closes #4527)
According to http://en.wikipedia.org/wiki/N-Joy, n-joy.de is a service
hosted by NDR, so I put them together.
2015-05-10 18:22:07 +08:00
ping
a909e6ad43 [dailymotion] Patch upload_date detection.
(closes #5665)
2015-05-10 11:13:14 +02:00
Duncan
1dcb52188d [voicerepublic] Remove hardcoded paths to media files 2015-05-10 17:06:34 +12:00
Duncan
28ebef0b1b [voicerepublic] Detect list of available formats from the web page 2015-05-10 16:03:09 +12:00
Duncan
f03a8a3c4e [voicerepublic] Raise ExtractorError if audio is still being processed 2015-05-10 15:50:06 +12:00
Duncan
03f760b1c0 [voicerepublic] Remove creator field 2015-05-10 15:41:27 +12:00
Duncan
f900dc3fb9 [voicerepublic] Extract author using _html_search_meta 2015-05-10 15:01:58 +12:00
Sergey M․
95eb1adda8 [life:embed] Sort formats 2015-05-10 08:54:50 +06:00
Duncan
c6ddbdb66c [voicerepublic] Add new extractor 2015-05-10 12:39:24 +12:00
Sergey M․
3800b908b1 [mlb] Fix #5663 2015-05-10 06:14:34 +06:00
Philipp Hagemeister
69fe3a5f09 release 2015.05.10 2015-05-10 01:05:24 +02:00
Sergey M․
754270313a [life:embed] Move to separated extractor and extract m3u8 formats 2015-05-10 01:03:26 +06:00
Sergey M․
057ebeaca3 [lifenews] Add test for #5660 2015-05-10 00:27:49 +06:00
Sergey M․
480065172d [lifenews] Add support for video URLs (Closes #5660) 2015-05-10 00:26:42 +06:00
Sergey M․
f2e0056579 [vgtv] Avoid duplicate format_id 2015-05-09 21:23:09 +06:00
Sergey M․
32fffff2cc [eroprofile] Fix video URL extraction (Closes #5657) 2015-05-09 21:19:09 +06:00
Sergey M.
3c47824d6b Merge pull request #5658 from blissland/master
[BRIE] Updated two test cases
2015-05-09 20:07:21 +05:00
blissland
0892090a56 Added audio test for BRIE 2015-05-09 16:02:07 +01:00
blissland
d592b42f5c Updated two tests for BRIE 2015-05-09 15:26:00 +01:00
Jaime Marquínez Ferrándiz
3b5f65a64c [mlb] Fix extraction of articles
And move test from generic, since it's directly handled by MLBIE
2015-05-09 12:41:56 +02:00
Jaime Marquínez Ferrándiz
5c0b2c16a8 [vgtv] Escape '#' in _VALID_URL and remove empty newlines at the end
In verbose mode, '#' is interpreted as the start of a comment.
2015-05-09 12:34:45 +02:00
Yen Chi Hsuan
d39e0f05db [utils] Remove sanitize_url_path_consecutive_slashes()
This function is used only in SohuIE, which is updated to use a new
extraction logic.
2015-05-09 17:37:39 +08:00
Yen Chi Hsuan
6d14d08e06 [yam] Fix title and uploader id 2015-05-09 17:36:07 +08:00
Yen Chi Hsuan
32060c6d6b [sohu] Update extractor
The original extraction logic always fails for all test videos
2015-05-09 14:02:11 +08:00
Yen Chi Hsuan
3dbec410a0 [sohu] Enhance error handling 2015-05-09 14:02:11 +08:00
Sergey M․
de765f6c31 [foxsports] Support some more URLs (#5611) 2015-05-09 02:15:51 +06:00
Sergey M․
dc455a5f88 [extractor/generic] Add test for svt embed 2015-05-09 00:27:37 +06:00
Sergey M․
bab19a8e91 [extractor/generic] Add support for svt embeds (Closes #5622) 2015-05-09 00:23:35 +06:00
Sergey M․
322915014f [svtplay] Rename to svt 2015-05-09 00:13:40 +06:00
Sergey M․
79998cd5af [svtplay] Generalize svt extractors and add svt.se extractor 2015-05-09 00:12:42 +06:00
Sergey M.
50b9013064 [README.md] Fix typo 2015-05-08 23:21:23 +06:00
Sergey M.
bb03fdae0d [README.md] Clarify format selection when streaming to stdout 2015-05-08 23:19:57 +06:00
Sergey M․
4384cf9e7d [extractor/__init__] Fix alphabetic order 2015-05-08 23:04:27 +06:00
Sergey M.
d47e980d0d Merge pull request #5641 from dstftw/preserve-best-for-stdout-outtmpl
[YoutubeDL] Do not force bestvideo+bestaudio when outtmpl is stdout
2015-05-08 22:01:50 +05:00
Sergey M․
fe373287eb [vgtv] Add support for bt vestlendingen (Closes #5620) 2015-05-08 22:59:50 +06:00
Sergey M․
cbe443362f [aftenposten] Implement in terms of xtream extractor 2015-05-08 22:52:20 +06:00
Sergey M․
2c0c9dc46c [xstream] Move xstream to separate extractor 2015-05-08 22:50:01 +06:00
Sergey M․
0ceab84749 [vgtv] Add support for bt.no articles (#5620) 2015-05-08 22:18:43 +06:00
Sergey M․
34e7dc81a9 [vgtv] Add support for generic bt.no URLs (#5620) 2015-05-08 22:03:03 +06:00
Sergey M․
4e6e9d21bd [mlb] Improve _VALID_URL 2015-05-08 21:48:47 +06:00
Sergey M․
d1feb30811 [mlb] Fallback to extracting video id from webpage for all URLs that does not contain it explicitly (Closes #5630) 2015-05-08 20:07:53 +06:00
blissland
43837189c1 Fix URL template extraction for netzkino. Fixes #5614 2015-05-08 12:20:34 +02:00
blissland
249962ffa2 [bet] Use unique part of xml url as the video id and fix tests (closes #5642)
The guid changes often.
2015-05-08 11:31:05 +02:00
Jaime Marquínez Ferrándiz
541168039d [utils] get_exe_version: encode executable name (fixes #5647)
It failed in python 2.x when $PATH contains a directory with non-ascii characters.
2015-05-08 11:01:24 +02:00
Yen Chi Hsuan
7ef00afe9d [nhl] Support RTMP videos (fixes #4481) 2015-05-08 03:11:25 +08:00
Yen Chi Hsuan
156fc83a55 [downloader/rtmp] Fix a typo 2015-05-08 03:11:24 +08:00
Naglis Jonaitis
46be82b811 [vessel] Use main_video_asset when searching for video_asset (Fixes #5623) 2015-05-07 22:00:07 +03:00
Yen Chi Hsuan
09b412dafa [nhl] Partial support for hlg id (fixes #4285) 2015-05-08 02:14:28 +08:00
Jaime Marquínez Ferrándiz
5268a05e47 [ooyala] Style fix 2015-05-07 17:04:15 +02:00
Sergey M․
406224be52 [extractor/generic] Fix following incomplete redirects (#5640) 2015-05-07 21:02:59 +06:00
Sergey M․
3799834dcf [YoutubeDL] Do not force bestvideo+bestaudio when outtmpl is stdout (#5627) 2015-05-07 20:46:11 +06:00
Yen Chi Hsuan
553e412bda Merge branch 'master' of github.com:rg3/youtube-dl 2015-05-07 22:24:49 +08:00
Sergey M․
f22834a372 [bild] Relax thumbnail test check 2015-05-07 20:20:43 +06:00
Sergey M.
bd349a8704 Merge pull request #5638 from blissland/master
[BildIE] Fix ampersands in xml attributes & update test thumbnails
2015-05-07 19:18:35 +05:00
blissland
bc08873cff Fix indents 2015-05-07 15:09:27 +01:00
Yen Chi Hsuan
aafe273990 [ooyala] Use SAS API to extract info (fixes #4336) 2015-05-07 22:07:32 +08:00
blissland
c09593c04e [BildIE] Escape ampersands in xml and update test thumbnail 2015-05-07 15:07:11 +01:00
Yen Chi Hsuan
84bf31aaf8 [ooyala] Extract m3u8 information (#2292) 2015-05-07 18:12:01 +08:00
Yen Chi Hsuan
05d5392cda [common] Ignore subtitles in m3u8 2015-05-07 18:06:22 +08:00
Yen Chi Hsuan
d9a743d917 [vice] Remove a redundant print 2015-05-07 18:05:37 +08:00
Yen Chi Hsuan
ac6c358c2a [teamcoco] Fix extracting preload data again 2015-05-07 12:58:00 +08:00
Sergey M․
ad0c0ad3b4 [historicfilms] Fix tape id extraction 2015-05-06 21:52:26 +06:00
Sergey M․
1ed34f3dd6 [gorillavid] Switch 404 test to only matching 2015-05-06 21:43:36 +06:00
Sergey M․
6a8f9cd22e [giga] Fix view count extraction 2015-05-06 21:39:53 +06:00
Sergey M․
e8b9ab8957 [pbs] Add format_id for direct links 2015-05-06 21:31:25 +06:00
Sergey M․
74f728249f [extractor/common] Fallback to empty string for (yet) missing format_id in _sort_formats (Closes #5624) 2015-05-06 21:24:24 +06:00
blissland
d6a1738892 [archive.org] Fix incorrect url condition (closes #5628)
The condition for assigning to json_url is the wrong way round:

currently for url: aaa.com/xxx

we get:

aaa.com/xxx&output=json

instead of the correct value:

aaa.com/xxx?output=json
2015-05-06 15:06:10 +02:00
Sergey M․
b326b07adc [lifenews] Use _proto_relative_url 2015-05-05 21:49:36 +06:00
Yen Chi Hsuan
07d2921c6d [lifenews] Correctly determine iframe links (fixes #5618) 2015-05-05 23:39:54 +08:00
Sergey M.
22e462c97a Merge pull request #5612 from rrooij/southparknl
Southparknl
2015-05-05 19:32:27 +05:00
rrooij
dcf8077906 [southparknl] Fix test to match playlist tests 2015-05-05 09:17:21 +02:00
rrooij
3408f6e64a [southparkde] Fix naming inconsistency
The class was first called 'SouthparkDe'. It is now changed to
'SouthParkDe' to match the name of the other extractors.
2015-05-05 09:01:07 +02:00
rrooij
e10dc0e1f0 [southparknl] Add extractor for southpark.nl 2015-05-05 08:59:09 +02:00
Sergey M․
ce5c1ae517 [noco] Remove unused import 2015-05-05 02:52:21 +06:00
Sergey M․
bbe718c97f Merge branch 'Tassatux-noco' 2015-05-05 02:50:58 +06:00
Sergey M․
01e4b1ee14 [noco] Update tests 2015-05-05 02:50:39 +06:00
Sergey M․
815ac0293e [noco] Modernize 2015-05-05 02:38:13 +06:00
Sergey M․
6568382d6f [noco] Extract all variations of audio/subtitles media 2015-05-05 02:27:24 +06:00
Sergey M․
f943b7ddce Merge branch 'noco' of https://github.com/Tassatux/youtube-dl into Tassatux-noco 2015-05-05 00:39:24 +06:00
Aurélien Dunand
ff9d68e7be [noco] Add test for multi languages video 2015-05-04 19:55:29 +02:00
Aurélien Dunand
7212560f4d [noco] Retrieve video language according to user options 2015-05-04 18:06:12 +02:00
Sergey M․
1aa43d77c0 [rutv] Remove superfluous check 2015-05-04 21:29:56 +06:00
Sergey M․
e038d5c4e3 [rutv] Fix preference 2015-05-04 21:29:32 +06:00
Sergey M․
dfad3aac98 [rutv] Fix live stream test URL 2015-05-04 21:23:26 +06:00
Yen Chi Hsuan
df8418ffcf [nytimes] Extend _VALID_URL (#2754) 2015-05-04 23:03:47 +08:00
Yen Chi Hsuan
50aa43b3ae [nytimes] Implement extracting videos from articles (closes #5436) 2015-05-04 23:03:47 +08:00
Jaime Marquínez Ferrándiz
a90552663e [livestream:original] Update url format (fixes #5598) 2015-05-04 16:54:01 +02:00
Jaime Marquínez Ferrándiz
883340c107 [livestream:original] Fix extraction (fixes #4702) 2015-05-04 16:52:17 +02:00
Yen Chi Hsuan
0fe2ff78e6 [NBC] Enhance embedURL extraction (closes #2549) 2015-05-04 21:55:04 +08:00
Philipp Hagemeister
dc1eed93be release 2015.05.04 2015-05-04 15:12:48 +02:00
Sergey M․
b2f82360d7 [escapist] Add uploader to tests 2015-05-04 19:06:07 +06:00
Sergey M․
782e0568ef [escapist] Modernize 2015-05-04 19:04:49 +06:00
Sergey M․
90b4b0eabe [escapist] Improve _VALID_URL 2015-05-04 19:01:08 +06:00
Sergey M․
cec04ef3a6 [escapist] Update tests' checksums 2015-05-04 19:00:34 +06:00
Sergey M․
71fa56b887 [escapist] Fix formats extraction 2015-05-04 18:59:22 +06:00
Yen Chi Hsuan
b9b3ab45ea [NBC] Enhance extraction of ThePlatform URL (fixes #5470) 2015-05-04 19:09:18 +08:00
Philipp Hagemeister
957b794c26 release 2015.05.03 2015-05-03 22:31:39 +02:00
Yen Chi Hsuan
8001607e90 [generic] Detect more MLB videos (fixes #5443) 2015-05-04 02:20:07 +08:00
Yen Chi Hsuan
3e7202c1bc [MLB] Extend _VALID_URL (#5443) 2015-05-04 01:59:26 +08:00
Yen Chi Hsuan
848edeab89 [lifenews] Detect <iframe> (fixes #5346) 2015-05-04 01:24:19 +08:00
Yen Chi Hsuan
1748d67aea [lifenews] Fix view count and comment count 2015-05-04 01:11:23 +08:00
Jaime Marquínez Ferrándiz
5477ca8239 [dailymotion] Use https urls
The video url still redirects to an http url, but it doesn't explicitly contain the video id.
2015-05-03 16:59:14 +02:00
Sergey M․
d0fd305023 [rutv] Add test for #5584 2015-05-03 10:00:34 +06:00
Sergey M․
8dab1e9072 [rutv] Recognize live streams (#5584) 2015-05-03 09:56:03 +06:00
Sergey M․
963aea5279 [baiduvideo] Improve _VALID_URL 2015-05-03 07:45:15 +06:00
Sergey M․
0a64aa7355 [vgtv] Fix _VALID_URL (Closes #5578) 2015-05-03 00:58:42 +06:00
Sergey M․
0669c89c55 [options] Clarify --write-annotations help 2015-05-02 23:38:30 +06:00
Sergey M․
2699da8041 [YoutubeDL] Improve description file naming 2015-05-02 23:36:55 +06:00
Sergey M․
98727e123f [YoutubeDL] Improve annotations file naming 2015-05-02 23:35:18 +06:00
Sergey M․
b29e0000e6 [YoutubeDL] Improve JSON info file naming 2015-05-02 23:23:44 +06:00
Sergey M․
b3ed15b760 [utils] Add replace_extension 2015-05-02 23:23:06 +06:00
Sergey M․
666a9a2b95 [YoutubeDL] Improve audio/video-only file naming 2015-05-02 23:11:34 +06:00
Sergey M․
a4bcaad773 [test_utils] Add tests for prepend_extension 2015-05-02 23:10:48 +06:00
Sergey M․
e65e4c8874 [utils] Improve prepend_extension
Now `ext` is appended to filename if real extension != expected extension.
2015-05-02 23:06:01 +06:00
Yen Chi Hsuan
21f6330274 [baiduvideo] Add new extractor (closes #4563) 2015-05-03 00:53:24 +08:00
Sergey M․
38c6902b90 [YoutubeDL] Ensure correct extension is always present for a merged file (Closes #5535) 2015-05-02 22:52:21 +06:00
Jaime Marquínez Ferrándiz
2ddcd88129 Remove code that was only used by the Grooveshark extractor 2015-05-02 17:29:56 +02:00
Yen Chi Hsuan
dd8920653c [Grooveshark] Remove the extractor
grooveshark.com was shut down on 2015/04/30
2015-05-02 21:46:33 +08:00
Sergey M․
c938c35f95 [iconosquare] Fix extraction 2015-05-02 07:18:22 +06:00
Yen Chi Hsuan
2eb0192155 [viki] Remove clean_html call 2015-05-02 01:35:46 +08:00
Yen Chi Hsuan
d948e09b61 [viki] Extract m3u8 videos (#4855) 2015-05-02 01:20:16 +08:00
Yen Chi Hsuan
89966a5aea [viki] Enhance error message handling (#3774) 2015-05-02 01:20:15 +08:00
Yen Chi Hsuan
8e3df9dfee [viki] Fix extractor and add a global availble test case 2015-05-02 01:20:15 +08:00
Sergey M․
5890eef6b0 [pbs] Add support for HD (Closes #3564, closes #5390) 2015-05-01 17:43:06 +06:00
Nikoli
083c1bb960 Add ability to embed subtitles in mkv files (closes #5434) 2015-05-01 11:54:40 +02:00
Yen Chi Hsuan
861e65eb05 [yahoo] Extend _VALID_URL 2015-05-01 12:32:24 +08:00
Sergey M․
650cfd0cb0 [bbccouk] Mute thumbnail 2015-05-01 04:07:30 +06:00
Sergey M․
e68ae99a41 [bbccouk] Add test for #5530 2015-05-01 04:02:56 +06:00
Sergey M․
8683b4d8d9 [bbccouk] Improve extraction (Closes #5530) 2015-05-01 03:59:13 +06:00
Sergey M․
1dbd717eb4 [theplaform] Fix FutureWarning 2015-05-01 02:51:55 +06:00
Sergey M․
6a8422b942 [foxsports] Add extractor (Closes #5517) 2015-05-01 02:49:06 +06:00
Sergey M․
cb202fd286 [YoutubeDL] Filter requested info fields on --load-info as well
In order to properly handle JSON info files generated by youtube-dl versions prior to 4070b458ec
2015-05-01 00:44:34 +06:00
Naglis Jonaitis
67fc8ecd53 [dreisat] Extend _VALID_URL (Closes #5548) 2015-04-30 21:28:08 +03:00
Jaime Marquínez Ferrándiz
df8301fef5 [YoutubeDL] pep8: use 'k not in' instead of 'not k in' 2015-04-30 20:18:42 +02:00
Sergey M․
4070b458ec [YoutubeDL] Do not write requested info in info JSON file (Closes #5562, closes #5564) 2015-04-30 23:55:05 +06:00
Yen Chi Hsuan
ffbc3901d2 Merge remote-tracking branch 'upstream/master' 2015-04-30 23:33:49 +08:00
Sergey M․
7a03280df4 [vporn] More metadata extraction fixes and tests update (#5560) 2015-04-30 21:31:38 +06:00
Yen Chi Hsuan
482a1258de [VeeHD] Replace the third test case due to copyright issues 2015-04-30 23:27:07 +08:00
Sergey M․
cd298882cd [vporn] Fix metadata extraction (#5560) 2015-04-30 21:25:17 +06:00
Sergey M․
e01c56f9e1 [YoutubeDL] Generalize best/worst format match behavior 2015-04-30 21:06:51 +06:00
Sergey M.
4d72df4031 Merge pull request #5556 from jaimeMF/best-format-nodash
Make 'best' format only match non-DASH formats (closes #5554)
2015-04-30 19:57:02 +05:00
Yen Chi Hsuan
f7f1df1d82 [VeeHD] Enhance extraction and fix tests (fixes #4965) 2015-04-30 22:37:41 +08:00
Yen Chi Hsuan
c4a21bc9db [bilibili] Extract multipart videos (closes #3250) 2015-04-30 18:26:08 +08:00
Yen Chi Hsuan
621ffe7bf4 [niconico] Fix so* video extraction (fixes #4874) (#2087) 2015-04-30 17:05:02 +08:00
Jaime Marquínez Ferrándiz
8dd5418803 Make 'best' format only match non-DASH formats (closes #5554)
Otherwise it's impossible to only download non-DASH formats, for example `best[height=?480]/best` would download a DASH video if it's the only one with height=480, instead for falling back to the second format specifier.
For audio only urls (soundcloud, bandcamp ...), the best audio will be downloaded as before.
2015-04-29 22:53:18 +02:00
Jaime Marquínez Ferrándiz
965cb8d530 [escapist] pep8 fixes 2015-04-29 22:46:19 +02:00
Yen Chi Hsuan
b2e8e7dab5 [niconico] Try to extract all optional fields from various sources 2015-04-30 02:24:05 +08:00
Yen Chi Hsuan
59d814f793 [niconico] Remove credentials from tests and enhance title extraction
All test videos can be downloaded without username and password now.
2015-04-30 00:50:48 +08:00
Yen Chi Hsuan
bb865f3a5e [niconico] Fix extraction and update tests (closes #5511) 2015-04-30 00:50:48 +08:00
Yen Chi Hsuan
9ee53a49f0 [YouPorn] Fix extractor 2015-04-30 00:50:48 +08:00
Sergey M.
79adb09baa Merge pull request #5553 from zouhair/master
Typo: twice "the the" to "the"
2015-04-29 20:05:48 +05:00
zouhair
cf0649f8b7 Typo: twice "the the" to "the" 2015-04-29 11:03:10 -04:00
Sergey M.
f8690631e2 Merge pull request #5552 from zouhair/master
Typo "incompatible" instead of "uncompatible"
2015-04-29 19:09:47 +05:00
zouhair
5456d78f0c Typo "incompatible" instead of "uncompatible" 2015-04-29 10:07:49 -04:00
Yen Chi Hsuan
cbbece96a2 [yourupload] Simplify 2015-04-29 04:05:14 +08:00
Yen Chi Hsuan
9d8ba307ef [yourupload] Fix extraction 2015-04-29 04:03:07 +08:00
Yen Chi Hsuan
ec7c1e85e0 [testtube] Fix test case 1
Seems the site now provides webm with higher bitrates
2015-04-29 00:24:58 +08:00
Yen Chi Hsuan
e70c7568c0 [testtube] Detect Youtube iframes (fixes #4867) 2015-04-29 00:22:17 +08:00
Yen Chi Hsuan
39b62db116 [youtube] Catch more alert messages (closes #5074) 2015-04-28 23:07:56 +08:00
Jaime Marquínez Ferrándiz
2edce52584 [vimeo] Fix password protected videos again (#5082)
Since they have changed again to the previous format, I've modified the regex to match both formats.
2015-04-28 15:06:08 +02:00
pulpe
10831b5ec9 [vimeo] Fix redirection 2015-04-28 14:56:48 +02:00
132 changed files with 3476 additions and 1682 deletions

View File

@@ -124,3 +124,5 @@ Mohammad Teimori Pabandi
Roman Le Négrate
Matthias Küch
Julian Richen
Ping O.
Mister Hat

View File

@@ -17,12 +17,12 @@ youtube-dl - download videos from youtube.com or other video platforms
To install it right away for all UNIX users (Linux, OS X, etc.), type:
sudo curl https://yt-dl.org/latest/youtube-dl -o /usr/local/bin/youtube-dl
sudo chmod a+x /usr/local/bin/youtube-dl
sudo chmod a+rx /usr/local/bin/youtube-dl
If you do not have curl, you can alternatively use a recent wget:
sudo wget https://yt-dl.org/downloads/latest/youtube-dl -O /usr/local/bin/youtube-dl
sudo chmod a+x /usr/local/bin/youtube-dl
sudo chmod a+rx /usr/local/bin/youtube-dl
Windows users can [download a .exe file](https://yt-dl.org/latest/youtube-dl.exe) and place it in their home directory or any other location on their [PATH](http://en.wikipedia.org/wiki/PATH_%28variable%29).
@@ -133,7 +133,7 @@ which means you can modify it, redistribute it or use it however you like.
--no-mtime Do not use the Last-modified header to set the file modification time
--write-description Write video description to a .description file
--write-info-json Write video metadata to a .info.json file
--write-annotations Write video annotations to a .annotation file
--write-annotations Write video annotations to a .annotations.xml file
--load-info FILE JSON file containing the video information (created with the "--write-info-json" option)
--cookies FILE File to read cookies from and dump cookie jar in
--cache-dir DIR Location in the filesystem where youtube-dl can store some downloaded information permanently. By default $XDG_CACHE_HOME/youtube-dl
@@ -168,7 +168,7 @@ which means you can modify it, redistribute it or use it however you like.
--no-progress Do not print progress bar
--console-title Display progress in console titlebar
-v, --verbose Print various debugging information
--dump-pages Print downloaded pages to debug problems (very verbose)
--dump-pages Print downloaded pages encoded using base64 to debug problems (very verbose)
--write-pages Write downloaded intermediary pages to files in the current directory to debug problems
--print-traffic Display sent and read HTTP traffic
-C, --call-home Contact the youtube-dl server for debugging
@@ -216,11 +216,11 @@ which means you can modify it, redistribute it or use it however you like.
--recode-video FORMAT Encode the video to another format if necessary (currently supported: mp4|flv|ogg|webm|mkv)
-k, --keep-video Keep the video file on disk after the post-processing; the video is erased by default
--no-post-overwrites Do not overwrite post-processed files; the post-processed files are overwritten by default
--embed-subs Embed subtitles in the video (only for mp4 videos)
--embed-subs Embed subtitles in the video (only for mkv and mp4 videos)
--embed-thumbnail Embed thumbnail in the audio as cover art
--add-metadata Write metadata to the video file
--metadata-from-title FORMAT Parse additional metadata like song title / artist from the video title. The format syntax is the same as --output, the parsed
parameters replace existing values. Additional templates: %(album), %(artist). Example: --metadata-from-title "%(artist)s -
parameters replace existing values. Additional templates: %(album)s, %(artist)s. Example: --metadata-from-title "%(artist)s -
%(title)s" matches a title like "Coldplay - Paradise"
--xattrs Write metadata to the video file's xattrs (using dublin core and xdg standards)
--fixup POLICY Automatically correct known faults of the file. One of never (do nothing), warn (only emit a warning), detect_or_warn(the default;
@@ -269,7 +269,7 @@ The simplest case is requesting a specific format, for example `-f 22`. You can
If you want to download multiple videos and they don't have the same formats available, you can specify the order of preference using slashes, as in `-f 22/17/18`. You can also filter the video results by putting a condition in brackets, as in `-f "best[height=720]"` (or `-f "[filesize>10M]"`). This works for filesize, height, width, tbr, abr, vbr, asr, and fps and the comparisons <, <=, >, >=, =, != and for ext, acodec, vcodec, container, and protocol and the comparisons =, != . Formats for which the value is not known are excluded unless you put a question mark (?) after the operator. You can combine format filters, so `-f "[height <=? 720][tbr>500]"` selects up to 720p videos (or videos where the height is not known) with a bitrate of at least 500 KBit/s. Use commas to download multiple formats, such as `-f 136/137/mp4/bestvideo,140/m4a/bestaudio`. You can merge the video and audio of two formats into a single file using `-f <video-format>+<audio-format>` (requires ffmpeg or avconv), for example `-f bestvideo+bestaudio`.
Since the end of April 2015 and version 2015.04.26 youtube-dl uses `-f bestvideo+bestaudio/best` as default format selection (see #5447, #5456). If ffmpeg or avconv are installed this results in downloading `bestvideo` and `bestaudio` separately and muxing them together into a single file giving the best overall quality available. Otherwise it falls back to `best` and results in downloading best available quality served as a single file. `best` is also needed for videos that don't come from YouTube because they don't provide the audio and video in two different files. If you want to only download some dash formats (for example if you are not interested in getting videos with a resolution higher than 1080p), you can add `-f bestvideo[height<=?1080]+bestaudio/best` to your configuration file.
Since the end of April 2015 and version 2015.04.26 youtube-dl uses `-f bestvideo+bestaudio/best` as default format selection (see #5447, #5456). If ffmpeg or avconv are installed this results in downloading `bestvideo` and `bestaudio` separately and muxing them together into a single file giving the best overall quality available. Otherwise it falls back to `best` and results in downloading best available quality served as a single file. `best` is also needed for videos that don't come from YouTube because they don't provide the audio and video in two different files. If you want to only download some dash formats (for example if you are not interested in getting videos with a resolution higher than 1080p), you can add `-f bestvideo[height<=?1080]+bestaudio/best` to your configuration file. Note that if you use youtube-dl to stream to `stdout` (and most likely to pipe it to your media player then), i.e. you explicitly specify output template as `-o -`, youtube-dl still uses `-f best` format selection in order to start content delivery immediately to your player and not to wait until `bestvideo` and `bestaudio` are downloaded and muxed.
If you want to preserve the old format selection behavior (prior to youtube-dl 2015.04.26), i.e. you want to download best available quality media served as a single file, you should explicitly specify your choice with `-f best`. You may want to add it to the [configuration file](#configuration) in order not to type it every time you run youtube-dl.

View File

@@ -26,8 +26,7 @@
- **anitube.se**
- **AnySex**
- **Aparat**
- **AppleDailyAnimationNews**
- **AppleDailyRealtimeNews**
- **AppleDaily**
- **AppleTrailers**
- **archive.org**: archive.org videos
- **ARD**
@@ -44,6 +43,7 @@
- **audiomack**
- **audiomack:album**
- **Azubu**
- **BaiduVideo**
- **bambuser**
- **bambuser:channel**
- **Bandcamp**
@@ -63,6 +63,8 @@
- **BR**: Bayerischer Rundfunk Mediathek
- **Break**
- **Brightcove**
- **bt:article**: Bergens Tidende Articles
- **bt:vestlendingen**: Bergens Tidende - Vestlendingen
- **BuzzFeed**
- **BYUtv**
- **Camdemy**
@@ -139,6 +141,7 @@
- **Eporner**
- **EroProfile**
- **Escapist**
- **ESPN** (Currently broken)
- **EveryonesMixtape**
- **exfm**: ex.fm
- **ExpoTV**
@@ -148,13 +151,13 @@
- **fc2**
- **fernsehkritik.tv**
- **fernsehkritik.tv:postecke**
- **Firedrive**
- **Firstpost**
- **Flickr**
- **Folketinget**: Folketinget (ft.dk; Danish parliament)
- **FootyRoom**
- **Foxgay**
- **FoxNews**
- **FoxSports**
- **france2.fr:generation-quoi**
- **FranceCulture**
- **FranceInter**
@@ -184,7 +187,6 @@
- **Golem**
- **GorillaVid**: GorillaVid.in, daclips.in, movpod.in, fastvideo.in and realvid.net
- **Goshgay**
- **Grooveshark**
- **Groupon**
- **Hark**
- **HearThisAt**
@@ -226,6 +228,7 @@
- **KanalPlay**: Kanal 5/9/11 Play
- **Kankan**
- **Karaoketv**
- **KarriereVideos**
- **keek**
- **KeezMovies**
- **KhanAcademy**
@@ -239,6 +242,7 @@
- **LetvPlaylist**
- **LetvTv**
- **Libsyn**
- **life:embed**
- **lifenews**: LIFE | NEWS
- **LiveLeak**
- **livestream**
@@ -287,6 +291,7 @@
- **MySpass**
- **myvideo**
- **MyVidster**
- **N-JOY**
- **n-tv.de**
- **NationalGeographic**
- **Naver**
@@ -316,6 +321,7 @@
- **NosVideo**
- **novamov**: NovaMov
- **Nowness**
- **NowTV**
- **nowvideo**: NowVideo
- **npo.nl**
- **npo.nl:live**
@@ -327,11 +333,13 @@
- **ntv.ru**
- **Nuvid**
- **NYTimes**
- **NYTimesArticle**
- **ocw.mit.edu**
- **Odnoklassniki**
- **OktoberfestTV**
- **on.aol.com**
- **Ooyala**
- **OoyalaExternal**
- **OpenFilm**
- **orf:fm4**: radio FM4
- **orf:iptv**: iptv.ORF.at
@@ -363,9 +371,10 @@
- **prosiebensat1**: ProSiebenSat.1 Digital
- **Puls4**
- **Pyvideo**
- **QQMusic**
- **QQMusicAlbum**
- **QQMusicSinger**
- **qqmusic**
- **qqmusic:album**
- **qqmusic:singer**
- **qqmusic:toplist**
- **QuickVid**
- **R7**
- **radio.de**
@@ -384,7 +393,6 @@
- **Rte**
- **rtl.nl**: rtl.nl and rtlxl.nl
- **RTL2**
- **RTLnow**
- **RTP**
- **RTS**: RTS.ch
- **rtve.es:alacarta**: RTVE a la carta
@@ -422,7 +430,6 @@
- **smotri:community**: Smotri.com community videos
- **smotri:user**: Smotri.com user videos
- **Snotr**
- **Sockshare**
- **Sohu**
- **soundcloud**
- **soundcloud:playlist**
@@ -433,6 +440,8 @@
- **southpark.cc.com**
- **southpark.cc.com:español**
- **southpark.de**
- **southpark.nl**
- **southparkstudios.dk**
- **Space**
- **SpankBang**
- **Spankwire**
@@ -442,6 +451,7 @@
- **Spike**
- **Sport5**
- **SportBox**
- **SportBoxEmbed**
- **SportDeutschland**
- **Srf**
- **SRMediathek**: Saarländischer Rundfunk
@@ -452,6 +462,7 @@
- **StreamCZ**
- **StreetVoice**
- **SunPorno**
- **SVT**
- **SVTPlay**: SVT Play and Öppet arkiv
- **SWRMediathek**
- **Syfy**
@@ -485,6 +496,7 @@
- **tlc.com**
- **tlc.de**
- **TMZ**
- **TMZArticle**
- **TNAFlix**
- **tou.tv**
- **Toypics**: Toypics user profile
@@ -499,6 +511,8 @@
- **Turbo**
- **Tutv**
- **tv.dfb.de**
- **TV2**
- **TV2Article**
- **TV4**: tv4.se and tv4play.se
- **tvigle**: Интернет-телевидение Tvigle.ru
- **tvp.pl**
@@ -528,7 +542,7 @@
- **Vessel**
- **Vesti**: Вести.Ru
- **Vevo**
- **VGTV**
- **VGTV**: VGTV and BTTV
- **vh1.com**
- **Vice**
- **Viddler**
@@ -548,6 +562,7 @@
- **vier:videos**
- **Viewster**
- **viki**
- **viki:channel**
- **vimeo**
- **vimeo:album**
- **vimeo:channel**
@@ -562,6 +577,7 @@
- **vk.com**
- **vk.com:user-videos**: vk.com:All of a user's videos
- **Vodlocker**
- **VoiceRepublic**
- **Vporn**
- **VRT**
- **vube**: Vube.com
@@ -586,6 +602,7 @@
- **XHamster**
- **XMinus**
- **XNXX**
- **Xstream**
- **XTube**
- **XTubeUser**: XTube user profile
- **Xuite**

View File

@@ -12,6 +12,7 @@ import copy
from test.helper import FakeYDL, assertRegexpMatches
from youtube_dl import YoutubeDL
from youtube_dl.compat import compat_str
from youtube_dl.extractor import YoutubeIE
from youtube_dl.postprocessor.common import PostProcessor
from youtube_dl.utils import match_filter_func
@@ -237,7 +238,7 @@ class TestFormatSelection(unittest.TestCase):
f2['url'] = 'url:' + f2id
info_dict = _make_result([f1, f2], extractor='youtube')
ydl = YDL()
ydl = YDL({'format': 'best/bestvideo'})
yie = YoutubeIE(ydl)
yie._sort_formats(info_dict['formats'])
ydl.process_ie_result(info_dict)
@@ -245,7 +246,7 @@ class TestFormatSelection(unittest.TestCase):
self.assertEqual(downloaded['format_id'], f1id)
info_dict = _make_result([f2, f1], extractor='youtube')
ydl = YDL()
ydl = YDL({'format': 'best/bestvideo'})
yie = YoutubeIE(ydl)
yie._sort_formats(info_dict['formats'])
ydl.process_ie_result(info_dict)
@@ -507,6 +508,51 @@ class TestYoutubeDL(unittest.TestCase):
res = get_videos(f)
self.assertEqual(res, ['1'])
def test_playlist_items_selection(self):
entries = [{
'id': compat_str(i),
'title': compat_str(i),
'url': TEST_URL,
} for i in range(1, 5)]
playlist = {
'_type': 'playlist',
'id': 'test',
'entries': entries,
'extractor': 'test:playlist',
'extractor_key': 'test:playlist',
'webpage_url': 'http://example.com',
}
def get_ids(params):
ydl = YDL(params)
# make a copy because the dictionary can be modified
ydl.process_ie_result(playlist.copy())
return [int(v['id']) for v in ydl.downloaded_info_dicts]
result = get_ids({})
self.assertEqual(result, [1, 2, 3, 4])
result = get_ids({'playlistend': 10})
self.assertEqual(result, [1, 2, 3, 4])
result = get_ids({'playlistend': 2})
self.assertEqual(result, [1, 2])
result = get_ids({'playliststart': 10})
self.assertEqual(result, [])
result = get_ids({'playliststart': 2})
self.assertEqual(result, [2, 3, 4])
result = get_ids({'playlist_items': '2-4'})
self.assertEqual(result, [2, 3, 4])
result = get_ids({'playlist_items': '2,4'})
self.assertEqual(result, [2, 4])
result = get_ids({'playlist_items': '10'})
self.assertEqual(result, [])
if __name__ == '__main__':
unittest.main()

View File

@@ -266,7 +266,7 @@ class TestNRKSubtitles(BaseTestSubtitles):
self.DL.params['allsubtitles'] = True
subtitles = self.getSubtitles()
self.assertEqual(set(subtitles.keys()), set(['no']))
self.assertEqual(md5(subtitles['no']), '1d221e6458c95c5494dcd38e6a1f129a')
self.assertEqual(md5(subtitles['no']), '544fa917d3197fcbee64634559221cc2')
class TestRaiSubtitles(BaseTestSubtitles):

View File

@@ -40,7 +40,8 @@ from youtube_dl.utils import (
read_batch_urls,
sanitize_filename,
sanitize_path,
sanitize_url_path_consecutive_slashes,
prepend_extension,
replace_extension,
shell_quote,
smuggle_url,
str_to_int,
@@ -51,6 +52,7 @@ from youtube_dl.utils import (
unified_strdate,
unsmuggle_url,
uppercase_escape,
lowercase_escape,
url_basename,
urlencode_postdata,
version_tuple,
@@ -173,25 +175,21 @@ class TestUtil(unittest.TestCase):
self.assertEqual(sanitize_path('./abc'), 'abc')
self.assertEqual(sanitize_path('./../abc'), '..\\abc')
def test_sanitize_url_path_consecutive_slashes(self):
self.assertEqual(
sanitize_url_path_consecutive_slashes('http://hostname/foo//bar/filename.html'),
'http://hostname/foo/bar/filename.html')
self.assertEqual(
sanitize_url_path_consecutive_slashes('http://hostname//foo/bar/filename.html'),
'http://hostname/foo/bar/filename.html')
self.assertEqual(
sanitize_url_path_consecutive_slashes('http://hostname//'),
'http://hostname/')
self.assertEqual(
sanitize_url_path_consecutive_slashes('http://hostname/foo/bar/filename.html'),
'http://hostname/foo/bar/filename.html')
self.assertEqual(
sanitize_url_path_consecutive_slashes('http://hostname/'),
'http://hostname/')
self.assertEqual(
sanitize_url_path_consecutive_slashes('http://hostname/abc//'),
'http://hostname/abc/')
def test_prepend_extension(self):
self.assertEqual(prepend_extension('abc.ext', 'temp'), 'abc.temp.ext')
self.assertEqual(prepend_extension('abc.ext', 'temp', 'ext'), 'abc.temp.ext')
self.assertEqual(prepend_extension('abc.unexpected_ext', 'temp', 'ext'), 'abc.unexpected_ext.temp')
self.assertEqual(prepend_extension('abc', 'temp'), 'abc.temp')
self.assertEqual(prepend_extension('.abc', 'temp'), '.abc.temp')
self.assertEqual(prepend_extension('.abc.ext', 'temp'), '.abc.temp.ext')
def test_replace_extension(self):
self.assertEqual(replace_extension('abc.ext', 'temp'), 'abc.temp')
self.assertEqual(replace_extension('abc.ext', 'temp', 'ext'), 'abc.temp')
self.assertEqual(replace_extension('abc.unexpected_ext', 'temp', 'ext'), 'abc.unexpected_ext.temp')
self.assertEqual(replace_extension('abc', 'temp'), 'abc.temp')
self.assertEqual(replace_extension('.abc', 'temp'), '.abc.temp')
self.assertEqual(replace_extension('.abc.ext', 'temp'), '.abc.temp')
def test_ordered_set(self):
self.assertEqual(orderedSet([1, 1, 2, 3, 4, 4, 5, 6, 7, 3, 5]), [1, 2, 3, 4, 5, 6, 7])
@@ -400,6 +398,10 @@ class TestUtil(unittest.TestCase):
self.assertEqual(uppercase_escape(''), '')
self.assertEqual(uppercase_escape('\\U0001d550'), '𝕐')
def test_lowercase_escape(self):
self.assertEqual(lowercase_escape(''), '')
self.assertEqual(lowercase_escape('\\u0026'), '&')
def test_limit_length(self):
self.assertEqual(limit_length(None, 12), None)
self.assertEqual(limit_length('foo', 12), 'foo')
@@ -598,7 +600,7 @@ ffmpeg version 2.4.4 Copyright (c) 2000-2014 the FFmpeg ...'''), '2.4.4')
<div xml:lang="en">
<p begin="0" end="1">The following line contains Chinese characters and special symbols</p>
<p begin="1" end="2">第二行<br/>♪♪</p>
<p begin="2" end="3"><span>Third<br/>Line</span></p>
<p begin="2" dur="1"><span>Third<br/>Line</span></p>
</div>
</body>
</tt>'''
@@ -619,6 +621,21 @@ Line
'''
self.assertEqual(dfxp2srt(dfxp_data), srt_data)
dfxp_data_no_default_namespace = '''<?xml version="1.0" encoding="UTF-8"?>
<tt xml:lang="en" xmlns:tts="http://www.w3.org/ns/ttml#parameter">
<body>
<div xml:lang="en">
<p begin="0" end="1">The first line</p>
</div>
</body>
</tt>'''
srt_data = '''1
00:00:00,000 --> 00:00:01,000
The first line
'''
self.assertEqual(dfxp2srt(dfxp_data_no_default_namespace), srt_data)
if __name__ == '__main__':
unittest.main()

View File

@@ -4,6 +4,8 @@ envlist = py26,py27,py33,py34
deps =
nose
coverage
# We need a valid $HOME for test_compat_expanduser
passenv = HOME
defaultargs = test --exclude test_download.py --exclude test_age_restriction.py
--exclude test_subtitles.py --exclude test_write_annotations.py
--exclude test_youtube_lists.py

View File

@@ -71,6 +71,7 @@ from .utils import (
write_string,
YoutubeDLHandler,
prepend_extension,
replace_extension,
args_to_str,
age_restricted,
)
@@ -259,7 +260,6 @@ class YoutubeDL(object):
The following options are used by the post processors:
prefer_ffmpeg: If True, use ffmpeg instead of avconv if both are available,
otherwise prefer avconv.
exec_cmd: Arbitrary command to run after downloading
"""
params = None
@@ -759,7 +759,9 @@ class YoutubeDL(object):
if isinstance(ie_entries, list):
n_all_entries = len(ie_entries)
if playlistitems:
entries = [ie_entries[i - 1] for i in playlistitems]
entries = [
ie_entries[i - 1] for i in playlistitems
if -n_all_entries <= i - 1 < n_all_entries]
else:
entries = ie_entries[playliststart:playlistend]
n_entries = len(entries)
@@ -914,15 +916,16 @@ class YoutubeDL(object):
if not available_formats:
return None
if format_spec == 'best' or format_spec is None:
return available_formats[-1]
elif format_spec == 'worst':
if format_spec in ['best', 'worst', None]:
format_idx = 0 if format_spec == 'worst' else -1
audiovideo_formats = [
f for f in available_formats
if f.get('vcodec') != 'none' and f.get('acodec') != 'none']
if audiovideo_formats:
return audiovideo_formats[0]
return available_formats[0]
return audiovideo_formats[format_idx]
# for audio only urls, select the best/worst audio format
elif all(f.get('acodec') != 'none' for f in available_formats):
return available_formats[format_idx]
elif format_spec == 'bestaudio':
audio_formats = [
f for f in available_formats
@@ -1084,8 +1087,11 @@ class YoutubeDL(object):
req_format = self.params.get('format')
if req_format is None:
req_format_list = []
if info_dict['extractor'] in ['youtube', 'ted'] and FFmpegMergerPP(self).available:
req_format_list.append('bestvideo+bestaudio')
if (self.params.get('outtmpl', DEFAULT_OUTTMPL) != '-' and
info_dict['extractor'] in ['youtube', 'ted']):
merger = FFmpegMergerPP(self)
if merger.available and merger.can_merge():
req_format_list.append('bestvideo+bestaudio')
req_format_list.append('best')
req_format = '/'.join(req_format_list)
formats_to_download = []
@@ -1269,7 +1275,7 @@ class YoutubeDL(object):
return
if self.params.get('writedescription', False):
descfn = filename + '.description'
descfn = replace_extension(filename, 'description', info_dict.get('ext'))
if self.params.get('nooverwrites', False) and os.path.exists(encodeFilename(descfn)):
self.to_screen('[info] Video description is already present')
elif info_dict.get('description') is None:
@@ -1284,7 +1290,7 @@ class YoutubeDL(object):
return
if self.params.get('writeannotations', False):
annofn = filename + '.annotations.xml'
annofn = replace_extension(filename, 'annotations.xml', info_dict.get('ext'))
if self.params.get('nooverwrites', False) and os.path.exists(encodeFilename(annofn)):
self.to_screen('[info] Video annotations are already present')
else:
@@ -1331,13 +1337,13 @@ class YoutubeDL(object):
return
if self.params.get('writeinfojson', False):
infofn = os.path.splitext(filename)[0] + '.info.json'
infofn = replace_extension(filename, 'info.json', info_dict.get('ext'))
if self.params.get('nooverwrites', False) and os.path.exists(encodeFilename(infofn)):
self.to_screen('[info] Video description metadata is already present')
else:
self.to_screen('[info] Writing video description metadata as JSON to: ' + infofn)
try:
write_json_file(info_dict, infofn)
write_json_file(self.filter_requested_info(info_dict), infofn)
except (OSError, IOError):
self.report_error('Cannot write metadata to JSON file ' + infofn)
return
@@ -1362,7 +1368,7 @@ class YoutubeDL(object):
postprocessors = []
self.report_warning('You have requested multiple '
'formats but ffmpeg or avconv are not installed.'
' The formats won\'t be merged')
' The formats won\'t be merged.')
else:
postprocessors = [merger]
@@ -1381,11 +1387,18 @@ class YoutubeDL(object):
# TODO: Check acodec/vcodec
return False
filename_real_ext = os.path.splitext(filename)[1][1:]
filename_wo_ext = (
os.path.splitext(filename)[0]
if filename_real_ext == info_dict['ext']
else filename)
requested_formats = info_dict['requested_formats']
if self.params.get('merge_output_format') is None and not compatible_formats(requested_formats):
filename = os.path.splitext(filename)[0] + '.mkv'
self.report_warning('You have requested formats uncompatible for merge. '
'The formats will be merged into mkv')
info_dict['ext'] = 'mkv'
self.report_warning(
'Requested formats are incompatible for merge and will be merged into mkv.')
# Ensure filename always has a correct extension for successful merge
filename = '%s.%s' % (filename_wo_ext, info_dict['ext'])
if os.path.exists(encodeFilename(filename)):
self.to_screen(
'[download] %s has already been downloaded and '
@@ -1395,7 +1408,7 @@ class YoutubeDL(object):
new_info = dict(info_dict)
new_info.update(f)
fname = self.prepare_filename(new_info)
fname = prepend_extension(fname, 'f%s' % f['format_id'])
fname = prepend_extension(fname, 'f%s' % f['format_id'], new_info['ext'])
downloaded.append(fname)
partial_success = dl(fname, new_info)
success = success and partial_success
@@ -1487,7 +1500,7 @@ class YoutubeDL(object):
[info_filename], mode='r',
openhook=fileinput.hook_encoded('utf-8'))) as f:
# FileInput doesn't have a read method, we can't call json.load
info = json.loads('\n'.join(f))
info = self.filter_requested_info(json.loads('\n'.join(f)))
try:
self.process_ie_result(info, download=True)
except DownloadError:
@@ -1499,6 +1512,12 @@ class YoutubeDL(object):
raise
return self._download_retcode
@staticmethod
def filter_requested_info(info_dict):
return dict(
(k, v) for k, v in info_dict.items()
if k not in ['requested_formats', 'requested_subtitles'])
def post_process(self, filename, ie_info):
"""Run all the postprocessors on the given file."""
info = dict(ie_info)
@@ -1508,6 +1527,7 @@ class YoutubeDL(object):
pps_chain.extend(ie_info['__postprocessors'])
pps_chain.extend(self._pps)
for pp in pps_chain:
files_to_delete = []
try:
files_to_delete, info = pp.run(info)
except PostProcessingError as e:
@@ -1832,7 +1852,7 @@ class YoutubeDL(object):
thumb_ext = determine_ext(t['url'], 'jpg')
suffix = '_%s' % t['id'] if len(thumbnails) > 1 else ''
thumb_display_id = '%s ' % t['id'] if len(thumbnails) > 1 else ''
thumb_filename = os.path.splitext(filename)[0] + suffix + '.' + thumb_ext
t['filename'] = thumb_filename = os.path.splitext(filename)[0] + suffix + '.' + thumb_ext
if self.params.get('nooverwrites', False) and os.path.exists(encodeFilename(thumb_filename)):
self.to_screen('[%s] %s: Thumbnail %sis already present' %

View File

@@ -240,13 +240,18 @@ def _real_main(argv=None):
if opts.xattrs:
postprocessors.append({'key': 'XAttrMetadata'})
if opts.embedthumbnail:
postprocessors.append({'key': 'EmbedThumbnail'})
already_have_thumbnail = opts.writethumbnail or opts.write_all_thumbnails
postprocessors.append({
'key': 'EmbedThumbnail',
'already_have_thumbnail': already_have_thumbnail
})
if not already_have_thumbnail:
opts.writethumbnail = True
# Please keep ExecAfterDownload towards the bottom as it allows the user to modify the final file in any way.
# So if the user is able to remove the file before your postprocessor runs it might cause a few problems.
if opts.exec_cmd:
postprocessors.append({
'key': 'ExecAfterDownload',
'verboseOutput': opts.verbose,
'exec_cmd': opts.exec_cmd,
})
if opts.xattr_set_filesize:
@@ -345,7 +350,6 @@ def _real_main(argv=None):
'default_search': opts.default_search,
'youtube_include_dash_manifest': opts.youtube_include_dash_manifest,
'encoding': opts.encoding,
'exec_cmd': opts.exec_cmd,
'extract_flat': opts.extract_flat,
'merge_output_format': opts.merge_output_format,
'postprocessors': postprocessors,

View File

@@ -152,7 +152,7 @@ def aes_decrypt_text(data, password, key_size_bytes):
"""
NONCE_LENGTH_BYTES = 8
data = bytes_to_intlist(base64.b64decode(data))
data = bytes_to_intlist(base64.b64decode(data.encode('utf-8')))
password = bytes_to_intlist(password.encode('utf-8'))
key = password[:key_size_bytes] + [0] * (key_size_bytes - len(password))

View File

@@ -46,11 +46,6 @@ try:
except ImportError: # Python 2
import htmlentitydefs as compat_html_entities
try:
import html.parser as compat_html_parser
except ImportError: # Python 2
import HTMLParser as compat_html_parser
try:
import http.client as compat_http_client
except ImportError: # Python 2
@@ -404,7 +399,6 @@ __all__ = [
'compat_getenv',
'compat_getpass',
'compat_html_entities',
'compat_html_parser',
'compat_http_client',
'compat_http_server',
'compat_kwargs',

View File

@@ -28,13 +28,8 @@ class HttpFD(FileDownloader):
add_headers = info_dict.get('http_headers')
if add_headers:
headers.update(add_headers)
data = info_dict.get('http_post_data')
http_method = info_dict.get('http_method')
basic_request = compat_urllib_request.Request(url, data, headers)
request = compat_urllib_request.Request(url, data, headers)
if http_method is not None:
basic_request.get_method = lambda: http_method
request.get_method = lambda: http_method
basic_request = compat_urllib_request.Request(url, None, headers)
request = compat_urllib_request.Request(url, None, headers)
is_test = self.params.get('test', False)

View File

@@ -131,7 +131,7 @@ class RtmpFD(FileDownloader):
if play_path is not None:
basic_args += ['--playpath', play_path]
if tc_url is not None:
basic_args += ['--tcUrl', url]
basic_args += ['--tcUrl', tc_url]
if test:
basic_args += ['--stop', '1']
if flash_version is not None:

View File

@@ -32,6 +32,7 @@ from .atresplayer import AtresPlayerIE
from .atttechchannel import ATTTechChannelIE
from .audiomack import AudiomackIE, AudiomackAlbumIE
from .azubu import AzubuIE
from .baidu import BaiduVideoIE
from .bambuser import BambuserIE, BambuserChannelIE
from .bandcamp import BandcampIE, BandcampAlbumIE
from .bbccouk import BBCCoUkIE
@@ -140,6 +141,7 @@ from .engadget import EngadgetIE
from .eporner import EpornerIE
from .eroprofile import EroProfileIE
from .escapist import EscapistIE
from .espn import ESPNIE
from .everyonesmixtape import EveryonesMixtapeIE
from .exfm import ExfmIE
from .expotv import ExpoTVIE
@@ -147,7 +149,6 @@ from .extremetube import ExtremeTubeIE
from .facebook import FacebookIE
from .faz import FazIE
from .fc2 import FC2IE
from .firedrive import FiredriveIE
from .firstpost import FirstpostIE
from .firsttv import FirstTVIE
from .fivemin import FiveMinIE
@@ -161,6 +162,7 @@ from .footyroom import FootyRoomIE
from .fourtube import FourTubeIE
from .foxgay import FoxgayIE
from .foxnews import FoxNewsIE
from .foxsports import FoxSportsIE
from .franceculture import FranceCultureIE
from .franceinter import FranceInterIE
from .francetv import (
@@ -198,7 +200,6 @@ from .googleplus import GooglePlusIE
from .googlesearch import GoogleSearchIE
from .gorillavid import GorillaVidIE
from .goshgay import GoshgayIE
from .grooveshark import GroovesharkIE
from .groupon import GrouponIE
from .hark import HarkIE
from .hearthisat import HearThisAtIE
@@ -242,6 +243,7 @@ from .kaltura import KalturaIE
from .kanalplay import KanalPlayIE
from .kankan import KankanIE
from .karaoketv import KaraoketvIE
from .karrierevideos import KarriereVideosIE
from .keezmovies import KeezMoviesIE
from .khanacademy import KhanAcademyIE
from .kickstarter import KickStarterIE
@@ -257,7 +259,10 @@ from .letv import (
LetvPlaylistIE
)
from .libsyn import LibsynIE
from .lifenews import LifeNewsIE
from .lifenews import (
LifeNewsIE,
LifeEmbedIE,
)
from .liveleak import LiveLeakIE
from .livestream import (
LivestreamIE,
@@ -320,7 +325,10 @@ from .nbc import (
NBCSportsIE,
NBCSportsVPlayerIE,
)
from .ndr import NDRIE
from .ndr import (
NDRIE,
NJoyIE,
)
from .ndtv import NDTVIE
from .netzkino import NetzkinoIE
from .nerdcubed import NerdCubedFeedIE
@@ -330,8 +338,7 @@ from .newstube import NewstubeIE
from .nextmedia import (
NextMediaIE,
NextMediaActionNewsIE,
AppleDailyRealtimeNewsIE,
AppleDailyAnimationNewsIE
AppleDailyIE,
)
from .nfb import NFBIE
from .nfl import NFLIE
@@ -347,6 +354,7 @@ from .normalboots import NormalbootsIE
from .nosvideo import NosVideoIE
from .novamov import NovaMovIE
from .nowness import NownessIE
from .nowtv import NowTVIE
from .nowvideo import NowVideoIE
from .npo import (
NPOIE,
@@ -362,11 +370,17 @@ from .nrk import (
)
from .ntvde import NTVDeIE
from .ntvru import NTVRuIE
from .nytimes import NYTimesIE
from .nytimes import (
NYTimesIE,
NYTimesArticleIE,
)
from .nuvid import NuvidIE
from .odnoklassniki import OdnoklassnikiIE
from .oktoberfesttv import OktoberfestTVIE
from .ooyala import OoyalaIE
from .ooyala import (
OoyalaIE,
OoyalaExternalIE,
)
from .openfilm import OpenFilmIE
from .orf import (
ORFTVthekIE,
@@ -404,6 +418,7 @@ from .qqmusic import (
QQMusicIE,
QQMusicSingerIE,
QQMusicAlbumIE,
QQMusicToplistIE,
)
from .quickvid import QuickVidIE
from .r7 import R7IE
@@ -423,7 +438,6 @@ from .roxwel import RoxwelIE
from .rtbf import RTBFIE
from .rte import RteIE
from .rtlnl import RtlNlIE
from .rtlnow import RTLnowIE
from .rtl2 import RTL2IE
from .rtp import RTPIE
from .rts import RTSIE
@@ -465,7 +479,6 @@ from .smotri import (
SmotriBroadcastIE,
)
from .snotr import SnotrIE
from .sockshare import SockshareIE
from .sohu import SohuIE
from .soundcloud import (
SoundcloudIE,
@@ -479,8 +492,10 @@ from .soundgasm import (
)
from .southpark import (
SouthParkIE,
SouthParkDeIE,
SouthParkDkIE,
SouthParkEsIE,
SouthparkDeIE,
SouthParkNlIE
)
from .space import SpaceIE
from .spankbang import SpankBangIE
@@ -489,7 +504,10 @@ from .spiegel import SpiegelIE, SpiegelArticleIE
from .spiegeltv import SpiegeltvIE
from .spike import SpikeIE
from .sport5 import Sport5IE
from .sportbox import SportBoxIE
from .sportbox import (
SportBoxIE,
SportBoxEmbedIE,
)
from .sportdeutschland import SportDeutschlandIE
from .srf import SrfIE
from .srmediathek import SRMediathekIE
@@ -500,7 +518,10 @@ from .streamcloud import StreamcloudIE
from .streamcz import StreamCZIE
from .streetvoice import StreetVoiceIE
from .sunporno import SunPornoIE
from .svtplay import SVTPlayIE
from .svt import (
SVTIE,
SVTPlayIE,
)
from .swrmediathek import SWRMediathekIE
from .syfy import SyfyIE
from .sztvhu import SztvHuIE
@@ -529,7 +550,10 @@ from .thesixtyone import TheSixtyOneIE
from .thisav import ThisAVIE
from .tinypic import TinyPicIE
from .tlc import TlcIE, TlcDeIE
from .tmz import TMZIE
from .tmz import (
TMZIE,
TMZArticleIE,
)
from .tnaflix import TNAFlixIE
from .thvideo import (
THVideoIE,
@@ -546,6 +570,10 @@ from .tumblr import TumblrIE
from .tunein import TuneInIE
from .turbo import TurboIE
from .tutv import TutvIE
from .tv2 import (
TV2IE,
TV2ArticleIE,
)
from .tv4 import TV4IE
from .tvigle import TvigleIE
from .tvp import TvpIE, TvpSeriesIE
@@ -582,7 +610,11 @@ from .veoh import VeohIE
from .vessel import VesselIE
from .vesti import VestiIE
from .vevo import VevoIE
from .vgtv import VGTVIE
from .vgtv import (
BTArticleIE,
BTVestlendingenIE,
VGTVIE,
)
from .vh1 import VH1IE
from .vice import ViceIE
from .viddler import ViddlerIE
@@ -613,12 +645,16 @@ from .vine import (
VineIE,
VineUserIE,
)
from .viki import VikiIE
from .viki import (
VikiIE,
VikiChannelIE,
)
from .vk import (
VKIE,
VKUserVideosIE,
)
from .vodlocker import VodlockerIE
from .voicerepublic import VoiceRepublicIE
from .vporn import VpornIE
from .vrt import VRTIE
from .vube import VubeIE
@@ -645,9 +681,10 @@ from .xboxclips import XboxClipsIE
from .xhamster import XHamsterIE
from .xminus import XMinusIE
from .xnxx import XNXXIE
from .xvideos import XVideosIE
from .xstream import XstreamIE
from .xtube import XTubeUserIE, XTubeIE
from .xuite import XuiteIE
from .xvideos import XVideosIE
from .xxxymovies import XXXYMoviesIE
from .yahoo import (
YahooIE,

View File

@@ -1,21 +1,11 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
int_or_none,
parse_iso8601,
xpath_with_ns,
xpath_text,
find_xpath_attr,
)
class AftenpostenIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?aftenposten\.no/webtv/(?:#!/)?video/(?P<id>\d+)'
_TEST = {
'url': 'http://www.aftenposten.no/webtv/#!/video/21039/trailer-sweatshop-i-can-t-take-any-more',
'md5': 'fd828cd29774a729bf4d4425fe192972',
@@ -30,69 +20,4 @@ class AftenpostenIE(InfoExtractor):
}
def _real_extract(self, url):
video_id = self._match_id(url)
data = self._download_xml(
'http://frontend.xstream.dk/ap/feed/video/?platform=web&id=%s' % video_id, video_id)
NS_MAP = {
'atom': 'http://www.w3.org/2005/Atom',
'xt': 'http://xstream.dk/',
'media': 'http://search.yahoo.com/mrss/',
}
entry = data.find(xpath_with_ns('./atom:entry', NS_MAP))
title = xpath_text(
entry, xpath_with_ns('./atom:title', NS_MAP), 'title')
description = xpath_text(
entry, xpath_with_ns('./atom:summary', NS_MAP), 'description')
timestamp = parse_iso8601(xpath_text(
entry, xpath_with_ns('./atom:published', NS_MAP), 'upload date'))
formats = []
media_group = entry.find(xpath_with_ns('./media:group', NS_MAP))
for media_content in media_group.findall(xpath_with_ns('./media:content', NS_MAP)):
media_url = media_content.get('url')
if not media_url:
continue
tbr = int_or_none(media_content.get('bitrate'))
mobj = re.search(r'^(?P<url>rtmp://[^/]+/(?P<app>[^/]+))/(?P<playpath>.+)$', media_url)
if mobj:
formats.append({
'url': mobj.group('url'),
'play_path': 'mp4:%s' % mobj.group('playpath'),
'app': mobj.group('app'),
'ext': 'flv',
'tbr': tbr,
'format_id': 'rtmp-%d' % tbr,
})
else:
formats.append({
'url': media_url,
'tbr': tbr,
})
self._sort_formats(formats)
link = find_xpath_attr(
entry, xpath_with_ns('./atom:link', NS_MAP), 'rel', 'original')
if link is not None:
formats.append({
'url': link.get('href'),
'format_id': link.get('rel'),
})
thumbnails = [{
'url': splash.get('url'),
'width': int_or_none(splash.get('width')),
'height': int_or_none(splash.get('height')),
} for splash in media_group.findall(xpath_with_ns('./xt:splash', NS_MAP))]
return {
'id': video_id,
'title': title,
'description': description,
'timestamp': timestamp,
'formats': formats,
'thumbnails': thumbnails,
}
return self.url_result('xstream:ap:%s' % self._match_id(url), 'Xstream')

View File

@@ -33,7 +33,7 @@ class ArchiveOrgIE(InfoExtractor):
def _real_extract(self, url):
video_id = self._match_id(url)
json_url = url + ('?' if '?' in url else '&') + 'output=json'
json_url = url + ('&' if '?' in url else '?') + 'output=json'
data = self._download_json(json_url, video_id)
def get_optional(data_dict, field):

View File

@@ -7,7 +7,6 @@ from .common import InfoExtractor
from ..utils import (
find_xpath_attr,
unified_strdate,
get_element_by_id,
get_element_by_attribute,
int_or_none,
qualities,
@@ -195,7 +194,9 @@ class ArteTVFutureIE(ArteTVPlus7IE):
def _real_extract(self, url):
anchor_id, lang = self._extract_url_info(url)
webpage = self._download_webpage(url, anchor_id)
row = get_element_by_id(anchor_id, webpage)
row = self._search_regex(
r'(?s)id="%s"[^>]*>.+?(<div[^>]*arte_vp_url[^>]*>)' % anchor_id,
webpage, 'row')
return self._extract_from_webpage(row, anchor_id, lang)

View File

@@ -0,0 +1,68 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_urlparse
class BaiduVideoIE(InfoExtractor):
_VALID_URL = r'http://v\.baidu\.com/(?P<type>[a-z]+)/(?P<id>\d+)\.htm'
_TESTS = [{
'url': 'http://v.baidu.com/comic/1069.htm?frp=bdbrand&q=%E4%B8%AD%E5%8D%8E%E5%B0%8F%E5%BD%93%E5%AE%B6',
'info_dict': {
'id': '1069',
'title': '中华小当家 TV版 (全52集)',
'description': 'md5:395a419e41215e531c857bb037bbaf80',
},
'playlist_count': 52,
}, {
'url': 'http://v.baidu.com/show/11595.htm?frp=bdbrand',
'info_dict': {
'id': '11595',
'title': 're:^奔跑吧兄弟',
'description': 'md5:1bf88bad6d850930f542d51547c089b8',
},
'playlist_mincount': 3,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
playlist_id = mobj.group('id')
category = category2 = mobj.group('type')
if category == 'show':
category2 = 'tvshow'
webpage = self._download_webpage(url, playlist_id)
playlist_title = self._html_search_regex(
r'title\s*:\s*(["\'])(?P<title>[^\']+)\1', webpage,
'playlist title', group='title')
playlist_description = self._html_search_regex(
r'<input[^>]+class="j-data-intro"[^>]+value="([^"]+)"/>', webpage,
playlist_id, 'playlist description')
site = self._html_search_regex(
r'filterSite\s*:\s*["\']([^"]*)["\']', webpage,
'primary provider site')
api_result = self._download_json(
'http://v.baidu.com/%s_intro/?dtype=%sPlayUrl&id=%s&site=%s' % (
category, category2, playlist_id, site),
playlist_id, 'Get playlist links')
entries = []
for episode in api_result[0]['episodes']:
episode_id = '%s_%s' % (playlist_id, episode['episode'])
redirect_page = self._download_webpage(
compat_urlparse.urljoin(url, episode['url']), episode_id,
note='Download Baidu redirect page')
real_url = self._html_search_regex(
r'location\.replace\("([^"]+)"\)', redirect_page, 'real URL')
entries.append(self.url_result(
real_url, video_title=episode['single_title']))
return self.playlist_result(
entries, playlist_id, playlist_title, playlist_description)

View File

@@ -3,7 +3,10 @@ from __future__ import unicode_literals
import xml.etree.ElementTree
from .common import InfoExtractor
from ..utils import ExtractorError
from ..utils import (
ExtractorError,
int_or_none,
)
from ..compat import compat_HTTPError
@@ -112,6 +115,20 @@ class BBCCoUkIE(InfoExtractor):
# rtmp download
'skip_download': True,
}
}, {
'url': 'http://www.bbc.co.uk/iplayer/episode/b054fn09/ad/natural-world-20152016-2-super-powered-owls',
'info_dict': {
'id': 'p02n76xf',
'ext': 'flv',
'title': 'Natural World, 2015-2016: 2. Super Powered Owls',
'description': 'md5:e4db5c937d0e95a7c6b5e654d429183d',
'duration': 3540,
},
'params': {
# rtmp download
'skip_download': True,
},
'skip': 'geolocation',
}, {
'url': 'http://www.bbc.co.uk/iplayer/playlist/p01dvks4',
'only_matching': True,
@@ -326,16 +343,27 @@ class BBCCoUkIE(InfoExtractor):
webpage = self._download_webpage(url, group_id, 'Downloading video page')
programme_id = self._search_regex(
r'"vpid"\s*:\s*"([\da-z]{8})"', webpage, 'vpid', fatal=False, default=None)
programme_id = None
tviplayer = self._search_regex(
r'mediator\.bind\(({.+?})\s*,\s*document\.getElementById',
webpage, 'player', default=None)
if tviplayer:
player = self._parse_json(tviplayer, group_id).get('player', {})
duration = int_or_none(player.get('duration'))
programme_id = player.get('vpid')
if not programme_id:
programme_id = self._search_regex(
r'"vpid"\s*:\s*"([\da-z]{8})"', webpage, 'vpid', fatal=False, default=None)
if programme_id:
player = self._download_json(
'http://www.bbc.co.uk/iplayer/episode/%s.json' % group_id,
group_id)['jsConf']['player']
title = player['title']
description = player['subtitle']
duration = player['duration']
formats, subtitles = self._download_media_selector(programme_id)
title = self._og_search_title(webpage)
description = self._search_regex(
r'<p class="medium-description">([^<]+)</p>',
webpage, 'description', fatal=False)
else:
programme_id, title, description, duration, formats, subtitles = self._download_playlist(group_id)
@@ -345,6 +373,7 @@ class BBCCoUkIE(InfoExtractor):
'id': programme_id,
'title': title,
'description': description,
'thumbnail': self._og_search_thumbnail(webpage, default=None),
'duration': duration,
'formats': formats,
'subtitles': subtitles,

View File

@@ -16,11 +16,11 @@ class BetIE(InfoExtractor):
{
'url': 'http://www.bet.com/news/politics/2014/12/08/in-bet-exclusive-obama-talks-race-and-racism.html',
'info_dict': {
'id': '740ab250-bb94-4a8a-8787-fe0de7c74471',
'id': 'news/national/2014/a-conversation-with-president-obama',
'display_id': 'in-bet-exclusive-obama-talks-race-and-racism',
'ext': 'flv',
'title': 'BET News Presents: A Conversation With President Obama',
'description': 'md5:5a88d8ae912c1b33e090290af7ec33c6',
'title': 'A Conversation With President Obama',
'description': 'md5:699d0652a350cf3e491cd15cc745b5da',
'duration': 1534,
'timestamp': 1418075340,
'upload_date': '20141208',
@@ -35,7 +35,7 @@ class BetIE(InfoExtractor):
{
'url': 'http://www.bet.com/video/news/national/2014/justice-for-ferguson-a-community-reacts.html',
'info_dict': {
'id': 'bcd1b1df-673a-42cf-8d01-b282db608f2d',
'id': 'news/national/2014/justice-for-ferguson-a-community-reacts',
'display_id': 'justice-for-ferguson-a-community-reacts',
'ext': 'flv',
'title': 'Justice for Ferguson: A Community Reacts',
@@ -61,6 +61,9 @@ class BetIE(InfoExtractor):
[r'mediaURL\s*:\s*"([^"]+)"', r"var\s+mrssMediaUrl\s*=\s*'([^']+)'"],
webpage, 'media URL'))
video_id = self._search_regex(
r'/video/(.*)/_jcr_content/', media_url, 'video id')
mrss = self._download_xml(media_url, display_id)
item = mrss.find('./channel/item')
@@ -75,8 +78,6 @@ class BetIE(InfoExtractor):
description = xpath_text(
item, './description', 'description', fatal=False)
video_id = xpath_text(item, './guid', 'video id', fatal=False)
timestamp = parse_iso8601(xpath_text(
item, xpath_with_ns('./dc:date', NS_MAP),
'upload date', fatal=False))

View File

@@ -2,7 +2,10 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import int_or_none
from ..utils import (
int_or_none,
fix_xml_ampersands,
)
class BildIE(InfoExtractor):
@@ -15,7 +18,7 @@ class BildIE(InfoExtractor):
'id': '38184146',
'ext': 'mp4',
'title': 'BILD hat sie getestet',
'thumbnail': 'http://bilder.bild.de/fotos/stand-das-koennen-die-neuen-ipads-38184138/Bild/1.bild.jpg',
'thumbnail': 're:^https?://.*\.jpg$',
'duration': 196,
'description': 'Mit dem iPad Air 2 und dem iPad Mini 3 hat Apple zwei neue Tablet-Modelle präsentiert. BILD-Reporter Sven Stein durfte die Geräte bereits testen. ',
}
@@ -25,7 +28,7 @@ class BildIE(InfoExtractor):
video_id = self._match_id(url)
xml_url = url.split(".bild.html")[0] + ",view=xml.bild.xml"
doc = self._download_xml(xml_url, video_id)
doc = self._download_xml(xml_url, video_id, transform_source=fix_xml_ampersands)
duration = int_or_none(doc.attrib.get('duration'), scale=1000)

View File

@@ -2,6 +2,9 @@
from __future__ import unicode_literals
import re
import itertools
import json
import xml.etree.ElementTree as ET
from .common import InfoExtractor
from ..utils import (
@@ -14,18 +17,25 @@ from ..utils import (
class BiliBiliIE(InfoExtractor):
_VALID_URL = r'http://www\.bilibili\.(?:tv|com)/video/av(?P<id>[0-9]+)/'
_TEST = {
_TESTS = [{
'url': 'http://www.bilibili.tv/video/av1074402/',
'md5': '2c301e4dab317596e837c3e7633e7d86',
'info_dict': {
'id': '1074402',
'id': '1074402_part1',
'ext': 'flv',
'title': '【金坷垃】金泡沫',
'duration': 308,
'upload_date': '20140420',
'thumbnail': 're:^https?://.+\.jpg',
},
}
}, {
'url': 'http://www.bilibili.com/video/av1041170/',
'info_dict': {
'id': '1041170',
'title': '【BD1080P】刀语【诸神&异域】',
},
'playlist_count': 9,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
@@ -57,19 +67,22 @@ class BiliBiliIE(InfoExtractor):
cid = self._search_regex(r'cid=(\d+)', webpage, 'cid')
lq_doc = self._download_xml(
entries = []
lq_page = self._download_webpage(
'http://interface.bilibili.com/v_cdn_play?appkey=1&cid=%s' % cid,
video_id,
note='Downloading LQ video info'
)
lq_durl = lq_doc.find('./durl')
formats = [{
'format_id': 'lq',
'quality': 1,
'url': lq_durl.find('./url').text,
'filesize': int_or_none(
lq_durl.find('./size'), get_attr='text'),
}]
try:
err_info = json.loads(lq_page)
raise ExtractorError(
'BiliBili said: ' + err_info['error_text'], expected=True)
except ValueError:
pass
lq_doc = ET.fromstring(lq_page)
lq_durls = lq_doc.findall('./durl')
hq_doc = self._download_xml(
'http://interface.bilibili.com/playurl?appkey=1&cid=%s' % cid,
@@ -78,22 +91,45 @@ class BiliBiliIE(InfoExtractor):
fatal=False,
)
if hq_doc is not False:
hq_durl = hq_doc.find('./durl')
formats.append({
'format_id': 'hq',
'quality': 2,
'ext': 'flv',
'url': hq_durl.find('./url').text,
hq_durls = hq_doc.findall('./durl')
assert len(lq_durls) == len(hq_durls)
else:
hq_durls = itertools.repeat(None)
i = 1
for lq_durl, hq_durl in zip(lq_durls, hq_durls):
formats = [{
'format_id': 'lq',
'quality': 1,
'url': lq_durl.find('./url').text,
'filesize': int_or_none(
hq_durl.find('./size'), get_attr='text'),
lq_durl.find('./size'), get_attr='text'),
}]
if hq_durl:
formats.append({
'format_id': 'hq',
'quality': 2,
'ext': 'flv',
'url': hq_durl.find('./url').text,
'filesize': int_or_none(
hq_durl.find('./size'), get_attr='text'),
})
self._sort_formats(formats)
entries.append({
'id': '%s_part%d' % (video_id, i),
'title': title,
'formats': formats,
'duration': duration,
'upload_date': upload_date,
'thumbnail': thumbnail,
})
self._sort_formats(formats)
i += 1
return {
'_type': 'multi_video',
'entries': entries,
'id': video_id,
'title': title,
'formats': formats,
'duration': duration,
'upload_date': upload_date,
'thumbnail': thumbnail,
'title': title
}

View File

@@ -16,27 +16,38 @@ class BRIE(InfoExtractor):
_TESTS = [
{
'url': 'http://www.br.de/mediathek/video/sendungen/heimatsound/heimatsound-festival-2014-trailer-100.html',
'md5': '93556dd2bcb2948d9259f8670c516d59',
'url': 'http://www.br.de/mediathek/video/sendungen/abendschau/betriebliche-altersvorsorge-104.html',
'md5': '83a0477cf0b8451027eb566d88b51106',
'info_dict': {
'id': '25e279aa-1ffd-40fd-9955-5325bd48a53a',
'id': '48f656ef-287e-486f-be86-459122db22cc',
'ext': 'mp4',
'title': 'Wenn das Traditions-Theater wackelt',
'description': 'Heimatsound-Festival 2014: Wenn das Traditions-Theater wackelt',
'duration': 34,
'uploader': 'BR',
'upload_date': '20140802',
'title': 'Die böse Überraschung',
'description': 'Betriebliche Altersvorsorge: Die böse Überraschung',
'duration': 180,
'uploader': 'Reinhard Weber',
'upload_date': '20150422',
}
},
{
'url': 'http://www.br.de/nachrichten/schaeuble-haushaltsentwurf-bundestag-100.html',
'md5': '3db0df1a9a9cd9fa0c70e6ea8aa8e820',
'url': 'http://www.br.de/nachrichten/oberbayern/inhalt/muenchner-polizeipraesident-schreiber-gestorben-100.html',
'md5': 'a44396d73ab6a68a69a568fae10705bb',
'info_dict': {
'id': 'c6aae3de-2cf9-43f2-957f-f17fef9afaab',
'id': 'a4b83e34-123d-4b81-9f4e-c0d3121a4e05',
'ext': 'mp4',
'title': 'Manfred Schreiber ist tot',
'description': 'Abendschau kompakt: Manfred Schreiber ist tot',
'duration': 26,
}
},
{
'url': 'http://www.br.de/radio/br-klassik/sendungen/allegro/premiere-urauffuehrung-the-land-2015-dance-festival-muenchen-100.html',
'md5': '8b5b27c0b090f3b35eac4ab3f7a73d3d',
'info_dict': {
'id': '74c603c9-26d3-48bb-b85b-079aeed66e0b',
'ext': 'aac',
'title': '"Keine neuen Schulden im nächsten Jahr"',
'description': 'Haushaltsentwurf: "Keine neuen Schulden im nächsten Jahr"',
'duration': 64,
'title': 'Kurzweilig und sehr bewegend',
'description': '"The Land" von Peeping Tom: Kurzweilig und sehr bewegend',
'duration': 296,
}
},
{

View File

@@ -16,7 +16,7 @@ class BYUtvIE(InfoExtractor):
'ext': 'mp4',
'description': 'md5:5438d33774b6bdc662f9485a340401cc',
'title': 'Season 5 Episode 5',
'thumbnail': 're:^https?://.*promo.*'
'thumbnail': 're:^https?://.*\.jpg$'
},
'params': {
'skip_download': True,

View File

@@ -25,14 +25,14 @@ class CanalplusIE(InfoExtractor):
}
_TESTS = [{
'url': 'http://www.canalplus.fr/c-infos-documentaires/pid1830-c-zapping.html?vid=922470',
'md5': '3db39fb48b9685438ecf33a1078023e4',
'url': 'http://www.canalplus.fr/c-emissions/pid1830-c-zapping.html?vid=1263092',
'md5': 'b3481d7ca972f61e37420798d0a9d934',
'info_dict': {
'id': '922470',
'id': '1263092',
'ext': 'flv',
'title': 'Zapping - 26/08/13',
'description': 'Le meilleur de toutes les chaînes, tous les jours.\nEmission du 26 août 2013',
'upload_date': '20130826',
'title': 'Le Zapping - 13/05/15',
'description': 'md5:09738c0d06be4b5d06a0940edb0da73f',
'upload_date': '20150513',
},
}, {
'url': 'http://www.piwiplus.fr/videos-piwi/pid1405-le-labyrinthe-boing-super-ranger.html?vid=1108190',
@@ -56,7 +56,7 @@ class CanalplusIE(InfoExtractor):
'skip': 'videos get deleted after a while',
}, {
'url': 'http://www.itele.fr/france/video/aubervilliers-un-lycee-en-colere-111559',
'md5': '65aa83ad62fe107ce29e564bb8712580',
'md5': 'f3a46edcdf28006598ffaf5b30e6a2d4',
'info_dict': {
'id': '1213714',
'ext': 'flv',

View File

@@ -32,7 +32,7 @@ class CBSNewsIE(InfoExtractor):
'id': 'fort-hood-shooting-army-downplays-mental-illness-as-cause-of-attack',
'ext': 'flv',
'title': 'Fort Hood shooting: Army downplays mental illness as cause of attack',
'thumbnail': 'http://cbsnews2.cbsistatic.com/hub/i/r/2014/04/04/0c9fbc66-576b-41ca-8069-02d122060dd2/thumbnail/140x90/6dad7a502f88875ceac38202984b6d58/en-0404-werner-replace-640x360.jpg',
'thumbnail': 're:^https?://.*\.jpg$',
'duration': 205,
},
'params': {

View File

@@ -16,7 +16,7 @@ class CCCIE(InfoExtractor):
_TEST = {
'url': 'http://media.ccc.de/browse/congress/2013/30C3_-_5443_-_en_-_saal_g_-_201312281830_-_introduction_to_processor_design_-_byterazor.html#video',
'md5': '205a365d0d57c0b1e43a12c9ffe8f9be',
'md5': '3a1eda8f3a29515d27f5adb967d7e740',
'info_dict': {
'id': '20131228183',
'ext': 'mp4',
@@ -51,7 +51,7 @@ class CCCIE(InfoExtractor):
matches = re.finditer(r'''(?xs)
<(?:span|div)\s+class='label\s+filetype'>(?P<format>.*?)</(?:span|div)>\s*
<a\s+href='(?P<http_url>[^']+)'>\s*
<a\s+download\s+href='(?P<http_url>[^']+)'>\s*
(?:
.*?
<a\s+href='(?P<torrent_url>[^']+\.torrent)'

View File

@@ -57,7 +57,7 @@ class ChilloutzoneIE(InfoExtractor):
base64_video_info = self._html_search_regex(
r'var cozVidData = "(.+?)";', webpage, 'video data')
decoded_video_info = base64.b64decode(base64_video_info).decode("utf-8")
decoded_video_info = base64.b64decode(base64_video_info.encode('utf-8')).decode('utf-8')
video_info_dict = json.loads(decoded_video_info)
# get video information from dict

View File

@@ -60,6 +60,17 @@ class CinemassacreIE(InfoExtractor):
'uploader_id': 'Cinemassacre',
'title': 'AVGN: McKids',
}
},
{
'url': 'http://cinemassacre.com/2015/05/25/mario-kart-64-nintendo-64-james-mike-mondays/',
'md5': '1376908e49572389e7b06251a53cdd08',
'info_dict': {
'id': 'Cinemassacre-555779690c440',
'ext': 'mp4',
'description': 'Lets Play Mario Kart 64 !! Mario Kart 64 is a classic go-kart racing game released for the Nintendo 64 (N64). Today James & Mike do 4 player Battle Mode with Kyle and Bootsy!',
'title': 'Mario Kart 64 (Nintendo 64) James & Mike Mondays',
'upload_date': '20150525',
}
}
]
@@ -72,7 +83,7 @@ class CinemassacreIE(InfoExtractor):
playerdata_url = self._search_regex(
[
r'src="(http://player\.screenwavemedia\.com/play/[a-zA-Z]+\.php\?[^"]*\bid=.+?)"',
r'src="(http://(?:player2\.screenwavemedia\.com|player\.screenwavemedia\.com/play)/[a-zA-Z]+\.php\?[^"]*\bid=.+?)"',
r'<iframe[^>]+src="((?:https?:)?//(?:[^.]+\.)?youtube\.com/.+?)"',
],
webpage, 'player data URL', default=None)

View File

@@ -12,7 +12,7 @@ from ..utils import (
class CNNIE(InfoExtractor):
_VALID_URL = r'''(?x)https?://(?:(?:edition|www)\.)?cnn\.com/video/(?:data/.+?|\?)/
(?P<path>.+?/(?P<title>[^/]+?)(?:\.(?:[a-z]{3,5})(?:-ap)?|(?=&)))'''
(?P<path>.+?/(?P<title>[^/]+?)(?:\.(?:[a-z\-]+)|(?=&)))'''
_TESTS = [{
'url': 'http://edition.cnn.com/video/?/video/sports/2013/06/09/nadal-1-on-1.cnn',

View File

@@ -47,7 +47,7 @@ class InfoExtractor(object):
information possibly downloading the video to the file system, among
other possible outcomes.
The type field determines the the type of the result.
The type field determines the type of the result.
By far the most common value (and the default if _type is missing) is
"video", which indicates a single video.
@@ -111,11 +111,8 @@ class InfoExtractor(object):
(quality takes higher priority)
-1 for default (order by other properties),
-2 or smaller for less than default.
* http_method HTTP method to use for the download.
* http_headers A dictionary of additional HTTP headers
to add to the request.
* http_post_data Additional data to send with a POST
request.
* stretched_ratio If given and not 1, indicates that the
video's pixels are not square.
width : height ratio as float.
@@ -572,7 +569,7 @@ class InfoExtractor(object):
def _get_login_info(self):
"""
Get the the login info as (username, password)
Get the login info as (username, password)
It will look in the netrc file using the _NETRC_MACHINE value
If there's no info available, return (None, None)
"""
@@ -767,7 +764,7 @@ class InfoExtractor(object):
f.get('fps') if f.get('fps') is not None else -1,
f.get('filesize_approx') if f.get('filesize_approx') is not None else -1,
f.get('source_preference') if f.get('source_preference') is not None else -1,
f.get('format_id'),
f.get('format_id') if f.get('format_id') is not None else '',
)
formats.sort(key=_formats_key)
@@ -789,8 +786,8 @@ class InfoExtractor(object):
return True
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError):
self.report_warning(
'%s URL is invalid, skipping' % item, video_id)
self.to_screen(
'%s: %s URL is invalid, skipping' % (video_id, item))
return False
raise
@@ -899,7 +896,7 @@ class InfoExtractor(object):
format_id = []
if m3u8_id:
format_id.append(m3u8_id)
last_media_name = last_media.get('NAME') if last_media else None
last_media_name = last_media.get('NAME') if last_media and last_media.get('TYPE') != 'SUBTITLES' else None
format_id.append(last_media_name if last_media_name else '%d' % (tbr if tbr else len(formats)))
f = {
'format_id': '-'.join(format_id),
@@ -1075,9 +1072,6 @@ class InfoExtractor(object):
def _get_automatic_captions(self, *args, **kwargs):
raise NotImplementedError("This method must be implemented by subclasses")
def _subtitles_timecode(self, seconds):
return '%02d:%02d:%02d.%03d' % (seconds / 3600, (seconds % 3600) / 60, seconds % 60, (seconds % 1) * 1000)
class SearchInfoExtractor(InfoExtractor):
"""

View File

@@ -52,6 +52,7 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
'ext': 'mp4',
'uploader': 'IGN',
'title': 'Steam Machine Models, Pricing Listed on Steam Store - IGN News',
'upload_date': '20150306',
}
},
# Vevo video
@@ -85,7 +86,7 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
def _real_extract(self, url):
video_id = self._match_id(url)
url = 'http://www.dailymotion.com/video/%s' % video_id
url = 'https://www.dailymotion.com/video/%s' % video_id
# Retrieve video webpage to extract further information
request = self._build_request(url)
@@ -106,11 +107,11 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
age_limit = self._rta_search(webpage)
video_upload_date = None
mobj = re.search(r'<div class="[^"]*uploaded_cont[^"]*" title="[^"]*">([0-9]{2})-([0-9]{2})-([0-9]{4})</div>', webpage)
mobj = re.search(r'<meta property="video:release_date" content="([0-9]{4})-([0-9]{2})-([0-9]{2}).+?"/>', webpage)
if mobj is not None:
video_upload_date = mobj.group(3) + mobj.group(2) + mobj.group(1)
video_upload_date = mobj.group(1) + mobj.group(2) + mobj.group(3)
embed_url = 'http://www.dailymotion.com/embed/video/%s' % video_id
embed_url = 'https://www.dailymotion.com/embed/video/%s' % video_id
embed_request = self._build_request(embed_url)
embed_page = self._download_webpage(
embed_request, video_id, 'Downloading embed page')
@@ -224,7 +225,7 @@ class DailymotionPlaylistIE(DailymotionBaseInfoExtractor):
class DailymotionUserIE(DailymotionPlaylistIE):
IE_NAME = 'dailymotion:user'
_VALID_URL = r'https?://(?:www\.)?dailymotion\.[a-z]{2,3}/(?:old/)?user/(?P<user>[^/]+)'
_VALID_URL = r'https?://(?:www\.)?dailymotion\.[a-z]{2,3}/(?:(?:old/)?user/)?(?P<user>[^/]+)$'
_PAGE_TEMPLATE = 'http://www.dailymotion.com/user/%s/%s'
_TESTS = [{
'url': 'https://www.dailymotion.com/user/nqtv',
@@ -238,7 +239,8 @@ class DailymotionUserIE(DailymotionPlaylistIE):
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
user = mobj.group('user')
webpage = self._download_webpage(url, user)
webpage = self._download_webpage(
'https://www.dailymotion.com/user/%s' % user, user)
full_user = unescapeHTML(self._html_search_regex(
r'<a class="nav-image" title="([^"]+)" href="/%s">' % re.escape(user),
webpage, 'user'))

View File

@@ -11,19 +11,25 @@ from ..utils import (
class DreiSatIE(InfoExtractor):
IE_NAME = '3sat'
_VALID_URL = r'(?:http://)?(?:www\.)?3sat\.de/mediathek/(?:index\.php)?\?(?:(?:mode|display)=[^&]+&)*obj=(?P<id>[0-9]+)$'
_TEST = {
'url': 'http://www.3sat.de/mediathek/index.php?mode=play&obj=45918',
'md5': 'be37228896d30a88f315b638900a026e',
'info_dict': {
'id': '45918',
'ext': 'mp4',
'title': 'Waidmannsheil',
'description': 'md5:cce00ca1d70e21425e72c86a98a56817',
'uploader': '3sat',
'upload_date': '20140913'
}
}
_VALID_URL = r'(?:http://)?(?:www\.)?3sat\.de/mediathek/(?:index\.php|mediathek\.php)?\?(?:(?:mode|display)=[^&]+&)*obj=(?P<id>[0-9]+)$'
_TESTS = [
{
'url': 'http://www.3sat.de/mediathek/index.php?mode=play&obj=45918',
'md5': 'be37228896d30a88f315b638900a026e',
'info_dict': {
'id': '45918',
'ext': 'mp4',
'title': 'Waidmannsheil',
'description': 'md5:cce00ca1d70e21425e72c86a98a56817',
'uploader': '3sat',
'upload_date': '20140913'
}
},
{
'url': 'http://www.3sat.de/mediathek/mediathek.php?mode=play&obj=51066',
'only_matching': True,
},
]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)

View File

@@ -1,8 +1,11 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor, ExtractorError
from ..utils import parse_iso8601
from .common import InfoExtractor
from ..utils import (
ExtractorError,
parse_iso8601,
)
class DRTVIE(InfoExtractor):
@@ -60,19 +63,31 @@ class DRTVIE(InfoExtractor):
restricted_to_denmark = asset['RestrictedToDenmark']
spoken_subtitles = asset['Target'] == 'SpokenSubtitles'
for link in asset['Links']:
target = link['Target']
uri = link['Uri']
target = link['Target']
format_id = target
preference = -1 if target == 'HDS' else -2
preference = None
if spoken_subtitles:
preference -= 2
preference = -1
format_id += '-spoken-subtitles'
formats.append({
'url': uri + '?hdcore=3.3.0&plugin=aasp-3.3.0.99.43' if target == 'HDS' else uri,
'format_id': format_id,
'ext': link['FileFormat'],
'preference': preference,
})
if target == 'HDS':
formats.extend(self._extract_f4m_formats(
uri + '?hdcore=3.3.0&plugin=aasp-3.3.0.99.43',
video_id, preference, f4m_id=format_id))
elif target == 'HLS':
formats.extend(self._extract_m3u8_formats(
uri, video_id, 'mp4', preference=preference,
m3u8_id=format_id))
else:
bitrate = link.get('Bitrate')
if bitrate:
format_id += '-%s' % bitrate
formats.append({
'url': uri,
'format_id': format_id,
'tbr': bitrate,
'ext': link.get('FileFormat'),
})
subtitles_list = asset.get('SubtitlesList')
if isinstance(subtitles_list, list):
LANGS = {

View File

@@ -26,7 +26,7 @@ class DumpertIE(InfoExtractor):
video_id = self._match_id(url)
req = compat_urllib_request.Request(url)
req.add_header('Cookie', 'nsfw=1')
req.add_header('Cookie', 'nsfw=1; cpc=10')
webpage = self._download_webpage(req, video_id)
files_base64 = self._search_regex(

View File

@@ -4,22 +4,28 @@ from .tnaflix import TNAFlixIE
class EMPFlixIE(TNAFlixIE):
_VALID_URL = r'^https?://www\.empflix\.com/videos/(?P<display_id>[0-9a-zA-Z-]+)-(?P<id>[0-9]+)\.html'
_VALID_URL = r'https?://(?:www\.)?empflix\.com/videos/(?P<display_id>.+?)-(?P<id>[0-9]+)\.html'
_TITLE_REGEX = r'name="title" value="(?P<title>[^"]*)"'
_DESCRIPTION_REGEX = r'name="description" value="([^"]*)"'
_CONFIG_REGEX = r'flashvars\.config\s*=\s*escape\("([^"]+)"'
_TEST = {
'url': 'http://www.empflix.com/videos/Amateur-Finger-Fuck-33051.html',
'md5': 'b1bc15b6412d33902d6e5952035fcabc',
'info_dict': {
'id': '33051',
'display_id': 'Amateur-Finger-Fuck',
'ext': 'mp4',
'title': 'Amateur Finger Fuck',
'description': 'Amateur solo finger fucking.',
'thumbnail': 're:https?://.*\.jpg$',
'age_limit': 18,
_TESTS = [
{
'url': 'http://www.empflix.com/videos/Amateur-Finger-Fuck-33051.html',
'md5': 'b1bc15b6412d33902d6e5952035fcabc',
'info_dict': {
'id': '33051',
'display_id': 'Amateur-Finger-Fuck',
'ext': 'mp4',
'title': 'Amateur Finger Fuck',
'description': 'Amateur solo finger fucking.',
'thumbnail': 're:https?://.*\.jpg$',
'age_limit': 18,
}
},
{
'url': 'http://www.empflix.com/videos/[AROMA][ARMD-718]-Aoi-Yoshino-Sawa-25826.html',
'matching_only': True,
}
}
]

View File

@@ -4,7 +4,10 @@ import re
from .common import InfoExtractor
from ..compat import compat_urllib_parse
from ..utils import ExtractorError
from ..utils import (
ExtractorError,
unescapeHTML
)
class EroProfileIE(InfoExtractor):
@@ -75,8 +78,8 @@ class EroProfileIE(InfoExtractor):
[r"glbUpdViews\s*\('\d*','(\d+)'", r'p/report/video/(\d+)'],
webpage, 'video id', default=None)
video_url = self._search_regex(
r'<source src="([^"]+)', webpage, 'video url')
video_url = unescapeHTML(self._search_regex(
r'<source src="([^"]+)', webpage, 'video url'))
title = self._html_search_regex(
r'Title:</th><td>([^<]+)</td>', webpage, 'title')
thumbnail = self._search_regex(

View File

@@ -8,7 +8,8 @@ from ..compat import compat_urllib_request
from ..utils import (
determine_ext,
clean_html,
qualities,
int_or_none,
float_or_none,
)
@@ -36,10 +37,10 @@ def _decrypt_config(key, string):
class EscapistIE(InfoExtractor):
_VALID_URL = r'https?://?(www\.)?escapistmagazine\.com/videos/view/[^/?#]+/(?P<id>[0-9]+)-[^/?#]*(?:$|[?#])'
_VALID_URL = r'https?://?(?:www\.)?escapistmagazine\.com/videos/view/[^/?#]+/(?P<id>[0-9]+)-[^/?#]*(?:$|[?#])'
_TESTS = [{
'url': 'http://www.escapistmagazine.com/videos/view/the-escapist-presents/6618-Breaking-Down-Baldurs-Gate',
'md5': 'c6793dbda81388f4264c1ba18684a74d',
'md5': 'ab3a706c681efca53f0a35f1415cf0d1',
'info_dict': {
'id': '6618',
'ext': 'mp4',
@@ -47,10 +48,11 @@ class EscapistIE(InfoExtractor):
'title': "Breaking Down Baldur's Gate",
'thumbnail': 're:^https?://.*\.jpg$',
'duration': 264,
'uploader': 'The Escapist',
}
}, {
'url': 'http://www.escapistmagazine.com/videos/view/zero-punctuation/10044-Evolve-One-vs-Multiplayer',
'md5': 'cf8842a8a46444d241f9a9980d7874f2',
'md5': '9e8c437b0dbb0387d3bd3255ca77f6bf',
'info_dict': {
'id': '10044',
'ext': 'mp4',
@@ -58,6 +60,7 @@ class EscapistIE(InfoExtractor):
'title': 'Evolve - One vs Multiplayer',
'thumbnail': 're:^https?://.*\.jpg$',
'duration': 304,
'uploader': 'The Escapist',
}
}]
@@ -65,35 +68,33 @@ class EscapistIE(InfoExtractor):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
imsVideo = self._parse_json(
ims_video = self._parse_json(
self._search_regex(
r'imsVideo\.play\(({.+?})\);', webpage, 'imsVideo'),
video_id)
video_id = imsVideo['videoID']
key = imsVideo['hash']
video_id = ims_video['videoID']
key = ims_video['hash']
quality = qualities(['lq', 'hq', 'hd'])
config_req = compat_urllib_request.Request(
'http://www.escapistmagazine.com/videos/'
'vidconfig.php?videoID=%s&hash=%s' % (video_id, key))
config_req.add_header('Referer', url)
config = self._download_webpage(config_req, video_id, 'Downloading video config')
formats = []
for q in ['lq', 'hq', 'hd']:
config_req = compat_urllib_request.Request('http://www.escapistmagazine.com/videos/'
'vidconfig.php?videoID=%s&hash=%s&quality=%s' % (video_id, key, 'mp4_' + q))
config_req.add_header('Referer', url)
config = self._download_webpage(config_req, video_id, 'Downloading video config ' + q.upper())
data = json.loads(_decrypt_config(key, config))
data = json.loads(_decrypt_config(key, config))
video_data = data['videoData']
title = clean_html(data['videoData']['title'])
duration = data['videoData']['duration'] / 1000
for i, v in enumerate(data['files']['videos']):
formats.append({
'url': v,
'format_id': determine_ext(v) + '_' + q + str(i),
'quality': quality(q),
})
title = clean_html(video_data['title'])
duration = float_or_none(video_data.get('duration'), 1000)
uploader = video_data.get('publisher')
formats = [{
'url': video['src'],
'format_id': '%s-%sp' % (determine_ext(video['src']), video['res']),
'height': int_or_none(video.get('res')),
} for video in data['files']['videos']]
self._sort_formats(formats)
return {
'id': video_id,
@@ -102,4 +103,5 @@ class EscapistIE(InfoExtractor):
'thumbnail': self._og_search_thumbnail(webpage),
'description': self._og_search_description(webpage),
'duration': duration,
'uploader': uploader,
}

View File

@@ -0,0 +1,55 @@
from __future__ import unicode_literals
from .common import InfoExtractor
class ESPNIE(InfoExtractor):
_VALID_URL = r'https?://espn\.go\.com/(?:[^/]+/)*(?P<id>[^/]+)'
_WORKING = False
_TESTS = [{
'url': 'http://espn.go.com/video/clip?id=10365079',
'info_dict': {
'id': 'FkYWtmazr6Ed8xmvILvKLWjd4QvYZpzG',
'ext': 'mp4',
'title': 'dm_140128_30for30Shorts___JudgingJewellv2',
'description': '',
},
'params': {
# m3u8 download
'skip_download': True,
},
}, {
'url': 'https://espn.go.com/video/iframe/twitter/?cms=espn&id=10365079',
'only_matching': True,
}, {
'url': 'http://espn.go.com/nba/recap?gameId=400793786',
'only_matching': True,
}, {
'url': 'http://espn.go.com/blog/golden-state-warriors/post/_/id/593/how-warriors-rapidly-regained-a-winning-edge',
'only_matching': True,
}, {
'url': 'http://espn.go.com/sports/endurance/story/_/id/12893522/dzhokhar-tsarnaev-sentenced-role-boston-marathon-bombings',
'only_matching': True,
}, {
'url': 'http://espn.go.com/nba/playoffs/2015/story/_/id/12887571/john-wall-washington-wizards-no-swelling-left-hand-wrist-game-5-return',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
video_id = self._search_regex(
r'class="video-play-button"[^>]+data-id="(\d+)',
webpage, 'video id')
player = self._download_webpage(
'https://espn.go.com/video/iframe/twitter/?id=%s' % video_id, video_id)
pcode = self._search_regex(
r'["\']pcode=([^"\']+)["\']', player, 'pcode')
return self.url_result(
'ooyalaexternal:espn:%s:%s' % (video_id, pcode),
'OoyalaExternal')

View File

@@ -50,7 +50,10 @@ class FacebookIE(InfoExtractor):
'id': '274175099429670',
'ext': 'mp4',
'title': 'Facebook video #274175099429670',
}
},
'expected_warnings': [
'title'
]
}, {
'url': 'https://www.facebook.com/video.php?v=10204634152394104',
'only_matching': True,
@@ -149,12 +152,12 @@ class FacebookIE(InfoExtractor):
raise ExtractorError('Cannot find video formats')
video_title = self._html_search_regex(
r'<h2 class="uiHeaderTitle">([^<]*)</h2>', webpage, 'title',
fatal=False)
r'<h2\s+[^>]*class="uiHeaderTitle"[^>]*>([^<]*)</h2>', webpage, 'title',
default=None)
if not video_title:
video_title = self._html_search_regex(
r'(?s)<span class="fbPhotosPhotoCaption".*?id="fbPhotoPageCaption"><span class="hasCaption">(.*?)</span>',
webpage, 'alternative title', default=None)
webpage, 'alternative title', fatal=False)
video_title = limit_length(video_title, 80)
if not video_title:
video_title = 'Facebook video #%s' % video_id

View File

@@ -1,80 +0,0 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import (
compat_urllib_parse,
compat_urllib_request,
)
from ..utils import (
ExtractorError,
)
class FiredriveIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?firedrive\.com/' + \
'(?:file|embed)/(?P<id>[0-9a-zA-Z]+)'
_FILE_DELETED_REGEX = r'<div class="removed_file_image">'
_TESTS = [{
'url': 'https://www.firedrive.com/file/FEB892FA160EBD01',
'md5': 'd5d4252f80ebeab4dc2d5ceaed1b7970',
'info_dict': {
'id': 'FEB892FA160EBD01',
'ext': 'flv',
'title': 'bbb_theora_486kbit.flv',
'thumbnail': 're:^http://.*\.jpg$',
},
}]
def _real_extract(self, url):
video_id = self._match_id(url)
url = 'http://firedrive.com/file/%s' % video_id
webpage = self._download_webpage(url, video_id)
if re.search(self._FILE_DELETED_REGEX, webpage) is not None:
raise ExtractorError('Video %s does not exist' % video_id,
expected=True)
fields = dict(re.findall(r'''(?x)<input\s+
type="hidden"\s+
name="([^"]+)"\s+
value="([^"]*)"
''', webpage))
post = compat_urllib_parse.urlencode(fields)
req = compat_urllib_request.Request(url, post)
req.add_header('Content-type', 'application/x-www-form-urlencoded')
# Apparently, this header is required for confirmation to work.
req.add_header('Host', 'www.firedrive.com')
webpage = self._download_webpage(req, video_id,
'Downloading video page')
title = self._search_regex(r'class="external_title_left">(.+)</div>',
webpage, 'title')
thumbnail = self._search_regex(r'image:\s?"(//[^\"]+)', webpage,
'thumbnail', fatal=False)
if thumbnail is not None:
thumbnail = 'http:' + thumbnail
ext = self._search_regex(r'type:\s?\'([^\']+)\',',
webpage, 'extension', fatal=False)
video_url = self._search_regex(
r'file:\s?loadURL\(\'(http[^\']+)\'\),', webpage, 'file url')
formats = [{
'format_id': 'sd',
'url': video_url,
'ext': ext,
}]
return {
'id': video_id,
'title': title,
'thumbnail': thumbnail,
'formats': formats,
}

View File

@@ -0,0 +1,32 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import smuggle_url
class FoxSportsIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?foxsports\.com/(?:[^/]+/)*(?P<id>[^/]+)'
_TEST = {
'url': 'http://www.foxsports.com/video?vid=432609859715',
'info_dict': {
'id': 'gA0bHB3Ladz3',
'ext': 'flv',
'title': 'Courtney Lee on going up 2-0 in series vs. Blazers',
'description': 'Courtney Lee talks about Memphis being focused.',
},
'add_ie': ['ThePlatform'],
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
config = self._parse_json(
self._search_regex(
r"data-player-config='([^']+)'", webpage, 'data player config'),
video_id)
return self.url_result(smuggle_url(
config['releaseURL'] + '&manifest=f4m', {'force_smil_url': True}))

View File

@@ -14,8 +14,8 @@ from ..utils import (
class GameSpotIE(InfoExtractor):
_VALID_URL = r'(?:http://)?(?:www\.)?gamespot\.com/.*-(?P<id>\d+)/?'
_TEST = {
_VALID_URL = r'http://(?:www\.)?gamespot\.com/.*-(?P<id>\d+)/?'
_TESTS = [{
'url': 'http://www.gamespot.com/videos/arma-3-community-guide-sitrep-i/2300-6410818/',
'md5': 'b2a30deaa8654fcccd43713a6b6a4825',
'info_dict': {
@@ -23,8 +23,16 @@ class GameSpotIE(InfoExtractor):
'ext': 'mp4',
'title': 'Arma 3 - Community Guide: SITREP I',
'description': 'Check out this video where some of the basics of Arma 3 is explained.',
}
}
},
}, {
'url': 'http://www.gamespot.com/videos/the-witcher-3-wild-hunt-xbox-one-now-playing/2300-6424837/',
'info_dict': {
'id': 'gs-2300-6424837',
'ext': 'flv',
'title': 'The Witcher 3: Wild Hunt [Xbox ONE] - Now Playing',
'description': 'Join us as we take a look at the early hours of The Witcher 3: Wild Hunt and more.',
},
}]
def _real_extract(self, url):
page_id = self._match_id(url)
@@ -32,25 +40,37 @@ class GameSpotIE(InfoExtractor):
data_video_json = self._search_regex(
r'data-video=["\'](.*?)["\']', webpage, 'data video')
data_video = json.loads(unescapeHTML(data_video_json))
streams = data_video['videoStreams']
# Transform the manifest url to a link to the mp4 files
# they are used in mobile devices.
f4m_url = data_video['videoStreams']['f4m_stream']
f4m_path = compat_urlparse.urlparse(f4m_url).path
QUALITIES_RE = r'((,\d+)+,?)'
qualities = self._search_regex(QUALITIES_RE, f4m_path, 'qualities').strip(',').split(',')
http_path = f4m_path[1:].split('/', 1)[1]
http_template = re.sub(QUALITIES_RE, r'%s', http_path)
http_template = http_template.replace('.csmil/manifest.f4m', '')
http_template = compat_urlparse.urljoin(
'http://video.gamespotcdn.com/', http_template)
formats = []
for q in qualities:
formats.append({
'url': http_template % q,
'ext': 'mp4',
'format_id': q,
})
f4m_url = streams.get('f4m_stream')
if f4m_url is not None:
# Transform the manifest url to a link to the mp4 files
# they are used in mobile devices.
f4m_path = compat_urlparse.urlparse(f4m_url).path
QUALITIES_RE = r'((,\d+)+,?)'
qualities = self._search_regex(QUALITIES_RE, f4m_path, 'qualities').strip(',').split(',')
http_path = f4m_path[1:].split('/', 1)[1]
http_template = re.sub(QUALITIES_RE, r'%s', http_path)
http_template = http_template.replace('.csmil/manifest.f4m', '')
http_template = compat_urlparse.urljoin(
'http://video.gamespotcdn.com/', http_template)
for q in qualities:
formats.append({
'url': http_template % q,
'ext': 'mp4',
'format_id': q,
})
else:
for quality in ['sd', 'hd']:
# It's actually a link to a flv file
flv_url = streams.get('f4m_{0}'.format(quality))
if flv_url is not None:
formats.append({
'url': flv_url,
'ext': 'flv',
'format_id': quality,
})
return {
'id': data_video['guid'],

View File

@@ -32,11 +32,13 @@ from .brightcove import BrightcoveIE
from .nbc import NBCSportsVPlayerIE
from .ooyala import OoyalaIE
from .rutv import RUTVIE
from .sportbox import SportBoxEmbedIE
from .smotri import SmotriIE
from .condenast import CondeNastIE
from .udn import UDNEmbedIE
from .senateisvp import SenateISVPIE
from .bliptv import BlipTVIE
from .svt import SVTIE
class GenericIE(InfoExtractor):
@@ -223,6 +225,37 @@ class GenericIE(InfoExtractor):
'skip_download': True,
},
},
# SportBox embed
{
'url': 'http://www.vestifinance.ru/articles/25753',
'info_dict': {
'id': '25753',
'title': 'Вести Экономика ― Прямые трансляции с Форума-выставки "Госзаказ-2013"',
},
'playlist': [{
'info_dict': {
'id': '370908',
'title': 'Госзаказ. День 3',
'ext': 'mp4',
}
}, {
'info_dict': {
'id': '370905',
'title': 'Госзаказ. День 2',
'ext': 'mp4',
}
}, {
'info_dict': {
'id': '370902',
'title': 'Госзаказ. День 1',
'ext': 'mp4',
}
}],
'params': {
# m3u8 download
'skip_download': True,
},
},
# Embedded TED video
{
'url': 'http://en.support.wordpress.com/videos/ted-talks/',
@@ -645,6 +678,17 @@ class GenericIE(InfoExtractor):
'title': 'Facebook Creates "On This Day" | Crunch Report',
},
},
# SVT embed
{
'url': 'http://www.svt.se/sport/ishockey/jagr-tacklar-giroux-under-intervjun',
'info_dict': {
'id': '2900353',
'ext': 'flv',
'title': 'Här trycker Jagr till Giroux (under SVT-intervjun)',
'duration': 27,
'age_limit': 0,
},
},
# RSS feed with enclosure
{
'url': 'http://podcastfeeds.nbcnews.com/audio/podcast/MSNBC-MADDOW-NETCAST-M4V.xml',
@@ -1078,6 +1122,11 @@ class GenericIE(InfoExtractor):
if bliptv_url:
return self.url_result(bliptv_url, 'BlipTV')
# Look for SVT player
svt_url = SVTIE._extract_url(webpage)
if svt_url:
return self.url_result(svt_url, 'SVT')
# Look for embedded condenast player
matches = re.findall(
r'<iframe\s+(?:[a-zA-Z-]+="[^"]+"\s+)*?src="(https?://player\.cnevids\.com/embed/[^"]+")',
@@ -1212,6 +1261,11 @@ class GenericIE(InfoExtractor):
if rutv_url:
return self.url_result(rutv_url, 'RUTV')
# Look for embedded SportBox player
sportbox_urls = SportBoxEmbedIE._extract_urls(webpage)
if sportbox_urls:
return _playlist_from_matches(sportbox_urls, ie='SportBoxEmbed')
# Look for embedded TED player
mobj = re.search(
r'<iframe[^>]+?src=(["\'])(?P<url>https?://embed(?:-ssl)?\.ted\.com/.+?)\1', webpage)
@@ -1289,6 +1343,10 @@ class GenericIE(InfoExtractor):
mobj = re.search(
r'<iframe[^>]+?src=(["\'])(?P<url>https?://m(?:lb)?\.mlb\.com/shared/video/embed/embed\.html\?.+?)\1',
webpage)
if not mobj:
mobj = re.search(
r'data-video-link=["\'](?P<url>http://m.mlb.com/video/[^"\']+)',
webpage)
if mobj is not None:
return self.url_result(mobj.group('url'), 'MLB')
@@ -1367,7 +1425,7 @@ class GenericIE(InfoExtractor):
# Look for Senate ISVP iframe
senate_isvp_url = SenateISVPIE._search_iframe_url(webpage)
if senate_isvp_url:
return self.url_result(surl, 'SenateISVP')
return self.url_result(senate_isvp_url, 'SenateISVP')
def check_video(vurl):
if YoutubeIE.suitable(vurl):
@@ -1436,7 +1494,7 @@ class GenericIE(InfoExtractor):
if refresh_header:
found = re.search(REDIRECT_REGEX, refresh_header)
if found:
new_url = found.group(1)
new_url = compat_urlparse.urljoin(url, found.group(1))
self.report_following_redirect(new_url)
return {
'_type': 'url',

View File

@@ -85,7 +85,8 @@ class GigaIE(InfoExtractor):
r'class="author">([^<]+)</a>', webpage, 'uploader', fatal=False)
view_count = str_to_int(self._search_regex(
r'<span class="views"><strong>([\d.]+)</strong>', webpage, 'view count', fatal=False))
r'<span class="views"><strong>([\d.,]+)</strong>',
webpage, 'view count', fatal=False))
return {
'id': video_id,

View File

@@ -35,13 +35,7 @@ class GorillaVidIE(InfoExtractor):
},
}, {
'url': 'http://gorillavid.in/embed-z08zf8le23c6-960x480.html',
'md5': 'c9e293ca74d46cad638e199c3f3fe604',
'info_dict': {
'id': 'z08zf8le23c6',
'ext': 'mp4',
'title': 'Say something nice',
'thumbnail': 're:http://.*\.jpg',
},
'only_matching': True,
}, {
'url': 'http://daclips.in/3rso4kdn6f9m',
'md5': '1ad8fd39bb976eeb66004d3a4895f106',

View File

@@ -1,191 +0,0 @@
# coding: utf-8
from __future__ import unicode_literals
import time
import math
import os.path
import re
from .common import InfoExtractor
from ..compat import (
compat_html_parser,
compat_urllib_parse,
compat_urllib_request,
compat_urlparse,
)
from ..utils import ExtractorError
class GroovesharkHtmlParser(compat_html_parser.HTMLParser):
def __init__(self):
self._current_object = None
self.objects = []
compat_html_parser.HTMLParser.__init__(self)
def handle_starttag(self, tag, attrs):
attrs = dict((k, v) for k, v in attrs)
if tag == 'object':
self._current_object = {'attrs': attrs, 'params': []}
elif tag == 'param':
self._current_object['params'].append(attrs)
def handle_endtag(self, tag):
if tag == 'object':
self.objects.append(self._current_object)
self._current_object = None
@classmethod
def extract_object_tags(cls, html):
p = cls()
p.feed(html)
p.close()
return p.objects
class GroovesharkIE(InfoExtractor):
_VALID_URL = r'https?://(www\.)?grooveshark\.com/#!/s/([^/]+)/([^/]+)'
_TEST = {
'url': 'http://grooveshark.com/#!/s/Jolene+Tenth+Key+Remix+Ft+Will+Sessions/6SS1DW?src=5',
'md5': '7ecf8aefa59d6b2098517e1baa530023',
'info_dict': {
'id': '6SS1DW',
'title': 'Jolene (Tenth Key Remix ft. Will Sessions)',
'ext': 'mp3',
'duration': 227,
}
}
do_playerpage_request = True
do_bootstrap_request = True
def _parse_target(self, target):
uri = compat_urlparse.urlparse(target)
hash = uri.fragment[1:].split('?')[0]
token = os.path.basename(hash.rstrip('/'))
return (uri, hash, token)
def _build_bootstrap_url(self, target):
(uri, hash, token) = self._parse_target(target)
query = 'getCommunicationToken=1&hash=%s&%d' % (compat_urllib_parse.quote(hash, safe=''), self.ts)
return (compat_urlparse.urlunparse((uri.scheme, uri.netloc, '/preload.php', None, query, None)), token)
def _build_meta_url(self, target):
(uri, hash, token) = self._parse_target(target)
query = 'hash=%s&%d' % (compat_urllib_parse.quote(hash, safe=''), self.ts)
return (compat_urlparse.urlunparse((uri.scheme, uri.netloc, '/preload.php', None, query, None)), token)
def _build_stream_url(self, meta):
return compat_urlparse.urlunparse(('http', meta['streamKey']['ip'], '/stream.php', None, None, None))
def _build_swf_referer(self, target, obj):
(uri, _, _) = self._parse_target(target)
return compat_urlparse.urlunparse((uri.scheme, uri.netloc, obj['attrs']['data'], None, None, None))
def _transform_bootstrap(self, js):
return re.split('(?m)^\s*try\s*\{', js)[0] \
.split(' = ', 1)[1].strip().rstrip(';')
def _transform_meta(self, js):
return js.split('\n')[0].split('=')[1].rstrip(';')
def _get_meta(self, target):
(meta_url, token) = self._build_meta_url(target)
self.to_screen('Metadata URL: %s' % meta_url)
headers = {'Referer': compat_urlparse.urldefrag(target)[0]}
req = compat_urllib_request.Request(meta_url, headers=headers)
res = self._download_json(req, token,
transform_source=self._transform_meta)
if 'getStreamKeyWithSong' not in res:
raise ExtractorError(
'Metadata not found. URL may be malformed, or Grooveshark API may have changed.')
if res['getStreamKeyWithSong'] is None:
raise ExtractorError(
'Metadata download failed, probably due to Grooveshark anti-abuse throttling. Wait at least an hour before retrying from this IP.',
expected=True)
return res['getStreamKeyWithSong']
def _get_bootstrap(self, target):
(bootstrap_url, token) = self._build_bootstrap_url(target)
headers = {'Referer': compat_urlparse.urldefrag(target)[0]}
req = compat_urllib_request.Request(bootstrap_url, headers=headers)
res = self._download_json(req, token, fatal=False,
note='Downloading player bootstrap data',
errnote='Unable to download player bootstrap data',
transform_source=self._transform_bootstrap)
return res
def _get_playerpage(self, target):
(_, _, token) = self._parse_target(target)
webpage = self._download_webpage(
target, token,
note='Downloading player page',
errnote='Unable to download player page',
fatal=False)
if webpage is not None:
# Search (for example German) error message
error_msg = self._html_search_regex(
r'<div id="content">\s*<h2>(.*?)</h2>', webpage,
'error message', default=None)
if error_msg is not None:
error_msg = error_msg.replace('\n', ' ')
raise ExtractorError('Grooveshark said: %s' % error_msg)
if webpage is not None:
o = GroovesharkHtmlParser.extract_object_tags(webpage)
return webpage, [x for x in o if x['attrs']['id'] == 'jsPlayerEmbed']
return webpage, None
def _real_initialize(self):
self.ts = int(time.time() * 1000) # timestamp in millis
def _real_extract(self, url):
(target_uri, _, token) = self._parse_target(url)
# 1. Fill cookiejar by making a request to the player page
swf_referer = None
if self.do_playerpage_request:
(_, player_objs) = self._get_playerpage(url)
if player_objs:
swf_referer = self._build_swf_referer(url, player_objs[0])
self.to_screen('SWF Referer: %s' % swf_referer)
# 2. Ask preload.php for swf bootstrap data to better mimic webapp
if self.do_bootstrap_request:
bootstrap = self._get_bootstrap(url)
self.to_screen('CommunicationToken: %s' % bootstrap['getCommunicationToken'])
# 3. Ask preload.php for track metadata.
meta = self._get_meta(url)
# 4. Construct stream request for track.
stream_url = self._build_stream_url(meta)
duration = int(math.ceil(float(meta['streamKey']['uSecs']) / 1000000))
post_dict = {'streamKey': meta['streamKey']['streamKey']}
post_data = compat_urllib_parse.urlencode(post_dict).encode('utf-8')
headers = {
'Content-Length': len(post_data),
'Content-Type': 'application/x-www-form-urlencoded'
}
if swf_referer is not None:
headers['Referer'] = swf_referer
return {
'id': token,
'title': meta['song']['Name'],
'http_method': 'POST',
'url': stream_url,
'ext': 'mp3',
'format': 'mp3 audio',
'duration': duration,
'http_post_data': post_data,
'http_headers': headers,
}

View File

@@ -25,7 +25,8 @@ class HistoricFilmsIE(InfoExtractor):
webpage = self._download_webpage(url, video_id)
tape_id = self._search_regex(
r'class="tapeId">([^<]+)<', webpage, 'tape id')
[r'class="tapeId"[^>]*>([^<]+)<', r'tapeId\s*:\s*"([^"]+)"'],
webpage, 'tape id')
title = self._og_search_title(webpage)
description = self._og_search_description(webpage)

View File

@@ -1,36 +1,75 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import int_or_none
class IconosquareIE(InfoExtractor):
_VALID_URL = r'https?://(www\.)?(?:iconosquare\.com|statigr\.am)/p/(?P<id>[^/]+)'
_VALID_URL = r'https?://(?:www\.)?(?:iconosquare\.com|statigr\.am)/p/(?P<id>[^/]+)'
_TEST = {
'url': 'http://statigr.am/p/522207370455279102_24101272',
'md5': '6eb93b882a3ded7c378ee1d6884b1814',
'info_dict': {
'id': '522207370455279102_24101272',
'ext': 'mp4',
'uploader_id': 'aguynamedpatrick',
'title': 'Instagram photo by @aguynamedpatrick (Patrick Janelle)',
'title': 'Instagram media by @aguynamedpatrick (Patrick Janelle)',
'description': 'md5:644406a9ec27457ed7aa7a9ebcd4ce3d',
'timestamp': 1376471991,
'upload_date': '20130814',
'uploader': 'aguynamedpatrick',
'uploader_id': '24101272',
'comment_count': int,
'like_count': int,
},
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
media = self._parse_json(
self._search_regex(
r'window\.media\s*=\s*({.+?});\n', webpage, 'media'),
video_id)
formats = [{
'url': f['url'],
'format_id': format_id,
'width': int_or_none(f.get('width')),
'height': int_or_none(f.get('height'))
} for format_id, f in media['videos'].items()]
self._sort_formats(formats)
title = self._html_search_regex(
r'<title>(.+?)(?: *\(Videos?\))? \| (?:Iconosquare|Statigram)</title>',
webpage, 'title')
uploader_id = self._html_search_regex(
r'@([^ ]+)', title, 'uploader name', fatal=False)
timestamp = int_or_none(media.get('created_time') or media.get('caption', {}).get('created_time'))
description = media.get('caption', {}).get('text')
uploader = media.get('user', {}).get('username')
uploader_id = media.get('user', {}).get('id')
comment_count = int_or_none(media.get('comments', {}).get('count'))
like_count = int_or_none(media.get('likes', {}).get('count'))
thumbnails = [{
'url': t['url'],
'id': thumbnail_id,
'width': int_or_none(t.get('width')),
'height': int_or_none(t.get('height'))
} for thumbnail_id, t in media.get('images', {}).items()]
return {
'id': video_id,
'url': self._og_search_video_url(webpage),
'title': title,
'description': self._og_search_description(webpage),
'thumbnail': self._og_search_thumbnail(webpage),
'uploader_id': uploader_id
'description': description,
'thumbnails': thumbnails,
'timestamp': timestamp,
'uploader': uploader,
'uploader_id': uploader_id,
'comment_count': comment_count,
'like_count': like_count,
'formats': formats,
}

View File

@@ -7,9 +7,9 @@ from ..utils import int_or_none
class InstagramIE(InfoExtractor):
_VALID_URL = r'https?://instagram\.com/p/(?P<id>[\da-zA-Z]+)'
_VALID_URL = r'https://instagram\.com/p/(?P<id>[\da-zA-Z]+)'
_TEST = {
'url': 'http://instagram.com/p/aye83DjauH/?foo=bar#abc',
'url': 'https://instagram.com/p/aye83DjauH/?foo=bar#abc',
'md5': '0d2da106a9d2631273e192b372806516',
'info_dict': {
'id': 'aye83DjauH',
@@ -41,11 +41,11 @@ class InstagramIE(InfoExtractor):
class InstagramUserIE(InfoExtractor):
_VALID_URL = r'http://instagram\.com/(?P<username>[^/]{2,})/?(?:$|[?#])'
_VALID_URL = r'https://instagram\.com/(?P<username>[^/]{2,})/?(?:$|[?#])'
IE_DESC = 'Instagram user profile'
IE_NAME = 'instagram:user'
_TEST = {
'url': 'http://instagram.com/porsche',
'url': 'https://instagram.com/porsche',
'info_dict': {
'id': 'porsche',
'title': 'porsche',

View File

@@ -7,6 +7,7 @@ from .common import InfoExtractor
from ..utils import (
ExtractorError,
float_or_none,
srt_subtitles_timecode,
)
@@ -39,8 +40,8 @@ class KanalPlayIE(InfoExtractor):
'%s\r\n%s --> %s\r\n%s'
% (
num,
self._subtitles_timecode(item['startMillis'] / 1000.0),
self._subtitles_timecode(item['endMillis'] / 1000.0),
srt_subtitles_timecode(item['startMillis'] / 1000.0),
srt_subtitles_timecode(item['endMillis'] / 1000.0),
item['text'],
) for num, item in enumerate(subs, 1))

View File

@@ -0,0 +1,96 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_urlparse
from ..utils import (
fix_xml_ampersands,
float_or_none,
xpath_with_ns,
xpath_text,
)
class KarriereVideosIE(InfoExtractor):
_VALID_URL = r'http://(?:www\.)?karrierevideos\.at(?:/[^/]+)+/(?P<id>[^/]+)'
_TESTS = [{
'url': 'http://www.karrierevideos.at/berufsvideos/mittlere-hoehere-schulen/altenpflegerin',
'info_dict': {
'id': '32c91',
'ext': 'flv',
'title': 'AltenpflegerIn',
'description': 'md5:dbadd1259fde2159a9b28667cb664ae2',
'thumbnail': 're:^http://.*\.png',
},
'params': {
# rtmp download
'skip_download': True,
}
}, {
# broken ampersands
'url': 'http://www.karrierevideos.at/orientierung/vaeterkarenz-und-neue-chancen-fuer-muetter-baby-was-nun',
'info_dict': {
'id': '5sniu',
'ext': 'flv',
'title': 'Väterkarenz und neue Chancen für Mütter - "Baby - was nun?"',
'description': 'md5:97092c6ad1fd7d38e9d6a5fdeb2bcc33',
'thumbnail': 're:^http://.*\.png',
},
'params': {
# rtmp download
'skip_download': True,
}
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
title = (self._html_search_meta('title', webpage, default=None) or
self._search_regex(r'<h1 class="title">([^<]+)</h1>'))
video_id = self._search_regex(
r'/config/video/(.+?)\.xml', webpage, 'video id')
playlist = self._download_xml(
'http://www.karrierevideos.at/player-playlist.xml.php?p=%s' % video_id,
video_id, transform_source=fix_xml_ampersands)
NS_MAP = {
'jwplayer': 'http://developer.longtailvideo.com/trac/wiki/FlashFormats'
}
def ns(path):
return xpath_with_ns(path, NS_MAP)
item = playlist.find('./tracklist/item')
video_file = xpath_text(
item, ns('./jwplayer:file'), 'video url', fatal=True)
streamer = xpath_text(
item, ns('./jwplayer:streamer'), 'streamer', fatal=True)
uploader = xpath_text(
item, ns('./jwplayer:author'), 'uploader')
duration = float_or_none(
xpath_text(item, ns('./jwplayer:duration'), 'duration'))
description = self._html_search_regex(
r'(?s)<div class="leadtext">(.+?)</div>',
webpage, 'description')
thumbnail = self._html_search_meta(
'thumbnail', webpage, 'thumbnail')
if thumbnail:
thumbnail = compat_urlparse.urljoin(url, thumbnail)
return {
'id': video_id,
'url': streamer.replace('rtmpt', 'rtmp'),
'play_path': 'mp4:%s' % video_file,
'ext': 'flv',
'title': title,
'description': description,
'thumbnail': thumbnail,
'uploader': uploader,
'duration': duration,
}

View File

@@ -50,9 +50,7 @@ class LetvIE(InfoExtractor):
'title': '与龙共舞 完整版',
'description': 'md5:7506a5eeb1722bb9d4068f85024e3986',
},
'params': {
'cn_verification_proxy': 'http://proxy.uku.im:8888'
},
'skip': 'Only available in China',
}]
@staticmethod

View File

@@ -4,7 +4,9 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_urlparse
from ..utils import (
determine_ext,
int_or_none,
unified_strdate,
ExtractorError,
@@ -14,9 +16,9 @@ from ..utils import (
class LifeNewsIE(InfoExtractor):
IE_NAME = 'lifenews'
IE_DESC = 'LIFE | NEWS'
_VALID_URL = r'http://lifenews\.ru/(?:mobile/)?news/(?P<id>\d+)'
_VALID_URL = r'http://lifenews\.ru/(?:mobile/)?(?P<section>news|video)/(?P<id>\d+)'
_TEST = {
_TESTS = [{
'url': 'http://lifenews.ru/news/126342',
'md5': 'e1b50a5c5fb98a6a544250f2e0db570a',
'info_dict': {
@@ -27,16 +29,47 @@ class LifeNewsIE(InfoExtractor):
'thumbnail': 're:http://.*\.jpg',
'upload_date': '20140130',
}
}
}, {
# video in <iframe>
'url': 'http://lifenews.ru/news/152125',
'md5': '77d19a6f0886cd76bdbf44b4d971a273',
'info_dict': {
'id': '152125',
'ext': 'mp4',
'title': 'В Сети появилось видео захвата «Правым сектором» колхозных полей ',
'description': 'Жители двух поселков Днепропетровской области не простили радикалам угрозу лишения плодородных земель и пошли в лобовую. ',
'upload_date': '20150402',
'uploader': 'embed.life.ru',
}
}, {
'url': 'http://lifenews.ru/news/153461',
'md5': '9b6ef8bc0ffa25aebc8bdb40d89ab795',
'info_dict': {
'id': '153461',
'ext': 'mp4',
'title': 'В Москве спасли потерявшегося медвежонка, который спрятался на дереве',
'description': 'Маленький хищник не смог найти дорогу домой и обрел временное убежище на тополе недалеко от жилого массива, пока его не нашла соседская собака.',
'upload_date': '20150505',
'uploader': 'embed.life.ru',
}
}, {
'url': 'http://lifenews.ru/video/13035',
'only_matching': True,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
section = mobj.group('section')
webpage = self._download_webpage('http://lifenews.ru/news/%s' % video_id, video_id, 'Downloading page')
webpage = self._download_webpage(
'http://lifenews.ru/%s/%s' % (section, video_id),
video_id, 'Downloading page')
videos = re.findall(r'<video.*?poster="(?P<poster>[^"]+)".*?src="(?P<video>[^"]+)".*?></video>', webpage)
if not videos:
iframe_link = self._html_search_regex(
'<iframe[^>]+src=["\']([^"\']+)["\']', webpage, 'iframe link', default=None)
if not videos and not iframe_link:
raise ExtractorError('No media links available for %s' % video_id)
title = self._og_search_title(webpage)
@@ -47,28 +80,90 @@ class LifeNewsIE(InfoExtractor):
description = self._og_search_description(webpage)
view_count = self._html_search_regex(
r'<div class=\'views\'>(\d+)</div>', webpage, 'view count', fatal=False)
r'<div class=\'views\'>\s*(\d+)\s*</div>', webpage, 'view count', fatal=False)
comment_count = self._html_search_regex(
r'<div class=\'comments\'>\s*<span class=\'counter\'>(\d+)</span>', webpage, 'comment count', fatal=False)
r'<div class=\'comments\'>\s*<span class=\'counter\'>\s*(\d+)\s*</span>', webpage, 'comment count', fatal=False)
upload_date = self._html_search_regex(
r'<time datetime=\'([^\']+)\'>', webpage, 'upload date', fatal=False)
if upload_date is not None:
upload_date = unified_strdate(upload_date)
common_info = {
'description': description,
'view_count': int_or_none(view_count),
'comment_count': int_or_none(comment_count),
'upload_date': upload_date,
}
def make_entry(video_id, media, video_number=None):
return {
cur_info = dict(common_info)
cur_info.update({
'id': video_id,
'url': media[1],
'thumbnail': media[0],
'title': title if video_number is None else '%s-video%s' % (title, video_number),
'description': description,
'view_count': int_or_none(view_count),
'comment_count': int_or_none(comment_count),
'upload_date': upload_date,
}
})
return cur_info
if iframe_link:
iframe_link = self._proto_relative_url(iframe_link, 'http:')
cur_info = dict(common_info)
cur_info.update({
'_type': 'url_transparent',
'id': video_id,
'title': title,
'url': iframe_link,
})
return cur_info
if len(videos) == 1:
return make_entry(video_id, videos[0])
else:
return [make_entry(video_id, media, video_number + 1) for video_number, media in enumerate(videos)]
class LifeEmbedIE(InfoExtractor):
IE_NAME = 'life:embed'
_VALID_URL = r'http://embed\.life\.ru/embed/(?P<id>[\da-f]{32})'
_TEST = {
'url': 'http://embed.life.ru/embed/e50c2dec2867350528e2574c899b8291',
'md5': 'b889715c9e49cb1981281d0e5458fbbe',
'info_dict': {
'id': 'e50c2dec2867350528e2574c899b8291',
'ext': 'mp4',
'title': 'e50c2dec2867350528e2574c899b8291',
'thumbnail': 're:http://.*\.jpg',
}
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
formats = []
for video_url in re.findall(r'"file"\s*:\s*"([^"]+)', webpage):
video_url = compat_urlparse.urljoin(url, video_url)
ext = determine_ext(video_url)
if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
video_url, video_id, 'mp4', m3u8_id='m3u8'))
else:
formats.append({
'url': video_url,
'format_id': ext,
'preference': 1,
})
self._sort_formats(formats)
thumbnail = self._search_regex(
r'"image"\s*:\s*"([^"]+)', webpage, 'thumbnail', default=None)
return {
'id': video_id,
'title': video_id,
'thumbnail': thumbnail,
'formats': formats,
}

View File

@@ -194,23 +194,19 @@ class LivestreamIE(InfoExtractor):
# The original version of Livestream uses a different system
class LivestreamOriginalIE(InfoExtractor):
IE_NAME = 'livestream:original'
_VALID_URL = r'''(?x)https?://www\.livestream\.com/
_VALID_URL = r'''(?x)https?://original\.livestream\.com/
(?P<user>[^/]+)/(?P<type>video|folder)
(?:\?.*?Id=|/)(?P<id>.*?)(&|$)
'''
_TESTS = [{
'url': 'http://www.livestream.com/dealbook/video?clipId=pla_8aa4a3f1-ba15-46a4-893b-902210e138fb',
'url': 'http://original.livestream.com/dealbook/video?clipId=pla_8aa4a3f1-ba15-46a4-893b-902210e138fb',
'info_dict': {
'id': 'pla_8aa4a3f1-ba15-46a4-893b-902210e138fb',
'ext': 'flv',
'ext': 'mp4',
'title': 'Spark 1 (BitCoin) with Cameron Winklevoss & Tyler Winklevoss of Winklevoss Capital',
},
'params': {
# rtmp
'skip_download': True,
},
}, {
'url': 'https://www.livestream.com/newplay/folder?dirId=a07bf706-d0e4-4e75-a747-b021d84f2fd3',
'url': 'https://original.livestream.com/newplay/folder?dirId=a07bf706-d0e4-4e75-a747-b021d84f2fd3',
'info_dict': {
'id': 'a07bf706-d0e4-4e75-a747-b021d84f2fd3',
},
@@ -221,19 +217,17 @@ class LivestreamOriginalIE(InfoExtractor):
api_url = 'http://x{0}x.api.channel.livestream.com/2.0/clipdetails?extendedInfo=true&id={1}'.format(user, video_id)
info = self._download_xml(api_url, video_id)
# this url is used on mobile devices
stream_url = 'http://x{0}x.api.channel.livestream.com/3.0/getstream.json?id={1}'.format(user, video_id)
stream_info = self._download_json(stream_url, video_id)
item = info.find('channel').find('item')
ns = {'media': 'http://search.yahoo.com/mrss'}
thumbnail_url = item.find(xpath_with_ns('media:thumbnail', ns)).attrib['url']
# Remove the extension and number from the path (like 1.jpg)
path = self._search_regex(r'(user-files/.+)_.*?\.jpg$', thumbnail_url, 'path')
return {
'id': video_id,
'title': item.find('title').text,
'url': 'rtmp://extondemand.livestream.com/ondemand',
'play_path': 'trans/dv15/mogulus-{0}'.format(path),
'player_url': 'http://static.livestream.com/chromelessPlayer/v21/playerapi.swf?hash=5uetk&v=0803&classid=D27CDB6E-AE6D-11cf-96B8-444553540000&jsEnabled=false&wmode=opaque',
'ext': 'flv',
'url': stream_info['progressiveUrl'],
'thumbnail': thumbnail_url,
}

View File

@@ -20,7 +20,6 @@ class MiTeleIE(InfoExtractor):
_TESTS = [{
'url': 'http://www.mitele.es/programas-tv/diario-de/la-redaccion/programa-144/',
'md5': '6a75fe9d0d3275bead0cb683c616fddb',
'info_dict': {
'id': '0fce117d',
'ext': 'mp4',
@@ -29,6 +28,10 @@ class MiTeleIE(InfoExtractor):
'display_id': 'programa-144',
'duration': 2913,
},
'params': {
# m3u8 download
'skip_download': True,
},
}]
def _real_extract(self, url):
@@ -56,12 +59,14 @@ class MiTeleIE(InfoExtractor):
episode,
transform_source=strip_jsonp
)
formats = self._extract_m3u8_formats(
token_info['tokenizedUrl'], episode, ext='mp4')
return {
'id': embed_data['videoId'],
'display_id': episode,
'title': info_el.find('title').text,
'url': token_info['tokenizedUrl'],
'formats': formats,
'description': get_element_by_attribute('class', 'text', webpage),
'thumbnail': info_el.find('thumb').text,
'duration': parse_duration(info_el.find('duration').text),

View File

@@ -10,7 +10,21 @@ from ..utils import (
class MLBIE(InfoExtractor):
_VALID_URL = r'https?://m(?:lb)?\.(?:[\da-z_-]+\.)?mlb\.com/(?:(?:.*?/)?video/(?:topic/[\da-z_-]+/)?v|(?:shared/video/embed/embed\.html|[^/]+/video/play\.jsp)\?.*?\bcontent_id=)(?P<id>n?\d+)'
_VALID_URL = r'''(?x)
https?://
(?:[\da-z_-]+\.)*mlb\.com/
(?:
(?:
(?:.*?/)?video/(?:topic/[\da-z_-]+/)?v|
(?:
shared/video/embed/(?:embed|m-internal-embed)\.html|
(?:[^/]+/)+(?:play|index)\.jsp|
)\?.*?\bcontent_id=
)
(?P<id>n?\d+)|
(?:[^/]+/)*(?P<path>[^/]+)
)
'''
_TESTS = [
{
'url': 'http://m.mlb.com/sea/video/topic/51231442/v34698933/nymsea-ackley-robs-a-home-run-with-an-amazing-catch/?c_id=sea',
@@ -68,6 +82,18 @@ class MLBIE(InfoExtractor):
'thumbnail': 're:^https?://.*\.jpg$',
},
},
{
'url': 'http://m.mlb.com/news/article/118550098/blue-jays-kevin-pillar-goes-spidey-up-the-wall-to-rob-tim-beckham-of-a-homer',
'md5': 'b190e70141fb9a1552a85426b4da1b5d',
'info_dict': {
'id': '75609783',
'ext': 'mp4',
'title': 'Must C: Pillar climbs for catch',
'description': '4/15/15: Blue Jays outfielder Kevin Pillar continues his defensive dominance by climbing the wall in left to rob Tim Beckham of a home run',
'timestamp': 1429124820,
'upload_date': '20150415',
}
},
{
'url': 'http://m.mlb.com/shared/video/embed/embed.html?content_id=35692085&topic_id=6479266&width=400&height=224&property=mlb',
'only_matching': True,
@@ -83,6 +109,15 @@ class MLBIE(InfoExtractor):
{
'url': 'http://m.cardinals.mlb.com/stl/video/v51175783/atlstl-piscotty-makes-great-sliding-catch-on-line/?partnerId=as_mlb_20150321_42500876&adbid=579409712979910656&adbpl=tw&adbpr=52847728',
'only_matching': True,
},
{
# From http://m.mlb.com/news/article/118550098/blue-jays-kevin-pillar-goes-spidey-up-the-wall-to-rob-tim-beckham-of-a-homer
'url': 'http://mlb.mlb.com/shared/video/embed/m-internal-embed.html?content_id=75609783&property=mlb&autoplay=true&hashmode=false&siteSection=mlb/multimedia/article_118550098/article_embed&club=mlb',
'only_matching': True,
},
{
'url': 'http://washington.nationals.mlb.com/mlb/gameday/index.jsp?c_id=was&gid=2015_05_09_atlmlb_wasmlb_1&lang=en&content_id=108309983&mode=video#',
'only_matching': True,
}
]
@@ -90,6 +125,12 @@ class MLBIE(InfoExtractor):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
if not video_id:
video_path = mobj.group('path')
webpage = self._download_webpage(url, video_path)
video_id = self._search_regex(
[r'data-video-?id="(\d+)"', r'content_id=(\d+)'], webpage, 'video id')
detail = self._download_xml(
'http://m.mlb.com/gen/multimedia/detail/%s/%s/%s/%s.xml'
% (video_id[-3], video_id[-2], video_id[-1], video_id), video_id)

View File

@@ -6,6 +6,7 @@ import re
from .common import InfoExtractor
from ..compat import (
compat_urllib_parse,
compat_urlparse,
)
from ..utils import (
ExtractorError,
@@ -16,7 +17,7 @@ from ..utils import (
class NaverIE(InfoExtractor):
_VALID_URL = r'https?://(?:m\.)?tvcast\.naver\.com/v/(?P<id>\d+)'
_TEST = {
_TESTS = [{
'url': 'http://tvcast.naver.com/v/81652',
'info_dict': {
'id': '81652',
@@ -25,7 +26,18 @@ class NaverIE(InfoExtractor):
'description': '합격불변의 법칙 메가스터디 | 메가스터디 수학 김상희 선생님이 9월 모의고사 수학A형 16번에서 20번까지 해설강의를 공개합니다.',
'upload_date': '20130903',
},
}
}, {
'url': 'http://tvcast.naver.com/v/395837',
'md5': '638ed4c12012c458fefcddfd01f173cd',
'info_dict': {
'id': '395837',
'ext': 'mp4',
'title': '9년이 지나도 아픈 기억, 전효성의 아버지',
'description': 'md5:5bf200dcbf4b66eb1b350d1eb9c753f7',
'upload_date': '20150519',
},
'skip': 'Georestricted',
}]
def _real_extract(self, url):
video_id = self._match_id(url)
@@ -35,7 +47,7 @@ class NaverIE(InfoExtractor):
webpage)
if m_id is None:
m_error = re.search(
r'(?s)<div class="nation_error">\s*(?:<!--.*?-->)?\s*<p class="[^"]+">(?P<msg>.+?)</p>\s*</div>',
r'(?s)<div class="(?:nation_error|nation_box)">\s*(?:<!--.*?-->)?\s*<p class="[^"]+">(?P<msg>.+?)</p>\s*</div>',
webpage)
if m_error:
raise ExtractorError(clean_html(m_error.group('msg')), expected=True)
@@ -58,14 +70,18 @@ class NaverIE(InfoExtractor):
formats = []
for format_el in urls.findall('EncodingOptions/EncodingOption'):
domain = format_el.find('Domain').text
uri = format_el.find('uri').text
f = {
'url': domain + format_el.find('uri').text,
'url': compat_urlparse.urljoin(domain, uri),
'ext': 'mp4',
'width': int(format_el.find('width').text),
'height': int(format_el.find('height').text),
}
if domain.startswith('rtmp'):
# urlparse does not support custom schemes
# https://bugs.python.org/issue18828
f.update({
'url': domain + uri,
'ext': 'flv',
'rtmp_protocol': '1', # rtmpt
})

View File

@@ -22,6 +22,18 @@ class NBAIE(InfoExtractor):
}, {
'url': 'http://www.nba.com/video/games/hornets/2014/12/05/0021400276-nyk-cha-play5.nba/',
'only_matching': True,
}, {
'url': 'http://watch.nba.com/nba/video/channels/playoffs/2015/05/20/0041400301-cle-atl-recap.nba',
'info_dict': {
'id': '0041400301-cle-atl-recap.nba',
'ext': 'mp4',
'title': 'NBA GAME TIME | Video: Hawks vs. Cavaliers Game 1',
'description': 'md5:8094c3498d35a9bd6b1a8c396a071b4d',
'duration': 228,
},
'params': {
'skip_download': True,
}
}]
def _real_extract(self, url):
@@ -35,8 +47,12 @@ class NBAIE(InfoExtractor):
self._og_search_title(webpage, default=shortened_video_id), ' : NBA.com')
description = self._og_search_description(webpage)
duration = parse_duration(
self._html_search_meta('duration', webpage, 'duration'))
duration_str = self._html_search_meta(
'duration', webpage, 'duration', default=None)
if not duration_str:
duration_str = self._html_search_regex(
r'Duration:</b>\s*(\d+:\d+)', webpage, 'duration', fatal=False)
duration = parse_duration(duration_str)
return {
'id': shortened_video_id,

View File

@@ -10,6 +10,8 @@ from ..compat import (
from ..utils import (
ExtractorError,
find_xpath_attr,
lowercase_escape,
unescapeHTML,
)
@@ -37,14 +39,32 @@ class NBCIE(InfoExtractor):
},
'skip': 'Only works from US',
},
{
'url': 'http://www.nbc.com/saturday-night-live/video/star-wars-teaser/2832821',
'info_dict': {
'id': '8iUuyzWDdYUZ',
'ext': 'flv',
'title': 'Star Wars Teaser',
'description': 'md5:0b40f9cbde5b671a7ff62fceccc4f442',
},
'skip': 'Only works from US',
},
{
# This video has expired but with an escaped embedURL
'url': 'http://www.nbc.com/parenthood/episode-guide/season-5/just-like-at-home/515',
'skip': 'Expired'
}
]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
theplatform_url = self._search_regex(
'(?:class="video-player video-player-full" data-mpx-url|class="player" src)="(.*?)"',
webpage, 'theplatform url').replace('_no_endcard', '')
theplatform_url = unescapeHTML(lowercase_escape(self._html_search_regex(
[
r'(?:class="video-player video-player-full" data-mpx-url|class="player" src)="(.*?)"',
r'"embedURL"\s*:\s*"([^"]+)"'
],
webpage, 'theplatform url').replace('_no_endcard', '').replace('\\/', '/')))
if theplatform_url.startswith('//'):
theplatform_url = 'http:' + theplatform_url
return self.url_result(theplatform_url)

View File

@@ -8,41 +8,11 @@ from ..utils import (
ExtractorError,
int_or_none,
qualities,
parse_duration,
)
class NDRIE(InfoExtractor):
IE_NAME = 'ndr'
IE_DESC = 'NDR.de - Mediathek'
_VALID_URL = r'https?://www\.ndr\.de/.+?(?P<id>\d+)\.html'
_TESTS = [
{
'url': 'http://www.ndr.de/fernsehen/sendungen/nordmagazin/Kartoffeltage-in-der-Lewitz,nordmagazin25866.html',
'md5': '5bc5f5b92c82c0f8b26cddca34f8bb2c',
'note': 'Video file',
'info_dict': {
'id': '25866',
'ext': 'mp4',
'title': 'Kartoffeltage in der Lewitz',
'description': 'md5:48c4c04dde604c8a9971b3d4e3b9eaa8',
'duration': 166,
}
},
{
'url': 'http://www.ndr.de/info/audio51535.html',
'md5': 'bb3cd38e24fbcc866d13b50ca59307b8',
'note': 'Audio file',
'info_dict': {
'id': '51535',
'ext': 'mp3',
'title': 'La Valette entgeht der Hinrichtung',
'description': 'md5:22f9541913a40fe50091d5cdd7c9f536',
'duration': 884,
}
}
]
class NDRBaseIE(InfoExtractor):
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
@@ -54,7 +24,11 @@ class NDRIE(InfoExtractor):
if description:
description = description.strip()
duration = int_or_none(self._html_search_regex(r'duration: (\d+),\n', page, 'duration', fatal=False))
duration = int_or_none(self._html_search_regex(r'duration: (\d+),\n', page, 'duration', default=None))
if not duration:
duration = parse_duration(self._html_search_regex(
r'(<span class="min">\d+</span>:<span class="sec">\d+</span>)',
page, 'duration', default=None))
formats = []
@@ -92,3 +66,65 @@ class NDRIE(InfoExtractor):
'duration': duration,
'formats': formats,
}
class NDRIE(NDRBaseIE):
IE_NAME = 'ndr'
IE_DESC = 'NDR.de - Mediathek'
_VALID_URL = r'https?://www\.ndr\.de/.+?(?P<id>\d+)\.html'
_TESTS = [
{
'url': 'http://www.ndr.de/fernsehen/sendungen/nordmagazin/Kartoffeltage-in-der-Lewitz,nordmagazin25866.html',
'md5': '5bc5f5b92c82c0f8b26cddca34f8bb2c',
'note': 'Video file',
'info_dict': {
'id': '25866',
'ext': 'mp4',
'title': 'Kartoffeltage in der Lewitz',
'description': 'md5:48c4c04dde604c8a9971b3d4e3b9eaa8',
'duration': 166,
},
'skip': '404 Not found',
},
{
'url': 'http://www.ndr.de/fernsehen/Party-Poette-und-Parade,hafengeburtstag988.html',
'md5': 'dadc003c55ae12a5d2f6bd436cd73f59',
'info_dict': {
'id': '988',
'ext': 'mp4',
'title': 'Party, Pötte und Parade',
'description': 'Hunderttausende feiern zwischen Speicherstadt und St. Pauli den 826. Hafengeburtstag. Die NDR Sondersendung zeigt die schönsten und spektakulärsten Bilder vom Auftakt.',
'duration': 3498,
},
},
{
'url': 'http://www.ndr.de/info/audio51535.html',
'md5': 'bb3cd38e24fbcc866d13b50ca59307b8',
'note': 'Audio file',
'info_dict': {
'id': '51535',
'ext': 'mp3',
'title': 'La Valette entgeht der Hinrichtung',
'description': 'md5:22f9541913a40fe50091d5cdd7c9f536',
'duration': 884,
}
}
]
class NJoyIE(NDRBaseIE):
IE_NAME = 'N-JOY'
_VALID_URL = r'https?://www\.n-joy\.de/.+?(?P<id>\d+)\.html'
_TEST = {
'url': 'http://www.n-joy.de/entertainment/comedy/comedy_contest/Benaissa-beim-NDR-Comedy-Contest,comedycontest2480.html',
'md5': 'cb63be60cd6f9dd75218803146d8dc67',
'info_dict': {
'id': '2480',
'ext': 'mp4',
'title': 'Benaissa beim NDR Comedy Contest',
'description': 'Von seinem sehr "behaarten" Leben lässt sich Benaissa trotz aller Schwierigkeiten nicht unterkriegen.',
'duration': 654,
}
}

View File

@@ -49,7 +49,7 @@ class NetzkinoIE(InfoExtractor):
'http://www.netzkino.de/beta/dist/production.min.js', video_id,
note='Downloading player code')
avo_js = self._search_regex(
r'window\.avoCore\s*=.*?urlTemplate:\s*(\{.*?"\})',
r'var urlTemplate=(\{.*?"\})',
production_js, 'URL templates')
templates = self._parse_json(
avo_js, video_id, transform_source=js_to_json)

View File

@@ -89,8 +89,8 @@ class NextMediaActionNewsIE(NextMediaIE):
return self._extract_from_nextmedia_page(news_id, url, article_page)
class AppleDailyRealtimeNewsIE(NextMediaIE):
_VALID_URL = r'http://(www|ent).appledaily.com.tw/(realtimenews|enews)/[^/]+/[^/]+/(?P<date>\d+)/(?P<id>\d+)(/.*)?'
class AppleDailyIE(NextMediaIE):
_VALID_URL = r'http://(www|ent).appledaily.com.tw/(?:animation|appledaily|enews|realtimenews)/[^/]+/[^/]+/(?P<date>\d+)/(?P<id>\d+)(/.*)?'
_TESTS = [{
'url': 'http://ent.appledaily.com.tw/enews/article/entertainment/20150128/36354694',
'md5': 'a843ab23d150977cc55ef94f1e2c1e4d',
@@ -99,7 +99,7 @@ class AppleDailyRealtimeNewsIE(NextMediaIE):
'ext': 'mp4',
'title': '周亭羽走過摩鐵陰霾2男陪吃 九把刀孤寒看醫生',
'thumbnail': 're:^https?://.*\.jpg$',
'description': 'md5:b23787119933404ce515c6356a8c355c',
'description': 'md5:2acd430e59956dc47cd7f67cb3c003f4',
'upload_date': '20150128',
}
}, {
@@ -110,26 +110,10 @@ class AppleDailyRealtimeNewsIE(NextMediaIE):
'ext': 'mp4',
'title': '不滿被踩腳 山東兩大媽一路打下車',
'thumbnail': 're:^https?://.*\.jpg$',
'description': 'md5:2648aaf6fc4f401f6de35a91d111aa1d',
'description': 'md5:175b4260c1d7c085993474217e4ab1b4',
'upload_date': '20150128',
}
}]
_URL_PATTERN = r'\{url: \'(.+)\'\}'
def _fetch_title(self, page):
return self._html_search_regex(r'<h1 id="h1">([^<>]+)</h1>', page, 'news title')
def _fetch_thumbnail(self, page):
return self._html_search_regex(r"setInitialImage\(\'([^']+)'\)", page, 'video thumbnail', fatal=False)
def _fetch_timestamp(self, page):
return None
class AppleDailyAnimationNewsIE(AppleDailyRealtimeNewsIE):
_VALID_URL = 'http://www.appledaily.com.tw/animation/[^/]+/[^/]+/(?P<date>\d+)/(?P<id>\d+)(/.*)?'
_TESTS = [{
}, {
'url': 'http://www.appledaily.com.tw/animation/realtimenews/new/20150128/5003671',
'md5': '03df296d95dedc2d5886debbb80cb43f',
'info_dict': {
@@ -154,10 +138,22 @@ class AppleDailyAnimationNewsIE(AppleDailyRealtimeNewsIE):
'expected_warnings': [
'video thumbnail',
]
}, {
'url': 'http://www.appledaily.com.tw/appledaily/article/supplement/20140417/35770334/',
'only_matching': True,
}]
_URL_PATTERN = r'\{url: \'(.+)\'\}'
def _fetch_title(self, page):
return self._html_search_meta('description', page, 'news title')
return (self._html_search_regex(r'<h1 id="h1">([^<>]+)</h1>', page, 'news title', default=None) or
self._html_search_meta('description', page, 'news title'))
def _fetch_thumbnail(self, page):
return self._html_search_regex(r"setInitialImage\(\'([^']+)'\)", page, 'video thumbnail', fatal=False)
def _fetch_timestamp(self, page):
return None
def _fetch_description(self, page):
return self._html_search_meta('description', page, 'news description')

View File

@@ -21,6 +21,9 @@ class NHLBaseInfoExtractor(InfoExtractor):
return json_string.replace('\\\'', '\'')
def _real_extract_video(self, video_id):
vid_parts = video_id.split(',')
if len(vid_parts) == 3:
video_id = '%s0%s%s-X-h' % (vid_parts[0][:4], vid_parts[1], vid_parts[2].rjust(4, '0'))
json_url = 'http://video.nhl.com/videocenter/servlets/playlist?ids=%s&format=json' % video_id
data = self._download_json(
json_url, video_id, transform_source=self._fix_json)
@@ -47,7 +50,7 @@ class NHLBaseInfoExtractor(InfoExtractor):
video_url = initial_video_url
join = compat_urlparse.urljoin
return {
ret = {
'id': video_id,
'title': info['name'],
'url': video_url,
@@ -56,11 +59,20 @@ class NHLBaseInfoExtractor(InfoExtractor):
'thumbnail': join(join(video_url, '/u/'), info['bigImage']),
'upload_date': unified_strdate(info['releaseDate'].split('.')[0]),
}
if video_url.startswith('rtmp:'):
mobj = re.match(r'(?P<tc_url>rtmp://[^/]+/(?P<app>[a-z0-9/]+))/(?P<play_path>mp4:.*)', video_url)
ret.update({
'tc_url': mobj.group('tc_url'),
'play_path': mobj.group('play_path'),
'app': mobj.group('app'),
'no_resume': True,
})
return ret
class NHLIE(NHLBaseInfoExtractor):
IE_NAME = 'nhl.com'
_VALID_URL = r'https?://video(?P<team>\.[^.]*)?\.nhl\.com/videocenter/(?:console)?(?:\?(?:.*?[?&])?)id=(?P<id>[-0-9a-zA-Z]+)'
_VALID_URL = r'https?://video(?P<team>\.[^.]*)?\.nhl\.com/videocenter/(?:console)?(?:\?(?:.*?[?&])?)(?:id|hlg)=(?P<id>[-0-9a-zA-Z,]+)'
_TESTS = [{
'url': 'http://video.canucks.nhl.com/videocenter/console?catid=6?id=453614',
@@ -101,6 +113,29 @@ class NHLIE(NHLBaseInfoExtractor):
}, {
'url': 'http://video.nhl.com/videocenter/?id=736722',
'only_matching': True,
}, {
'url': 'http://video.nhl.com/videocenter/console?hlg=20142015,2,299&lang=en',
'md5': '076fcb88c255154aacbf0a7accc3f340',
'info_dict': {
'id': '2014020299-X-h',
'ext': 'mp4',
'title': 'Penguins at Islanders / Game Highlights',
'description': 'Home broadcast - Pittsburgh Penguins at New York Islanders - November 22, 2014',
'duration': 268,
'upload_date': '20141122',
}
}, {
'url': 'http://video.oilers.nhl.com/videocenter/console?id=691469&catid=4',
'info_dict': {
'id': '691469',
'ext': 'mp4',
'title': 'RAW | Craig MacTavish Full Press Conference',
'description': 'Oilers GM Craig MacTavish addresses the media at Rexall Place on Friday.',
'upload_date': '20141205',
},
'params': {
'skip_download': True, # Requires rtmpdump
}
}]
def _real_extract(self, url):

View File

@@ -3,6 +3,7 @@ from __future__ import unicode_literals
import re
import json
import datetime
from .common import InfoExtractor
from ..compat import (
@@ -14,7 +15,9 @@ from ..utils import (
ExtractorError,
int_or_none,
parse_duration,
unified_strdate,
parse_iso8601,
xpath_text,
determine_ext,
)
@@ -32,30 +35,50 @@ class NiconicoIE(InfoExtractor):
'uploader': 'takuya0301',
'uploader_id': '2698420',
'upload_date': '20131123',
'timestamp': 1385182762,
'description': '(c) copyright 2008, Blender Foundation / www.bigbuckbunny.org',
'duration': 33,
},
'params': {
'username': 'ydl.niconico@gmail.com',
'password': 'youtube-dl',
},
}, {
# File downloaded with and without credentials are different, so omit
# the md5 field
'url': 'http://www.nicovideo.jp/watch/nm14296458',
'md5': '8db08e0158457cf852a31519fceea5bc',
'info_dict': {
'id': 'nm14296458',
'ext': 'swf',
'title': '【鏡音リン】Dance on media【オリジナル】take2!',
'description': 'md5:',
'description': 'md5:689f066d74610b3b22e0f1739add0f58',
'uploader': 'りょうた',
'uploader_id': '18822557',
'upload_date': '20110429',
'timestamp': 1304065916,
'duration': 209,
},
'params': {
'username': 'ydl.niconico@gmail.com',
'password': 'youtube-dl',
}, {
# 'video exists but is marked as "deleted"
# md5 is unstable
'url': 'http://www.nicovideo.jp/watch/sm10000',
'info_dict': {
'id': 'sm10000',
'ext': 'unknown_video',
'description': 'deleted',
'title': 'ドラえもんエターナル第3話「決戦第3新東京市」前編',
'upload_date': '20071224',
'timestamp': 1198527840, # timestamp field has different value if logged in
'duration': 304,
},
}, {
'url': 'http://www.nicovideo.jp/watch/so22543406',
'info_dict': {
'id': '1388129933',
'ext': 'mp4',
'title': '【第1回】RADIOアニメロミックス ラブライブのぞえりRadio Garden',
'description': 'md5:b27d224bb0ff53d3c8269e9f8b561cf1',
'timestamp': 1388851200,
'upload_date': '20140104',
'uploader': 'アニメロチャンネル',
'uploader_id': '312',
}
}]
_VALID_URL = r'https?://(?:www\.|secure\.)?nicovideo\.jp/watch/(?P<id>(?:[a-z]{2})?[0-9]+)'
@@ -95,9 +118,13 @@ class NiconicoIE(InfoExtractor):
def _real_extract(self, url):
video_id = self._match_id(url)
# Get video webpage. We are not actually interested in it, but need
# the cookies in order to be able to download the info webpage
self._download_webpage('http://www.nicovideo.jp/watch/' + video_id, video_id)
# Get video webpage. We are not actually interested in it for normal
# cases, but need the cookies in order to be able to download the
# info webpage
webpage, handle = self._download_webpage_handle(
'http://www.nicovideo.jp/watch/' + video_id, video_id)
if video_id.startswith('so'):
video_id = self._match_id(handle.geturl())
video_info = self._download_xml(
'http://ext.nicovideo.jp/api/getthumbinfo/' + video_id, video_id,
@@ -127,22 +154,78 @@ class NiconicoIE(InfoExtractor):
flv_info_request, video_id,
note='Downloading flv info', errnote='Unable to download flv info')
if 'deleted=' in flv_info_webpage:
raise ExtractorError('The video has been deleted.',
expected=True)
video_real_url = compat_urlparse.parse_qs(flv_info_webpage)['url'][0]
flv_info = compat_urlparse.parse_qs(flv_info_webpage)
if 'url' not in flv_info:
if 'deleted' in flv_info:
raise ExtractorError('The video has been deleted.',
expected=True)
else:
raise ExtractorError('Unable to find video URL')
video_real_url = flv_info['url'][0]
# Start extracting information
title = video_info.find('.//title').text
extension = video_info.find('.//movie_type').text
title = xpath_text(video_info, './/title')
if not title:
title = self._og_search_title(webpage, default=None)
if not title:
title = self._html_search_regex(
r'<span[^>]+class="videoHeaderTitle"[^>]*>([^<]+)</span>',
webpage, 'video title')
watch_api_data_string = self._html_search_regex(
r'<div[^>]+id="watchAPIDataContainer"[^>]+>([^<]+)</div>',
webpage, 'watch api data', default=None)
watch_api_data = self._parse_json(watch_api_data_string, video_id) if watch_api_data_string else {}
video_detail = watch_api_data.get('videoDetail', {})
extension = xpath_text(video_info, './/movie_type')
if not extension:
extension = determine_ext(video_real_url)
video_format = extension.upper()
thumbnail = video_info.find('.//thumbnail_url').text
description = video_info.find('.//description').text
upload_date = unified_strdate(video_info.find('.//first_retrieve').text.split('+')[0])
view_count = int_or_none(video_info.find('.//view_counter').text)
comment_count = int_or_none(video_info.find('.//comment_num').text)
duration = parse_duration(video_info.find('.//length').text)
webpage_url = video_info.find('.//watch_url').text
thumbnail = (
xpath_text(video_info, './/thumbnail_url') or
self._html_search_meta('image', webpage, 'thumbnail', default=None) or
video_detail.get('thumbnail'))
description = xpath_text(video_info, './/description')
timestamp = parse_iso8601(xpath_text(video_info, './/first_retrieve'))
if not timestamp:
match = self._html_search_meta('datePublished', webpage, 'date published', default=None)
if match:
timestamp = parse_iso8601(match.replace('+', ':00+'))
if not timestamp and video_detail.get('postedAt'):
timestamp = parse_iso8601(
video_detail['postedAt'].replace('/', '-'),
delimiter=' ', timezone=datetime.timedelta(hours=9))
view_count = int_or_none(xpath_text(video_info, './/view_counter'))
if not view_count:
match = self._html_search_regex(
r'>Views: <strong[^>]*>([^<]+)</strong>',
webpage, 'view count', default=None)
if match:
view_count = int_or_none(match.replace(',', ''))
view_count = view_count or video_detail.get('viewCount')
comment_count = int_or_none(xpath_text(video_info, './/comment_num'))
if not comment_count:
match = self._html_search_regex(
r'>Comments: <strong[^>]*>([^<]+)</strong>',
webpage, 'comment count', default=None)
if match:
comment_count = int_or_none(match.replace(',', ''))
comment_count = comment_count or video_detail.get('commentCount')
duration = (parse_duration(
xpath_text(video_info, './/length') or
self._html_search_meta(
'video:duration', webpage, 'video duration', default=None)) or
video_detail.get('length'))
webpage_url = xpath_text(video_info, './/watch_url') or url
if video_info.find('.//ch_id') is not None:
uploader_id = video_info.find('.//ch_id').text
@@ -162,7 +245,7 @@ class NiconicoIE(InfoExtractor):
'thumbnail': thumbnail,
'description': description,
'uploader': uploader,
'upload_date': upload_date,
'timestamp': timestamp,
'uploader_id': uploader_id,
'view_count': view_count,
'comment_count': comment_count,

View File

@@ -14,7 +14,9 @@ from ..compat import (
from ..utils import (
clean_html,
ExtractorError,
unified_strdate,
int_or_none,
float_or_none,
parse_iso8601,
)
@@ -25,21 +27,38 @@ class NocoIE(InfoExtractor):
_SUB_LANG_TEMPLATE = '&sub_lang=%s'
_NETRC_MACHINE = 'noco'
_TEST = {
'url': 'http://noco.tv/emission/11538/nolife/ami-ami-idol-hello-france/',
'md5': '0a993f0058ddbcd902630b2047ef710e',
'info_dict': {
'id': '11538',
'ext': 'mp4',
'title': 'Ami Ami Idol - Hello! France',
'description': 'md5:4eaab46ab68fa4197a317a88a53d3b86',
'upload_date': '20140412',
'uploader': 'Nolife',
'uploader_id': 'NOL',
'duration': 2851.2,
_TESTS = [
{
'url': 'http://noco.tv/emission/11538/nolife/ami-ami-idol-hello-france/',
'md5': '0a993f0058ddbcd902630b2047ef710e',
'info_dict': {
'id': '11538',
'ext': 'mp4',
'title': 'Ami Ami Idol - Hello! France',
'description': 'md5:4eaab46ab68fa4197a317a88a53d3b86',
'upload_date': '20140412',
'uploader': 'Nolife',
'uploader_id': 'NOL',
'duration': 2851.2,
},
'skip': 'Requires noco account',
},
'skip': 'Requires noco account',
}
{
'url': 'http://noco.tv/emission/12610/lbl42/the-guild/s01e01-wake-up-call',
'md5': 'c190f1f48e313c55838f1f412225934d',
'info_dict': {
'id': '12610',
'ext': 'mp4',
'title': 'The Guild #1 - Wake-Up Call',
'timestamp': 1403863200,
'upload_date': '20140627',
'uploader': 'LBL42',
'uploader_id': 'LBL',
'duration': 233.023,
},
'skip': 'Requires noco account',
}
]
def _real_initialize(self):
self._login()
@@ -90,51 +109,66 @@ class NocoIE(InfoExtractor):
'shows/%s/medias' % video_id,
video_id, 'Downloading video JSON')
show = self._call_api(
'shows/by_id/%s' % video_id,
video_id, 'Downloading show JSON')[0]
options = self._call_api(
'users/init', video_id,
'Downloading user options JSON')['options']
audio_lang_pref = options.get('audio_language') or options.get('language', 'fr')
if audio_lang_pref == 'original':
audio_lang_pref = show['original_lang']
if len(medias) == 1:
audio_lang_pref = list(medias.keys())[0]
elif audio_lang_pref not in medias:
audio_lang_pref = 'fr'
qualities = self._call_api(
'qualities',
video_id, 'Downloading qualities JSON')
formats = []
for lang, lang_dict in medias['fr']['video_list'].items():
for format_id, fmt in lang_dict['quality_list'].items():
format_id_extended = '%s-%s' % (lang, format_id) if lang != 'none' else format_id
for audio_lang, audio_lang_dict in medias.items():
preference = 1 if audio_lang == audio_lang_pref else 0
for sub_lang, lang_dict in audio_lang_dict['video_list'].items():
for format_id, fmt in lang_dict['quality_list'].items():
format_id_extended = 'audio-%s_sub-%s_%s' % (audio_lang, sub_lang, format_id)
video = self._call_api(
'shows/%s/video/%s/fr' % (video_id, format_id.lower()),
video_id, 'Downloading %s video JSON' % format_id_extended,
lang if lang != 'none' else None)
video = self._call_api(
'shows/%s/video/%s/%s' % (video_id, format_id.lower(), audio_lang),
video_id, 'Downloading %s video JSON' % format_id_extended,
sub_lang if sub_lang != 'none' else None)
file_url = video['file']
if not file_url:
continue
file_url = video['file']
if not file_url:
continue
if file_url in ['forbidden', 'not found']:
popmessage = video['popmessage']
self._raise_error(popmessage['title'], popmessage['message'])
if file_url in ['forbidden', 'not found']:
popmessage = video['popmessage']
self._raise_error(popmessage['title'], popmessage['message'])
formats.append({
'url': file_url,
'format_id': format_id_extended,
'width': fmt['res_width'],
'height': fmt['res_lines'],
'abr': fmt['audiobitrate'],
'vbr': fmt['videobitrate'],
'filesize': fmt['filesize'],
'format_note': qualities[format_id]['quality_name'],
'preference': qualities[format_id]['priority'],
})
formats.append({
'url': file_url,
'format_id': format_id_extended,
'width': int_or_none(fmt.get('res_width')),
'height': int_or_none(fmt.get('res_lines')),
'abr': int_or_none(fmt.get('audiobitrate')),
'vbr': int_or_none(fmt.get('videobitrate')),
'filesize': int_or_none(fmt.get('filesize')),
'format_note': qualities[format_id].get('quality_name'),
'quality': qualities[format_id].get('priority'),
'preference': preference,
})
self._sort_formats(formats)
show = self._call_api(
'shows/by_id/%s' % video_id,
video_id, 'Downloading show JSON')[0]
upload_date = unified_strdate(show['online_date_start_utc'])
uploader = show['partner_name']
uploader_id = show['partner_key']
duration = show['duration_ms'] / 1000.0
timestamp = parse_iso8601(show.get('online_date_start_utc'), ' ')
uploader = show.get('partner_name')
uploader_id = show.get('partner_key')
duration = float_or_none(show.get('duration_ms'), 1000)
thumbnails = []
for thumbnail_key, thumbnail_url in show.items():
@@ -166,7 +200,7 @@ class NocoIE(InfoExtractor):
'title': title,
'description': description,
'thumbnails': thumbnails,
'upload_date': upload_date,
'timestamp': timestamp,
'uploader': uploader,
'uploader_id': uploader_id,
'duration': duration,

View File

@@ -0,0 +1,192 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
ExtractorError,
int_or_none,
parse_iso8601,
parse_duration,
remove_start,
)
class NowTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?nowtv\.de/(?P<station>rtl|rtl2|rtlnitro|superrtl|ntv|vox)/(?P<id>.+?)/player'
_TESTS = [{
# rtl
'url': 'http://www.nowtv.de/rtl/bauer-sucht-frau/die-neuen-bauern-und-eine-hochzeit/player',
'info_dict': {
'id': '203519',
'display_id': 'bauer-sucht-frau/die-neuen-bauern-und-eine-hochzeit',
'ext': 'mp4',
'title': 'Die neuen Bauern und eine Hochzeit',
'description': 'md5:e234e1ed6d63cf06be5c070442612e7e',
'thumbnail': 're:^https?://.*\.jpg$',
'timestamp': 1432580700,
'upload_date': '20150525',
'duration': 2786,
},
'params': {
# m3u8 download
'skip_download': True,
},
}, {
# rtl2
'url': 'http://www.nowtv.de/rtl2/berlin-tag-nacht/berlin-tag-nacht-folge-934/player',
'info_dict': {
'id': '203481',
'display_id': 'berlin-tag-nacht/berlin-tag-nacht-folge-934',
'ext': 'mp4',
'title': 'Berlin - Tag & Nacht (Folge 934)',
'description': 'md5:c85e88c2e36c552dfe63433bc9506dd0',
'thumbnail': 're:^https?://.*\.jpg$',
'timestamp': 1432666800,
'upload_date': '20150526',
'duration': 2641,
},
'params': {
# m3u8 download
'skip_download': True,
},
}, {
# rtlnitro
'url': 'http://www.nowtv.de/rtlnitro/alarm-fuer-cobra-11-die-autobahnpolizei/hals-und-beinbruch-2014-08-23-21-10-00/player',
'info_dict': {
'id': '165780',
'display_id': 'alarm-fuer-cobra-11-die-autobahnpolizei/hals-und-beinbruch-2014-08-23-21-10-00',
'ext': 'mp4',
'title': 'Hals- und Beinbruch',
'description': 'md5:b50d248efffe244e6f56737f0911ca57',
'thumbnail': 're:^https?://.*\.jpg$',
'timestamp': 1432415400,
'upload_date': '20150523',
'duration': 2742,
},
'params': {
# m3u8 download
'skip_download': True,
},
}, {
# superrtl
'url': 'http://www.nowtv.de/superrtl/medicopter-117/angst/player',
'info_dict': {
'id': '99205',
'display_id': 'medicopter-117/angst',
'ext': 'mp4',
'title': 'Angst!',
'description': 'md5:30cbc4c0b73ec98bcd73c9f2a8c17c4e',
'thumbnail': 're:^https?://.*\.jpg$',
'timestamp': 1222632900,
'upload_date': '20080928',
'duration': 3025,
},
'params': {
# m3u8 download
'skip_download': True,
},
}, {
# ntv
'url': 'http://www.nowtv.de/ntv/ratgeber-geld/thema-ua-der-erste-blick-die-apple-watch/player',
'info_dict': {
'id': '203521',
'display_id': 'ratgeber-geld/thema-ua-der-erste-blick-die-apple-watch',
'ext': 'mp4',
'title': 'Thema u.a.: Der erste Blick: Die Apple Watch',
'description': 'md5:4312b6c9d839ffe7d8caf03865a531af',
'thumbnail': 're:^https?://.*\.jpg$',
'timestamp': 1432751700,
'upload_date': '20150527',
'duration': 1083,
},
'params': {
# m3u8 download
'skip_download': True,
},
}, {
# vox
'url': 'http://www.nowtv.de/vox/der-hundeprofi/buero-fall-chihuahua-joel/player',
'info_dict': {
'id': '128953',
'display_id': 'der-hundeprofi/buero-fall-chihuahua-joel',
'ext': 'mp4',
'title': "Büro-Fall / Chihuahua 'Joel'",
'description': 'md5:e62cb6bf7c3cc669179d4f1eb279ad8d',
'thumbnail': 're:^https?://.*\.jpg$',
'timestamp': 1432408200,
'upload_date': '20150523',
'duration': 3092,
},
'params': {
# m3u8 download
'skip_download': True,
},
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
display_id = mobj.group('id')
station = mobj.group('station')
info = self._download_json(
'https://api.nowtv.de/v3/movies/%s?fields=*,format,files' % display_id,
display_id)
video_id = compat_str(info['id'])
files = info['files']
if not files:
if info.get('geoblocked', False):
raise ExtractorError(
'Video %s is not available from your location due to geo restriction' % video_id,
expected=True)
if not info.get('free', True):
raise ExtractorError(
'Video %s is not available for free' % video_id, expected=True)
f = info.get('format', {})
station = f.get('station') or station
STATIONS = {
'rtl': 'rtlnow',
'rtl2': 'rtl2now',
'vox': 'voxnow',
'nitro': 'rtlnitronow',
'ntv': 'n-tvnow',
'superrtl': 'superrtlnow'
}
formats = []
for item in files['items']:
item_path = remove_start(item['path'], '/')
tbr = int_or_none(item['bitrate'])
m3u8_url = 'http://hls.fra.%s.de/hls-vod-enc/%s.m3u8' % (STATIONS[station], item_path)
m3u8_url = m3u8_url.replace('now/', 'now/videos/')
formats.append({
'url': m3u8_url,
'format_id': '%s-%sk' % (item['id'], tbr),
'ext': 'mp4',
'tbr': tbr,
})
self._sort_formats(formats)
title = info['title']
description = info.get('articleLong') or info.get('articleShort')
timestamp = parse_iso8601(info.get('broadcastStartDate'), ' ')
duration = parse_duration(info.get('duration'))
thumbnail = f.get('defaultImage169Format') or f.get('defaultImage169Logo')
return {
'id': video_id,
'display_id': display_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'timestamp': timestamp,
'duration': duration,
'formats': formats,
}

View File

@@ -4,7 +4,6 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
ExtractorError,
float_or_none,
@@ -200,20 +199,10 @@ class NRKTVIE(InfoExtractor):
url = "%s%s" % (baseurl, subtitlesurl)
self._debug_print('%s: Subtitle url: %s' % (video_id, url))
captions = self._download_xml(
url, video_id, 'Downloading subtitles',
transform_source=lambda s: s.replace(r'<br />', '\r\n'))
url, video_id, 'Downloading subtitles')
lang = captions.get('lang', 'no')
ps = captions.findall('./{0}body/{0}div/{0}p'.format('{http://www.w3.org/ns/ttml}'))
srt = ''
for pos, p in enumerate(ps):
begin = parse_duration(p.get('begin'))
duration = parse_duration(p.get('dur'))
starttime = self._subtitles_timecode(begin)
endtime = self._subtitles_timecode(begin + duration)
srt += '%s\r\n%s --> %s\r\n%s\r\n\r\n' % (compat_str(pos), starttime, endtime, p.text)
return {lang: [
{'ext': 'ttml', 'url': url},
{'ext': 'srt', 'data': srt},
]}
def _extract_f4m(self, manifest_url, video_id):

View File

@@ -8,30 +8,8 @@ from ..utils import (
)
class NYTimesIE(InfoExtractor):
_VALID_URL = r'https?://(?:(?:www\.)?nytimes\.com/video/(?:[^/]+/)+?|graphics8\.nytimes\.com/bcvideo/\d+(?:\.\d+)?/iframe/embed\.html\?videoId=)(?P<id>\d+)'
_TESTS = [{
'url': 'http://www.nytimes.com/video/opinion/100000002847155/verbatim-what-is-a-photocopier.html?playlistId=100000001150263',
'md5': '18a525a510f942ada2720db5f31644c0',
'info_dict': {
'id': '100000002847155',
'ext': 'mov',
'title': 'Verbatim: What Is a Photocopier?',
'description': 'md5:93603dada88ddbda9395632fdc5da260',
'timestamp': 1398631707,
'upload_date': '20140427',
'uploader': 'Brett Weiner',
'duration': 419,
}
}, {
'url': 'http://www.nytimes.com/video/travel/100000003550828/36-hours-in-dubai.html',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
class NYTimesBaseIE(InfoExtractor):
def _extract_video_from_id(self, video_id):
video_data = self._download_json(
'http://www.nytimes.com/svc/video/api/v2/video/%s' % video_id,
video_id, 'Downloading video JSON')
@@ -81,3 +59,59 @@ class NYTimesIE(InfoExtractor):
'formats': formats,
'thumbnails': thumbnails,
}
class NYTimesIE(NYTimesBaseIE):
_VALID_URL = r'https?://(?:(?:www\.)?nytimes\.com/video/(?:[^/]+/)+?|graphics8\.nytimes\.com/bcvideo/\d+(?:\.\d+)?/iframe/embed\.html\?videoId=)(?P<id>\d+)'
_TESTS = [{
'url': 'http://www.nytimes.com/video/opinion/100000002847155/verbatim-what-is-a-photocopier.html?playlistId=100000001150263',
'md5': '18a525a510f942ada2720db5f31644c0',
'info_dict': {
'id': '100000002847155',
'ext': 'mov',
'title': 'Verbatim: What Is a Photocopier?',
'description': 'md5:93603dada88ddbda9395632fdc5da260',
'timestamp': 1398631707,
'upload_date': '20140427',
'uploader': 'Brett Weiner',
'duration': 419,
}
}, {
'url': 'http://www.nytimes.com/video/travel/100000003550828/36-hours-in-dubai.html',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
return self._extract_video_from_id(video_id)
class NYTimesArticleIE(NYTimesBaseIE):
_VALID_URL = r'https?://(?:www\.)?nytimes\.com/(.(?<!video))*?/(?:[^/]+/)*(?P<id>[^.]+)(?:\.html)?'
_TESTS = [{
'url': 'http://www.nytimes.com/2015/04/14/business/owner-of-gravity-payments-a-credit-card-processor-is-setting-a-new-minimum-wage-70000-a-year.html?_r=0',
'md5': 'e2076d58b4da18e6a001d53fd56db3c9',
'info_dict': {
'id': '100000003628438',
'ext': 'mov',
'title': 'New Minimum Wage: $70,000 a Year',
'description': 'Dan Price, C.E.O. of Gravity Payments, surprised his 120-person staff by announcing that he planned over the next three years to raise the salary of every employee to $70,000 a year.',
'timestamp': 1429033037,
'upload_date': '20150414',
'uploader': 'Matthew Williams',
}
}, {
'url': 'http://www.nytimes.com/news/minute/2014/03/17/times-minute-whats-next-in-crimea/?_php=true&_type=blogs&_php=true&_type=blogs&_r=1',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
video_id = self._html_search_regex(r'data-videoid="(\d+)"', webpage, 'video id')
return self._extract_video_from_id(video_id)

View File

@@ -2,16 +2,19 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_urllib_parse
from ..utils import (
unified_strdate,
int_or_none,
qualities,
unescapeHTML,
)
class OdnoklassnikiIE(InfoExtractor):
_VALID_URL = r'https?://(?:odnoklassniki|ok)\.ru/(?:video|web-api/video/moviePlayer)/(?P<id>\d+)'
_VALID_URL = r'https?://(?:odnoklassniki|ok)\.ru/(?:video|web-api/video/moviePlayer)/(?P<id>[\d-]+)'
_TESTS = [{
# metadata in JSON
'url': 'http://ok.ru/video/20079905452',
'md5': '8e24ad2da6f387948e7a7d44eb8668fe',
'info_dict': {
@@ -19,11 +22,22 @@ class OdnoklassnikiIE(InfoExtractor):
'ext': 'mp4',
'title': 'Культура меняет нас (прекрасный ролик!))',
'duration': 100,
'upload_date': '20141207',
'uploader_id': '330537914540',
'uploader': 'Виталий Добровольский',
'like_count': int,
'age_limit': 0,
},
}, {
# metadataUrl
'url': 'http://ok.ru/video/63567059965189-0',
'md5': '9676cf86eff5391d35dea675d224e131',
'info_dict': {
'id': '63567059965189-0',
'ext': 'mp4',
'title': 'Девушка без комплексов ...',
'duration': 191,
'uploader_id': '534380003155',
'uploader': 'Андрей Мещанинов',
'like_count': int,
},
}, {
'url': 'http://ok.ru/web-api/video/moviePlayer/20079905452',
@@ -33,14 +47,23 @@ class OdnoklassnikiIE(InfoExtractor):
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
webpage = self._download_webpage(
'http://ok.ru/video/%s' % video_id, video_id)
player = self._parse_json(
self._search_regex(
r"OKVideo\.start\(({.+?})\s*,\s*'VideoAutoplay_player'", webpage, 'player'),
unescapeHTML(self._search_regex(
r'data-attributes="([^"]+)"', webpage, 'player')),
video_id)
metadata = self._parse_json(player['flashvars']['metadata'], video_id)
flashvars = player['flashvars']
metadata = flashvars.get('metadata')
if metadata:
metadata = self._parse_json(metadata, video_id)
else:
metadata = self._download_json(
compat_urllib_parse.unquote(flashvars['metadataUrl']),
video_id, 'Downloading metadata JSON')
movie = metadata['movie']
title = movie['title']
@@ -52,11 +75,11 @@ class OdnoklassnikiIE(InfoExtractor):
uploader = author.get('name')
upload_date = unified_strdate(self._html_search_meta(
'ya:ovs:upload_date', webpage, 'upload date'))
'ya:ovs:upload_date', webpage, 'upload date', default=None))
age_limit = None
adult = self._html_search_meta(
'ya:ovs:adult', webpage, 'age limit')
'ya:ovs:adult', webpage, 'age limit', default=None)
if adult:
age_limit = 18 if adult == 'true' else 0

View File

@@ -1,15 +1,111 @@
from __future__ import unicode_literals
import re
import json
import base64
from .common import InfoExtractor
from ..utils import (
unescapeHTML,
ExtractorError,
determine_ext,
int_or_none,
)
class OoyalaIE(InfoExtractor):
class OoyalaBaseIE(InfoExtractor):
def _extract_result(self, info, more_info):
embedCode = info['embedCode']
video_url = info.get('ipad_url') or info['url']
if determine_ext(video_url) == 'm3u8':
formats = self._extract_m3u8_formats(video_url, embedCode, ext='mp4')
else:
formats = [{
'url': video_url,
'ext': 'mp4',
}]
return {
'id': embedCode,
'title': unescapeHTML(info['title']),
'formats': formats,
'description': unescapeHTML(more_info['description']),
'thumbnail': more_info['promo'],
}
def _extract(self, player_url, video_id):
player = self._download_webpage(player_url, video_id)
mobile_url = self._search_regex(r'mobile_player_url="(.+?)&device="',
player, 'mobile player url')
# Looks like some videos are only available for particular devices
# (e.g. http://player.ooyala.com/player.js?embedCode=x1b3lqZDq9y_7kMyC2Op5qo-p077tXD0
# is only available for ipad)
# Working around with fetching URLs for all the devices found starting with 'unknown'
# until we succeed or eventually fail for each device.
devices = re.findall(r'device\s*=\s*"([^"]+)";', player)
devices.remove('unknown')
devices.insert(0, 'unknown')
for device in devices:
mobile_player = self._download_webpage(
'%s&device=%s' % (mobile_url, device), video_id,
'Downloading mobile player JS for %s device' % device)
videos_info = self._search_regex(
r'var streams=window.oo_testEnv\?\[\]:eval\("\((\[{.*?}\])\)"\);',
mobile_player, 'info', fatal=False, default=None)
if videos_info:
break
if not videos_info:
formats = []
auth_data = self._download_json(
'http://player.ooyala.com/sas/player_api/v1/authorization/embed_code/%s/%s?domain=www.example.org&supportedFormats=mp4,webm' % (video_id, video_id),
video_id)
cur_auth_data = auth_data['authorization_data'][video_id]
for stream in cur_auth_data['streams']:
formats.append({
'url': base64.b64decode(stream['url']['data'].encode('ascii')).decode('utf-8'),
'ext': stream.get('delivery_type'),
'format': stream.get('video_codec'),
'format_id': stream.get('profile'),
'width': int_or_none(stream.get('width')),
'height': int_or_none(stream.get('height')),
'abr': int_or_none(stream.get('audio_bitrate')),
'vbr': int_or_none(stream.get('video_bitrate')),
})
if formats:
return {
'id': video_id,
'formats': formats,
'title': 'Ooyala video',
}
if not cur_auth_data['authorized']:
raise ExtractorError(cur_auth_data['message'], expected=True)
if not videos_info:
raise ExtractorError('Unable to extract info')
videos_info = videos_info.replace('\\"', '"')
videos_more_info = self._search_regex(
r'eval\("\(({.*?\\"promo\\".*?})\)"', mobile_player, 'more info').replace('\\"', '"')
videos_info = json.loads(videos_info)
videos_more_info = json.loads(videos_more_info)
if videos_more_info.get('lineup'):
videos = [self._extract_result(info, more_info) for (info, more_info) in zip(videos_info, videos_more_info['lineup'])]
return {
'_type': 'playlist',
'id': video_id,
'title': unescapeHTML(videos_more_info['title']),
'entries': videos,
}
else:
return self._extract_result(videos_info[0], videos_more_info)
class OoyalaIE(OoyalaBaseIE):
_VALID_URL = r'(?:ooyala:|https?://.+?\.ooyala\.com/.*?(?:embedCode|ec)=)(?P<id>.+?)(&|$)'
_TESTS = [
@@ -32,6 +128,17 @@ class OoyalaIE(InfoExtractor):
'description': '',
},
},
{
# Information available only through SAS api
# From http://community.plm.automation.siemens.com/t5/News-NX-Manufacturing/Tool-Path-Divide/ba-p/4187
'url': 'http://player.ooyala.com/player.js?embedCode=FiOG81ZTrvckcchQxmalf4aQj590qTEx',
'md5': 'a84001441b35ea492bc03736e59e7935',
'info_dict': {
'id': 'FiOG81ZTrvckcchQxmalf4aQj590qTEx',
'ext': 'mp4',
'title': 'Ooyala video',
}
}
]
@staticmethod
@@ -43,55 +150,47 @@ class OoyalaIE(InfoExtractor):
return cls.url_result(cls._url_for_embed_code(embed_code),
ie=cls.ie_key())
def _extract_result(self, info, more_info):
return {
'id': info['embedCode'],
def _real_extract(self, url):
embed_code = self._match_id(url)
player_url = 'http://player.ooyala.com/player.js?embedCode=%s' % embed_code
return self._extract(player_url, embed_code)
class OoyalaExternalIE(OoyalaBaseIE):
_VALID_URL = r'''(?x)
(?:
ooyalaexternal:|
https?://.+?\.ooyala\.com/.*?\bexternalId=
)
(?P<partner_id>[^:]+)
:
(?P<id>.+)
(?:
:|
.*?&pcode=
)
(?P<pcode>.+?)
(&|$)
'''
_TEST = {
'url': 'https://player.ooyala.com/player.js?externalId=espn:10365079&pcode=1kNG061cgaoolOncv54OAO1ceO-I&adSetCode=91cDU6NuXTGKz3OdjOxFdAgJVtQcKJnI&callback=handleEvents&hasModuleParams=1&height=968&playerBrandingId=7af3bd04449c444c964f347f11873075&targetReplaceId=videoPlayer&width=1656&wmode=opaque&allowScriptAccess=always',
'info_dict': {
'id': 'FkYWtmazr6Ed8xmvILvKLWjd4QvYZpzG',
'ext': 'mp4',
'title': unescapeHTML(info['title']),
'url': info.get('ipad_url') or info['url'],
'description': unescapeHTML(more_info['description']),
'thumbnail': more_info['promo'],
}
'title': 'dm_140128_30for30Shorts___JudgingJewellv2',
'description': '',
},
'params': {
# m3u8 download
'skip_download': True,
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
embedCode = mobj.group('id')
player_url = 'http://player.ooyala.com/player.js?embedCode=%s' % embedCode
player = self._download_webpage(player_url, embedCode)
mobile_url = self._search_regex(r'mobile_player_url="(.+?)&device="',
player, 'mobile player url')
# Looks like some videos are only available for particular devices
# (e.g. http://player.ooyala.com/player.js?embedCode=x1b3lqZDq9y_7kMyC2Op5qo-p077tXD0
# is only available for ipad)
# Working around with fetching URLs for all the devices found starting with 'unknown'
# until we succeed or eventually fail for each device.
devices = re.findall(r'device\s*=\s*"([^"]+)";', player)
devices.remove('unknown')
devices.insert(0, 'unknown')
for device in devices:
mobile_player = self._download_webpage(
'%s&device=%s' % (mobile_url, device), embedCode,
'Downloading mobile player JS for %s device' % device)
videos_info = self._search_regex(
r'var streams=window.oo_testEnv\?\[\]:eval\("\((\[{.*?}\])\)"\);',
mobile_player, 'info', fatal=False, default=None)
if videos_info:
break
if not videos_info:
raise ExtractorError('Unable to extract info')
videos_info = videos_info.replace('\\"', '"')
videos_more_info = self._search_regex(
r'eval\("\(({.*?\\"promo\\".*?})\)"', mobile_player, 'more info').replace('\\"', '"')
videos_info = json.loads(videos_info)
videos_more_info = json.loads(videos_more_info)
if videos_more_info.get('lineup'):
videos = [self._extract_result(info, more_info) for (info, more_info) in zip(videos_info, videos_more_info['lineup'])]
return {
'_type': 'playlist',
'id': embedCode,
'title': unescapeHTML(videos_more_info['title']),
'entries': videos,
}
else:
return self._extract_result(videos_info[0], videos_more_info)
partner_id = mobj.group('partner_id')
video_id = mobj.group('id')
pcode = mobj.group('pcode')
player_url = 'http://player.ooyala.com/player.js?externalId=%s:%s&pcode=%s' % (partner_id, video_id, pcode)
return self._extract(player_url, video_id)

View File

@@ -5,6 +5,8 @@ import re
from .common import InfoExtractor
from ..utils import (
ExtractorError,
determine_ext,
int_or_none,
unified_strdate,
US_RATINGS,
)
@@ -149,21 +151,45 @@ class PBSIE(InfoExtractor):
for vid_id in video_id]
return self.playlist_result(entries, display_id)
info_url = 'http://video.pbs.org/videoInfo/%s?format=json' % video_id
info = self._download_json(info_url, display_id)
info = self._download_json(
'http://video.pbs.org/videoInfo/%s?format=json&type=partner' % video_id,
display_id)
redirect_url = info['alternate_encoding']['url']
redirect_info = self._download_json(
redirect_url + '?format=json', display_id,
'Downloading video url info')
if redirect_info['status'] == 'error':
if redirect_info['http_code'] == 403:
message = (
'The video is not available in your region due to '
'right restrictions')
formats = []
for encoding_name in ('recommended_encoding', 'alternate_encoding'):
redirect = info.get(encoding_name)
if not redirect:
continue
redirect_url = redirect.get('url')
if not redirect_url:
continue
redirect_info = self._download_json(
redirect_url + '?format=json', display_id,
'Downloading %s video url info' % encoding_name)
if redirect_info['status'] == 'error':
if redirect_info['http_code'] == 403:
message = (
'The video is not available in your region due to '
'right restrictions')
else:
message = redirect_info['message']
raise ExtractorError(message, expected=True)
format_url = redirect_info.get('url')
if not format_url:
continue
if determine_ext(format_url) == 'm3u8':
formats.extend(self._extract_m3u8_formats(
format_url, display_id, 'mp4', preference=1, m3u8_id='hls'))
else:
message = redirect_info['message']
raise ExtractorError(message, expected=True)
formats.append({
'url': format_url,
'format_id': redirect.get('eeid'),
})
self._sort_formats(formats)
rating_str = info.get('rating')
if rating_str is not None:
@@ -174,11 +200,10 @@ class PBSIE(InfoExtractor):
'id': video_id,
'display_id': display_id,
'title': info['title'],
'url': redirect_info['url'],
'ext': 'mp4',
'description': info['program'].get('description'),
'thumbnail': info.get('image_url'),
'duration': info.get('duration'),
'duration': int_or_none(info.get('duration')),
'age_limit': age_limit,
'upload_date': upload_date,
'formats': formats,
}

View File

@@ -71,7 +71,8 @@ class PornHubIE(InfoExtractor):
video_urls = list(map(compat_urllib_parse.unquote, re.findall(r'"quality_[0-9]{3}p":"([^"]+)', webpage)))
if webpage.find('"encrypted":true') != -1:
password = compat_urllib_parse.unquote_plus(self._html_search_regex(r'"video_title":"([^"]+)', webpage, 'password'))
password = compat_urllib_parse.unquote_plus(
self._search_regex(r'"video_title":"([^"]+)', webpage, 'password'))
video_urls = list(map(lambda s: aes_decrypt_text(s, password, 32).decode('utf-8'), video_urls))
formats = []

View File

@@ -17,7 +17,7 @@ from ..utils import (
class ProSiebenSat1IE(InfoExtractor):
IE_NAME = 'prosiebensat1'
IE_DESC = 'ProSiebenSat.1 Digital'
_VALID_URL = r'https?://(?:www\.)?(?:(?:prosieben|prosiebenmaxx|sixx|sat1|kabeleins|ran|the-voice-of-germany)\.de|fem\.com)/(?P<id>.+)'
_VALID_URL = r'https?://(?:www\.)?(?:(?:prosieben|prosiebenmaxx|sixx|sat1|kabeleins|the-voice-of-germany)\.(?:de|at)|ran\.de|fem\.com)/(?P<id>.+)'
_TESTS = [
{

View File

@@ -9,11 +9,13 @@ from .common import InfoExtractor
from ..utils import (
strip_jsonp,
unescapeHTML,
js_to_json,
)
from ..compat import compat_urllib_request
class QQMusicIE(InfoExtractor):
IE_NAME = 'qqmusic'
_VALID_URL = r'http://y.qq.com/#type=song&mid=(?P<id>[0-9A-Za-z]+)'
_TESTS = [{
'url': 'http://y.qq.com/#type=song&mid=004295Et37taLD',
@@ -24,7 +26,7 @@ class QQMusicIE(InfoExtractor):
'title': '可惜没如果',
'upload_date': '20141227',
'creator': '林俊杰',
'description': 'md5:4348ff1dd24036906baa7b6f973f8d30',
'description': 'md5:d327722d0361576fde558f1ac68a7065',
}
}]
@@ -58,6 +60,8 @@ class QQMusicIE(InfoExtractor):
lrc_content = self._html_search_regex(
r'<div class="content" id="lrc_content"[^<>]*>([^<>]+)</div>',
detail_info_page, 'LRC lyrics', default=None)
if lrc_content:
lrc_content = lrc_content.replace('\\n', '\n')
guid = self.m_r_get_ruin()
@@ -96,6 +100,7 @@ class QQPlaylistBaseIE(InfoExtractor):
class QQMusicSingerIE(QQPlaylistBaseIE):
IE_NAME = 'qqmusic:singer'
_VALID_URL = r'http://y.qq.com/#type=singer&mid=(?P<id>[0-9A-Za-z]+)'
_TEST = {
'url': 'http://y.qq.com/#type=singer&mid=001BLpXF2DyJe2',
@@ -139,6 +144,7 @@ class QQMusicSingerIE(QQPlaylistBaseIE):
class QQMusicAlbumIE(QQPlaylistBaseIE):
IE_NAME = 'qqmusic:album'
_VALID_URL = r'http://y.qq.com/#type=album&mid=(?P<id>[0-9A-Za-z]+)'
_TEST = {
@@ -168,3 +174,67 @@ class QQMusicAlbumIE(QQPlaylistBaseIE):
album_page, 'album details', default=None)
return self.playlist_result(entries, mid, album_name, album_detail)
class QQMusicToplistIE(QQPlaylistBaseIE):
IE_NAME = 'qqmusic:toplist'
_VALID_URL = r'http://y\.qq\.com/#type=toplist&p=(?P<id>(top|global)_[0-9]+)'
_TESTS = [{
'url': 'http://y.qq.com/#type=toplist&p=global_12',
'info_dict': {
'id': 'global_12',
'title': 'itunes榜',
},
'playlist_count': 10,
}, {
'url': 'http://y.qq.com/#type=toplist&p=top_6',
'info_dict': {
'id': 'top_6',
'title': 'QQ音乐巅峰榜·欧美',
},
'playlist_count': 100,
}, {
'url': 'http://y.qq.com/#type=toplist&p=global_5',
'info_dict': {
'id': 'global_5',
'title': '韩国mnet排行榜',
},
'playlist_count': 50,
}]
@staticmethod
def strip_qq_jsonp(code):
return js_to_json(re.sub(r'^MusicJsonCallback\((.*?)\)/\*.+?\*/$', r'\1', code))
def _real_extract(self, url):
list_id = self._match_id(url)
list_type, num_id = list_id.split("_")
list_page = self._download_webpage(
"http://y.qq.com/y/static/toplist/index/%s.html" % list_id,
list_id, 'Download toplist page')
entries = []
if list_type == 'top':
jsonp_url = "http://y.qq.com/y/static/toplist/json/top/%s/1.js" % num_id
else:
jsonp_url = "http://y.qq.com/y/static/toplist/json/global/%s/1_1.js" % num_id
toplist_json = self._download_json(
jsonp_url, list_id, note='Retrieve toplist json',
errnote='Unable to get toplist json', transform_source=self.strip_qq_jsonp)
for song in toplist_json['l']:
s = song['s']
song_mid = s.split("|")[20]
entries.append(self.url_result(
'http://y.qq.com/#type=song&mid=' + song_mid, 'QQMusic',
song_mid))
list_name = self._html_search_regex(
r'<h2 id="top_name">([^\']+)</h2>', list_page, 'top list name',
default=None)
return self.playlist_result(entries, list_id, list_name)

View File

@@ -1,10 +1,11 @@
# coding: utf-8
from __future__ import unicode_literals
import re
import json
from .common import InfoExtractor
from ..utils import (
int_or_none,
unescapeHTML,
)
class RTBFIE(InfoExtractor):
@@ -16,25 +17,24 @@ class RTBFIE(InfoExtractor):
'id': '1921274',
'ext': 'mp4',
'title': 'Les Diables au coeur (épisode 2)',
'description': 'Football - Diables Rouges',
'duration': 3099,
'timestamp': 1398456336,
'upload_date': '20140425',
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
video_id = self._match_id(url)
page = self._download_webpage('https://www.rtbf.be/video/embed?id=%s' % video_id, video_id)
webpage = self._download_webpage(
'http://www.rtbf.be/video/embed?id=%s' % video_id, video_id)
data = json.loads(self._html_search_regex(
r'<div class="js-player-embed(?: player-embed)?" data-video="([^"]+)"', page, 'data video'))['data']
data = self._parse_json(
unescapeHTML(self._search_regex(
r'data-video="([^"]+)"', webpage, 'data video')),
video_id)
video_url = data.get('downloadUrl') or data.get('url')
if data['provider'].lower() == 'youtube':
if data.get('provider').lower() == 'youtube':
return self.url_result(video_url, 'Youtube')
return {
@@ -42,8 +42,8 @@ class RTBFIE(InfoExtractor):
'url': video_url,
'title': data['title'],
'description': data.get('description') or data.get('subtitle'),
'thumbnail': data['thumbnail']['large'],
'thumbnail': data.get('thumbnail'),
'duration': data.get('duration') or data.get('realDuration'),
'timestamp': data['created'],
'view_count': data['viewCount'],
'timestamp': int_or_none(data.get('created')),
'view_count': int_or_none(data.get('viewCount')),
}

View File

@@ -1,174 +0,0 @@
# encoding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
ExtractorError,
clean_html,
unified_strdate,
int_or_none,
)
class RTLnowIE(InfoExtractor):
"""Information Extractor for RTL NOW, RTL2 NOW, RTL NITRO, SUPER RTL NOW, VOX NOW and n-tv NOW"""
_VALID_URL = r'''(?x)
(?:https?://)?
(?P<url>
(?P<domain>
rtl-now\.rtl\.de|
rtl2now\.rtl2\.de|
(?:www\.)?voxnow\.de|
(?:www\.)?rtlnitronow\.de|
(?:www\.)?superrtlnow\.de|
(?:www\.)?n-tvnow\.de)
/+[a-zA-Z0-9-]+/[a-zA-Z0-9-]+\.php\?
(?:container_id|film_id)=(?P<video_id>[0-9]+)&
player=1(?:&season=[0-9]+)?(?:&.*)?
)'''
_TESTS = [
{
'url': 'http://rtl-now.rtl.de/ahornallee/folge-1.php?film_id=90419&player=1&season=1',
'info_dict': {
'id': '90419',
'ext': 'flv',
'title': 'Ahornallee - Folge 1 - Der Einzug',
'description': 'md5:ce843b6b5901d9a7f7d04d1bbcdb12de',
'upload_date': '20070416',
'duration': 1685,
},
'params': {
'skip_download': True,
},
'skip': 'Only works from Germany',
},
{
'url': 'http://rtl2now.rtl2.de/aerger-im-revier/episode-15-teil-1.php?film_id=69756&player=1&season=2&index=5',
'info_dict': {
'id': '69756',
'ext': 'flv',
'title': 'Ärger im Revier - Ein junger Ladendieb, ein handfester Streit u.a.',
'description': 'md5:3fb247005ed21a935ffc82b7dfa70cf0',
'thumbnail': 'http://autoimg.static-fra.de/rtl2now/219850/1500x1500/image2.jpg',
'upload_date': '20120519',
'duration': 1245,
},
'params': {
'skip_download': True,
},
'skip': 'Only works from Germany',
},
{
'url': 'http://www.voxnow.de/voxtours/suedafrika-reporter-ii.php?film_id=13883&player=1&season=17',
'info_dict': {
'id': '13883',
'ext': 'flv',
'title': 'Voxtours - Südafrika-Reporter II',
'description': 'md5:de7f8d56be6fd4fed10f10f57786db00',
'upload_date': '20090627',
'duration': 1800,
},
'params': {
'skip_download': True,
},
},
{
'url': 'http://superrtlnow.de/medicopter-117/angst.php?film_id=99205&player=1',
'info_dict': {
'id': '99205',
'ext': 'flv',
'title': 'Medicopter 117 - Angst!',
'description': 're:^Im Therapiezentrum \'Sonnalm\' kommen durch eine Unachtsamkeit die für die B.handlung mit Phobikern gehaltenen Voglespinnen frei\. Eine Ausreißerin',
'thumbnail': 'http://autoimg.static-fra.de/superrtlnow/287529/1500x1500/image2.jpg',
'upload_date': '20080928',
'duration': 2691,
},
'params': {
'skip_download': True,
},
},
{
'url': 'http://rtl-now.rtl.de/der-bachelor/folge-4.php?film_id=188729&player=1&season=5',
'info_dict': {
'id': '188729',
'ext': 'flv',
'upload_date': '20150204',
'description': 'md5:5e1ce23095e61a79c166d134b683cecc',
'title': 'Der Bachelor - Folge 4',
}
}, {
'url': 'http://www.n-tvnow.de/deluxe-alles-was-spass-macht/thema-ua-luxushotel-fuer-vierbeiner.php?container_id=153819&player=1&season=0',
'only_matching': True,
},
]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_page_url = 'http://%s/' % mobj.group('domain')
video_id = mobj.group('video_id')
webpage = self._download_webpage('http://' + mobj.group('url'), video_id)
mobj = re.search(r'(?s)<div style="margin-left: 20px; font-size: 13px;">(.*?)<div id="playerteaser">', webpage)
if mobj:
raise ExtractorError(clean_html(mobj.group(1)), expected=True)
title = self._og_search_title(webpage)
description = self._og_search_description(webpage)
thumbnail = self._og_search_thumbnail(webpage, default=None)
upload_date = unified_strdate(self._html_search_meta('uploadDate', webpage, 'upload date'))
mobj = re.search(r'<meta itemprop="duration" content="PT(?P<seconds>\d+)S" />', webpage)
duration = int(mobj.group('seconds')) if mobj else None
playerdata_url = self._html_search_regex(
r"'playerdata': '(?P<playerdata_url>[^']+)'", webpage, 'playerdata_url')
playerdata = self._download_xml(playerdata_url, video_id, 'Downloading player data XML')
videoinfo = playerdata.find('./playlist/videoinfo')
formats = []
for filename in videoinfo.findall('filename'):
mobj = re.search(r'(?P<url>rtmpe://(?:[^/]+/){2})(?P<play_path>.+)', filename.text)
if mobj:
fmt = {
'url': mobj.group('url'),
'play_path': 'mp4:' + mobj.group('play_path'),
'page_url': video_page_url,
'player_url': video_page_url + 'includes/vodplayer.swf',
}
else:
mobj = re.search(r'.*/(?P<hoster>[^/]+)/videos/(?P<play_path>.+)\.f4m', filename.text)
if mobj:
fmt = {
'url': 'rtmpe://fms.rtl.de/' + mobj.group('hoster'),
'play_path': 'mp4:' + mobj.group('play_path'),
'page_url': url,
'player_url': video_page_url + 'includes/vodplayer.swf',
}
else:
fmt = {
'url': filename.text,
}
fmt.update({
'width': int_or_none(filename.get('width')),
'height': int_or_none(filename.get('height')),
'vbr': int_or_none(filename.get('bitrate')),
'ext': 'flv',
})
formats.append(fmt)
return {
'id': video_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'upload_date': upload_date,
'duration': duration,
'formats': formats,
}

View File

@@ -190,6 +190,7 @@ class RTSIE(InfoExtractor):
'tbr': media['rate'] or extract_bitrate(media['url']),
} for media in info['media'] if media.get('rate')])
self._check_formats(formats, video_id)
self._sort_formats(formats)
return {

View File

@@ -17,7 +17,7 @@ from ..utils import (
def _decrypt_url(png):
encrypted_data = base64.b64decode(png)
encrypted_data = base64.b64decode(png.encode('utf-8'))
text_index = encrypted_data.find(b'tEXt')
text_chunk = encrypted_data[text_index - 4:]
length = struct_unpack('!I', text_chunk[:4])[0]

View File

@@ -84,18 +84,27 @@ class RUTVIE(InfoExtractor):
'title': 'Сочи-2014. Биатлон. Индивидуальная гонка. Мужчины ',
'description': 'md5:9e0ed5c9d2fa1efbfdfed90c9a6d179c',
},
'skip': 'Translation has finished',
},
{
'url': 'http://player.rutv.ru/iframe/live/id/21/showZoomBtn/false/isPlay/true/',
'info_dict': {
'id': '21',
'ext': 'mp4',
'title': 're:^Россия 24. Прямой эфир [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
'is_live': True,
},
'params': {
# rtmp download
# m3u8 download
'skip_download': True,
},
'skip': 'Translation has finished',
},
]
@classmethod
def _extract_url(cls, webpage):
mobj = re.search(
r'<iframe[^>]+?src=(["\'])(?P<url>https?://player\.rutv\.ru/(?:iframe/(?:swf|video|live)/id|index/iframe/cast_id)/.+?)\1', webpage)
r'<iframe[^>]+?src=(["\'])(?P<url>https?://player\.(?:rutv\.ru|vgtrk\.com)/(?:iframe/(?:swf|video|live)/id|index/iframe/cast_id)/.+?)\1', webpage)
if mobj:
return mobj.group('url')
@@ -119,8 +128,10 @@ class RUTVIE(InfoExtractor):
elif video_path.startswith('index/iframe/cast_id'):
video_type = 'live'
is_live = video_type == 'live'
json_data = self._download_json(
'http://player.rutv.ru/iframe/%splay/id/%s' % ('live-' if video_type == 'live' else '', video_id),
'http://player.rutv.ru/iframe/%splay/id/%s' % ('live-' if is_live else '', video_id),
video_id, 'Downloading JSON')
if json_data['errors']:
@@ -147,6 +158,7 @@ class RUTVIE(InfoExtractor):
for transport, links in media['sources'].items():
for quality, url in links.items():
preference = -1 if priority_transport == transport else -2
if transport == 'rtmp':
mobj = re.search(r'^(?P<url>rtmp://[^/]+/(?P<app>.+))/(?P<playpath>.+)$', url)
if not mobj:
@@ -160,9 +172,11 @@ class RUTVIE(InfoExtractor):
'rtmp_live': True,
'ext': 'flv',
'vbr': int(quality),
'preference': preference,
}
elif transport == 'm3u8':
formats.extend(self._extract_m3u8_formats(url, video_id, 'mp4'))
formats.extend(self._extract_m3u8_formats(
url, video_id, 'mp4', preference=preference, m3u8_id='hls'))
continue
else:
fmt = {
@@ -172,21 +186,18 @@ class RUTVIE(InfoExtractor):
'width': width,
'height': height,
'format_id': '%s-%s' % (transport, quality),
'preference': -1 if priority_transport == transport else -2,
})
formats.append(fmt)
if not formats:
raise ExtractorError('No media links available for %s' % video_id)
self._sort_formats(formats)
return {
'id': video_id,
'title': title,
'title': self._live_title(title) if is_live else title,
'description': description,
'thumbnail': thumbnail,
'view_count': view_count,
'duration': duration,
'formats': formats,
'is_live': is_live,
}

View File

@@ -1,7 +1,6 @@
# -*- coding: utf-8 -*-
from __future__ import unicode_literals
import json
import re
from .common import InfoExtractor
from ..utils import (
@@ -33,16 +32,18 @@ class SBSIE(InfoExtractor):
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
release_urls_json = js_to_json(self._search_regex(
player = self._search_regex(
r'(?s)playerParams\.releaseUrls\s*=\s*(\{.*?\n\});\n',
webpage, ''))
release_urls = json.loads(release_urls_json)
theplatform_url = (
release_urls.get('progressive') or release_urls.get('standard'))
webpage, 'player')
player = re.sub(r"'\s*\+\s*[\da-zA-Z_]+\s*\+\s*'", '', player)
release_urls = self._parse_json(js_to_json(player), video_id)
theplatform_url = release_urls.get('progressive') or release_urls['standard']
title = remove_end(self._og_search_title(webpage), ' (The Feed)')
description = self._html_search_meta('description', webpage)
@@ -52,7 +53,6 @@ class SBSIE(InfoExtractor):
'_type': 'url_transparent',
'id': video_id,
'url': theplatform_url,
'title': title,
'description': description,
'thumbnail': thumbnail,

View File

@@ -11,7 +11,7 @@ from ..utils import (
class ScreenwaveMediaIE(InfoExtractor):
_VALID_URL = r'http://player\.screenwavemedia\.com/play/[a-zA-Z]+\.php\?[^"]*\bid=(?P<id>.+)'
_VALID_URL = r'http://player\d?\.screenwavemedia\.com/(?:play/)?[a-zA-Z]+\.php\?[^"]*\bid=(?P<id>.+)'
_TESTS = [{
'url': 'http://player.screenwavemedia.com/play/play.php?playerdiv=videoarea&companiondiv=squareAd&id=Cinemassacre-19911',
@@ -20,7 +20,10 @@ class ScreenwaveMediaIE(InfoExtractor):
def _real_extract(self, url):
video_id = self._match_id(url)
playerdata = self._download_webpage(url, video_id, 'Downloading player webpage')
playerdata = self._download_webpage(
'http://player.screenwavemedia.com/play/player.php?id=%s' % video_id,
video_id, 'Downloading player webpage')
vidtitle = self._search_regex(
r'\'vidtitle\'\s*:\s*"([^"]+)"', playerdata, 'vidtitle').replace('\\/', '/')
@@ -99,7 +102,7 @@ class TeamFourIE(InfoExtractor):
webpage = self._download_webpage(url, display_id)
playerdata_url = self._search_regex(
r'src="(http://player\.screenwavemedia\.com/play/[a-zA-Z]+\.php\?[^"]*\bid=.+?)"',
r'src="(http://player\d?\.screenwavemedia\.com/(?:play/)?[a-zA-Z]+\.php\?[^"]*\bid=.+?)"',
webpage, 'player data URL')
video_title = self._html_search_regex(

View File

@@ -48,7 +48,7 @@ class SenateISVPIE(InfoExtractor):
["arch", "", "http://ussenate-f.akamaihd.net/"]
]
_IE_NAME = 'senate.gov'
_VALID_URL = r'http://www\.senate\.gov/isvp/\?(?P<qs>.+)'
_VALID_URL = r'http://www\.senate\.gov/isvp/?\?(?P<qs>.+)'
_TESTS = [{
'url': 'http://www.senate.gov/isvp/?comm=judiciary&type=live&stt=&filename=judiciary031715&auto_play=false&wmode=transparent&poster=http%3A%2F%2Fwww.judiciary.senate.gov%2Fthemes%2Fjudiciary%2Fimages%2Fvideo-poster-flash-fit.png',
'info_dict': {
@@ -72,12 +72,16 @@ class SenateISVPIE(InfoExtractor):
'ext': 'mp4',
'title': 'Integrated Senate Video Player'
}
}, {
# From http://www.c-span.org/video/?96791-1
'url': 'http://www.senate.gov/isvp?type=live&comm=banking&filename=banking012715',
'only_matching': True,
}]
@staticmethod
def _search_iframe_url(webpage):
mobj = re.search(
r"<iframe[^>]+src=['\"](?P<url>http://www\.senate\.gov/isvp/\?[^'\"]+)['\"]",
r"<iframe[^>]+src=['\"](?P<url>http://www\.senate\.gov/isvp/?\?[^'\"]+)['\"]",
webpage)
if mobj:
return mobj.group('url')

View File

@@ -47,7 +47,7 @@ class SharedIE(InfoExtractor):
video_url = self._html_search_regex(
r'data-url="([^"]+)"', video_page, 'video URL')
title = base64.b64decode(self._html_search_meta(
'full:title', webpage, 'title')).decode('utf-8')
'full:title', webpage, 'title').encode('utf-8')).decode('utf-8')
filesize = int_or_none(self._html_search_meta(
'full:size', webpage, 'file size', fatal=False))
thumbnail = self._html_search_regex(

View File

@@ -1,83 +0,0 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from ..compat import (
compat_urllib_parse,
compat_urllib_request,
)
from ..utils import (
determine_ext,
ExtractorError,
)
from .common import InfoExtractor
class SockshareIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?sockshare\.com/file/(?P<id>[0-9A-Za-z]+)'
_FILE_DELETED_REGEX = r'This file doesn\'t exist, or has been removed\.</div>'
_TEST = {
'url': 'http://www.sockshare.com/file/437BE28B89D799D7',
'md5': '9d0bf1cfb6dbeaa8d562f6c97506c5bd',
'info_dict': {
'id': '437BE28B89D799D7',
'title': 'big_buck_bunny_720p_surround.avi',
'ext': 'avi',
}
}
def _real_extract(self, url):
video_id = self._match_id(url)
url = 'http://sockshare.com/file/%s' % video_id
webpage = self._download_webpage(url, video_id)
if re.search(self._FILE_DELETED_REGEX, webpage) is not None:
raise ExtractorError('Video %s does not exist' % video_id,
expected=True)
confirm_hash = self._html_search_regex(r'''(?x)<input\s+
type="hidden"\s+
value="([^"]*)"\s+
name="hash"
''', webpage, 'hash')
fields = {
"hash": confirm_hash.encode('utf-8'),
"confirm": "Continue as Free User"
}
post = compat_urllib_parse.urlencode(fields)
req = compat_urllib_request.Request(url, post)
# Apparently, this header is required for confirmation to work.
req.add_header('Host', 'www.sockshare.com')
req.add_header('Content-type', 'application/x-www-form-urlencoded')
webpage = self._download_webpage(
req, video_id, 'Downloading video page')
video_url = self._html_search_regex(
r'<a href="([^"]*)".+class="download_file_link"',
webpage, 'file url')
video_url = "http://www.sockshare.com" + video_url
title = self._html_search_regex((
r'<h1>(.+)<strong>',
r'var name = "([^"]+)";'),
webpage, 'title', default=None)
thumbnail = self._html_search_regex(
r'<img\s+src="([^"]*)".+?name="bg"',
webpage, 'thumbnail', default=None)
formats = [{
'format_id': 'sd',
'url': video_url,
'ext': determine_ext(title),
}]
return {
'id': video_id,
'title': title,
'thumbnail': thumbnail,
'formats': formats,
}

View File

@@ -8,7 +8,7 @@ from ..compat import (
compat_str,
compat_urllib_request
)
from ..utils import sanitize_url_path_consecutive_slashes
from ..utils import ExtractorError
class SohuIE(InfoExtractor):
@@ -23,12 +23,10 @@ class SohuIE(InfoExtractor):
'ext': 'mp4',
'title': 'MVFar East Movement《The Illest》',
},
'params': {
'cn_verification_proxy': 'proxy.uku.im:8888'
}
'skip': 'On available in China',
}, {
'url': 'http://tv.sohu.com/20150305/n409385080.shtml',
'md5': '699060e75cf58858dd47fb9c03c42cfb',
'md5': 'ac9a5d322b4bf9ae184d53e4711e4f1a',
'info_dict': {
'id': '409385080',
'ext': 'mp4',
@@ -36,7 +34,7 @@ class SohuIE(InfoExtractor):
}
}, {
'url': 'http://my.tv.sohu.com/us/232799889/78693464.shtml',
'md5': '9bf34be48f2f4dadcb226c74127e203c',
'md5': '49308ff6dafde5ece51137d04aec311e',
'info_dict': {
'id': '78693464',
'ext': 'mp4',
@@ -50,7 +48,7 @@ class SohuIE(InfoExtractor):
'title': '【神探苍实战秘籍】第13期 战争之影 赫卡里姆',
},
'playlist': [{
'md5': 'bdbfb8f39924725e6589c146bc1883ad',
'md5': '492923eac023ba2f13ff69617c32754a',
'info_dict': {
'id': '78910339_part1',
'ext': 'mp4',
@@ -58,7 +56,7 @@ class SohuIE(InfoExtractor):
'title': '【神探苍实战秘籍】第13期 战争之影 赫卡里姆',
}
}, {
'md5': '3e1f46aaeb95354fd10e7fca9fc1804e',
'md5': 'de604848c0e8e9c4a4dde7e1347c0637',
'info_dict': {
'id': '78910339_part2',
'ext': 'mp4',
@@ -66,7 +64,7 @@ class SohuIE(InfoExtractor):
'title': '【神探苍实战秘籍】第13期 战争之影 赫卡里姆',
}
}, {
'md5': '8407e634175fdac706766481b9443450',
'md5': '93584716ee0657c0b205b8aa3d27aa13',
'info_dict': {
'id': '78910339_part3',
'ext': 'mp4',
@@ -117,6 +115,15 @@ class SohuIE(InfoExtractor):
r'var vid ?= ?["\'](\d+)["\']',
webpage, 'video path')
vid_data = _fetch_data(vid, mytv)
if vid_data['play'] != 1:
if vid_data.get('status') == 12:
raise ExtractorError(
'Sohu said: There\'s something wrong in the video.',
expected=True)
else:
raise ExtractorError(
'Sohu said: The video is only licensed to users in Mainland China.',
expected=True)
formats_json = {}
for format_id in ('nor', 'high', 'super', 'ori', 'h2644k', 'h2654k'):
@@ -132,24 +139,21 @@ class SohuIE(InfoExtractor):
for i in range(part_count):
formats = []
for format_id, format_data in formats_json.items():
allot = format_data['allot']
prot = format_data['prot']
data = format_data['data']
clips_url = data['clipsURL']
su = data['su']
part_str = self._download_webpage(
'http://%s/?prot=%s&file=%s&new=%s' %
(allot, prot, clips_url[i], su[i]),
video_id,
'Downloading %s video URL part %d of %d'
% (format_id, i + 1, part_count))
part_info = part_str.split('|')
video_url = sanitize_url_path_consecutive_slashes(
'%s%s?key=%s' % (part_info[0], su[i], part_info[3]))
# URLs starts with http://newflv.sohu.ccgslb.net/ is not usable
# so retry until got a working URL
video_url = 'newflv.sohu.ccgslb.net'
retries = 0
while 'newflv.sohu.ccgslb.net' in video_url and retries < 5:
download_note = 'Download information from CDN gateway for format ' + format_id
if retries > 0:
download_note += ' (retry #%d)' % retries
retries += 1
cdn_info = self._download_json(
'http://data.vod.itc.cn/cdnList?new=' + data['su'][i],
video_id, download_note)
video_url = cdn_info['url']
formats.append({
'url': video_url,

View File

@@ -336,7 +336,7 @@ class SoundcloudUserIE(SoundcloudIE):
if len(new_entries) == 0:
self.to_screen('%s: End page received' % uploader)
break
entries.extend(self._extract_info_dict(e, quiet=True) for e in new_entries)
entries.extend(self.url_result(e['permalink_url'], 'Soundcloud') for e in new_entries)
return {
'_type': 'playlist',

View File

@@ -32,7 +32,7 @@ class SouthParkEsIE(SouthParkIE):
}]
class SouthparkDeIE(SouthParkIE):
class SouthParkDeIE(SouthParkIE):
IE_NAME = 'southpark.de'
_VALID_URL = r'https?://(?:www\.)?(?P<url>southpark\.de/(?:clips|alle-episoden)/(?P<id>.+?)(\?|#|$))'
_FEED_URL = 'http://www.southpark.de/feeds/video-player/mrss/'
@@ -46,3 +46,25 @@ class SouthparkDeIE(SouthParkIE):
'description': 'Cartman explains the benefits of "Shitter" to Stan, Kyle and Craig.',
},
}]
class SouthParkNlIE(SouthParkIE):
IE_NAME = 'southpark.nl'
_VALID_URL = r'https?://(?:www\.)?(?P<url>southpark\.nl/(?:clips|full-episodes)/(?P<id>.+?)(\?|#|$))'
_FEED_URL = 'http://www.southpark.nl/feeds/video-player/mrss/'
_TESTS = [{
'url': 'http://www.southpark.nl/full-episodes/s18e06-freemium-isnt-free',
'playlist_count': 4,
}]
class SouthParkDkIE(SouthParkIE):
IE_NAME = 'southparkstudios.dk'
_VALID_URL = r'https?://(?:www\.)?(?P<url>southparkstudios\.dk/(?:clips|full-episodes)/(?P<id>.+?)(\?|#|$))'
_FEED_URL = 'http://www.southparkstudios.dk/feeds/video-player/mrss/'
_TESTS = [{
'url': 'http://www.southparkstudios.dk/full-episodes/s18e07-grounded-vindaloop',
'playlist_count': 4,
}]

View File

@@ -71,7 +71,7 @@ class SpankwireIE(InfoExtractor):
compat_urllib_parse.unquote,
re.findall(r'playerData\.cdnPath[0-9]{3,}\s*=\s*["\']([^"\']+)["\']', webpage)))
if webpage.find('flashvars\.encrypted = "true"') != -1:
password = self._html_search_regex(
password = self._search_regex(
r'flashvars\.video_title = "([^"]+)',
webpage, 'password').replace('+', ' ')
video_urls = list(map(

View File

@@ -4,37 +4,36 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_urlparse
from ..utils import (
parse_duration,
parse_iso8601,
unified_strdate,
)
class SportBoxIE(InfoExtractor):
_VALID_URL = r'https?://news\.sportbox\.ru/Vidy_sporta/(?:[^/]+/)+spbvideo_NI\d+_(?P<display_id>.+)'
_TESTS = [
{
'url': 'http://news.sportbox.ru/Vidy_sporta/Avtosport/Rossijskij/spbvideo_NI483529_Gonka-2-zaezd-Obyedinenniy-2000-klassi-Turing-i-S',
'md5': 'ff56a598c2cf411a9a38a69709e97079',
'info_dict': {
'id': '80822',
'ext': 'mp4',
'title': 'Гонка 2 заезд ««Объединенный 2000»: классы Туринг и Супер-продакшн',
'description': 'md5:81715fa9c4ea3d9e7915dc8180c778ed',
'thumbnail': 're:^https?://.*\.jpg$',
'timestamp': 1411896237,
'upload_date': '20140928',
'duration': 4846,
},
'params': {
# m3u8 download
'skip_download': True,
},
}, {
'url': 'http://news.sportbox.ru/Vidy_sporta/billiard/spbvideo_NI486287_CHempionat-mira-po-dinamichnoy-piramide-4',
'only_matching': True,
}
]
_VALID_URL = r'https?://news\.sportbox\.ru/(?:[^/]+/)+spbvideo_NI\d+_(?P<display_id>.+)'
_TESTS = [{
'url': 'http://news.sportbox.ru/Vidy_sporta/Avtosport/Rossijskij/spbvideo_NI483529_Gonka-2-zaezd-Obyedinenniy-2000-klassi-Turing-i-S',
'md5': 'ff56a598c2cf411a9a38a69709e97079',
'info_dict': {
'id': '80822',
'ext': 'mp4',
'title': 'Гонка 2 заезд ««Объединенный 2000»: классы Туринг и Супер-продакшн',
'description': 'md5:3d72dc4a006ab6805d82f037fdc637ad',
'thumbnail': 're:^https?://.*\.jpg$',
'upload_date': '20140928',
},
'params': {
# m3u8 download
'skip_download': True,
},
}, {
'url': 'http://news.sportbox.ru/Vidy_sporta/billiard/spbvideo_NI486287_CHempionat-mira-po-dinamichnoy-piramide-4',
'only_matching': True,
}, {
'url': 'http://news.sportbox.ru/video/no_ads/spbvideo_NI536574_V_Novorossijske_proshel_detskij_turnir_Pole_slavy_bojevoj?ci=211355',
'only_matching': True,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
@@ -42,35 +41,75 @@ class SportBoxIE(InfoExtractor):
webpage = self._download_webpage(url, display_id)
video_id = self._search_regex(
r'src="/vdl/player/media/(\d+)"', webpage, 'video id')
player = self._download_webpage(
'http://news.sportbox.ru/vdl/player/media/%s' % video_id,
display_id, 'Downloading player webpage')
hls = self._search_regex(
r"var\s+original_hls_file\s*=\s*'([^']+)'", player, 'hls file')
formats = self._extract_m3u8_formats(hls, display_id, 'mp4')
player = self._search_regex(
r'src="/?(vdl/player/[^"]+)"', webpage, 'player')
title = self._html_search_regex(
r'<h1 itemprop="name">([^<]+)</h1>', webpage, 'title')
description = self._html_search_regex(
r'(?s)<div itemprop="description">(.+?)</div>', webpage, 'description', fatal=False)
[r'"nodetitle"\s*:\s*"([^"]+)"', r'class="node-header_{1,2}title">([^<]+)'],
webpage, 'title')
description = self._og_search_description(webpage) or self._html_search_meta(
'description', webpage, 'description')
thumbnail = self._og_search_thumbnail(webpage)
timestamp = parse_iso8601(self._search_regex(
r'<span itemprop="uploadDate">([^<]+)</span>', webpage, 'timestamp', fatal=False))
duration = parse_duration(self._html_search_regex(
r'<meta itemprop="duration" content="PT([^"]+)">', webpage, 'duration', fatal=False))
upload_date = unified_strdate(self._html_search_meta(
'dateCreated', webpage, 'upload date'))
return {
'id': video_id,
'_type': 'url_transparent',
'url': compat_urlparse.urljoin(url, '/%s' % player),
'display_id': display_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'timestamp': timestamp,
'duration': duration,
'upload_date': upload_date,
}
class SportBoxEmbedIE(InfoExtractor):
_VALID_URL = r'https?://news\.sportbox\.ru/vdl/player(?:/[^/]+/|\?.*?\bn?id=)(?P<id>\d+)'
_TESTS = [{
'url': 'http://news.sportbox.ru/vdl/player/ci/211355',
'info_dict': {
'id': '211355',
'ext': 'mp4',
'title': 'В Новороссийске прошел детский турнир «Поле славы боевой»',
'thumbnail': 're:^https?://.*\.jpg$',
},
'params': {
# m3u8 download
'skip_download': True,
},
}, {
'url': 'http://news.sportbox.ru/vdl/player?nid=370908&only_player=1&autostart=false&playeri=2&height=340&width=580',
'only_matching': True,
}]
@staticmethod
def _extract_urls(webpage):
return re.findall(
r'<iframe[^>]+src="(https?://news\.sportbox\.ru/vdl/player[^"]+)"',
webpage)
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
hls = self._search_regex(
r"sportboxPlayer\.jwplayer_common_params\.file\s*=\s*['\"]([^'\"]+)['\"]",
webpage, 'hls file')
formats = self._extract_m3u8_formats(hls, video_id, 'mp4')
title = self._search_regex(
r'sportboxPlayer\.node_title\s*=\s*"([^"]+)"', webpage, 'title')
thumbnail = self._search_regex(
r'sportboxPlayer\.jwplayer_common_params\.image\s*=\s*"([^"]+)"',
webpage, 'thumbnail', default=None)
return {
'id': video_id,
'title': title,
'thumbnail': thumbnail,
'formats': formats,
}

View File

@@ -9,41 +9,9 @@ from ..utils import (
)
class SVTPlayIE(InfoExtractor):
IE_DESC = 'SVT Play and Öppet arkiv'
_VALID_URL = r'https?://(?:www\.)?(?P<host>svtplay|oppetarkiv)\.se/video/(?P<id>[0-9]+)'
_TESTS = [{
'url': 'http://www.svtplay.se/video/2609989/sm-veckan/sm-veckan-rally-final-sasong-1-sm-veckan-rally-final',
'md5': 'ade3def0643fa1c40587a422f98edfd9',
'info_dict': {
'id': '2609989',
'ext': 'flv',
'title': 'SM veckan vinter, Örebro - Rally, final',
'duration': 4500,
'thumbnail': 're:^https?://.*[\.-]jpg$',
'age_limit': 0,
},
}, {
'url': 'http://www.oppetarkiv.se/video/1058509/rederiet-sasong-1-avsnitt-1-av-318',
'md5': 'c3101a17ce9634f4c1f9800f0746c187',
'info_dict': {
'id': '1058509',
'ext': 'flv',
'title': 'Farlig kryssning',
'duration': 2566,
'thumbnail': 're:^https?://.*[\.-]jpg$',
'age_limit': 0,
},
'skip': 'Only works from Sweden',
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
host = mobj.group('host')
info = self._download_json(
'http://www.%s.se/video/%s?output=json' % (host, video_id), video_id)
class SVTBaseIE(InfoExtractor):
def _extract_video(self, url, video_id):
info = self._download_json(url, video_id)
title = info['context']['title']
thumbnail = info['context'].get('thumbnailImage')
@@ -80,3 +48,70 @@ class SVTPlayIE(InfoExtractor):
'duration': duration,
'age_limit': age_limit,
}
class SVTIE(SVTBaseIE):
_VALID_URL = r'https?://(?:www\.)?svt\.se/wd\?(?:.*?&)?widgetId=(?P<widget_id>\d+)&.*?\barticleId=(?P<id>\d+)'
_TEST = {
'url': 'http://www.svt.se/wd?widgetId=23991&sectionId=541&articleId=2900353&type=embed&contextSectionId=123&autostart=false',
'md5': '9648197555fc1b49e3dc22db4af51d46',
'info_dict': {
'id': '2900353',
'ext': 'flv',
'title': 'Här trycker Jagr till Giroux (under SVT-intervjun)',
'duration': 27,
'age_limit': 0,
},
}
@staticmethod
def _extract_url(webpage):
mobj = re.search(
r'(?:<iframe src|href)="(?P<url>%s[^"]*)"' % SVTIE._VALID_URL, webpage)
if mobj:
return mobj.group('url')
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
widget_id = mobj.group('widget_id')
article_id = mobj.group('id')
return self._extract_video(
'http://www.svt.se/wd?widgetId=%s&articleId=%s&format=json&type=embed&output=json' % (widget_id, article_id),
article_id)
class SVTPlayIE(SVTBaseIE):
IE_DESC = 'SVT Play and Öppet arkiv'
_VALID_URL = r'https?://(?:www\.)?(?P<host>svtplay|oppetarkiv)\.se/video/(?P<id>[0-9]+)'
_TESTS = [{
'url': 'http://www.svtplay.se/video/2609989/sm-veckan/sm-veckan-rally-final-sasong-1-sm-veckan-rally-final',
'md5': 'ade3def0643fa1c40587a422f98edfd9',
'info_dict': {
'id': '2609989',
'ext': 'flv',
'title': 'SM veckan vinter, Örebro - Rally, final',
'duration': 4500,
'thumbnail': 're:^https?://.*[\.-]jpg$',
'age_limit': 0,
},
}, {
'url': 'http://www.oppetarkiv.se/video/1058509/rederiet-sasong-1-avsnitt-1-av-318',
'md5': 'c3101a17ce9634f4c1f9800f0746c187',
'info_dict': {
'id': '1058509',
'ext': 'flv',
'title': 'Farlig kryssning',
'duration': 2566,
'thumbnail': 're:^https?://.*[\.-]jpg$',
'age_limit': 0,
},
'skip': 'Only works from Sweden',
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
host = mobj.group('host')
return self._extract_video(
'http://www.%s.se/video/%s?output=json' % (host, video_id),
video_id)

View File

@@ -2,13 +2,17 @@
from __future__ import unicode_literals
import base64
import binascii
import re
import json
from .common import InfoExtractor
from ..utils import (
ExtractorError,
qualities,
determine_ext,
)
from ..compat import compat_ord
class TeamcocoIE(InfoExtractor):
@@ -59,37 +63,53 @@ class TeamcocoIE(InfoExtractor):
mobj = re.match(self._VALID_URL, url)
display_id = mobj.group('display_id')
webpage = self._download_webpage(url, display_id)
webpage, urlh = self._download_webpage_handle(url, display_id)
if 'src=expired' in urlh.geturl():
raise ExtractorError('This video is expired.', expected=True)
video_id = mobj.group('video_id')
if not video_id:
video_id = self._html_search_regex(
self._VIDEO_ID_REGEXES, webpage, 'video id')
preload = None
preloads = re.findall(r'"preload":\s*"([^"]+)"', webpage)
if preloads:
preload = max([(len(p), p) for p in preloads])[1]
data = None
if not preload:
preload = ''.join(re.findall(r'this\.push\("([^"]+)"\);', webpage))
preload_codes = self._html_search_regex(
r'(function.+)setTimeout\(function\(\)\{playlist',
webpage, 'preload codes')
base64_fragments = re.findall(r'"([a-zA-z0-9+/=]+)"', preload_codes)
base64_fragments.remove('init')
if not preload:
preload = self._html_search_regex([
r'player,\[?"([^"]+)"\]?', r'player.init\(\[?"([^"]+)"\]?\)'
], webpage.replace('","', ''), 'preload data', default=None)
def _check_sequence(cur_fragments):
if not cur_fragments:
return
for i in range(len(cur_fragments)):
cur_sequence = (''.join(cur_fragments[i:] + cur_fragments[:i])).encode('ascii')
try:
raw_data = base64.b64decode(cur_sequence)
if compat_ord(raw_data[0]) == compat_ord('{'):
return json.loads(raw_data.decode('utf-8'))
except (TypeError, binascii.Error, UnicodeDecodeError, ValueError):
continue
if not preload:
def _check_data():
for i in range(len(base64_fragments) + 1):
for j in range(i, len(base64_fragments) + 1):
data = _check_sequence(base64_fragments[:i] + base64_fragments[j:])
if data:
return data
self.to_screen('Try to compute possible data sequence. This may take some time.')
data = _check_data()
if not data:
raise ExtractorError(
'Preload information could not be extracted', expected=True)
data = self._parse_json(
base64.b64decode(preload.encode('ascii')).decode('utf-8'), video_id)
formats = []
get_quality = qualities(['500k', '480p', '1000k', '720p', '1080p'])
for filed in data['files']:
if filed['type'] == 'hls':
if determine_ext(filed['url']) == 'm3u8':
formats.extend(self._extract_m3u8_formats(
filed['url'], video_id, ext='mp4'))
else:

View File

@@ -16,6 +16,10 @@ class TelecincoIE(MiTeleIE):
'title': 'Con Martín Berasategui, hacer un bacalao al ...',
'duration': 662,
},
'params': {
# m3u8 download
'skip_download': True,
},
}, {
'url': 'http://www.telecinco.es/informativos/nacional/Pablo_Iglesias-Informativos_Telecinco-entrevista-Pedro_Piqueras_2_1945155182.html',
'only_matching': True,

View File

@@ -2,6 +2,10 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
int_or_none,
float_or_none,
)
class TenPlayIE(InfoExtractor):
@@ -49,18 +53,23 @@ class TenPlayIE(InfoExtractor):
if protocol == 'rtmp':
url = url.replace('&mp4:', '')
tbr = int_or_none(rendition.get('encodingRate'), 1000)
formats.append({
'format_id': '_'.join(['rtmp', rendition['videoContainer'].lower(), rendition['videoCodec'].lower()]),
'width': rendition['frameWidth'],
'height': rendition['frameHeight'],
'tbr': rendition['encodingRate'] / 1024,
'filesize': rendition['size'],
'format_id': '_'.join(
['rtmp', rendition['videoContainer'].lower(),
rendition['videoCodec'].lower(), '%sk' % tbr]),
'width': int_or_none(rendition['frameWidth']),
'height': int_or_none(rendition['frameHeight']),
'tbr': tbr,
'filesize': int_or_none(rendition['size']),
'protocol': protocol,
'ext': ext,
'vcodec': rendition['videoCodec'].lower(),
'container': rendition['videoContainer'].lower(),
'url': url,
})
self._sort_formats(formats)
return {
'id': video_id,
@@ -74,8 +83,8 @@ class TenPlayIE(InfoExtractor):
'url': json['thumbnailURL']
}],
'thumbnail': json['videoStillURL'],
'duration': json['length'] / 1000,
'timestamp': float(json['creationDate']) / 1000,
'uploader': json['customFields']['production_company_distributor'] if 'production_company_distributor' in json['customFields'] else 'TENplay',
'view_count': json['playsTotal']
'duration': float_or_none(json.get('length'), 1000),
'timestamp': float_or_none(json.get('creationDate'), 1000),
'uploader': json.get('customFields', {}).get('production_company_distributor') or 'TENplay',
'view_count': int_or_none(json.get('playsTotal')),
}

View File

@@ -15,19 +15,37 @@ class TestTubeIE(InfoExtractor):
'id': '60163',
'display_id': '5-weird-ways-plants-can-eat-animals',
'duration': 275,
'ext': 'mp4',
'ext': 'webm',
'title': '5 Weird Ways Plants Can Eat Animals',
'description': 'Why have some plants evolved to eat meat?',
'thumbnail': 're:^https?://.*\.jpg$',
'uploader': 'DNews',
'uploader_id': 'dnews',
},
}, {
'url': 'https://testtube.com/iflscience/insane-jet-ski-flipping',
'info_dict': {
'id': 'fAGfJ4YjVus',
'ext': 'mp4',
'title': 'Flipping Jet-Ski Skills | Outrageous Acts of Science',
'uploader': 'Science Channel',
'uploader_id': 'ScienceChannel',
'upload_date': '20150203',
'description': 'md5:e61374030015bae1d2e22f096d4769d6',
}
}]
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
youtube_url = self._html_search_regex(
r'<iframe[^>]+src="((?:https?:)?//www.youtube.com/embed/[^"]+)"',
webpage, 'youtube iframe', default=None)
if youtube_url:
return self.url_result(youtube_url, 'Youtube', video_id=display_id)
video_id = self._search_regex(
r"player\.loadRevision3Item\('video_id',\s*([0-9]+)\);",
webpage, 'video ID')

View File

@@ -6,8 +6,8 @@ from .common import InfoExtractor
class TF1IE(InfoExtractor):
"""TF1 uses the wat.tv player."""
_VALID_URL = r'http://(?:videos\.tf1|www\.tfou)\.fr/.*?-(?P<id>\d+)(?:-\d+)?\.html'
_TESTS = {
_VALID_URL = r'http://(?:videos\.tf1|www\.tfou|www\.tf1)\.fr/.*?-(?P<id>\d+)(?:-\d+)?\.html'
_TESTS = [{
'url': 'http://videos.tf1.fr/auto-moto/citroen-grand-c4-picasso-2013-presentation-officielle-8062060.html',
'info_dict': {
'id': '10635995',
@@ -32,7 +32,10 @@ class TF1IE(InfoExtractor):
# Sometimes wat serves the whole file with the --test option
'skip_download': True,
},
}
}, {
'url': 'http://www.tf1.fr/tf1/koh-lanta/videos/replay-koh-lanta-22-mai-2015.html',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)

View File

@@ -129,7 +129,9 @@ class ThePlatformIE(InfoExtractor):
head = meta.find(_x('smil:head'))
body = meta.find(_x('smil:body'))
f4m_node = body.find(_x('smil:seq//smil:video')) or body.find(_x('smil:seq/smil:video'))
f4m_node = body.find(_x('smil:seq//smil:video'))
if f4m_node is None:
f4m_node = body.find(_x('smil:seq/smil:video'))
if f4m_node is not None and '.f4m' in f4m_node.attrib['src']:
f4m_url = f4m_node.attrib['src']
if 'manifest.f4m?' not in f4m_url:
@@ -142,7 +144,9 @@ class ThePlatformIE(InfoExtractor):
formats = []
switch = body.find(_x('smil:switch'))
if switch is None:
switch = body.find(_x('smil:par//smil:switch')) or body.find(_x('smil:par/smil:switch'))
switch = body.find(_x('smil:par//smil:switch'))
if switch is None:
switch = body.find(_x('smil:par/smil:switch'))
if switch is None:
switch = body.find(_x('smil:par'))
if switch is not None:
@@ -163,7 +167,9 @@ class ThePlatformIE(InfoExtractor):
'vbr': vbr,
})
else:
switch = body.find(_x('smil:seq//smil:switch')) or body.find(_x('smil:seq/smil:switch'))
switch = body.find(_x('smil:seq//smil:switch'))
if switch is None:
switch = body.find(_x('smil:seq/smil:switch'))
for f in switch.findall(_x('smil:video')):
attr = f.attrib
vbr = int_or_none(attr.get('system-bitrate'), 1000)

View File

@@ -30,3 +30,31 @@ class TMZIE(InfoExtractor):
'description': self._og_search_description(webpage),
'thumbnail': self._html_search_meta('ThumbURL', webpage),
}
class TMZArticleIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?tmz\.com/\d{4}/\d{2}/\d{2}/(?P<id>[^/]+)/?'
_TEST = {
'url': 'http://www.tmz.com/2015/04/19/bobby-brown-bobbi-kristina-awake-video-concert',
'md5': 'e482a414a38db73087450e3a6ce69d00',
'info_dict': {
'id': '0_6snoelag',
'ext': 'mp4',
'title': 'Bobby Brown Tells Crowd ... Bobbi Kristina is Awake',
'description': 'Bobby Brown stunned his audience during a concert Saturday night, when he told the crowd, "Bobbi is awake. She\'s watching me."',
}
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
embedded_video_info_str = self._html_search_regex(
r'tmzVideoEmbedV2\("([^)]+)"\);', webpage, 'embedded video info')
embedded_video_info = self._parse_json(
embedded_video_info_str, video_id,
transform_source=lambda s: s.replace('\\', ''))
return self.url_result(
'http://www.tmz.com/videos/%s/' % embedded_video_info['id'])

View File

@@ -10,26 +10,32 @@ from ..utils import (
class TNAFlixIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?tnaflix\.com/(?P<cat_id>[\w-]+)/(?P<display_id>[\w-]+)/video(?P<id>\d+)'
_VALID_URL = r'https?://(?:www\.)?tnaflix\.com/[^/]+/(?P<display_id>[^/]+)/video(?P<id>\d+)'
_TITLE_REGEX = r'<title>(.+?) - TNAFlix Porn Videos</title>'
_DESCRIPTION_REGEX = r'<h3 itemprop="description">([^<]+)</h3>'
_CONFIG_REGEX = r'flashvars\.config\s*=\s*escape\("([^"]+)"'
_TEST = {
'url': 'http://www.tnaflix.com/porn-stars/Carmella-Decesare-striptease/video553878',
'md5': 'ecf3498417d09216374fc5907f9c6ec0',
'info_dict': {
'id': '553878',
'display_id': 'Carmella-Decesare-striptease',
'ext': 'mp4',
'title': 'Carmella Decesare - striptease',
'description': '',
'thumbnail': 're:https?://.*\.jpg$',
'duration': 91,
'age_limit': 18,
_TESTS = [
{
'url': 'http://www.tnaflix.com/porn-stars/Carmella-Decesare-striptease/video553878',
'md5': 'ecf3498417d09216374fc5907f9c6ec0',
'info_dict': {
'id': '553878',
'display_id': 'Carmella-Decesare-striptease',
'ext': 'mp4',
'title': 'Carmella Decesare - striptease',
'description': '',
'thumbnail': 're:https?://.*\.jpg$',
'duration': 91,
'age_limit': 18,
}
},
{
'url': 'https://www.tnaflix.com/amateur-porn/bunzHD-Ms.Donk/video358632',
'matching_only': True,
}
}
]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)

View File

@@ -26,7 +26,7 @@ class TutvIE(InfoExtractor):
data_content = self._download_webpage(
'http://tu.tv/flvurl.php?codVideo=%s' % internal_id, video_id, 'Downloading video info')
video_url = base64.b64decode(compat_parse_qs(data_content)['kpt'][0]).decode('utf-8')
video_url = base64.b64decode(compat_parse_qs(data_content)['kpt'][0].encode('utf-8')).decode('utf-8')
return {
'id': internal_id,

Some files were not shown because too many files have changed in this diff Show More