Compare commits

..

330 Commits

Author SHA1 Message Date
Philipp Hagemeister
92da3cd848 release 2016.02.22 2016-02-22 11:57:31 +01:00
remitamine
6212bcb191 [tf1] fix info extraction(fixes #8599) 2016-02-22 09:57:40 +01:00
Sergey M․
d69abbd3f0 [googledrive] Make thumbnail optional (Closes #8629) 2016-02-22 03:13:18 +06:00
Sergey M․
1d00a8823e [arte] PEP 8 2016-02-22 01:32:23 +06:00
Sergey M․
5d6e1011df [pbs] Extract all formats (Closes #8538) 2016-02-22 01:23:27 +06:00
Sergey M․
f5bdb44443 [extractor/common] Add _remove_duplicate_formats 2016-02-22 01:19:39 +06:00
Yen Chi Hsuan
7efc1c2b49 [twitter] Fix metadata extraction and test_Twitter_1 2016-02-21 17:29:28 +08:00
Yen Chi Hsuan
132e3b74bd [twitter] Fix a typo 2016-02-21 17:21:37 +08:00
Yen Chi Hsuan
bdbf4ba40e [twitter:amplify] Extract more metadata 2016-02-21 17:16:35 +08:00
Yen Chi Hsuan
acb6e97e6a [twitter] Fix several failed tests 2016-02-21 16:57:56 +08:00
Yen Chi Hsuan
445d72b8b5 [twitter:amplify] Add TwitterAmplifyIE for handling Twitter smart URLs
Closes #8075
2016-02-21 16:41:24 +08:00
Sergey M․
92c5e11b40 [arte:future] Fix test 2016-02-21 14:23:58 +06:00
Sergey M․
0dd046c16c [arte:magazine] Fix test 2016-02-21 13:57:30 +06:00
Sergey M․
305168ca3e [arte:+7] Detect more embeds (Closes #8613) 2016-02-21 13:55:25 +06:00
Sergey M․
b72f6163dc [arte:+7] Improve _VALID_URL 2016-02-21 13:37:31 +06:00
Sergey M․
33d4fdabfa [extractor/generic] Add support for ok embeds (#8619) 2016-02-21 09:51:54 +06:00
remitamine
cafcf657a4 add more subtitles mime types to mimetype2ext and fix the platform subtitle extraction 2016-02-20 22:02:03 +01:00
Yen Chi Hsuan
7360db05b4 [postprocessor/embedthumbnail] Allow mkv to embed thumbnails
Fixes #6046
2016-02-21 03:32:03 +08:00
Jaime Marquínez Ferrándiz
765ac263db [utils] mimetype2ext: return 'm4a' for 'audio/mp4' (fixes #8620)
The youtube extractor was using 'mp4' for them, therefore filters like 'bestaudio[ext=m4a]' stopped working (94278f7202 broke it).
2016-02-20 19:55:10 +01:00
Yen Chi Hsuan
a4e4d7dfcd [test_iqiyi_sdk_interpreter] Add test for iQiyi login 2016-02-20 23:10:39 +08:00
Yen Chi Hsuan
73f9c2867d [iqiyi] Support playlists (closes #8019) 2016-02-20 22:44:04 +08:00
Philipp Hagemeister
9c86d50916 [faz] Future-proof XML element check 2016-02-20 14:11:44 +01:00
Yen Chi Hsuan
1d14c75f55 [Makefile] iQiyi login test requires network 2016-02-20 20:49:30 +08:00
Yen Chi Hsuan
99709cc3f1 [iqiyi] Implement _login()
Currently only email login supported
2016-02-20 19:54:58 +08:00
Yen Chi Hsuan
5bc880b988 [utils] Add OHDave's RSA encryption function 2016-02-20 19:54:58 +08:00
Yen Chi Hsuan
958759f44b [appletrailers] Extend _VALID_URL (#8524) 2016-02-20 15:54:00 +08:00
Sergey M․
86bf29050e [test_YoutubeDL] Make test pass until more intelligent sort formats (Closes #8462) 2016-02-20 03:36:03 +06:00
remitamine
04cbc4980d [mtv] imporove duration extraction 2016-02-19 20:56:45 +01:00
RiCON
8765151c8a [mtv] Extract duration from each playlist item
RSS used instead of manifest files because it's exact to the millisecond
with the video I tested while in manifest it's only exact to the second.
2016-02-19 19:38:28 +00:00
Sergey M
8ec64ac683 [README.md] Clarify verbose log 2016-02-19 22:18:21 +06:00
Sergey M․
ed8648a322 [pornhub] Fix thumbnail and duration extraction (Closes #8604) 2016-02-19 21:42:46 +06:00
Sergey M․
88641243ab [pornhub:playlistbase] Improve extract entries 2016-02-18 22:30:19 +06:00
Sergey M․
40e146aa1e [pornhub:user:videos] Add extractor (Closes #8548) 2016-02-18 22:29:17 +06:00
Sergey M․
f3f9cd9234 [francetv] Improve video id regex (Closes #8563) 2016-02-18 22:09:21 +06:00
Sergey M․
ebf1b291d0 [youtube:watchlater] Respect --no-playlist 2016-02-18 22:03:46 +06:00
Sergey M․
bc7a9cd8fb [youtube:watchlater] Improve _VALID_URL (Closes #8594) 2016-02-18 21:50:21 +06:00
Sergey M․
d48502b82a [arte] Improve _VALID_URLs 2016-02-18 21:29:52 +06:00
Sergey M․
479ec54a8d [arte:magazine] Improve (Closes #8473) 2016-02-18 21:29:07 +06:00
Thomas Jost
49625662a9 [arte:magazine] Add extractor 2016-02-18 21:28:18 +06:00
remitamine
8b809a079a [cbsnews] use find_xpath_attr 2016-02-18 16:10:09 +01:00
remitamine
778433cb90 [cbsnews] extract subtitle url from theplatform SMIL manifest(fixes #8568) 2016-02-18 15:43:28 +01:00
cazulu
411cb8f476 [dailymotion] Fix view count extraction
Fix view count parsing when the decimal marker is a whitespace, e.g. '101 101'
2016-02-18 20:31:43 +06:00
Sergey M․
63bf4f0dc0 [vrt] Detect geo restriction 2016-02-17 23:28:41 +06:00
Sergey M․
80e59a0d5d [vrt] Make formats extraction non fatal (Closes #8587) 2016-02-17 23:18:23 +06:00
Sergey M․
8bbd3d1476 [arte] Fix upload date extraction (Closes #8581) 2016-02-17 22:51:08 +06:00
Sergey M․
e725e4bced [arte] PEP 8 2016-02-17 22:37:55 +06:00
Sergey M․
08d65046f0 [arte] Make sorting aware of en/es formats 2016-02-17 22:37:05 +06:00
Sergey M․
44b9745000 [arte] Extend more _VALID_URLs for en and es support 2016-02-17 21:53:53 +06:00
Sergey M․
9654fc875b [arte:+7] Fix extraction for react-based layout 2016-02-17 21:49:15 +06:00
Sergey M․
0f425e65ec [arte:+7] Add support for en and es URLs 2016-02-17 21:47:18 +06:00
Sergey M․
e277f2a63b [orf:tvthek] Check formats (Closes #8580) 2016-02-16 22:23:38 +06:00
Sergey M․
f4db09178a [xtube:user] Remove duplicated video ids 2016-02-16 22:06:26 +06:00
Sergey M․
86be3cdc2a [xtube] Fix extraction (Closes #8565) 2016-02-16 22:05:23 +06:00
Yen Chi Hsuan
cb64ccc715 [facebook] Improve error handling (#8572) 2016-02-16 09:07:38 +08:00
Sergey M․
f66a3c7bc2 [screenjunkies] Fix spelling 2016-02-16 01:30:00 +06:00
Sergey M․
fe80df3080 Credit @TingPing for screenjunkies (#8505) 2016-02-16 01:24:57 +06:00
Yen Chi Hsuan
1932476c13 [iqiyi] Omit MD5 sums for the VIP-only video 2016-02-16 02:45:21 +08:00
Sergey M․
d2c1f79f20 [youtube:searchurl] Extend _VALID_URL 2016-02-16 00:29:51 +06:00
Sergey M․
8eacae8cf9 Credit @RobinHoutevelts for canvas subtiltes (#8537) 2016-02-15 22:33:32 +06:00
Sergey M․
c8a80fd818 [screenjunkies] Improve, extract more metadata and workaround subscription (Closes #8505) 2016-02-15 22:29:28 +06:00
Patrick Griffis
b9e8d7140a [screenjunkies] Add new extractor
This doesn't handle the plus only videos yet

Closes #8492
2016-02-15 22:28:36 +06:00
Sergey M․
6eff2605d6 [canvas] Add subtitles test (#8537) 2016-02-15 20:59:16 +06:00
Sergey M․
fd7a3ea4a4 [canvas] Improve subtitles (Closes #8537) 2016-02-15 20:54:01 +06:00
Robin Houtevelts
8d3eeb36d7 [Canvas] Add subtitles 2016-02-15 20:50:03 +06:00
Yen Chi Hsuan
8e0548e180 [iqiyi] Partial support for VIP-only videos
See #8569 and #8019. Currently only 6-min preview are supported
2016-02-15 19:58:24 +08:00
Philipp Hagemeister
a517bb4b1e [noz] Add new extractor 2016-02-15 00:07:16 +01:00
Sergey M․
9dcefb23a1 [laola1tv] Improve (Closes #8478) 2016-02-14 23:40:26 +06:00
Sergey M․
d9da74bc06 Credit @blackwinter for laola1tv (#8478) 2016-02-14 23:39:49 +06:00
Jens Wille
5e19323ed9 [laola1tv] Fixes for changed site layout.
* Fixed valid URLs (w/ tests).
* Fixed iframe URL extraction.
* Fixed token URL extraction.
* Fixed variable extraction.
* Fixed uploader spelling.
* Added upload_date to result dictionary.
2016-02-14 23:01:49 +06:00
Sergey M․
611c1dd96e [refactor] Single quotes consistency 2016-02-14 15:37:17 +06:00
Sergey M․
d800609c62 [refactor] Do not specify redundant None as second argument in dict.get() 2016-02-14 14:25:04 +06:00
Sergey M․
c78c9cd10d [downloader/dash] PEP 8 2016-02-14 14:13:09 +06:00
Sergey M․
e76394f36c [globo] Switch to new-style classes 2016-02-14 14:02:12 +06:00
Sergey M․
080e09557d [aes] Switch to new-style classes 2016-02-14 14:01:43 +06:00
Sergey M․
fca2e6d5a6 [dailymotion:cloud] Use idiomatic name for classmethod's first argument 2016-02-14 13:44:23 +06:00
Sergey M․
b45f2b1d6e [myvideo] Mark broken 2016-02-14 11:24:57 +06:00
remitamine
fc2e70ee90 Merge pull request #8479 from remitamine/dash_downloader
[downloader/dash] Implement dashsegments fd in terms of fragment fd
2016-02-13 21:12:33 +01:00
Sergey M․
b4561e857f [animeondemand] Add .netrc 2016-02-13 22:41:58 +06:00
Jaime Marquínez Ferrándiz
7023251239 [comedycentral] Support /shows URLs (fixes #8405) 2016-02-13 12:26:27 +01:00
Sergey M․
e2bd68c901 [animeondemand][wip] Add extractor (#8518) 2016-02-13 13:30:31 +06:00
Philipp Hagemeister
35ced3985a release 2016.02.13 2016-02-13 08:25:05 +01:00
Sergey M․
3e18700d45 [nbc] Correct test 2016-02-13 07:45:32 +06:00
Sergey M․
f9f49d87c2 [youtube] Add test for #8536 2016-02-13 05:18:58 +06:00
Sergey M․
6863631c26 [youtube] Improve multifeed videos extraction (Closes #8536) 2016-02-13 05:01:20 +06:00
Sergey M․
9d939cec48 [extractor/generic] Add direct mpd url test 2016-02-13 00:36:47 +06:00
Sergey M․
4c77d3f52a [YoutubeDL] Allow bestvideo+bestaudio for any extractor 2016-02-13 00:23:14 +06:00
Sergey M․
7be747b921 [extractor/generic] Pass mpd base url to _parse_mpd_formats 2016-02-13 00:15:59 +06:00
Sergey M․
bb20526b64 [extractor/common] Improve base url construction 2016-02-13 00:13:56 +06:00
remitamine
bcbb1b08b2 Revert "[aenetworks] extract http formats"
This reverts commit 3d98f97c64.
2016-02-12 17:56:06 +01:00
remitamine
3d98f97c64 [aenetworks] extract http formats 2016-02-12 17:39:32 +01:00
remitamine
c349456ef6 [extractor/common] strip http urls in smil manifest 2016-02-12 17:38:48 +01:00
Sergey M․
5a4905924d [extractor/generic] Improve dailymotion embed detection (Closes #8521, closes #8325) 2016-02-12 22:03:10 +06:00
Sergey M․
b826035dd5 [vimeo] Fix authentication (Closes #8520) 2016-02-12 03:16:26 +06:00
remitamine
a7cab4d039 [theplatform] remove unused import and change smil url for ThePlatformFeedIE 2016-02-11 18:50:14 +01:00
remitamine
fc3810f6d1 Merge branch 'master' of github.com:rg3/youtube-dl 2016-02-11 18:13:56 +01:00
remitamine
3dc71d82ce [theplatform] fix pid extraction in the platform feed 2016-02-11 18:13:03 +01:00
Sergey M․
9c7b38981c [utils] Bump Firefox version in User-Agent
Old version number causes Youtube not to serve some formats in ytplayer.config
2016-02-11 23:12:30 +06:00
remitamine
8b85ac3fd9 [cbc] Add new extractor(closes #3803)(closes #4731)(closes #5309) 2016-02-11 18:10:32 +01:00
remitamine
81e1c4e2fc [extractor/common] remove duplicate rtmp formats in smil manifest 2016-02-11 17:58:48 +01:00
Sergey M․
388ae76b52 [YoutubeDL] Fix format resolution when height is missing 2016-02-11 22:46:13 +06:00
Sergey M․
b67d63149d [youtube] Fix typos 2016-02-11 22:33:08 +06:00
Sergey M․
28280e8ded [plays] PEP 8 2016-02-11 22:02:57 +06:00
Sergey M․
6b3fbd3425 [pbs] Fix multi part videos extraction 2016-02-11 22:02:37 +06:00
Sergey M․
a7ab46375b [pbs] Update some tests 2016-02-11 21:43:01 +06:00
Sergey M․
b14d5e26f6 [pbs] Improve description extraction 2016-02-11 21:28:09 +06:00
Sergey M․
9a61dfba0c [pbs] Revert prefer portalplayer 2016-02-11 21:22:57 +06:00
remitamine
154c209e2d [extractor/common] improve dash format ids 2016-02-11 10:33:26 +01:00
remitamine
d1ea5e171f [plays] Add new extractor(#8458) 2016-02-11 10:30:31 +01:00
remitamine
a1188d0ed0 [crackle] add prefix to format ids 2016-02-10 22:39:33 +01:00
remitamine
47d205a646 [crackle] improve format sorting 2016-02-10 22:23:56 +01:00
remitamine
80f772c28a [crackle] Add new extractor 2016-02-10 22:16:21 +01:00
Philipp Hagemeister
f817d9bec1 release 2016.02.10 2016-02-10 16:17:38 +01:00
Sergey M․
e2effb08a4 [YoutubeDL] Sanitize format_id (Closes #8494) 2016-02-10 21:16:58 +06:00
Sergey M․
7fcea295c5 [pbs] Switch to portal player by default (Closes #8491) 2016-02-10 20:46:38 +06:00
Sergey M․
cc799437ea [youku] Report private videos (Closes #8498) 2016-02-10 20:05:17 +06:00
Sergey M․
89d23f37f2 [hotstar] Relax _VALID_URL (Closes #8487) 2016-02-10 04:43:00 +06:00
Philipp Hagemeister
b92071ef00 release 2016.02.09.1 2016-02-09 20:12:36 +01:00
Sergey M․
47246ae26c [viddler] Update tests 2016-02-10 01:12:47 +06:00
Sergey M․
9c15869c28 [viddler] Add support for secret videos (Closes #8481) 2016-02-10 01:09:07 +06:00
remitamine
51e9094f4a [extractor/common] extract youtube dash formats filesize(fixes #8480) 2016-02-09 20:05:39 +01:00
remitamine
5e3a6fec33 [fox] update test 2016-02-09 17:30:42 +01:00
remitamine
c43fe0268c [downloader/dash] Implement dashsegments fd in terms of fragment fd 2016-02-09 17:25:44 +01:00
remitamine
d413095f7e [extractor/common] remove duplicated formats and subtiles in smil manifests 2016-02-09 17:15:41 +01:00
remitamine
1bedf4de06 [fox] extract http formats 2016-02-09 17:12:34 +01:00
Sergey M․
3967a761f4 [mailru] Fix tests 2016-02-09 21:31:51 +06:00
Sergey M․
b081350bd9 [mailru] Improve and modernize 2016-02-09 21:30:48 +06:00
Sergey M․
16f1430ba6 [mailru] Prefer metaUrl API (Closes #8474) 2016-02-09 21:14:02 +06:00
Philipp Hagemeister
085ad71157 release 2016.02.09 2016-02-09 12:57:51 +01:00
Sergey M․
35972ba172 [vk] Improve rutube embeds detection (Closes #8461) 2016-02-08 21:30:23 +06:00
Sergey M․
3834d3e35c [youtube] Clarify itag 36 height and abr (Closes #8457) 2016-02-08 01:30:57 +06:00
Sergey M
8d0a2a2a4e [README.md] Fix typo 2016-02-07 21:23:29 +06:00
Sergey M
11c0339bec [README.md] Clarify quotes in output template 2016-02-07 21:22:33 +06:00
Sergey M
915dd77783 [README.md] Add output template example for streaming to stdout 2016-02-07 21:21:14 +06:00
Sergey M․
b6bfa6fb79 [konserthusetplay] Reorder code pieces 2016-02-07 21:18:32 +06:00
Sergey M․
f070197bd7 [konserthusetplay] Improve _VALID_URL 2016-02-07 21:16:31 +06:00
Sergey M․
5a7699bb2e [konserthusetplay] Improve and extract all formats (Closes #8381) 2016-02-07 21:11:59 +06:00
ovitei
8628d26f38 [KonserthusetPlay] Add new extractor (partial support) 2016-02-07 19:47:29 +06:00
Sergey M․
8411229bd5 [utils] Allow dot in strip_jsonp 2016-02-07 19:47:09 +06:00
Sergey M
72b9ebc65d [README.md] Document extractor sequences in output template 2016-02-07 19:08:54 +06:00
Sergey M
3b799ca14c [README.md] Clarify percent literal and output to stdout 2016-02-07 19:06:42 +06:00
Sergey M
0474512e30 [README.md] Document even more sequences in output template 2016-02-07 19:00:59 +06:00
Sergey M
f0905c6ec3 [README.md] Document more sequences in output template 2016-02-07 18:45:44 +06:00
Sergey M․
86296ad2cd [utils] Add ability to control skipping false values in dict_get 2016-02-07 08:13:04 +06:00
Sergey M․
52f5889f77 [vlive] Improve and extract more metadata (Closes #8446) 2016-02-07 06:17:40 +06:00
Sergey M․
81e0b4f2d1 Credit @EraYaN for vlive update (#8446) 2016-02-07 06:14:26 +06:00
Sergey M․
cbecc9b903 [utils] Add dict_get convenience method 2016-02-07 06:12:53 +06:00
Erwin de Haan
b8b465af3e [vlive] Updated to new V App/VLive api.
More robust with getting keys and ids from website.
2016-02-07 05:27:17 +06:00
pulpe
59b35c6745 [IPrima] Remove test video_id 2016-02-06 21:42:24 +01:00
Jaime Marquínez Ferrándiz
7032833011 [iprima] Follow pep8 2016-02-06 21:37:28 +01:00
pulpe
f406c78785 [IPrima] Fix extractor (fixes #7617) 2016-02-06 21:23:41 +01:00
Sergey M
f326b5837a Merge pull request #8445 from bpfoley/rte-newurl
[rte:radio] Add support for RTMP downloads, alternate URL style
2016-02-07 00:28:39 +05:00
Brian Foley
5dd4b3468f [rte:radio] Add support for RTMP downloads, alternate URL style
This is useful as
a) RTMP downloads are a good deal faster to download
b) Older items are available only as RTMP streams
2016-02-06 18:42:57 +00:00
Jaime Marquínez Ferrándiz
d4f8e83404 [FFmpegSubtitlesConvertorPP] remove unused variable 2016-02-06 19:04:53 +01:00
Jaime Marquínez Ferrándiz
7b8b007cd9 [FFmpegSubtitlesConvertorPP] remove intermediate srt files 2016-02-06 19:04:18 +01:00
Jaime Marquínez Ferrándiz
3547d26587 [FFmpegSubtitlesConvertorPP] correctly update the extension (fixes #8444) 2016-02-06 18:58:18 +01:00
Jaime Marquínez Ferrándiz
7e62c2eb6d [FFmpegSubtitlesConvertorPP] fix not working when srt is used as the intermediate format between ttml/dfxp and other format
It was trying to use the ttml/dfxp file with ffmpeg, which doesn't have support for them.
I broke it in e04398e397.
2016-02-06 18:51:05 +01:00
Sergey M․
56401e1e5f [downloader/hls] Do not send 'q' to ffmpeg on Windows (Closes #8300) 2016-02-06 23:24:22 +06:00
Sergey M
860db2d508 [README.md] Fix typo 2016-02-06 22:40:20 +06:00
Sergey M
4b8874975c [README.md] Remove non-relevant info 2016-02-06 22:39:50 +06:00
Sergey M
bd6b6f6622 [README.md] Fix typo 2016-02-06 22:36:30 +06:00
Sergey M․
4340727e6c [videomore] Fix typo 2016-02-06 22:36:30 +06:00
Sergey M
3ceccade87 [README.md] Improve output template documentation and add more examples 2016-02-06 22:33:49 +06:00
remitamine
28ad7df65d [generic] detect MPD manfiest only from the content 2016-02-06 14:51:45 +01:00
Sergey M․
79a3508579 [extractor/generic] Detect DASH manifests in found URLs and extract mpd formats 2016-02-06 19:42:03 +06:00
Sergey M․
1b840245bd [extractor/generic] Detect DASH manifests and extract mpd formats 2016-02-06 19:35:32 +06:00
remitamine
6a3828fddd [common] use float conversion instead of using division from __future__ 2016-02-06 14:27:04 +01:00
remitamine
91cb6b5065 rename _parse_mpd to _parse_mpd_formats and add default value for mpd namespace 2016-02-06 14:03:48 +01:00
remitamine
0826a0b555 [common] sort dash formats 2016-02-06 06:52:48 +01:00
remitamine
bcbbb98bfe [generic] extract dash formats detected using content type 2016-02-06 06:47:38 +01:00
remitamine
66159b38aa Merge pull request #8408 from remitamine/dash
Add generic support for mpd manifests(dash formats)
2016-02-06 06:26:02 +01:00
Sergey M․
23d17e4beb [youtube] Fix automatic captions 2016-02-06 06:44:38 +06:00
Sergey M․
d97b0e3241 [vidme] Clarify IE_NAMEs 2016-02-05 23:58:26 +06:00
Sergey M․
eb2533ec4c [vidme:user:likes] Add extractor 2016-02-05 23:57:35 +06:00
Sergey M․
b7b365067f [vidme:user] Add extractor (Closes #8416) 2016-02-05 23:32:22 +06:00
remitamine
86e284e028 Merge branch 'master' of github.com:rg3/youtube-dl 2016-02-05 17:20:00 +01:00
Sergey M․
d9e543b680 [spankbang] Add test with single format (#8398) 2016-02-05 22:16:56 +06:00
Sergey M․
c773c232d8 [spankbang] Check formats (#8398) 2016-02-05 22:16:40 +06:00
Sergey M․
58ae24336a [spankbang] Extend format id regex (Closes #8398) 2016-02-05 22:16:12 +06:00
remitamine
7d3a035ee0 [ffmpeg] check for m3u8 protocol in FFmpegMetadataPP 2016-02-05 17:12:49 +01:00
Philipp Hagemeister
e06e75c7e7 release 2016.02.05.1 2016-02-05 15:17:31 +01:00
remitamine
593e0f43b4 [ffmpeg] fix condition(fixes #8440) 2016-02-05 15:12:06 +01:00
Philipp Hagemeister
008ab0f814 release 2016.02.05 2016-02-05 11:04:00 +01:00
Jaime Marquínez Ferrándiz
3f7e8750d4 [arte.tv:+7] Fix extraction (fixes #8427) 2016-02-04 20:16:47 +01:00
Philipp Hagemeister
f1ed3acae5 release 2016.02.04 2016-02-04 13:39:26 +01:00
remitamine
920d21b9d3 [test_subtitles] update youtube subtitles tests 2016-02-04 08:50:55 +01:00
remitamine
2fb35d1c28 [youtube] fix subtitle order 2016-02-04 08:39:01 +01:00
remitamine
09be85b8dd [youtube] fix subtitle extraction(fixes #8415) 2016-02-04 08:28:37 +01:00
remitamine
eadc3ccd50 [generic] extract m3u8 formats when mpegurl content type detected 2016-02-04 01:25:36 +01:00
remitamine
255732f0d3 [common] fix segment duration calculation 2016-02-03 23:57:08 +01:00
remitamine
53c269c6fd [common] fix media_template string formating 2016-02-03 23:54:34 +01:00
remitamine
675d001633 [common] skip drm protected dash formats 2016-02-03 18:44:43 +01:00
Yen Chi Hsuan
58be922079 [kuwo] Check for georestriction 2016-02-04 01:26:25 +08:00
Sergey M
c84d3a557d [README.md] Clarify unavailable sequences in output format 2016-02-03 19:18:25 +05:00
remitamine
d577c79632 [common] ignore ISO 639-2 generic codes 2016-02-03 13:24:07 +01:00
remitamine
6ad2b01e14 [srgssr] use flv as ext for rtmp formats 2016-02-02 23:09:50 +01:00
remitamine
fd3a1f3d60 [cbsnews] add support for live videos(fixes #7010) 2016-02-02 23:02:18 +01:00
Jaime Marquínez Ferrándiz
87de7069b9 [utils] dfxp2srt: make TTMLPElementParser inherit from object
For consistency between python 2 and 3.
2016-02-02 22:30:13 +01:00
remitamine
6fba62c87a [ffmpeg] fix adding metadata when using --hls-prefer-native(#8350) 2016-02-02 22:14:23 +01:00
remitamine
f14be22816 [common] remove duplicate reference to namespace 2016-02-02 22:02:08 +01:00
Yen Chi Hsuan
1df4141196 [test_YoutubeDL] Fix test_youtube_format_selection
Broken since a6c2c24479. Thanks to
@jaimeMF and @anisse for pointing that out
2016-02-03 03:42:37 +08:00
remitamine
fae45ede08 Merge pull request #8354 from remitamine/m3u8_metadata
[ffmpeg] fix adding metadata when using m3u8_native(fixes #8350)
2016-02-02 19:13:58 +01:00
remitamine
4e0cff2a50 Merge pull request #8348 from remitamine/dfxp2srt-text
[utils] fix dfxp2srt text extraction(fixes #8055)
2016-02-02 18:36:26 +01:00
remitamine
9c74423510 [common] fix media template regex 2016-02-02 18:30:31 +01:00
remitamine
5976e7ab57 [vevo] add support for dash formats 2016-02-02 18:13:01 +01:00
remitamine
a1a22572fb [downloader/dash] make initialization_url optional 2016-02-02 18:12:32 +01:00
remitamine
c11875b328 [facebook] use _parse_mpd 2016-02-02 18:11:16 +01:00
remitamine
8ff648e4f9 [youtube] use _extract_mpd_formats 2016-02-02 18:10:23 +01:00
remitamine
1bac34556f [common] add a generic support for mpd manifests 2016-02-02 18:09:25 +01:00
Sergey M․
0436157b95 [vk:uservideos] Improve _VALID_URL (Closes #8389) 2016-02-02 00:52:37 +06:00
Philipp Hagemeister
ae0db349c1 release 2016.02.01 2016-02-01 12:00:32 +01:00
Yen Chi Hsuan
08411970d5 Merge pull request #8374 from yan12125/facebook-dash
Facebook DASH formats
2016-02-01 18:31:49 +08:00
Yen Chi Hsuan
dc724e0c8b [daum.net:user] Match more URLs (#1952) 2016-02-01 18:26:23 +08:00
Yen Chi Hsuan
0a5d1ec706 Merge branch 'ping-daum-playlist-user' 2016-02-01 18:19:32 +08:00
Yen Chi Hsuan
58250eff2b [daum] Update test_daum_1 2016-02-01 18:19:02 +08:00
Yen Chi Hsuan
11a4efc505 [daum] Do not match a single URL with multiple info extractors 2016-02-01 18:15:53 +08:00
Yen Chi Hsuan
7537b35fb8 [daum] PEP8 2016-02-01 17:40:35 +08:00
Yen Chi Hsuan
33cc74eeeb Merge branch 'daum-playlist-user' of https://github.com/ping/youtube-dl into ping-daum-playlist-user 2016-02-01 17:29:12 +08:00
Yen Chi Hsuan
f021acee49 [kickstarter] Fix title and test_kickstarter
It's the description page that contains a video. The original URL is now
the calendar.
2016-02-01 17:21:40 +08:00
Yen Chi Hsuan
abe694ca95 [kickstarter] Eliminate the warning message and add_ie 2016-02-01 17:10:11 +08:00
Yen Chi Hsuan
b286f201a8 [YoutubeDL] Do not override ie_key in url_transparent 2016-02-01 17:05:48 +08:00
Yen Chi Hsuan
bd93a12e85 [vidzi] Fix _TESTS 2016-02-01 17:03:31 +08:00
Yen Chi Hsuan
92769650fa [vidzi] Fix extraction
Closes #8386.

Vidzi.tv now uses jwplayer, which can be handled by GenericIE
2016-02-01 15:40:42 +08:00
Yen Chi Hsuan
dc4fe5c6d7 [allocine] Use xpath_element 2016-02-01 05:32:28 +08:00
Yen Chi Hsuan
566bda51f2 [bpb] Fix extraction and update tests 2016-02-01 05:00:09 +08:00
Yen Chi Hsuan
f63757ec35 [allocine] Fix for Python 2.6
Python 2.6 does not support .// syntax in find(). Fortunately, the
interested node is at the top level
2016-02-01 03:34:02 +08:00
Yen Chi Hsuan
7a0ed06909 [allocine] Fix extraction of test_allocine_1 and update tests 2016-02-01 03:31:58 +08:00
Yen Chi Hsuan
9934fe76be [acast] Remove ACastBaseIE
No longer necessary as _API_BASE_URL is used by ACastChannelIE only
2016-02-01 03:08:46 +08:00
Yen Chi Hsuan
a8aad21001 [acast] Fix extraction 2016-02-01 03:07:45 +08:00
Yen Chi Hsuan
d055bf91cc Merge branch 'rrooij-gamekings_fix' 2016-02-01 02:21:02 +08:00
Yen Chi Hsuan
0e1b1a011d [gamekings] Stricter checks 2016-02-01 02:19:03 +08:00
Yen Chi Hsuan
eab3c2895c [gamekings] add_ie 2016-02-01 02:15:25 +08:00
Yen Chi Hsuan
163da6a484 [gamekings] Add MD5 back
The test is now a YouTube video, whose MD5 should be stable
2016-02-01 02:13:11 +08:00
Yen Chi Hsuan
324916d11a Merge branch 'gamekings_fix' of https://github.com/rrooij/youtube-dl into rrooij-gamekings_fix 2016-02-01 02:11:25 +08:00
Jaime Marquínez Ferrándiz
3ccb0655c1 [youtube] Use 'orderedSet' instead of 'set' to preserve the order 2016-01-31 15:11:00 +01:00
Jaime Marquínez Ferrándiz
e04398e397 [FFmpegSubtitlesConvertorPP] delete old subtitle files (fixes #8382) 2016-01-31 14:22:36 +01:00
Yen Chi Hsuan
231ea2a3bb [xuite] Replace the test case with my uploaded one 2016-01-31 20:21:57 +08:00
Yen Chi Hsuan
b99d88c6a1 [youporn] Fix uploader and description 2016-01-31 20:12:43 +08:00
Yen Chi Hsuan
189d72d5fd [test_subtitles] Fix TestRaiSubtitles
RaiIE is renamed to RaiTVIE in 06d5556dfa
2016-01-31 20:12:43 +08:00
Yen Chi Hsuan
a7aab0c23e [test_youtube_lists] Fix TestYoutubeLists.test_youtube_course
Youtube entries are now generators
2016-01-31 20:12:43 +08:00
Philipp Hagemeister
a69bee4762 release 2016.01.31 2016-01-31 12:57:18 +01:00
Sergey M․
9acd33094d [youtube] Filter duplicates in playlists base extractor 2016-01-31 17:52:02 +06:00
Sergey M․
8e7aad2075 [youtube] Use authentication for entry list base extractor (Closes #8380) 2016-01-31 17:49:59 +06:00
rrooij
ce5879fa14 [Gamekings] Fix viewing of old videos
Some old videos that aren't on Vimeo are being uploaded to YouTube under the
'Gamekings Vault' channel. They use YouTube now for some videos as video
hosting instead of Vimeo or their own hosting. The first test failed to
succeed under the existing code, but works now by using the YouTube
extractor.

The Regex is changed to find the new gogoVideo JavaScript line with the
YouTube embed. Checking if there is a YouTube embed is done by a String
find, which is probably not the best method of checking this.
2016-01-31 00:20:46 +01:00
Yen Chi Hsuan
7b7507d6e1 [letv] Fix LetvCloud extraction 2016-01-31 07:15:43 +08:00
rrooij
14823decf3 [Gamekings] Fix url from .tv to .nl
Gamekings doesn't use the .tv top level domain anymore, but the regular
domain for Dutch sites.
2016-01-31 00:03:23 +01:00
Sergey M․
673fb82e65 [schooltv] Improve video id regex 2016-01-31 04:41:18 +06:00
Sergey M
181cf24bc0 Merge pull request #8376 from rrooij/schooltv
[schooltv] Add extractor for SchoolTV playlists
2016-01-31 04:36:33 +06:00
rrooij
89f2602880 [schooltv] Add extractor for SchoolTV playlists
This closes #8163
2016-01-30 23:21:42 +01:00
Yen Chi Hsuan
db9b1dbcd9 [nba] Add ext for hls formats and fix test_NBA 2016-01-31 04:58:10 +08:00
Yen Chi Hsuan
e881c4bcab [nbc] Use NBC's id and fix _TESTS
ThePlatform URL gives the same ID for all _TESTS
2016-01-31 04:58:10 +08:00
Yen Chi Hsuan
670ad51ade [nrktv] Fix _TESTS 2016-01-31 04:58:10 +08:00
Yen Chi Hsuan
eb6fc7d32a [senateisvp] Fix test_SenateISVP and test_SenateISVP_1 2016-01-31 04:58:10 +08:00
Yen Chi Hsuan
ed1a390583 [tv2] Fix test_TV2 2016-01-31 04:58:10 +08:00
Yen Chi Hsuan
809e1857c5 [screenwavemedia] Fix HLS extension and test_TeamFour 2016-01-31 04:58:10 +08:00
Yen Chi Hsuan
7c38af48b9 [vgtv] Fix test_VGTV_2 2016-01-31 04:58:10 +08:00
Yen Chi Hsuan
60ad3eb970 [viidea] Skip download for the test case requiring ffmpeg 2016-01-31 04:58:10 +08:00
Sergey M․
a7685b3a6b [npo] Add extension for m3u8 2016-01-31 02:38:28 +06:00
remitamine
8f1fddc816 [limelight] fix format sorting and make m3u8 and f4m extraction non fatal 2016-01-30 20:51:47 +01:00
remitamine
1bf996fa5c [generic] Add support for Limelight API 2016-01-30 20:45:56 +01:00
Yen Chi Hsuan
248ae880b6 [facebook] Add md5 for the test case with DASH 2016-01-30 23:01:19 +08:00
Yen Chi Hsuan
2d2fa82d17 [common] Add _extract_dash_manifest_formats 2016-01-30 22:52:23 +08:00
Yen Chi Hsuan
c94678957f [common] Remove unused arguments 2016-01-30 22:45:16 +08:00
Yen Chi Hsuan
16f38a699f [common] Rename to namespace
For consistency with _parse_smil_*
2016-01-30 22:40:56 +08:00
Yen Chi Hsuan
a6c2c24479 [youtube] Remove '(v|a)codec': 'none' entries
Not used anymore
2016-01-30 22:28:53 +08:00
Sergey M․
b8c9926c0a [downloader/f4m] Do not update fragment list while test 2016-01-30 19:43:25 +06:00
Yen Chi Hsuan
df374b5222 [common] Prefer the manifest than formats_dict in determining codecs 2016-01-30 21:42:27 +08:00
Yen Chi Hsuan
5ea1eb78f5 [common] Fix for youtube 2016-01-30 21:36:01 +08:00
Yen Chi Hsuan
5d2c0fd9ba [youtube] Pass self._formats to _parse_dash_manifest 2016-01-30 21:32:15 +08:00
Yen Chi Hsuan
0803753fea [facebook] Add support for DASH manifests 2016-01-30 21:31:53 +08:00
Sergey M․
2c2f1efdcd [downloader/fragment] Remove superfluous whitespace 2016-01-30 19:30:31 +06:00
Yen Chi Hsuan
b323e1707d [common] Modify _parse_dash_manifest for use in Facebook 2016-01-30 21:27:43 +08:00
Sergey M․
09104e9930 [downloader/f4m] Add live stream flag to context
Now download progress for f4m livestreams is reported correctly
2016-01-30 19:22:15 +06:00
Sergey M․
5fa1702ca6 [downloader/fragment] Do not report total bytes estimation and eta for live streams 2016-01-30 19:20:52 +06:00
Yen Chi Hsuan
17b598d30c [common] _parse_dash_manifest() from youtube.py 2016-01-30 21:05:55 +08:00
Sergey M․
53be8894e4 [options] Add missing closing parenthesis 2016-01-30 18:44:22 +06:00
Sergey M․
c3deacd562 [matchtv] Add extractor (Closes #8313) 2016-01-30 18:30:27 +06:00
Sergey M․
8ab3fe81d8 [downloader/f4m] Prefer bootstrap url attribute over inline bootstrap info 2016-01-30 18:28:38 +06:00
ping
2f0a33d8a3 [daum.net] Support for playlists, user channels 2016-01-30 20:10:36 +08:00
Yen Chi Hsuan
05d0d131a7 [youtube] Move decrypt_sig out of _parse_dash_manifest 2016-01-30 20:05:56 +08:00
Yen Chi Hsuan
c140629995 [facebook] Support alternative webpage form
Fixes #8371
2016-01-30 19:33:22 +08:00
Jaime Marquínez Ferrándiz
7d106a65ca Add --hls-use-mpegts option
When using the mpegts container hls vidoes can be played while being downloaded (useful if you are recording a live stream).
VLC and mpv play them file, but QuickTime doesn't.
2016-01-30 12:26:40 +01:00
Yen Chi Hsuan
0179f6a830 [daum] Add 'thumbnail' to all _TESTS 2016-01-30 16:54:14 +08:00
Yen Chi Hsuan
830afe85dc [daum.net] Support VodPlayer.swf URLs (closes #8173) 2016-01-30 16:50:13 +08:00
Yen Chi Hsuan
8bf39420b4 Merge remote-tracking branch 'upstream/master' 2016-01-30 16:25:55 +08:00
Yen Chi Hsuan
71d08b3e29 Merge branch 'ping-daum-fix-clip' 2016-01-30 16:25:06 +08:00
Yen Chi Hsuan
06ffa33485 [daum.net] Move the request to ClipInfoXml.do
To reduce the number of wasted requests
2016-01-30 16:23:37 +08:00
Yen Chi Hsuan
874e05975b Merge branch 'daum-fix-clip' of https://github.com/ping/youtube-dl into ping-daum-fix-clip 2016-01-30 16:22:37 +08:00
ping
f5d30d521c [daum] Fix add view_count, comment_count to test 2016-01-30 11:09:30 +08:00
ping
e047922be0 [daum] Fix copy-paste mistake 2016-01-30 11:04:11 +08:00
Sergey M․
83ab8a79cc [espn] Improve video id extraction (Closes #8368) 2016-01-30 01:48:54 +06:00
Sergey M․
350cf045d8 [extractor/common] Restrict checks when auto calculating tbr 2016-01-30 01:47:46 +06:00
Sergey M․
68a0ea15b4 [cspan] Unescape path (Closes #8365) 2016-01-30 00:26:33 +06:00
Jaime Marquínez Ferrándiz
2b4f5e68d1 [azubu] Add extractor for live streams (closes #8343) 2016-01-29 15:36:33 +01:00
Philipp Hagemeister
055f417278 release 2016.01.29 2016-01-29 12:20:08 +01:00
Jaime Marquínez Ferrándiz
70029bc348 [youtube:user] Require 'https?://' in the url (fixes #8356)
It was matching www.youtube.com/embed/WpfukLMe1TM.
The generic extractor automatically adds http:// if it's missing.
2016-01-29 11:27:11 +01:00
remitamine
cf57433bbd [ffmpeg] fix adding metadata when using m3u8_native(fixes #8350) 2016-01-28 18:57:32 +01:00
Sergey M․
1ac6e794cb [bbc] Add test for #8147 2016-01-28 23:27:48 +06:00
Sergey M․
a853427427 [bbc] Add another description regex 2016-01-28 23:23:13 +06:00
Sergey M․
50e989e263 [bbc] Add another title regex (Closes #8340) 2016-01-28 23:19:53 +06:00
Sergey M․
10e6ed9341 [ok] Add support for mobile URLs (Closes #8345) 2016-01-28 22:56:49 +06:00
Sergey M․
38c84acae5 [ndr:embed:base] Add missing ext for m3u8 2016-01-28 22:50:18 +06:00
Yen Chi Hsuan
29f46c2bee Credit @dyn888 for improving format selection
[ci skip]
2016-01-28 22:56:59 +08:00
Yen Chi Hsuan
39c10a2b6e Merge pull request #8346 from dyn888/dyn888-regex-1
Regex pattern update to match more codecs (fixes #6858)
2016-01-28 22:22:43 +08:00
dyn888
b913348d5f Test codec with a dot '.' in name selection. 2016-01-28 15:07:33 +01:00
remitamine
2b14cb566f [utils] fix dfxp2srt text extraction(fixes #8055) 2016-01-28 12:38:34 +01:00
dyn888
b0df5223be Update YoutubeDL.py 2016-01-28 12:07:15 +01:00
Sergey M․
ed7cd1e859 [cbsnews] Remove unused import 2016-01-28 00:42:04 +06:00
remitamine
f125d9115b [cbsnews] extract all formats 2016-01-27 19:11:21 +01:00
remitamine
a9d5f12fec Merge pull request #8328 from remitamine/hls-master-detect
[extractor/common] detect media playlist in _extract_m3u8_formats
2016-01-27 18:07:30 +01:00
remitamine
7f32e5dc35 [extractor/common] detect media playlist in _extract_m3u8_formats 2016-01-27 17:53:42 +01:00
Sergey M․
c3111ab34f [spankbang] Fix title extraction (Closes #8329) 2016-01-27 21:49:56 +06:00
Sergey M․
9339774af2 [spankbang] Fix formats extraction 2016-01-27 21:49:39 +06:00
Sergey M․
b0d21deda9 [extractor/common] Auto calculate tbr when missing 2016-01-27 21:11:17 +06:00
Philipp Hagemeister
fab6f0e65b release 2016.01.27 2016-01-27 08:32:03 +01:00
ping
b6c33fd544 [daum.net] Fixes #8331 2016-01-27 12:48:00 +08:00
Sergey M․
fb4b345800 [instagram] Make description optional (Closes #8326) 2016-01-26 21:46:51 +06:00
Sergey M․
af9c2a07ae [cspan] Extract from path when no qualities (Closes #8317) 2016-01-26 21:29:42 +06:00
remitamine
ab180fc648 Merge branch 'master' of github.com:rg3/youtube-dl 2016-01-26 15:55:38 +01:00
remitamine
682f8c43b5 [vevo] fallback to youtube video only if vevo video is geo restricted(fixes 8263)(fixes 2874) 2016-01-26 15:54:32 +01:00
Sergey M․
f693213567 [cspan] Fix clip/prog id extraction (#8317) 2016-01-26 20:42:20 +06:00
remitamine
9165d6bab9 [vevo] extract metadata and formats from api if videoinfo is empty
these was fixed by @yan12125 in ff51983e15
i only added some code to extract video metadata and more formats from
api
2016-01-26 13:46:58 +01:00
remitamine
2975fe1a7b [vevo] extract all formats and bypass geo restriction 2016-01-25 22:35:06 +01:00
Sergey M․
de691a498d [facebook:post] Add extractor (Closes #8321) 2016-01-25 22:18:34 +06:00
Sergey M․
2e6e742c3c [facebook] Add shortcut and reformat _VALID_URL 2016-01-25 22:15:21 +06:00
Yen Chi Hsuan
e9bd0f772b Merge pull request #8130 from dyn888/master
[youtube] added vcodec/acodec/abr for multiple itags
2016-01-25 01:15:11 +08:00
Yen Chi Hsuan
77f785076f [common] Keep full codec name from m3u8 manifests
See #8293. This is for consistency between YouTube and HLS formats.
2016-01-25 01:03:46 +08:00
Yen Chi Hsuan
94278f7202 [youtube] Prefer info from YouTube than _formats (#8293) 2016-01-25 01:02:19 +08:00
Yen Chi Hsuan
a0d8d704df [utils] Reorder items in mimetype2ext alphabetically 2016-01-25 01:01:15 +08:00
Yen Chi Hsuan
f6861ec96f [utils] Add more items to mimetype2ext (#8293)
These are used in Youtube formats
2016-01-25 00:58:53 +08:00
dyn888
e1a0bfdffe [youtube] added vcodec/acodec/abr for multiple itags
Should make downloading with filters more precise and easier, ie. bestvideo[vcodec=h264]. By default a lot of codecs are specified as avc1.xxxxxx and unique for each format, which makes them unusable for bestvideo selection.
2016-01-03 04:11:19 +01:00
153 changed files with 4132 additions and 1409 deletions

View File

@@ -155,3 +155,8 @@ Vignesh Venkat
Tom Gijselinck
Founder Fang
Andrew Alexeyew
Saso Bezlaj
Erwin de Haan
Jens Wille
Robin Houtevelts
Patrick Griffis

View File

@@ -1,6 +1,6 @@
**Please include the full output of youtube-dl when run with `-v`**, i.e. add `-v` flag to your command line, copy the **whole** output and post it in the issue body wrapped in \`\`\` for better formatting. It should look similar to this:
**Please include the full output of youtube-dl when run with `-v`**, i.e. **add** `-v` flag to **your command line**, copy the **whole** output and post it in the issue body wrapped in \`\`\` for better formatting. It should look similar to this:
```
$ youtube-dl -v http://www.youtube.com/watch?v=BaW_jenozKcj
$ youtube-dl -v <your command line>
[debug] System config: []
[debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']

View File

@@ -44,7 +44,7 @@ test:
ot: offlinetest
offlinetest: codetest
nosetests --verbose test --exclude test_download.py --exclude test_age_restriction.py --exclude test_subtitles.py --exclude test_write_annotations.py --exclude test_youtube_lists.py
nosetests --verbose test --exclude test_download.py --exclude test_age_restriction.py --exclude test_subtitles.py --exclude test_write_annotations.py --exclude test_youtube_lists.py --exclude test_iqiyi_sdk_interpreter.py
tar: youtube-dl.tar.gz

105
README.md
View File

@@ -173,6 +173,10 @@ which means you can modify it, redistribute it or use it however you like.
expected filesize (experimental)
--hls-prefer-native Use the native HLS downloader instead of
ffmpeg (experimental)
--hls-use-mpegts Use the mpegts container for HLS videos,
allowing to play the video while
downloading (some players may not be able
to play it)
--external-downloader COMMAND Use the specified external downloader.
Currently supports
aria2c,axel,curl,httpie,wget
@@ -438,28 +442,97 @@ On Windows you may also need to setup the `%HOME%` environment variable manually
The `-o` option allows users to indicate a template for the output file names. The basic usage is not to set any template arguments when downloading a single file, like in `youtube-dl -o funny_video.flv "http://some/video"`. However, it may contain special sequences that will be replaced when downloading each video. The special sequences have the format `%(NAME)s`. To clarify, that is a percent symbol followed by a name in parentheses, followed by a lowercase S. Allowed names are:
- `id`: The sequence will be replaced by the video identifier.
- `url`: The sequence will be replaced by the video URL.
- `uploader`: The sequence will be replaced by the nickname of the person who uploaded the video.
- `upload_date`: The sequence will be replaced by the upload date in YYYYMMDD format.
- `title`: The sequence will be replaced by the video title.
- `ext`: The sequence will be replaced by the appropriate extension (like flv or mp4).
- `epoch`: The sequence will be replaced by the Unix epoch when creating the file.
- `autonumber`: The sequence will be replaced by a five-digit number that will be increased with each download, starting at zero.
- `playlist`: The sequence will be replaced by the name or the id of the playlist that contains the video.
- `playlist_index`: The sequence will be replaced by the index of the video in the playlist padded with leading zeros according to the total length of the playlist.
- `format_id`: The sequence will be replaced by the format code specified by `--format`.
- `duration`: The sequence will be replaced by the length of the video in seconds.
- `id`: Video identifier
- `title`: Video title
- `url`: Video URL
- `ext`: Video filename extension
- `alt_title`: A secondary title of the video
- `display_id`: An alternative identifier for the video
- `uploader`: Full name of the video uploader
- `creator`: The main artist who created the video
- `release_date`: The date (YYYYMMDD) when the video was released
- `timestamp`: UNIX timestamp of the moment the video became available
- `upload_date`: Video upload date (YYYYMMDD)
- `uploader_id`: Nickname or id of the video uploader
- `location`: Physical location where the video was filmed
- `duration`: Length of the video in seconds
- `view_count`: How many users have watched the video on the platform
- `like_count`: Number of positive ratings of the video
- `dislike_count`: Number of negative ratings of the video
- `repost_count`: Number of reposts of the video
- `average_rating`: Average rating give by users, the scale used depends on the webpage
- `comment_count`: Number of comments on the video
- `age_limit`: Age restriction for the video (years)
- `format`: A human-readable description of the format
- `format_id`: Format code specified by `--format`
- `format_note`: Additional info about the format
- `width`: Width of the video
- `height`: Height of the video
- `resolution`: Textual description of width and height
- `tbr`: Average bitrate of audio and video in KBit/s
- `abr`: Average audio bitrate in KBit/s
- `acodec`: Name of the audio codec in use
- `asr`: Audio sampling rate in Hertz
- `vbr`: Average video bitrate in KBit/s
- `fps`: Frame rate
- `vcodec`: Name of the video codec in use
- `container`: Name of the container format
- `filesize`: The number of bytes, if known in advance
- `filesize_approx`: An estimate for the number of bytes
- `protocol`: The protocol that will be used for the actual download
- `extractor`: Name of the extractor
- `extractor_key`: Key name of the extractor
- `epoch`: Unix epoch when creating the file
- `autonumber`: Five-digit number that will be increased with each download, starting at zero
- `playlist`: Name or id of the playlist that contains the video
- `playlist_index`: Index of the video in the playlist padded with leading zeros according to the total length of the playlist
Available for the video that belongs to some logical chapter or section:
- `chapter`: Name or title of the chapter the video belongs to
- `chapter_number`: Number of the chapter the video belongs to
- `chapter_id`: Id of the chapter the video belongs to
Available for the video that is an episode of some series or programme:
- `series`: Title of the series or programme the video episode belongs to
- `season`: Title of the season the video episode belongs to
- `season_number`: Number of the season the video episode belongs to
- `season_id`: Id of the season the video episode belongs to
- `episode`: Title of the video episode
- `episode_number`: Number of the video episode within a season
- `episode_id`: Id of the video episode
Each aforementioned sequence when referenced in output template will be replaced by the actual value corresponding to the sequence name. Note that some of the sequences are not guaranteed to be present since they depend on the metadata obtained by particular extractor, such sequences will be replaced with `NA`.
For example for `-o %(title)s-%(id)s.%(ext)s` and mp4 video with title `youtube-dl test video` and id `BaW_jenozKcj` this will result in a `youtube-dl test video-BaW_jenozKcj.mp4` file created in the current directory.
Output template can also contain arbitrary hierarchical path, e.g. `-o '%(playlist)s/%(playlist_index)s - %(title)s.%(ext)s'` that will result in downloading each video in a directory corresponding to this path template. Any missing directory will be automatically created for you.
To specify percent literal in output template use `%%`. To output to stdout use `-o -`.
The current default template is `%(title)s-%(id)s.%(ext)s`.
In some cases, you don't want special characters such as 中, spaces, or &, such as when transferring the downloaded filename to a Windows system or the filename through an 8bit-unsafe channel. In these cases, add the `--restrict-filenames` flag to get a shorter title:
Examples (note on Windows you may need to use double quotes instead of single):
```bash
$ youtube-dl --get-filename -o "%(title)s.%(ext)s" BaW_jenozKc
$ youtube-dl --get-filename -o '%(title)s.%(ext)s' BaW_jenozKc
youtube-dl test video ''_ä↭𝕐.mp4 # All kinds of weird characters
$ youtube-dl --get-filename -o "%(title)s.%(ext)s" BaW_jenozKc --restrict-filenames
$ youtube-dl --get-filename -o '%(title)s.%(ext)s' BaW_jenozKc --restrict-filenames
youtube-dl_test_video_.mp4 # A simple file name
# Download YouTube playlist videos in separate directory indexed by video order in a playlist
$ youtube-dl -o '%(playlist)s/%(playlist_index)s - %(title)s.%(ext)s' https://www.youtube.com/playlist?list=PLwiyx1dc3P2JR9N8gQaQN_BCvlSlap7re
# Download Udemy course keeping each chapter in separate directory under MyVideos directory in your home
$ youtube-dl -u user -p password -o '~/MyVideos/%(playlist)s/%(chapter_number)s - %(chapter)s/%(title)s.%(ext)s' https://www.udemy.com/java-tutorial/
# Download entire series season keeping each series and each season in separate directory under C:/MyVideos
$ youtube-dl -o "C:/MyVideos/%(series)s/%(season_number)s - %(season)s/%(episode_number)s - %(episode)s.%(ext)s" http://videomore.ru/kino_v_detalayah/5_sezon/367617
# Stream the video being downloaded to stdout
$ youtube-dl -o - BaW_jenozKc
```
# FORMAT SELECTION
@@ -862,9 +935,9 @@ with youtube_dl.YoutubeDL(ydl_opts) as ydl:
Bugs and suggestions should be reported at: <https://github.com/rg3/youtube-dl/issues>. Unless you were prompted so or there is another pertinent reason (e.g. GitHub fails to accept the bug report), please do not send bug reports via personal email. For discussions, join us in the IRC channel [#youtube-dl](irc://chat.freenode.net/#youtube-dl) on freenode ([webchat](http://webchat.freenode.net/?randomnick=1&channels=youtube-dl)).
**Please include the full output of youtube-dl when run with `-v`**, i.e. add `-v` flag to your command line, copy the **whole** output and post it in the issue body wrapped in \`\`\` for better formatting. It should look similar to this:
**Please include the full output of youtube-dl when run with `-v`**, i.e. **add** `-v` flag to **your command line**, copy the **whole** output and post it in the issue body wrapped in \`\`\` for better formatting. It should look similar to this:
```
$ youtube-dl -v http://www.youtube.com/watch?v=BaW_jenozKcj
$ youtube-dl -v <your command line>
[debug] System config: []
[debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']

View File

@@ -30,6 +30,7 @@
- **AlJazeera**
- **Allocine**
- **AlphaPorno**
- **AnimeOnDemand**
- **anitube.se**
- **AnySex**
- **Aparat**
@@ -49,12 +50,14 @@
- **arte.tv:ddc**
- **arte.tv:embed**
- **arte.tv:future**
- **arte.tv:magazine**
- **AtresPlayer**
- **ATTTechChannel**
- **AudiMedia**
- **audiomack**
- **audiomack:album**
- **Azubu**
- **AzubuLive**
- **BaiduVideo**: 百度视频
- **bambuser**
- **bambuser:channel**
@@ -88,8 +91,11 @@
- **canalc2.tv**
- **Canalplus**: canalplus.fr, piwiplus.fr and d8.tv
- **Canvas**
- **CBC**
- **CBCPlayer**
- **CBS**
- **CBSNews**: CBS News
- **CBSNewsLiveVideo**: CBS News Live Videos
- **CBSSports**
- **CeskaTelevize**
- **channel9**: Channel 9
@@ -118,6 +124,7 @@
- **ComedyCentralShows**: The Daily Show / The Colbert Report
- **CondeNast**: Condé Nast media group: Allure, Architectural Digest, Ars Technica, Bon Appétit, Brides, Condé Nast, Condé Nast Traveler, Details, Epicurious, GQ, Glamour, Golf Digest, SELF, Teen Vogue, The New Yorker, Vanity Fair, Vogue, W Magazine, WIRED
- **Cracked**
- **Crackle**
- **Criterion**
- **CrooksAndLiars**
- **Crunchyroll**
@@ -133,6 +140,8 @@
- **DailymotionCloud**
- **daum.net**
- **daum.net:clip**
- **daum.net:playlist**
- **daum.net:user**
- **DBTV**
- **DCN**
- **dcn:live**
@@ -180,6 +189,7 @@
- **ExpoTV**
- **ExtremeTube**
- **facebook**
- **facebook:post**
- **faz.net**
- **fc2**
- **Fczenit**
@@ -257,7 +267,7 @@
- **Instagram**
- **instagram:user**: Instagram user profile
- **InternetVideoArchive**
- **IPrima** (Currently broken)
- **IPrima**
- **iqiyi**: 爱奇艺
- **Ir90Tv**
- **ivi**: ivi.ru
@@ -278,6 +288,7 @@
- **KeezMovies**
- **KhanAcademy**
- **KickStarter**
- **KonserthusetPlay**
- **kontrtube**: KontrTube.ru - Труба зовёт
- **KrasView**: Красвью
- **Ku6**
@@ -314,6 +325,7 @@
- **mailru**: Видео@Mail.Ru
- **MakerTV**
- **Malemotion**
- **MatchTV**
- **MDR**: MDR.DE and KiKA
- **media.ccc.de**
- **metacafe**
@@ -350,7 +362,7 @@
- **MySpace:album**
- **MySpass**
- **Myvi**
- **myvideo**
- **myvideo** (Currently broken)
- **MyVidster**
- **n-tv.de**
- **NationalGeographic**
@@ -400,6 +412,7 @@
- **NowTV** (Currently broken)
- **NowTVList**
- **nowvideo**: NowVideo
- **Noz**
- **npo**: npo.nl and ntr.nl
- **npo.nl:live**
- **npo.nl:radio**
@@ -438,6 +451,7 @@
- **PlanetaPlay**
- **play.fm**
- **played.to**
- **PlaysTV**
- **Playtvak**: Playtvak.cz, iDNES.cz and Lidovky.cz
- **Playvid**
- **Playwire**
@@ -449,6 +463,7 @@
- **PornHd**
- **PornHub**
- **PornHubPlaylist**
- **PornHubUserVideos**
- **Pornotube**
- **PornoVoisines**
- **PornoXO**
@@ -506,10 +521,12 @@
- **Sapo**: SAPO Vídeos
- **savefrom.net**
- **SBS**: sbs.com.au
- **schooltv**
- **SciVee**
- **screen.yahoo:search**: Yahoo screen search
- **Screencast**
- **ScreencastOMatic**
- **ScreenJunkies**
- **ScreenwaveMedia**
- **SenateISVP**
- **ServingSys**
@@ -643,6 +660,7 @@
- **twitch:video**
- **twitch:vod**
- **twitter**
- **twitter:amplify**
- **twitter:card**
- **Ubu**
- **udemy**
@@ -674,7 +692,9 @@
- **VideoPremium**
- **VideoTt**: video.tt - Your True Tube (Currently broken)
- **videoweed**: VideoWeed
- **Vidme**
- **vidme**
- **vidme:user**
- **vidme:user:likes**
- **Vidzi**
- **vier**
- **vier:videos**

View File

@@ -14,6 +14,7 @@ from test.helper import FakeYDL, assertRegexpMatches
from youtube_dl import YoutubeDL
from youtube_dl.compat import compat_str, compat_urllib_error
from youtube_dl.extractor import YoutubeIE
from youtube_dl.extractor.common import InfoExtractor
from youtube_dl.postprocessor.common import PostProcessor
from youtube_dl.utils import ExtractorError, match_filter_func
@@ -221,9 +222,19 @@ class TestFormatSelection(unittest.TestCase):
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], 'dash-video-low')
formats = [
{'format_id': 'vid-vcodec-dot', 'ext': 'mp4', 'preference': 1, 'vcodec': 'avc1.123456', 'acodec': 'none', 'url': TEST_URL},
]
info_dict = _make_result(formats)
ydl = YDL({'format': 'bestvideo[vcodec=avc1.123456]'})
ydl.process_ie_result(info_dict.copy())
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], 'vid-vcodec-dot')
def test_youtube_format_selection(self):
order = [
'38', '37', '46', '22', '45', '35', '44', '18', '34', '43', '6', '5', '36', '17', '13',
'38', '37', '46', '22', '45', '35', '44', '18', '34', '43', '6', '5', '17', '36', '13',
# Apple HTTP Live Streaming
'96', '95', '94', '93', '92', '132', '151',
# 3D
@@ -237,6 +248,17 @@ class TestFormatSelection(unittest.TestCase):
def format_info(f_id):
info = YoutubeIE._formats[f_id].copy()
# XXX: In real cases InfoExtractor._parse_mpd_formats() fills up 'acodec'
# and 'vcodec', while in tests such information is incomplete since
# commit a6c2c24479e5f4827ceb06f64d855329c0a6f593
# test_YoutubeDL.test_youtube_format_selection is broken without
# this fix
if 'acodec' in info and 'vcodec' not in info:
info['vcodec'] = 'none'
elif 'vcodec' in info and 'acodec' not in info:
info['acodec'] = 'none'
info['format_id'] = f_id
info['url'] = 'url:' + f_id
return info
@@ -636,6 +658,42 @@ class TestYoutubeDL(unittest.TestCase):
ydl = YDL()
self.assertRaises(compat_urllib_error.URLError, ydl.urlopen, 'file:///etc/passwd')
def test_do_not_override_ie_key_in_url_transparent(self):
ydl = YDL()
class Foo1IE(InfoExtractor):
_VALID_URL = r'foo1:'
def _real_extract(self, url):
return {
'_type': 'url_transparent',
'url': 'foo2:',
'ie_key': 'Foo2',
}
class Foo2IE(InfoExtractor):
_VALID_URL = r'foo2:'
def _real_extract(self, url):
return {
'_type': 'url',
'url': 'foo3:',
'ie_key': 'Foo3',
}
class Foo3IE(InfoExtractor):
_VALID_URL = r'foo3:'
def _real_extract(self, url):
return _make_result([{'url': TEST_URL}])
ydl.add_info_extractor(Foo1IE(ydl))
ydl.add_info_extractor(Foo2IE(ydl))
ydl.add_info_extractor(Foo3IE(ydl))
ydl.extract_info('foo1:')
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['url'], TEST_URL)
if __name__ == '__main__':
unittest.main()

View File

@@ -56,7 +56,7 @@ class TestAllURLsMatching(unittest.TestCase):
assertChannel('https://www.youtube.com/channel/HCtnHdj3df7iM/videos')
def test_youtube_user_matching(self):
self.assertMatch('www.youtube.com/NASAgovVideo/videos', ['youtube:user'])
self.assertMatch('http://www.youtube.com/NASAgovVideo/videos', ['youtube:user'])
def test_youtube_feeds(self):
self.assertMatch('https://www.youtube.com/feed/watch_later', ['youtube:watchlater'])

View File

@@ -0,0 +1,47 @@
#!/usr/bin/env python
from __future__ import unicode_literals
# Allow direct execution
import os
import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import FakeYDL
from youtube_dl.extractor import IqiyiIE
class IqiyiIEWithCredentials(IqiyiIE):
def _get_login_info(self):
return 'foo', 'bar'
class WarningLogger(object):
def __init__(self):
self.messages = []
def warning(self, msg):
self.messages.append(msg)
def debug(self, msg):
pass
def error(self, msg):
pass
class TestIqiyiSDKInterpreter(unittest.TestCase):
def test_iqiyi_sdk_interpreter(self):
'''
Test the functionality of IqiyiSDKInterpreter by trying to log in
If `sign` is incorrect, /validate call throws an HTTP 556 error
'''
logger = WarningLogger()
ie = IqiyiIEWithCredentials(FakeYDL({'logger': logger}))
ie._login()
self.assertTrue('unable to log in:' in logger.messages[0])
if __name__ == '__main__':
unittest.main()

View File

@@ -21,7 +21,7 @@ from youtube_dl.extractor import (
NPOIE,
ComedyCentralIE,
NRKTVIE,
RaiIE,
RaiTVIE,
VikiIE,
ThePlatformIE,
ThePlatformFeedIE,
@@ -65,16 +65,16 @@ class TestYoutubeSubtitles(BaseTestSubtitles):
self.DL.params['allsubtitles'] = True
subtitles = self.getSubtitles()
self.assertEqual(len(subtitles.keys()), 13)
self.assertEqual(md5(subtitles['en']), '4cd9278a35ba2305f47354ee13472260')
self.assertEqual(md5(subtitles['it']), '164a51f16f260476a05b50fe4c2f161d')
for lang in ['it', 'fr', 'de']:
self.assertEqual(md5(subtitles['en']), '3cb210999d3e021bd6c7f0ea751eab06')
self.assertEqual(md5(subtitles['it']), '6d752b98c31f1cf8d597050c7a2cb4b5')
for lang in ['fr', 'de']:
self.assertTrue(subtitles.get(lang) is not None, 'Subtitles for \'%s\' not extracted' % lang)
def test_youtube_subtitles_sbv_format(self):
def test_youtube_subtitles_ttml_format(self):
self.DL.params['writesubtitles'] = True
self.DL.params['subtitlesformat'] = 'sbv'
self.DL.params['subtitlesformat'] = 'ttml'
subtitles = self.getSubtitles()
self.assertEqual(md5(subtitles['en']), '13aeaa0c245a8bed9a451cb643e3ad8b')
self.assertEqual(md5(subtitles['en']), 'e306f8c42842f723447d9f63ad65df54')
def test_youtube_subtitles_vtt_format(self):
self.DL.params['writesubtitles'] = True
@@ -260,7 +260,7 @@ class TestNRKSubtitles(BaseTestSubtitles):
class TestRaiSubtitles(BaseTestSubtitles):
url = 'http://www.rai.tv/dl/RaiTV/programmi/media/ContentItem-cb27157f-9dd0-4aee-b788-b1f67643a391.html'
IE = RaiIE
IE = RaiTVIE
def test_allsubtitles(self):
self.DL.params['writesubtitles'] = True

View File

@@ -22,6 +22,7 @@ from youtube_dl.utils import (
DateRange,
detect_exe_version,
determine_ext,
dict_get,
encode_compat_str,
encodeFilename,
escape_rfc3986,
@@ -34,6 +35,7 @@ from youtube_dl.utils import (
is_html,
js_to_json,
limit_length,
ohdave_rsa_encrypt,
OnDemandPagedList,
orderedSet,
parse_duration,
@@ -450,6 +452,28 @@ class TestUtil(unittest.TestCase):
data = urlencode_postdata({'username': 'foo@bar.com', 'password': '1234'})
self.assertTrue(isinstance(data, bytes))
def test_dict_get(self):
FALSE_VALUES = {
'none': None,
'false': False,
'zero': 0,
'empty_string': '',
'empty_list': [],
}
d = FALSE_VALUES.copy()
d['a'] = 42
self.assertEqual(dict_get(d, 'a'), 42)
self.assertEqual(dict_get(d, 'b'), None)
self.assertEqual(dict_get(d, 'b', 42), 42)
self.assertEqual(dict_get(d, ('a', )), 42)
self.assertEqual(dict_get(d, ('b', 'a', )), 42)
self.assertEqual(dict_get(d, ('b', 'c', 'a', 'd', )), 42)
self.assertEqual(dict_get(d, ('b', 'c', )), None)
self.assertEqual(dict_get(d, ('b', 'c', ), 42), 42)
for key, false_value in FALSE_VALUES.items():
self.assertEqual(dict_get(d, ('b', 'c', key, )), None)
self.assertEqual(dict_get(d, ('b', 'c', key, ), skip_false_values=False), false_value)
def test_encode_compat_str(self):
self.assertEqual(encode_compat_str(b'\xd1\x82\xd0\xb5\xd1\x81\xd1\x82', 'utf-8'), 'тест')
self.assertEqual(encode_compat_str('тест', 'utf-8'), 'тест')
@@ -471,6 +495,10 @@ class TestUtil(unittest.TestCase):
d = json.loads(stripped)
self.assertEqual(d, {'STATUS': 'OK'})
stripped = strip_jsonp('ps.embedHandler({"status": "success"});')
d = json.loads(stripped)
self.assertEqual(d, {'status': 'success'})
def test_uppercase_escape(self):
self.assertEqual(uppercase_escape(''), '')
self.assertEqual(uppercase_escape('\\U0001d550'), '𝕐')
@@ -765,6 +793,13 @@ The first line
{'nocheckcertificate': False}, '--check-certificate', 'nocheckcertificate', 'false', 'true', '='),
['--check-certificate=true'])
def test_ohdave_rsa_encrypt(self):
N = 0xab86b6371b5318aaa1d3c9e612a9f1264f372323c8c0f19875b5fc3b3fd3afcc1e5bec527aa94bfa85bffc157e4245aebda05389a5357b75115ac94f074aefcd
e = 65537
self.assertEqual(
ohdave_rsa_encrypt(b'aa111222', e, N),
'726664bd9a23fd0c70f9f1b84aab5e3905ce1e45a584e9cbcf9bcc7510338fc1986d6c599ff990d923aa43c51c0d9013cd572e13bc58f4ae48f2ed8c0b0ba881')
if __name__ == '__main__':
unittest.main()

View File

@@ -34,7 +34,7 @@ class TestYoutubeLists(unittest.TestCase):
ie = YoutubePlaylistIE(dl)
# TODO find a > 100 (paginating?) videos course
result = ie.extract('https://www.youtube.com/course?list=ECUl4u3cNGP61MdtwGTqZA0MreSaDybji8')
entries = result['entries']
entries = list(result['entries'])
self.assertEqual(YoutubeIE().extract_id(entries[0]['url']), 'j9WZyLZCBzs')
self.assertEqual(len(entries), 25)
self.assertEqual(YoutubeIE().extract_id(entries[-1]['url']), 'rYefUsYuEp0')

View File

@@ -263,7 +263,7 @@ class YoutubeDL(object):
the downloader (see youtube_dl/downloader/common.py):
nopart, updatetime, buffersize, ratelimit, min_filesize, max_filesize, test,
noresizebuffer, retries, continuedl, noprogress, consoletitle,
xattr_set_filesize, external_downloader_args.
xattr_set_filesize, external_downloader_args, hls_use_mpegts.
The following options are used by the post processors:
prefer_ffmpeg: If True, use ffmpeg instead of avconv if both are available,
@@ -605,12 +605,12 @@ class YoutubeDL(object):
if rejecttitle:
if re.search(rejecttitle, title, re.IGNORECASE):
return '"' + title + '" title matched reject pattern "' + rejecttitle + '"'
date = info_dict.get('upload_date', None)
date = info_dict.get('upload_date')
if date is not None:
dateRange = self.params.get('daterange', DateRange())
if date not in dateRange:
return '%s upload date is not in range %s' % (date_from_str(date).isoformat(), dateRange)
view_count = info_dict.get('view_count', None)
view_count = info_dict.get('view_count')
if view_count is not None:
min_views = self.params.get('min_views')
if min_views is not None and view_count < min_views:
@@ -707,7 +707,6 @@ class YoutubeDL(object):
It will also download the videos if 'download'.
Returns the resolved ie_result.
"""
result_type = ie_result.get('_type', 'video')
if result_type in ('url', 'url_transparent'):
@@ -736,7 +735,7 @@ class YoutubeDL(object):
force_properties = dict(
(k, v) for k, v in ie_result.items() if v is not None)
for f in ('_type', 'url'):
for f in ('_type', 'url', 'ie_key'):
if f in force_properties:
del force_properties[f]
new_result = info.copy()
@@ -748,18 +747,18 @@ class YoutubeDL(object):
new_result, download=download, extra_info=extra_info)
elif result_type == 'playlist' or result_type == 'multi_video':
# We process each entry in the playlist
playlist = ie_result.get('title', None) or ie_result.get('id', None)
playlist = ie_result.get('title') or ie_result.get('id')
self.to_screen('[download] Downloading playlist: %s' % playlist)
playlist_results = []
playliststart = self.params.get('playliststart', 1) - 1
playlistend = self.params.get('playlistend', None)
playlistend = self.params.get('playlistend')
# For backwards compatibility, interpret -1 as whole list
if playlistend == -1:
playlistend = None
playlistitems_str = self.params.get('playlist_items', None)
playlistitems_str = self.params.get('playlist_items')
playlistitems = None
if playlistitems_str is not None:
def iter_playlistitems(format):
@@ -783,7 +782,7 @@ class YoutubeDL(object):
entries = ie_entries[playliststart:playlistend]
n_entries = len(entries)
self.to_screen(
"[%s] playlist %s: Collected %d video ids (downloading %d of them)" %
'[%s] playlist %s: Collected %d video ids (downloading %d of them)' %
(ie_result['extractor'], playlist, n_all_entries, n_entries))
elif isinstance(ie_entries, PagedList):
if playlistitems:
@@ -797,7 +796,7 @@ class YoutubeDL(object):
playliststart, playlistend)
n_entries = len(entries)
self.to_screen(
"[%s] playlist %s: Downloading %d videos" %
'[%s] playlist %s: Downloading %d videos' %
(ie_result['extractor'], playlist, n_entries))
else: # iterable
if playlistitems:
@@ -808,7 +807,7 @@ class YoutubeDL(object):
ie_entries, playliststart, playlistend))
n_entries = len(entries)
self.to_screen(
"[%s] playlist %s: Downloading %d videos" %
'[%s] playlist %s: Downloading %d videos' %
(ie_result['extractor'], playlist, n_entries))
if self.params.get('playlistreverse', False):
@@ -906,7 +905,7 @@ class YoutubeDL(object):
str_operator_rex = re.compile(r'''(?x)
\s*(?P<key>ext|acodec|vcodec|container|protocol)
\s*(?P<op>%s)(?P<none_inclusive>\s*\?)?
\s*(?P<value>[a-zA-Z0-9_-]+)
\s*(?P<value>[a-zA-Z0-9._-]+)
\s*$
''' % '|'.join(map(re.escape, STR_OPERATORS.keys())))
m = str_operator_rex.search(filter_spec)
@@ -1289,6 +1288,9 @@ class YoutubeDL(object):
if format.get('format_id') is None:
format['format_id'] = compat_str(i)
else:
# Sanitize format_id from characters used in format selector expression
format['format_id'] = re.sub('[\s,/+\[\]()]', '_', format['format_id'])
format_id = format['format_id']
if format_id not in formats_dict:
formats_dict[format_id] = []
@@ -1339,7 +1341,6 @@ class YoutubeDL(object):
if req_format is None:
req_format_list = []
if (self.params.get('outtmpl', DEFAULT_OUTTMPL) != '-' and
info_dict['extractor'] in ['youtube', 'ted'] and
not info_dict.get('is_live')):
merger = FFmpegMergerPP(self)
if merger.available and merger.can_merge():
@@ -1796,7 +1797,7 @@ class YoutubeDL(object):
else:
res = '%sp' % format['height']
elif format.get('width') is not None:
res = '?x%d' % format['width']
res = '%dx?' % format['width']
else:
res = default
return res

View File

@@ -369,6 +369,7 @@ def _real_main(argv=None):
'no_color': opts.no_color,
'ffmpeg_location': opts.ffmpeg_location,
'hls_prefer_native': opts.hls_prefer_native,
'hls_use_mpegts': opts.hls_use_mpegts,
'external_downloader_args': external_downloader_args,
'postprocessor_args': postprocessor_args,
'cn_verification_proxy': opts.cn_verification_proxy,

View File

@@ -7,7 +7,7 @@ from __future__ import unicode_literals
import sys
if __package__ is None and not hasattr(sys, "frozen"):
if __package__ is None and not hasattr(sys, 'frozen'):
# direct call of __main__.py
import os.path
path = os.path.realpath(os.path.abspath(__file__))

View File

@@ -161,7 +161,7 @@ def aes_decrypt_text(data, password, key_size_bytes):
nonce = data[:NONCE_LENGTH_BYTES]
cipher = data[NONCE_LENGTH_BYTES:]
class Counter:
class Counter(object):
__value = nonce + [0] * (BLOCK_SIZE_BYTES - NONCE_LENGTH_BYTES)
def next_value(self):

View File

@@ -181,20 +181,20 @@ except ImportError: # Python < 3.4
# parameter := attribute "=" value
url = req.get_full_url()
scheme, data = url.split(":", 1)
mediatype, data = data.split(",", 1)
scheme, data = url.split(':', 1)
mediatype, data = data.split(',', 1)
# even base64 encoded data URLs might be quoted so unquote in any case:
data = compat_urllib_parse_unquote_to_bytes(data)
if mediatype.endswith(";base64"):
if mediatype.endswith(';base64'):
data = binascii.a2b_base64(data)
mediatype = mediatype[:-7]
if not mediatype:
mediatype = "text/plain;charset=US-ASCII"
mediatype = 'text/plain;charset=US-ASCII'
headers = email.message_from_string(
"Content-type: %s\nContent-length: %d\n" % (mediatype, len(data)))
'Content-type: %s\nContent-length: %d\n' % (mediatype, len(data)))
return compat_urllib_response.addinfourl(io.BytesIO(data), headers, url)
@@ -268,7 +268,7 @@ except ImportError: # Python 2
nv = name_value.split('=', 1)
if len(nv) != 2:
if strict_parsing:
raise ValueError("bad query field: %r" % (name_value,))
raise ValueError('bad query field: %r' % (name_value,))
# Handle case of a control-name with no equal sign
if keep_blank_values:
nv.append('')
@@ -466,7 +466,7 @@ if sys.version_info < (2, 7):
if err is not None:
raise err
else:
raise socket.error("getaddrinfo returns an empty list")
raise socket.error('getaddrinfo returns an empty list')
else:
compat_socket_create_connection = socket.create_connection

View File

@@ -45,6 +45,7 @@ class FileDownloader(object):
(experimental)
external_downloader_args: A list of additional command-line arguments for the
external downloader.
hls_use_mpegts: Use the mpegts container for HLS videos.
Subclasses of this one must re-define the real_download method.
"""
@@ -156,7 +157,7 @@ class FileDownloader(object):
def slow_down(self, start_time, now, byte_counter):
"""Sleep if the download speed is over the rate limit."""
rate_limit = self.params.get('ratelimit', None)
rate_limit = self.params.get('ratelimit')
if rate_limit is None or byte_counter == 0:
return
if now is None:

View File

@@ -1,66 +1,59 @@
from __future__ import unicode_literals
import os
import re
from .common import FileDownloader
from ..utils import sanitized_Request
from .fragment import FragmentFD
from ..utils import (
sanitize_open,
encodeFilename,
)
class DashSegmentsFD(FileDownloader):
class DashSegmentsFD(FragmentFD):
"""
Download segments in a DASH manifest
"""
FD_NAME = 'dashsegments'
def real_download(self, filename, info_dict):
self.report_destination(filename)
tmpfilename = self.temp_name(filename)
base_url = info_dict['url']
segment_urls = info_dict['segment_urls']
segment_urls = [info_dict['segment_urls'][0]] if self.params.get('test', False) else info_dict['segment_urls']
initialization_url = info_dict.get('initialization_url')
is_test = self.params.get('test', False)
remaining_bytes = self._TEST_FILE_SIZE if is_test else None
byte_counter = 0
ctx = {
'filename': filename,
'total_frags': len(segment_urls) + (1 if initialization_url else 0),
}
def append_url_to_file(outf, target_url, target_name, remaining_bytes=None):
self.to_screen('[DashSegments] %s: Downloading %s' % (info_dict['id'], target_name))
req = sanitized_Request(target_url)
if remaining_bytes is not None:
req.add_header('Range', 'bytes=0-%d' % (remaining_bytes - 1))
data = self.ydl.urlopen(req).read()
if remaining_bytes is not None:
data = data[:remaining_bytes]
outf.write(data)
return len(data)
self._prepare_and_start_frag_download(ctx)
def combine_url(base_url, target_url):
if re.match(r'^https?://', target_url):
return target_url
return '%s%s%s' % (base_url, '' if base_url.endswith('/') else '/', target_url)
with open(tmpfilename, 'wb') as outf:
append_url_to_file(
outf, combine_url(base_url, info_dict['initialization_url']),
'initialization segment')
for i, segment_url in enumerate(segment_urls):
segment_len = append_url_to_file(
outf, combine_url(base_url, segment_url),
'segment %d / %d' % (i + 1, len(segment_urls)),
remaining_bytes)
byte_counter += segment_len
if remaining_bytes is not None:
remaining_bytes -= segment_len
if remaining_bytes <= 0:
break
segments_filenames = []
self.try_rename(tmpfilename, filename)
def append_url_to_file(target_url, target_filename):
success = ctx['dl'].download(target_filename, {'url': combine_url(base_url, target_url)})
if not success:
return False
down, target_sanitized = sanitize_open(target_filename, 'rb')
ctx['dest_stream'].write(down.read())
down.close()
segments_filenames.append(target_sanitized)
self._hook_progress({
'downloaded_bytes': byte_counter,
'total_bytes': byte_counter,
'filename': filename,
'status': 'finished',
})
if initialization_url:
append_url_to_file(initialization_url, ctx['tmpfilename'] + '-Init')
for i, segment_url in enumerate(segment_urls):
segment_filename = '%s-Seg%d' % (ctx['tmpfilename'], i)
append_url_to_file(segment_url, segment_filename)
self._finish_frag_download(ctx)
for segment_file in segments_filenames:
os.remove(encodeFilename(segment_file))
return True

View File

@@ -273,15 +273,21 @@ class F4mFD(FragmentFD):
return fragments_list
def _parse_bootstrap_node(self, node, base_url):
if node.text is None:
# Sometimes non empty inline bootstrap info can be specified along
# with bootstrap url attribute (e.g. dummy inline bootstrap info
# contains whitespace characters in [1]). We will prefer bootstrap
# url over inline bootstrap info when present.
# 1. http://live-1-1.rutube.ru/stream/1024/HDS/SD/C2NKsS85HQNckgn5HdEmOQ/1454167650/S-s604419906/move/four/dirs/upper/1024-576p.f4m
bootstrap_url = node.get('url')
if bootstrap_url:
bootstrap_url = compat_urlparse.urljoin(
base_url, node.attrib['url'])
base_url, bootstrap_url)
boot_info = self._get_bootstrap_from_url(bootstrap_url)
else:
bootstrap_url = None
bootstrap = base64.b64decode(node.text.encode('ascii'))
boot_info = read_bootstrap_info(bootstrap)
return (boot_info, bootstrap_url)
return boot_info, bootstrap_url
def real_download(self, filename, info_dict):
man_url = info_dict['url']
@@ -316,7 +322,8 @@ class F4mFD(FragmentFD):
metadata = None
fragments_list = build_fragments_list(boot_info)
if self.params.get('test', False):
test = self.params.get('test', False)
if test:
# We only download the first fragment
fragments_list = fragments_list[:1]
total_frags = len(fragments_list)
@@ -326,6 +333,7 @@ class F4mFD(FragmentFD):
ctx = {
'filename': filename,
'total_frags': total_frags,
'live': live,
}
self._prepare_frag_download(ctx)
@@ -380,7 +388,7 @@ class F4mFD(FragmentFD):
else:
raise
if not fragments_list and live and bootstrap_url:
if not fragments_list and not test and live and bootstrap_url:
fragments_list = self._update_live_fragments(bootstrap_url, frag_i)
total_frags += len(fragments_list)
if fragments_list and (fragments_list[0][1] > frag_i + 1):

View File

@@ -26,7 +26,11 @@ class FragmentFD(FileDownloader):
self._start_frag_download(ctx)
def _prepare_frag_download(self, ctx):
self.to_screen('[%s] Total fragments: %d' % (self.FD_NAME, ctx['total_frags']))
if 'live' not in ctx:
ctx['live'] = False
self.to_screen(
'[%s] Total fragments: %s'
% (self.FD_NAME, ctx['total_frags'] if not ctx['live'] else 'unknown (live)'))
self.report_destination(ctx['filename'])
dl = HttpQuietDownloader(
self.ydl,
@@ -34,7 +38,7 @@ class FragmentFD(FileDownloader):
'continuedl': True,
'quiet': True,
'noprogress': True,
'ratelimit': self.params.get('ratelimit', None),
'ratelimit': self.params.get('ratelimit'),
'retries': self.params.get('retries', 0),
'test': self.params.get('test', False),
}
@@ -74,14 +78,14 @@ class FragmentFD(FileDownloader):
if s['status'] not in ('downloading', 'finished'):
return
frag_total_bytes = s.get('total_bytes') or 0
estimated_size = (
(ctx['complete_frags_downloaded_bytes'] + frag_total_bytes) /
(state['frag_index'] + 1) * total_frags)
time_now = time.time()
state['total_bytes_estimate'] = estimated_size
state['elapsed'] = time_now - start
frag_total_bytes = s.get('total_bytes') or 0
if not ctx['live']:
estimated_size = (
(ctx['complete_frags_downloaded_bytes'] + frag_total_bytes) /
(state['frag_index'] + 1) * total_frags)
state['total_bytes_estimate'] = estimated_size
if s['status'] == 'finished':
state['frag_index'] += 1
@@ -91,9 +95,10 @@ class FragmentFD(FileDownloader):
else:
frag_downloaded_bytes = s['downloaded_bytes']
state['downloaded_bytes'] += frag_downloaded_bytes - ctx['prev_frag_downloaded_bytes']
state['eta'] = self.calc_eta(
start, time_now, estimated_size,
state['downloaded_bytes'])
if not ctx['live']:
state['eta'] = self.calc_eta(
start, time_now, estimated_size,
state['downloaded_bytes'])
state['speed'] = s.get('speed')
ctx['prev_frag_downloaded_bytes'] = frag_downloaded_bytes
self._hook_progress(state)

View File

@@ -3,6 +3,7 @@ from __future__ import unicode_literals
import os
import re
import subprocess
import sys
from .common import FileDownloader
from .fragment import FragmentFD
@@ -39,7 +40,11 @@ class HlsFD(FileDownloader):
'-headers',
''.join('%s: %s\r\n' % (key, val) for key, val in headers.items())]
args += ['-i', url, '-f', 'mp4', '-c', 'copy', '-bsf:a', 'aac_adtstoasc']
args += ['-i', url, '-c', 'copy']
if self.params.get('hls_use_mpegts', False):
args += ['-f', 'mpegts']
else:
args += ['-f', 'mp4', '-bsf:a', 'aac_adtstoasc']
args = [encodeArgument(opt) for opt in args]
args.append(encodeFilename(ffpp._ffmpeg_filename_argument(tmpfilename), True))
@@ -53,8 +58,10 @@ class HlsFD(FileDownloader):
# subprocces.run would send the SIGKILL signal to ffmpeg and the
# mp4 file couldn't be played, but if we ask ffmpeg to quit it
# produces a file that is playable (this is mostly useful for live
# streams)
proc.communicate(b'q')
# streams). Note that Windows is not affected and produces playable
# files (see https://github.com/rg3/youtube-dl/issues/8300).
if sys.platform != 'win32':
proc.communicate(b'q')
raise
if retval == 0:
fsize = os.path.getsize(encodeFilename(tmpfilename))

View File

@@ -140,8 +140,8 @@ class HttpFD(FileDownloader):
if data_len is not None:
data_len = int(data_len) + resume_len
min_data_len = self.params.get("min_filesize", None)
max_data_len = self.params.get("max_filesize", None)
min_data_len = self.params.get('min_filesize')
max_data_len = self.params.get('max_filesize')
if min_data_len is not None and data_len < min_data_len:
self.to_screen('\r[download] File is smaller than min-filesize (%s bytes < %s bytes). Aborting.' % (data_len, min_data_len))
return False

View File

@@ -94,15 +94,15 @@ class RtmpFD(FileDownloader):
return proc.returncode
url = info_dict['url']
player_url = info_dict.get('player_url', None)
page_url = info_dict.get('page_url', None)
app = info_dict.get('app', None)
play_path = info_dict.get('play_path', None)
tc_url = info_dict.get('tc_url', None)
flash_version = info_dict.get('flash_version', None)
player_url = info_dict.get('player_url')
page_url = info_dict.get('page_url')
app = info_dict.get('app')
play_path = info_dict.get('play_path')
tc_url = info_dict.get('tc_url')
flash_version = info_dict.get('flash_version')
live = info_dict.get('rtmp_live', False)
conn = info_dict.get('rtmp_conn', None)
protocol = info_dict.get('rtmp_protocol', None)
conn = info_dict.get('rtmp_conn')
protocol = info_dict.get('rtmp_protocol')
real_time = info_dict.get('rtmp_real_time', False)
no_resume = info_dict.get('no_resume', False)
continue_dl = self.params.get('continuedl', True)

View File

@@ -20,6 +20,7 @@ from .aftonbladet import AftonbladetIE
from .airmozilla import AirMozillaIE
from .aljazeera import AlJazeeraIE
from .alphaporno import AlphaPornoIE
from .animeondemand import AnimeOnDemandIE
from .anitube import AnitubeIE
from .anysex import AnySexIE
from .aol import AolIE
@@ -44,13 +45,14 @@ from .arte import (
ArteTVFutureIE,
ArteTVCinemaIE,
ArteTVDDCIE,
ArteTVMagazineIE,
ArteTVEmbedIE,
)
from .atresplayer import AtresPlayerIE
from .atttechchannel import ATTTechChannelIE
from .audimedia import AudiMediaIE
from .audiomack import AudiomackIE, AudiomackAlbumIE
from .azubu import AzubuIE
from .azubu import AzubuIE, AzubuLiveIE
from .baidu import BaiduVideoIE
from .bambuser import BambuserIE, BambuserChannelIE
from .bandcamp import BandcampIE, BandcampAlbumIE
@@ -89,8 +91,15 @@ from .camdemy import (
from .canalplus import CanalplusIE
from .canalc2 import Canalc2IE
from .canvas import CanvasIE
from .cbc import (
CBCIE,
CBCPlayerIE,
)
from .cbs import CBSIE
from .cbsnews import CBSNewsIE
from .cbsnews import (
CBSNewsIE,
CBSNewsLiveVideoIE,
)
from .cbssports import CBSSportsIE
from .ccc import CCCIE
from .ceskatelevize import CeskaTelevizeIE
@@ -123,6 +132,7 @@ from .comcarcoff import ComCarCoffIE
from .commonmistakes import CommonMistakesIE, UnicodeBOMIE
from .condenast import CondeNastIE
from .cracked import CrackedIE
from .crackle import CrackleIE
from .criterion import CriterionIE
from .crooksandliars import CrooksAndLiarsIE
from .crunchyroll import (
@@ -142,6 +152,8 @@ from .dailymotion import (
from .daum import (
DaumIE,
DaumClipIE,
DaumPlaylistIE,
DaumUserIE,
)
from .dbtv import DBTVIE
from .dcn import (
@@ -196,7 +208,10 @@ from .everyonesmixtape import EveryonesMixtapeIE
from .exfm import ExfmIE
from .expotv import ExpoTVIE
from .extremetube import ExtremeTubeIE
from .facebook import FacebookIE
from .facebook import (
FacebookIE,
FacebookPostIE,
)
from .faz import FazIE
from .fc2 import FC2IE
from .fczenit import FczenitIE
@@ -320,6 +335,7 @@ from .keezmovies import KeezMoviesIE
from .khanacademy import KhanAcademyIE
from .kickstarter import KickStarterIE
from .keek import KeekIE
from .konserthusetplay import KonserthusetPlayIE
from .kontrtube import KontrTubeIE
from .krasview import KrasViewIE
from .ku6 import Ku6IE
@@ -369,6 +385,7 @@ from .macgamestore import MacGameStoreIE
from .mailru import MailRuIE
from .makertv import MakerTVIE
from .malemotion import MalemotionIE
from .matchtv import MatchTVIE
from .mdr import MDRIE
from .metacafe import MetacafeIE
from .metacritic import MetacriticIE
@@ -474,11 +491,13 @@ from .nowtv import (
NowTVIE,
NowTVListIE,
)
from .noz import NozIE
from .npo import (
NPOIE,
NPOLiveIE,
NPORadioIE,
NPORadioFragmentIE,
SchoolTVIE,
VPROIE,
WNLIE
)
@@ -522,6 +541,7 @@ from .planetaplay import PlanetaPlayIE
from .pladform import PladformIE
from .played import PlayedIE
from .playfm import PlayFMIE
from .plays import PlaysTVIE
from .playtvak import PlaytvakIE
from .playvid import PlayvidIE
from .playwire import PlaywireIE
@@ -535,6 +555,7 @@ from .pornhd import PornHdIE
from .pornhub import (
PornHubIE,
PornHubPlaylistIE,
PornHubUserVideosIE,
)
from .pornotube import PornotubeIE
from .pornovoisines import PornoVoisinesIE
@@ -602,6 +623,7 @@ from .sbs import SBSIE
from .scivee import SciVeeIE
from .screencast import ScreencastIE
from .screencastomatic import ScreencastOMaticIE
from .screenjunkies import ScreenJunkiesIE
from .screenwavemedia import ScreenwaveMediaIE, TeamFourIE
from .senateisvp import SenateISVPIE
from .servingsys import ServingSysIE
@@ -776,7 +798,11 @@ from .twitch import (
TwitchBookmarksIE,
TwitchStreamIE,
)
from .twitter import TwitterCardIE, TwitterIE
from .twitter import (
TwitterCardIE,
TwitterIE,
TwitterAmplifyIE,
)
from .ubu import UbuIE
from .udemy import (
UdemyIE,
@@ -812,7 +838,11 @@ from .videomore import (
)
from .videopremium import VideoPremiumIE
from .videott import VideoTtIE
from .vidme import VidmeIE
from .vidme import (
VidmeIE,
VidmeUserIE,
VidmeUserLikesIE,
)
from .vidzi import VidziIE
from .vier import VierIE, VierVideosIE
from .viewster import ViewsterIE

View File

@@ -8,11 +8,7 @@ from ..compat import compat_str
from ..utils import int_or_none
class ACastBaseIE(InfoExtractor):
_API_BASE_URL = 'https://www.acast.com/api/'
class ACastIE(ACastBaseIE):
class ACastIE(InfoExtractor):
IE_NAME = 'acast'
_VALID_URL = r'https?://(?:www\.)?acast\.com/(?P<channel>[^/]+)/(?P<id>[^/#?]+)'
_TEST = {
@@ -23,14 +19,19 @@ class ACastIE(ACastBaseIE):
'ext': 'mp3',
'title': '"Where Are You?": Taipei 101, Taiwan',
'timestamp': 1196172000000,
'description': 'md5:0c5d8201dfea2b93218ea986c91eee6e',
'description': 'md5:a0b4ef3634e63866b542e5b1199a1a0e',
'duration': 211,
}
}
def _real_extract(self, url):
channel, display_id = re.match(self._VALID_URL, url).groups()
cast_data = self._download_json(self._API_BASE_URL + 'channels/%s/acasts/%s/playback' % (channel, display_id), display_id)
embed_page = self._download_webpage(
re.sub('(?:www\.)?acast\.com', 'embedcdn.acast.com', url), display_id)
cast_data = self._parse_json(self._search_regex(
r'window\[\'acast/queries\'\]\s*=\s*([^;]+);', embed_page, 'acast data'),
display_id)['GetAcast/%s/%s' % (channel, display_id)]
return {
'id': compat_str(cast_data['id']),
@@ -44,7 +45,7 @@ class ACastIE(ACastBaseIE):
}
class ACastChannelIE(ACastBaseIE):
class ACastChannelIE(InfoExtractor):
IE_NAME = 'acast:channel'
_VALID_URL = r'https?://(?:www\.)?acast\.com/(?P<id>[^/#?]+)'
_TEST = {
@@ -56,6 +57,7 @@ class ACastChannelIE(ACastBaseIE):
},
'playlist_mincount': 20,
}
_API_BASE_URL = 'https://www.acast.com/api/'
@classmethod
def suitable(cls, url):

View File

@@ -28,7 +28,7 @@ class AENetworksIE(InfoExtractor):
'info_dict': {
'id': 'eg47EERs_JsZ',
'ext': 'mp4',
'title': "Winter Is Coming",
'title': 'Winter Is Coming',
'description': 'md5:641f424b7a19d8e24f26dea22cf59d74',
},
'params': {

View File

@@ -8,6 +8,8 @@ from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
qualities,
unescapeHTML,
xpath_element,
)
@@ -31,7 +33,7 @@ class AllocineIE(InfoExtractor):
'id': '19540403',
'ext': 'mp4',
'title': 'Planes 2 Bande-annonce VF',
'description': 'md5:eeaffe7c2d634525e21159b93acf3b1e',
'description': 'Regardez la bande annonce du film Planes 2 (Planes 2 Bande-annonce VF). Planes 2, un film de Roberts Gannaway',
'thumbnail': 're:http://.*\.jpg',
},
}, {
@@ -41,7 +43,7 @@ class AllocineIE(InfoExtractor):
'id': '19544709',
'ext': 'mp4',
'title': 'Dragons 2 - Bande annonce finale VF',
'description': 'md5:71742e3a74b0d692c7fce0dd2017a4ac',
'description': 'md5:601d15393ac40f249648ef000720e7e3',
'thumbnail': 're:http://.*\.jpg',
},
}, {
@@ -59,14 +61,18 @@ class AllocineIE(InfoExtractor):
if typ == 'film':
video_id = self._search_regex(r'href="/video/player_gen_cmedia=([0-9]+).+"', webpage, 'video id')
else:
player = self._search_regex(r'data-player=\'([^\']+)\'>', webpage, 'data player')
player_data = json.loads(player)
video_id = compat_str(player_data['refMedia'])
player = self._search_regex(r'data-player=\'([^\']+)\'>', webpage, 'data player', default=None)
if player:
player_data = json.loads(player)
video_id = compat_str(player_data['refMedia'])
else:
model = self._search_regex(r'data-model="([^"]+)">', webpage, 'data model')
model_data = self._parse_json(unescapeHTML(model), display_id)
video_id = compat_str(model_data['id'])
xml = self._download_xml('http://www.allocine.fr/ws/AcVisiondataV4.ashx?media=%s' % video_id, display_id)
video = xml.find('.//AcVisionVideo').attrib
video = xpath_element(xml, './/AcVisionVideo').attrib
quality = qualities(['ld', 'md', 'hd'])
formats = []

View File

@@ -0,0 +1,160 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_urlparse
from ..utils import (
determine_ext,
encode_dict,
ExtractorError,
sanitized_Request,
urlencode_postdata,
)
class AnimeOnDemandIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?anime-on-demand\.de/anime/(?P<id>\d+)'
_LOGIN_URL = 'https://www.anime-on-demand.de/users/sign_in'
_APPLY_HTML5_URL = 'https://www.anime-on-demand.de/html5apply'
_NETRC_MACHINE = 'animeondemand'
_TEST = {
'url': 'https://www.anime-on-demand.de/anime/161',
'info_dict': {
'id': '161',
'title': 'Grimgar, Ashes and Illusions (OmU)',
'description': 'md5:6681ce3c07c7189d255ac6ab23812d31',
},
'playlist_mincount': 4,
}
def _login(self):
(username, password) = self._get_login_info()
if username is None:
return
login_page = self._download_webpage(
self._LOGIN_URL, None, 'Downloading login page')
login_form = self._form_hidden_inputs('new_user', login_page)
login_form.update({
'user[login]': username,
'user[password]': password,
})
post_url = self._search_regex(
r'<form[^>]+action=(["\'])(?P<url>.+?)\1', login_page,
'post url', default=self._LOGIN_URL, group='url')
if not post_url.startswith('http'):
post_url = compat_urlparse.urljoin(self._LOGIN_URL, post_url)
request = sanitized_Request(
post_url, urlencode_postdata(encode_dict(login_form)))
request.add_header('Referer', self._LOGIN_URL)
response = self._download_webpage(
request, None, 'Logging in as %s' % username)
if all(p not in response for p in ('>Logout<', 'href="/users/sign_out"')):
error = self._search_regex(
r'<p class="alert alert-danger">(.+?)</p>',
response, 'error', default=None)
if error:
raise ExtractorError('Unable to login: %s' % error, expected=True)
raise ExtractorError('Unable to log in')
def _real_initialize(self):
self._login()
def _real_extract(self, url):
anime_id = self._match_id(url)
webpage = self._download_webpage(url, anime_id)
if 'data-playlist=' not in webpage:
self._download_webpage(
self._APPLY_HTML5_URL, anime_id,
'Activating HTML5 beta', 'Unable to apply HTML5 beta')
webpage = self._download_webpage(url, anime_id)
csrf_token = self._html_search_meta(
'csrf-token', webpage, 'csrf token', fatal=True)
anime_title = self._html_search_regex(
r'(?s)<h1[^>]+itemprop="name"[^>]*>(.+?)</h1>',
webpage, 'anime name')
anime_description = self._html_search_regex(
r'(?s)<div[^>]+itemprop="description"[^>]*>(.+?)</div>',
webpage, 'anime description', default=None)
entries = []
for episode_html in re.findall(r'(?s)<h3[^>]+class="episodebox-title".+?>Episodeninhalt<', webpage):
m = re.search(
r'class="episodebox-title"[^>]+title="Episode (?P<number>\d+) - (?P<title>.+?)"', episode_html)
if not m:
continue
episode_number = int(m.group('number'))
episode_title = m.group('title')
video_id = 'episode-%d' % episode_number
common_info = {
'id': video_id,
'series': anime_title,
'episode': episode_title,
'episode_number': episode_number,
}
formats = []
playlist_url = self._search_regex(
r'data-playlist=(["\'])(?P<url>.+?)\1',
episode_html, 'data playlist', default=None, group='url')
if playlist_url:
request = sanitized_Request(
compat_urlparse.urljoin(url, playlist_url),
headers={
'X-Requested-With': 'XMLHttpRequest',
'X-CSRF-Token': csrf_token,
'Referer': url,
'Accept': 'application/json, text/javascript, */*; q=0.01',
})
playlist = self._download_json(
request, video_id, 'Downloading playlist JSON', fatal=False)
if playlist:
playlist = playlist['playlist'][0]
title = playlist['title']
description = playlist.get('description')
for source in playlist.get('sources', []):
file_ = source.get('file')
if file_ and determine_ext(file_) == 'm3u8':
formats = self._extract_m3u8_formats(
file_, video_id, 'mp4',
entry_protocol='m3u8_native', m3u8_id='hls')
if formats:
f = common_info.copy()
f.update({
'title': title,
'description': description,
'formats': formats,
})
entries.append(f)
m = re.search(
r'data-dialog-header=(["\'])(?P<title>.+?)\1[^>]+href=(["\'])(?P<href>.+?)\3[^>]*>Teaser<',
episode_html)
if m:
f = common_info.copy()
f.update({
'id': '%s-teaser' % f['id'],
'title': m.group('title'),
'url': compat_urlparse.urljoin(url, m.group('href')),
})
entries.append(f)
return self.playlist_result(entries, anime_id, anime_title, anime_description)

View File

@@ -12,7 +12,7 @@ from ..utils import (
class AppleTrailersIE(InfoExtractor):
IE_NAME = 'appletrailers'
_VALID_URL = r'https?://(?:www\.)?trailers\.apple\.com/(?:trailers|ca)/(?P<company>[^/]+)/(?P<movie>[^/]+)'
_VALID_URL = r'https?://(?:www\.|movie)?trailers\.apple\.com/(?:trailers|ca)/(?P<company>[^/]+)/(?P<movie>[^/]+)'
_TESTS = [{
'url': 'http://trailers.apple.com/trailers/wb/manofsteel/',
'info_dict': {
@@ -73,6 +73,9 @@ class AppleTrailersIE(InfoExtractor):
}, {
'url': 'http://trailers.apple.com/ca/metropole/autrui/',
'only_matching': True,
}, {
'url': 'http://movietrailers.apple.com/trailers/focus_features/kuboandthetwostrings/',
'only_matching': True,
}]
_JSON_RE = r'iTunes.playURL\((.*?)\);'

View File

@@ -13,6 +13,7 @@ from ..utils import (
unified_strdate,
get_element_by_attribute,
int_or_none,
NO_DEFAULT,
qualities,
)
@@ -22,7 +23,7 @@ from ..utils import (
class ArteTvIE(InfoExtractor):
_VALID_URL = r'http://videos\.arte\.tv/(?P<lang>fr|de)/.*-(?P<id>.*?)\.html'
_VALID_URL = r'http://videos\.arte\.tv/(?P<lang>fr|de|en|es)/.*-(?P<id>.*?)\.html'
IE_NAME = 'arte.tv'
def _real_extract(self, url):
@@ -62,7 +63,7 @@ class ArteTvIE(InfoExtractor):
class ArteTVPlus7IE(InfoExtractor):
IE_NAME = 'arte.tv:+7'
_VALID_URL = r'https?://(?:www\.)?arte\.tv/guide/(?P<lang>fr|de)/(?:(?:sendungen|emissions)/)?(?P<id>.*?)/(?P<name>.*?)(\?.*)?'
_VALID_URL = r'https?://(?:www\.)?arte\.tv/guide/(?P<lang>fr|de|en|es)/(?:(?:sendungen|emissions|embed)/)?(?P<id>[^/]+)/(?P<name>[^/?#&+])'
@classmethod
def _extract_url_info(cls, url):
@@ -93,12 +94,40 @@ class ArteTVPlus7IE(InfoExtractor):
json_url = self._html_search_regex(
patterns, webpage, 'json vp url', default=None)
if not json_url:
iframe_url = self._html_search_regex(
r'<iframe[^>]+src=(["\'])(?P<url>.+\bjson_url=.+?)\1',
webpage, 'iframe url', group='url')
json_url = compat_parse_qs(
compat_urllib_parse_urlparse(iframe_url).query)['json_url'][0]
return self._extract_from_json_url(json_url, video_id, lang)
def find_iframe_url(webpage, default=NO_DEFAULT):
return self._html_search_regex(
r'<iframe[^>]+src=(["\'])(?P<url>.+\bjson_url=.+?)\1',
webpage, 'iframe url', group='url', default=default)
iframe_url = find_iframe_url(webpage, None)
if not iframe_url:
embed_url = self._html_search_regex(
r'arte_vp_url_oembed=\'([^\']+?)\'', webpage, 'embed url', default=None)
if embed_url:
player = self._download_json(
embed_url, video_id, 'Downloading player page')
iframe_url = find_iframe_url(player['html'])
# en and es URLs produce react-based pages with different layout (e.g.
# http://www.arte.tv/guide/en/053330-002-A/carnival-italy?zone=world)
if not iframe_url:
program = self._search_regex(
r'program\s*:\s*({.+?["\']embed_html["\'].+?}),?\s*\n',
webpage, 'program', default=None)
if program:
embed_html = self._parse_json(program, video_id)
if embed_html:
iframe_url = find_iframe_url(embed_html['embed_html'])
if iframe_url:
json_url = compat_parse_qs(
compat_urllib_parse_urlparse(iframe_url).query)['json_url'][0]
if json_url:
return self._extract_from_json_url(json_url, video_id, lang)
# Differend kind of embed URL (e.g.
# http://www.arte.tv/magazine/trepalium/fr/episode-0406-replay-trepalium)
embed_url = self._search_regex(
r'<iframe[^>]+src=(["\'])(?P<url>.+?)\1',
webpage, 'embed url', group='url')
return self.url_result(embed_url)
def _extract_from_json_url(self, json_url, video_id, lang):
info = self._download_json(json_url, video_id)
@@ -106,7 +135,7 @@ class ArteTVPlus7IE(InfoExtractor):
upload_date_str = player_info.get('shootingDate')
if not upload_date_str:
upload_date_str = player_info.get('VDA', '').split(' ')[0]
upload_date_str = (player_info.get('VRA') or player_info.get('VDA') or '').split(' ')[0]
title = player_info['VTI'].strip()
subtitle = player_info.get('VSU', '').strip()
@@ -122,27 +151,30 @@ class ArteTVPlus7IE(InfoExtractor):
}
qfunc = qualities(['HQ', 'MQ', 'EQ', 'SQ'])
LANGS = {
'fr': 'F',
'de': 'A',
'en': 'E[ANG]',
'es': 'E[ESP]',
}
formats = []
for format_id, format_dict in player_info['VSR'].items():
f = dict(format_dict)
versionCode = f.get('versionCode')
langcode = {
'fr': 'F',
'de': 'A',
}.get(lang, lang)
lang_rexs = [r'VO?%s' % langcode, r'VO?.-ST%s' % langcode]
lang_pref = (
None if versionCode is None else (
10 if any(re.match(r, versionCode) for r in lang_rexs)
else -10))
langcode = LANGS.get(lang, lang)
lang_rexs = [r'VO?%s-' % re.escape(langcode), r'VO?.-ST%s$' % re.escape(langcode)]
lang_pref = None
if versionCode:
matched_lang_rexs = [r for r in lang_rexs if re.match(r, versionCode)]
lang_pref = -10 if not matched_lang_rexs else 10 * len(matched_lang_rexs)
source_pref = 0
if versionCode is not None:
# The original version with subtitles has lower relevance
if re.match(r'VO-ST(F|A)', versionCode):
if re.match(r'VO-ST(F|A|E)', versionCode):
source_pref -= 10
# The version with sourds/mal subtitles has also lower relevance
elif re.match(r'VO?(F|A)-STM\1', versionCode):
elif re.match(r'VO?(F|A|E)-STM\1', versionCode):
source_pref -= 9
format = {
'format_id': format_id,
@@ -175,7 +207,7 @@ class ArteTVPlus7IE(InfoExtractor):
# It also uses the arte_vp_url url from the webpage to extract the information
class ArteTVCreativeIE(ArteTVPlus7IE):
IE_NAME = 'arte.tv:creative'
_VALID_URL = r'https?://creative\.arte\.tv/(?P<lang>fr|de)/(?:magazine?/)?(?P<id>[^?#]+)'
_VALID_URL = r'https?://creative\.arte\.tv/(?P<lang>fr|de|en|es)/(?:magazine?/)?(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'http://creative.arte.tv/de/magazin/agentur-amateur-corporate-design',
@@ -199,7 +231,7 @@ class ArteTVCreativeIE(ArteTVPlus7IE):
class ArteTVFutureIE(ArteTVPlus7IE):
IE_NAME = 'arte.tv:future'
_VALID_URL = r'https?://future\.arte\.tv/(?P<lang>fr|de)/(?P<id>.+)'
_VALID_URL = r'https?://future\.arte\.tv/(?P<lang>fr|de|en|es)/(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'http://future.arte.tv/fr/info-sciences/les-ecrevisses-aussi-sont-anxieuses',
@@ -207,6 +239,7 @@ class ArteTVFutureIE(ArteTVPlus7IE):
'id': '050940-028-A',
'ext': 'mp4',
'title': 'Les écrevisses aussi peuvent être anxieuses',
'upload_date': '20140902',
},
}, {
'url': 'http://future.arte.tv/fr/la-science-est-elle-responsable',
@@ -216,7 +249,7 @@ class ArteTVFutureIE(ArteTVPlus7IE):
class ArteTVDDCIE(ArteTVPlus7IE):
IE_NAME = 'arte.tv:ddc'
_VALID_URL = r'https?://ddc\.arte\.tv/(?P<lang>emission|folge)/(?P<id>.+)'
_VALID_URL = r'https?://ddc\.arte\.tv/(?P<lang>emission|folge)/(?P<id>[^/?#&]+)'
def _real_extract(self, url):
video_id, lang = self._extract_url_info(url)
@@ -234,7 +267,7 @@ class ArteTVDDCIE(ArteTVPlus7IE):
class ArteTVConcertIE(ArteTVPlus7IE):
IE_NAME = 'arte.tv:concert'
_VALID_URL = r'https?://concert\.arte\.tv/(?P<lang>de|fr)/(?P<id>.+)'
_VALID_URL = r'https?://concert\.arte\.tv/(?P<lang>fr|de|en|es)/(?P<id>[^/?#&]+)'
_TEST = {
'url': 'http://concert.arte.tv/de/notwist-im-pariser-konzertclub-divan-du-monde',
@@ -251,7 +284,7 @@ class ArteTVConcertIE(ArteTVPlus7IE):
class ArteTVCinemaIE(ArteTVPlus7IE):
IE_NAME = 'arte.tv:cinema'
_VALID_URL = r'https?://cinema\.arte\.tv/(?P<lang>de|fr)/(?P<id>.+)'
_VALID_URL = r'https?://cinema\.arte\.tv/(?P<lang>fr|de|en|es)/(?P<id>.+)'
_TEST = {
'url': 'http://cinema.arte.tv/de/node/38291',
@@ -266,6 +299,37 @@ class ArteTVCinemaIE(ArteTVPlus7IE):
}
class ArteTVMagazineIE(ArteTVPlus7IE):
IE_NAME = 'arte.tv:magazine'
_VALID_URL = r'https?://(?:www\.)?arte\.tv/magazine/[^/]+/(?P<lang>fr|de|en|es)/(?P<id>[^/?#&]+)'
_TESTS = [{
# Embedded via <iframe src="http://www.arte.tv/arte_vp/index.php?json_url=..."
'url': 'http://www.arte.tv/magazine/trepalium/fr/entretien-avec-le-realisateur-vincent-lannoo-trepalium',
'md5': '2a9369bcccf847d1c741e51416299f25',
'info_dict': {
'id': '065965-000-A',
'ext': 'mp4',
'title': 'Trepalium - Extrait Ep.01',
'upload_date': '20160121',
},
}, {
# Embedded via <iframe src="http://www.arte.tv/guide/fr/embed/054813-004-A/medium"
'url': 'http://www.arte.tv/magazine/trepalium/fr/episode-0406-replay-trepalium',
'md5': 'fedc64fc7a946110fe311634e79782ca',
'info_dict': {
'id': '054813-004_PLUS7-F',
'ext': 'mp4',
'title': 'Trepalium (4/6)',
'description': 'md5:10057003c34d54e95350be4f9b05cb40',
'upload_date': '20160218',
},
}, {
'url': 'http://www.arte.tv/magazine/metropolis/de/frank-woeste-german-paris-metropolis',
'only_matching': True,
}]
class ArteTVEmbedIE(ArteTVPlus7IE):
IE_NAME = 'arte.tv:embed'
_VALID_URL = r'''(?x)

View File

@@ -3,7 +3,11 @@ from __future__ import unicode_literals
import json
from .common import InfoExtractor
from ..utils import float_or_none
from ..utils import (
ExtractorError,
float_or_none,
sanitized_Request,
)
class AzubuIE(InfoExtractor):
@@ -91,3 +95,37 @@ class AzubuIE(InfoExtractor):
'view_count': view_count,
'formats': formats,
}
class AzubuLiveIE(InfoExtractor):
_VALID_URL = r'http://www.azubu.tv/(?P<id>[^/]+)$'
_TEST = {
'url': 'http://www.azubu.tv/MarsTVMDLen',
'only_matching': True,
}
def _real_extract(self, url):
user = self._match_id(url)
info = self._download_json(
'http://api.azubu.tv/public/modules/last-video/{0}/info'.format(user),
user)['data']
if info['type'] != 'STREAM':
raise ExtractorError('{0} is not streaming live'.format(user), expected=True)
req = sanitized_Request(
'https://edge-elb.api.brightcove.com/playback/v1/accounts/3361910549001/videos/ref:' + info['reference_id'])
req.add_header('Accept', 'application/json;pk=BCpkADawqM1gvI0oGWg8dxQHlgT8HkdE2LnAlWAZkOlznO39bSZX726u4JqnDsK3MDXcO01JxXK2tZtJbgQChxgaFzEVdHRjaDoxaOu8hHOO8NYhwdxw9BzvgkvLUlpbDNUuDoc4E4wxDToV')
bc_info = self._download_json(req, user)
m3u8_url = next(source['src'] for source in bc_info['sources'] if source['container'] == 'M2TS')
formats = self._extract_m3u8_formats(m3u8_url, user, ext='mp4')
return {
'id': info['id'],
'title': self._live_title(info['title']),
'uploader_id': user,
'formats': formats,
'is_live': True,
'thumbnail': bc_info['poster'],
}

View File

@@ -86,7 +86,7 @@ class BBCCoUkIE(InfoExtractor):
'id': 'b00yng1d',
'ext': 'flv',
'title': 'The Voice UK: Series 3: Blind Auditions 5',
'description': "Emma Willis and Marvin Humes present the fifth set of blind auditions in the singing competition, as the coaches continue to build their teams based on voice alone.",
'description': 'Emma Willis and Marvin Humes present the fifth set of blind auditions in the singing competition, as the coaches continue to build their teams based on voice alone.',
'duration': 5100,
},
'params': {
@@ -193,6 +193,19 @@ class BBCCoUkIE(InfoExtractor):
# rtmp download
'skip_download': True,
},
}, {
# compact player (https://github.com/rg3/youtube-dl/issues/8147)
'url': 'http://www.bbc.co.uk/programmes/p028bfkf/player',
'info_dict': {
'id': 'p028bfkj',
'ext': 'flv',
'title': 'Extract from BBC documentary Look Stranger - Giant Leeks and Magic Brews',
'description': 'Extract from BBC documentary Look Stranger - Giant Leeks and Magic Brews',
},
'params': {
# rtmp download
'skip_download': True,
},
}, {
'url': 'http://www.bbc.co.uk/iplayer/playlist/p01dvks4',
'only_matching': True,
@@ -482,9 +495,11 @@ class BBCCoUkIE(InfoExtractor):
if programme_id:
formats, subtitles = self._download_media_selector(programme_id)
title = self._og_search_title(webpage, default=None) or self._html_search_regex(
r'<h2[^>]+id="parent-title"[^>]*>(.+?)</h2>', webpage, 'title')
(r'<h2[^>]+id="parent-title"[^>]*>(.+?)</h2>',
r'<div[^>]+class="info"[^>]*>\s*<h1>(.+?)</h1>'), webpage, 'title')
description = self._search_regex(
r'<p class="[^"]*medium-description[^"]*">([^<]+)</p>',
(r'<p class="[^"]*medium-description[^"]*">([^<]+)</p>',
r'<div[^>]+class="info_+synopsis"[^>]*>([^<]+)</div>'),
webpage, 'description', default=None)
if not description:
description = self._html_search_meta('description', webpage)

View File

@@ -1,7 +1,13 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
js_to_json,
determine_ext,
)
class BpbIE(InfoExtractor):
@@ -10,7 +16,8 @@ class BpbIE(InfoExtractor):
_TEST = {
'url': 'http://www.bpb.de/mediathek/297/joachim-gauck-zu-1989-und-die-erinnerung-an-die-ddr',
'md5': '0792086e8e2bfbac9cdf27835d5f2093',
# md5 fails in Python 2.6 due to buggy server response and wrong handling of urllib2
'md5': 'c4f84c8a8044ca9ff68bb8441d300b3f',
'info_dict': {
'id': '297',
'ext': 'mp4',
@@ -25,13 +32,26 @@ class BpbIE(InfoExtractor):
title = self._html_search_regex(
r'<h2 class="white">(.*?)</h2>', webpage, 'title')
video_url = self._html_search_regex(
r'(http://film\.bpb\.de/player/dokument_[0-9]+\.mp4)',
webpage, 'video URL')
video_info_dicts = re.findall(
r"({\s*src:\s*'http://film\.bpb\.de/[^}]+})", webpage)
formats = []
for video_info in video_info_dicts:
video_info = self._parse_json(video_info, video_id, transform_source=js_to_json)
quality = video_info['quality']
video_url = video_info['src']
formats.append({
'url': video_url,
'preference': 10 if quality == 'high' else 0,
'format_note': quality,
'format_id': '%s-%s' % (quality, determine_ext(video_url)),
})
self._sort_formats(formats)
return {
'id': video_id,
'url': video_url,
'formats': formats,
'title': title,
'description': self._og_search_description(webpage),
}

View File

@@ -6,7 +6,7 @@ from ..utils import float_or_none
class CanvasIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?canvas\.be/video/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_TEST = {
_TESTS = [{
'url': 'http://www.canvas.be/video/de-afspraak/najaar-2015/de-afspraak-veilt-voor-de-warmste-week',
'md5': 'ea838375a547ac787d4064d8c7860a6c',
'info_dict': {
@@ -18,7 +18,27 @@ class CanvasIE(InfoExtractor):
'thumbnail': 're:^https?://.*\.jpg$',
'duration': 49.02,
}
}
}, {
# with subtitles
'url': 'http://www.canvas.be/video/panorama/2016/pieter-0167',
'info_dict': {
'id': 'mz-ast-5240ff21-2d30-4101-bba6-92b5ec67c625',
'display_id': 'pieter-0167',
'ext': 'mp4',
'title': 'Pieter 0167',
'description': 'md5:943cd30f48a5d29ba02c3a104dc4ec4e',
'thumbnail': 're:^https?://.*\.jpg$',
'duration': 2553.08,
'subtitles': {
'nl': [{
'ext': 'vtt',
}],
},
},
'params': {
'skip_download': True,
}
}]
def _real_extract(self, url):
display_id = self._match_id(url)
@@ -54,6 +74,14 @@ class CanvasIE(InfoExtractor):
})
self._sort_formats(formats)
subtitles = {}
subtitle_urls = data.get('subtitleUrls')
if isinstance(subtitle_urls, list):
for subtitle in subtitle_urls:
subtitle_url = subtitle.get('url')
if subtitle_url and subtitle.get('type') == 'CLOSED':
subtitles.setdefault('nl', []).append({'url': subtitle_url})
return {
'id': video_id,
'display_id': display_id,
@@ -62,4 +90,5 @@ class CanvasIE(InfoExtractor):
'formats': formats,
'duration': float_or_none(data.get('duration'), 1000),
'thumbnail': data.get('posterImageUrl'),
'subtitles': subtitles,
}

113
youtube_dl/extractor/cbc.py Normal file
View File

@@ -0,0 +1,113 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import js_to_json
class CBCIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?cbc\.ca/(?:[^/]+/)+(?P<id>[^/?#]+)'
_TESTS = [{
# with mediaId
'url': 'http://www.cbc.ca/22minutes/videos/clips-season-23/don-cherry-play-offs',
'info_dict': {
'id': '2682904050',
'ext': 'flv',
'title': 'Don Cherry All-Stars',
'description': 'Don Cherry has a bee in his bonnet about AHL player John Scott because that guys got heart.',
'timestamp': 1454475540,
'upload_date': '20160203',
},
'params': {
# rtmp download
'skip_download': True,
},
}, {
# with clipId
'url': 'http://www.cbc.ca/archives/entry/1978-robin-williams-freestyles-on-90-minutes-live',
'info_dict': {
'id': '2487345465',
'ext': 'flv',
'title': 'Robin Williams freestyles on 90 Minutes Live',
'description': 'Wacky American comedian Robin Williams shows off his infamous "freestyle" comedic talents while being interviewed on CBC\'s 90 Minutes Live.',
'upload_date': '19700101',
},
'params': {
# rtmp download
'skip_download': True,
},
}, {
# multiple iframes
'url': 'http://www.cbc.ca/natureofthings/blog/birds-eye-view-from-vancouvers-burrard-street-bridge-how-we-got-the-shot',
'playlist': [{
'info_dict': {
'id': '2680832926',
'ext': 'flv',
'title': 'An Eagle\'s-Eye View Off Burrard Bridge',
'description': 'Hercules the eagle flies from Vancouver\'s Burrard Bridge down to a nearby park with a mini-camera strapped to his back.',
'upload_date': '19700101',
},
}, {
'info_dict': {
'id': '2658915080',
'ext': 'flv',
'title': 'Fly like an eagle!',
'description': 'Eagle equipped with a mini camera flies from the world\'s tallest tower',
'upload_date': '19700101',
},
}],
'params': {
# rtmp download
'skip_download': True,
},
}]
@classmethod
def suitable(cls, url):
return False if CBCPlayerIE.suitable(url) else super(CBCIE, cls).suitable(url)
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
player_init = self._search_regex(
r'CBC\.APP\.Caffeine\.initInstance\(({.+?})\);', webpage, 'player init',
default=None)
if player_init:
player_info = self._parse_json(player_init, display_id, js_to_json)
media_id = player_info.get('mediaId')
if not media_id:
clip_id = player_info['clipId']
media_id = self._download_json(
'http://feed.theplatform.com/f/h9dtGB/punlNGjMlc1F?fields=id&byContent=byReleases%3DbyId%253D' + clip_id,
clip_id)['entries'][0]['id'].split('/')[-1]
return self.url_result('cbcplayer:%s' % media_id, 'CBCPlayer', media_id)
else:
entries = [self.url_result('cbcplayer:%s' % media_id, 'CBCPlayer', media_id) for media_id in re.findall(r'<iframe[^>]+src="[^"]+?mediaId=(\d+)"', webpage)]
return self.playlist_result(entries)
class CBCPlayerIE(InfoExtractor):
_VALID_URL = r'(?:cbcplayer:|https?://(?:www\.)?cbc\.ca/(?:player/play/|i/caffeine/syndicate/\?mediaId=))(?P<id>\d+)'
_TEST = {
'url': 'http://www.cbc.ca/player/play/2683190193',
'info_dict': {
'id': '2683190193',
'ext': 'flv',
'title': 'Gerry Runs a Sweat Shop',
'description': 'md5:b457e1c01e8ff408d9d801c1c2cd29b0',
'timestamp': 1455067800,
'upload_date': '20160210',
},
'params': {
# rtmp download
'skip_download': True,
},
}
def _real_extract(self, url):
video_id = self._match_id(url)
return self.url_result(
'http://feed.theplatform.com/f/ExhSPC/vms_5akSXx4Ng_Zn?byGuid=%s' % video_id,
'ThePlatformFeed', video_id)

View File

@@ -1,16 +1,17 @@
# encoding: utf-8
from __future__ import unicode_literals
import re
import json
from .common import InfoExtractor
from ..utils import remove_start
from .theplatform import ThePlatformIE
from ..utils import (
parse_duration,
find_xpath_attr,
)
class CBSNewsIE(InfoExtractor):
class CBSNewsIE(ThePlatformIE):
IE_DESC = 'CBS News'
_VALID_URL = r'http://(?:www\.)?cbsnews\.com/(?:[^/]+/)+(?P<id>[\da-z_-]+)'
_VALID_URL = r'http://(?:www\.)?cbsnews\.com/(?:news|videos)/(?P<id>[\da-z_-]+)'
_TESTS = [
{
@@ -31,7 +32,7 @@ class CBSNewsIE(InfoExtractor):
'url': 'http://www.cbsnews.com/videos/fort-hood-shooting-army-downplays-mental-illness-as-cause-of-attack/',
'info_dict': {
'id': 'fort-hood-shooting-army-downplays-mental-illness-as-cause-of-attack',
'ext': 'flv',
'ext': 'mp4',
'title': 'Fort Hood shooting: Army downplays mental illness as cause of attack',
'thumbnail': 're:^https?://.*\.jpg$',
'duration': 205,
@@ -42,60 +43,46 @@ class CBSNewsIE(InfoExtractor):
},
},
'params': {
# rtmp download
# m3u8 download
'skip_download': True,
},
},
]
def _parse_smil_subtitles(self, smil, namespace=None, subtitles_lang='en'):
closed_caption_e = find_xpath_attr(smil, self._xpath_ns('.//param', namespace), 'name', 'ClosedCaptionURL')
return {
'en': [{
'ext': 'ttml',
'url': closed_caption_e.attrib['value'],
}]
} if closed_caption_e is not None and closed_caption_e.attrib.get('value') else []
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
video_info = json.loads(self._html_search_regex(
video_info = self._parse_json(self._html_search_regex(
r'(?:<ul class="media-list items" id="media-related-items"><li data-video-info|<div id="cbsNewsVideoPlayer" data-video-player-options)=\'({.+?})\'',
webpage, 'video JSON info'))
webpage, 'video JSON info'), video_id)
item = video_info['item'] if 'item' in video_info else video_info
title = item.get('articleTitle') or item.get('hed')
duration = item.get('duration')
thumbnail = item.get('mediaImage') or item.get('thumbnail')
subtitles = {}
formats = []
for format_id in ['RtmpMobileLow', 'RtmpMobileHigh', 'Hls', 'RtmpDesktop']:
uri = item.get('media' + format_id + 'URI')
if not uri:
pid = item.get('media' + format_id)
if not pid:
continue
uri = remove_start(uri, '{manifest:none}')
fmt = {
'url': uri,
'format_id': format_id,
}
if uri.startswith('rtmp'):
play_path = re.sub(
r'{slistFilePath}', '',
uri.split('<break>')[-1].split('{break}')[-1])
play_path = re.sub(
r'{manifest:.+}.*$', '', play_path)
fmt.update({
'app': 'ondemand?auth=cbs',
'play_path': 'mp4:' + play_path,
'player_url': 'http://www.cbsnews.com/[[IMPORT]]/vidtech.cbsinteractive.com/player/3_3_0/CBSI_PLAYER_HD.swf',
'page_url': 'http://www.cbsnews.com',
'ext': 'flv',
})
elif uri.endswith('.m3u8'):
fmt['ext'] = 'mp4'
formats.append(fmt)
subtitles = {}
if 'mpxRefId' in video_info:
subtitles['en'] = [{
'ext': 'ttml',
'url': 'http://www.cbsnews.com/videos/captions/%s.adb_xml' % video_info['mpxRefId'],
}]
release_url = 'http://link.theplatform.com/s/dJ5BDC/%s?format=SMIL&mbr=true' % pid
tp_formats, tp_subtitles = self._extract_theplatform_smil(release_url, video_id, 'Downloading %s SMIL data' % pid)
formats.extend(tp_formats)
subtitles = self._merge_subtitles(subtitles, tp_subtitles)
self._sort_formats(formats)
return {
'id': video_id,
@@ -105,3 +92,41 @@ class CBSNewsIE(InfoExtractor):
'formats': formats,
'subtitles': subtitles,
}
class CBSNewsLiveVideoIE(InfoExtractor):
IE_DESC = 'CBS News Live Videos'
_VALID_URL = r'http://(?:www\.)?cbsnews\.com/live/video/(?P<id>[\da-z_-]+)'
_TEST = {
'url': 'http://www.cbsnews.com/live/video/clinton-sanders-prepare-to-face-off-in-nh/',
'info_dict': {
'id': 'clinton-sanders-prepare-to-face-off-in-nh',
'ext': 'flv',
'title': 'Clinton, Sanders Prepare To Face Off In NH',
'duration': 334,
},
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
video_info = self._parse_json(self._html_search_regex(
r'data-story-obj=\'({.+?})\'', webpage, 'video JSON info'), video_id)['story']
hdcore_sign = 'hdcore=3.3.1'
f4m_formats = self._extract_f4m_formats(video_info['url'] + '&' + hdcore_sign, video_id)
if f4m_formats:
for entry in f4m_formats:
# URLs without the extra param induce an 404 error
entry.update({'extra_param_to_segment_url': hdcore_sign})
return {
'id': video_id,
'title': video_info['headline'],
'thumbnail': video_info.get('thumbnail_url_hd') or video_info.get('thumbnail_url_sd'),
'duration': parse_duration(video_info.get('segmentDur')),
'formats': f4m_formats,
}

View File

@@ -45,7 +45,7 @@ class CCCIE(InfoExtractor):
title = self._html_search_regex(
r'(?s)<h1>(.*?)</h1>', webpage, 'title')
description = self._html_search_regex(
r"(?s)<h3>About</h3>(.+?)<h3>",
r'(?s)<h3>About</h3>(.+?)<h3>',
webpage, 'description', fatal=False)
upload_date = unified_strdate(self._html_search_regex(
r"(?s)<span[^>]+class='[^']*fa-calendar-o'[^>]*>(.+?)</span>",

View File

@@ -177,16 +177,16 @@ class CeskaTelevizeIE(InfoExtractor):
for divider in [1000, 60, 60, 100]:
components.append(msec % divider)
msec //= divider
return "{3:02}:{2:02}:{1:02},{0:03}".format(*components)
return '{3:02}:{2:02}:{1:02},{0:03}'.format(*components)
def _fix_subtitle(subtitle):
for line in subtitle.splitlines():
m = re.match(r"^\s*([0-9]+);\s*([0-9]+)\s+([0-9]+)\s*$", line)
m = re.match(r'^\s*([0-9]+);\s*([0-9]+)\s+([0-9]+)\s*$', line)
if m:
yield m.group(1)
start, stop = (_msectotimecode(int(t)) for t in m.groups()[1:])
yield "{0} --> {1}".format(start, stop)
yield '{0} --> {1}'.format(start, stop)
else:
yield line
return "\r\n".join(_fix_subtitle(subtitles))
return '\r\n'.join(_fix_subtitle(subtitles))

View File

@@ -26,14 +26,14 @@ class CNNIE(InfoExtractor):
'upload_date': '20130609',
},
}, {
"url": "http://edition.cnn.com/video/?/video/us/2013/08/21/sot-student-gives-epic-speech.georgia-institute-of-technology&utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+rss%2Fcnn_topstories+%28RSS%3A+Top+Stories%29",
"md5": "b5cc60c60a3477d185af8f19a2a26f4e",
"info_dict": {
'url': 'http://edition.cnn.com/video/?/video/us/2013/08/21/sot-student-gives-epic-speech.georgia-institute-of-technology&utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+rss%2Fcnn_topstories+%28RSS%3A+Top+Stories%29',
'md5': 'b5cc60c60a3477d185af8f19a2a26f4e',
'info_dict': {
'id': 'us/2013/08/21/sot-student-gives-epic-speech.georgia-institute-of-technology',
'ext': 'mp4',
"title": "Student's epic speech stuns new freshmen",
"description": "A Georgia Tech student welcomes the incoming freshmen with an epic speech backed by music from \"2001: A Space Odyssey.\"",
"upload_date": "20130821",
'title': "Student's epic speech stuns new freshmen",
'description': "A Georgia Tech student welcomes the incoming freshmen with an epic speech backed by music from \"2001: A Space Odyssey.\"",
'upload_date': '20130821',
}
}, {
'url': 'http://www.cnn.com/video/data/2.0/video/living/2014/12/22/growing-america-nashville-salemtown-board-episode-1.hln.html',

View File

@@ -46,9 +46,9 @@ class CollegeRamaIE(InfoExtractor):
video_id = self._match_id(url)
player_options_request = {
"getPlayerOptionsRequest": {
"ResourceId": video_id,
"QueryString": "",
'getPlayerOptionsRequest': {
'ResourceId': video_id,
'QueryString': '',
}
}

View File

@@ -2,6 +2,7 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
int_or_none,
parse_duration,
@@ -14,14 +15,13 @@ class ComCarCoffIE(InfoExtractor):
_TESTS = [{
'url': 'http://comediansincarsgettingcoffee.com/miranda-sings-happy-thanksgiving-miranda/',
'info_dict': {
'id': 'miranda-sings-happy-thanksgiving-miranda',
'id': '2494164',
'ext': 'mp4',
'upload_date': '20141127',
'timestamp': 1417107600,
'duration': 1232,
'title': 'Happy Thanksgiving Miranda',
'description': 'Jerry Seinfeld and his special guest Miranda Sings cruise around town in search of coffee, complaining and apologizing along the way.',
'thumbnail': 'http://ccc.crackle.com/images/s5e4_thumb.jpg',
},
'params': {
'skip_download': 'requires ffmpeg',
@@ -39,15 +39,14 @@ class ComCarCoffIE(InfoExtractor):
r'window\.app\s*=\s*({.+?});\n', webpage, 'full data json'),
display_id)['videoData']
video_id = full_data['activeVideo']['video']
video_data = full_data.get('videos', {}).get(video_id) or full_data['singleshots'][video_id]
display_id = full_data['activeVideo']['video']
video_data = full_data.get('videos', {}).get(display_id) or full_data['singleshots'][display_id]
video_id = compat_str(video_data['mediaId'])
thumbnails = [{
'url': video_data['images']['thumb'],
}, {
'url': video_data['images']['poster'],
}]
formats = self._extract_m3u8_formats(
video_data['mediaUrl'], video_id, ext='mp4')
timestamp = int_or_none(video_data.get('pubDateTime')) or parse_iso8601(
video_data.get('pubDate'))
@@ -55,6 +54,8 @@ class ComCarCoffIE(InfoExtractor):
video_data.get('duration'))
return {
'_type': 'url_transparent',
'url': 'crackle:%s' % video_id,
'id': video_id,
'display_id': display_id,
'title': video_data['title'],
@@ -62,6 +63,7 @@ class ComCarCoffIE(InfoExtractor):
'timestamp': timestamp,
'duration': duration,
'thumbnails': thumbnails,
'formats': formats,
'season_number': int_or_none(video_data.get('season')),
'episode_number': int_or_none(video_data.get('episode')),
'webpage_url': 'http://comediansincarsgettingcoffee.com/%s' % (video_data.get('urlSlug', video_data.get('slug'))),
}

View File

@@ -16,11 +16,11 @@ from ..utils import (
class ComedyCentralIE(MTVServicesInfoExtractor):
_VALID_URL = r'''(?x)https?://(?:www\.)?cc\.com/
(video-clips|episodes|cc-studios|video-collections|full-episodes)
(video-clips|episodes|cc-studios|video-collections|full-episodes|shows)
/(?P<title>.*)'''
_FEED_URL = 'http://comedycentral.com/feeds/mrss/'
_TEST = {
_TESTS = [{
'url': 'http://www.cc.com/video-clips/kllhuv/stand-up-greg-fitzsimmons--uncensored---too-good-of-a-mother',
'md5': 'c4f48e9eda1b16dd10add0744344b6d8',
'info_dict': {
@@ -29,7 +29,10 @@ class ComedyCentralIE(MTVServicesInfoExtractor):
'title': 'CC:Stand-Up|Greg Fitzsimmons: Life on Stage|Uncensored - Too Good of a Mother',
'description': 'After a certain point, breastfeeding becomes c**kblocking.',
},
}
}, {
'url': 'http://www.cc.com/shows/the-daily-show-with-trevor-noah/interviews/6yx39d/exclusive-rand-paul-extended-interview',
'only_matching': True,
}]
class ComedyCentralShowsIE(MTVServicesInfoExtractor):
@@ -192,7 +195,7 @@ class ComedyCentralShowsIE(MTVServicesInfoExtractor):
if len(altMovieParams) == 0:
raise ExtractorError('unable to find Flash URL in webpage ' + url)
else:
mMovieParams = [("http://media.mtvnservices.com/" + altMovieParams[0], altMovieParams[0])]
mMovieParams = [('http://media.mtvnservices.com/' + altMovieParams[0], altMovieParams[0])]
uri = mMovieParams[0][1]
# Correct cc.com in uri

View File

@@ -10,6 +10,7 @@ import re
import socket
import sys
import time
import math
from ..compat import (
compat_cookiejar,
@@ -44,6 +45,8 @@ from ..utils import (
xpath_text,
xpath_with_ns,
determine_protocol,
parse_duration,
mimetype2ext,
)
@@ -634,7 +637,7 @@ class InfoExtractor(object):
downloader_params = self._downloader.params
# Attempt to use provided username and password or .netrc data
if downloader_params.get('username', None) is not None:
if downloader_params.get('username') is not None:
username = downloader_params['username']
password = downloader_params['password']
elif downloader_params.get('usenetrc', False):
@@ -661,7 +664,7 @@ class InfoExtractor(object):
return None
downloader_params = self._downloader.params
if downloader_params.get('twofactor', None) is not None:
if downloader_params.get('twofactor') is not None:
return downloader_params['twofactor']
return compat_getpass('Type %s and press [Return]: ' % note)
@@ -742,7 +745,7 @@ class InfoExtractor(object):
'mature': 17,
'restricted': 19,
}
return RATING_TABLE.get(rating.lower(), None)
return RATING_TABLE.get(rating.lower())
def _family_friendly_search(self, html):
# See http://schema.org/VideoObject
@@ -757,7 +760,7 @@ class InfoExtractor(object):
'0': 18,
'false': 18,
}
return RATING_TABLE.get(family_friendly.lower(), None)
return RATING_TABLE.get(family_friendly.lower())
def _twitter_search_player(self, html):
return self._html_search_meta('twitter:player', html,
@@ -825,6 +828,12 @@ class InfoExtractor(object):
if not formats:
raise ExtractorError('No video formats found')
for f in formats:
# Automatically determine tbr when missing based on abr and vbr (improves
# formats sorting in some cases)
if 'tbr' not in f and f.get('abr') is not None and f.get('vbr') is not None:
f['tbr'] = f['abr'] + f['vbr']
def _formats_key(f):
# TODO remove the following workaround
from ..utils import determine_ext
@@ -891,6 +900,16 @@ class InfoExtractor(object):
item='%s video format' % f.get('format_id') if f.get('format_id') else 'video'),
formats)
@staticmethod
def _remove_duplicate_formats(formats):
format_urls = set()
unique_formats = []
for f in formats:
if f['url'] not in format_urls:
format_urls.add(f['url'])
unique_formats.append(f)
formats[:] = unique_formats
def _is_valid_url(self, url, video_id, item='video'):
url = self._proto_relative_url(url, scheme='http:')
# For now assume non HTTP(S) URLs always valid
@@ -1014,6 +1033,18 @@ class InfoExtractor(object):
return []
m3u8_doc, urlh = res
m3u8_url = urlh.geturl()
# A Media Playlist Tag MUST NOT appear in a Master Playlist
# https://tools.ietf.org/html/draft-pantos-http-live-streaming-17#section-4.3.3
# The EXT-X-TARGETDURATION tag is REQUIRED for every M3U8 Media Playlists
# https://tools.ietf.org/html/draft-pantos-http-live-streaming-17#section-4.3.3.1
if '#EXT-X-TARGETDURATION' in m3u8_doc:
return [{
'url': m3u8_url,
'format_id': m3u8_id,
'ext': ext,
'protocol': entry_protocol,
'preference': preference,
}]
last_info = None
last_media = None
kv_rex = re.compile(
@@ -1058,9 +1089,9 @@ class InfoExtractor(object):
# TODO: looks like video codec is not always necessarily goes first
va_codecs = codecs.split(',')
if va_codecs[0]:
f['vcodec'] = va_codecs[0].partition('.')[0]
f['vcodec'] = va_codecs[0]
if len(va_codecs) > 1 and va_codecs[1]:
f['acodec'] = va_codecs[1].partition('.')[0]
f['acodec'] = va_codecs[1]
resolution = last_info.get('RESOLUTION')
if resolution:
width_str, height_str = resolution.split('x')
@@ -1164,12 +1195,15 @@ class InfoExtractor(object):
formats = []
rtmp_count = 0
http_count = 0
m3u8_count = 0
srcs = []
videos = smil.findall(self._xpath_ns('.//video', namespace))
for video in videos:
src = video.get('src')
if not src:
if not src or src in srcs:
continue
srcs.append(src)
bitrate = float_or_none(video.get('system-bitrate') or video.get('systemBitrate'), 1000)
filesize = int_or_none(video.get('size') or video.get('fileSize'))
@@ -1201,10 +1235,20 @@ class InfoExtractor(object):
continue
src_url = src if src.startswith('http') else compat_urlparse.urljoin(base, src)
src_url = src_url.strip()
if proto == 'm3u8' or src_ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
src_url, video_id, ext or 'mp4', m3u8_id='hls', fatal=False))
m3u8_formats = self._extract_m3u8_formats(
src_url, video_id, ext or 'mp4', m3u8_id='hls', fatal=False)
if len(m3u8_formats) == 1:
m3u8_count += 1
m3u8_formats[0].update({
'format_id': 'hls-%d' % (m3u8_count if bitrate is None else bitrate),
'tbr': bitrate,
'width': width,
'height': height,
})
formats.extend(m3u8_formats)
continue
if src_ext == 'f4m':
@@ -1237,21 +1281,14 @@ class InfoExtractor(object):
return formats
def _parse_smil_subtitles(self, smil, namespace=None, subtitles_lang='en'):
urls = []
subtitles = {}
for num, textstream in enumerate(smil.findall(self._xpath_ns('.//textstream', namespace))):
src = textstream.get('src')
if not src:
if not src or src in urls:
continue
ext = textstream.get('ext') or determine_ext(src)
if not ext:
type_ = textstream.get('type')
SUBTITLES_TYPES = {
'text/vtt': 'vtt',
'text/srt': 'srt',
'application/smptett+xml': 'tt',
}
if type_ in SUBTITLES_TYPES:
ext = SUBTITLES_TYPES[type_]
urls.append(src)
ext = textstream.get('ext') or determine_ext(src) or mimetype2ext(textstream.get('type'))
lang = textstream.get('systemLanguage') or textstream.get('systemLanguageName') or textstream.get('lang') or subtitles_lang
subtitles.setdefault(lang, []).append({
'url': src,
@@ -1302,10 +1339,167 @@ class InfoExtractor(object):
})
return entries
def _extract_mpd_formats(self, mpd_url, video_id, mpd_id=None, note=None, errnote=None, fatal=True, formats_dict={}):
res = self._download_webpage_handle(
mpd_url, video_id,
note=note or 'Downloading MPD manifest',
errnote=errnote or 'Failed to download MPD manifest',
fatal=fatal)
if res is False:
return []
mpd, urlh = res
mpd_base_url = re.match(r'https?://.+/', urlh.geturl()).group()
return self._parse_mpd_formats(
compat_etree_fromstring(mpd.encode('utf-8')), mpd_id, mpd_base_url, formats_dict=formats_dict)
def _parse_mpd_formats(self, mpd_doc, mpd_id=None, mpd_base_url='', formats_dict={}):
if mpd_doc.get('type') == 'dynamic':
return []
namespace = self._search_regex(r'(?i)^{([^}]+)?}MPD$', mpd_doc.tag, 'namespace', default=None)
def _add_ns(path):
return self._xpath_ns(path, namespace)
def is_drm_protected(element):
return element.find(_add_ns('ContentProtection')) is not None
def extract_multisegment_info(element, ms_parent_info):
ms_info = ms_parent_info.copy()
segment_list = element.find(_add_ns('SegmentList'))
if segment_list is not None:
segment_urls_e = segment_list.findall(_add_ns('SegmentURL'))
if segment_urls_e:
ms_info['segment_urls'] = [segment.attrib['media'] for segment in segment_urls_e]
initialization = segment_list.find(_add_ns('Initialization'))
if initialization is not None:
ms_info['initialization_url'] = initialization.attrib['sourceURL']
else:
segment_template = element.find(_add_ns('SegmentTemplate'))
if segment_template is not None:
start_number = segment_template.get('startNumber')
if start_number:
ms_info['start_number'] = int(start_number)
segment_timeline = segment_template.find(_add_ns('SegmentTimeline'))
if segment_timeline is not None:
s_e = segment_timeline.findall(_add_ns('S'))
if s_e:
ms_info['total_number'] = 0
for s in s_e:
ms_info['total_number'] += 1 + int(s.get('r', '0'))
else:
timescale = segment_template.get('timescale')
if timescale:
ms_info['timescale'] = int(timescale)
segment_duration = segment_template.get('duration')
if segment_duration:
ms_info['segment_duration'] = int(segment_duration)
media_template = segment_template.get('media')
if media_template:
ms_info['media_template'] = media_template
initialization = segment_template.get('initialization')
if initialization:
ms_info['initialization_url'] = initialization
else:
initialization = segment_template.find(_add_ns('Initialization'))
if initialization is not None:
ms_info['initialization_url'] = initialization.attrib['sourceURL']
return ms_info
mpd_duration = parse_duration(mpd_doc.get('mediaPresentationDuration'))
formats = []
for period in mpd_doc.findall(_add_ns('Period')):
period_duration = parse_duration(period.get('duration')) or mpd_duration
period_ms_info = extract_multisegment_info(period, {
'start_number': 1,
'timescale': 1,
})
for adaptation_set in period.findall(_add_ns('AdaptationSet')):
if is_drm_protected(adaptation_set):
continue
adaption_set_ms_info = extract_multisegment_info(adaptation_set, period_ms_info)
for representation in adaptation_set.findall(_add_ns('Representation')):
if is_drm_protected(representation):
continue
representation_attrib = adaptation_set.attrib.copy()
representation_attrib.update(representation.attrib)
mime_type = representation_attrib.get('mimeType')
content_type = mime_type.split('/')[0] if mime_type else representation_attrib.get('contentType')
if content_type == 'text':
# TODO implement WebVTT downloading
pass
elif content_type == 'video' or content_type == 'audio':
base_url = ''
for element in (representation, adaptation_set, period, mpd_doc):
base_url_e = element.find(_add_ns('BaseURL'))
if base_url_e is not None:
base_url = base_url_e.text + base_url
if re.match(r'^https?://', base_url):
break
if mpd_base_url and not re.match(r'^https?://', base_url):
if not mpd_base_url.endswith('/') and not base_url.startswith('/'):
mpd_base_url += '/'
base_url = mpd_base_url + base_url
representation_id = representation_attrib.get('id')
lang = representation_attrib.get('lang')
url_el = representation.find(_add_ns('BaseURL'))
filesize = int_or_none(url_el.attrib.get('{http://youtube.com/yt/2012/10/10}contentLength') if url_el is not None else None)
f = {
'format_id': '%s-%s' % (mpd_id, representation_id) if mpd_id else representation_id,
'url': base_url,
'width': int_or_none(representation_attrib.get('width')),
'height': int_or_none(representation_attrib.get('height')),
'tbr': int_or_none(representation_attrib.get('bandwidth'), 1000),
'asr': int_or_none(representation_attrib.get('audioSamplingRate')),
'fps': int_or_none(representation_attrib.get('frameRate')),
'vcodec': 'none' if content_type == 'audio' else representation_attrib.get('codecs'),
'acodec': 'none' if content_type == 'video' else representation_attrib.get('codecs'),
'language': lang if lang not in ('mul', 'und', 'zxx', 'mis') else None,
'format_note': 'DASH %s' % content_type,
'filesize': filesize,
}
representation_ms_info = extract_multisegment_info(representation, adaption_set_ms_info)
if 'segment_urls' not in representation_ms_info and 'media_template' in representation_ms_info:
if 'total_number' not in representation_ms_info and 'segment_duration':
segment_duration = float(representation_ms_info['segment_duration']) / float(representation_ms_info['timescale'])
representation_ms_info['total_number'] = int(math.ceil(float(period_duration) / segment_duration))
media_template = representation_ms_info['media_template']
media_template = media_template.replace('$RepresentationID$', representation_id)
media_template = re.sub(r'\$(Number|Bandwidth)(?:%(0\d+)d)?\$', r'%(\1)\2d', media_template)
media_template.replace('$$', '$')
representation_ms_info['segment_urls'] = [media_template % {'Number': segment_number, 'Bandwidth': representation_attrib.get('bandwidth')} for segment_number in range(representation_ms_info['start_number'], representation_ms_info['total_number'] + representation_ms_info['start_number'])]
if 'segment_urls' in representation_ms_info:
f.update({
'segment_urls': representation_ms_info['segment_urls'],
'protocol': 'http_dash_segments',
})
if 'initialization_url' in representation_ms_info:
initialization_url = representation_ms_info['initialization_url'].replace('$RepresentationID$', representation_id)
f.update({
'initialization_url': initialization_url,
})
if not f.get('url'):
f['url'] = initialization_url
try:
existing_format = next(
fo for fo in formats
if fo['format_id'] == representation_id)
except StopIteration:
full_info = formats_dict.get(representation_id, {}).copy()
full_info.update(f)
formats.append(full_info)
else:
existing_format.update(f)
else:
self.report_warning('Unknown MIME type %s in DASH manifest' % mime_type)
self._sort_formats(formats)
return formats
def _live_title(self, name):
""" Generate the title for a live video """
now = datetime.datetime.now()
now_str = now.strftime("%Y-%m-%d %H:%M")
now_str = now.strftime('%Y-%m-%d %H:%M')
return name + ' ' + now_str
def _int(self, v, name, fatal=False, **kwargs):
@@ -1378,7 +1572,7 @@ class InfoExtractor(object):
return {}
def _get_subtitles(self, *args, **kwargs):
raise NotImplementedError("This method must be implemented by subclasses")
raise NotImplementedError('This method must be implemented by subclasses')
@staticmethod
def _merge_subtitle_items(subtitle_list1, subtitle_list2):
@@ -1404,7 +1598,7 @@ class InfoExtractor(object):
return {}
def _get_automatic_captions(self, *args, **kwargs):
raise NotImplementedError("This method must be implemented by subclasses")
raise NotImplementedError('This method must be implemented by subclasses')
class SearchInfoExtractor(InfoExtractor):
@@ -1444,7 +1638,7 @@ class SearchInfoExtractor(InfoExtractor):
def _get_n_results(self, query, n):
"""Get a specified number of results for a query"""
raise NotImplementedError("This method must be implemented by subclasses")
raise NotImplementedError('This method must be implemented by subclasses')
@property
def SEARCH_KEY(self):

View File

@@ -0,0 +1,95 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import int_or_none
class CrackleIE(InfoExtractor):
_VALID_URL = r'(?:crackle:|https?://(?:www\.)?crackle\.com/(?:playlist/\d+/|(?:[^/]+/)+))(?P<id>\d+)'
_TEST = {
'url': 'http://www.crackle.com/the-art-of-more/2496419',
'info_dict': {
'id': '2496419',
'ext': 'mp4',
'title': 'Heavy Lies the Head',
'description': 'md5:bb56aa0708fe7b9a4861535f15c3abca',
},
'params': {
# m3u8 download
'skip_download': True,
}
}
# extracted from http://legacyweb-us.crackle.com/flash/QueryReferrer.ashx
_SUBTITLE_SERVER = 'http://web-us-az.crackle.com'
_UPLYNK_OWNER_ID = 'e8773f7770a44dbd886eee4fca16a66b'
_THUMBNAIL_TEMPLATE = 'http://images-us-am.crackle.com/%stnl_1920x1080.jpg?ts=20140107233116?c=635333335057637614'
# extracted from http://legacyweb-us.crackle.com/flash/ReferrerRedirect.ashx
_MEDIA_FILE_SLOTS = {
'c544.flv': {
'width': 544,
'height': 306,
},
'360p.mp4': {
'width': 640,
'height': 360,
},
'480p.mp4': {
'width': 852,
'height': 478,
},
'480p_1mbps.mp4': {
'width': 852,
'height': 478,
},
}
def _real_extract(self, url):
video_id = self._match_id(url)
item = self._download_xml(
'http://legacyweb-us.crackle.com/app/revamp/vidwallcache.aspx?flags=-1&fm=%s' % video_id,
video_id).find('i')
title = item.attrib['t']
thumbnail = None
subtitles = {}
formats = self._extract_m3u8_formats(
'http://content.uplynk.com/ext/%s/%s.m3u8' % (self._UPLYNK_OWNER_ID, video_id),
video_id, 'mp4', m3u8_id='hls', fatal=None)
path = item.attrib.get('p')
if path:
thumbnail = self._THUMBNAIL_TEMPLATE % path
http_base_url = 'http://ahttp.crackle.com/' + path
for mfs_path, mfs_info in self._MEDIA_FILE_SLOTS.items():
formats.append({
'url': http_base_url + mfs_path,
'format_id': 'http-' + mfs_path.split('.')[0],
'width': mfs_info['width'],
'height': mfs_info['height'],
})
for cc in item.findall('cc'):
locale = cc.attrib.get('l')
v = cc.attrib.get('v')
if locale and v:
if locale not in subtitles:
subtitles[locale] = []
subtitles[locale] = [{
'url': '%s/%s%s_%s.xml' % (self._SUBTITLE_SERVER, path, locale, v),
'ext': 'ttml',
}]
self._sort_formats(formats, ('width', 'height', 'tbr', 'format_id'))
return {
'id': video_id,
'title': title,
'description': item.attrib.get('d'),
'duration': int(item.attrib.get('r'), 16) if item.attrib.get('r') else None,
'series': item.attrib.get('sn'),
'season_number': int_or_none(item.attrib.get('se')),
'episode_number': int_or_none(item.attrib.get('ep')),
'thumbnail': thumbnail,
'subtitles': subtitles,
'formats': formats,
}

View File

@@ -180,40 +180,40 @@ class CrunchyrollIE(CrunchyrollBaseIE):
return assvalue
output = '[Script Info]\n'
output += 'Title: %s\n' % sub_root.attrib["title"]
output += 'Title: %s\n' % sub_root.attrib['title']
output += 'ScriptType: v4.00+\n'
output += 'WrapStyle: %s\n' % sub_root.attrib["wrap_style"]
output += 'PlayResX: %s\n' % sub_root.attrib["play_res_x"]
output += 'PlayResY: %s\n' % sub_root.attrib["play_res_y"]
output += 'WrapStyle: %s\n' % sub_root.attrib['wrap_style']
output += 'PlayResX: %s\n' % sub_root.attrib['play_res_x']
output += 'PlayResY: %s\n' % sub_root.attrib['play_res_y']
output += """ScaledBorderAndShadow: yes
[V4+ Styles]
Format: Name, Fontname, Fontsize, PrimaryColour, SecondaryColour, OutlineColour, BackColour, Bold, Italic, Underline, StrikeOut, ScaleX, ScaleY, Spacing, Angle, BorderStyle, Outline, Shadow, Alignment, MarginL, MarginR, MarginV, Encoding
"""
for style in sub_root.findall('./styles/style'):
output += 'Style: ' + style.attrib["name"]
output += ',' + style.attrib["font_name"]
output += ',' + style.attrib["font_size"]
output += ',' + style.attrib["primary_colour"]
output += ',' + style.attrib["secondary_colour"]
output += ',' + style.attrib["outline_colour"]
output += ',' + style.attrib["back_colour"]
output += ',' + ass_bool(style.attrib["bold"])
output += ',' + ass_bool(style.attrib["italic"])
output += ',' + ass_bool(style.attrib["underline"])
output += ',' + ass_bool(style.attrib["strikeout"])
output += ',' + style.attrib["scale_x"]
output += ',' + style.attrib["scale_y"]
output += ',' + style.attrib["spacing"]
output += ',' + style.attrib["angle"]
output += ',' + style.attrib["border_style"]
output += ',' + style.attrib["outline"]
output += ',' + style.attrib["shadow"]
output += ',' + style.attrib["alignment"]
output += ',' + style.attrib["margin_l"]
output += ',' + style.attrib["margin_r"]
output += ',' + style.attrib["margin_v"]
output += ',' + style.attrib["encoding"]
output += 'Style: ' + style.attrib['name']
output += ',' + style.attrib['font_name']
output += ',' + style.attrib['font_size']
output += ',' + style.attrib['primary_colour']
output += ',' + style.attrib['secondary_colour']
output += ',' + style.attrib['outline_colour']
output += ',' + style.attrib['back_colour']
output += ',' + ass_bool(style.attrib['bold'])
output += ',' + ass_bool(style.attrib['italic'])
output += ',' + ass_bool(style.attrib['underline'])
output += ',' + ass_bool(style.attrib['strikeout'])
output += ',' + style.attrib['scale_x']
output += ',' + style.attrib['scale_y']
output += ',' + style.attrib['spacing']
output += ',' + style.attrib['angle']
output += ',' + style.attrib['border_style']
output += ',' + style.attrib['outline']
output += ',' + style.attrib['shadow']
output += ',' + style.attrib['alignment']
output += ',' + style.attrib['margin_l']
output += ',' + style.attrib['margin_r']
output += ',' + style.attrib['margin_v']
output += ',' + style.attrib['encoding']
output += '\n'
output += """
@@ -222,15 +222,15 @@ Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
"""
for event in sub_root.findall('./events/event'):
output += 'Dialogue: 0'
output += ',' + event.attrib["start"]
output += ',' + event.attrib["end"]
output += ',' + event.attrib["style"]
output += ',' + event.attrib["name"]
output += ',' + event.attrib["margin_l"]
output += ',' + event.attrib["margin_r"]
output += ',' + event.attrib["margin_v"]
output += ',' + event.attrib["effect"]
output += ',' + event.attrib["text"]
output += ',' + event.attrib['start']
output += ',' + event.attrib['end']
output += ',' + event.attrib['style']
output += ',' + event.attrib['name']
output += ',' + event.attrib['margin_l']
output += ',' + event.attrib['margin_r']
output += ',' + event.attrib['margin_v']
output += ',' + event.attrib['effect']
output += ',' + event.attrib['text']
output += '\n'
return output
@@ -376,7 +376,7 @@ Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
class CrunchyrollShowPlaylistIE(CrunchyrollBaseIE):
IE_NAME = "crunchyroll:playlist"
IE_NAME = 'crunchyroll:playlist'
_VALID_URL = r'https?://(?:(?P<prefix>www|m)\.)?(?P<url>crunchyroll\.com/(?!(?:news|anime-news|library|forum|launchcalendar|lineup|store|comics|freetrial|login))(?P<id>[\w\-]+))/?(?:\?|$)'
_TESTS = [{

View File

@@ -68,11 +68,16 @@ class CSpanIE(InfoExtractor):
video_type, video_id = matches.groups()
video_type = 'clip' if video_type == 'id' else 'program'
else:
senate_isvp_url = SenateISVPIE._search_iframe_url(webpage)
if senate_isvp_url:
title = self._og_search_title(webpage)
surl = smuggle_url(senate_isvp_url, {'force_title': title})
return self.url_result(surl, 'SenateISVP', video_id, title)
m = re.search(r'data-(?P<type>clip|prog)id=["\'](?P<id>\d+)', webpage)
if m:
video_id = m.group('id')
video_type = 'program' if m.group('type') == 'prog' else 'clip'
else:
senate_isvp_url = SenateISVPIE._search_iframe_url(webpage)
if senate_isvp_url:
title = self._og_search_title(webpage)
surl = smuggle_url(senate_isvp_url, {'force_title': title})
return self.url_result(surl, 'SenateISVP', video_id, title)
if video_type is None or video_id is None:
raise ExtractorError('unable to find video id and type')
@@ -107,6 +112,13 @@ class CSpanIE(InfoExtractor):
'height': int_or_none(get_text_attr(quality, 'height')),
'tbr': int_or_none(get_text_attr(quality, 'bitrate')),
})
if not formats:
path = unescapeHTML(get_text_attr(f, 'path'))
if not path:
continue
formats = self._extract_m3u8_formats(
path, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls') if determine_ext(path) == 'm3u8' else [{'url': path, }]
self._sort_formats(formats)
entries.append({
'id': '%s_%d' % (video_id, partnum + 1),

View File

@@ -122,10 +122,13 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
description = self._og_search_description(webpage) or self._html_search_meta(
'description', webpage, 'description')
view_count = str_to_int(self._search_regex(
[r'<meta[^>]+itemprop="interactionCount"[^>]+content="UserPlays:(\d+)"',
r'video_views_count[^>]+>\s+([\d\.,]+)'],
webpage, 'view count', fatal=False))
view_count_str = self._search_regex(
(r'<meta[^>]+itemprop="interactionCount"[^>]+content="UserPlays:([\s\d,.]+)"',
r'video_views_count[^>]+>\s+([\s\d\,.]+)'),
webpage, 'view count', fatal=False)
if view_count_str:
view_count_str = re.sub(r'\s', '', view_count_str)
view_count = str_to_int(view_count_str)
comment_count = int_or_none(self._search_regex(
r'<meta[^>]+itemprop="interactionCount"[^>]+content="UserComments:(\d+)"',
webpage, 'comment count', fatal=False))
@@ -396,13 +399,13 @@ class DailymotionCloudIE(DailymotionBaseInfoExtractor):
}]
@classmethod
def _extract_dmcloud_url(self, webpage):
mobj = re.search(r'<iframe[^>]+src=[\'"](%s)[\'"]' % self._VALID_EMBED_URL, webpage)
def _extract_dmcloud_url(cls, webpage):
mobj = re.search(r'<iframe[^>]+src=[\'"](%s)[\'"]' % cls._VALID_EMBED_URL, webpage)
if mobj:
return mobj.group(1)
mobj = re.search(
r'<input[^>]+id=[\'"]dmcloudUrlEmissionSelect[\'"][^>]+value=[\'"](%s)[\'"]' % self._VALID_EMBED_URL,
r'<input[^>]+id=[\'"]dmcloudUrlEmissionSelect[\'"][^>]+value=[\'"](%s)[\'"]' % cls._VALID_EMBED_URL,
webpage)
if mobj:
return mobj.group(1)

View File

@@ -2,17 +2,26 @@
from __future__ import unicode_literals
import re
import itertools
from .common import InfoExtractor
from ..compat import compat_urllib_parse
from ..compat import (
compat_parse_qs,
compat_urllib_parse,
compat_urllib_parse_unquote,
compat_urlparse,
)
from ..utils import (
int_or_none,
str_to_int,
xpath_text,
unescapeHTML,
)
class DaumIE(InfoExtractor):
_VALID_URL = r'https?://(?:m\.)?tvpot\.daum\.net/v/(?P<id>[^?#&]+)'
_VALID_URL = r'https?://(?:(?:m\.)?tvpot\.daum\.net/v/|videofarm\.daum\.net/controller/player/VodPlayer\.swf\?vid=)(?P<id>[^?#&]+)'
IE_NAME = 'daum.net'
_TESTS = [{
@@ -23,25 +32,57 @@ class DaumIE(InfoExtractor):
'title': '마크 헌트 vs 안토니오 실바',
'description': 'Mark Hunt vs Antonio Silva',
'upload_date': '20131217',
'thumbnail': 're:^https?://.*\.(?:jpg|png)',
'duration': 2117,
'view_count': int,
'comment_count': int,
},
}, {
'url': 'http://m.tvpot.daum.net/v/65139429',
'info_dict': {
'id': '65139429',
'ext': 'mp4',
'title': '1297회, \'아빠 아들로 태어나길 잘 했어\' 민수, 감동의 눈물[아빠 어디가] 20150118',
'description': 'md5:79794514261164ff27e36a21ad229fc5',
'upload_date': '20150604',
'thumbnail': 're:^https?://.*\.(?:jpg|png)',
'duration': 154,
'view_count': int,
'comment_count': int,
},
}, {
'url': 'http://tvpot.daum.net/v/07dXWRka62Y%24',
'only_matching': True,
}, {
'url': 'http://videofarm.daum.net/controller/player/VodPlayer.swf?vid=vwIpVpCQsT8%24&ref=',
'info_dict': {
'id': 'vwIpVpCQsT8$',
'ext': 'flv',
'title': '01-Korean War ( Trouble on the horizon )',
'description': '\nKorean War 01\nTrouble on the horizon\n전쟁의 먹구름',
'upload_date': '20080223',
'thumbnail': 're:^https?://.*\.(?:jpg|png)',
'duration': 249,
'view_count': int,
'comment_count': int,
},
}]
def _real_extract(self, url):
video_id = self._match_id(url)
video_id = compat_urllib_parse_unquote(self._match_id(url))
query = compat_urllib_parse.urlencode({'vid': video_id})
info = self._download_xml(
'http://tvpot.daum.net/clip/ClipInfoXml.do?' + query, video_id,
'Downloading video info')
movie_data = self._download_json(
'http://videofarm.daum.net/controller/api/closed/v1_2/IntegratedMovieData.json?' + query,
video_id, 'Downloading video formats info')
# For urls like http://m.tvpot.daum.net/v/65139429, where the video_id is really a clipid
if not movie_data.get('output_list', {}).get('output_list') and re.match(r'^\d+$', video_id):
return self.url_result('http://tvpot.daum.net/clip/ClipView.do?clipid=%s' % video_id)
info = self._download_xml(
'http://tvpot.daum.net/clip/ClipInfoXml.do?' + query, video_id,
'Downloading video info')
formats = []
for format_el in movie_data['output_list']['output_list']:
profile = format_el['profile']
@@ -76,8 +117,9 @@ class DaumIE(InfoExtractor):
class DaumClipIE(InfoExtractor):
_VALID_URL = r'https?://(?:m\.)?tvpot\.daum\.net/(?:clip/ClipView.do|mypot/View.do)\?.*?clipid=(?P<id>\d+)'
_VALID_URL = r'https?://(?:m\.)?tvpot\.daum\.net/(?:clip/ClipView.(?:do|tv)|mypot/View.do)\?.*?clipid=(?P<id>\d+)'
IE_NAME = 'daum.net:clip'
_URL_TEMPLATE = 'http://tvpot.daum.net/clip/ClipView.do?clipid=%s'
_TESTS = [{
'url': 'http://tvpot.daum.net/clip/ClipView.do?clipid=52554690',
@@ -87,11 +129,19 @@ class DaumClipIE(InfoExtractor):
'title': 'DOTA 2GETHER 시즌2 6회 - 2부',
'description': 'DOTA 2GETHER 시즌2 6회 - 2부',
'upload_date': '20130831',
'thumbnail': 're:^https?://.*\.(?:jpg|png)',
'duration': 3868,
'view_count': int,
},
}, {
'url': 'http://m.tvpot.daum.net/clip/ClipView.tv?clipid=54999425',
'only_matching': True,
}]
@classmethod
def suitable(cls, url):
return False if DaumPlaylistIE.suitable(url) or DaumUserIE.suitable(url) else super(DaumClipIE, cls).suitable(url)
def _real_extract(self, url):
video_id = self._match_id(url)
clip_info = self._download_json(
@@ -102,7 +152,7 @@ class DaumClipIE(InfoExtractor):
'_type': 'url_transparent',
'id': video_id,
'url': 'http://tvpot.daum.net/v/%s' % clip_info['vid'],
'title': clip_info['title'],
'title': unescapeHTML(clip_info['title']),
'thumbnail': clip_info.get('thumb_url'),
'description': clip_info.get('contents'),
'duration': int_or_none(clip_info.get('duration')),
@@ -110,3 +160,139 @@ class DaumClipIE(InfoExtractor):
'view_count': int_or_none(clip_info.get('play_count')),
'ie_key': 'Daum',
}
class DaumListIE(InfoExtractor):
def _get_entries(self, list_id, list_id_type):
name = None
entries = []
for pagenum in itertools.count(1):
list_info = self._download_json(
'http://tvpot.daum.net/mypot/json/GetClipInfo.do?size=48&init=true&order=date&page=%d&%s=%s' % (
pagenum, list_id_type, list_id), list_id, 'Downloading list info - %s' % pagenum)
entries.extend([
self.url_result(
'http://tvpot.daum.net/v/%s' % clip['vid'])
for clip in list_info['clip_list']
])
if not name:
name = list_info.get('playlist_bean', {}).get('name') or \
list_info.get('potInfo', {}).get('name')
if not list_info.get('has_more'):
break
return name, entries
def _check_clip(self, url, list_id):
query_dict = compat_parse_qs(compat_urlparse.urlparse(url).query)
if 'clipid' in query_dict:
clip_id = query_dict['clipid'][0]
if self._downloader.params.get('noplaylist'):
self.to_screen('Downloading just video %s because of --no-playlist' % clip_id)
return self.url_result(DaumClipIE._URL_TEMPLATE % clip_id, 'DaumClip')
else:
self.to_screen('Downloading playlist %s - add --no-playlist to just download video' % list_id)
class DaumPlaylistIE(DaumListIE):
_VALID_URL = r'https?://(?:m\.)?tvpot\.daum\.net/mypot/(?:View\.do|Top\.tv)\?.*?playlistid=(?P<id>[0-9]+)'
IE_NAME = 'daum.net:playlist'
_URL_TEMPLATE = 'http://tvpot.daum.net/mypot/View.do?playlistid=%s'
_TESTS = [{
'note': 'Playlist url with clipid',
'url': 'http://tvpot.daum.net/mypot/View.do?playlistid=6213966&clipid=73806844',
'info_dict': {
'id': '6213966',
'title': 'Woorissica Official',
},
'playlist_mincount': 181
}, {
'note': 'Playlist url with clipid - noplaylist',
'url': 'http://tvpot.daum.net/mypot/View.do?playlistid=6213966&clipid=73806844',
'info_dict': {
'id': '73806844',
'ext': 'mp4',
'title': '151017 Airport',
'upload_date': '20160117',
},
'params': {
'noplaylist': True,
'skip_download': True,
}
}]
@classmethod
def suitable(cls, url):
return False if DaumUserIE.suitable(url) else super(DaumPlaylistIE, cls).suitable(url)
def _real_extract(self, url):
list_id = self._match_id(url)
clip_result = self._check_clip(url, list_id)
if clip_result:
return clip_result
name, entries = self._get_entries(list_id, 'playlistid')
return self.playlist_result(entries, list_id, name)
class DaumUserIE(DaumListIE):
_VALID_URL = r'https?://(?:m\.)?tvpot\.daum\.net/mypot/(?:View|Top)\.(?:do|tv)\?.*?ownerid=(?P<id>[0-9a-zA-Z]+)'
IE_NAME = 'daum.net:user'
_TESTS = [{
'url': 'http://tvpot.daum.net/mypot/View.do?ownerid=o2scDLIVbHc0',
'info_dict': {
'id': 'o2scDLIVbHc0',
'title': '마이 리틀 텔레비전',
},
'playlist_mincount': 213
}, {
'url': 'http://tvpot.daum.net/mypot/View.do?ownerid=o2scDLIVbHc0&clipid=73801156',
'info_dict': {
'id': '73801156',
'ext': 'mp4',
'title': '[미공개] 김구라, 오만석이 부릅니다 \'오케피\' - 마이 리틀 텔레비전 20160116',
'upload_date': '20160117',
'description': 'md5:5e91d2d6747f53575badd24bd62b9f36'
},
'params': {
'noplaylist': True,
'skip_download': True,
}
}, {
'note': 'Playlist url has ownerid and playlistid, playlistid takes precedence',
'url': 'http://tvpot.daum.net/mypot/View.do?ownerid=o2scDLIVbHc0&playlistid=6196631',
'info_dict': {
'id': '6196631',
'title': '마이 리틀 텔레비전 - 20160109',
},
'playlist_count': 11
}, {
'url': 'http://tvpot.daum.net/mypot/Top.do?ownerid=o2scDLIVbHc0',
'only_matching': True,
}, {
'url': 'http://m.tvpot.daum.net/mypot/Top.tv?ownerid=45x1okb1If50&playlistid=3569733',
'only_matching': True,
}]
def _real_extract(self, url):
list_id = self._match_id(url)
clip_result = self._check_clip(url, list_id)
if clip_result:
return clip_result
query_dict = compat_parse_qs(compat_urlparse.urlparse(url).query)
if 'playlistid' in query_dict:
playlist_id = query_dict['playlistid'][0]
return self.url_result(DaumPlaylistIE._URL_TEMPLATE % playlist_id, 'DaumPlaylist')
name, entries = self._get_entries(list_id, 'ownerid')
return self.playlist_result(entries, list_id, name)

View File

@@ -87,7 +87,7 @@ class DRBonanzaIE(InfoExtractor):
formats = []
for file in info['Files']:
if info['Type'] == "Video":
if info['Type'] == 'Video':
if file['Type'] in video_types:
format = parse_filename_info(file['Location'])
format.update({
@@ -101,10 +101,10 @@ class DRBonanzaIE(InfoExtractor):
if '/bonanza/' in rtmp_url:
format['play_path'] = rtmp_url.split('/bonanza/')[1]
formats.append(format)
elif file['Type'] == "Thumb":
elif file['Type'] == 'Thumb':
thumbnail = file['Location']
elif info['Type'] == "Audio":
if file['Type'] == "Audio":
elif info['Type'] == 'Audio':
if file['Type'] == 'Audio':
format = parse_filename_info(file['Location'])
format.update({
'url': file['Location'],
@@ -112,7 +112,7 @@ class DRBonanzaIE(InfoExtractor):
'vcodec': 'none',
})
formats.append(format)
elif file['Type'] == "Thumb":
elif file['Type'] == 'Thumb':
thumbnail = file['Location']
description = '%s\n%s\n%s\n' % (

View File

@@ -17,85 +17,85 @@ class EightTracksIE(InfoExtractor):
IE_NAME = '8tracks'
_VALID_URL = r'https?://8tracks\.com/(?P<user>[^/]+)/(?P<id>[^/#]+)(?:#.*)?$'
_TEST = {
"name": "EightTracks",
"url": "http://8tracks.com/ytdl/youtube-dl-test-tracks-a",
"info_dict": {
'name': 'EightTracks',
'url': 'http://8tracks.com/ytdl/youtube-dl-test-tracks-a',
'info_dict': {
'id': '1336550',
'display_id': 'youtube-dl-test-tracks-a',
"description": "test chars: \"'/\\ä↭",
"title": "youtube-dl test tracks \"'/\\ä↭<>",
'description': "test chars: \"'/\\ä↭",
'title': "youtube-dl test tracks \"'/\\ä↭<>",
},
"playlist": [
'playlist': [
{
"md5": "96ce57f24389fc8734ce47f4c1abcc55",
"info_dict": {
"id": "11885610",
"ext": "m4a",
"title": "youtue-dl project<>\"' - youtube-dl test track 1 \"'/\\\u00e4\u21ad",
"uploader_id": "ytdl"
'md5': '96ce57f24389fc8734ce47f4c1abcc55',
'info_dict': {
'id': '11885610',
'ext': 'm4a',
'title': "youtue-dl project<>\"' - youtube-dl test track 1 \"'/\\\u00e4\u21ad",
'uploader_id': 'ytdl'
}
},
{
"md5": "4ab26f05c1f7291ea460a3920be8021f",
"info_dict": {
"id": "11885608",
"ext": "m4a",
"title": "youtube-dl project - youtube-dl test track 2 \"'/\\\u00e4\u21ad",
"uploader_id": "ytdl"
'md5': '4ab26f05c1f7291ea460a3920be8021f',
'info_dict': {
'id': '11885608',
'ext': 'm4a',
'title': "youtube-dl project - youtube-dl test track 2 \"'/\\\u00e4\u21ad",
'uploader_id': 'ytdl'
}
},
{
"md5": "d30b5b5f74217410f4689605c35d1fd7",
"info_dict": {
"id": "11885679",
"ext": "m4a",
"title": "youtube-dl project as well - youtube-dl test track 3 \"'/\\\u00e4\u21ad",
"uploader_id": "ytdl"
'md5': 'd30b5b5f74217410f4689605c35d1fd7',
'info_dict': {
'id': '11885679',
'ext': 'm4a',
'title': "youtube-dl project as well - youtube-dl test track 3 \"'/\\\u00e4\u21ad",
'uploader_id': 'ytdl'
}
},
{
"md5": "4eb0a669317cd725f6bbd336a29f923a",
"info_dict": {
"id": "11885680",
"ext": "m4a",
"title": "youtube-dl project as well - youtube-dl test track 4 \"'/\\\u00e4\u21ad",
"uploader_id": "ytdl"
'md5': '4eb0a669317cd725f6bbd336a29f923a',
'info_dict': {
'id': '11885680',
'ext': 'm4a',
'title': "youtube-dl project as well - youtube-dl test track 4 \"'/\\\u00e4\u21ad",
'uploader_id': 'ytdl'
}
},
{
"md5": "1893e872e263a2705558d1d319ad19e8",
"info_dict": {
"id": "11885682",
"ext": "m4a",
"title": "PH - youtube-dl test track 5 \"'/\\\u00e4\u21ad",
"uploader_id": "ytdl"
'md5': '1893e872e263a2705558d1d319ad19e8',
'info_dict': {
'id': '11885682',
'ext': 'm4a',
'title': "PH - youtube-dl test track 5 \"'/\\\u00e4\u21ad",
'uploader_id': 'ytdl'
}
},
{
"md5": "b673c46f47a216ab1741ae8836af5899",
"info_dict": {
"id": "11885683",
"ext": "m4a",
"title": "PH - youtube-dl test track 6 \"'/\\\u00e4\u21ad",
"uploader_id": "ytdl"
'md5': 'b673c46f47a216ab1741ae8836af5899',
'info_dict': {
'id': '11885683',
'ext': 'm4a',
'title': "PH - youtube-dl test track 6 \"'/\\\u00e4\u21ad",
'uploader_id': 'ytdl'
}
},
{
"md5": "1d74534e95df54986da7f5abf7d842b7",
"info_dict": {
"id": "11885684",
"ext": "m4a",
"title": "phihag - youtube-dl test track 7 \"'/\\\u00e4\u21ad",
"uploader_id": "ytdl"
'md5': '1d74534e95df54986da7f5abf7d842b7',
'info_dict': {
'id': '11885684',
'ext': 'm4a',
'title': "phihag - youtube-dl test track 7 \"'/\\\u00e4\u21ad",
'uploader_id': 'ytdl'
}
},
{
"md5": "f081f47af8f6ae782ed131d38b9cd1c0",
"info_dict": {
"id": "11885685",
"ext": "m4a",
"title": "phihag - youtube-dl test track 8 \"'/\\\u00e4\u21ad",
"uploader_id": "ytdl"
'md5': 'f081f47af8f6ae782ed131d38b9cd1c0',
'info_dict': {
'id': '11885685',
'ext': 'm4a',
'title': "phihag - youtube-dl test track 8 \"'/\\\u00e4\u21ad",
'uploader_id': 'ytdl'
}
}
]

View File

@@ -72,7 +72,7 @@ class EllenTVClipsIE(InfoExtractor):
def _extract_playlist(self, webpage):
json_string = self._search_regex(r'playerView.addClips\(\[\{(.*?)\}\]\);', webpage, 'json')
try:
return json.loads("[{" + json_string + "}]")
return json.loads('[{' + json_string + '}]')
except ValueError as ve:
raise ExtractorError('Failed to download JSON', cause=ve)

View File

@@ -53,8 +53,8 @@ class ESPNIE(InfoExtractor):
webpage = self._download_webpage(url, video_id)
video_id = self._search_regex(
r'class="video-play-button"[^>]+data-id="(\d+)',
webpage, 'video id')
r'class=(["\']).*?video-play-button.*?\1[^>]+data-id=["\'](?P<id>\d+)',
webpage, 'video id', group='id')
cms = 'espn'
if 'data-source="intl"' in webpage:

View File

@@ -14,14 +14,14 @@ class EveryonesMixtapeIE(InfoExtractor):
_TESTS = [{
'url': 'http://everyonesmixtape.com/#/mix/m7m0jJAbMQi/5',
"info_dict": {
'info_dict': {
'id': '5bfseWNmlds',
'ext': 'mp4',
"title": "Passion Pit - \"Sleepyhead\" (Official Music Video)",
"uploader": "FKR.TV",
"uploader_id": "frenchkissrecords",
"description": "Music video for \"Sleepyhead\" from Passion Pit's debut EP Chunk Of Change.\nBuy on iTunes: https://itunes.apple.com/us/album/chunk-of-change-ep/id300087641\n\nDirected by The Wilderness.\n\nhttp://www.passionpitmusic.com\nhttp://www.frenchkissrecords.com",
"upload_date": "20081015"
'title': "Passion Pit - \"Sleepyhead\" (Official Music Video)",
'uploader': 'FKR.TV',
'uploader_id': 'frenchkissrecords',
'description': "Music video for \"Sleepyhead\" from Passion Pit's debut EP Chunk Of Change.\nBuy on iTunes: https://itunes.apple.com/us/album/chunk-of-change-ep/id300087641\n\nDirected by The Wilderness.\n\nhttp://www.passionpitmusic.com\nhttp://www.frenchkissrecords.com",
'upload_date': '20081015'
},
'params': {
'skip_download': True, # This is simply YouTube

View File

@@ -41,7 +41,7 @@ class ExfmIE(InfoExtractor):
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
song_id = mobj.group('id')
info_url = "http://ex.fm/api/v3/song/%s" % song_id
info_url = 'http://ex.fm/api/v3/song/%s' % song_id
info = self._download_json(info_url, song_id)['song']
song_url = info['url']
if re.match(self._SOUNDCLOUD_URL, song_url) is not None:

View File

@@ -6,9 +6,11 @@ import socket
from .common import InfoExtractor
from ..compat import (
compat_etree_fromstring,
compat_http_client,
compat_urllib_error,
compat_urllib_parse_unquote,
compat_urllib_parse_unquote_plus,
)
from ..utils import (
error_to_compat_str,
@@ -23,19 +25,30 @@ from ..utils import (
class FacebookIE(InfoExtractor):
_VALID_URL = r'''(?x)
https?://(?:\w+\.)?facebook\.com/
(?:[^#]*?\#!/)?
(?:
(?:video/video\.php|photo\.php|video\.php|video/embed)\?(?:.*?)
(?:v|video_id)=|
[^/]+/videos/(?:[^/]+/)?
)
(?P<id>[0-9]+)
(?:.*)'''
(?:
https?://
(?:\w+\.)?facebook\.com/
(?:[^#]*?\#!/)?
(?:
(?:
video/video\.php|
photo\.php|
video\.php|
video/embed
)\?(?:.*?)(?:v|video_id)=|
[^/]+/videos/(?:[^/]+/)?
)|
facebook:
)
(?P<id>[0-9]+)
'''
_LOGIN_URL = 'https://www.facebook.com/login.php?next=http%3A%2F%2Ffacebook.com%2Fhome.php&login_attempt=1'
_CHECKPOINT_URL = 'https://www.facebook.com/checkpoint/?next=http%3A%2F%2Ffacebook.com%2Fhome.php&_fb_noscript=1'
_NETRC_MACHINE = 'facebook'
IE_NAME = 'facebook'
_CHROME_USER_AGENT = 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.97 Safari/537.36'
_TESTS = [{
'url': 'https://www.facebook.com/video.php?v=637842556329505&fref=nf',
'md5': '6a40d33c0eccbb1af76cf0485a052659',
@@ -57,6 +70,16 @@ class FacebookIE(InfoExtractor):
'expected_warnings': [
'title'
]
}, {
'note': 'Video with DASH manifest',
'url': 'https://www.facebook.com/video.php?v=957955867617029',
'md5': '54706e4db4f5ad58fbad82dde1f1213f',
'info_dict': {
'id': '957955867617029',
'ext': 'mp4',
'title': 'When you post epic content on instagram.com/433 8 million followers, this is ...',
'uploader': 'Demy de Zeeuw',
},
}, {
'url': 'https://www.facebook.com/video.php?v=10204634152394104',
'only_matching': True,
@@ -66,6 +89,9 @@ class FacebookIE(InfoExtractor):
}, {
'url': 'https://www.facebook.com/ChristyClarkForBC/videos/vb.22819070941/10153870694020942/?type=2&theater',
'only_matching': True,
}, {
'url': 'facebook:544765982287235',
'only_matching': True,
}]
def _login(self):
@@ -136,13 +162,36 @@ class FacebookIE(InfoExtractor):
def _real_extract(self, url):
video_id = self._match_id(url)
url = 'https://www.facebook.com/video/video.php?v=%s' % video_id
webpage = self._download_webpage(url, video_id)
req = sanitized_Request('https://www.facebook.com/video/video.php?v=%s' % video_id)
req.add_header('User-Agent', self._CHROME_USER_AGENT)
webpage = self._download_webpage(req, video_id)
video_data = None
BEFORE = '{swf.addParam(param[0], param[1]);});\n'
AFTER = '.forEach(function(variable) {swf.addVariable(variable[0], variable[1]);});'
m = re.search(re.escape(BEFORE) + '(.*?)' + re.escape(AFTER), webpage)
if not m:
if m:
data = dict(json.loads(m.group(1)))
params_raw = compat_urllib_parse_unquote(data['params'])
video_data = json.loads(params_raw)['video_data']
def video_data_list2dict(video_data):
ret = {}
for item in video_data:
format_id = item['stream_type']
ret.setdefault(format_id, []).append(item)
return ret
if not video_data:
server_js_data = self._parse_json(self._search_regex(
r'handleServerJS\(({.+})\);', webpage, 'server js data'), video_id)
for item in server_js_data.get('instances', []):
if item[1][0] == 'VideoConfig':
video_data = video_data_list2dict(item[2][0]['videoData'])
break
if not video_data:
m_msg = re.search(r'class="[^"]*uiInterstitialContent[^"]*"><div>(.*?)</div>', webpage)
if m_msg is not None:
raise ExtractorError(
@@ -150,12 +199,9 @@ class FacebookIE(InfoExtractor):
expected=True)
else:
raise ExtractorError('Cannot parse data')
data = dict(json.loads(m.group(1)))
params_raw = compat_urllib_parse_unquote(data['params'])
params = json.loads(params_raw)
formats = []
for format_id, f in params['video_data'].items():
for format_id, f in video_data.items():
if not f or not isinstance(f, list):
continue
for quality in ('sd', 'hd'):
@@ -167,9 +213,15 @@ class FacebookIE(InfoExtractor):
'url': src,
'preference': -10 if format_id == 'progressive' else 0,
})
dash_manifest = f[0].get('dash_manifest')
if dash_manifest:
formats.extend(self._parse_mpd_formats(
compat_etree_fromstring(compat_urllib_parse_unquote_plus(dash_manifest))))
if not formats:
raise ExtractorError('Cannot find video formats')
self._sort_formats(formats)
video_title = self._html_search_regex(
r'<h2\s+[^>]*class="uiHeaderTitle"[^>]*>([^<]*)</h2>', webpage, 'title',
default=None)
@@ -188,3 +240,33 @@ class FacebookIE(InfoExtractor):
'formats': formats,
'uploader': uploader,
}
class FacebookPostIE(InfoExtractor):
IE_NAME = 'facebook:post'
_VALID_URL = r'https?://(?:\w+\.)?facebook\.com/[^/]+/posts/(?P<id>\d+)'
_TEST = {
'url': 'https://www.facebook.com/maxlayn/posts/10153807558977570',
'md5': '037b1fa7f3c2d02b7a0d7bc16031ecc6',
'info_dict': {
'id': '544765982287235',
'ext': 'mp4',
'title': '"What are you doing running in the snow?"',
'uploader': 'FailArmy',
}
}
def _real_extract(self, url):
post_id = self._match_id(url)
webpage = self._download_webpage(url, post_id)
entries = [
self.url_result('facebook:%s' % video_id, FacebookIE.ie_key())
for video_id in self._parse_json(
self._search_regex(
r'(["\'])video_ids\1\s*:\s*(?P<ids>\[.+?\])',
webpage, 'video ids', group='ids'),
post_id)]
return self.playlist_result(entries, post_id)

View File

@@ -52,7 +52,7 @@ class FazIE(InfoExtractor):
formats = []
for pref, code in enumerate(['LOW', 'HIGH', 'HQ']):
encoding = xpath_element(encodings, code)
if encoding:
if encoding is not None:
encoding_url = xpath_text(encoding, 'FILENAME')
if encoding_url:
formats.append({

View File

@@ -87,7 +87,7 @@ class FC2IE(InfoExtractor):
mimi = hashlib.md5((video_id + '_gGddgPfeaf_gzyr').encode('utf-8')).hexdigest()
info_url = (
"http://video.fc2.com/ginfo.php?mimi={1:s}&href={2:s}&v={0:s}&fversion=WIN%2011%2C6%2C602%2C180&from=2&otag=0&upid={0:s}&tk=null&".
'http://video.fc2.com/ginfo.php?mimi={1:s}&href={2:s}&v={0:s}&fversion=WIN%2011%2C6%2C602%2C180&from=2&otag=0&upid={0:s}&tk=null&'.
format(video_id, mimi, compat_urllib_request.quote(refer, safe=b'').replace('.', '%2E')))
info_webpage = self._download_webpage(

View File

@@ -9,6 +9,7 @@ class FOXIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?fox\.com/watch/(?P<id>[0-9]+)'
_TEST = {
'url': 'http://www.fox.com/watch/255180355939/7684182528',
'md5': 'ebd296fcc41dd4b19f8115d8461a3165',
'info_dict': {
'id': '255180355939',
'ext': 'mp4',
@@ -17,10 +18,6 @@ class FOXIE(InfoExtractor):
'duration': 129,
},
'add_ie': ['ThePlatform'],
'params': {
# m3u8 download
'skip_download': True,
},
}
def _real_extract(self, url):
@@ -29,7 +26,7 @@ class FOXIE(InfoExtractor):
release_url = self._parse_json(self._search_regex(
r'"fox_pdk_player"\s*:\s*({[^}]+?})', webpage, 'fox_pdk_player'),
video_id)['release_url'] + '&manifest=m3u'
video_id)['release_url'] + '&switch=http'
return {
'_type': 'url_transparent',

View File

@@ -10,7 +10,7 @@ class FranceInterIE(InfoExtractor):
_TEST = {
'url': 'http://www.franceinter.fr/player/reecouter?play=793962',
'md5': '4764932e466e6f6c79c317d2e74f6884',
"info_dict": {
'info_dict': {
'id': '793962',
'ext': 'mp3',
'title': 'LHistoire dans les jeux vidéo',

View File

@@ -289,7 +289,7 @@ class FranceTVIE(FranceTVBaseInfoExtractor):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
video_id, catalogue = self._html_search_regex(
r'href="http://videos?\.francetv\.fr/video/([^@]+@[^"]+)"',
r'(?:href=|player\.setVideo\(\s*)"http://videos?\.francetv\.fr/video/([^@]+@[^"]+)"',
webpage, 'video ID').split('@')
return self._extract_video(video_id, catalogue)

View File

@@ -12,8 +12,8 @@ class FreeVideoIE(InfoExtractor):
'info_dict': {
'id': 'vysukany-zadecek-22033',
'ext': 'mp4',
"title": "vysukany-zadecek-22033",
"age_limit": 18,
'title': 'vysukany-zadecek-22033',
'age_limit': 18,
},
'skip': 'Blocked outside .cz',
}

View File

@@ -6,24 +6,29 @@ from ..utils import (
xpath_text,
xpath_with_ns,
)
from .youtube import YoutubeIE
class GamekingsIE(InfoExtractor):
_VALID_URL = r'http://www\.gamekings\.tv/(?:videos|nieuws)/(?P<id>[^/]+)'
_VALID_URL = r'http://www\.gamekings\.nl/(?:videos|nieuws)/(?P<id>[^/]+)'
_TESTS = [{
'url': 'http://www.gamekings.tv/videos/phoenix-wright-ace-attorney-dual-destinies-review/',
# MD5 is flaky, seems to change regularly
# 'md5': '2f32b1f7b80fdc5cb616efb4f387f8a3',
# YouTube embed video
'url': 'http://www.gamekings.nl/videos/phoenix-wright-ace-attorney-dual-destinies-review/',
'md5': '5208d3a17adeaef829a7861887cb9029',
'info_dict': {
'id': 'phoenix-wright-ace-attorney-dual-destinies-review',
'id': 'HkSQKetlGOU',
'ext': 'mp4',
'title': 'Phoenix Wright: Ace Attorney \u2013 Dual Destinies Review',
'description': 'md5:36fd701e57e8c15ac8682a2374c99731',
'title': 'Phoenix Wright: Ace Attorney - Dual Destinies Review',
'description': 'md5:db88c0e7f47e9ea50df3271b9dc72e1d',
'thumbnail': 're:^https?://.*\.jpg$',
'uploader_id': 'UCJugRGo4STYMeFr5RoOShtQ',
'uploader': 'Gamekings Vault',
'upload_date': '20151123',
},
'add_ie': ['Youtube'],
}, {
# vimeo video
'url': 'http://www.gamekings.tv/videos/the-legend-of-zelda-majoras-mask/',
'url': 'http://www.gamekings.nl/videos/the-legend-of-zelda-majoras-mask/',
'md5': '12bf04dfd238e70058046937657ea68d',
'info_dict': {
'id': 'the-legend-of-zelda-majoras-mask',
@@ -33,7 +38,7 @@ class GamekingsIE(InfoExtractor):
'thumbnail': 're:^https?://.*\.jpg$',
},
}, {
'url': 'http://www.gamekings.tv/nieuws/gamekings-extra-shelly-en-david-bereiden-zich-voor-op-de-livestream/',
'url': 'http://www.gamekings.nl/nieuws/gamekings-extra-shelly-en-david-bereiden-zich-voor-op-de-livestream/',
'only_matching': True,
}]
@@ -43,7 +48,11 @@ class GamekingsIE(InfoExtractor):
webpage = self._download_webpage(url, video_id)
playlist_id = self._search_regex(
r'gogoVideo\(\s*\d+\s*,\s*"([^"]+)', webpage, 'playlist id')
r'gogoVideo\([^,]+,\s*"([^"]+)', webpage, 'playlist id')
# Check if a YouTube embed is used
if YoutubeIE.suitable(playlist_id):
return self.url_result(playlist_id, ie='Youtube')
playlist = self._download_xml(
'http://www.gamekings.tv/wp-content/themes/gk2010/rss_playlist.php?id=%s' % playlist_id,

View File

@@ -224,6 +224,20 @@ class GenericIE(InfoExtractor):
'skip_download': True,
},
},
# MPD from http://dash-mse-test.appspot.com/media.html
{
'url': 'http://yt-dash-mse-test.commondatastorage.googleapis.com/media/car-20120827-manifest.mpd',
'md5': '4b57baab2e30d6eb3a6a09f0ba57ef53',
'info_dict': {
'id': 'car-20120827-manifest',
'ext': 'mp4',
'title': 'car-20120827-manifest',
'formats': 'mincount:9',
},
'params': {
'format': 'bestvideo',
},
},
# google redirect
{
'url': 'http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CCUQtwIwAA&url=http%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3DcmQHVoWB5FY&ei=F-sNU-LLCaXk4QT52ICQBQ&usg=AFQjCNEw4hL29zgOohLXvpJ-Bdh2bils1Q&bvm=bv.61965928,d.bGE',
@@ -1229,19 +1243,24 @@ class GenericIE(InfoExtractor):
# Check for direct link to a video
content_type = head_response.headers.get('Content-Type', '')
m = re.match(r'^(?P<type>audio|video|application(?=/ogg$))/(?P<format_id>.+)$', content_type)
m = re.match(r'^(?P<type>audio|video|application(?=/(?:ogg$|(?:vnd\.apple\.|x-)?mpegurl)))/(?P<format_id>.+)$', content_type)
if m:
upload_date = unified_strdate(
head_response.headers.get('Last-Modified'))
formats = []
if m.group('format_id').endswith('mpegurl'):
formats = self._extract_m3u8_formats(url, video_id, 'mp4')
else:
formats = [{
'format_id': m.group('format_id'),
'url': url,
'vcodec': 'none' if m.group('type') == 'audio' else None
}]
return {
'id': video_id,
'title': compat_urllib_parse_unquote(os.path.splitext(url_basename(url))[0]),
'direct': True,
'formats': [{
'format_id': m.group('format_id'),
'url': url,
'vcodec': 'none' if m.group('type') == 'audio' else None
}],
'formats': formats,
'upload_date': upload_date,
}
@@ -1284,7 +1303,7 @@ class GenericIE(InfoExtractor):
self.report_extraction(video_id)
# Is it an RSS feed, a SMIL file or a XSPF playlist?
# Is it an RSS feed, a SMIL file, an XSPF playlist or a MPD manifest?
try:
doc = compat_etree_fromstring(webpage.encode('utf-8'))
if doc.tag == 'rss':
@@ -1293,6 +1312,13 @@ class GenericIE(InfoExtractor):
return self._parse_smil(doc, url, video_id)
elif doc.tag == '{http://xspf.org/ns/0/}playlist':
return self.playlist_result(self._parse_xspf(doc, video_id), video_id)
elif re.match(r'(?i)^(?:{[^}]+})?MPD$', doc.tag):
return {
'id': video_id,
'title': compat_urllib_parse_unquote(os.path.splitext(url_basename(url))[0]),
'formats': self._parse_mpd_formats(
doc, video_id, mpd_base_url=url.rpartition('/')[0]),
}
except compat_xml_parse_error:
pass
@@ -1402,7 +1428,7 @@ class GenericIE(InfoExtractor):
# Look for embedded Dailymotion player
matches = re.findall(
r'<(?:embed|iframe)[^>]+?src=(["\'])(?P<url>(?:https?:)?//(?:www\.)?dailymotion\.com/(?:embed|swf)/video/.+?)\1', webpage)
r'<(?:(?:embed|iframe)[^>]+?src=|input[^>]+id=[\'"]dmcloudUrlEmissionSelect[\'"][^>]+value=)(["\'])(?P<url>(?:https?:)?//(?:www\.)?dailymotion\.com/(?:embed|swf)/video/.+?)\1', webpage)
if matches:
return _playlist_from_matches(
matches, lambda m: unescapeHTML(m[1]))
@@ -1547,6 +1573,11 @@ class GenericIE(InfoExtractor):
if mobj is not None:
return self.url_result(mobj.group('url'), 'VK')
# Look for embedded Odnoklassniki player
mobj = re.search(r'<iframe[^>]+?src=(["\'])(?P<url>https?://(?:odnoklassniki|ok)\.ru/videoembed/.+?)\1', webpage)
if mobj is not None:
return self.url_result(mobj.group('url'), 'Odnoklassniki')
# Look for embedded ivi player
mobj = re.search(r'<embed[^>]+?src=(["\'])(?P<url>https?://(?:www\.)?ivi\.ru/video/player.+?)\1', webpage)
if mobj is not None:
@@ -1819,6 +1850,17 @@ class GenericIE(InfoExtractor):
if digiteka_url:
return self.url_result(self._proto_relative_url(digiteka_url), DigitekaIE.ie_key())
# Look for Limelight embeds
mobj = re.search(r'LimelightPlayer\.doLoad(Media|Channel|ChannelList)\(["\'](?P<id>[a-z0-9]{32})', webpage)
if mobj:
lm = {
'Media': 'media',
'Channel': 'channel',
'ChannelList': 'channel_list',
}
return self.url_result('limelight:%s:%s' % (
lm[mobj.group(1)], mobj.group(2)), 'Limelight%s' % mobj.group(1), mobj.group(2))
# Look for AdobeTVVideo embeds
mobj = re.search(
r'<iframe[^>]+src=[\'"]((?:https?:)?//video\.tv\.adobe\.com/v/\d+[^"]+)[\'"]',
@@ -1935,6 +1977,8 @@ class GenericIE(InfoExtractor):
return self.playlist_result(self._extract_xspf_playlist(video_url, video_id), video_id)
elif ext == 'm3u8':
entry_info_dict['formats'] = self._extract_m3u8_formats(video_url, video_id, ext='mp4')
elif ext == 'mpd':
entry_info_dict['formats'] = self._extract_mpd_formats(video_url, video_id)
else:
entry_info_dict['url'] = video_url

View File

@@ -65,7 +65,7 @@ class GloboIE(InfoExtractor):
'only_matching': True,
}]
class MD5:
class MD5(object):
HEX_FORMAT_LOWERCASE = 0
HEX_FORMAT_UPPERCASE = 1
BASE64_PAD_CHARACTER_DEFAULT_COMPLIANCE = ''

View File

@@ -82,7 +82,7 @@ class GoogleDriveIE(InfoExtractor):
return {
'id': video_id,
'title': title,
'thumbnail': self._og_search_thumbnail(webpage),
'thumbnail': self._og_search_thumbnail(webpage, default=None),
'duration': duration,
'formats': formats,
}

View File

@@ -11,8 +11,8 @@ class HentaiStigmaIE(InfoExtractor):
'info_dict': {
'id': 'inyouchuu-etsu-bonus',
'ext': 'mp4',
"title": "Inyouchuu Etsu Bonus",
"age_limit": 18,
'title': 'Inyouchuu Etsu Bonus',
'age_limit': 18,
}
}

View File

@@ -10,8 +10,8 @@ from ..utils import (
class HotStarIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?hotstar\.com/.*?[/-](?P<id>\d{10})'
_TEST = {
_VALID_URL = r'https?://(?:www\.)?hotstar\.com/(?:.+?[/-])?(?P<id>\d{10})'
_TESTS = [{
'url': 'http://www.hotstar.com/on-air-with-aib--english-1000076273',
'info_dict': {
'id': '1000076273',
@@ -26,7 +26,13 @@ class HotStarIE(InfoExtractor):
# m3u8 download
'skip_download': True,
}
}
}, {
'url': 'http://www.hotstar.com/sports/cricket/rajitha-sizzles-on-debut-with-329/2001477583',
'only_matching': True,
}, {
'url': 'http://www.hotstar.com/1000000515',
'only_matching': True,
}]
_GET_CONTENT_TEMPLATE = 'http://account.hotstar.com/AVS/besc?action=GetAggregatedContentDetails&channel=PCTV&contentId=%s'
_GET_CDN_TEMPLATE = 'http://getcdn.hotstar.com/AVS/besc?action=GetCDN&asJson=Y&channel=%s&id=%s&type=%s'

View File

@@ -21,6 +21,18 @@ class InstagramIE(InfoExtractor):
'title': 'Video by naomipq',
'description': 'md5:1f17f0ab29bd6fe2bfad705f58de3cb8',
}
}, {
# missing description
'url': 'https://www.instagram.com/p/BA-pQFBG8HZ/?taken-by=britneyspears',
'info_dict': {
'id': 'BA-pQFBG8HZ',
'ext': 'mp4',
'uploader_id': 'britneyspears',
'title': 'Video by britneyspears',
},
'params': {
'skip_download': True,
},
}, {
'url': 'https://instagram.com/p/-Cmh1cukG2/',
'only_matching': True,
@@ -32,8 +44,8 @@ class InstagramIE(InfoExtractor):
webpage = self._download_webpage(url, video_id)
uploader_id = self._search_regex(r'"owner":{"username":"(.+?)"',
webpage, 'uploader id', fatal=False)
desc = self._search_regex(r'"caption":"(.*?)"', webpage, 'description',
fatal=False)
desc = self._search_regex(
r'"caption":"(.+?)"', webpage, 'description', default=None)
return {
'id': video_id,

View File

@@ -2,46 +2,30 @@
from __future__ import unicode_literals
import re
from random import random
from math import floor
import time
from .common import InfoExtractor
from ..utils import (
ExtractorError,
remove_end,
sanitized_Request,
)
class IPrimaIE(InfoExtractor):
_WORKING = False
_VALID_URL = r'https?://play\.iprima\.cz/(?:[^/]+/)*(?P<id>[^?#]+)'
_VALID_URL = r'https?://play\.iprima\.cz/(?:.+/)?(?P<id>[^?#]+)'
_TESTS = [{
'url': 'http://play.iprima.cz/gondici-s-r-o-33',
'info_dict': {
'id': 'p136534',
'ext': 'mp4',
'title': 'Gondíci s. r. o. (34)',
'description': 'md5:16577c629d006aa91f59ca8d8e7f99bd',
},
'params': {
'skip_download': True, # m3u8 download
},
}, {
'url': 'http://play.iprima.cz/particka/particka-92',
'info_dict': {
'id': '39152',
'ext': 'flv',
'title': 'Partička (92)',
'description': 'md5:74e9617e51bca67c3ecfb2c6f9766f45',
'thumbnail': 'http://play.iprima.cz/sites/default/files/image_crops/image_620x349/3/491483_particka-92_image_620x349.jpg',
},
'params': {
'skip_download': True, # requires rtmpdump
},
}, {
'url': 'http://play.iprima.cz/particka/tchibo-particka-jarni-moda',
'info_dict': {
'id': '9718337',
'ext': 'flv',
'title': 'Tchibo Partička - Jarní móda',
'thumbnail': 're:^http:.*\.jpg$',
},
'params': {
'skip_download': True, # requires rtmpdump
},
}, {
'url': 'http://play.iprima.cz/zpravy-ftv-prima-2752015',
'only_matching': True,
}]
@@ -51,62 +35,24 @@ class IPrimaIE(InfoExtractor):
webpage = self._download_webpage(url, video_id)
if re.search(r'Nemáte oprávnění přistupovat na tuto stránku\.\s*</div>', webpage):
raise ExtractorError(
'%s said: You do not have permission to access this page' % self.IE_NAME, expected=True)
video_id = self._search_regex(r'data-product="([^"]+)">', webpage, 'real id')
player_url = (
'http://embed.livebox.cz/iprimaplay/player-embed-v2.js?__tok%s__=%s' %
(floor(random() * 1073741824), floor(random() * 1073741824))
)
req = sanitized_Request(player_url)
req = sanitized_Request(
'http://play.iprima.cz/prehravac/init?_infuse=1'
'&_ts=%s&productId=%s' % (round(time.time()), video_id))
req.add_header('Referer', url)
playerpage = self._download_webpage(req, video_id)
playerpage = self._download_webpage(req, video_id, note='Downloading player')
base_url = ''.join(re.findall(r"embed\['stream'\] = '(.+?)'.+'(\?auth=)'.+'(.+?)';", playerpage)[1])
m3u8_url = self._search_regex(r"'src': '([^']+\.m3u8)'", playerpage, 'm3u8 url')
zoneGEO = self._html_search_regex(r'"zoneGEO":(.+?),', webpage, 'zoneGEO')
if zoneGEO != '0':
base_url = base_url.replace('token', 'token_' + zoneGEO)
formats = []
for format_id in ['lq', 'hq', 'hd']:
filename = self._html_search_regex(
r'"%s_id":(.+?),' % format_id, webpage, 'filename')
if filename == 'null':
continue
real_id = self._search_regex(
r'Prima-(?:[0-9]{10}|WEB)-([0-9]+)[-_]',
filename, 'real video id')
if format_id == 'lq':
quality = 0
elif format_id == 'hq':
quality = 1
elif format_id == 'hd':
quality = 2
filename = 'hq/' + filename
formats.append({
'format_id': format_id,
'url': base_url,
'quality': quality,
'play_path': 'mp4:' + filename.replace('"', '')[:-4],
'rtmp_live': True,
'ext': 'flv',
})
formats = self._extract_m3u8_formats(m3u8_url, video_id, ext='mp4')
self._sort_formats(formats)
return {
'id': real_id,
'title': remove_end(self._og_search_title(webpage), ' | Prima PLAY'),
'id': video_id,
'title': self._og_search_title(webpage),
'thumbnail': self._og_search_thumbnail(webpage),
'formats': formats,
'description': self._search_regex(
r'<p[^>]+itemprop="description"[^>]*>([^<]+)',
webpage, 'description', default=None),
'description': self._og_search_description(webpage),
}

View File

@@ -2,14 +2,194 @@
from __future__ import unicode_literals
import hashlib
import itertools
import math
import os
import random
import re
import time
import uuid
from .common import InfoExtractor
from ..compat import compat_urllib_parse
from ..utils import ExtractorError
from ..compat import (
compat_parse_qs,
compat_str,
compat_urllib_parse,
compat_urllib_parse_urlparse,
)
from ..utils import (
ExtractorError,
ohdave_rsa_encrypt,
remove_start,
sanitized_Request,
urlencode_postdata,
url_basename,
)
def md5_text(text):
return hashlib.md5(text.encode('utf-8')).hexdigest()
class IqiyiSDK(object):
def __init__(self, target, ip, timestamp):
self.target = target
self.ip = ip
self.timestamp = timestamp
@staticmethod
def split_sum(data):
return compat_str(sum(map(lambda p: int(p, 16), list(data))))
@staticmethod
def digit_sum(num):
if isinstance(num, int):
num = compat_str(num)
return compat_str(sum(map(int, num)))
def even_odd(self):
even = self.digit_sum(compat_str(self.timestamp)[::2])
odd = self.digit_sum(compat_str(self.timestamp)[1::2])
return even, odd
def preprocess(self, chunksize):
self.target = md5_text(self.target)
chunks = []
for i in range(32 // chunksize):
chunks.append(self.target[chunksize * i:chunksize * (i + 1)])
if 32 % chunksize:
chunks.append(self.target[32 - 32 % chunksize:])
return chunks, list(map(int, self.ip.split('.')))
def mod(self, modulus):
chunks, ip = self.preprocess(32)
self.target = chunks[0] + ''.join(map(lambda p: compat_str(p % modulus), ip))
def split(self, chunksize):
modulus_map = {
4: 256,
5: 10,
8: 100,
}
chunks, ip = self.preprocess(chunksize)
ret = ''
for i in range(len(chunks)):
ip_part = compat_str(ip[i] % modulus_map[chunksize]) if i < 4 else ''
if chunksize == 8:
ret += ip_part + chunks[i]
else:
ret += chunks[i] + ip_part
self.target = ret
def handle_input16(self):
self.target = md5_text(self.target)
self.target = self.split_sum(self.target[:16]) + self.target + self.split_sum(self.target[16:])
def handle_input8(self):
self.target = md5_text(self.target)
ret = ''
for i in range(4):
part = self.target[8 * i:8 * (i + 1)]
ret += self.split_sum(part) + part
self.target = ret
def handleSum(self):
self.target = md5_text(self.target)
self.target = self.split_sum(self.target) + self.target
def date(self, scheme):
self.target = md5_text(self.target)
d = time.localtime(self.timestamp)
strings = {
'y': compat_str(d.tm_year),
'm': '%02d' % d.tm_mon,
'd': '%02d' % d.tm_mday,
}
self.target += ''.join(map(lambda c: strings[c], list(scheme)))
def split_time_even_odd(self):
even, odd = self.even_odd()
self.target = odd + md5_text(self.target) + even
def split_time_odd_even(self):
even, odd = self.even_odd()
self.target = even + md5_text(self.target) + odd
def split_ip_time_sum(self):
chunks, ip = self.preprocess(32)
self.target = compat_str(sum(ip)) + chunks[0] + self.digit_sum(self.timestamp)
def split_time_ip_sum(self):
chunks, ip = self.preprocess(32)
self.target = self.digit_sum(self.timestamp) + chunks[0] + compat_str(sum(ip))
class IqiyiSDKInterpreter(object):
BASE62_TABLE = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
def __init__(self, sdk_code):
self.sdk_code = sdk_code
@classmethod
def base62(cls, num):
if num == 0:
return '0'
ret = ''
while num:
ret = cls.BASE62_TABLE[num % 62] + ret
num = num // 62
return ret
def decode_eval_codes(self):
self.sdk_code = self.sdk_code[5:-3]
mobj = re.search(
r"'([^']+)',62,(\d+),'([^']+)'\.split\('\|'\),[^,]+,{}",
self.sdk_code)
obfucasted_code, count, symbols = mobj.groups()
count = int(count)
symbols = symbols.split('|')
symbol_table = {}
while count:
count -= 1
b62count = self.base62(count)
symbol_table[b62count] = symbols[count] or b62count
self.sdk_code = re.sub(
r'\b(\w+)\b', lambda mobj: symbol_table[mobj.group(0)],
obfucasted_code)
def run(self, target, ip, timestamp):
self.decode_eval_codes()
functions = re.findall(r'input=([a-zA-Z0-9]+)\(input', self.sdk_code)
sdk = IqiyiSDK(target, ip, timestamp)
other_functions = {
'handleSum': sdk.handleSum,
'handleInput8': sdk.handle_input8,
'handleInput16': sdk.handle_input16,
'splitTimeEvenOdd': sdk.split_time_even_odd,
'splitTimeOddEven': sdk.split_time_odd_even,
'splitIpTimeSum': sdk.split_ip_time_sum,
'splitTimeIpSum': sdk.split_time_ip_sum,
}
for function in functions:
if re.match(r'mod\d+', function):
sdk.mod(int(function[3:]))
elif re.match(r'date[ymd]{3}', function):
sdk.date(function[4:])
elif re.match(r'split\d+', function):
sdk.split(int(function[5:]))
elif function in other_functions:
other_functions[function]()
else:
raise ExtractorError('Unknown funcion %s' % function)
return sdk.target
class IqiyiIE(InfoExtractor):
@@ -18,6 +198,8 @@ class IqiyiIE(InfoExtractor):
_VALID_URL = r'http://(?:[^.]+\.)?iqiyi\.com/.+\.html'
_NETRC_MACHINE = 'iqiyi'
_TESTS = [{
'url': 'http://www.iqiyi.com/v_19rrojlavg.html',
'md5': '2cb594dc2781e6c941a110d8f358118b',
@@ -93,6 +275,35 @@ class IqiyiIE(InfoExtractor):
}, {
'url': 'http://yule.iqiyi.com/pcb.html',
'only_matching': True,
}, {
# VIP-only video. The first 2 parts (6 minutes) are available without login
# MD5 sums omitted as values are different on Travis CI and my machine
'url': 'http://www.iqiyi.com/v_19rrny4w8w.html',
'info_dict': {
'id': 'f3cf468b39dddb30d676f89a91200dc1',
'title': '泰坦尼克号',
},
'playlist': [{
'info_dict': {
'id': 'f3cf468b39dddb30d676f89a91200dc1_part1',
'ext': 'f4v',
'title': '泰坦尼克号',
},
}, {
'info_dict': {
'id': 'f3cf468b39dddb30d676f89a91200dc1_part2',
'ext': 'f4v',
'title': '泰坦尼克号',
},
}],
'expected_warnings': ['Needs a VIP account for full video'],
}, {
'url': 'http://www.iqiyi.com/a_19rrhb8ce1.html',
'info_dict': {
'id': '202918101',
'title': '灌篮高手 国语版',
},
'playlist_count': 101,
}]
_FORMATS_MAP = [
@@ -104,11 +315,98 @@ class IqiyiIE(InfoExtractor):
('10', 'h1'),
]
@staticmethod
def md5_text(text):
return hashlib.md5(text.encode('utf-8')).hexdigest()
def _real_initialize(self):
self._login()
def construct_video_urls(self, data, video_id, _uuid):
@staticmethod
def _rsa_fun(data):
# public key extracted from http://static.iqiyi.com/js/qiyiV2/20160129180840/jobs/i18n/i18nIndex.js
N = 0xab86b6371b5318aaa1d3c9e612a9f1264f372323c8c0f19875b5fc3b3fd3afcc1e5bec527aa94bfa85bffc157e4245aebda05389a5357b75115ac94f074aefcd
e = 65537
return ohdave_rsa_encrypt(data, e, N)
def _login(self):
(username, password) = self._get_login_info()
# No authentication to be performed
if not username:
return True
data = self._download_json(
'http://kylin.iqiyi.com/get_token', None,
note='Get token for logging', errnote='Unable to get token for logging')
sdk = data['sdk']
timestamp = int(time.time())
target = '/apis/reglogin/login.action?lang=zh_TW&area_code=null&email=%s&passwd=%s&agenttype=1&from=undefined&keeplogin=0&piccode=&fromurl=&_pos=1' % (
username, self._rsa_fun(password.encode('utf-8')))
interp = IqiyiSDKInterpreter(sdk)
sign = interp.run(target, data['ip'], timestamp)
validation_params = {
'target': target,
'server': 'BEA3AA1908656AABCCFF76582C4C6660',
'token': data['token'],
'bird_src': 'f8d91d57af224da7893dd397d52d811a',
'sign': sign,
'bird_t': timestamp,
}
validation_result = self._download_json(
'http://kylin.iqiyi.com/validate?' + compat_urllib_parse.urlencode(validation_params), None,
note='Validate credentials', errnote='Unable to validate credentials')
MSG_MAP = {
'P00107': 'please login via the web interface and enter the CAPTCHA code',
'P00117': 'bad username or password',
}
code = validation_result['code']
if code != 'A00000':
msg = MSG_MAP.get(code)
if not msg:
msg = 'error %s' % code
if validation_result.get('msg'):
msg += ': ' + validation_result['msg']
self._downloader.report_warning('unable to log in: ' + msg)
return False
return True
def _authenticate_vip_video(self, api_video_url, video_id, tvid, _uuid, do_report_warning):
auth_params = {
# version and platform hard-coded in com/qiyi/player/core/model/remote/AuthenticationRemote.as
'version': '2.0',
'platform': 'b6c13e26323c537d',
'aid': tvid,
'tvid': tvid,
'uid': '',
'deviceId': _uuid,
'playType': 'main', # XXX: always main?
'filename': os.path.splitext(url_basename(api_video_url))[0],
}
qd_items = compat_parse_qs(compat_urllib_parse_urlparse(api_video_url).query)
for key, val in qd_items.items():
auth_params[key] = val[0]
auth_req = sanitized_Request(
'http://api.vip.iqiyi.com/services/ckn.action',
urlencode_postdata(auth_params))
# iQiyi server throws HTTP 405 error without the following header
auth_req.add_header('Content-Type', 'application/x-www-form-urlencoded')
auth_result = self._download_json(
auth_req, video_id,
note='Downloading video authentication JSON',
errnote='Unable to download video authentication JSON')
if auth_result['code'] == 'Q00506': # requires a VIP account
if do_report_warning:
self.report_warning('Needs a VIP account for full video')
return False
return auth_result
def construct_video_urls(self, data, video_id, _uuid, tvid):
def do_xor(x, y):
a = y % 3
if a == 1:
@@ -134,9 +432,10 @@ class IqiyiIE(InfoExtractor):
note='Download path key of segment %d for format %s' % (segment_index + 1, format_id)
)['t']
t = str(int(math.floor(int(tm) / (600.0))))
return self.md5_text(t + mg + x)
return md5_text(t + mg + x)
video_urls_dict = {}
need_vip_warning_report = True
for format_item in data['vp']['tkl'][0]['vs']:
if 0 < int(format_item['bid']) <= 10:
format_id = self.get_format(format_item['bid'])
@@ -155,11 +454,13 @@ class IqiyiIE(InfoExtractor):
vl = segment['l']
if not vl.startswith('/'):
vl = get_encode_code(vl)
key = get_path_key(
vl.split('/')[-1].split('.')[0], format_id, segment_index)
is_vip_video = '/vip/' in vl
filesize = segment['b']
base_url = data['vp']['du'].split('/')
base_url.insert(-1, key)
if not is_vip_video:
key = get_path_key(
vl.split('/')[-1].split('.')[0], format_id, segment_index)
base_url.insert(-1, key)
base_url = '/'.join(base_url)
param = {
'su': _uuid,
@@ -170,8 +471,23 @@ class IqiyiIE(InfoExtractor):
'ct': '',
'tn': str(int(time.time()))
}
api_video_url = base_url + vl + '?' + \
compat_urllib_parse.urlencode(param)
api_video_url = base_url + vl
if is_vip_video:
api_video_url = api_video_url.replace('.f4v', '.hml')
auth_result = self._authenticate_vip_video(
api_video_url, video_id, tvid, _uuid, need_vip_warning_report)
if auth_result is False:
need_vip_warning_report = False
break
param.update({
't': auth_result['data']['t'],
# cid is hard-coded in com/qiyi/player/core/player/RuntimeData.as
'cid': 'afbe8fd3d73448c9',
'vid': video_id,
'QY00001': auth_result['data']['u'],
})
api_video_url += '?' if '?' not in api_video_url else '&'
api_video_url += compat_urllib_parse.urlencode(param)
js = self._download_json(
api_video_url, video_id,
note='Download video info of segment %d for format %s' % (segment_index + 1, format_id))
@@ -195,16 +511,17 @@ class IqiyiIE(InfoExtractor):
tail = tm + tvid
param = {
'key': 'fvip',
'src': self.md5_text('youtube-dl'),
'src': md5_text('youtube-dl'),
'tvId': tvid,
'vid': video_id,
'vinfo': 1,
'tm': tm,
'enc': self.md5_text(enc_key + tail),
'enc': md5_text(enc_key + tail),
'qyid': _uuid,
'tn': random.random(),
'um': 0,
'authkey': self.md5_text(self.md5_text('') + tail),
'authkey': md5_text(md5_text('') + tail),
'k_tag': 1,
}
api_url = 'http://cache.video.qiyi.com/vms' + '?' + \
@@ -218,9 +535,49 @@ class IqiyiIE(InfoExtractor):
enc_key = '6ab6d0280511493ba85594779759d4ed'
return enc_key
def _extract_playlist(self, webpage):
PAGE_SIZE = 50
links = re.findall(
r'<a[^>]+class="site-piclist_pic_link"[^>]+href="(http://www\.iqiyi\.com/.+\.html)"',
webpage)
if not links:
return
album_id = self._search_regex(
r'albumId\s*:\s*(\d+),', webpage, 'album ID')
album_title = self._search_regex(
r'data-share-title="([^"]+)"', webpage, 'album title', fatal=False)
entries = list(map(self.url_result, links))
# Start from 2 because links in the first page are already on webpage
for page_num in itertools.count(2):
pagelist_page = self._download_webpage(
'http://cache.video.qiyi.com/jp/avlist/%s/%d/%d/' % (album_id, page_num, PAGE_SIZE),
album_id,
note='Download playlist page %d' % page_num,
errnote='Failed to download playlist page %d' % page_num)
pagelist = self._parse_json(
remove_start(pagelist_page, 'var tvInfoJs='), album_id)
vlist = pagelist['data']['vlist']
for item in vlist:
entries.append(self.url_result(item['vurl']))
if len(vlist) < PAGE_SIZE:
break
return self.playlist_result(entries, album_id, album_title)
def _real_extract(self, url):
webpage = self._download_webpage(
url, 'temp_id', note='download video page')
# There's no simple way to determine whether an URL is a playlist or not
# So detect it
playlist_result = self._extract_playlist(webpage)
if playlist_result:
return playlist_result
tvid = self._search_regex(
r'data-player-tvid\s*=\s*[\'"](\d+)', webpage, 'tvid')
video_id = self._search_regex(
@@ -236,16 +593,13 @@ class IqiyiIE(InfoExtractor):
if raw_data['code'] != 'A000000':
raise ExtractorError('Unable to load data. Error code: ' + raw_data['code'])
if not raw_data['data']['vp']['tkl']:
raise ExtractorError('No support iQiqy VIP video')
data = raw_data['data']
title = data['vi']['vn']
# generate video_urls_dict
video_urls_dict = self.construct_video_urls(
data, video_id, _uuid)
data, video_id, _uuid, tvid)
# construct info
entries = []

View File

@@ -28,7 +28,7 @@ class KankanIE(InfoExtractor):
title = self._search_regex(r'(?:G_TITLE=|G_MOVIE_TITLE = )[\'"](.+?)[\'"]', webpage, 'video title')
surls = re.search(r'surls:\[\'.+?\'\]|lurl:\'.+?\.flv\'', webpage).group(0)
gcids = re.findall(r"http://.+?/.+?/(.+?)/", surls)
gcids = re.findall(r'http://.+?/.+?/(.+?)/', surls)
gcid = gcids[-1]
info_url = 'http://p2s.cl.kankan.com/getCdnresource_flv?gcid=%s' % gcid

View File

@@ -2,12 +2,13 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import smuggle_url
class KickStarterIE(InfoExtractor):
_VALID_URL = r'https?://www\.kickstarter\.com/projects/(?P<id>[^/]*)/.*'
_TESTS = [{
'url': 'https://www.kickstarter.com/projects/1404461844/intersection-the-story-of-josh-grant?ref=home_location',
'url': 'https://www.kickstarter.com/projects/1404461844/intersection-the-story-of-josh-grant/description',
'md5': 'c81addca81327ffa66c642b5d8b08cab',
'info_dict': {
'id': '1404461844',
@@ -27,7 +28,8 @@ class KickStarterIE(InfoExtractor):
'uploader_id': 'pebble',
'uploader': 'Pebble Technology',
'title': 'Pebble iOS Notifications',
}
},
'add_ie': ['Vimeo'],
}, {
'url': 'https://www.kickstarter.com/projects/1420158244/power-drive-2000/widget/video.html',
'info_dict': {
@@ -43,7 +45,7 @@ class KickStarterIE(InfoExtractor):
webpage = self._download_webpage(url, video_id)
title = self._html_search_regex(
r'<title>\s*(.*?)(?:\s*&mdash; Kickstarter)?\s*</title>',
r'<title>\s*(.*?)(?:\s*&mdash;\s*Kickstarter)?\s*</title>',
webpage, 'title')
video_url = self._search_regex(
r'data-video-url="(.*?)"',
@@ -52,7 +54,7 @@ class KickStarterIE(InfoExtractor):
return {
'_type': 'url_transparent',
'ie_key': 'Generic',
'url': url,
'url': smuggle_url(url, {'to_generic': True}),
'title': title,
}

View File

@@ -0,0 +1,107 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
float_or_none,
int_or_none,
)
class KonserthusetPlayIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?konserthusetplay\.se/\?.*\bm=(?P<id>[^&]+)'
_TEST = {
'url': 'http://www.konserthusetplay.se/?m=CKDDnlCY-dhWAAqiMERd-A',
'info_dict': {
'id': 'CKDDnlCY-dhWAAqiMERd-A',
'ext': 'flv',
'title': 'Orkesterns instrument: Valthornen',
'description': 'md5:f10e1f0030202020396a4d712d2fa827',
'thumbnail': 're:^https?://.*$',
'duration': 398.8,
},
'params': {
# rtmp download
'skip_download': True,
},
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
e = self._search_regex(
r'https?://csp\.picsearch\.com/rest\?.*\be=(.+?)[&"\']', webpage, 'e')
rest = self._download_json(
'http://csp.picsearch.com/rest?e=%s&containerId=mediaplayer&i=object' % e,
video_id, transform_source=lambda s: s[s.index('{'):s.rindex('}') + 1])
media = rest['media']
player_config = media['playerconfig']
playlist = player_config['playlist']
source = next(f for f in playlist if f.get('bitrates'))
FORMAT_ID_REGEX = r'_([^_]+)_h264m\.mp4'
formats = []
fallback_url = source.get('fallbackUrl')
fallback_format_id = None
if fallback_url:
fallback_format_id = self._search_regex(
FORMAT_ID_REGEX, fallback_url, 'format id', default=None)
connection_url = (player_config.get('rtmp', {}).get(
'netConnectionUrl') or player_config.get(
'plugins', {}).get('bwcheck', {}).get('netConnectionUrl'))
if connection_url:
for f in source['bitrates']:
video_url = f.get('url')
if not video_url:
continue
format_id = self._search_regex(
FORMAT_ID_REGEX, video_url, 'format id', default=None)
f_common = {
'vbr': int_or_none(f.get('bitrate')),
'width': int_or_none(f.get('width')),
'height': int_or_none(f.get('height')),
}
f = f_common.copy()
f.update({
'url': connection_url,
'play_path': video_url,
'format_id': 'rtmp-%s' % format_id if format_id else 'rtmp',
'ext': 'flv',
})
formats.append(f)
if format_id and format_id == fallback_format_id:
f = f_common.copy()
f.update({
'url': fallback_url,
'format_id': 'http-%s' % format_id if format_id else 'http',
})
formats.append(f)
if not formats and fallback_url:
formats.append({
'url': fallback_url,
})
self._sort_formats(formats)
title = player_config.get('title') or media['title']
description = player_config.get('mediaInfo', {}).get('description')
thumbnail = media.get('image')
duration = float_or_none(media.get('duration'), 1000)
return {
'id': video_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'duration': duration,
'formats': formats,
}

View File

@@ -31,6 +31,10 @@ class KuwoBaseIE(InfoExtractor):
(file_format['ext'], file_format.get('br', ''), song_id),
song_id, note='Download %s url info' % file_format['format'],
)
if song_url == 'IPDeny':
raise ExtractorError('This song is blocked in this region', expected=True)
if song_url.startswith('http://') or song_url.startswith('https://'):
formats.append({
'url': song_url,

View File

@@ -1,86 +1,125 @@
# -*- coding: utf-8 -*-
# coding: utf-8
from __future__ import unicode_literals
import random
import re
from .common import InfoExtractor
from ..compat import (
compat_urllib_parse,
compat_urlparse,
)
from ..utils import (
ExtractorError,
sanitized_Request,
unified_strdate,
urlencode_postdata,
xpath_element,
xpath_text,
)
class Laola1TvIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?laola1\.tv/(?P<lang>[a-z]+)-(?P<portal>[a-z]+)/.*?/(?P<id>[0-9]+)\.html'
_TEST = {
_VALID_URL = r'https?://(?:www\.)?laola1\.tv/(?P<lang>[a-z]+)-(?P<portal>[a-z]+)/[^/]+/(?P<slug>[^/?#&]+)'
_TESTS = [{
'url': 'http://www.laola1.tv/de-de/video/straubing-tigers-koelner-haie/227883.html',
'info_dict': {
'id': '227883',
'ext': 'mp4',
'display_id': 'straubing-tigers-koelner-haie',
'ext': 'flv',
'title': 'Straubing Tigers - Kölner Haie',
'categories': ['Eishockey'],
'upload_date': '20140912',
'is_live': False,
'categories': ['Eishockey'],
},
'params': {
'skip_download': True,
}
}
}, {
'url': 'http://www.laola1.tv/de-de/video/straubing-tigers-koelner-haie',
'info_dict': {
'id': '464602',
'display_id': 'straubing-tigers-koelner-haie',
'ext': 'flv',
'title': 'Straubing Tigers - Kölner Haie',
'upload_date': '20160129',
'is_live': False,
'categories': ['Eishockey'],
},
'params': {
'skip_download': True,
}
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
display_id = mobj.group('slug')
lang = mobj.group('lang')
portal = mobj.group('portal')
webpage = self._download_webpage(url, video_id)
iframe_url = self._search_regex(
r'<iframe[^>]*?class="main_tv_player"[^>]*?src="([^"]+)"',
webpage, 'iframe URL')
webpage = self._download_webpage(url, display_id)
iframe = self._download_webpage(
iframe_url, video_id, note='Downloading iframe')
flashvars_m = re.findall(
r'flashvars\.([_a-zA-Z0-9]+)\s*=\s*"([^"]*)";', iframe)
flashvars = dict((m[0], m[1]) for m in flashvars_m)
iframe_url = self._search_regex(
r'<iframe[^>]*?id="videoplayer"[^>]*?src="([^"]+)"',
webpage, 'iframe url')
video_id = self._search_regex(
r'videoid=(\d+)', iframe_url, 'video id')
iframe = self._download_webpage(compat_urlparse.urljoin(
url, iframe_url), display_id, 'Downloading iframe')
partner_id = self._search_regex(
r'partnerid\s*:\s*"([^"]+)"', iframe, 'partner id')
r'partnerid\s*:\s*(["\'])(?P<partner_id>.+?)\1',
iframe, 'partner id', group='partner_id')
xml_url = ('http://www.laola1.tv/server/hd_video.php?' +
'play=%s&partner=%s&portal=%s&v5ident=&lang=%s' % (
video_id, partner_id, portal, lang))
hd_doc = self._download_xml(xml_url, video_id)
hd_doc = self._download_xml(
'http://www.laola1.tv/server/hd_video.php?%s'
% compat_urllib_parse.urlencode({
'play': video_id,
'partner': partner_id,
'portal': portal,
'lang': lang,
'v5ident': '',
}), display_id)
title = xpath_text(hd_doc, './/video/title', fatal=True)
flash_url = xpath_text(hd_doc, './/video/url', fatal=True)
uploader = xpath_text(hd_doc, './/video/meta_organistation')
is_live = xpath_text(hd_doc, './/video/islive') == 'true'
_v = lambda x, **k: xpath_text(hd_doc, './/video/' + x, **k)
title = _v('title', fatal=True)
categories = xpath_text(hd_doc, './/video/meta_sports')
if categories:
categories = categories.split(',')
req = sanitized_Request(
'https://club.laola1.tv/sp/laola1/api/v3/user/session/premium/player/stream-access?%s' %
compat_urllib_parse.urlencode({
'videoId': video_id,
'target': '2',
'label': 'laola1tv',
'area': _v('area'),
}),
urlencode_postdata(
dict((i, v) for i, v in enumerate(_v('req_liga_abos').split(',')))))
ident = random.randint(10000000, 99999999)
token_url = '%s&ident=%s&klub=0&unikey=0&timestamp=%s&auth=%s' % (
flash_url, ident, flashvars['timestamp'], flashvars['auth'])
token_url = self._download_json(req, display_id)['data']['stream-access'][0]
token_doc = self._download_xml(token_url, display_id, 'Downloading token')
token_doc = self._download_xml(
token_url, video_id, note='Downloading token')
token_attrib = token_doc.find('.//token').attrib
if token_attrib.get('auth') in ('blocked', 'restricted'):
token_attrib = xpath_element(token_doc, './/token').attrib
token_auth = token_attrib['auth']
if token_auth in ('blocked', 'restricted', 'error'):
raise ExtractorError(
'Token error: %s' % token_attrib.get('comment'), expected=True)
'Token error: %s' % token_attrib['comment'], expected=True)
video_url = '%s?hdnea=%s&hdcore=3.2.0' % (
token_attrib['url'], token_attrib['auth'])
formats = self._extract_f4m_formats(
'%s?hdnea=%s&hdcore=3.2.0' % (token_attrib['url'], token_auth),
video_id, f4m_id='hds')
categories_str = _v('meta_sports')
categories = categories_str.split(',') if categories_str else []
return {
'id': video_id,
'is_live': is_live,
'display_id': display_id,
'title': title,
'url': video_url,
'uploader': uploader,
'upload_date': unified_strdate(_v('time_date')),
'uploader': _v('meta_organisation'),
'categories': categories,
'ext': 'mp4',
'is_live': _v('islive') == 'true',
'formats': formats,
}

View File

@@ -5,11 +5,13 @@ import datetime
import re
import time
import base64
import hashlib
from .common import InfoExtractor
from ..compat import (
compat_urllib_parse,
compat_ord,
compat_str,
)
from ..utils import (
determine_ext,
@@ -258,6 +260,7 @@ class LetvCloudIE(InfoExtractor):
},
}, {
'url': 'http://yuntv.letv.com/bcloud.html?uu=p7jnfw5hw9&vu=ec93197892&pu=2c7cd40209&auto_play=1&gpcflag=1&width=640&height=360',
'md5': 'e03d9cc8d9c13191e1caf277e42dbd31',
'info_dict': {
'id': 'p7jnfw5hw9_ec93197892',
'ext': 'mp4',
@@ -265,6 +268,7 @@ class LetvCloudIE(InfoExtractor):
},
}, {
'url': 'http://yuntv.letv.com/bcloud.html?uu=p7jnfw5hw9&vu=187060b6fd',
'md5': 'cb988699a776b22d4a41b9d43acfb3ac',
'info_dict': {
'id': 'p7jnfw5hw9_187060b6fd',
'ext': 'mp4',
@@ -272,21 +276,37 @@ class LetvCloudIE(InfoExtractor):
},
}]
def _real_extract(self, url):
uu_mobj = re.search('uu=([\w]+)', url)
vu_mobj = re.search('vu=([\w]+)', url)
@staticmethod
def sign_data(obj):
if obj['cf'] == 'flash':
salt = '2f9d6924b33a165a6d8b5d3d42f4f987'
items = ['cf', 'format', 'ran', 'uu', 'ver', 'vu']
elif obj['cf'] == 'html5':
salt = 'fbeh5player12c43eccf2bec3300344'
items = ['cf', 'ran', 'uu', 'bver', 'vu']
input_data = ''.join([item + obj[item] for item in items]) + salt
obj['sign'] = hashlib.md5(input_data.encode('utf-8')).hexdigest()
if not uu_mobj or not vu_mobj:
raise ExtractorError('Invalid URL: %s' % url, expected=True)
def _get_formats(self, cf, uu, vu, media_id):
def get_play_json(cf, timestamp):
data = {
'cf': cf,
'ver': '2.2',
'bver': 'firefox44.0',
'format': 'json',
'uu': uu,
'vu': vu,
'ran': compat_str(timestamp),
}
self.sign_data(data)
return self._download_json(
'http://api.letvcloud.com/gpc.php?' + compat_urllib_parse.urlencode(data),
media_id, 'Downloading playJson data for type %s' % cf)
uu = uu_mobj.group(1)
vu = vu_mobj.group(1)
media_id = uu + '_' + vu
play_json_req = sanitized_Request(
'http://api.letvcloud.com/gpc.php?cf=html5&sign=signxxxxx&ver=2.2&format=json&' +
'uu=' + uu + '&vu=' + vu)
play_json = self._download_json(play_json_req, media_id, 'Downloading playJson data')
play_json = get_play_json(cf, time.time())
# The server time may be different from local time
if play_json.get('code') == 10071:
play_json = get_play_json(cf, play_json['timestamp'])
if not play_json.get('data'):
if play_json.get('message'):
@@ -312,6 +332,21 @@ class LetvCloudIE(InfoExtractor):
'width': int_or_none(play_url.get('vwidth')),
'height': int_or_none(play_url.get('vheight')),
})
return formats
def _real_extract(self, url):
uu_mobj = re.search('uu=([\w]+)', url)
vu_mobj = re.search('vu=([\w]+)', url)
if not uu_mobj or not vu_mobj:
raise ExtractorError('Invalid URL: %s' % url, expected=True)
uu = uu_mobj.group(1)
vu = vu_mobj.group(1)
media_id = uu + '_' + vu
formats = self._get_formats('flash', uu, vu, media_id) + self._get_formats('html5', uu, vu, media_id)
self._sort_formats(formats)
return {

View File

@@ -40,7 +40,8 @@ class LimelightBaseIE(InfoExtractor):
if not stream_url:
continue
if '.f4m' in stream_url:
formats.extend(self._extract_f4m_formats(stream_url, video_id))
formats.extend(self._extract_f4m_formats(
stream_url, video_id, fatal=False))
else:
fmt = {
'url': stream_url,
@@ -72,8 +73,8 @@ class LimelightBaseIE(InfoExtractor):
format_id = mobile_url.get('targetMediaPlatform')
if determine_ext(media_url) == 'm3u8':
formats.extend(self._extract_m3u8_formats(
media_url, video_id, 'mp4', entry_protocol='m3u8_native',
preference=-1, m3u8_id=format_id))
media_url, video_id, 'mp4', 'm3u8_native',
m3u8_id=format_id, fatal=False))
else:
formats.append({
'url': media_url,

View File

@@ -47,7 +47,7 @@ class LiveLeakIE(InfoExtractor):
'info_dict': {
'id': '801_1409392012',
'ext': 'mp4',
'description': "Happened on 27.7.2014. \r\nAt 0:53 you can see people still swimming at near beach.",
'description': 'Happened on 27.7.2014. \r\nAt 0:53 you can see people still swimming at near beach.',
'uploader': 'bony333',
'title': 'Crazy Hungarian tourist films close call waterspout in Croatia'
}

View File

@@ -4,6 +4,10 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
int_or_none,
remove_end,
)
class MailRuIE(InfoExtractor):
@@ -34,14 +38,30 @@ class MailRuIE(InfoExtractor):
'id': '46843144_1263',
'ext': 'mp4',
'title': 'Samsung Galaxy S5 Hammer Smash Fail Battery Explosion',
'timestamp': 1397217632,
'upload_date': '20140411',
'uploader': 'hitech',
'timestamp': 1397039888,
'upload_date': '20140409',
'uploader': 'hitech@corp.mail.ru',
'uploader_id': 'hitech@corp.mail.ru',
'duration': 245,
},
'skip': 'Not accessible from Travis CI server',
},
{
# only available via metaUrl API
'url': 'http://my.mail.ru/mail/720pizle/video/_myvideo/502.html',
'md5': '3b26d2491c6949d031a32b96bd97c096',
'info_dict': {
'id': '56664382_502',
'ext': 'mp4',
'title': ':8336',
'timestamp': 1449094163,
'upload_date': '20151202',
'uploader': '720pizle@mail.ru',
'uploader_id': '720pizle@mail.ru',
'duration': 6001,
},
'skip': 'Not accessible from Travis CI server',
}
]
def _real_extract(self, url):
@@ -51,32 +71,55 @@ class MailRuIE(InfoExtractor):
if not video_id:
video_id = mobj.group('idv2prefix') + mobj.group('idv2suffix')
video_data = self._download_json(
'http://api.video.mail.ru/videos/%s.json?new=1' % video_id, video_id, 'Downloading video JSON')
webpage = self._download_webpage(url, video_id)
author = video_data['author']
uploader = author['name']
uploader_id = author.get('id') or author.get('email')
view_count = video_data.get('views_count')
video_data = None
page_config = self._parse_json(self._search_regex(
r'(?s)<script[^>]+class="sp-video__page-config"[^>]*>(.+?)</script>',
webpage, 'page config', default='{}'), video_id, fatal=False)
if page_config:
meta_url = page_config.get('metaUrl') or page_config.get('video', {}).get('metaUrl')
if meta_url:
video_data = self._download_json(
meta_url, video_id, 'Downloading video meta JSON', fatal=False)
# Fallback old approach
if not video_data:
video_data = self._download_json(
'http://api.video.mail.ru/videos/%s.json?new=1' % video_id,
video_id, 'Downloading video JSON')
formats = []
for f in video_data['videos']:
video_url = f.get('url')
if not video_url:
continue
format_id = f.get('key')
height = int_or_none(self._search_regex(
r'^(\d+)[pP]$', format_id, 'height', default=None)) if format_id else None
formats.append({
'url': video_url,
'format_id': format_id,
'height': height,
})
self._sort_formats(formats)
meta_data = video_data['meta']
content_id = '%s_%s' % (
meta_data.get('accId', ''), meta_data['itemId'])
title = meta_data['title']
if title.endswith('.mp4'):
title = title[:-4]
thumbnail = meta_data['poster']
duration = meta_data['duration']
timestamp = meta_data['timestamp']
title = remove_end(meta_data['title'], '.mp4')
formats = [
{
'url': video['url'],
'format_id': video['key'],
'height': int(video['key'].rstrip('p'))
} for video in video_data['videos']
]
self._sort_formats(formats)
author = video_data.get('author')
uploader = author.get('name')
uploader_id = author.get('id') or author.get('email')
view_count = int_or_none(video_data.get('viewsCount') or video_data.get('views_count'))
acc_id = meta_data.get('accId')
item_id = meta_data.get('itemId')
content_id = '%s_%s' % (acc_id, item_id) if acc_id and item_id else video_id
thumbnail = meta_data.get('poster')
duration = int_or_none(meta_data.get('duration'))
timestamp = int_or_none(meta_data.get('timestamp'))
return {
'id': content_id,

View File

@@ -0,0 +1,55 @@
# coding: utf-8
from __future__ import unicode_literals
import random
from .common import InfoExtractor
from ..compat import compat_urllib_parse
from ..utils import (
sanitized_Request,
xpath_text,
)
class MatchTVIE(InfoExtractor):
_VALID_URL = r'https?://matchtv\.ru/?#live-player'
_TEST = {
'url': 'http://matchtv.ru/#live-player',
'info_dict': {
'id': 'matchtv-live',
'ext': 'flv',
'title': 're:^Матч ТВ - Прямой эфир \d{4}-\d{2}-\d{2} \d{2}:\d{2}$',
'is_live': True,
},
'params': {
'skip_download': True,
},
}
def _real_extract(self, url):
video_id = 'matchtv-live'
request = sanitized_Request(
'http://player.matchtv.ntvplus.tv/player/smil?%s' % compat_urllib_parse.urlencode({
'ts': '',
'quality': 'SD',
'contentId': '561d2c0df7159b37178b4567',
'sign': '',
'includeHighlights': '0',
'userId': '',
'sessionId': random.randint(1, 1000000000),
'contentType': 'channel',
'timeShift': '0',
'platform': 'portal',
}),
headers={
'Referer': 'http://player.matchtv.ntvplus.tv/embed-player/NTVEmbedPlayer.swf',
})
video_url = self._download_json(request, video_id)['data']['videoUrl']
f4m_url = xpath_text(self._download_xml(video_url, video_id), './to')
formats = self._extract_f4m_formats(f4m_url, video_id)
return {
'id': video_id,
'title': self._live_title('Матч ТВ - Прямой эфир'),
'is_live': True,
'formats': formats,
}

View File

@@ -38,7 +38,7 @@ class MofosexIE(InfoExtractor):
path = compat_urllib_parse_urlparse(video_url).path
extension = os.path.splitext(path)[1][1:]
format = path.split('/')[5].split('_')[:2]
format = "-".join(format)
format = '-'.join(format)
age_limit = self._rta_search(webpage)

View File

@@ -11,6 +11,7 @@ from ..utils import (
ExtractorError,
find_xpath_attr,
fix_xml_ampersands,
float_or_none,
HEADRequest,
sanitized_Request,
unescapeHTML,
@@ -110,7 +111,8 @@ class MTVServicesInfoExtractor(InfoExtractor):
uri = itemdoc.find('guid').text
video_id = self._id_from_uri(uri)
self.report_extraction(video_id)
mediagen_url = itemdoc.find('%s/%s' % (_media_xml_tag('group'), _media_xml_tag('content'))).attrib['url']
content_el = itemdoc.find('%s/%s' % (_media_xml_tag('group'), _media_xml_tag('content')))
mediagen_url = content_el.attrib['url']
# Remove the templates, like &device={device}
mediagen_url = re.sub(r'&[^=]*?={.*?}(?=(&|$))', '', mediagen_url)
if 'acceptMethods' not in mediagen_url:
@@ -165,6 +167,7 @@ class MTVServicesInfoExtractor(InfoExtractor):
'id': video_id,
'thumbnail': self._get_thumbnail_url(uri, itemdoc),
'description': description,
'duration': float_or_none(content_el.attrib.get('duration')),
}
def _get_feed_query(self, uri):

View File

@@ -18,8 +18,8 @@ class MySpassIE(InfoExtractor):
'info_dict': {
'id': '11741',
'ext': 'mp4',
"description": "Wer kann in die Fu\u00dfstapfen von Wolfgang Kubicki treten und die Mehrheit der Zuschauer hinter sich versammeln? Wird vielleicht sogar die Absolute Mehrheit geknackt und der Jackpot von 200.000 Euro mit nach Hause genommen?",
"title": "Absolute Mehrheit vom 17.02.2013 - Die Highlights, Teil 2",
'description': 'Wer kann in die Fu\u00dfstapfen von Wolfgang Kubicki treten und die Mehrheit der Zuschauer hinter sich versammeln? Wird vielleicht sogar die Absolute Mehrheit geknackt und der Jackpot von 200.000 Euro mit nach Hause genommen?',
'title': 'Absolute Mehrheit vom 17.02.2013 - Die Highlights, Teil 2',
},
}

View File

@@ -19,6 +19,7 @@ from ..utils import (
class MyVideoIE(InfoExtractor):
_WORKING = False
_VALID_URL = r'http://(?:www\.)?myvideo\.de/(?:[^/]+/)?watch/(?P<id>[0-9]+)/[^?/]+.*'
IE_NAME = 'myvideo'
_TEST = {

View File

@@ -18,13 +18,17 @@ class NBAIE(InfoExtractor):
'md5': '9e7729d3010a9c71506fd1248f74e4f4',
'info_dict': {
'id': '0021200253-okc-bkn-recap',
'ext': 'flv',
'ext': 'mp4',
'title': 'Thunder vs. Nets',
'description': 'Kevin Durant scores 32 points and dishes out six assists as the Thunder beat the Nets in Brooklyn.',
'duration': 181,
'timestamp': 1354638466,
'upload_date': '20121204',
},
'params': {
# m3u8 download
'skip_download': True,
},
}, {
'url': 'http://www.nba.com/video/games/hornets/2014/12/05/0021400276-nyk-cha-play5.nba/',
'only_matching': True,
@@ -68,7 +72,7 @@ class NBAIE(InfoExtractor):
if video_url.startswith('/'):
continue
if video_url.endswith('.m3u8'):
formats.extend(self._extract_m3u8_formats(video_url, video_id, m3u8_id='hls', fatal=False))
formats.extend(self._extract_m3u8_formats(video_url, video_id, ext='mp4', m3u8_id='hls', fatal=False))
elif video_url.endswith('.f4m'):
formats.extend(self._extract_f4m_formats(video_url + '?hdcore=3.4.1.1', video_id, f4m_id='hds', fatal=False))
else:

View File

@@ -19,38 +19,45 @@ class NBCIE(InfoExtractor):
_TESTS = [
{
'url': 'http://www.nbc.com/the-tonight-show/segments/112966',
# md5 checksum is not stable
'info_dict': {
'id': 'c9xnCo0YPOPH',
'ext': 'flv',
'id': '112966',
'ext': 'mp4',
'title': 'Jimmy Fallon Surprises Fans at Ben & Jerry\'s',
'description': 'Jimmy gives out free scoops of his new "Tonight Dough" ice cream flavor by surprising customers at the Ben & Jerry\'s scoop shop.',
},
'params': {
# m3u8 download
'skip_download': True,
},
},
{
'url': 'http://www.nbc.com/the-tonight-show/episodes/176',
'info_dict': {
'id': 'XwU9KZkp98TH',
'id': '176',
'ext': 'flv',
'title': 'Ricky Gervais, Steven Van Zandt, ILoveMakonnen',
'description': 'A brand new episode of The Tonight Show welcomes Ricky Gervais, Steven Van Zandt and ILoveMakonnen.',
},
'skip': 'Only works from US',
'skip': '404 Not Found',
},
{
'url': 'http://www.nbc.com/saturday-night-live/video/star-wars-teaser/2832821',
'info_dict': {
'id': '8iUuyzWDdYUZ',
'ext': 'flv',
'id': '2832821',
'ext': 'mp4',
'title': 'Star Wars Teaser',
'description': 'md5:0b40f9cbde5b671a7ff62fceccc4f442',
},
'params': {
# m3u8 download
'skip_download': True,
},
'skip': 'Only works from US',
},
{
# This video has expired but with an escaped embedURL
'url': 'http://www.nbc.com/parenthood/episode-guide/season-5/just-like-at-home/515',
'skip': 'Expired'
'only_matching': True,
}
]
@@ -66,7 +73,11 @@ class NBCIE(InfoExtractor):
webpage, 'theplatform url').replace('_no_endcard', '').replace('\\/', '/')))
if theplatform_url.startswith('//'):
theplatform_url = 'http:' + theplatform_url
return self.url_result(smuggle_url(theplatform_url, {'source_url': url}))
return {
'_type': 'url_transparent',
'url': smuggle_url(theplatform_url, {'source_url': url}),
'id': video_id,
}
class NBCSportsVPlayerIE(InfoExtractor):

View File

@@ -193,7 +193,7 @@ class NDREmbedBaseIE(InfoExtractor):
src + '?hdcore=3.7.0&plugin=aasp-3.7.0.39.44', video_id, f4m_id='hds'))
elif ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
src, video_id, m3u8_id='hls', entry_protocol='m3u8_native'))
src, video_id, 'mp4', m3u8_id='hls', entry_protocol='m3u8_native'))
else:
quality = f.get('quality')
ff = {

View File

@@ -18,14 +18,14 @@ class NerdCubedFeedIE(InfoExtractor):
}
def _real_extract(self, url):
feed = self._download_json(url, url, "Downloading NerdCubed JSON feed")
feed = self._download_json(url, url, 'Downloading NerdCubed JSON feed')
entries = [{
'_type': 'url',
'title': feed_entry['title'],
'uploader': feed_entry['source']['name'] if feed_entry['source'] else None,
'upload_date': datetime.datetime.strptime(feed_entry['date'], '%Y-%m-%d').strftime('%Y%m%d'),
'url': "http://www.youtube.com/watch?v=" + feed_entry['youtube_id'],
'url': 'http://www.youtube.com/watch?v=' + feed_entry['youtube_id'],
} for feed_entry in feed]
return {

View File

@@ -0,0 +1,69 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_urllib_parse_unquote
from ..utils import (
int_or_none,
xpath_text,
)
class NozIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?noz\.de/video/(?P<id>[0-9]+)/'
_TESTS = [{
'url': 'http://www.noz.de/video/25151/32-Deutschland-gewinnt-Badminton-Lnderspiel-in-Melle',
'info_dict': {
'id': '25151',
'ext': 'mp4',
'duration': 215,
'title': '3:2 - Deutschland gewinnt Badminton-Länderspiel in Melle',
'description': 'Vor rund 370 Zuschauern gewinnt die deutsche Badminton-Nationalmannschaft am Donnerstag ein EM-Vorbereitungsspiel gegen Frankreich in Melle. Video Moritz Frankenberg.',
'thumbnail': 're:^http://.*\.jpg',
},
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
description = self._og_search_description(webpage)
edge_url = self._html_search_regex(
r'<script\s+(?:type="text/javascript"\s+)?src="(.*?/videojs_.*?)"',
webpage, 'edge URL')
edge_content = self._download_webpage(edge_url, 'meta configuration')
config_url_encoded = self._search_regex(
r'so\.addVariable\("config_url","[^,]*,(.*?)"',
edge_content, 'config URL'
)
config_url = compat_urllib_parse_unquote(config_url_encoded)
doc = self._download_xml(config_url, 'video configuration')
title = xpath_text(doc, './/title')
thumbnail = xpath_text(doc, './/article/thumbnail/url')
duration = int_or_none(xpath_text(
doc, './/article/movie/file/duration'))
formats = []
for qnode in doc.findall('.//article/movie/file/qualities/qual'):
video_node = qnode.find('./html_urls/video_url[@format="video/mp4"]')
if video_node is None:
continue # auto
formats.append({
'url': video_node.text,
'format_name': xpath_text(qnode, './name'),
'format_id': xpath_text(qnode, './id'),
'height': int_or_none(xpath_text(qnode, './height')),
'width': int_or_none(xpath_text(qnode, './width')),
'tbr': int_or_none(xpath_text(qnode, './bitrate'), scale=1000),
})
self._sort_formats(formats)
return {
'id': video_id,
'formats': formats,
'title': title,
'duration': duration,
'description': description,
'thumbnail': thumbnail,
}

View File

@@ -189,7 +189,7 @@ class NPOIE(NPOBaseIE):
if not video_url:
continue
if format_id == 'adaptive':
formats.extend(self._extract_m3u8_formats(video_url, video_id))
formats.extend(self._extract_m3u8_formats(video_url, video_id, 'mp4'))
else:
formats.append({
'url': video_url,
@@ -406,6 +406,38 @@ class NPORadioFragmentIE(InfoExtractor):
}
class SchoolTVIE(InfoExtractor):
IE_NAME = 'schooltv'
_VALID_URL = r'https?://(?:www\.)?schooltv\.nl/video/(?P<id>[^/?#&]+)'
_TEST = {
'url': 'http://www.schooltv.nl/video/ademhaling-de-hele-dag-haal-je-adem-maar-wat-gebeurt-er-dan-eigenlijk-in-je-lichaam/',
'info_dict': {
'id': 'WO_NTR_429477',
'display_id': 'ademhaling-de-hele-dag-haal-je-adem-maar-wat-gebeurt-er-dan-eigenlijk-in-je-lichaam',
'title': 'Ademhaling: De hele dag haal je adem. Maar wat gebeurt er dan eigenlijk in je lichaam?',
'ext': 'mp4',
'description': 'md5:abfa0ff690adb73fd0297fd033aaa631'
},
'params': {
# Skip because of m3u8 download
'skip_download': True
}
}
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
video_id = self._search_regex(
r'data-mid=(["\'])(?P<id>.+?)\1', webpage, 'video_id', group='id')
return {
'_type': 'url_transparent',
'ie_key': 'NPO',
'url': 'npo:%s' % video_id,
'display_id': display_id
}
class VPROIE(NPOIE):
IE_NAME = 'vpro'
_VALID_URL = r'https?://(?:www\.)?(?:tegenlicht\.)?vpro\.nl/(?:[^/]+/){2,}(?P<id>[^/]+)\.html'

View File

@@ -133,26 +133,32 @@ class NRKTVIE(InfoExtractor):
_TESTS = [
{
'url': 'https://tv.nrk.no/serie/20-spoersmaal-tv/MUHH48000314/23-05-2014',
'md5': 'adf2c5454fa2bf032f47a9f8fb351342',
'info_dict': {
'id': 'MUHH48000314',
'ext': 'flv',
'ext': 'mp4',
'title': '20 spørsmål',
'description': 'md5:bdea103bc35494c143c6a9acdd84887a',
'upload_date': '20140523',
'duration': 1741.52,
},
'params': {
# m3u8 download
'skip_download': True,
},
},
{
'url': 'https://tv.nrk.no/program/mdfp15000514',
'md5': '383650ece2b25ecec996ad7b5bb2a384',
'info_dict': {
'id': 'mdfp15000514',
'ext': 'flv',
'title': 'Kunnskapskanalen: Grunnlovsjubiléet - Stor ståhei for ingenting',
'ext': 'mp4',
'title': 'Grunnlovsjubiléet - Stor ståhei for ingenting',
'description': 'md5:654c12511f035aed1e42bdf5db3b206a',
'upload_date': '20140524',
'duration': 4605.0,
'duration': 4605.08,
},
'params': {
# m3u8 download
'skip_download': True,
},
},
{

View File

@@ -13,7 +13,7 @@ from ..utils import (
class OdnoklassnikiIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?(?:odnoklassniki|ok)\.ru/(?:video(?:embed)?|web-api/video/moviePlayer)/(?P<id>[\d-]+)'
_VALID_URL = r'https?://(?:(?:www|m|mobile)\.)?(?:odnoklassniki|ok)\.ru/(?:video(?:embed)?|web-api/video/moviePlayer)/(?P<id>[\d-]+)'
_TESTS = [{
# metadata in JSON
'url': 'http://ok.ru/video/20079905452',
@@ -69,6 +69,12 @@ class OdnoklassnikiIE(InfoExtractor):
}, {
'url': 'http://www.ok.ru/videoembed/20648036891',
'only_matching': True,
}, {
'url': 'http://m.ok.ru/video/20079905452',
'only_matching': True,
}, {
'url': 'http://mobile.ok.ru/video/20079905452',
'only_matching': True,
}]
def _real_extract(self, url):

View File

@@ -112,6 +112,7 @@ class ORFTVthekIE(InfoExtractor):
% geo_str),
fatal=False)
self._check_formats(formats, video_id)
self._sort_formats(formats)
upload_date = unified_strdate(sd['created_date'])

View File

@@ -4,10 +4,12 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_HTTPError
from ..utils import (
ExtractorError,
determine_ext,
int_or_none,
js_to_json,
strip_jsonp,
unified_strdate,
US_RATINGS,
@@ -199,7 +201,7 @@ class PBSIE(InfoExtractor):
'id': '2365006249',
'ext': 'mp4',
'title': 'Constitution USA with Peter Sagal - A More Perfect Union',
'description': 'md5:ba0c207295339c8d6eced00b7c363c6a',
'description': 'md5:36f341ae62e251b8f5bd2b754b95a071',
'duration': 3190,
},
'params': {
@@ -213,7 +215,7 @@ class PBSIE(InfoExtractor):
'id': '2365297690',
'ext': 'mp4',
'title': 'FRONTLINE - Losing Iraq',
'description': 'md5:f5bfbefadf421e8bb8647602011caf8e',
'description': 'md5:4d3eaa01f94e61b3e73704735f1196d9',
'duration': 5050,
},
'params': {
@@ -227,7 +229,7 @@ class PBSIE(InfoExtractor):
'id': '2201174722',
'ext': 'mp4',
'title': 'PBS NewsHour - Cyber Schools Gain Popularity, but Quality Questions Persist',
'description': 'md5:5871c15cba347c1b3d28ac47a73c7c28',
'description': 'md5:95a19f568689d09a166dff9edada3301',
'duration': 801,
},
},
@@ -237,8 +239,8 @@ class PBSIE(InfoExtractor):
'info_dict': {
'id': '2365297708',
'ext': 'mp4',
'description': 'md5:68d87ef760660eb564455eb30ca464fe',
'title': 'Great Performances - Dudamel Conducts Verdi Requiem at the Hollywood Bowl - Full',
'description': 'md5:657897370e09e2bc6bf0f8d2cd313c6b',
'duration': 6559,
'thumbnail': 're:^https?://.*\.jpg$',
},
@@ -278,7 +280,7 @@ class PBSIE(InfoExtractor):
'display_id': 'player',
'ext': 'mp4',
'title': 'American Experience - Death and the Civil War, Chapter 1',
'description': 'American Experience, TVs most-watched history series, brings to life the compelling stories from our past that inform our understanding of the world today.',
'description': 'md5:1b80a74e0380ed2a4fb335026de1600d',
'duration': 682,
'thumbnail': 're:^https?://.*\.jpg$',
},
@@ -287,20 +289,19 @@ class PBSIE(InfoExtractor):
},
},
{
'url': 'http://video.pbs.org/video/2365367186/',
'url': 'http://www.pbs.org/video/2365245528/',
'info_dict': {
'id': '2365367186',
'display_id': '2365367186',
'id': '2365245528',
'display_id': '2365245528',
'ext': 'mp4',
'title': 'To Catch A Comet - Full Episode',
'description': 'On November 12, 2014, billions of kilometers from Earth, spacecraft orbiter Rosetta and lander Philae did what no other had dared to attempt \u2014 land on the volatile surface of a comet as it zooms around the sun at 67,000 km/hr. The European Space Agency hopes this mission can help peer into our past and unlock secrets of our origins.',
'duration': 3342,
'title': 'FRONTLINE - United States of Secrets (Part One)',
'description': 'md5:55756bd5c551519cc4b7703e373e217e',
'duration': 6851,
'thumbnail': 're:^https?://.*\.jpg$',
},
'params': {
'skip_download': True, # requires ffmpeg
},
'skip': 'Expired',
},
{
# Video embedded in iframe containing angle brackets as attribute's value (e.g.
@@ -312,7 +313,7 @@ class PBSIE(InfoExtractor):
'display_id': 'a-chefs-life-season-3-episode-5-prickly-business',
'ext': 'mp4',
'title': "A Chef's Life - Season 3, Ep. 5: Prickly Business",
'description': 'md5:61db2ddf27c9912f09c241014b118ed1',
'description': 'md5:54033c6baa1f9623607c6e2ed245888b',
'duration': 1480,
'thumbnail': 're:^https?://.*\.jpg$',
},
@@ -328,7 +329,7 @@ class PBSIE(InfoExtractor):
'display_id': 'the-atomic-artists',
'ext': 'mp4',
'title': 'FRONTLINE - The Atomic Artists',
'description': 'md5:f5bfbefadf421e8bb8647602011caf8e',
'description': 'md5:1a2481e86b32b2e12ec1905dd473e2c1',
'duration': 723,
'thumbnail': 're:^https?://.*\.jpg$',
},
@@ -336,6 +337,21 @@ class PBSIE(InfoExtractor):
'skip_download': True, # requires ffmpeg
},
},
{
# Serves hd only via wigget/partnerplayer page
'url': 'http://www.pbs.org/video/2365641075/',
'info_dict': {
'id': '2365641075',
'ext': 'mp4',
'title': 'FRONTLINE - Netanyahu at War',
'duration': 6852,
'thumbnail': 're:^https?://.*\.jpg$',
'formats': 'mincount:8',
},
'params': {
'skip_download': True, # requires ffmpeg
},
},
{
'url': 'http://player.pbs.org/widget/partnerplayer/2365297708/?start=0&end=0&chapterbar=false&endscreen=false&topbar=true',
'only_matching': True,
@@ -365,10 +381,14 @@ class PBSIE(InfoExtractor):
webpage, 'upload date', default=None))
# tabbed frontline videos
tabbed_videos = re.findall(
r'<div[^>]+class="videotab[^"]*"[^>]+vid="(\d+)"', webpage)
if tabbed_videos:
return tabbed_videos, presumptive_id, upload_date
MULTI_PART_REGEXES = (
r'<div[^>]+class="videotab[^"]*"[^>]+vid="(\d+)"',
r'<a[^>]+href=["\']#video-\d+["\'][^>]+data-coveid=["\'](\d+)',
)
for p in MULTI_PART_REGEXES:
tabbed_videos = re.findall(p, webpage)
if tabbed_videos:
return tabbed_videos, presumptive_id, upload_date
MEDIA_ID_REGEXES = [
r"div\s*:\s*'videoembed'\s*,\s*mediaid\s*:\s*'(\d+)'", # frontline video embed
@@ -432,22 +452,54 @@ class PBSIE(InfoExtractor):
for vid_id in video_id]
return self.playlist_result(entries, display_id)
info = self._download_json(
'http://player.pbs.org/videoInfo/%s?format=json&type=partner' % video_id,
display_id)
info = None
redirects = []
redirect_urls = set()
def extract_redirect_urls(info):
for encoding_name in ('recommended_encoding', 'alternate_encoding'):
redirect = info.get(encoding_name)
if not redirect:
continue
redirect_url = redirect.get('url')
if redirect_url and redirect_url not in redirect_urls:
redirects.append(redirect)
redirect_urls.add(redirect_url)
try:
video_info = self._download_json(
'http://player.pbs.org/videoInfo/%s?format=json&type=partner' % video_id,
display_id, 'Downloading video info JSON')
extract_redirect_urls(video_info)
info = video_info
except ExtractorError as e:
# videoInfo API may not work for some videos
if not isinstance(e.cause, compat_HTTPError) or e.cause.code != 404:
raise
# Player pages may also serve different qualities
for page in ('widget/partnerplayer', 'portalplayer'):
player = self._download_webpage(
'http://player.pbs.org/%s/%s' % (page, video_id),
display_id, 'Downloading %s page' % page, fatal=False)
if player:
video_info = self._parse_json(
self._search_regex(
r'(?s)PBS\.videoData\s*=\s*({.+?});\n',
player, '%s video data' % page, default='{}'),
display_id, transform_source=js_to_json, fatal=False)
if video_info:
extract_redirect_urls(video_info)
if not info:
info = video_info
formats = []
for encoding_name in ('recommended_encoding', 'alternate_encoding'):
redirect = info.get(encoding_name)
if not redirect:
continue
redirect_url = redirect.get('url')
if not redirect_url:
continue
for num, redirect in enumerate(redirects):
redirect_id = redirect.get('eeid')
redirect_info = self._download_json(
redirect_url + '?format=json', display_id,
'Downloading %s video url info' % encoding_name)
'%s?format=json' % redirect['url'], display_id,
'Downloading %s video url info' % (redirect_id or num))
if redirect_info['status'] == 'error':
raise ExtractorError(
@@ -466,8 +518,9 @@ class PBSIE(InfoExtractor):
else:
formats.append({
'url': format_url,
'format_id': redirect.get('eeid'),
'format_id': redirect_id,
})
self._remove_duplicate_formats(formats)
self._sort_formats(formats)
rating_str = info.get('rating')
@@ -493,7 +546,7 @@ class PBSIE(InfoExtractor):
'id': video_id,
'display_id': display_id,
'title': info['title'],
'description': info['program'].get('description'),
'description': info.get('description') or info.get('program', {}).get('description'),
'thumbnail': info.get('image_url'),
'duration': int_or_none(info.get('duration')),
'age_limit': age_limit,

View File

@@ -0,0 +1,51 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import int_or_none
class PlaysTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?plays\.tv/video/(?P<id>[0-9a-f]{18})'
_TEST = {
'url': 'http://plays.tv/video/56af17f56c95335490/when-you-outplay-the-azir-wall',
'md5': 'dfeac1198506652b5257a62762cec7bc',
'info_dict': {
'id': '56af17f56c95335490',
'ext': 'mp4',
'title': 'When you outplay the Azir wall',
'description': 'Posted by Bjergsen',
}
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
title = self._og_search_title(webpage)
content = self._parse_json(
self._search_regex(
r'R\.bindContent\(({.+?})\);', webpage,
'content'), video_id)['content']
mpd_url, sources = re.search(
r'(?s)<video[^>]+data-mpd="([^"]+)"[^>]*>(.+?)</video>',
content).groups()
formats = self._extract_mpd_formats(
self._proto_relative_url(mpd_url), video_id, mpd_id='DASH')
for format_id, height, format_url in re.findall(r'<source\s+res="((\d+)h?)"\s+src="([^"]+)"', sources):
formats.append({
'url': self._proto_relative_url(format_url),
'format_id': 'http-' + format_id,
'height': int_or_none(height),
})
self._sort_formats(formats)
return {
'id': video_id,
'title': title,
'description': self._og_search_description(webpage),
'thumbnail': self._og_search_thumbnail(webpage),
'formats': formats,
}

View File

@@ -11,6 +11,7 @@ from ..compat import (
)
from ..utils import (
ExtractorError,
int_or_none,
sanitized_Request,
str_to_int,
)
@@ -23,13 +24,18 @@ class PornHubIE(InfoExtractor):
_VALID_URL = r'https?://(?:[a-z]+\.)?pornhub\.com/(?:view_video\.php\?viewkey=|embed/)(?P<id>[0-9a-z]+)'
_TESTS = [{
'url': 'http://www.pornhub.com/view_video.php?viewkey=648719015',
'md5': '882f488fa1f0026f023f33576004a2ed',
'md5': '1e19b41231a02eba417839222ac9d58e',
'info_dict': {
'id': '648719015',
'ext': 'mp4',
"uploader": "Babes",
"title": "Seductive Indian beauty strips down and fingers her pink pussy",
"age_limit": 18
'title': 'Seductive Indian beauty strips down and fingers her pink pussy',
'uploader': 'Babes',
'duration': 361,
'view_count': int,
'like_count': int,
'dislike_count': int,
'comment_count': int,
'age_limit': 18,
}
}, {
'url': 'http://www.pornhub.com/view_video.php?viewkey=ph557bbb6676d2d',
@@ -67,13 +73,23 @@ class PornHubIE(InfoExtractor):
'PornHub said: %s' % error_msg,
expected=True, video_id=video_id)
video_title = self._html_search_regex(r'<h1 [^>]+>([^<]+)', webpage, 'title')
flashvars = self._parse_json(
self._search_regex(
r'var\s+flashv1ars_\d+\s*=\s*({.+?});', webpage, 'flashvars', default='{}'),
video_id)
if flashvars:
video_title = flashvars.get('video_title')
thumbnail = flashvars.get('image_url')
duration = int_or_none(flashvars.get('video_duration'))
else:
video_title, thumbnail, duration = [None] * 3
if not video_title:
video_title = self._html_search_regex(r'<h1 [^>]+>([^<]+)', webpage, 'title')
video_uploader = self._html_search_regex(
r'(?s)From:&nbsp;.+?<(?:a href="/users/|a href="/channels/|span class="username)[^>]+>(.+?)<',
webpage, 'uploader', fatal=False)
thumbnail = self._html_search_regex(r'"image_url":"([^"]+)', webpage, 'thumbnail', fatal=False)
if thumbnail:
thumbnail = compat_urllib_parse_unquote(thumbnail)
view_count = self._extract_count(
r'<span class="count">([\d,\.]+)</span> views', webpage, 'view')
@@ -95,7 +111,7 @@ class PornHubIE(InfoExtractor):
path = compat_urllib_parse_urlparse(video_url).path
extension = os.path.splitext(path)[1][1:]
format = path.split('/')[5].split('_')[:2]
format = "-".join(format)
format = '-'.join(format)
m = re.match(r'^(?P<height>[0-9]+)[pP]-(?P<tbr>[0-9]+)[kK]$', format)
if m is None:
@@ -120,6 +136,7 @@ class PornHubIE(InfoExtractor):
'uploader': video_uploader,
'title': video_title,
'thumbnail': thumbnail,
'duration': duration,
'view_count': view_count,
'like_count': like_count,
'dislike_count': dislike_count,
@@ -129,7 +146,31 @@ class PornHubIE(InfoExtractor):
}
class PornHubPlaylistIE(InfoExtractor):
class PornHubPlaylistBaseIE(InfoExtractor):
def _extract_entries(self, webpage):
return [
self.url_result('http://www.pornhub.com/%s' % video_url, PornHubIE.ie_key())
for video_url in set(re.findall(
r'href="/?(view_video\.php\?.*\bviewkey=[\da-z]+[^"]*)"', webpage))
]
def _real_extract(self, url):
playlist_id = self._match_id(url)
webpage = self._download_webpage(url, playlist_id)
entries = self._extract_entries(webpage)
playlist = self._parse_json(
self._search_regex(
r'playlistObject\s*=\s*({.+?});', webpage, 'playlist'),
playlist_id)
return self.playlist_result(
entries, playlist_id, playlist.get('title'), playlist.get('description'))
class PornHubPlaylistIE(PornHubPlaylistBaseIE):
_VALID_URL = r'https?://(?:www\.)?pornhub\.com/playlist/(?P<id>\d+)'
_TESTS = [{
'url': 'http://www.pornhub.com/playlist/6201671',
@@ -140,21 +181,20 @@ class PornHubPlaylistIE(InfoExtractor):
'playlist_mincount': 35,
}]
class PornHubUserVideosIE(PornHubPlaylistBaseIE):
_VALID_URL = r'https?://(?:www\.)?pornhub\.com/users/(?P<id>[^/]+)/videos'
_TESTS = [{
'url': 'http://www.pornhub.com/users/rushandlia/videos',
'info_dict': {
'id': 'rushandlia',
},
'playlist_mincount': 13,
}]
def _real_extract(self, url):
playlist_id = self._match_id(url)
user_id = self._match_id(url)
webpage = self._download_webpage(url, playlist_id)
webpage = self._download_webpage(url, user_id)
entries = [
self.url_result('http://www.pornhub.com/%s' % video_url, 'PornHub')
for video_url in set(re.findall(
r'href="/?(view_video\.php\?.*\bviewkey=[\da-z]+[^"]*)"', webpage))
]
playlist = self._parse_json(
self._search_regex(
r'playlistObject\s*=\s*({.+?});', webpage, 'playlist'),
playlist_id)
return self.playlist_result(
entries, playlist_id, playlist.get('title'), playlist.get('description'))
return self.playlist_result(self._extract_entries(webpage), user_id)

View File

@@ -56,7 +56,7 @@ class PornoVoisinesIE(InfoExtractor):
r'<h1>(.+?)</h1>', webpage, 'title', flags=re.DOTALL)
description = self._html_search_regex(
r'<article id="descriptif">(.+?)</article>',
webpage, "description", fatal=False, flags=re.DOTALL)
webpage, 'description', fatal=False, flags=re.DOTALL)
thumbnail = self._search_regex(
r'<div id="mediaspace%s">\s*<img src="/?([^"]+)"' % video_id,

View File

@@ -28,16 +28,16 @@ class RadioBremenIE(InfoExtractor):
def _real_extract(self, url):
video_id = self._match_id(url)
meta_url = "http://www.radiobremen.de/apps/php/mediathek/metadaten.php?id=%s" % video_id
meta_url = 'http://www.radiobremen.de/apps/php/mediathek/metadaten.php?id=%s' % video_id
meta_doc = self._download_webpage(
meta_url, video_id, 'Downloading metadata')
title = self._html_search_regex(
r"<h1.*>(?P<title>.+)</h1>", meta_doc, "title")
r'<h1.*>(?P<title>.+)</h1>', meta_doc, 'title')
description = self._html_search_regex(
r"<p>(?P<description>.*)</p>", meta_doc, "description", fatal=False)
r'<p>(?P<description>.*)</p>', meta_doc, 'description', fatal=False)
duration = parse_duration(self._html_search_regex(
r"L&auml;nge:</td>\s+<td>(?P<duration>[0-9]+:[0-9]+)</td>",
meta_doc, "duration", fatal=False))
r'L&auml;nge:</td>\s+<td>(?P<duration>[0-9]+:[0-9]+)</td>',
meta_doc, 'duration', fatal=False))
page_doc = self._download_webpage(
url, video_id, 'Downloading video information')
@@ -51,7 +51,7 @@ class RadioBremenIE(InfoExtractor):
formats = [{
'url': video_url,
'ext': 'mp4',
'width': int(mobj.group("width")),
'width': int(mobj.group('width')),
}]
return {
'id': video_id,

View File

@@ -16,9 +16,9 @@ class RadioFranceIE(InfoExtractor):
'info_dict': {
'id': 'one-one',
'ext': 'ogg',
"title": "One to one",
"description": "Plutôt que d'imaginer la radio de demain comme technologie ou comme création de contenu, je veux montrer que quelles que soient ses évolutions, j'ai l'intime conviction que la radio continuera d'être un grand média de proximité pour les auditeurs.",
"uploader": "Thomas Hercouët",
'title': 'One to one',
'description': "Plutôt que d'imaginer la radio de demain comme technologie ou comme création de contenu, je veux montrer que quelles que soient ses évolutions, j'ai l'intime conviction que la radio continuera d'être un grand média de proximité pour les auditeurs.",
'uploader': 'Thomas Hercouët',
},
}

View File

@@ -18,11 +18,11 @@ class RBMARadioIE(InfoExtractor):
'info_dict': {
'id': 'ford-lopatin-live-at-primavera-sound-2011',
'ext': 'mp3',
"uploader_id": "ford-lopatin",
"location": "Spain",
"description": "Joel Ford and Daniel Oneohtrix Point Never Lopatin fly their midified pop extravaganza to Spain. Live at Primavera Sound 2011.",
"uploader": "Ford & Lopatin",
"title": "Live at Primavera Sound 2011",
'uploader_id': 'ford-lopatin',
'location': 'Spain',
'description': 'Joel Ford and Daniel Oneohtrix Point Never Lopatin fly their midified pop extravaganza to Spain. Live at Primavera Sound 2011.',
'uploader': 'Ford & Lopatin',
'title': 'Live at Primavera Sound 2011',
},
}

Some files were not shown because too many files have changed in this diff Show More