Compare commits

...

341 Commits

Author SHA1 Message Date
Sergey M․
b0081562d2 release 2016.08.12 2016-08-12 00:22:22 +07:00
Sergey M․
fff37cfd4f [ChangeLog] Actualize 2016-08-12 00:18:28 +07:00
Sergey M․
a3be69b7f0 [viu] Remove from extractors 2016-08-12 00:14:51 +07:00
Sergey M․
0fd1b1624c [goldenmoustache] Remove extractor (Closes #10298)
Now uses dailymotion
2016-08-11 23:52:17 +07:00
Sergey M․
367976d49f [drtuber] Improve title extraction 2016-08-11 23:47:52 +07:00
Sergey M․
0aef0771f8 [drtuber] Make dislike count optional (Closes #10297) 2016-08-11 23:47:27 +07:00
Sergey M․
0c070681c5 [chirbit] Fix extraction (Closes #10296) 2016-08-11 23:37:56 +07:00
Sergey M․
30b25d382d [francetvinfo] Relax _VALID_URL 2016-08-11 21:42:55 +07:00
Yen Chi Hsuan
e5f878c205 [ChangeLog] Add change log for #10269
[skip ci]
2016-08-11 19:13:41 +08:00
Yen Chi Hsuan
e2e84aed7e Merge branch 'lkho-pr/#10268' 2016-08-11 19:09:18 +08:00
Yen Chi Hsuan
b1927f4e8a [YoutubeDL] Disable newline conversion when writing subtitles
By default io.open() convert all '\n' occurrences to '\r\n' when writing
files. If the content already contains '\r\n', it will be converted to
'\r\r\n', breaking some video players.
2016-08-11 19:04:23 +08:00
Yen Chi Hsuan
3b9323d96e Merge branch 'pr/#10268' of https://github.com/lkho/youtube-dl into lkho-pr/#10268 2016-08-11 19:03:08 +08:00
lkho
7f832413d6 Preserve line endings for downloaded subtitle files 2016-08-10 23:40:50 +08:00
Sergey M․
7f2ed47595 [rtlnl] Relax _VALID_URL (Closes #10282) 2016-08-10 21:07:43 +07:00
Sergey M․
c3fa77bdef [formula1] Relax _VALID_URL (Closes #10283) 2016-08-10 21:00:40 +07:00
Remita Amine
57ce8a6d08 [wat] improve extraction(#10281)
add alternative method to extract http formats
works even if the video is geo-restricted or removed
from public access(most of the cases)
2016-08-10 14:20:28 +01:00
Yen Chi Hsuan
69d8eeeec5 [ctsnews] Fix extraction 2016-08-10 11:38:38 +08:00
Yen Chi Hsuan
81c13222c6 [utils] Recognize more formats in unified_timestamp
Used in CtsNews
2016-08-10 11:37:23 +08:00
Sergey M․
b1ce2ba197 release 2016.08.10 2016-08-10 00:20:44 +07:00
Sergey M․
5c8411e968 [ChangeLog] Actualize 2016-08-10 00:18:28 +07:00
Sergey M․
cc9c8ce5df [devscripts/prepare_manpage] Fix description strings starting with dash (Closes #10273) 2016-08-09 22:24:58 +07:00
Remita Amine
20ef4123b9 [uol] remove unused import 2016-08-09 15:13:15 +01:00
Remita Amine
4e62d26aa2 [uol] Add new extractor(#4263) 2016-08-09 15:09:08 +01:00
Sergey M․
b657816684 Credit @singh-pratyush96 for #10223 2016-08-09 04:04:45 +07:00
Sergey M․
9778b3e7ee Credit @zvonicek for #10242 and #10253 2016-08-09 04:03:52 +07:00
Sergey M․
25dd58ca6a [metadatafromtitle] Remove unused exception class 2016-08-09 04:01:05 +07:00
nyorain
5e42f8a0ad Make --metadata-from-title non fatal
Output a warning if the metadata can't be parsed from the title (and don't write any metadata) instead of raising a critical error.
2016-08-09 03:56:22 +07:00
Sergey M․
1ad6b891b2 Add more checks for --min/max-sleep-interval arguments and use more idiomatic naming 2016-08-09 03:47:56 +07:00
Sergey M․
7aa589a5e1 Fix --min/max-sleep-interval wording 2016-08-09 03:46:52 +07:00
singh-pratyush96
065bc35489 Add --max-sleep-interval (Closes #9930) 2016-08-09 03:32:42 +07:00
Sergey M․
3a380766d1 [rbmaradio] Improve, simplify and extract all formats (Closes #10242) 2016-08-09 02:46:29 +07:00
Petr Zvoníček
affaea0688 [rbmaradio] Fixed extractor 2016-08-09 02:18:33 +07:00
Sergey M․
77426a087b [sonyliv] Improve (Closes #10258) 2016-08-09 02:16:28 +07:00
Sukhbir Singh
8991844ea2 [sonyliv] Add new extractor 2016-08-09 02:09:13 +07:00
Sergey M․
082395d0a0 [extractor/generic] Add proper default to _search_json_ld call 2016-08-08 22:48:33 +07:00
Sergey M․
e8ed7354e6 [flipagram] Add proper default to _search_json_ld call 2016-08-08 22:46:19 +07:00
Sergey M․
1e7f602e2a [condenast] Make _search_json_ld call non fatal 2016-08-08 22:45:49 +07:00
Sergey M․
522f6c066d [bbc] Add proper default to _search_json_ld call 2016-08-08 22:44:36 +07:00
Sergey M․
321b5e082a [extractor/common] Respect default in _search_json_ld 2016-08-08 22:36:18 +07:00
Sergey M․
3711fa1eb2 Revert "[flipagram] Make _search_json_ld non fatal"
This reverts commit d34995a9e3.
2016-08-08 21:49:45 +07:00
Sergey M․
395c74615c Revert "[extractor/generic] Make _search_json_ld non fatal"
This reverts commit 958849275f.
2016-08-08 21:49:27 +07:00
Yen Chi Hsuan
3dc240e8c6 [sohu] Update _TESTS (closes #10260) 2016-08-08 18:48:21 +08:00
Yen Chi Hsuan
a41a6c5094 [chaturbate] Skip the invalid test 2016-08-08 13:06:02 +08:00
Yen Chi Hsuan
d71207121d [biqle] Skip an invalid test 2016-08-08 12:59:55 +08:00
Yen Chi Hsuan
b1c6f21c74 [aparat] Fix extraction 2016-08-08 12:59:07 +08:00
Yen Chi Hsuan
412abb8760 [bilibili] Update _TESTS 2016-08-08 12:57:17 +08:00
Yen Chi Hsuan
f17d5f6d14 [features.aol.com] Fix _TESTS 2016-08-08 12:52:36 +08:00
Remita Amine
6bb801cfaf [cwtv] extract http formats 2016-08-07 22:58:12 +01:00
Sergey M․
de02d1f4e9 [rozhlas] Fix regexes and improve extraction (Closes #10253) 2016-08-08 04:58:02 +07:00
Petr Zvoníček
e1f93a0a76 [rozhlas] Add new extractor 2016-08-08 04:41:45 +07:00
Charlie Le
d21a661bb4 [README.md] Update Options Link
The link references a bad anchor. The updated link now references the correct anchor.
2016-08-08 03:46:42 +07:00
Yen Chi Hsuan
b2bd968f4b [kuwo:singer] Fix extraction 2016-08-07 22:59:34 +08:00
Sergey M․
4a01befb34 release 2016.08.07 2016-08-07 21:12:41 +07:00
Sergey M․
845dfcdc40 [ChangeLog] Actualize 2016-08-07 21:10:48 +07:00
Sergey M․
d92cb46305 [discoverygo] Add extractor (Closes #10245) 2016-08-07 20:57:05 +07:00
Sergey M․
a8795327ca [utils] Add support TV Parental Guidelines ratings in parse_age_limit 2016-08-07 20:45:18 +07:00
Sergey M․
d34995a9e3 [flipagram] Make _search_json_ld non fatal 2016-08-07 19:06:55 +07:00
Sergey M․
958849275f [extractor/generic] Make _search_json_ld non fatal 2016-08-07 19:04:22 +07:00
Sergey M․
998f094452 [bbc] Remove proxy from test 2016-08-07 18:13:05 +07:00
Sergey M․
aaa42cf0cf [bbc] PEP 8 2016-08-07 18:05:13 +07:00
Sergey M․
9fb64c04cd [bbc] Add support for morph embeds (Closes #10239) 2016-08-07 18:01:50 +07:00
Remita Amine
f9622868e7 [bbc] preserve format_id backward compatibility 2016-08-07 11:14:15 +01:00
Remita Amine
37768f9242 [common] correctly lower the preference of m3u8 master manifest format 2016-08-07 10:59:09 +01:00
Sergey M․
a1aadd09a4 [tnaflixnetworkbase] Improve title extraction 2016-08-07 16:00:09 +07:00
Sergey M․
b47a75017b [tnaflix] Fix metadata extraction (Closes #10249) 2016-08-07 16:00:03 +07:00
Remita Amine
e37b54b140 [fox] fix theplatform release url query 2016-08-06 20:53:39 +01:00
Yen Chi Hsuan
c1decda58c [openload] Fix extraction (closes #9706) 2016-08-07 02:44:15 +08:00
Yen Chi Hsuan
d3f8e038fe [utils] Add decode_png for openload (#9706) 2016-08-07 02:42:58 +08:00
Remita Amine
ad152e2d95 [bbc] fix test 2016-08-06 19:36:12 +01:00
Remita Amine
b0af12154e [bbc] reduce requests and improve format_id 2016-08-06 19:24:59 +01:00
Remita Amine
d16b3c6677 [common] extract partOfTVSeries info in json-ld 2016-08-06 18:58:38 +01:00
Remita Amine
c57244cdb1 [common] lower the preference of m3u8 master manifest format 2016-08-06 18:55:05 +01:00
Remita Amine
a7e5f27412 [bbc] improve extraction
- extract f4m and dash formats
- improve format sorting and listing
- improve extraction of articles with `otherSettings.playlist`
2016-08-06 18:48:09 +01:00
Remita Amine
089a40955c [pokemon] improve _VALID_URL 2016-08-06 12:08:14 +01:00
Remita Amine
d73ebac100 [pokemon] Add new extractor(closes #10093) 2016-08-06 11:18:14 +01:00
Remita Amine
e563c0d73b [condenast] fallback to loader.js if video.js fail 2016-08-05 21:01:16 +01:00
Sergey M․
491c42e690 release 2016.08.06 2016-08-06 01:23:48 +07:00
Sergey M․
7f2339c617 [ChangeLog] Actualize 2016-08-06 01:19:47 +07:00
Sergey M․
8122e79fef [gamekings] Remove remnants 2016-08-06 00:12:37 +07:00
Sergey M․
fe3ad1d456 [adultswim] Remove superfluous md5 from test 2016-08-06 00:02:05 +07:00
Sergey M․
038a5e1a65 [adultswim] Add support for trailers (Closes #10235) 2016-08-06 00:00:05 +07:00
Sergey M․
84bc23b41b [archiveorg] PEP 8 2016-08-05 23:16:19 +07:00
Sergey M․
46933a15d6 [extractor/common] Support root JSON-LD lists (Closes #10203) 2016-08-05 23:14:32 +07:00
Sergey M․
3859ebeee6 [tvplay] Capture and output native error message 2016-08-05 22:50:42 +07:00
Remita Amine
d50aca41f8 [archiveorg] improve format extraction(closes #10219) 2016-08-05 16:42:15 +01:00
Remita Amine
0ca057b965 [jwplatform] add support for playlist extraction and relative urls and improve audio detection 2016-08-05 16:42:15 +01:00
Sergey M․
5ca968d0a6 [tvplay] Extract series metadata 2016-08-05 22:37:38 +07:00
Sergey M․
f0d31c624e [tvplay] Add support for subtitles (Closes #10194) 2016-08-05 22:17:32 +07:00
Remita Amine
08c655906c [5min] fix _VALID_URL(closes #10228) 2016-08-05 10:22:33 +01:00
Remita Amine
5a993e1692 [natgeo] fix tests(closes #10229) 2016-08-05 10:13:26 +01:00
Remita Amine
a7d2953073 [extractors] add tvp:embed import 2016-08-05 10:11:59 +01:00
Remita Amine
fdd0b8f8e0 [tvp] extract video id from the webpage(fixes #7799) 2016-08-05 09:44:15 +01:00
Remita Amine
f65dc41b72 [naver] extract upload date 2016-08-05 08:12:25 +01:00
Yen Chi Hsuan
962250f7ea [cbslocal] Fix timestamp parsing (closes #10213) 2016-08-05 11:44:50 +08:00
Yen Chi Hsuan
7dc2a74e0a [utils] Fix unified_timestamp for formats parsed by parsedate_tz() 2016-08-05 11:41:55 +08:00
Remita Amine
b02b960c6b [naver] improve extraction(closes #8096) 2016-08-04 21:42:22 +01:00
Remita Amine
4f427c4be8 [condenast] improve extraction 2016-08-04 18:30:56 +01:00
Sergey M․
8a00ea567b [natgeo:episodeguide] Do not shadow url from outer scope 2016-08-04 23:21:04 +07:00
Remita Amine
8895be01fc [5min] fix _VALID_URL 2016-08-04 16:55:12 +01:00
Remita Amine
52e7fcfeb7 [engadget] Relax _VALID_URL 2016-08-04 16:34:47 +01:00
Remita Amine
2396062c74 [5min] delegate extraction to AolIE
recently the 5min SenseHandler request return
HTTP Error 503: Service Unavailable error
2016-08-04 16:21:27 +01:00
Remita Amine
14704aeff6 [kaltura] remove debugging line 2016-08-04 14:54:34 +01:00
Remita Amine
3c2c3af059 [extractors] change imports for national geographic extractors 2016-08-04 12:20:56 +01:00
Remita Amine
1891ea2d76 [nationalgeographic] Add support for National Geographic Episode Guide 2016-08-04 12:18:10 +01:00
Remita Amine
1094074c04 [kaltura] extract subtitles and reduce requests 2016-08-04 09:39:06 +01:00
Remita Amine
217d5ae013 [vodplatform] Add new extractor 2016-08-04 09:39:06 +01:00
Remita Amine
8b40854529 [common] lower proto_preference of rtsp formats
Most of the time the RtspFD fail to download videos but it report
success of the download with this output:
[mpv] 0 bytes
[download] 100% of 0.00B
2016-08-04 09:39:06 +01:00
Yen Chi Hsuan
6bb0fbf9fb Revert "[README.md] Use full paths for all configuration files (#8863)"
This reverts commit 899d2bea63.
2016-08-04 09:54:28 +08:00
Sergey M․
8d3b226b83 [gamekings] Remove extractor
Now covered by generic jwplayer
2016-08-03 22:06:10 +07:00
Remita Amine
42b7a5afe0 [limelight] extract http formats 2016-08-03 13:12:51 +01:00
Yen Chi Hsuan
899d2bea63 [README.md] Use full paths for all configuration files (#8863) 2016-08-03 11:15:27 +08:00
Sergey M․
9cb0e65d7e [ntvru] Fix extraction 2016-08-02 22:56:48 +07:00
Sergey M․
b070564efb [extractor/common] Support multiple properties in _og_search_property 2016-08-02 22:55:14 +07:00
Philipp Hagemeister
ce28252c48 [options] Add test that checks that --password=secret is hidden in verbose output 2016-08-02 17:03:46 +02:00
Philipp Hagemeister
3aa9a73554 [options] Hide --password=secret in verbose output 2016-08-02 17:03:26 +02:00
Philipp Hagemeister
6a9b3b61ea [comedycentral] Re-add shortnames
In cc99d4f826, the shortname feature got deleted by accident. Re-add it as a separate IE.
2016-08-02 14:02:31 +02:00
Sergey M․
45408eb075 release 2016.08.01 2016-08-01 22:59:23 +07:00
Sergey M․
eafc66855d [ChangeLog] Add recent changes 2016-08-01 22:56:01 +07:00
Sergey M․
e03d3e6453 [cwtv] Add support for cwtvpr.com (Closes #10196) 2016-08-01 22:51:01 +07:00
Remita Amine
a70e45f80a [limelight] keep videos marked as previewStream
e382b953f0 (commitcomment-18472915)
2016-08-01 16:25:41 +01:00
Sergey M․
697655a7c0 [safari] Relax url regexes (Closes #10202) 2016-08-01 21:48:48 +07:00
Remita Amine
e382b953f0 [limelight] skip preview and drm protected videos 2016-08-01 00:33:30 +01:00
Yen Chi Hsuan
116e7e0d04 [bloomberg] Support BPlayer() players (closes #10187) 2016-07-31 14:47:19 +08:00
Sergey M․
cf03e34ad3 [yandexmusic:track] Fix extraction (Closes #10193) 2016-07-31 07:56:18 +07:00
Sergey M․
2903137292 release 2016.07.30 2016-07-30 14:45:07 +07:00
Sergey M․
9361f2169c [ChangeLog] Make extractor improvements' descriptions more concrete 2016-07-30 14:43:28 +07:00
Yen Chi Hsuan
35aa6c538f Add ChangeLog 2016-07-30 12:33:09 +08:00
Sergey M․
fa9f1d16b8 [dailymotion:playlist] Carry long line 2016-07-29 22:47:34 +07:00
Dave
485fedf6fd [dailymotion:playlist] Optimize download archive processing 2016-07-29 22:45:41 +07:00
Jaime Marquínez Ferrándiz
da0baba5c8 [rtve] Fix extraction for some videos
For example http://www.rtve.es/alacarta/videos/documentos-tv/documentos-tv-descredito/3574098/.
2016-07-29 17:20:27 +02:00
Jaime Marquínez Ferrándiz
bb9f3bfedf Revert "[rtve] Fix extraction (#10076)"
This reverts commit c39b2ed990.

Apparently outside of Spain using 'auth/resources' is required (#10097).
2016-07-29 17:14:04 +02:00
Sergey M․
dbc0b39b91 [tv2] Improve extraction 2016-07-29 22:01:34 +07:00
Sergey M․
481c5c5137 [tv2:article] Fix extraction (Closes #10188) 2016-07-29 21:43:17 +07:00
Sergey M․
0cacae2807 [twitch:clips] Sort formats 2016-07-29 09:01:53 +07:00
Sergey M․
d9d56deadf release 2016.07.28 2016-07-28 02:42:57 +07:00
Sergey M․
74ba450a81 [twitch:clips] Fix extraction (Closes #9767) 2016-07-28 22:30:09 +07:00
Sergey M․
db19df6ca0 [extractor/generic] Add test for #10179 2016-07-28 22:20:08 +07:00
Sergey M․
fbdf8d15d1 [soundcloud] Add _extract_urls (#10179) 2016-07-28 22:16:05 +07:00
Sergey M․
94aae01548 [extractor/generic] Extract all soundcloud embeds (Closes #10179) 2016-07-28 22:15:15 +07:00
Sergey M․
39eef54cf0 [ard:mediathek] Skip unavailable test 2016-07-28 21:38:23 +07:00
Sergey M․
05c8268c81 [shared] Modernize and make more robust 2016-07-27 23:39:02 +07:00
Sergey M․
289a16b4f3 [shared] Respect redirect URL (Closes #10170) 2016-07-27 23:28:01 +07:00
Sergey M․
7935926baa [devscripts/show-downloads-statistics] Add support for paging 2016-07-27 00:14:40 +07:00
Sergey M․
dcbb07c35a release 2016.07.26.2 2016-07-26 23:56:53 +07:00
Sergey M․
40090e8d51 [extractor/common] Improve is_suitable
In order to fix breakage introduced by a3aa814b77
2016-07-26 23:54:06 +07:00
Sergey M․
3e050d51d4 [orf:oe1] Relax _VALID_URL 2016-07-26 23:14:04 +07:00
Sergey M․
ced70c8640 [cbc] PEP 8 2016-07-26 23:08:08 +07:00
Sergey M․
9a700deea4 [instagram] Remove duplicate field in test 2016-07-26 23:07:16 +07:00
Sergey M․
dc35ba0eba [mgtv] Fix typo 2016-07-26 23:06:21 +07:00
Sergey M․
88bd486b9a [cbc] Improve extraction for videos embedded with clipId 2016-07-26 22:58:50 +07:00
Sergey M․
7f8b92e3cf [bigflix] Update tests 2016-07-26 21:44:53 +07:00
Yen Chi Hsuan
35f6e0ff36 [mtv.de] Skip 2 geo-restricted tests 2016-07-26 13:19:47 +08:00
Yen Chi Hsuan
326fa4e6e5 [generic] Skip an invalid test 2016-07-26 13:16:04 +08:00
Yen Chi Hsuan
c74299a72c [cmt] Detect unavailable videos and update _TESTS 2016-07-26 13:13:14 +08:00
Yen Chi Hsuan
10a1bb3a78 [mtv] Fix for videos with missing bitrates 2016-07-26 13:12:24 +08:00
Yen Chi Hsuan
4d3e543c73 Update extractors.py 2016-07-26 11:17:28 +08:00
Yen Chi Hsuan
05d1e7aaa9 [generic] Fix an MTV test and another test that breaks nosetests 2016-07-26 11:11:36 +08:00
Yen Chi Hsuan
a3aa814b77 Update _TESTS for MTV sites 2016-07-26 11:10:41 +08:00
Yen Chi Hsuan
5c32a77cad [nextmovie] Remove extractor
This domain name now redirects to mtv.com
2016-07-26 11:08:55 +08:00
Yen Chi Hsuan
14a28e705b [test/test_all_urls] Remove *.cc.com tests 2016-07-26 11:08:09 +08:00
Yen Chi Hsuan
cc99d4f826 [comedycentral] Remove IEs for *.cc.com except tosh.cc.com
All other subdomains now redirects to cc.com/* URLs
2016-07-26 11:06:50 +08:00
Yen Chi Hsuan
712c7530ff [mtv] Extract more metadata and more
1. Remove MTVIggyIE. All www.mtviggy.com URLs now redirects to
   www.mtv.com
2. Fix MTVDEIE
3. Return multiple URLs from _transform_rtmp_url. This is for
   tosh.cc.com
2016-07-26 11:03:43 +08:00
Sergey M․
0a147785e8 [camdemy] Extract duration properly 2016-07-25 23:03:58 +07:00
Sergey M․
59eaf69e33 [camdemy] Fix camdemy 2016-07-25 23:03:43 +07:00
Sergey M․
e8be2943a7 [smotri] Modernize, make more robust and fix tests 2016-07-24 18:38:18 +07:00
Sergey M․
8fdc538b46 release 2016.07.24 2016-07-24 11:39:50 +07:00
Sergey M․
9513c1eb17 [tvp] Update dash format comment 2016-07-24 11:03:39 +07:00
Sergey M․
ae6fff4e64 [onet] Enable dash formats 2016-07-24 10:43:05 +07:00
Sergey M․
5a65668e25 [dcn] Enable dash formats 2016-07-24 10:35:55 +07:00
Sergey M․
f75e6890db [telegraaf] Make hls non fatal 2016-07-24 10:29:26 +07:00
Sergey M․
d9cb92c840 [telegraaf] Enable dash formats 2016-07-24 10:29:09 +07:00
Sergey M․
94c04a3c79 [arkena] Enable dash formats 2016-07-24 10:28:11 +07:00
Sergey M․
f094834857 [extractor/common] Add support for $ in SegmentTemplate in MPD manifests 2016-07-24 10:27:16 +07:00
Déstin Reed
111de00289 [DailyMail] Improve title and description extraction 2016-07-24 05:37:13 +07:00
Sergey M․
b4a131e1a5 [facebook] Relax _VALID_URL (Closes #10151) 2016-07-24 04:36:49 +07:00
Sergey M․
f1991ce928 [arkena] Skip dash formats 2016-07-23 18:07:55 +07:00
Sergey M․
6548030a17 Credit @rvanbekkum for arkena (#8682) 2016-07-23 18:00:19 +07:00
Sergey M․
3a8947650b [arkenaplay] Remove extractor 2016-07-23 17:57:55 +07:00
Sergey M․
1979969f91 [extractor/generic] Add support for arkena embeds 2016-07-23 17:56:48 +07:00
Sergey M․
0673741af3 [extractors] Add imports for arkena and lcp 2016-07-23 17:56:29 +07:00
Sergey M․
c8e170b209 [lcp] Improve extraction 2016-07-23 17:56:11 +07:00
Sergey M․
bbe1f3634a [arkena] Improve extraction (Closes #8682) 2016-07-23 17:55:54 +07:00
Rob van Bekkum
4671dd41b2 [arkena:lcp] Add extractors 2016-07-23 17:01:09 +07:00
Sergey M․
f164b97123 [utils] Add another f4m mimetype to mimetype2ext 2016-07-23 16:48:59 +07:00
Sergey M․
5275efe30d release 2016.07.22 2016-07-22 23:11:28 +07:00
Sergey M․
b13647cf3c [eporner] Fix extraction (Closes #10139) 2016-07-22 23:04:13 +07:00
Sergey M․
add7d2a0e2 [pornhub] Make error regex less ambiguous (Closes #10138) 2016-07-22 21:24:09 +07:00
Sergey M․
e298d3a08c [youtube] Fix authentication (Closes #10140) 2016-07-22 21:05:39 +07:00
Sergey M․
fd8c8c7dcd [youtube:shared] Relax _VALID_URL 2016-07-21 22:58:34 +07:00
Sergey M․
9158af16cc [bbc.co.uk:iplayer:playlist] Add support for group URLs 2016-07-21 22:37:36 +07:00
Sergey M․
c6668e4ad1 [bbc.co.uk:iplayer:playlist] Skip unavailable test 2016-07-21 22:34:55 +07:00
Sergey M․
84e8cca48b [youjizz] Relax _VALID_URL (Closes #10131) 2016-07-20 22:41:13 +07:00
Sergey M․
790b06b7d4 [odatv] Improve (Closes #9285) 2016-07-20 21:43:22 +07:00
skacurt
740d7c49c2 [odatv] Add extractor 2016-07-20 21:42:05 +07:00
Sergey M․
4e51ec5f57 [extractors] Add import for comedycentral.tv 2016-07-19 22:50:37 +07:00
Sergey M․
05087d1b4c [bbc] Improve extraction from sxml playlists 2016-07-19 22:49:38 +07:00
Sergey M․
a66a73ee90 [ard] Add test for rbb-online 2016-07-18 02:25:31 +07:00
Sergey M․
8188b923db release 2016.07.17 2016-07-17 19:04:29 +07:00
Sergey M․
d993a1354d [README.md] Make download URLs consistent 2016-07-17 18:58:47 +07:00
Sergey M․
e8882e7043 [spike] Relax _VALID_URL and improve extraction (Closes #10106) 2016-07-17 18:34:25 +07:00
Sergey M․
1056821799 [viki] Fix tests (Closes #10098) 2016-07-17 18:13:54 +07:00
Sergey M․
890e6d3309 [viki] Lower m3u8 preference
http URLs are always provde the same or better quality
2016-07-17 18:12:03 +07:00
Sergey M․
246080d378 [viki] Override m3u8 formats acodec 2016-07-17 18:10:16 +07:00
Sergey M․
b1ea680270 Revert "[bbc] extract more and better qulities from Unified Streaming Platform m3u8 manifests"
This reverts commit 0385aa6199.
2016-07-17 17:29:36 +07:00
Sergey M․
45550d1039 [comedycentraltv] Add extractor (Closes #10101) 2016-07-17 16:58:58 +07:00
Sergey M․
7cdfc4c90f [mtvservices] Strip description 2016-07-17 16:56:39 +07:00
Sergey M․
af21f56f98 [ard] Add support for rbb-online (Closes #10095) 2016-07-17 03:40:58 +07:00
Sergey M․
1a8f0773b6 [streamable] Fix title extraction and improve (Closes #9122) 2016-07-17 02:01:00 +07:00
Zach Bruggeman
59cc5bd8bf [streamable] Add extractor 2016-07-17 01:35:09 +07:00
Sergey M․
49bc16b95e [nintendo] Improve playlist extraction (Closes #9986) 2016-07-17 00:01:25 +07:00
TRox1972
a2f9ca1e67 [nintendo] Add extractor 2016-07-16 23:58:53 +07:00
Sergey M․
371ddb14fe [extractor/generic] Change twitter:player embeds priority to lowest (Closes #10090) 2016-07-16 15:59:43 +07:00
Yen Chi Hsuan
998895dffa [cloudy] Drop videoraj.to
videoraj.ch is now a shoe-selling website, and videoraj.to domain name
is gone.
2016-07-16 15:37:54 +08:00
Yen Chi Hsuan
aadd3ce21f [cliphunter] Update _TESTS 2016-07-16 15:37:54 +08:00
Yen Chi Hsuan
ae7b846203 [cbsnews] Update _TESTS of CBSNewsLiveVideoIE 2016-07-16 15:37:54 +08:00
Yen Chi Hsuan
21ba7d0981 [cbc] Skip geo-restricted test case 2016-07-16 15:37:54 +08:00
Sergey M․
691fbe7f98 release 2016.07.16 2016-07-16 02:20:00 +07:00
Sergey M․
2e221ca3a8 [YoutubeDL] Fix incomplete formats check 2016-07-16 01:18:05 +07:00
Sergey M․
317f7ab634 [YoutubeDL] Fix format selection with filters (Closes #10083) 2016-07-16 00:55:43 +07:00
Yen Chi Hsuan
23495d6a39 Revert "[ffmpeg] Fix embedding subtitles (#9063)"
This reverts commit ccff2c404d.

Fixes #10081.

The new approach breaks embedding subtitles into video-only or
audio-only files. FFMpeg provides a trick: add '?' after the argument of
'-map' so that a missing stream is ignored. For example:

opts = [
    '-map', '0:v?',
    '-c:v', 'copy',
    '-map', '0:a?',
    '-c:a', 'copy',
    # other options...
]

Unfortunately, such a format is not implemented in avconv, either.
I guess adding '-ignore_unknown' if self.basename == 'ffmpeg' is the
best solution. However, the example mentioned in #9063 no longer serves
problematic files, so I can't test it. I'll reopen #9063 and wait for
another example so that I can test '-ignore_unknown'.
2016-07-15 20:02:36 +08:00
Remita Amine
224db034ab [syfy] fix extraction(closes #9087)(closes #3820)(closes #2388) 2016-07-14 23:59:47 +01:00
Sergey M․
ad27649be3 [3qsdn] Restrict src JS regex 2016-07-15 03:36:50 +07:00
Sergey M․
84571be645 [orf:tvthek] Remove test md5 2016-07-15 03:17:29 +07:00
Nehal Patel
7b0d333a7e Fix unit tests for m3u8 and RTSP extractors that require ffmpeg or mplayer 2016-07-15 03:06:23 +07:00
Remita Amine
342f0c3682 [ninenow] correct test url 2016-07-14 14:19:18 +01:00
Remita Amine
38e0f16a94 [ninenow] Add new extractor(closes #5181) 2016-07-14 14:16:11 +01:00
Remita Amine
e910fe2fe4 [brightcove] skip ism manifests 2016-07-14 14:13:57 +01:00
Jaime Marquínez Ferrándiz
233b58dec7 Add extractor for rtve.es/television (fixes #10076) 2016-07-13 21:02:34 +02:00
Jaime Marquínez Ferrándiz
c39b2ed990 [rtve] Fix extraction (#10076)
For http://www.rtve.es/alacarta/videos/documentos-tv/documentos-tv-revolucion-del-movil/3069778/ using 'auth/resources' fails, and other URLs seem to work fine.
2016-07-13 20:23:27 +02:00
Remita Amine
35ec86689c [bbc] extract only the original Unified Streaming Platform m3u8 manifests
0385aa6199 (commitcomment-18233275)
manifests with higher birate require more time to check formats
2016-07-13 18:01:14 +01:00
Sergey M․
c485959034 release 2016.07.13 2016-07-13 23:58:01 +07:00
Sergey M․
a0560d8ab8 [ellentv] Improve extraction (Closes #10067) 2016-07-13 22:42:53 +07:00
Remita Amine
0385aa6199 [bbc] extract more and better qulities from Unified Streaming Platform m3u8 manifests 2016-07-13 15:58:24 +01:00
Remita Amine
00f4764cb7 [common] extract vbr, abr and fps for Unified Streaming Platform m3u8 manifests 2016-07-13 15:58:24 +01:00
Sergey M․
51c2cd0b83 [extractors] Add vk:wallpost extractor import 2016-07-13 21:53:23 +07:00
Sergey M․
5f5a9d6158 [vk] Improve login 2016-07-13 21:52:52 +07:00
Sergey M․
2d19fb5072 [vk:wallpost] Add extractor 2016-07-13 21:51:44 +07:00
Yen Chi Hsuan
9d865a1af6 [travis] Skip downloading srelay
SOCKS tests never run on Travis CI due to unknown reasons, and
downloading them broke some tests (e.g.
https://travis-ci.org/rg3/youtube-dl/builds/144306425)
2016-07-13 14:27:14 +08:00
Remita Amine
41aa44259d [shahid] try to bypass geo restriction and extract more metadata(closes #10062) 2016-07-12 23:15:38 +01:00
Philipp Hagemeister
381ff44756 [devscripts/generate-download] Remove MD5 and SHA1 2016-07-12 09:09:54 +02:00
Sergey M․
7f29cf545a [youtube] Add YouTube Red paid video reference test (#10059) 2016-07-12 02:10:35 +07:00
Remita Amine
7d1219f3e0 [tmz] delegate extraction to KalturaIE 2016-07-11 19:08:22 +01:00
Remita Amine
f1b4af7d79 [beightcove:new] remove html tags from description 2016-07-11 19:06:50 +01:00
Remita Amine
8a8590a617 [dbtv] delegate extraction to BrightcoveNewIE 2016-07-11 16:30:24 +01:00
Remita Amine
4a7a5e41f7 [tvplay] improve extraction 2016-07-11 14:51:44 +01:00
Yen Chi Hsuan
2a49d01600 [playvid] Update _TESTS
Blocks https://travis-ci.org/rg3/youtube-dl/jobs/143809100
2016-07-11 15:15:28 +08:00
Yen Chi Hsuan
b99af8a51c [biobiochiletv] Fix extraction and update _TESTS 2016-07-11 13:23:57 +08:00
Yen Chi Hsuan
8e7020daef [rudo] Add new extractor
Used in biobiochile.tv
2016-07-11 13:19:25 +08:00
Sergey M․
a26bcc61c1 release 2016.07.11 2016-07-11 03:17:12 +07:00
Sergey M․
5c4dcf8172 [vidzi] Add support for embed URLs (Closes #10058) 2016-07-11 03:14:39 +07:00
Sergey M․
e9fb6a4bbe [youtube] Relax TFA regexes 2016-07-11 03:08:38 +07:00
Yen Chi Hsuan
e2dbcaa1bf [vuclip] Fix extraction 2016-07-11 00:52:25 +08:00
Yen Chi Hsuan
ae01850165 [miomio] Fix _TESTS 2016-07-11 00:03:24 +08:00
Yen Chi Hsuan
c3baaedfc8 [miomio] Support new 'h5' player (closes #9605)
Depends on #8876
2016-07-10 23:46:48 +08:00
Yen Chi Hsuan
0b68de3cc1 Merge pull request #8876 from remitamine/html5_media
[extractor/common] add helper method to extract html5 media entries
2016-07-10 23:40:45 +08:00
Sergey M․
39e9d524e5 Credit @nehalvpatel for roosterteeth (#9864) 2016-07-10 01:30:12 +07:00
Sergey M․
865b087224 [roosterteeth] Improve (Closes #9864) 2016-07-10 01:30:12 +07:00
Nehal Patel
3121b25639 [roosterteeth] Add extractor 2016-07-10 01:30:12 +07:00
Sergey M․
0286b85c79 release 2016.07.09.2 2016-07-09 22:22:24 +07:00
Sergey M․
ab52bb5137 [animeondemand] Fix typo 2016-07-09 22:20:34 +07:00
Sergey M․
61a98b8623 [lynda] Remove md5 from test (Closes #10047) 2016-07-09 21:29:11 +07:00
Sergey M․
6daf34a045 [facebook] Fix typo and break when found video_data (Closes #10048) 2016-07-09 21:25:07 +07:00
Yen Chi Hsuan
c03adf90bd [generic] Add the test. Closes #1638 2016-07-09 14:39:01 +08:00
Yen Chi Hsuan
0ece114b7b [vimeo] Recognize non-standard embeds (#1638) 2016-07-09 14:38:27 +08:00
Yen Chi Hsuan
5b6a74856b Merge pull request #9288 from reyyed/issue#9063fix
[ffmpeg] Fix embedding subtitles (#9063)
2016-07-09 14:29:53 +08:00
Sergey M․
ce43100a01 release 2016.07.09.1 2016-07-09 10:06:40 +07:00
Remita Amine
8cc9b4016d [srmediathek] extend _VALID_URL(closes #9373) 2016-07-09 03:22:09 +01:00
Remita Amine
31eeab9f41 [ard] fix f4m extraction and skip tests with 404 errors 2016-07-09 03:22:09 +01:00
Sergey M․
9558dcec9c [youtube:user] Preserve user/c path segment 2016-07-09 08:37:19 +07:00
Sergey M․
6e6b70d65f [extractor/generic] Properly comment out a test 2016-07-09 08:37:19 +07:00
Sergey M․
d417fd88d0 release 2016.07.09 2016-07-09 07:16:47 +07:00
Sergey M․
9e4f5dc1e9 [animeondemand] Pass num for episode based videos 2016-07-09 07:13:32 +07:00
Sergey M․
1251565ee0 [options] Rollback old behavior for configuratio files' encoding
Until agreed with some solution
2016-07-09 07:12:52 +07:00
Sergey M․
1f7258a367 [animeondemand] Add support for full length films (Closes #10031) 2016-07-09 06:57:04 +07:00
Sergey M․
0af985069b [flipagram] Improve extraction (Closes #9898) 2016-07-09 03:31:17 +07:00
Sergey M․
0de168f7ed [extractor/generic] Detect schema.org/VideoObject embeds 2016-07-09 03:29:07 +07:00
Sergey M․
95b31e266b [extractor/common] Add expected_type in json ld routines 2016-07-09 03:28:04 +07:00
Sergey M․
6b3a3098b5 [extractor/common] Extract more metadata for VideoObject in _json_ld 2016-07-09 03:27:11 +07:00
Sergey M․
2de624fdd5 [extractor/common] Introduce filesize metafield for thumbnails 2016-07-09 03:24:36 +07:00
Déstin Reed
3fee7f636c [flipagram] Add extractor 2016-07-09 03:23:32 +07:00
Remita Amine
89e2fff2b7 [mgtv] pass geo verification headers for api request 2016-07-08 20:18:25 +01:00
Sergey M․
cedc70b292 [facebook] Fix invalid video being extracted (Closes #9851) 2016-07-09 00:28:07 +07:00
Remita Amine
07d7689f2e [le] extract http formats 2016-07-08 15:35:20 +01:00
Yen Chi Hsuan
ae8cb5328d Merge branch 'JakubAdamWieczorek-polskie-radio' 2016-07-08 19:35:21 +08:00
Yen Chi Hsuan
2e32ac0b9a [polskieradio] Fix regex in _TESTS 2016-07-08 19:34:53 +08:00
Yen Chi Hsuan
672f01c370 Merge branch 'polskie-radio' of https://github.com/JakubAdamWieczorek/youtube-dl into JakubAdamWieczorek-polskie-radio 2016-07-08 19:33:28 +08:00
Jakub Adam Wieczorek
e2d616dd30 [polskieradio] Add thumbnails. 2016-07-08 13:23:00 +02:00
Yen Chi Hsuan
0ab7f4fe2b [nick] support nickjr.com (closes #7542) 2016-07-08 15:11:28 +08:00
Sergey M․
29c4a07776 [lynda] Fix test 2016-07-08 03:33:53 +07:00
Philipp Hagemeister
826e911e41 Merge branch 'master' of github.com:rg3/youtube-dl 2016-07-07 19:42:22 +02:00
Philipp Hagemeister
30d22dae8e [options] Do not decode Unicode on Python 2.x
The configuration file contents are being returned as unicode now, so decoding them is no longer necessary.
(Run python2 with -3 to see the warning before this commit)
2016-07-07 19:41:00 +02:00
Yen Chi Hsuan
ec3518725b [compat] Fix test_cmdline_umlauts on Python 2.6
The original statement raises uncaught UnicodeWarning on Python 2.6
2016-07-07 22:30:58 +08:00
Remita Amine
5f87d845eb [tweakers] fix info extraction(closes #9516) 2016-07-07 12:51:42 +01:00
Philipp Hagemeister
571808a7aa document comments in configuration file (fixes #10024) 2016-07-07 12:12:21 +02:00
Yen Chi Hsuan
dfe5fa49ae [compat] Fix compat_shlex_split for non-ASCII input
Closes #9871
2016-07-07 17:37:29 +08:00
Remita Amine
01a0c511eb [radiocanada] extract more formats 2016-07-07 03:46:12 +01:00
remitamine
b3d30315ce Merge pull request #9597 from remitamine/toutv
[toutv] fix info extraction(closes #1792)(closes #2082)
2016-07-07 01:51:01 +01:00
remitamine
882af14d7d [toutv] fix info extraction(closes #1792)(closes #2082) 2016-07-07 01:47:28 +01:00
Remita Amine
47335a0efa [telecinco] fix info extraction 2016-07-06 23:09:13 +01:00
Sergey M․
34bc2d9dfd release 2016.07.07 2016-07-07 01:54:29 +07:00
Sergey M․
08c7af4afa [kamcord] Add extractor (Closes #10001) 2016-07-07 01:50:39 +07:00
Yen Chi Hsuan
f7291a0b7c [daum.net] Fix extraction for specific examples
Closes #9972
2016-07-07 01:26:14 +08:00
Yen Chi Hsuan
c65aa4e9e1 [brightcove:legacy] Support 'playlistTabs' and skip a dead test
Closes #9965
2016-07-07 01:13:37 +08:00
Yen Chi Hsuan
ad213a1d74 [francetv] Recognize more Dailymotion embedded videos
Closes #9955
2016-07-06 23:37:54 +08:00
Yen Chi Hsuan
43f1e4e41e [onet] Add MD5 checksum 2016-07-06 20:32:03 +08:00
Yen Chi Hsuan
54b0e909d5 [amp] Fix a typo 2016-07-06 20:10:47 +08:00
Yen Chi Hsuan
f8752b86ac [Onet,ClipRs] Add new extractor for onet.tv and use it for clip.rs
Closes #9950
2016-07-06 20:09:05 +08:00
Yen Chi Hsuan
84c237fb8a [utils] Add get_element_by_class
For #9950
2016-07-06 20:02:52 +08:00
Remita Amine
ab49d7a9fa use mimetype2ext to determine manifest ext in multiple extractors 2016-07-06 09:11:46 +01:00
Remita Amine
b4173f1551 [utils] add mimetypes to determine manifest ext(m3u8, f4m, mpd) 2016-07-06 09:06:28 +01:00
Remita Amine
2817b99cf2 [metacafe] fix info extraction(closes #8539)(closes #3253) 2016-07-06 02:19:55 +01:00
Remita Amine
001fffd004 [spiegel:article] update test(closes #10018) 2016-07-06 00:16:41 +01:00
Sergey M․
0e94b4713d release 2016.07.06 2016-07-06 00:54:23 +07:00
Sergey M․
a6d3b89feb [prosiebensat1] Make downloading urls JSON non fatal 2016-07-06 00:52:48 +07:00
Remita Amine
6c26815d63 [onionstudios] fix info extraction 2016-07-05 18:05:07 +01:00
Sergey M․
73c4ac2c95 [youtube:channel] Improve channel id extraction and detect unavailable channels (Closes #10009) 2016-07-05 23:30:44 +07:00
Remita Amine
84f214d840 [prosiebensat1] extract all formats 2016-07-05 17:11:45 +01:00
Remita Amine
e3f88be7a9 [rtvnh] extract all formats 2016-07-05 14:45:39 +01:00
Remita Amine
31af3e35e0 [sandia] remove unused imports 2016-07-05 13:39:24 +01:00
Remita Amine
94a5cff91d [sendia] fix info extraction 2016-07-05 13:37:46 +01:00
Remita Amine
77082c7b9e [slideshare] fix description extraction 2016-07-05 12:01:04 +01:00
Remita Amine
252a1f75d2 [spiegel] improve info extraction 2016-07-05 11:46:25 +01:00
Remita Amine
5abf513cf8 [stitcher] fix episode config extraction 2016-07-05 10:44:16 +01:00
Yen Chi Hsuan
c6054e3201 [xuite] Support videos with already encoded media id 2016-07-05 14:26:42 +08:00
Yen Chi Hsuan
4080530624 [youtube:shared] Recognize the new 'shared' URLs
Closes #10007
2016-07-05 13:15:05 +08:00
Sergey M․
c25f1a9b63 release 2016.07.05 2016-07-05 06:32:46 +07:00
Remita Amine
dfaa86b75e [test_utils] add test for smuggling a smuggled url 2016-07-04 21:36:32 +01:00
Remita Amine
d9163ae3b6 [kaltura] fix extraction error for videos from multiple kaltura servers 2016-07-04 21:34:27 +01:00
Remita Amine
dafafe7cf1 [la7] extract more info from a kaltura custom server 2016-07-04 17:59:58 +01:00
Remita Amine
81953d1ae5 [kaltura] add support videos stored on custom kaltura servers(closes #5557) 2016-07-04 17:59:58 +01:00
Yen Chi Hsuan
3a212ed62e [iqiyi] Skip an unstable MD5 checksum 2016-07-04 11:25:46 +08:00
Sergey M․
195f084542 [pornhub] Detect private videos (Closes #9987) 2016-07-04 03:27:00 +07:00
Sergey M․
aa7a455b2e [README.md] Clarify configuration file may not exist by default 2016-07-04 01:24:33 +07:00
Sergey M․
6a4e659c93 [yahoo] Recognize brightcove embed (Closes #9995) 2016-07-03 23:00:36 +07:00
Yen Chi Hsuan
40f3666f6b [test/test_http] Update tests for 38cce791c7 2016-07-03 23:50:55 +08:00
Remita Amine
dd801bbe18 [brightcove] improve error detection 2016-07-03 16:37:22 +01:00
Yen Chi Hsuan
38cce791c7 Rename --cn-verfication-proxy to --geo-verification-proxy
And deprecate the former one

Since commit f138873900, this option is
not limited to China websites, so rename it.
2016-07-03 23:29:56 +08:00
Sergey M․
bf3ae6a543 [devscripts/show-downloads-statictics] Add script for displaying downloads statistics 2016-07-03 22:20:14 +07:00
remitamine
59bbe4911a [extractor/common] add helper method to extract html5 media entries 2016-06-26 14:04:08 +01:00
remitamine
4f3c5e0627 [utils] add helper function for parsing codecs 2016-06-26 14:03:58 +01:00
Wang Jun Tham
ccff2c404d [ffmpeg] Fix embedding subtitles (#9063)
Changed command line parameters for ffmpeg when embedding subtitles.
Changed to ‘-map 0:v -c:v copy -map 0:a -c:a copy’
2016-04-24 00:08:02 +08:00
166 changed files with 5958 additions and 2769 deletions

View File

@@ -6,8 +6,8 @@
--- ---
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2016.07.03.1*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected. ### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2016.08.12*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2016.07.03.1** - [ ] I've **verified** and **I assure** that I'm running youtube-dl **2016.08.12**
### Before submitting an *issue* make sure you have: ### Before submitting an *issue* make sure you have:
- [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections - [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
@@ -35,7 +35,7 @@ $ youtube-dl -v <your command line>
[debug] User config: [] [debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj'] [debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251 [debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2016.07.03.1 [debug] youtube-dl version 2016.08.12
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2 [debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4 [debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {} [debug] Proxy map: {}

View File

@@ -7,9 +7,6 @@ python:
- "3.4" - "3.4"
- "3.5" - "3.5"
sudo: false sudo: false
install:
- bash ./devscripts/install_srelay.sh
- export PATH=$PATH:$(pwd)/tmp/srelay-0.4.8b6
script: nosetests test --verbose script: nosetests test --verbose
notifications: notifications:
email: email:

View File

@@ -177,3 +177,7 @@ Roman Tsiupa
Artur Krysiak Artur Krysiak
Jakub Adam Wieczorek Jakub Adam Wieczorek
Aleksandar Topuzović Aleksandar Topuzović
Nehal Patel
Rob van Bekkum
Petr Zvoníček
Pratyush Singh

View File

@@ -46,7 +46,7 @@ Make sure that someone has not already opened the issue you're trying to open. S
### Why are existing options not enough? ### Why are existing options not enough?
Before requesting a new feature, please have a quick peek at [the list of supported options](https://github.com/rg3/youtube-dl/blob/master/README.md#synopsis). Many feature requests are for features that actually exist already! Please, absolutely do show off your work in the issue report and detail how the existing similar options do *not* solve your problem. Before requesting a new feature, please have a quick peek at [the list of supported options](https://github.com/rg3/youtube-dl/blob/master/README.md#options). Many feature requests are for features that actually exist already! Please, absolutely do show off your work in the issue report and detail how the existing similar options do *not* solve your problem.
### Is there enough context in your bug report? ### Is there enough context in your bug report?

369
ChangeLog Normal file
View File

@@ -0,0 +1,369 @@
version 2016.08.12
Core
* Subtitles are now written as is. Newline conversions are disabled. (#10268)
+ Recognize more formats in unified_timestamp
Extractors
- [goldenmoustache] Remove extractor (#10298)
* [drtuber] Improve title extraction
* [drtuber] Make dislike count optional (#10297)
* [chirbit] Fix extraction (#10296)
* [francetvinfo] Relax URL regular expression
* [rtlnl] Relax URL regular expression (#10282)
* [formula1] Relax URL regular expression (#10283)
* [wat] Improve extraction (#10281)
* [ctsnews] Fix extraction
version 2016.08.10
Core
* Make --metadata-from-title non fatal when title does not match the pattern
* Introduce options for randomized sleep before each download
--min-sleep-interval and --max-sleep-interval (#9930)
* Respect default in _search_json_ld
Extractors
+ [uol] Add extractor for uol.com.br (#4263)
* [rbmaradio] Fix extraction and extract all formats (#10242)
+ [sonyliv] Add extractor for sonyliv.com (#10258)
* [aparat] Fix extraction
* [cwtv] Extract HTTP formats
+ [rozhlas] Add extractor for prehravac.rozhlas.cz (#10253)
* [kuwo:singer] Fix extraction
version 2016.08.07
Core
+ Add support for TV Parental Guidelines ratings in parse_age_limit
+ Add decode_png (#9706)
+ Add support for partOfTVSeries in JSON-LD
* Lower master M3U8 manifest preference for better format sorting
Extractors
+ [discoverygo] Add extractor (#10245)
* [flipagram] Make JSON-LD extraction non fatal
* [generic] Make JSON-LD extraction non fatal
+ [bbc] Add support for morph embeds (#10239)
* [tnaflixnetworkbase] Improve title extraction
* [tnaflix] Fix metadata extraction (#10249)
* [fox] Fix theplatform release URL query
* [openload] Fix extraction (#9706)
* [bbc] Skip duplicate manifest URLs
* [bbc] Improve format code
+ [bbc] Add support for DASH and F4M
* [bbc] Improve format sorting and listing
* [bbc] Improve playlist extraction
+ [pokemon] Add extractor (#10093)
+ [condenast] Add fallback scenario for video info extraction
version 2016.08.06
Core
* Add support for JSON-LD root list entries (#10203)
* Improve unified_timestamp
* Lower preference of RTSP formats in generic sorting
+ Add support for multiple properties in _og_search_property
* Improve password hiding from verbose output
Extractors
+ [adultswim] Add support for trailers (#10235)
* [archiveorg] Improve extraction (#10219)
+ [jwplatform] Add support for playlists
+ [jwplatform] Add support for relative URLs
* [jwplatform] Improve audio detection
+ [tvplay] Capture and output native error message
+ [tvplay] Extract series metadata
+ [tvplay] Add support for subtitles (#10194)
* [tvp] Improve extraction (#7799)
* [cbslocal] Fix timestamp parsing (#10213)
+ [naver] Add support for subtitles (#8096)
* [naver] Improve extraction
* [condenast] Improve extraction
* [engadget] Relax URL regular expression
* [5min] Fix extraction
+ [nationalgeographic] Add support for Episode Guide
+ [kaltura] Add support for subtitles
* [kaltura] Optimize network requests
+ [vodplatform] Add extractor for vod-platform.net
- [gamekings] Remove extractor
* [limelight] Extract HTTP formats
* [ntvru] Fix extraction
+ [comedycentral] Re-add :tds and :thedailyshow shortnames
version 2016.08.01
Fixed/improved extractors
- [yandexmusic:track] Adapt to changes in track location JSON (#10193)
- [bloomberg] Support another form of player (#10187)
- [limelight] Skip DRM protected videos
- [safari] Relax regular expressions for URL matching (#10202)
- [cwtv] Add support for cwtvpr.com (#10196)
version 2016.07.30
Fixed/improved extractors
- [twitch:clips] Sort formats
- [tv2] Use m3u8_native
- [tv2:article] Fix video detection (#10188)
- rtve (#10076)
- [dailymotion:playlist] Optimize download archive processing (#10180)
version 2016.07.28
Fixed/improved extractors
- shared (#10170)
- soundcloud (#10179)
- twitch (#9767)
version 2016.07.26.2
Fixed/improved extractors
- smotri
- camdemy
- mtv
- comedycentral
- cmt
- cbc
- mgtv
- orf
version 2016.07.24
New extractors
- arkena (#8682)
- lcp (#8682)
Fixed/improved extractors
- facebook (#10151)
- dailymail
- telegraaf
- dcn
- onet
- tvp
Miscellaneous
- Support $Time$ in DASH manifests
version 2016.07.22
New extractors
- odatv (#9285)
Fixed/improved extractors
- bbc
- youjizz (#10131)
- youtube (#10140)
- pornhub (#10138)
- eporner (#10139)
version 2016.07.17
New extractors
- nintendo (#9986)
- streamable (#9122)
Fixed/improved extractors
- ard (#10095)
- mtv
- comedycentral (#10101)
- viki (#10098)
- spike (#10106)
Miscellaneous
- Improved twitter player detection (#10090)
version 2016.07.16
New extractors
- ninenow (#5181)
Fixed/improved extractors
- rtve (#10076)
- brightcove
- 3qsdn
- syfy (#9087, #3820, #2388)
- youtube (#10083)
Miscellaneous
- Fix subtitle embedding for video-only and audio-only files (#10081)
version 2016.07.13
New extractors
- rudo
Fixed/improved extractors
- biobiochiletv
- tvplay
- dbtv
- brightcove
- tmz
- youtube (#10059)
- shahid (#10062)
- vk
- ellentv (#10067)
version 2016.07.11
New Extractors
- roosterteeth (#9864)
Fixed/improved extractors
- miomio (#9605)
- vuclip
- youtube
- vidzi (#10058)
version 2016.07.09.2
Fixed/improved extractors
- vimeo (#1638)
- facebook (#10048)
- lynda (#10047)
- animeondemand
Fixed/improved features
- Embedding subtitles no longer throws an error with problematic inputs (#9063)
version 2016.07.09.1
Fixed/improved extractors
- youtube
- ard
- srmediatek (#9373)
version 2016.07.09
New extractors
- Flipagram (#9898)
Fixed/improved extractors
- telecinco
- toutv
- radiocanada
- tweakers (#9516)
- lynda
- nick (#7542)
- polskieradio (#10028)
- le
- facebook (#9851)
- mgtv
- animeondemand (#10031)
Fixed/improved features
- `--postprocessor-args` and `--downloader-args` now accepts non-ASCII inputs
on non-Windows systems
version 2016.07.07
New extractors
- kamcord (#10001)
Fixed/improved extractors
- spiegel (#10018)
- metacafe (#8539, #3253)
- onet (#9950)
- francetv (#9955)
- brightcove (#9965)
- daum (#9972)
version 2016.07.06
Fixed/improved extractors
- youtube (#10007, #10009)
- xuite
- stitcher
- spiegel
- slideshare
- sandia
- rtvnh
- prosiebensat1
- onionstudios
version 2016.07.05
Fixed/improved extractors
- brightcove
- yahoo (#9995)
- pornhub (#9997)
- iqiyi
- kaltura (#5557)
- la7
- Changed features
- Rename --cn-verfication-proxy to --geo-verification-proxy
Miscellaneous
- Add script for displaying downloads statistics
version 2016.07.03.1
Fixed/improved extractors
- theplatform
- aenetworks
- nationalgeographic
- hrti (#9482)
- facebook (#5701)
- buzzfeed (#5701)
- rai (#8617, #9157, #9232, #8552, #8551)
- nationalgeographic (#9991)
- iqiyi
version 2016.07.03
New extractors
- hrti (#9482)
Fixed/improved extractors
- vk (#9981)
- facebook (#9938)
- xtube (#9953, #9961)
version 2016.07.02
New extractors
- fusion (#9958)
Fixed/improved extractors
- twitch (#9975)
- vine (#9970)
- periscope (#9967)
- pornhub (#8696)
version 2016.07.01
New extractors
- 9c9media
- ctvnews (#2156)
- ctv (#4077)
Fixed/Improved extractors
- rds
- meta (#8789)
- pornhub (#9964)
- sixplay (#2183)
New features
- Accept quoted strings across multiple lines (#9940)

View File

@@ -94,7 +94,7 @@ _EXTRACTOR_FILES != find youtube_dl/extractor -iname '*.py' -and -not -iname 'la
youtube_dl/extractor/lazy_extractors.py: devscripts/make_lazy_extractors.py devscripts/lazy_load_template.py $(_EXTRACTOR_FILES) youtube_dl/extractor/lazy_extractors.py: devscripts/make_lazy_extractors.py devscripts/lazy_load_template.py $(_EXTRACTOR_FILES)
$(PYTHON) devscripts/make_lazy_extractors.py $@ $(PYTHON) devscripts/make_lazy_extractors.py $@
youtube-dl.tar.gz: youtube-dl README.md README.txt youtube-dl.1 youtube-dl.bash-completion youtube-dl.zsh youtube-dl.fish youtube-dl.tar.gz: youtube-dl README.md README.txt youtube-dl.1 youtube-dl.bash-completion youtube-dl.zsh youtube-dl.fish ChangeLog
@tar -czf youtube-dl.tar.gz --transform "s|^|youtube-dl/|" --owner 0 --group 0 \ @tar -czf youtube-dl.tar.gz --transform "s|^|youtube-dl/|" --owner 0 --group 0 \
--exclude '*.DS_Store' \ --exclude '*.DS_Store' \
--exclude '*.kate-swp' \ --exclude '*.kate-swp' \
@@ -107,7 +107,7 @@ youtube-dl.tar.gz: youtube-dl README.md README.txt youtube-dl.1 youtube-dl.bash-
--exclude 'docs/_build' \ --exclude 'docs/_build' \
-- \ -- \
bin devscripts test youtube_dl docs \ bin devscripts test youtube_dl docs \
LICENSE README.md README.txt \ ChangeLog LICENSE README.md README.txt \
Makefile MANIFEST.in youtube-dl.1 youtube-dl.bash-completion \ Makefile MANIFEST.in youtube-dl.1 youtube-dl.bash-completion \
youtube-dl.zsh youtube-dl.fish setup.py \ youtube-dl.zsh youtube-dl.fish setup.py \
youtube-dl youtube-dl

View File

@@ -17,7 +17,7 @@ youtube-dl - download videos from youtube.com or other video platforms
To install it right away for all UNIX users (Linux, OS X, etc.), type: To install it right away for all UNIX users (Linux, OS X, etc.), type:
sudo curl -L https://yt-dl.org/latest/youtube-dl -o /usr/local/bin/youtube-dl sudo curl -L https://yt-dl.org/downloads/latest/youtube-dl -o /usr/local/bin/youtube-dl
sudo chmod a+rx /usr/local/bin/youtube-dl sudo chmod a+rx /usr/local/bin/youtube-dl
If you do not have curl, you can alternatively use a recent wget: If you do not have curl, you can alternatively use a recent wget:
@@ -103,9 +103,9 @@ which means you can modify it, redistribute it or use it however you like.
(experimental) (experimental)
-6, --force-ipv6 Make all connections via IPv6 -6, --force-ipv6 Make all connections via IPv6
(experimental) (experimental)
--cn-verification-proxy URL Use this proxy to verify the IP address for --geo-verification-proxy URL Use this proxy to verify the IP address for
some Chinese sites. The default proxy some geo-restricted sites. The default
specified by --proxy (or none, if the proxy specified by --proxy (or none, if the
options is not present) is used for the options is not present) is used for the
actual downloading. (experimental) actual downloading. (experimental)
@@ -330,7 +330,15 @@ which means you can modify it, redistribute it or use it however you like.
bidirectional text support. Requires bidiv bidirectional text support. Requires bidiv
or fribidi executable in PATH or fribidi executable in PATH
--sleep-interval SECONDS Number of seconds to sleep before each --sleep-interval SECONDS Number of seconds to sleep before each
download. download when used alone or a lower bound
of a range for randomized sleep before each
download (minimum possible number of
seconds to sleep) when used along with
--max-sleep-interval.
--max-sleep-interval SECONDS Upper bound of a range for randomized sleep
before each download (maximum possible
number of seconds to sleep). Must only be
used along with --min-sleep-interval.
## Video Format Options: ## Video Format Options:
-f, --format FORMAT Video format code, see the "FORMAT -f, --format FORMAT Video format code, see the "FORMAT
@@ -424,7 +432,7 @@ which means you can modify it, redistribute it or use it however you like.
# CONFIGURATION # CONFIGURATION
You can configure youtube-dl by placing any supported command line option to a configuration file. On Linux and OS X, the system wide configuration file is located at `/etc/youtube-dl.conf` and the user wide configuration file at `~/.config/youtube-dl/config`. On Windows, the user wide configuration file locations are `%APPDATA%\youtube-dl\config.txt` or `C:\Users\<user name>\youtube-dl.conf`. You can configure youtube-dl by placing any supported command line option to a configuration file. On Linux and OS X, the system wide configuration file is located at `/etc/youtube-dl.conf` and the user wide configuration file at `~/.config/youtube-dl/config`. On Windows, the user wide configuration file locations are `%APPDATA%\youtube-dl\config.txt` or `C:\Users\<user name>\youtube-dl.conf`. Note that by default configuration file may not exist so you may need to create it yourself.
For example, with the following configuration file youtube-dl will always extract the audio, not copy the mtime, use a proxy and save all videos under `Movies` directory in your home directory: For example, with the following configuration file youtube-dl will always extract the audio, not copy the mtime, use a proxy and save all videos under `Movies` directory in your home directory:
``` ```
@@ -432,6 +440,7 @@ For example, with the following configuration file youtube-dl will always extrac
--no-mtime --no-mtime
--proxy 127.0.0.1:3128 --proxy 127.0.0.1:3128
-o ~/Movies/%(title)s.%(ext)s -o ~/Movies/%(title)s.%(ext)s
# Lines starting with # are comments
``` ```
Note that options in configuration file are just the same options aka switches used in regular command line calls thus there **must be no whitespace** after `-` or `--`, e.g. `-o` or `--proxy` but not `- o` or `-- proxy`. Note that options in configuration file are just the same options aka switches used in regular command line calls thus there **must be no whitespace** after `-` or `--`, e.g. `-o` or `--proxy` but not `- o` or `-- proxy`.
@@ -1195,7 +1204,7 @@ Make sure that someone has not already opened the issue you're trying to open. S
### Why are existing options not enough? ### Why are existing options not enough?
Before requesting a new feature, please have a quick peek at [the list of supported options](https://github.com/rg3/youtube-dl/blob/master/README.md#synopsis). Many feature requests are for features that actually exist already! Please, absolutely do show off your work in the issue report and detail how the existing similar options do *not* solve your problem. Before requesting a new feature, please have a quick peek at [the list of supported options](https://github.com/rg3/youtube-dl/blob/master/README.md#options). Many feature requests are for features that actually exist already! Please, absolutely do show off your work in the issue report and detail how the existing similar options do *not* solve your problem.
### Is there enough context in your bug report? ### Is there enough context in your bug report?

View File

@@ -15,13 +15,9 @@ data = urllib.request.urlopen(URL).read()
with open('download.html.in', 'r', encoding='utf-8') as tmplf: with open('download.html.in', 'r', encoding='utf-8') as tmplf:
template = tmplf.read() template = tmplf.read()
md5sum = hashlib.md5(data).hexdigest()
sha1sum = hashlib.sha1(data).hexdigest()
sha256sum = hashlib.sha256(data).hexdigest() sha256sum = hashlib.sha256(data).hexdigest()
template = template.replace('@PROGRAM_VERSION@', version) template = template.replace('@PROGRAM_VERSION@', version)
template = template.replace('@PROGRAM_URL@', URL) template = template.replace('@PROGRAM_URL@', URL)
template = template.replace('@PROGRAM_MD5SUM@', md5sum)
template = template.replace('@PROGRAM_SHA1SUM@', sha1sum)
template = template.replace('@PROGRAM_SHA256SUM@', sha256sum) template = template.replace('@PROGRAM_SHA256SUM@', sha256sum)
template = template.replace('@EXE_URL@', versions_info['versions'][version]['exe'][0]) template = template.replace('@EXE_URL@', versions_info['versions'][version]['exe'][0])
template = template.replace('@EXE_SHA256SUM@', versions_info['versions'][version]['exe'][1]) template = template.replace('@EXE_SHA256SUM@', versions_info['versions'][version]['exe'][1])

View File

@@ -1,8 +0,0 @@
#!/bin/bash
mkdir -p tmp && cd tmp
wget -N http://downloads.sourceforge.net/project/socks-relay/socks-relay/srelay-0.4.8/srelay-0.4.8b6.tar.gz
tar zxvf srelay-0.4.8b6.tar.gz
cd srelay-0.4.8b6
./configure
make

View File

@@ -54,17 +54,21 @@ def filter_options(readme):
if in_options: if in_options:
if line.lstrip().startswith('-'): if line.lstrip().startswith('-'):
option, description = re.split(r'\s{2,}', line.lstrip()) split = re.split(r'\s{2,}', line.lstrip())
split_option = option.split(' ') # Description string may start with `-` as well. If there is
# only one piece then it's a description bit not an option.
if len(split) > 1:
option, description = split
split_option = option.split(' ')
if not split_option[-1].startswith('-'): # metavar if not split_option[-1].startswith('-'): # metavar
option = ' '.join(split_option[:-1] + ['*%s*' % split_option[-1]]) option = ' '.join(split_option[:-1] + ['*%s*' % split_option[-1]])
# Pandoc's definition_lists. See http://pandoc.org/README.html # Pandoc's definition_lists. See http://pandoc.org/README.html
# for more information. # for more information.
ret += '\n%s\n: %s\n' % (option, description) ret += '\n%s\n: %s\n' % (option, description)
else: continue
ret += line.lstrip() + '\n' ret += line.lstrip() + '\n'
else: else:
ret += line + '\n' ret += line + '\n'

View File

@@ -71,9 +71,12 @@ fi
/bin/echo -e "\n### Changing version in version.py..." /bin/echo -e "\n### Changing version in version.py..."
sed -i "s/__version__ = '.*'/__version__ = '$version'/" youtube_dl/version.py sed -i "s/__version__ = '.*'/__version__ = '$version'/" youtube_dl/version.py
/bin/echo -e "\n### Changing version in ChangeLog..."
sed -i "s/<unreleased>/$version/" ChangeLog
/bin/echo -e "\n### Committing documentation, templates and youtube_dl/version.py..." /bin/echo -e "\n### Committing documentation, templates and youtube_dl/version.py..."
make README.md CONTRIBUTING.md .github/ISSUE_TEMPLATE.md supportedsites make README.md CONTRIBUTING.md .github/ISSUE_TEMPLATE.md supportedsites
git add README.md CONTRIBUTING.md .github/ISSUE_TEMPLATE.md docs/supportedsites.md youtube_dl/version.py git add README.md CONTRIBUTING.md .github/ISSUE_TEMPLATE.md docs/supportedsites.md youtube_dl/version.py ChangeLog
git commit $gpg_sign_commits -m "release $version" git commit $gpg_sign_commits -m "release $version"
/bin/echo -e "\n### Now tagging, signing and pushing..." /bin/echo -e "\n### Now tagging, signing and pushing..."

View File

@@ -0,0 +1,47 @@
#!/usr/bin/env python
from __future__ import unicode_literals
import itertools
import json
import os
import re
import sys
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from youtube_dl.compat import (
compat_print,
compat_urllib_request,
)
from youtube_dl.utils import format_bytes
def format_size(bytes):
return '%s (%d bytes)' % (format_bytes(bytes), bytes)
total_bytes = 0
for page in itertools.count(1):
releases = json.loads(compat_urllib_request.urlopen(
'https://api.github.com/repos/rg3/youtube-dl/releases?page=%s' % page
).read().decode('utf-8'))
if not releases:
break
for release in releases:
compat_print(release['name'])
for asset in release['assets']:
asset_name = asset['name']
total_bytes += asset['download_count'] * asset['size']
if all(not re.match(p, asset_name) for p in (
r'^youtube-dl$',
r'^youtube-dl-\d{4}\.\d{2}\.\d{2}(?:\.\d+)?\.tar\.gz$',
r'^youtube-dl\.exe$')):
continue
compat_print(
' %s size: %s downloads: %d'
% (asset_name, format_size(asset['size']), asset['download_count']))
compat_print('total downloads traffic: %s' % format_size(total_bytes))

View File

@@ -14,6 +14,7 @@
- **8tracks** - **8tracks**
- **91porn** - **91porn**
- **9gag** - **9gag**
- **9now.com.au**
- **abc.net.au** - **abc.net.au**
- **Abc7News** - **Abc7News**
- **abcnews** - **abcnews**
@@ -45,6 +46,7 @@
- **archive.org**: archive.org videos - **archive.org**: archive.org videos
- **ARD** - **ARD**
- **ARD:mediathek** - **ARD:mediathek**
- **Arkena**
- **arte.tv** - **arte.tv**
- **arte.tv:+7** - **arte.tv:+7**
- **arte.tv:cinema** - **arte.tv:cinema**
@@ -140,7 +142,8 @@
- **CollegeRama** - **CollegeRama**
- **ComCarCoff** - **ComCarCoff**
- **ComedyCentral** - **ComedyCentral**
- **ComedyCentralShows**: The Daily Show / The Colbert Report - **ComedyCentralShortname**
- **ComedyCentralTV**
- **CondeNast**: Condé Nast media group: Allure, Architectural Digest, Ars Technica, Bon Appétit, Brides, Condé Nast, Condé Nast Traveler, Details, Epicurious, GQ, Glamour, Golf Digest, SELF, Teen Vogue, The New Yorker, Vanity Fair, Vogue, W Magazine, WIRED - **CondeNast**: Condé Nast media group: Allure, Architectural Digest, Ars Technica, Bon Appétit, Brides, Condé Nast, Condé Nast Traveler, Details, Epicurious, GQ, Glamour, Golf Digest, SELF, Teen Vogue, The New Yorker, Vanity Fair, Vogue, W Magazine, WIRED
- **Coub** - **Coub**
- **Cracked** - **Cracked**
@@ -179,6 +182,7 @@
- **DigitallySpeaking** - **DigitallySpeaking**
- **Digiteka** - **Digiteka**
- **Discovery** - **Discovery**
- **DiscoveryGo**
- **Dotsub** - **Dotsub**
- **DouyuTV**: 斗鱼 - **DouyuTV**: 斗鱼
- **DPlay** - **DPlay**
@@ -224,6 +228,7 @@
- **Firstpost** - **Firstpost**
- **FiveTV** - **FiveTV**
- **Flickr** - **Flickr**
- **Flipagram**
- **Folketinget**: Folketinget (ft.dk; Danish parliament) - **Folketinget**: Folketinget (ft.dk; Danish parliament)
- **FootyRoom** - **FootyRoom**
- **Formula1** - **Formula1**
@@ -244,7 +249,6 @@
- **FunnyOrDie** - **FunnyOrDie**
- **Fusion** - **Fusion**
- **GameInformer** - **GameInformer**
- **Gamekings**
- **GameOne** - **GameOne**
- **gameone:playlist** - **gameone:playlist**
- **Gamersyde** - **Gamersyde**
@@ -261,7 +265,6 @@
- **GloboArticle** - **GloboArticle**
- **GodTube** - **GodTube**
- **GodTV** - **GodTV**
- **GoldenMoustache**
- **Golem** - **Golem**
- **GoogleDrive** - **GoogleDrive**
- **Goshgay** - **Goshgay**
@@ -312,6 +315,7 @@
- **jpopsuki.tv** - **jpopsuki.tv**
- **JWPlatform** - **JWPlatform**
- **Kaltura** - **Kaltura**
- **Kamcord**
- **KanalPlay**: Kanal 5/9/11 Play - **KanalPlay**: Kanal 5/9/11 Play
- **Kankan** - **Kankan**
- **Karaoketv** - **Karaoketv**
@@ -333,6 +337,8 @@
- **kuwo:song**: 酷我音乐 - **kuwo:song**: 酷我音乐
- **la7.it** - **la7.it**
- **Laola1Tv** - **Laola1Tv**
- **Lcp**
- **LcpPlay**
- **Le**: 乐视网 - **Le**: 乐视网
- **Learnr** - **Learnr**
- **Lecture2Go** - **Lecture2Go**
@@ -394,7 +400,6 @@
- **MSN** - **MSN**
- **MTV** - **MTV**
- **mtv.de** - **mtv.de**
- **mtviggy.com**
- **mtvservices:embedded** - **mtvservices:embedded**
- **MuenchenTV**: münchen.tv - **MuenchenTV**: münchen.tv
- **MusicPlayOn** - **MusicPlayOn**
@@ -410,7 +415,8 @@
- **MyVidster** - **MyVidster**
- **n-tv.de** - **n-tv.de**
- **natgeo** - **natgeo**
- **natgeo:channel** - **natgeo:episodeguide**
- **natgeo:video**
- **Naver** - **Naver**
- **NBA** - **NBA**
- **NBC** - **NBC**
@@ -434,7 +440,6 @@
- **Newstube** - **Newstube**
- **NextMedia**: 蘋果日報 - **NextMedia**: 蘋果日報
- **NextMediaActionNews**: 蘋果日報 - 動新聞 - **NextMediaActionNews**: 蘋果日報 - 動新聞
- **nextmovie.com**
- **nfb**: National Film Board of Canada - **nfb**: National Film Board of Canada
- **nfl.com** - **nfl.com**
- **nhl.com** - **nhl.com**
@@ -446,6 +451,7 @@
- **niconico**: ニコニコ動画 - **niconico**: ニコニコ動画
- **NiconicoPlaylist** - **NiconicoPlaylist**
- **NineCNineMedia** - **NineCNineMedia**
- **Nintendo**
- **njoy**: N-JOY - **njoy**: N-JOY
- **njoy:embed** - **njoy:embed**
- **Noco** - **Noco**
@@ -473,9 +479,12 @@
- **NYTimes** - **NYTimes**
- **NYTimesArticle** - **NYTimesArticle**
- **ocw.mit.edu** - **ocw.mit.edu**
- **OdaTV**
- **Odnoklassniki** - **Odnoklassniki**
- **OktoberfestTV** - **OktoberfestTV**
- **on.aol.com** - **on.aol.com**
- **onet.tv**
- **onet.tv:channel**
- **OnionStudios** - **OnionStudios**
- **Ooyala** - **Ooyala**
- **OoyalaExternal** - **OoyalaExternal**
@@ -509,6 +518,7 @@
- **plus.google**: Google Plus - **plus.google**: Google Plus
- **pluzz.francetv.fr** - **pluzz.francetv.fr**
- **podomatic** - **podomatic**
- **Pokemon**
- **PolskieRadio** - **PolskieRadio**
- **PornHd** - **PornHd**
- **PornHub**: PornHub and Thumbzilla - **PornHub**: PornHub and Thumbzilla
@@ -550,8 +560,10 @@
- **RICE** - **RICE**
- **RingTV** - **RingTV**
- **RockstarGames** - **RockstarGames**
- **RoosterTeeth**
- **RottenTomatoes** - **RottenTomatoes**
- **Roxwel** - **Roxwel**
- **Rozhlas**
- **RTBF** - **RTBF**
- **rte**: Raidió Teilifís Éireann TV - **rte**: Raidió Teilifís Éireann TV
- **rte:radio**: Raidió Teilifís Éireann radio - **rte:radio**: Raidió Teilifís Éireann radio
@@ -562,7 +574,9 @@
- **rtve.es:alacarta**: RTVE a la carta - **rtve.es:alacarta**: RTVE a la carta
- **rtve.es:infantil**: RTVE infantil - **rtve.es:infantil**: RTVE infantil
- **rtve.es:live**: RTVE.es live streams - **rtve.es:live**: RTVE.es live streams
- **rtve.es:television**
- **RTVNH** - **RTVNH**
- **Rudo**
- **RUHD** - **RUHD**
- **RulePorn** - **RulePorn**
- **rutube**: Rutube videos - **rutube**: Rutube videos
@@ -607,6 +621,7 @@
- **smotri:user**: Smotri.com user videos - **smotri:user**: Smotri.com user videos
- **Snotr** - **Snotr**
- **Sohu** - **Sohu**
- **SonyLIV**
- **soundcloud** - **soundcloud**
- **soundcloud:playlist** - **soundcloud:playlist**
- **soundcloud:search**: Soundcloud search - **soundcloud:search**: Soundcloud search
@@ -637,6 +652,7 @@
- **stanfordoc**: Stanford Open ClassRoom - **stanfordoc**: Stanford Open ClassRoom
- **Steam** - **Steam**
- **Stitcher** - **Stitcher**
- **Streamable**
- **streamcloud.eu** - **streamcloud.eu**
- **StreamCZ** - **StreamCZ**
- **StreetVoice** - **StreetVoice**
@@ -684,6 +700,7 @@
- **TNAFlix** - **TNAFlix**
- **TNAFlixNetworkEmbed** - **TNAFlixNetworkEmbed**
- **toggle** - **toggle**
- **Tosh**: Tosh.0
- **tou.tv** - **tou.tv**
- **Toypics**: Toypics user profile - **Toypics**: Toypics user profile
- **ToypicsUser**: Toypics user profile - **ToypicsUser**: Toypics user profile
@@ -713,6 +730,7 @@
- **tvigle**: Интернет-телевидение Tvigle.ru - **tvigle**: Интернет-телевидение Tvigle.ru
- **tvland.com** - **tvland.com**
- **tvp**: Telewizja Polska - **tvp**: Telewizja Polska
- **tvp:embed**: Telewizja Polska
- **tvp:series** - **tvp:series**
- **TVPlay**: TV3Play and related services - **TVPlay**: TV3Play and related services
- **Tweakers** - **Tweakers**
@@ -730,6 +748,7 @@
- **udemy:course** - **udemy:course**
- **UDNEmbed**: 聯合影音 - **UDNEmbed**: 聯合影音
- **Unistra** - **Unistra**
- **uol.com.br**
- **Urort**: NRK P3 Urørt - **Urort**: NRK P3 Urørt
- **URPlay** - **URPlay**
- **USAToday** - **USAToday**
@@ -789,8 +808,10 @@
- **vine:user** - **vine:user**
- **vk**: VK - **vk**: VK
- **vk:uservideos**: VK - User's Videos - **vk:uservideos**: VK - User's Videos
- **vk:wallpost**
- **vlive** - **vlive**
- **Vodlocker** - **Vodlocker**
- **VODPlatform**
- **VoiceRepublic** - **VoiceRepublic**
- **VoxMedia** - **VoxMedia**
- **Vporn** - **Vporn**
@@ -857,6 +878,7 @@
- **youtube:search**: YouTube.com searches - **youtube:search**: YouTube.com searches
- **youtube:search:date**: YouTube.com searches, newest videos first - **youtube:search:date**: YouTube.com searches, newest videos first
- **youtube:search_url**: YouTube.com search URLs - **youtube:search_url**: YouTube.com search URLs
- **youtube:shared**
- **youtube:show**: YouTube.com (multi-season) shows - **youtube:show**: YouTube.com (multi-season) shows
- **youtube:subscriptions**: YouTube.com subscriptions feed, "ytsubs" keyword (requires authentication) - **youtube:subscriptions**: YouTube.com subscriptions feed, "ytsubs" keyword (requires authentication)
- **youtube:user**: YouTube.com user videos (URL or "ytuser" keyword) - **youtube:user**: YouTube.com user videos (URL or "ytuser" keyword)

View File

@@ -48,6 +48,9 @@ class TestInfoExtractor(unittest.TestCase):
self.assertEqual(ie._og_search_property('foobar', html), 'Foo') self.assertEqual(ie._og_search_property('foobar', html), 'Foo')
self.assertEqual(ie._og_search_property('test1', html), 'foo > < bar') self.assertEqual(ie._og_search_property('test1', html), 'foo > < bar')
self.assertEqual(ie._og_search_property('test2', html), 'foo >//< bar') self.assertEqual(ie._og_search_property('test2', html), 'foo >//< bar')
self.assertEqual(ie._og_search_property(('test0', 'test1'), html), 'foo > < bar')
self.assertRaises(RegexNotFoundError, ie._og_search_property, 'test0', html, None, fatal=True)
self.assertRaises(RegexNotFoundError, ie._og_search_property, ('test0', 'test00'), html, None, fatal=True)
def test_html_search_meta(self): def test_html_search_meta(self):
ie = self.ie ie = self.ie

View File

@@ -335,6 +335,40 @@ class TestFormatSelection(unittest.TestCase):
downloaded = ydl.downloaded_info_dicts[0] downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], f1['format_id']) self.assertEqual(downloaded['format_id'], f1['format_id'])
def test_audio_only_extractor_format_selection(self):
# For extractors with incomplete formats (all formats are audio-only or
# video-only) best and worst should fallback to corresponding best/worst
# video-only or audio-only formats (as per
# https://github.com/rg3/youtube-dl/pull/5556)
formats = [
{'format_id': 'low', 'ext': 'mp3', 'preference': 1, 'vcodec': 'none', 'url': TEST_URL},
{'format_id': 'high', 'ext': 'mp3', 'preference': 2, 'vcodec': 'none', 'url': TEST_URL},
]
info_dict = _make_result(formats)
ydl = YDL({'format': 'best'})
ydl.process_ie_result(info_dict.copy())
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], 'high')
ydl = YDL({'format': 'worst'})
ydl.process_ie_result(info_dict.copy())
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], 'low')
def test_format_not_available(self):
formats = [
{'format_id': 'regular', 'ext': 'mp4', 'height': 360, 'url': TEST_URL},
{'format_id': 'video', 'ext': 'mp4', 'height': 720, 'acodec': 'none', 'url': TEST_URL},
]
info_dict = _make_result(formats)
# This must fail since complete video-audio format does not match filter
# and extractor does not provide incomplete only formats (i.e. only
# video-only or audio-only).
ydl = YDL({'format': 'best[height>360]'})
self.assertRaises(ExtractorError, ydl.process_ie_result, info_dict.copy())
def test_invalid_format_specs(self): def test_invalid_format_specs(self):
def assert_syntax_error(format_spec): def assert_syntax_error(format_spec):
ydl = YDL({'format': format_spec}) ydl = YDL({'format': format_spec})

View File

@@ -101,8 +101,6 @@ class TestAllURLsMatching(unittest.TestCase):
self.assertMatch(':ytsubs', ['youtube:subscriptions']) self.assertMatch(':ytsubs', ['youtube:subscriptions'])
self.assertMatch(':ytsubscriptions', ['youtube:subscriptions']) self.assertMatch(':ytsubscriptions', ['youtube:subscriptions'])
self.assertMatch(':ythistory', ['youtube:history']) self.assertMatch(':ythistory', ['youtube:history'])
self.assertMatch(':thedailyshow', ['ComedyCentralShows'])
self.assertMatch(':tds', ['ComedyCentralShows'])
def test_vimeo_matching(self): def test_vimeo_matching(self):
self.assertMatch('https://vimeo.com/channels/tributes', ['vimeo:channel']) self.assertMatch('https://vimeo.com/channels/tributes', ['vimeo:channel'])

View File

@@ -88,6 +88,7 @@ class TestCompat(unittest.TestCase):
def test_compat_shlex_split(self): def test_compat_shlex_split(self):
self.assertEqual(compat_shlex_split('-option "one two"'), ['-option', 'one two']) self.assertEqual(compat_shlex_split('-option "one two"'), ['-option', 'one two'])
self.assertEqual(compat_shlex_split('-option "one\ntwo" \n -flag'), ['-option', 'one\ntwo', '-flag']) self.assertEqual(compat_shlex_split('-option "one\ntwo" \n -flag'), ['-option', 'one\ntwo', '-flag'])
self.assertEqual(compat_shlex_split('-val 中文'), ['-val', '中文'])
def test_compat_etree_fromstring(self): def test_compat_etree_fromstring(self):
xml = ''' xml = '''

View File

@@ -138,27 +138,27 @@ class TestProxy(unittest.TestCase):
self.proxy_thread.daemon = True self.proxy_thread.daemon = True
self.proxy_thread.start() self.proxy_thread.start()
self.cn_proxy = compat_http_server.HTTPServer( self.geo_proxy = compat_http_server.HTTPServer(
('localhost', 0), _build_proxy_handler('cn')) ('localhost', 0), _build_proxy_handler('geo'))
self.cn_port = http_server_port(self.cn_proxy) self.geo_port = http_server_port(self.geo_proxy)
self.cn_proxy_thread = threading.Thread(target=self.cn_proxy.serve_forever) self.geo_proxy_thread = threading.Thread(target=self.geo_proxy.serve_forever)
self.cn_proxy_thread.daemon = True self.geo_proxy_thread.daemon = True
self.cn_proxy_thread.start() self.geo_proxy_thread.start()
def test_proxy(self): def test_proxy(self):
cn_proxy = 'localhost:{0}'.format(self.cn_port) geo_proxy = 'localhost:{0}'.format(self.geo_port)
ydl = YoutubeDL({ ydl = YoutubeDL({
'proxy': 'localhost:{0}'.format(self.port), 'proxy': 'localhost:{0}'.format(self.port),
'cn_verification_proxy': cn_proxy, 'geo_verification_proxy': geo_proxy,
}) })
url = 'http://foo.com/bar' url = 'http://foo.com/bar'
response = ydl.urlopen(url).read().decode('utf-8') response = ydl.urlopen(url).read().decode('utf-8')
self.assertEqual(response, 'normal: {0}'.format(url)) self.assertEqual(response, 'normal: {0}'.format(url))
req = compat_urllib_request.Request(url) req = compat_urllib_request.Request(url)
req.add_header('Ytdl-request-proxy', cn_proxy) req.add_header('Ytdl-request-proxy', geo_proxy)
response = ydl.urlopen(req).read().decode('utf-8') response = ydl.urlopen(req).read().decode('utf-8')
self.assertEqual(response, 'cn: {0}'.format(url)) self.assertEqual(response, 'geo: {0}'.format(url))
def test_proxy_with_idn(self): def test_proxy_with_idn(self):
ydl = YoutubeDL({ ydl = YoutubeDL({

View File

@@ -33,6 +33,7 @@ from youtube_dl.utils import (
ExtractorError, ExtractorError,
find_xpath_attr, find_xpath_attr,
fix_xml_ampersands, fix_xml_ampersands,
get_element_by_class,
InAdvancePagedList, InAdvancePagedList,
intlist_to_bytes, intlist_to_bytes,
is_html, is_html,
@@ -41,6 +42,7 @@ from youtube_dl.utils import (
ohdave_rsa_encrypt, ohdave_rsa_encrypt,
OnDemandPagedList, OnDemandPagedList,
orderedSet, orderedSet,
parse_age_limit,
parse_duration, parse_duration,
parse_filesize, parse_filesize,
parse_count, parse_count,
@@ -80,6 +82,7 @@ from youtube_dl.utils import (
cli_option, cli_option,
cli_valueless_option, cli_valueless_option,
cli_bool_option, cli_bool_option,
parse_codecs,
) )
from youtube_dl.compat import ( from youtube_dl.compat import (
compat_chr, compat_chr,
@@ -306,6 +309,7 @@ class TestUtil(unittest.TestCase):
self.assertEqual(unified_timestamp('25-09-2014'), 1411603200) self.assertEqual(unified_timestamp('25-09-2014'), 1411603200)
self.assertEqual(unified_timestamp('27.02.2016 17:30'), 1456594200) self.assertEqual(unified_timestamp('27.02.2016 17:30'), 1456594200)
self.assertEqual(unified_timestamp('UNKNOWN DATE FORMAT'), None) self.assertEqual(unified_timestamp('UNKNOWN DATE FORMAT'), None)
self.assertEqual(unified_timestamp('May 16, 2016 11:15 PM'), 1463440500)
def test_determine_ext(self): def test_determine_ext(self):
self.assertEqual(determine_ext('http://example.com/foo/bar.mp4/?download'), 'mp4') self.assertEqual(determine_ext('http://example.com/foo/bar.mp4/?download'), 'mp4')
@@ -405,6 +409,12 @@ class TestUtil(unittest.TestCase):
self.assertEqual(res_url, url) self.assertEqual(res_url, url)
self.assertEqual(res_data, None) self.assertEqual(res_data, None)
smug_url = smuggle_url(url, {'a': 'b'})
smug_smug_url = smuggle_url(smug_url, {'c': 'd'})
res_url, res_data = unsmuggle_url(smug_smug_url)
self.assertEqual(res_url, url)
self.assertEqual(res_data, {'a': 'b', 'c': 'd'})
def test_shell_quote(self): def test_shell_quote(self):
args = ['ffmpeg', '-i', encodeFilename('ñ€ß\'.mp4')] args = ['ffmpeg', '-i', encodeFilename('ñ€ß\'.mp4')]
self.assertEqual(shell_quote(args), """ffmpeg -i 'ñ€ß'"'"'.mp4'""") self.assertEqual(shell_quote(args), """ffmpeg -i 'ñ€ß'"'"'.mp4'""")
@@ -423,6 +433,20 @@ class TestUtil(unittest.TestCase):
url_basename('http://media.w3.org/2010/05/sintel/trailer.mp4'), url_basename('http://media.w3.org/2010/05/sintel/trailer.mp4'),
'trailer.mp4') 'trailer.mp4')
def test_parse_age_limit(self):
self.assertEqual(parse_age_limit(None), None)
self.assertEqual(parse_age_limit(False), None)
self.assertEqual(parse_age_limit('invalid'), None)
self.assertEqual(parse_age_limit(0), 0)
self.assertEqual(parse_age_limit(18), 18)
self.assertEqual(parse_age_limit(21), 21)
self.assertEqual(parse_age_limit(22), None)
self.assertEqual(parse_age_limit('18'), 18)
self.assertEqual(parse_age_limit('18+'), 18)
self.assertEqual(parse_age_limit('PG-13'), 13)
self.assertEqual(parse_age_limit('TV-14'), 14)
self.assertEqual(parse_age_limit('TV-MA'), 17)
def test_parse_duration(self): def test_parse_duration(self):
self.assertEqual(parse_duration(None), None) self.assertEqual(parse_duration(None), None)
self.assertEqual(parse_duration(False), None) self.assertEqual(parse_duration(False), None)
@@ -601,6 +625,29 @@ class TestUtil(unittest.TestCase):
limit_length('foo bar baz asd', 12).startswith('foo bar')) limit_length('foo bar baz asd', 12).startswith('foo bar'))
self.assertTrue('...' in limit_length('foo bar baz asd', 12)) self.assertTrue('...' in limit_length('foo bar baz asd', 12))
def test_parse_codecs(self):
self.assertEqual(parse_codecs(''), {})
self.assertEqual(parse_codecs('avc1.77.30, mp4a.40.2'), {
'vcodec': 'avc1.77.30',
'acodec': 'mp4a.40.2',
})
self.assertEqual(parse_codecs('mp4a.40.2'), {
'vcodec': 'none',
'acodec': 'mp4a.40.2',
})
self.assertEqual(parse_codecs('mp4a.40.5,avc1.42001e'), {
'vcodec': 'avc1.42001e',
'acodec': 'mp4a.40.5',
})
self.assertEqual(parse_codecs('avc3.640028'), {
'vcodec': 'avc3.640028',
'acodec': 'none',
})
self.assertEqual(parse_codecs(', h264,,newcodec,aac'), {
'vcodec': 'h264',
'acodec': 'aac',
})
def test_escape_rfc3986(self): def test_escape_rfc3986(self):
reserved = "!*'();:@&=+$,/?#[]" reserved = "!*'();:@&=+$,/?#[]"
unreserved = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_.~' unreserved = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_.~'
@@ -985,5 +1032,13 @@ The first line
self.assertEqual(urshift(3, 1), 1) self.assertEqual(urshift(3, 1), 1)
self.assertEqual(urshift(-3, 1), 2147483646) self.assertEqual(urshift(-3, 1), 2147483646)
def test_get_element_by_class(self):
html = '''
<span class="foo bar">nice</span>
'''
self.assertEqual(get_element_by_class('foo', html), 'nice')
self.assertEqual(get_element_by_class('no-such-class', html), None)
if __name__ == '__main__': if __name__ == '__main__':
unittest.main() unittest.main()

View File

@@ -0,0 +1,70 @@
#!/usr/bin/env python
# coding: utf-8
from __future__ import unicode_literals
import unittest
import sys
import os
import subprocess
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
rootDir = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
class TestVerboseOutput(unittest.TestCase):
def test_private_info_arg(self):
outp = subprocess.Popen(
[
sys.executable, 'youtube_dl/__main__.py', '-v',
'--username', 'johnsmith@gmail.com',
'--password', 'secret',
], cwd=rootDir, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
sout, serr = outp.communicate()
self.assertTrue('--username' in serr)
self.assertTrue('johnsmith' not in serr)
self.assertTrue('--password' in serr)
self.assertTrue('secret' not in serr)
def test_private_info_shortarg(self):
outp = subprocess.Popen(
[
sys.executable, 'youtube_dl/__main__.py', '-v',
'-u', 'johnsmith@gmail.com',
'-p', 'secret',
], cwd=rootDir, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
sout, serr = outp.communicate()
self.assertTrue('-u' in serr)
self.assertTrue('johnsmith' not in serr)
self.assertTrue('-p' in serr)
self.assertTrue('secret' not in serr)
def test_private_info_eq(self):
outp = subprocess.Popen(
[
sys.executable, 'youtube_dl/__main__.py', '-v',
'--username=johnsmith@gmail.com',
'--password=secret',
], cwd=rootDir, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
sout, serr = outp.communicate()
self.assertTrue('--username' in serr)
self.assertTrue('johnsmith' not in serr)
self.assertTrue('--password' in serr)
self.assertTrue('secret' not in serr)
def test_private_info_shortarg_eq(self):
outp = subprocess.Popen(
[
sys.executable, 'youtube_dl/__main__.py', '-v',
'-u=johnsmith@gmail.com',
'-p=secret',
], cwd=rootDir, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
sout, serr = outp.communicate()
self.assertTrue('-u' in serr)
self.assertTrue('johnsmith' not in serr)
self.assertTrue('-p' in serr)
self.assertTrue('secret' not in serr)
if __name__ == '__main__':
unittest.main()

View File

@@ -5,6 +5,7 @@ from __future__ import absolute_import, unicode_literals
import collections import collections
import contextlib import contextlib
import copy
import datetime import datetime
import errno import errno
import fileinput import fileinput
@@ -196,8 +197,8 @@ class YoutubeDL(object):
prefer_insecure: Use HTTP instead of HTTPS to retrieve information. prefer_insecure: Use HTTP instead of HTTPS to retrieve information.
At the moment, this is only supported by YouTube. At the moment, this is only supported by YouTube.
proxy: URL of the proxy server to use proxy: URL of the proxy server to use
cn_verification_proxy: URL of the proxy to use for IP address verification geo_verification_proxy: URL of the proxy to use for IP address verification
on Chinese sites. (Experimental) on geo-restricted sites. (Experimental)
socket_timeout: Time to wait for unresponsive hosts, in seconds socket_timeout: Time to wait for unresponsive hosts, in seconds
bidi_workaround: Work around buggy terminals without bidirectional text bidi_workaround: Work around buggy terminals without bidirectional text
support, using fridibi support, using fridibi
@@ -248,7 +249,16 @@ class YoutubeDL(object):
source_address: (Experimental) Client-side IP address to bind to. source_address: (Experimental) Client-side IP address to bind to.
call_home: Boolean, true iff we are allowed to contact the call_home: Boolean, true iff we are allowed to contact the
youtube-dl servers for debugging. youtube-dl servers for debugging.
sleep_interval: Number of seconds to sleep before each download. sleep_interval: Number of seconds to sleep before each download when
used alone or a lower bound of a range for randomized
sleep before each download (minimum possible number
of seconds to sleep) when used along with
max_sleep_interval.
max_sleep_interval:Upper bound of a range for randomized sleep before each
download (maximum possible number of seconds to sleep).
Must only be used along with sleep_interval.
Actual sleep time will be a random float from range
[sleep_interval; max_sleep_interval].
listformats: Print an overview of available video formats and exit. listformats: Print an overview of available video formats and exit.
list_thumbnails: Print a table of all thumbnails and exit. list_thumbnails: Print a table of all thumbnails and exit.
match_filter: A function that gets called with the info_dict of match_filter: A function that gets called with the info_dict of
@@ -304,6 +314,11 @@ class YoutubeDL(object):
self.params.update(params) self.params.update(params)
self.cache = Cache(self) self.cache = Cache(self)
if self.params.get('cn_verification_proxy') is not None:
self.report_warning('--cn-verification-proxy is deprecated. Use --geo-verification-proxy instead.')
if self.params.get('geo_verification_proxy') is None:
self.params['geo_verification_proxy'] = self.params['cn_verification_proxy']
if params.get('bidi_workaround', False): if params.get('bidi_workaround', False):
try: try:
import pty import pty
@@ -1046,9 +1061,9 @@ class YoutubeDL(object):
if isinstance(selector, list): if isinstance(selector, list):
fs = [_build_selector_function(s) for s in selector] fs = [_build_selector_function(s) for s in selector]
def selector_function(formats): def selector_function(ctx):
for f in fs: for f in fs:
for format in f(formats): for format in f(ctx):
yield format yield format
return selector_function return selector_function
elif selector.type == GROUP: elif selector.type == GROUP:
@@ -1056,17 +1071,17 @@ class YoutubeDL(object):
elif selector.type == PICKFIRST: elif selector.type == PICKFIRST:
fs = [_build_selector_function(s) for s in selector.selector] fs = [_build_selector_function(s) for s in selector.selector]
def selector_function(formats): def selector_function(ctx):
for f in fs: for f in fs:
picked_formats = list(f(formats)) picked_formats = list(f(ctx))
if picked_formats: if picked_formats:
return picked_formats return picked_formats
return [] return []
elif selector.type == SINGLE: elif selector.type == SINGLE:
format_spec = selector.selector format_spec = selector.selector
def selector_function(formats): def selector_function(ctx):
formats = list(formats) formats = list(ctx['formats'])
if not formats: if not formats:
return return
if format_spec == 'all': if format_spec == 'all':
@@ -1079,9 +1094,10 @@ class YoutubeDL(object):
if f.get('vcodec') != 'none' and f.get('acodec') != 'none'] if f.get('vcodec') != 'none' and f.get('acodec') != 'none']
if audiovideo_formats: if audiovideo_formats:
yield audiovideo_formats[format_idx] yield audiovideo_formats[format_idx]
# for audio only (soundcloud) or video only (imgur) urls, select the best/worst audio format # for extractors with incomplete formats (audio only (soundcloud)
elif (all(f.get('acodec') != 'none' for f in formats) or # or video only (imgur)) we will fallback to best/worst
all(f.get('vcodec') != 'none' for f in formats)): # {video,audio}-only format
elif ctx['incomplete_formats']:
yield formats[format_idx] yield formats[format_idx]
elif format_spec == 'bestaudio': elif format_spec == 'bestaudio':
audio_formats = [ audio_formats = [
@@ -1155,17 +1171,18 @@ class YoutubeDL(object):
} }
video_selector, audio_selector = map(_build_selector_function, selector.selector) video_selector, audio_selector = map(_build_selector_function, selector.selector)
def selector_function(formats): def selector_function(ctx):
formats = list(formats) for pair in itertools.product(
for pair in itertools.product(video_selector(formats), audio_selector(formats)): video_selector(copy.deepcopy(ctx)), audio_selector(copy.deepcopy(ctx))):
yield _merge(pair) yield _merge(pair)
filters = [self._build_format_filter(f) for f in selector.filters] filters = [self._build_format_filter(f) for f in selector.filters]
def final_selector(formats): def final_selector(ctx):
ctx_copy = copy.deepcopy(ctx)
for _filter in filters: for _filter in filters:
formats = list(filter(_filter, formats)) ctx_copy['formats'] = list(filter(_filter, ctx_copy['formats']))
return selector_function(formats) return selector_function(ctx_copy)
return final_selector return final_selector
stream = io.BytesIO(format_spec.encode('utf-8')) stream = io.BytesIO(format_spec.encode('utf-8'))
@@ -1372,7 +1389,34 @@ class YoutubeDL(object):
req_format_list.append('best') req_format_list.append('best')
req_format = '/'.join(req_format_list) req_format = '/'.join(req_format_list)
format_selector = self.build_format_selector(req_format) format_selector = self.build_format_selector(req_format)
formats_to_download = list(format_selector(formats))
# While in format selection we may need to have an access to the original
# format set in order to calculate some metrics or do some processing.
# For now we need to be able to guess whether original formats provided
# by extractor are incomplete or not (i.e. whether extractor provides only
# video-only or audio-only formats) for proper formats selection for
# extractors with such incomplete formats (see
# https://github.com/rg3/youtube-dl/pull/5556).
# Since formats may be filtered during format selection and may not match
# the original formats the results may be incorrect. Thus original formats
# or pre-calculated metrics should be passed to format selection routines
# as well.
# We will pass a context object containing all necessary additional data
# instead of just formats.
# This fixes incorrect format selection issue (see
# https://github.com/rg3/youtube-dl/issues/10083).
incomplete_formats = (
# All formats are video-only or
all(f.get('vcodec') != 'none' and f.get('acodec') == 'none' for f in formats) or
# all formats are audio-only
all(f.get('vcodec') == 'none' and f.get('acodec') != 'none' for f in formats))
ctx = {
'formats': formats,
'incomplete_formats': incomplete_formats,
}
formats_to_download = list(format_selector(ctx))
if not formats_to_download: if not formats_to_download:
raise ExtractorError('requested format not available', raise ExtractorError('requested format not available',
expected=True) expected=True)
@@ -1559,7 +1603,9 @@ class YoutubeDL(object):
self.to_screen('[info] Video subtitle %s.%s is already_present' % (sub_lang, sub_format)) self.to_screen('[info] Video subtitle %s.%s is already_present' % (sub_lang, sub_format))
else: else:
self.to_screen('[info] Writing video subtitles to: ' + sub_filename) self.to_screen('[info] Writing video subtitles to: ' + sub_filename)
with io.open(encodeFilename(sub_filename), 'w', encoding='utf-8') as subfile: # Use newline='' to prevent conversion of newline characters
# See https://github.com/rg3/youtube-dl/issues/10268
with io.open(encodeFilename(sub_filename), 'w', encoding='utf-8', newline='') as subfile:
subfile.write(sub_data) subfile.write(sub_data)
except (OSError, IOError): except (OSError, IOError):
self.report_error('Cannot write subtitles file ' + sub_filename) self.report_error('Cannot write subtitles file ' + sub_filename)

View File

@@ -145,6 +145,16 @@ def _real_main(argv=None):
if numeric_limit is None: if numeric_limit is None:
parser.error('invalid max_filesize specified') parser.error('invalid max_filesize specified')
opts.max_filesize = numeric_limit opts.max_filesize = numeric_limit
if opts.sleep_interval is not None:
if opts.sleep_interval < 0:
parser.error('sleep interval must be positive or 0')
if opts.max_sleep_interval is not None:
if opts.max_sleep_interval < 0:
parser.error('max sleep interval must be positive or 0')
if opts.max_sleep_interval < opts.sleep_interval:
parser.error('max sleep interval must be greater than or equal to min sleep interval')
else:
opts.max_sleep_interval = opts.sleep_interval
def parse_retries(retries): def parse_retries(retries):
if retries in ('inf', 'infinite'): if retries in ('inf', 'infinite'):
@@ -370,6 +380,7 @@ def _real_main(argv=None):
'source_address': opts.source_address, 'source_address': opts.source_address,
'call_home': opts.call_home, 'call_home': opts.call_home,
'sleep_interval': opts.sleep_interval, 'sleep_interval': opts.sleep_interval,
'max_sleep_interval': opts.max_sleep_interval,
'external_downloader': opts.external_downloader, 'external_downloader': opts.external_downloader,
'list_thumbnails': opts.list_thumbnails, 'list_thumbnails': opts.list_thumbnails,
'playlist_items': opts.playlist_items, 'playlist_items': opts.playlist_items,
@@ -382,6 +393,8 @@ def _real_main(argv=None):
'external_downloader_args': external_downloader_args, 'external_downloader_args': external_downloader_args,
'postprocessor_args': postprocessor_args, 'postprocessor_args': postprocessor_args,
'cn_verification_proxy': opts.cn_verification_proxy, 'cn_verification_proxy': opts.cn_verification_proxy,
'geo_verification_proxy': opts.geo_verification_proxy,
} }
with YoutubeDL(ydl_opts) as ydl: with YoutubeDL(ydl_opts) as ydl:

View File

@@ -1,3 +1,4 @@
# coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import binascii import binascii
@@ -2594,15 +2595,19 @@ except ImportError: # Python < 3.3
return "'" + s.replace("'", "'\"'\"'") + "'" return "'" + s.replace("'", "'\"'\"'") + "'"
if sys.version_info >= (2, 7, 3): try:
args = shlex.split('中文')
assert (isinstance(args, list) and
isinstance(args[0], compat_str) and
args[0] == '中文')
compat_shlex_split = shlex.split compat_shlex_split = shlex.split
else: except (AssertionError, UnicodeEncodeError):
# Working around shlex issue with unicode strings on some python 2 # Working around shlex issue with unicode strings on some python 2
# versions (see http://bugs.python.org/issue1548891) # versions (see http://bugs.python.org/issue1548891)
def compat_shlex_split(s, comments=False, posix=True): def compat_shlex_split(s, comments=False, posix=True):
if isinstance(s, compat_str): if isinstance(s, compat_str):
s = s.encode('utf-8') s = s.encode('utf-8')
return shlex.split(s, comments, posix) return list(map(lambda s: s.decode('utf-8'), shlex.split(s, comments, posix)))
def compat_ord(c): def compat_ord(c):

View File

@@ -4,6 +4,7 @@ import os
import re import re
import sys import sys
import time import time
import random
from ..compat import compat_os_name from ..compat import compat_os_name
from ..utils import ( from ..utils import (
@@ -342,8 +343,11 @@ class FileDownloader(object):
}) })
return True return True
sleep_interval = self.params.get('sleep_interval') min_sleep_interval = self.params.get('sleep_interval')
if sleep_interval: if min_sleep_interval:
max_sleep_interval = self.params.get('max_sleep_interval', min_sleep_interval)
print(min_sleep_interval, max_sleep_interval)
sleep_interval = random.uniform(min_sleep_interval, max_sleep_interval)
self.to_screen('[download] Sleeping %s seconds...' % sleep_interval) self.to_screen('[download] Sleeping %s seconds...' % sleep_interval)
time.sleep(sleep_interval) time.sleep(sleep_interval)

View File

@@ -83,6 +83,20 @@ class AdultSwimIE(InfoExtractor):
# m3u8 download # m3u8 download
'skip_download': True, 'skip_download': True,
} }
}, {
# heroMetadata.trailer
'url': 'http://www.adultswim.com/videos/decker/inside-decker-a-new-hero/',
'info_dict': {
'id': 'I0LQFQkaSUaFp8PnAWHhoQ',
'ext': 'mp4',
'title': 'Decker - Inside Decker: A New Hero',
'description': 'md5:c916df071d425d62d70c86d4399d3ee0',
'duration': 249.008,
},
'params': {
# m3u8 download
'skip_download': True,
}
}] }]
@staticmethod @staticmethod
@@ -133,20 +147,26 @@ class AdultSwimIE(InfoExtractor):
if video_info is None: if video_info is None:
if bootstrapped_data.get('slugged_video', {}).get('slug') == episode_path: if bootstrapped_data.get('slugged_video', {}).get('slug') == episode_path:
video_info = bootstrapped_data['slugged_video'] video_info = bootstrapped_data['slugged_video']
else: if not video_info:
raise ExtractorError('Unable to find video info') video_info = bootstrapped_data.get('heroMetadata', {}).get('trailer').get('video')
if not video_info:
raise ExtractorError('Unable to find video info')
show = bootstrapped_data['show'] show = bootstrapped_data['show']
show_title = show['title'] show_title = show['title']
stream = video_info.get('stream') stream = video_info.get('stream')
clips = [stream] if stream else video_info.get('clips') if stream and stream.get('videoPlaybackID'):
if not clips: segment_ids = [stream['videoPlaybackID']]
elif video_info.get('clips'):
segment_ids = [clip['videoPlaybackID'] for clip in video_info['clips']]
elif video_info.get('videoPlaybackID'):
segment_ids = [video_info['videoPlaybackID']]
else:
raise ExtractorError( raise ExtractorError(
'This video is only available via cable service provider subscription that' 'This video is only available via cable service provider subscription that'
' is not currently supported. You may want to use --cookies.' ' is not currently supported. You may want to use --cookies.'
if video_info.get('auth') is True else 'Unable to find stream or clips', if video_info.get('auth') is True else 'Unable to find stream or clips',
expected=True) expected=True)
segment_ids = [clip['videoPlaybackID'] for clip in clips]
episode_id = video_info['id'] episode_id = video_info['id']
episode_title = video_info['title'] episode_title = video_info['title']

View File

@@ -5,6 +5,8 @@ from .common import InfoExtractor
from ..utils import ( from ..utils import (
int_or_none, int_or_none,
parse_iso8601, parse_iso8601,
mimetype2ext,
determine_ext,
) )
@@ -50,21 +52,25 @@ class AMPIE(InfoExtractor):
if isinstance(media_content, dict): if isinstance(media_content, dict):
media_content = [media_content] media_content = [media_content]
for media_data in media_content: for media_data in media_content:
media = media_data['@attributes'] media = media_data.get('@attributes', {})
media_type = media['type'] media_url = media.get('url')
if media_type in ('video/f4m', 'application/f4m+xml'): if not media_url:
continue
ext = mimetype2ext(media.get('type')) or determine_ext(media_url)
if ext == 'f4m':
formats.extend(self._extract_f4m_formats( formats.extend(self._extract_f4m_formats(
media['url'] + '?hdcore=3.4.0&plugin=aasp-3.4.0.132.124', media_url + '?hdcore=3.4.0&plugin=aasp-3.4.0.132.124',
video_id, f4m_id='hds', fatal=False)) video_id, f4m_id='hds', fatal=False))
elif media_type == 'application/x-mpegURL': elif ext == 'm3u8':
formats.extend(self._extract_m3u8_formats( formats.extend(self._extract_m3u8_formats(
media['url'], video_id, 'mp4', m3u8_id='hls', fatal=False)) media_url, video_id, 'mp4', m3u8_id='hls', fatal=False))
else: else:
formats.append({ formats.append({
'format_id': media_data.get('media-category', {}).get('@attributes', {}).get('label'), 'format_id': media_data.get('media-category', {}).get('@attributes', {}).get('label'),
'url': media['url'], 'url': media['url'],
'tbr': int_or_none(media.get('bitrate')), 'tbr': int_or_none(media.get('bitrate')),
'filesize': int_or_none(media.get('fileSize')), 'filesize': int_or_none(media.get('fileSize')),
'ext': ext,
}) })
self._sort_formats(formats) self._sort_formats(formats)

View File

@@ -22,6 +22,7 @@ class AnimeOnDemandIE(InfoExtractor):
_APPLY_HTML5_URL = 'https://www.anime-on-demand.de/html5apply' _APPLY_HTML5_URL = 'https://www.anime-on-demand.de/html5apply'
_NETRC_MACHINE = 'animeondemand' _NETRC_MACHINE = 'animeondemand'
_TESTS = [{ _TESTS = [{
# jap, OmU
'url': 'https://www.anime-on-demand.de/anime/161', 'url': 'https://www.anime-on-demand.de/anime/161',
'info_dict': { 'info_dict': {
'id': '161', 'id': '161',
@@ -30,17 +31,21 @@ class AnimeOnDemandIE(InfoExtractor):
}, },
'playlist_mincount': 4, 'playlist_mincount': 4,
}, { }, {
# Film wording is used instead of Episode # Film wording is used instead of Episode, ger/jap, Dub/OmU
'url': 'https://www.anime-on-demand.de/anime/39', 'url': 'https://www.anime-on-demand.de/anime/39',
'only_matching': True, 'only_matching': True,
}, { }, {
# Episodes without titles # Episodes without titles, jap, OmU
'url': 'https://www.anime-on-demand.de/anime/162', 'url': 'https://www.anime-on-demand.de/anime/162',
'only_matching': True, 'only_matching': True,
}, { }, {
# ger/jap, Dub/OmU, account required # ger/jap, Dub/OmU, account required
'url': 'https://www.anime-on-demand.de/anime/169', 'url': 'https://www.anime-on-demand.de/anime/169',
'only_matching': True, 'only_matching': True,
}, {
# Full length film, non-series, ger/jap, Dub/OmU, account required
'url': 'https://www.anime-on-demand.de/anime/185',
'only_matching': True,
}] }]
def _login(self): def _login(self):
@@ -110,35 +115,12 @@ class AnimeOnDemandIE(InfoExtractor):
entries = [] entries = []
for num, episode_html in enumerate(re.findall( def extract_info(html, video_id, num=None):
r'(?s)<h3[^>]+class="episodebox-title".+?>Episodeninhalt<', webpage), 1): title, description = [None] * 2
episodebox_title = self._search_regex(
(r'class="episodebox-title"[^>]+title=(["\'])(?P<title>.+?)\1',
r'class="episodebox-title"[^>]+>(?P<title>.+?)<'),
episode_html, 'episodebox title', default=None, group='title')
if not episodebox_title:
continue
episode_number = int(self._search_regex(
r'(?:Episode|Film)\s*(\d+)',
episodebox_title, 'episode number', default=num))
episode_title = self._search_regex(
r'(?:Episode|Film)\s*\d+\s*-\s*(.+)',
episodebox_title, 'episode title', default=None)
video_id = 'episode-%d' % episode_number
common_info = {
'id': video_id,
'series': anime_title,
'episode': episode_title,
'episode_number': episode_number,
}
formats = [] formats = []
for input_ in re.findall( for input_ in re.findall(
r'<input[^>]+class=["\'].*?streamstarter_html5[^>]+>', episode_html): r'<input[^>]+class=["\'].*?streamstarter_html5[^>]+>', html):
attributes = extract_attributes(input_) attributes = extract_attributes(input_)
playlist_urls = [] playlist_urls = []
for playlist_key in ('data-playlist', 'data-otherplaylist'): for playlist_key in ('data-playlist', 'data-otherplaylist'):
@@ -161,7 +143,7 @@ class AnimeOnDemandIE(InfoExtractor):
format_id_list.append(lang) format_id_list.append(lang)
if kind: if kind:
format_id_list.append(kind) format_id_list.append(kind)
if not format_id_list: if not format_id_list and num is not None:
format_id_list.append(compat_str(num)) format_id_list.append(compat_str(num))
format_id = '-'.join(format_id_list) format_id = '-'.join(format_id_list)
format_note = ', '.join(filter(None, (kind, lang_note))) format_note = ', '.join(filter(None, (kind, lang_note)))
@@ -215,28 +197,74 @@ class AnimeOnDemandIE(InfoExtractor):
}) })
formats.extend(file_formats) formats.extend(file_formats)
if formats: return {
self._sort_formats(formats) 'title': title,
'description': description,
'formats': formats,
}
def extract_entries(html, video_id, common_info, num=None):
info = extract_info(html, video_id, num)
if info['formats']:
self._sort_formats(info['formats'])
f = common_info.copy() f = common_info.copy()
f.update({ f.update(info)
'title': title,
'description': description,
'formats': formats,
})
entries.append(f) entries.append(f)
# Extract teaser only when full episode is not available # Extract teaser/trailer only when full episode is not available
if not formats: if not info['formats']:
m = re.search( m = re.search(
r'data-dialog-header=(["\'])(?P<title>.+?)\1[^>]+href=(["\'])(?P<href>.+?)\3[^>]*>Teaser<', r'data-dialog-header=(["\'])(?P<title>.+?)\1[^>]+href=(["\'])(?P<href>.+?)\3[^>]*>(?P<kind>Teaser|Trailer)<',
episode_html) html)
if m: if m:
f = common_info.copy() f = common_info.copy()
f.update({ f.update({
'id': '%s-teaser' % f['id'], 'id': '%s-%s' % (f['id'], m.group('kind').lower()),
'title': m.group('title'), 'title': m.group('title'),
'url': compat_urlparse.urljoin(url, m.group('href')), 'url': compat_urlparse.urljoin(url, m.group('href')),
}) })
entries.append(f) entries.append(f)
def extract_episodes(html):
for num, episode_html in enumerate(re.findall(
r'(?s)<h3[^>]+class="episodebox-title".+?>Episodeninhalt<', html), 1):
episodebox_title = self._search_regex(
(r'class="episodebox-title"[^>]+title=(["\'])(?P<title>.+?)\1',
r'class="episodebox-title"[^>]+>(?P<title>.+?)<'),
episode_html, 'episodebox title', default=None, group='title')
if not episodebox_title:
continue
episode_number = int(self._search_regex(
r'(?:Episode|Film)\s*(\d+)',
episodebox_title, 'episode number', default=num))
episode_title = self._search_regex(
r'(?:Episode|Film)\s*\d+\s*-\s*(.+)',
episodebox_title, 'episode title', default=None)
video_id = 'episode-%d' % episode_number
common_info = {
'id': video_id,
'series': anime_title,
'episode': episode_title,
'episode_number': episode_number,
}
extract_entries(episode_html, video_id, common_info)
def extract_film(html, video_id):
common_info = {
'id': anime_id,
'title': anime_title,
'description': anime_description,
}
extract_entries(html, video_id, common_info)
extract_episodes(webpage)
if not entries:
extract_film(webpage, anime_id)
return self.playlist_result(entries, anime_id, anime_title, anime_description) return self.playlist_result(entries, anime_id, anime_title, anime_description)

View File

@@ -123,6 +123,10 @@ class AolFeaturesIE(InfoExtractor):
'title': 'What To Watch - February 17, 2016', 'title': 'What To Watch - February 17, 2016',
}, },
'add_ie': ['FiveMin'], 'add_ie': ['FiveMin'],
'params': {
# encrypted m3u8 download
'skip_download': True,
},
}] }]
def _real_extract(self, url): def _real_extract(self, url):

View File

@@ -1,8 +1,6 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
@@ -15,7 +13,7 @@ class AparatIE(InfoExtractor):
_TEST = { _TEST = {
'url': 'http://www.aparat.com/v/wP8On', 'url': 'http://www.aparat.com/v/wP8On',
'md5': '6714e0af7e0d875c5a39c4dc4ab46ad1', 'md5': '131aca2e14fe7c4dcb3c4877ba300c89',
'info_dict': { 'info_dict': {
'id': 'wP8On', 'id': 'wP8On',
'ext': 'mp4', 'ext': 'mp4',
@@ -31,13 +29,13 @@ class AparatIE(InfoExtractor):
# Note: There is an easier-to-parse configuration at # Note: There is an easier-to-parse configuration at
# http://www.aparat.com/video/video/config/videohash/%video_id # http://www.aparat.com/video/video/config/videohash/%video_id
# but the URL in there does not work # but the URL in there does not work
embed_url = ('http://www.aparat.com/video/video/embed/videohash/' + embed_url = 'http://www.aparat.com/video/video/embed/vt/frame/showvideo/yes/videohash/' + video_id
video_id + '/vt/frame')
webpage = self._download_webpage(embed_url, video_id) webpage = self._download_webpage(embed_url, video_id)
video_urls = [video_url.replace('\\/', '/') for video_url in re.findall( file_list = self._parse_json(self._search_regex(
r'(?:fileList\[[0-9]+\]\s*=|"file"\s*:)\s*"([^"]+)"', webpage)] r'fileList\s*=\s*JSON\.parse\(\'([^\']+)\'\)', webpage, 'file list'), video_id)
for i, video_url in enumerate(video_urls): for i, item in enumerate(file_list[0]):
video_url = item['file']
req = HEADRequest(video_url) req = HEADRequest(video_url)
res = self._request_webpage( res = self._request_webpage(
req, video_id, note='Testing video URL %d' % i, errnote=False) req, video_id, note='Testing video URL %d' % i, errnote=False)

View File

@@ -1,67 +1,65 @@
from __future__ import unicode_literals from __future__ import unicode_literals
from .common import InfoExtractor from .jwplatform import JWPlatformBaseIE
from ..utils import unified_strdate from ..utils import (
unified_strdate,
clean_html,
)
class ArchiveOrgIE(InfoExtractor): class ArchiveOrgIE(JWPlatformBaseIE):
IE_NAME = 'archive.org' IE_NAME = 'archive.org'
IE_DESC = 'archive.org videos' IE_DESC = 'archive.org videos'
_VALID_URL = r'https?://(?:www\.)?archive\.org/details/(?P<id>[^?/]+)(?:[?].*)?$' _VALID_URL = r'https?://(?:www\.)?archive\.org/(?:details|embed)/(?P<id>[^/?#]+)(?:[?].*)?$'
_TESTS = [{ _TESTS = [{
'url': 'http://archive.org/details/XD300-23_68HighlightsAResearchCntAugHumanIntellect', 'url': 'http://archive.org/details/XD300-23_68HighlightsAResearchCntAugHumanIntellect',
'md5': '8af1d4cf447933ed3c7f4871162602db', 'md5': '8af1d4cf447933ed3c7f4871162602db',
'info_dict': { 'info_dict': {
'id': 'XD300-23_68HighlightsAResearchCntAugHumanIntellect', 'id': 'XD300-23_68HighlightsAResearchCntAugHumanIntellect',
'ext': 'ogv', 'ext': 'ogg',
'title': '1968 Demo - FJCC Conference Presentation Reel #1', 'title': '1968 Demo - FJCC Conference Presentation Reel #1',
'description': 'md5:1780b464abaca9991d8968c877bb53ed', 'description': 'md5:da45c349df039f1cc8075268eb1b5c25',
'upload_date': '19681210', 'upload_date': '19681210',
'uploader': 'SRI International' 'uploader': 'SRI International'
} }
}, { }, {
'url': 'https://archive.org/details/Cops1922', 'url': 'https://archive.org/details/Cops1922',
'md5': '18f2a19e6d89af8425671da1cf3d4e04', 'md5': 'bc73c8ab3838b5a8fc6c6651fa7b58ba',
'info_dict': { 'info_dict': {
'id': 'Cops1922', 'id': 'Cops1922',
'ext': 'ogv', 'ext': 'mp4',
'title': 'Buster Keaton\'s "Cops" (1922)', 'title': 'Buster Keaton\'s "Cops" (1922)',
'description': 'md5:70f72ee70882f713d4578725461ffcc3', 'description': 'md5:b4544662605877edd99df22f9620d858',
} }
}, {
'url': 'http://archive.org/embed/XD300-23_68HighlightsAResearchCntAugHumanIntellect',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
webpage = self._download_webpage(
'http://archive.org/embed/' + video_id, video_id)
jwplayer_playlist = self._parse_json(self._search_regex(
r"(?s)Play\('[^']+'\s*,\s*(\[.+\])\s*,\s*{.*?}\);",
webpage, 'jwplayer playlist'), video_id)
info = self._parse_jwplayer_data(
{'playlist': jwplayer_playlist}, video_id, base_url=url)
json_url = url + ('&' if '?' in url else '?') + 'output=json' def get_optional(metadata, field):
data = self._download_json(json_url, video_id) return metadata.get(field, [None])[0]
def get_optional(data_dict, field): metadata = self._download_json(
return data_dict['metadata'].get(field, [None])[0] 'http://archive.org/details/' + video_id, video_id, query={
'output': 'json',
title = get_optional(data, 'title') })['metadata']
description = get_optional(data, 'description') info.update({
uploader = get_optional(data, 'creator') 'title': get_optional(metadata, 'title') or info.get('title'),
upload_date = unified_strdate(get_optional(data, 'date')) 'description': clean_html(get_optional(metadata, 'description')),
})
formats = [ if info.get('_type') != 'playlist':
{ info.update({
'format': fdata['format'], 'uploader': get_optional(metadata, 'creator'),
'url': 'http://' + data['server'] + data['dir'] + fn, 'upload_date': unified_strdate(get_optional(metadata, 'date')),
'file_size': int(fdata['size']), })
} return info
for fn, fdata in data['files'].items()
if 'Video' in fdata['format']]
self._sort_formats(formats)
return {
'_type': 'video',
'id': video_id,
'title': title,
'formats': formats,
'description': description,
'uploader': uploader,
'upload_date': upload_date,
'thumbnail': data.get('misc', {}).get('image'),
}

View File

@@ -13,13 +13,14 @@ from ..utils import (
parse_duration, parse_duration,
unified_strdate, unified_strdate,
xpath_text, xpath_text,
update_url_query,
) )
from ..compat import compat_etree_fromstring from ..compat import compat_etree_fromstring
class ARDMediathekIE(InfoExtractor): class ARDMediathekIE(InfoExtractor):
IE_NAME = 'ARD:mediathek' IE_NAME = 'ARD:mediathek'
_VALID_URL = r'^https?://(?:(?:www\.)?ardmediathek\.de|mediathek\.daserste\.de)/(?:.*/)(?P<video_id>[0-9]+|[^0-9][^/\?]+)[^/\?]*(?:\?.*)?' _VALID_URL = r'^https?://(?:(?:www\.)?ardmediathek\.de|mediathek\.(?:daserste|rbb-online)\.de)/(?:.*/)(?P<video_id>[0-9]+|[^0-9][^/\?]+)[^/\?]*(?:\?.*)?'
_TESTS = [{ _TESTS = [{
'url': 'http://www.ardmediathek.de/tv/Dokumentation-und-Reportage/Ich-liebe-das-Leben-trotzdem/rbb-Fernsehen/Video?documentId=29582122&bcastId=3822114', 'url': 'http://www.ardmediathek.de/tv/Dokumentation-und-Reportage/Ich-liebe-das-Leben-trotzdem/rbb-Fernsehen/Video?documentId=29582122&bcastId=3822114',
@@ -34,6 +35,7 @@ class ARDMediathekIE(InfoExtractor):
# m3u8 download # m3u8 download
'skip_download': True, 'skip_download': True,
}, },
'skip': 'HTTP Error 404: Not Found',
}, { }, {
'url': 'http://www.ardmediathek.de/tv/Tatort/Tatort-Scheinwelten-H%C3%B6rfassung-Video/Das-Erste/Video?documentId=29522730&bcastId=602916', 'url': 'http://www.ardmediathek.de/tv/Tatort/Tatort-Scheinwelten-H%C3%B6rfassung-Video/Das-Erste/Video?documentId=29522730&bcastId=602916',
'md5': 'f4d98b10759ac06c0072bbcd1f0b9e3e', 'md5': 'f4d98b10759ac06c0072bbcd1f0b9e3e',
@@ -44,6 +46,7 @@ class ARDMediathekIE(InfoExtractor):
'description': 'md5:196392e79876d0ac94c94e8cdb2875f1', 'description': 'md5:196392e79876d0ac94c94e8cdb2875f1',
'duration': 5252, 'duration': 5252,
}, },
'skip': 'HTTP Error 404: Not Found',
}, { }, {
# audio # audio
'url': 'http://www.ardmediathek.de/tv/WDR-H%C3%B6rspiel-Speicher/Tod-eines-Fu%C3%9Fballers/WDR-3/Audio-Podcast?documentId=28488308&bcastId=23074086', 'url': 'http://www.ardmediathek.de/tv/WDR-H%C3%B6rspiel-Speicher/Tod-eines-Fu%C3%9Fballers/WDR-3/Audio-Podcast?documentId=28488308&bcastId=23074086',
@@ -55,9 +58,22 @@ class ARDMediathekIE(InfoExtractor):
'description': 'md5:f6e39f3461f0e1f54bfa48c8875c86ef', 'description': 'md5:f6e39f3461f0e1f54bfa48c8875c86ef',
'duration': 3240, 'duration': 3240,
}, },
'skip': 'HTTP Error 404: Not Found',
}, { }, {
'url': 'http://mediathek.daserste.de/sendungen_a-z/328454_anne-will/22429276_vertrauen-ist-gut-spionieren-ist-besser-geht', 'url': 'http://mediathek.daserste.de/sendungen_a-z/328454_anne-will/22429276_vertrauen-ist-gut-spionieren-ist-besser-geht',
'only_matching': True, 'only_matching': True,
}, {
# audio
'url': 'http://mediathek.rbb-online.de/radio/Hörspiel/Vor-dem-Fest/kulturradio/Audio?documentId=30796318&topRessort=radio&bcastId=9839158',
'md5': '4e8f00631aac0395fee17368ac0e9867',
'info_dict': {
'id': '30796318',
'ext': 'mp3',
'title': 'Vor dem Fest',
'description': 'md5:c0c1c8048514deaed2a73b3a60eecacb',
'duration': 3287,
},
'skip': 'Video is no longer available',
}] }]
def _extract_media_info(self, media_info_url, webpage, video_id): def _extract_media_info(self, media_info_url, webpage, video_id):
@@ -113,11 +129,14 @@ class ARDMediathekIE(InfoExtractor):
continue continue
if ext == 'f4m': if ext == 'f4m':
formats.extend(self._extract_f4m_formats( formats.extend(self._extract_f4m_formats(
stream_url + '?hdcore=3.1.1&plugin=aasp-3.1.1.69.124', update_url_query(stream_url, {
video_id, preference=-1, f4m_id='hds', fatal=False)) 'hdcore': '3.1.1',
'plugin': 'aasp-3.1.1.69.124'
}),
video_id, f4m_id='hds', fatal=False))
elif ext == 'm3u8': elif ext == 'm3u8':
formats.extend(self._extract_m3u8_formats( formats.extend(self._extract_m3u8_formats(
stream_url, video_id, 'mp4', preference=1, m3u8_id='hls', fatal=False)) stream_url, video_id, 'mp4', m3u8_id='hls', fatal=False))
else: else:
if server and server.startswith('rtmp'): if server and server.startswith('rtmp'):
f = { f = {
@@ -231,7 +250,8 @@ class ARDIE(InfoExtractor):
'title': 'Die Story im Ersten: Mission unter falscher Flagge', 'title': 'Die Story im Ersten: Mission unter falscher Flagge',
'upload_date': '20140804', 'upload_date': '20140804',
'thumbnail': 're:^https?://.*\.jpg$', 'thumbnail': 're:^https?://.*\.jpg$',
} },
'skip': 'HTTP Error 404: Not Found',
} }
def _real_extract(self, url): def _real_extract(self, url):

View File

@@ -0,0 +1,115 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
determine_ext,
float_or_none,
int_or_none,
mimetype2ext,
parse_iso8601,
strip_jsonp,
)
class ArkenaIE(InfoExtractor):
_VALID_URL = r'https?://play\.arkena\.com/(?:config|embed)/avp/v\d/player/media/(?P<id>[^/]+)/[^/]+/(?P<account_id>\d+)'
_TESTS = [{
'url': 'https://play.arkena.com/embed/avp/v2/player/media/b41dda37-d8e7-4d3f-b1b5-9a9db578bdfe/1/129411',
'md5': 'b96f2f71b359a8ecd05ce4e1daa72365',
'info_dict': {
'id': 'b41dda37-d8e7-4d3f-b1b5-9a9db578bdfe',
'ext': 'mp4',
'title': 'Big Buck Bunny',
'description': 'Royalty free test video',
'timestamp': 1432816365,
'upload_date': '20150528',
'is_live': False,
},
}, {
'url': 'https://play.arkena.com/config/avp/v2/player/media/b41dda37-d8e7-4d3f-b1b5-9a9db578bdfe/1/129411/?callbackMethod=jQuery1111023664739129262213_1469227693893',
'only_matching': True,
}, {
'url': 'http://play.arkena.com/config/avp/v1/player/media/327336/darkmatter/131064/?callbackMethod=jQuery1111002221189684892677_1469227595972',
'only_matching': True,
}, {
'url': 'http://play.arkena.com/embed/avp/v1/player/media/327336/darkmatter/131064/',
'only_matching': True,
}]
@staticmethod
def _extract_url(webpage):
# See https://support.arkena.com/display/PLAY/Ways+to+embed+your+video
mobj = re.search(
r'<iframe[^>]+src=(["\'])(?P<url>(?:https?:)?//play\.arkena\.com/embed/avp/.+?)\1',
webpage)
if mobj:
return mobj.group('url')
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
account_id = mobj.group('account_id')
playlist = self._download_json(
'https://play.arkena.com/config/avp/v2/player/media/%s/0/%s/?callbackMethod=_'
% (video_id, account_id),
video_id, transform_source=strip_jsonp)['Playlist'][0]
media_info = playlist['MediaInfo']
title = media_info['Title']
media_files = playlist['MediaFiles']
is_live = False
formats = []
for kind_case, kind_formats in media_files.items():
kind = kind_case.lower()
for f in kind_formats:
f_url = f.get('Url')
if not f_url:
continue
is_live = f.get('Live') == 'true'
exts = (mimetype2ext(f.get('Type')), determine_ext(f_url, None))
if kind == 'm3u8' or 'm3u8' in exts:
formats.extend(self._extract_m3u8_formats(
f_url, video_id, 'mp4',
entry_protocol='m3u8' if is_live else 'm3u8_native',
m3u8_id=kind, fatal=False, live=is_live))
elif kind == 'flash' or 'f4m' in exts:
formats.extend(self._extract_f4m_formats(
f_url, video_id, f4m_id=kind, fatal=False))
elif kind == 'dash' or 'mpd' in exts:
formats.extend(self._extract_mpd_formats(
f_url, video_id, mpd_id=kind, fatal=False))
elif kind == 'silverlight':
# TODO: process when ism is supported (see
# https://github.com/rg3/youtube-dl/issues/8118)
continue
else:
tbr = float_or_none(f.get('Bitrate'), 1000)
formats.append({
'url': f_url,
'format_id': '%s-%d' % (kind, tbr) if tbr else kind,
'tbr': tbr,
})
self._sort_formats(formats)
description = media_info.get('Description')
video_id = media_info.get('VideoId') or video_id
timestamp = parse_iso8601(media_info.get('PublishDate'))
thumbnails = [{
'url': thumbnail['Url'],
'width': int_or_none(thumbnail.get('Size')),
} for thumbnail in (media_info.get('Poster') or []) if thumbnail.get('Url')]
return {
'id': video_id,
'title': title,
'description': description,
'timestamp': timestamp,
'is_live': is_live,
'thumbnails': thumbnails,
'formats': formats,
}

View File

@@ -5,11 +5,13 @@ import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
dict_get,
ExtractorError, ExtractorError,
float_or_none, float_or_none,
int_or_none, int_or_none,
parse_duration, parse_duration,
parse_iso8601, parse_iso8601,
try_get,
unescapeHTML, unescapeHTML,
) )
from ..compat import ( from ..compat import (
@@ -229,51 +231,6 @@ class BBCCoUkIE(InfoExtractor):
asx = self._download_xml(connection.get('href'), programme_id, 'Downloading ASX playlist') asx = self._download_xml(connection.get('href'), programme_id, 'Downloading ASX playlist')
return [ref.get('href') for ref in asx.findall('./Entry/ref')] return [ref.get('href') for ref in asx.findall('./Entry/ref')]
def _extract_connection(self, connection, programme_id):
formats = []
kind = connection.get('kind')
protocol = connection.get('protocol')
supplier = connection.get('supplier')
if protocol == 'http':
href = connection.get('href')
transfer_format = connection.get('transferFormat')
# ASX playlist
if supplier == 'asx':
for i, ref in enumerate(self._extract_asx_playlist(connection, programme_id)):
formats.append({
'url': ref,
'format_id': 'ref%s_%s' % (i, supplier),
})
# Skip DASH until supported
elif transfer_format == 'dash':
pass
elif transfer_format == 'hls':
formats.extend(self._extract_m3u8_formats(
href, programme_id, ext='mp4', entry_protocol='m3u8_native',
m3u8_id=supplier, fatal=False))
# Direct link
else:
formats.append({
'url': href,
'format_id': supplier or kind or protocol,
})
elif protocol == 'rtmp':
application = connection.get('application', 'ondemand')
auth_string = connection.get('authString')
identifier = connection.get('identifier')
server = connection.get('server')
formats.append({
'url': '%s://%s/%s?%s' % (protocol, server, application, auth_string),
'play_path': identifier,
'app': '%s?%s' % (application, auth_string),
'page_url': 'http://www.bbc.co.uk',
'player_url': 'http://www.bbc.co.uk/emp/releases/iplayer/revisions/617463_618125_4/617463_618125_4_emp.swf',
'rtmp_live': False,
'ext': 'flv',
'format_id': supplier,
})
return formats
def _extract_items(self, playlist): def _extract_items(self, playlist):
return playlist.findall('./{%s}item' % self._EMP_PLAYLIST_NS) return playlist.findall('./{%s}item' % self._EMP_PLAYLIST_NS)
@@ -294,46 +251,6 @@ class BBCCoUkIE(InfoExtractor):
def _extract_connections(self, media): def _extract_connections(self, media):
return self._findall_ns(media, './{%s}connection') return self._findall_ns(media, './{%s}connection')
def _extract_video(self, media, programme_id):
formats = []
vbr = int_or_none(media.get('bitrate'))
vcodec = media.get('encoding')
service = media.get('service')
width = int_or_none(media.get('width'))
height = int_or_none(media.get('height'))
file_size = int_or_none(media.get('media_file_size'))
for connection in self._extract_connections(media):
conn_formats = self._extract_connection(connection, programme_id)
for format in conn_formats:
format.update({
'width': width,
'height': height,
'vbr': vbr,
'vcodec': vcodec,
'filesize': file_size,
})
if service:
format['format_id'] = '%s_%s' % (service, format['format_id'])
formats.extend(conn_formats)
return formats
def _extract_audio(self, media, programme_id):
formats = []
abr = int_or_none(media.get('bitrate'))
acodec = media.get('encoding')
service = media.get('service')
for connection in self._extract_connections(media):
conn_formats = self._extract_connection(connection, programme_id)
for format in conn_formats:
format.update({
'format_id': '%s_%s' % (service, format['format_id']),
'abr': abr,
'acodec': acodec,
'vcodec': 'none',
})
formats.extend(conn_formats)
return formats
def _get_subtitles(self, media, programme_id): def _get_subtitles(self, media, programme_id):
subtitles = {} subtitles = {}
for connection in self._extract_connections(media): for connection in self._extract_connections(media):
@@ -379,13 +296,87 @@ class BBCCoUkIE(InfoExtractor):
def _process_media_selector(self, media_selection, programme_id): def _process_media_selector(self, media_selection, programme_id):
formats = [] formats = []
subtitles = None subtitles = None
urls = []
for media in self._extract_medias(media_selection): for media in self._extract_medias(media_selection):
kind = media.get('kind') kind = media.get('kind')
if kind == 'audio': if kind in ('video', 'audio'):
formats.extend(self._extract_audio(media, programme_id)) bitrate = int_or_none(media.get('bitrate'))
elif kind == 'video': encoding = media.get('encoding')
formats.extend(self._extract_video(media, programme_id)) service = media.get('service')
width = int_or_none(media.get('width'))
height = int_or_none(media.get('height'))
file_size = int_or_none(media.get('media_file_size'))
for connection in self._extract_connections(media):
href = connection.get('href')
if href in urls:
continue
if href:
urls.append(href)
conn_kind = connection.get('kind')
protocol = connection.get('protocol')
supplier = connection.get('supplier')
transfer_format = connection.get('transferFormat')
format_id = supplier or conn_kind or protocol
if service:
format_id = '%s_%s' % (service, format_id)
# ASX playlist
if supplier == 'asx':
for i, ref in enumerate(self._extract_asx_playlist(connection, programme_id)):
formats.append({
'url': ref,
'format_id': 'ref%s_%s' % (i, format_id),
})
elif transfer_format == 'dash':
formats.extend(self._extract_mpd_formats(
href, programme_id, mpd_id=format_id, fatal=False))
elif transfer_format == 'hls':
formats.extend(self._extract_m3u8_formats(
href, programme_id, ext='mp4', entry_protocol='m3u8_native',
m3u8_id=format_id, fatal=False))
elif transfer_format == 'hds':
formats.extend(self._extract_f4m_formats(
href, programme_id, f4m_id=format_id, fatal=False))
else:
if not service and not supplier and bitrate:
format_id += '-%d' % bitrate
fmt = {
'format_id': format_id,
'filesize': file_size,
}
if kind == 'video':
fmt.update({
'width': width,
'height': height,
'vbr': bitrate,
'vcodec': encoding,
})
else:
fmt.update({
'abr': bitrate,
'acodec': encoding,
'vcodec': 'none',
})
if protocol == 'http':
# Direct link
fmt.update({
'url': href,
})
elif protocol == 'rtmp':
application = connection.get('application', 'ondemand')
auth_string = connection.get('authString')
identifier = connection.get('identifier')
server = connection.get('server')
fmt.update({
'url': '%s://%s/%s?%s' % (protocol, server, application, auth_string),
'play_path': identifier,
'app': '%s?%s' % (application, auth_string),
'page_url': 'http://www.bbc.co.uk',
'player_url': 'http://www.bbc.co.uk/emp/releases/iplayer/revisions/617463_618125_4/617463_618125_4_emp.swf',
'rtmp_live': False,
'ext': 'flv',
})
formats.append(fmt)
elif kind == 'captions': elif kind == 'captions':
subtitles = self.extract_subtitles(media, programme_id) subtitles = self.extract_subtitles(media, programme_id)
return formats, subtitles return formats, subtitles
@@ -590,6 +581,7 @@ class BBCIE(BBCCoUkIE):
'id': '150615_telabyad_kentin_cogu', 'id': '150615_telabyad_kentin_cogu',
'ext': 'mp4', 'ext': 'mp4',
'title': "YPG: Tel Abyad'ın tamamı kontrolümüzde", 'title': "YPG: Tel Abyad'ın tamamı kontrolümüzde",
'description': 'md5:33a4805a855c9baf7115fcbde57e7025',
'timestamp': 1434397334, 'timestamp': 1434397334,
'upload_date': '20150615', 'upload_date': '20150615',
}, },
@@ -603,6 +595,7 @@ class BBCIE(BBCCoUkIE):
'id': '150619_video_honduras_militares_hospitales_corrupcion_aw', 'id': '150619_video_honduras_militares_hospitales_corrupcion_aw',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Honduras militariza sus hospitales por nuevo escándalo de corrupción', 'title': 'Honduras militariza sus hospitales por nuevo escándalo de corrupción',
'description': 'md5:1525f17448c4ee262b64b8f0c9ce66c8',
'timestamp': 1434713142, 'timestamp': 1434713142,
'upload_date': '20150619', 'upload_date': '20150619',
}, },
@@ -652,6 +645,23 @@ class BBCIE(BBCCoUkIE):
# rtmp download # rtmp download
'skip_download': True, 'skip_download': True,
} }
}, {
# single video embedded with Morph
'url': 'http://www.bbc.co.uk/sport/live/olympics/36895975',
'info_dict': {
'id': 'p041vhd0',
'ext': 'mp4',
'title': "Nigeria v Japan - Men's First Round",
'description': 'Live coverage of the first round from Group B at the Amazonia Arena.',
'duration': 7980,
'uploader': 'BBC Sport',
'uploader_id': 'bbc_sport',
},
'params': {
# m3u8 download
'skip_download': True,
},
'skip': 'Georestricted to UK',
}, { }, {
# single video with playlist.sxml URL in playlist param # single video with playlist.sxml URL in playlist param
'url': 'http://www.bbc.com/sport/0/football/33653409', 'url': 'http://www.bbc.com/sport/0/football/33653409',
@@ -749,7 +759,7 @@ class BBCIE(BBCCoUkIE):
webpage = self._download_webpage(url, playlist_id) webpage = self._download_webpage(url, playlist_id)
json_ld_info = self._search_json_ld(webpage, playlist_id, default=None) json_ld_info = self._search_json_ld(webpage, playlist_id, default={})
timestamp = json_ld_info.get('timestamp') timestamp = json_ld_info.get('timestamp')
playlist_title = json_ld_info.get('title') playlist_title = json_ld_info.get('title')
@@ -818,8 +828,29 @@ class BBCIE(BBCCoUkIE):
# http://www.bbc.com/turkce/multimedya/2015/10/151010_vid_ankara_patlama_ani) # http://www.bbc.com/turkce/multimedya/2015/10/151010_vid_ankara_patlama_ani)
playlist = data_playable.get('otherSettings', {}).get('playlist', {}) playlist = data_playable.get('otherSettings', {}).get('playlist', {})
if playlist: if playlist:
entries.append(self._extract_from_playlist_sxml( entry = None
playlist.get('progressiveDownloadUrl'), playlist_id, timestamp)) for key in ('streaming', 'progressiveDownload'):
playlist_url = playlist.get('%sUrl' % key)
if not playlist_url:
continue
try:
info = self._extract_from_playlist_sxml(
playlist_url, playlist_id, timestamp)
if not entry:
entry = info
else:
entry['title'] = info['title']
entry['formats'].extend(info['formats'])
except Exception as e:
# Some playlist URL may fail with 500, at the same time
# the other one may work fine (e.g.
# http://www.bbc.com/turkce/haberler/2015/06/150615_telabyad_kentin_cogu)
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 500:
continue
raise
if entry:
self._sort_formats(entry['formats'])
entries.append(entry)
if entries: if entries:
return self.playlist_result(entries, playlist_id, playlist_title, playlist_description) return self.playlist_result(entries, playlist_id, playlist_title, playlist_description)
@@ -852,6 +883,50 @@ class BBCIE(BBCCoUkIE):
'subtitles': subtitles, 'subtitles': subtitles,
} }
# Morph based embed (e.g. http://www.bbc.co.uk/sport/live/olympics/36895975)
# There are several setPayload calls may be present but the video
# seems to be always related to the first one
morph_payload = self._parse_json(
self._search_regex(
r'Morph\.setPayload\([^,]+,\s*({.+?})\);',
webpage, 'morph payload', default='{}'),
playlist_id, fatal=False)
if morph_payload:
components = try_get(morph_payload, lambda x: x['body']['components'], list) or []
for component in components:
if not isinstance(component, dict):
continue
lead_media = try_get(component, lambda x: x['props']['leadMedia'], dict)
if not lead_media:
continue
identifiers = lead_media.get('identifiers')
if not identifiers or not isinstance(identifiers, dict):
continue
programme_id = identifiers.get('vpid') or identifiers.get('playablePid')
if not programme_id:
continue
title = lead_media.get('title') or self._og_search_title(webpage)
formats, subtitles = self._download_media_selector(programme_id)
self._sort_formats(formats)
description = lead_media.get('summary')
uploader = lead_media.get('masterBrand')
uploader_id = lead_media.get('mid')
duration = None
duration_d = lead_media.get('duration')
if isinstance(duration_d, dict):
duration = parse_duration(dict_get(
duration_d, ('rawDuration', 'formattedDuration', 'spokenDuration')))
return {
'id': programme_id,
'title': title,
'description': description,
'duration': duration,
'uploader': uploader,
'uploader_id': uploader_id,
'formats': formats,
'subtitles': subtitles,
}
def extract_all(pattern): def extract_all(pattern):
return list(filter(None, map( return list(filter(None, map(
lambda s: self._parse_json(s, playlist_id, fatal=False), lambda s: self._parse_json(s, playlist_id, fatal=False),
@@ -869,7 +944,7 @@ class BBCIE(BBCCoUkIE):
r'setPlaylist\("(%s)"\)' % EMBED_URL, webpage)) r'setPlaylist\("(%s)"\)' % EMBED_URL, webpage))
if entries: if entries:
return self.playlist_result( return self.playlist_result(
[self.url_result(entry, 'BBCCoUk') for entry in entries], [self.url_result(entry_, 'BBCCoUk') for entry_ in entries],
playlist_id, playlist_title, playlist_description) playlist_id, playlist_title, playlist_description)
# Multiple video article (e.g. http://www.bbc.com/news/world-europe-32668511) # Multiple video article (e.g. http://www.bbc.com/news/world-europe-32668511)
@@ -998,10 +1073,10 @@ class BBCCoUkPlaylistBaseIE(InfoExtractor):
class BBCCoUkIPlayerPlaylistIE(BBCCoUkPlaylistBaseIE): class BBCCoUkIPlayerPlaylistIE(BBCCoUkPlaylistBaseIE):
IE_NAME = 'bbc.co.uk:iplayer:playlist' IE_NAME = 'bbc.co.uk:iplayer:playlist'
_VALID_URL = r'https?://(?:www\.)?bbc\.co\.uk/iplayer/episodes/(?P<id>%s)' % BBCCoUkIE._ID_REGEX _VALID_URL = r'https?://(?:www\.)?bbc\.co\.uk/iplayer/(?:episodes|group)/(?P<id>%s)' % BBCCoUkIE._ID_REGEX
_URL_TEMPLATE = 'http://www.bbc.co.uk/iplayer/episode/%s' _URL_TEMPLATE = 'http://www.bbc.co.uk/iplayer/episode/%s'
_VIDEO_ID_TEMPLATE = r'data-ip-id=["\'](%s)' _VIDEO_ID_TEMPLATE = r'data-ip-id=["\'](%s)'
_TEST = { _TESTS = [{
'url': 'http://www.bbc.co.uk/iplayer/episodes/b05rcz9v', 'url': 'http://www.bbc.co.uk/iplayer/episodes/b05rcz9v',
'info_dict': { 'info_dict': {
'id': 'b05rcz9v', 'id': 'b05rcz9v',
@@ -1009,7 +1084,17 @@ class BBCCoUkIPlayerPlaylistIE(BBCCoUkPlaylistBaseIE):
'description': 'French thriller serial about a missing teenager.', 'description': 'French thriller serial about a missing teenager.',
}, },
'playlist_mincount': 6, 'playlist_mincount': 6,
} 'skip': 'This programme is not currently available on BBC iPlayer',
}, {
# Available for over a year unlike 30 days for most other programmes
'url': 'http://www.bbc.co.uk/iplayer/group/p02tcc32',
'info_dict': {
'id': 'p02tcc32',
'title': 'Bohemian Icons',
'description': 'md5:683e901041b2fe9ba596f2ab04c4dbe7',
},
'playlist_mincount': 10,
}]
def _extract_title_and_description(self, webpage): def _extract_title_and_description(self, webpage):
title = self._search_regex(r'<h1>([^<]+)</h1>', webpage, 'title', fatal=False) title = self._search_regex(r'<h1>([^<]+)</h1>', webpage, 'title', fatal=False)

View File

@@ -12,7 +12,7 @@ class BigflixIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?bigflix\.com/.+/(?P<id>[0-9]+)' _VALID_URL = r'https?://(?:www\.)?bigflix\.com/.+/(?P<id>[0-9]+)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.bigflix.com/Hindi-movies/Action-movies/Singham-Returns/16537', 'url': 'http://www.bigflix.com/Hindi-movies/Action-movies/Singham-Returns/16537',
'md5': 'ec76aa9b1129e2e5b301a474e54fab74', 'md5': 'dc1b4aebb46e3a7077ecc0d9f43f61e3',
'info_dict': { 'info_dict': {
'id': '16537', 'id': '16537',
'ext': 'mp4', 'ext': 'mp4',
@@ -26,7 +26,7 @@ class BigflixIE(InfoExtractor):
'id': '16070', 'id': '16070',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Madarasapatinam', 'title': 'Madarasapatinam',
'description': 'md5:63b9b8ed79189c6f0418c26d9a3452ca', 'description': 'md5:9f0470b26a4ba8e824c823b5d95c2f6b',
'formats': 'mincount:2', 'formats': 'mincount:2',
}, },
'params': { 'params': {

View File

@@ -25,13 +25,13 @@ class BiliBiliIE(InfoExtractor):
_TESTS = [{ _TESTS = [{
'url': 'http://www.bilibili.tv/video/av1074402/', 'url': 'http://www.bilibili.tv/video/av1074402/',
'md5': '5f7d29e1a2872f3df0cf76b1f87d3788', 'md5': '9fa226fe2b8a9a4d5a69b4c6a183417e',
'info_dict': { 'info_dict': {
'id': '1554319', 'id': '1554319',
'ext': 'flv', 'ext': 'mp4',
'title': '【金坷垃】金泡沫', 'title': '【金坷垃】金泡沫',
'description': 'md5:ce18c2a2d2193f0df2917d270f2e5923', 'description': 'md5:ce18c2a2d2193f0df2917d270f2e5923',
'duration': 308.067, 'duration': 308.315,
'timestamp': 1398012660, 'timestamp': 1398012660,
'upload_date': '20140420', 'upload_date': '20140420',
'thumbnail': 're:^https?://.+\.jpg', 'thumbnail': 're:^https?://.+\.jpg',
@@ -41,73 +41,33 @@ class BiliBiliIE(InfoExtractor):
}, { }, {
'url': 'http://www.bilibili.com/video/av1041170/', 'url': 'http://www.bilibili.com/video/av1041170/',
'info_dict': { 'info_dict': {
'id': '1041170', 'id': '1507019',
'ext': 'mp4',
'title': '【BD1080P】刀语【诸神&异域】', 'title': '【BD1080P】刀语【诸神&异域】',
'description': '这是个神奇的故事~每个人不留弹幕不给走哦~切利哦!~', 'description': '这是个神奇的故事~每个人不留弹幕不给走哦~切利哦!~',
'timestamp': 1396530060,
'upload_date': '20140403',
'uploader': '枫叶逝去',
'uploader_id': '520116',
}, },
'playlist_count': 9,
}, { }, {
'url': 'http://www.bilibili.com/video/av4808130/', 'url': 'http://www.bilibili.com/video/av4808130/',
'info_dict': { 'info_dict': {
'id': '4808130', 'id': '7802182',
'ext': 'mp4',
'title': '【长篇】哆啦A梦443【钉铛】', 'title': '【长篇】哆啦A梦443【钉铛】',
'description': '(2016.05.27)来组合客人的脸吧&amp;amp;寻母六千里锭 抱歉,又轮到周日上班现在才到家 封面www.pixiv.net/member_illust.php?mode=medium&amp;amp;illust_id=56912929', 'description': '(2016.05.27)来组合客人的脸吧&amp;amp;寻母六千里锭 抱歉,又轮到周日上班现在才到家 封面www.pixiv.net/member_illust.php?mode=medium&amp;amp;illust_id=56912929',
'timestamp': 1464564180,
'upload_date': '20160529',
'uploader': '喜欢拉面',
'uploader_id': '151066',
}, },
'playlist': [{
'md5': '55cdadedf3254caaa0d5d27cf20a8f9c',
'info_dict': {
'id': '4808130_part1',
'ext': 'flv',
'title': '【长篇】哆啦A梦443【钉铛】',
'description': '(2016.05.27)来组合客人的脸吧&amp;amp;寻母六千里锭 抱歉,又轮到周日上班现在才到家 封面www.pixiv.net/member_illust.php?mode=medium&amp;amp;illust_id=56912929',
'timestamp': 1464564180,
'upload_date': '20160529',
'uploader': '喜欢拉面',
'uploader_id': '151066',
},
}, {
'md5': '926f9f67d0c482091872fbd8eca7ea3d',
'info_dict': {
'id': '4808130_part2',
'ext': 'flv',
'title': '【长篇】哆啦A梦443【钉铛】',
'description': '(2016.05.27)来组合客人的脸吧&amp;amp;寻母六千里锭 抱歉,又轮到周日上班现在才到家 封面www.pixiv.net/member_illust.php?mode=medium&amp;amp;illust_id=56912929',
'timestamp': 1464564180,
'upload_date': '20160529',
'uploader': '喜欢拉面',
'uploader_id': '151066',
},
}, {
'md5': '4b7b225b968402d7c32348c646f1fd83',
'info_dict': {
'id': '4808130_part3',
'ext': 'flv',
'title': '【长篇】哆啦A梦443【钉铛】',
'description': '(2016.05.27)来组合客人的脸吧&amp;amp;寻母六千里锭 抱歉,又轮到周日上班现在才到家 封面www.pixiv.net/member_illust.php?mode=medium&amp;amp;illust_id=56912929',
'timestamp': 1464564180,
'upload_date': '20160529',
'uploader': '喜欢拉面',
'uploader_id': '151066',
},
}, {
'md5': '7b795e214166501e9141139eea236e91',
'info_dict': {
'id': '4808130_part4',
'ext': 'flv',
'title': '【长篇】哆啦A梦443【钉铛】',
'description': '(2016.05.27)来组合客人的脸吧&amp;amp;寻母六千里锭 抱歉,又轮到周日上班现在才到家 封面www.pixiv.net/member_illust.php?mode=medium&amp;amp;illust_id=56912929',
'timestamp': 1464564180,
'upload_date': '20160529',
'uploader': '喜欢拉面',
'uploader_id': '151066',
},
}],
}, { }, {
# Missing upload time # Missing upload time
'url': 'http://www.bilibili.com/video/av1867637/', 'url': 'http://www.bilibili.com/video/av1867637/',
'info_dict': { 'info_dict': {
'id': '2880301', 'id': '2880301',
'ext': 'flv', 'ext': 'mp4',
'title': '【HDTV】【喜剧】岳父岳母真难当 2014【法国票房冠军】', 'title': '【HDTV】【喜剧】岳父岳母真难当 2014【法国票房冠军】',
'description': '一个信奉天主教的法国旧式传统资产阶级家庭中有四个女儿。三个女儿却分别找了阿拉伯、犹太、中国丈夫,老夫老妻唯独期盼剩下未嫁的小女儿能找一个信奉天主教的法国白人,结果没想到小女儿找了一位非裔黑人……【这次应该不会跳帧了】', 'description': '一个信奉天主教的法国旧式传统资产阶级家庭中有四个女儿。三个女儿却分别找了阿拉伯、犹太、中国丈夫,老夫老妻唯独期盼剩下未嫁的小女儿能找一个信奉天主教的法国白人,结果没想到小女儿找了一位非裔黑人……【这次应该不会跳帧了】',
'uploader': '黑夜为猫', 'uploader': '黑夜为猫',

View File

@@ -2,11 +2,15 @@
from __future__ import unicode_literals from __future__ import unicode_literals
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import remove_end from ..utils import (
ExtractorError,
remove_end,
)
from .rudo import RudoIE
class BioBioChileTVIE(InfoExtractor): class BioBioChileTVIE(InfoExtractor):
_VALID_URL = r'https?://tv\.biobiochile\.cl/notas/(?:[^/]+/)+(?P<id>[^/]+)\.shtml' _VALID_URL = r'https?://(?:tv|www)\.biobiochile\.cl/(?:notas|noticias)/(?:[^/]+/)+(?P<id>[^/]+)\.shtml'
_TESTS = [{ _TESTS = [{
'url': 'http://tv.biobiochile.cl/notas/2015/10/21/sobre-camaras-y-camarillas-parlamentarias.shtml', 'url': 'http://tv.biobiochile.cl/notas/2015/10/21/sobre-camaras-y-camarillas-parlamentarias.shtml',
@@ -18,6 +22,7 @@ class BioBioChileTVIE(InfoExtractor):
'thumbnail': 're:^https?://.*\.jpg$', 'thumbnail': 're:^https?://.*\.jpg$',
'uploader': 'Fernando Atria', 'uploader': 'Fernando Atria',
}, },
'skip': 'URL expired and redirected to http://www.biobiochile.cl/portada/bbtv/index.html',
}, { }, {
# different uploader layout # different uploader layout
'url': 'http://tv.biobiochile.cl/notas/2016/03/18/natalia-valdebenito-repasa-a-diputado-hasbun-paso-a-la-categoria-de-hablar-brutalidades.shtml', 'url': 'http://tv.biobiochile.cl/notas/2016/03/18/natalia-valdebenito-repasa-a-diputado-hasbun-paso-a-la-categoria-de-hablar-brutalidades.shtml',
@@ -32,6 +37,16 @@ class BioBioChileTVIE(InfoExtractor):
'params': { 'params': {
'skip_download': True, 'skip_download': True,
}, },
'skip': 'URL expired and redirected to http://www.biobiochile.cl/portada/bbtv/index.html',
}, {
'url': 'http://www.biobiochile.cl/noticias/bbtv/comentarios-bio-bio/2016/07/08/edecanes-del-congreso-figuras-decorativas-que-le-cuestan-muy-caro-a-los-chilenos.shtml',
'info_dict': {
'id': 'edecanes-del-congreso-figuras-decorativas-que-le-cuestan-muy-caro-a-los-chilenos',
'ext': 'mp4',
'uploader': '(none)',
'upload_date': '20160708',
'title': 'Edecanes del Congreso: Figuras decorativas que le cuestan muy caro a los chilenos',
},
}, { }, {
'url': 'http://tv.biobiochile.cl/notas/2015/10/22/ninos-transexuales-de-quien-es-la-decision.shtml', 'url': 'http://tv.biobiochile.cl/notas/2015/10/22/ninos-transexuales-de-quien-es-la-decision.shtml',
'only_matching': True, 'only_matching': True,
@@ -45,42 +60,22 @@ class BioBioChileTVIE(InfoExtractor):
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
rudo_url = RudoIE._extract_url(webpage)
if not rudo_url:
raise ExtractorError('No videos found')
title = remove_end(self._og_search_title(webpage), ' - BioBioChile TV') title = remove_end(self._og_search_title(webpage), ' - BioBioChile TV')
file_url = self._search_regex(
r'loadFWPlayerVideo\([^,]+,\s*(["\'])(?P<url>.+?)\1',
webpage, 'file url', group='url')
base_url = self._search_regex(
r'file\s*:\s*(["\'])(?P<url>.+?)\1\s*\+\s*fileURL', webpage,
'base url', default='http://unlimited2-cl.digitalproserver.com/bbtv/',
group='url')
formats = self._extract_m3u8_formats(
'%s%s/playlist.m3u8' % (base_url, file_url), video_id, 'mp4',
entry_protocol='m3u8_native', m3u8_id='hls', fatal=False)
f = {
'url': '%s%s' % (base_url, file_url),
'format_id': 'http',
'protocol': 'http',
'preference': 1,
}
if formats:
f_copy = formats[-1].copy()
f_copy.update(f)
f = f_copy
formats.append(f)
self._sort_formats(formats)
thumbnail = self._og_search_thumbnail(webpage) thumbnail = self._og_search_thumbnail(webpage)
uploader = self._html_search_regex( uploader = self._html_search_regex(
r'<a[^>]+href=["\']https?://busca\.biobiochile\.cl/author[^>]+>(.+?)</a>', r'<a[^>]+href=["\']https?://(?:busca|www)\.biobiochile\.cl/(?:lista/)?(?:author|autor)[^>]+>(.+?)</a>',
webpage, 'uploader', fatal=False) webpage, 'uploader', fatal=False)
return { return {
'_type': 'url_transparent',
'url': rudo_url,
'id': video_id, 'id': video_id,
'title': title, 'title': title,
'thumbnail': thumbnail, 'thumbnail': thumbnail,
'uploader': uploader, 'uploader': uploader,
'formats': formats,
} }

View File

@@ -24,7 +24,8 @@ class BIQLEIE(InfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'title': 'Ребенок в шоке от автоматической мойки', 'title': 'Ребенок в шоке от автоматической мойки',
'uploader': 'Dmitry Kotov', 'uploader': 'Dmitry Kotov',
} },
'skip': ' This video was marked as adult. Embedding adult videos on external sites is prohibited.',
}] }]
def _real_extract(self, url): def _real_extract(self, url):

View File

@@ -1,3 +1,4 @@
# coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import re import re
@@ -20,6 +21,18 @@ class BloombergIE(InfoExtractor):
'params': { 'params': {
'format': 'best[format_id^=hds]', 'format': 'best[format_id^=hds]',
}, },
}, {
# video ID in BPlayer(...)
'url': 'http://www.bloomberg.com/features/2016-hello-world-new-zealand/',
'info_dict': {
'id': '938c7e72-3f25-4ddb-8b85-a9be731baa74',
'ext': 'flv',
'title': 'Meet the Real-Life Tech Wizards of Middle Earth',
'description': 'Hello World, Episode 1: New Zealands freaky AI babies, robot exoskeletons, and a virtual you.',
},
'params': {
'format': 'best[format_id^=hds]',
},
}, { }, {
'url': 'http://www.bloomberg.com/news/articles/2015-11-12/five-strange-things-that-have-been-happening-in-financial-markets', 'url': 'http://www.bloomberg.com/news/articles/2015-11-12/five-strange-things-that-have-been-happening-in-financial-markets',
'only_matching': True, 'only_matching': True,
@@ -33,7 +46,11 @@ class BloombergIE(InfoExtractor):
webpage = self._download_webpage(url, name) webpage = self._download_webpage(url, name)
video_id = self._search_regex( video_id = self._search_regex(
r'["\']bmmrId["\']\s*:\s*(["\'])(?P<url>.+?)\1', r'["\']bmmrId["\']\s*:\s*(["\'])(?P<url>.+?)\1',
webpage, 'id', group='url') webpage, 'id', group='url', default=None)
if not video_id:
bplayer_data = self._parse_json(self._search_regex(
r'BPlayer\(null,\s*({[^;]+})\);', webpage, 'id'), name)
video_id = bplayer_data['id']
title = re.sub(': Video$', '', self._og_search_title(webpage)) title = re.sub(': Video$', '', self._og_search_title(webpage))
embed_info = self._download_json( embed_info = self._download_json(

View File

@@ -26,6 +26,8 @@ from ..utils import (
unescapeHTML, unescapeHTML,
unsmuggle_url, unsmuggle_url,
update_url_query, update_url_query,
clean_html,
mimetype2ext,
) )
@@ -90,6 +92,7 @@ class BrightcoveLegacyIE(InfoExtractor):
'description': 'md5:363109c02998fee92ec02211bd8000df', 'description': 'md5:363109c02998fee92ec02211bd8000df',
'uploader': 'National Ballet of Canada', 'uploader': 'National Ballet of Canada',
}, },
'skip': 'Video gone',
}, },
{ {
# test flv videos served by akamaihd.net # test flv videos served by akamaihd.net
@@ -108,7 +111,7 @@ class BrightcoveLegacyIE(InfoExtractor):
}, },
}, },
{ {
# playlist test # playlist with 'videoList'
# from http://support.brightcove.com/en/video-cloud/docs/playlist-support-single-video-players # from http://support.brightcove.com/en/video-cloud/docs/playlist-support-single-video-players
'url': 'http://c.brightcove.com/services/viewer/htmlFederated?playerID=3550052898001&playerKey=AQ%7E%7E%2CAAABmA9XpXk%7E%2C-Kp7jNgisre1fG5OdqpAFUTcs0lP_ZoL', 'url': 'http://c.brightcove.com/services/viewer/htmlFederated?playerID=3550052898001&playerKey=AQ%7E%7E%2CAAABmA9XpXk%7E%2C-Kp7jNgisre1fG5OdqpAFUTcs0lP_ZoL',
'info_dict': { 'info_dict': {
@@ -117,6 +120,15 @@ class BrightcoveLegacyIE(InfoExtractor):
}, },
'playlist_mincount': 7, 'playlist_mincount': 7,
}, },
{
# playlist with 'playlistTab' (https://github.com/rg3/youtube-dl/issues/9965)
'url': 'http://c.brightcove.com/services/json/experience/runtime/?command=get_programming_for_experience&playerKey=AQ%7E%7E,AAABXlLMdok%7E,NJ4EoMlZ4rZdx9eU1rkMVd8EaYPBBUlg',
'info_dict': {
'id': '1522758701001',
'title': 'Lesson 08',
},
'playlist_mincount': 10,
},
] ]
FLV_VCODECS = { FLV_VCODECS = {
1: 'SORENSON', 1: 'SORENSON',
@@ -298,13 +310,19 @@ class BrightcoveLegacyIE(InfoExtractor):
info_url, player_key, 'Downloading playlist information') info_url, player_key, 'Downloading playlist information')
json_data = json.loads(playlist_info) json_data = json.loads(playlist_info)
if 'videoList' not in json_data: if 'videoList' in json_data:
playlist_info = json_data['videoList']
playlist_dto = playlist_info['mediaCollectionDTO']
elif 'playlistTabs' in json_data:
playlist_info = json_data['playlistTabs']
playlist_dto = playlist_info['lineupListDTO']['playlistDTOs'][0]
else:
raise ExtractorError('Empty playlist') raise ExtractorError('Empty playlist')
playlist_info = json_data['videoList']
videos = [self._extract_video_info(video_info) for video_info in playlist_info['mediaCollectionDTO']['videoDTOs']] videos = [self._extract_video_info(video_info) for video_info in playlist_dto['videoDTOs']]
return self.playlist_result(videos, playlist_id='%s' % playlist_info['id'], return self.playlist_result(videos, playlist_id='%s' % playlist_info['id'],
playlist_title=playlist_info['mediaCollectionDTO']['displayName']) playlist_title=playlist_dto['displayName'])
def _extract_video_info(self, video_info): def _extract_video_info(self, video_info):
video_id = compat_str(video_info['id']) video_id = compat_str(video_info['id'])
@@ -528,14 +546,16 @@ class BrightcoveNewIE(InfoExtractor):
formats = [] formats = []
for source in json_data.get('sources', []): for source in json_data.get('sources', []):
container = source.get('container') container = source.get('container')
source_type = source.get('type') ext = mimetype2ext(source.get('type'))
src = source.get('src') src = source.get('src')
if source_type == 'application/x-mpegURL' or container == 'M2TS': if ext == 'ism':
continue
elif ext == 'm3u8' or container == 'M2TS':
if not src: if not src:
continue continue
formats.extend(self._extract_m3u8_formats( formats.extend(self._extract_m3u8_formats(
src, video_id, 'mp4', 'm3u8_native', m3u8_id='hls', fatal=False)) src, video_id, 'mp4', 'm3u8_native', m3u8_id='hls', fatal=False))
elif source_type == 'application/dash+xml': elif ext == 'mpd':
if not src: if not src:
continue continue
formats.extend(self._extract_mpd_formats(src, video_id, 'dash', fatal=False)) formats.extend(self._extract_mpd_formats(src, video_id, 'dash', fatal=False))
@@ -551,7 +571,7 @@ class BrightcoveNewIE(InfoExtractor):
'tbr': tbr, 'tbr': tbr,
'filesize': int_or_none(source.get('size')), 'filesize': int_or_none(source.get('size')),
'container': container, 'container': container,
'ext': container.lower(), 'ext': ext or container.lower(),
} }
if width == 0 and height == 0: if width == 0 and height == 0:
f.update({ f.update({
@@ -585,6 +605,13 @@ class BrightcoveNewIE(InfoExtractor):
'format_id': build_format_id('rtmp'), 'format_id': build_format_id('rtmp'),
}) })
formats.append(f) formats.append(f)
errors = json_data.get('errors')
if not formats and errors:
error = errors[0]
raise ExtractorError(
error.get('message') or error.get('error_subcode') or error['error_code'], expected=True)
self._sort_formats(formats) self._sort_formats(formats)
subtitles = {} subtitles = {}
@@ -597,7 +624,7 @@ class BrightcoveNewIE(InfoExtractor):
return { return {
'id': video_id, 'id': video_id,
'title': title, 'title': title,
'description': json_data.get('description'), 'description': clean_html(json_data.get('description')),
'thumbnail': json_data.get('thumbnail') or json_data.get('poster'), 'thumbnail': json_data.get('thumbnail') or json_data.get('poster'),
'duration': float_or_none(json_data.get('duration'), 1000), 'duration': float_or_none(json_data.get('duration'), 1000),
'timestamp': parse_iso8601(json_data.get('published_at')), 'timestamp': parse_iso8601(json_data.get('published_at')),

View File

@@ -1,7 +1,6 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import datetime
import re import re
from .common import InfoExtractor from .common import InfoExtractor
@@ -10,8 +9,10 @@ from ..compat import (
compat_urlparse, compat_urlparse,
) )
from ..utils import ( from ..utils import (
parse_iso8601, clean_html,
parse_duration,
str_to_int, str_to_int,
unified_strdate,
) )
@@ -26,14 +27,14 @@ class CamdemyIE(InfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'title': 'Ch1-1 Introduction, Signals (02-23-2012)', 'title': 'Ch1-1 Introduction, Signals (02-23-2012)',
'thumbnail': 're:^https?://.*\.jpg$', 'thumbnail': 're:^https?://.*\.jpg$',
'description': '',
'creator': 'ss11spring', 'creator': 'ss11spring',
'duration': 1591,
'upload_date': '20130114', 'upload_date': '20130114',
'timestamp': 1358154556,
'view_count': int, 'view_count': int,
} }
}, { }, {
# With non-empty description # With non-empty description
# webpage returns "No permission or not login"
'url': 'http://www.camdemy.com/media/13885', 'url': 'http://www.camdemy.com/media/13885',
'md5': '4576a3bb2581f86c61044822adbd1249', 'md5': '4576a3bb2581f86c61044822adbd1249',
'info_dict': { 'info_dict': {
@@ -41,64 +42,71 @@ class CamdemyIE(InfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'title': 'EverCam + Camdemy QuickStart', 'title': 'EverCam + Camdemy QuickStart',
'thumbnail': 're:^https?://.*\.jpg$', 'thumbnail': 're:^https?://.*\.jpg$',
'description': 'md5:050b62f71ed62928f8a35f1a41e186c9', 'description': 'md5:2a9f989c2b153a2342acee579c6e7db6',
'creator': 'evercam', 'creator': 'evercam',
'upload_date': '20140620', 'duration': 318,
'timestamp': 1403271569,
} }
}, { }, {
# External source # External source (YouTube)
'url': 'http://www.camdemy.com/media/14842', 'url': 'http://www.camdemy.com/media/14842',
'md5': '50e1c3c3aa233d3d7b7daa2fa10b1cf7',
'info_dict': { 'info_dict': {
'id': '2vsYQzNIsJo', 'id': '2vsYQzNIsJo',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Excel 2013 Tutorial - How to add Password Protection',
'description': 'Excel 2013 Tutorial for Beginners - How to add Password Protection',
'upload_date': '20130211', 'upload_date': '20130211',
'uploader': 'Hun Kim', 'uploader': 'Hun Kim',
'description': 'Excel 2013 Tutorial for Beginners - How to add Password Protection',
'uploader_id': 'hunkimtutorials', 'uploader_id': 'hunkimtutorials',
'title': 'Excel 2013 Tutorial - How to add Password Protection', },
} 'params': {
'skip_download': True,
},
}] }]
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
page = self._download_webpage(url, video_id)
webpage = self._download_webpage(url, video_id)
src_from = self._html_search_regex( src_from = self._html_search_regex(
r"<div class='srcFrom'>Source: <a title='([^']+)'", page, r"class=['\"]srcFrom['\"][^>]*>Sources?(?:\s+from)?\s*:\s*<a[^>]+(?:href|title)=(['\"])(?P<url>(?:(?!\1).)+)\1",
'external source', default=None) webpage, 'external source', default=None, group='url')
if src_from: if src_from:
return self.url_result(src_from) return self.url_result(src_from)
oembed_obj = self._download_json( oembed_obj = self._download_json(
'http://www.camdemy.com/oembed/?format=json&url=' + url, video_id) 'http://www.camdemy.com/oembed/?format=json&url=' + url, video_id)
title = oembed_obj['title']
thumb_url = oembed_obj['thumbnail_url'] thumb_url = oembed_obj['thumbnail_url']
video_folder = compat_urlparse.urljoin(thumb_url, 'video/') video_folder = compat_urlparse.urljoin(thumb_url, 'video/')
file_list_doc = self._download_xml( file_list_doc = self._download_xml(
compat_urlparse.urljoin(video_folder, 'fileList.xml'), compat_urlparse.urljoin(video_folder, 'fileList.xml'),
video_id, 'Filelist XML') video_id, 'Downloading filelist XML')
file_name = file_list_doc.find('./video/item/fileName').text file_name = file_list_doc.find('./video/item/fileName').text
video_url = compat_urlparse.urljoin(video_folder, file_name) video_url = compat_urlparse.urljoin(video_folder, file_name)
timestamp = parse_iso8601(self._html_search_regex( # Some URLs return "No permission or not login" in a webpage despite being
r"<div class='title'>Posted\s*:</div>\s*<div class='value'>([^<>]+)<", # freely available via oembed JSON URL (e.g. http://www.camdemy.com/media/13885)
page, 'creation time', fatal=False), upload_date = unified_strdate(self._search_regex(
delimiter=' ', timezone=datetime.timedelta(hours=8)) r'>published on ([^<]+)<', webpage,
view_count = str_to_int(self._html_search_regex( 'upload date', default=None))
r"<div class='title'>Views\s*:</div>\s*<div class='value'>([^<>]+)<", view_count = str_to_int(self._search_regex(
page, 'view count', fatal=False)) r'role=["\']viewCnt["\'][^>]*>([\d,.]+) views',
webpage, 'view count', default=None))
description = self._html_search_meta(
'description', webpage, default=None) or clean_html(
oembed_obj.get('description'))
return { return {
'id': video_id, 'id': video_id,
'url': video_url, 'url': video_url,
'title': oembed_obj['title'], 'title': title,
'thumbnail': thumb_url, 'thumbnail': thumb_url,
'description': self._html_search_meta('description', page), 'description': description,
'creator': oembed_obj['author_name'], 'creator': oembed_obj.get('author_name'),
'duration': oembed_obj['duration'], 'duration': parse_duration(oembed_obj.get('duration')),
'timestamp': timestamp, 'upload_date': upload_date,
'view_count': view_count, 'view_count': view_count,
} }

View File

@@ -4,9 +4,11 @@ from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_str
from ..utils import ( from ..utils import (
js_to_json, js_to_json,
smuggle_url, smuggle_url,
try_get,
) )
@@ -25,8 +27,22 @@ class CBCIE(InfoExtractor):
'upload_date': '20160203', 'upload_date': '20160203',
'uploader': 'CBCC-NEW', 'uploader': 'CBCC-NEW',
}, },
'skip': 'Geo-restricted to Canada',
}, { }, {
# with clipId # with clipId, feed available via tpfeed.cbc.ca and feed.theplatform.com
'url': 'http://www.cbc.ca/22minutes/videos/22-minutes-update/22-minutes-update-episode-4',
'md5': '162adfa070274b144f4fdc3c3b8207db',
'info_dict': {
'id': '2414435309',
'ext': 'mp4',
'title': '22 Minutes Update: What Not To Wear Quebec',
'description': "This week's latest Canadian top political story is What Not To Wear Quebec.",
'upload_date': '20131025',
'uploader': 'CBCC-NEW',
'timestamp': 1382717907,
},
}, {
# with clipId, feed only available via tpfeed.cbc.ca
'url': 'http://www.cbc.ca/archives/entry/1978-robin-williams-freestyles-on-90-minutes-live', 'url': 'http://www.cbc.ca/archives/entry/1978-robin-williams-freestyles-on-90-minutes-live',
'md5': '0274a90b51a9b4971fe005c63f592f12', 'md5': '0274a90b51a9b4971fe005c63f592f12',
'info_dict': { 'info_dict': {
@@ -64,6 +80,7 @@ class CBCIE(InfoExtractor):
'uploader': 'CBCC-NEW', 'uploader': 'CBCC-NEW',
}, },
}], }],
'skip': 'Geo-restricted to Canada',
}] }]
@classmethod @classmethod
@@ -81,9 +98,15 @@ class CBCIE(InfoExtractor):
media_id = player_info.get('mediaId') media_id = player_info.get('mediaId')
if not media_id: if not media_id:
clip_id = player_info['clipId'] clip_id = player_info['clipId']
media_id = self._download_json( feed = self._download_json(
'http://feed.theplatform.com/f/h9dtGB/punlNGjMlc1F?fields=id&byContent=byReleases%3DbyId%253D' + clip_id, 'http://tpfeed.cbc.ca/f/ExhSPC/vms_5akSXx4Ng_Zn?byCustomValue={:mpsReleases}{%s}' % clip_id,
clip_id)['entries'][0]['id'].split('/')[-1] clip_id, fatal=False)
if feed:
media_id = try_get(feed, lambda x: x['entries'][0]['guid'], compat_str)
if not media_id:
media_id = self._download_json(
'http://feed.theplatform.com/f/h9dtGB/punlNGjMlc1F?fields=id&byContent=byReleases%3DbyId%253D' + clip_id,
clip_id)['entries'][0]['id'].split('/')[-1]
return self.url_result('cbcplayer:%s' % media_id, 'CBCPlayer', media_id) return self.url_result('cbcplayer:%s' % media_id, 'CBCPlayer', media_id)
else: else:
entries = [self.url_result('cbcplayer:%s' % media_id, 'CBCPlayer', media_id) for media_id in re.findall(r'<iframe[^>]+src="[^"]+?mediaId=(\d+)"', webpage)] entries = [self.url_result('cbcplayer:%s' % media_id, 'CBCPlayer', media_id) for media_id in re.findall(r'<iframe[^>]+src="[^"]+?mediaId=(\d+)"', webpage)]
@@ -104,6 +127,7 @@ class CBCPlayerIE(InfoExtractor):
'upload_date': '20160210', 'upload_date': '20160210',
'uploader': 'CBCC-NEW', 'uploader': 'CBCC-NEW',
}, },
'skip': 'Geo-restricted to Canada',
}, { }, {
# Redirected from http://www.cbc.ca/player/AudioMobile/All%20in%20a%20Weekend%20Montreal/ID/2657632011/ # Redirected from http://www.cbc.ca/player/AudioMobile/All%20in%20a%20Weekend%20Montreal/ID/2657632011/
'url': 'http://www.cbc.ca/player/play/2657631896', 'url': 'http://www.cbc.ca/player/play/2657631896',

View File

@@ -1,12 +1,10 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import calendar
import datetime
from .anvato import AnvatoIE from .anvato import AnvatoIE
from .sendtonews import SendtoNewsIE from .sendtonews import SendtoNewsIE
from ..compat import compat_urlparse from ..compat import compat_urlparse
from ..utils import unified_timestamp
class CBSLocalIE(AnvatoIE): class CBSLocalIE(AnvatoIE):
@@ -71,10 +69,7 @@ class CBSLocalIE(AnvatoIE):
time_str = self._html_search_regex( time_str = self._html_search_regex(
r'class="entry-date">([^<]+)<', webpage, 'released date', fatal=False) r'class="entry-date">([^<]+)<', webpage, 'released date', fatal=False)
timestamp = None timestamp = unified_timestamp(time_str)
if time_str:
timestamp = calendar.timegm(datetime.datetime.strptime(
time_str, '%b %d, %Y %I:%M %p').timetuple())
info_dict.update({ info_dict.update({
'display_id': display_id, 'display_id': display_id,

View File

@@ -26,6 +26,7 @@ class CBSNewsIE(CBSBaseIE):
# rtmp download # rtmp download
'skip_download': True, 'skip_download': True,
}, },
'skip': 'Subscribers only',
}, },
{ {
'url': 'http://www.cbsnews.com/videos/fort-hood-shooting-army-downplays-mental-illness-as-cause-of-attack/', 'url': 'http://www.cbsnews.com/videos/fort-hood-shooting-army-downplays-mental-illness-as-cause-of-attack/',
@@ -69,7 +70,7 @@ class CBSNewsLiveVideoIE(InfoExtractor):
IE_DESC = 'CBS News Live Videos' IE_DESC = 'CBS News Live Videos'
_VALID_URL = r'https?://(?:www\.)?cbsnews\.com/live/video/(?P<id>[\da-z_-]+)' _VALID_URL = r'https?://(?:www\.)?cbsnews\.com/live/video/(?P<id>[\da-z_-]+)'
_TEST = { _TESTS = [{
'url': 'http://www.cbsnews.com/live/video/clinton-sanders-prepare-to-face-off-in-nh/', 'url': 'http://www.cbsnews.com/live/video/clinton-sanders-prepare-to-face-off-in-nh/',
'info_dict': { 'info_dict': {
'id': 'clinton-sanders-prepare-to-face-off-in-nh', 'id': 'clinton-sanders-prepare-to-face-off-in-nh',
@@ -77,7 +78,15 @@ class CBSNewsLiveVideoIE(InfoExtractor):
'title': 'Clinton, Sanders Prepare To Face Off In NH', 'title': 'Clinton, Sanders Prepare To Face Off In NH',
'duration': 334, 'duration': 334,
}, },
} 'skip': 'Video gone, redirected to http://www.cbsnews.com/live/',
}, {
'url': 'http://www.cbsnews.com/live/video/video-shows-intense-paragliding-accident/',
'info_dict': {
'id': 'video-shows-intense-paragliding-accident',
'ext': 'flv',
'title': 'Video Shows Intense Paragliding Accident',
},
}]
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)

View File

@@ -17,7 +17,8 @@ class ChaturbateIE(InfoExtractor):
}, },
'params': { 'params': {
'skip_download': True, 'skip_download': True,
} },
'skip': 'Room is offline',
}, { }, {
'url': 'https://en.chaturbate.com/siswet19/', 'url': 'https://en.chaturbate.com/siswet19/',
'only_matching': True, 'only_matching': True,

View File

@@ -1,30 +1,33 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import base64
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import parse_duration
parse_duration,
int_or_none,
)
class ChirbitIE(InfoExtractor): class ChirbitIE(InfoExtractor):
IE_NAME = 'chirbit' IE_NAME = 'chirbit'
_VALID_URL = r'https?://(?:www\.)?chirb\.it/(?:(?:wp|pl)/|fb_chirbit_player\.swf\?key=)?(?P<id>[\da-zA-Z]+)' _VALID_URL = r'https?://(?:www\.)?chirb\.it/(?:(?:wp|pl)/|fb_chirbit_player\.swf\?key=)?(?P<id>[\da-zA-Z]+)'
_TESTS = [{ _TESTS = [{
'url': 'http://chirb.it/PrIPv5', 'url': 'http://chirb.it/be2abG',
'md5': '9847b0dad6ac3e074568bf2cfb197de8',
'info_dict': { 'info_dict': {
'id': 'PrIPv5', 'id': 'be2abG',
'ext': 'mp3', 'ext': 'mp3',
'title': 'Фасадстрой', 'title': 'md5:f542ea253f5255240be4da375c6a5d7e',
'duration': 52, 'description': 'md5:f24a4e22a71763e32da5fed59e47c770',
'view_count': int, 'duration': 306,
'comment_count': int, },
'params': {
'skip_download': True,
} }
}, { }, {
'url': 'https://chirb.it/fb_chirbit_player.swf?key=PrIPv5', 'url': 'https://chirb.it/fb_chirbit_player.swf?key=PrIPv5',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://chirb.it/wp/MN58c2',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
@@ -33,27 +36,30 @@ class ChirbitIE(InfoExtractor):
webpage = self._download_webpage( webpage = self._download_webpage(
'http://chirb.it/%s' % audio_id, audio_id) 'http://chirb.it/%s' % audio_id, audio_id)
audio_url = self._search_regex( data_fd = self._search_regex(
r'"setFile"\s*,\s*"([^"]+)"', webpage, 'audio url') r'data-fd=(["\'])(?P<url>(?:(?!\1).)+)\1',
webpage, 'data fd', group='url')
# Reverse engineered from https://chirb.it/js/chirbit.player.js (look
# for soundURL)
audio_url = base64.b64decode(
data_fd[::-1].encode('ascii')).decode('utf-8')
title = self._search_regex( title = self._search_regex(
r'itemprop="name">([^<]+)', webpage, 'title') r'class=["\']chirbit-title["\'][^>]*>([^<]+)', webpage, 'title')
duration = parse_duration(self._html_search_meta( description = self._search_regex(
'duration', webpage, 'duration', fatal=False)) r'<h3>Description</h3>\s*<pre[^>]*>([^<]+)</pre>',
view_count = int_or_none(self._search_regex( webpage, 'description', default=None)
r'itemprop="playCount"\s*>(\d+)', webpage, duration = parse_duration(self._search_regex(
'listen count', fatal=False)) r'class=["\']c-length["\'][^>]*>([^<]+)',
comment_count = int_or_none(self._search_regex( webpage, 'duration', fatal=False))
r'>(\d+) Comments?:', webpage,
'comment count', fatal=False))
return { return {
'id': audio_id, 'id': audio_id,
'url': audio_url, 'url': audio_url,
'title': title, 'title': title,
'description': description,
'duration': duration, 'duration': duration,
'view_count': view_count,
'comment_count': comment_count,
} }

View File

@@ -23,7 +23,7 @@ class CliphunterIE(InfoExtractor):
(?P<id>[0-9]+)/ (?P<id>[0-9]+)/
(?P<seo>.+?)(?:$|[#\?]) (?P<seo>.+?)(?:$|[#\?])
''' '''
_TEST = { _TESTS = [{
'url': 'http://www.cliphunter.com/w/1012420/Fun_Jynx_Maze_solo', 'url': 'http://www.cliphunter.com/w/1012420/Fun_Jynx_Maze_solo',
'md5': 'b7c9bbd4eb3a226ab91093714dcaa480', 'md5': 'b7c9bbd4eb3a226ab91093714dcaa480',
'info_dict': { 'info_dict': {
@@ -32,8 +32,19 @@ class CliphunterIE(InfoExtractor):
'title': 'Fun Jynx Maze solo', 'title': 'Fun Jynx Maze solo',
'thumbnail': 're:^https?://.*\.jpg$', 'thumbnail': 're:^https?://.*\.jpg$',
'age_limit': 18, 'age_limit': 18,
} },
} 'skip': 'Video gone',
}, {
'url': 'http://www.cliphunter.com/w/2019449/ShesNew__My_booty_girlfriend_Victoria_Paradices_pussy_filled_with_jizz',
'md5': '55a723c67bfc6da6b0cfa00d55da8a27',
'info_dict': {
'id': '2019449',
'ext': 'mp4',
'title': 'ShesNew - My booty girlfriend, Victoria Paradice\'s pussy filled with jizz',
'thumbnail': 're:^https?://.*\.jpg$',
'age_limit': 18,
},
}]
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)

View File

@@ -1,16 +1,10 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
from .common import InfoExtractor from .onet import OnetBaseIE
from ..utils import (
ExtractorError,
float_or_none,
int_or_none,
parse_iso8601,
)
class ClipRsIE(InfoExtractor): class ClipRsIE(OnetBaseIE):
_VALID_URL = r'https?://(?:www\.)?clip\.rs/(?P<id>[^/]+)/\d+' _VALID_URL = r'https?://(?:www\.)?clip\.rs/(?P<id>[^/]+)/\d+'
_TEST = { _TEST = {
'url': 'http://www.clip.rs/premijera-frajle-predstavljaju-novi-spot-za-pesmu-moli-me-moli/3732', 'url': 'http://www.clip.rs/premijera-frajle-predstavljaju-novi-spot-za-pesmu-moli-me-moli/3732',
@@ -27,64 +21,13 @@ class ClipRsIE(InfoExtractor):
} }
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) display_id = self._match_id(url)
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, display_id)
video_id = self._search_regex( mvp_id = self._search_mvp_id(webpage)
r'id=(["\'])mvp:(?P<id>.+?)\1', webpage, 'mvp id', group='id')
response = self._download_json( info_dict = self._extract_from_id(mvp_id, webpage)
'http://qi.ckm.onetapi.pl/', video_id, info_dict['display_id'] = display_id
query={
'body[id]': video_id,
'body[jsonrpc]': '2.0',
'body[method]': 'get_asset_detail',
'body[params][ID_Publikacji]': video_id,
'body[params][Service]': 'www.onet.pl',
'content-type': 'application/jsonp',
'x-onet-app': 'player.front.onetapi.pl',
})
error = response.get('error') return info_dict
if error:
raise ExtractorError(
'%s said: %s' % (self.IE_NAME, error['message']), expected=True)
video = response['result'].get('0')
formats = []
for _, formats_dict in video['formats'].items():
if not isinstance(formats_dict, dict):
continue
for format_id, format_list in formats_dict.items():
if not isinstance(format_list, list):
continue
for f in format_list:
if not f.get('url'):
continue
formats.append({
'url': f['url'],
'format_id': format_id,
'height': int_or_none(f.get('vertical_resolution')),
'width': int_or_none(f.get('horizontal_resolution')),
'abr': float_or_none(f.get('audio_bitrate')),
'vbr': float_or_none(f.get('video_bitrate')),
})
self._sort_formats(formats)
meta = video.get('meta', {})
title = self._og_search_title(webpage, default=None) or meta['title']
description = self._og_search_description(webpage, default=None) or meta.get('description')
duration = meta.get('length') or meta.get('lenght')
timestamp = parse_iso8601(meta.get('addDate'), ' ')
return {
'id': video_id,
'title': title,
'description': description,
'duration': duration,
'timestamp': timestamp,
'formats': formats,
}

View File

@@ -6,7 +6,6 @@ import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import ( from ..compat import (
compat_parse_qs, compat_parse_qs,
compat_urllib_parse_urlencode,
compat_HTTPError, compat_HTTPError,
) )
from ..utils import ( from ..utils import (
@@ -17,37 +16,26 @@ from ..utils import (
class CloudyIE(InfoExtractor): class CloudyIE(InfoExtractor):
_IE_DESC = 'cloudy.ec and videoraj.ch' _IE_DESC = 'cloudy.ec'
_VALID_URL = r'''(?x) _VALID_URL = r'''(?x)
https?://(?:www\.)?(?P<host>cloudy\.ec|videoraj\.(?:ch|to))/ https?://(?:www\.)?cloudy\.ec/
(?:v/|embed\.php\?id=) (?:v/|embed\.php\?id=)
(?P<id>[A-Za-z0-9]+) (?P<id>[A-Za-z0-9]+)
''' '''
_EMBED_URL = 'http://www.%s/embed.php?id=%s' _EMBED_URL = 'http://www.cloudy.ec/embed.php?id=%s'
_API_URL = 'http://www.%s/api/player.api.php?%s' _API_URL = 'http://www.cloudy.ec/api/player.api.php'
_MAX_TRIES = 2 _MAX_TRIES = 2
_TESTS = [ _TEST = {
{ 'url': 'https://www.cloudy.ec/v/af511e2527aac',
'url': 'https://www.cloudy.ec/v/af511e2527aac', 'md5': '5cb253ace826a42f35b4740539bedf07',
'md5': '5cb253ace826a42f35b4740539bedf07', 'info_dict': {
'info_dict': { 'id': 'af511e2527aac',
'id': 'af511e2527aac', 'ext': 'flv',
'ext': 'flv', 'title': 'Funny Cats and Animals Compilation june 2013',
'title': 'Funny Cats and Animals Compilation june 2013',
}
},
{
'url': 'http://www.videoraj.to/v/47f399fd8bb60',
'md5': '7d0f8799d91efd4eda26587421c3c3b0',
'info_dict': {
'id': '47f399fd8bb60',
'ext': 'flv',
'title': 'Burning a New iPhone 5 with Gasoline - Will it Survive?',
}
} }
] }
def _extract_video(self, video_host, video_id, file_key, error_url=None, try_num=0): def _extract_video(self, video_id, file_key, error_url=None, try_num=0):
if try_num > self._MAX_TRIES - 1: if try_num > self._MAX_TRIES - 1:
raise ExtractorError('Unable to extract video URL', expected=True) raise ExtractorError('Unable to extract video URL', expected=True)
@@ -64,9 +52,8 @@ class CloudyIE(InfoExtractor):
'errorUrl': error_url, 'errorUrl': error_url,
}) })
data_url = self._API_URL % (video_host, compat_urllib_parse_urlencode(form))
player_data = self._download_webpage( player_data = self._download_webpage(
data_url, video_id, 'Downloading player data') self._API_URL, video_id, 'Downloading player data', query=form)
data = compat_parse_qs(player_data) data = compat_parse_qs(player_data)
try_num += 1 try_num += 1
@@ -88,7 +75,7 @@ class CloudyIE(InfoExtractor):
except ExtractorError as e: except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code in [404, 410]: if isinstance(e.cause, compat_HTTPError) and e.cause.code in [404, 410]:
self.report_warning('Invalid video URL, requesting another', video_id) self.report_warning('Invalid video URL, requesting another', video_id)
return self._extract_video(video_host, video_id, file_key, video_url, try_num) return self._extract_video(video_id, file_key, video_url, try_num)
return { return {
'id': video_id, 'id': video_id,
@@ -98,14 +85,13 @@ class CloudyIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
video_host = mobj.group('host')
video_id = mobj.group('id') video_id = mobj.group('id')
url = self._EMBED_URL % (video_host, video_id) url = self._EMBED_URL % video_id
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
file_key = self._search_regex( file_key = self._search_regex(
[r'key\s*:\s*"([^"]+)"', r'filekey\s*=\s*"([^"]+)"'], [r'key\s*:\s*"([^"]+)"', r'filekey\s*=\s*"([^"]+)"'],
webpage, 'file_key') webpage, 'file_key')
return self._extract_video(video_host, video_id, file_key) return self._extract_video(video_id, file_key)

View File

@@ -1,5 +1,7 @@
from __future__ import unicode_literals from __future__ import unicode_literals
from .mtv import MTVIE from .mtv import MTVIE
from ..utils import ExtractorError
class CMTIE(MTVIE): class CMTIE(MTVIE):
@@ -16,7 +18,27 @@ class CMTIE(MTVIE):
'title': 'Garth Brooks - "The Call (featuring Trisha Yearwood)"', 'title': 'Garth Brooks - "The Call (featuring Trisha Yearwood)"',
'description': 'Blame It All On My Roots', 'description': 'Blame It All On My Roots',
}, },
'skip': 'Video not available',
}, {
'url': 'http://www.cmt.com/videos/misc/1504699/still-the-king-ep-109-in-3-minutes.jhtml#id=1739908',
'md5': 'e61a801ca4a183a466c08bd98dccbb1c',
'info_dict': {
'id': '1504699',
'ext': 'mp4',
'title': 'Still The King Ep. 109 in 3 Minutes',
'description': 'Relive or catch up with Still The King by watching this recap of season 1, episode 9. New episodes Sundays 9/8c.',
'timestamp': 1469421000.0,
'upload_date': '20160725',
},
}, { }, {
'url': 'http://www.cmt.com/shows/party-down-south/party-down-south-ep-407-gone-girl/1738172/playlist/#id=1738172', 'url': 'http://www.cmt.com/shows/party-down-south/party-down-south-ep-407-gone-girl/1738172/playlist/#id=1738172',
'only_matching': True, 'only_matching': True,
}] }]
@classmethod
def _transform_rtmp_url(cls, rtmp_video_url):
if 'error_not_available.swf' in rtmp_video_url:
raise ExtractorError(
'%s said: video is not available' % cls.IE_NAME, expected=True)
return super(CMTIE, cls)._transform_rtmp_url(rtmp_video_url)

View File

@@ -1,17 +1,7 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import re
from .mtv import MTVServicesInfoExtractor from .mtv import MTVServicesInfoExtractor
from ..compat import ( from .common import InfoExtractor
compat_str,
compat_urllib_parse_urlencode,
)
from ..utils import (
ExtractorError,
float_or_none,
unified_strdate,
)
class ComedyCentralIE(MTVServicesInfoExtractor): class ComedyCentralIE(MTVServicesInfoExtractor):
@@ -26,8 +16,10 @@ class ComedyCentralIE(MTVServicesInfoExtractor):
'info_dict': { 'info_dict': {
'id': 'cef0cbb3-e776-4bc9-b62e-8016deccb354', 'id': 'cef0cbb3-e776-4bc9-b62e-8016deccb354',
'ext': 'mp4', 'ext': 'mp4',
'title': 'CC:Stand-Up|Greg Fitzsimmons: Life on Stage|Uncensored - Too Good of a Mother', 'title': 'CC:Stand-Up|August 18, 2013|1|0101|Uncensored - Too Good of a Mother',
'description': 'After a certain point, breastfeeding becomes c**kblocking.', 'description': 'After a certain point, breastfeeding becomes c**kblocking.',
'timestamp': 1376798400,
'upload_date': '20130818',
}, },
}, { }, {
'url': 'http://www.cc.com/shows/the-daily-show-with-trevor-noah/interviews/6yx39d/exclusive-rand-paul-extended-interview', 'url': 'http://www.cc.com/shows/the-daily-show-with-trevor-noah/interviews/6yx39d/exclusive-rand-paul-extended-interview',
@@ -35,241 +27,92 @@ class ComedyCentralIE(MTVServicesInfoExtractor):
}] }]
class ComedyCentralShowsIE(MTVServicesInfoExtractor): class ToshIE(MTVServicesInfoExtractor):
IE_DESC = 'The Daily Show / The Colbert Report' IE_DESC = 'Tosh.0'
# urls can be abbreviations like :thedailyshow _VALID_URL = r'^https?://tosh\.cc\.com/video-(?:clips|collections)/[^/]+/(?P<videotitle>[^/?#]+)'
# urls for episodes like: _FEED_URL = 'http://tosh.cc.com/feeds/mrss'
# or urls for clips like: http://www.thedailyshow.com/watch/mon-december-10-2012/any-given-gun-day
# or: http://www.colbertnation.com/the-colbert-report-videos/421667/november-29-2012/moon-shattering-news
# or: http://www.colbertnation.com/the-colbert-report-collections/422008/festival-of-lights/79524
_VALID_URL = r'''(?x)^(:(?P<shortname>tds|thedailyshow)
|https?://(:www\.)?
(?P<showname>thedailyshow|thecolbertreport|tosh)\.(?:cc\.)?com/
((?:full-)?episodes/(?:[0-9a-z]{6}/)?(?P<episode>.*)|
(?P<clip>
(?:(?:guests/[^/]+|videos|video-(?:clips|playlists)|special-editions|news-team/[^/]+)/[^/]+/(?P<videotitle>[^/?#]+))
|(the-colbert-report-(videos|collections)/(?P<clipID>[0-9]+)/[^/]*/(?P<cntitle>.*?))
|(watch/(?P<date>[^/]*)/(?P<tdstitle>.*))
)|
(?P<interview>
extended-interviews/(?P<interID>[0-9a-z]+)/
(?:playlist_tds_extended_)?(?P<interview_title>[^/?#]*?)
(?:/[^/?#]?|[?#]|$))))
'''
_TESTS = [{ _TESTS = [{
'url': 'http://thedailyshow.cc.com/watch/thu-december-13-2012/kristen-stewart',
'md5': '4e2f5cb088a83cd8cdb7756132f9739d',
'info_dict': {
'id': 'ab9ab3e7-5a98-4dbe-8b21-551dc0523d55',
'ext': 'mp4',
'upload_date': '20121213',
'description': 'Kristen Stewart learns to let loose in "On the Road."',
'uploader': 'thedailyshow',
'title': 'thedailyshow kristen-stewart part 1',
}
}, {
'url': 'http://thedailyshow.cc.com/extended-interviews/b6364d/sarah-chayes-extended-interview',
'info_dict': {
'id': 'sarah-chayes-extended-interview',
'description': 'Carnegie Endowment Senior Associate Sarah Chayes discusses how corrupt institutions function throughout the world in her book "Thieves of State: Why Corruption Threatens Global Security."',
'title': 'thedailyshow Sarah Chayes Extended Interview',
},
'playlist': [
{
'info_dict': {
'id': '0baad492-cbec-4ec1-9e50-ad91c291127f',
'ext': 'mp4',
'upload_date': '20150129',
'description': 'Carnegie Endowment Senior Associate Sarah Chayes discusses how corrupt institutions function throughout the world in her book "Thieves of State: Why Corruption Threatens Global Security."',
'uploader': 'thedailyshow',
'title': 'thedailyshow sarah-chayes-extended-interview part 1',
},
},
{
'info_dict': {
'id': '1e4fb91b-8ce7-4277-bd7c-98c9f1bbd283',
'ext': 'mp4',
'upload_date': '20150129',
'description': 'Carnegie Endowment Senior Associate Sarah Chayes discusses how corrupt institutions function throughout the world in her book "Thieves of State: Why Corruption Threatens Global Security."',
'uploader': 'thedailyshow',
'title': 'thedailyshow sarah-chayes-extended-interview part 2',
},
},
],
'params': {
'skip_download': True,
},
}, {
'url': 'http://thedailyshow.cc.com/extended-interviews/xm3fnq/andrew-napolitano-extended-interview',
'only_matching': True,
}, {
'url': 'http://thecolbertreport.cc.com/videos/29w6fx/-realhumanpraise-for-fox-news',
'only_matching': True,
}, {
'url': 'http://thecolbertreport.cc.com/videos/gh6urb/neil-degrasse-tyson-pt--1?xrs=eml_col_031114',
'only_matching': True,
}, {
'url': 'http://thedailyshow.cc.com/guests/michael-lewis/3efna8/exclusive---michael-lewis-extended-interview-pt--3',
'only_matching': True,
}, {
'url': 'http://thedailyshow.cc.com/episodes/sy7yv0/april-8--2014---denis-leary',
'only_matching': True,
}, {
'url': 'http://thecolbertreport.cc.com/episodes/8ase07/april-8--2014---jane-goodall',
'only_matching': True,
}, {
'url': 'http://thedailyshow.cc.com/video-playlists/npde3s/the-daily-show-19088-highlights',
'only_matching': True,
}, {
'url': 'http://thedailyshow.cc.com/video-playlists/t6d9sg/the-daily-show-20038-highlights/be3cwo',
'only_matching': True,
}, {
'url': 'http://thedailyshow.cc.com/special-editions/2l8fdb/special-edition---a-look-back-at-food',
'only_matching': True,
}, {
'url': 'http://thedailyshow.cc.com/news-team/michael-che/7wnfel/we-need-to-talk-about-israel',
'only_matching': True,
}, {
'url': 'http://tosh.cc.com/video-clips/68g93d/twitter-users-share-summer-plans', 'url': 'http://tosh.cc.com/video-clips/68g93d/twitter-users-share-summer-plans',
'info_dict': {
'description': 'Tosh asked fans to share their summer plans.',
'title': 'Twitter Users Share Summer Plans',
},
'playlist': [{
'md5': 'f269e88114c1805bb6d7653fecea9e06',
'info_dict': {
'id': '90498ec2-ed00-11e0-aca6-0026b9414f30',
'ext': 'mp4',
'title': 'Tosh.0|June 9, 2077|2|211|Twitter Users Share Summer Plans',
'description': 'Tosh asked fans to share their summer plans.',
'thumbnail': 're:^https?://.*\.jpg',
# It's really reported to be published on year 2077
'upload_date': '20770610',
'timestamp': 3390510600,
'subtitles': {
'en': 'mincount:3',
},
},
}]
}, {
'url': 'http://tosh.cc.com/video-collections/x2iz7k/just-plain-foul/m5q4fp',
'only_matching': True, 'only_matching': True,
}] }]
_available_formats = ['3500', '2200', '1700', '1200', '750', '400'] @classmethod
def _transform_rtmp_url(cls, rtmp_video_url):
new_urls = super(ToshIE, cls)._transform_rtmp_url(rtmp_video_url)
new_urls['rtmp'] = rtmp_video_url.replace('viacomccstrm', 'viacommtvstrm')
return new_urls
_video_extensions = {
'3500': 'mp4', class ComedyCentralTVIE(MTVServicesInfoExtractor):
'2200': 'mp4', _VALID_URL = r'https?://(?:www\.)?comedycentral\.tv/(?:staffeln|shows)/(?P<id>[^/?#&]+)'
'1700': 'mp4', _TESTS = [{
'1200': 'mp4', 'url': 'http://www.comedycentral.tv/staffeln/7436-the-mindy-project-staffel-4',
'750': 'mp4', 'info_dict': {
'400': 'mp4', 'id': 'local_playlist-f99b626bdfe13568579a',
} 'ext': 'flv',
_video_dimensions = { 'title': 'Episode_the-mindy-project_shows_season-4_episode-3_full-episode_part1',
'3500': (1280, 720), },
'2200': (960, 540), 'params': {
'1700': (768, 432), # rtmp download
'1200': (640, 360), 'skip_download': True,
'750': (512, 288), },
'400': (384, 216), }, {
} 'url': 'http://www.comedycentral.tv/shows/1074-workaholics',
'only_matching': True,
}, {
'url': 'http://www.comedycentral.tv/shows/1727-the-mindy-project/bonus',
'only_matching': True,
}]
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) video_id = self._match_id(url)
if mobj.group('shortname'): webpage = self._download_webpage(url, video_id)
return self.url_result('http://www.cc.com/shows/the-daily-show-with-trevor-noah/full-episodes')
if mobj.group('clip'): mrss_url = self._search_regex(
if mobj.group('videotitle'): r'data-mrss=(["\'])(?P<url>(?:(?!\1).)+)\1',
epTitle = mobj.group('videotitle') webpage, 'mrss url', group='url')
elif mobj.group('showname') == 'thedailyshow':
epTitle = mobj.group('tdstitle')
else:
epTitle = mobj.group('cntitle')
dlNewest = False
elif mobj.group('interview'):
epTitle = mobj.group('interview_title')
dlNewest = False
else:
dlNewest = not mobj.group('episode')
if dlNewest:
epTitle = mobj.group('showname')
else:
epTitle = mobj.group('episode')
show_name = mobj.group('showname')
webpage, htmlHandle = self._download_webpage_handle(url, epTitle) return self._get_videos_info_from_url(mrss_url, video_id)
if dlNewest:
url = htmlHandle.geturl()
mobj = re.match(self._VALID_URL, url, re.VERBOSE)
if mobj is None:
raise ExtractorError('Invalid redirected URL: ' + url)
if mobj.group('episode') == '':
raise ExtractorError('Redirected URL is still not specific: ' + url)
epTitle = (mobj.group('episode') or mobj.group('videotitle')).rpartition('/')[-1]
mMovieParams = re.findall('(?:<param name="movie" value="|var url = ")(http://media.mtvnservices.com/([^"]*(?:episode|video).*?:.*?))"', webpage)
if len(mMovieParams) == 0:
# The Colbert Report embeds the information in a without
# a URL prefix; so extract the alternate reference
# and then add the URL prefix manually.
altMovieParams = re.findall('data-mgid="([^"]*(?:episode|video|playlist).*?:.*?)"', webpage) class ComedyCentralShortnameIE(InfoExtractor):
if len(altMovieParams) == 0: _VALID_URL = r'^:(?P<id>tds|thedailyshow)$'
raise ExtractorError('unable to find Flash URL in webpage ' + url) _TESTS = [{
else: 'url': ':tds',
mMovieParams = [('http://media.mtvnservices.com/' + altMovieParams[0], altMovieParams[0])] 'only_matching': True,
}, {
'url': ':thedailyshow',
'only_matching': True,
}]
uri = mMovieParams[0][1] def _real_extract(self, url):
# Correct cc.com in uri video_id = self._match_id(url)
uri = re.sub(r'(episode:[^.]+)(\.cc)?\.com', r'\1.com', uri) shortcut_map = {
'tds': 'http://www.cc.com/shows/the-daily-show-with-trevor-noah/full-episodes',
index_url = 'http://%s.cc.com/feeds/mrss?%s' % (show_name, compat_urllib_parse_urlencode({'uri': uri})) 'thedailyshow': 'http://www.cc.com/shows/the-daily-show-with-trevor-noah/full-episodes',
idoc = self._download_xml(
index_url, epTitle,
'Downloading show index', 'Unable to download episode index')
title = idoc.find('./channel/title').text
description = idoc.find('./channel/description').text
entries = []
item_els = idoc.findall('.//item')
for part_num, itemEl in enumerate(item_els):
upload_date = unified_strdate(itemEl.findall('./pubDate')[0].text)
thumbnail = itemEl.find('.//{http://search.yahoo.com/mrss/}thumbnail').attrib.get('url')
content = itemEl.find('.//{http://search.yahoo.com/mrss/}content')
duration = float_or_none(content.attrib.get('duration'))
mediagen_url = content.attrib['url']
guid = itemEl.find('./guid').text.rpartition(':')[-1]
cdoc = self._download_xml(
mediagen_url, epTitle,
'Downloading configuration for segment %d / %d' % (part_num + 1, len(item_els)))
turls = []
for rendition in cdoc.findall('.//rendition'):
finfo = (rendition.attrib['bitrate'], rendition.findall('./src')[0].text)
turls.append(finfo)
formats = []
for format, rtmp_video_url in turls:
w, h = self._video_dimensions.get(format, (None, None))
formats.append({
'format_id': 'vhttp-%s' % format,
'url': self._transform_rtmp_url(rtmp_video_url),
'ext': self._video_extensions.get(format, 'mp4'),
'height': h,
'width': w,
})
formats.append({
'format_id': 'rtmp-%s' % format,
'url': rtmp_video_url.replace('viacomccstrm', 'viacommtvstrm'),
'ext': self._video_extensions.get(format, 'mp4'),
'height': h,
'width': w,
})
self._sort_formats(formats)
subtitles = self._extract_subtitles(cdoc, guid)
virtual_id = show_name + ' ' + epTitle + ' part ' + compat_str(part_num + 1)
entries.append({
'id': guid,
'title': virtual_id,
'formats': formats,
'uploader': show_name,
'upload_date': upload_date,
'duration': duration,
'thumbnail': thumbnail,
'description': description,
'subtitles': subtitles,
})
return {
'_type': 'playlist',
'id': epTitle,
'entries': entries,
'title': show_name + ' ' + title,
'description': description,
} }
return self.url_result(shortcut_map[video_id])

View File

@@ -44,6 +44,7 @@ from ..utils import (
sanitized_Request, sanitized_Request,
unescapeHTML, unescapeHTML,
unified_strdate, unified_strdate,
unified_timestamp,
url_basename, url_basename,
xpath_element, xpath_element,
xpath_text, xpath_text,
@@ -54,6 +55,8 @@ from ..utils import (
update_Request, update_Request,
update_url_query, update_url_query,
parse_m3u8_attributes, parse_m3u8_attributes,
extract_attributes,
parse_codecs,
) )
@@ -161,6 +164,7 @@ class InfoExtractor(object):
* "height" (optional, int) * "height" (optional, int)
* "resolution" (optional, string "{width}x{height"}, * "resolution" (optional, string "{width}x{height"},
deprecated) deprecated)
* "filesize" (optional, int)
thumbnail: Full URL to a video thumbnail image. thumbnail: Full URL to a video thumbnail image.
description: Full video description. description: Full video description.
uploader: Full name of the video uploader. uploader: Full name of the video uploader.
@@ -723,9 +727,14 @@ class InfoExtractor(object):
[^>]+?content=(["\'])(?P<content>.*?)\2''' % re.escape(prop) [^>]+?content=(["\'])(?P<content>.*?)\2''' % re.escape(prop)
def _og_search_property(self, prop, html, name=None, **kargs): def _og_search_property(self, prop, html, name=None, **kargs):
if not isinstance(prop, (list, tuple)):
prop = [prop]
if name is None: if name is None:
name = 'OpenGraph %s' % prop name = 'OpenGraph %s' % prop[0]
escaped = self._search_regex(self._og_regexes(prop), html, name, flags=re.DOTALL, **kargs) og_regexes = []
for p in prop:
og_regexes.extend(self._og_regexes(p))
escaped = self._search_regex(og_regexes, html, name, flags=re.DOTALL, **kargs)
if escaped is None: if escaped is None:
return None return None
return unescapeHTML(escaped) return unescapeHTML(escaped)
@@ -803,40 +812,66 @@ class InfoExtractor(object):
return self._html_search_meta('twitter:player', html, return self._html_search_meta('twitter:player', html,
'twitter card player') 'twitter card player')
def _search_json_ld(self, html, video_id, **kwargs): def _search_json_ld(self, html, video_id, expected_type=None, **kwargs):
json_ld = self._search_regex( json_ld = self._search_regex(
r'(?s)<script[^>]+type=(["\'])application/ld\+json\1[^>]*>(?P<json_ld>.+?)</script>', r'(?s)<script[^>]+type=(["\'])application/ld\+json\1[^>]*>(?P<json_ld>.+?)</script>',
html, 'JSON-LD', group='json_ld', **kwargs) html, 'JSON-LD', group='json_ld', **kwargs)
default = kwargs.get('default', NO_DEFAULT)
if not json_ld: if not json_ld:
return {} return default if default is not NO_DEFAULT else {}
return self._json_ld(json_ld, video_id, fatal=kwargs.get('fatal', True)) # JSON-LD may be malformed and thus `fatal` should be respected.
# At the same time `default` may be passed that assumes `fatal=False`
# for _search_regex. Let's simulate the same behavior here as well.
fatal = kwargs.get('fatal', True) if default == NO_DEFAULT else False
return self._json_ld(json_ld, video_id, fatal=fatal, expected_type=expected_type)
def _json_ld(self, json_ld, video_id, fatal=True): def _json_ld(self, json_ld, video_id, fatal=True, expected_type=None):
if isinstance(json_ld, compat_str): if isinstance(json_ld, compat_str):
json_ld = self._parse_json(json_ld, video_id, fatal=fatal) json_ld = self._parse_json(json_ld, video_id, fatal=fatal)
if not json_ld: if not json_ld:
return {} return {}
info = {} info = {}
if json_ld.get('@context') == 'http://schema.org': if not isinstance(json_ld, (list, tuple, dict)):
item_type = json_ld.get('@type') return info
if item_type == 'TVEpisode': if isinstance(json_ld, dict):
info.update({ json_ld = [json_ld]
'episode': unescapeHTML(json_ld.get('name')), for e in json_ld:
'episode_number': int_or_none(json_ld.get('episodeNumber')), if e.get('@context') == 'http://schema.org':
'description': unescapeHTML(json_ld.get('description')), item_type = e.get('@type')
}) if expected_type is not None and expected_type != item_type:
part_of_season = json_ld.get('partOfSeason') return info
if isinstance(part_of_season, dict) and part_of_season.get('@type') == 'TVSeason': if item_type == 'TVEpisode':
info['season_number'] = int_or_none(part_of_season.get('seasonNumber')) info.update({
part_of_series = json_ld.get('partOfSeries') 'episode': unescapeHTML(e.get('name')),
if isinstance(part_of_series, dict) and part_of_series.get('@type') == 'TVSeries': 'episode_number': int_or_none(e.get('episodeNumber')),
info['series'] = unescapeHTML(part_of_series.get('name')) 'description': unescapeHTML(e.get('description')),
elif item_type == 'Article': })
info.update({ part_of_season = e.get('partOfSeason')
'timestamp': parse_iso8601(json_ld.get('datePublished')), if isinstance(part_of_season, dict) and part_of_season.get('@type') == 'TVSeason':
'title': unescapeHTML(json_ld.get('headline')), info['season_number'] = int_or_none(part_of_season.get('seasonNumber'))
'description': unescapeHTML(json_ld.get('articleBody')), part_of_series = e.get('partOfSeries') or e.get('partOfTVSeries')
}) if isinstance(part_of_series, dict) and part_of_series.get('@type') == 'TVSeries':
info['series'] = unescapeHTML(part_of_series.get('name'))
elif item_type == 'Article':
info.update({
'timestamp': parse_iso8601(e.get('datePublished')),
'title': unescapeHTML(e.get('headline')),
'description': unescapeHTML(e.get('articleBody')),
})
elif item_type == 'VideoObject':
info.update({
'url': e.get('contentUrl'),
'title': unescapeHTML(e.get('name')),
'description': unescapeHTML(e.get('description')),
'thumbnail': e.get('thumbnailUrl'),
'duration': parse_duration(e.get('duration')),
'timestamp': unified_timestamp(e.get('uploadDate')),
'filesize': float_or_none(e.get('contentSize')),
'tbr': int_or_none(e.get('bitrate')),
'width': int_or_none(e.get('width')),
'height': int_or_none(e.get('height')),
})
break
return dict((k, v) for k, v in info.items() if v is not None) return dict((k, v) for k, v in info.items() if v is not None)
@staticmethod @staticmethod
@@ -890,7 +925,8 @@ class InfoExtractor(object):
if f.get('ext') in ['f4f', 'f4m']: # Not yet supported if f.get('ext') in ['f4f', 'f4m']: # Not yet supported
preference -= 0.5 preference -= 0.5
proto_preference = 0 if determine_protocol(f) in ['http', 'https'] else -0.1 protocol = f.get('protocol') or determine_protocol(f)
proto_preference = 0 if protocol in ['http', 'https'] else (-0.5 if protocol == 'rtsp' else -0.1)
if f.get('vcodec') == 'none': # audio only if f.get('vcodec') == 'none': # audio only
preference -= 50 preference -= 50
@@ -1107,7 +1143,7 @@ class InfoExtractor(object):
'url': m3u8_url, 'url': m3u8_url,
'ext': ext, 'ext': ext,
'protocol': 'm3u8', 'protocol': 'm3u8',
'preference': preference - 1 if preference else -1, 'preference': preference - 100 if preference else -100,
'resolution': 'multiple', 'resolution': 'multiple',
'format_note': 'Quality selection URL', 'format_note': 'Quality selection URL',
} }
@@ -1186,6 +1222,7 @@ class InfoExtractor(object):
'url': format_url(line.strip()), 'url': format_url(line.strip()),
'tbr': tbr, 'tbr': tbr,
'ext': ext, 'ext': ext,
'fps': float_or_none(last_info.get('FRAME-RATE')),
'protocol': entry_protocol, 'protocol': entry_protocol,
'preference': preference, 'preference': preference,
} }
@@ -1194,24 +1231,17 @@ class InfoExtractor(object):
width_str, height_str = resolution.split('x') width_str, height_str = resolution.split('x')
f['width'] = int(width_str) f['width'] = int(width_str)
f['height'] = int(height_str) f['height'] = int(height_str)
codecs = last_info.get('CODECS') # Unified Streaming Platform
if codecs: mobj = re.search(
vcodec, acodec = [None] * 2 r'audio.*?(?:%3D|=)(\d+)(?:-video.*?(?:%3D|=)(\d+))?', f['url'])
va_codecs = codecs.split(',') if mobj:
if len(va_codecs) == 1: abr, vbr = mobj.groups()
# Audio only entries usually come with single codec and abr, vbr = float_or_none(abr, 1000), float_or_none(vbr, 1000)
# no resolution. For more robustness we also check it to
# be mp4 audio.
if not resolution and va_codecs[0].startswith('mp4a'):
vcodec, acodec = 'none', va_codecs[0]
else:
vcodec = va_codecs[0]
else:
vcodec, acodec = va_codecs[:2]
f.update({ f.update({
'acodec': acodec, 'vbr': vbr,
'vcodec': vcodec, 'abr': abr,
}) })
f.update(parse_codecs(last_info.get('CODECS')))
if last_media is not None: if last_media is not None:
f['m3u8_media'] = last_media f['m3u8_media'] = last_media
last_media = None last_media = None
@@ -1466,6 +1496,13 @@ class InfoExtractor(object):
compat_etree_fromstring(mpd.encode('utf-8')), mpd_id, mpd_base_url, formats_dict=formats_dict) compat_etree_fromstring(mpd.encode('utf-8')), mpd_id, mpd_base_url, formats_dict=formats_dict)
def _parse_mpd_formats(self, mpd_doc, mpd_id=None, mpd_base_url='', formats_dict={}): def _parse_mpd_formats(self, mpd_doc, mpd_id=None, mpd_base_url='', formats_dict={}):
"""
Parse formats from MPD manifest.
References:
1. MPEG-DASH Standard, ISO/IEC 23009-1:2014(E),
http://standards.iso.org/ittf/PubliclyAvailableStandards/c065274_ISO_IEC_23009-1_2014.zip
2. https://en.wikipedia.org/wiki/Dynamic_Adaptive_Streaming_over_HTTP
"""
if mpd_doc.get('type') == 'dynamic': if mpd_doc.get('type') == 'dynamic':
return [] return []
@@ -1498,8 +1535,16 @@ class InfoExtractor(object):
s_e = segment_timeline.findall(_add_ns('S')) s_e = segment_timeline.findall(_add_ns('S'))
if s_e: if s_e:
ms_info['total_number'] = 0 ms_info['total_number'] = 0
ms_info['s'] = []
for s in s_e: for s in s_e:
ms_info['total_number'] += 1 + int(s.get('r', '0')) r = int(s.get('r', 0))
ms_info['total_number'] += 1 + r
ms_info['s'].append({
't': int(s.get('t', 0)),
# @d is mandatory (see [1, 5.3.9.6.2, Table 17, page 60])
'd': int(s.attrib['d']),
'r': r,
})
else: else:
timescale = segment_template.get('timescale') timescale = segment_template.get('timescale')
if timescale: if timescale:
@@ -1536,7 +1581,7 @@ class InfoExtractor(object):
continue continue
representation_attrib = adaptation_set.attrib.copy() representation_attrib = adaptation_set.attrib.copy()
representation_attrib.update(representation.attrib) representation_attrib.update(representation.attrib)
# According to page 41 of ISO/IEC 29001-1:2014, @mimeType is mandatory # According to [1, 5.3.7.2, Table 9, page 41], @mimeType is mandatory
mime_type = representation_attrib['mimeType'] mime_type = representation_attrib['mimeType']
content_type = mime_type.split('/')[0] content_type = mime_type.split('/')[0]
if content_type == 'text': if content_type == 'text':
@@ -1580,16 +1625,40 @@ class InfoExtractor(object):
representation_ms_info['total_number'] = int(math.ceil(float(period_duration) / segment_duration)) representation_ms_info['total_number'] = int(math.ceil(float(period_duration) / segment_duration))
media_template = representation_ms_info['media_template'] media_template = representation_ms_info['media_template']
media_template = media_template.replace('$RepresentationID$', representation_id) media_template = media_template.replace('$RepresentationID$', representation_id)
media_template = re.sub(r'\$(Number|Bandwidth)\$', r'%(\1)d', media_template) media_template = re.sub(r'\$(Number|Bandwidth|Time)\$', r'%(\1)d', media_template)
media_template = re.sub(r'\$(Number|Bandwidth)%([^$]+)\$', r'%(\1)\2', media_template) media_template = re.sub(r'\$(Number|Bandwidth|Time)%([^$]+)\$', r'%(\1)\2', media_template)
media_template.replace('$$', '$') media_template.replace('$$', '$')
representation_ms_info['segment_urls'] = [
media_template % { # As per [1, 5.3.9.4.4, Table 16, page 55] $Number$ and $Time$
'Number': segment_number, # can't be used at the same time
'Bandwidth': representation_attrib.get('bandwidth')} if '%(Number' in media_template:
for segment_number in range( representation_ms_info['segment_urls'] = [
representation_ms_info['start_number'], media_template % {
representation_ms_info['total_number'] + representation_ms_info['start_number'])] 'Number': segment_number,
'Bandwidth': representation_attrib.get('bandwidth'),
}
for segment_number in range(
representation_ms_info['start_number'],
representation_ms_info['total_number'] + representation_ms_info['start_number'])]
else:
representation_ms_info['segment_urls'] = []
segment_time = 0
def add_segment_url():
representation_ms_info['segment_urls'].append(
media_template % {
'Time': segment_time,
'Bandwidth': representation_attrib.get('bandwidth'),
}
)
for num, s in enumerate(representation_ms_info['s']):
segment_time = s.get('t') or segment_time
add_segment_url()
for r in range(s.get('r', 0)):
segment_time += s['d']
add_segment_url()
segment_time += s['d']
if 'segment_urls' in representation_ms_info: if 'segment_urls' in representation_ms_info:
f.update({ f.update({
'segment_urls': representation_ms_info['segment_urls'], 'segment_urls': representation_ms_info['segment_urls'],
@@ -1616,6 +1685,62 @@ class InfoExtractor(object):
self.report_warning('Unknown MIME type %s in DASH manifest' % mime_type) self.report_warning('Unknown MIME type %s in DASH manifest' % mime_type)
return formats return formats
def _parse_html5_media_entries(self, base_url, webpage):
def absolute_url(video_url):
return compat_urlparse.urljoin(base_url, video_url)
def parse_content_type(content_type):
if not content_type:
return {}
ctr = re.search(r'(?P<mimetype>[^/]+/[^;]+)(?:;\s*codecs="?(?P<codecs>[^"]+))?', content_type)
if ctr:
mimetype, codecs = ctr.groups()
f = parse_codecs(codecs)
f['ext'] = mimetype2ext(mimetype)
return f
return {}
entries = []
for media_tag, media_type, media_content in re.findall(r'(?s)(<(?P<tag>video|audio)[^>]*>)(.*?)</(?P=tag)>', webpage):
media_info = {
'formats': [],
'subtitles': {},
}
media_attributes = extract_attributes(media_tag)
src = media_attributes.get('src')
if src:
media_info['formats'].append({
'url': absolute_url(src),
'vcodec': 'none' if media_type == 'audio' else None,
})
media_info['thumbnail'] = media_attributes.get('poster')
if media_content:
for source_tag in re.findall(r'<source[^>]+>', media_content):
source_attributes = extract_attributes(source_tag)
src = source_attributes.get('src')
if not src:
continue
f = parse_content_type(source_attributes.get('type'))
f.update({
'url': absolute_url(src),
'vcodec': 'none' if media_type == 'audio' else None,
})
media_info['formats'].append(f)
for track_tag in re.findall(r'<track[^>]+>', media_content):
track_attributes = extract_attributes(track_tag)
kind = track_attributes.get('kind')
if not kind or kind == 'subtitles':
src = track_attributes.get('src')
if not src:
continue
lang = track_attributes.get('srclang') or track_attributes.get('lang') or track_attributes.get('label')
media_info['subtitles'].setdefault(lang, []).append({
'url': absolute_url(src),
})
if media_info['formats']:
entries.append(media_info)
return entries
def _live_title(self, name): def _live_title(self, name):
""" Generate the title for a live video """ """ Generate the title for a live video """
now = datetime.datetime.now() now = datetime.datetime.now()
@@ -1676,7 +1801,7 @@ class InfoExtractor(object):
any_restricted = False any_restricted = False
for tc in self.get_testcases(include_onlymatching=False): for tc in self.get_testcases(include_onlymatching=False):
if 'playlist' in tc: if tc.get('playlist', []):
tc = tc['playlist'][0] tc = tc['playlist'][0]
is_restricted = age_restricted( is_restricted = age_restricted(
tc.get('info_dict', {}).get('age_limit'), age_limit) tc.get('info_dict', {}).get('age_limit'), age_limit)
@@ -1729,6 +1854,13 @@ class InfoExtractor(object):
def _mark_watched(self, *args, **kwargs): def _mark_watched(self, *args, **kwargs):
raise NotImplementedError('This method must be implemented by subclasses') raise NotImplementedError('This method must be implemented by subclasses')
def geo_verification_headers(self):
headers = {}
geo_verification_proxy = self._downloader.params.get('geo_verification_proxy')
if geo_verification_proxy:
headers['Ytdl-request-proxy'] = geo_verification_proxy
return headers
class SearchInfoExtractor(InfoExtractor): class SearchInfoExtractor(InfoExtractor):
""" """

View File

@@ -5,13 +5,17 @@ import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import ( from ..compat import (
compat_urllib_parse_urlencode,
compat_urllib_parse_urlparse, compat_urllib_parse_urlparse,
compat_urlparse, compat_urlparse,
) )
from ..utils import ( from ..utils import (
orderedSet, orderedSet,
remove_end, remove_end,
extract_attributes,
mimetype2ext,
determine_ext,
int_or_none,
parse_iso8601,
) )
@@ -58,6 +62,9 @@ class CondeNastIE(InfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'title': '3D Printed Speakers Lit With LED', 'title': '3D Printed Speakers Lit With LED',
'description': 'Check out these beautiful 3D printed LED speakers. You can\'t actually buy them, but LumiGeek is working on a board that will let you make you\'re own.', 'description': 'Check out these beautiful 3D printed LED speakers. You can\'t actually buy them, but LumiGeek is working on a board that will let you make you\'re own.',
'uploader': 'wired',
'upload_date': '20130314',
'timestamp': 1363219200,
} }
}, { }, {
# JS embed # JS embed
@@ -67,70 +74,93 @@ class CondeNastIE(InfoExtractor):
'id': '55f9cf8b61646d1acf00000c', 'id': '55f9cf8b61646d1acf00000c',
'ext': 'mp4', 'ext': 'mp4',
'title': '3D printed TSA Travel Sentry keys really do open TSA locks', 'title': '3D printed TSA Travel Sentry keys really do open TSA locks',
'uploader': 'arstechnica',
'upload_date': '20150916',
'timestamp': 1442434955,
} }
}] }]
def _extract_series(self, url, webpage): def _extract_series(self, url, webpage):
title = self._html_search_regex(r'<div class="cne-series-info">.*?<h1>(.+?)</h1>', title = self._html_search_regex(
webpage, 'series title', flags=re.DOTALL) r'(?s)<div class="cne-series-info">.*?<h1>(.+?)</h1>',
webpage, 'series title')
url_object = compat_urllib_parse_urlparse(url) url_object = compat_urllib_parse_urlparse(url)
base_url = '%s://%s' % (url_object.scheme, url_object.netloc) base_url = '%s://%s' % (url_object.scheme, url_object.netloc)
m_paths = re.finditer(r'<p class="cne-thumb-title">.*?<a href="(/watch/.+?)["\?]', m_paths = re.finditer(
webpage, flags=re.DOTALL) r'(?s)<p class="cne-thumb-title">.*?<a href="(/watch/.+?)["\?]', webpage)
paths = orderedSet(m.group(1) for m in m_paths) paths = orderedSet(m.group(1) for m in m_paths)
build_url = lambda path: compat_urlparse.urljoin(base_url, path) build_url = lambda path: compat_urlparse.urljoin(base_url, path)
entries = [self.url_result(build_url(path), 'CondeNast') for path in paths] entries = [self.url_result(build_url(path), 'CondeNast') for path in paths]
return self.playlist_result(entries, playlist_title=title) return self.playlist_result(entries, playlist_title=title)
def _extract_video(self, webpage, url_type): def _extract_video(self, webpage, url_type):
if url_type != 'embed': query = {}
description = self._html_search_regex( params = self._search_regex(
[ r'(?s)var params = {(.+?)}[;,]', webpage, 'player params', default=None)
r'<div class="cne-video-description">(.+?)</div>', if params:
r'<div class="video-post-content">(.+?)</div>', query.update({
], 'videoId': self._search_regex(r'videoId: [\'"](.+?)[\'"]', params, 'video id'),
webpage, 'description', fatal=False, flags=re.DOTALL) 'playerId': self._search_regex(r'playerId: [\'"](.+?)[\'"]', params, 'player id'),
'target': self._search_regex(r'target: [\'"](.+?)[\'"]', params, 'target'),
})
else: else:
description = None params = extract_attributes(self._search_regex(
params = self._search_regex(r'var params = {(.+?)}[;,]', webpage, r'(<[^>]+data-js="video-player"[^>]+>)',
'player params', flags=re.DOTALL) webpage, 'player params element'))
video_id = self._search_regex(r'videoId: [\'"](.+?)[\'"]', params, 'video id') query.update({
player_id = self._search_regex(r'playerId: [\'"](.+?)[\'"]', params, 'player id') 'videoId': params['data-video'],
target = self._search_regex(r'target: [\'"](.+?)[\'"]', params, 'target') 'playerId': params['data-player'],
data = compat_urllib_parse_urlencode({'videoId': video_id, 'target': params['id'],
'playerId': player_id, })
'target': target, video_id = query['videoId']
}) video_info = None
base_info_url = self._search_regex(r'url = [\'"](.+?)[\'"][,;]', info_page = self._download_webpage(
webpage, 'base info url', 'http://player.cnevids.com/player/video.js',
default='http://player.cnevids.com/player/loader.js?') video_id, 'Downloading video info', query=query, fatal=False)
info_url = base_info_url + data if info_page:
info_page = self._download_webpage(info_url, video_id, video_info = self._parse_json(self._search_regex(
'Downloading video info') r'loadCallback\(({.+})\)', info_page, 'video info'), video_id)['video']
video_info = self._search_regex(r'var\s+video\s*=\s*({.+?});', info_page, 'video info') else:
video_info = self._parse_json(video_info, video_id) info_page = self._download_webpage(
'http://player.cnevids.com/player/loader.js',
video_id, 'Downloading loader info', query=query)
video_info = self._parse_json(self._search_regex(
r'var\s+video\s*=\s*({.+?});', info_page, 'video info'), video_id)
title = video_info['title']
formats = [{ formats = []
'format_id': '%s-%s' % (fdata['type'].split('/')[-1], fdata['quality']), for fdata in video_info.get('sources', [{}])[0]:
'url': fdata['src'], src = fdata.get('src')
'ext': fdata['type'].split('/')[-1], if not src:
'quality': 1 if fdata['quality'] == 'high' else 0, continue
} for fdata in video_info['sources'][0]] ext = mimetype2ext(fdata.get('type')) or determine_ext(src)
quality = fdata.get('quality')
formats.append({
'format_id': ext + ('-%s' % quality if quality else ''),
'url': src,
'ext': ext,
'quality': 1 if quality == 'high' else 0,
})
self._sort_formats(formats) self._sort_formats(formats)
return { info = self._search_json_ld(
webpage, video_id, fatal=False) if url_type != 'embed' else {}
info.update({
'id': video_id, 'id': video_id,
'formats': formats, 'formats': formats,
'title': video_info['title'], 'title': title,
'thumbnail': video_info['poster_frame'], 'thumbnail': video_info.get('poster_frame'),
'description': description, 'uploader': video_info.get('brand'),
} 'duration': int_or_none(video_info.get('duration')),
'tags': video_info.get('tags'),
'series': video_info.get('series_title'),
'season': video_info.get('season_title'),
'timestamp': parse_iso8601(video_info.get('premiere_date')),
})
return info
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) site, url_type, item_id = re.match(self._VALID_URL, url).groups()
site = mobj.group('site')
url_type = mobj.group('type')
item_id = mobj.group('id')
# Convert JS embed to regular embed # Convert JS embed to regular embed
if url_type == 'embedjs': if url_type == 'embedjs':

View File

@@ -51,8 +51,11 @@ class CSpanIE(InfoExtractor):
'url': 'http://www.c-span.org/video/?104517-1/immigration-reforms-needed-protect-skilled-american-workers', 'url': 'http://www.c-span.org/video/?104517-1/immigration-reforms-needed-protect-skilled-american-workers',
'info_dict': { 'info_dict': {
'id': 'judiciary031715', 'id': 'judiciary031715',
'ext': 'flv', 'ext': 'mp4',
'title': 'Immigration Reforms Needed to Protect Skilled American Workers', 'title': 'Immigration Reforms Needed to Protect Skilled American Workers',
},
'params': {
'skip_download': True, # m3u8 downloads
} }
}] }]

View File

@@ -1,13 +1,12 @@
# -*- coding: utf-8 -*- # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import parse_iso8601, ExtractorError from ..utils import unified_timestamp
class CtsNewsIE(InfoExtractor): class CtsNewsIE(InfoExtractor):
IE_DESC = '華視新聞' IE_DESC = '華視新聞'
# https connection failed (Connection reset)
_VALID_URL = r'https?://news\.cts\.com\.tw/[a-z]+/[a-z]+/\d+/(?P<id>\d+)\.html' _VALID_URL = r'https?://news\.cts\.com\.tw/[a-z]+/[a-z]+/\d+/(?P<id>\d+)\.html'
_TESTS = [{ _TESTS = [{
'url': 'http://news.cts.com.tw/cts/international/201501/201501291578109.html', 'url': 'http://news.cts.com.tw/cts/international/201501/201501291578109.html',
@@ -16,7 +15,7 @@ class CtsNewsIE(InfoExtractor):
'id': '201501291578109', 'id': '201501291578109',
'ext': 'mp4', 'ext': 'mp4',
'title': '以色列.真主黨交火 3人死亡', 'title': '以色列.真主黨交火 3人死亡',
'description': 'md5:95e9b295c898b7ff294f09d450178d7d', 'description': '以色列和黎巴嫩真主黨,爆發五年最嚴重衝突,雙方砲轟交火,兩名以軍死亡,還有一名西班牙籍的聯合國維和人...',
'timestamp': 1422528540, 'timestamp': 1422528540,
'upload_date': '20150129', 'upload_date': '20150129',
} }
@@ -28,7 +27,7 @@ class CtsNewsIE(InfoExtractor):
'id': '201309031304098', 'id': '201309031304098',
'ext': 'mp4', 'ext': 'mp4',
'title': '韓國31歲童顏男 貌如十多歲小孩', 'title': '韓國31歲童顏男 貌如十多歲小孩',
'description': 'md5:f183feeba3752b683827aab71adad584', 'description': '越有年紀的人越希望看起來年輕一點而南韓卻有一位31歲的男子看起來像是11、12歲的小孩身...',
'thumbnail': 're:^https?://.*\.jpg$', 'thumbnail': 're:^https?://.*\.jpg$',
'timestamp': 1378205880, 'timestamp': 1378205880,
'upload_date': '20130903', 'upload_date': '20130903',
@@ -36,8 +35,7 @@ class CtsNewsIE(InfoExtractor):
}, { }, {
# With Youtube embedded video # With Youtube embedded video
'url': 'http://news.cts.com.tw/cts/money/201501/201501291578003.html', 'url': 'http://news.cts.com.tw/cts/money/201501/201501291578003.html',
'md5': '1d842c771dc94c8c3bca5af2cc1db9c5', 'md5': 'e4726b2ccd70ba2c319865e28f0a91d1',
'add_ie': ['Youtube'],
'info_dict': { 'info_dict': {
'id': 'OVbfO7d0_hQ', 'id': 'OVbfO7d0_hQ',
'ext': 'mp4', 'ext': 'mp4',
@@ -47,42 +45,37 @@ class CtsNewsIE(InfoExtractor):
'upload_date': '20150128', 'upload_date': '20150128',
'uploader_id': 'TBSCTS', 'uploader_id': 'TBSCTS',
'uploader': '中華電視公司', 'uploader': '中華電視公司',
} },
'add_ie': ['Youtube'],
}] }]
def _real_extract(self, url): def _real_extract(self, url):
news_id = self._match_id(url) news_id = self._match_id(url)
page = self._download_webpage(url, news_id) page = self._download_webpage(url, news_id)
if self._search_regex(r'(CTSPlayer2)', page, 'CTSPlayer2 identifier', default=None): news_id = self._hidden_inputs(page).get('get_id')
feed_url = self._html_search_regex(
r'(http://news\.cts\.com\.tw/action/mp4feed\.php\?news_id=\d+)', if news_id:
page, 'feed url') mp4_feed = self._download_json(
video_url = self._download_webpage( 'http://news.cts.com.tw/action/test_mp4feed.php',
feed_url, news_id, note='Fetching feed') news_id, note='Fetching feed', query={'news_id': news_id})
video_url = mp4_feed['source_url']
else: else:
self.to_screen('Not CTSPlayer video, trying Youtube...') self.to_screen('Not CTSPlayer video, trying Youtube...')
youtube_url = self._search_regex( youtube_url = self._search_regex(
r'src="(//www\.youtube\.com/embed/[^"]+)"', page, 'youtube url', r'src="(//www\.youtube\.com/embed/[^"]+)"', page, 'youtube url')
default=None)
if not youtube_url:
raise ExtractorError('The news includes no videos!', expected=True)
return { return self.url_result(youtube_url, ie='Youtube')
'_type': 'url',
'url': youtube_url,
'ie_key': 'Youtube',
}
description = self._html_search_meta('description', page) description = self._html_search_meta('description', page)
title = self._html_search_meta('title', page) title = self._html_search_meta('title', page, fatal=True)
thumbnail = self._html_search_meta('image', page) thumbnail = self._html_search_meta('image', page)
datetime_str = self._html_search_regex( datetime_str = self._html_search_regex(
r'(\d{4}/\d{2}/\d{2} \d{2}:\d{2})', page, 'date and time') r'(\d{4}/\d{2}/\d{2} \d{2}:\d{2})', page, 'date and time', fatal=False)
# Transform into ISO 8601 format with timezone info timestamp = None
datetime_str = datetime_str.replace('/', '-') + ':00+0800' if datetime_str:
timestamp = parse_iso8601(datetime_str, delimiter=' ') timestamp = unified_timestamp(datetime_str) - 8 * 3600
return { return {
'id': news_id, 'id': news_id,

View File

@@ -9,7 +9,7 @@ from ..utils import (
class CWTVIE(InfoExtractor): class CWTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?cw(?:tv|seed)\.com/(?:shows/)?(?:[^/]+/){2}\?.*\bplay=(?P<id>[a-z0-9]{8}-[a-z0-9]{4}-[a-z0-9]{4}-[a-z0-9]{4}-[a-z0-9]{12})' _VALID_URL = r'https?://(?:www\.)?cw(?:tv(?:pr)?|seed)\.com/(?:shows/)?(?:[^/]+/)+[^?]*\?.*\b(?:play|watch)=(?P<id>[a-z0-9]{8}-[a-z0-9]{4}-[a-z0-9]{4}-[a-z0-9]{4}-[a-z0-9]{12})'
_TESTS = [{ _TESTS = [{
'url': 'http://cwtv.com/shows/arrow/legends-of-yesterday/?play=6b15e985-9345-4f60-baf8-56e96be57c63', 'url': 'http://cwtv.com/shows/arrow/legends-of-yesterday/?play=6b15e985-9345-4f60-baf8-56e96be57c63',
'info_dict': { 'info_dict': {
@@ -28,7 +28,8 @@ class CWTVIE(InfoExtractor):
'params': { 'params': {
# m3u8 download # m3u8 download
'skip_download': True, 'skip_download': True,
} },
'skip': 'redirect to http://cwtv.com/shows/arrow/',
}, { }, {
'url': 'http://www.cwseed.com/shows/whose-line-is-it-anyway/jeff-davis-4/?play=24282b12-ead2-42f2-95ad-26770c2c6088', 'url': 'http://www.cwseed.com/shows/whose-line-is-it-anyway/jeff-davis-4/?play=24282b12-ead2-42f2-95ad-26770c2c6088',
'info_dict': { 'info_dict': {
@@ -44,22 +45,43 @@ class CWTVIE(InfoExtractor):
'upload_date': '20151006', 'upload_date': '20151006',
'timestamp': 1444107300, 'timestamp': 1444107300,
}, },
'params': {
# m3u8 download
'skip_download': True,
}
}, { }, {
'url': 'http://cwtv.com/thecw/chroniclesofcisco/?play=8adebe35-f447-465f-ab52-e863506ff6d6', 'url': 'http://cwtv.com/thecw/chroniclesofcisco/?play=8adebe35-f447-465f-ab52-e863506ff6d6',
'only_matching': True, 'only_matching': True,
}, {
'url': 'http://cwtvpr.com/the-cw/video?watch=9eee3f60-ef4e-440b-b3b2-49428ac9c54e',
'only_matching': True,
}, {
'url': 'http://cwtv.com/shows/arrow/legends-of-yesterday/?watch=6b15e985-9345-4f60-baf8-56e96be57c63',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
video_data = self._download_json( video_data = None
'http://metaframe.digitalsmiths.tv/v2/CWtv/assets/%s/partner/132?format=json' % video_id, video_id) formats = []
for partner in (154, 213):
formats = self._extract_m3u8_formats( vdata = self._download_json(
video_data['videos']['variantplaylist']['uri'], video_id, 'mp4') 'http://metaframe.digitalsmiths.tv/v2/CWtv/assets/%s/partner/%d?format=json' % (video_id, partner), video_id, fatal=False)
if not vdata:
continue
video_data = vdata
for quality, quality_data in vdata.get('videos', {}).items():
quality_url = quality_data.get('uri')
if not quality_url:
continue
if quality == 'variantplaylist':
formats.extend(self._extract_m3u8_formats(
quality_url, video_id, 'mp4', m3u8_id='hls', fatal=False))
else:
tbr = int_or_none(quality_data.get('bitrate'))
format_id = 'http' + ('-%d' % tbr if tbr else '')
if self._is_valid_url(quality_url, video_id, format_id):
formats.append({
'format_id': format_id,
'url': quality_url,
'tbr': tbr,
})
self._sort_formats(formats) self._sort_formats(formats)
thumbnails = [{ thumbnails = [{

View File

@@ -5,19 +5,20 @@ from .common import InfoExtractor
from ..utils import ( from ..utils import (
int_or_none, int_or_none,
determine_protocol, determine_protocol,
unescapeHTML,
) )
class DailyMailIE(InfoExtractor): class DailyMailIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?dailymail\.co\.uk/video/[^/]+/video-(?P<id>[0-9]+)' _VALID_URL = r'https?://(?:www\.)?dailymail\.co\.uk/video/[^/]+/video-(?P<id>[0-9]+)'
_TEST = { _TEST = {
'url': 'http://www.dailymail.co.uk/video/sciencetech/video-1288527/Turn-video-impressionist-masterpiece.html', 'url': 'http://www.dailymail.co.uk/video/tvshowbiz/video-1295863/The-Mountain-appears-sparkling-water-ad-Heavy-Bubbles.html',
'md5': '2f639d446394f53f3a33658b518b6615', 'md5': 'f6129624562251f628296c3a9ffde124',
'info_dict': { 'info_dict': {
'id': '1288527', 'id': '1295863',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Turn any video into an impressionist masterpiece', 'title': 'The Mountain appears in sparkling water ad for \'Heavy Bubbles\'',
'description': 'md5:88ddbcb504367987b2708bb38677c9d2', 'description': 'md5:a93d74b6da172dd5dc4d973e0b766a84',
} }
} }
@@ -26,7 +27,7 @@ class DailyMailIE(InfoExtractor):
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
video_data = self._parse_json(self._search_regex( video_data = self._parse_json(self._search_regex(
r"data-opts='({.+?})'", webpage, 'video data'), video_id) r"data-opts='({.+?})'", webpage, 'video data'), video_id)
title = video_data['title'] title = unescapeHTML(video_data['title'])
video_sources = self._download_json(video_data.get( video_sources = self._download_json(video_data.get(
'sources', {}).get('url') or 'http://www.dailymail.co.uk/api/player/%s/video-sources.json' % video_id, video_id) 'sources', {}).get('url') or 'http://www.dailymail.co.uk/api/player/%s/video-sources.json' % video_id, video_id)
@@ -55,7 +56,7 @@ class DailyMailIE(InfoExtractor):
return { return {
'id': video_id, 'id': video_id,
'title': title, 'title': title,
'description': video_data.get('descr'), 'description': unescapeHTML(video_data.get('descr')),
'thumbnail': video_data.get('poster') or video_data.get('thumbnail'), 'thumbnail': video_data.get('poster') or video_data.get('thumbnail'),
'formats': formats, 'formats': formats,
} }

View File

@@ -16,6 +16,7 @@ from ..utils import (
sanitized_Request, sanitized_Request,
str_to_int, str_to_int,
unescapeHTML, unescapeHTML,
mimetype2ext,
) )
@@ -111,6 +112,13 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
} }
] ]
@staticmethod
def _extract_urls(webpage):
# Look for embedded Dailymotion player
matches = re.findall(
r'<(?:(?:embed|iframe)[^>]+?src=|input[^>]+id=[\'"]dmcloudUrlEmissionSelect[\'"][^>]+value=)(["\'])(?P<url>(?:https?:)?//(?:www\.)?dailymotion\.com/(?:embed|swf)/video/.+?)\1', webpage)
return list(map(lambda m: unescapeHTML(m[1]), matches))
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
@@ -153,18 +161,19 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
type_ = media.get('type') type_ = media.get('type')
if type_ == 'application/vnd.lumberjack.manifest': if type_ == 'application/vnd.lumberjack.manifest':
continue continue
ext = determine_ext(media_url) ext = mimetype2ext(type_) or determine_ext(media_url)
if type_ == 'application/x-mpegURL' or ext == 'm3u8': if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats( formats.extend(self._extract_m3u8_formats(
media_url, video_id, 'mp4', preference=-1, media_url, video_id, 'mp4', preference=-1,
m3u8_id='hls', fatal=False)) m3u8_id='hls', fatal=False))
elif type_ == 'application/f4m' or ext == 'f4m': elif ext == 'f4m':
formats.extend(self._extract_f4m_formats( formats.extend(self._extract_f4m_formats(
media_url, video_id, preference=-1, f4m_id='hds', fatal=False)) media_url, video_id, preference=-1, f4m_id='hds', fatal=False))
else: else:
f = { f = {
'url': media_url, 'url': media_url,
'format_id': 'http-%s' % quality, 'format_id': 'http-%s' % quality,
'ext': ext,
} }
m = re.search(r'H264-(?P<width>\d+)x(?P<height>\d+)', media_url) m = re.search(r'H264-(?P<width>\d+)x(?P<height>\d+)', media_url)
if m: if m:
@@ -322,7 +331,9 @@ class DailymotionPlaylistIE(DailymotionBaseInfoExtractor):
for video_id in re.findall(r'data-xid="(.+?)"', webpage): for video_id in re.findall(r'data-xid="(.+?)"', webpage):
if video_id not in video_ids: if video_id not in video_ids:
yield self.url_result('http://www.dailymotion.com/video/%s' % video_id, 'Dailymotion') yield self.url_result(
'http://www.dailymotion.com/video/%s' % video_id,
DailymotionIE.ie_key(), video_id)
video_ids.add(video_id) video_ids.add(video_id)
if re.search(self._MORE_PAGES_INDICATOR, webpage) is None: if re.search(self._MORE_PAGES_INDICATOR, webpage) is None:

View File

@@ -66,22 +66,32 @@ class DaumIE(InfoExtractor):
'view_count': int, 'view_count': int,
'comment_count': int, 'comment_count': int,
}, },
}, {
# Requires dte_type=WEB (#9972)
'url': 'http://tvpot.daum.net/v/s3794Uf1NZeZ1qMpGpeqeRU',
'md5': 'a8917742069a4dd442516b86e7d66529',
'info_dict': {
'id': 's3794Uf1NZeZ1qMpGpeqeRU',
'ext': 'mp4',
'title': '러블리즈 - Destiny (나의 지구) (Lovelyz - Destiny) [쇼! 음악중심] 508회 20160611',
'description': '러블리즈 - Destiny (나의 지구) (Lovelyz - Destiny)\n\n[쇼! 음악중심] 20160611, 507회',
'upload_date': '20160611',
},
}] }]
def _real_extract(self, url): def _real_extract(self, url):
video_id = compat_urllib_parse_unquote(self._match_id(url)) video_id = compat_urllib_parse_unquote(self._match_id(url))
query = compat_urllib_parse_urlencode({'vid': video_id})
movie_data = self._download_json( movie_data = self._download_json(
'http://videofarm.daum.net/controller/api/closed/v1_2/IntegratedMovieData.json?' + query, 'http://videofarm.daum.net/controller/api/closed/v1_2/IntegratedMovieData.json',
video_id, 'Downloading video formats info') video_id, 'Downloading video formats info', query={'vid': video_id, 'dte_type': 'WEB'})
# For urls like http://m.tvpot.daum.net/v/65139429, where the video_id is really a clipid # For urls like http://m.tvpot.daum.net/v/65139429, where the video_id is really a clipid
if not movie_data.get('output_list', {}).get('output_list') and re.match(r'^\d+$', video_id): if not movie_data.get('output_list', {}).get('output_list') and re.match(r'^\d+$', video_id):
return self.url_result('http://tvpot.daum.net/clip/ClipView.do?clipid=%s' % video_id) return self.url_result('http://tvpot.daum.net/clip/ClipView.do?clipid=%s' % video_id)
info = self._download_xml( info = self._download_xml(
'http://tvpot.daum.net/clip/ClipInfoXml.do?' + query, video_id, 'http://tvpot.daum.net/clip/ClipInfoXml.do', video_id,
'Downloading video info') 'Downloading video info', query={'vid': video_id})
formats = [] formats = []
for format_el in movie_data['output_list']['output_list']: for format_el in movie_data['output_list']['output_list']:

View File

@@ -4,78 +4,47 @@ from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
float_or_none,
int_or_none,
clean_html,
)
class DBTVIE(InfoExtractor): class DBTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?dbtv\.no/(?:(?:lazyplayer|player)/)?(?P<id>[0-9]+)(?:#(?P<display_id>.+))?' _VALID_URL = r'https?://(?:www\.)?dbtv\.no/(?:[^/]+/)?(?P<id>[0-9]+)(?:#(?P<display_id>.+))?'
_TESTS = [{ _TESTS = [{
'url': 'http://dbtv.no/3649835190001#Skulle_teste_ut_fornøyelsespark,_men_kollegaen_var_bare_opptatt_av_bikinikroppen', 'url': 'http://dbtv.no/3649835190001#Skulle_teste_ut_fornøyelsespark,_men_kollegaen_var_bare_opptatt_av_bikinikroppen',
'md5': 'b89953ed25dacb6edb3ef6c6f430f8bc', 'md5': '2e24f67936517b143a234b4cadf792ec',
'info_dict': { 'info_dict': {
'id': '33100', 'id': '3649835190001',
'display_id': 'Skulle_teste_ut_fornøyelsespark,_men_kollegaen_var_bare_opptatt_av_bikinikroppen', 'display_id': 'Skulle_teste_ut_fornøyelsespark,_men_kollegaen_var_bare_opptatt_av_bikinikroppen',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Skulle teste ut fornøyelsespark, men kollegaen var bare opptatt av bikinikroppen', 'title': 'Skulle teste ut fornøyelsespark, men kollegaen var bare opptatt av bikinikroppen',
'description': 'md5:1504a54606c4dde3e4e61fc97aa857e0', 'description': 'md5:1504a54606c4dde3e4e61fc97aa857e0',
'thumbnail': 're:https?://.*\.jpg$', 'thumbnail': 're:https?://.*\.jpg',
'timestamp': 1404039863.438, 'timestamp': 1404039863,
'upload_date': '20140629', 'upload_date': '20140629',
'duration': 69.544, 'duration': 69.544,
'view_count': int, 'uploader_id': '1027729757001',
'categories': list, },
} 'add_ie': ['BrightcoveNew']
}, { }, {
'url': 'http://dbtv.no/3649835190001', 'url': 'http://dbtv.no/3649835190001',
'only_matching': True, 'only_matching': True,
}, { }, {
'url': 'http://www.dbtv.no/lazyplayer/4631135248001', 'url': 'http://www.dbtv.no/lazyplayer/4631135248001',
'only_matching': True, 'only_matching': True,
}, {
'url': 'http://dbtv.no/vice/5000634109001',
'only_matching': True,
}, {
'url': 'http://dbtv.no/filmtrailer/3359293614001',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) video_id, display_id = re.match(self._VALID_URL, url).groups()
video_id = mobj.group('id')
display_id = mobj.group('display_id') or video_id
data = self._download_json(
'http://api.dbtv.no/discovery/%s' % video_id, display_id)
video = data['playlist'][0]
formats = [{
'url': f['URL'],
'vcodec': f.get('container'),
'width': int_or_none(f.get('width')),
'height': int_or_none(f.get('height')),
'vbr': float_or_none(f.get('rate'), 1000),
'filesize': int_or_none(f.get('size')),
} for f in video['renditions'] if 'URL' in f]
if not formats:
for url_key, format_id in [('URL', 'mp4'), ('HLSURL', 'hls')]:
if url_key in video:
formats.append({
'url': video[url_key],
'format_id': format_id,
})
self._sort_formats(formats)
return { return {
'id': compat_str(video['id']), '_type': 'url_transparent',
'url': 'http://players.brightcove.net/1027729757001/default_default/index.html?videoId=%s' % video_id,
'id': video_id,
'display_id': display_id, 'display_id': display_id,
'title': video['title'], 'ie_key': 'BrightcoveNew',
'description': clean_html(video['desc']),
'thumbnail': video.get('splash') or video.get('thumb'),
'timestamp': float_or_none(video.get('publishedAt'), 1000),
'duration': float_or_none(video.get('length'), 1000),
'view_count': int_or_none(video.get('views')),
'categories': video.get('tags'),
'formats': formats,
} }

View File

@@ -62,11 +62,9 @@ class DCNBaseIE(InfoExtractor):
r'file\s*:\s*"https?(://[^"]+)/playlist.m3u8', r'file\s*:\s*"https?(://[^"]+)/playlist.m3u8',
r'<a[^>]+href="rtsp(://[^"]+)"' r'<a[^>]+href="rtsp(://[^"]+)"'
], webpage, 'format url') ], webpage, 'format url')
# TODO: Current DASH formats are broken - $Time$ pattern in formats.extend(self._extract_mpd_formats(
# <SegmentTemplate> not implemented yet format_url_base + '/manifest.mpd',
# formats.extend(self._extract_mpd_formats( video_id, mpd_id='dash', fatal=False))
# format_url_base + '/manifest.mpd',
# video_id, mpd_id='dash', fatal=False))
formats.extend(self._extract_m3u8_formats( formats.extend(self._extract_m3u8_formats(
format_url_base + '/playlist.m3u8', video_id, 'mp4', format_url_base + '/playlist.m3u8', video_id, 'mp4',
m3u8_entry_protocol, m3u8_id='hls', fatal=False)) m3u8_entry_protocol, m3u8_id='hls', fatal=False))

View File

@@ -0,0 +1,98 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
extract_attributes,
int_or_none,
parse_age_limit,
unescapeHTML,
)
class DiscoveryGoIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?discoverygo\.com/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_TEST = {
'url': 'https://www.discoverygo.com/love-at-first-kiss/kiss-first-ask-questions-later/',
'info_dict': {
'id': '57a33c536b66d1cd0345eeb1',
'ext': 'mp4',
'title': 'Kiss First, Ask Questions Later!',
'description': 'md5:fe923ba34050eae468bffae10831cb22',
'duration': 2579,
'series': 'Love at First Kiss',
'season_number': 1,
'episode_number': 1,
'age_limit': 14,
},
}
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
container = extract_attributes(
self._search_regex(
r'(<div[^>]+class=["\']video-player-container[^>]+>)',
webpage, 'video container'))
video = self._parse_json(
unescapeHTML(container.get('data-video') or container.get('data-json')),
display_id)
title = video['name']
stream = video['stream']
STREAM_URL_SUFFIX = 'streamUrl'
formats = []
for stream_kind in ('', 'hds'):
suffix = STREAM_URL_SUFFIX.capitalize() if stream_kind else STREAM_URL_SUFFIX
stream_url = stream.get('%s%s' % (stream_kind, suffix))
if not stream_url:
continue
if stream_kind == '':
formats.extend(self._extract_m3u8_formats(
stream_url, display_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls', fatal=False))
elif stream_kind == 'hds':
formats.extend(self._extract_f4m_formats(
stream_url, display_id, f4m_id=stream_kind, fatal=False))
self._sort_formats(formats)
video_id = video.get('id') or display_id
description = video.get('description', {}).get('detailed')
duration = int_or_none(video.get('duration'))
series = video.get('show', {}).get('name')
season_number = int_or_none(video.get('season', {}).get('number'))
episode_number = int_or_none(video.get('episodeNumber'))
tags = video.get('tags')
age_limit = parse_age_limit(video.get('parental', {}).get('rating'))
subtitles = {}
captions = stream.get('captions')
if isinstance(captions, list):
for caption in captions:
subtitle_url = caption.get('fileUrl')
if (not subtitle_url or not isinstance(subtitle_url, compat_str) or
not subtitle_url.startswith('http')):
continue
lang = caption.get('fileLang', 'en')
subtitles.setdefault(lang, []).append({'url': subtitle_url})
return {
'id': video_id,
'display_id': display_id,
'title': title,
'description': description,
'duration': duration,
'series': series,
'season_number': season_number,
'episode_number': episode_number,
'tags': tags,
'age_limit': age_limit,
'formats': formats,
'subtitles': subtitles,
}

View File

@@ -17,8 +17,12 @@ class DreiSatIE(ZDFIE):
'ext': 'mp4', 'ext': 'mp4',
'title': 'Waidmannsheil', 'title': 'Waidmannsheil',
'description': 'md5:cce00ca1d70e21425e72c86a98a56817', 'description': 'md5:cce00ca1d70e21425e72c86a98a56817',
'uploader': '3sat', 'uploader': 'SCHWEIZWEIT',
'uploader_id': '100000210',
'upload_date': '20140913' 'upload_date': '20140913'
},
'params': {
'skip_download': True, # m3u8 downloads
} }
}, },
{ {

View File

@@ -3,7 +3,10 @@ from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import str_to_int from ..utils import (
NO_DEFAULT,
str_to_int,
)
class DrTuberIE(InfoExtractor): class DrTuberIE(InfoExtractor):
@@ -17,7 +20,6 @@ class DrTuberIE(InfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'title': 'hot perky blonde naked golf', 'title': 'hot perky blonde naked golf',
'like_count': int, 'like_count': int,
'dislike_count': int,
'comment_count': int, 'comment_count': int,
'categories': ['Babe', 'Blonde', 'Erotic', 'Outdoor', 'Softcore', 'Solo'], 'categories': ['Babe', 'Blonde', 'Erotic', 'Outdoor', 'Softcore', 'Solo'],
'thumbnail': 're:https?://.*\.jpg$', 'thumbnail': 're:https?://.*\.jpg$',
@@ -36,25 +38,29 @@ class DrTuberIE(InfoExtractor):
r'<source src="([^"]+)"', webpage, 'video URL') r'<source src="([^"]+)"', webpage, 'video URL')
title = self._html_search_regex( title = self._html_search_regex(
[r'<p[^>]+class="title_substrate">([^<]+)</p>', r'<title>([^<]+) - \d+'], (r'class="title_watch"[^>]*><p>([^<]+)<',
r'<p[^>]+class="title_substrate">([^<]+)</p>',
r'<title>([^<]+) - \d+'),
webpage, 'title') webpage, 'title')
thumbnail = self._html_search_regex( thumbnail = self._html_search_regex(
r'poster="([^"]+)"', r'poster="([^"]+)"',
webpage, 'thumbnail', fatal=False) webpage, 'thumbnail', fatal=False)
def extract_count(id_, name): def extract_count(id_, name, default=NO_DEFAULT):
return str_to_int(self._html_search_regex( return str_to_int(self._html_search_regex(
r'<span[^>]+(?:class|id)="%s"[^>]*>([\d,\.]+)</span>' % id_, r'<span[^>]+(?:class|id)="%s"[^>]*>([\d,\.]+)</span>' % id_,
webpage, '%s count' % name, fatal=False)) webpage, '%s count' % name, default=default, fatal=False))
like_count = extract_count('rate_likes', 'like') like_count = extract_count('rate_likes', 'like')
dislike_count = extract_count('rate_dislikes', 'dislike') dislike_count = extract_count('rate_dislikes', 'dislike', default=None)
comment_count = extract_count('comments_count', 'comment') comment_count = extract_count('comments_count', 'comment')
cats_str = self._search_regex( cats_str = self._search_regex(
r'<div[^>]+class="categories_list">(.+?)</div>', webpage, 'categories', fatal=False) r'<div[^>]+class="categories_list">(.+?)</div>',
categories = [] if not cats_str else re.findall(r'<a title="([^"]+)"', cats_str) webpage, 'categories', fatal=False)
categories = [] if not cats_str else re.findall(
r'<a title="([^"]+)"', cats_str)
return { return {
'id': video_id, 'id': video_id,

View File

@@ -6,12 +6,13 @@ import json
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
NO_DEFAULT,
) )
class EllenTVIE(InfoExtractor): class EllenTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?(?:ellentv|ellentube)\.com/videos/(?P<id>[a-z0-9_-]+)' _VALID_URL = r'https?://(?:www\.)?(?:ellentv|ellentube)\.com/videos/(?P<id>[a-z0-9_-]+)'
_TEST = { _TESTS = [{
'url': 'http://www.ellentv.com/videos/0-ipq1gsai/', 'url': 'http://www.ellentv.com/videos/0-ipq1gsai/',
'md5': '4294cf98bc165f218aaa0b89e0fd8042', 'md5': '4294cf98bc165f218aaa0b89e0fd8042',
'info_dict': { 'info_dict': {
@@ -22,24 +23,47 @@ class EllenTVIE(InfoExtractor):
'timestamp': 1428035648, 'timestamp': 1428035648,
'upload_date': '20150403', 'upload_date': '20150403',
'uploader_id': 'batchUser', 'uploader_id': 'batchUser',
} },
} }, {
# not available via http://widgets.ellentube.com/
'url': 'http://www.ellentv.com/videos/1-szkgu2m2/',
'info_dict': {
'id': '1_szkgu2m2',
'ext': 'flv',
'title': "Ellen's Amazingly Talented Audience",
'description': 'md5:86ff1e376ff0d717d7171590e273f0a5',
'timestamp': 1255140900,
'upload_date': '20091010',
'uploader_id': 'ellenkaltura@gmail.com',
},
'params': {
'skip_download': True,
},
}]
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
webpage = self._download_webpage( URLS = ('http://widgets.ellentube.com/videos/%s' % video_id, url)
'http://widgets.ellentube.com/videos/%s' % video_id,
video_id)
partner_id = self._search_regex( for num, url_ in enumerate(URLS, 1):
r"var\s+partnerId\s*=\s*'([^']+)", webpage, 'partner id') webpage = self._download_webpage(
url_, video_id, fatal=num == len(URLS))
kaltura_id = self._search_regex( default = NO_DEFAULT if num == len(URLS) else None
[r'id="kaltura_player_([^"]+)"',
r"_wb_entry_id\s*:\s*'([^']+)", partner_id = self._search_regex(
r'data-kaltura-entry-id="([^"]+)'], r"var\s+partnerId\s*=\s*'([^']+)", webpage, 'partner id',
webpage, 'kaltura id') default=default)
kaltura_id = self._search_regex(
[r'id="kaltura_player_([^"]+)"',
r"_wb_entry_id\s*:\s*'([^']+)",
r'data-kaltura-entry-id="([^"]+)'],
webpage, 'kaltura id', default=default)
if partner_id and kaltura_id:
break
return self.url_result('kaltura:%s:%s' % (partner_id, kaltura_id), 'Kaltura') return self.url_result('kaltura:%s:%s' % (partner_id, kaltura_id), 'Kaltura')

View File

@@ -4,9 +4,10 @@ from .common import InfoExtractor
class EngadgetIE(InfoExtractor): class EngadgetIE(InfoExtractor):
_VALID_URL = r'https?://www.engadget.com/video/(?P<id>\d+)' _VALID_URL = r'https?://www.engadget.com/video/(?P<id>[^/?#]+)'
_TEST = { _TESTS = [{
# video with 5min ID
'url': 'http://www.engadget.com/video/518153925/', 'url': 'http://www.engadget.com/video/518153925/',
'md5': 'c6820d4828a5064447a4d9fc73f312c9', 'md5': 'c6820d4828a5064447a4d9fc73f312c9',
'info_dict': { 'info_dict': {
@@ -15,8 +16,12 @@ class EngadgetIE(InfoExtractor):
'title': 'Samsung Galaxy Tab Pro 8.4 Review', 'title': 'Samsung Galaxy Tab Pro 8.4 Review',
}, },
'add_ie': ['FiveMin'], 'add_ie': ['FiveMin'],
} }, {
# video with vidible ID
'url': 'https://www.engadget.com/video/57a28462134aa15a39f0421a/',
'only_matching': True,
}]
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
return self.url_result('5min:%s' % video_id) return self.url_result('aol-video:%s' % video_id)

View File

@@ -4,19 +4,23 @@ from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_str
from ..utils import ( from ..utils import (
encode_base_n,
ExtractorError,
int_or_none,
parse_duration, parse_duration,
str_to_int, str_to_int,
) )
class EpornerIE(InfoExtractor): class EpornerIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?eporner\.com/hd-porn/(?P<id>\w+)/(?P<display_id>[\w-]+)' _VALID_URL = r'https?://(?:www\.)?eporner\.com/hd-porn/(?P<id>\w+)(?:/(?P<display_id>[\w-]+))?'
_TESTS = [{ _TESTS = [{
'url': 'http://www.eporner.com/hd-porn/95008/Infamous-Tiffany-Teen-Strip-Tease-Video/', 'url': 'http://www.eporner.com/hd-porn/95008/Infamous-Tiffany-Teen-Strip-Tease-Video/',
'md5': '39d486f046212d8e1b911c52ab4691f8', 'md5': '39d486f046212d8e1b911c52ab4691f8',
'info_dict': { 'info_dict': {
'id': '95008', 'id': 'qlDUmNsj6VS',
'display_id': 'Infamous-Tiffany-Teen-Strip-Tease-Video', 'display_id': 'Infamous-Tiffany-Teen-Strip-Tease-Video',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Infamous Tiffany Teen Strip Tease Video', 'title': 'Infamous Tiffany Teen Strip Tease Video',
@@ -28,34 +32,72 @@ class EpornerIE(InfoExtractor):
# New (May 2016) URL layout # New (May 2016) URL layout
'url': 'http://www.eporner.com/hd-porn/3YRUtzMcWn0/Star-Wars-XXX-Parody/', 'url': 'http://www.eporner.com/hd-porn/3YRUtzMcWn0/Star-Wars-XXX-Parody/',
'only_matching': True, 'only_matching': True,
}, {
'url': 'http://www.eporner.com/hd-porn/3YRUtzMcWn0',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id') video_id = mobj.group('id')
display_id = mobj.group('display_id') display_id = mobj.group('display_id') or video_id
webpage = self._download_webpage(url, display_id) webpage, urlh = self._download_webpage_handle(url, display_id)
title = self._html_search_regex(
r'<title>(.*?) - EPORNER', webpage, 'title')
redirect_url = 'http://www.eporner.com/config5/%s' % video_id video_id = self._match_id(compat_str(urlh.geturl()))
player_code = self._download_webpage(
redirect_url, display_id, note='Downloading player config')
sources = self._search_regex( hash = self._search_regex(
r'(?s)sources\s*:\s*\[\s*({.+?})\s*\]', player_code, 'sources') r'hash\s*:\s*["\']([\da-f]{32})', webpage, 'hash')
title = self._og_search_title(webpage, default=None) or self._html_search_regex(
r'<title>(.+?) - EPORNER', webpage, 'title')
# Reverse engineered from vjs.js
def calc_hash(s):
return ''.join((encode_base_n(int(s[lb:lb + 8], 16), 36) for lb in range(0, 32, 8)))
video = self._download_json(
'http://www.eporner.com/xhr/video/%s' % video_id,
display_id, note='Downloading video JSON',
query={
'hash': calc_hash(hash),
'device': 'generic',
'domain': 'www.eporner.com',
'fallback': 'false',
})
if video.get('available') is False:
raise ExtractorError(
'%s said: %s' % (self.IE_NAME, video['message']), expected=True)
sources = video['sources']
formats = [] formats = []
for video_url, format_id in re.findall(r'file\s*:\s*"([^"]+)",\s*label\s*:\s*"([^"]+)"', sources): for kind, formats_dict in sources.items():
fmt = { if not isinstance(formats_dict, dict):
'url': video_url, continue
'format_id': format_id, for format_id, format_dict in formats_dict.items():
} if not isinstance(format_dict, dict):
m = re.search(r'^(\d+)', format_id) continue
if m: src = format_dict.get('src')
fmt['height'] = int(m.group(1)) if not isinstance(src, compat_str) or not src.startswith('http'):
formats.append(fmt) continue
if kind == 'hls':
formats.extend(self._extract_m3u8_formats(
src, display_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id=kind, fatal=False))
else:
height = int_or_none(self._search_regex(
r'(\d+)[pP]', format_id, 'height', default=None))
fps = int_or_none(self._search_regex(
r'(\d+)fps', format_id, 'fps', default=None))
formats.append({
'url': src,
'format_id': format_id,
'height': height,
'fps': fps,
})
self._sort_formats(formats) self._sort_formats(formats)
duration = parse_duration(self._html_search_meta('duration', webpage)) duration = parse_duration(self._html_search_meta('duration', webpage))

View File

@@ -44,6 +44,7 @@ from .appletrailers import (
AppleTrailersSectionIE, AppleTrailersSectionIE,
) )
from .archiveorg import ArchiveOrgIE from .archiveorg import ArchiveOrgIE
from .arkena import ArkenaIE
from .ard import ( from .ard import (
ARDIE, ARDIE,
ARDMediathekIE, ARDMediathekIE,
@@ -139,9 +140,9 @@ from .chirbit import (
ChirbitProfileIE, ChirbitProfileIE,
) )
from .cinchcast import CinchcastIE from .cinchcast import CinchcastIE
from .cliprs import ClipRsIE
from .clipfish import ClipfishIE from .clipfish import ClipfishIE
from .cliphunter import CliphunterIE from .cliphunter import CliphunterIE
from .cliprs import ClipRsIE
from .clipsyndicate import ClipsyndicateIE from .clipsyndicate import ClipsyndicateIE
from .closertotruth import CloserToTruthIE from .closertotruth import CloserToTruthIE
from .cloudy import CloudyIE from .cloudy import CloudyIE
@@ -156,7 +157,12 @@ from .cnn import (
) )
from .coub import CoubIE from .coub import CoubIE
from .collegerama import CollegeRamaIE from .collegerama import CollegeRamaIE
from .comedycentral import ComedyCentralIE, ComedyCentralShowsIE from .comedycentral import (
ComedyCentralIE,
ComedyCentralShortnameIE,
ComedyCentralTVIE,
ToshIE,
)
from .comcarcoff import ComCarCoffIE from .comcarcoff import ComCarCoffIE
from .commonmistakes import CommonMistakesIE, UnicodeBOMIE from .commonmistakes import CommonMistakesIE, UnicodeBOMIE
from .commonprotocols import RtmpIE from .commonprotocols import RtmpIE
@@ -215,6 +221,7 @@ from .dvtv import DVTVIE
from .dumpert import DumpertIE from .dumpert import DumpertIE
from .defense import DefenseGouvFrIE from .defense import DefenseGouvFrIE
from .discovery import DiscoveryIE from .discovery import DiscoveryIE
from .discoverygo import DiscoveryGoIE
from .dispeak import DigitallySpeakingIE from .dispeak import DigitallySpeakingIE
from .dropbox import DropboxIE from .dropbox import DropboxIE
from .dw import ( from .dw import (
@@ -256,6 +263,7 @@ from .fivemin import FiveMinIE
from .fivetv import FiveTVIE from .fivetv import FiveTVIE
from .fktv import FKTVIE from .fktv import FKTVIE
from .flickr import FlickrIE from .flickr import FlickrIE
from .flipagram import FlipagramIE
from .folketinget import FolketingetIE from .folketinget import FolketingetIE
from .footyroom import FootyRoomIE from .footyroom import FootyRoomIE
from .formula1 import Formula1IE from .formula1 import Formula1IE
@@ -283,7 +291,6 @@ from .funimation import FunimationIE
from .funnyordie import FunnyOrDieIE from .funnyordie import FunnyOrDieIE
from .fusion import FusionIE from .fusion import FusionIE
from .gameinformer import GameInformerIE from .gameinformer import GameInformerIE
from .gamekings import GamekingsIE
from .gameone import ( from .gameone import (
GameOneIE, GameOneIE,
GameOnePlaylistIE, GameOnePlaylistIE,
@@ -304,7 +311,6 @@ from .globo import (
) )
from .godtube import GodTubeIE from .godtube import GodTubeIE
from .godtv import GodTVIE from .godtv import GodTVIE
from .goldenmoustache import GoldenMoustacheIE
from .golem import GolemIE from .golem import GolemIE
from .googledrive import GoogleDriveIE from .googledrive import GoogleDriveIE
from .googleplus import GooglePlusIE from .googleplus import GooglePlusIE
@@ -368,6 +374,7 @@ from .jove import JoveIE
from .jwplatform import JWPlatformIE from .jwplatform import JWPlatformIE
from .jpopsukitv import JpopsukiIE from .jpopsukitv import JpopsukiIE
from .kaltura import KalturaIE from .kaltura import KalturaIE
from .kamcord import KamcordIE
from .kanalplay import KanalPlayIE from .kanalplay import KanalPlayIE
from .kankan import KankanIE from .kankan import KankanIE
from .karaoketv import KaraoketvIE from .karaoketv import KaraoketvIE
@@ -391,6 +398,10 @@ from .kuwo import (
) )
from .la7 import LA7IE from .la7 import LA7IE
from .laola1tv import Laola1TvIE from .laola1tv import Laola1TvIE
from .lcp import (
LcpPlayIE,
LcpIE,
)
from .learnr import LearnrIE from .learnr import LearnrIE
from .lecture2go import Lecture2GoIE from .lecture2go import Lecture2GoIE
from .lemonde import LemondeIE from .lemonde import LemondeIE
@@ -469,7 +480,6 @@ from .msn import MSNIE
from .mtv import ( from .mtv import (
MTVIE, MTVIE,
MTVServicesEmbeddedIE, MTVServicesEmbeddedIE,
MTVIggyIE,
MTVDEIE, MTVDEIE,
) )
from .muenchentv import MuenchenTVIE from .muenchentv import MuenchenTVIE
@@ -481,8 +491,9 @@ from .myvi import MyviIE
from .myvideo import MyVideoIE from .myvideo import MyVideoIE
from .myvidster import MyVidsterIE from .myvidster import MyVidsterIE
from .nationalgeographic import ( from .nationalgeographic import (
NationalGeographicVideoIE,
NationalGeographicIE, NationalGeographicIE,
NationalGeographicChannelIE, NationalGeographicEpisodeGuideIE,
) )
from .naver import NaverIE from .naver import NaverIE
from .nba import NBAIE from .nba import NBAIE
@@ -519,7 +530,6 @@ from .nextmedia import (
NextMediaActionNewsIE, NextMediaActionNewsIE,
AppleDailyIE, AppleDailyIE,
) )
from .nextmovie import NextMovieIE
from .nfb import NFBIE from .nfb import NFBIE
from .nfl import NFLIE from .nfl import NFLIE
from .nhl import ( from .nhl import (
@@ -535,6 +545,8 @@ from .nick import (
from .niconico import NiconicoIE, NiconicoPlaylistIE from .niconico import NiconicoIE, NiconicoPlaylistIE
from .ninecninemedia import NineCNineMediaIE from .ninecninemedia import NineCNineMediaIE
from .ninegag import NineGagIE from .ninegag import NineGagIE
from .ninenow import NineNowIE
from .nintendo import NintendoIE
from .noco import NocoIE from .noco import NocoIE
from .normalboots import NormalbootsIE from .normalboots import NormalbootsIE
from .nosvideo import NosVideoIE from .nosvideo import NosVideoIE
@@ -579,8 +591,13 @@ from .nytimes import (
NYTimesArticleIE, NYTimesArticleIE,
) )
from .nuvid import NuvidIE from .nuvid import NuvidIE
from .odatv import OdaTVIE
from .odnoklassniki import OdnoklassnikiIE from .odnoklassniki import OdnoklassnikiIE
from .oktoberfesttv import OktoberfestTVIE from .oktoberfesttv import OktoberfestTVIE
from .onet import (
OnetIE,
OnetChannelIE,
)
from .onionstudios import OnionStudiosIE from .onionstudios import OnionStudiosIE
from .ooyala import ( from .ooyala import (
OoyalaIE, OoyalaIE,
@@ -619,6 +636,7 @@ from .pluralsight import (
PluralsightCourseIE, PluralsightCourseIE,
) )
from .podomatic import PodomaticIE from .podomatic import PodomaticIE
from .pokemon import PokemonIE
from .polskieradio import PolskieRadioIE from .polskieradio import PolskieRadioIE
from .porn91 import Porn91IE from .porn91 import Porn91IE
from .pornhd import PornHdIE from .pornhd import PornHdIE
@@ -674,16 +692,19 @@ from .rice import RICEIE
from .ringtv import RingTVIE from .ringtv import RingTVIE
from .ro220 import Ro220IE from .ro220 import Ro220IE
from .rockstargames import RockstarGamesIE from .rockstargames import RockstarGamesIE
from .roosterteeth import RoosterTeethIE
from .rottentomatoes import RottenTomatoesIE from .rottentomatoes import RottenTomatoesIE
from .roxwel import RoxwelIE from .roxwel import RoxwelIE
from .rozhlas import RozhlasIE
from .rtbf import RTBFIE from .rtbf import RTBFIE
from .rte import RteIE, RteRadioIE from .rte import RteIE, RteRadioIE
from .rtlnl import RtlNlIE from .rtlnl import RtlNlIE
from .rtl2 import RTL2IE from .rtl2 import RTL2IE
from .rtp import RTPIE from .rtp import RTPIE
from .rts import RTSIE from .rts import RTSIE
from .rtve import RTVEALaCartaIE, RTVELiveIE, RTVEInfantilIE from .rtve import RTVEALaCartaIE, RTVELiveIE, RTVEInfantilIE, RTVELiveIE, RTVETelevisionIE
from .rtvnh import RTVNHIE from .rtvnh import RTVNHIE
from .rudo import RudoIE
from .ruhd import RUHDIE from .ruhd import RUHDIE
from .ruleporn import RulePornIE from .ruleporn import RulePornIE
from .rutube import ( from .rutube import (
@@ -734,6 +755,7 @@ from .smotri import (
) )
from .snotr import SnotrIE from .snotr import SnotrIE
from .sohu import SohuIE from .sohu import SohuIE
from .sonyliv import SonyLIVIE
from .soundcloud import ( from .soundcloud import (
SoundcloudIE, SoundcloudIE,
SoundcloudSetIE, SoundcloudSetIE,
@@ -773,6 +795,7 @@ from .srmediathek import SRMediathekIE
from .ssa import SSAIE from .ssa import SSAIE
from .stanfordoc import StanfordOpenClassroomIE from .stanfordoc import StanfordOpenClassroomIE
from .steam import SteamIE from .steam import SteamIE
from .streamable import StreamableIE
from .streamcloud import StreamcloudIE from .streamcloud import StreamcloudIE
from .streamcz import StreamCZIE from .streamcz import StreamCZIE
from .streetvoice import StreetVoiceIE from .streetvoice import StreetVoiceIE
@@ -872,6 +895,7 @@ from .tvc import (
from .tvigle import TvigleIE from .tvigle import TvigleIE
from .tvland import TVLandIE from .tvland import TVLandIE
from .tvp import ( from .tvp import (
TVPEmbedIE,
TVPIE, TVPIE,
TVPSeriesIE, TVPSeriesIE,
) )
@@ -904,6 +928,7 @@ from .udemy import (
from .udn import UDNEmbedIE from .udn import UDNEmbedIE
from .digiteka import DigitekaIE from .digiteka import DigitekaIE
from .unistra import UnistraIE from .unistra import UnistraIE
from .uol import UOLIE
from .urort import UrortIE from .urort import UrortIE
from .urplay import URPlayIE from .urplay import URPlayIE
from .usatoday import USATodayIE from .usatoday import USATodayIE
@@ -981,9 +1006,11 @@ from .viki import (
from .vk import ( from .vk import (
VKIE, VKIE,
VKUserVideosIE, VKUserVideosIE,
VKWallPostIE,
) )
from .vlive import VLiveIE from .vlive import VLiveIE
from .vodlocker import VodlockerIE from .vodlocker import VodlockerIE
from .vodplatform import VODPlatformIE
from .voicerepublic import VoiceRepublicIE from .voicerepublic import VoiceRepublicIE
from .voxmedia import VoxMediaIE from .voxmedia import VoxMediaIE
from .vporn import VpornIE from .vporn import VpornIE
@@ -1066,6 +1093,7 @@ from .youtube import (
YoutubeSearchDateIE, YoutubeSearchDateIE,
YoutubeSearchIE, YoutubeSearchIE,
YoutubeSearchURLIE, YoutubeSearchURLIE,
YoutubeSharedVideoIE,
YoutubeShowIE, YoutubeShowIE,
YoutubeSubscriptionsIE, YoutubeSubscriptionsIE,
YoutubeTruncatedIDIE, YoutubeTruncatedIDIE,

View File

@@ -27,7 +27,7 @@ class FacebookIE(InfoExtractor):
_VALID_URL = r'''(?x) _VALID_URL = r'''(?x)
(?: (?:
https?:// https?://
(?:\w+\.)?facebook\.com/ (?:[\w-]+\.)?facebook\.com/
(?:[^#]*?\#!/)? (?:[^#]*?\#!/)?
(?: (?:
(?: (?:
@@ -127,6 +127,9 @@ class FacebookIE(InfoExtractor):
}, { }, {
'url': 'https://www.facebook.com/groups/164828000315060/permalink/764967300301124/', 'url': 'https://www.facebook.com/groups/164828000315060/permalink/764967300301124/',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://zh-hk.facebook.com/peoplespower/videos/1135894589806027/',
'only_matching': True,
}] }]
@staticmethod @staticmethod
@@ -219,12 +222,25 @@ class FacebookIE(InfoExtractor):
BEFORE = '{swf.addParam(param[0], param[1]);});' BEFORE = '{swf.addParam(param[0], param[1]);});'
AFTER = '.forEach(function(variable) {swf.addVariable(variable[0], variable[1]);});' AFTER = '.forEach(function(variable) {swf.addVariable(variable[0], variable[1]);});'
m = re.search(re.escape(BEFORE) + '(?:\n|\\\\n)(.*?)' + re.escape(AFTER), webpage) PATTERN = re.escape(BEFORE) + '(?:\n|\\\\n)(.*?)' + re.escape(AFTER)
if m:
swf_params = m.group(1).replace('\\\\', '\\').replace('\\"', '"') for m in re.findall(PATTERN, webpage):
swf_params = m.replace('\\\\', '\\').replace('\\"', '"')
data = dict(json.loads(swf_params)) data = dict(json.loads(swf_params))
params_raw = compat_urllib_parse_unquote(data['params']) params_raw = compat_urllib_parse_unquote(data['params'])
video_data = json.loads(params_raw)['video_data'] video_data_candidate = json.loads(params_raw)['video_data']
for _, f in video_data_candidate.items():
if not f:
continue
if isinstance(f, dict):
f = [f]
if not isinstance(f, list):
continue
if f[0].get('video_id') == video_id:
video_data = video_data_candidate
break
if video_data:
break
def video_data_list2dict(video_data): def video_data_list2dict(video_data):
ret = {} ret = {}

View File

@@ -1,24 +1,11 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import (
compat_parse_qs,
compat_urllib_parse_urlencode,
compat_urllib_parse_urlparse,
compat_urlparse,
)
from ..utils import (
ExtractorError,
parse_duration,
replace_extension,
)
class FiveMinIE(InfoExtractor): class FiveMinIE(InfoExtractor):
IE_NAME = '5min' IE_NAME = '5min'
_VALID_URL = r'(?:5min:(?P<id>\d+)(?::(?P<sid>\d+))?|https?://[^/]*?5min\.com/Scripts/PlayerSeed\.js\?(?P<query>.*))' _VALID_URL = r'(?:5min:|https?://(?:[^/]*?5min\.com/|delivery\.vidible\.tv/aol)(?:(?:Scripts/PlayerSeed\.js|playerseed/?)?\?.*?playList=)?)(?P<id>\d+)'
_TESTS = [ _TESTS = [
{ {
@@ -29,8 +16,16 @@ class FiveMinIE(InfoExtractor):
'id': '518013791', 'id': '518013791',
'ext': 'mp4', 'ext': 'mp4',
'title': 'iPad Mini with Retina Display Review', 'title': 'iPad Mini with Retina Display Review',
'description': 'iPad mini with Retina Display review',
'duration': 177, 'duration': 177,
'uploader': 'engadget',
'upload_date': '20131115',
'timestamp': 1384515288,
}, },
'params': {
# m3u8 download
'skip_download': True,
}
}, },
{ {
# From http://on.aol.com/video/how-to-make-a-next-level-fruit-salad-518086247 # From http://on.aol.com/video/how-to-make-a-next-level-fruit-salad-518086247
@@ -44,108 +39,16 @@ class FiveMinIE(InfoExtractor):
}, },
'skip': 'no longer available', 'skip': 'no longer available',
}, },
{
'url': 'http://embed.5min.com/518726732/',
'only_matching': True,
},
{
'url': 'http://delivery.vidible.tv/aol?playList=518013791',
'only_matching': True,
}
] ]
_ERRORS = {
'ErrorVideoNotExist': 'We\'re sorry, but the video you are trying to watch does not exist.',
'ErrorVideoNoLongerAvailable': 'We\'re sorry, but the video you are trying to watch is no longer available.',
'ErrorVideoRejected': 'We\'re sorry, but the video you are trying to watch has been removed.',
'ErrorVideoUserNotGeo': 'We\'re sorry, but the video you are trying to watch cannot be viewed from your current location.',
'ErrorVideoLibraryRestriction': 'We\'re sorry, but the video you are trying to watch is currently unavailable for viewing at this domain.',
'ErrorExposurePermission': 'We\'re sorry, but the video you are trying to watch is currently unavailable for viewing at this domain.',
}
_QUALITIES = {
1: {
'width': 640,
'height': 360,
},
2: {
'width': 854,
'height': 480,
},
4: {
'width': 1280,
'height': 720,
},
8: {
'width': 1920,
'height': 1080,
},
16: {
'width': 640,
'height': 360,
},
32: {
'width': 854,
'height': 480,
},
64: {
'width': 1280,
'height': 720,
},
128: {
'width': 640,
'height': 360,
},
}
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) video_id = self._match_id(url)
video_id = mobj.group('id') return self.url_result('aol-video:%s' % video_id)
sid = mobj.group('sid')
if mobj.group('query'):
qs = compat_parse_qs(mobj.group('query'))
if not qs.get('playList'):
raise ExtractorError('Invalid URL', expected=True)
video_id = qs['playList'][0]
if qs.get('sid'):
sid = qs['sid'][0]
embed_url = 'https://embed.5min.com/playerseed/?playList=%s' % video_id
if not sid:
embed_page = self._download_webpage(embed_url, video_id,
'Downloading embed page')
sid = self._search_regex(r'sid=(\d+)', embed_page, 'sid')
response = self._download_json(
'https://syn.5min.com/handlers/SenseHandler.ashx?' +
compat_urllib_parse_urlencode({
'func': 'GetResults',
'playlist': video_id,
'sid': sid,
'isPlayerSeed': 'true',
'url': embed_url,
}),
video_id)
if not response['success']:
raise ExtractorError(
'%s said: %s' % (
self.IE_NAME,
self._ERRORS.get(response['errorMessage'], response['errorMessage'])),
expected=True)
info = response['binding'][0]
formats = []
parsed_video_url = compat_urllib_parse_urlparse(compat_parse_qs(
compat_urllib_parse_urlparse(info['EmbededURL']).query)['videoUrl'][0])
for rendition in info['Renditions']:
if rendition['RenditionType'] == 'aac' or rendition['RenditionType'] == 'm3u8':
continue
else:
rendition_url = compat_urlparse.urlunparse(parsed_video_url._replace(path=replace_extension(parsed_video_url.path.replace('//', '/%s/' % rendition['ID']), rendition['RenditionType'])))
quality = self._QUALITIES.get(rendition['ID'], {})
formats.append({
'format_id': '%s-%d' % (rendition['RenditionType'], rendition['ID']),
'url': rendition_url,
'width': quality.get('width'),
'height': quality.get('height'),
})
self._sort_formats(formats)
return {
'id': video_id,
'title': info['Title'],
'thumbnail': info.get('ThumbURL'),
'duration': parse_duration(info.get('Duration')),
'formats': formats,
}

View File

@@ -0,0 +1,115 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
int_or_none,
float_or_none,
try_get,
unified_timestamp,
)
class FlipagramIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?flipagram\.com/f/(?P<id>[^/?#&]+)'
_TEST = {
'url': 'https://flipagram.com/f/nyvTSJMKId',
'md5': '888dcf08b7ea671381f00fab74692755',
'info_dict': {
'id': 'nyvTSJMKId',
'ext': 'mp4',
'title': 'Flipagram by sjuria101 featuring Midnight Memories by One Direction',
'description': 'md5:d55e32edc55261cae96a41fa85ff630e',
'duration': 35.571,
'timestamp': 1461244995,
'upload_date': '20160421',
'uploader': 'kitty juria',
'uploader_id': 'sjuria101',
'creator': 'kitty juria',
'view_count': int,
'like_count': int,
'repost_count': int,
'comment_count': int,
'comments': list,
'formats': 'mincount:2',
},
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
video_data = self._parse_json(
self._search_regex(
r'window\.reactH2O\s*=\s*({.+});', webpage, 'video data'),
video_id)
flipagram = video_data['flipagram']
video = flipagram['video']
json_ld = self._search_json_ld(webpage, video_id, default={})
title = json_ld.get('title') or flipagram['captionText']
description = json_ld.get('description') or flipagram.get('captionText')
formats = [{
'url': video['url'],
'width': int_or_none(video.get('width')),
'height': int_or_none(video.get('height')),
'filesize': int_or_none(video_data.get('size')),
}]
preview_url = try_get(
flipagram, lambda x: x['music']['track']['previewUrl'], compat_str)
if preview_url:
formats.append({
'url': preview_url,
'ext': 'm4a',
'vcodec': 'none',
})
self._sort_formats(formats)
counts = flipagram.get('counts', {})
user = flipagram.get('user', {})
video_data = flipagram.get('video', {})
thumbnails = [{
'url': self._proto_relative_url(cover['url']),
'width': int_or_none(cover.get('width')),
'height': int_or_none(cover.get('height')),
'filesize': int_or_none(cover.get('size')),
} for cover in flipagram.get('covers', []) if cover.get('url')]
# Note that this only retrieves comments that are initally loaded.
# For videos with large amounts of comments, most won't be retrieved.
comments = []
for comment in video_data.get('comments', {}).get(video_id, {}).get('items', []):
text = comment.get('comment')
if not text or not isinstance(text, list):
continue
comments.append({
'author': comment.get('user', {}).get('name'),
'author_id': comment.get('user', {}).get('username'),
'id': comment.get('id'),
'text': text[0],
'timestamp': unified_timestamp(comment.get('created')),
})
return {
'id': video_id,
'title': title,
'description': description,
'duration': float_or_none(flipagram.get('duration'), 1000),
'thumbnails': thumbnails,
'timestamp': unified_timestamp(flipagram.get('iso8601Created')),
'uploader': user.get('name'),
'uploader_id': user.get('username'),
'creator': user.get('name'),
'view_count': int_or_none(counts.get('plays')),
'like_count': int_or_none(counts.get('likes')),
'repost_count': int_or_none(counts.get('reflips')),
'comment_count': int_or_none(counts.get('comments')),
'comments': comments,
'formats': formats,
}

View File

@@ -5,8 +5,8 @@ from .common import InfoExtractor
class Formula1IE(InfoExtractor): class Formula1IE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?formula1\.com/content/fom-website/en/video/\d{4}/\d{1,2}/(?P<id>.+?)\.html' _VALID_URL = r'https?://(?:www\.)?formula1\.com/(?:content/fom-website/)?en/video/\d{4}/\d{1,2}/(?P<id>.+?)\.html'
_TEST = { _TESTS = [{
'url': 'http://www.formula1.com/content/fom-website/en/video/2016/5/Race_highlights_-_Spain_2016.html', 'url': 'http://www.formula1.com/content/fom-website/en/video/2016/5/Race_highlights_-_Spain_2016.html',
'md5': '8c79e54be72078b26b89e0e111c0502b', 'md5': '8c79e54be72078b26b89e0e111c0502b',
'info_dict': { 'info_dict': {
@@ -15,7 +15,10 @@ class Formula1IE(InfoExtractor):
'title': 'Race highlights - Spain 2016', 'title': 'Race highlights - Spain 2016',
}, },
'add_ie': ['Ooyala'], 'add_ie': ['Ooyala'],
} }, {
'url': 'http://www.formula1.com/en/video/2016/5/Race_highlights_-_Spain_2016.html',
'only_matching': True,
}]
def _real_extract(self, url): def _real_extract(self, url):
display_id = self._match_id(url) display_id = self._match_id(url)

View File

@@ -2,7 +2,10 @@
from __future__ import unicode_literals from __future__ import unicode_literals
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import smuggle_url from ..utils import (
smuggle_url,
update_url_query,
)
class FOXIE(InfoExtractor): class FOXIE(InfoExtractor):
@@ -29,11 +32,12 @@ class FOXIE(InfoExtractor):
release_url = self._parse_json(self._search_regex( release_url = self._parse_json(self._search_regex(
r'"fox_pdk_player"\s*:\s*({[^}]+?})', webpage, 'fox_pdk_player'), r'"fox_pdk_player"\s*:\s*({[^}]+?})', webpage, 'fox_pdk_player'),
video_id)['release_url'] + '&switch=http' video_id)['release_url']
return { return {
'_type': 'url_transparent', '_type': 'url_transparent',
'ie_key': 'ThePlatform', 'ie_key': 'ThePlatform',
'url': smuggle_url(release_url, {'force_smil_url': True}), 'url': smuggle_url(update_url_query(
release_url, {'switch': 'http'}), {'force_smil_url': True}),
'id': video_id, 'id': video_id,
} }

View File

@@ -14,7 +14,10 @@ from ..utils import (
parse_duration, parse_duration,
determine_ext, determine_ext,
) )
from .dailymotion import DailymotionCloudIE from .dailymotion import (
DailymotionIE,
DailymotionCloudIE,
)
class FranceTVBaseInfoExtractor(InfoExtractor): class FranceTVBaseInfoExtractor(InfoExtractor):
@@ -128,7 +131,7 @@ class PluzzIE(FranceTVBaseInfoExtractor):
class FranceTvInfoIE(FranceTVBaseInfoExtractor): class FranceTvInfoIE(FranceTVBaseInfoExtractor):
IE_NAME = 'francetvinfo.fr' IE_NAME = 'francetvinfo.fr'
_VALID_URL = r'https?://(?:www|mobile|france3-regions)\.francetvinfo\.fr/.*/(?P<title>.+)\.html' _VALID_URL = r'https?://(?:www|mobile|france3-regions)\.francetvinfo\.fr/(?:[^/]+/)*(?P<title>[^/?#&.]+)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.francetvinfo.fr/replay-jt/france-3/soir-3/jt-grand-soir-3-lundi-26-aout-2013_393427.html', 'url': 'http://www.francetvinfo.fr/replay-jt/france-3/soir-3/jt-grand-soir-3-lundi-26-aout-2013_393427.html',
@@ -188,6 +191,24 @@ class FranceTvInfoIE(FranceTVBaseInfoExtractor):
'params': { 'params': {
'skip_download': True, 'skip_download': True,
}, },
}, {
# Dailymotion embed
'url': 'http://www.francetvinfo.fr/politique/notre-dame-des-landes/video-sur-france-inter-cecile-duflot-denonce-le-regard-meprisant-de-patrick-cohen_1520091.html',
'md5': 'ee7f1828f25a648addc90cb2687b1f12',
'info_dict': {
'id': 'x4iiko0',
'ext': 'mp4',
'title': 'NDDL, référendum, Brexit : Cécile Duflot répond à Patrick Cohen',
'description': 'Au lendemain de la victoire du "oui" au référendum sur l\'aéroport de Notre-Dame-des-Landes, l\'ancienne ministre écologiste est l\'invitée de Patrick Cohen. Plus d\'info : https://www.franceinter.fr/emissions/le-7-9/le-7-9-27-juin-2016',
'timestamp': 1467011958,
'upload_date': '20160627',
'uploader': 'France Inter',
'uploader_id': 'x2q2ez',
},
'add_ie': ['Dailymotion'],
}, {
'url': 'http://france3-regions.francetvinfo.fr/limousin/emissions/jt-1213-limousin',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
@@ -197,7 +218,13 @@ class FranceTvInfoIE(FranceTVBaseInfoExtractor):
dmcloud_url = DailymotionCloudIE._extract_dmcloud_url(webpage) dmcloud_url = DailymotionCloudIE._extract_dmcloud_url(webpage)
if dmcloud_url: if dmcloud_url:
return self.url_result(dmcloud_url, 'DailymotionCloud') return self.url_result(dmcloud_url, DailymotionCloudIE.ie_key())
dailymotion_urls = DailymotionIE._extract_urls(webpage)
if dailymotion_urls:
return self.playlist_result([
self.url_result(dailymotion_url, DailymotionIE.ie_key())
for dailymotion_url in dailymotion_urls])
video_id, catalogue = self._search_regex( video_id, catalogue = self._search_regex(
(r'id-video=([^@]+@[^"]+)', (r'id-video=([^@]+@[^"]+)',

View File

@@ -1,76 +0,0 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
xpath_text,
xpath_with_ns,
)
from .youtube import YoutubeIE
class GamekingsIE(InfoExtractor):
_VALID_URL = r'https?://www\.gamekings\.nl/(?:videos|nieuws)/(?P<id>[^/]+)'
_TESTS = [{
# YouTube embed video
'url': 'http://www.gamekings.nl/videos/phoenix-wright-ace-attorney-dual-destinies-review/',
'md5': '5208d3a17adeaef829a7861887cb9029',
'info_dict': {
'id': 'HkSQKetlGOU',
'ext': 'mp4',
'title': 'Phoenix Wright: Ace Attorney - Dual Destinies Review',
'description': 'md5:db88c0e7f47e9ea50df3271b9dc72e1d',
'thumbnail': 're:^https?://.*\.jpg$',
'uploader_id': 'UCJugRGo4STYMeFr5RoOShtQ',
'uploader': 'Gamekings Vault',
'upload_date': '20151123',
},
'add_ie': ['Youtube'],
}, {
# vimeo video
'url': 'http://www.gamekings.nl/videos/the-legend-of-zelda-majoras-mask/',
'md5': '12bf04dfd238e70058046937657ea68d',
'info_dict': {
'id': 'the-legend-of-zelda-majoras-mask',
'ext': 'mp4',
'title': 'The Legend of Zelda: Majoras Mask',
'description': 'md5:9917825fe0e9f4057601fe1e38860de3',
'thumbnail': 're:^https?://.*\.jpg$',
},
}, {
'url': 'http://www.gamekings.nl/nieuws/gamekings-extra-shelly-en-david-bereiden-zich-voor-op-de-livestream/',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
playlist_id = self._search_regex(
r'gogoVideo\([^,]+,\s*"([^"]+)', webpage, 'playlist id')
# Check if a YouTube embed is used
if YoutubeIE.suitable(playlist_id):
return self.url_result(playlist_id, ie='Youtube')
playlist = self._download_xml(
'http://www.gamekings.tv/wp-content/themes/gk2010/rss_playlist.php?id=%s' % playlist_id,
video_id)
NS_MAP = {
'jwplayer': 'http://rss.jwpcdn.com/'
}
item = playlist.find('./channel/item')
thumbnail = xpath_text(item, xpath_with_ns('./jwplayer:image', NS_MAP), 'thumbnail')
video_url = item.find(xpath_with_ns('./jwplayer:source', NS_MAP)).get('file')
return {
'id': video_id,
'url': video_url,
'title': self._og_search_title(webpage),
'description': self._og_search_description(webpage),
'thumbnail': thumbnail,
}

View File

@@ -28,10 +28,13 @@ class GameSpotIE(OnceIE):
'url': 'http://www.gamespot.com/videos/the-witcher-3-wild-hunt-xbox-one-now-playing/2300-6424837/', 'url': 'http://www.gamespot.com/videos/the-witcher-3-wild-hunt-xbox-one-now-playing/2300-6424837/',
'info_dict': { 'info_dict': {
'id': 'gs-2300-6424837', 'id': 'gs-2300-6424837',
'ext': 'flv', 'ext': 'mp4',
'title': 'The Witcher 3: Wild Hunt [Xbox ONE] - Now Playing', 'title': 'Now Playing - The Witcher 3: Wild Hunt',
'description': 'Join us as we take a look at the early hours of The Witcher 3: Wild Hunt and more.', 'description': 'Join us as we take a look at the early hours of The Witcher 3: Wild Hunt and more.',
}, },
'params': {
'skip_download': True, # m3u8 downloads
},
}] }]
def _real_extract(self, url): def _real_extract(self, url):

View File

@@ -49,7 +49,10 @@ from .pornhub import PornHubIE
from .xhamster import XHamsterEmbedIE from .xhamster import XHamsterEmbedIE
from .tnaflix import TNAFlixNetworkEmbedIE from .tnaflix import TNAFlixNetworkEmbedIE
from .vimeo import VimeoIE from .vimeo import VimeoIE
from .dailymotion import DailymotionCloudIE from .dailymotion import (
DailymotionIE,
DailymotionCloudIE,
)
from .onionstudios import OnionStudiosIE from .onionstudios import OnionStudiosIE
from .viewlift import ViewLiftEmbedIE from .viewlift import ViewLiftEmbedIE
from .screenwavemedia import ScreenwaveMediaIE from .screenwavemedia import ScreenwaveMediaIE
@@ -59,6 +62,7 @@ from .videomore import VideomoreIE
from .googledrive import GoogleDriveIE from .googledrive import GoogleDriveIE
from .jwplatform import JWPlatformIE from .jwplatform import JWPlatformIE
from .digiteka import DigitekaIE from .digiteka import DigitekaIE
from .arkena import ArkenaIE
from .instagram import InstagramIE from .instagram import InstagramIE
from .liveleak import LiveLeakIE from .liveleak import LiveLeakIE
from .threeqsdn import ThreeQSDNIE from .threeqsdn import ThreeQSDNIE
@@ -67,6 +71,7 @@ from .vessel import VesselIE
from .kaltura import KalturaIE from .kaltura import KalturaIE
from .eagleplatform import EaglePlatformIE from .eagleplatform import EaglePlatformIE
from .facebook import FacebookIE from .facebook import FacebookIE
from .soundcloud import SoundcloudIE
class GenericIE(InfoExtractor): class GenericIE(InfoExtractor):
@@ -470,7 +475,7 @@ class GenericIE(InfoExtractor):
'url': 'http://www.vestifinance.ru/articles/25753', 'url': 'http://www.vestifinance.ru/articles/25753',
'info_dict': { 'info_dict': {
'id': '25753', 'id': '25753',
'title': 'Вести Экономика ― Прямые трансляции с Форума-выставки "Госзаказ-2013"', 'title': 'Прямые трансляции с Форума-выставки "Госзаказ-2013"',
}, },
'playlist': [{ 'playlist': [{
'info_dict': { 'info_dict': {
@@ -637,6 +642,8 @@ class GenericIE(InfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'title': 'Key and Peele|October 10, 2012|2|203|Liam Neesons - Uncensored', 'title': 'Key and Peele|October 10, 2012|2|203|Liam Neesons - Uncensored',
'description': 'Two valets share their love for movie star Liam Neesons.', 'description': 'Two valets share their love for movie star Liam Neesons.',
'timestamp': 1349922600,
'upload_date': '20121011',
}, },
}, },
# YouTube embed via <data-embed-url=""> # YouTube embed via <data-embed-url="">
@@ -778,6 +785,15 @@ class GenericIE(InfoExtractor):
'upload_date': '20141029', 'upload_date': '20141029',
} }
}, },
# Soundcloud multiple embeds
{
'url': 'http://www.guitarplayer.com/lessons/1014/legato-workout-one-hour-to-more-fluid-performance---tab/52809',
'info_dict': {
'id': '52809',
'title': 'Guitar Essentials: Legato Workout—One-Hour to Fluid Performance | TAB + AUDIO',
},
'playlist_mincount': 7,
},
# Livestream embed # Livestream embed
{ {
'url': 'http://www.esa.int/Our_Activities/Space_Science/Rosetta/Philae_comet_touch-down_webcast', 'url': 'http://www.esa.int/Our_Activities/Space_Science/Rosetta/Philae_comet_touch-down_webcast',
@@ -853,6 +869,7 @@ class GenericIE(InfoExtractor):
'description': 'md5:601cb790edd05908957dae8aaa866465', 'description': 'md5:601cb790edd05908957dae8aaa866465',
'upload_date': '20150220', 'upload_date': '20150220',
}, },
'skip': 'All The Daily Show URLs now redirect to http://www.cc.com/shows/',
}, },
# jwplayer YouTube # jwplayer YouTube
{ {
@@ -1246,6 +1263,20 @@ class GenericIE(InfoExtractor):
'uploader': 'www.hudl.com', 'uploader': 'www.hudl.com',
}, },
}, },
# twitter:player:stream embed
{
'url': 'http://www.rtl.be/info/video/589263.aspx?CategoryID=288',
'info_dict': {
'id': 'master',
'ext': 'mp4',
'title': 'Une nouvelle espèce de dinosaure découverte en Argentine',
'uploader': 'www.rtl.be',
},
'params': {
# m3u8 downloads
'skip_download': True,
},
},
# twitter:player embed # twitter:player embed
{ {
'url': 'http://www.theatlantic.com/video/index/484130/what-do-black-holes-sound-like/', 'url': 'http://www.theatlantic.com/video/index/484130/what-do-black-holes-sound-like/',
@@ -1295,6 +1326,70 @@ class GenericIE(InfoExtractor):
'uploader': 'cylus cyrus', 'uploader': 'cylus cyrus',
}, },
}, },
{
# video stored on custom kaltura server
'url': 'http://www.expansion.com/multimedia/videos.html?media=EQcM30NHIPv',
'md5': '537617d06e64dfed891fa1593c4b30cc',
'info_dict': {
'id': '0_1iotm5bh',
'ext': 'mp4',
'title': 'Elecciones británicas: 5 lecciones para Rajoy',
'description': 'md5:435a89d68b9760b92ce67ed227055f16',
'uploader_id': 'videos.expansion@el-mundo.net',
'upload_date': '20150429',
'timestamp': 1430303472,
},
'add_ie': ['Kaltura'],
},
{
# Non-standard Vimeo embed
'url': 'https://openclassrooms.com/courses/understanding-the-web',
'md5': '64d86f1c7d369afd9a78b38cbb88d80a',
'info_dict': {
'id': '148867247',
'ext': 'mp4',
'title': 'Understanding the web - Teaser',
'description': 'This is "Understanding the web - Teaser" by openclassrooms on Vimeo, the home for high quality videos and the people who love them.',
'upload_date': '20151214',
'uploader': 'OpenClassrooms',
'uploader_id': 'openclassrooms',
},
'add_ie': ['Vimeo'],
},
{
'url': 'https://support.arkena.com/display/PLAY/Ways+to+embed+your+video',
'md5': 'b96f2f71b359a8ecd05ce4e1daa72365',
'info_dict': {
'id': 'b41dda37-d8e7-4d3f-b1b5-9a9db578bdfe',
'ext': 'mp4',
'title': 'Big Buck Bunny',
'description': 'Royalty free test video',
'timestamp': 1432816365,
'upload_date': '20150528',
'is_live': False,
},
'params': {
'skip_download': True,
},
'add_ie': [ArkenaIE.ie_key()],
},
# {
# # TODO: find another test
# # http://schema.org/VideoObject
# 'url': 'https://flipagram.com/f/nyvTSJMKId',
# 'md5': '888dcf08b7ea671381f00fab74692755',
# 'info_dict': {
# 'id': 'nyvTSJMKId',
# 'ext': 'mp4',
# 'title': 'Flipagram by sjuria101 featuring Midnight Memories by One Direction',
# 'description': '#love for cats.',
# 'timestamp': 1461244995,
# 'upload_date': '20160421',
# },
# 'params': {
# 'force_generic_extractor': True,
# },
# }
] ]
def report_following_redirect(self, new_url): def report_following_redirect(self, new_url):
@@ -1658,12 +1753,9 @@ class GenericIE(InfoExtractor):
if matches: if matches:
return _playlist_from_matches(matches, lambda m: m[-1]) return _playlist_from_matches(matches, lambda m: m[-1])
# Look for embedded Dailymotion player matches = DailymotionIE._extract_urls(webpage)
matches = re.findall(
r'<(?:(?:embed|iframe)[^>]+?src=|input[^>]+id=[\'"]dmcloudUrlEmissionSelect[\'"][^>]+value=)(["\'])(?P<url>(?:https?:)?//(?:www\.)?dailymotion\.com/(?:embed|swf)/video/.+?)\1', webpage)
if matches: if matches:
return _playlist_from_matches( return _playlist_from_matches(matches)
matches, lambda m: unescapeHTML(m[1]))
# Look for embedded Dailymotion playlist player (#3822) # Look for embedded Dailymotion playlist player (#3822)
m = re.search( m = re.search(
@@ -1917,12 +2009,9 @@ class GenericIE(InfoExtractor):
return self.url_result(myvi_url) return self.url_result(myvi_url)
# Look for embedded soundcloud player # Look for embedded soundcloud player
mobj = re.search( soundcloud_urls = SoundcloudIE._extract_urls(webpage)
r'<iframe\s+(?:[a-zA-Z0-9_-]+="[^"]+"\s+)*src="(?P<url>https?://(?:w\.)?soundcloud\.com/player[^"]+)"', if soundcloud_urls:
webpage) return _playlist_from_matches(soundcloud_urls, getter=unescapeHTML, ie=SoundcloudIE.ie_key())
if mobj is not None:
url = unescapeHTML(mobj.group('url'))
return self.url_result(url)
# Look for embedded mtvservices player # Look for embedded mtvservices player
mtvservices_url = MTVServicesEmbeddedIE._extract_url(webpage) mtvservices_url = MTVServicesEmbeddedIE._extract_url(webpage)
@@ -2085,6 +2174,11 @@ class GenericIE(InfoExtractor):
if digiteka_url: if digiteka_url:
return self.url_result(self._proto_relative_url(digiteka_url), DigitekaIE.ie_key()) return self.url_result(self._proto_relative_url(digiteka_url), DigitekaIE.ie_key())
# Look for Arkena embeds
arkena_url = ArkenaIE._extract_url(webpage)
if arkena_url:
return self.url_result(arkena_url, ArkenaIE.ie_key())
# Look for Limelight embeds # Look for Limelight embeds
mobj = re.search(r'LimelightPlayer\.doLoad(Media|Channel|ChannelList)\(["\'](?P<id>[a-z0-9]{32})', webpage) mobj = re.search(r'LimelightPlayer\.doLoad(Media|Channel|ChannelList)\(["\'](?P<id>[a-z0-9]{32})', webpage)
if mobj: if mobj:
@@ -2113,6 +2207,14 @@ class GenericIE(InfoExtractor):
return self.url_result( return self.url_result(
self._proto_relative_url(unescapeHTML(mobj.group(1))), 'Vine') self._proto_relative_url(unescapeHTML(mobj.group(1))), 'Vine')
# Look for VODPlatform embeds
mobj = re.search(
r'<iframe[^>]+src=[\'"]((?:https?:)?//(?:www\.)?vod-platform\.net/embed/[^/?#]+)',
webpage)
if mobj is not None:
return self.url_result(
self._proto_relative_url(unescapeHTML(mobj.group(1))), 'VODPlatform')
# Look for Instagram embeds # Look for Instagram embeds
instagram_embed_url = InstagramIE._extract_embed_url(webpage) instagram_embed_url = InstagramIE._extract_embed_url(webpage)
if instagram_embed_url is not None: if instagram_embed_url is not None:
@@ -2137,10 +2239,18 @@ class GenericIE(InfoExtractor):
'uploader': video_uploader, 'uploader': video_uploader,
} }
# https://dev.twitter.com/cards/types/player#On_twitter.com_via_desktop_browser # Looking for http://schema.org/VideoObject
embed_url = self._html_search_meta('twitter:player', webpage, default=None) json_ld = self._search_json_ld(
if embed_url: webpage, video_id, default={}, expected_type='VideoObject')
return self.url_result(embed_url) if json_ld.get('url'):
info_dict.update({
'title': video_title or info_dict['title'],
'description': video_description,
'thumbnail': video_thumbnail,
'age_limit': age_limit
})
info_dict.update(json_ld)
return info_dict
def check_video(vurl): def check_video(vurl):
if YoutubeIE.suitable(vurl): if YoutubeIE.suitable(vurl):
@@ -2185,6 +2295,9 @@ class GenericIE(InfoExtractor):
r"cinerama\.embedPlayer\(\s*\'[^']+\',\s*'([^']+)'", webpage) r"cinerama\.embedPlayer\(\s*\'[^']+\',\s*'([^']+)'", webpage)
if not found: if not found:
# Try to find twitter cards info # Try to find twitter cards info
# twitter:player:stream should be checked before twitter:player since
# it is expected to contain a raw stream (see
# https://dev.twitter.com/cards/types/player#On_twitter.com_via_desktop_browser)
found = filter_video(re.findall( found = filter_video(re.findall(
r'<meta (?:property|name)="twitter:player:stream" (?:content|value)="(.+?)"', webpage)) r'<meta (?:property|name)="twitter:player:stream" (?:content|value)="(.+?)"', webpage))
if not found: if not found:
@@ -2218,6 +2331,15 @@ class GenericIE(InfoExtractor):
'_type': 'url', '_type': 'url',
'url': new_url, 'url': new_url,
} }
if not found:
# twitter:player is a https URL to iframe player that may or may not
# be supported by youtube-dl thus this is checked the very last (see
# https://dev.twitter.com/cards/types/player#On_twitter.com_via_desktop_browser)
embed_url = self._html_search_meta('twitter:player', webpage, default=None)
if embed_url:
return self.url_result(embed_url)
if not found: if not found:
raise UnsupportedError(url) raise UnsupportedError(url)

View File

@@ -1,48 +0,0 @@
from __future__ import unicode_literals
from .common import InfoExtractor
class GoldenMoustacheIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?goldenmoustache\.com/(?P<display_id>[\w-]+)-(?P<id>\d+)'
_TESTS = [{
'url': 'http://www.goldenmoustache.com/suricate-le-poker-3700/',
'md5': '0f904432fa07da5054d6c8beb5efb51a',
'info_dict': {
'id': '3700',
'ext': 'mp4',
'title': 'Suricate - Le Poker',
'description': 'md5:3d1f242f44f8c8cb0a106f1fd08e5dc9',
'thumbnail': 're:^https?://.*\.jpg$',
}
}, {
'url': 'http://www.goldenmoustache.com/le-lab-tout-effacer-mc-fly-et-carlito-55249/',
'md5': '27f0c50fb4dd5f01dc9082fc67cd5700',
'info_dict': {
'id': '55249',
'ext': 'mp4',
'title': 'Le LAB - Tout Effacer (Mc Fly et Carlito)',
'description': 'md5:9b7fbf11023fb2250bd4b185e3de3b2a',
'thumbnail': 're:^https?://.*\.(?:png|jpg)$',
}
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
video_url = self._html_search_regex(
r'data-src-type="mp4" data-src="([^"]+)"', webpage, 'video URL')
title = self._html_search_regex(
r'<title>(.*?)(?: - Golden Moustache)?</title>', webpage, 'title')
thumbnail = self._og_search_thumbnail(webpage)
description = self._og_search_description(webpage)
return {
'id': video_id,
'url': video_url,
'ext': 'mp4',
'title': title,
'description': description,
'thumbnail': thumbnail,
}

View File

@@ -36,7 +36,6 @@ class InstagramIE(InfoExtractor):
'info_dict': { 'info_dict': {
'id': 'BA-pQFBG8HZ', 'id': 'BA-pQFBG8HZ',
'ext': 'mp4', 'ext': 'mp4',
'uploader_id': 'britneyspears',
'title': 'Video by britneyspears', 'title': 'Video by britneyspears',
'thumbnail': 're:^https?://.*\.jpg', 'thumbnail': 're:^https?://.*\.jpg',
'timestamp': 1453760977, 'timestamp': 1453760977,

View File

@@ -165,7 +165,7 @@ class IqiyiIE(InfoExtractor):
_TESTS = [{ _TESTS = [{
'url': 'http://www.iqiyi.com/v_19rrojlavg.html', 'url': 'http://www.iqiyi.com/v_19rrojlavg.html',
'md5': '5b0591f55961117155430b5d544fdb01', # MD5 checksum differs on my machine and Travis CI
'info_dict': { 'info_dict': {
'id': '9c1fb1b99d192b21c559e5a1a2cb3c73', 'id': '9c1fb1b99d192b21c559e5a1a2cb3c73',
'ext': 'mp4', 'ext': 'mp4',
@@ -293,14 +293,10 @@ class IqiyiIE(InfoExtractor):
't': tm, 't': tm,
} }
headers = {}
cn_verification_proxy = self._downloader.params.get('cn_verification_proxy')
if cn_verification_proxy:
headers['Ytdl-request-proxy'] = cn_verification_proxy
return self._download_json( return self._download_json(
'http://cache.m.iqiyi.com/jp/tmts/%s/%s/' % (tvid, video_id), 'http://cache.m.iqiyi.com/jp/tmts/%s/%s/' % (tvid, video_id),
video_id, transform_source=lambda s: remove_start(s, 'var tvInfoJs='), video_id, transform_source=lambda s: remove_start(s, 'var tvInfoJs='),
query=params, headers=headers) query=params, headers=self.geo_verification_headers())
def _extract_playlist(self, webpage): def _extract_playlist(self, webpage):
PAGE_SIZE = 50 PAGE_SIZE = 50

View File

@@ -4,10 +4,12 @@ from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_urlparse
from ..utils import ( from ..utils import (
determine_ext, determine_ext,
float_or_none, float_or_none,
int_or_none, int_or_none,
mimetype2ext,
) )
@@ -28,74 +30,84 @@ class JWPlatformBaseIE(InfoExtractor):
return self._parse_jwplayer_data( return self._parse_jwplayer_data(
jwplayer_data, video_id, *args, **kwargs) jwplayer_data, video_id, *args, **kwargs)
def _parse_jwplayer_data(self, jwplayer_data, video_id, require_title=True, m3u8_id=None, rtmp_params=None): def _parse_jwplayer_data(self, jwplayer_data, video_id, require_title=True, m3u8_id=None, rtmp_params=None, base_url=None):
# JWPlayer backward compatibility: flattened playlists # JWPlayer backward compatibility: flattened playlists
# https://github.com/jwplayer/jwplayer/blob/v7.4.3/src/js/api/config.js#L81-L96 # https://github.com/jwplayer/jwplayer/blob/v7.4.3/src/js/api/config.js#L81-L96
if 'playlist' not in jwplayer_data: if 'playlist' not in jwplayer_data:
jwplayer_data = {'playlist': [jwplayer_data]} jwplayer_data = {'playlist': [jwplayer_data]}
video_data = jwplayer_data['playlist'][0] entries = []
for video_data in jwplayer_data['playlist']:
# JWPlayer backward compatibility: flattened sources
# https://github.com/jwplayer/jwplayer/blob/v7.4.3/src/js/playlist/item.js#L29-L35
if 'sources' not in video_data:
video_data['sources'] = [video_data]
# JWPlayer backward compatibility: flattened sources formats = []
# https://github.com/jwplayer/jwplayer/blob/v7.4.3/src/js/playlist/item.js#L29-L35 for source in video_data['sources']:
if 'sources' not in video_data: source_url = self._proto_relative_url(source['file'])
video_data['sources'] = [video_data] if base_url:
source_url = compat_urlparse.urljoin(base_url, source_url)
formats = [] source_type = source.get('type') or ''
for source in video_data['sources']: ext = mimetype2ext(source_type) or determine_ext(source_url)
source_url = self._proto_relative_url(source['file']) if source_type == 'hls' or ext == 'm3u8':
source_type = source.get('type') or '' formats.extend(self._extract_m3u8_formats(
if source_type in ('application/vnd.apple.mpegurl', 'hls') or determine_ext(source_url) == 'm3u8': source_url, video_id, 'mp4', 'm3u8_native', m3u8_id=m3u8_id, fatal=False))
formats.extend(self._extract_m3u8_formats( # https://github.com/jwplayer/jwplayer/blob/master/src/js/providers/default.js#L67
source_url, video_id, 'mp4', 'm3u8_native', m3u8_id=m3u8_id, fatal=False)) elif source_type.startswith('audio') or ext in ('oga', 'aac', 'mp3', 'mpeg', 'vorbis'):
elif source_type.startswith('audio'): formats.append({
formats.append({ 'url': source_url,
'url': source_url, 'vcodec': 'none',
'vcodec': 'none', 'ext': ext,
})
else:
a_format = {
'url': source_url,
'width': int_or_none(source.get('width')),
'height': int_or_none(source.get('height')),
}
if source_url.startswith('rtmp'):
a_format['ext'] = 'flv',
# See com/longtailvideo/jwplayer/media/RTMPMediaProvider.as
# of jwplayer.flash.swf
rtmp_url_parts = re.split(
r'((?:mp4|mp3|flv):)', source_url, 1)
if len(rtmp_url_parts) == 3:
rtmp_url, prefix, play_path = rtmp_url_parts
a_format.update({
'url': rtmp_url,
'play_path': prefix + play_path,
})
if rtmp_params:
a_format.update(rtmp_params)
formats.append(a_format)
self._sort_formats(formats)
subtitles = {}
tracks = video_data.get('tracks')
if tracks and isinstance(tracks, list):
for track in tracks:
if track.get('file') and track.get('kind') == 'captions':
subtitles.setdefault(track.get('label') or 'en', []).append({
'url': self._proto_relative_url(track['file'])
}) })
else:
a_format = {
'url': source_url,
'width': int_or_none(source.get('width')),
'height': int_or_none(source.get('height')),
'ext': ext,
}
if source_url.startswith('rtmp'):
a_format['ext'] = 'flv',
return { # See com/longtailvideo/jwplayer/media/RTMPMediaProvider.as
'id': video_id, # of jwplayer.flash.swf
'title': video_data['title'] if require_title else video_data.get('title'), rtmp_url_parts = re.split(
'description': video_data.get('description'), r'((?:mp4|mp3|flv):)', source_url, 1)
'thumbnail': self._proto_relative_url(video_data.get('image')), if len(rtmp_url_parts) == 3:
'timestamp': int_or_none(video_data.get('pubdate')), rtmp_url, prefix, play_path = rtmp_url_parts
'duration': float_or_none(jwplayer_data.get('duration')), a_format.update({
'subtitles': subtitles, 'url': rtmp_url,
'formats': formats, 'play_path': prefix + play_path,
} })
if rtmp_params:
a_format.update(rtmp_params)
formats.append(a_format)
self._sort_formats(formats)
subtitles = {}
tracks = video_data.get('tracks')
if tracks and isinstance(tracks, list):
for track in tracks:
if track.get('file') and track.get('kind') == 'captions':
subtitles.setdefault(track.get('label') or 'en', []).append({
'url': self._proto_relative_url(track['file'])
})
entries.append({
'id': video_id,
'title': video_data['title'] if require_title else video_data.get('title'),
'description': video_data.get('description'),
'thumbnail': self._proto_relative_url(video_data.get('image')),
'timestamp': int_or_none(video_data.get('pubdate')),
'duration': float_or_none(jwplayer_data.get('duration')),
'subtitles': subtitles,
'formats': formats,
})
if len(entries) == 1:
return entries[0]
else:
return self.playlist_result(entries)
class JWPlatformIE(JWPlatformBaseIE): class JWPlatformIE(JWPlatformBaseIE):

View File

@@ -6,7 +6,6 @@ import base64
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import ( from ..compat import (
compat_urllib_parse_urlencode,
compat_urlparse, compat_urlparse,
compat_parse_qs, compat_parse_qs,
) )
@@ -15,6 +14,7 @@ from ..utils import (
ExtractorError, ExtractorError,
int_or_none, int_or_none,
unsmuggle_url, unsmuggle_url,
smuggle_url,
) )
@@ -34,7 +34,8 @@ class KalturaIE(InfoExtractor):
)(?:/(?P<path>[^?]+))?(?:\?(?P<query>.*))? )(?:/(?P<path>[^?]+))?(?:\?(?P<query>.*))?
) )
''' '''
_API_BASE = 'http://cdnapi.kaltura.com/api_v3/index.php?' _SERVICE_URL = 'http://cdnapi.kaltura.com'
_SERVICE_BASE = '/api_v3/index.php'
_TESTS = [ _TESTS = [
{ {
'url': 'kaltura:269692:1_1jc2y3e4', 'url': 'kaltura:269692:1_1jc2y3e4',
@@ -61,6 +62,11 @@ class KalturaIE(InfoExtractor):
{ {
'url': 'https://cdnapisec.kaltura.com/html5/html5lib/v2.30.2/mwEmbedFrame.php/p/1337/uiconf_id/20540612/entry_id/1_sf5ovm7u?wid=_243342', 'url': 'https://cdnapisec.kaltura.com/html5/html5lib/v2.30.2/mwEmbedFrame.php/p/1337/uiconf_id/20540612/entry_id/1_sf5ovm7u?wid=_243342',
'only_matching': True, 'only_matching': True,
},
{
# video with subtitles
'url': 'kaltura:111032:1_cw786r8q',
'only_matching': True,
} }
] ]
@@ -88,18 +94,26 @@ class KalturaIE(InfoExtractor):
(?P<q3>["\'])(?P<id>.+?)(?P=q3) (?P<q3>["\'])(?P<id>.+?)(?P=q3)
''', webpage)) ''', webpage))
if mobj: if mobj:
return 'kaltura:%(partner_id)s:%(id)s' % mobj.groupdict() embed_info = mobj.groupdict()
url = 'kaltura:%(partner_id)s:%(id)s' % embed_info
escaped_pid = re.escape(embed_info['partner_id'])
service_url = re.search(
r'<script[^>]+src=["\']((?:https?:)?//.+?)/p/%s/sp/%s00/embedIframeJs' % (escaped_pid, escaped_pid),
webpage)
if service_url:
url = smuggle_url(url, {'service_url': service_url.group(1)})
return url
def _kaltura_api_call(self, video_id, actions, *args, **kwargs): def _kaltura_api_call(self, video_id, actions, service_url=None, *args, **kwargs):
params = actions[0] params = actions[0]
if len(actions) > 1: if len(actions) > 1:
for i, a in enumerate(actions[1:], start=1): for i, a in enumerate(actions[1:], start=1):
for k, v in a.items(): for k, v in a.items():
params['%d:%s' % (i, k)] = v params['%d:%s' % (i, k)] = v
query = compat_urllib_parse_urlencode(params) data = self._download_json(
url = self._API_BASE + query (service_url or self._SERVICE_URL) + self._SERVICE_BASE,
data = self._download_json(url, video_id, *args, **kwargs) video_id, query=params, *args, **kwargs)
status = data if len(actions) == 1 else data[0] status = data if len(actions) == 1 else data[0]
if status.get('objectType') == 'KalturaAPIException': if status.get('objectType') == 'KalturaAPIException':
@@ -108,7 +122,7 @@ class KalturaIE(InfoExtractor):
return data return data
def _get_kaltura_signature(self, video_id, partner_id): def _get_kaltura_signature(self, video_id, partner_id, service_url=None):
actions = [{ actions = [{
'apiVersion': '3.1', 'apiVersion': '3.1',
'expiry': 86400, 'expiry': 86400,
@@ -118,10 +132,9 @@ class KalturaIE(InfoExtractor):
'widgetId': '_%s' % partner_id, 'widgetId': '_%s' % partner_id,
}] }]
return self._kaltura_api_call( return self._kaltura_api_call(
video_id, actions, note='Downloading Kaltura signature')['ks'] video_id, actions, service_url, note='Downloading Kaltura signature')['ks']
def _get_video_info(self, video_id, partner_id): def _get_video_info(self, video_id, partner_id, service_url=None):
signature = self._get_kaltura_signature(video_id, partner_id)
actions = [ actions = [
{ {
'action': 'null', 'action': 'null',
@@ -129,22 +142,34 @@ class KalturaIE(InfoExtractor):
'clientTag': 'kdp:v3.8.5', 'clientTag': 'kdp:v3.8.5',
'format': 1, # JSON, 2 = XML, 3 = PHP 'format': 1, # JSON, 2 = XML, 3 = PHP
'service': 'multirequest', 'service': 'multirequest',
'ks': signature, },
{
'expiry': 86400,
'service': 'session',
'action': 'startWidgetSession',
'widgetId': '_%s' % partner_id,
}, },
{ {
'action': 'get', 'action': 'get',
'entryId': video_id, 'entryId': video_id,
'service': 'baseentry', 'service': 'baseentry',
'version': '-1', 'ks': '{1:result:ks}',
}, },
{ {
'action': 'getbyentryid', 'action': 'getbyentryid',
'entryId': video_id, 'entryId': video_id,
'service': 'flavorAsset', 'service': 'flavorAsset',
'ks': '{1:result:ks}',
},
{
'action': 'list',
'filter:entryIdEqual': video_id,
'service': 'caption_captionasset',
'ks': '{1:result:ks}',
}, },
] ]
return self._kaltura_api_call( return self._kaltura_api_call(
video_id, actions, note='Downloading video info JSON') video_id, actions, service_url, note='Downloading video info JSON')
def _real_extract(self, url): def _real_extract(self, url):
url, smuggled_data = unsmuggle_url(url, {}) url, smuggled_data = unsmuggle_url(url, {})
@@ -152,8 +177,9 @@ class KalturaIE(InfoExtractor):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
partner_id, entry_id = mobj.group('partner_id', 'id') partner_id, entry_id = mobj.group('partner_id', 'id')
ks = None ks = None
captions = None
if partner_id and entry_id: if partner_id and entry_id:
info, flavor_assets = self._get_video_info(entry_id, partner_id) _, info, flavor_assets, captions = self._get_video_info(entry_id, partner_id, smuggled_data.get('service_url'))
else: else:
path, query = mobj.group('path', 'query') path, query = mobj.group('path', 'query')
if not path and not query: if not path and not query:
@@ -172,7 +198,7 @@ class KalturaIE(InfoExtractor):
raise ExtractorError('Invalid URL', expected=True) raise ExtractorError('Invalid URL', expected=True)
if 'entry_id' in params: if 'entry_id' in params:
entry_id = params['entry_id'][0] entry_id = params['entry_id'][0]
info, flavor_assets = self._get_video_info(entry_id, partner_id) _, info, flavor_assets, captions = self._get_video_info(entry_id, partner_id)
elif 'uiconf_id' in params and 'flashvars[referenceId]' in params: elif 'uiconf_id' in params and 'flashvars[referenceId]' in params:
reference_id = params['flashvars[referenceId]'][0] reference_id = params['flashvars[referenceId]'][0]
webpage = self._download_webpage(url, reference_id) webpage = self._download_webpage(url, reference_id)
@@ -201,12 +227,17 @@ class KalturaIE(InfoExtractor):
unsigned_url += '?referrer=%s' % referrer unsigned_url += '?referrer=%s' % referrer
return unsigned_url return unsigned_url
data_url = info['dataUrl']
if '/flvclipper/' in data_url:
data_url = re.sub(r'/flvclipper/.*', '/serveFlavor', data_url)
formats = [] formats = []
for f in flavor_assets: for f in flavor_assets:
# Continue if asset is not ready # Continue if asset is not ready
if f['status'] != 2: if f.get('status') != 2:
continue continue
video_url = sign_url('%s/flavorId/%s' % (info['dataUrl'], f['id'])) video_url = sign_url(
'%s/flavorId/%s' % (data_url, f['id']))
formats.append({ formats.append({
'format_id': '%(fileExt)s-%(bitrate)s' % f, 'format_id': '%(fileExt)s-%(bitrate)s' % f,
'ext': f.get('fileExt'), 'ext': f.get('fileExt'),
@@ -219,17 +250,31 @@ class KalturaIE(InfoExtractor):
'width': int_or_none(f.get('width')), 'width': int_or_none(f.get('width')),
'url': video_url, 'url': video_url,
}) })
m3u8_url = sign_url(info['dataUrl'].replace('format/url', 'format/applehttp')) if '/playManifest/' in data_url:
formats.extend(self._extract_m3u8_formats( m3u8_url = sign_url(data_url.replace(
m3u8_url, entry_id, 'mp4', 'm3u8_native', m3u8_id='hls', fatal=False)) 'format/url', 'format/applehttp'))
formats.extend(self._extract_m3u8_formats(
m3u8_url, entry_id, 'mp4', 'm3u8_native',
m3u8_id='hls', fatal=False))
self._check_formats(formats, entry_id)
self._sort_formats(formats) self._sort_formats(formats)
subtitles = {}
if captions:
for caption in captions.get('objects', []):
# Continue if caption is not ready
if f.get('status') != 2:
continue
subtitles.setdefault(caption.get('languageCode') or caption.get('language'), []).append({
'url': '%s/api_v3/service/caption_captionasset/action/serve/captionAssetId/%s' % (self._SERVICE_URL, caption['id']),
'ext': caption.get('fileExt'),
})
return { return {
'id': entry_id, 'id': entry_id,
'title': info['name'], 'title': info['name'],
'formats': formats, 'formats': formats,
'subtitles': subtitles,
'description': clean_html(info.get('description')), 'description': clean_html(info.get('description')),
'thumbnail': info.get('thumbnailUrl'), 'thumbnail': info.get('thumbnailUrl'),
'duration': info.get('duration'), 'duration': info.get('duration'),

View File

@@ -0,0 +1,71 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
int_or_none,
qualities,
)
class KamcordIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?kamcord\.com/v/(?P<id>[^/?#&]+)'
_TEST = {
'url': 'https://www.kamcord.com/v/hNYRduDgWb4',
'md5': 'c3180e8a9cfac2e86e1b88cb8751b54c',
'info_dict': {
'id': 'hNYRduDgWb4',
'ext': 'mp4',
'title': 'Drinking Madness',
'uploader': 'jacksfilms',
'uploader_id': '3044562',
'view_count': int,
'like_count': int,
'comment_count': int,
},
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
video = self._parse_json(
self._search_regex(
r'window\.__props\s*=\s*({.+?});?(?:\n|\s*</script)',
webpage, 'video'),
video_id)['video']
title = video['title']
formats = self._extract_m3u8_formats(
video['play']['hls'], video_id, 'mp4', entry_protocol='m3u8_native')
self._sort_formats(formats)
uploader = video.get('user', {}).get('username')
uploader_id = video.get('user', {}).get('id')
view_count = int_or_none(video.get('viewCount'))
like_count = int_or_none(video.get('heartCount'))
comment_count = int_or_none(video.get('messageCount'))
preference_key = qualities(('small', 'medium', 'large'))
thumbnails = [{
'url': thumbnail_url,
'id': thumbnail_id,
'preference': preference_key(thumbnail_id),
} for thumbnail_id, thumbnail_url in (video.get('thumbnail') or {}).items()
if isinstance(thumbnail_id, compat_str) and isinstance(thumbnail_url, compat_str)]
return {
'id': video_id,
'title': title,
'uploader': uploader,
'uploader_id': uploader_id,
'view_count': view_count,
'like_count': like_count,
'comment_count': comment_count,
'thumbnails': thumbnails,
'formats': formats,
}

View File

@@ -4,6 +4,7 @@ from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_urlparse
from ..utils import ( from ..utils import (
get_element_by_id, get_element_by_id,
clean_html, clean_html,
@@ -26,11 +27,6 @@ class KuwoBaseIE(InfoExtractor):
def _get_formats(self, song_id, tolerate_ip_deny=False): def _get_formats(self, song_id, tolerate_ip_deny=False):
formats = [] formats = []
for file_format in self._FORMATS: for file_format in self._FORMATS:
headers = {}
cn_verification_proxy = self._downloader.params.get('cn_verification_proxy')
if cn_verification_proxy:
headers['Ytdl-request-proxy'] = cn_verification_proxy
query = { query = {
'format': file_format['ext'], 'format': file_format['ext'],
'br': file_format.get('br', ''), 'br': file_format.get('br', ''),
@@ -42,7 +38,7 @@ class KuwoBaseIE(InfoExtractor):
song_url = self._download_webpage( song_url = self._download_webpage(
'http://antiserver.kuwo.cn/anti.s', 'http://antiserver.kuwo.cn/anti.s',
song_id, note='Download %s url info' % file_format['format'], song_id, note='Download %s url info' % file_format['format'],
query=query, headers=headers, query=query, headers=self.geo_verification_headers(),
) )
if song_url == 'IPDeny' and not tolerate_ip_deny: if song_url == 'IPDeny' and not tolerate_ip_deny:
@@ -247,8 +243,9 @@ class KuwoSingerIE(InfoExtractor):
query={'artistId': artist_id, 'pn': page_num, 'rn': self.PAGE_SIZE}) query={'artistId': artist_id, 'pn': page_num, 'rn': self.PAGE_SIZE})
return [ return [
self.url_result(song_url, 'Kuwo') for song_url in re.findall( self.url_result(compat_urlparse.urljoin(url, song_url), 'Kuwo')
r'<div[^>]+class="name"><a[^>]+href="(http://www\.kuwo\.cn/yinyue/\d+)', for song_url in re.findall(
r'<div[^>]+class="name"><a[^>]+href="(/yinyue/\d+)',
webpage) webpage)
] ]

View File

@@ -3,8 +3,8 @@ from __future__ import unicode_literals
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
determine_ext,
js_to_json, js_to_json,
smuggle_url,
) )
@@ -18,13 +18,16 @@ class LA7IE(InfoExtractor):
_TESTS = [{ _TESTS = [{
# 'src' is a plain URL # 'src' is a plain URL
'url': 'http://www.la7.it/crozza/video/inccool8-02-10-2015-163722', 'url': 'http://www.la7.it/crozza/video/inccool8-02-10-2015-163722',
'md5': '6054674766e7988d3e02f2148ff92180', 'md5': '8b613ffc0c4bf9b9e377169fc19c214c',
'info_dict': { 'info_dict': {
'id': 'inccool8-02-10-2015-163722', 'id': 'inccool8-02-10-2015-163722',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Inc.Cool8', 'title': 'Inc.Cool8',
'description': 'Benvenuti nell\'incredibile mondo della INC. COOL. 8. dove “INC.” sta per “Incorporated” “COOL” sta per “fashion” ed Eight sta per il gesto atletico', 'description': 'Benvenuti nell\'incredibile mondo della INC. COOL. 8. dove “INC.” sta per “Incorporated” “COOL” sta per “fashion” ed Eight sta per il gesto atletico',
'thumbnail': 're:^https?://.*', 'thumbnail': 're:^https?://.*',
'uploader_id': 'kdla7pillole@iltrovatore.it',
'timestamp': 1443814869,
'upload_date': '20151002',
}, },
}, { }, {
# 'src' is a dictionary # 'src' is a dictionary
@@ -49,26 +52,14 @@ class LA7IE(InfoExtractor):
self._search_regex(r'videoLa7\(({[^;]+})\);', webpage, 'player data'), self._search_regex(r'videoLa7\(({[^;]+})\);', webpage, 'player data'),
video_id, transform_source=js_to_json) video_id, transform_source=js_to_json)
source = player_data['src']
source_urls = source.values() if isinstance(source, dict) else [source]
formats = []
for source_url in source_urls:
ext = determine_ext(source_url)
if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
source_url, video_id, ext='mp4',
entry_protocol='m3u8_native', m3u8_id='hls'))
else:
formats.append({
'url': source_url,
})
self._sort_formats(formats)
return { return {
'_type': 'url_transparent',
'url': smuggle_url('kaltura:103:%s' % player_data['vid'], {
'service_url': 'http://kdam.iltrovatore.it',
}),
'id': video_id, 'id': video_id,
'title': player_data['title'], 'title': player_data['title'],
'description': self._og_search_description(webpage, default=None), 'description': self._og_search_description(webpage, default=None),
'thumbnail': player_data.get('poster'), 'thumbnail': player_data.get('poster'),
'formats': formats, 'ie_key': 'Kaltura',
} }

View File

@@ -0,0 +1,90 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from .arkena import ArkenaIE
class LcpPlayIE(ArkenaIE):
_VALID_URL = r'https?://play\.lcp\.fr/embed/(?P<id>[^/]+)/(?P<account_id>[^/]+)/[^/]+/[^/]+'
_TESTS = [{
'url': 'http://play.lcp.fr/embed/327336/131064/darkmatter/0',
'md5': 'b8bd9298542929c06c1c15788b1f277a',
'info_dict': {
'id': '327336',
'ext': 'mp4',
'title': '327336',
'timestamp': 1456391602,
'upload_date': '20160225',
},
'params': {
'skip_download': True,
},
}]
class LcpIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?lcp\.fr/(?:[^/]+/)*(?P<id>[^/]+)'
_TESTS = [{
# arkena embed
'url': 'http://www.lcp.fr/la-politique-en-video/schwartzenberg-prg-preconise-francois-hollande-de-participer-une-primaire',
'md5': 'b8bd9298542929c06c1c15788b1f277a',
'info_dict': {
'id': 'd56d03e9',
'ext': 'mp4',
'title': 'Schwartzenberg (PRG) préconise à François Hollande de participer à une primaire à gauche',
'description': 'md5:96ad55009548da9dea19f4120c6c16a8',
'timestamp': 1456488895,
'upload_date': '20160226',
},
'params': {
'skip_download': True,
},
}, {
# dailymotion live stream
'url': 'http://www.lcp.fr/le-direct',
'info_dict': {
'id': 'xji3qy',
'ext': 'mp4',
'title': 'La Chaine Parlementaire (LCP), Live TNT',
'description': 'md5:5c69593f2de0f38bd9a949f2c95e870b',
'uploader': 'LCP',
'uploader_id': 'xbz33d',
'timestamp': 1308923058,
'upload_date': '20110624',
},
'params': {
# m3u8 live stream
'skip_download': True,
},
}, {
'url': 'http://www.lcp.fr/emissions/277792-les-volontaires',
'only_matching': True,
}]
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
play_url = self._search_regex(
r'<iframe[^>]+src=(["\'])(?P<url>%s?(?:(?!\1).)*)\1' % LcpPlayIE._VALID_URL,
webpage, 'play iframe', default=None, group='url')
if not play_url:
return self.url_result(url, 'Generic')
title = self._og_search_title(webpage, default=None) or self._html_search_meta(
'twitter:title', webpage, fatal=True)
description = self._html_search_meta(
('description', 'twitter:description'), webpage)
return {
'_type': 'url_transparent',
'ie_key': LcpPlayIE.ie_key(),
'url': play_url,
'display_id': display_id,
'title': title,
'description': description,
}

View File

@@ -20,10 +20,10 @@ from ..utils import (
int_or_none, int_or_none,
orderedSet, orderedSet,
parse_iso8601, parse_iso8601,
sanitized_Request,
str_or_none, str_or_none,
url_basename, url_basename,
urshift, urshift,
update_url_query,
) )
@@ -90,6 +90,10 @@ class LeIE(InfoExtractor):
_loc3_ = self.ror(_loc3_, _loc2_ % 17) _loc3_ = self.ror(_loc3_, _loc2_ % 17)
return _loc3_ return _loc3_
# reversed from http://jstatic.letvcdn.com/sdk/player.js
def get_mms_key(self, time):
return self.ror(time, 8) ^ 185025305
# see M3U8Encryption class in KLetvPlayer.swf # see M3U8Encryption class in KLetvPlayer.swf
@staticmethod @staticmethod
def decrypt_m3u8(encrypted_data): def decrypt_m3u8(encrypted_data):
@@ -110,28 +114,7 @@ class LeIE(InfoExtractor):
return bytes(_loc7_) return bytes(_loc7_)
def _real_extract(self, url): def _check_errors(self, play_json):
media_id = self._match_id(url)
page = self._download_webpage(url, media_id)
params = {
'id': media_id,
'platid': 1,
'splatid': 101,
'format': 1,
'tkey': self.calc_time_key(int(time.time())),
'domain': 'www.le.com'
}
play_json_req = sanitized_Request(
'http://api.le.com/mms/out/video/playJson?' + compat_urllib_parse_urlencode(params)
)
cn_verification_proxy = self._downloader.params.get('cn_verification_proxy')
if cn_verification_proxy:
play_json_req.add_header('Ytdl-request-proxy', cn_verification_proxy)
play_json = self._download_json(
play_json_req,
media_id, 'Downloading playJson data')
# Check for errors # Check for errors
playstatus = play_json['playstatus'] playstatus = play_json['playstatus']
if playstatus['status'] == 0: if playstatus['status'] == 0:
@@ -142,43 +125,99 @@ class LeIE(InfoExtractor):
msg = 'Generic error. flag = %d' % flag msg = 'Generic error. flag = %d' % flag
raise ExtractorError(msg, expected=True) raise ExtractorError(msg, expected=True)
playurl = play_json['playurl'] def _real_extract(self, url):
media_id = self._match_id(url)
page = self._download_webpage(url, media_id)
formats = ['350', '1000', '1300', '720p', '1080p'] play_json_h5 = self._download_json(
dispatch = playurl['dispatch'] 'http://api.le.com/mms/out/video/playJsonH5',
media_id, 'Downloading html5 playJson data', query={
'id': media_id,
'platid': 3,
'splatid': 304,
'format': 1,
'tkey': self.get_mms_key(int(time.time())),
'domain': 'www.le.com',
'tss': 'no',
},
headers=self.geo_verification_headers())
self._check_errors(play_json_h5)
urls = [] play_json_flash = self._download_json(
for format_id in formats: 'http://api.le.com/mms/out/video/playJson',
if format_id in dispatch: media_id, 'Downloading flash playJson data', query={
media_url = playurl['domain'][0] + dispatch[format_id][0] 'id': media_id,
media_url += '&' + compat_urllib_parse_urlencode({ 'platid': 1,
'm3v': 1, 'splatid': 101,
'format': 1,
'tkey': self.calc_time_key(int(time.time())),
'domain': 'www.le.com',
},
headers=self.geo_verification_headers())
self._check_errors(play_json_flash)
def get_h5_urls(media_url, format_id):
location = self._download_json(
media_url, media_id,
'Download JSON metadata for format %s' % format_id, query={
'format': 1, 'format': 1,
'expect': 3, 'expect': 3,
'rateid': format_id, 'tss': 'no',
}) })['location']
nodes_data = self._download_json( return {
media_url, media_id, 'http': update_url_query(location, {'tss': 'no'}),
'Download JSON metadata for format %s' % format_id) 'hls': update_url_query(location, {'tss': 'ios'}),
}
req = self._request_webpage( def get_flash_urls(media_url, format_id):
nodes_data['nodelist'][0]['location'], media_id, media_url += '&' + compat_urllib_parse_urlencode({
note='Downloading m3u8 information for format %s' % format_id) 'm3v': 1,
'format': 1,
'expect': 3,
'rateid': format_id,
})
m3u8_data = self.decrypt_m3u8(req.read()) nodes_data = self._download_json(
media_url, media_id,
'Download JSON metadata for format %s' % format_id)
url_info_dict = { req = self._request_webpage(
'url': encode_data_uri(m3u8_data, 'application/vnd.apple.mpegurl'), nodes_data['nodelist'][0]['location'], media_id,
'ext': determine_ext(dispatch[format_id][1]), note='Downloading m3u8 information for format %s' % format_id)
'format_id': format_id,
'protocol': 'm3u8',
}
if format_id[-1:] == 'p': m3u8_data = self.decrypt_m3u8(req.read())
url_info_dict['height'] = int_or_none(format_id[:-1])
urls.append(url_info_dict) return {
'hls': encode_data_uri(m3u8_data, 'application/vnd.apple.mpegurl'),
}
extracted_formats = []
formats = []
for play_json, get_urls in ((play_json_h5, get_h5_urls), (play_json_flash, get_flash_urls)):
playurl = play_json['playurl']
play_domain = playurl['domain'][0]
for format_id, format_data in playurl.get('dispatch', []).items():
if format_id in extracted_formats:
continue
extracted_formats.append(format_id)
media_url = play_domain + format_data[0]
for protocol, format_url in get_urls(media_url, format_id).items():
f = {
'url': format_url,
'ext': determine_ext(format_data[1]),
'format_id': '%s-%s' % (protocol, format_id),
'protocol': 'm3u8_native' if protocol == 'hls' else 'http',
'quality': int_or_none(format_id),
}
if format_id[-1:] == 'p':
f['height'] = int_or_none(format_id[:-1])
formats.append(f)
self._sort_formats(formats, ('height', 'quality', 'format_id'))
publish_time = parse_iso8601(self._html_search_regex( publish_time = parse_iso8601(self._html_search_regex(
r'发布时间&nbsp;([^<>]+) ', page, 'publish time', default=None), r'发布时间&nbsp;([^<>]+) ', page, 'publish time', default=None),
@@ -187,7 +226,7 @@ class LeIE(InfoExtractor):
return { return {
'id': media_id, 'id': media_id,
'formats': urls, 'formats': formats,
'title': playurl['title'], 'title': playurl['title'],
'thumbnail': playurl['pic'], 'thumbnail': playurl['pic'],
'description': description, 'description': description,

View File

@@ -37,11 +37,12 @@ class LimelightBaseIE(InfoExtractor):
for stream in streams: for stream in streams:
stream_url = stream.get('url') stream_url = stream.get('url')
if not stream_url: if not stream_url or stream.get('drmProtected'):
continue continue
if '.f4m' in stream_url: ext = determine_ext(stream_url)
if ext == 'f4m':
formats.extend(self._extract_f4m_formats( formats.extend(self._extract_f4m_formats(
stream_url, video_id, fatal=False)) stream_url, video_id, f4m_id='hds', fatal=False))
else: else:
fmt = { fmt = {
'url': stream_url, 'url': stream_url,
@@ -50,13 +51,19 @@ class LimelightBaseIE(InfoExtractor):
'fps': float_or_none(stream.get('videoFrameRate')), 'fps': float_or_none(stream.get('videoFrameRate')),
'width': int_or_none(stream.get('videoWidthInPixels')), 'width': int_or_none(stream.get('videoWidthInPixels')),
'height': int_or_none(stream.get('videoHeightInPixels')), 'height': int_or_none(stream.get('videoHeightInPixels')),
'ext': determine_ext(stream_url) 'ext': ext,
} }
rtmp = re.search(r'^(?P<url>rtmpe?://[^/]+/(?P<app>.+))/(?P<playpath>mp4:.+)$', stream_url) rtmp = re.search(r'^(?P<url>rtmpe?://(?P<host>[^/]+)/(?P<app>.+))/(?P<playpath>mp4:.+)$', stream_url)
if rtmp: if rtmp:
format_id = 'rtmp' format_id = 'rtmp'
if stream.get('videoBitRate'): if stream.get('videoBitRate'):
format_id += '-%d' % int_or_none(stream['videoBitRate']) format_id += '-%d' % int_or_none(stream['videoBitRate'])
http_fmt = fmt.copy()
http_fmt.update({
'url': 'http://%s/%s' % (rtmp.group('host').replace('csl.', 'cpl.'), rtmp.group('playpath')[4:]),
'format_id': format_id.replace('rtmp', 'http'),
})
formats.append(http_fmt)
fmt.update({ fmt.update({
'url': rtmp.group('url'), 'url': rtmp.group('url'),
'play_path': rtmp.group('playpath'), 'play_path': rtmp.group('playpath'),
@@ -68,18 +75,23 @@ class LimelightBaseIE(InfoExtractor):
for mobile_url in mobile_urls: for mobile_url in mobile_urls:
media_url = mobile_url.get('mobileUrl') media_url = mobile_url.get('mobileUrl')
if not media_url:
continue
format_id = mobile_url.get('targetMediaPlatform') format_id = mobile_url.get('targetMediaPlatform')
if determine_ext(media_url) == 'm3u8': if not media_url or format_id == 'Widevine':
continue
ext = determine_ext(media_url)
if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats( formats.extend(self._extract_m3u8_formats(
media_url, video_id, 'mp4', 'm3u8_native', media_url, video_id, 'mp4', 'm3u8_native',
m3u8_id=format_id, fatal=False)) m3u8_id=format_id, fatal=False))
elif ext == 'f4m':
formats.extend(self._extract_f4m_formats(
stream_url, video_id, f4m_id=format_id, fatal=False))
else: else:
formats.append({ formats.append({
'url': media_url, 'url': media_url,
'format_id': format_id, 'format_id': format_id,
'preference': -1, 'preference': -1,
'ext': ext,
}) })
self._sort_formats(formats) self._sort_formats(formats)
@@ -145,7 +157,7 @@ class LimelightMediaIE(LimelightBaseIE):
'url': 'http://link.videoplatform.limelight.com/media/?mediaId=3ffd040b522b4485b6d84effc750cd86', 'url': 'http://link.videoplatform.limelight.com/media/?mediaId=3ffd040b522b4485b6d84effc750cd86',
'info_dict': { 'info_dict': {
'id': '3ffd040b522b4485b6d84effc750cd86', 'id': '3ffd040b522b4485b6d84effc750cd86',
'ext': 'flv', 'ext': 'mp4',
'title': 'HaP and the HB Prince Trailer', 'title': 'HaP and the HB Prince Trailer',
'description': 'md5:8005b944181778e313d95c1237ddb640', 'description': 'md5:8005b944181778e313d95c1237ddb640',
'thumbnail': 're:^https?://.*\.jpeg$', 'thumbnail': 're:^https?://.*\.jpeg$',
@@ -154,27 +166,23 @@ class LimelightMediaIE(LimelightBaseIE):
'upload_date': '20090604', 'upload_date': '20090604',
}, },
'params': { 'params': {
# rtmp download # m3u8 download
'skip_download': True, 'skip_download': True,
}, },
}, { }, {
# video with subtitles # video with subtitles
'url': 'limelight:media:a3e00274d4564ec4a9b29b9466432335', 'url': 'limelight:media:a3e00274d4564ec4a9b29b9466432335',
'md5': '2fa3bad9ac321e23860ca23bc2c69e3d',
'info_dict': { 'info_dict': {
'id': 'a3e00274d4564ec4a9b29b9466432335', 'id': 'a3e00274d4564ec4a9b29b9466432335',
'ext': 'flv', 'ext': 'mp4',
'title': '3Play Media Overview Video', 'title': '3Play Media Overview Video',
'description': '',
'thumbnail': 're:^https?://.*\.jpeg$', 'thumbnail': 're:^https?://.*\.jpeg$',
'duration': 78.101, 'duration': 78.101,
'timestamp': 1338929955, 'timestamp': 1338929955,
'upload_date': '20120605', 'upload_date': '20120605',
'subtitles': 'mincount:9', 'subtitles': 'mincount:9',
}, },
'params': {
# rtmp download
'skip_download': True,
},
}, { }, {
'url': 'https://assets.delvenetworks.com/player/loader.swf?mediaId=8018a574f08d416e95ceaccae4ba0452', 'url': 'https://assets.delvenetworks.com/player/loader.swf?mediaId=8018a574f08d416e95ceaccae4ba0452',
'only_matching': True, 'only_matching': True,

View File

@@ -100,7 +100,7 @@ class LyndaIE(LyndaBaseIE):
_TESTS = [{ _TESTS = [{
'url': 'http://www.lynda.com/Bootstrap-tutorials/Using-exercise-files/110885/114408-4.html', 'url': 'http://www.lynda.com/Bootstrap-tutorials/Using-exercise-files/110885/114408-4.html',
'md5': 'ecfc6862da89489161fb9cd5f5a6fac1', # md5 is unstable
'info_dict': { 'info_dict': {
'id': '114408', 'id': '114408',
'ext': 'mp4', 'ext': 'mp4',

View File

@@ -11,13 +11,14 @@ from ..utils import (
determine_ext, determine_ext,
ExtractorError, ExtractorError,
int_or_none, int_or_none,
sanitized_Request,
urlencode_postdata, urlencode_postdata,
get_element_by_attribute,
mimetype2ext,
) )
class MetacafeIE(InfoExtractor): class MetacafeIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?metacafe\.com/watch/([^/]+)/([^/]+)/.*' _VALID_URL = r'https?://(?:www\.)?metacafe\.com/watch/(?P<video_id>[^/]+)/(?P<display_id>[^/?#]+)'
_DISCLAIMER = 'http://www.metacafe.com/family_filter/' _DISCLAIMER = 'http://www.metacafe.com/family_filter/'
_FILTER_POST = 'http://www.metacafe.com/f/index.php?inputType=filter&controllerGroup=user' _FILTER_POST = 'http://www.metacafe.com/f/index.php?inputType=filter&controllerGroup=user'
IE_NAME = 'metacafe' IE_NAME = 'metacafe'
@@ -47,6 +48,7 @@ class MetacafeIE(InfoExtractor):
'uploader': 'ign', 'uploader': 'ign',
'description': 'Sony released a massive FAQ on the PlayStation Blog detailing the PS4\'s capabilities and limitations.', 'description': 'Sony released a massive FAQ on the PlayStation Blog detailing the PS4\'s capabilities and limitations.',
}, },
'skip': 'Page is temporarily unavailable.',
}, },
# AnyClip video # AnyClip video
{ {
@@ -55,8 +57,8 @@ class MetacafeIE(InfoExtractor):
'id': 'an-dVVXnuY7Jh77J', 'id': 'an-dVVXnuY7Jh77J',
'ext': 'mp4', 'ext': 'mp4',
'title': 'The Andromeda Strain (1971): Stop the Bomb Part 3', 'title': 'The Andromeda Strain (1971): Stop the Bomb Part 3',
'uploader': 'anyclip', 'uploader': 'AnyClip',
'description': 'md5:38c711dd98f5bb87acf973d573442e67', 'description': 'md5:cbef0460d31e3807f6feb4e7a5952e5b',
}, },
}, },
# age-restricted video # age-restricted video
@@ -110,28 +112,25 @@ class MetacafeIE(InfoExtractor):
def report_disclaimer(self): def report_disclaimer(self):
self.to_screen('Retrieving disclaimer') self.to_screen('Retrieving disclaimer')
def _real_initialize(self): def _confirm_age(self):
# Retrieve disclaimer # Retrieve disclaimer
self.report_disclaimer() self.report_disclaimer()
self._download_webpage(self._DISCLAIMER, None, False, 'Unable to retrieve disclaimer') self._download_webpage(self._DISCLAIMER, None, False, 'Unable to retrieve disclaimer')
# Confirm age # Confirm age
disclaimer_form = {
'filters': '0',
'submit': "Continue - I'm over 18",
}
request = sanitized_Request(self._FILTER_POST, urlencode_postdata(disclaimer_form))
request.add_header('Content-Type', 'application/x-www-form-urlencoded')
self.report_age_confirmation() self.report_age_confirmation()
self._download_webpage(request, None, False, 'Unable to confirm age') self._download_webpage(
self._FILTER_POST, None, False, 'Unable to confirm age',
data=urlencode_postdata({
'filters': '0',
'submit': "Continue - I'm over 18",
}), headers={
'Content-Type': 'application/x-www-form-urlencoded',
})
def _real_extract(self, url): def _real_extract(self, url):
# Extract id and simplified title from URL # Extract id and simplified title from URL
mobj = re.match(self._VALID_URL, url) video_id, display_id = re.match(self._VALID_URL, url).groups()
if mobj is None:
raise ExtractorError('Invalid URL: %s' % url)
video_id = mobj.group(1)
# the video may come from an external site # the video may come from an external site
m_external = re.match('^(\w{2})-(.*)$', video_id) m_external = re.match('^(\w{2})-(.*)$', video_id)
@@ -144,15 +143,24 @@ class MetacafeIE(InfoExtractor):
if prefix == 'cb': if prefix == 'cb':
return self.url_result('theplatform:%s' % ext_id, 'ThePlatform') return self.url_result('theplatform:%s' % ext_id, 'ThePlatform')
# Retrieve video webpage to extract further information # self._confirm_age()
req = sanitized_Request('http://www.metacafe.com/watch/%s/' % video_id)
# AnyClip videos require the flashversion cookie so that we get the link # AnyClip videos require the flashversion cookie so that we get the link
# to the mp4 file # to the mp4 file
mobj_an = re.match(r'^an-(.*?)$', video_id) headers = {}
if mobj_an: if video_id.startswith('an-'):
req.headers['Cookie'] = 'flashVersion=0;' headers['Cookie'] = 'flashVersion=0;'
webpage = self._download_webpage(req, video_id)
# Retrieve video webpage to extract further information
webpage = self._download_webpage(url, video_id, headers=headers)
error = get_element_by_attribute(
'class', 'notfound-page-title', webpage)
if error:
raise ExtractorError(error, expected=True)
video_title = self._html_search_meta(
['og:title', 'twitter:title'], webpage, 'title', default=None) or self._search_regex(r'<h1>(.*?)</h1>', webpage, 'title')
# Extract URL, uploader and title from webpage # Extract URL, uploader and title from webpage
self.report_extraction(video_id) self.report_extraction(video_id)
@@ -216,20 +224,40 @@ class MetacafeIE(InfoExtractor):
'player_url': player_url, 'player_url': player_url,
'ext': play_path.partition(':')[0], 'ext': play_path.partition(':')[0],
}) })
if video_url is None:
flashvars = self._parse_json(self._search_regex(
r'flashvars\s*=\s*({.*});', webpage, 'flashvars',
default=None), video_id, fatal=False)
if flashvars:
video_url = []
for source in flashvars.get('sources'):
source_url = source.get('src')
if not source_url:
continue
ext = mimetype2ext(source.get('type')) or determine_ext(source_url)
if ext == 'm3u8':
video_url.extend(self._extract_m3u8_formats(
source_url, video_id, 'mp4',
'm3u8_native', m3u8_id='hls', fatal=False))
else:
video_url.append({
'url': source_url,
'ext': ext,
})
if video_url is None: if video_url is None:
raise ExtractorError('Unsupported video type') raise ExtractorError('Unsupported video type')
video_title = self._html_search_regex( description = self._html_search_meta(
r'(?im)<title>(.*) - Video</title>', webpage, 'title') ['og:description', 'twitter:description', 'description'],
description = self._og_search_description(webpage) webpage, 'title', fatal=False)
thumbnail = self._og_search_thumbnail(webpage) thumbnail = self._html_search_meta(
['og:image', 'twitter:image'], webpage, 'title', fatal=False)
video_uploader = self._html_search_regex( video_uploader = self._html_search_regex(
r'submitter=(.*?);|googletag\.pubads\(\)\.setTargeting\("(?:channel|submiter)","([^"]+)"\);', r'submitter=(.*?);|googletag\.pubads\(\)\.setTargeting\("(?:channel|submiter)","([^"]+)"\);',
webpage, 'uploader nickname', fatal=False) webpage, 'uploader nickname', fatal=False)
duration = int_or_none( duration = int_or_none(
self._html_search_meta('video:duration', webpage)) self._html_search_meta('video:duration', webpage, default=None))
age_limit = ( age_limit = (
18 18
if re.search(r'(?:"contentRating":|"rating",)"restricted"', webpage) if re.search(r'(?:"contentRating":|"rating",)"restricted"', webpage)
@@ -242,10 +270,11 @@ class MetacafeIE(InfoExtractor):
'url': video_url, 'url': video_url,
'ext': video_ext, 'ext': video_ext,
}] }]
self._sort_formats(formats) self._sort_formats(formats)
return { return {
'id': video_id, 'id': video_id,
'display_id': display_id,
'description': description, 'description': description,
'uploader': video_uploader, 'uploader': video_uploader,
'title': video_title, 'title': video_title,

View File

@@ -9,7 +9,7 @@ class MGTVIE(InfoExtractor):
_VALID_URL = r'https?://www\.mgtv\.com/v/(?:[^/]+/)*(?P<id>\d+)\.html' _VALID_URL = r'https?://www\.mgtv\.com/v/(?:[^/]+/)*(?P<id>\d+)\.html'
IE_DESC = '芒果TV' IE_DESC = '芒果TV'
_TEST = { _TESTS = [{
'url': 'http://www.mgtv.com/v/1/290525/f/3116640.html', 'url': 'http://www.mgtv.com/v/1/290525/f/3116640.html',
'md5': '1bdadcf760a0b90946ca68ee9a2db41a', 'md5': '1bdadcf760a0b90946ca68ee9a2db41a',
'info_dict': { 'info_dict': {
@@ -20,13 +20,18 @@ class MGTVIE(InfoExtractor):
'duration': 7461, 'duration': 7461,
'thumbnail': 're:^https?://.*\.jpg$', 'thumbnail': 're:^https?://.*\.jpg$',
}, },
} }, {
# no tbr extracted from stream_url
'url': 'http://www.mgtv.com/v/1/1/f/3324755.html',
'only_matching': True,
}]
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
api_data = self._download_json( api_data = self._download_json(
'http://v.api.mgtv.com/player/video', video_id, 'http://v.api.mgtv.com/player/video', video_id,
query={'video_id': video_id})['data'] query={'video_id': video_id},
headers=self.geo_verification_headers())['data']
info = api_data['info'] info = api_data['info']
formats = [] formats = []
@@ -40,7 +45,8 @@ class MGTVIE(InfoExtractor):
def extract_format(stream_url, format_id, idx, query={}): def extract_format(stream_url, format_id, idx, query={}):
format_info = self._download_json( format_info = self._download_json(
stream_url, video_id, stream_url, video_id,
note='Download video info for format %s' % format_id or '#%d' % idx, query=query) note='Download video info for format %s' % (format_id or '#%d' % idx),
query=query)
return { return {
'format_id': format_id, 'format_id': format_id,
'url': format_info['info'], 'url': format_info['info'],

View File

@@ -4,6 +4,7 @@ from __future__ import unicode_literals
import random import random
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_urlparse
from ..utils import ( from ..utils import (
xpath_text, xpath_text,
int_or_none, int_or_none,
@@ -18,13 +19,16 @@ class MioMioIE(InfoExtractor):
_TESTS = [{ _TESTS = [{
# "type=video" in flashvars # "type=video" in flashvars
'url': 'http://www.miomio.tv/watch/cc88912/', 'url': 'http://www.miomio.tv/watch/cc88912/',
'md5': '317a5f7f6b544ce8419b784ca8edae65',
'info_dict': { 'info_dict': {
'id': '88912', 'id': '88912',
'ext': 'flv', 'ext': 'flv',
'title': '【SKY】字幕 铠武昭和VS平成 假面骑士大战FEAT战队 魔星字幕组 字幕', 'title': '【SKY】字幕 铠武昭和VS平成 假面骑士大战FEAT战队 魔星字幕组 字幕',
'duration': 5923, 'duration': 5923,
}, },
'params': {
# The server provides broken file
'skip_download': True,
}
}, { }, {
'url': 'http://www.miomio.tv/watch/cc184024/', 'url': 'http://www.miomio.tv/watch/cc184024/',
'info_dict': { 'info_dict': {
@@ -32,7 +36,7 @@ class MioMioIE(InfoExtractor):
'title': '《动漫同人插画绘制》', 'title': '《动漫同人插画绘制》',
}, },
'playlist_mincount': 86, 'playlist_mincount': 86,
'skip': 'This video takes time too long for retrieving the URL', 'skip': 'Unable to load videos',
}, { }, {
'url': 'http://www.miomio.tv/watch/cc173113/', 'url': 'http://www.miomio.tv/watch/cc173113/',
'info_dict': { 'info_dict': {
@@ -40,20 +44,23 @@ class MioMioIE(InfoExtractor):
'title': 'The New Macbook 2015 上手试玩与简评' 'title': 'The New Macbook 2015 上手试玩与简评'
}, },
'playlist_mincount': 2, 'playlist_mincount': 2,
'skip': 'Unable to load videos',
}, {
# new 'h5' player
'url': 'http://www.miomio.tv/watch/cc273295/',
'md5': '',
'info_dict': {
'id': '273295',
'ext': 'mp4',
'title': 'アウト×デラックス 20160526',
},
'params': {
# intermittent HTTP 500
'skip_download': True,
},
}] }]
def _real_extract(self, url): def _extract_mioplayer(self, webpage, video_id, title, http_headers):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
title = self._html_search_meta(
'description', webpage, 'title', fatal=True)
mioplayer_path = self._search_regex(
r'src="(/mioplayer/[^"]+)"', webpage, 'ref_path')
http_headers = {'Referer': 'http://www.miomio.tv%s' % mioplayer_path}
xml_config = self._search_regex( xml_config = self._search_regex(
r'flashvars="type=(?:sina|video)&amp;(.+?)&amp;', r'flashvars="type=(?:sina|video)&amp;(.+?)&amp;',
webpage, 'xml config') webpage, 'xml config')
@@ -92,10 +99,34 @@ class MioMioIE(InfoExtractor):
'http_headers': http_headers, 'http_headers': http_headers,
}) })
return entries
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
title = self._html_search_meta(
'description', webpage, 'title', fatal=True)
mioplayer_path = self._search_regex(
r'src="(/mioplayer(?:_h5)?/[^"]+)"', webpage, 'ref_path')
if '_h5' in mioplayer_path:
player_url = compat_urlparse.urljoin(url, mioplayer_path)
player_webpage = self._download_webpage(
player_url, video_id,
note='Downloading player webpage', headers={'Referer': url})
entries = self._parse_html5_media_entries(player_url, player_webpage)
http_headers = {'Referer': player_url}
else:
http_headers = {'Referer': 'http://www.miomio.tv%s' % mioplayer_path}
entries = self._extract_mioplayer(webpage, video_id, title, http_headers)
if len(entries) == 1: if len(entries) == 1:
segment = entries[0] segment = entries[0]
segment['id'] = video_id segment['id'] = video_id
segment['title'] = title segment['title'] = title
segment['http_headers'] = http_headers
return segment return segment
return { return {

View File

@@ -12,12 +12,69 @@ from ..utils import (
get_element_by_attribute, get_element_by_attribute,
int_or_none, int_or_none,
remove_start, remove_start,
extract_attributes,
determine_ext,
) )
class MiTeleIE(InfoExtractor): class MiTeleBaseIE(InfoExtractor):
def _get_player_info(self, url, webpage):
player_data = extract_attributes(self._search_regex(
r'(?s)(<ms-video-player.+?</ms-video-player>)',
webpage, 'ms video player'))
video_id = player_data['data-media-id']
config_url = compat_urlparse.urljoin(url, player_data['data-config'])
config = self._download_json(
config_url, video_id, 'Downloading config JSON')
mmc_url = config['services']['mmc']
duration = None
formats = []
for m_url in (mmc_url, mmc_url.replace('/flash.json', '/html5.json')):
mmc = self._download_json(
m_url, video_id, 'Downloading mmc JSON')
if not duration:
duration = int_or_none(mmc.get('duration'))
for location in mmc['locations']:
gat = self._proto_relative_url(location.get('gat'), 'http:')
bas = location.get('bas')
loc = location.get('loc')
ogn = location.get('ogn')
if None in (gat, bas, loc, ogn):
continue
token_data = {
'bas': bas,
'icd': loc,
'ogn': ogn,
'sta': '0',
}
media = self._download_json(
'%s/?%s' % (gat, compat_urllib_parse_urlencode(token_data)),
video_id, 'Downloading %s JSON' % location['loc'])
file_ = media.get('file')
if not file_:
continue
ext = determine_ext(file_)
if ext == 'f4m':
formats.extend(self._extract_f4m_formats(
file_ + '&hdcore=3.2.0&plugin=aasp-3.2.0.77.18',
video_id, f4m_id='hds', fatal=False))
elif ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
file_, video_id, 'mp4', 'm3u8_native', m3u8_id='hls', fatal=False))
self._sort_formats(formats)
return {
'id': video_id,
'formats': formats,
'thumbnail': player_data.get('data-poster') or config.get('poster', {}).get('imageUrl'),
'duration': duration,
}
class MiTeleIE(MiTeleBaseIE):
IE_DESC = 'mitele.es' IE_DESC = 'mitele.es'
_VALID_URL = r'https?://www\.mitele\.es/[^/]+/[^/]+/[^/]+/(?P<id>[^/]+)/' _VALID_URL = r'https?://www\.mitele\.es/(?:[^/]+/){3}(?P<id>[^/]+)/'
_TESTS = [{ _TESTS = [{
'url': 'http://www.mitele.es/programas-tv/diario-de/la-redaccion/programa-144/', 'url': 'http://www.mitele.es/programas-tv/diario-de/la-redaccion/programa-144/',
@@ -25,7 +82,7 @@ class MiTeleIE(InfoExtractor):
'info_dict': { 'info_dict': {
'id': '0NF1jJnxS1Wu3pHrmvFyw2', 'id': '0NF1jJnxS1Wu3pHrmvFyw2',
'display_id': 'programa-144', 'display_id': 'programa-144',
'ext': 'flv', 'ext': 'mp4',
'title': 'Tor, la web invisible', 'title': 'Tor, la web invisible',
'description': 'md5:3b6fce7eaa41b2d97358726378d9369f', 'description': 'md5:3b6fce7eaa41b2d97358726378d9369f',
'series': 'Diario de', 'series': 'Diario de',
@@ -40,7 +97,7 @@ class MiTeleIE(InfoExtractor):
'info_dict': { 'info_dict': {
'id': 'eLZSwoEd1S3pVyUm8lc6F', 'id': 'eLZSwoEd1S3pVyUm8lc6F',
'display_id': 'programa-226', 'display_id': 'programa-226',
'ext': 'flv', 'ext': 'mp4',
'title': 'Cuarto Milenio - Temporada 6 - Programa 226', 'title': 'Cuarto Milenio - Temporada 6 - Programa 226',
'description': 'md5:50daf9fadefa4e62d9fc866d0c015701', 'description': 'md5:50daf9fadefa4e62d9fc866d0c015701',
'series': 'Cuarto Milenio', 'series': 'Cuarto Milenio',
@@ -59,40 +116,7 @@ class MiTeleIE(InfoExtractor):
webpage = self._download_webpage(url, display_id) webpage = self._download_webpage(url, display_id)
config_url = self._search_regex( info = self._get_player_info(url, webpage)
r'data-config\s*=\s*"([^"]+)"', webpage, 'data config url')
config_url = compat_urlparse.urljoin(url, config_url)
config = self._download_json(
config_url, display_id, 'Downloading config JSON')
mmc = self._download_json(
config['services']['mmc'], display_id, 'Downloading mmc JSON')
formats = []
for location in mmc['locations']:
gat = self._proto_relative_url(location.get('gat'), 'http:')
bas = location.get('bas')
loc = location.get('loc')
ogn = location.get('ogn')
if None in (gat, bas, loc, ogn):
continue
token_data = {
'bas': bas,
'icd': loc,
'ogn': ogn,
'sta': '0',
}
media = self._download_json(
'%s/?%s' % (gat, compat_urllib_parse_urlencode(token_data)),
display_id, 'Downloading %s JSON' % location['loc'])
file_ = media.get('file')
if not file_:
continue
formats.extend(self._extract_f4m_formats(
file_ + '&hdcore=3.2.0&plugin=aasp-3.2.0.77.18',
display_id, f4m_id=loc))
self._sort_formats(formats)
title = self._search_regex( title = self._search_regex(
r'class="Destacado-text"[^>]*>\s*<strong>([^<]+)</strong>', r'class="Destacado-text"[^>]*>\s*<strong>([^<]+)</strong>',
@@ -112,21 +136,12 @@ class MiTeleIE(InfoExtractor):
title = remove_start(self._search_regex( title = remove_start(self._search_regex(
r'<title>([^<]+)</title>', webpage, 'title'), 'Ver online ') r'<title>([^<]+)</title>', webpage, 'title'), 'Ver online ')
video_id = self._search_regex( info.update({
r'data-media-id\s*=\s*"([^"]+)"', webpage,
'data media id', default=None) or display_id
thumbnail = config.get('poster', {}).get('imageUrl')
duration = int_or_none(mmc.get('duration'))
return {
'id': video_id,
'display_id': display_id, 'display_id': display_id,
'title': title, 'title': title,
'description': get_element_by_attribute('class', 'text', webpage), 'description': get_element_by_attribute('class', 'text', webpage),
'series': series, 'series': series,
'season': season, 'season': season,
'episode': episode, 'episode': episode,
'thumbnail': thumbnail, })
'duration': duration, return info
'formats': formats,
}

View File

@@ -15,6 +15,8 @@ from ..utils import (
float_or_none, float_or_none,
HEADRequest, HEADRequest,
sanitized_Request, sanitized_Request,
strip_or_none,
timeconvert,
unescapeHTML, unescapeHTML,
url_basename, url_basename,
RegexNotFoundError, RegexNotFoundError,
@@ -35,13 +37,13 @@ class MTVServicesInfoExtractor(InfoExtractor):
return uri.split(':')[-1] return uri.split(':')[-1]
# This was originally implemented for ComedyCentral, but it also works here # This was originally implemented for ComedyCentral, but it also works here
@staticmethod @classmethod
def _transform_rtmp_url(rtmp_video_url): def _transform_rtmp_url(cls, rtmp_video_url):
m = re.match(r'^rtmpe?://.*?/(?P<finalid>gsp\..+?/.*)$', rtmp_video_url) m = re.match(r'^rtmpe?://.*?/(?P<finalid>gsp\..+?/.*)$', rtmp_video_url)
if not m: if not m:
return rtmp_video_url return {'rtmp': rtmp_video_url}
base = 'http://viacommtvstrmfs.fplive.net/' base = 'http://viacommtvstrmfs.fplive.net/'
return base + m.group('finalid') return {'http': base + m.group('finalid')}
def _get_feed_url(self, uri): def _get_feed_url(self, uri):
return self._FEED_URL return self._FEED_URL
@@ -85,14 +87,14 @@ class MTVServicesInfoExtractor(InfoExtractor):
rtmp_video_url = rendition.find('./src').text rtmp_video_url = rendition.find('./src').text
if rtmp_video_url.endswith('siteunavail.png'): if rtmp_video_url.endswith('siteunavail.png'):
continue continue
new_url = self._transform_rtmp_url(rtmp_video_url) new_urls = self._transform_rtmp_url(rtmp_video_url)
formats.append({ formats.extend([{
'ext': 'flv' if new_url.startswith('rtmp') else ext, 'ext': 'flv' if new_url.startswith('rtmp') else ext,
'url': new_url, 'url': new_url,
'format_id': rendition.get('bitrate'), 'format_id': '-'.join(filter(None, [kind, rendition.get('bitrate')])),
'width': int(rendition.get('width')), 'width': int(rendition.get('width')),
'height': int(rendition.get('height')), 'height': int(rendition.get('height')),
}) } for kind, new_url in new_urls.items()])
except (KeyError, TypeError): except (KeyError, TypeError):
raise ExtractorError('Invalid rendition field.') raise ExtractorError('Invalid rendition field.')
self._sort_formats(formats) self._sort_formats(formats)
@@ -133,7 +135,9 @@ class MTVServicesInfoExtractor(InfoExtractor):
message += item.text message += item.text
raise ExtractorError(message, expected=True) raise ExtractorError(message, expected=True)
description = xpath_text(itemdoc, 'description') description = strip_or_none(xpath_text(itemdoc, 'description'))
timestamp = timeconvert(xpath_text(itemdoc, 'pubDate'))
title_el = None title_el = None
if title_el is None: if title_el is None:
@@ -167,6 +171,7 @@ class MTVServicesInfoExtractor(InfoExtractor):
'thumbnail': self._get_thumbnail_url(uri, itemdoc), 'thumbnail': self._get_thumbnail_url(uri, itemdoc),
'description': description, 'description': description,
'duration': float_or_none(content_el.attrib.get('duration')), 'duration': float_or_none(content_el.attrib.get('duration')),
'timestamp': timestamp,
} }
def _get_feed_query(self, uri): def _get_feed_query(self, uri):
@@ -185,8 +190,13 @@ class MTVServicesInfoExtractor(InfoExtractor):
idoc = self._download_xml( idoc = self._download_xml(
url, video_id, url, video_id,
'Downloading info', transform_source=fix_xml_ampersands) 'Downloading info', transform_source=fix_xml_ampersands)
title = xpath_text(idoc, './channel/title')
description = xpath_text(idoc, './channel/description')
return self.playlist_result( return self.playlist_result(
[self._get_video_info(item) for item in idoc.findall('.//item')]) [self._get_video_info(item) for item in idoc.findall('.//item')],
playlist_title=title, playlist_description=description)
def _extract_mgid(self, webpage): def _extract_mgid(self, webpage):
try: try:
@@ -232,6 +242,8 @@ class MTVServicesEmbeddedIE(MTVServicesInfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'title': 'Peter Dinklage Sums Up \'Game Of Thrones\' In 45 Seconds', 'title': 'Peter Dinklage Sums Up \'Game Of Thrones\' In 45 Seconds',
'description': '"Sexy sexy sexy, stabby stabby stabby, beautiful language," says Peter Dinklage as he tries summarizing "Game of Thrones" in under a minute.', 'description': '"Sexy sexy sexy, stabby stabby stabby, beautiful language," says Peter Dinklage as he tries summarizing "Game of Thrones" in under a minute.',
'timestamp': 1400126400,
'upload_date': '20140515',
}, },
} }
@@ -274,6 +286,8 @@ class MTVIE(MTVServicesInfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'title': 'Taylor Swift - "Ours (VH1 Storytellers)"', 'title': 'Taylor Swift - "Ours (VH1 Storytellers)"',
'description': 'Album: Taylor Swift performs "Ours" for VH1 Storytellers at Harvey Mudd College.', 'description': 'Album: Taylor Swift performs "Ours" for VH1 Storytellers at Harvey Mudd College.',
'timestamp': 1352610000,
'upload_date': '20121111',
}, },
}, },
] ]
@@ -300,20 +314,6 @@ class MTVIE(MTVServicesInfoExtractor):
return self._get_videos_info(uri) return self._get_videos_info(uri)
class MTVIggyIE(MTVServicesInfoExtractor):
IE_NAME = 'mtviggy.com'
_VALID_URL = r'https?://www\.mtviggy\.com/videos/.+'
_TEST = {
'url': 'http://www.mtviggy.com/videos/arcade-fire-behind-the-scenes-at-the-biggest-music-experiment-yet/',
'info_dict': {
'id': '984696',
'ext': 'mp4',
'title': 'Arcade Fire: Behind the Scenes at the Biggest Music Experiment Yet',
}
}
_FEED_URL = 'http://all.mtvworldverticals.com/feed-xml/'
class MTVDEIE(MTVServicesInfoExtractor): class MTVDEIE(MTVServicesInfoExtractor):
IE_NAME = 'mtv.de' IE_NAME = 'mtv.de'
_VALID_URL = r'https?://(?:www\.)?mtv\.de/(?:artists|shows|news)/(?:[^/]+/)*(?P<id>\d+)-[^/#?]+/*(?:[#?].*)?$' _VALID_URL = r'https?://(?:www\.)?mtv\.de/(?:artists|shows|news)/(?:[^/]+/)*(?P<id>\d+)-[^/#?]+/*(?:[#?].*)?$'
@@ -321,7 +321,7 @@ class MTVDEIE(MTVServicesInfoExtractor):
'url': 'http://www.mtv.de/artists/10571-cro/videos/61131-traum', 'url': 'http://www.mtv.de/artists/10571-cro/videos/61131-traum',
'info_dict': { 'info_dict': {
'id': 'music_video-a50bc5f0b3aa4b3190aa', 'id': 'music_video-a50bc5f0b3aa4b3190aa',
'ext': 'mp4', 'ext': 'flv',
'title': 'MusicVideo_cro-traum', 'title': 'MusicVideo_cro-traum',
'description': 'Cro - Traum', 'description': 'Cro - Traum',
}, },
@@ -329,20 +329,21 @@ class MTVDEIE(MTVServicesInfoExtractor):
# rtmp download # rtmp download
'skip_download': True, 'skip_download': True,
}, },
'skip': 'Blocked at Travis CI',
}, { }, {
# mediagen URL without query (e.g. http://videos.mtvnn.com/mediagen/e865da714c166d18d6f80893195fcb97) # mediagen URL without query (e.g. http://videos.mtvnn.com/mediagen/e865da714c166d18d6f80893195fcb97)
'url': 'http://www.mtv.de/shows/933-teen-mom-2/staffeln/5353/folgen/63565-enthullungen', 'url': 'http://www.mtv.de/shows/933-teen-mom-2/staffeln/5353/folgen/63565-enthullungen',
'info_dict': { 'info_dict': {
'id': 'local_playlist-f5ae778b9832cc837189', 'id': 'local_playlist-f5ae778b9832cc837189',
'ext': 'mp4', 'ext': 'flv',
'title': 'Episode_teen-mom-2_shows_season-5_episode-1_full-episode_part1', 'title': 'Episode_teen-mom-2_shows_season-5_episode-1_full-episode_part1',
}, },
'params': { 'params': {
# rtmp download # rtmp download
'skip_download': True, 'skip_download': True,
}, },
'skip': 'Blocked at Travis CI',
}, { }, {
# single video in pagePlaylist with different id
'url': 'http://www.mtv.de/news/77491-mtv-movies-spotlight-pixels-teil-3', 'url': 'http://www.mtv.de/news/77491-mtv-movies-spotlight-pixels-teil-3',
'info_dict': { 'info_dict': {
'id': 'local_playlist-4e760566473c4c8c5344', 'id': 'local_playlist-4e760566473c4c8c5344',
@@ -354,6 +355,7 @@ class MTVDEIE(MTVServicesInfoExtractor):
# rtmp download # rtmp download
'skip_download': True, 'skip_download': True,
}, },
'skip': 'Das Video kann zur Zeit nicht abgespielt werden.',
}] }]
def _real_extract(self, url): def _real_extract(self, url):
@@ -366,11 +368,14 @@ class MTVDEIE(MTVServicesInfoExtractor):
r'window\.pagePlaylist\s*=\s*(\[.+?\]);\n', webpage, 'page playlist'), r'window\.pagePlaylist\s*=\s*(\[.+?\]);\n', webpage, 'page playlist'),
video_id) video_id)
def _mrss_url(item):
return item['mrss'] + item.get('mrssvars', '')
# news pages contain single video in playlist with different id # news pages contain single video in playlist with different id
if len(playlist) == 1: if len(playlist) == 1:
return self._get_videos_info_from_url(playlist[0]['mrss'], video_id) return self._get_videos_info_from_url(_mrss_url(playlist[0]), video_id)
for item in playlist: for item in playlist:
item_id = item.get('id') item_id = item.get('id')
if item_id and compat_str(item_id) == video_id: if item_id and compat_str(item_id) == video_id:
return self._get_videos_info_from_url(item['mrss'], video_id) return self._get_videos_info_from_url(_mrss_url(item), video_id)

View File

@@ -1,16 +1,19 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import re
from .common import InfoExtractor from .common import InfoExtractor
from .theplatform import ThePlatformIE from .theplatform import ThePlatformIE
from ..utils import ( from ..utils import (
smuggle_url, smuggle_url,
url_basename, url_basename,
update_url_query, update_url_query,
get_element_by_class,
) )
class NationalGeographicIE(InfoExtractor): class NationalGeographicVideoIE(InfoExtractor):
IE_NAME = 'natgeo' IE_NAME = 'natgeo:video'
_VALID_URL = r'https?://video\.nationalgeographic\.com/.*?' _VALID_URL = r'https?://video\.nationalgeographic\.com/.*?'
_TESTS = [ _TESTS = [
@@ -62,16 +65,16 @@ class NationalGeographicIE(InfoExtractor):
} }
class NationalGeographicChannelIE(ThePlatformIE): class NationalGeographicIE(ThePlatformIE):
IE_NAME = 'natgeo:channel' IE_NAME = 'natgeo'
_VALID_URL = r'https?://channel\.nationalgeographic\.com/(?:wild/)?[^/]+/videos/(?P<id>[^/?]+)' _VALID_URL = r'https?://channel\.nationalgeographic\.com/(?:wild/)?[^/]+/(?:videos|episodes)/(?P<id>[^/?]+)'
_TESTS = [ _TESTS = [
{ {
'url': 'http://channel.nationalgeographic.com/the-story-of-god-with-morgan-freeman/videos/uncovering-a-universal-knowledge/', 'url': 'http://channel.nationalgeographic.com/the-story-of-god-with-morgan-freeman/videos/uncovering-a-universal-knowledge/',
'md5': '518c9aa655686cf81493af5cc21e2a04', 'md5': '518c9aa655686cf81493af5cc21e2a04',
'info_dict': { 'info_dict': {
'id': 'nB5vIAfmyllm', 'id': 'vKInpacll2pC',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Uncovering a Universal Knowledge', 'title': 'Uncovering a Universal Knowledge',
'description': 'md5:1a89148475bf931b3661fcd6ddb2ae3a', 'description': 'md5:1a89148475bf931b3661fcd6ddb2ae3a',
@@ -85,7 +88,7 @@ class NationalGeographicChannelIE(ThePlatformIE):
'url': 'http://channel.nationalgeographic.com/wild/destination-wild/videos/the-stunning-red-bird-of-paradise/', 'url': 'http://channel.nationalgeographic.com/wild/destination-wild/videos/the-stunning-red-bird-of-paradise/',
'md5': 'c4912f656b4cbe58f3e000c489360989', 'md5': 'c4912f656b4cbe58f3e000c489360989',
'info_dict': { 'info_dict': {
'id': '3TmMv9OvGwIR', 'id': 'Pok5lWCkiEFA',
'ext': 'mp4', 'ext': 'mp4',
'title': 'The Stunning Red Bird of Paradise', 'title': 'The Stunning Red Bird of Paradise',
'description': 'md5:7bc8cd1da29686be4d17ad1230f0140c', 'description': 'md5:7bc8cd1da29686be4d17ad1230f0140c',
@@ -95,6 +98,10 @@ class NationalGeographicChannelIE(ThePlatformIE):
}, },
'add_ie': ['ThePlatform'], 'add_ie': ['ThePlatform'],
}, },
{
'url': 'http://channel.nationalgeographic.com/the-story-of-god-with-morgan-freeman/episodes/the-power-of-miracles/',
'only_matching': True,
}
] ]
def _real_extract(self, url): def _real_extract(self, url):
@@ -122,3 +129,40 @@ class NationalGeographicChannelIE(ThePlatformIE):
{'force_smil_url': True}), {'force_smil_url': True}),
'display_id': display_id, 'display_id': display_id,
} }
class NationalGeographicEpisodeGuideIE(ThePlatformIE):
IE_NAME = 'natgeo:episodeguide'
_VALID_URL = r'https?://channel\.nationalgeographic\.com/(?:wild/)?(?P<id>[^/]+)/episode-guide'
_TESTS = [
{
'url': 'http://channel.nationalgeographic.com/the-story-of-god-with-morgan-freeman/episode-guide/',
'info_dict': {
'id': 'the-story-of-god-with-morgan-freeman-season-1',
'title': 'The Story of God with Morgan Freeman - Season 1',
},
'playlist_mincount': 6,
},
{
'url': 'http://channel.nationalgeographic.com/underworld-inc/episode-guide/?s=2',
'info_dict': {
'id': 'underworld-inc-season-2',
'title': 'Underworld, Inc. - Season 2',
},
'playlist_mincount': 7,
},
]
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
show = get_element_by_class('show', webpage)
selected_season = self._search_regex(
r'<div[^>]+class="select-seasons[^"]*".*?<a[^>]*>(.*?)</a>',
webpage, 'selected season')
entries = [
self.url_result(self._proto_relative_url(entry_url), 'NationalGeographic')
for entry_url in re.findall('(?s)<div[^>]+class="col-inner"[^>]*?>.*?<a[^>]+href="([^"]+)"', webpage)]
return self.playlist_result(
entries, '%s-%s' % (display_id, selected_season.lower().replace(' ', '-')),
'%s - %s' % (show, selected_season))

View File

@@ -4,12 +4,10 @@ from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import (
compat_urllib_parse_urlencode,
compat_urlparse,
)
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
int_or_none,
update_url_query,
) )
@@ -51,48 +49,74 @@ class NaverIE(InfoExtractor):
if error: if error:
raise ExtractorError(error, expected=True) raise ExtractorError(error, expected=True)
raise ExtractorError('couldn\'t extract vid and key') raise ExtractorError('couldn\'t extract vid and key')
vid = m_id.group(1) video_data = self._download_json(
key = m_id.group(2) 'http://play.rmcnmv.naver.com/vod/play/v2.0/' + m_id.group(1),
query = compat_urllib_parse_urlencode({'vid': vid, 'inKey': key, }) video_id, query={
query_urls = compat_urllib_parse_urlencode({ 'key': m_id.group(2),
'masterVid': vid, })
'protocol': 'p2p', meta = video_data['meta']
'inKey': key, title = meta['subject']
})
info = self._download_xml(
'http://serviceapi.rmcnmv.naver.com/flash/videoInfo.nhn?' + query,
video_id, 'Downloading video info')
urls = self._download_xml(
'http://serviceapi.rmcnmv.naver.com/flash/playableEncodingOption.nhn?' + query_urls,
video_id, 'Downloading video formats info')
formats = [] formats = []
for format_el in urls.findall('EncodingOptions/EncodingOption'):
domain = format_el.find('Domain').text def extract_formats(streams, stream_type, query={}):
uri = format_el.find('uri').text for stream in streams:
f = { stream_url = stream.get('source')
'url': compat_urlparse.urljoin(domain, uri), if not stream_url:
'ext': 'mp4', continue
'width': int(format_el.find('width').text), stream_url = update_url_query(stream_url, query)
'height': int(format_el.find('height').text), encoding_option = stream.get('encodingOption', {})
} bitrate = stream.get('bitrate', {})
if domain.startswith('rtmp'): formats.append({
# urlparse does not support custom schemes 'format_id': '%s_%s' % (stream.get('type') or stream_type, encoding_option.get('id') or encoding_option.get('name')),
# https://bugs.python.org/issue18828 'url': stream_url,
f.update({ 'width': int_or_none(encoding_option.get('width')),
'url': domain + uri, 'height': int_or_none(encoding_option.get('height')),
'ext': 'flv', 'vbr': int_or_none(bitrate.get('video')),
'rtmp_protocol': '1', # rtmpt 'abr': int_or_none(bitrate.get('audio')),
'filesize': int_or_none(stream.get('size')),
'protocol': 'm3u8_native' if stream_type == 'HLS' else None,
}) })
formats.append(f)
extract_formats(video_data.get('videos', {}).get('list', []), 'H264')
for stream_set in video_data.get('streams', []):
query = {}
for param in stream_set.get('keys', []):
query[param['name']] = param['value']
stream_type = stream_set.get('type')
videos = stream_set.get('videos')
if videos:
extract_formats(videos, stream_type, query)
elif stream_type == 'HLS':
stream_url = stream_set.get('source')
if not stream_url:
continue
formats.extend(self._extract_m3u8_formats(
update_url_query(stream_url, query), video_id,
'mp4', 'm3u8_native', m3u8_id=stream_type, fatal=False))
self._sort_formats(formats) self._sort_formats(formats)
subtitles = {}
for caption in video_data.get('captions', {}).get('list', []):
caption_url = caption.get('source')
if not caption_url:
continue
subtitles.setdefault(caption.get('language') or caption.get('locale'), []).append({
'url': caption_url,
})
upload_date = self._search_regex(
r'<span[^>]+class="date".*?(\d{4}\.\d{2}\.\d{2})',
webpage, 'upload date', fatal=False)
if upload_date:
upload_date = upload_date.replace('.', '')
return { return {
'id': video_id, 'id': video_id,
'title': info.find('Subject').text, 'title': title,
'formats': formats, 'formats': formats,
'subtitles': subtitles,
'description': self._og_search_description(webpage), 'description': self._og_search_description(webpage),
'thumbnail': self._og_search_thumbnail(webpage), 'thumbnail': meta.get('cover', {}).get('source') or self._og_search_thumbnail(webpage),
'upload_date': info.find('WriteDate').text.replace('.', ''), 'view_count': int_or_none(meta.get('count')),
'view_count': int(info.find('PlayCount').text), 'upload_date': upload_date,
} }

View File

@@ -1,30 +0,0 @@
# coding: utf-8
from __future__ import unicode_literals
from .mtv import MTVServicesInfoExtractor
from ..compat import compat_urllib_parse_urlencode
class NextMovieIE(MTVServicesInfoExtractor):
IE_NAME = 'nextmovie.com'
_VALID_URL = r'https?://(?:www\.)?nextmovie\.com/shows/[^/]+/\d{4}-\d{2}-\d{2}/(?P<id>[^/?#]+)'
_FEED_URL = 'http://lite.dextr.mtvi.com/service1/dispatch.htm'
_TESTS = [{
'url': 'http://www.nextmovie.com/shows/exclusives/2013-03-10/mgid:uma:videolist:nextmovie.com:1715019/',
'md5': '09a9199f2f11f10107d04fcb153218aa',
'info_dict': {
'id': '961726',
'ext': 'mp4',
'title': 'The Muppets\' Gravity',
},
}]
def _get_feed_query(self, uri):
return compat_urllib_parse_urlencode({
'feed': '1505',
'mgid': uri,
})
def _real_extract(self, url):
mgid = self._match_id(url)
return self._get_videos_info(mgid)

View File

@@ -7,8 +7,9 @@ from ..utils import update_url_query
class NickIE(MTVServicesInfoExtractor): class NickIE(MTVServicesInfoExtractor):
# None of videos on the website are still alive?
IE_NAME = 'nick.com' IE_NAME = 'nick.com'
_VALID_URL = r'https?://(?:www\.)?nick\.com/videos/clip/(?P<id>[^/?#.]+)' _VALID_URL = r'https?://(?:www\.)?nick(?:jr)?\.com/(?:videos/clip|[^/]+/videos)/(?P<id>[^/?#.]+)'
_FEED_URL = 'http://udat.mtvnservices.com/service1/dispatch.htm' _FEED_URL = 'http://udat.mtvnservices.com/service1/dispatch.htm'
_TESTS = [{ _TESTS = [{
'url': 'http://www.nick.com/videos/clip/alvinnn-and-the-chipmunks-112-full-episode.html', 'url': 'http://www.nick.com/videos/clip/alvinnn-and-the-chipmunks-112-full-episode.html',
@@ -52,6 +53,9 @@ class NickIE(MTVServicesInfoExtractor):
} }
}, },
], ],
}, {
'url': 'http://www.nickjr.com/paw-patrol/videos/pups-save-a-goldrush-s3-ep302-full-episode/',
'only_matching': True,
}] }]
def _get_feed_query(self, uri): def _get_feed_query(self, uri):

View File

@@ -0,0 +1,72 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
int_or_none,
float_or_none,
ExtractorError,
)
class NineNowIE(InfoExtractor):
IE_NAME = '9now.com.au'
_VALID_URL = r'https?://(?:www\.)?9now\.com\.au/(?:[^/]+/){2}(?P<id>[^/?#]+)'
_TESTS = [{
# clip
'url': 'https://www.9now.com.au/afl-footy-show/2016/clip-ciql02091000g0hp5oktrnytc',
'md5': '17cf47d63ec9323e562c9957a968b565',
'info_dict': {
'id': '16801',
'ext': 'mp4',
'title': 'St. Kilda\'s Joey Montagna on the potential for a player\'s strike',
'description': 'Is a boycott of the NAB Cup "on the table"?',
'uploader_id': '4460760524001',
'upload_date': '20160713',
'timestamp': 1468421266,
},
'skip': 'Only available in Australia',
}, {
# episode
'url': 'https://www.9now.com.au/afl-footy-show/2016/episode-19',
'only_matching': True,
}, {
# DRM protected
'url': 'https://www.9now.com.au/andrew-marrs-history-of-the-world/season-1/episode-1',
'only_matching': True,
}]
BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/4460760524001/default_default/index.html?videoId=%s'
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
page_data = self._parse_json(self._search_regex(
r'window\.__data\s*=\s*({.*?});', webpage,
'page data'), display_id)
common_data = page_data.get('episode', {}).get('episode') or page_data.get('clip', {}).get('clip')
video_data = common_data['video']
if video_data.get('drm'):
raise ExtractorError('This video is DRM protected.', expected=True)
brightcove_id = video_data.get('brightcoveId') or 'ref:' + video_data['referenceId']
video_id = compat_str(video_data.get('id') or brightcove_id)
title = common_data['name']
thumbnails = [{
'id': thumbnail_id,
'url': thumbnail_url,
'width': int_or_none(thumbnail_id[1:])
} for thumbnail_id, thumbnail_url in common_data.get('image', {}).get('sizes', {}).items()]
return {
'_type': 'url_transparent',
'url': self.BRIGHTCOVE_URL_TEMPLATE % brightcove_id,
'id': video_id,
'title': title,
'description': common_data.get('description'),
'duration': float_or_none(video_data.get('duration'), 1000),
'thumbnails': thumbnails,
'ie_key': 'BrightcoveNew',
}

View File

@@ -0,0 +1,46 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from .ooyala import OoyalaIE
from ..utils import unescapeHTML
class NintendoIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?nintendo\.com/games/detail/(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'http://www.nintendo.com/games/detail/yEiAzhU2eQI1KZ7wOHhngFoAHc1FpHwj',
'info_dict': {
'id': 'MzMmticjp0VPzO3CCj4rmFOuohEuEWoW',
'ext': 'flv',
'title': 'Duck Hunt Wii U VC NES - Trailer',
'duration': 60.326,
},
'params': {
'skip_download': True,
},
'add_ie': ['Ooyala'],
}, {
'url': 'http://www.nintendo.com/games/detail/tokyo-mirage-sessions-fe-wii-u',
'info_dict': {
'id': 'tokyo-mirage-sessions-fe-wii-u',
'title': 'Tokyo Mirage Sessions ♯FE',
},
'playlist_count': 3,
}]
def _real_extract(self, url):
page_id = self._match_id(url)
webpage = self._download_webpage(url, page_id)
entries = [
OoyalaIE._build_url_result(m.group('code'))
for m in re.finditer(
r'class=(["\'])embed-video\1[^>]+data-video-code=(["\'])(?P<code>(?:(?!\2).)+)\2',
webpage)]
return self.playlist_result(
entries, page_id, unescapeHTML(self._og_search_title(webpage, fatal=False)))

View File

@@ -11,70 +11,64 @@ from ..utils import (
class NTVRuIE(InfoExtractor): class NTVRuIE(InfoExtractor):
IE_NAME = 'ntv.ru' IE_NAME = 'ntv.ru'
_VALID_URL = r'https?://(?:www\.)?ntv\.ru/(?P<id>.+)' _VALID_URL = r'https?://(?:www\.)?ntv\.ru/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_TESTS = [ _TESTS = [{
{ 'url': 'http://www.ntv.ru/novosti/863142/',
'url': 'http://www.ntv.ru/novosti/863142/', 'md5': 'ba7ea172a91cb83eb734cad18c10e723',
'md5': 'ba7ea172a91cb83eb734cad18c10e723', 'info_dict': {
'info_dict': { 'id': '746000',
'id': '746000', 'ext': 'mp4',
'ext': 'mp4', 'title': 'Командующий Черноморским флотом провел переговоры в штабе ВМС Украины',
'title': 'Командующий Черноморским флотом провел переговоры в штабе ВМС Украины', 'description': 'Командующий Черноморским флотом провел переговоры в штабе ВМС Украины',
'description': 'Командующий Черноморским флотом провел переговоры в штабе ВМС Украины', 'thumbnail': 're:^http://.*\.jpg',
'thumbnail': 're:^http://.*\.jpg', 'duration': 136,
'duration': 136,
},
}, },
{ }, {
'url': 'http://www.ntv.ru/video/novosti/750370/', 'url': 'http://www.ntv.ru/video/novosti/750370/',
'md5': 'adecff79691b4d71e25220a191477124', 'md5': 'adecff79691b4d71e25220a191477124',
'info_dict': { 'info_dict': {
'id': '750370', 'id': '750370',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Родные пассажиров пропавшего Boeing не верят в трагический исход', 'title': 'Родные пассажиров пропавшего Boeing не верят в трагический исход',
'description': 'Родные пассажиров пропавшего Boeing не верят в трагический исход', 'description': 'Родные пассажиров пропавшего Boeing не верят в трагический исход',
'thumbnail': 're:^http://.*\.jpg', 'thumbnail': 're:^http://.*\.jpg',
'duration': 172, 'duration': 172,
},
}, },
{ }, {
'url': 'http://www.ntv.ru/peredacha/segodnya/m23700/o232416', 'url': 'http://www.ntv.ru/peredacha/segodnya/m23700/o232416',
'md5': '82dbd49b38e3af1d00df16acbeab260c', 'md5': '82dbd49b38e3af1d00df16acbeab260c',
'info_dict': { 'info_dict': {
'id': '747480', 'id': '747480',
'ext': 'mp4', 'ext': 'mp4',
'title': '«Сегодня». 21 марта 2014 года. 16:00', 'title': '«Сегодня». 21 марта 2014 года. 16:00',
'description': '«Сегодня». 21 марта 2014 года. 16:00', 'description': '«Сегодня». 21 марта 2014 года. 16:00',
'thumbnail': 're:^http://.*\.jpg', 'thumbnail': 're:^http://.*\.jpg',
'duration': 1496, 'duration': 1496,
},
}, },
{ }, {
'url': 'http://www.ntv.ru/kino/Koma_film', 'url': 'http://www.ntv.ru/kino/Koma_film',
'md5': 'f825770930937aa7e5aca0dc0d29319a', 'md5': 'f825770930937aa7e5aca0dc0d29319a',
'info_dict': { 'info_dict': {
'id': '1007609', 'id': '1007609',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Остросюжетный фильм «Кома»', 'title': 'Остросюжетный фильм «Кома»',
'description': 'Остросюжетный фильм «Кома»', 'description': 'Остросюжетный фильм «Кома»',
'thumbnail': 're:^http://.*\.jpg', 'thumbnail': 're:^http://.*\.jpg',
'duration': 5592, 'duration': 5592,
},
}, },
{ }, {
'url': 'http://www.ntv.ru/serial/Delo_vrachey/m31760/o233916/', 'url': 'http://www.ntv.ru/serial/Delo_vrachey/m31760/o233916/',
'md5': '9320cd0e23f3ea59c330dc744e06ff3b', 'md5': '9320cd0e23f3ea59c330dc744e06ff3b',
'info_dict': { 'info_dict': {
'id': '751482', 'id': '751482',
'ext': 'mp4', 'ext': 'mp4',
'title': '«Дело врачей»: «Деревце жизни»', 'title': '«Дело врачей»: «Деревце жизни»',
'description': '«Дело врачей»: «Деревце жизни»', 'description': '«Дело врачей»: «Деревце жизни»',
'thumbnail': 're:^http://.*\.jpg', 'thumbnail': 're:^http://.*\.jpg',
'duration': 2590, 'duration': 2590,
},
}, },
] }]
_VIDEO_ID_REGEXES = [ _VIDEO_ID_REGEXES = [
r'<meta property="og:url" content="http://www\.ntv\.ru/video/(\d+)', r'<meta property="og:url" content="http://www\.ntv\.ru/video/(\d+)',
@@ -87,11 +81,21 @@ class NTVRuIE(InfoExtractor):
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
video_id = self._html_search_regex(self._VIDEO_ID_REGEXES, webpage, 'video id') video_url = self._og_search_property(
('video', 'video:iframe'), webpage, default=None)
if video_url:
video_id = self._search_regex(
r'https?://(?:www\.)?ntv\.ru/video/(?:embed/)?(\d+)',
video_url, 'video id', default=None)
if not video_id:
video_id = self._html_search_regex(
self._VIDEO_ID_REGEXES, webpage, 'video id')
player = self._download_xml( player = self._download_xml(
'http://www.ntv.ru/vi%s/' % video_id, 'http://www.ntv.ru/vi%s/' % video_id,
video_id, 'Downloading video XML') video_id, 'Downloading video XML')
title = clean_html(xpath_text(player, './data/title', 'title', fatal=True)) title = clean_html(xpath_text(player, './data/title', 'title', fatal=True))
description = clean_html(xpath_text(player, './data/description', 'description')) description = clean_html(xpath_text(player, './data/description', 'description'))

Some files were not shown because too many files have changed in this diff Show More