Compare commits

...

387 Commits

Author SHA1 Message Date
Sergey M․
73a85620ee release 2016.08.13 2016-08-13 23:17:11 +07:00
Sergey M․
a560f28c98 [ChangeLog] Actualize 2016-08-13 23:01:35 +07:00
Sergey M․
5ec5461e1a [pbs] Clarify comment on http formats 2016-08-13 22:50:18 +07:00
Sergey M․
542130a5d9 [pbs] Fix description extraction and update tests 2016-08-13 21:59:29 +07:00
Sergey M․
82997dad57 [franceculture] Fix extraction (Closes #10324) 2016-08-13 21:00:34 +07:00
Sergey M․
647a7bf5e8 [pornotube] Fix extraction (Closes #10322) 2016-08-13 20:49:16 +07:00
Sergey M․
77afa008dd [4tube] Fix metadata extraction (Closes #10321) 2016-08-13 19:55:09 +07:00
Yen Chi Hsuan
db535435b3 [bigflix] Remove an invalid test
There's no video anymore
2016-08-13 18:02:11 +08:00
Sergey M․
c2a453b461 [imgur] Fix width and height extraction (Closes #10325) 2016-08-13 16:46:07 +07:00
Sergey M․
cd29eaab95 [vbox7] Remove unused imports 2016-08-13 16:45:34 +07:00
Yen Chi Hsuan
52aa7e7476 [test_verbose_output] Fix tests under Python 3 2016-08-13 17:36:14 +08:00
Sergey M․
e97c55ee6a [expotv] Improve extraction and update test 2016-08-13 16:29:05 +07:00
Remita Amine
acfccacad5 [downloader/external:curl] Clarify why CurlFD should not capture stderr 2016-08-13 10:26:02 +01:00
Remita Amine
5f2c2b7936 [test_utils] add test for option with not str value 2016-08-13 09:54:12 +01:00
Sergey M․
cb55908e51 [vbox7] Fix extraction (Closes #10309) 2016-08-13 15:47:20 +07:00
Yen Chi Hsuan
e581224843 [tapely] Remove extractor. It's shut down
Closes #10323
2016-08-13 16:32:07 +08:00
Remita Amine
f50365e91c [pbs] add test for videos with undocumented http formats and remove unused import 2016-08-13 09:10:09 +01:00
Sergey M․
c366f8d30a [24video] Add support for me and xxx TLDs 2016-08-13 14:47:51 +07:00
Sergey M․
6a26c5f9d5 [muenchentv] Fix extraction (Closes #10313) 2016-08-13 14:28:44 +07:00
Sergey M․
bd6fb007de [24video] Fix comment count extraction 2016-08-13 14:22:47 +07:00
Sergey M․
b69b2ff736 [sunporno] Add support for embed URLs 2016-08-13 14:13:49 +07:00
Sergey M․
794e5dcd7e [sunporno] Fix metadata extraction (Closes #10316) 2016-08-13 14:09:35 +07:00
Remita Amine
f0d3669437 [hgtv] Add new extractor(closes #3999) 2016-08-12 18:05:49 +01:00
Remita Amine
98e698f1ff [external/curl] respect more downloader options and display progress 2016-08-12 12:30:02 +01:00
Remita Amine
3cddb8d6a7 [pbs] check all http formats and remove unnecessary request
- some of the quality that not reported in the documentation
are available(4500k, 6500k)
- the videoInfo request doesn't work for a long time
2016-08-12 08:38:06 +01:00
Sergey M․
990d533ee4 [crunchyroll] Add support for HLS (Closes #10301) 2016-08-12 00:56:16 +07:00
Sergey M․
b0081562d2 release 2016.08.12 2016-08-12 00:22:22 +07:00
Sergey M․
fff37cfd4f [ChangeLog] Actualize 2016-08-12 00:18:28 +07:00
Sergey M․
a3be69b7f0 [viu] Remove from extractors 2016-08-12 00:14:51 +07:00
Sergey M․
0fd1b1624c [goldenmoustache] Remove extractor (Closes #10298)
Now uses dailymotion
2016-08-11 23:52:17 +07:00
Sergey M․
367976d49f [drtuber] Improve title extraction 2016-08-11 23:47:52 +07:00
Sergey M․
0aef0771f8 [drtuber] Make dislike count optional (Closes #10297) 2016-08-11 23:47:27 +07:00
Sergey M․
0c070681c5 [chirbit] Fix extraction (Closes #10296) 2016-08-11 23:37:56 +07:00
Sergey M․
30b25d382d [francetvinfo] Relax _VALID_URL 2016-08-11 21:42:55 +07:00
Yen Chi Hsuan
e5f878c205 [ChangeLog] Add change log for #10269
[skip ci]
2016-08-11 19:13:41 +08:00
Yen Chi Hsuan
e2e84aed7e Merge branch 'lkho-pr/#10268' 2016-08-11 19:09:18 +08:00
Yen Chi Hsuan
b1927f4e8a [YoutubeDL] Disable newline conversion when writing subtitles
By default io.open() convert all '\n' occurrences to '\r\n' when writing
files. If the content already contains '\r\n', it will be converted to
'\r\r\n', breaking some video players.
2016-08-11 19:04:23 +08:00
Yen Chi Hsuan
3b9323d96e Merge branch 'pr/#10268' of https://github.com/lkho/youtube-dl into lkho-pr/#10268 2016-08-11 19:03:08 +08:00
lkho
7f832413d6 Preserve line endings for downloaded subtitle files 2016-08-10 23:40:50 +08:00
Sergey M․
7f2ed47595 [rtlnl] Relax _VALID_URL (Closes #10282) 2016-08-10 21:07:43 +07:00
Sergey M․
c3fa77bdef [formula1] Relax _VALID_URL (Closes #10283) 2016-08-10 21:00:40 +07:00
Remita Amine
57ce8a6d08 [wat] improve extraction(#10281)
add alternative method to extract http formats
works even if the video is geo-restricted or removed
from public access(most of the cases)
2016-08-10 14:20:28 +01:00
Yen Chi Hsuan
69d8eeeec5 [ctsnews] Fix extraction 2016-08-10 11:38:38 +08:00
Yen Chi Hsuan
81c13222c6 [utils] Recognize more formats in unified_timestamp
Used in CtsNews
2016-08-10 11:37:23 +08:00
Sergey M․
b1ce2ba197 release 2016.08.10 2016-08-10 00:20:44 +07:00
Sergey M․
5c8411e968 [ChangeLog] Actualize 2016-08-10 00:18:28 +07:00
Sergey M․
cc9c8ce5df [devscripts/prepare_manpage] Fix description strings starting with dash (Closes #10273) 2016-08-09 22:24:58 +07:00
Remita Amine
20ef4123b9 [uol] remove unused import 2016-08-09 15:13:15 +01:00
Remita Amine
4e62d26aa2 [uol] Add new extractor(#4263) 2016-08-09 15:09:08 +01:00
Sergey M․
b657816684 Credit @singh-pratyush96 for #10223 2016-08-09 04:04:45 +07:00
Sergey M․
9778b3e7ee Credit @zvonicek for #10242 and #10253 2016-08-09 04:03:52 +07:00
Sergey M․
25dd58ca6a [metadatafromtitle] Remove unused exception class 2016-08-09 04:01:05 +07:00
nyorain
5e42f8a0ad Make --metadata-from-title non fatal
Output a warning if the metadata can't be parsed from the title (and don't write any metadata) instead of raising a critical error.
2016-08-09 03:56:22 +07:00
Sergey M․
1ad6b891b2 Add more checks for --min/max-sleep-interval arguments and use more idiomatic naming 2016-08-09 03:47:56 +07:00
Sergey M․
7aa589a5e1 Fix --min/max-sleep-interval wording 2016-08-09 03:46:52 +07:00
singh-pratyush96
065bc35489 Add --max-sleep-interval (Closes #9930) 2016-08-09 03:32:42 +07:00
Sergey M․
3a380766d1 [rbmaradio] Improve, simplify and extract all formats (Closes #10242) 2016-08-09 02:46:29 +07:00
Petr Zvoníček
affaea0688 [rbmaradio] Fixed extractor 2016-08-09 02:18:33 +07:00
Sergey M․
77426a087b [sonyliv] Improve (Closes #10258) 2016-08-09 02:16:28 +07:00
Sukhbir Singh
8991844ea2 [sonyliv] Add new extractor 2016-08-09 02:09:13 +07:00
Sergey M․
082395d0a0 [extractor/generic] Add proper default to _search_json_ld call 2016-08-08 22:48:33 +07:00
Sergey M․
e8ed7354e6 [flipagram] Add proper default to _search_json_ld call 2016-08-08 22:46:19 +07:00
Sergey M․
1e7f602e2a [condenast] Make _search_json_ld call non fatal 2016-08-08 22:45:49 +07:00
Sergey M․
522f6c066d [bbc] Add proper default to _search_json_ld call 2016-08-08 22:44:36 +07:00
Sergey M․
321b5e082a [extractor/common] Respect default in _search_json_ld 2016-08-08 22:36:18 +07:00
Sergey M․
3711fa1eb2 Revert "[flipagram] Make _search_json_ld non fatal"
This reverts commit d34995a9e3.
2016-08-08 21:49:45 +07:00
Sergey M․
395c74615c Revert "[extractor/generic] Make _search_json_ld non fatal"
This reverts commit 958849275f.
2016-08-08 21:49:27 +07:00
Yen Chi Hsuan
3dc240e8c6 [sohu] Update _TESTS (closes #10260) 2016-08-08 18:48:21 +08:00
Yen Chi Hsuan
a41a6c5094 [chaturbate] Skip the invalid test 2016-08-08 13:06:02 +08:00
Yen Chi Hsuan
d71207121d [biqle] Skip an invalid test 2016-08-08 12:59:55 +08:00
Yen Chi Hsuan
b1c6f21c74 [aparat] Fix extraction 2016-08-08 12:59:07 +08:00
Yen Chi Hsuan
412abb8760 [bilibili] Update _TESTS 2016-08-08 12:57:17 +08:00
Yen Chi Hsuan
f17d5f6d14 [features.aol.com] Fix _TESTS 2016-08-08 12:52:36 +08:00
Remita Amine
6bb801cfaf [cwtv] extract http formats 2016-08-07 22:58:12 +01:00
Sergey M․
de02d1f4e9 [rozhlas] Fix regexes and improve extraction (Closes #10253) 2016-08-08 04:58:02 +07:00
Petr Zvoníček
e1f93a0a76 [rozhlas] Add new extractor 2016-08-08 04:41:45 +07:00
Charlie Le
d21a661bb4 [README.md] Update Options Link
The link references a bad anchor. The updated link now references the correct anchor.
2016-08-08 03:46:42 +07:00
Yen Chi Hsuan
b2bd968f4b [kuwo:singer] Fix extraction 2016-08-07 22:59:34 +08:00
Sergey M․
4a01befb34 release 2016.08.07 2016-08-07 21:12:41 +07:00
Sergey M․
845dfcdc40 [ChangeLog] Actualize 2016-08-07 21:10:48 +07:00
Sergey M․
d92cb46305 [discoverygo] Add extractor (Closes #10245) 2016-08-07 20:57:05 +07:00
Sergey M․
a8795327ca [utils] Add support TV Parental Guidelines ratings in parse_age_limit 2016-08-07 20:45:18 +07:00
Sergey M․
d34995a9e3 [flipagram] Make _search_json_ld non fatal 2016-08-07 19:06:55 +07:00
Sergey M․
958849275f [extractor/generic] Make _search_json_ld non fatal 2016-08-07 19:04:22 +07:00
Sergey M․
998f094452 [bbc] Remove proxy from test 2016-08-07 18:13:05 +07:00
Sergey M․
aaa42cf0cf [bbc] PEP 8 2016-08-07 18:05:13 +07:00
Sergey M․
9fb64c04cd [bbc] Add support for morph embeds (Closes #10239) 2016-08-07 18:01:50 +07:00
Remita Amine
f9622868e7 [bbc] preserve format_id backward compatibility 2016-08-07 11:14:15 +01:00
Remita Amine
37768f9242 [common] correctly lower the preference of m3u8 master manifest format 2016-08-07 10:59:09 +01:00
Sergey M․
a1aadd09a4 [tnaflixnetworkbase] Improve title extraction 2016-08-07 16:00:09 +07:00
Sergey M․
b47a75017b [tnaflix] Fix metadata extraction (Closes #10249) 2016-08-07 16:00:03 +07:00
Remita Amine
e37b54b140 [fox] fix theplatform release url query 2016-08-06 20:53:39 +01:00
Yen Chi Hsuan
c1decda58c [openload] Fix extraction (closes #9706) 2016-08-07 02:44:15 +08:00
Yen Chi Hsuan
d3f8e038fe [utils] Add decode_png for openload (#9706) 2016-08-07 02:42:58 +08:00
Remita Amine
ad152e2d95 [bbc] fix test 2016-08-06 19:36:12 +01:00
Remita Amine
b0af12154e [bbc] reduce requests and improve format_id 2016-08-06 19:24:59 +01:00
Remita Amine
d16b3c6677 [common] extract partOfTVSeries info in json-ld 2016-08-06 18:58:38 +01:00
Remita Amine
c57244cdb1 [common] lower the preference of m3u8 master manifest format 2016-08-06 18:55:05 +01:00
Remita Amine
a7e5f27412 [bbc] improve extraction
- extract f4m and dash formats
- improve format sorting and listing
- improve extraction of articles with `otherSettings.playlist`
2016-08-06 18:48:09 +01:00
Remita Amine
089a40955c [pokemon] improve _VALID_URL 2016-08-06 12:08:14 +01:00
Remita Amine
d73ebac100 [pokemon] Add new extractor(closes #10093) 2016-08-06 11:18:14 +01:00
Remita Amine
e563c0d73b [condenast] fallback to loader.js if video.js fail 2016-08-05 21:01:16 +01:00
Sergey M․
491c42e690 release 2016.08.06 2016-08-06 01:23:48 +07:00
Sergey M․
7f2339c617 [ChangeLog] Actualize 2016-08-06 01:19:47 +07:00
Sergey M․
8122e79fef [gamekings] Remove remnants 2016-08-06 00:12:37 +07:00
Sergey M․
fe3ad1d456 [adultswim] Remove superfluous md5 from test 2016-08-06 00:02:05 +07:00
Sergey M․
038a5e1a65 [adultswim] Add support for trailers (Closes #10235) 2016-08-06 00:00:05 +07:00
Sergey M․
84bc23b41b [archiveorg] PEP 8 2016-08-05 23:16:19 +07:00
Sergey M․
46933a15d6 [extractor/common] Support root JSON-LD lists (Closes #10203) 2016-08-05 23:14:32 +07:00
Sergey M․
3859ebeee6 [tvplay] Capture and output native error message 2016-08-05 22:50:42 +07:00
Remita Amine
d50aca41f8 [archiveorg] improve format extraction(closes #10219) 2016-08-05 16:42:15 +01:00
Remita Amine
0ca057b965 [jwplatform] add support for playlist extraction and relative urls and improve audio detection 2016-08-05 16:42:15 +01:00
Sergey M․
5ca968d0a6 [tvplay] Extract series metadata 2016-08-05 22:37:38 +07:00
Sergey M․
f0d31c624e [tvplay] Add support for subtitles (Closes #10194) 2016-08-05 22:17:32 +07:00
Remita Amine
08c655906c [5min] fix _VALID_URL(closes #10228) 2016-08-05 10:22:33 +01:00
Remita Amine
5a993e1692 [natgeo] fix tests(closes #10229) 2016-08-05 10:13:26 +01:00
Remita Amine
a7d2953073 [extractors] add tvp:embed import 2016-08-05 10:11:59 +01:00
Remita Amine
fdd0b8f8e0 [tvp] extract video id from the webpage(fixes #7799) 2016-08-05 09:44:15 +01:00
Remita Amine
f65dc41b72 [naver] extract upload date 2016-08-05 08:12:25 +01:00
Yen Chi Hsuan
962250f7ea [cbslocal] Fix timestamp parsing (closes #10213) 2016-08-05 11:44:50 +08:00
Yen Chi Hsuan
7dc2a74e0a [utils] Fix unified_timestamp for formats parsed by parsedate_tz() 2016-08-05 11:41:55 +08:00
Remita Amine
b02b960c6b [naver] improve extraction(closes #8096) 2016-08-04 21:42:22 +01:00
Remita Amine
4f427c4be8 [condenast] improve extraction 2016-08-04 18:30:56 +01:00
Sergey M․
8a00ea567b [natgeo:episodeguide] Do not shadow url from outer scope 2016-08-04 23:21:04 +07:00
Remita Amine
8895be01fc [5min] fix _VALID_URL 2016-08-04 16:55:12 +01:00
Remita Amine
52e7fcfeb7 [engadget] Relax _VALID_URL 2016-08-04 16:34:47 +01:00
Remita Amine
2396062c74 [5min] delegate extraction to AolIE
recently the 5min SenseHandler request return
HTTP Error 503: Service Unavailable error
2016-08-04 16:21:27 +01:00
Remita Amine
14704aeff6 [kaltura] remove debugging line 2016-08-04 14:54:34 +01:00
Remita Amine
3c2c3af059 [extractors] change imports for national geographic extractors 2016-08-04 12:20:56 +01:00
Remita Amine
1891ea2d76 [nationalgeographic] Add support for National Geographic Episode Guide 2016-08-04 12:18:10 +01:00
Remita Amine
1094074c04 [kaltura] extract subtitles and reduce requests 2016-08-04 09:39:06 +01:00
Remita Amine
217d5ae013 [vodplatform] Add new extractor 2016-08-04 09:39:06 +01:00
Remita Amine
8b40854529 [common] lower proto_preference of rtsp formats
Most of the time the RtspFD fail to download videos but it report
success of the download with this output:
[mpv] 0 bytes
[download] 100% of 0.00B
2016-08-04 09:39:06 +01:00
Yen Chi Hsuan
6bb0fbf9fb Revert "[README.md] Use full paths for all configuration files (#8863)"
This reverts commit 899d2bea63.
2016-08-04 09:54:28 +08:00
Sergey M․
8d3b226b83 [gamekings] Remove extractor
Now covered by generic jwplayer
2016-08-03 22:06:10 +07:00
Remita Amine
42b7a5afe0 [limelight] extract http formats 2016-08-03 13:12:51 +01:00
Yen Chi Hsuan
899d2bea63 [README.md] Use full paths for all configuration files (#8863) 2016-08-03 11:15:27 +08:00
Sergey M․
9cb0e65d7e [ntvru] Fix extraction 2016-08-02 22:56:48 +07:00
Sergey M․
b070564efb [extractor/common] Support multiple properties in _og_search_property 2016-08-02 22:55:14 +07:00
Philipp Hagemeister
ce28252c48 [options] Add test that checks that --password=secret is hidden in verbose output 2016-08-02 17:03:46 +02:00
Philipp Hagemeister
3aa9a73554 [options] Hide --password=secret in verbose output 2016-08-02 17:03:26 +02:00
Philipp Hagemeister
6a9b3b61ea [comedycentral] Re-add shortnames
In cc99d4f826, the shortname feature got deleted by accident. Re-add it as a separate IE.
2016-08-02 14:02:31 +02:00
Sergey M․
45408eb075 release 2016.08.01 2016-08-01 22:59:23 +07:00
Sergey M․
eafc66855d [ChangeLog] Add recent changes 2016-08-01 22:56:01 +07:00
Sergey M․
e03d3e6453 [cwtv] Add support for cwtvpr.com (Closes #10196) 2016-08-01 22:51:01 +07:00
Remita Amine
a70e45f80a [limelight] keep videos marked as previewStream
e382b953f0 (commitcomment-18472915)
2016-08-01 16:25:41 +01:00
Sergey M․
697655a7c0 [safari] Relax url regexes (Closes #10202) 2016-08-01 21:48:48 +07:00
Remita Amine
e382b953f0 [limelight] skip preview and drm protected videos 2016-08-01 00:33:30 +01:00
Yen Chi Hsuan
116e7e0d04 [bloomberg] Support BPlayer() players (closes #10187) 2016-07-31 14:47:19 +08:00
Sergey M․
cf03e34ad3 [yandexmusic:track] Fix extraction (Closes #10193) 2016-07-31 07:56:18 +07:00
Sergey M․
2903137292 release 2016.07.30 2016-07-30 14:45:07 +07:00
Sergey M․
9361f2169c [ChangeLog] Make extractor improvements' descriptions more concrete 2016-07-30 14:43:28 +07:00
Yen Chi Hsuan
35aa6c538f Add ChangeLog 2016-07-30 12:33:09 +08:00
Sergey M․
fa9f1d16b8 [dailymotion:playlist] Carry long line 2016-07-29 22:47:34 +07:00
Dave
485fedf6fd [dailymotion:playlist] Optimize download archive processing 2016-07-29 22:45:41 +07:00
Jaime Marquínez Ferrándiz
da0baba5c8 [rtve] Fix extraction for some videos
For example http://www.rtve.es/alacarta/videos/documentos-tv/documentos-tv-descredito/3574098/.
2016-07-29 17:20:27 +02:00
Jaime Marquínez Ferrándiz
bb9f3bfedf Revert "[rtve] Fix extraction (#10076)"
This reverts commit c39b2ed990.

Apparently outside of Spain using 'auth/resources' is required (#10097).
2016-07-29 17:14:04 +02:00
Sergey M․
dbc0b39b91 [tv2] Improve extraction 2016-07-29 22:01:34 +07:00
Sergey M․
481c5c5137 [tv2:article] Fix extraction (Closes #10188) 2016-07-29 21:43:17 +07:00
Sergey M․
0cacae2807 [twitch:clips] Sort formats 2016-07-29 09:01:53 +07:00
Sergey M․
d9d56deadf release 2016.07.28 2016-07-28 02:42:57 +07:00
Sergey M․
74ba450a81 [twitch:clips] Fix extraction (Closes #9767) 2016-07-28 22:30:09 +07:00
Sergey M․
db19df6ca0 [extractor/generic] Add test for #10179 2016-07-28 22:20:08 +07:00
Sergey M․
fbdf8d15d1 [soundcloud] Add _extract_urls (#10179) 2016-07-28 22:16:05 +07:00
Sergey M․
94aae01548 [extractor/generic] Extract all soundcloud embeds (Closes #10179) 2016-07-28 22:15:15 +07:00
Sergey M․
39eef54cf0 [ard:mediathek] Skip unavailable test 2016-07-28 21:38:23 +07:00
Sergey M․
05c8268c81 [shared] Modernize and make more robust 2016-07-27 23:39:02 +07:00
Sergey M․
289a16b4f3 [shared] Respect redirect URL (Closes #10170) 2016-07-27 23:28:01 +07:00
Sergey M․
7935926baa [devscripts/show-downloads-statistics] Add support for paging 2016-07-27 00:14:40 +07:00
Sergey M․
dcbb07c35a release 2016.07.26.2 2016-07-26 23:56:53 +07:00
Sergey M․
40090e8d51 [extractor/common] Improve is_suitable
In order to fix breakage introduced by a3aa814b77
2016-07-26 23:54:06 +07:00
Sergey M․
3e050d51d4 [orf:oe1] Relax _VALID_URL 2016-07-26 23:14:04 +07:00
Sergey M․
ced70c8640 [cbc] PEP 8 2016-07-26 23:08:08 +07:00
Sergey M․
9a700deea4 [instagram] Remove duplicate field in test 2016-07-26 23:07:16 +07:00
Sergey M․
dc35ba0eba [mgtv] Fix typo 2016-07-26 23:06:21 +07:00
Sergey M․
88bd486b9a [cbc] Improve extraction for videos embedded with clipId 2016-07-26 22:58:50 +07:00
Sergey M․
7f8b92e3cf [bigflix] Update tests 2016-07-26 21:44:53 +07:00
Yen Chi Hsuan
35f6e0ff36 [mtv.de] Skip 2 geo-restricted tests 2016-07-26 13:19:47 +08:00
Yen Chi Hsuan
326fa4e6e5 [generic] Skip an invalid test 2016-07-26 13:16:04 +08:00
Yen Chi Hsuan
c74299a72c [cmt] Detect unavailable videos and update _TESTS 2016-07-26 13:13:14 +08:00
Yen Chi Hsuan
10a1bb3a78 [mtv] Fix for videos with missing bitrates 2016-07-26 13:12:24 +08:00
Yen Chi Hsuan
4d3e543c73 Update extractors.py 2016-07-26 11:17:28 +08:00
Yen Chi Hsuan
05d1e7aaa9 [generic] Fix an MTV test and another test that breaks nosetests 2016-07-26 11:11:36 +08:00
Yen Chi Hsuan
a3aa814b77 Update _TESTS for MTV sites 2016-07-26 11:10:41 +08:00
Yen Chi Hsuan
5c32a77cad [nextmovie] Remove extractor
This domain name now redirects to mtv.com
2016-07-26 11:08:55 +08:00
Yen Chi Hsuan
14a28e705b [test/test_all_urls] Remove *.cc.com tests 2016-07-26 11:08:09 +08:00
Yen Chi Hsuan
cc99d4f826 [comedycentral] Remove IEs for *.cc.com except tosh.cc.com
All other subdomains now redirects to cc.com/* URLs
2016-07-26 11:06:50 +08:00
Yen Chi Hsuan
712c7530ff [mtv] Extract more metadata and more
1. Remove MTVIggyIE. All www.mtviggy.com URLs now redirects to
   www.mtv.com
2. Fix MTVDEIE
3. Return multiple URLs from _transform_rtmp_url. This is for
   tosh.cc.com
2016-07-26 11:03:43 +08:00
Sergey M․
0a147785e8 [camdemy] Extract duration properly 2016-07-25 23:03:58 +07:00
Sergey M․
59eaf69e33 [camdemy] Fix camdemy 2016-07-25 23:03:43 +07:00
Sergey M․
e8be2943a7 [smotri] Modernize, make more robust and fix tests 2016-07-24 18:38:18 +07:00
Sergey M․
8fdc538b46 release 2016.07.24 2016-07-24 11:39:50 +07:00
Sergey M․
9513c1eb17 [tvp] Update dash format comment 2016-07-24 11:03:39 +07:00
Sergey M․
ae6fff4e64 [onet] Enable dash formats 2016-07-24 10:43:05 +07:00
Sergey M․
5a65668e25 [dcn] Enable dash formats 2016-07-24 10:35:55 +07:00
Sergey M․
f75e6890db [telegraaf] Make hls non fatal 2016-07-24 10:29:26 +07:00
Sergey M․
d9cb92c840 [telegraaf] Enable dash formats 2016-07-24 10:29:09 +07:00
Sergey M․
94c04a3c79 [arkena] Enable dash formats 2016-07-24 10:28:11 +07:00
Sergey M․
f094834857 [extractor/common] Add support for $ in SegmentTemplate in MPD manifests 2016-07-24 10:27:16 +07:00
Déstin Reed
111de00289 [DailyMail] Improve title and description extraction 2016-07-24 05:37:13 +07:00
Sergey M․
b4a131e1a5 [facebook] Relax _VALID_URL (Closes #10151) 2016-07-24 04:36:49 +07:00
Sergey M․
f1991ce928 [arkena] Skip dash formats 2016-07-23 18:07:55 +07:00
Sergey M․
6548030a17 Credit @rvanbekkum for arkena (#8682) 2016-07-23 18:00:19 +07:00
Sergey M․
3a8947650b [arkenaplay] Remove extractor 2016-07-23 17:57:55 +07:00
Sergey M․
1979969f91 [extractor/generic] Add support for arkena embeds 2016-07-23 17:56:48 +07:00
Sergey M․
0673741af3 [extractors] Add imports for arkena and lcp 2016-07-23 17:56:29 +07:00
Sergey M․
c8e170b209 [lcp] Improve extraction 2016-07-23 17:56:11 +07:00
Sergey M․
bbe1f3634a [arkena] Improve extraction (Closes #8682) 2016-07-23 17:55:54 +07:00
Rob van Bekkum
4671dd41b2 [arkena:lcp] Add extractors 2016-07-23 17:01:09 +07:00
Sergey M․
f164b97123 [utils] Add another f4m mimetype to mimetype2ext 2016-07-23 16:48:59 +07:00
Sergey M․
5275efe30d release 2016.07.22 2016-07-22 23:11:28 +07:00
Sergey M․
b13647cf3c [eporner] Fix extraction (Closes #10139) 2016-07-22 23:04:13 +07:00
Sergey M․
add7d2a0e2 [pornhub] Make error regex less ambiguous (Closes #10138) 2016-07-22 21:24:09 +07:00
Sergey M․
e298d3a08c [youtube] Fix authentication (Closes #10140) 2016-07-22 21:05:39 +07:00
Sergey M․
fd8c8c7dcd [youtube:shared] Relax _VALID_URL 2016-07-21 22:58:34 +07:00
Sergey M․
9158af16cc [bbc.co.uk:iplayer:playlist] Add support for group URLs 2016-07-21 22:37:36 +07:00
Sergey M․
c6668e4ad1 [bbc.co.uk:iplayer:playlist] Skip unavailable test 2016-07-21 22:34:55 +07:00
Sergey M․
84e8cca48b [youjizz] Relax _VALID_URL (Closes #10131) 2016-07-20 22:41:13 +07:00
Sergey M․
790b06b7d4 [odatv] Improve (Closes #9285) 2016-07-20 21:43:22 +07:00
skacurt
740d7c49c2 [odatv] Add extractor 2016-07-20 21:42:05 +07:00
Sergey M․
4e51ec5f57 [extractors] Add import for comedycentral.tv 2016-07-19 22:50:37 +07:00
Sergey M․
05087d1b4c [bbc] Improve extraction from sxml playlists 2016-07-19 22:49:38 +07:00
Sergey M․
a66a73ee90 [ard] Add test for rbb-online 2016-07-18 02:25:31 +07:00
Sergey M․
8188b923db release 2016.07.17 2016-07-17 19:04:29 +07:00
Sergey M․
d993a1354d [README.md] Make download URLs consistent 2016-07-17 18:58:47 +07:00
Sergey M․
e8882e7043 [spike] Relax _VALID_URL and improve extraction (Closes #10106) 2016-07-17 18:34:25 +07:00
Sergey M․
1056821799 [viki] Fix tests (Closes #10098) 2016-07-17 18:13:54 +07:00
Sergey M․
890e6d3309 [viki] Lower m3u8 preference
http URLs are always provde the same or better quality
2016-07-17 18:12:03 +07:00
Sergey M․
246080d378 [viki] Override m3u8 formats acodec 2016-07-17 18:10:16 +07:00
Sergey M․
b1ea680270 Revert "[bbc] extract more and better qulities from Unified Streaming Platform m3u8 manifests"
This reverts commit 0385aa6199.
2016-07-17 17:29:36 +07:00
Sergey M․
45550d1039 [comedycentraltv] Add extractor (Closes #10101) 2016-07-17 16:58:58 +07:00
Sergey M․
7cdfc4c90f [mtvservices] Strip description 2016-07-17 16:56:39 +07:00
Sergey M․
af21f56f98 [ard] Add support for rbb-online (Closes #10095) 2016-07-17 03:40:58 +07:00
Sergey M․
1a8f0773b6 [streamable] Fix title extraction and improve (Closes #9122) 2016-07-17 02:01:00 +07:00
Zach Bruggeman
59cc5bd8bf [streamable] Add extractor 2016-07-17 01:35:09 +07:00
Sergey M․
49bc16b95e [nintendo] Improve playlist extraction (Closes #9986) 2016-07-17 00:01:25 +07:00
TRox1972
a2f9ca1e67 [nintendo] Add extractor 2016-07-16 23:58:53 +07:00
Sergey M․
371ddb14fe [extractor/generic] Change twitter:player embeds priority to lowest (Closes #10090) 2016-07-16 15:59:43 +07:00
Yen Chi Hsuan
998895dffa [cloudy] Drop videoraj.to
videoraj.ch is now a shoe-selling website, and videoraj.to domain name
is gone.
2016-07-16 15:37:54 +08:00
Yen Chi Hsuan
aadd3ce21f [cliphunter] Update _TESTS 2016-07-16 15:37:54 +08:00
Yen Chi Hsuan
ae7b846203 [cbsnews] Update _TESTS of CBSNewsLiveVideoIE 2016-07-16 15:37:54 +08:00
Yen Chi Hsuan
21ba7d0981 [cbc] Skip geo-restricted test case 2016-07-16 15:37:54 +08:00
Sergey M․
691fbe7f98 release 2016.07.16 2016-07-16 02:20:00 +07:00
Sergey M․
2e221ca3a8 [YoutubeDL] Fix incomplete formats check 2016-07-16 01:18:05 +07:00
Sergey M․
317f7ab634 [YoutubeDL] Fix format selection with filters (Closes #10083) 2016-07-16 00:55:43 +07:00
Yen Chi Hsuan
23495d6a39 Revert "[ffmpeg] Fix embedding subtitles (#9063)"
This reverts commit ccff2c404d.

Fixes #10081.

The new approach breaks embedding subtitles into video-only or
audio-only files. FFMpeg provides a trick: add '?' after the argument of
'-map' so that a missing stream is ignored. For example:

opts = [
    '-map', '0:v?',
    '-c:v', 'copy',
    '-map', '0:a?',
    '-c:a', 'copy',
    # other options...
]

Unfortunately, such a format is not implemented in avconv, either.
I guess adding '-ignore_unknown' if self.basename == 'ffmpeg' is the
best solution. However, the example mentioned in #9063 no longer serves
problematic files, so I can't test it. I'll reopen #9063 and wait for
another example so that I can test '-ignore_unknown'.
2016-07-15 20:02:36 +08:00
Remita Amine
224db034ab [syfy] fix extraction(closes #9087)(closes #3820)(closes #2388) 2016-07-14 23:59:47 +01:00
Sergey M․
ad27649be3 [3qsdn] Restrict src JS regex 2016-07-15 03:36:50 +07:00
Sergey M․
84571be645 [orf:tvthek] Remove test md5 2016-07-15 03:17:29 +07:00
Nehal Patel
7b0d333a7e Fix unit tests for m3u8 and RTSP extractors that require ffmpeg or mplayer 2016-07-15 03:06:23 +07:00
Remita Amine
342f0c3682 [ninenow] correct test url 2016-07-14 14:19:18 +01:00
Remita Amine
38e0f16a94 [ninenow] Add new extractor(closes #5181) 2016-07-14 14:16:11 +01:00
Remita Amine
e910fe2fe4 [brightcove] skip ism manifests 2016-07-14 14:13:57 +01:00
Jaime Marquínez Ferrándiz
233b58dec7 Add extractor for rtve.es/television (fixes #10076) 2016-07-13 21:02:34 +02:00
Jaime Marquínez Ferrándiz
c39b2ed990 [rtve] Fix extraction (#10076)
For http://www.rtve.es/alacarta/videos/documentos-tv/documentos-tv-revolucion-del-movil/3069778/ using 'auth/resources' fails, and other URLs seem to work fine.
2016-07-13 20:23:27 +02:00
Remita Amine
35ec86689c [bbc] extract only the original Unified Streaming Platform m3u8 manifests
0385aa6199 (commitcomment-18233275)
manifests with higher birate require more time to check formats
2016-07-13 18:01:14 +01:00
Sergey M․
c485959034 release 2016.07.13 2016-07-13 23:58:01 +07:00
Sergey M․
a0560d8ab8 [ellentv] Improve extraction (Closes #10067) 2016-07-13 22:42:53 +07:00
Remita Amine
0385aa6199 [bbc] extract more and better qulities from Unified Streaming Platform m3u8 manifests 2016-07-13 15:58:24 +01:00
Remita Amine
00f4764cb7 [common] extract vbr, abr and fps for Unified Streaming Platform m3u8 manifests 2016-07-13 15:58:24 +01:00
Sergey M․
51c2cd0b83 [extractors] Add vk:wallpost extractor import 2016-07-13 21:53:23 +07:00
Sergey M․
5f5a9d6158 [vk] Improve login 2016-07-13 21:52:52 +07:00
Sergey M․
2d19fb5072 [vk:wallpost] Add extractor 2016-07-13 21:51:44 +07:00
Yen Chi Hsuan
9d865a1af6 [travis] Skip downloading srelay
SOCKS tests never run on Travis CI due to unknown reasons, and
downloading them broke some tests (e.g.
https://travis-ci.org/rg3/youtube-dl/builds/144306425)
2016-07-13 14:27:14 +08:00
Remita Amine
41aa44259d [shahid] try to bypass geo restriction and extract more metadata(closes #10062) 2016-07-12 23:15:38 +01:00
Philipp Hagemeister
381ff44756 [devscripts/generate-download] Remove MD5 and SHA1 2016-07-12 09:09:54 +02:00
Sergey M․
7f29cf545a [youtube] Add YouTube Red paid video reference test (#10059) 2016-07-12 02:10:35 +07:00
Remita Amine
7d1219f3e0 [tmz] delegate extraction to KalturaIE 2016-07-11 19:08:22 +01:00
Remita Amine
f1b4af7d79 [beightcove:new] remove html tags from description 2016-07-11 19:06:50 +01:00
Remita Amine
8a8590a617 [dbtv] delegate extraction to BrightcoveNewIE 2016-07-11 16:30:24 +01:00
Remita Amine
4a7a5e41f7 [tvplay] improve extraction 2016-07-11 14:51:44 +01:00
Yen Chi Hsuan
2a49d01600 [playvid] Update _TESTS
Blocks https://travis-ci.org/rg3/youtube-dl/jobs/143809100
2016-07-11 15:15:28 +08:00
Yen Chi Hsuan
b99af8a51c [biobiochiletv] Fix extraction and update _TESTS 2016-07-11 13:23:57 +08:00
Yen Chi Hsuan
8e7020daef [rudo] Add new extractor
Used in biobiochile.tv
2016-07-11 13:19:25 +08:00
Sergey M․
a26bcc61c1 release 2016.07.11 2016-07-11 03:17:12 +07:00
Sergey M․
5c4dcf8172 [vidzi] Add support for embed URLs (Closes #10058) 2016-07-11 03:14:39 +07:00
Sergey M․
e9fb6a4bbe [youtube] Relax TFA regexes 2016-07-11 03:08:38 +07:00
Yen Chi Hsuan
e2dbcaa1bf [vuclip] Fix extraction 2016-07-11 00:52:25 +08:00
Yen Chi Hsuan
ae01850165 [miomio] Fix _TESTS 2016-07-11 00:03:24 +08:00
Yen Chi Hsuan
c3baaedfc8 [miomio] Support new 'h5' player (closes #9605)
Depends on #8876
2016-07-10 23:46:48 +08:00
Yen Chi Hsuan
0b68de3cc1 Merge pull request #8876 from remitamine/html5_media
[extractor/common] add helper method to extract html5 media entries
2016-07-10 23:40:45 +08:00
Sergey M․
39e9d524e5 Credit @nehalvpatel for roosterteeth (#9864) 2016-07-10 01:30:12 +07:00
Sergey M․
865b087224 [roosterteeth] Improve (Closes #9864) 2016-07-10 01:30:12 +07:00
Nehal Patel
3121b25639 [roosterteeth] Add extractor 2016-07-10 01:30:12 +07:00
Sergey M․
0286b85c79 release 2016.07.09.2 2016-07-09 22:22:24 +07:00
Sergey M․
ab52bb5137 [animeondemand] Fix typo 2016-07-09 22:20:34 +07:00
Sergey M․
61a98b8623 [lynda] Remove md5 from test (Closes #10047) 2016-07-09 21:29:11 +07:00
Sergey M․
6daf34a045 [facebook] Fix typo and break when found video_data (Closes #10048) 2016-07-09 21:25:07 +07:00
Yen Chi Hsuan
c03adf90bd [generic] Add the test. Closes #1638 2016-07-09 14:39:01 +08:00
Yen Chi Hsuan
0ece114b7b [vimeo] Recognize non-standard embeds (#1638) 2016-07-09 14:38:27 +08:00
Yen Chi Hsuan
5b6a74856b Merge pull request #9288 from reyyed/issue#9063fix
[ffmpeg] Fix embedding subtitles (#9063)
2016-07-09 14:29:53 +08:00
Sergey M․
ce43100a01 release 2016.07.09.1 2016-07-09 10:06:40 +07:00
Remita Amine
8cc9b4016d [srmediathek] extend _VALID_URL(closes #9373) 2016-07-09 03:22:09 +01:00
Remita Amine
31eeab9f41 [ard] fix f4m extraction and skip tests with 404 errors 2016-07-09 03:22:09 +01:00
Sergey M․
9558dcec9c [youtube:user] Preserve user/c path segment 2016-07-09 08:37:19 +07:00
Sergey M․
6e6b70d65f [extractor/generic] Properly comment out a test 2016-07-09 08:37:19 +07:00
Sergey M․
d417fd88d0 release 2016.07.09 2016-07-09 07:16:47 +07:00
Sergey M․
9e4f5dc1e9 [animeondemand] Pass num for episode based videos 2016-07-09 07:13:32 +07:00
Sergey M․
1251565ee0 [options] Rollback old behavior for configuratio files' encoding
Until agreed with some solution
2016-07-09 07:12:52 +07:00
Sergey M․
1f7258a367 [animeondemand] Add support for full length films (Closes #10031) 2016-07-09 06:57:04 +07:00
Sergey M․
0af985069b [flipagram] Improve extraction (Closes #9898) 2016-07-09 03:31:17 +07:00
Sergey M․
0de168f7ed [extractor/generic] Detect schema.org/VideoObject embeds 2016-07-09 03:29:07 +07:00
Sergey M․
95b31e266b [extractor/common] Add expected_type in json ld routines 2016-07-09 03:28:04 +07:00
Sergey M․
6b3a3098b5 [extractor/common] Extract more metadata for VideoObject in _json_ld 2016-07-09 03:27:11 +07:00
Sergey M․
2de624fdd5 [extractor/common] Introduce filesize metafield for thumbnails 2016-07-09 03:24:36 +07:00
Déstin Reed
3fee7f636c [flipagram] Add extractor 2016-07-09 03:23:32 +07:00
Remita Amine
89e2fff2b7 [mgtv] pass geo verification headers for api request 2016-07-08 20:18:25 +01:00
Sergey M․
cedc70b292 [facebook] Fix invalid video being extracted (Closes #9851) 2016-07-09 00:28:07 +07:00
Remita Amine
07d7689f2e [le] extract http formats 2016-07-08 15:35:20 +01:00
Yen Chi Hsuan
ae8cb5328d Merge branch 'JakubAdamWieczorek-polskie-radio' 2016-07-08 19:35:21 +08:00
Yen Chi Hsuan
2e32ac0b9a [polskieradio] Fix regex in _TESTS 2016-07-08 19:34:53 +08:00
Yen Chi Hsuan
672f01c370 Merge branch 'polskie-radio' of https://github.com/JakubAdamWieczorek/youtube-dl into JakubAdamWieczorek-polskie-radio 2016-07-08 19:33:28 +08:00
Jakub Adam Wieczorek
e2d616dd30 [polskieradio] Add thumbnails. 2016-07-08 13:23:00 +02:00
Yen Chi Hsuan
0ab7f4fe2b [nick] support nickjr.com (closes #7542) 2016-07-08 15:11:28 +08:00
Sergey M․
29c4a07776 [lynda] Fix test 2016-07-08 03:33:53 +07:00
Philipp Hagemeister
826e911e41 Merge branch 'master' of github.com:rg3/youtube-dl 2016-07-07 19:42:22 +02:00
Philipp Hagemeister
30d22dae8e [options] Do not decode Unicode on Python 2.x
The configuration file contents are being returned as unicode now, so decoding them is no longer necessary.
(Run python2 with -3 to see the warning before this commit)
2016-07-07 19:41:00 +02:00
Yen Chi Hsuan
ec3518725b [compat] Fix test_cmdline_umlauts on Python 2.6
The original statement raises uncaught UnicodeWarning on Python 2.6
2016-07-07 22:30:58 +08:00
Remita Amine
5f87d845eb [tweakers] fix info extraction(closes #9516) 2016-07-07 12:51:42 +01:00
Philipp Hagemeister
571808a7aa document comments in configuration file (fixes #10024) 2016-07-07 12:12:21 +02:00
Yen Chi Hsuan
dfe5fa49ae [compat] Fix compat_shlex_split for non-ASCII input
Closes #9871
2016-07-07 17:37:29 +08:00
Remita Amine
01a0c511eb [radiocanada] extract more formats 2016-07-07 03:46:12 +01:00
remitamine
b3d30315ce Merge pull request #9597 from remitamine/toutv
[toutv] fix info extraction(closes #1792)(closes #2082)
2016-07-07 01:51:01 +01:00
remitamine
882af14d7d [toutv] fix info extraction(closes #1792)(closes #2082) 2016-07-07 01:47:28 +01:00
Remita Amine
47335a0efa [telecinco] fix info extraction 2016-07-06 23:09:13 +01:00
Sergey M․
34bc2d9dfd release 2016.07.07 2016-07-07 01:54:29 +07:00
Sergey M․
08c7af4afa [kamcord] Add extractor (Closes #10001) 2016-07-07 01:50:39 +07:00
Yen Chi Hsuan
f7291a0b7c [daum.net] Fix extraction for specific examples
Closes #9972
2016-07-07 01:26:14 +08:00
Yen Chi Hsuan
c65aa4e9e1 [brightcove:legacy] Support 'playlistTabs' and skip a dead test
Closes #9965
2016-07-07 01:13:37 +08:00
Yen Chi Hsuan
ad213a1d74 [francetv] Recognize more Dailymotion embedded videos
Closes #9955
2016-07-06 23:37:54 +08:00
Yen Chi Hsuan
43f1e4e41e [onet] Add MD5 checksum 2016-07-06 20:32:03 +08:00
Yen Chi Hsuan
54b0e909d5 [amp] Fix a typo 2016-07-06 20:10:47 +08:00
Yen Chi Hsuan
f8752b86ac [Onet,ClipRs] Add new extractor for onet.tv and use it for clip.rs
Closes #9950
2016-07-06 20:09:05 +08:00
Yen Chi Hsuan
84c237fb8a [utils] Add get_element_by_class
For #9950
2016-07-06 20:02:52 +08:00
Remita Amine
ab49d7a9fa use mimetype2ext to determine manifest ext in multiple extractors 2016-07-06 09:11:46 +01:00
Remita Amine
b4173f1551 [utils] add mimetypes to determine manifest ext(m3u8, f4m, mpd) 2016-07-06 09:06:28 +01:00
Remita Amine
2817b99cf2 [metacafe] fix info extraction(closes #8539)(closes #3253) 2016-07-06 02:19:55 +01:00
Remita Amine
001fffd004 [spiegel:article] update test(closes #10018) 2016-07-06 00:16:41 +01:00
Sergey M․
0e94b4713d release 2016.07.06 2016-07-06 00:54:23 +07:00
Sergey M․
a6d3b89feb [prosiebensat1] Make downloading urls JSON non fatal 2016-07-06 00:52:48 +07:00
Remita Amine
6c26815d63 [onionstudios] fix info extraction 2016-07-05 18:05:07 +01:00
Sergey M․
73c4ac2c95 [youtube:channel] Improve channel id extraction and detect unavailable channels (Closes #10009) 2016-07-05 23:30:44 +07:00
Remita Amine
84f214d840 [prosiebensat1] extract all formats 2016-07-05 17:11:45 +01:00
Remita Amine
e3f88be7a9 [rtvnh] extract all formats 2016-07-05 14:45:39 +01:00
Remita Amine
31af3e35e0 [sandia] remove unused imports 2016-07-05 13:39:24 +01:00
Remita Amine
94a5cff91d [sendia] fix info extraction 2016-07-05 13:37:46 +01:00
Remita Amine
77082c7b9e [slideshare] fix description extraction 2016-07-05 12:01:04 +01:00
Remita Amine
252a1f75d2 [spiegel] improve info extraction 2016-07-05 11:46:25 +01:00
Remita Amine
5abf513cf8 [stitcher] fix episode config extraction 2016-07-05 10:44:16 +01:00
Yen Chi Hsuan
c6054e3201 [xuite] Support videos with already encoded media id 2016-07-05 14:26:42 +08:00
Yen Chi Hsuan
4080530624 [youtube:shared] Recognize the new 'shared' URLs
Closes #10007
2016-07-05 13:15:05 +08:00
Sergey M․
c25f1a9b63 release 2016.07.05 2016-07-05 06:32:46 +07:00
Remita Amine
dfaa86b75e [test_utils] add test for smuggling a smuggled url 2016-07-04 21:36:32 +01:00
Remita Amine
d9163ae3b6 [kaltura] fix extraction error for videos from multiple kaltura servers 2016-07-04 21:34:27 +01:00
Remita Amine
dafafe7cf1 [la7] extract more info from a kaltura custom server 2016-07-04 17:59:58 +01:00
Remita Amine
81953d1ae5 [kaltura] add support videos stored on custom kaltura servers(closes #5557) 2016-07-04 17:59:58 +01:00
Yen Chi Hsuan
3a212ed62e [iqiyi] Skip an unstable MD5 checksum 2016-07-04 11:25:46 +08:00
Sergey M․
195f084542 [pornhub] Detect private videos (Closes #9987) 2016-07-04 03:27:00 +07:00
Sergey M․
aa7a455b2e [README.md] Clarify configuration file may not exist by default 2016-07-04 01:24:33 +07:00
Sergey M․
6a4e659c93 [yahoo] Recognize brightcove embed (Closes #9995) 2016-07-03 23:00:36 +07:00
Yen Chi Hsuan
40f3666f6b [test/test_http] Update tests for 38cce791c7 2016-07-03 23:50:55 +08:00
Remita Amine
dd801bbe18 [brightcove] improve error detection 2016-07-03 16:37:22 +01:00
Yen Chi Hsuan
38cce791c7 Rename --cn-verfication-proxy to --geo-verification-proxy
And deprecate the former one

Since commit f138873900, this option is
not limited to China websites, so rename it.
2016-07-03 23:29:56 +08:00
Sergey M․
bf3ae6a543 [devscripts/show-downloads-statictics] Add script for displaying downloads statistics 2016-07-03 22:20:14 +07:00
Sergey M․
bff98341d5 release 2016.07.03.1 2016-07-03 21:28:55 +07:00
Yen Chi Hsuan
2644e911be [iqiyi] Fix extraction
See https://github.com/soimort/you-get/issues/1211#issuecomment-229011559
2016-07-03 22:19:56 +08:00
Remita Amine
a5f67895d3 [nationalgeographic] restore http formats
there was a misunderstanding about the reason of 403 response
the problem happen only when the user use aria2c as a downloader
a1f6f5c768 (commitcomment-18107559)
2016-07-03 14:10:25 +01:00
Yen Chi Hsuan
15e4b6b758 [rai] Support an alternative form of embedded relinker URL
Closes #8551
2016-07-03 19:52:11 +08:00
Yen Chi Hsuan
2b28b892d8 [rai] Support videos with embedded content item ID (#8551) 2016-07-03 19:52:11 +08:00
Sergey M․
7507fc98cb [README.md] Fix somes typo in coding conventions section 2016-07-03 18:35:28 +07:00
Yen Chi Hsuan
477b7a8474 [downloader/f4m] Fix for Rai live streams 2016-07-03 19:26:39 +08:00
Yen Chi Hsuan
034a884957 [rai] Support direct relinker URLs (closes #8552) 2016-07-03 19:26:39 +08:00
Remita Amine
64436cb1a4 [nationalgeographic] skip download for national geographic channel tests(closes #9991) 2016-07-03 10:43:36 +01:00
Yen Chi Hsuan
f138873900 [rai] Fix extraction and update _TESTS
Closes #8617
Closes #9157
Closes #9232
2016-07-03 15:49:35 +08:00
Yen Chi Hsuan
e793338c88 [buzzfeed] Detect Facebook embed and update _TESTS
Closes #5701
2016-07-03 14:12:02 +08:00
Yen Chi Hsuan
369bb06206 [facebook] Improve embed detection (#5701) 2016-07-03 14:11:29 +08:00
Sergey M․
2cb31d288e [history:topic] Relax _VALID_URL 2016-07-03 13:01:04 +07:00
Sergey M․
c723d1cd8d [README.md] Update some codebase links 2016-07-03 11:35:13 +07:00
Sergey M․
1f55234057 Add PULL_REQUEST_TEMPLATE.md 2016-07-03 11:31:49 +07:00
Sergey M․
04006fae8d [README.md] Start writing youtube-dl coding conventions 2016-07-03 11:31:07 +07:00
Jaime Marquínez Ferrándiz
4cb13d0d6a [hrti] Don't redefine variable in list comprehension 2016-07-02 23:02:14 +02:00
Remita Amine
a1f6f5c768 [nationalgeographic] add support Adobe Pass auth 2016-07-02 21:24:22 +01:00
Remita Amine
05c7feec77 [aenetworks] add support Adobe Pass auth 2016-07-02 21:24:22 +01:00
Remita Amine
bf83024826 [theplatform] add basic support for Adobe Pass 2016-07-02 21:24:22 +01:00
remitamine
59bbe4911a [extractor/common] add helper method to extract html5 media entries 2016-06-26 14:04:08 +01:00
remitamine
4f3c5e0627 [utils] add helper function for parsing codecs 2016-06-26 14:03:58 +01:00
Wang Jun Tham
ccff2c404d [ffmpeg] Fix embedding subtitles (#9063)
Changed command line parameters for ffmpeg when embedding subtitles.
Changed to ‘-map 0:v -c:v copy -map 0:a -c:a copy’
2016-04-24 00:08:02 +08:00
187 changed files with 7006 additions and 3413 deletions

View File

@@ -6,8 +6,8 @@
---
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2016.07.03*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2016.07.03**
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2016.08.13*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2016.08.13**
### Before submitting an *issue* make sure you have:
- [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
@@ -35,7 +35,7 @@ $ youtube-dl -v <your command line>
[debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2016.07.03
[debug] youtube-dl version 2016.08.13
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {}

22
.github/PULL_REQUEST_TEMPLATE.md vendored Normal file
View File

@@ -0,0 +1,22 @@
## Please follow the guide below
- You will be asked some questions, please read them **carefully** and answer honestly
- Put an `x` into all the boxes [ ] relevant to your *pull request* (like that [x])
- Use *Preview* tab to see how your *pull request* will actually look like
---
### Before submitting a *pull request* make sure you have:
- [ ] At least skimmed through [adding new extractor tutorial](https://github.com/rg3/youtube-dl#adding-support-for-a-new-site) and [youtube-dl coding conventions](https://github.com/rg3/youtube-dl#youtube-dl-coding-conventions) sections
- [ ] [Searched](https://github.com/rg3/youtube-dl/search?q=is%3Apr&type=Issues) the bugtracker for similar pull requests
### What is the purpose of your *pull request*?
- [ ] Bug fix
- [ ] New extractor
- [ ] New feature
---
### Description of your *pull request* and other information
Explanation of your *pull request* in arbitrary form goes here. Please make sure the description explains the purpose and effect of your *pull request* and is worded well enough to be understood. Provide as much context and examples as possible.

View File

@@ -7,9 +7,6 @@ python:
- "3.4"
- "3.5"
sudo: false
install:
- bash ./devscripts/install_srelay.sh
- export PATH=$PATH:$(pwd)/tmp/srelay-0.4.8b6
script: nosetests test --verbose
notifications:
email:

View File

@@ -177,3 +177,7 @@ Roman Tsiupa
Artur Krysiak
Jakub Adam Wieczorek
Aleksandar Topuzović
Nehal Patel
Rob van Bekkum
Petr Zvoníček
Pratyush Singh

View File

@@ -46,7 +46,7 @@ Make sure that someone has not already opened the issue you're trying to open. S
### Why are existing options not enough?
Before requesting a new feature, please have a quick peek at [the list of supported options](https://github.com/rg3/youtube-dl/blob/master/README.md#synopsis). Many feature requests are for features that actually exist already! Please, absolutely do show off your work in the issue report and detail how the existing similar options do *not* solve your problem.
Before requesting a new feature, please have a quick peek at [the list of supported options](https://github.com/rg3/youtube-dl/blob/master/README.md#options). Many feature requests are for features that actually exist already! Please, absolutely do show off your work in the issue report and detail how the existing similar options do *not* solve your problem.
### Is there enough context in your bug report?
@@ -97,9 +97,17 @@ If you want to add support for a new site, first of all **make sure** this site
After you have ensured this site is distributing it's content legally, you can follow this quick list (assuming your service is called `yourextractor`):
1. [Fork this repository](https://github.com/rg3/youtube-dl/fork)
2. Check out the source code with `git clone git@github.com:YOUR_GITHUB_USERNAME/youtube-dl.git`
3. Start a new git branch with `cd youtube-dl; git checkout -b yourextractor`
2. Check out the source code with:
git clone git@github.com:YOUR_GITHUB_USERNAME/youtube-dl.git
3. Start a new git branch with
cd youtube-dl
git checkout -b yourextractor
4. Start with this simple template and save it to `youtube_dl/extractor/yourextractor.py`:
```python
# coding: utf-8
from __future__ import unicode_literals
@@ -143,16 +151,148 @@ After you have ensured this site is distributing it's content legally, you can f
5. Add an import in [`youtube_dl/extractor/extractors.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/extractors.py).
6. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, then rename ``_TEST`` to ``_TESTS`` and make it into a list of dictionaries. The tests will then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc.
7. Have a look at [`youtube_dl/extractor/common.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L74-L252). Add tests and code for as many as you want.
8. Keep in mind that the only mandatory fields in info dict for successful extraction process are `id`, `title` and either `url` or `formats`, i.e. these are the critical data the extraction does not make any sense without. This means that [any field](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L148-L252) apart from aforementioned mandatory ones should be treated **as optional** and extraction should be **tolerate** to situations when sources for these fields can potentially be unavailable (even if they always available at the moment) and **future-proof** in order not to break the extraction of general purpose mandatory fields. For example, if you have some intermediate dict `meta` that is a source of metadata and it has a key `summary` that you want to extract and put into resulting info dict as `description`, you should be ready that this key may be missing from the `meta` dict, i.e. you should extract it as `meta.get('summary')` and not `meta['summary']`. Similarly, you should pass `fatal=False` when extracting data from a webpage with `_search_regex/_html_search_regex`.
9. Check the code with [flake8](https://pypi.python.org/pypi/flake8). Also make sure your code works under all [Python](http://www.python.org/) versions claimed supported by youtube-dl, namely 2.6, 2.7, and 3.2+.
10. When the tests pass, [add](http://git-scm.com/docs/git-add) the new files and [commit](http://git-scm.com/docs/git-commit) them and [push](http://git-scm.com/docs/git-push) the result, like this:
8. Make sure your code follows [youtube-dl coding conventions](#youtube-dl-coding-conventions) and check the code with [flake8](https://pypi.python.org/pypi/flake8). Also make sure your code works under all [Python](http://www.python.org/) versions claimed supported by youtube-dl, namely 2.6, 2.7, and 3.2+.
9. When the tests pass, [add](http://git-scm.com/docs/git-add) the new files and [commit](http://git-scm.com/docs/git-commit) them and [push](http://git-scm.com/docs/git-push) the result, like this:
$ git add youtube_dl/extractor/extractors.py
$ git add youtube_dl/extractor/yourextractor.py
$ git commit -m '[yourextractor] Add new extractor'
$ git push origin yourextractor
11. Finally, [create a pull request](https://help.github.com/articles/creating-a-pull-request). We'll then review and merge it.
10. Finally, [create a pull request](https://help.github.com/articles/creating-a-pull-request). We'll then review and merge it.
In any case, thank you very much for your contributions!
## youtube-dl coding conventions
This section introduces a guide lines for writing idiomatic, robust and future-proof extractor code.
Extractors are very fragile by nature since they depend on the layout of the source data provided by 3rd party media hoster out of your control and this layout tend to change. As an extractor implementer your task is not only to write code that will extract media links and metadata correctly but also to minimize code dependency on source's layout changes and even to make the code foresee potential future changes and be ready for that. This is important because it will allow extractor not to break on minor layout changes thus keeping old youtube-dl versions working. Even though this breakage issue is easily fixed by emitting a new version of youtube-dl with fix incorporated all the previous version become broken in all repositories and distros' packages that may not be so prompt in fetching the update from us. Needless to say some may never receive an update at all that is possible for non rolling release distros.
### Mandatory and optional metafields
For extraction to work youtube-dl relies on metadata your extractor extracts and provides to youtube-dl expressed by [information dictionary](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L75-L257) or simply *info dict*. Only the following meta fields in *info dict* are considered mandatory for successful extraction process by youtube-dl:
- `id` (media identifier)
- `title` (media title)
- `url` (media download URL) or `formats`
In fact only the last option is technically mandatory (i.e. if you can't figure out the download location of the media the extraction does not make any sense). But by convention youtube-dl also treats `id` and `title` to be mandatory. Thus aforementioned metafields are the critical data the extraction does not make any sense without and if any of them fail to be extracted then extractor is considered completely broken.
[Any field](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L149-L257) apart from the aforementioned ones are considered **optional**. That means that extraction should be **tolerate** to situations when sources for these fields can potentially be unavailable (even if they are always available at the moment) and **future-proof** in order not to break the extraction of general purpose mandatory fields.
#### Example
Say you have some source dictionary `meta` that you've fetched as JSON with HTTP request and it has a key `summary`:
```python
meta = self._download_json(url, video_id)
```
Assume at this point `meta`'s layout is:
```python
{
...
"summary": "some fancy summary text",
...
}
```
Assume you want to extract `summary` and put into resulting info dict as `description`. Since `description` is optional metafield you should be ready that this key may be missing from the `meta` dict, so that you should extract it like:
```python
description = meta.get('summary') # correct
```
and not like:
```python
description = meta['summary'] # incorrect
```
The latter will break extraction process with `KeyError` if `summary` disappears from `meta` at some time later but with former approach extraction will just go ahead with `description` set to `None` that is perfectly fine (remember `None` is equivalent for absence of data).
Similarly, you should pass `fatal=False` when extracting optional data from a webpage with `_search_regex`, `_html_search_regex` or similar methods, for instance:
```python
description = self._search_regex(
r'<span[^>]+id="title"[^>]*>([^<]+)<',
webpage, 'description', fatal=False)
```
With `fatal` set to `False` if `_search_regex` fails to extract `description` it will emit a warning and continue extraction.
You can also pass `default=<some fallback value>`, for example:
```python
description = self._search_regex(
r'<span[^>]+id="title"[^>]*>([^<]+)<',
webpage, 'description', default=None)
```
On failure this code will silently continue the extraction with `description` set to `None`. That is useful for metafields that are known to may or may not be present.
### Provide fallbacks
When extracting metadata try to provide several scenarios for that. For example if `title` is present in several places/sources try extracting from at least some of them. This would make it more future-proof in case some of the sources became unavailable.
#### Example
Say `meta` from previous example has a `title` and you are about to extract it. Since `title` is mandatory meta field you should end up with something like:
```python
title = meta['title']
```
If `title` disappeares from `meta` in future due to some changes on hoster's side the extraction would fail since `title` is mandatory. That's expected.
Assume that you have some another source you can extract `title` from, for example `og:title` HTML meta of a `webpage`. In this case you can provide a fallback scenario:
```python
title = meta.get('title') or self._og_search_title(webpage)
```
This code will try to extract from `meta` first and if it fails it will try extracting `og:title` from a `webpage`.
### Make regular expressions flexible
When using regular expressions try to write them fuzzy and flexible.
#### Example
Say you need to extract `title` from the following HTML code:
```html
<span style="position: absolute; left: 910px; width: 90px; float: right; z-index: 9999;" class="title">some fancy title</span>
```
The code for that task should look similar to:
```python
title = self._search_regex(
r'<span[^>]+class="title"[^>]*>([^<]+)', webpage, 'title')
```
Or even better:
```python
title = self._search_regex(
r'<span[^>]+class=(["\'])title\1[^>]*>(?P<title>[^<]+)',
webpage, 'title', group='title')
```
Note how you tolerate potential changes in `style` attribute's value or switch from using double quotes to single for `class` attribute:
The code definitely should not look like:
```python
title = self._search_regex(
r'<span style="position: absolute; left: 910px; width: 90px; float: right; z-index: 9999;" class="title">(.*?)</span>',
webpage, 'title', group='title')
```
### Use safe conversion functions
Wrap all extracted numeric data into safe functions from `utils`: `int_or_none`, `float_or_none`. Use them for string to number conversions as well.

394
ChangeLog Normal file
View File

@@ -0,0 +1,394 @@
version 2016.08.13
Core
* Show progress for curl external downloader
* Forward more options to curl external downloader
Extractors
* [pbs] Fix description extraction
* [franceculture] Fix extraction (#10324)
* [pornotube] Fix extraction (#10322)
* [4tube] Fix metadata extraction (#10321)
* [imgur] Fix width and height extraction (#10325)
* [expotv] Improve extraction
+ [vbox7] Fix extraction (#10309)
- [tapely] Remove extractor (#10323)
* [muenchentv] Fix extraction (#10313)
+ [24video] Add support for .me and .xxx TLDs
* [24video] Fix comment count extraction
* [sunporno] Add support for embed URLs
* [sunporno] Fix metadata extraction (#10316)
+ [hgtv] Add extractor for hgtv.ca (#3999)
- [pbs] Remove request to unavailable API
+ [pbs] Add support for high quality HTTP formats
+ [crunchyroll] Add support for HLS formats (#10301)
version 2016.08.12
Core
* Subtitles are now written as is. Newline conversions are disabled. (#10268)
+ Recognize more formats in unified_timestamp
Extractors
- [goldenmoustache] Remove extractor (#10298)
* [drtuber] Improve title extraction
* [drtuber] Make dislike count optional (#10297)
* [chirbit] Fix extraction (#10296)
* [francetvinfo] Relax URL regular expression
* [rtlnl] Relax URL regular expression (#10282)
* [formula1] Relax URL regular expression (#10283)
* [wat] Improve extraction (#10281)
* [ctsnews] Fix extraction
version 2016.08.10
Core
* Make --metadata-from-title non fatal when title does not match the pattern
* Introduce options for randomized sleep before each download
--min-sleep-interval and --max-sleep-interval (#9930)
* Respect default in _search_json_ld
Extractors
+ [uol] Add extractor for uol.com.br (#4263)
* [rbmaradio] Fix extraction and extract all formats (#10242)
+ [sonyliv] Add extractor for sonyliv.com (#10258)
* [aparat] Fix extraction
* [cwtv] Extract HTTP formats
+ [rozhlas] Add extractor for prehravac.rozhlas.cz (#10253)
* [kuwo:singer] Fix extraction
version 2016.08.07
Core
+ Add support for TV Parental Guidelines ratings in parse_age_limit
+ Add decode_png (#9706)
+ Add support for partOfTVSeries in JSON-LD
* Lower master M3U8 manifest preference for better format sorting
Extractors
+ [discoverygo] Add extractor (#10245)
* [flipagram] Make JSON-LD extraction non fatal
* [generic] Make JSON-LD extraction non fatal
+ [bbc] Add support for morph embeds (#10239)
* [tnaflixnetworkbase] Improve title extraction
* [tnaflix] Fix metadata extraction (#10249)
* [fox] Fix theplatform release URL query
* [openload] Fix extraction (#9706)
* [bbc] Skip duplicate manifest URLs
* [bbc] Improve format code
+ [bbc] Add support for DASH and F4M
* [bbc] Improve format sorting and listing
* [bbc] Improve playlist extraction
+ [pokemon] Add extractor (#10093)
+ [condenast] Add fallback scenario for video info extraction
version 2016.08.06
Core
* Add support for JSON-LD root list entries (#10203)
* Improve unified_timestamp
* Lower preference of RTSP formats in generic sorting
+ Add support for multiple properties in _og_search_property
* Improve password hiding from verbose output
Extractors
+ [adultswim] Add support for trailers (#10235)
* [archiveorg] Improve extraction (#10219)
+ [jwplatform] Add support for playlists
+ [jwplatform] Add support for relative URLs
* [jwplatform] Improve audio detection
+ [tvplay] Capture and output native error message
+ [tvplay] Extract series metadata
+ [tvplay] Add support for subtitles (#10194)
* [tvp] Improve extraction (#7799)
* [cbslocal] Fix timestamp parsing (#10213)
+ [naver] Add support for subtitles (#8096)
* [naver] Improve extraction
* [condenast] Improve extraction
* [engadget] Relax URL regular expression
* [5min] Fix extraction
+ [nationalgeographic] Add support for Episode Guide
+ [kaltura] Add support for subtitles
* [kaltura] Optimize network requests
+ [vodplatform] Add extractor for vod-platform.net
- [gamekings] Remove extractor
* [limelight] Extract HTTP formats
* [ntvru] Fix extraction
+ [comedycentral] Re-add :tds and :thedailyshow shortnames
version 2016.08.01
Fixed/improved extractors
- [yandexmusic:track] Adapt to changes in track location JSON (#10193)
- [bloomberg] Support another form of player (#10187)
- [limelight] Skip DRM protected videos
- [safari] Relax regular expressions for URL matching (#10202)
- [cwtv] Add support for cwtvpr.com (#10196)
version 2016.07.30
Fixed/improved extractors
- [twitch:clips] Sort formats
- [tv2] Use m3u8_native
- [tv2:article] Fix video detection (#10188)
- rtve (#10076)
- [dailymotion:playlist] Optimize download archive processing (#10180)
version 2016.07.28
Fixed/improved extractors
- shared (#10170)
- soundcloud (#10179)
- twitch (#9767)
version 2016.07.26.2
Fixed/improved extractors
- smotri
- camdemy
- mtv
- comedycentral
- cmt
- cbc
- mgtv
- orf
version 2016.07.24
New extractors
- arkena (#8682)
- lcp (#8682)
Fixed/improved extractors
- facebook (#10151)
- dailymail
- telegraaf
- dcn
- onet
- tvp
Miscellaneous
- Support $Time$ in DASH manifests
version 2016.07.22
New extractors
- odatv (#9285)
Fixed/improved extractors
- bbc
- youjizz (#10131)
- youtube (#10140)
- pornhub (#10138)
- eporner (#10139)
version 2016.07.17
New extractors
- nintendo (#9986)
- streamable (#9122)
Fixed/improved extractors
- ard (#10095)
- mtv
- comedycentral (#10101)
- viki (#10098)
- spike (#10106)
Miscellaneous
- Improved twitter player detection (#10090)
version 2016.07.16
New extractors
- ninenow (#5181)
Fixed/improved extractors
- rtve (#10076)
- brightcove
- 3qsdn
- syfy (#9087, #3820, #2388)
- youtube (#10083)
Miscellaneous
- Fix subtitle embedding for video-only and audio-only files (#10081)
version 2016.07.13
New extractors
- rudo
Fixed/improved extractors
- biobiochiletv
- tvplay
- dbtv
- brightcove
- tmz
- youtube (#10059)
- shahid (#10062)
- vk
- ellentv (#10067)
version 2016.07.11
New Extractors
- roosterteeth (#9864)
Fixed/improved extractors
- miomio (#9605)
- vuclip
- youtube
- vidzi (#10058)
version 2016.07.09.2
Fixed/improved extractors
- vimeo (#1638)
- facebook (#10048)
- lynda (#10047)
- animeondemand
Fixed/improved features
- Embedding subtitles no longer throws an error with problematic inputs (#9063)
version 2016.07.09.1
Fixed/improved extractors
- youtube
- ard
- srmediatek (#9373)
version 2016.07.09
New extractors
- Flipagram (#9898)
Fixed/improved extractors
- telecinco
- toutv
- radiocanada
- tweakers (#9516)
- lynda
- nick (#7542)
- polskieradio (#10028)
- le
- facebook (#9851)
- mgtv
- animeondemand (#10031)
Fixed/improved features
- `--postprocessor-args` and `--downloader-args` now accepts non-ASCII inputs
on non-Windows systems
version 2016.07.07
New extractors
- kamcord (#10001)
Fixed/improved extractors
- spiegel (#10018)
- metacafe (#8539, #3253)
- onet (#9950)
- francetv (#9955)
- brightcove (#9965)
- daum (#9972)
version 2016.07.06
Fixed/improved extractors
- youtube (#10007, #10009)
- xuite
- stitcher
- spiegel
- slideshare
- sandia
- rtvnh
- prosiebensat1
- onionstudios
version 2016.07.05
Fixed/improved extractors
- brightcove
- yahoo (#9995)
- pornhub (#9997)
- iqiyi
- kaltura (#5557)
- la7
- Changed features
- Rename --cn-verfication-proxy to --geo-verification-proxy
Miscellaneous
- Add script for displaying downloads statistics
version 2016.07.03.1
Fixed/improved extractors
- theplatform
- aenetworks
- nationalgeographic
- hrti (#9482)
- facebook (#5701)
- buzzfeed (#5701)
- rai (#8617, #9157, #9232, #8552, #8551)
- nationalgeographic (#9991)
- iqiyi
version 2016.07.03
New extractors
- hrti (#9482)
Fixed/improved extractors
- vk (#9981)
- facebook (#9938)
- xtube (#9953, #9961)
version 2016.07.02
New extractors
- fusion (#9958)
Fixed/improved extractors
- twitch (#9975)
- vine (#9970)
- periscope (#9967)
- pornhub (#8696)
version 2016.07.01
New extractors
- 9c9media
- ctvnews (#2156)
- ctv (#4077)
Fixed/Improved extractors
- rds
- meta (#8789)
- pornhub (#9964)
- sixplay (#2183)
New features
- Accept quoted strings across multiple lines (#9940)

View File

@@ -94,7 +94,7 @@ _EXTRACTOR_FILES != find youtube_dl/extractor -iname '*.py' -and -not -iname 'la
youtube_dl/extractor/lazy_extractors.py: devscripts/make_lazy_extractors.py devscripts/lazy_load_template.py $(_EXTRACTOR_FILES)
$(PYTHON) devscripts/make_lazy_extractors.py $@
youtube-dl.tar.gz: youtube-dl README.md README.txt youtube-dl.1 youtube-dl.bash-completion youtube-dl.zsh youtube-dl.fish
youtube-dl.tar.gz: youtube-dl README.md README.txt youtube-dl.1 youtube-dl.bash-completion youtube-dl.zsh youtube-dl.fish ChangeLog
@tar -czf youtube-dl.tar.gz --transform "s|^|youtube-dl/|" --owner 0 --group 0 \
--exclude '*.DS_Store' \
--exclude '*.kate-swp' \
@@ -107,7 +107,7 @@ youtube-dl.tar.gz: youtube-dl README.md README.txt youtube-dl.1 youtube-dl.bash-
--exclude 'docs/_build' \
-- \
bin devscripts test youtube_dl docs \
LICENSE README.md README.txt \
ChangeLog LICENSE README.md README.txt \
Makefile MANIFEST.in youtube-dl.1 youtube-dl.bash-completion \
youtube-dl.zsh youtube-dl.fish setup.py \
youtube-dl

175
README.md
View File

@@ -17,7 +17,7 @@ youtube-dl - download videos from youtube.com or other video platforms
To install it right away for all UNIX users (Linux, OS X, etc.), type:
sudo curl -L https://yt-dl.org/latest/youtube-dl -o /usr/local/bin/youtube-dl
sudo curl -L https://yt-dl.org/downloads/latest/youtube-dl -o /usr/local/bin/youtube-dl
sudo chmod a+rx /usr/local/bin/youtube-dl
If you do not have curl, you can alternatively use a recent wget:
@@ -103,9 +103,9 @@ which means you can modify it, redistribute it or use it however you like.
(experimental)
-6, --force-ipv6 Make all connections via IPv6
(experimental)
--cn-verification-proxy URL Use this proxy to verify the IP address for
some Chinese sites. The default proxy
specified by --proxy (or none, if the
--geo-verification-proxy URL Use this proxy to verify the IP address for
some geo-restricted sites. The default
proxy specified by --proxy (or none, if the
options is not present) is used for the
actual downloading. (experimental)
@@ -330,7 +330,15 @@ which means you can modify it, redistribute it or use it however you like.
bidirectional text support. Requires bidiv
or fribidi executable in PATH
--sleep-interval SECONDS Number of seconds to sleep before each
download.
download when used alone or a lower bound
of a range for randomized sleep before each
download (minimum possible number of
seconds to sleep) when used along with
--max-sleep-interval.
--max-sleep-interval SECONDS Upper bound of a range for randomized sleep
before each download (maximum possible
number of seconds to sleep). Must only be
used along with --min-sleep-interval.
## Video Format Options:
-f, --format FORMAT Video format code, see the "FORMAT
@@ -424,7 +432,7 @@ which means you can modify it, redistribute it or use it however you like.
# CONFIGURATION
You can configure youtube-dl by placing any supported command line option to a configuration file. On Linux and OS X, the system wide configuration file is located at `/etc/youtube-dl.conf` and the user wide configuration file at `~/.config/youtube-dl/config`. On Windows, the user wide configuration file locations are `%APPDATA%\youtube-dl\config.txt` or `C:\Users\<user name>\youtube-dl.conf`.
You can configure youtube-dl by placing any supported command line option to a configuration file. On Linux and OS X, the system wide configuration file is located at `/etc/youtube-dl.conf` and the user wide configuration file at `~/.config/youtube-dl/config`. On Windows, the user wide configuration file locations are `%APPDATA%\youtube-dl\config.txt` or `C:\Users\<user name>\youtube-dl.conf`. Note that by default configuration file may not exist so you may need to create it yourself.
For example, with the following configuration file youtube-dl will always extract the audio, not copy the mtime, use a proxy and save all videos under `Movies` directory in your home directory:
```
@@ -432,6 +440,7 @@ For example, with the following configuration file youtube-dl will always extrac
--no-mtime
--proxy 127.0.0.1:3128
-o ~/Movies/%(title)s.%(ext)s
# Lines starting with # are comments
```
Note that options in configuration file are just the same options aka switches used in regular command line calls thus there **must be no whitespace** after `-` or `--`, e.g. `-o` or `--proxy` but not `- o` or `-- proxy`.
@@ -890,9 +899,17 @@ If you want to add support for a new site, first of all **make sure** this site
After you have ensured this site is distributing it's content legally, you can follow this quick list (assuming your service is called `yourextractor`):
1. [Fork this repository](https://github.com/rg3/youtube-dl/fork)
2. Check out the source code with `git clone git@github.com:YOUR_GITHUB_USERNAME/youtube-dl.git`
3. Start a new git branch with `cd youtube-dl; git checkout -b yourextractor`
2. Check out the source code with:
git clone git@github.com:YOUR_GITHUB_USERNAME/youtube-dl.git
3. Start a new git branch with
cd youtube-dl
git checkout -b yourextractor
4. Start with this simple template and save it to `youtube_dl/extractor/yourextractor.py`:
```python
# coding: utf-8
from __future__ import unicode_literals
@@ -936,19 +953,151 @@ After you have ensured this site is distributing it's content legally, you can f
5. Add an import in [`youtube_dl/extractor/extractors.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/extractors.py).
6. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, then rename ``_TEST`` to ``_TESTS`` and make it into a list of dictionaries. The tests will then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc.
7. Have a look at [`youtube_dl/extractor/common.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L74-L252). Add tests and code for as many as you want.
8. Keep in mind that the only mandatory fields in info dict for successful extraction process are `id`, `title` and either `url` or `formats`, i.e. these are the critical data the extraction does not make any sense without. This means that [any field](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L148-L252) apart from aforementioned mandatory ones should be treated **as optional** and extraction should be **tolerate** to situations when sources for these fields can potentially be unavailable (even if they always available at the moment) and **future-proof** in order not to break the extraction of general purpose mandatory fields. For example, if you have some intermediate dict `meta` that is a source of metadata and it has a key `summary` that you want to extract and put into resulting info dict as `description`, you should be ready that this key may be missing from the `meta` dict, i.e. you should extract it as `meta.get('summary')` and not `meta['summary']`. Similarly, you should pass `fatal=False` when extracting data from a webpage with `_search_regex/_html_search_regex`.
9. Check the code with [flake8](https://pypi.python.org/pypi/flake8). Also make sure your code works under all [Python](http://www.python.org/) versions claimed supported by youtube-dl, namely 2.6, 2.7, and 3.2+.
10. When the tests pass, [add](http://git-scm.com/docs/git-add) the new files and [commit](http://git-scm.com/docs/git-commit) them and [push](http://git-scm.com/docs/git-push) the result, like this:
8. Make sure your code follows [youtube-dl coding conventions](#youtube-dl-coding-conventions) and check the code with [flake8](https://pypi.python.org/pypi/flake8). Also make sure your code works under all [Python](http://www.python.org/) versions claimed supported by youtube-dl, namely 2.6, 2.7, and 3.2+.
9. When the tests pass, [add](http://git-scm.com/docs/git-add) the new files and [commit](http://git-scm.com/docs/git-commit) them and [push](http://git-scm.com/docs/git-push) the result, like this:
$ git add youtube_dl/extractor/extractors.py
$ git add youtube_dl/extractor/yourextractor.py
$ git commit -m '[yourextractor] Add new extractor'
$ git push origin yourextractor
11. Finally, [create a pull request](https://help.github.com/articles/creating-a-pull-request). We'll then review and merge it.
10. Finally, [create a pull request](https://help.github.com/articles/creating-a-pull-request). We'll then review and merge it.
In any case, thank you very much for your contributions!
## youtube-dl coding conventions
This section introduces a guide lines for writing idiomatic, robust and future-proof extractor code.
Extractors are very fragile by nature since they depend on the layout of the source data provided by 3rd party media hoster out of your control and this layout tend to change. As an extractor implementer your task is not only to write code that will extract media links and metadata correctly but also to minimize code dependency on source's layout changes and even to make the code foresee potential future changes and be ready for that. This is important because it will allow extractor not to break on minor layout changes thus keeping old youtube-dl versions working. Even though this breakage issue is easily fixed by emitting a new version of youtube-dl with fix incorporated all the previous version become broken in all repositories and distros' packages that may not be so prompt in fetching the update from us. Needless to say some may never receive an update at all that is possible for non rolling release distros.
### Mandatory and optional metafields
For extraction to work youtube-dl relies on metadata your extractor extracts and provides to youtube-dl expressed by [information dictionary](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L75-L257) or simply *info dict*. Only the following meta fields in *info dict* are considered mandatory for successful extraction process by youtube-dl:
- `id` (media identifier)
- `title` (media title)
- `url` (media download URL) or `formats`
In fact only the last option is technically mandatory (i.e. if you can't figure out the download location of the media the extraction does not make any sense). But by convention youtube-dl also treats `id` and `title` to be mandatory. Thus aforementioned metafields are the critical data the extraction does not make any sense without and if any of them fail to be extracted then extractor is considered completely broken.
[Any field](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L149-L257) apart from the aforementioned ones are considered **optional**. That means that extraction should be **tolerate** to situations when sources for these fields can potentially be unavailable (even if they are always available at the moment) and **future-proof** in order not to break the extraction of general purpose mandatory fields.
#### Example
Say you have some source dictionary `meta` that you've fetched as JSON with HTTP request and it has a key `summary`:
```python
meta = self._download_json(url, video_id)
```
Assume at this point `meta`'s layout is:
```python
{
...
"summary": "some fancy summary text",
...
}
```
Assume you want to extract `summary` and put into resulting info dict as `description`. Since `description` is optional metafield you should be ready that this key may be missing from the `meta` dict, so that you should extract it like:
```python
description = meta.get('summary') # correct
```
and not like:
```python
description = meta['summary'] # incorrect
```
The latter will break extraction process with `KeyError` if `summary` disappears from `meta` at some time later but with former approach extraction will just go ahead with `description` set to `None` that is perfectly fine (remember `None` is equivalent for absence of data).
Similarly, you should pass `fatal=False` when extracting optional data from a webpage with `_search_regex`, `_html_search_regex` or similar methods, for instance:
```python
description = self._search_regex(
r'<span[^>]+id="title"[^>]*>([^<]+)<',
webpage, 'description', fatal=False)
```
With `fatal` set to `False` if `_search_regex` fails to extract `description` it will emit a warning and continue extraction.
You can also pass `default=<some fallback value>`, for example:
```python
description = self._search_regex(
r'<span[^>]+id="title"[^>]*>([^<]+)<',
webpage, 'description', default=None)
```
On failure this code will silently continue the extraction with `description` set to `None`. That is useful for metafields that are known to may or may not be present.
### Provide fallbacks
When extracting metadata try to provide several scenarios for that. For example if `title` is present in several places/sources try extracting from at least some of them. This would make it more future-proof in case some of the sources became unavailable.
#### Example
Say `meta` from previous example has a `title` and you are about to extract it. Since `title` is mandatory meta field you should end up with something like:
```python
title = meta['title']
```
If `title` disappeares from `meta` in future due to some changes on hoster's side the extraction would fail since `title` is mandatory. That's expected.
Assume that you have some another source you can extract `title` from, for example `og:title` HTML meta of a `webpage`. In this case you can provide a fallback scenario:
```python
title = meta.get('title') or self._og_search_title(webpage)
```
This code will try to extract from `meta` first and if it fails it will try extracting `og:title` from a `webpage`.
### Make regular expressions flexible
When using regular expressions try to write them fuzzy and flexible.
#### Example
Say you need to extract `title` from the following HTML code:
```html
<span style="position: absolute; left: 910px; width: 90px; float: right; z-index: 9999;" class="title">some fancy title</span>
```
The code for that task should look similar to:
```python
title = self._search_regex(
r'<span[^>]+class="title"[^>]*>([^<]+)', webpage, 'title')
```
Or even better:
```python
title = self._search_regex(
r'<span[^>]+class=(["\'])title\1[^>]*>(?P<title>[^<]+)',
webpage, 'title', group='title')
```
Note how you tolerate potential changes in `style` attribute's value or switch from using double quotes to single for `class` attribute:
The code definitely should not look like:
```python
title = self._search_regex(
r'<span style="position: absolute; left: 910px; width: 90px; float: right; z-index: 9999;" class="title">(.*?)</span>',
webpage, 'title', group='title')
```
### Use safe conversion functions
Wrap all extracted numeric data into safe functions from `utils`: `int_or_none`, `float_or_none`. Use them for string to number conversions as well.
# EMBEDDING YOUTUBE-DL
youtube-dl makes the best effort to be a good command-line program, and thus should be callable from any programming language. If you encounter any problems parsing its output, feel free to [create a report](https://github.com/rg3/youtube-dl/issues/new).
@@ -1055,7 +1204,7 @@ Make sure that someone has not already opened the issue you're trying to open. S
### Why are existing options not enough?
Before requesting a new feature, please have a quick peek at [the list of supported options](https://github.com/rg3/youtube-dl/blob/master/README.md#synopsis). Many feature requests are for features that actually exist already! Please, absolutely do show off your work in the issue report and detail how the existing similar options do *not* solve your problem.
Before requesting a new feature, please have a quick peek at [the list of supported options](https://github.com/rg3/youtube-dl/blob/master/README.md#options). Many feature requests are for features that actually exist already! Please, absolutely do show off your work in the issue report and detail how the existing similar options do *not* solve your problem.
### Is there enough context in your bug report?

View File

@@ -15,13 +15,9 @@ data = urllib.request.urlopen(URL).read()
with open('download.html.in', 'r', encoding='utf-8') as tmplf:
template = tmplf.read()
md5sum = hashlib.md5(data).hexdigest()
sha1sum = hashlib.sha1(data).hexdigest()
sha256sum = hashlib.sha256(data).hexdigest()
template = template.replace('@PROGRAM_VERSION@', version)
template = template.replace('@PROGRAM_URL@', URL)
template = template.replace('@PROGRAM_MD5SUM@', md5sum)
template = template.replace('@PROGRAM_SHA1SUM@', sha1sum)
template = template.replace('@PROGRAM_SHA256SUM@', sha256sum)
template = template.replace('@EXE_URL@', versions_info['versions'][version]['exe'][0])
template = template.replace('@EXE_SHA256SUM@', versions_info['versions'][version]['exe'][1])

View File

@@ -1,8 +0,0 @@
#!/bin/bash
mkdir -p tmp && cd tmp
wget -N http://downloads.sourceforge.net/project/socks-relay/socks-relay/srelay-0.4.8/srelay-0.4.8b6.tar.gz
tar zxvf srelay-0.4.8b6.tar.gz
cd srelay-0.4.8b6
./configure
make

View File

@@ -54,17 +54,21 @@ def filter_options(readme):
if in_options:
if line.lstrip().startswith('-'):
option, description = re.split(r'\s{2,}', line.lstrip())
split_option = option.split(' ')
split = re.split(r'\s{2,}', line.lstrip())
# Description string may start with `-` as well. If there is
# only one piece then it's a description bit not an option.
if len(split) > 1:
option, description = split
split_option = option.split(' ')
if not split_option[-1].startswith('-'): # metavar
option = ' '.join(split_option[:-1] + ['*%s*' % split_option[-1]])
if not split_option[-1].startswith('-'): # metavar
option = ' '.join(split_option[:-1] + ['*%s*' % split_option[-1]])
# Pandoc's definition_lists. See http://pandoc.org/README.html
# for more information.
ret += '\n%s\n: %s\n' % (option, description)
else:
ret += line.lstrip() + '\n'
# Pandoc's definition_lists. See http://pandoc.org/README.html
# for more information.
ret += '\n%s\n: %s\n' % (option, description)
continue
ret += line.lstrip() + '\n'
else:
ret += line + '\n'

View File

@@ -71,9 +71,12 @@ fi
/bin/echo -e "\n### Changing version in version.py..."
sed -i "s/__version__ = '.*'/__version__ = '$version'/" youtube_dl/version.py
/bin/echo -e "\n### Changing version in ChangeLog..."
sed -i "s/<unreleased>/$version/" ChangeLog
/bin/echo -e "\n### Committing documentation, templates and youtube_dl/version.py..."
make README.md CONTRIBUTING.md .github/ISSUE_TEMPLATE.md supportedsites
git add README.md CONTRIBUTING.md .github/ISSUE_TEMPLATE.md docs/supportedsites.md youtube_dl/version.py
git add README.md CONTRIBUTING.md .github/ISSUE_TEMPLATE.md docs/supportedsites.md youtube_dl/version.py ChangeLog
git commit $gpg_sign_commits -m "release $version"
/bin/echo -e "\n### Now tagging, signing and pushing..."

View File

@@ -0,0 +1,47 @@
#!/usr/bin/env python
from __future__ import unicode_literals
import itertools
import json
import os
import re
import sys
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from youtube_dl.compat import (
compat_print,
compat_urllib_request,
)
from youtube_dl.utils import format_bytes
def format_size(bytes):
return '%s (%d bytes)' % (format_bytes(bytes), bytes)
total_bytes = 0
for page in itertools.count(1):
releases = json.loads(compat_urllib_request.urlopen(
'https://api.github.com/repos/rg3/youtube-dl/releases?page=%s' % page
).read().decode('utf-8'))
if not releases:
break
for release in releases:
compat_print(release['name'])
for asset in release['assets']:
asset_name = asset['name']
total_bytes += asset['download_count'] * asset['size']
if all(not re.match(p, asset_name) for p in (
r'^youtube-dl$',
r'^youtube-dl-\d{4}\.\d{2}\.\d{2}(?:\.\d+)?\.tar\.gz$',
r'^youtube-dl\.exe$')):
continue
compat_print(
' %s size: %s downloads: %d'
% (asset_name, format_size(asset['size']), asset['download_count']))
compat_print('total downloads traffic: %s' % format_size(total_bytes))

View File

@@ -14,6 +14,7 @@
- **8tracks**
- **91porn**
- **9gag**
- **9now.com.au**
- **abc.net.au**
- **Abc7News**
- **abcnews**
@@ -45,6 +46,7 @@
- **archive.org**: archive.org videos
- **ARD**
- **ARD:mediathek**
- **Arkena**
- **arte.tv**
- **arte.tv:+7**
- **arte.tv:cinema**
@@ -140,7 +142,8 @@
- **CollegeRama**
- **ComCarCoff**
- **ComedyCentral**
- **ComedyCentralShows**: The Daily Show / The Colbert Report
- **ComedyCentralShortname**
- **ComedyCentralTV**
- **CondeNast**: Condé Nast media group: Allure, Architectural Digest, Ars Technica, Bon Appétit, Brides, Condé Nast, Condé Nast Traveler, Details, Epicurious, GQ, Glamour, Golf Digest, SELF, Teen Vogue, The New Yorker, Vanity Fair, Vogue, W Magazine, WIRED
- **Coub**
- **Cracked**
@@ -179,6 +182,7 @@
- **DigitallySpeaking**
- **Digiteka**
- **Discovery**
- **DiscoveryGo**
- **Dotsub**
- **DouyuTV**: 斗鱼
- **DPlay**
@@ -224,6 +228,7 @@
- **Firstpost**
- **FiveTV**
- **Flickr**
- **Flipagram**
- **Folketinget**: Folketinget (ft.dk; Danish parliament)
- **FootyRoom**
- **Formula1**
@@ -233,7 +238,6 @@
- **FoxSports**
- **france2.fr:generation-quoi**
- **FranceCulture**
- **FranceCultureEmission**
- **FranceInter**
- **francetv**: France 2, 3, 4, 5 and Ô
- **francetvinfo.fr**
@@ -244,7 +248,6 @@
- **FunnyOrDie**
- **Fusion**
- **GameInformer**
- **Gamekings**
- **GameOne**
- **gameone:playlist**
- **Gamersyde**
@@ -261,7 +264,6 @@
- **GloboArticle**
- **GodTube**
- **GodTV**
- **GoldenMoustache**
- **Golem**
- **GoogleDrive**
- **Goshgay**
@@ -274,6 +276,7 @@
- **HellPorno**
- **Helsinki**: helsinki.fi
- **HentaiStigma**
- **HGTV**
- **HistoricFilms**
- **history:topic**: History.com Topic
- **hitbox**
@@ -312,6 +315,7 @@
- **jpopsuki.tv**
- **JWPlatform**
- **Kaltura**
- **Kamcord**
- **KanalPlay**: Kanal 5/9/11 Play
- **Kankan**
- **Karaoketv**
@@ -333,6 +337,8 @@
- **kuwo:song**: 酷我音乐
- **la7.it**
- **Laola1Tv**
- **Lcp**
- **LcpPlay**
- **Le**: 乐视网
- **Learnr**
- **Lecture2Go**
@@ -394,7 +400,6 @@
- **MSN**
- **MTV**
- **mtv.de**
- **mtviggy.com**
- **mtvservices:embedded**
- **MuenchenTV**: münchen.tv
- **MusicPlayOn**
@@ -410,7 +415,8 @@
- **MyVidster**
- **n-tv.de**
- **natgeo**
- **natgeo:channel**
- **natgeo:episodeguide**
- **natgeo:video**
- **Naver**
- **NBA**
- **NBC**
@@ -434,7 +440,6 @@
- **Newstube**
- **NextMedia**: 蘋果日報
- **NextMediaActionNews**: 蘋果日報 - 動新聞
- **nextmovie.com**
- **nfb**: National Film Board of Canada
- **nfl.com**
- **nhl.com**
@@ -446,6 +451,7 @@
- **niconico**: ニコニコ動画
- **NiconicoPlaylist**
- **NineCNineMedia**
- **Nintendo**
- **njoy**: N-JOY
- **njoy:embed**
- **Noco**
@@ -473,9 +479,12 @@
- **NYTimes**
- **NYTimesArticle**
- **ocw.mit.edu**
- **OdaTV**
- **Odnoklassniki**
- **OktoberfestTV**
- **on.aol.com**
- **onet.tv**
- **onet.tv:channel**
- **OnionStudios**
- **Ooyala**
- **OoyalaExternal**
@@ -509,6 +518,7 @@
- **plus.google**: Google Plus
- **pluzz.francetv.fr**
- **podomatic**
- **Pokemon**
- **PolskieRadio**
- **PornHd**
- **PornHub**: PornHub and Thumbzilla
@@ -550,8 +560,10 @@
- **RICE**
- **RingTV**
- **RockstarGames**
- **RoosterTeeth**
- **RottenTomatoes**
- **Roxwel**
- **Rozhlas**
- **RTBF**
- **rte**: Raidió Teilifís Éireann TV
- **rte:radio**: Raidió Teilifís Éireann radio
@@ -562,7 +574,9 @@
- **rtve.es:alacarta**: RTVE a la carta
- **rtve.es:infantil**: RTVE infantil
- **rtve.es:live**: RTVE.es live streams
- **rtve.es:television**
- **RTVNH**
- **Rudo**
- **RUHD**
- **RulePorn**
- **rutube**: Rutube videos
@@ -607,6 +621,7 @@
- **smotri:user**: Smotri.com user videos
- **Snotr**
- **Sohu**
- **SonyLIV**
- **soundcloud**
- **soundcloud:playlist**
- **soundcloud:search**: Soundcloud search
@@ -637,6 +652,7 @@
- **stanfordoc**: Stanford Open ClassRoom
- **Steam**
- **Stitcher**
- **Streamable**
- **streamcloud.eu**
- **StreamCZ**
- **StreetVoice**
@@ -648,7 +664,6 @@
- **SztvHu**
- **Tagesschau**
- **tagesschau:player**
- **Tapely**
- **Tass**
- **TDSLifeway**
- **teachertube**: teachertube.com videos
@@ -684,6 +699,7 @@
- **TNAFlix**
- **TNAFlixNetworkEmbed**
- **toggle**
- **Tosh**: Tosh.0
- **tou.tv**
- **Toypics**: Toypics user profile
- **ToypicsUser**: Toypics user profile
@@ -713,6 +729,7 @@
- **tvigle**: Интернет-телевидение Tvigle.ru
- **tvland.com**
- **tvp**: Telewizja Polska
- **tvp:embed**: Telewizja Polska
- **tvp:series**
- **TVPlay**: TV3Play and related services
- **Tweakers**
@@ -730,6 +747,7 @@
- **udemy:course**
- **UDNEmbed**: 聯合影音
- **Unistra**
- **uol.com.br**
- **Urort**: NRK P3 Urørt
- **URPlay**
- **USAToday**
@@ -789,8 +807,10 @@
- **vine:user**
- **vk**: VK
- **vk:uservideos**: VK - User's Videos
- **vk:wallpost**
- **vlive**
- **Vodlocker**
- **VODPlatform**
- **VoiceRepublic**
- **VoxMedia**
- **Vporn**
@@ -857,6 +877,7 @@
- **youtube:search**: YouTube.com searches
- **youtube:search:date**: YouTube.com searches, newest videos first
- **youtube:search_url**: YouTube.com search URLs
- **youtube:shared**
- **youtube:show**: YouTube.com (multi-season) shows
- **youtube:subscriptions**: YouTube.com subscriptions feed, "ytsubs" keyword (requires authentication)
- **youtube:user**: YouTube.com user videos (URL or "ytuser" keyword)

View File

@@ -48,6 +48,9 @@ class TestInfoExtractor(unittest.TestCase):
self.assertEqual(ie._og_search_property('foobar', html), 'Foo')
self.assertEqual(ie._og_search_property('test1', html), 'foo > < bar')
self.assertEqual(ie._og_search_property('test2', html), 'foo >//< bar')
self.assertEqual(ie._og_search_property(('test0', 'test1'), html), 'foo > < bar')
self.assertRaises(RegexNotFoundError, ie._og_search_property, 'test0', html, None, fatal=True)
self.assertRaises(RegexNotFoundError, ie._og_search_property, ('test0', 'test00'), html, None, fatal=True)
def test_html_search_meta(self):
ie = self.ie

View File

@@ -335,6 +335,40 @@ class TestFormatSelection(unittest.TestCase):
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], f1['format_id'])
def test_audio_only_extractor_format_selection(self):
# For extractors with incomplete formats (all formats are audio-only or
# video-only) best and worst should fallback to corresponding best/worst
# video-only or audio-only formats (as per
# https://github.com/rg3/youtube-dl/pull/5556)
formats = [
{'format_id': 'low', 'ext': 'mp3', 'preference': 1, 'vcodec': 'none', 'url': TEST_URL},
{'format_id': 'high', 'ext': 'mp3', 'preference': 2, 'vcodec': 'none', 'url': TEST_URL},
]
info_dict = _make_result(formats)
ydl = YDL({'format': 'best'})
ydl.process_ie_result(info_dict.copy())
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], 'high')
ydl = YDL({'format': 'worst'})
ydl.process_ie_result(info_dict.copy())
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], 'low')
def test_format_not_available(self):
formats = [
{'format_id': 'regular', 'ext': 'mp4', 'height': 360, 'url': TEST_URL},
{'format_id': 'video', 'ext': 'mp4', 'height': 720, 'acodec': 'none', 'url': TEST_URL},
]
info_dict = _make_result(formats)
# This must fail since complete video-audio format does not match filter
# and extractor does not provide incomplete only formats (i.e. only
# video-only or audio-only).
ydl = YDL({'format': 'best[height>360]'})
self.assertRaises(ExtractorError, ydl.process_ie_result, info_dict.copy())
def test_invalid_format_specs(self):
def assert_syntax_error(format_spec):
ydl = YDL({'format': format_spec})

View File

@@ -101,8 +101,6 @@ class TestAllURLsMatching(unittest.TestCase):
self.assertMatch(':ytsubs', ['youtube:subscriptions'])
self.assertMatch(':ytsubscriptions', ['youtube:subscriptions'])
self.assertMatch(':ythistory', ['youtube:history'])
self.assertMatch(':thedailyshow', ['ComedyCentralShows'])
self.assertMatch(':tds', ['ComedyCentralShows'])
def test_vimeo_matching(self):
self.assertMatch('https://vimeo.com/channels/tributes', ['vimeo:channel'])

View File

@@ -88,6 +88,7 @@ class TestCompat(unittest.TestCase):
def test_compat_shlex_split(self):
self.assertEqual(compat_shlex_split('-option "one two"'), ['-option', 'one two'])
self.assertEqual(compat_shlex_split('-option "one\ntwo" \n -flag'), ['-option', 'one\ntwo', '-flag'])
self.assertEqual(compat_shlex_split('-val 中文'), ['-val', '中文'])
def test_compat_etree_fromstring(self):
xml = '''

View File

@@ -138,27 +138,27 @@ class TestProxy(unittest.TestCase):
self.proxy_thread.daemon = True
self.proxy_thread.start()
self.cn_proxy = compat_http_server.HTTPServer(
('localhost', 0), _build_proxy_handler('cn'))
self.cn_port = http_server_port(self.cn_proxy)
self.cn_proxy_thread = threading.Thread(target=self.cn_proxy.serve_forever)
self.cn_proxy_thread.daemon = True
self.cn_proxy_thread.start()
self.geo_proxy = compat_http_server.HTTPServer(
('localhost', 0), _build_proxy_handler('geo'))
self.geo_port = http_server_port(self.geo_proxy)
self.geo_proxy_thread = threading.Thread(target=self.geo_proxy.serve_forever)
self.geo_proxy_thread.daemon = True
self.geo_proxy_thread.start()
def test_proxy(self):
cn_proxy = 'localhost:{0}'.format(self.cn_port)
geo_proxy = 'localhost:{0}'.format(self.geo_port)
ydl = YoutubeDL({
'proxy': 'localhost:{0}'.format(self.port),
'cn_verification_proxy': cn_proxy,
'geo_verification_proxy': geo_proxy,
})
url = 'http://foo.com/bar'
response = ydl.urlopen(url).read().decode('utf-8')
self.assertEqual(response, 'normal: {0}'.format(url))
req = compat_urllib_request.Request(url)
req.add_header('Ytdl-request-proxy', cn_proxy)
req.add_header('Ytdl-request-proxy', geo_proxy)
response = ydl.urlopen(req).read().decode('utf-8')
self.assertEqual(response, 'cn: {0}'.format(url))
self.assertEqual(response, 'geo: {0}'.format(url))
def test_proxy_with_idn(self):
ydl = YoutubeDL({

View File

@@ -33,6 +33,7 @@ from youtube_dl.utils import (
ExtractorError,
find_xpath_attr,
fix_xml_ampersands,
get_element_by_class,
InAdvancePagedList,
intlist_to_bytes,
is_html,
@@ -41,6 +42,7 @@ from youtube_dl.utils import (
ohdave_rsa_encrypt,
OnDemandPagedList,
orderedSet,
parse_age_limit,
parse_duration,
parse_filesize,
parse_count,
@@ -80,6 +82,7 @@ from youtube_dl.utils import (
cli_option,
cli_valueless_option,
cli_bool_option,
parse_codecs,
)
from youtube_dl.compat import (
compat_chr,
@@ -306,6 +309,7 @@ class TestUtil(unittest.TestCase):
self.assertEqual(unified_timestamp('25-09-2014'), 1411603200)
self.assertEqual(unified_timestamp('27.02.2016 17:30'), 1456594200)
self.assertEqual(unified_timestamp('UNKNOWN DATE FORMAT'), None)
self.assertEqual(unified_timestamp('May 16, 2016 11:15 PM'), 1463440500)
def test_determine_ext(self):
self.assertEqual(determine_ext('http://example.com/foo/bar.mp4/?download'), 'mp4')
@@ -405,6 +409,12 @@ class TestUtil(unittest.TestCase):
self.assertEqual(res_url, url)
self.assertEqual(res_data, None)
smug_url = smuggle_url(url, {'a': 'b'})
smug_smug_url = smuggle_url(smug_url, {'c': 'd'})
res_url, res_data = unsmuggle_url(smug_smug_url)
self.assertEqual(res_url, url)
self.assertEqual(res_data, {'a': 'b', 'c': 'd'})
def test_shell_quote(self):
args = ['ffmpeg', '-i', encodeFilename('ñ€ß\'.mp4')]
self.assertEqual(shell_quote(args), """ffmpeg -i 'ñ€ß'"'"'.mp4'""")
@@ -423,6 +433,20 @@ class TestUtil(unittest.TestCase):
url_basename('http://media.w3.org/2010/05/sintel/trailer.mp4'),
'trailer.mp4')
def test_parse_age_limit(self):
self.assertEqual(parse_age_limit(None), None)
self.assertEqual(parse_age_limit(False), None)
self.assertEqual(parse_age_limit('invalid'), None)
self.assertEqual(parse_age_limit(0), 0)
self.assertEqual(parse_age_limit(18), 18)
self.assertEqual(parse_age_limit(21), 21)
self.assertEqual(parse_age_limit(22), None)
self.assertEqual(parse_age_limit('18'), 18)
self.assertEqual(parse_age_limit('18+'), 18)
self.assertEqual(parse_age_limit('PG-13'), 13)
self.assertEqual(parse_age_limit('TV-14'), 14)
self.assertEqual(parse_age_limit('TV-MA'), 17)
def test_parse_duration(self):
self.assertEqual(parse_duration(None), None)
self.assertEqual(parse_duration(False), None)
@@ -601,6 +625,29 @@ class TestUtil(unittest.TestCase):
limit_length('foo bar baz asd', 12).startswith('foo bar'))
self.assertTrue('...' in limit_length('foo bar baz asd', 12))
def test_parse_codecs(self):
self.assertEqual(parse_codecs(''), {})
self.assertEqual(parse_codecs('avc1.77.30, mp4a.40.2'), {
'vcodec': 'avc1.77.30',
'acodec': 'mp4a.40.2',
})
self.assertEqual(parse_codecs('mp4a.40.2'), {
'vcodec': 'none',
'acodec': 'mp4a.40.2',
})
self.assertEqual(parse_codecs('mp4a.40.5,avc1.42001e'), {
'vcodec': 'avc1.42001e',
'acodec': 'mp4a.40.5',
})
self.assertEqual(parse_codecs('avc3.640028'), {
'vcodec': 'avc3.640028',
'acodec': 'none',
})
self.assertEqual(parse_codecs(', h264,,newcodec,aac'), {
'vcodec': 'h264',
'acodec': 'aac',
})
def test_escape_rfc3986(self):
reserved = "!*'();:@&=+$,/?#[]"
unreserved = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_.~'
@@ -921,6 +968,7 @@ The first line
self.assertEqual(cli_option({'proxy': '127.0.0.1:3128'}, '--proxy', 'proxy'), ['--proxy', '127.0.0.1:3128'])
self.assertEqual(cli_option({'proxy': None}, '--proxy', 'proxy'), [])
self.assertEqual(cli_option({}, '--proxy', 'proxy'), [])
self.assertEqual(cli_option({'retries': 10}, '--retries', 'retries'), ['--retries', '10'])
def test_cli_valueless_option(self):
self.assertEqual(cli_valueless_option(
@@ -985,5 +1033,13 @@ The first line
self.assertEqual(urshift(3, 1), 1)
self.assertEqual(urshift(-3, 1), 2147483646)
def test_get_element_by_class(self):
html = '''
<span class="foo bar">nice</span>
'''
self.assertEqual(get_element_by_class('foo', html), 'nice')
self.assertEqual(get_element_by_class('no-such-class', html), None)
if __name__ == '__main__':
unittest.main()

View File

@@ -0,0 +1,70 @@
#!/usr/bin/env python
# coding: utf-8
from __future__ import unicode_literals
import unittest
import sys
import os
import subprocess
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
rootDir = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
class TestVerboseOutput(unittest.TestCase):
def test_private_info_arg(self):
outp = subprocess.Popen(
[
sys.executable, 'youtube_dl/__main__.py', '-v',
'--username', 'johnsmith@gmail.com',
'--password', 'secret',
], cwd=rootDir, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
sout, serr = outp.communicate()
self.assertTrue(b'--username' in serr)
self.assertTrue(b'johnsmith' not in serr)
self.assertTrue(b'--password' in serr)
self.assertTrue(b'secret' not in serr)
def test_private_info_shortarg(self):
outp = subprocess.Popen(
[
sys.executable, 'youtube_dl/__main__.py', '-v',
'-u', 'johnsmith@gmail.com',
'-p', 'secret',
], cwd=rootDir, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
sout, serr = outp.communicate()
self.assertTrue(b'-u' in serr)
self.assertTrue(b'johnsmith' not in serr)
self.assertTrue(b'-p' in serr)
self.assertTrue(b'secret' not in serr)
def test_private_info_eq(self):
outp = subprocess.Popen(
[
sys.executable, 'youtube_dl/__main__.py', '-v',
'--username=johnsmith@gmail.com',
'--password=secret',
], cwd=rootDir, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
sout, serr = outp.communicate()
self.assertTrue(b'--username' in serr)
self.assertTrue(b'johnsmith' not in serr)
self.assertTrue(b'--password' in serr)
self.assertTrue(b'secret' not in serr)
def test_private_info_shortarg_eq(self):
outp = subprocess.Popen(
[
sys.executable, 'youtube_dl/__main__.py', '-v',
'-u=johnsmith@gmail.com',
'-p=secret',
], cwd=rootDir, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
sout, serr = outp.communicate()
self.assertTrue(b'-u' in serr)
self.assertTrue(b'johnsmith' not in serr)
self.assertTrue(b'-p' in serr)
self.assertTrue(b'secret' not in serr)
if __name__ == '__main__':
unittest.main()

View File

@@ -5,6 +5,7 @@ from __future__ import absolute_import, unicode_literals
import collections
import contextlib
import copy
import datetime
import errno
import fileinput
@@ -196,8 +197,8 @@ class YoutubeDL(object):
prefer_insecure: Use HTTP instead of HTTPS to retrieve information.
At the moment, this is only supported by YouTube.
proxy: URL of the proxy server to use
cn_verification_proxy: URL of the proxy to use for IP address verification
on Chinese sites. (Experimental)
geo_verification_proxy: URL of the proxy to use for IP address verification
on geo-restricted sites. (Experimental)
socket_timeout: Time to wait for unresponsive hosts, in seconds
bidi_workaround: Work around buggy terminals without bidirectional text
support, using fridibi
@@ -248,7 +249,16 @@ class YoutubeDL(object):
source_address: (Experimental) Client-side IP address to bind to.
call_home: Boolean, true iff we are allowed to contact the
youtube-dl servers for debugging.
sleep_interval: Number of seconds to sleep before each download.
sleep_interval: Number of seconds to sleep before each download when
used alone or a lower bound of a range for randomized
sleep before each download (minimum possible number
of seconds to sleep) when used along with
max_sleep_interval.
max_sleep_interval:Upper bound of a range for randomized sleep before each
download (maximum possible number of seconds to sleep).
Must only be used along with sleep_interval.
Actual sleep time will be a random float from range
[sleep_interval; max_sleep_interval].
listformats: Print an overview of available video formats and exit.
list_thumbnails: Print a table of all thumbnails and exit.
match_filter: A function that gets called with the info_dict of
@@ -304,6 +314,11 @@ class YoutubeDL(object):
self.params.update(params)
self.cache = Cache(self)
if self.params.get('cn_verification_proxy') is not None:
self.report_warning('--cn-verification-proxy is deprecated. Use --geo-verification-proxy instead.')
if self.params.get('geo_verification_proxy') is None:
self.params['geo_verification_proxy'] = self.params['cn_verification_proxy']
if params.get('bidi_workaround', False):
try:
import pty
@@ -1046,9 +1061,9 @@ class YoutubeDL(object):
if isinstance(selector, list):
fs = [_build_selector_function(s) for s in selector]
def selector_function(formats):
def selector_function(ctx):
for f in fs:
for format in f(formats):
for format in f(ctx):
yield format
return selector_function
elif selector.type == GROUP:
@@ -1056,17 +1071,17 @@ class YoutubeDL(object):
elif selector.type == PICKFIRST:
fs = [_build_selector_function(s) for s in selector.selector]
def selector_function(formats):
def selector_function(ctx):
for f in fs:
picked_formats = list(f(formats))
picked_formats = list(f(ctx))
if picked_formats:
return picked_formats
return []
elif selector.type == SINGLE:
format_spec = selector.selector
def selector_function(formats):
formats = list(formats)
def selector_function(ctx):
formats = list(ctx['formats'])
if not formats:
return
if format_spec == 'all':
@@ -1079,9 +1094,10 @@ class YoutubeDL(object):
if f.get('vcodec') != 'none' and f.get('acodec') != 'none']
if audiovideo_formats:
yield audiovideo_formats[format_idx]
# for audio only (soundcloud) or video only (imgur) urls, select the best/worst audio format
elif (all(f.get('acodec') != 'none' for f in formats) or
all(f.get('vcodec') != 'none' for f in formats)):
# for extractors with incomplete formats (audio only (soundcloud)
# or video only (imgur)) we will fallback to best/worst
# {video,audio}-only format
elif ctx['incomplete_formats']:
yield formats[format_idx]
elif format_spec == 'bestaudio':
audio_formats = [
@@ -1155,17 +1171,18 @@ class YoutubeDL(object):
}
video_selector, audio_selector = map(_build_selector_function, selector.selector)
def selector_function(formats):
formats = list(formats)
for pair in itertools.product(video_selector(formats), audio_selector(formats)):
def selector_function(ctx):
for pair in itertools.product(
video_selector(copy.deepcopy(ctx)), audio_selector(copy.deepcopy(ctx))):
yield _merge(pair)
filters = [self._build_format_filter(f) for f in selector.filters]
def final_selector(formats):
def final_selector(ctx):
ctx_copy = copy.deepcopy(ctx)
for _filter in filters:
formats = list(filter(_filter, formats))
return selector_function(formats)
ctx_copy['formats'] = list(filter(_filter, ctx_copy['formats']))
return selector_function(ctx_copy)
return final_selector
stream = io.BytesIO(format_spec.encode('utf-8'))
@@ -1372,7 +1389,34 @@ class YoutubeDL(object):
req_format_list.append('best')
req_format = '/'.join(req_format_list)
format_selector = self.build_format_selector(req_format)
formats_to_download = list(format_selector(formats))
# While in format selection we may need to have an access to the original
# format set in order to calculate some metrics or do some processing.
# For now we need to be able to guess whether original formats provided
# by extractor are incomplete or not (i.e. whether extractor provides only
# video-only or audio-only formats) for proper formats selection for
# extractors with such incomplete formats (see
# https://github.com/rg3/youtube-dl/pull/5556).
# Since formats may be filtered during format selection and may not match
# the original formats the results may be incorrect. Thus original formats
# or pre-calculated metrics should be passed to format selection routines
# as well.
# We will pass a context object containing all necessary additional data
# instead of just formats.
# This fixes incorrect format selection issue (see
# https://github.com/rg3/youtube-dl/issues/10083).
incomplete_formats = (
# All formats are video-only or
all(f.get('vcodec') != 'none' and f.get('acodec') == 'none' for f in formats) or
# all formats are audio-only
all(f.get('vcodec') == 'none' and f.get('acodec') != 'none' for f in formats))
ctx = {
'formats': formats,
'incomplete_formats': incomplete_formats,
}
formats_to_download = list(format_selector(ctx))
if not formats_to_download:
raise ExtractorError('requested format not available',
expected=True)
@@ -1559,7 +1603,9 @@ class YoutubeDL(object):
self.to_screen('[info] Video subtitle %s.%s is already_present' % (sub_lang, sub_format))
else:
self.to_screen('[info] Writing video subtitles to: ' + sub_filename)
with io.open(encodeFilename(sub_filename), 'w', encoding='utf-8') as subfile:
# Use newline='' to prevent conversion of newline characters
# See https://github.com/rg3/youtube-dl/issues/10268
with io.open(encodeFilename(sub_filename), 'w', encoding='utf-8', newline='') as subfile:
subfile.write(sub_data)
except (OSError, IOError):
self.report_error('Cannot write subtitles file ' + sub_filename)

View File

@@ -145,6 +145,16 @@ def _real_main(argv=None):
if numeric_limit is None:
parser.error('invalid max_filesize specified')
opts.max_filesize = numeric_limit
if opts.sleep_interval is not None:
if opts.sleep_interval < 0:
parser.error('sleep interval must be positive or 0')
if opts.max_sleep_interval is not None:
if opts.max_sleep_interval < 0:
parser.error('max sleep interval must be positive or 0')
if opts.max_sleep_interval < opts.sleep_interval:
parser.error('max sleep interval must be greater than or equal to min sleep interval')
else:
opts.max_sleep_interval = opts.sleep_interval
def parse_retries(retries):
if retries in ('inf', 'infinite'):
@@ -370,6 +380,7 @@ def _real_main(argv=None):
'source_address': opts.source_address,
'call_home': opts.call_home,
'sleep_interval': opts.sleep_interval,
'max_sleep_interval': opts.max_sleep_interval,
'external_downloader': opts.external_downloader,
'list_thumbnails': opts.list_thumbnails,
'playlist_items': opts.playlist_items,
@@ -382,6 +393,8 @@ def _real_main(argv=None):
'external_downloader_args': external_downloader_args,
'postprocessor_args': postprocessor_args,
'cn_verification_proxy': opts.cn_verification_proxy,
'geo_verification_proxy': opts.geo_verification_proxy,
}
with YoutubeDL(ydl_opts) as ydl:

View File

@@ -1,3 +1,4 @@
# coding: utf-8
from __future__ import unicode_literals
import binascii
@@ -2594,15 +2595,19 @@ except ImportError: # Python < 3.3
return "'" + s.replace("'", "'\"'\"'") + "'"
if sys.version_info >= (2, 7, 3):
try:
args = shlex.split('中文')
assert (isinstance(args, list) and
isinstance(args[0], compat_str) and
args[0] == '中文')
compat_shlex_split = shlex.split
else:
except (AssertionError, UnicodeEncodeError):
# Working around shlex issue with unicode strings on some python 2
# versions (see http://bugs.python.org/issue1548891)
def compat_shlex_split(s, comments=False, posix=True):
if isinstance(s, compat_str):
s = s.encode('utf-8')
return shlex.split(s, comments, posix)
return list(map(lambda s: s.decode('utf-8'), shlex.split(s, comments, posix)))
def compat_ord(c):

View File

@@ -4,6 +4,7 @@ import os
import re
import sys
import time
import random
from ..compat import compat_os_name
from ..utils import (
@@ -342,8 +343,11 @@ class FileDownloader(object):
})
return True
sleep_interval = self.params.get('sleep_interval')
if sleep_interval:
min_sleep_interval = self.params.get('sleep_interval')
if min_sleep_interval:
max_sleep_interval = self.params.get('max_sleep_interval', min_sleep_interval)
print(min_sleep_interval, max_sleep_interval)
sleep_interval = random.uniform(min_sleep_interval, max_sleep_interval)
self.to_screen('[download] Sleeping %s seconds...' % sleep_interval)
time.sleep(sleep_interval)

View File

@@ -96,6 +96,12 @@ class CurlFD(ExternalFD):
cmd = [self.exe, '--location', '-o', tmpfilename]
for key, val in info_dict['http_headers'].items():
cmd += ['--header', '%s: %s' % (key, val)]
cmd += self._bool_option('--continue-at', 'continuedl', '-', '0')
cmd += self._valueless_option('--silent', 'noprogress')
cmd += self._valueless_option('--verbose', 'verbose')
cmd += self._option('--limit-rate', 'ratelimit')
cmd += self._option('--retry', 'retries')
cmd += self._option('--max-filesize', 'max_filesize')
cmd += self._option('--interface', 'source_address')
cmd += self._option('--proxy', 'proxy')
cmd += self._valueless_option('--insecure', 'nocheckcertificate')
@@ -103,6 +109,16 @@ class CurlFD(ExternalFD):
cmd += ['--', info_dict['url']]
return cmd
def _call_downloader(self, tmpfilename, info_dict):
cmd = [encodeArgument(a) for a in self._make_cmd(tmpfilename, info_dict)]
self._debug_cmd(cmd)
# curl writes the progress to stderr so don't capture it.
p = subprocess.Popen(cmd)
p.communicate()
return p.returncode
class AxelFD(ExternalFD):
AVAILABLE_OPT = '-V'

View File

@@ -196,6 +196,11 @@ def build_fragments_list(boot_info):
first_frag_number = fragment_run_entry_table[0]['first']
fragments_counter = itertools.count(first_frag_number)
for segment, fragments_count in segment_run_table['segment_run']:
# In some live HDS streams (for example Rai), `fragments_count` is
# abnormal and causing out-of-memory errors. It's OK to change the
# number of fragments for live streams as they are updated periodically
if fragments_count == 4294967295 and boot_info['live']:
fragments_count = 2
for _ in range(fragments_count):
res.append((segment, next(fragments_counter)))
@@ -329,7 +334,11 @@ class F4mFD(FragmentFD):
base_url = compat_urlparse.urljoin(man_url, media.attrib['url'])
bootstrap_node = doc.find(_add_ns('bootstrapInfo'))
boot_info, bootstrap_url = self._parse_bootstrap_node(bootstrap_node, base_url)
# From Adobe F4M 3.0 spec:
# The <baseURL> element SHALL be the base URL for all relative
# (HTTP-based) URLs in the manifest. If <baseURL> is not present, said
# URLs should be relative to the location of the containing document.
boot_info, bootstrap_url = self._parse_bootstrap_node(bootstrap_node, man_url)
live = boot_info['live']
metadata_node = media.find(_add_ns('metadata'))
if metadata_node is not None:

View File

@@ -83,6 +83,20 @@ class AdultSwimIE(InfoExtractor):
# m3u8 download
'skip_download': True,
}
}, {
# heroMetadata.trailer
'url': 'http://www.adultswim.com/videos/decker/inside-decker-a-new-hero/',
'info_dict': {
'id': 'I0LQFQkaSUaFp8PnAWHhoQ',
'ext': 'mp4',
'title': 'Decker - Inside Decker: A New Hero',
'description': 'md5:c916df071d425d62d70c86d4399d3ee0',
'duration': 249.008,
},
'params': {
# m3u8 download
'skip_download': True,
}
}]
@staticmethod
@@ -133,20 +147,26 @@ class AdultSwimIE(InfoExtractor):
if video_info is None:
if bootstrapped_data.get('slugged_video', {}).get('slug') == episode_path:
video_info = bootstrapped_data['slugged_video']
else:
raise ExtractorError('Unable to find video info')
if not video_info:
video_info = bootstrapped_data.get('heroMetadata', {}).get('trailer').get('video')
if not video_info:
raise ExtractorError('Unable to find video info')
show = bootstrapped_data['show']
show_title = show['title']
stream = video_info.get('stream')
clips = [stream] if stream else video_info.get('clips')
if not clips:
if stream and stream.get('videoPlaybackID'):
segment_ids = [stream['videoPlaybackID']]
elif video_info.get('clips'):
segment_ids = [clip['videoPlaybackID'] for clip in video_info['clips']]
elif video_info.get('videoPlaybackID'):
segment_ids = [video_info['videoPlaybackID']]
else:
raise ExtractorError(
'This video is only available via cable service provider subscription that'
' is not currently supported. You may want to use --cookies.'
if video_info.get('auth') is True else 'Unable to find stream or clips',
expected=True)
segment_ids = [clip['videoPlaybackID'] for clip in clips]
episode_id = video_info['id']
episode_title = video_info['title']

View File

@@ -2,7 +2,7 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from .theplatform import ThePlatformIE
from ..utils import (
smuggle_url,
update_url_query,
@@ -15,28 +15,15 @@ from ..compat import (
)
class AENetworksBaseIE(InfoExtractor):
def theplatform_url_result(self, theplatform_url, video_id, query):
return {
'_type': 'url_transparent',
'id': video_id,
'url': smuggle_url(
update_url_query(theplatform_url, query),
{
'sig': {
'key': 'crazyjava',
'secret': 's3cr3t'
},
'force_smil_url': True
}),
'ie_key': 'ThePlatform',
}
class AENetworksBaseIE(ThePlatformIE):
_THEPLATFORM_KEY = 'crazyjava'
_THEPLATFORM_SECRET = 's3cr3t'
class AENetworksIE(AENetworksBaseIE):
IE_NAME = 'aenetworks'
IE_DESC = 'A+E Networks: A&E, Lifetime, History.com, FYI Network'
_VALID_URL = r'https?://(?:www\.)?(?:(?:history|aetv|mylifetime)\.com|fyi\.tv)/(?:shows/(?P<show_path>[^/]+(?:/[^/]+){0,2})|movies/(?P<movie_display_id>[^/]+)/full-movie)'
_VALID_URL = r'https?://(?:www\.)?(?P<domain>(?:history|aetv|mylifetime)\.com|fyi\.tv)/(?:shows/(?P<show_path>[^/]+(?:/[^/]+){0,2})|movies/(?P<movie_display_id>[^/]+)/full-movie)'
_TESTS = [{
'url': 'http://www.history.com/shows/mountain-men/season-1/episode-1',
'md5': '8ff93eb073449f151d6b90c0ae1ef0c7',
@@ -76,9 +63,15 @@ class AENetworksIE(AENetworksBaseIE):
'url': 'http://www.mylifetime.com/movies/center-stage-on-pointe/full-movie',
'only_matching': True
}]
_DOMAIN_TO_REQUESTOR_ID = {
'history.com': 'HISTORY',
'aetv.com': 'AETV',
'mylifetime.com': 'LIFETIME',
'fyi.tv': 'FYI',
}
def _real_extract(self, url):
show_path, movie_display_id = re.match(self._VALID_URL, url).groups()
domain, show_path, movie_display_id = re.match(self._VALID_URL, url).groups()
display_id = show_path or movie_display_id
webpage = self._download_webpage(url, display_id)
if show_path:
@@ -103,23 +96,39 @@ class AENetworksIE(AENetworksBaseIE):
episode_attributes['data-videoid']))
return self.playlist_result(
entries, self._html_search_meta('aetn:SeasonId', webpage))
query = {
'mbr': 'true',
'assetTypes': 'medium_video_s3'
}
video_id = self._html_search_meta('aetn:VideoID', webpage)
media_url = self._search_regex(
r"media_url\s*=\s*'([^']+)'", webpage, 'video url')
info = self._search_json_ld(webpage, video_id, fatal=False)
info.update(self.theplatform_url_result(
media_url, video_id, {
'mbr': 'true',
'assetTypes': 'medium_video_s3'
}))
theplatform_metadata = self._download_theplatform_metadata(self._search_regex(
r'https?://link.theplatform.com/s/([^?]+)', media_url, 'theplatform_path'), video_id)
info = self._parse_theplatform_metadata(theplatform_metadata)
if theplatform_metadata.get('AETN$isBehindWall'):
requestor_id = self._DOMAIN_TO_REQUESTOR_ID[domain]
resource = '<rss version="2.0" xmlns:media="http://search.yahoo.com/mrss/"><channel><title>%s</title><item><title>%s</title><guid>%s</guid><media:rating scheme="urn:v-chip">%s</media:rating></item></channel></rss>' % (requestor_id, theplatform_metadata['title'], theplatform_metadata['AETN$PPL_pplProgramId'], theplatform_metadata['ratings'][0]['rating'])
query['auth'] = self._extract_mvpd_auth(
url, video_id, requestor_id, resource)
info.update(self._search_json_ld(webpage, video_id, fatal=False))
media_url = update_url_query(media_url, query)
media_url = self._sign_url(media_url, self._THEPLATFORM_KEY, self._THEPLATFORM_SECRET)
formats, subtitles = self._extract_theplatform_smil(media_url, video_id)
self._sort_formats(formats)
info.update({
'id': video_id,
'formats': formats,
'subtitles': subtitles,
})
return info
class HistoryTopicIE(AENetworksBaseIE):
IE_NAME = 'history:topic'
IE_DESC = 'History.com Topic'
_VALID_URL = r'https?://(?:www\.)?history\.com/topics/(?:[^/]+/)?(?P<topic_id>[^/]+)/videos(?:/(?P<video_display_id>[^/?#]+))?'
_VALID_URL = r'https?://(?:www\.)?history\.com/topics/(?:[^/]+/)?(?P<topic_id>[^/]+)(?:/[^/]+(?:/(?P<video_display_id>[^/?#]+))?)?'
_TESTS = [{
'url': 'http://www.history.com/topics/valentines-day/history-of-valentines-day/videos/bet-you-didnt-know-valentines-day?m=528e394da93ae&s=undefined&f=1&free=false',
'info_dict': {
@@ -147,8 +156,30 @@ class HistoryTopicIE(AENetworksBaseIE):
}, {
'url': 'http://www.history.com/topics/world-war-i-history/videos',
'only_matching': True,
}, {
'url': 'http://www.history.com/topics/world-war-i/world-war-i-history',
'only_matching': True,
}, {
'url': 'http://www.history.com/topics/world-war-i/world-war-i-history/speeches',
'only_matching': True,
}]
def theplatform_url_result(self, theplatform_url, video_id, query):
return {
'_type': 'url_transparent',
'id': video_id,
'url': smuggle_url(
update_url_query(theplatform_url, query),
{
'sig': {
'key': self._THEPLATFORM_KEY,
'secret': self._THEPLATFORM_SECRET,
},
'force_smil_url': True
}),
'ie_key': 'ThePlatform',
}
def _real_extract(self, url):
topic_id, video_display_id = re.match(self._VALID_URL, url).groups()
if video_display_id:

View File

@@ -5,6 +5,8 @@ from .common import InfoExtractor
from ..utils import (
int_or_none,
parse_iso8601,
mimetype2ext,
determine_ext,
)
@@ -50,21 +52,25 @@ class AMPIE(InfoExtractor):
if isinstance(media_content, dict):
media_content = [media_content]
for media_data in media_content:
media = media_data['@attributes']
media_type = media['type']
if media_type in ('video/f4m', 'application/f4m+xml'):
media = media_data.get('@attributes', {})
media_url = media.get('url')
if not media_url:
continue
ext = mimetype2ext(media.get('type')) or determine_ext(media_url)
if ext == 'f4m':
formats.extend(self._extract_f4m_formats(
media['url'] + '?hdcore=3.4.0&plugin=aasp-3.4.0.132.124',
media_url + '?hdcore=3.4.0&plugin=aasp-3.4.0.132.124',
video_id, f4m_id='hds', fatal=False))
elif media_type == 'application/x-mpegURL':
elif ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
media['url'], video_id, 'mp4', m3u8_id='hls', fatal=False))
media_url, video_id, 'mp4', m3u8_id='hls', fatal=False))
else:
formats.append({
'format_id': media_data.get('media-category', {}).get('@attributes', {}).get('label'),
'url': media['url'],
'tbr': int_or_none(media.get('bitrate')),
'filesize': int_or_none(media.get('fileSize')),
'ext': ext,
})
self._sort_formats(formats)

View File

@@ -22,6 +22,7 @@ class AnimeOnDemandIE(InfoExtractor):
_APPLY_HTML5_URL = 'https://www.anime-on-demand.de/html5apply'
_NETRC_MACHINE = 'animeondemand'
_TESTS = [{
# jap, OmU
'url': 'https://www.anime-on-demand.de/anime/161',
'info_dict': {
'id': '161',
@@ -30,17 +31,21 @@ class AnimeOnDemandIE(InfoExtractor):
},
'playlist_mincount': 4,
}, {
# Film wording is used instead of Episode
# Film wording is used instead of Episode, ger/jap, Dub/OmU
'url': 'https://www.anime-on-demand.de/anime/39',
'only_matching': True,
}, {
# Episodes without titles
# Episodes without titles, jap, OmU
'url': 'https://www.anime-on-demand.de/anime/162',
'only_matching': True,
}, {
# ger/jap, Dub/OmU, account required
'url': 'https://www.anime-on-demand.de/anime/169',
'only_matching': True,
}, {
# Full length film, non-series, ger/jap, Dub/OmU, account required
'url': 'https://www.anime-on-demand.de/anime/185',
'only_matching': True,
}]
def _login(self):
@@ -110,35 +115,12 @@ class AnimeOnDemandIE(InfoExtractor):
entries = []
for num, episode_html in enumerate(re.findall(
r'(?s)<h3[^>]+class="episodebox-title".+?>Episodeninhalt<', webpage), 1):
episodebox_title = self._search_regex(
(r'class="episodebox-title"[^>]+title=(["\'])(?P<title>.+?)\1',
r'class="episodebox-title"[^>]+>(?P<title>.+?)<'),
episode_html, 'episodebox title', default=None, group='title')
if not episodebox_title:
continue
episode_number = int(self._search_regex(
r'(?:Episode|Film)\s*(\d+)',
episodebox_title, 'episode number', default=num))
episode_title = self._search_regex(
r'(?:Episode|Film)\s*\d+\s*-\s*(.+)',
episodebox_title, 'episode title', default=None)
video_id = 'episode-%d' % episode_number
common_info = {
'id': video_id,
'series': anime_title,
'episode': episode_title,
'episode_number': episode_number,
}
def extract_info(html, video_id, num=None):
title, description = [None] * 2
formats = []
for input_ in re.findall(
r'<input[^>]+class=["\'].*?streamstarter_html5[^>]+>', episode_html):
r'<input[^>]+class=["\'].*?streamstarter_html5[^>]+>', html):
attributes = extract_attributes(input_)
playlist_urls = []
for playlist_key in ('data-playlist', 'data-otherplaylist'):
@@ -161,7 +143,7 @@ class AnimeOnDemandIE(InfoExtractor):
format_id_list.append(lang)
if kind:
format_id_list.append(kind)
if not format_id_list:
if not format_id_list and num is not None:
format_id_list.append(compat_str(num))
format_id = '-'.join(format_id_list)
format_note = ', '.join(filter(None, (kind, lang_note)))
@@ -215,28 +197,74 @@ class AnimeOnDemandIE(InfoExtractor):
})
formats.extend(file_formats)
if formats:
self._sort_formats(formats)
return {
'title': title,
'description': description,
'formats': formats,
}
def extract_entries(html, video_id, common_info, num=None):
info = extract_info(html, video_id, num)
if info['formats']:
self._sort_formats(info['formats'])
f = common_info.copy()
f.update({
'title': title,
'description': description,
'formats': formats,
})
f.update(info)
entries.append(f)
# Extract teaser only when full episode is not available
if not formats:
# Extract teaser/trailer only when full episode is not available
if not info['formats']:
m = re.search(
r'data-dialog-header=(["\'])(?P<title>.+?)\1[^>]+href=(["\'])(?P<href>.+?)\3[^>]*>Teaser<',
episode_html)
r'data-dialog-header=(["\'])(?P<title>.+?)\1[^>]+href=(["\'])(?P<href>.+?)\3[^>]*>(?P<kind>Teaser|Trailer)<',
html)
if m:
f = common_info.copy()
f.update({
'id': '%s-teaser' % f['id'],
'id': '%s-%s' % (f['id'], m.group('kind').lower()),
'title': m.group('title'),
'url': compat_urlparse.urljoin(url, m.group('href')),
})
entries.append(f)
def extract_episodes(html):
for num, episode_html in enumerate(re.findall(
r'(?s)<h3[^>]+class="episodebox-title".+?>Episodeninhalt<', html), 1):
episodebox_title = self._search_regex(
(r'class="episodebox-title"[^>]+title=(["\'])(?P<title>.+?)\1',
r'class="episodebox-title"[^>]+>(?P<title>.+?)<'),
episode_html, 'episodebox title', default=None, group='title')
if not episodebox_title:
continue
episode_number = int(self._search_regex(
r'(?:Episode|Film)\s*(\d+)',
episodebox_title, 'episode number', default=num))
episode_title = self._search_regex(
r'(?:Episode|Film)\s*\d+\s*-\s*(.+)',
episodebox_title, 'episode title', default=None)
video_id = 'episode-%d' % episode_number
common_info = {
'id': video_id,
'series': anime_title,
'episode': episode_title,
'episode_number': episode_number,
}
extract_entries(episode_html, video_id, common_info)
def extract_film(html, video_id):
common_info = {
'id': anime_id,
'title': anime_title,
'description': anime_description,
}
extract_entries(html, video_id, common_info)
extract_episodes(webpage)
if not entries:
extract_film(webpage, anime_id)
return self.playlist_result(entries, anime_id, anime_title, anime_description)

View File

@@ -123,6 +123,10 @@ class AolFeaturesIE(InfoExtractor):
'title': 'What To Watch - February 17, 2016',
},
'add_ie': ['FiveMin'],
'params': {
# encrypted m3u8 download
'skip_download': True,
},
}]
def _real_extract(self, url):

View File

@@ -1,8 +1,6 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
ExtractorError,
@@ -15,7 +13,7 @@ class AparatIE(InfoExtractor):
_TEST = {
'url': 'http://www.aparat.com/v/wP8On',
'md5': '6714e0af7e0d875c5a39c4dc4ab46ad1',
'md5': '131aca2e14fe7c4dcb3c4877ba300c89',
'info_dict': {
'id': 'wP8On',
'ext': 'mp4',
@@ -31,13 +29,13 @@ class AparatIE(InfoExtractor):
# Note: There is an easier-to-parse configuration at
# http://www.aparat.com/video/video/config/videohash/%video_id
# but the URL in there does not work
embed_url = ('http://www.aparat.com/video/video/embed/videohash/' +
video_id + '/vt/frame')
embed_url = 'http://www.aparat.com/video/video/embed/vt/frame/showvideo/yes/videohash/' + video_id
webpage = self._download_webpage(embed_url, video_id)
video_urls = [video_url.replace('\\/', '/') for video_url in re.findall(
r'(?:fileList\[[0-9]+\]\s*=|"file"\s*:)\s*"([^"]+)"', webpage)]
for i, video_url in enumerate(video_urls):
file_list = self._parse_json(self._search_regex(
r'fileList\s*=\s*JSON\.parse\(\'([^\']+)\'\)', webpage, 'file list'), video_id)
for i, item in enumerate(file_list[0]):
video_url = item['file']
req = HEADRequest(video_url)
res = self._request_webpage(
req, video_id, note='Testing video URL %d' % i, errnote=False)

View File

@@ -1,67 +1,65 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import unified_strdate
from .jwplatform import JWPlatformBaseIE
from ..utils import (
unified_strdate,
clean_html,
)
class ArchiveOrgIE(InfoExtractor):
class ArchiveOrgIE(JWPlatformBaseIE):
IE_NAME = 'archive.org'
IE_DESC = 'archive.org videos'
_VALID_URL = r'https?://(?:www\.)?archive\.org/details/(?P<id>[^?/]+)(?:[?].*)?$'
_VALID_URL = r'https?://(?:www\.)?archive\.org/(?:details|embed)/(?P<id>[^/?#]+)(?:[?].*)?$'
_TESTS = [{
'url': 'http://archive.org/details/XD300-23_68HighlightsAResearchCntAugHumanIntellect',
'md5': '8af1d4cf447933ed3c7f4871162602db',
'info_dict': {
'id': 'XD300-23_68HighlightsAResearchCntAugHumanIntellect',
'ext': 'ogv',
'ext': 'ogg',
'title': '1968 Demo - FJCC Conference Presentation Reel #1',
'description': 'md5:1780b464abaca9991d8968c877bb53ed',
'description': 'md5:da45c349df039f1cc8075268eb1b5c25',
'upload_date': '19681210',
'uploader': 'SRI International'
}
}, {
'url': 'https://archive.org/details/Cops1922',
'md5': '18f2a19e6d89af8425671da1cf3d4e04',
'md5': 'bc73c8ab3838b5a8fc6c6651fa7b58ba',
'info_dict': {
'id': 'Cops1922',
'ext': 'ogv',
'ext': 'mp4',
'title': 'Buster Keaton\'s "Cops" (1922)',
'description': 'md5:70f72ee70882f713d4578725461ffcc3',
'description': 'md5:b4544662605877edd99df22f9620d858',
}
}, {
'url': 'http://archive.org/embed/XD300-23_68HighlightsAResearchCntAugHumanIntellect',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(
'http://archive.org/embed/' + video_id, video_id)
jwplayer_playlist = self._parse_json(self._search_regex(
r"(?s)Play\('[^']+'\s*,\s*(\[.+\])\s*,\s*{.*?}\);",
webpage, 'jwplayer playlist'), video_id)
info = self._parse_jwplayer_data(
{'playlist': jwplayer_playlist}, video_id, base_url=url)
json_url = url + ('&' if '?' in url else '?') + 'output=json'
data = self._download_json(json_url, video_id)
def get_optional(metadata, field):
return metadata.get(field, [None])[0]
def get_optional(data_dict, field):
return data_dict['metadata'].get(field, [None])[0]
title = get_optional(data, 'title')
description = get_optional(data, 'description')
uploader = get_optional(data, 'creator')
upload_date = unified_strdate(get_optional(data, 'date'))
formats = [
{
'format': fdata['format'],
'url': 'http://' + data['server'] + data['dir'] + fn,
'file_size': int(fdata['size']),
}
for fn, fdata in data['files'].items()
if 'Video' in fdata['format']]
self._sort_formats(formats)
return {
'_type': 'video',
'id': video_id,
'title': title,
'formats': formats,
'description': description,
'uploader': uploader,
'upload_date': upload_date,
'thumbnail': data.get('misc', {}).get('image'),
}
metadata = self._download_json(
'http://archive.org/details/' + video_id, video_id, query={
'output': 'json',
})['metadata']
info.update({
'title': get_optional(metadata, 'title') or info.get('title'),
'description': clean_html(get_optional(metadata, 'description')),
})
if info.get('_type') != 'playlist':
info.update({
'uploader': get_optional(metadata, 'creator'),
'upload_date': unified_strdate(get_optional(metadata, 'date')),
})
return info

View File

@@ -13,13 +13,14 @@ from ..utils import (
parse_duration,
unified_strdate,
xpath_text,
update_url_query,
)
from ..compat import compat_etree_fromstring
class ARDMediathekIE(InfoExtractor):
IE_NAME = 'ARD:mediathek'
_VALID_URL = r'^https?://(?:(?:www\.)?ardmediathek\.de|mediathek\.daserste\.de)/(?:.*/)(?P<video_id>[0-9]+|[^0-9][^/\?]+)[^/\?]*(?:\?.*)?'
_VALID_URL = r'^https?://(?:(?:www\.)?ardmediathek\.de|mediathek\.(?:daserste|rbb-online)\.de)/(?:.*/)(?P<video_id>[0-9]+|[^0-9][^/\?]+)[^/\?]*(?:\?.*)?'
_TESTS = [{
'url': 'http://www.ardmediathek.de/tv/Dokumentation-und-Reportage/Ich-liebe-das-Leben-trotzdem/rbb-Fernsehen/Video?documentId=29582122&bcastId=3822114',
@@ -34,6 +35,7 @@ class ARDMediathekIE(InfoExtractor):
# m3u8 download
'skip_download': True,
},
'skip': 'HTTP Error 404: Not Found',
}, {
'url': 'http://www.ardmediathek.de/tv/Tatort/Tatort-Scheinwelten-H%C3%B6rfassung-Video/Das-Erste/Video?documentId=29522730&bcastId=602916',
'md5': 'f4d98b10759ac06c0072bbcd1f0b9e3e',
@@ -44,6 +46,7 @@ class ARDMediathekIE(InfoExtractor):
'description': 'md5:196392e79876d0ac94c94e8cdb2875f1',
'duration': 5252,
},
'skip': 'HTTP Error 404: Not Found',
}, {
# audio
'url': 'http://www.ardmediathek.de/tv/WDR-H%C3%B6rspiel-Speicher/Tod-eines-Fu%C3%9Fballers/WDR-3/Audio-Podcast?documentId=28488308&bcastId=23074086',
@@ -55,9 +58,22 @@ class ARDMediathekIE(InfoExtractor):
'description': 'md5:f6e39f3461f0e1f54bfa48c8875c86ef',
'duration': 3240,
},
'skip': 'HTTP Error 404: Not Found',
}, {
'url': 'http://mediathek.daserste.de/sendungen_a-z/328454_anne-will/22429276_vertrauen-ist-gut-spionieren-ist-besser-geht',
'only_matching': True,
}, {
# audio
'url': 'http://mediathek.rbb-online.de/radio/Hörspiel/Vor-dem-Fest/kulturradio/Audio?documentId=30796318&topRessort=radio&bcastId=9839158',
'md5': '4e8f00631aac0395fee17368ac0e9867',
'info_dict': {
'id': '30796318',
'ext': 'mp3',
'title': 'Vor dem Fest',
'description': 'md5:c0c1c8048514deaed2a73b3a60eecacb',
'duration': 3287,
},
'skip': 'Video is no longer available',
}]
def _extract_media_info(self, media_info_url, webpage, video_id):
@@ -113,11 +129,14 @@ class ARDMediathekIE(InfoExtractor):
continue
if ext == 'f4m':
formats.extend(self._extract_f4m_formats(
stream_url + '?hdcore=3.1.1&plugin=aasp-3.1.1.69.124',
video_id, preference=-1, f4m_id='hds', fatal=False))
update_url_query(stream_url, {
'hdcore': '3.1.1',
'plugin': 'aasp-3.1.1.69.124'
}),
video_id, f4m_id='hds', fatal=False))
elif ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
stream_url, video_id, 'mp4', preference=1, m3u8_id='hls', fatal=False))
stream_url, video_id, 'mp4', m3u8_id='hls', fatal=False))
else:
if server and server.startswith('rtmp'):
f = {
@@ -231,7 +250,8 @@ class ARDIE(InfoExtractor):
'title': 'Die Story im Ersten: Mission unter falscher Flagge',
'upload_date': '20140804',
'thumbnail': 're:^https?://.*\.jpg$',
}
},
'skip': 'HTTP Error 404: Not Found',
}
def _real_extract(self, url):

View File

@@ -0,0 +1,115 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
determine_ext,
float_or_none,
int_or_none,
mimetype2ext,
parse_iso8601,
strip_jsonp,
)
class ArkenaIE(InfoExtractor):
_VALID_URL = r'https?://play\.arkena\.com/(?:config|embed)/avp/v\d/player/media/(?P<id>[^/]+)/[^/]+/(?P<account_id>\d+)'
_TESTS = [{
'url': 'https://play.arkena.com/embed/avp/v2/player/media/b41dda37-d8e7-4d3f-b1b5-9a9db578bdfe/1/129411',
'md5': 'b96f2f71b359a8ecd05ce4e1daa72365',
'info_dict': {
'id': 'b41dda37-d8e7-4d3f-b1b5-9a9db578bdfe',
'ext': 'mp4',
'title': 'Big Buck Bunny',
'description': 'Royalty free test video',
'timestamp': 1432816365,
'upload_date': '20150528',
'is_live': False,
},
}, {
'url': 'https://play.arkena.com/config/avp/v2/player/media/b41dda37-d8e7-4d3f-b1b5-9a9db578bdfe/1/129411/?callbackMethod=jQuery1111023664739129262213_1469227693893',
'only_matching': True,
}, {
'url': 'http://play.arkena.com/config/avp/v1/player/media/327336/darkmatter/131064/?callbackMethod=jQuery1111002221189684892677_1469227595972',
'only_matching': True,
}, {
'url': 'http://play.arkena.com/embed/avp/v1/player/media/327336/darkmatter/131064/',
'only_matching': True,
}]
@staticmethod
def _extract_url(webpage):
# See https://support.arkena.com/display/PLAY/Ways+to+embed+your+video
mobj = re.search(
r'<iframe[^>]+src=(["\'])(?P<url>(?:https?:)?//play\.arkena\.com/embed/avp/.+?)\1',
webpage)
if mobj:
return mobj.group('url')
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
account_id = mobj.group('account_id')
playlist = self._download_json(
'https://play.arkena.com/config/avp/v2/player/media/%s/0/%s/?callbackMethod=_'
% (video_id, account_id),
video_id, transform_source=strip_jsonp)['Playlist'][0]
media_info = playlist['MediaInfo']
title = media_info['Title']
media_files = playlist['MediaFiles']
is_live = False
formats = []
for kind_case, kind_formats in media_files.items():
kind = kind_case.lower()
for f in kind_formats:
f_url = f.get('Url')
if not f_url:
continue
is_live = f.get('Live') == 'true'
exts = (mimetype2ext(f.get('Type')), determine_ext(f_url, None))
if kind == 'm3u8' or 'm3u8' in exts:
formats.extend(self._extract_m3u8_formats(
f_url, video_id, 'mp4',
entry_protocol='m3u8' if is_live else 'm3u8_native',
m3u8_id=kind, fatal=False, live=is_live))
elif kind == 'flash' or 'f4m' in exts:
formats.extend(self._extract_f4m_formats(
f_url, video_id, f4m_id=kind, fatal=False))
elif kind == 'dash' or 'mpd' in exts:
formats.extend(self._extract_mpd_formats(
f_url, video_id, mpd_id=kind, fatal=False))
elif kind == 'silverlight':
# TODO: process when ism is supported (see
# https://github.com/rg3/youtube-dl/issues/8118)
continue
else:
tbr = float_or_none(f.get('Bitrate'), 1000)
formats.append({
'url': f_url,
'format_id': '%s-%d' % (kind, tbr) if tbr else kind,
'tbr': tbr,
})
self._sort_formats(formats)
description = media_info.get('Description')
video_id = media_info.get('VideoId') or video_id
timestamp = parse_iso8601(media_info.get('PublishDate'))
thumbnails = [{
'url': thumbnail['Url'],
'width': int_or_none(thumbnail.get('Size')),
} for thumbnail in (media_info.get('Poster') or []) if thumbnail.get('Url')]
return {
'id': video_id,
'title': title,
'description': description,
'timestamp': timestamp,
'is_live': is_live,
'thumbnails': thumbnails,
'formats': formats,
}

View File

@@ -5,11 +5,13 @@ import re
from .common import InfoExtractor
from ..utils import (
dict_get,
ExtractorError,
float_or_none,
int_or_none,
parse_duration,
parse_iso8601,
try_get,
unescapeHTML,
)
from ..compat import (
@@ -229,51 +231,6 @@ class BBCCoUkIE(InfoExtractor):
asx = self._download_xml(connection.get('href'), programme_id, 'Downloading ASX playlist')
return [ref.get('href') for ref in asx.findall('./Entry/ref')]
def _extract_connection(self, connection, programme_id):
formats = []
kind = connection.get('kind')
protocol = connection.get('protocol')
supplier = connection.get('supplier')
if protocol == 'http':
href = connection.get('href')
transfer_format = connection.get('transferFormat')
# ASX playlist
if supplier == 'asx':
for i, ref in enumerate(self._extract_asx_playlist(connection, programme_id)):
formats.append({
'url': ref,
'format_id': 'ref%s_%s' % (i, supplier),
})
# Skip DASH until supported
elif transfer_format == 'dash':
pass
elif transfer_format == 'hls':
formats.extend(self._extract_m3u8_formats(
href, programme_id, ext='mp4', entry_protocol='m3u8_native',
m3u8_id=supplier, fatal=False))
# Direct link
else:
formats.append({
'url': href,
'format_id': supplier or kind or protocol,
})
elif protocol == 'rtmp':
application = connection.get('application', 'ondemand')
auth_string = connection.get('authString')
identifier = connection.get('identifier')
server = connection.get('server')
formats.append({
'url': '%s://%s/%s?%s' % (protocol, server, application, auth_string),
'play_path': identifier,
'app': '%s?%s' % (application, auth_string),
'page_url': 'http://www.bbc.co.uk',
'player_url': 'http://www.bbc.co.uk/emp/releases/iplayer/revisions/617463_618125_4/617463_618125_4_emp.swf',
'rtmp_live': False,
'ext': 'flv',
'format_id': supplier,
})
return formats
def _extract_items(self, playlist):
return playlist.findall('./{%s}item' % self._EMP_PLAYLIST_NS)
@@ -294,46 +251,6 @@ class BBCCoUkIE(InfoExtractor):
def _extract_connections(self, media):
return self._findall_ns(media, './{%s}connection')
def _extract_video(self, media, programme_id):
formats = []
vbr = int_or_none(media.get('bitrate'))
vcodec = media.get('encoding')
service = media.get('service')
width = int_or_none(media.get('width'))
height = int_or_none(media.get('height'))
file_size = int_or_none(media.get('media_file_size'))
for connection in self._extract_connections(media):
conn_formats = self._extract_connection(connection, programme_id)
for format in conn_formats:
format.update({
'width': width,
'height': height,
'vbr': vbr,
'vcodec': vcodec,
'filesize': file_size,
})
if service:
format['format_id'] = '%s_%s' % (service, format['format_id'])
formats.extend(conn_formats)
return formats
def _extract_audio(self, media, programme_id):
formats = []
abr = int_or_none(media.get('bitrate'))
acodec = media.get('encoding')
service = media.get('service')
for connection in self._extract_connections(media):
conn_formats = self._extract_connection(connection, programme_id)
for format in conn_formats:
format.update({
'format_id': '%s_%s' % (service, format['format_id']),
'abr': abr,
'acodec': acodec,
'vcodec': 'none',
})
formats.extend(conn_formats)
return formats
def _get_subtitles(self, media, programme_id):
subtitles = {}
for connection in self._extract_connections(media):
@@ -379,13 +296,87 @@ class BBCCoUkIE(InfoExtractor):
def _process_media_selector(self, media_selection, programme_id):
formats = []
subtitles = None
urls = []
for media in self._extract_medias(media_selection):
kind = media.get('kind')
if kind == 'audio':
formats.extend(self._extract_audio(media, programme_id))
elif kind == 'video':
formats.extend(self._extract_video(media, programme_id))
if kind in ('video', 'audio'):
bitrate = int_or_none(media.get('bitrate'))
encoding = media.get('encoding')
service = media.get('service')
width = int_or_none(media.get('width'))
height = int_or_none(media.get('height'))
file_size = int_or_none(media.get('media_file_size'))
for connection in self._extract_connections(media):
href = connection.get('href')
if href in urls:
continue
if href:
urls.append(href)
conn_kind = connection.get('kind')
protocol = connection.get('protocol')
supplier = connection.get('supplier')
transfer_format = connection.get('transferFormat')
format_id = supplier or conn_kind or protocol
if service:
format_id = '%s_%s' % (service, format_id)
# ASX playlist
if supplier == 'asx':
for i, ref in enumerate(self._extract_asx_playlist(connection, programme_id)):
formats.append({
'url': ref,
'format_id': 'ref%s_%s' % (i, format_id),
})
elif transfer_format == 'dash':
formats.extend(self._extract_mpd_formats(
href, programme_id, mpd_id=format_id, fatal=False))
elif transfer_format == 'hls':
formats.extend(self._extract_m3u8_formats(
href, programme_id, ext='mp4', entry_protocol='m3u8_native',
m3u8_id=format_id, fatal=False))
elif transfer_format == 'hds':
formats.extend(self._extract_f4m_formats(
href, programme_id, f4m_id=format_id, fatal=False))
else:
if not service and not supplier and bitrate:
format_id += '-%d' % bitrate
fmt = {
'format_id': format_id,
'filesize': file_size,
}
if kind == 'video':
fmt.update({
'width': width,
'height': height,
'vbr': bitrate,
'vcodec': encoding,
})
else:
fmt.update({
'abr': bitrate,
'acodec': encoding,
'vcodec': 'none',
})
if protocol == 'http':
# Direct link
fmt.update({
'url': href,
})
elif protocol == 'rtmp':
application = connection.get('application', 'ondemand')
auth_string = connection.get('authString')
identifier = connection.get('identifier')
server = connection.get('server')
fmt.update({
'url': '%s://%s/%s?%s' % (protocol, server, application, auth_string),
'play_path': identifier,
'app': '%s?%s' % (application, auth_string),
'page_url': 'http://www.bbc.co.uk',
'player_url': 'http://www.bbc.co.uk/emp/releases/iplayer/revisions/617463_618125_4/617463_618125_4_emp.swf',
'rtmp_live': False,
'ext': 'flv',
})
formats.append(fmt)
elif kind == 'captions':
subtitles = self.extract_subtitles(media, programme_id)
return formats, subtitles
@@ -590,6 +581,7 @@ class BBCIE(BBCCoUkIE):
'id': '150615_telabyad_kentin_cogu',
'ext': 'mp4',
'title': "YPG: Tel Abyad'ın tamamı kontrolümüzde",
'description': 'md5:33a4805a855c9baf7115fcbde57e7025',
'timestamp': 1434397334,
'upload_date': '20150615',
},
@@ -603,6 +595,7 @@ class BBCIE(BBCCoUkIE):
'id': '150619_video_honduras_militares_hospitales_corrupcion_aw',
'ext': 'mp4',
'title': 'Honduras militariza sus hospitales por nuevo escándalo de corrupción',
'description': 'md5:1525f17448c4ee262b64b8f0c9ce66c8',
'timestamp': 1434713142,
'upload_date': '20150619',
},
@@ -652,6 +645,23 @@ class BBCIE(BBCCoUkIE):
# rtmp download
'skip_download': True,
}
}, {
# single video embedded with Morph
'url': 'http://www.bbc.co.uk/sport/live/olympics/36895975',
'info_dict': {
'id': 'p041vhd0',
'ext': 'mp4',
'title': "Nigeria v Japan - Men's First Round",
'description': 'Live coverage of the first round from Group B at the Amazonia Arena.',
'duration': 7980,
'uploader': 'BBC Sport',
'uploader_id': 'bbc_sport',
},
'params': {
# m3u8 download
'skip_download': True,
},
'skip': 'Georestricted to UK',
}, {
# single video with playlist.sxml URL in playlist param
'url': 'http://www.bbc.com/sport/0/football/33653409',
@@ -749,7 +759,7 @@ class BBCIE(BBCCoUkIE):
webpage = self._download_webpage(url, playlist_id)
json_ld_info = self._search_json_ld(webpage, playlist_id, default=None)
json_ld_info = self._search_json_ld(webpage, playlist_id, default={})
timestamp = json_ld_info.get('timestamp')
playlist_title = json_ld_info.get('title')
@@ -818,8 +828,29 @@ class BBCIE(BBCCoUkIE):
# http://www.bbc.com/turkce/multimedya/2015/10/151010_vid_ankara_patlama_ani)
playlist = data_playable.get('otherSettings', {}).get('playlist', {})
if playlist:
entries.append(self._extract_from_playlist_sxml(
playlist.get('progressiveDownloadUrl'), playlist_id, timestamp))
entry = None
for key in ('streaming', 'progressiveDownload'):
playlist_url = playlist.get('%sUrl' % key)
if not playlist_url:
continue
try:
info = self._extract_from_playlist_sxml(
playlist_url, playlist_id, timestamp)
if not entry:
entry = info
else:
entry['title'] = info['title']
entry['formats'].extend(info['formats'])
except Exception as e:
# Some playlist URL may fail with 500, at the same time
# the other one may work fine (e.g.
# http://www.bbc.com/turkce/haberler/2015/06/150615_telabyad_kentin_cogu)
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 500:
continue
raise
if entry:
self._sort_formats(entry['formats'])
entries.append(entry)
if entries:
return self.playlist_result(entries, playlist_id, playlist_title, playlist_description)
@@ -852,6 +883,50 @@ class BBCIE(BBCCoUkIE):
'subtitles': subtitles,
}
# Morph based embed (e.g. http://www.bbc.co.uk/sport/live/olympics/36895975)
# There are several setPayload calls may be present but the video
# seems to be always related to the first one
morph_payload = self._parse_json(
self._search_regex(
r'Morph\.setPayload\([^,]+,\s*({.+?})\);',
webpage, 'morph payload', default='{}'),
playlist_id, fatal=False)
if morph_payload:
components = try_get(morph_payload, lambda x: x['body']['components'], list) or []
for component in components:
if not isinstance(component, dict):
continue
lead_media = try_get(component, lambda x: x['props']['leadMedia'], dict)
if not lead_media:
continue
identifiers = lead_media.get('identifiers')
if not identifiers or not isinstance(identifiers, dict):
continue
programme_id = identifiers.get('vpid') or identifiers.get('playablePid')
if not programme_id:
continue
title = lead_media.get('title') or self._og_search_title(webpage)
formats, subtitles = self._download_media_selector(programme_id)
self._sort_formats(formats)
description = lead_media.get('summary')
uploader = lead_media.get('masterBrand')
uploader_id = lead_media.get('mid')
duration = None
duration_d = lead_media.get('duration')
if isinstance(duration_d, dict):
duration = parse_duration(dict_get(
duration_d, ('rawDuration', 'formattedDuration', 'spokenDuration')))
return {
'id': programme_id,
'title': title,
'description': description,
'duration': duration,
'uploader': uploader,
'uploader_id': uploader_id,
'formats': formats,
'subtitles': subtitles,
}
def extract_all(pattern):
return list(filter(None, map(
lambda s: self._parse_json(s, playlist_id, fatal=False),
@@ -869,7 +944,7 @@ class BBCIE(BBCCoUkIE):
r'setPlaylist\("(%s)"\)' % EMBED_URL, webpage))
if entries:
return self.playlist_result(
[self.url_result(entry, 'BBCCoUk') for entry in entries],
[self.url_result(entry_, 'BBCCoUk') for entry_ in entries],
playlist_id, playlist_title, playlist_description)
# Multiple video article (e.g. http://www.bbc.com/news/world-europe-32668511)
@@ -998,10 +1073,10 @@ class BBCCoUkPlaylistBaseIE(InfoExtractor):
class BBCCoUkIPlayerPlaylistIE(BBCCoUkPlaylistBaseIE):
IE_NAME = 'bbc.co.uk:iplayer:playlist'
_VALID_URL = r'https?://(?:www\.)?bbc\.co\.uk/iplayer/episodes/(?P<id>%s)' % BBCCoUkIE._ID_REGEX
_VALID_URL = r'https?://(?:www\.)?bbc\.co\.uk/iplayer/(?:episodes|group)/(?P<id>%s)' % BBCCoUkIE._ID_REGEX
_URL_TEMPLATE = 'http://www.bbc.co.uk/iplayer/episode/%s'
_VIDEO_ID_TEMPLATE = r'data-ip-id=["\'](%s)'
_TEST = {
_TESTS = [{
'url': 'http://www.bbc.co.uk/iplayer/episodes/b05rcz9v',
'info_dict': {
'id': 'b05rcz9v',
@@ -1009,7 +1084,17 @@ class BBCCoUkIPlayerPlaylistIE(BBCCoUkPlaylistBaseIE):
'description': 'French thriller serial about a missing teenager.',
},
'playlist_mincount': 6,
}
'skip': 'This programme is not currently available on BBC iPlayer',
}, {
# Available for over a year unlike 30 days for most other programmes
'url': 'http://www.bbc.co.uk/iplayer/group/p02tcc32',
'info_dict': {
'id': 'p02tcc32',
'title': 'Bohemian Icons',
'description': 'md5:683e901041b2fe9ba596f2ab04c4dbe7',
},
'playlist_mincount': 10,
}]
def _extract_title_and_description(self, webpage):
title = self._search_regex(r'<h1>([^<]+)</h1>', webpage, 'title', fatal=False)

View File

@@ -11,22 +11,13 @@ from ..compat import compat_urllib_parse_unquote
class BigflixIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?bigflix\.com/.+/(?P<id>[0-9]+)'
_TESTS = [{
'url': 'http://www.bigflix.com/Hindi-movies/Action-movies/Singham-Returns/16537',
'md5': 'ec76aa9b1129e2e5b301a474e54fab74',
'info_dict': {
'id': '16537',
'ext': 'mp4',
'title': 'Singham Returns',
'description': 'md5:3d2ba5815f14911d5cc6a501ae0cf65d',
}
}, {
# 2 formats
'url': 'http://www.bigflix.com/Tamil-movies/Drama-movies/Madarasapatinam/16070',
'info_dict': {
'id': '16070',
'ext': 'mp4',
'title': 'Madarasapatinam',
'description': 'md5:63b9b8ed79189c6f0418c26d9a3452ca',
'description': 'md5:9f0470b26a4ba8e824c823b5d95c2f6b',
'formats': 'mincount:2',
},
'params': {

View File

@@ -25,13 +25,13 @@ class BiliBiliIE(InfoExtractor):
_TESTS = [{
'url': 'http://www.bilibili.tv/video/av1074402/',
'md5': '5f7d29e1a2872f3df0cf76b1f87d3788',
'md5': '9fa226fe2b8a9a4d5a69b4c6a183417e',
'info_dict': {
'id': '1554319',
'ext': 'flv',
'ext': 'mp4',
'title': '【金坷垃】金泡沫',
'description': 'md5:ce18c2a2d2193f0df2917d270f2e5923',
'duration': 308.067,
'duration': 308.315,
'timestamp': 1398012660,
'upload_date': '20140420',
'thumbnail': 're:^https?://.+\.jpg',
@@ -41,73 +41,33 @@ class BiliBiliIE(InfoExtractor):
}, {
'url': 'http://www.bilibili.com/video/av1041170/',
'info_dict': {
'id': '1041170',
'id': '1507019',
'ext': 'mp4',
'title': '【BD1080P】刀语【诸神&异域】',
'description': '这是个神奇的故事~每个人不留弹幕不给走哦~切利哦!~',
'timestamp': 1396530060,
'upload_date': '20140403',
'uploader': '枫叶逝去',
'uploader_id': '520116',
},
'playlist_count': 9,
}, {
'url': 'http://www.bilibili.com/video/av4808130/',
'info_dict': {
'id': '4808130',
'id': '7802182',
'ext': 'mp4',
'title': '【长篇】哆啦A梦443【钉铛】',
'description': '(2016.05.27)来组合客人的脸吧&amp;amp;寻母六千里锭 抱歉,又轮到周日上班现在才到家 封面www.pixiv.net/member_illust.php?mode=medium&amp;amp;illust_id=56912929',
'timestamp': 1464564180,
'upload_date': '20160529',
'uploader': '喜欢拉面',
'uploader_id': '151066',
},
'playlist': [{
'md5': '55cdadedf3254caaa0d5d27cf20a8f9c',
'info_dict': {
'id': '4808130_part1',
'ext': 'flv',
'title': '【长篇】哆啦A梦443【钉铛】',
'description': '(2016.05.27)来组合客人的脸吧&amp;amp;寻母六千里锭 抱歉,又轮到周日上班现在才到家 封面www.pixiv.net/member_illust.php?mode=medium&amp;amp;illust_id=56912929',
'timestamp': 1464564180,
'upload_date': '20160529',
'uploader': '喜欢拉面',
'uploader_id': '151066',
},
}, {
'md5': '926f9f67d0c482091872fbd8eca7ea3d',
'info_dict': {
'id': '4808130_part2',
'ext': 'flv',
'title': '【长篇】哆啦A梦443【钉铛】',
'description': '(2016.05.27)来组合客人的脸吧&amp;amp;寻母六千里锭 抱歉,又轮到周日上班现在才到家 封面www.pixiv.net/member_illust.php?mode=medium&amp;amp;illust_id=56912929',
'timestamp': 1464564180,
'upload_date': '20160529',
'uploader': '喜欢拉面',
'uploader_id': '151066',
},
}, {
'md5': '4b7b225b968402d7c32348c646f1fd83',
'info_dict': {
'id': '4808130_part3',
'ext': 'flv',
'title': '【长篇】哆啦A梦443【钉铛】',
'description': '(2016.05.27)来组合客人的脸吧&amp;amp;寻母六千里锭 抱歉,又轮到周日上班现在才到家 封面www.pixiv.net/member_illust.php?mode=medium&amp;amp;illust_id=56912929',
'timestamp': 1464564180,
'upload_date': '20160529',
'uploader': '喜欢拉面',
'uploader_id': '151066',
},
}, {
'md5': '7b795e214166501e9141139eea236e91',
'info_dict': {
'id': '4808130_part4',
'ext': 'flv',
'title': '【长篇】哆啦A梦443【钉铛】',
'description': '(2016.05.27)来组合客人的脸吧&amp;amp;寻母六千里锭 抱歉,又轮到周日上班现在才到家 封面www.pixiv.net/member_illust.php?mode=medium&amp;amp;illust_id=56912929',
'timestamp': 1464564180,
'upload_date': '20160529',
'uploader': '喜欢拉面',
'uploader_id': '151066',
},
}],
}, {
# Missing upload time
'url': 'http://www.bilibili.com/video/av1867637/',
'info_dict': {
'id': '2880301',
'ext': 'flv',
'ext': 'mp4',
'title': '【HDTV】【喜剧】岳父岳母真难当 2014【法国票房冠军】',
'description': '一个信奉天主教的法国旧式传统资产阶级家庭中有四个女儿。三个女儿却分别找了阿拉伯、犹太、中国丈夫,老夫老妻唯独期盼剩下未嫁的小女儿能找一个信奉天主教的法国白人,结果没想到小女儿找了一位非裔黑人……【这次应该不会跳帧了】',
'uploader': '黑夜为猫',

View File

@@ -2,11 +2,15 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import remove_end
from ..utils import (
ExtractorError,
remove_end,
)
from .rudo import RudoIE
class BioBioChileTVIE(InfoExtractor):
_VALID_URL = r'https?://tv\.biobiochile\.cl/notas/(?:[^/]+/)+(?P<id>[^/]+)\.shtml'
_VALID_URL = r'https?://(?:tv|www)\.biobiochile\.cl/(?:notas|noticias)/(?:[^/]+/)+(?P<id>[^/]+)\.shtml'
_TESTS = [{
'url': 'http://tv.biobiochile.cl/notas/2015/10/21/sobre-camaras-y-camarillas-parlamentarias.shtml',
@@ -18,6 +22,7 @@ class BioBioChileTVIE(InfoExtractor):
'thumbnail': 're:^https?://.*\.jpg$',
'uploader': 'Fernando Atria',
},
'skip': 'URL expired and redirected to http://www.biobiochile.cl/portada/bbtv/index.html',
}, {
# different uploader layout
'url': 'http://tv.biobiochile.cl/notas/2016/03/18/natalia-valdebenito-repasa-a-diputado-hasbun-paso-a-la-categoria-de-hablar-brutalidades.shtml',
@@ -32,6 +37,16 @@ class BioBioChileTVIE(InfoExtractor):
'params': {
'skip_download': True,
},
'skip': 'URL expired and redirected to http://www.biobiochile.cl/portada/bbtv/index.html',
}, {
'url': 'http://www.biobiochile.cl/noticias/bbtv/comentarios-bio-bio/2016/07/08/edecanes-del-congreso-figuras-decorativas-que-le-cuestan-muy-caro-a-los-chilenos.shtml',
'info_dict': {
'id': 'edecanes-del-congreso-figuras-decorativas-que-le-cuestan-muy-caro-a-los-chilenos',
'ext': 'mp4',
'uploader': '(none)',
'upload_date': '20160708',
'title': 'Edecanes del Congreso: Figuras decorativas que le cuestan muy caro a los chilenos',
},
}, {
'url': 'http://tv.biobiochile.cl/notas/2015/10/22/ninos-transexuales-de-quien-es-la-decision.shtml',
'only_matching': True,
@@ -45,42 +60,22 @@ class BioBioChileTVIE(InfoExtractor):
webpage = self._download_webpage(url, video_id)
rudo_url = RudoIE._extract_url(webpage)
if not rudo_url:
raise ExtractorError('No videos found')
title = remove_end(self._og_search_title(webpage), ' - BioBioChile TV')
file_url = self._search_regex(
r'loadFWPlayerVideo\([^,]+,\s*(["\'])(?P<url>.+?)\1',
webpage, 'file url', group='url')
base_url = self._search_regex(
r'file\s*:\s*(["\'])(?P<url>.+?)\1\s*\+\s*fileURL', webpage,
'base url', default='http://unlimited2-cl.digitalproserver.com/bbtv/',
group='url')
formats = self._extract_m3u8_formats(
'%s%s/playlist.m3u8' % (base_url, file_url), video_id, 'mp4',
entry_protocol='m3u8_native', m3u8_id='hls', fatal=False)
f = {
'url': '%s%s' % (base_url, file_url),
'format_id': 'http',
'protocol': 'http',
'preference': 1,
}
if formats:
f_copy = formats[-1].copy()
f_copy.update(f)
f = f_copy
formats.append(f)
self._sort_formats(formats)
thumbnail = self._og_search_thumbnail(webpage)
uploader = self._html_search_regex(
r'<a[^>]+href=["\']https?://busca\.biobiochile\.cl/author[^>]+>(.+?)</a>',
r'<a[^>]+href=["\']https?://(?:busca|www)\.biobiochile\.cl/(?:lista/)?(?:author|autor)[^>]+>(.+?)</a>',
webpage, 'uploader', fatal=False)
return {
'_type': 'url_transparent',
'url': rudo_url,
'id': video_id,
'title': title,
'thumbnail': thumbnail,
'uploader': uploader,
'formats': formats,
}

View File

@@ -24,7 +24,8 @@ class BIQLEIE(InfoExtractor):
'ext': 'mp4',
'title': 'Ребенок в шоке от автоматической мойки',
'uploader': 'Dmitry Kotov',
}
},
'skip': ' This video was marked as adult. Embedding adult videos on external sites is prohibited.',
}]
def _real_extract(self, url):

View File

@@ -1,3 +1,4 @@
# coding: utf-8
from __future__ import unicode_literals
import re
@@ -20,6 +21,18 @@ class BloombergIE(InfoExtractor):
'params': {
'format': 'best[format_id^=hds]',
},
}, {
# video ID in BPlayer(...)
'url': 'http://www.bloomberg.com/features/2016-hello-world-new-zealand/',
'info_dict': {
'id': '938c7e72-3f25-4ddb-8b85-a9be731baa74',
'ext': 'flv',
'title': 'Meet the Real-Life Tech Wizards of Middle Earth',
'description': 'Hello World, Episode 1: New Zealands freaky AI babies, robot exoskeletons, and a virtual you.',
},
'params': {
'format': 'best[format_id^=hds]',
},
}, {
'url': 'http://www.bloomberg.com/news/articles/2015-11-12/five-strange-things-that-have-been-happening-in-financial-markets',
'only_matching': True,
@@ -33,7 +46,11 @@ class BloombergIE(InfoExtractor):
webpage = self._download_webpage(url, name)
video_id = self._search_regex(
r'["\']bmmrId["\']\s*:\s*(["\'])(?P<url>.+?)\1',
webpage, 'id', group='url')
webpage, 'id', group='url', default=None)
if not video_id:
bplayer_data = self._parse_json(self._search_regex(
r'BPlayer\(null,\s*({[^;]+})\);', webpage, 'id'), name)
video_id = bplayer_data['id']
title = re.sub(': Video$', '', self._og_search_title(webpage))
embed_info = self._download_json(

View File

@@ -26,6 +26,8 @@ from ..utils import (
unescapeHTML,
unsmuggle_url,
update_url_query,
clean_html,
mimetype2ext,
)
@@ -90,6 +92,7 @@ class BrightcoveLegacyIE(InfoExtractor):
'description': 'md5:363109c02998fee92ec02211bd8000df',
'uploader': 'National Ballet of Canada',
},
'skip': 'Video gone',
},
{
# test flv videos served by akamaihd.net
@@ -108,7 +111,7 @@ class BrightcoveLegacyIE(InfoExtractor):
},
},
{
# playlist test
# playlist with 'videoList'
# from http://support.brightcove.com/en/video-cloud/docs/playlist-support-single-video-players
'url': 'http://c.brightcove.com/services/viewer/htmlFederated?playerID=3550052898001&playerKey=AQ%7E%7E%2CAAABmA9XpXk%7E%2C-Kp7jNgisre1fG5OdqpAFUTcs0lP_ZoL',
'info_dict': {
@@ -117,6 +120,15 @@ class BrightcoveLegacyIE(InfoExtractor):
},
'playlist_mincount': 7,
},
{
# playlist with 'playlistTab' (https://github.com/rg3/youtube-dl/issues/9965)
'url': 'http://c.brightcove.com/services/json/experience/runtime/?command=get_programming_for_experience&playerKey=AQ%7E%7E,AAABXlLMdok%7E,NJ4EoMlZ4rZdx9eU1rkMVd8EaYPBBUlg',
'info_dict': {
'id': '1522758701001',
'title': 'Lesson 08',
},
'playlist_mincount': 10,
},
]
FLV_VCODECS = {
1: 'SORENSON',
@@ -298,13 +310,19 @@ class BrightcoveLegacyIE(InfoExtractor):
info_url, player_key, 'Downloading playlist information')
json_data = json.loads(playlist_info)
if 'videoList' not in json_data:
if 'videoList' in json_data:
playlist_info = json_data['videoList']
playlist_dto = playlist_info['mediaCollectionDTO']
elif 'playlistTabs' in json_data:
playlist_info = json_data['playlistTabs']
playlist_dto = playlist_info['lineupListDTO']['playlistDTOs'][0]
else:
raise ExtractorError('Empty playlist')
playlist_info = json_data['videoList']
videos = [self._extract_video_info(video_info) for video_info in playlist_info['mediaCollectionDTO']['videoDTOs']]
videos = [self._extract_video_info(video_info) for video_info in playlist_dto['videoDTOs']]
return self.playlist_result(videos, playlist_id='%s' % playlist_info['id'],
playlist_title=playlist_info['mediaCollectionDTO']['displayName'])
playlist_title=playlist_dto['displayName'])
def _extract_video_info(self, video_info):
video_id = compat_str(video_info['id'])
@@ -528,14 +546,16 @@ class BrightcoveNewIE(InfoExtractor):
formats = []
for source in json_data.get('sources', []):
container = source.get('container')
source_type = source.get('type')
ext = mimetype2ext(source.get('type'))
src = source.get('src')
if source_type == 'application/x-mpegURL' or container == 'M2TS':
if ext == 'ism':
continue
elif ext == 'm3u8' or container == 'M2TS':
if not src:
continue
formats.extend(self._extract_m3u8_formats(
src, video_id, 'mp4', 'm3u8_native', m3u8_id='hls', fatal=False))
elif source_type == 'application/dash+xml':
elif ext == 'mpd':
if not src:
continue
formats.extend(self._extract_mpd_formats(src, video_id, 'dash', fatal=False))
@@ -551,7 +571,7 @@ class BrightcoveNewIE(InfoExtractor):
'tbr': tbr,
'filesize': int_or_none(source.get('size')),
'container': container,
'ext': container.lower(),
'ext': ext or container.lower(),
}
if width == 0 and height == 0:
f.update({
@@ -585,6 +605,13 @@ class BrightcoveNewIE(InfoExtractor):
'format_id': build_format_id('rtmp'),
})
formats.append(f)
errors = json_data.get('errors')
if not formats and errors:
error = errors[0]
raise ExtractorError(
error.get('message') or error.get('error_subcode') or error['error_code'], expected=True)
self._sort_formats(formats)
subtitles = {}
@@ -597,7 +624,7 @@ class BrightcoveNewIE(InfoExtractor):
return {
'id': video_id,
'title': title,
'description': json_data.get('description'),
'description': clean_html(json_data.get('description')),
'thumbnail': json_data.get('thumbnail') or json_data.get('poster'),
'duration': float_or_none(json_data.get('duration'), 1000),
'timestamp': parse_iso8601(json_data.get('published_at')),

View File

@@ -5,6 +5,7 @@ import json
import re
from .common import InfoExtractor
from .facebook import FacebookIE
class BuzzFeedIE(InfoExtractor):
@@ -20,11 +21,11 @@ class BuzzFeedIE(InfoExtractor):
'info_dict': {
'id': 'aVCR29aE_OQ',
'ext': 'mp4',
'title': 'Angry Ram destroys a punching bag..',
'description': 'md5:c59533190ef23fd4458a5e8c8c872345',
'upload_date': '20141024',
'uploader_id': 'Buddhanz1',
'description': 'He likes to stay in shape with his heavy bag, he wont stop until its on the ground\n\nFollow Angry Ram on Facebook for regular updates -\nhttps://www.facebook.com/pages/Angry-Ram/1436897249899558?ref=hl',
'uploader': 'Buddhanz',
'title': 'Angry Ram destroys a punching bag',
'uploader': 'Angry Ram',
}
}]
}, {
@@ -41,13 +42,30 @@ class BuzzFeedIE(InfoExtractor):
'info_dict': {
'id': 'mVmBL8B-In0',
'ext': 'mp4',
'title': 're:Munchkin the Teddy Bear gets her exercise',
'description': 'md5:28faab95cda6e361bcff06ec12fc21d8',
'upload_date': '20141124',
'uploader_id': 'CindysMunchkin',
'description': 're:© 2014 Munchkin the',
'uploader': 're:^Munchkin the',
'title': 're:Munchkin the Teddy Bear gets her exercise',
},
}]
}, {
'url': 'http://www.buzzfeed.com/craigsilverman/the-most-adorable-crash-landing-ever#.eq7pX0BAmK',
'info_dict': {
'id': 'the-most-adorable-crash-landing-ever',
'title': 'Watch This Baby Goose Make The Most Adorable Crash Landing',
'description': 'This gosling knows how to stick a landing.',
},
'playlist': [{
'md5': '763ca415512f91ca62e4621086900a23',
'info_dict': {
'id': '971793786185728',
'ext': 'mp4',
'title': 'We set up crash pads so that the goslings on our roof would have a safe landi...',
'uploader': 'Calgary Outdoor Centre-University of Calgary',
},
}],
'add_ie': ['Facebook'],
}]
def _real_extract(self, url):
@@ -66,6 +84,10 @@ class BuzzFeedIE(InfoExtractor):
continue
entries.append(self.url_result(video['url']))
facebook_url = FacebookIE._extract_url(webpage)
if facebook_url:
entries.append(self.url_result(facebook_url))
return {
'_type': 'playlist',
'id': playlist_id,

View File

@@ -1,7 +1,6 @@
# coding: utf-8
from __future__ import unicode_literals
import datetime
import re
from .common import InfoExtractor
@@ -10,8 +9,10 @@ from ..compat import (
compat_urlparse,
)
from ..utils import (
parse_iso8601,
clean_html,
parse_duration,
str_to_int,
unified_strdate,
)
@@ -26,14 +27,14 @@ class CamdemyIE(InfoExtractor):
'ext': 'mp4',
'title': 'Ch1-1 Introduction, Signals (02-23-2012)',
'thumbnail': 're:^https?://.*\.jpg$',
'description': '',
'creator': 'ss11spring',
'duration': 1591,
'upload_date': '20130114',
'timestamp': 1358154556,
'view_count': int,
}
}, {
# With non-empty description
# webpage returns "No permission or not login"
'url': 'http://www.camdemy.com/media/13885',
'md5': '4576a3bb2581f86c61044822adbd1249',
'info_dict': {
@@ -41,64 +42,71 @@ class CamdemyIE(InfoExtractor):
'ext': 'mp4',
'title': 'EverCam + Camdemy QuickStart',
'thumbnail': 're:^https?://.*\.jpg$',
'description': 'md5:050b62f71ed62928f8a35f1a41e186c9',
'description': 'md5:2a9f989c2b153a2342acee579c6e7db6',
'creator': 'evercam',
'upload_date': '20140620',
'timestamp': 1403271569,
'duration': 318,
}
}, {
# External source
# External source (YouTube)
'url': 'http://www.camdemy.com/media/14842',
'md5': '50e1c3c3aa233d3d7b7daa2fa10b1cf7',
'info_dict': {
'id': '2vsYQzNIsJo',
'ext': 'mp4',
'title': 'Excel 2013 Tutorial - How to add Password Protection',
'description': 'Excel 2013 Tutorial for Beginners - How to add Password Protection',
'upload_date': '20130211',
'uploader': 'Hun Kim',
'description': 'Excel 2013 Tutorial for Beginners - How to add Password Protection',
'uploader_id': 'hunkimtutorials',
'title': 'Excel 2013 Tutorial - How to add Password Protection',
}
},
'params': {
'skip_download': True,
},
}]
def _real_extract(self, url):
video_id = self._match_id(url)
page = self._download_webpage(url, video_id)
webpage = self._download_webpage(url, video_id)
src_from = self._html_search_regex(
r"<div class='srcFrom'>Source: <a title='([^']+)'", page,
'external source', default=None)
r"class=['\"]srcFrom['\"][^>]*>Sources?(?:\s+from)?\s*:\s*<a[^>]+(?:href|title)=(['\"])(?P<url>(?:(?!\1).)+)\1",
webpage, 'external source', default=None, group='url')
if src_from:
return self.url_result(src_from)
oembed_obj = self._download_json(
'http://www.camdemy.com/oembed/?format=json&url=' + url, video_id)
title = oembed_obj['title']
thumb_url = oembed_obj['thumbnail_url']
video_folder = compat_urlparse.urljoin(thumb_url, 'video/')
file_list_doc = self._download_xml(
compat_urlparse.urljoin(video_folder, 'fileList.xml'),
video_id, 'Filelist XML')
video_id, 'Downloading filelist XML')
file_name = file_list_doc.find('./video/item/fileName').text
video_url = compat_urlparse.urljoin(video_folder, file_name)
timestamp = parse_iso8601(self._html_search_regex(
r"<div class='title'>Posted\s*:</div>\s*<div class='value'>([^<>]+)<",
page, 'creation time', fatal=False),
delimiter=' ', timezone=datetime.timedelta(hours=8))
view_count = str_to_int(self._html_search_regex(
r"<div class='title'>Views\s*:</div>\s*<div class='value'>([^<>]+)<",
page, 'view count', fatal=False))
# Some URLs return "No permission or not login" in a webpage despite being
# freely available via oembed JSON URL (e.g. http://www.camdemy.com/media/13885)
upload_date = unified_strdate(self._search_regex(
r'>published on ([^<]+)<', webpage,
'upload date', default=None))
view_count = str_to_int(self._search_regex(
r'role=["\']viewCnt["\'][^>]*>([\d,.]+) views',
webpage, 'view count', default=None))
description = self._html_search_meta(
'description', webpage, default=None) or clean_html(
oembed_obj.get('description'))
return {
'id': video_id,
'url': video_url,
'title': oembed_obj['title'],
'title': title,
'thumbnail': thumb_url,
'description': self._html_search_meta('description', page),
'creator': oembed_obj['author_name'],
'duration': oembed_obj['duration'],
'timestamp': timestamp,
'description': description,
'creator': oembed_obj.get('author_name'),
'duration': parse_duration(oembed_obj.get('duration')),
'upload_date': upload_date,
'view_count': view_count,
}

View File

@@ -4,9 +4,11 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
js_to_json,
smuggle_url,
try_get,
)
@@ -25,8 +27,22 @@ class CBCIE(InfoExtractor):
'upload_date': '20160203',
'uploader': 'CBCC-NEW',
},
'skip': 'Geo-restricted to Canada',
}, {
# with clipId
# with clipId, feed available via tpfeed.cbc.ca and feed.theplatform.com
'url': 'http://www.cbc.ca/22minutes/videos/22-minutes-update/22-minutes-update-episode-4',
'md5': '162adfa070274b144f4fdc3c3b8207db',
'info_dict': {
'id': '2414435309',
'ext': 'mp4',
'title': '22 Minutes Update: What Not To Wear Quebec',
'description': "This week's latest Canadian top political story is What Not To Wear Quebec.",
'upload_date': '20131025',
'uploader': 'CBCC-NEW',
'timestamp': 1382717907,
},
}, {
# with clipId, feed only available via tpfeed.cbc.ca
'url': 'http://www.cbc.ca/archives/entry/1978-robin-williams-freestyles-on-90-minutes-live',
'md5': '0274a90b51a9b4971fe005c63f592f12',
'info_dict': {
@@ -64,6 +80,7 @@ class CBCIE(InfoExtractor):
'uploader': 'CBCC-NEW',
},
}],
'skip': 'Geo-restricted to Canada',
}]
@classmethod
@@ -81,9 +98,15 @@ class CBCIE(InfoExtractor):
media_id = player_info.get('mediaId')
if not media_id:
clip_id = player_info['clipId']
media_id = self._download_json(
'http://feed.theplatform.com/f/h9dtGB/punlNGjMlc1F?fields=id&byContent=byReleases%3DbyId%253D' + clip_id,
clip_id)['entries'][0]['id'].split('/')[-1]
feed = self._download_json(
'http://tpfeed.cbc.ca/f/ExhSPC/vms_5akSXx4Ng_Zn?byCustomValue={:mpsReleases}{%s}' % clip_id,
clip_id, fatal=False)
if feed:
media_id = try_get(feed, lambda x: x['entries'][0]['guid'], compat_str)
if not media_id:
media_id = self._download_json(
'http://feed.theplatform.com/f/h9dtGB/punlNGjMlc1F?fields=id&byContent=byReleases%3DbyId%253D' + clip_id,
clip_id)['entries'][0]['id'].split('/')[-1]
return self.url_result('cbcplayer:%s' % media_id, 'CBCPlayer', media_id)
else:
entries = [self.url_result('cbcplayer:%s' % media_id, 'CBCPlayer', media_id) for media_id in re.findall(r'<iframe[^>]+src="[^"]+?mediaId=(\d+)"', webpage)]
@@ -104,6 +127,7 @@ class CBCPlayerIE(InfoExtractor):
'upload_date': '20160210',
'uploader': 'CBCC-NEW',
},
'skip': 'Geo-restricted to Canada',
}, {
# Redirected from http://www.cbc.ca/player/AudioMobile/All%20in%20a%20Weekend%20Montreal/ID/2657632011/
'url': 'http://www.cbc.ca/player/play/2657631896',

View File

@@ -80,9 +80,6 @@ class CBSInteractiveIE(ThePlatformIE):
media_guid_path = 'media/guid/%d/%s' % (self.MPX_ACCOUNTS[site], vdata['mpxRefId'])
formats, subtitles = [], {}
if site == 'cnet':
formats, subtitles = self._extract_theplatform_smil(
self.TP_RELEASE_URL_TEMPLATE % media_guid_path, video_id)
for (fkey, vid) in vdata['files'].items():
if fkey == 'hls_phone' and 'hls_tablet' in vdata['files']:
continue
@@ -94,7 +91,7 @@ class CBSInteractiveIE(ThePlatformIE):
subtitles = self._merge_subtitles(subtitles, tp_subtitles)
self._sort_formats(formats)
info = self.get_metadata('kYEXFC/%s' % media_guid_path, video_id)
info = self._extract_theplatform_metadata('kYEXFC/%s' % media_guid_path, video_id)
info.update({
'id': video_id,
'display_id': display_id,

View File

@@ -1,12 +1,10 @@
# coding: utf-8
from __future__ import unicode_literals
import calendar
import datetime
from .anvato import AnvatoIE
from .sendtonews import SendtoNewsIE
from ..compat import compat_urlparse
from ..utils import unified_timestamp
class CBSLocalIE(AnvatoIE):
@@ -71,10 +69,7 @@ class CBSLocalIE(AnvatoIE):
time_str = self._html_search_regex(
r'class="entry-date">([^<]+)<', webpage, 'released date', fatal=False)
timestamp = None
if time_str:
timestamp = calendar.timegm(datetime.datetime.strptime(
time_str, '%b %d, %Y %I:%M %p').timetuple())
timestamp = unified_timestamp(time_str)
info_dict.update({
'display_id': display_id,

View File

@@ -26,6 +26,7 @@ class CBSNewsIE(CBSBaseIE):
# rtmp download
'skip_download': True,
},
'skip': 'Subscribers only',
},
{
'url': 'http://www.cbsnews.com/videos/fort-hood-shooting-army-downplays-mental-illness-as-cause-of-attack/',
@@ -69,7 +70,7 @@ class CBSNewsLiveVideoIE(InfoExtractor):
IE_DESC = 'CBS News Live Videos'
_VALID_URL = r'https?://(?:www\.)?cbsnews\.com/live/video/(?P<id>[\da-z_-]+)'
_TEST = {
_TESTS = [{
'url': 'http://www.cbsnews.com/live/video/clinton-sanders-prepare-to-face-off-in-nh/',
'info_dict': {
'id': 'clinton-sanders-prepare-to-face-off-in-nh',
@@ -77,7 +78,15 @@ class CBSNewsLiveVideoIE(InfoExtractor):
'title': 'Clinton, Sanders Prepare To Face Off In NH',
'duration': 334,
},
}
'skip': 'Video gone, redirected to http://www.cbsnews.com/live/',
}, {
'url': 'http://www.cbsnews.com/live/video/video-shows-intense-paragliding-accident/',
'info_dict': {
'id': 'video-shows-intense-paragliding-accident',
'ext': 'flv',
'title': 'Video Shows Intense Paragliding Accident',
},
}]
def _real_extract(self, url):
video_id = self._match_id(url)

View File

@@ -17,7 +17,8 @@ class ChaturbateIE(InfoExtractor):
},
'params': {
'skip_download': True,
}
},
'skip': 'Room is offline',
}, {
'url': 'https://en.chaturbate.com/siswet19/',
'only_matching': True,

View File

@@ -1,30 +1,33 @@
# coding: utf-8
from __future__ import unicode_literals
import base64
from .common import InfoExtractor
from ..utils import (
parse_duration,
int_or_none,
)
from ..utils import parse_duration
class ChirbitIE(InfoExtractor):
IE_NAME = 'chirbit'
_VALID_URL = r'https?://(?:www\.)?chirb\.it/(?:(?:wp|pl)/|fb_chirbit_player\.swf\?key=)?(?P<id>[\da-zA-Z]+)'
_TESTS = [{
'url': 'http://chirb.it/PrIPv5',
'md5': '9847b0dad6ac3e074568bf2cfb197de8',
'url': 'http://chirb.it/be2abG',
'info_dict': {
'id': 'PrIPv5',
'id': 'be2abG',
'ext': 'mp3',
'title': 'Фасадстрой',
'duration': 52,
'view_count': int,
'comment_count': int,
'title': 'md5:f542ea253f5255240be4da375c6a5d7e',
'description': 'md5:f24a4e22a71763e32da5fed59e47c770',
'duration': 306,
},
'params': {
'skip_download': True,
}
}, {
'url': 'https://chirb.it/fb_chirbit_player.swf?key=PrIPv5',
'only_matching': True,
}, {
'url': 'https://chirb.it/wp/MN58c2',
'only_matching': True,
}]
def _real_extract(self, url):
@@ -33,27 +36,30 @@ class ChirbitIE(InfoExtractor):
webpage = self._download_webpage(
'http://chirb.it/%s' % audio_id, audio_id)
audio_url = self._search_regex(
r'"setFile"\s*,\s*"([^"]+)"', webpage, 'audio url')
data_fd = self._search_regex(
r'data-fd=(["\'])(?P<url>(?:(?!\1).)+)\1',
webpage, 'data fd', group='url')
# Reverse engineered from https://chirb.it/js/chirbit.player.js (look
# for soundURL)
audio_url = base64.b64decode(
data_fd[::-1].encode('ascii')).decode('utf-8')
title = self._search_regex(
r'itemprop="name">([^<]+)', webpage, 'title')
duration = parse_duration(self._html_search_meta(
'duration', webpage, 'duration', fatal=False))
view_count = int_or_none(self._search_regex(
r'itemprop="playCount"\s*>(\d+)', webpage,
'listen count', fatal=False))
comment_count = int_or_none(self._search_regex(
r'>(\d+) Comments?:', webpage,
'comment count', fatal=False))
r'class=["\']chirbit-title["\'][^>]*>([^<]+)', webpage, 'title')
description = self._search_regex(
r'<h3>Description</h3>\s*<pre[^>]*>([^<]+)</pre>',
webpage, 'description', default=None)
duration = parse_duration(self._search_regex(
r'class=["\']c-length["\'][^>]*>([^<]+)',
webpage, 'duration', fatal=False))
return {
'id': audio_id,
'url': audio_url,
'title': title,
'description': description,
'duration': duration,
'view_count': view_count,
'comment_count': comment_count,
}

View File

@@ -23,7 +23,7 @@ class CliphunterIE(InfoExtractor):
(?P<id>[0-9]+)/
(?P<seo>.+?)(?:$|[#\?])
'''
_TEST = {
_TESTS = [{
'url': 'http://www.cliphunter.com/w/1012420/Fun_Jynx_Maze_solo',
'md5': 'b7c9bbd4eb3a226ab91093714dcaa480',
'info_dict': {
@@ -32,8 +32,19 @@ class CliphunterIE(InfoExtractor):
'title': 'Fun Jynx Maze solo',
'thumbnail': 're:^https?://.*\.jpg$',
'age_limit': 18,
}
}
},
'skip': 'Video gone',
}, {
'url': 'http://www.cliphunter.com/w/2019449/ShesNew__My_booty_girlfriend_Victoria_Paradices_pussy_filled_with_jizz',
'md5': '55a723c67bfc6da6b0cfa00d55da8a27',
'info_dict': {
'id': '2019449',
'ext': 'mp4',
'title': 'ShesNew - My booty girlfriend, Victoria Paradice\'s pussy filled with jizz',
'thumbnail': 're:^https?://.*\.jpg$',
'age_limit': 18,
},
}]
def _real_extract(self, url):
video_id = self._match_id(url)

View File

@@ -1,16 +1,10 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
ExtractorError,
float_or_none,
int_or_none,
parse_iso8601,
)
from .onet import OnetBaseIE
class ClipRsIE(InfoExtractor):
class ClipRsIE(OnetBaseIE):
_VALID_URL = r'https?://(?:www\.)?clip\.rs/(?P<id>[^/]+)/\d+'
_TEST = {
'url': 'http://www.clip.rs/premijera-frajle-predstavljaju-novi-spot-za-pesmu-moli-me-moli/3732',
@@ -27,64 +21,13 @@ class ClipRsIE(InfoExtractor):
}
def _real_extract(self, url):
video_id = self._match_id(url)
display_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
webpage = self._download_webpage(url, display_id)
video_id = self._search_regex(
r'id=(["\'])mvp:(?P<id>.+?)\1', webpage, 'mvp id', group='id')
mvp_id = self._search_mvp_id(webpage)
response = self._download_json(
'http://qi.ckm.onetapi.pl/', video_id,
query={
'body[id]': video_id,
'body[jsonrpc]': '2.0',
'body[method]': 'get_asset_detail',
'body[params][ID_Publikacji]': video_id,
'body[params][Service]': 'www.onet.pl',
'content-type': 'application/jsonp',
'x-onet-app': 'player.front.onetapi.pl',
})
info_dict = self._extract_from_id(mvp_id, webpage)
info_dict['display_id'] = display_id
error = response.get('error')
if error:
raise ExtractorError(
'%s said: %s' % (self.IE_NAME, error['message']), expected=True)
video = response['result'].get('0')
formats = []
for _, formats_dict in video['formats'].items():
if not isinstance(formats_dict, dict):
continue
for format_id, format_list in formats_dict.items():
if not isinstance(format_list, list):
continue
for f in format_list:
if not f.get('url'):
continue
formats.append({
'url': f['url'],
'format_id': format_id,
'height': int_or_none(f.get('vertical_resolution')),
'width': int_or_none(f.get('horizontal_resolution')),
'abr': float_or_none(f.get('audio_bitrate')),
'vbr': float_or_none(f.get('video_bitrate')),
})
self._sort_formats(formats)
meta = video.get('meta', {})
title = self._og_search_title(webpage, default=None) or meta['title']
description = self._og_search_description(webpage, default=None) or meta.get('description')
duration = meta.get('length') or meta.get('lenght')
timestamp = parse_iso8601(meta.get('addDate'), ' ')
return {
'id': video_id,
'title': title,
'description': description,
'duration': duration,
'timestamp': timestamp,
'formats': formats,
}
return info_dict

View File

@@ -6,7 +6,6 @@ import re
from .common import InfoExtractor
from ..compat import (
compat_parse_qs,
compat_urllib_parse_urlencode,
compat_HTTPError,
)
from ..utils import (
@@ -17,37 +16,26 @@ from ..utils import (
class CloudyIE(InfoExtractor):
_IE_DESC = 'cloudy.ec and videoraj.ch'
_IE_DESC = 'cloudy.ec'
_VALID_URL = r'''(?x)
https?://(?:www\.)?(?P<host>cloudy\.ec|videoraj\.(?:ch|to))/
https?://(?:www\.)?cloudy\.ec/
(?:v/|embed\.php\?id=)
(?P<id>[A-Za-z0-9]+)
'''
_EMBED_URL = 'http://www.%s/embed.php?id=%s'
_API_URL = 'http://www.%s/api/player.api.php?%s'
_EMBED_URL = 'http://www.cloudy.ec/embed.php?id=%s'
_API_URL = 'http://www.cloudy.ec/api/player.api.php'
_MAX_TRIES = 2
_TESTS = [
{
'url': 'https://www.cloudy.ec/v/af511e2527aac',
'md5': '5cb253ace826a42f35b4740539bedf07',
'info_dict': {
'id': 'af511e2527aac',
'ext': 'flv',
'title': 'Funny Cats and Animals Compilation june 2013',
}
},
{
'url': 'http://www.videoraj.to/v/47f399fd8bb60',
'md5': '7d0f8799d91efd4eda26587421c3c3b0',
'info_dict': {
'id': '47f399fd8bb60',
'ext': 'flv',
'title': 'Burning a New iPhone 5 with Gasoline - Will it Survive?',
}
_TEST = {
'url': 'https://www.cloudy.ec/v/af511e2527aac',
'md5': '5cb253ace826a42f35b4740539bedf07',
'info_dict': {
'id': 'af511e2527aac',
'ext': 'flv',
'title': 'Funny Cats and Animals Compilation june 2013',
}
]
}
def _extract_video(self, video_host, video_id, file_key, error_url=None, try_num=0):
def _extract_video(self, video_id, file_key, error_url=None, try_num=0):
if try_num > self._MAX_TRIES - 1:
raise ExtractorError('Unable to extract video URL', expected=True)
@@ -64,9 +52,8 @@ class CloudyIE(InfoExtractor):
'errorUrl': error_url,
})
data_url = self._API_URL % (video_host, compat_urllib_parse_urlencode(form))
player_data = self._download_webpage(
data_url, video_id, 'Downloading player data')
self._API_URL, video_id, 'Downloading player data', query=form)
data = compat_parse_qs(player_data)
try_num += 1
@@ -88,7 +75,7 @@ class CloudyIE(InfoExtractor):
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code in [404, 410]:
self.report_warning('Invalid video URL, requesting another', video_id)
return self._extract_video(video_host, video_id, file_key, video_url, try_num)
return self._extract_video(video_id, file_key, video_url, try_num)
return {
'id': video_id,
@@ -98,14 +85,13 @@ class CloudyIE(InfoExtractor):
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_host = mobj.group('host')
video_id = mobj.group('id')
url = self._EMBED_URL % (video_host, video_id)
url = self._EMBED_URL % video_id
webpage = self._download_webpage(url, video_id)
file_key = self._search_regex(
[r'key\s*:\s*"([^"]+)"', r'filekey\s*=\s*"([^"]+)"'],
webpage, 'file_key')
return self._extract_video(video_host, video_id, file_key)
return self._extract_video(video_id, file_key)

View File

@@ -1,5 +1,7 @@
from __future__ import unicode_literals
from .mtv import MTVIE
from ..utils import ExtractorError
class CMTIE(MTVIE):
@@ -16,7 +18,27 @@ class CMTIE(MTVIE):
'title': 'Garth Brooks - "The Call (featuring Trisha Yearwood)"',
'description': 'Blame It All On My Roots',
},
'skip': 'Video not available',
}, {
'url': 'http://www.cmt.com/videos/misc/1504699/still-the-king-ep-109-in-3-minutes.jhtml#id=1739908',
'md5': 'e61a801ca4a183a466c08bd98dccbb1c',
'info_dict': {
'id': '1504699',
'ext': 'mp4',
'title': 'Still The King Ep. 109 in 3 Minutes',
'description': 'Relive or catch up with Still The King by watching this recap of season 1, episode 9. New episodes Sundays 9/8c.',
'timestamp': 1469421000.0,
'upload_date': '20160725',
},
}, {
'url': 'http://www.cmt.com/shows/party-down-south/party-down-south-ep-407-gone-girl/1738172/playlist/#id=1738172',
'only_matching': True,
}]
@classmethod
def _transform_rtmp_url(cls, rtmp_video_url):
if 'error_not_available.swf' in rtmp_video_url:
raise ExtractorError(
'%s said: video is not available' % cls.IE_NAME, expected=True)
return super(CMTIE, cls)._transform_rtmp_url(rtmp_video_url)

View File

@@ -1,17 +1,7 @@
from __future__ import unicode_literals
import re
from .mtv import MTVServicesInfoExtractor
from ..compat import (
compat_str,
compat_urllib_parse_urlencode,
)
from ..utils import (
ExtractorError,
float_or_none,
unified_strdate,
)
from .common import InfoExtractor
class ComedyCentralIE(MTVServicesInfoExtractor):
@@ -26,8 +16,10 @@ class ComedyCentralIE(MTVServicesInfoExtractor):
'info_dict': {
'id': 'cef0cbb3-e776-4bc9-b62e-8016deccb354',
'ext': 'mp4',
'title': 'CC:Stand-Up|Greg Fitzsimmons: Life on Stage|Uncensored - Too Good of a Mother',
'title': 'CC:Stand-Up|August 18, 2013|1|0101|Uncensored - Too Good of a Mother',
'description': 'After a certain point, breastfeeding becomes c**kblocking.',
'timestamp': 1376798400,
'upload_date': '20130818',
},
}, {
'url': 'http://www.cc.com/shows/the-daily-show-with-trevor-noah/interviews/6yx39d/exclusive-rand-paul-extended-interview',
@@ -35,241 +27,92 @@ class ComedyCentralIE(MTVServicesInfoExtractor):
}]
class ComedyCentralShowsIE(MTVServicesInfoExtractor):
IE_DESC = 'The Daily Show / The Colbert Report'
# urls can be abbreviations like :thedailyshow
# urls for episodes like:
# or urls for clips like: http://www.thedailyshow.com/watch/mon-december-10-2012/any-given-gun-day
# or: http://www.colbertnation.com/the-colbert-report-videos/421667/november-29-2012/moon-shattering-news
# or: http://www.colbertnation.com/the-colbert-report-collections/422008/festival-of-lights/79524
_VALID_URL = r'''(?x)^(:(?P<shortname>tds|thedailyshow)
|https?://(:www\.)?
(?P<showname>thedailyshow|thecolbertreport|tosh)\.(?:cc\.)?com/
((?:full-)?episodes/(?:[0-9a-z]{6}/)?(?P<episode>.*)|
(?P<clip>
(?:(?:guests/[^/]+|videos|video-(?:clips|playlists)|special-editions|news-team/[^/]+)/[^/]+/(?P<videotitle>[^/?#]+))
|(the-colbert-report-(videos|collections)/(?P<clipID>[0-9]+)/[^/]*/(?P<cntitle>.*?))
|(watch/(?P<date>[^/]*)/(?P<tdstitle>.*))
)|
(?P<interview>
extended-interviews/(?P<interID>[0-9a-z]+)/
(?:playlist_tds_extended_)?(?P<interview_title>[^/?#]*?)
(?:/[^/?#]?|[?#]|$))))
'''
class ToshIE(MTVServicesInfoExtractor):
IE_DESC = 'Tosh.0'
_VALID_URL = r'^https?://tosh\.cc\.com/video-(?:clips|collections)/[^/]+/(?P<videotitle>[^/?#]+)'
_FEED_URL = 'http://tosh.cc.com/feeds/mrss'
_TESTS = [{
'url': 'http://thedailyshow.cc.com/watch/thu-december-13-2012/kristen-stewart',
'md5': '4e2f5cb088a83cd8cdb7756132f9739d',
'info_dict': {
'id': 'ab9ab3e7-5a98-4dbe-8b21-551dc0523d55',
'ext': 'mp4',
'upload_date': '20121213',
'description': 'Kristen Stewart learns to let loose in "On the Road."',
'uploader': 'thedailyshow',
'title': 'thedailyshow kristen-stewart part 1',
}
}, {
'url': 'http://thedailyshow.cc.com/extended-interviews/b6364d/sarah-chayes-extended-interview',
'info_dict': {
'id': 'sarah-chayes-extended-interview',
'description': 'Carnegie Endowment Senior Associate Sarah Chayes discusses how corrupt institutions function throughout the world in her book "Thieves of State: Why Corruption Threatens Global Security."',
'title': 'thedailyshow Sarah Chayes Extended Interview',
},
'playlist': [
{
'info_dict': {
'id': '0baad492-cbec-4ec1-9e50-ad91c291127f',
'ext': 'mp4',
'upload_date': '20150129',
'description': 'Carnegie Endowment Senior Associate Sarah Chayes discusses how corrupt institutions function throughout the world in her book "Thieves of State: Why Corruption Threatens Global Security."',
'uploader': 'thedailyshow',
'title': 'thedailyshow sarah-chayes-extended-interview part 1',
},
},
{
'info_dict': {
'id': '1e4fb91b-8ce7-4277-bd7c-98c9f1bbd283',
'ext': 'mp4',
'upload_date': '20150129',
'description': 'Carnegie Endowment Senior Associate Sarah Chayes discusses how corrupt institutions function throughout the world in her book "Thieves of State: Why Corruption Threatens Global Security."',
'uploader': 'thedailyshow',
'title': 'thedailyshow sarah-chayes-extended-interview part 2',
},
},
],
'params': {
'skip_download': True,
},
}, {
'url': 'http://thedailyshow.cc.com/extended-interviews/xm3fnq/andrew-napolitano-extended-interview',
'only_matching': True,
}, {
'url': 'http://thecolbertreport.cc.com/videos/29w6fx/-realhumanpraise-for-fox-news',
'only_matching': True,
}, {
'url': 'http://thecolbertreport.cc.com/videos/gh6urb/neil-degrasse-tyson-pt--1?xrs=eml_col_031114',
'only_matching': True,
}, {
'url': 'http://thedailyshow.cc.com/guests/michael-lewis/3efna8/exclusive---michael-lewis-extended-interview-pt--3',
'only_matching': True,
}, {
'url': 'http://thedailyshow.cc.com/episodes/sy7yv0/april-8--2014---denis-leary',
'only_matching': True,
}, {
'url': 'http://thecolbertreport.cc.com/episodes/8ase07/april-8--2014---jane-goodall',
'only_matching': True,
}, {
'url': 'http://thedailyshow.cc.com/video-playlists/npde3s/the-daily-show-19088-highlights',
'only_matching': True,
}, {
'url': 'http://thedailyshow.cc.com/video-playlists/t6d9sg/the-daily-show-20038-highlights/be3cwo',
'only_matching': True,
}, {
'url': 'http://thedailyshow.cc.com/special-editions/2l8fdb/special-edition---a-look-back-at-food',
'only_matching': True,
}, {
'url': 'http://thedailyshow.cc.com/news-team/michael-che/7wnfel/we-need-to-talk-about-israel',
'only_matching': True,
}, {
'url': 'http://tosh.cc.com/video-clips/68g93d/twitter-users-share-summer-plans',
'info_dict': {
'description': 'Tosh asked fans to share their summer plans.',
'title': 'Twitter Users Share Summer Plans',
},
'playlist': [{
'md5': 'f269e88114c1805bb6d7653fecea9e06',
'info_dict': {
'id': '90498ec2-ed00-11e0-aca6-0026b9414f30',
'ext': 'mp4',
'title': 'Tosh.0|June 9, 2077|2|211|Twitter Users Share Summer Plans',
'description': 'Tosh asked fans to share their summer plans.',
'thumbnail': 're:^https?://.*\.jpg',
# It's really reported to be published on year 2077
'upload_date': '20770610',
'timestamp': 3390510600,
'subtitles': {
'en': 'mincount:3',
},
},
}]
}, {
'url': 'http://tosh.cc.com/video-collections/x2iz7k/just-plain-foul/m5q4fp',
'only_matching': True,
}]
_available_formats = ['3500', '2200', '1700', '1200', '750', '400']
@classmethod
def _transform_rtmp_url(cls, rtmp_video_url):
new_urls = super(ToshIE, cls)._transform_rtmp_url(rtmp_video_url)
new_urls['rtmp'] = rtmp_video_url.replace('viacomccstrm', 'viacommtvstrm')
return new_urls
_video_extensions = {
'3500': 'mp4',
'2200': 'mp4',
'1700': 'mp4',
'1200': 'mp4',
'750': 'mp4',
'400': 'mp4',
}
_video_dimensions = {
'3500': (1280, 720),
'2200': (960, 540),
'1700': (768, 432),
'1200': (640, 360),
'750': (512, 288),
'400': (384, 216),
}
class ComedyCentralTVIE(MTVServicesInfoExtractor):
_VALID_URL = r'https?://(?:www\.)?comedycentral\.tv/(?:staffeln|shows)/(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'http://www.comedycentral.tv/staffeln/7436-the-mindy-project-staffel-4',
'info_dict': {
'id': 'local_playlist-f99b626bdfe13568579a',
'ext': 'flv',
'title': 'Episode_the-mindy-project_shows_season-4_episode-3_full-episode_part1',
},
'params': {
# rtmp download
'skip_download': True,
},
}, {
'url': 'http://www.comedycentral.tv/shows/1074-workaholics',
'only_matching': True,
}, {
'url': 'http://www.comedycentral.tv/shows/1727-the-mindy-project/bonus',
'only_matching': True,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = self._match_id(url)
if mobj.group('shortname'):
return self.url_result('http://www.cc.com/shows/the-daily-show-with-trevor-noah/full-episodes')
webpage = self._download_webpage(url, video_id)
if mobj.group('clip'):
if mobj.group('videotitle'):
epTitle = mobj.group('videotitle')
elif mobj.group('showname') == 'thedailyshow':
epTitle = mobj.group('tdstitle')
else:
epTitle = mobj.group('cntitle')
dlNewest = False
elif mobj.group('interview'):
epTitle = mobj.group('interview_title')
dlNewest = False
else:
dlNewest = not mobj.group('episode')
if dlNewest:
epTitle = mobj.group('showname')
else:
epTitle = mobj.group('episode')
show_name = mobj.group('showname')
mrss_url = self._search_regex(
r'data-mrss=(["\'])(?P<url>(?:(?!\1).)+)\1',
webpage, 'mrss url', group='url')
webpage, htmlHandle = self._download_webpage_handle(url, epTitle)
if dlNewest:
url = htmlHandle.geturl()
mobj = re.match(self._VALID_URL, url, re.VERBOSE)
if mobj is None:
raise ExtractorError('Invalid redirected URL: ' + url)
if mobj.group('episode') == '':
raise ExtractorError('Redirected URL is still not specific: ' + url)
epTitle = (mobj.group('episode') or mobj.group('videotitle')).rpartition('/')[-1]
return self._get_videos_info_from_url(mrss_url, video_id)
mMovieParams = re.findall('(?:<param name="movie" value="|var url = ")(http://media.mtvnservices.com/([^"]*(?:episode|video).*?:.*?))"', webpage)
if len(mMovieParams) == 0:
# The Colbert Report embeds the information in a without
# a URL prefix; so extract the alternate reference
# and then add the URL prefix manually.
altMovieParams = re.findall('data-mgid="([^"]*(?:episode|video|playlist).*?:.*?)"', webpage)
if len(altMovieParams) == 0:
raise ExtractorError('unable to find Flash URL in webpage ' + url)
else:
mMovieParams = [('http://media.mtvnservices.com/' + altMovieParams[0], altMovieParams[0])]
class ComedyCentralShortnameIE(InfoExtractor):
_VALID_URL = r'^:(?P<id>tds|thedailyshow)$'
_TESTS = [{
'url': ':tds',
'only_matching': True,
}, {
'url': ':thedailyshow',
'only_matching': True,
}]
uri = mMovieParams[0][1]
# Correct cc.com in uri
uri = re.sub(r'(episode:[^.]+)(\.cc)?\.com', r'\1.com', uri)
index_url = 'http://%s.cc.com/feeds/mrss?%s' % (show_name, compat_urllib_parse_urlencode({'uri': uri}))
idoc = self._download_xml(
index_url, epTitle,
'Downloading show index', 'Unable to download episode index')
title = idoc.find('./channel/title').text
description = idoc.find('./channel/description').text
entries = []
item_els = idoc.findall('.//item')
for part_num, itemEl in enumerate(item_els):
upload_date = unified_strdate(itemEl.findall('./pubDate')[0].text)
thumbnail = itemEl.find('.//{http://search.yahoo.com/mrss/}thumbnail').attrib.get('url')
content = itemEl.find('.//{http://search.yahoo.com/mrss/}content')
duration = float_or_none(content.attrib.get('duration'))
mediagen_url = content.attrib['url']
guid = itemEl.find('./guid').text.rpartition(':')[-1]
cdoc = self._download_xml(
mediagen_url, epTitle,
'Downloading configuration for segment %d / %d' % (part_num + 1, len(item_els)))
turls = []
for rendition in cdoc.findall('.//rendition'):
finfo = (rendition.attrib['bitrate'], rendition.findall('./src')[0].text)
turls.append(finfo)
formats = []
for format, rtmp_video_url in turls:
w, h = self._video_dimensions.get(format, (None, None))
formats.append({
'format_id': 'vhttp-%s' % format,
'url': self._transform_rtmp_url(rtmp_video_url),
'ext': self._video_extensions.get(format, 'mp4'),
'height': h,
'width': w,
})
formats.append({
'format_id': 'rtmp-%s' % format,
'url': rtmp_video_url.replace('viacomccstrm', 'viacommtvstrm'),
'ext': self._video_extensions.get(format, 'mp4'),
'height': h,
'width': w,
})
self._sort_formats(formats)
subtitles = self._extract_subtitles(cdoc, guid)
virtual_id = show_name + ' ' + epTitle + ' part ' + compat_str(part_num + 1)
entries.append({
'id': guid,
'title': virtual_id,
'formats': formats,
'uploader': show_name,
'upload_date': upload_date,
'duration': duration,
'thumbnail': thumbnail,
'description': description,
'subtitles': subtitles,
})
return {
'_type': 'playlist',
'id': epTitle,
'entries': entries,
'title': show_name + ' ' + title,
'description': description,
def _real_extract(self, url):
video_id = self._match_id(url)
shortcut_map = {
'tds': 'http://www.cc.com/shows/the-daily-show-with-trevor-noah/full-episodes',
'thedailyshow': 'http://www.cc.com/shows/the-daily-show-with-trevor-noah/full-episodes',
}
return self.url_result(shortcut_map[video_id])

View File

@@ -44,6 +44,7 @@ from ..utils import (
sanitized_Request,
unescapeHTML,
unified_strdate,
unified_timestamp,
url_basename,
xpath_element,
xpath_text,
@@ -54,6 +55,8 @@ from ..utils import (
update_Request,
update_url_query,
parse_m3u8_attributes,
extract_attributes,
parse_codecs,
)
@@ -161,6 +164,7 @@ class InfoExtractor(object):
* "height" (optional, int)
* "resolution" (optional, string "{width}x{height"},
deprecated)
* "filesize" (optional, int)
thumbnail: Full URL to a video thumbnail image.
description: Full video description.
uploader: Full name of the video uploader.
@@ -723,9 +727,14 @@ class InfoExtractor(object):
[^>]+?content=(["\'])(?P<content>.*?)\2''' % re.escape(prop)
def _og_search_property(self, prop, html, name=None, **kargs):
if not isinstance(prop, (list, tuple)):
prop = [prop]
if name is None:
name = 'OpenGraph %s' % prop
escaped = self._search_regex(self._og_regexes(prop), html, name, flags=re.DOTALL, **kargs)
name = 'OpenGraph %s' % prop[0]
og_regexes = []
for p in prop:
og_regexes.extend(self._og_regexes(p))
escaped = self._search_regex(og_regexes, html, name, flags=re.DOTALL, **kargs)
if escaped is None:
return None
return unescapeHTML(escaped)
@@ -803,40 +812,66 @@ class InfoExtractor(object):
return self._html_search_meta('twitter:player', html,
'twitter card player')
def _search_json_ld(self, html, video_id, **kwargs):
def _search_json_ld(self, html, video_id, expected_type=None, **kwargs):
json_ld = self._search_regex(
r'(?s)<script[^>]+type=(["\'])application/ld\+json\1[^>]*>(?P<json_ld>.+?)</script>',
html, 'JSON-LD', group='json_ld', **kwargs)
default = kwargs.get('default', NO_DEFAULT)
if not json_ld:
return {}
return self._json_ld(json_ld, video_id, fatal=kwargs.get('fatal', True))
return default if default is not NO_DEFAULT else {}
# JSON-LD may be malformed and thus `fatal` should be respected.
# At the same time `default` may be passed that assumes `fatal=False`
# for _search_regex. Let's simulate the same behavior here as well.
fatal = kwargs.get('fatal', True) if default == NO_DEFAULT else False
return self._json_ld(json_ld, video_id, fatal=fatal, expected_type=expected_type)
def _json_ld(self, json_ld, video_id, fatal=True):
def _json_ld(self, json_ld, video_id, fatal=True, expected_type=None):
if isinstance(json_ld, compat_str):
json_ld = self._parse_json(json_ld, video_id, fatal=fatal)
if not json_ld:
return {}
info = {}
if json_ld.get('@context') == 'http://schema.org':
item_type = json_ld.get('@type')
if item_type == 'TVEpisode':
info.update({
'episode': unescapeHTML(json_ld.get('name')),
'episode_number': int_or_none(json_ld.get('episodeNumber')),
'description': unescapeHTML(json_ld.get('description')),
})
part_of_season = json_ld.get('partOfSeason')
if isinstance(part_of_season, dict) and part_of_season.get('@type') == 'TVSeason':
info['season_number'] = int_or_none(part_of_season.get('seasonNumber'))
part_of_series = json_ld.get('partOfSeries')
if isinstance(part_of_series, dict) and part_of_series.get('@type') == 'TVSeries':
info['series'] = unescapeHTML(part_of_series.get('name'))
elif item_type == 'Article':
info.update({
'timestamp': parse_iso8601(json_ld.get('datePublished')),
'title': unescapeHTML(json_ld.get('headline')),
'description': unescapeHTML(json_ld.get('articleBody')),
})
if not isinstance(json_ld, (list, tuple, dict)):
return info
if isinstance(json_ld, dict):
json_ld = [json_ld]
for e in json_ld:
if e.get('@context') == 'http://schema.org':
item_type = e.get('@type')
if expected_type is not None and expected_type != item_type:
return info
if item_type == 'TVEpisode':
info.update({
'episode': unescapeHTML(e.get('name')),
'episode_number': int_or_none(e.get('episodeNumber')),
'description': unescapeHTML(e.get('description')),
})
part_of_season = e.get('partOfSeason')
if isinstance(part_of_season, dict) and part_of_season.get('@type') == 'TVSeason':
info['season_number'] = int_or_none(part_of_season.get('seasonNumber'))
part_of_series = e.get('partOfSeries') or e.get('partOfTVSeries')
if isinstance(part_of_series, dict) and part_of_series.get('@type') == 'TVSeries':
info['series'] = unescapeHTML(part_of_series.get('name'))
elif item_type == 'Article':
info.update({
'timestamp': parse_iso8601(e.get('datePublished')),
'title': unescapeHTML(e.get('headline')),
'description': unescapeHTML(e.get('articleBody')),
})
elif item_type == 'VideoObject':
info.update({
'url': e.get('contentUrl'),
'title': unescapeHTML(e.get('name')),
'description': unescapeHTML(e.get('description')),
'thumbnail': e.get('thumbnailUrl'),
'duration': parse_duration(e.get('duration')),
'timestamp': unified_timestamp(e.get('uploadDate')),
'filesize': float_or_none(e.get('contentSize')),
'tbr': int_or_none(e.get('bitrate')),
'width': int_or_none(e.get('width')),
'height': int_or_none(e.get('height')),
})
break
return dict((k, v) for k, v in info.items() if v is not None)
@staticmethod
@@ -890,7 +925,8 @@ class InfoExtractor(object):
if f.get('ext') in ['f4f', 'f4m']: # Not yet supported
preference -= 0.5
proto_preference = 0 if determine_protocol(f) in ['http', 'https'] else -0.1
protocol = f.get('protocol') or determine_protocol(f)
proto_preference = 0 if protocol in ['http', 'https'] else (-0.5 if protocol == 'rtsp' else -0.1)
if f.get('vcodec') == 'none': # audio only
preference -= 50
@@ -1107,7 +1143,7 @@ class InfoExtractor(object):
'url': m3u8_url,
'ext': ext,
'protocol': 'm3u8',
'preference': preference - 1 if preference else -1,
'preference': preference - 100 if preference else -100,
'resolution': 'multiple',
'format_note': 'Quality selection URL',
}
@@ -1186,6 +1222,7 @@ class InfoExtractor(object):
'url': format_url(line.strip()),
'tbr': tbr,
'ext': ext,
'fps': float_or_none(last_info.get('FRAME-RATE')),
'protocol': entry_protocol,
'preference': preference,
}
@@ -1194,24 +1231,17 @@ class InfoExtractor(object):
width_str, height_str = resolution.split('x')
f['width'] = int(width_str)
f['height'] = int(height_str)
codecs = last_info.get('CODECS')
if codecs:
vcodec, acodec = [None] * 2
va_codecs = codecs.split(',')
if len(va_codecs) == 1:
# Audio only entries usually come with single codec and
# no resolution. For more robustness we also check it to
# be mp4 audio.
if not resolution and va_codecs[0].startswith('mp4a'):
vcodec, acodec = 'none', va_codecs[0]
else:
vcodec = va_codecs[0]
else:
vcodec, acodec = va_codecs[:2]
# Unified Streaming Platform
mobj = re.search(
r'audio.*?(?:%3D|=)(\d+)(?:-video.*?(?:%3D|=)(\d+))?', f['url'])
if mobj:
abr, vbr = mobj.groups()
abr, vbr = float_or_none(abr, 1000), float_or_none(vbr, 1000)
f.update({
'acodec': acodec,
'vcodec': vcodec,
'vbr': vbr,
'abr': abr,
})
f.update(parse_codecs(last_info.get('CODECS')))
if last_media is not None:
f['m3u8_media'] = last_media
last_media = None
@@ -1466,6 +1496,13 @@ class InfoExtractor(object):
compat_etree_fromstring(mpd.encode('utf-8')), mpd_id, mpd_base_url, formats_dict=formats_dict)
def _parse_mpd_formats(self, mpd_doc, mpd_id=None, mpd_base_url='', formats_dict={}):
"""
Parse formats from MPD manifest.
References:
1. MPEG-DASH Standard, ISO/IEC 23009-1:2014(E),
http://standards.iso.org/ittf/PubliclyAvailableStandards/c065274_ISO_IEC_23009-1_2014.zip
2. https://en.wikipedia.org/wiki/Dynamic_Adaptive_Streaming_over_HTTP
"""
if mpd_doc.get('type') == 'dynamic':
return []
@@ -1498,8 +1535,16 @@ class InfoExtractor(object):
s_e = segment_timeline.findall(_add_ns('S'))
if s_e:
ms_info['total_number'] = 0
ms_info['s'] = []
for s in s_e:
ms_info['total_number'] += 1 + int(s.get('r', '0'))
r = int(s.get('r', 0))
ms_info['total_number'] += 1 + r
ms_info['s'].append({
't': int(s.get('t', 0)),
# @d is mandatory (see [1, 5.3.9.6.2, Table 17, page 60])
'd': int(s.attrib['d']),
'r': r,
})
else:
timescale = segment_template.get('timescale')
if timescale:
@@ -1536,7 +1581,7 @@ class InfoExtractor(object):
continue
representation_attrib = adaptation_set.attrib.copy()
representation_attrib.update(representation.attrib)
# According to page 41 of ISO/IEC 29001-1:2014, @mimeType is mandatory
# According to [1, 5.3.7.2, Table 9, page 41], @mimeType is mandatory
mime_type = representation_attrib['mimeType']
content_type = mime_type.split('/')[0]
if content_type == 'text':
@@ -1580,16 +1625,40 @@ class InfoExtractor(object):
representation_ms_info['total_number'] = int(math.ceil(float(period_duration) / segment_duration))
media_template = representation_ms_info['media_template']
media_template = media_template.replace('$RepresentationID$', representation_id)
media_template = re.sub(r'\$(Number|Bandwidth)\$', r'%(\1)d', media_template)
media_template = re.sub(r'\$(Number|Bandwidth)%([^$]+)\$', r'%(\1)\2', media_template)
media_template = re.sub(r'\$(Number|Bandwidth|Time)\$', r'%(\1)d', media_template)
media_template = re.sub(r'\$(Number|Bandwidth|Time)%([^$]+)\$', r'%(\1)\2', media_template)
media_template.replace('$$', '$')
representation_ms_info['segment_urls'] = [
media_template % {
'Number': segment_number,
'Bandwidth': representation_attrib.get('bandwidth')}
for segment_number in range(
representation_ms_info['start_number'],
representation_ms_info['total_number'] + representation_ms_info['start_number'])]
# As per [1, 5.3.9.4.4, Table 16, page 55] $Number$ and $Time$
# can't be used at the same time
if '%(Number' in media_template:
representation_ms_info['segment_urls'] = [
media_template % {
'Number': segment_number,
'Bandwidth': representation_attrib.get('bandwidth'),
}
for segment_number in range(
representation_ms_info['start_number'],
representation_ms_info['total_number'] + representation_ms_info['start_number'])]
else:
representation_ms_info['segment_urls'] = []
segment_time = 0
def add_segment_url():
representation_ms_info['segment_urls'].append(
media_template % {
'Time': segment_time,
'Bandwidth': representation_attrib.get('bandwidth'),
}
)
for num, s in enumerate(representation_ms_info['s']):
segment_time = s.get('t') or segment_time
add_segment_url()
for r in range(s.get('r', 0)):
segment_time += s['d']
add_segment_url()
segment_time += s['d']
if 'segment_urls' in representation_ms_info:
f.update({
'segment_urls': representation_ms_info['segment_urls'],
@@ -1616,6 +1685,62 @@ class InfoExtractor(object):
self.report_warning('Unknown MIME type %s in DASH manifest' % mime_type)
return formats
def _parse_html5_media_entries(self, base_url, webpage):
def absolute_url(video_url):
return compat_urlparse.urljoin(base_url, video_url)
def parse_content_type(content_type):
if not content_type:
return {}
ctr = re.search(r'(?P<mimetype>[^/]+/[^;]+)(?:;\s*codecs="?(?P<codecs>[^"]+))?', content_type)
if ctr:
mimetype, codecs = ctr.groups()
f = parse_codecs(codecs)
f['ext'] = mimetype2ext(mimetype)
return f
return {}
entries = []
for media_tag, media_type, media_content in re.findall(r'(?s)(<(?P<tag>video|audio)[^>]*>)(.*?)</(?P=tag)>', webpage):
media_info = {
'formats': [],
'subtitles': {},
}
media_attributes = extract_attributes(media_tag)
src = media_attributes.get('src')
if src:
media_info['formats'].append({
'url': absolute_url(src),
'vcodec': 'none' if media_type == 'audio' else None,
})
media_info['thumbnail'] = media_attributes.get('poster')
if media_content:
for source_tag in re.findall(r'<source[^>]+>', media_content):
source_attributes = extract_attributes(source_tag)
src = source_attributes.get('src')
if not src:
continue
f = parse_content_type(source_attributes.get('type'))
f.update({
'url': absolute_url(src),
'vcodec': 'none' if media_type == 'audio' else None,
})
media_info['formats'].append(f)
for track_tag in re.findall(r'<track[^>]+>', media_content):
track_attributes = extract_attributes(track_tag)
kind = track_attributes.get('kind')
if not kind or kind == 'subtitles':
src = track_attributes.get('src')
if not src:
continue
lang = track_attributes.get('srclang') or track_attributes.get('lang') or track_attributes.get('label')
media_info['subtitles'].setdefault(lang, []).append({
'url': absolute_url(src),
})
if media_info['formats']:
entries.append(media_info)
return entries
def _live_title(self, name):
""" Generate the title for a live video """
now = datetime.datetime.now()
@@ -1676,7 +1801,7 @@ class InfoExtractor(object):
any_restricted = False
for tc in self.get_testcases(include_onlymatching=False):
if 'playlist' in tc:
if tc.get('playlist', []):
tc = tc['playlist'][0]
is_restricted = age_restricted(
tc.get('info_dict', {}).get('age_limit'), age_limit)
@@ -1729,6 +1854,13 @@ class InfoExtractor(object):
def _mark_watched(self, *args, **kwargs):
raise NotImplementedError('This method must be implemented by subclasses')
def geo_verification_headers(self):
headers = {}
geo_verification_proxy = self._downloader.params.get('geo_verification_proxy')
if geo_verification_proxy:
headers['Ytdl-request-proxy'] = geo_verification_proxy
return headers
class SearchInfoExtractor(InfoExtractor):
"""

View File

@@ -5,13 +5,17 @@ import re
from .common import InfoExtractor
from ..compat import (
compat_urllib_parse_urlencode,
compat_urllib_parse_urlparse,
compat_urlparse,
)
from ..utils import (
orderedSet,
remove_end,
extract_attributes,
mimetype2ext,
determine_ext,
int_or_none,
parse_iso8601,
)
@@ -58,6 +62,9 @@ class CondeNastIE(InfoExtractor):
'ext': 'mp4',
'title': '3D Printed Speakers Lit With LED',
'description': 'Check out these beautiful 3D printed LED speakers. You can\'t actually buy them, but LumiGeek is working on a board that will let you make you\'re own.',
'uploader': 'wired',
'upload_date': '20130314',
'timestamp': 1363219200,
}
}, {
# JS embed
@@ -67,70 +74,93 @@ class CondeNastIE(InfoExtractor):
'id': '55f9cf8b61646d1acf00000c',
'ext': 'mp4',
'title': '3D printed TSA Travel Sentry keys really do open TSA locks',
'uploader': 'arstechnica',
'upload_date': '20150916',
'timestamp': 1442434955,
}
}]
def _extract_series(self, url, webpage):
title = self._html_search_regex(r'<div class="cne-series-info">.*?<h1>(.+?)</h1>',
webpage, 'series title', flags=re.DOTALL)
title = self._html_search_regex(
r'(?s)<div class="cne-series-info">.*?<h1>(.+?)</h1>',
webpage, 'series title')
url_object = compat_urllib_parse_urlparse(url)
base_url = '%s://%s' % (url_object.scheme, url_object.netloc)
m_paths = re.finditer(r'<p class="cne-thumb-title">.*?<a href="(/watch/.+?)["\?]',
webpage, flags=re.DOTALL)
m_paths = re.finditer(
r'(?s)<p class="cne-thumb-title">.*?<a href="(/watch/.+?)["\?]', webpage)
paths = orderedSet(m.group(1) for m in m_paths)
build_url = lambda path: compat_urlparse.urljoin(base_url, path)
entries = [self.url_result(build_url(path), 'CondeNast') for path in paths]
return self.playlist_result(entries, playlist_title=title)
def _extract_video(self, webpage, url_type):
if url_type != 'embed':
description = self._html_search_regex(
[
r'<div class="cne-video-description">(.+?)</div>',
r'<div class="video-post-content">(.+?)</div>',
],
webpage, 'description', fatal=False, flags=re.DOTALL)
query = {}
params = self._search_regex(
r'(?s)var params = {(.+?)}[;,]', webpage, 'player params', default=None)
if params:
query.update({
'videoId': self._search_regex(r'videoId: [\'"](.+?)[\'"]', params, 'video id'),
'playerId': self._search_regex(r'playerId: [\'"](.+?)[\'"]', params, 'player id'),
'target': self._search_regex(r'target: [\'"](.+?)[\'"]', params, 'target'),
})
else:
description = None
params = self._search_regex(r'var params = {(.+?)}[;,]', webpage,
'player params', flags=re.DOTALL)
video_id = self._search_regex(r'videoId: [\'"](.+?)[\'"]', params, 'video id')
player_id = self._search_regex(r'playerId: [\'"](.+?)[\'"]', params, 'player id')
target = self._search_regex(r'target: [\'"](.+?)[\'"]', params, 'target')
data = compat_urllib_parse_urlencode({'videoId': video_id,
'playerId': player_id,
'target': target,
})
base_info_url = self._search_regex(r'url = [\'"](.+?)[\'"][,;]',
webpage, 'base info url',
default='http://player.cnevids.com/player/loader.js?')
info_url = base_info_url + data
info_page = self._download_webpage(info_url, video_id,
'Downloading video info')
video_info = self._search_regex(r'var\s+video\s*=\s*({.+?});', info_page, 'video info')
video_info = self._parse_json(video_info, video_id)
params = extract_attributes(self._search_regex(
r'(<[^>]+data-js="video-player"[^>]+>)',
webpage, 'player params element'))
query.update({
'videoId': params['data-video'],
'playerId': params['data-player'],
'target': params['id'],
})
video_id = query['videoId']
video_info = None
info_page = self._download_webpage(
'http://player.cnevids.com/player/video.js',
video_id, 'Downloading video info', query=query, fatal=False)
if info_page:
video_info = self._parse_json(self._search_regex(
r'loadCallback\(({.+})\)', info_page, 'video info'), video_id)['video']
else:
info_page = self._download_webpage(
'http://player.cnevids.com/player/loader.js',
video_id, 'Downloading loader info', query=query)
video_info = self._parse_json(self._search_regex(
r'var\s+video\s*=\s*({.+?});', info_page, 'video info'), video_id)
title = video_info['title']
formats = [{
'format_id': '%s-%s' % (fdata['type'].split('/')[-1], fdata['quality']),
'url': fdata['src'],
'ext': fdata['type'].split('/')[-1],
'quality': 1 if fdata['quality'] == 'high' else 0,
} for fdata in video_info['sources'][0]]
formats = []
for fdata in video_info.get('sources', [{}])[0]:
src = fdata.get('src')
if not src:
continue
ext = mimetype2ext(fdata.get('type')) or determine_ext(src)
quality = fdata.get('quality')
formats.append({
'format_id': ext + ('-%s' % quality if quality else ''),
'url': src,
'ext': ext,
'quality': 1 if quality == 'high' else 0,
})
self._sort_formats(formats)
return {
info = self._search_json_ld(
webpage, video_id, fatal=False) if url_type != 'embed' else {}
info.update({
'id': video_id,
'formats': formats,
'title': video_info['title'],
'thumbnail': video_info['poster_frame'],
'description': description,
}
'title': title,
'thumbnail': video_info.get('poster_frame'),
'uploader': video_info.get('brand'),
'duration': int_or_none(video_info.get('duration')),
'tags': video_info.get('tags'),
'series': video_info.get('series_title'),
'season': video_info.get('season_title'),
'timestamp': parse_iso8601(video_info.get('premiere_date')),
})
return info
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
site = mobj.group('site')
url_type = mobj.group('type')
item_id = mobj.group('id')
site, url_type, item_id = re.match(self._VALID_URL, url).groups()
# Convert JS embed to regular embed
if url_type == 'embedjs':

View File

@@ -114,6 +114,21 @@ class CrunchyrollIE(CrunchyrollBaseIE):
# rtmp
'skip_download': True,
},
}, {
'url': 'http://www.crunchyroll.com/rezero-starting-life-in-another-world-/episode-5-the-morning-of-our-promise-is-still-distant-702409',
'info_dict': {
'id': '702409',
'ext': 'mp4',
'title': 'Re:ZERO -Starting Life in Another World- Episode 5 The Morning of Our Promise Is Still Distant',
'description': 'md5:97664de1ab24bbf77a9c01918cb7dca9',
'thumbnail': 're:^https?://.*\.jpg$',
'uploader': 'TV TOKYO',
'upload_date': '20160508',
},
'params': {
# m3u8 download
'skip_download': True,
},
}, {
'url': 'http://www.crunchyroll.fr/girl-friend-beta/episode-11-goodbye-la-mode-661697',
'only_matching': True,
@@ -336,9 +351,18 @@ Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
if video_encode_id in video_encode_ids:
continue
video_encode_ids.append(video_encode_id)
video_file = xpath_text(stream_info, './file')
if not video_file:
continue
if video_file.startswith('http'):
formats.extend(self._extract_m3u8_formats(
video_file, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls', fatal=False))
continue
video_url = xpath_text(stream_info, './host')
video_play_path = xpath_text(stream_info, './file')
if not video_url or not video_play_path:
if not video_url:
continue
metadata = stream_info.find('./metadata')
format_info = {
@@ -353,7 +377,7 @@ Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
parsed_video_url = compat_urlparse.urlparse(video_url)
direct_video_url = compat_urlparse.urlunparse(parsed_video_url._replace(
netloc='v.lvlt.crcdn.net',
path='%s/%s' % (remove_end(parsed_video_url.path, '/'), video_play_path.split(':')[-1])))
path='%s/%s' % (remove_end(parsed_video_url.path, '/'), video_file.split(':')[-1])))
if self._is_valid_url(direct_video_url, video_id, video_format):
format_info.update({
'url': direct_video_url,
@@ -363,7 +387,7 @@ Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
format_info.update({
'url': video_url,
'play_path': video_play_path,
'play_path': video_file,
'ext': 'flv',
})
formats.append(format_info)

View File

@@ -51,8 +51,11 @@ class CSpanIE(InfoExtractor):
'url': 'http://www.c-span.org/video/?104517-1/immigration-reforms-needed-protect-skilled-american-workers',
'info_dict': {
'id': 'judiciary031715',
'ext': 'flv',
'ext': 'mp4',
'title': 'Immigration Reforms Needed to Protect Skilled American Workers',
},
'params': {
'skip_download': True, # m3u8 downloads
}
}]

View File

@@ -1,13 +1,12 @@
# -*- coding: utf-8 -*-
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import parse_iso8601, ExtractorError
from ..utils import unified_timestamp
class CtsNewsIE(InfoExtractor):
IE_DESC = '華視新聞'
# https connection failed (Connection reset)
_VALID_URL = r'https?://news\.cts\.com\.tw/[a-z]+/[a-z]+/\d+/(?P<id>\d+)\.html'
_TESTS = [{
'url': 'http://news.cts.com.tw/cts/international/201501/201501291578109.html',
@@ -16,7 +15,7 @@ class CtsNewsIE(InfoExtractor):
'id': '201501291578109',
'ext': 'mp4',
'title': '以色列.真主黨交火 3人死亡',
'description': 'md5:95e9b295c898b7ff294f09d450178d7d',
'description': '以色列和黎巴嫩真主黨,爆發五年最嚴重衝突,雙方砲轟交火,兩名以軍死亡,還有一名西班牙籍的聯合國維和人...',
'timestamp': 1422528540,
'upload_date': '20150129',
}
@@ -28,7 +27,7 @@ class CtsNewsIE(InfoExtractor):
'id': '201309031304098',
'ext': 'mp4',
'title': '韓國31歲童顏男 貌如十多歲小孩',
'description': 'md5:f183feeba3752b683827aab71adad584',
'description': '越有年紀的人越希望看起來年輕一點而南韓卻有一位31歲的男子看起來像是11、12歲的小孩身...',
'thumbnail': 're:^https?://.*\.jpg$',
'timestamp': 1378205880,
'upload_date': '20130903',
@@ -36,8 +35,7 @@ class CtsNewsIE(InfoExtractor):
}, {
# With Youtube embedded video
'url': 'http://news.cts.com.tw/cts/money/201501/201501291578003.html',
'md5': '1d842c771dc94c8c3bca5af2cc1db9c5',
'add_ie': ['Youtube'],
'md5': 'e4726b2ccd70ba2c319865e28f0a91d1',
'info_dict': {
'id': 'OVbfO7d0_hQ',
'ext': 'mp4',
@@ -47,42 +45,37 @@ class CtsNewsIE(InfoExtractor):
'upload_date': '20150128',
'uploader_id': 'TBSCTS',
'uploader': '中華電視公司',
}
},
'add_ie': ['Youtube'],
}]
def _real_extract(self, url):
news_id = self._match_id(url)
page = self._download_webpage(url, news_id)
if self._search_regex(r'(CTSPlayer2)', page, 'CTSPlayer2 identifier', default=None):
feed_url = self._html_search_regex(
r'(http://news\.cts\.com\.tw/action/mp4feed\.php\?news_id=\d+)',
page, 'feed url')
video_url = self._download_webpage(
feed_url, news_id, note='Fetching feed')
news_id = self._hidden_inputs(page).get('get_id')
if news_id:
mp4_feed = self._download_json(
'http://news.cts.com.tw/action/test_mp4feed.php',
news_id, note='Fetching feed', query={'news_id': news_id})
video_url = mp4_feed['source_url']
else:
self.to_screen('Not CTSPlayer video, trying Youtube...')
youtube_url = self._search_regex(
r'src="(//www\.youtube\.com/embed/[^"]+)"', page, 'youtube url',
default=None)
if not youtube_url:
raise ExtractorError('The news includes no videos!', expected=True)
r'src="(//www\.youtube\.com/embed/[^"]+)"', page, 'youtube url')
return {
'_type': 'url',
'url': youtube_url,
'ie_key': 'Youtube',
}
return self.url_result(youtube_url, ie='Youtube')
description = self._html_search_meta('description', page)
title = self._html_search_meta('title', page)
title = self._html_search_meta('title', page, fatal=True)
thumbnail = self._html_search_meta('image', page)
datetime_str = self._html_search_regex(
r'(\d{4}/\d{2}/\d{2} \d{2}:\d{2})', page, 'date and time')
# Transform into ISO 8601 format with timezone info
datetime_str = datetime_str.replace('/', '-') + ':00+0800'
timestamp = parse_iso8601(datetime_str, delimiter=' ')
r'(\d{4}/\d{2}/\d{2} \d{2}:\d{2})', page, 'date and time', fatal=False)
timestamp = None
if datetime_str:
timestamp = unified_timestamp(datetime_str) - 8 * 3600
return {
'id': news_id,

View File

@@ -9,7 +9,7 @@ from ..utils import (
class CWTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?cw(?:tv|seed)\.com/(?:shows/)?(?:[^/]+/){2}\?.*\bplay=(?P<id>[a-z0-9]{8}-[a-z0-9]{4}-[a-z0-9]{4}-[a-z0-9]{4}-[a-z0-9]{12})'
_VALID_URL = r'https?://(?:www\.)?cw(?:tv(?:pr)?|seed)\.com/(?:shows/)?(?:[^/]+/)+[^?]*\?.*\b(?:play|watch)=(?P<id>[a-z0-9]{8}-[a-z0-9]{4}-[a-z0-9]{4}-[a-z0-9]{4}-[a-z0-9]{12})'
_TESTS = [{
'url': 'http://cwtv.com/shows/arrow/legends-of-yesterday/?play=6b15e985-9345-4f60-baf8-56e96be57c63',
'info_dict': {
@@ -28,7 +28,8 @@ class CWTVIE(InfoExtractor):
'params': {
# m3u8 download
'skip_download': True,
}
},
'skip': 'redirect to http://cwtv.com/shows/arrow/',
}, {
'url': 'http://www.cwseed.com/shows/whose-line-is-it-anyway/jeff-davis-4/?play=24282b12-ead2-42f2-95ad-26770c2c6088',
'info_dict': {
@@ -44,22 +45,43 @@ class CWTVIE(InfoExtractor):
'upload_date': '20151006',
'timestamp': 1444107300,
},
'params': {
# m3u8 download
'skip_download': True,
}
}, {
'url': 'http://cwtv.com/thecw/chroniclesofcisco/?play=8adebe35-f447-465f-ab52-e863506ff6d6',
'only_matching': True,
}, {
'url': 'http://cwtvpr.com/the-cw/video?watch=9eee3f60-ef4e-440b-b3b2-49428ac9c54e',
'only_matching': True,
}, {
'url': 'http://cwtv.com/shows/arrow/legends-of-yesterday/?watch=6b15e985-9345-4f60-baf8-56e96be57c63',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
video_data = self._download_json(
'http://metaframe.digitalsmiths.tv/v2/CWtv/assets/%s/partner/132?format=json' % video_id, video_id)
formats = self._extract_m3u8_formats(
video_data['videos']['variantplaylist']['uri'], video_id, 'mp4')
video_data = None
formats = []
for partner in (154, 213):
vdata = self._download_json(
'http://metaframe.digitalsmiths.tv/v2/CWtv/assets/%s/partner/%d?format=json' % (video_id, partner), video_id, fatal=False)
if not vdata:
continue
video_data = vdata
for quality, quality_data in vdata.get('videos', {}).items():
quality_url = quality_data.get('uri')
if not quality_url:
continue
if quality == 'variantplaylist':
formats.extend(self._extract_m3u8_formats(
quality_url, video_id, 'mp4', m3u8_id='hls', fatal=False))
else:
tbr = int_or_none(quality_data.get('bitrate'))
format_id = 'http' + ('-%d' % tbr if tbr else '')
if self._is_valid_url(quality_url, video_id, format_id):
formats.append({
'format_id': format_id,
'url': quality_url,
'tbr': tbr,
})
self._sort_formats(formats)
thumbnails = [{

View File

@@ -5,19 +5,20 @@ from .common import InfoExtractor
from ..utils import (
int_or_none,
determine_protocol,
unescapeHTML,
)
class DailyMailIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?dailymail\.co\.uk/video/[^/]+/video-(?P<id>[0-9]+)'
_TEST = {
'url': 'http://www.dailymail.co.uk/video/sciencetech/video-1288527/Turn-video-impressionist-masterpiece.html',
'md5': '2f639d446394f53f3a33658b518b6615',
'url': 'http://www.dailymail.co.uk/video/tvshowbiz/video-1295863/The-Mountain-appears-sparkling-water-ad-Heavy-Bubbles.html',
'md5': 'f6129624562251f628296c3a9ffde124',
'info_dict': {
'id': '1288527',
'id': '1295863',
'ext': 'mp4',
'title': 'Turn any video into an impressionist masterpiece',
'description': 'md5:88ddbcb504367987b2708bb38677c9d2',
'title': 'The Mountain appears in sparkling water ad for \'Heavy Bubbles\'',
'description': 'md5:a93d74b6da172dd5dc4d973e0b766a84',
}
}
@@ -26,7 +27,7 @@ class DailyMailIE(InfoExtractor):
webpage = self._download_webpage(url, video_id)
video_data = self._parse_json(self._search_regex(
r"data-opts='({.+?})'", webpage, 'video data'), video_id)
title = video_data['title']
title = unescapeHTML(video_data['title'])
video_sources = self._download_json(video_data.get(
'sources', {}).get('url') or 'http://www.dailymail.co.uk/api/player/%s/video-sources.json' % video_id, video_id)
@@ -55,7 +56,7 @@ class DailyMailIE(InfoExtractor):
return {
'id': video_id,
'title': title,
'description': video_data.get('descr'),
'description': unescapeHTML(video_data.get('descr')),
'thumbnail': video_data.get('poster') or video_data.get('thumbnail'),
'formats': formats,
}

View File

@@ -16,6 +16,7 @@ from ..utils import (
sanitized_Request,
str_to_int,
unescapeHTML,
mimetype2ext,
)
@@ -111,6 +112,13 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
}
]
@staticmethod
def _extract_urls(webpage):
# Look for embedded Dailymotion player
matches = re.findall(
r'<(?:(?:embed|iframe)[^>]+?src=|input[^>]+id=[\'"]dmcloudUrlEmissionSelect[\'"][^>]+value=)(["\'])(?P<url>(?:https?:)?//(?:www\.)?dailymotion\.com/(?:embed|swf)/video/.+?)\1', webpage)
return list(map(lambda m: unescapeHTML(m[1]), matches))
def _real_extract(self, url):
video_id = self._match_id(url)
@@ -153,18 +161,19 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
type_ = media.get('type')
if type_ == 'application/vnd.lumberjack.manifest':
continue
ext = determine_ext(media_url)
if type_ == 'application/x-mpegURL' or ext == 'm3u8':
ext = mimetype2ext(type_) or determine_ext(media_url)
if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
media_url, video_id, 'mp4', preference=-1,
m3u8_id='hls', fatal=False))
elif type_ == 'application/f4m' or ext == 'f4m':
elif ext == 'f4m':
formats.extend(self._extract_f4m_formats(
media_url, video_id, preference=-1, f4m_id='hds', fatal=False))
else:
f = {
'url': media_url,
'format_id': 'http-%s' % quality,
'ext': ext,
}
m = re.search(r'H264-(?P<width>\d+)x(?P<height>\d+)', media_url)
if m:
@@ -322,7 +331,9 @@ class DailymotionPlaylistIE(DailymotionBaseInfoExtractor):
for video_id in re.findall(r'data-xid="(.+?)"', webpage):
if video_id not in video_ids:
yield self.url_result('http://www.dailymotion.com/video/%s' % video_id, 'Dailymotion')
yield self.url_result(
'http://www.dailymotion.com/video/%s' % video_id,
DailymotionIE.ie_key(), video_id)
video_ids.add(video_id)
if re.search(self._MORE_PAGES_INDICATOR, webpage) is None:

View File

@@ -66,22 +66,32 @@ class DaumIE(InfoExtractor):
'view_count': int,
'comment_count': int,
},
}, {
# Requires dte_type=WEB (#9972)
'url': 'http://tvpot.daum.net/v/s3794Uf1NZeZ1qMpGpeqeRU',
'md5': 'a8917742069a4dd442516b86e7d66529',
'info_dict': {
'id': 's3794Uf1NZeZ1qMpGpeqeRU',
'ext': 'mp4',
'title': '러블리즈 - Destiny (나의 지구) (Lovelyz - Destiny) [쇼! 음악중심] 508회 20160611',
'description': '러블리즈 - Destiny (나의 지구) (Lovelyz - Destiny)\n\n[쇼! 음악중심] 20160611, 507회',
'upload_date': '20160611',
},
}]
def _real_extract(self, url):
video_id = compat_urllib_parse_unquote(self._match_id(url))
query = compat_urllib_parse_urlencode({'vid': video_id})
movie_data = self._download_json(
'http://videofarm.daum.net/controller/api/closed/v1_2/IntegratedMovieData.json?' + query,
video_id, 'Downloading video formats info')
'http://videofarm.daum.net/controller/api/closed/v1_2/IntegratedMovieData.json',
video_id, 'Downloading video formats info', query={'vid': video_id, 'dte_type': 'WEB'})
# For urls like http://m.tvpot.daum.net/v/65139429, where the video_id is really a clipid
if not movie_data.get('output_list', {}).get('output_list') and re.match(r'^\d+$', video_id):
return self.url_result('http://tvpot.daum.net/clip/ClipView.do?clipid=%s' % video_id)
info = self._download_xml(
'http://tvpot.daum.net/clip/ClipInfoXml.do?' + query, video_id,
'Downloading video info')
'http://tvpot.daum.net/clip/ClipInfoXml.do', video_id,
'Downloading video info', query={'vid': video_id})
formats = []
for format_el in movie_data['output_list']['output_list']:

View File

@@ -4,78 +4,47 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
float_or_none,
int_or_none,
clean_html,
)
class DBTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?dbtv\.no/(?:(?:lazyplayer|player)/)?(?P<id>[0-9]+)(?:#(?P<display_id>.+))?'
_VALID_URL = r'https?://(?:www\.)?dbtv\.no/(?:[^/]+/)?(?P<id>[0-9]+)(?:#(?P<display_id>.+))?'
_TESTS = [{
'url': 'http://dbtv.no/3649835190001#Skulle_teste_ut_fornøyelsespark,_men_kollegaen_var_bare_opptatt_av_bikinikroppen',
'md5': 'b89953ed25dacb6edb3ef6c6f430f8bc',
'md5': '2e24f67936517b143a234b4cadf792ec',
'info_dict': {
'id': '33100',
'id': '3649835190001',
'display_id': 'Skulle_teste_ut_fornøyelsespark,_men_kollegaen_var_bare_opptatt_av_bikinikroppen',
'ext': 'mp4',
'title': 'Skulle teste ut fornøyelsespark, men kollegaen var bare opptatt av bikinikroppen',
'description': 'md5:1504a54606c4dde3e4e61fc97aa857e0',
'thumbnail': 're:https?://.*\.jpg$',
'timestamp': 1404039863.438,
'thumbnail': 're:https?://.*\.jpg',
'timestamp': 1404039863,
'upload_date': '20140629',
'duration': 69.544,
'view_count': int,
'categories': list,
}
'uploader_id': '1027729757001',
},
'add_ie': ['BrightcoveNew']
}, {
'url': 'http://dbtv.no/3649835190001',
'only_matching': True,
}, {
'url': 'http://www.dbtv.no/lazyplayer/4631135248001',
'only_matching': True,
}, {
'url': 'http://dbtv.no/vice/5000634109001',
'only_matching': True,
}, {
'url': 'http://dbtv.no/filmtrailer/3359293614001',
'only_matching': True,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
display_id = mobj.group('display_id') or video_id
data = self._download_json(
'http://api.dbtv.no/discovery/%s' % video_id, display_id)
video = data['playlist'][0]
formats = [{
'url': f['URL'],
'vcodec': f.get('container'),
'width': int_or_none(f.get('width')),
'height': int_or_none(f.get('height')),
'vbr': float_or_none(f.get('rate'), 1000),
'filesize': int_or_none(f.get('size')),
} for f in video['renditions'] if 'URL' in f]
if not formats:
for url_key, format_id in [('URL', 'mp4'), ('HLSURL', 'hls')]:
if url_key in video:
formats.append({
'url': video[url_key],
'format_id': format_id,
})
self._sort_formats(formats)
video_id, display_id = re.match(self._VALID_URL, url).groups()
return {
'id': compat_str(video['id']),
'_type': 'url_transparent',
'url': 'http://players.brightcove.net/1027729757001/default_default/index.html?videoId=%s' % video_id,
'id': video_id,
'display_id': display_id,
'title': video['title'],
'description': clean_html(video['desc']),
'thumbnail': video.get('splash') or video.get('thumb'),
'timestamp': float_or_none(video.get('publishedAt'), 1000),
'duration': float_or_none(video.get('length'), 1000),
'view_count': int_or_none(video.get('views')),
'categories': video.get('tags'),
'formats': formats,
'ie_key': 'BrightcoveNew',
}

View File

@@ -62,11 +62,9 @@ class DCNBaseIE(InfoExtractor):
r'file\s*:\s*"https?(://[^"]+)/playlist.m3u8',
r'<a[^>]+href="rtsp(://[^"]+)"'
], webpage, 'format url')
# TODO: Current DASH formats are broken - $Time$ pattern in
# <SegmentTemplate> not implemented yet
# formats.extend(self._extract_mpd_formats(
# format_url_base + '/manifest.mpd',
# video_id, mpd_id='dash', fatal=False))
formats.extend(self._extract_mpd_formats(
format_url_base + '/manifest.mpd',
video_id, mpd_id='dash', fatal=False))
formats.extend(self._extract_m3u8_formats(
format_url_base + '/playlist.m3u8', video_id, 'mp4',
m3u8_entry_protocol, m3u8_id='hls', fatal=False))

View File

@@ -0,0 +1,98 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
extract_attributes,
int_or_none,
parse_age_limit,
unescapeHTML,
)
class DiscoveryGoIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?discoverygo\.com/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_TEST = {
'url': 'https://www.discoverygo.com/love-at-first-kiss/kiss-first-ask-questions-later/',
'info_dict': {
'id': '57a33c536b66d1cd0345eeb1',
'ext': 'mp4',
'title': 'Kiss First, Ask Questions Later!',
'description': 'md5:fe923ba34050eae468bffae10831cb22',
'duration': 2579,
'series': 'Love at First Kiss',
'season_number': 1,
'episode_number': 1,
'age_limit': 14,
},
}
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
container = extract_attributes(
self._search_regex(
r'(<div[^>]+class=["\']video-player-container[^>]+>)',
webpage, 'video container'))
video = self._parse_json(
unescapeHTML(container.get('data-video') or container.get('data-json')),
display_id)
title = video['name']
stream = video['stream']
STREAM_URL_SUFFIX = 'streamUrl'
formats = []
for stream_kind in ('', 'hds'):
suffix = STREAM_URL_SUFFIX.capitalize() if stream_kind else STREAM_URL_SUFFIX
stream_url = stream.get('%s%s' % (stream_kind, suffix))
if not stream_url:
continue
if stream_kind == '':
formats.extend(self._extract_m3u8_formats(
stream_url, display_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls', fatal=False))
elif stream_kind == 'hds':
formats.extend(self._extract_f4m_formats(
stream_url, display_id, f4m_id=stream_kind, fatal=False))
self._sort_formats(formats)
video_id = video.get('id') or display_id
description = video.get('description', {}).get('detailed')
duration = int_or_none(video.get('duration'))
series = video.get('show', {}).get('name')
season_number = int_or_none(video.get('season', {}).get('number'))
episode_number = int_or_none(video.get('episodeNumber'))
tags = video.get('tags')
age_limit = parse_age_limit(video.get('parental', {}).get('rating'))
subtitles = {}
captions = stream.get('captions')
if isinstance(captions, list):
for caption in captions:
subtitle_url = caption.get('fileUrl')
if (not subtitle_url or not isinstance(subtitle_url, compat_str) or
not subtitle_url.startswith('http')):
continue
lang = caption.get('fileLang', 'en')
subtitles.setdefault(lang, []).append({'url': subtitle_url})
return {
'id': video_id,
'display_id': display_id,
'title': title,
'description': description,
'duration': duration,
'series': series,
'season_number': season_number,
'episode_number': episode_number,
'tags': tags,
'age_limit': age_limit,
'formats': formats,
'subtitles': subtitles,
}

View File

@@ -17,8 +17,12 @@ class DreiSatIE(ZDFIE):
'ext': 'mp4',
'title': 'Waidmannsheil',
'description': 'md5:cce00ca1d70e21425e72c86a98a56817',
'uploader': '3sat',
'uploader': 'SCHWEIZWEIT',
'uploader_id': '100000210',
'upload_date': '20140913'
},
'params': {
'skip_download': True, # m3u8 downloads
}
},
{

View File

@@ -3,7 +3,10 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import str_to_int
from ..utils import (
NO_DEFAULT,
str_to_int,
)
class DrTuberIE(InfoExtractor):
@@ -17,7 +20,6 @@ class DrTuberIE(InfoExtractor):
'ext': 'mp4',
'title': 'hot perky blonde naked golf',
'like_count': int,
'dislike_count': int,
'comment_count': int,
'categories': ['Babe', 'Blonde', 'Erotic', 'Outdoor', 'Softcore', 'Solo'],
'thumbnail': 're:https?://.*\.jpg$',
@@ -36,25 +38,29 @@ class DrTuberIE(InfoExtractor):
r'<source src="([^"]+)"', webpage, 'video URL')
title = self._html_search_regex(
[r'<p[^>]+class="title_substrate">([^<]+)</p>', r'<title>([^<]+) - \d+'],
(r'class="title_watch"[^>]*><p>([^<]+)<',
r'<p[^>]+class="title_substrate">([^<]+)</p>',
r'<title>([^<]+) - \d+'),
webpage, 'title')
thumbnail = self._html_search_regex(
r'poster="([^"]+)"',
webpage, 'thumbnail', fatal=False)
def extract_count(id_, name):
def extract_count(id_, name, default=NO_DEFAULT):
return str_to_int(self._html_search_regex(
r'<span[^>]+(?:class|id)="%s"[^>]*>([\d,\.]+)</span>' % id_,
webpage, '%s count' % name, fatal=False))
webpage, '%s count' % name, default=default, fatal=False))
like_count = extract_count('rate_likes', 'like')
dislike_count = extract_count('rate_dislikes', 'dislike')
dislike_count = extract_count('rate_dislikes', 'dislike', default=None)
comment_count = extract_count('comments_count', 'comment')
cats_str = self._search_regex(
r'<div[^>]+class="categories_list">(.+?)</div>', webpage, 'categories', fatal=False)
categories = [] if not cats_str else re.findall(r'<a title="([^"]+)"', cats_str)
r'<div[^>]+class="categories_list">(.+?)</div>',
webpage, 'categories', fatal=False)
categories = [] if not cats_str else re.findall(
r'<a title="([^"]+)"', cats_str)
return {
'id': video_id,

View File

@@ -6,12 +6,13 @@ import json
from .common import InfoExtractor
from ..utils import (
ExtractorError,
NO_DEFAULT,
)
class EllenTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?(?:ellentv|ellentube)\.com/videos/(?P<id>[a-z0-9_-]+)'
_TEST = {
_TESTS = [{
'url': 'http://www.ellentv.com/videos/0-ipq1gsai/',
'md5': '4294cf98bc165f218aaa0b89e0fd8042',
'info_dict': {
@@ -22,24 +23,47 @@ class EllenTVIE(InfoExtractor):
'timestamp': 1428035648,
'upload_date': '20150403',
'uploader_id': 'batchUser',
}
}
},
}, {
# not available via http://widgets.ellentube.com/
'url': 'http://www.ellentv.com/videos/1-szkgu2m2/',
'info_dict': {
'id': '1_szkgu2m2',
'ext': 'flv',
'title': "Ellen's Amazingly Talented Audience",
'description': 'md5:86ff1e376ff0d717d7171590e273f0a5',
'timestamp': 1255140900,
'upload_date': '20091010',
'uploader_id': 'ellenkaltura@gmail.com',
},
'params': {
'skip_download': True,
},
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(
'http://widgets.ellentube.com/videos/%s' % video_id,
video_id)
URLS = ('http://widgets.ellentube.com/videos/%s' % video_id, url)
partner_id = self._search_regex(
r"var\s+partnerId\s*=\s*'([^']+)", webpage, 'partner id')
for num, url_ in enumerate(URLS, 1):
webpage = self._download_webpage(
url_, video_id, fatal=num == len(URLS))
kaltura_id = self._search_regex(
[r'id="kaltura_player_([^"]+)"',
r"_wb_entry_id\s*:\s*'([^']+)",
r'data-kaltura-entry-id="([^"]+)'],
webpage, 'kaltura id')
default = NO_DEFAULT if num == len(URLS) else None
partner_id = self._search_regex(
r"var\s+partnerId\s*=\s*'([^']+)", webpage, 'partner id',
default=default)
kaltura_id = self._search_regex(
[r'id="kaltura_player_([^"]+)"',
r"_wb_entry_id\s*:\s*'([^']+)",
r'data-kaltura-entry-id="([^"]+)'],
webpage, 'kaltura id', default=default)
if partner_id and kaltura_id:
break
return self.url_result('kaltura:%s:%s' % (partner_id, kaltura_id), 'Kaltura')

View File

@@ -4,9 +4,10 @@ from .common import InfoExtractor
class EngadgetIE(InfoExtractor):
_VALID_URL = r'https?://www.engadget.com/video/(?P<id>\d+)'
_VALID_URL = r'https?://www.engadget.com/video/(?P<id>[^/?#]+)'
_TEST = {
_TESTS = [{
# video with 5min ID
'url': 'http://www.engadget.com/video/518153925/',
'md5': 'c6820d4828a5064447a4d9fc73f312c9',
'info_dict': {
@@ -15,8 +16,12 @@ class EngadgetIE(InfoExtractor):
'title': 'Samsung Galaxy Tab Pro 8.4 Review',
},
'add_ie': ['FiveMin'],
}
}, {
# video with vidible ID
'url': 'https://www.engadget.com/video/57a28462134aa15a39f0421a/',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
return self.url_result('5min:%s' % video_id)
return self.url_result('aol-video:%s' % video_id)

View File

@@ -4,19 +4,23 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
encode_base_n,
ExtractorError,
int_or_none,
parse_duration,
str_to_int,
)
class EpornerIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?eporner\.com/hd-porn/(?P<id>\w+)/(?P<display_id>[\w-]+)'
_VALID_URL = r'https?://(?:www\.)?eporner\.com/hd-porn/(?P<id>\w+)(?:/(?P<display_id>[\w-]+))?'
_TESTS = [{
'url': 'http://www.eporner.com/hd-porn/95008/Infamous-Tiffany-Teen-Strip-Tease-Video/',
'md5': '39d486f046212d8e1b911c52ab4691f8',
'info_dict': {
'id': '95008',
'id': 'qlDUmNsj6VS',
'display_id': 'Infamous-Tiffany-Teen-Strip-Tease-Video',
'ext': 'mp4',
'title': 'Infamous Tiffany Teen Strip Tease Video',
@@ -28,34 +32,72 @@ class EpornerIE(InfoExtractor):
# New (May 2016) URL layout
'url': 'http://www.eporner.com/hd-porn/3YRUtzMcWn0/Star-Wars-XXX-Parody/',
'only_matching': True,
}, {
'url': 'http://www.eporner.com/hd-porn/3YRUtzMcWn0',
'only_matching': True,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
display_id = mobj.group('display_id')
display_id = mobj.group('display_id') or video_id
webpage = self._download_webpage(url, display_id)
title = self._html_search_regex(
r'<title>(.*?) - EPORNER', webpage, 'title')
webpage, urlh = self._download_webpage_handle(url, display_id)
redirect_url = 'http://www.eporner.com/config5/%s' % video_id
player_code = self._download_webpage(
redirect_url, display_id, note='Downloading player config')
video_id = self._match_id(compat_str(urlh.geturl()))
sources = self._search_regex(
r'(?s)sources\s*:\s*\[\s*({.+?})\s*\]', player_code, 'sources')
hash = self._search_regex(
r'hash\s*:\s*["\']([\da-f]{32})', webpage, 'hash')
title = self._og_search_title(webpage, default=None) or self._html_search_regex(
r'<title>(.+?) - EPORNER', webpage, 'title')
# Reverse engineered from vjs.js
def calc_hash(s):
return ''.join((encode_base_n(int(s[lb:lb + 8], 16), 36) for lb in range(0, 32, 8)))
video = self._download_json(
'http://www.eporner.com/xhr/video/%s' % video_id,
display_id, note='Downloading video JSON',
query={
'hash': calc_hash(hash),
'device': 'generic',
'domain': 'www.eporner.com',
'fallback': 'false',
})
if video.get('available') is False:
raise ExtractorError(
'%s said: %s' % (self.IE_NAME, video['message']), expected=True)
sources = video['sources']
formats = []
for video_url, format_id in re.findall(r'file\s*:\s*"([^"]+)",\s*label\s*:\s*"([^"]+)"', sources):
fmt = {
'url': video_url,
'format_id': format_id,
}
m = re.search(r'^(\d+)', format_id)
if m:
fmt['height'] = int(m.group(1))
formats.append(fmt)
for kind, formats_dict in sources.items():
if not isinstance(formats_dict, dict):
continue
for format_id, format_dict in formats_dict.items():
if not isinstance(format_dict, dict):
continue
src = format_dict.get('src')
if not isinstance(src, compat_str) or not src.startswith('http'):
continue
if kind == 'hls':
formats.extend(self._extract_m3u8_formats(
src, display_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id=kind, fatal=False))
else:
height = int_or_none(self._search_regex(
r'(\d+)[pP]', format_id, 'height', default=None))
fps = int_or_none(self._search_regex(
r'(\d+)fps', format_id, 'fps', default=None))
formats.append({
'url': src,
'format_id': format_id,
'height': height,
'fps': fps,
})
self._sort_formats(formats)
duration = parse_duration(self._html_search_meta('duration', webpage))

View File

@@ -1,7 +1,5 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
int_or_none,
@@ -12,23 +10,22 @@ from ..utils import (
class ExpoTVIE(InfoExtractor):
_VALID_URL = r'https?://www\.expotv\.com/videos/[^?#]*/(?P<id>[0-9]+)($|[?#])'
_TEST = {
'url': 'http://www.expotv.com/videos/reviews/1/24/LinneCardscom/17561',
'md5': '2985e6d7a392b2f7a05e0ca350fe41d0',
'url': 'http://www.expotv.com/videos/reviews/3/40/NYX-Butter-lipstick/667916',
'md5': 'fe1d728c3a813ff78f595bc8b7a707a8',
'info_dict': {
'id': '17561',
'id': '667916',
'ext': 'mp4',
'upload_date': '20060212',
'title': 'My Favorite Online Scrapbook Store',
'view_count': int,
'description': 'You\'ll find most everything you need at this virtual store front.',
'uploader': 'Anna T.',
'title': 'NYX Butter Lipstick Little Susie',
'description': 'Goes on like butter, but looks better!',
'thumbnail': 're:^https?://.*\.jpg$',
'uploader': 'Stephanie S.',
'upload_date': '20150520',
'view_count': int,
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
player_key = self._search_regex(
@@ -66,7 +63,7 @@ class ExpoTVIE(InfoExtractor):
fatal=False)
upload_date = unified_strdate(self._search_regex(
r'<h5>Reviewed on ([0-9/.]+)</h5>', webpage, 'upload date',
fatal=False))
fatal=False), day_first=False)
return {
'id': video_id,

View File

@@ -44,6 +44,7 @@ from .appletrailers import (
AppleTrailersSectionIE,
)
from .archiveorg import ArchiveOrgIE
from .arkena import ArkenaIE
from .ard import (
ARDIE,
ARDMediathekIE,
@@ -139,9 +140,9 @@ from .chirbit import (
ChirbitProfileIE,
)
from .cinchcast import CinchcastIE
from .cliprs import ClipRsIE
from .clipfish import ClipfishIE
from .cliphunter import CliphunterIE
from .cliprs import ClipRsIE
from .clipsyndicate import ClipsyndicateIE
from .closertotruth import CloserToTruthIE
from .cloudy import CloudyIE
@@ -156,7 +157,12 @@ from .cnn import (
)
from .coub import CoubIE
from .collegerama import CollegeRamaIE
from .comedycentral import ComedyCentralIE, ComedyCentralShowsIE
from .comedycentral import (
ComedyCentralIE,
ComedyCentralShortnameIE,
ComedyCentralTVIE,
ToshIE,
)
from .comcarcoff import ComCarCoffIE
from .commonmistakes import CommonMistakesIE, UnicodeBOMIE
from .commonprotocols import RtmpIE
@@ -215,6 +221,7 @@ from .dvtv import DVTVIE
from .dumpert import DumpertIE
from .defense import DefenseGouvFrIE
from .discovery import DiscoveryIE
from .discoverygo import DiscoveryGoIE
from .dispeak import DigitallySpeakingIE
from .dropbox import DropboxIE
from .dw import (
@@ -256,6 +263,7 @@ from .fivemin import FiveMinIE
from .fivetv import FiveTVIE
from .fktv import FKTVIE
from .flickr import FlickrIE
from .flipagram import FlipagramIE
from .folketinget import FolketingetIE
from .footyroom import FootyRoomIE
from .formula1 import Formula1IE
@@ -264,10 +272,7 @@ from .fox import FOXIE
from .foxgay import FoxgayIE
from .foxnews import FoxNewsIE
from .foxsports import FoxSportsIE
from .franceculture import (
FranceCultureIE,
FranceCultureEmissionIE,
)
from .franceculture import FranceCultureIE
from .franceinter import FranceInterIE
from .francetv import (
PluzzIE,
@@ -283,7 +288,6 @@ from .funimation import FunimationIE
from .funnyordie import FunnyOrDieIE
from .fusion import FusionIE
from .gameinformer import GameInformerIE
from .gamekings import GamekingsIE
from .gameone import (
GameOneIE,
GameOnePlaylistIE,
@@ -304,7 +308,6 @@ from .globo import (
)
from .godtube import GodTubeIE
from .godtv import GodTVIE
from .goldenmoustache import GoldenMoustacheIE
from .golem import GolemIE
from .googledrive import GoogleDriveIE
from .googleplus import GooglePlusIE
@@ -319,6 +322,7 @@ from .heise import HeiseIE
from .hellporno import HellPornoIE
from .helsinki import HelsinkiIE
from .hentaistigma import HentaiStigmaIE
from .hgtv import HGTVIE
from .historicfilms import HistoricFilmsIE
from .hitbox import HitboxIE, HitboxLiveIE
from .hornbunny import HornBunnyIE
@@ -368,6 +372,7 @@ from .jove import JoveIE
from .jwplatform import JWPlatformIE
from .jpopsukitv import JpopsukiIE
from .kaltura import KalturaIE
from .kamcord import KamcordIE
from .kanalplay import KanalPlayIE
from .kankan import KankanIE
from .karaoketv import KaraoketvIE
@@ -391,6 +396,10 @@ from .kuwo import (
)
from .la7 import LA7IE
from .laola1tv import Laola1TvIE
from .lcp import (
LcpPlayIE,
LcpIE,
)
from .learnr import LearnrIE
from .lecture2go import Lecture2GoIE
from .lemonde import LemondeIE
@@ -469,7 +478,6 @@ from .msn import MSNIE
from .mtv import (
MTVIE,
MTVServicesEmbeddedIE,
MTVIggyIE,
MTVDEIE,
)
from .muenchentv import MuenchenTVIE
@@ -481,8 +489,9 @@ from .myvi import MyviIE
from .myvideo import MyVideoIE
from .myvidster import MyVidsterIE
from .nationalgeographic import (
NationalGeographicVideoIE,
NationalGeographicIE,
NationalGeographicChannelIE,
NationalGeographicEpisodeGuideIE,
)
from .naver import NaverIE
from .nba import NBAIE
@@ -519,7 +528,6 @@ from .nextmedia import (
NextMediaActionNewsIE,
AppleDailyIE,
)
from .nextmovie import NextMovieIE
from .nfb import NFBIE
from .nfl import NFLIE
from .nhl import (
@@ -535,6 +543,8 @@ from .nick import (
from .niconico import NiconicoIE, NiconicoPlaylistIE
from .ninecninemedia import NineCNineMediaIE
from .ninegag import NineGagIE
from .ninenow import NineNowIE
from .nintendo import NintendoIE
from .noco import NocoIE
from .normalboots import NormalbootsIE
from .nosvideo import NosVideoIE
@@ -579,8 +589,13 @@ from .nytimes import (
NYTimesArticleIE,
)
from .nuvid import NuvidIE
from .odatv import OdaTVIE
from .odnoklassniki import OdnoklassnikiIE
from .oktoberfesttv import OktoberfestTVIE
from .onet import (
OnetIE,
OnetChannelIE,
)
from .onionstudios import OnionStudiosIE
from .ooyala import (
OoyalaIE,
@@ -619,6 +634,7 @@ from .pluralsight import (
PluralsightCourseIE,
)
from .podomatic import PodomaticIE
from .pokemon import PokemonIE
from .polskieradio import PolskieRadioIE
from .porn91 import Porn91IE
from .pornhd import PornHdIE
@@ -674,16 +690,19 @@ from .rice import RICEIE
from .ringtv import RingTVIE
from .ro220 import Ro220IE
from .rockstargames import RockstarGamesIE
from .roosterteeth import RoosterTeethIE
from .rottentomatoes import RottenTomatoesIE
from .roxwel import RoxwelIE
from .rozhlas import RozhlasIE
from .rtbf import RTBFIE
from .rte import RteIE, RteRadioIE
from .rtlnl import RtlNlIE
from .rtl2 import RTL2IE
from .rtp import RTPIE
from .rts import RTSIE
from .rtve import RTVEALaCartaIE, RTVELiveIE, RTVEInfantilIE
from .rtve import RTVEALaCartaIE, RTVELiveIE, RTVEInfantilIE, RTVELiveIE, RTVETelevisionIE
from .rtvnh import RTVNHIE
from .rudo import RudoIE
from .ruhd import RUHDIE
from .ruleporn import RulePornIE
from .rutube import (
@@ -734,6 +753,7 @@ from .smotri import (
)
from .snotr import SnotrIE
from .sohu import SohuIE
from .sonyliv import SonyLIVIE
from .soundcloud import (
SoundcloudIE,
SoundcloudSetIE,
@@ -773,6 +793,7 @@ from .srmediathek import SRMediathekIE
from .ssa import SSAIE
from .stanfordoc import StanfordOpenClassroomIE
from .steam import SteamIE
from .streamable import StreamableIE
from .streamcloud import StreamcloudIE
from .streamcz import StreamCZIE
from .streetvoice import StreetVoiceIE
@@ -788,7 +809,6 @@ from .tagesschau import (
TagesschauPlayerIE,
TagesschauIE,
)
from .tapely import TapelyIE
from .tass import TassIE
from .tdslifeway import TDSLifewayIE
from .teachertube import (
@@ -872,6 +892,7 @@ from .tvc import (
from .tvigle import TvigleIE
from .tvland import TVLandIE
from .tvp import (
TVPEmbedIE,
TVPIE,
TVPSeriesIE,
)
@@ -904,6 +925,7 @@ from .udemy import (
from .udn import UDNEmbedIE
from .digiteka import DigitekaIE
from .unistra import UnistraIE
from .uol import UOLIE
from .urort import UrortIE
from .urplay import URPlayIE
from .usatoday import USATodayIE
@@ -981,9 +1003,11 @@ from .viki import (
from .vk import (
VKIE,
VKUserVideosIE,
VKWallPostIE,
)
from .vlive import VLiveIE
from .vodlocker import VodlockerIE
from .vodplatform import VODPlatformIE
from .voicerepublic import VoiceRepublicIE
from .voxmedia import VoxMediaIE
from .vporn import VpornIE
@@ -1066,6 +1090,7 @@ from .youtube import (
YoutubeSearchDateIE,
YoutubeSearchIE,
YoutubeSearchURLIE,
YoutubeSharedVideoIE,
YoutubeShowIE,
YoutubeSubscriptionsIE,
YoutubeTruncatedIDIE,

View File

@@ -27,7 +27,7 @@ class FacebookIE(InfoExtractor):
_VALID_URL = r'''(?x)
(?:
https?://
(?:\w+\.)?facebook\.com/
(?:[\w-]+\.)?facebook\.com/
(?:[^#]*?\#!/)?
(?:
(?:
@@ -127,6 +127,9 @@ class FacebookIE(InfoExtractor):
}, {
'url': 'https://www.facebook.com/groups/164828000315060/permalink/764967300301124/',
'only_matching': True,
}, {
'url': 'https://zh-hk.facebook.com/peoplespower/videos/1135894589806027/',
'only_matching': True,
}]
@staticmethod
@@ -139,7 +142,7 @@ class FacebookIE(InfoExtractor):
# Facebook API embed
# see https://developers.facebook.com/docs/plugins/embedded-video-player
mobj = re.search(r'''(?x)<div[^>]+
class=(?P<q1>[\'"])[^\'"]*\bfb-video\b[^\'"]*(?P=q1)[^>]+
class=(?P<q1>[\'"])[^\'"]*\bfb-(?:video|post)\b[^\'"]*(?P=q1)[^>]+
data-href=(?P<q2>[\'"])(?P<url>(?:https?:)?//(?:www\.)?facebook.com/.+?)(?P=q2)''', webpage)
if mobj is not None:
return mobj.group('url')
@@ -219,12 +222,25 @@ class FacebookIE(InfoExtractor):
BEFORE = '{swf.addParam(param[0], param[1]);});'
AFTER = '.forEach(function(variable) {swf.addVariable(variable[0], variable[1]);});'
m = re.search(re.escape(BEFORE) + '(?:\n|\\\\n)(.*?)' + re.escape(AFTER), webpage)
if m:
swf_params = m.group(1).replace('\\\\', '\\').replace('\\"', '"')
PATTERN = re.escape(BEFORE) + '(?:\n|\\\\n)(.*?)' + re.escape(AFTER)
for m in re.findall(PATTERN, webpage):
swf_params = m.replace('\\\\', '\\').replace('\\"', '"')
data = dict(json.loads(swf_params))
params_raw = compat_urllib_parse_unquote(data['params'])
video_data = json.loads(params_raw)['video_data']
video_data_candidate = json.loads(params_raw)['video_data']
for _, f in video_data_candidate.items():
if not f:
continue
if isinstance(f, dict):
f = [f]
if not isinstance(f, list):
continue
if f[0].get('video_id') == video_id:
video_data = video_data_candidate
break
if video_data:
break
def video_data_list2dict(video_data):
ret = {}

View File

@@ -1,24 +1,11 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import (
compat_parse_qs,
compat_urllib_parse_urlencode,
compat_urllib_parse_urlparse,
compat_urlparse,
)
from ..utils import (
ExtractorError,
parse_duration,
replace_extension,
)
class FiveMinIE(InfoExtractor):
IE_NAME = '5min'
_VALID_URL = r'(?:5min:(?P<id>\d+)(?::(?P<sid>\d+))?|https?://[^/]*?5min\.com/Scripts/PlayerSeed\.js\?(?P<query>.*))'
_VALID_URL = r'(?:5min:|https?://(?:[^/]*?5min\.com/|delivery\.vidible\.tv/aol)(?:(?:Scripts/PlayerSeed\.js|playerseed/?)?\?.*?playList=)?)(?P<id>\d+)'
_TESTS = [
{
@@ -29,8 +16,16 @@ class FiveMinIE(InfoExtractor):
'id': '518013791',
'ext': 'mp4',
'title': 'iPad Mini with Retina Display Review',
'description': 'iPad mini with Retina Display review',
'duration': 177,
'uploader': 'engadget',
'upload_date': '20131115',
'timestamp': 1384515288,
},
'params': {
# m3u8 download
'skip_download': True,
}
},
{
# From http://on.aol.com/video/how-to-make-a-next-level-fruit-salad-518086247
@@ -44,108 +39,16 @@ class FiveMinIE(InfoExtractor):
},
'skip': 'no longer available',
},
{
'url': 'http://embed.5min.com/518726732/',
'only_matching': True,
},
{
'url': 'http://delivery.vidible.tv/aol?playList=518013791',
'only_matching': True,
}
]
_ERRORS = {
'ErrorVideoNotExist': 'We\'re sorry, but the video you are trying to watch does not exist.',
'ErrorVideoNoLongerAvailable': 'We\'re sorry, but the video you are trying to watch is no longer available.',
'ErrorVideoRejected': 'We\'re sorry, but the video you are trying to watch has been removed.',
'ErrorVideoUserNotGeo': 'We\'re sorry, but the video you are trying to watch cannot be viewed from your current location.',
'ErrorVideoLibraryRestriction': 'We\'re sorry, but the video you are trying to watch is currently unavailable for viewing at this domain.',
'ErrorExposurePermission': 'We\'re sorry, but the video you are trying to watch is currently unavailable for viewing at this domain.',
}
_QUALITIES = {
1: {
'width': 640,
'height': 360,
},
2: {
'width': 854,
'height': 480,
},
4: {
'width': 1280,
'height': 720,
},
8: {
'width': 1920,
'height': 1080,
},
16: {
'width': 640,
'height': 360,
},
32: {
'width': 854,
'height': 480,
},
64: {
'width': 1280,
'height': 720,
},
128: {
'width': 640,
'height': 360,
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
sid = mobj.group('sid')
if mobj.group('query'):
qs = compat_parse_qs(mobj.group('query'))
if not qs.get('playList'):
raise ExtractorError('Invalid URL', expected=True)
video_id = qs['playList'][0]
if qs.get('sid'):
sid = qs['sid'][0]
embed_url = 'https://embed.5min.com/playerseed/?playList=%s' % video_id
if not sid:
embed_page = self._download_webpage(embed_url, video_id,
'Downloading embed page')
sid = self._search_regex(r'sid=(\d+)', embed_page, 'sid')
response = self._download_json(
'https://syn.5min.com/handlers/SenseHandler.ashx?' +
compat_urllib_parse_urlencode({
'func': 'GetResults',
'playlist': video_id,
'sid': sid,
'isPlayerSeed': 'true',
'url': embed_url,
}),
video_id)
if not response['success']:
raise ExtractorError(
'%s said: %s' % (
self.IE_NAME,
self._ERRORS.get(response['errorMessage'], response['errorMessage'])),
expected=True)
info = response['binding'][0]
formats = []
parsed_video_url = compat_urllib_parse_urlparse(compat_parse_qs(
compat_urllib_parse_urlparse(info['EmbededURL']).query)['videoUrl'][0])
for rendition in info['Renditions']:
if rendition['RenditionType'] == 'aac' or rendition['RenditionType'] == 'm3u8':
continue
else:
rendition_url = compat_urlparse.urlunparse(parsed_video_url._replace(path=replace_extension(parsed_video_url.path.replace('//', '/%s/' % rendition['ID']), rendition['RenditionType'])))
quality = self._QUALITIES.get(rendition['ID'], {})
formats.append({
'format_id': '%s-%d' % (rendition['RenditionType'], rendition['ID']),
'url': rendition_url,
'width': quality.get('width'),
'height': quality.get('height'),
})
self._sort_formats(formats)
return {
'id': video_id,
'title': info['Title'],
'thumbnail': info.get('ThumbURL'),
'duration': parse_duration(info.get('Duration')),
'formats': formats,
}
video_id = self._match_id(url)
return self.url_result('aol-video:%s' % video_id)

View File

@@ -0,0 +1,115 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
int_or_none,
float_or_none,
try_get,
unified_timestamp,
)
class FlipagramIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?flipagram\.com/f/(?P<id>[^/?#&]+)'
_TEST = {
'url': 'https://flipagram.com/f/nyvTSJMKId',
'md5': '888dcf08b7ea671381f00fab74692755',
'info_dict': {
'id': 'nyvTSJMKId',
'ext': 'mp4',
'title': 'Flipagram by sjuria101 featuring Midnight Memories by One Direction',
'description': 'md5:d55e32edc55261cae96a41fa85ff630e',
'duration': 35.571,
'timestamp': 1461244995,
'upload_date': '20160421',
'uploader': 'kitty juria',
'uploader_id': 'sjuria101',
'creator': 'kitty juria',
'view_count': int,
'like_count': int,
'repost_count': int,
'comment_count': int,
'comments': list,
'formats': 'mincount:2',
},
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
video_data = self._parse_json(
self._search_regex(
r'window\.reactH2O\s*=\s*({.+});', webpage, 'video data'),
video_id)
flipagram = video_data['flipagram']
video = flipagram['video']
json_ld = self._search_json_ld(webpage, video_id, default={})
title = json_ld.get('title') or flipagram['captionText']
description = json_ld.get('description') or flipagram.get('captionText')
formats = [{
'url': video['url'],
'width': int_or_none(video.get('width')),
'height': int_or_none(video.get('height')),
'filesize': int_or_none(video_data.get('size')),
}]
preview_url = try_get(
flipagram, lambda x: x['music']['track']['previewUrl'], compat_str)
if preview_url:
formats.append({
'url': preview_url,
'ext': 'm4a',
'vcodec': 'none',
})
self._sort_formats(formats)
counts = flipagram.get('counts', {})
user = flipagram.get('user', {})
video_data = flipagram.get('video', {})
thumbnails = [{
'url': self._proto_relative_url(cover['url']),
'width': int_or_none(cover.get('width')),
'height': int_or_none(cover.get('height')),
'filesize': int_or_none(cover.get('size')),
} for cover in flipagram.get('covers', []) if cover.get('url')]
# Note that this only retrieves comments that are initally loaded.
# For videos with large amounts of comments, most won't be retrieved.
comments = []
for comment in video_data.get('comments', {}).get(video_id, {}).get('items', []):
text = comment.get('comment')
if not text or not isinstance(text, list):
continue
comments.append({
'author': comment.get('user', {}).get('name'),
'author_id': comment.get('user', {}).get('username'),
'id': comment.get('id'),
'text': text[0],
'timestamp': unified_timestamp(comment.get('created')),
})
return {
'id': video_id,
'title': title,
'description': description,
'duration': float_or_none(flipagram.get('duration'), 1000),
'thumbnails': thumbnails,
'timestamp': unified_timestamp(flipagram.get('iso8601Created')),
'uploader': user.get('name'),
'uploader_id': user.get('username'),
'creator': user.get('name'),
'view_count': int_or_none(counts.get('plays')),
'like_count': int_or_none(counts.get('likes')),
'repost_count': int_or_none(counts.get('reflips')),
'comment_count': int_or_none(counts.get('comments')),
'comments': comments,
'formats': formats,
}

View File

@@ -5,8 +5,8 @@ from .common import InfoExtractor
class Formula1IE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?formula1\.com/content/fom-website/en/video/\d{4}/\d{1,2}/(?P<id>.+?)\.html'
_TEST = {
_VALID_URL = r'https?://(?:www\.)?formula1\.com/(?:content/fom-website/)?en/video/\d{4}/\d{1,2}/(?P<id>.+?)\.html'
_TESTS = [{
'url': 'http://www.formula1.com/content/fom-website/en/video/2016/5/Race_highlights_-_Spain_2016.html',
'md5': '8c79e54be72078b26b89e0e111c0502b',
'info_dict': {
@@ -15,7 +15,10 @@ class Formula1IE(InfoExtractor):
'title': 'Race highlights - Spain 2016',
},
'add_ie': ['Ooyala'],
}
}, {
'url': 'http://www.formula1.com/en/video/2016/5/Race_highlights_-_Spain_2016.html',
'only_matching': True,
}]
def _real_extract(self, url):
display_id = self._match_id(url)

View File

@@ -43,14 +43,14 @@ class FourTubeIE(InfoExtractor):
'uploadDate', webpage))
thumbnail = self._html_search_meta('thumbnailUrl', webpage)
uploader_id = self._html_search_regex(
r'<a class="img-avatar" href="[^"]+/channels/([^/"]+)" title="Go to [^"]+ page">',
r'<a class="item-to-subscribe" href="[^"]+/channels/([^/"]+)" title="Go to [^"]+ page">',
webpage, 'uploader id', fatal=False)
uploader = self._html_search_regex(
r'<a class="img-avatar" href="[^"]+/channels/[^/"]+" title="Go to ([^"]+) page">',
r'<a class="item-to-subscribe" href="[^"]+/channels/[^/"]+" title="Go to ([^"]+) page">',
webpage, 'uploader', fatal=False)
categories_html = self._search_regex(
r'(?s)><i class="icon icon-tag"></i>\s*Categories / Tags\s*.*?<ul class="list">(.*?)</ul>',
r'(?s)><i class="icon icon-tag"></i>\s*Categories / Tags\s*.*?<ul class="[^"]*?list[^"]*?">(.*?)</ul>',
webpage, 'categories', fatal=False)
categories = None
if categories_html:
@@ -59,10 +59,10 @@ class FourTubeIE(InfoExtractor):
r'(?s)<li><a.*?>(.*?)</a>', categories_html)]
view_count = str_to_int(self._search_regex(
r'<meta itemprop="interactionCount" content="UserPlays:([0-9,]+)">',
r'<meta[^>]+itemprop="interactionCount"[^>]+content="UserPlays:([0-9,]+)">',
webpage, 'view count', fatal=False))
like_count = str_to_int(self._search_regex(
r'<meta itemprop="interactionCount" content="UserLikes:([0-9,]+)">',
r'<meta[^>]+itemprop="interactionCount"[^>]+content="UserLikes:([0-9,]+)">',
webpage, 'like count', fatal=False))
duration = parse_duration(self._html_search_meta('duration', webpage))

View File

@@ -2,7 +2,10 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import smuggle_url
from ..utils import (
smuggle_url,
update_url_query,
)
class FOXIE(InfoExtractor):
@@ -29,11 +32,12 @@ class FOXIE(InfoExtractor):
release_url = self._parse_json(self._search_regex(
r'"fox_pdk_player"\s*:\s*({[^}]+?})', webpage, 'fox_pdk_player'),
video_id)['release_url'] + '&switch=http'
video_id)['release_url']
return {
'_type': 'url_transparent',
'ie_key': 'ThePlatform',
'url': smuggle_url(release_url, {'force_smil_url': True}),
'url': smuggle_url(update_url_query(
release_url, {'switch': 'http'}), {'force_smil_url': True}),
'id': video_id,
}

View File

@@ -2,104 +2,56 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import (
compat_urlparse,
)
from ..utils import (
determine_ext,
int_or_none,
ExtractorError,
unified_strdate,
)
class FranceCultureIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?franceculture\.fr/player/reecouter\?play=(?P<id>[0-9]+)'
_VALID_URL = r'https?://(?:www\.)?franceculture\.fr/emissions/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_TEST = {
'url': 'http://www.franceculture.fr/player/reecouter?play=4795174',
'url': 'http://www.franceculture.fr/emissions/carnet-nomade/rendez-vous-au-pays-des-geeks',
'info_dict': {
'id': '4795174',
'id': 'rendez-vous-au-pays-des-geeks',
'display_id': 'rendez-vous-au-pays-des-geeks',
'ext': 'mp3',
'title': 'Rendez-vous au pays des geeks',
'alt_title': 'Carnet nomade | 13-14',
'vcodec': 'none',
'thumbnail': 're:^https?://.*\\.jpg$',
'upload_date': '20140301',
'thumbnail': r're:^http://static\.franceculture\.fr/.*/images/player/Carnet-nomade\.jpg$',
'description': 'startswith:Avec :Jean-Baptiste Péretié pour son documentaire sur Arte "La revanche',
'timestamp': 1393700400,
'vcodec': 'none',
}
}
def _extract_from_player(self, url, video_id):
webpage = self._download_webpage(url, video_id)
def _real_extract(self, url):
display_id = self._match_id(url)
video_path = self._search_regex(
r'<a id="player".*?href="([^"]+)"', webpage, 'video path')
video_url = compat_urlparse.urljoin(url, video_path)
timestamp = int_or_none(self._search_regex(
r'<a id="player".*?data-date="([0-9]+)"',
webpage = self._download_webpage(url, display_id)
video_url = self._search_regex(
r'(?s)<div[^>]+class="[^"]*?title-zone-diffusion[^"]*?"[^>]*>.*?<a[^>]+href="([^"]+)"',
webpage, 'video path')
title = self._og_search_title(webpage)
upload_date = unified_strdate(self._search_regex(
'(?s)<div[^>]+class="date"[^>]*>.*?<span[^>]+class="inner"[^>]*>([^<]+)<',
webpage, 'upload date', fatal=False))
thumbnail = self._search_regex(
r'<a id="player".*?>\s+<img src="([^"]+)"',
r'(?s)<figure[^>]+itemtype="https://schema.org/ImageObject"[^>]*>.*?<img[^>]+data-pagespeed-(?:lazy|high-res)-src="([^"]+)"',
webpage, 'thumbnail', fatal=False)
display_id = self._search_regex(
r'<span class="path-diffusion">emission-(.*?)</span>', webpage, 'display_id')
title = self._html_search_regex(
r'<span class="title-diffusion">(.*?)</span>', webpage, 'title')
alt_title = self._html_search_regex(
r'<span class="title">(.*?)</span>',
webpage, 'alt_title', fatal=False)
description = self._html_search_regex(
r'<span class="description">(.*?)</span>',
webpage, 'description', fatal=False)
uploader = self._html_search_regex(
r'(?s)<div id="emission".*?<span class="author">(.*?)</span>',
webpage, 'uploader', default=None)
vcodec = 'none' if determine_ext(video_url.lower()) == 'mp3' else None
return {
'id': video_id,
'id': display_id,
'display_id': display_id,
'url': video_url,
'title': title,
'thumbnail': thumbnail,
'vcodec': vcodec,
'uploader': uploader,
'timestamp': timestamp,
'title': title,
'alt_title': alt_title,
'thumbnail': thumbnail,
'description': description,
'display_id': display_id,
'upload_date': upload_date,
}
def _real_extract(self, url):
video_id = self._match_id(url)
return self._extract_from_player(url, video_id)
class FranceCultureEmissionIE(FranceCultureIE):
_VALID_URL = r'https?://(?:www\.)?franceculture\.fr/emission-(?P<id>[^?#]+)'
_TEST = {
'url': 'http://www.franceculture.fr/emission-les-carnets-de-la-creation-jean-gabriel-periot-cineaste-2015-10-13',
'info_dict': {
'title': 'Jean-Gabriel Périot, cinéaste',
'alt_title': 'Les Carnets de la création',
'id': '5093239',
'display_id': 'les-carnets-de-la-creation-jean-gabriel-periot-cineaste-2015-10-13',
'ext': 'mp3',
'timestamp': 1444762500,
'upload_date': '20151013',
'description': 'startswith:Aujourd\'hui dans "Les carnets de la création", le cinéaste',
},
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
video_path = self._html_search_regex(
r'<a class="rf-player-open".*?href="([^"]+)"', webpage, 'video path', 'no_path_player')
if video_path == 'no_path_player':
raise ExtractorError('no player : no sound in this page.', expected=True)
new_id = self._search_regex('play=(?P<id>[0-9]+)', video_path, 'new_id', group='id')
video_url = compat_urlparse.urljoin(url, video_path)
return self._extract_from_player(video_url, new_id)

View File

@@ -14,7 +14,10 @@ from ..utils import (
parse_duration,
determine_ext,
)
from .dailymotion import DailymotionCloudIE
from .dailymotion import (
DailymotionIE,
DailymotionCloudIE,
)
class FranceTVBaseInfoExtractor(InfoExtractor):
@@ -128,7 +131,7 @@ class PluzzIE(FranceTVBaseInfoExtractor):
class FranceTvInfoIE(FranceTVBaseInfoExtractor):
IE_NAME = 'francetvinfo.fr'
_VALID_URL = r'https?://(?:www|mobile|france3-regions)\.francetvinfo\.fr/.*/(?P<title>.+)\.html'
_VALID_URL = r'https?://(?:www|mobile|france3-regions)\.francetvinfo\.fr/(?:[^/]+/)*(?P<title>[^/?#&.]+)'
_TESTS = [{
'url': 'http://www.francetvinfo.fr/replay-jt/france-3/soir-3/jt-grand-soir-3-lundi-26-aout-2013_393427.html',
@@ -188,6 +191,24 @@ class FranceTvInfoIE(FranceTVBaseInfoExtractor):
'params': {
'skip_download': True,
},
}, {
# Dailymotion embed
'url': 'http://www.francetvinfo.fr/politique/notre-dame-des-landes/video-sur-france-inter-cecile-duflot-denonce-le-regard-meprisant-de-patrick-cohen_1520091.html',
'md5': 'ee7f1828f25a648addc90cb2687b1f12',
'info_dict': {
'id': 'x4iiko0',
'ext': 'mp4',
'title': 'NDDL, référendum, Brexit : Cécile Duflot répond à Patrick Cohen',
'description': 'Au lendemain de la victoire du "oui" au référendum sur l\'aéroport de Notre-Dame-des-Landes, l\'ancienne ministre écologiste est l\'invitée de Patrick Cohen. Plus d\'info : https://www.franceinter.fr/emissions/le-7-9/le-7-9-27-juin-2016',
'timestamp': 1467011958,
'upload_date': '20160627',
'uploader': 'France Inter',
'uploader_id': 'x2q2ez',
},
'add_ie': ['Dailymotion'],
}, {
'url': 'http://france3-regions.francetvinfo.fr/limousin/emissions/jt-1213-limousin',
'only_matching': True,
}]
def _real_extract(self, url):
@@ -197,7 +218,13 @@ class FranceTvInfoIE(FranceTVBaseInfoExtractor):
dmcloud_url = DailymotionCloudIE._extract_dmcloud_url(webpage)
if dmcloud_url:
return self.url_result(dmcloud_url, 'DailymotionCloud')
return self.url_result(dmcloud_url, DailymotionCloudIE.ie_key())
dailymotion_urls = DailymotionIE._extract_urls(webpage)
if dailymotion_urls:
return self.playlist_result([
self.url_result(dailymotion_url, DailymotionIE.ie_key())
for dailymotion_url in dailymotion_urls])
video_id, catalogue = self._search_regex(
(r'id-video=([^@]+@[^"]+)',

View File

@@ -1,76 +0,0 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
xpath_text,
xpath_with_ns,
)
from .youtube import YoutubeIE
class GamekingsIE(InfoExtractor):
_VALID_URL = r'https?://www\.gamekings\.nl/(?:videos|nieuws)/(?P<id>[^/]+)'
_TESTS = [{
# YouTube embed video
'url': 'http://www.gamekings.nl/videos/phoenix-wright-ace-attorney-dual-destinies-review/',
'md5': '5208d3a17adeaef829a7861887cb9029',
'info_dict': {
'id': 'HkSQKetlGOU',
'ext': 'mp4',
'title': 'Phoenix Wright: Ace Attorney - Dual Destinies Review',
'description': 'md5:db88c0e7f47e9ea50df3271b9dc72e1d',
'thumbnail': 're:^https?://.*\.jpg$',
'uploader_id': 'UCJugRGo4STYMeFr5RoOShtQ',
'uploader': 'Gamekings Vault',
'upload_date': '20151123',
},
'add_ie': ['Youtube'],
}, {
# vimeo video
'url': 'http://www.gamekings.nl/videos/the-legend-of-zelda-majoras-mask/',
'md5': '12bf04dfd238e70058046937657ea68d',
'info_dict': {
'id': 'the-legend-of-zelda-majoras-mask',
'ext': 'mp4',
'title': 'The Legend of Zelda: Majoras Mask',
'description': 'md5:9917825fe0e9f4057601fe1e38860de3',
'thumbnail': 're:^https?://.*\.jpg$',
},
}, {
'url': 'http://www.gamekings.nl/nieuws/gamekings-extra-shelly-en-david-bereiden-zich-voor-op-de-livestream/',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
playlist_id = self._search_regex(
r'gogoVideo\([^,]+,\s*"([^"]+)', webpage, 'playlist id')
# Check if a YouTube embed is used
if YoutubeIE.suitable(playlist_id):
return self.url_result(playlist_id, ie='Youtube')
playlist = self._download_xml(
'http://www.gamekings.tv/wp-content/themes/gk2010/rss_playlist.php?id=%s' % playlist_id,
video_id)
NS_MAP = {
'jwplayer': 'http://rss.jwpcdn.com/'
}
item = playlist.find('./channel/item')
thumbnail = xpath_text(item, xpath_with_ns('./jwplayer:image', NS_MAP), 'thumbnail')
video_url = item.find(xpath_with_ns('./jwplayer:source', NS_MAP)).get('file')
return {
'id': video_id,
'url': video_url,
'title': self._og_search_title(webpage),
'description': self._og_search_description(webpage),
'thumbnail': thumbnail,
}

View File

@@ -28,10 +28,13 @@ class GameSpotIE(OnceIE):
'url': 'http://www.gamespot.com/videos/the-witcher-3-wild-hunt-xbox-one-now-playing/2300-6424837/',
'info_dict': {
'id': 'gs-2300-6424837',
'ext': 'flv',
'title': 'The Witcher 3: Wild Hunt [Xbox ONE] - Now Playing',
'ext': 'mp4',
'title': 'Now Playing - The Witcher 3: Wild Hunt',
'description': 'Join us as we take a look at the early hours of The Witcher 3: Wild Hunt and more.',
},
'params': {
'skip_download': True, # m3u8 downloads
},
}]
def _real_extract(self, url):

View File

@@ -49,7 +49,10 @@ from .pornhub import PornHubIE
from .xhamster import XHamsterEmbedIE
from .tnaflix import TNAFlixNetworkEmbedIE
from .vimeo import VimeoIE
from .dailymotion import DailymotionCloudIE
from .dailymotion import (
DailymotionIE,
DailymotionCloudIE,
)
from .onionstudios import OnionStudiosIE
from .viewlift import ViewLiftEmbedIE
from .screenwavemedia import ScreenwaveMediaIE
@@ -59,6 +62,7 @@ from .videomore import VideomoreIE
from .googledrive import GoogleDriveIE
from .jwplatform import JWPlatformIE
from .digiteka import DigitekaIE
from .arkena import ArkenaIE
from .instagram import InstagramIE
from .liveleak import LiveLeakIE
from .threeqsdn import ThreeQSDNIE
@@ -67,6 +71,7 @@ from .vessel import VesselIE
from .kaltura import KalturaIE
from .eagleplatform import EaglePlatformIE
from .facebook import FacebookIE
from .soundcloud import SoundcloudIE
class GenericIE(InfoExtractor):
@@ -470,7 +475,7 @@ class GenericIE(InfoExtractor):
'url': 'http://www.vestifinance.ru/articles/25753',
'info_dict': {
'id': '25753',
'title': 'Вести Экономика ― Прямые трансляции с Форума-выставки "Госзаказ-2013"',
'title': 'Прямые трансляции с Форума-выставки "Госзаказ-2013"',
},
'playlist': [{
'info_dict': {
@@ -637,6 +642,8 @@ class GenericIE(InfoExtractor):
'ext': 'mp4',
'title': 'Key and Peele|October 10, 2012|2|203|Liam Neesons - Uncensored',
'description': 'Two valets share their love for movie star Liam Neesons.',
'timestamp': 1349922600,
'upload_date': '20121011',
},
},
# YouTube embed via <data-embed-url="">
@@ -778,6 +785,15 @@ class GenericIE(InfoExtractor):
'upload_date': '20141029',
}
},
# Soundcloud multiple embeds
{
'url': 'http://www.guitarplayer.com/lessons/1014/legato-workout-one-hour-to-more-fluid-performance---tab/52809',
'info_dict': {
'id': '52809',
'title': 'Guitar Essentials: Legato Workout—One-Hour to Fluid Performance | TAB + AUDIO',
},
'playlist_mincount': 7,
},
# Livestream embed
{
'url': 'http://www.esa.int/Our_Activities/Space_Science/Rosetta/Philae_comet_touch-down_webcast',
@@ -853,6 +869,7 @@ class GenericIE(InfoExtractor):
'description': 'md5:601cb790edd05908957dae8aaa866465',
'upload_date': '20150220',
},
'skip': 'All The Daily Show URLs now redirect to http://www.cc.com/shows/',
},
# jwplayer YouTube
{
@@ -1246,6 +1263,20 @@ class GenericIE(InfoExtractor):
'uploader': 'www.hudl.com',
},
},
# twitter:player:stream embed
{
'url': 'http://www.rtl.be/info/video/589263.aspx?CategoryID=288',
'info_dict': {
'id': 'master',
'ext': 'mp4',
'title': 'Une nouvelle espèce de dinosaure découverte en Argentine',
'uploader': 'www.rtl.be',
},
'params': {
# m3u8 downloads
'skip_download': True,
},
},
# twitter:player embed
{
'url': 'http://www.theatlantic.com/video/index/484130/what-do-black-holes-sound-like/',
@@ -1295,6 +1326,70 @@ class GenericIE(InfoExtractor):
'uploader': 'cylus cyrus',
},
},
{
# video stored on custom kaltura server
'url': 'http://www.expansion.com/multimedia/videos.html?media=EQcM30NHIPv',
'md5': '537617d06e64dfed891fa1593c4b30cc',
'info_dict': {
'id': '0_1iotm5bh',
'ext': 'mp4',
'title': 'Elecciones británicas: 5 lecciones para Rajoy',
'description': 'md5:435a89d68b9760b92ce67ed227055f16',
'uploader_id': 'videos.expansion@el-mundo.net',
'upload_date': '20150429',
'timestamp': 1430303472,
},
'add_ie': ['Kaltura'],
},
{
# Non-standard Vimeo embed
'url': 'https://openclassrooms.com/courses/understanding-the-web',
'md5': '64d86f1c7d369afd9a78b38cbb88d80a',
'info_dict': {
'id': '148867247',
'ext': 'mp4',
'title': 'Understanding the web - Teaser',
'description': 'This is "Understanding the web - Teaser" by openclassrooms on Vimeo, the home for high quality videos and the people who love them.',
'upload_date': '20151214',
'uploader': 'OpenClassrooms',
'uploader_id': 'openclassrooms',
},
'add_ie': ['Vimeo'],
},
{
'url': 'https://support.arkena.com/display/PLAY/Ways+to+embed+your+video',
'md5': 'b96f2f71b359a8ecd05ce4e1daa72365',
'info_dict': {
'id': 'b41dda37-d8e7-4d3f-b1b5-9a9db578bdfe',
'ext': 'mp4',
'title': 'Big Buck Bunny',
'description': 'Royalty free test video',
'timestamp': 1432816365,
'upload_date': '20150528',
'is_live': False,
},
'params': {
'skip_download': True,
},
'add_ie': [ArkenaIE.ie_key()],
},
# {
# # TODO: find another test
# # http://schema.org/VideoObject
# 'url': 'https://flipagram.com/f/nyvTSJMKId',
# 'md5': '888dcf08b7ea671381f00fab74692755',
# 'info_dict': {
# 'id': 'nyvTSJMKId',
# 'ext': 'mp4',
# 'title': 'Flipagram by sjuria101 featuring Midnight Memories by One Direction',
# 'description': '#love for cats.',
# 'timestamp': 1461244995,
# 'upload_date': '20160421',
# },
# 'params': {
# 'force_generic_extractor': True,
# },
# }
]
def report_following_redirect(self, new_url):
@@ -1658,12 +1753,9 @@ class GenericIE(InfoExtractor):
if matches:
return _playlist_from_matches(matches, lambda m: m[-1])
# Look for embedded Dailymotion player
matches = re.findall(
r'<(?:(?:embed|iframe)[^>]+?src=|input[^>]+id=[\'"]dmcloudUrlEmissionSelect[\'"][^>]+value=)(["\'])(?P<url>(?:https?:)?//(?:www\.)?dailymotion\.com/(?:embed|swf)/video/.+?)\1', webpage)
matches = DailymotionIE._extract_urls(webpage)
if matches:
return _playlist_from_matches(
matches, lambda m: unescapeHTML(m[1]))
return _playlist_from_matches(matches)
# Look for embedded Dailymotion playlist player (#3822)
m = re.search(
@@ -1917,12 +2009,9 @@ class GenericIE(InfoExtractor):
return self.url_result(myvi_url)
# Look for embedded soundcloud player
mobj = re.search(
r'<iframe\s+(?:[a-zA-Z0-9_-]+="[^"]+"\s+)*src="(?P<url>https?://(?:w\.)?soundcloud\.com/player[^"]+)"',
webpage)
if mobj is not None:
url = unescapeHTML(mobj.group('url'))
return self.url_result(url)
soundcloud_urls = SoundcloudIE._extract_urls(webpage)
if soundcloud_urls:
return _playlist_from_matches(soundcloud_urls, getter=unescapeHTML, ie=SoundcloudIE.ie_key())
# Look for embedded mtvservices player
mtvservices_url = MTVServicesEmbeddedIE._extract_url(webpage)
@@ -2085,6 +2174,11 @@ class GenericIE(InfoExtractor):
if digiteka_url:
return self.url_result(self._proto_relative_url(digiteka_url), DigitekaIE.ie_key())
# Look for Arkena embeds
arkena_url = ArkenaIE._extract_url(webpage)
if arkena_url:
return self.url_result(arkena_url, ArkenaIE.ie_key())
# Look for Limelight embeds
mobj = re.search(r'LimelightPlayer\.doLoad(Media|Channel|ChannelList)\(["\'](?P<id>[a-z0-9]{32})', webpage)
if mobj:
@@ -2113,6 +2207,14 @@ class GenericIE(InfoExtractor):
return self.url_result(
self._proto_relative_url(unescapeHTML(mobj.group(1))), 'Vine')
# Look for VODPlatform embeds
mobj = re.search(
r'<iframe[^>]+src=[\'"]((?:https?:)?//(?:www\.)?vod-platform\.net/embed/[^/?#]+)',
webpage)
if mobj is not None:
return self.url_result(
self._proto_relative_url(unescapeHTML(mobj.group(1))), 'VODPlatform')
# Look for Instagram embeds
instagram_embed_url = InstagramIE._extract_embed_url(webpage)
if instagram_embed_url is not None:
@@ -2137,10 +2239,18 @@ class GenericIE(InfoExtractor):
'uploader': video_uploader,
}
# https://dev.twitter.com/cards/types/player#On_twitter.com_via_desktop_browser
embed_url = self._html_search_meta('twitter:player', webpage, default=None)
if embed_url:
return self.url_result(embed_url)
# Looking for http://schema.org/VideoObject
json_ld = self._search_json_ld(
webpage, video_id, default={}, expected_type='VideoObject')
if json_ld.get('url'):
info_dict.update({
'title': video_title or info_dict['title'],
'description': video_description,
'thumbnail': video_thumbnail,
'age_limit': age_limit
})
info_dict.update(json_ld)
return info_dict
def check_video(vurl):
if YoutubeIE.suitable(vurl):
@@ -2185,6 +2295,9 @@ class GenericIE(InfoExtractor):
r"cinerama\.embedPlayer\(\s*\'[^']+\',\s*'([^']+)'", webpage)
if not found:
# Try to find twitter cards info
# twitter:player:stream should be checked before twitter:player since
# it is expected to contain a raw stream (see
# https://dev.twitter.com/cards/types/player#On_twitter.com_via_desktop_browser)
found = filter_video(re.findall(
r'<meta (?:property|name)="twitter:player:stream" (?:content|value)="(.+?)"', webpage))
if not found:
@@ -2218,6 +2331,15 @@ class GenericIE(InfoExtractor):
'_type': 'url',
'url': new_url,
}
if not found:
# twitter:player is a https URL to iframe player that may or may not
# be supported by youtube-dl thus this is checked the very last (see
# https://dev.twitter.com/cards/types/player#On_twitter.com_via_desktop_browser)
embed_url = self._html_search_meta('twitter:player', webpage, default=None)
if embed_url:
return self.url_result(embed_url)
if not found:
raise UnsupportedError(url)

View File

@@ -1,48 +0,0 @@
from __future__ import unicode_literals
from .common import InfoExtractor
class GoldenMoustacheIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?goldenmoustache\.com/(?P<display_id>[\w-]+)-(?P<id>\d+)'
_TESTS = [{
'url': 'http://www.goldenmoustache.com/suricate-le-poker-3700/',
'md5': '0f904432fa07da5054d6c8beb5efb51a',
'info_dict': {
'id': '3700',
'ext': 'mp4',
'title': 'Suricate - Le Poker',
'description': 'md5:3d1f242f44f8c8cb0a106f1fd08e5dc9',
'thumbnail': 're:^https?://.*\.jpg$',
}
}, {
'url': 'http://www.goldenmoustache.com/le-lab-tout-effacer-mc-fly-et-carlito-55249/',
'md5': '27f0c50fb4dd5f01dc9082fc67cd5700',
'info_dict': {
'id': '55249',
'ext': 'mp4',
'title': 'Le LAB - Tout Effacer (Mc Fly et Carlito)',
'description': 'md5:9b7fbf11023fb2250bd4b185e3de3b2a',
'thumbnail': 're:^https?://.*\.(?:png|jpg)$',
}
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
video_url = self._html_search_regex(
r'data-src-type="mp4" data-src="([^"]+)"', webpage, 'video URL')
title = self._html_search_regex(
r'<title>(.*?)(?: - Golden Moustache)?</title>', webpage, 'title')
thumbnail = self._og_search_thumbnail(webpage)
description = self._og_search_description(webpage)
return {
'id': video_id,
'url': video_url,
'ext': 'mp4',
'title': title,
'description': description,
'thumbnail': thumbnail,
}

View File

@@ -0,0 +1,48 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
int_or_none,
js_to_json,
smuggle_url,
)
class HGTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?hgtv\.ca/[^/]+/video/(?P<id>[^/]+)/video.html'
_TEST = {
'url': 'http://www.hgtv.ca/homefree/video/overnight-success/video.html?v=738081859718&p=1&s=da#video',
'md5': '',
'info_dict': {
'id': 'aFH__I_5FBOX',
'ext': 'mp4',
'title': 'Overnight Success',
'description': 'After weeks of hard work, high stakes, breakdowns and pep talks, the final 2 contestants compete to win the ultimate dream.',
'uploader': 'SHWM-NEW',
'timestamp': 1470320034,
'upload_date': '20160804',
},
'params': {
# m3u8 download
'skip_download': True,
},
}
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
embed_vars = self._parse_json(self._search_regex(
r'(?s)embed_vars\s*=\s*({.*?});',
webpage, 'embed vars'), display_id, js_to_json)
return {
'_type': 'url_transparent',
'url': smuggle_url(
'http://link.theplatform.com/s/dtjsEC/%s?mbr=true&manifest=m3u' % embed_vars['pid'], {
'force_smil_url': True
}),
'series': embed_vars.get('show'),
'season_number': int_or_none(embed_vars.get('season')),
'episode_number': int_or_none(embed_vars.get('episode')),
'ie_key': 'ThePlatform',
}

View File

@@ -197,6 +197,6 @@ class HRTiPlaylistIE(HRTiBaseIE):
response, lambda x: x['video_listings'][0]['alternatives'][0]['list'],
list) or [video['id'] for video in response.get('videos', []) if video.get('id')]
entries = [self.url_result('hrti:%s' % category_id) for category_id in video_ids]
entries = [self.url_result('hrti:%s' % video_id) for video_id in video_ids]
return self.playlist_result(entries, category_id, display_id)

View File

@@ -50,12 +50,10 @@ class ImgurIE(InfoExtractor):
webpage = self._download_webpage(
compat_urlparse.urljoin(url, video_id), video_id)
width = int_or_none(self._search_regex(
r'<param name="width" value="([0-9]+)"',
webpage, 'width', fatal=False))
height = int_or_none(self._search_regex(
r'<param name="height" value="([0-9]+)"',
webpage, 'height', fatal=False))
width = int_or_none(self._og_search_property(
'video:width', webpage, default=None))
height = int_or_none(self._og_search_property(
'video:height', webpage, default=None))
video_elements = self._search_regex(
r'(?s)<div class="video-elements">(.*?)</div>',

View File

@@ -36,7 +36,6 @@ class InstagramIE(InfoExtractor):
'info_dict': {
'id': 'BA-pQFBG8HZ',
'ext': 'mp4',
'uploader_id': 'britneyspears',
'title': 'Video by britneyspears',
'thumbnail': 're:^https?://.*\.jpg',
'timestamp': 1453760977,

View File

@@ -1,10 +1,8 @@
# coding: utf-8
from __future__ import unicode_literals
import binascii
import hashlib
import itertools
import math
import re
import time
@@ -14,12 +12,13 @@ from ..compat import (
compat_urllib_parse_urlencode,
)
from ..utils import (
clean_html,
decode_packed_codes,
get_element_by_id,
get_element_by_attribute,
ExtractorError,
intlist_to_bytes,
ohdave_rsa_encrypt,
remove_start,
urshift,
)
@@ -166,7 +165,7 @@ class IqiyiIE(InfoExtractor):
_TESTS = [{
'url': 'http://www.iqiyi.com/v_19rrojlavg.html',
'md5': '470a6c160618577166db1a7aac5a3606',
# MD5 checksum differs on my machine and Travis CI
'info_dict': {
'id': '9c1fb1b99d192b21c559e5a1a2cb3c73',
'ext': 'mp4',
@@ -174,11 +173,11 @@ class IqiyiIE(InfoExtractor):
}
}, {
'url': 'http://www.iqiyi.com/v_19rrhnnclk.html',
'md5': 'f09f0a6a59b2da66a26bf4eda669a4cc',
'md5': '667171934041350c5de3f5015f7f1152',
'info_dict': {
'id': 'e3f585b550a280af23c98b6cb2be19fb',
'ext': 'mp4',
'title': '名侦探柯南 国语版',
'title': '名侦探柯南 国语版第752集 迫近灰原秘密的黑影 下篇',
},
'skip': 'Geo-restricted to China',
}, {
@@ -196,22 +195,10 @@ class IqiyiIE(InfoExtractor):
'url': 'http://www.iqiyi.com/v_19rrny4w8w.html',
'info_dict': {
'id': 'f3cf468b39dddb30d676f89a91200dc1',
'ext': 'mp4',
'title': '泰坦尼克号',
},
'playlist': [{
'info_dict': {
'id': 'f3cf468b39dddb30d676f89a91200dc1_part1',
'ext': 'f4v',
'title': '泰坦尼克号',
},
}, {
'info_dict': {
'id': 'f3cf468b39dddb30d676f89a91200dc1_part2',
'ext': 'f4v',
'title': '泰坦尼克号',
},
}],
'expected_warnings': ['Needs a VIP account for full video'],
'skip': 'Geo-restricted to China',
}, {
'url': 'http://www.iqiyi.com/a_19rrhb8ce1.html',
'info_dict': {
@@ -224,14 +211,16 @@ class IqiyiIE(InfoExtractor):
'only_matching': True,
}]
_FORMATS_MAP = [
('1', 'h6'),
('2', 'h5'),
('3', 'h4'),
('4', 'h3'),
('5', 'h2'),
('10', 'h1'),
]
_FORMATS_MAP = {
'96': 1, # 216p, 240p
'1': 2, # 336p, 360p
'2': 3, # 480p, 504p
'21': 4, # 504p
'4': 5, # 720p
'17': 5, # 720p
'5': 6, # 1072p, 1080p
'18': 7, # 1080p
}
def _real_initialize(self):
self._login()
@@ -291,101 +280,23 @@ class IqiyiIE(InfoExtractor):
return True
@staticmethod
def _gen_sc(tvid, timestamp):
M = [1732584193, -271733879]
M.extend([~M[0], ~M[1]])
I_table = [7, 12, 17, 22, 5, 9, 14, 20, 4, 11, 16, 23, 6, 10, 15, 21]
C_base = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 8388608, 432]
def L(n, t):
if t is None:
t = 0
return trunc(((n >> 1) + (t >> 1) << 1) + (n & 1) + (t & 1))
def trunc(n):
n = n % 0x100000000
if n > 0x7fffffff:
n -= 0x100000000
return n
def transform(string, mod):
num = int(string, 16)
return (num >> 8 * (i % 4) & 255 ^ i % mod) << ((a & 3) << 3)
C = list(C_base)
o = list(M)
k = str(timestamp - 7)
for i in range(13):
a = i
C[a >> 2] |= ord(k[a]) << 8 * (a % 4)
for i in range(16):
a = i + 13
start = (i >> 2) * 8
r = '03967743b643f66763d623d637e30733'
C[a >> 2] |= transform(''.join(reversed(r[start:start + 8])), 7)
for i in range(16):
a = i + 29
start = (i >> 2) * 8
r = '7038766939776a32776a32706b337139'
C[a >> 2] |= transform(r[start:start + 8], 1)
for i in range(9):
a = i + 45
if i < len(tvid):
C[a >> 2] |= ord(tvid[i]) << 8 * (a % 4)
for a in range(64):
i = a
I = i >> 4
C_index = [i, 5 * i + 1, 3 * i + 5, 7 * i][I] % 16 + urshift(a, 6)
m = L(L(o[0], [
trunc(o[1] & o[2]) | trunc(~o[1] & o[3]),
trunc(o[3] & o[1]) | trunc(~o[3] & o[2]),
o[1] ^ o[2] ^ o[3],
o[2] ^ trunc(o[1] | ~o[3])
][I]), L(
trunc(int(abs(math.sin(i + 1)) * 4294967296)),
C[C_index] if C_index < len(C) else None))
I = I_table[4 * I + i % 4]
o = [o[3],
L(o[1], trunc(trunc(m << I) | urshift(m, 32 - I))),
o[1],
o[2]]
new_M = [L(o[0], M[0]), L(o[1], M[1]), L(o[2], M[2]), L(o[3], M[3])]
s = [new_M[a >> 3] >> (1 ^ a & 7) * 4 & 15 for a in range(32)]
return binascii.hexlify(intlist_to_bytes(s))[1::2].decode('ascii')
def get_raw_data(self, tvid, video_id):
tm = int(time.time() * 1000)
sc = self._gen_sc(tvid, tm)
key = 'd5fb4bd9d50c4be6948c97edd7254b0e'
sc = md5_text(compat_str(tm) + key + tvid)
params = {
'platForm': 'h5',
'rate': 1,
'tvid': tvid,
'vid': video_id,
'cupid': 'qc_100001_100186',
'type': 'mp4',
'nolimit': 0,
'agenttype': 13,
'src': 'd846d0c32d664d32b6b54ea48997a589',
'src': '76f90cbd92f94a2e925d83e8ccd22cb7',
'sc': sc,
't': tm - 7,
'__jsT': None,
't': tm,
}
headers = {}
cn_verification_proxy = self._downloader.params.get('cn_verification_proxy')
if cn_verification_proxy:
headers['Ytdl-request-proxy'] = cn_verification_proxy
return self._download_json(
'http://cache.m.iqiyi.com/jp/tmts/%s/%s/' % (tvid, video_id),
video_id, transform_source=lambda s: remove_start(s, 'var tvInfoJs='),
query=params, headers=headers)
query=params, headers=self.geo_verification_headers())
def _extract_playlist(self, webpage):
PAGE_SIZE = 50
@@ -435,6 +346,7 @@ class IqiyiIE(InfoExtractor):
video_id = self._search_regex(
r'data-player-videoid\s*=\s*[\'"]([a-f\d]+)', webpage, 'video_id')
formats = []
for _ in range(5):
raw_data = self.get_raw_data(tvid, video_id)
@@ -445,16 +357,29 @@ class IqiyiIE(InfoExtractor):
data = raw_data['data']
# iQiYi sometimes returns Ads
if not isinstance(data['playInfo'], dict):
self._sleep(5, video_id)
continue
for stream in data['vidl']:
if 'm3utx' not in stream:
continue
vd = compat_str(stream['vd'])
formats.append({
'url': stream['m3utx'],
'format_id': vd,
'ext': 'mp4',
'preference': self._FORMATS_MAP.get(vd, -1),
'protocol': 'm3u8_native',
})
title = data['playInfo']['an']
break
if formats:
break
self._sleep(5, video_id)
self._sort_formats(formats)
title = (get_element_by_id('widget-videotitle', webpage) or
clean_html(get_element_by_attribute('class', 'mod-play-tit', webpage)))
return {
'id': video_id,
'title': title,
'url': data['m3u'],
'formats': formats,
}

View File

@@ -4,10 +4,12 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_urlparse
from ..utils import (
determine_ext,
float_or_none,
int_or_none,
mimetype2ext,
)
@@ -28,74 +30,84 @@ class JWPlatformBaseIE(InfoExtractor):
return self._parse_jwplayer_data(
jwplayer_data, video_id, *args, **kwargs)
def _parse_jwplayer_data(self, jwplayer_data, video_id, require_title=True, m3u8_id=None, rtmp_params=None):
def _parse_jwplayer_data(self, jwplayer_data, video_id, require_title=True, m3u8_id=None, rtmp_params=None, base_url=None):
# JWPlayer backward compatibility: flattened playlists
# https://github.com/jwplayer/jwplayer/blob/v7.4.3/src/js/api/config.js#L81-L96
if 'playlist' not in jwplayer_data:
jwplayer_data = {'playlist': [jwplayer_data]}
video_data = jwplayer_data['playlist'][0]
entries = []
for video_data in jwplayer_data['playlist']:
# JWPlayer backward compatibility: flattened sources
# https://github.com/jwplayer/jwplayer/blob/v7.4.3/src/js/playlist/item.js#L29-L35
if 'sources' not in video_data:
video_data['sources'] = [video_data]
# JWPlayer backward compatibility: flattened sources
# https://github.com/jwplayer/jwplayer/blob/v7.4.3/src/js/playlist/item.js#L29-L35
if 'sources' not in video_data:
video_data['sources'] = [video_data]
formats = []
for source in video_data['sources']:
source_url = self._proto_relative_url(source['file'])
source_type = source.get('type') or ''
if source_type in ('application/vnd.apple.mpegurl', 'hls') or determine_ext(source_url) == 'm3u8':
formats.extend(self._extract_m3u8_formats(
source_url, video_id, 'mp4', 'm3u8_native', m3u8_id=m3u8_id, fatal=False))
elif source_type.startswith('audio'):
formats.append({
'url': source_url,
'vcodec': 'none',
})
else:
a_format = {
'url': source_url,
'width': int_or_none(source.get('width')),
'height': int_or_none(source.get('height')),
}
if source_url.startswith('rtmp'):
a_format['ext'] = 'flv',
# See com/longtailvideo/jwplayer/media/RTMPMediaProvider.as
# of jwplayer.flash.swf
rtmp_url_parts = re.split(
r'((?:mp4|mp3|flv):)', source_url, 1)
if len(rtmp_url_parts) == 3:
rtmp_url, prefix, play_path = rtmp_url_parts
a_format.update({
'url': rtmp_url,
'play_path': prefix + play_path,
})
if rtmp_params:
a_format.update(rtmp_params)
formats.append(a_format)
self._sort_formats(formats)
subtitles = {}
tracks = video_data.get('tracks')
if tracks and isinstance(tracks, list):
for track in tracks:
if track.get('file') and track.get('kind') == 'captions':
subtitles.setdefault(track.get('label') or 'en', []).append({
'url': self._proto_relative_url(track['file'])
formats = []
for source in video_data['sources']:
source_url = self._proto_relative_url(source['file'])
if base_url:
source_url = compat_urlparse.urljoin(base_url, source_url)
source_type = source.get('type') or ''
ext = mimetype2ext(source_type) or determine_ext(source_url)
if source_type == 'hls' or ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
source_url, video_id, 'mp4', 'm3u8_native', m3u8_id=m3u8_id, fatal=False))
# https://github.com/jwplayer/jwplayer/blob/master/src/js/providers/default.js#L67
elif source_type.startswith('audio') or ext in ('oga', 'aac', 'mp3', 'mpeg', 'vorbis'):
formats.append({
'url': source_url,
'vcodec': 'none',
'ext': ext,
})
else:
a_format = {
'url': source_url,
'width': int_or_none(source.get('width')),
'height': int_or_none(source.get('height')),
'ext': ext,
}
if source_url.startswith('rtmp'):
a_format['ext'] = 'flv',
return {
'id': video_id,
'title': video_data['title'] if require_title else video_data.get('title'),
'description': video_data.get('description'),
'thumbnail': self._proto_relative_url(video_data.get('image')),
'timestamp': int_or_none(video_data.get('pubdate')),
'duration': float_or_none(jwplayer_data.get('duration')),
'subtitles': subtitles,
'formats': formats,
}
# See com/longtailvideo/jwplayer/media/RTMPMediaProvider.as
# of jwplayer.flash.swf
rtmp_url_parts = re.split(
r'((?:mp4|mp3|flv):)', source_url, 1)
if len(rtmp_url_parts) == 3:
rtmp_url, prefix, play_path = rtmp_url_parts
a_format.update({
'url': rtmp_url,
'play_path': prefix + play_path,
})
if rtmp_params:
a_format.update(rtmp_params)
formats.append(a_format)
self._sort_formats(formats)
subtitles = {}
tracks = video_data.get('tracks')
if tracks and isinstance(tracks, list):
for track in tracks:
if track.get('file') and track.get('kind') == 'captions':
subtitles.setdefault(track.get('label') or 'en', []).append({
'url': self._proto_relative_url(track['file'])
})
entries.append({
'id': video_id,
'title': video_data['title'] if require_title else video_data.get('title'),
'description': video_data.get('description'),
'thumbnail': self._proto_relative_url(video_data.get('image')),
'timestamp': int_or_none(video_data.get('pubdate')),
'duration': float_or_none(jwplayer_data.get('duration')),
'subtitles': subtitles,
'formats': formats,
})
if len(entries) == 1:
return entries[0]
else:
return self.playlist_result(entries)
class JWPlatformIE(JWPlatformBaseIE):

View File

@@ -6,7 +6,6 @@ import base64
from .common import InfoExtractor
from ..compat import (
compat_urllib_parse_urlencode,
compat_urlparse,
compat_parse_qs,
)
@@ -15,6 +14,7 @@ from ..utils import (
ExtractorError,
int_or_none,
unsmuggle_url,
smuggle_url,
)
@@ -34,7 +34,8 @@ class KalturaIE(InfoExtractor):
)(?:/(?P<path>[^?]+))?(?:\?(?P<query>.*))?
)
'''
_API_BASE = 'http://cdnapi.kaltura.com/api_v3/index.php?'
_SERVICE_URL = 'http://cdnapi.kaltura.com'
_SERVICE_BASE = '/api_v3/index.php'
_TESTS = [
{
'url': 'kaltura:269692:1_1jc2y3e4',
@@ -61,6 +62,11 @@ class KalturaIE(InfoExtractor):
{
'url': 'https://cdnapisec.kaltura.com/html5/html5lib/v2.30.2/mwEmbedFrame.php/p/1337/uiconf_id/20540612/entry_id/1_sf5ovm7u?wid=_243342',
'only_matching': True,
},
{
# video with subtitles
'url': 'kaltura:111032:1_cw786r8q',
'only_matching': True,
}
]
@@ -88,18 +94,26 @@ class KalturaIE(InfoExtractor):
(?P<q3>["\'])(?P<id>.+?)(?P=q3)
''', webpage))
if mobj:
return 'kaltura:%(partner_id)s:%(id)s' % mobj.groupdict()
embed_info = mobj.groupdict()
url = 'kaltura:%(partner_id)s:%(id)s' % embed_info
escaped_pid = re.escape(embed_info['partner_id'])
service_url = re.search(
r'<script[^>]+src=["\']((?:https?:)?//.+?)/p/%s/sp/%s00/embedIframeJs' % (escaped_pid, escaped_pid),
webpage)
if service_url:
url = smuggle_url(url, {'service_url': service_url.group(1)})
return url
def _kaltura_api_call(self, video_id, actions, *args, **kwargs):
def _kaltura_api_call(self, video_id, actions, service_url=None, *args, **kwargs):
params = actions[0]
if len(actions) > 1:
for i, a in enumerate(actions[1:], start=1):
for k, v in a.items():
params['%d:%s' % (i, k)] = v
query = compat_urllib_parse_urlencode(params)
url = self._API_BASE + query
data = self._download_json(url, video_id, *args, **kwargs)
data = self._download_json(
(service_url or self._SERVICE_URL) + self._SERVICE_BASE,
video_id, query=params, *args, **kwargs)
status = data if len(actions) == 1 else data[0]
if status.get('objectType') == 'KalturaAPIException':
@@ -108,7 +122,7 @@ class KalturaIE(InfoExtractor):
return data
def _get_kaltura_signature(self, video_id, partner_id):
def _get_kaltura_signature(self, video_id, partner_id, service_url=None):
actions = [{
'apiVersion': '3.1',
'expiry': 86400,
@@ -118,10 +132,9 @@ class KalturaIE(InfoExtractor):
'widgetId': '_%s' % partner_id,
}]
return self._kaltura_api_call(
video_id, actions, note='Downloading Kaltura signature')['ks']
video_id, actions, service_url, note='Downloading Kaltura signature')['ks']
def _get_video_info(self, video_id, partner_id):
signature = self._get_kaltura_signature(video_id, partner_id)
def _get_video_info(self, video_id, partner_id, service_url=None):
actions = [
{
'action': 'null',
@@ -129,22 +142,34 @@ class KalturaIE(InfoExtractor):
'clientTag': 'kdp:v3.8.5',
'format': 1, # JSON, 2 = XML, 3 = PHP
'service': 'multirequest',
'ks': signature,
},
{
'expiry': 86400,
'service': 'session',
'action': 'startWidgetSession',
'widgetId': '_%s' % partner_id,
},
{
'action': 'get',
'entryId': video_id,
'service': 'baseentry',
'version': '-1',
'ks': '{1:result:ks}',
},
{
'action': 'getbyentryid',
'entryId': video_id,
'service': 'flavorAsset',
'ks': '{1:result:ks}',
},
{
'action': 'list',
'filter:entryIdEqual': video_id,
'service': 'caption_captionasset',
'ks': '{1:result:ks}',
},
]
return self._kaltura_api_call(
video_id, actions, note='Downloading video info JSON')
video_id, actions, service_url, note='Downloading video info JSON')
def _real_extract(self, url):
url, smuggled_data = unsmuggle_url(url, {})
@@ -152,8 +177,9 @@ class KalturaIE(InfoExtractor):
mobj = re.match(self._VALID_URL, url)
partner_id, entry_id = mobj.group('partner_id', 'id')
ks = None
captions = None
if partner_id and entry_id:
info, flavor_assets = self._get_video_info(entry_id, partner_id)
_, info, flavor_assets, captions = self._get_video_info(entry_id, partner_id, smuggled_data.get('service_url'))
else:
path, query = mobj.group('path', 'query')
if not path and not query:
@@ -172,7 +198,7 @@ class KalturaIE(InfoExtractor):
raise ExtractorError('Invalid URL', expected=True)
if 'entry_id' in params:
entry_id = params['entry_id'][0]
info, flavor_assets = self._get_video_info(entry_id, partner_id)
_, info, flavor_assets, captions = self._get_video_info(entry_id, partner_id)
elif 'uiconf_id' in params and 'flashvars[referenceId]' in params:
reference_id = params['flashvars[referenceId]'][0]
webpage = self._download_webpage(url, reference_id)
@@ -201,12 +227,17 @@ class KalturaIE(InfoExtractor):
unsigned_url += '?referrer=%s' % referrer
return unsigned_url
data_url = info['dataUrl']
if '/flvclipper/' in data_url:
data_url = re.sub(r'/flvclipper/.*', '/serveFlavor', data_url)
formats = []
for f in flavor_assets:
# Continue if asset is not ready
if f['status'] != 2:
if f.get('status') != 2:
continue
video_url = sign_url('%s/flavorId/%s' % (info['dataUrl'], f['id']))
video_url = sign_url(
'%s/flavorId/%s' % (data_url, f['id']))
formats.append({
'format_id': '%(fileExt)s-%(bitrate)s' % f,
'ext': f.get('fileExt'),
@@ -219,17 +250,31 @@ class KalturaIE(InfoExtractor):
'width': int_or_none(f.get('width')),
'url': video_url,
})
m3u8_url = sign_url(info['dataUrl'].replace('format/url', 'format/applehttp'))
formats.extend(self._extract_m3u8_formats(
m3u8_url, entry_id, 'mp4', 'm3u8_native', m3u8_id='hls', fatal=False))
if '/playManifest/' in data_url:
m3u8_url = sign_url(data_url.replace(
'format/url', 'format/applehttp'))
formats.extend(self._extract_m3u8_formats(
m3u8_url, entry_id, 'mp4', 'm3u8_native',
m3u8_id='hls', fatal=False))
self._check_formats(formats, entry_id)
self._sort_formats(formats)
subtitles = {}
if captions:
for caption in captions.get('objects', []):
# Continue if caption is not ready
if f.get('status') != 2:
continue
subtitles.setdefault(caption.get('languageCode') or caption.get('language'), []).append({
'url': '%s/api_v3/service/caption_captionasset/action/serve/captionAssetId/%s' % (self._SERVICE_URL, caption['id']),
'ext': caption.get('fileExt'),
})
return {
'id': entry_id,
'title': info['name'],
'formats': formats,
'subtitles': subtitles,
'description': clean_html(info.get('description')),
'thumbnail': info.get('thumbnailUrl'),
'duration': info.get('duration'),

View File

@@ -0,0 +1,71 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
int_or_none,
qualities,
)
class KamcordIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?kamcord\.com/v/(?P<id>[^/?#&]+)'
_TEST = {
'url': 'https://www.kamcord.com/v/hNYRduDgWb4',
'md5': 'c3180e8a9cfac2e86e1b88cb8751b54c',
'info_dict': {
'id': 'hNYRduDgWb4',
'ext': 'mp4',
'title': 'Drinking Madness',
'uploader': 'jacksfilms',
'uploader_id': '3044562',
'view_count': int,
'like_count': int,
'comment_count': int,
},
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
video = self._parse_json(
self._search_regex(
r'window\.__props\s*=\s*({.+?});?(?:\n|\s*</script)',
webpage, 'video'),
video_id)['video']
title = video['title']
formats = self._extract_m3u8_formats(
video['play']['hls'], video_id, 'mp4', entry_protocol='m3u8_native')
self._sort_formats(formats)
uploader = video.get('user', {}).get('username')
uploader_id = video.get('user', {}).get('id')
view_count = int_or_none(video.get('viewCount'))
like_count = int_or_none(video.get('heartCount'))
comment_count = int_or_none(video.get('messageCount'))
preference_key = qualities(('small', 'medium', 'large'))
thumbnails = [{
'url': thumbnail_url,
'id': thumbnail_id,
'preference': preference_key(thumbnail_id),
} for thumbnail_id, thumbnail_url in (video.get('thumbnail') or {}).items()
if isinstance(thumbnail_id, compat_str) and isinstance(thumbnail_url, compat_str)]
return {
'id': video_id,
'title': title,
'uploader': uploader,
'uploader_id': uploader_id,
'view_count': view_count,
'like_count': like_count,
'comment_count': comment_count,
'thumbnails': thumbnails,
'formats': formats,
}

View File

@@ -4,6 +4,7 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_urlparse
from ..utils import (
get_element_by_id,
clean_html,
@@ -26,11 +27,6 @@ class KuwoBaseIE(InfoExtractor):
def _get_formats(self, song_id, tolerate_ip_deny=False):
formats = []
for file_format in self._FORMATS:
headers = {}
cn_verification_proxy = self._downloader.params.get('cn_verification_proxy')
if cn_verification_proxy:
headers['Ytdl-request-proxy'] = cn_verification_proxy
query = {
'format': file_format['ext'],
'br': file_format.get('br', ''),
@@ -42,7 +38,7 @@ class KuwoBaseIE(InfoExtractor):
song_url = self._download_webpage(
'http://antiserver.kuwo.cn/anti.s',
song_id, note='Download %s url info' % file_format['format'],
query=query, headers=headers,
query=query, headers=self.geo_verification_headers(),
)
if song_url == 'IPDeny' and not tolerate_ip_deny:
@@ -247,8 +243,9 @@ class KuwoSingerIE(InfoExtractor):
query={'artistId': artist_id, 'pn': page_num, 'rn': self.PAGE_SIZE})
return [
self.url_result(song_url, 'Kuwo') for song_url in re.findall(
r'<div[^>]+class="name"><a[^>]+href="(http://www\.kuwo\.cn/yinyue/\d+)',
self.url_result(compat_urlparse.urljoin(url, song_url), 'Kuwo')
for song_url in re.findall(
r'<div[^>]+class="name"><a[^>]+href="(/yinyue/\d+)',
webpage)
]

View File

@@ -3,8 +3,8 @@ from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
determine_ext,
js_to_json,
smuggle_url,
)
@@ -18,13 +18,16 @@ class LA7IE(InfoExtractor):
_TESTS = [{
# 'src' is a plain URL
'url': 'http://www.la7.it/crozza/video/inccool8-02-10-2015-163722',
'md5': '6054674766e7988d3e02f2148ff92180',
'md5': '8b613ffc0c4bf9b9e377169fc19c214c',
'info_dict': {
'id': 'inccool8-02-10-2015-163722',
'ext': 'mp4',
'title': 'Inc.Cool8',
'description': 'Benvenuti nell\'incredibile mondo della INC. COOL. 8. dove “INC.” sta per “Incorporated” “COOL” sta per “fashion” ed Eight sta per il gesto atletico',
'thumbnail': 're:^https?://.*',
'uploader_id': 'kdla7pillole@iltrovatore.it',
'timestamp': 1443814869,
'upload_date': '20151002',
},
}, {
# 'src' is a dictionary
@@ -49,26 +52,14 @@ class LA7IE(InfoExtractor):
self._search_regex(r'videoLa7\(({[^;]+})\);', webpage, 'player data'),
video_id, transform_source=js_to_json)
source = player_data['src']
source_urls = source.values() if isinstance(source, dict) else [source]
formats = []
for source_url in source_urls:
ext = determine_ext(source_url)
if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
source_url, video_id, ext='mp4',
entry_protocol='m3u8_native', m3u8_id='hls'))
else:
formats.append({
'url': source_url,
})
self._sort_formats(formats)
return {
'_type': 'url_transparent',
'url': smuggle_url('kaltura:103:%s' % player_data['vid'], {
'service_url': 'http://kdam.iltrovatore.it',
}),
'id': video_id,
'title': player_data['title'],
'description': self._og_search_description(webpage, default=None),
'thumbnail': player_data.get('poster'),
'formats': formats,
'ie_key': 'Kaltura',
}

View File

@@ -0,0 +1,90 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from .arkena import ArkenaIE
class LcpPlayIE(ArkenaIE):
_VALID_URL = r'https?://play\.lcp\.fr/embed/(?P<id>[^/]+)/(?P<account_id>[^/]+)/[^/]+/[^/]+'
_TESTS = [{
'url': 'http://play.lcp.fr/embed/327336/131064/darkmatter/0',
'md5': 'b8bd9298542929c06c1c15788b1f277a',
'info_dict': {
'id': '327336',
'ext': 'mp4',
'title': '327336',
'timestamp': 1456391602,
'upload_date': '20160225',
},
'params': {
'skip_download': True,
},
}]
class LcpIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?lcp\.fr/(?:[^/]+/)*(?P<id>[^/]+)'
_TESTS = [{
# arkena embed
'url': 'http://www.lcp.fr/la-politique-en-video/schwartzenberg-prg-preconise-francois-hollande-de-participer-une-primaire',
'md5': 'b8bd9298542929c06c1c15788b1f277a',
'info_dict': {
'id': 'd56d03e9',
'ext': 'mp4',
'title': 'Schwartzenberg (PRG) préconise à François Hollande de participer à une primaire à gauche',
'description': 'md5:96ad55009548da9dea19f4120c6c16a8',
'timestamp': 1456488895,
'upload_date': '20160226',
},
'params': {
'skip_download': True,
},
}, {
# dailymotion live stream
'url': 'http://www.lcp.fr/le-direct',
'info_dict': {
'id': 'xji3qy',
'ext': 'mp4',
'title': 'La Chaine Parlementaire (LCP), Live TNT',
'description': 'md5:5c69593f2de0f38bd9a949f2c95e870b',
'uploader': 'LCP',
'uploader_id': 'xbz33d',
'timestamp': 1308923058,
'upload_date': '20110624',
},
'params': {
# m3u8 live stream
'skip_download': True,
},
}, {
'url': 'http://www.lcp.fr/emissions/277792-les-volontaires',
'only_matching': True,
}]
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
play_url = self._search_regex(
r'<iframe[^>]+src=(["\'])(?P<url>%s?(?:(?!\1).)*)\1' % LcpPlayIE._VALID_URL,
webpage, 'play iframe', default=None, group='url')
if not play_url:
return self.url_result(url, 'Generic')
title = self._og_search_title(webpage, default=None) or self._html_search_meta(
'twitter:title', webpage, fatal=True)
description = self._html_search_meta(
('description', 'twitter:description'), webpage)
return {
'_type': 'url_transparent',
'ie_key': LcpPlayIE.ie_key(),
'url': play_url,
'display_id': display_id,
'title': title,
'description': description,
}

View File

@@ -20,10 +20,10 @@ from ..utils import (
int_or_none,
orderedSet,
parse_iso8601,
sanitized_Request,
str_or_none,
url_basename,
urshift,
update_url_query,
)
@@ -90,6 +90,10 @@ class LeIE(InfoExtractor):
_loc3_ = self.ror(_loc3_, _loc2_ % 17)
return _loc3_
# reversed from http://jstatic.letvcdn.com/sdk/player.js
def get_mms_key(self, time):
return self.ror(time, 8) ^ 185025305
# see M3U8Encryption class in KLetvPlayer.swf
@staticmethod
def decrypt_m3u8(encrypted_data):
@@ -110,28 +114,7 @@ class LeIE(InfoExtractor):
return bytes(_loc7_)
def _real_extract(self, url):
media_id = self._match_id(url)
page = self._download_webpage(url, media_id)
params = {
'id': media_id,
'platid': 1,
'splatid': 101,
'format': 1,
'tkey': self.calc_time_key(int(time.time())),
'domain': 'www.le.com'
}
play_json_req = sanitized_Request(
'http://api.le.com/mms/out/video/playJson?' + compat_urllib_parse_urlencode(params)
)
cn_verification_proxy = self._downloader.params.get('cn_verification_proxy')
if cn_verification_proxy:
play_json_req.add_header('Ytdl-request-proxy', cn_verification_proxy)
play_json = self._download_json(
play_json_req,
media_id, 'Downloading playJson data')
def _check_errors(self, play_json):
# Check for errors
playstatus = play_json['playstatus']
if playstatus['status'] == 0:
@@ -142,43 +125,99 @@ class LeIE(InfoExtractor):
msg = 'Generic error. flag = %d' % flag
raise ExtractorError(msg, expected=True)
playurl = play_json['playurl']
def _real_extract(self, url):
media_id = self._match_id(url)
page = self._download_webpage(url, media_id)
formats = ['350', '1000', '1300', '720p', '1080p']
dispatch = playurl['dispatch']
play_json_h5 = self._download_json(
'http://api.le.com/mms/out/video/playJsonH5',
media_id, 'Downloading html5 playJson data', query={
'id': media_id,
'platid': 3,
'splatid': 304,
'format': 1,
'tkey': self.get_mms_key(int(time.time())),
'domain': 'www.le.com',
'tss': 'no',
},
headers=self.geo_verification_headers())
self._check_errors(play_json_h5)
urls = []
for format_id in formats:
if format_id in dispatch:
media_url = playurl['domain'][0] + dispatch[format_id][0]
media_url += '&' + compat_urllib_parse_urlencode({
'm3v': 1,
play_json_flash = self._download_json(
'http://api.le.com/mms/out/video/playJson',
media_id, 'Downloading flash playJson data', query={
'id': media_id,
'platid': 1,
'splatid': 101,
'format': 1,
'tkey': self.calc_time_key(int(time.time())),
'domain': 'www.le.com',
},
headers=self.geo_verification_headers())
self._check_errors(play_json_flash)
def get_h5_urls(media_url, format_id):
location = self._download_json(
media_url, media_id,
'Download JSON metadata for format %s' % format_id, query={
'format': 1,
'expect': 3,
'rateid': format_id,
})
'tss': 'no',
})['location']
nodes_data = self._download_json(
media_url, media_id,
'Download JSON metadata for format %s' % format_id)
return {
'http': update_url_query(location, {'tss': 'no'}),
'hls': update_url_query(location, {'tss': 'ios'}),
}
req = self._request_webpage(
nodes_data['nodelist'][0]['location'], media_id,
note='Downloading m3u8 information for format %s' % format_id)
def get_flash_urls(media_url, format_id):
media_url += '&' + compat_urllib_parse_urlencode({
'm3v': 1,
'format': 1,
'expect': 3,
'rateid': format_id,
})
m3u8_data = self.decrypt_m3u8(req.read())
nodes_data = self._download_json(
media_url, media_id,
'Download JSON metadata for format %s' % format_id)
url_info_dict = {
'url': encode_data_uri(m3u8_data, 'application/vnd.apple.mpegurl'),
'ext': determine_ext(dispatch[format_id][1]),
'format_id': format_id,
'protocol': 'm3u8',
}
req = self._request_webpage(
nodes_data['nodelist'][0]['location'], media_id,
note='Downloading m3u8 information for format %s' % format_id)
if format_id[-1:] == 'p':
url_info_dict['height'] = int_or_none(format_id[:-1])
m3u8_data = self.decrypt_m3u8(req.read())
urls.append(url_info_dict)
return {
'hls': encode_data_uri(m3u8_data, 'application/vnd.apple.mpegurl'),
}
extracted_formats = []
formats = []
for play_json, get_urls in ((play_json_h5, get_h5_urls), (play_json_flash, get_flash_urls)):
playurl = play_json['playurl']
play_domain = playurl['domain'][0]
for format_id, format_data in playurl.get('dispatch', []).items():
if format_id in extracted_formats:
continue
extracted_formats.append(format_id)
media_url = play_domain + format_data[0]
for protocol, format_url in get_urls(media_url, format_id).items():
f = {
'url': format_url,
'ext': determine_ext(format_data[1]),
'format_id': '%s-%s' % (protocol, format_id),
'protocol': 'm3u8_native' if protocol == 'hls' else 'http',
'quality': int_or_none(format_id),
}
if format_id[-1:] == 'p':
f['height'] = int_or_none(format_id[:-1])
formats.append(f)
self._sort_formats(formats, ('height', 'quality', 'format_id'))
publish_time = parse_iso8601(self._html_search_regex(
r'发布时间&nbsp;([^<>]+) ', page, 'publish time', default=None),
@@ -187,7 +226,7 @@ class LeIE(InfoExtractor):
return {
'id': media_id,
'formats': urls,
'formats': formats,
'title': playurl['title'],
'thumbnail': playurl['pic'],
'description': description,

View File

@@ -37,11 +37,12 @@ class LimelightBaseIE(InfoExtractor):
for stream in streams:
stream_url = stream.get('url')
if not stream_url:
if not stream_url or stream.get('drmProtected'):
continue
if '.f4m' in stream_url:
ext = determine_ext(stream_url)
if ext == 'f4m':
formats.extend(self._extract_f4m_formats(
stream_url, video_id, fatal=False))
stream_url, video_id, f4m_id='hds', fatal=False))
else:
fmt = {
'url': stream_url,
@@ -50,13 +51,19 @@ class LimelightBaseIE(InfoExtractor):
'fps': float_or_none(stream.get('videoFrameRate')),
'width': int_or_none(stream.get('videoWidthInPixels')),
'height': int_or_none(stream.get('videoHeightInPixels')),
'ext': determine_ext(stream_url)
'ext': ext,
}
rtmp = re.search(r'^(?P<url>rtmpe?://[^/]+/(?P<app>.+))/(?P<playpath>mp4:.+)$', stream_url)
rtmp = re.search(r'^(?P<url>rtmpe?://(?P<host>[^/]+)/(?P<app>.+))/(?P<playpath>mp4:.+)$', stream_url)
if rtmp:
format_id = 'rtmp'
if stream.get('videoBitRate'):
format_id += '-%d' % int_or_none(stream['videoBitRate'])
http_fmt = fmt.copy()
http_fmt.update({
'url': 'http://%s/%s' % (rtmp.group('host').replace('csl.', 'cpl.'), rtmp.group('playpath')[4:]),
'format_id': format_id.replace('rtmp', 'http'),
})
formats.append(http_fmt)
fmt.update({
'url': rtmp.group('url'),
'play_path': rtmp.group('playpath'),
@@ -68,18 +75,23 @@ class LimelightBaseIE(InfoExtractor):
for mobile_url in mobile_urls:
media_url = mobile_url.get('mobileUrl')
if not media_url:
continue
format_id = mobile_url.get('targetMediaPlatform')
if determine_ext(media_url) == 'm3u8':
if not media_url or format_id == 'Widevine':
continue
ext = determine_ext(media_url)
if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
media_url, video_id, 'mp4', 'm3u8_native',
m3u8_id=format_id, fatal=False))
elif ext == 'f4m':
formats.extend(self._extract_f4m_formats(
stream_url, video_id, f4m_id=format_id, fatal=False))
else:
formats.append({
'url': media_url,
'format_id': format_id,
'preference': -1,
'ext': ext,
})
self._sort_formats(formats)
@@ -145,7 +157,7 @@ class LimelightMediaIE(LimelightBaseIE):
'url': 'http://link.videoplatform.limelight.com/media/?mediaId=3ffd040b522b4485b6d84effc750cd86',
'info_dict': {
'id': '3ffd040b522b4485b6d84effc750cd86',
'ext': 'flv',
'ext': 'mp4',
'title': 'HaP and the HB Prince Trailer',
'description': 'md5:8005b944181778e313d95c1237ddb640',
'thumbnail': 're:^https?://.*\.jpeg$',
@@ -154,27 +166,23 @@ class LimelightMediaIE(LimelightBaseIE):
'upload_date': '20090604',
},
'params': {
# rtmp download
# m3u8 download
'skip_download': True,
},
}, {
# video with subtitles
'url': 'limelight:media:a3e00274d4564ec4a9b29b9466432335',
'md5': '2fa3bad9ac321e23860ca23bc2c69e3d',
'info_dict': {
'id': 'a3e00274d4564ec4a9b29b9466432335',
'ext': 'flv',
'ext': 'mp4',
'title': '3Play Media Overview Video',
'description': '',
'thumbnail': 're:^https?://.*\.jpeg$',
'duration': 78.101,
'timestamp': 1338929955,
'upload_date': '20120605',
'subtitles': 'mincount:9',
},
'params': {
# rtmp download
'skip_download': True,
},
}, {
'url': 'https://assets.delvenetworks.com/player/loader.swf?mediaId=8018a574f08d416e95ceaccae4ba0452',
'only_matching': True,

Some files were not shown because too many files have changed in this diff Show More