Compare commits

..

125 Commits

Author SHA1 Message Date
Sergey M․
b1ce2ba197 release 2016.08.10 2016-08-10 00:20:44 +07:00
Sergey M․
5c8411e968 [ChangeLog] Actualize 2016-08-10 00:18:28 +07:00
Sergey M․
cc9c8ce5df [devscripts/prepare_manpage] Fix description strings starting with dash (Closes #10273) 2016-08-09 22:24:58 +07:00
Remita Amine
20ef4123b9 [uol] remove unused import 2016-08-09 15:13:15 +01:00
Remita Amine
4e62d26aa2 [uol] Add new extractor(#4263) 2016-08-09 15:09:08 +01:00
Sergey M․
b657816684 Credit @singh-pratyush96 for #10223 2016-08-09 04:04:45 +07:00
Sergey M․
9778b3e7ee Credit @zvonicek for #10242 and #10253 2016-08-09 04:03:52 +07:00
Sergey M․
25dd58ca6a [metadatafromtitle] Remove unused exception class 2016-08-09 04:01:05 +07:00
nyorain
5e42f8a0ad Make --metadata-from-title non fatal
Output a warning if the metadata can't be parsed from the title (and don't write any metadata) instead of raising a critical error.
2016-08-09 03:56:22 +07:00
Sergey M․
1ad6b891b2 Add more checks for --min/max-sleep-interval arguments and use more idiomatic naming 2016-08-09 03:47:56 +07:00
Sergey M․
7aa589a5e1 Fix --min/max-sleep-interval wording 2016-08-09 03:46:52 +07:00
singh-pratyush96
065bc35489 Add --max-sleep-interval (Closes #9930) 2016-08-09 03:32:42 +07:00
Sergey M․
3a380766d1 [rbmaradio] Improve, simplify and extract all formats (Closes #10242) 2016-08-09 02:46:29 +07:00
Petr Zvoníček
affaea0688 [rbmaradio] Fixed extractor 2016-08-09 02:18:33 +07:00
Sergey M․
77426a087b [sonyliv] Improve (Closes #10258) 2016-08-09 02:16:28 +07:00
Sukhbir Singh
8991844ea2 [sonyliv] Add new extractor 2016-08-09 02:09:13 +07:00
Sergey M․
082395d0a0 [extractor/generic] Add proper default to _search_json_ld call 2016-08-08 22:48:33 +07:00
Sergey M․
e8ed7354e6 [flipagram] Add proper default to _search_json_ld call 2016-08-08 22:46:19 +07:00
Sergey M․
1e7f602e2a [condenast] Make _search_json_ld call non fatal 2016-08-08 22:45:49 +07:00
Sergey M․
522f6c066d [bbc] Add proper default to _search_json_ld call 2016-08-08 22:44:36 +07:00
Sergey M․
321b5e082a [extractor/common] Respect default in _search_json_ld 2016-08-08 22:36:18 +07:00
Sergey M․
3711fa1eb2 Revert "[flipagram] Make _search_json_ld non fatal"
This reverts commit d34995a9e3.
2016-08-08 21:49:45 +07:00
Sergey M․
395c74615c Revert "[extractor/generic] Make _search_json_ld non fatal"
This reverts commit 958849275f.
2016-08-08 21:49:27 +07:00
Yen Chi Hsuan
3dc240e8c6 [sohu] Update _TESTS (closes #10260) 2016-08-08 18:48:21 +08:00
Yen Chi Hsuan
a41a6c5094 [chaturbate] Skip the invalid test 2016-08-08 13:06:02 +08:00
Yen Chi Hsuan
d71207121d [biqle] Skip an invalid test 2016-08-08 12:59:55 +08:00
Yen Chi Hsuan
b1c6f21c74 [aparat] Fix extraction 2016-08-08 12:59:07 +08:00
Yen Chi Hsuan
412abb8760 [bilibili] Update _TESTS 2016-08-08 12:57:17 +08:00
Yen Chi Hsuan
f17d5f6d14 [features.aol.com] Fix _TESTS 2016-08-08 12:52:36 +08:00
Remita Amine
6bb801cfaf [cwtv] extract http formats 2016-08-07 22:58:12 +01:00
Sergey M․
de02d1f4e9 [rozhlas] Fix regexes and improve extraction (Closes #10253) 2016-08-08 04:58:02 +07:00
Petr Zvoníček
e1f93a0a76 [rozhlas] Add new extractor 2016-08-08 04:41:45 +07:00
Charlie Le
d21a661bb4 [README.md] Update Options Link
The link references a bad anchor. The updated link now references the correct anchor.
2016-08-08 03:46:42 +07:00
Yen Chi Hsuan
b2bd968f4b [kuwo:singer] Fix extraction 2016-08-07 22:59:34 +08:00
Sergey M․
4a01befb34 release 2016.08.07 2016-08-07 21:12:41 +07:00
Sergey M․
845dfcdc40 [ChangeLog] Actualize 2016-08-07 21:10:48 +07:00
Sergey M․
d92cb46305 [discoverygo] Add extractor (Closes #10245) 2016-08-07 20:57:05 +07:00
Sergey M․
a8795327ca [utils] Add support TV Parental Guidelines ratings in parse_age_limit 2016-08-07 20:45:18 +07:00
Sergey M․
d34995a9e3 [flipagram] Make _search_json_ld non fatal 2016-08-07 19:06:55 +07:00
Sergey M․
958849275f [extractor/generic] Make _search_json_ld non fatal 2016-08-07 19:04:22 +07:00
Sergey M․
998f094452 [bbc] Remove proxy from test 2016-08-07 18:13:05 +07:00
Sergey M․
aaa42cf0cf [bbc] PEP 8 2016-08-07 18:05:13 +07:00
Sergey M․
9fb64c04cd [bbc] Add support for morph embeds (Closes #10239) 2016-08-07 18:01:50 +07:00
Remita Amine
f9622868e7 [bbc] preserve format_id backward compatibility 2016-08-07 11:14:15 +01:00
Remita Amine
37768f9242 [common] correctly lower the preference of m3u8 master manifest format 2016-08-07 10:59:09 +01:00
Sergey M․
a1aadd09a4 [tnaflixnetworkbase] Improve title extraction 2016-08-07 16:00:09 +07:00
Sergey M․
b47a75017b [tnaflix] Fix metadata extraction (Closes #10249) 2016-08-07 16:00:03 +07:00
Remita Amine
e37b54b140 [fox] fix theplatform release url query 2016-08-06 20:53:39 +01:00
Yen Chi Hsuan
c1decda58c [openload] Fix extraction (closes #9706) 2016-08-07 02:44:15 +08:00
Yen Chi Hsuan
d3f8e038fe [utils] Add decode_png for openload (#9706) 2016-08-07 02:42:58 +08:00
Remita Amine
ad152e2d95 [bbc] fix test 2016-08-06 19:36:12 +01:00
Remita Amine
b0af12154e [bbc] reduce requests and improve format_id 2016-08-06 19:24:59 +01:00
Remita Amine
d16b3c6677 [common] extract partOfTVSeries info in json-ld 2016-08-06 18:58:38 +01:00
Remita Amine
c57244cdb1 [common] lower the preference of m3u8 master manifest format 2016-08-06 18:55:05 +01:00
Remita Amine
a7e5f27412 [bbc] improve extraction
- extract f4m and dash formats
- improve format sorting and listing
- improve extraction of articles with `otherSettings.playlist`
2016-08-06 18:48:09 +01:00
Remita Amine
089a40955c [pokemon] improve _VALID_URL 2016-08-06 12:08:14 +01:00
Remita Amine
d73ebac100 [pokemon] Add new extractor(closes #10093) 2016-08-06 11:18:14 +01:00
Remita Amine
e563c0d73b [condenast] fallback to loader.js if video.js fail 2016-08-05 21:01:16 +01:00
Sergey M․
491c42e690 release 2016.08.06 2016-08-06 01:23:48 +07:00
Sergey M․
7f2339c617 [ChangeLog] Actualize 2016-08-06 01:19:47 +07:00
Sergey M․
8122e79fef [gamekings] Remove remnants 2016-08-06 00:12:37 +07:00
Sergey M․
fe3ad1d456 [adultswim] Remove superfluous md5 from test 2016-08-06 00:02:05 +07:00
Sergey M․
038a5e1a65 [adultswim] Add support for trailers (Closes #10235) 2016-08-06 00:00:05 +07:00
Sergey M․
84bc23b41b [archiveorg] PEP 8 2016-08-05 23:16:19 +07:00
Sergey M․
46933a15d6 [extractor/common] Support root JSON-LD lists (Closes #10203) 2016-08-05 23:14:32 +07:00
Sergey M․
3859ebeee6 [tvplay] Capture and output native error message 2016-08-05 22:50:42 +07:00
Remita Amine
d50aca41f8 [archiveorg] improve format extraction(closes #10219) 2016-08-05 16:42:15 +01:00
Remita Amine
0ca057b965 [jwplatform] add support for playlist extraction and relative urls and improve audio detection 2016-08-05 16:42:15 +01:00
Sergey M․
5ca968d0a6 [tvplay] Extract series metadata 2016-08-05 22:37:38 +07:00
Sergey M․
f0d31c624e [tvplay] Add support for subtitles (Closes #10194) 2016-08-05 22:17:32 +07:00
Remita Amine
08c655906c [5min] fix _VALID_URL(closes #10228) 2016-08-05 10:22:33 +01:00
Remita Amine
5a993e1692 [natgeo] fix tests(closes #10229) 2016-08-05 10:13:26 +01:00
Remita Amine
a7d2953073 [extractors] add tvp:embed import 2016-08-05 10:11:59 +01:00
Remita Amine
fdd0b8f8e0 [tvp] extract video id from the webpage(fixes #7799) 2016-08-05 09:44:15 +01:00
Remita Amine
f65dc41b72 [naver] extract upload date 2016-08-05 08:12:25 +01:00
Yen Chi Hsuan
962250f7ea [cbslocal] Fix timestamp parsing (closes #10213) 2016-08-05 11:44:50 +08:00
Yen Chi Hsuan
7dc2a74e0a [utils] Fix unified_timestamp for formats parsed by parsedate_tz() 2016-08-05 11:41:55 +08:00
Remita Amine
b02b960c6b [naver] improve extraction(closes #8096) 2016-08-04 21:42:22 +01:00
Remita Amine
4f427c4be8 [condenast] improve extraction 2016-08-04 18:30:56 +01:00
Sergey M․
8a00ea567b [natgeo:episodeguide] Do not shadow url from outer scope 2016-08-04 23:21:04 +07:00
Remita Amine
8895be01fc [5min] fix _VALID_URL 2016-08-04 16:55:12 +01:00
Remita Amine
52e7fcfeb7 [engadget] Relax _VALID_URL 2016-08-04 16:34:47 +01:00
Remita Amine
2396062c74 [5min] delegate extraction to AolIE
recently the 5min SenseHandler request return
HTTP Error 503: Service Unavailable error
2016-08-04 16:21:27 +01:00
Remita Amine
14704aeff6 [kaltura] remove debugging line 2016-08-04 14:54:34 +01:00
Remita Amine
3c2c3af059 [extractors] change imports for national geographic extractors 2016-08-04 12:20:56 +01:00
Remita Amine
1891ea2d76 [nationalgeographic] Add support for National Geographic Episode Guide 2016-08-04 12:18:10 +01:00
Remita Amine
1094074c04 [kaltura] extract subtitles and reduce requests 2016-08-04 09:39:06 +01:00
Remita Amine
217d5ae013 [vodplatform] Add new extractor 2016-08-04 09:39:06 +01:00
Remita Amine
8b40854529 [common] lower proto_preference of rtsp formats
Most of the time the RtspFD fail to download videos but it report
success of the download with this output:
[mpv] 0 bytes
[download] 100% of 0.00B
2016-08-04 09:39:06 +01:00
Yen Chi Hsuan
6bb0fbf9fb Revert "[README.md] Use full paths for all configuration files (#8863)"
This reverts commit 899d2bea63.
2016-08-04 09:54:28 +08:00
Sergey M․
8d3b226b83 [gamekings] Remove extractor
Now covered by generic jwplayer
2016-08-03 22:06:10 +07:00
Remita Amine
42b7a5afe0 [limelight] extract http formats 2016-08-03 13:12:51 +01:00
Yen Chi Hsuan
899d2bea63 [README.md] Use full paths for all configuration files (#8863) 2016-08-03 11:15:27 +08:00
Sergey M․
9cb0e65d7e [ntvru] Fix extraction 2016-08-02 22:56:48 +07:00
Sergey M․
b070564efb [extractor/common] Support multiple properties in _og_search_property 2016-08-02 22:55:14 +07:00
Philipp Hagemeister
ce28252c48 [options] Add test that checks that --password=secret is hidden in verbose output 2016-08-02 17:03:46 +02:00
Philipp Hagemeister
3aa9a73554 [options] Hide --password=secret in verbose output 2016-08-02 17:03:26 +02:00
Philipp Hagemeister
6a9b3b61ea [comedycentral] Re-add shortnames
In cc99d4f826, the shortname feature got deleted by accident. Re-add it as a separate IE.
2016-08-02 14:02:31 +02:00
Sergey M․
45408eb075 release 2016.08.01 2016-08-01 22:59:23 +07:00
Sergey M․
eafc66855d [ChangeLog] Add recent changes 2016-08-01 22:56:01 +07:00
Sergey M․
e03d3e6453 [cwtv] Add support for cwtvpr.com (Closes #10196) 2016-08-01 22:51:01 +07:00
Remita Amine
a70e45f80a [limelight] keep videos marked as previewStream
e382b953f0 (commitcomment-18472915)
2016-08-01 16:25:41 +01:00
Sergey M․
697655a7c0 [safari] Relax url regexes (Closes #10202) 2016-08-01 21:48:48 +07:00
Remita Amine
e382b953f0 [limelight] skip preview and drm protected videos 2016-08-01 00:33:30 +01:00
Yen Chi Hsuan
116e7e0d04 [bloomberg] Support BPlayer() players (closes #10187) 2016-07-31 14:47:19 +08:00
Sergey M․
cf03e34ad3 [yandexmusic:track] Fix extraction (Closes #10193) 2016-07-31 07:56:18 +07:00
Sergey M․
2903137292 release 2016.07.30 2016-07-30 14:45:07 +07:00
Sergey M․
9361f2169c [ChangeLog] Make extractor improvements' descriptions more concrete 2016-07-30 14:43:28 +07:00
Yen Chi Hsuan
35aa6c538f Add ChangeLog 2016-07-30 12:33:09 +08:00
Sergey M․
fa9f1d16b8 [dailymotion:playlist] Carry long line 2016-07-29 22:47:34 +07:00
Dave
485fedf6fd [dailymotion:playlist] Optimize download archive processing 2016-07-29 22:45:41 +07:00
Jaime Marquínez Ferrándiz
da0baba5c8 [rtve] Fix extraction for some videos
For example http://www.rtve.es/alacarta/videos/documentos-tv/documentos-tv-descredito/3574098/.
2016-07-29 17:20:27 +02:00
Jaime Marquínez Ferrándiz
bb9f3bfedf Revert "[rtve] Fix extraction (#10076)"
This reverts commit c39b2ed990.

Apparently outside of Spain using 'auth/resources' is required (#10097).
2016-07-29 17:14:04 +02:00
Sergey M․
dbc0b39b91 [tv2] Improve extraction 2016-07-29 22:01:34 +07:00
Sergey M․
481c5c5137 [tv2:article] Fix extraction (Closes #10188) 2016-07-29 21:43:17 +07:00
Sergey M․
0cacae2807 [twitch:clips] Sort formats 2016-07-29 09:01:53 +07:00
Sergey M․
d9d56deadf release 2016.07.28 2016-07-28 02:42:57 +07:00
Sergey M․
74ba450a81 [twitch:clips] Fix extraction (Closes #9767) 2016-07-28 22:30:09 +07:00
Sergey M․
db19df6ca0 [extractor/generic] Add test for #10179 2016-07-28 22:20:08 +07:00
Sergey M․
fbdf8d15d1 [soundcloud] Add _extract_urls (#10179) 2016-07-28 22:16:05 +07:00
Sergey M․
94aae01548 [extractor/generic] Extract all soundcloud embeds (Closes #10179) 2016-07-28 22:15:15 +07:00
Sergey M․
39eef54cf0 [ard:mediathek] Skip unavailable test 2016-07-28 21:38:23 +07:00
Sergey M․
05c8268c81 [shared] Modernize and make more robust 2016-07-27 23:39:02 +07:00
Sergey M․
289a16b4f3 [shared] Respect redirect URL (Closes #10170) 2016-07-27 23:28:01 +07:00
Sergey M․
7935926baa [devscripts/show-downloads-statistics] Add support for paging 2016-07-27 00:14:40 +07:00
69 changed files with 2250 additions and 911 deletions

View File

@@ -6,8 +6,8 @@
---
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2016.07.26.2*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2016.07.26.2**
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2016.08.10*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2016.08.10**
### Before submitting an *issue* make sure you have:
- [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
@@ -35,7 +35,7 @@ $ youtube-dl -v <your command line>
[debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2016.07.26.2
[debug] youtube-dl version 2016.08.10
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {}

View File

@@ -179,3 +179,5 @@ Jakub Adam Wieczorek
Aleksandar Topuzović
Nehal Patel
Rob van Bekkum
Petr Zvoníček
Pratyush Singh

View File

@@ -46,7 +46,7 @@ Make sure that someone has not already opened the issue you're trying to open. S
### Why are existing options not enough?
Before requesting a new feature, please have a quick peek at [the list of supported options](https://github.com/rg3/youtube-dl/blob/master/README.md#synopsis). Many feature requests are for features that actually exist already! Please, absolutely do show off your work in the issue report and detail how the existing similar options do *not* solve your problem.
Before requesting a new feature, please have a quick peek at [the list of supported options](https://github.com/rg3/youtube-dl/blob/master/README.md#options). Many feature requests are for features that actually exist already! Please, absolutely do show off your work in the issue report and detail how the existing similar options do *not* solve your problem.
### Is there enough context in your bug report?

351
ChangeLog Normal file
View File

@@ -0,0 +1,351 @@
version 2016.08.10
Core
* Make --metadata-from-title non fatal when title does not match the pattern
* Introduce options for randomized sleep before each download
--min-sleep-interval and --max-sleep-interval (#9930)
* Respect default in _search_json_ld
Extractors
+ [uol] Add extractor for uol.com.br (#4263)
* [rbmaradio] Fix extraction and extract all formats (#10242)
+ [sonyliv] Add extractor for sonyliv.com (#10258)
* [aparat] Fix extraction
* [cwtv] Extract HTTP formats
+ [rozhlas] Add extractor for prehravac.rozhlas.cz (#10253)
* [kuwo:singer] Fix extraction
version 2016.08.07
Core
+ Add support for TV Parental Guidelines ratings in parse_age_limit
+ Add decode_png (#9706)
+ Add support for partOfTVSeries in JSON-LD
* Lower master M3U8 manifest preference for better format sorting
Extractors
+ [discoverygo] Add extractor (#10245)
* [flipagram] Make JSON-LD extraction non fatal
* [generic] Make JSON-LD extraction non fatal
+ [bbc] Add support for morph embeds (#10239)
* [tnaflixnetworkbase] Improve title extraction
* [tnaflix] Fix metadata extraction (#10249)
* [fox] Fix theplatform release URL query
* [openload] Fix extraction (#9706)
* [bbc] Skip duplicate manifest URLs
* [bbc] Improve format code
+ [bbc] Add support for DASH and F4M
* [bbc] Improve format sorting and listing
* [bbc] Improve playlist extraction
+ [pokemon] Add extractor (#10093)
+ [condenast] Add fallback scenario for video info extraction
version 2016.08.06
Core
* Add support for JSON-LD root list entries (#10203)
* Improve unified_timestamp
* Lower preference of RTSP formats in generic sorting
+ Add support for multiple properties in _og_search_property
* Improve password hiding from verbose output
Extractors
+ [adultswim] Add support for trailers (#10235)
* [archiveorg] Improve extraction (#10219)
+ [jwplatform] Add support for playlists
+ [jwplatform] Add support for relative URLs
* [jwplatform] Improve audio detection
+ [tvplay] Capture and output native error message
+ [tvplay] Extract series metadata
+ [tvplay] Add support for subtitles (#10194)
* [tvp] Improve extraction (#7799)
* [cbslocal] Fix timestamp parsing (#10213)
+ [naver] Add support for subtitles (#8096)
* [naver] Improve extraction
* [condenast] Improve extraction
* [engadget] Relax URL regular expression
* [5min] Fix extraction
+ [nationalgeographic] Add support for Episode Guide
+ [kaltura] Add support for subtitles
* [kaltura] Optimize network requests
+ [vodplatform] Add extractor for vod-platform.net
- [gamekings] Remove extractor
* [limelight] Extract HTTP formats
* [ntvru] Fix extraction
+ [comedycentral] Re-add :tds and :thedailyshow shortnames
version 2016.08.01
Fixed/improved extractors
- [yandexmusic:track] Adapt to changes in track location JSON (#10193)
- [bloomberg] Support another form of player (#10187)
- [limelight] Skip DRM protected videos
- [safari] Relax regular expressions for URL matching (#10202)
- [cwtv] Add support for cwtvpr.com (#10196)
version 2016.07.30
Fixed/improved extractors
- [twitch:clips] Sort formats
- [tv2] Use m3u8_native
- [tv2:article] Fix video detection (#10188)
- rtve (#10076)
- [dailymotion:playlist] Optimize download archive processing (#10180)
version 2016.07.28
Fixed/improved extractors
- shared (#10170)
- soundcloud (#10179)
- twitch (#9767)
version 2016.07.26.2
Fixed/improved extractors
- smotri
- camdemy
- mtv
- comedycentral
- cmt
- cbc
- mgtv
- orf
version 2016.07.24
New extractors
- arkena (#8682)
- lcp (#8682)
Fixed/improved extractors
- facebook (#10151)
- dailymail
- telegraaf
- dcn
- onet
- tvp
Miscellaneous
- Support $Time$ in DASH manifests
version 2016.07.22
New extractors
- odatv (#9285)
Fixed/improved extractors
- bbc
- youjizz (#10131)
- youtube (#10140)
- pornhub (#10138)
- eporner (#10139)
version 2016.07.17
New extractors
- nintendo (#9986)
- streamable (#9122)
Fixed/improved extractors
- ard (#10095)
- mtv
- comedycentral (#10101)
- viki (#10098)
- spike (#10106)
Miscellaneous
- Improved twitter player detection (#10090)
version 2016.07.16
New extractors
- ninenow (#5181)
Fixed/improved extractors
- rtve (#10076)
- brightcove
- 3qsdn
- syfy (#9087, #3820, #2388)
- youtube (#10083)
Miscellaneous
- Fix subtitle embedding for video-only and audio-only files (#10081)
version 2016.07.13
New extractors
- rudo
Fixed/improved extractors
- biobiochiletv
- tvplay
- dbtv
- brightcove
- tmz
- youtube (#10059)
- shahid (#10062)
- vk
- ellentv (#10067)
version 2016.07.11
New Extractors
- roosterteeth (#9864)
Fixed/improved extractors
- miomio (#9605)
- vuclip
- youtube
- vidzi (#10058)
version 2016.07.09.2
Fixed/improved extractors
- vimeo (#1638)
- facebook (#10048)
- lynda (#10047)
- animeondemand
Fixed/improved features
- Embedding subtitles no longer throws an error with problematic inputs (#9063)
version 2016.07.09.1
Fixed/improved extractors
- youtube
- ard
- srmediatek (#9373)
version 2016.07.09
New extractors
- Flipagram (#9898)
Fixed/improved extractors
- telecinco
- toutv
- radiocanada
- tweakers (#9516)
- lynda
- nick (#7542)
- polskieradio (#10028)
- le
- facebook (#9851)
- mgtv
- animeondemand (#10031)
Fixed/improved features
- `--postprocessor-args` and `--downloader-args` now accepts non-ASCII inputs
on non-Windows systems
version 2016.07.07
New extractors
- kamcord (#10001)
Fixed/improved extractors
- spiegel (#10018)
- metacafe (#8539, #3253)
- onet (#9950)
- francetv (#9955)
- brightcove (#9965)
- daum (#9972)
version 2016.07.06
Fixed/improved extractors
- youtube (#10007, #10009)
- xuite
- stitcher
- spiegel
- slideshare
- sandia
- rtvnh
- prosiebensat1
- onionstudios
version 2016.07.05
Fixed/improved extractors
- brightcove
- yahoo (#9995)
- pornhub (#9997)
- iqiyi
- kaltura (#5557)
- la7
- Changed features
- Rename --cn-verfication-proxy to --geo-verification-proxy
Miscellaneous
- Add script for displaying downloads statistics
version 2016.07.03.1
Fixed/improved extractors
- theplatform
- aenetworks
- nationalgeographic
- hrti (#9482)
- facebook (#5701)
- buzzfeed (#5701)
- rai (#8617, #9157, #9232, #8552, #8551)
- nationalgeographic (#9991)
- iqiyi
version 2016.07.03
New extractors
- hrti (#9482)
Fixed/improved extractors
- vk (#9981)
- facebook (#9938)
- xtube (#9953, #9961)
version 2016.07.02
New extractors
- fusion (#9958)
Fixed/improved extractors
- twitch (#9975)
- vine (#9970)
- periscope (#9967)
- pornhub (#8696)
version 2016.07.01
New extractors
- 9c9media
- ctvnews (#2156)
- ctv (#4077)
Fixed/Improved extractors
- rds
- meta (#8789)
- pornhub (#9964)
- sixplay (#2183)
New features
- Accept quoted strings across multiple lines (#9940)

View File

@@ -94,7 +94,7 @@ _EXTRACTOR_FILES != find youtube_dl/extractor -iname '*.py' -and -not -iname 'la
youtube_dl/extractor/lazy_extractors.py: devscripts/make_lazy_extractors.py devscripts/lazy_load_template.py $(_EXTRACTOR_FILES)
$(PYTHON) devscripts/make_lazy_extractors.py $@
youtube-dl.tar.gz: youtube-dl README.md README.txt youtube-dl.1 youtube-dl.bash-completion youtube-dl.zsh youtube-dl.fish
youtube-dl.tar.gz: youtube-dl README.md README.txt youtube-dl.1 youtube-dl.bash-completion youtube-dl.zsh youtube-dl.fish ChangeLog
@tar -czf youtube-dl.tar.gz --transform "s|^|youtube-dl/|" --owner 0 --group 0 \
--exclude '*.DS_Store' \
--exclude '*.kate-swp' \
@@ -107,7 +107,7 @@ youtube-dl.tar.gz: youtube-dl README.md README.txt youtube-dl.1 youtube-dl.bash-
--exclude 'docs/_build' \
-- \
bin devscripts test youtube_dl docs \
LICENSE README.md README.txt \
ChangeLog LICENSE README.md README.txt \
Makefile MANIFEST.in youtube-dl.1 youtube-dl.bash-completion \
youtube-dl.zsh youtube-dl.fish setup.py \
youtube-dl

View File

@@ -330,7 +330,15 @@ which means you can modify it, redistribute it or use it however you like.
bidirectional text support. Requires bidiv
or fribidi executable in PATH
--sleep-interval SECONDS Number of seconds to sleep before each
download.
download when used alone or a lower bound
of a range for randomized sleep before each
download (minimum possible number of
seconds to sleep) when used along with
--max-sleep-interval.
--max-sleep-interval SECONDS Upper bound of a range for randomized sleep
before each download (maximum possible
number of seconds to sleep). Must only be
used along with --min-sleep-interval.
## Video Format Options:
-f, --format FORMAT Video format code, see the "FORMAT
@@ -1196,7 +1204,7 @@ Make sure that someone has not already opened the issue you're trying to open. S
### Why are existing options not enough?
Before requesting a new feature, please have a quick peek at [the list of supported options](https://github.com/rg3/youtube-dl/blob/master/README.md#synopsis). Many feature requests are for features that actually exist already! Please, absolutely do show off your work in the issue report and detail how the existing similar options do *not* solve your problem.
Before requesting a new feature, please have a quick peek at [the list of supported options](https://github.com/rg3/youtube-dl/blob/master/README.md#options). Many feature requests are for features that actually exist already! Please, absolutely do show off your work in the issue report and detail how the existing similar options do *not* solve your problem.
### Is there enough context in your bug report?

View File

@@ -54,17 +54,21 @@ def filter_options(readme):
if in_options:
if line.lstrip().startswith('-'):
option, description = re.split(r'\s{2,}', line.lstrip())
split_option = option.split(' ')
split = re.split(r'\s{2,}', line.lstrip())
# Description string may start with `-` as well. If there is
# only one piece then it's a description bit not an option.
if len(split) > 1:
option, description = split
split_option = option.split(' ')
if not split_option[-1].startswith('-'): # metavar
option = ' '.join(split_option[:-1] + ['*%s*' % split_option[-1]])
if not split_option[-1].startswith('-'): # metavar
option = ' '.join(split_option[:-1] + ['*%s*' % split_option[-1]])
# Pandoc's definition_lists. See http://pandoc.org/README.html
# for more information.
ret += '\n%s\n: %s\n' % (option, description)
else:
ret += line.lstrip() + '\n'
# Pandoc's definition_lists. See http://pandoc.org/README.html
# for more information.
ret += '\n%s\n: %s\n' % (option, description)
continue
ret += line.lstrip() + '\n'
else:
ret += line + '\n'

View File

@@ -71,9 +71,12 @@ fi
/bin/echo -e "\n### Changing version in version.py..."
sed -i "s/__version__ = '.*'/__version__ = '$version'/" youtube_dl/version.py
/bin/echo -e "\n### Changing version in ChangeLog..."
sed -i "s/<unreleased>/$version/" ChangeLog
/bin/echo -e "\n### Committing documentation, templates and youtube_dl/version.py..."
make README.md CONTRIBUTING.md .github/ISSUE_TEMPLATE.md supportedsites
git add README.md CONTRIBUTING.md .github/ISSUE_TEMPLATE.md docs/supportedsites.md youtube_dl/version.py
git add README.md CONTRIBUTING.md .github/ISSUE_TEMPLATE.md docs/supportedsites.md youtube_dl/version.py ChangeLog
git commit $gpg_sign_commits -m "release $version"
/bin/echo -e "\n### Now tagging, signing and pushing..."

View File

@@ -1,6 +1,7 @@
#!/usr/bin/env python
from __future__ import unicode_literals
import itertools
import json
import os
import re
@@ -21,21 +22,26 @@ def format_size(bytes):
total_bytes = 0
releases = json.loads(compat_urllib_request.urlopen(
'https://api.github.com/repos/rg3/youtube-dl/releases').read().decode('utf-8'))
for page in itertools.count(1):
releases = json.loads(compat_urllib_request.urlopen(
'https://api.github.com/repos/rg3/youtube-dl/releases?page=%s' % page
).read().decode('utf-8'))
for release in releases:
compat_print(release['name'])
for asset in release['assets']:
asset_name = asset['name']
total_bytes += asset['download_count'] * asset['size']
if all(not re.match(p, asset_name) for p in (
r'^youtube-dl$',
r'^youtube-dl-\d{4}\.\d{2}\.\d{2}(?:\.\d+)?\.tar\.gz$',
r'^youtube-dl\.exe$')):
continue
compat_print(
' %s size: %s downloads: %d'
% (asset_name, format_size(asset['size']), asset['download_count']))
if not releases:
break
for release in releases:
compat_print(release['name'])
for asset in release['assets']:
asset_name = asset['name']
total_bytes += asset['download_count'] * asset['size']
if all(not re.match(p, asset_name) for p in (
r'^youtube-dl$',
r'^youtube-dl-\d{4}\.\d{2}\.\d{2}(?:\.\d+)?\.tar\.gz$',
r'^youtube-dl\.exe$')):
continue
compat_print(
' %s size: %s downloads: %d'
% (asset_name, format_size(asset['size']), asset['download_count']))
compat_print('total downloads traffic: %s' % format_size(total_bytes))

View File

@@ -142,6 +142,7 @@
- **CollegeRama**
- **ComCarCoff**
- **ComedyCentral**
- **ComedyCentralShortname**
- **ComedyCentralTV**
- **CondeNast**: Condé Nast media group: Allure, Architectural Digest, Ars Technica, Bon Appétit, Brides, Condé Nast, Condé Nast Traveler, Details, Epicurious, GQ, Glamour, Golf Digest, SELF, Teen Vogue, The New Yorker, Vanity Fair, Vogue, W Magazine, WIRED
- **Coub**
@@ -181,6 +182,7 @@
- **DigitallySpeaking**
- **Digiteka**
- **Discovery**
- **DiscoveryGo**
- **Dotsub**
- **DouyuTV**: 斗鱼
- **DPlay**
@@ -247,7 +249,6 @@
- **FunnyOrDie**
- **Fusion**
- **GameInformer**
- **Gamekings**
- **GameOne**
- **gameone:playlist**
- **Gamersyde**
@@ -415,7 +416,8 @@
- **MyVidster**
- **n-tv.de**
- **natgeo**
- **natgeo:channel**
- **natgeo:episodeguide**
- **natgeo:video**
- **Naver**
- **NBA**
- **NBC**
@@ -517,6 +519,7 @@
- **plus.google**: Google Plus
- **pluzz.francetv.fr**
- **podomatic**
- **Pokemon**
- **PolskieRadio**
- **PornHd**
- **PornHub**: PornHub and Thumbzilla
@@ -561,6 +564,7 @@
- **RoosterTeeth**
- **RottenTomatoes**
- **Roxwel**
- **Rozhlas**
- **RTBF**
- **rte**: Raidió Teilifís Éireann TV
- **rte:radio**: Raidió Teilifís Éireann radio
@@ -618,6 +622,7 @@
- **smotri:user**: Smotri.com user videos
- **Snotr**
- **Sohu**
- **SonyLIV**
- **soundcloud**
- **soundcloud:playlist**
- **soundcloud:search**: Soundcloud search
@@ -726,6 +731,7 @@
- **tvigle**: Интернет-телевидение Tvigle.ru
- **tvland.com**
- **tvp**: Telewizja Polska
- **tvp:embed**: Telewizja Polska
- **tvp:series**
- **TVPlay**: TV3Play and related services
- **Tweakers**
@@ -743,6 +749,7 @@
- **udemy:course**
- **UDNEmbed**: 聯合影音
- **Unistra**
- **uol.com.br**
- **Urort**: NRK P3 Urørt
- **URPlay**
- **USAToday**
@@ -805,6 +812,7 @@
- **vk:wallpost**
- **vlive**
- **Vodlocker**
- **VODPlatform**
- **VoiceRepublic**
- **VoxMedia**
- **Vporn**

View File

@@ -48,6 +48,9 @@ class TestInfoExtractor(unittest.TestCase):
self.assertEqual(ie._og_search_property('foobar', html), 'Foo')
self.assertEqual(ie._og_search_property('test1', html), 'foo > < bar')
self.assertEqual(ie._og_search_property('test2', html), 'foo >//< bar')
self.assertEqual(ie._og_search_property(('test0', 'test1'), html), 'foo > < bar')
self.assertRaises(RegexNotFoundError, ie._og_search_property, 'test0', html, None, fatal=True)
self.assertRaises(RegexNotFoundError, ie._og_search_property, ('test0', 'test00'), html, None, fatal=True)
def test_html_search_meta(self):
ie = self.ie

View File

@@ -42,6 +42,7 @@ from youtube_dl.utils import (
ohdave_rsa_encrypt,
OnDemandPagedList,
orderedSet,
parse_age_limit,
parse_duration,
parse_filesize,
parse_count,
@@ -308,6 +309,7 @@ class TestUtil(unittest.TestCase):
self.assertEqual(unified_timestamp('25-09-2014'), 1411603200)
self.assertEqual(unified_timestamp('27.02.2016 17:30'), 1456594200)
self.assertEqual(unified_timestamp('UNKNOWN DATE FORMAT'), None)
self.assertEqual(unified_timestamp('May 16, 2016 11:15 PM'), 1463440500)
def test_determine_ext(self):
self.assertEqual(determine_ext('http://example.com/foo/bar.mp4/?download'), 'mp4')
@@ -431,6 +433,20 @@ class TestUtil(unittest.TestCase):
url_basename('http://media.w3.org/2010/05/sintel/trailer.mp4'),
'trailer.mp4')
def test_parse_age_limit(self):
self.assertEqual(parse_age_limit(None), None)
self.assertEqual(parse_age_limit(False), None)
self.assertEqual(parse_age_limit('invalid'), None)
self.assertEqual(parse_age_limit(0), 0)
self.assertEqual(parse_age_limit(18), 18)
self.assertEqual(parse_age_limit(21), 21)
self.assertEqual(parse_age_limit(22), None)
self.assertEqual(parse_age_limit('18'), 18)
self.assertEqual(parse_age_limit('18+'), 18)
self.assertEqual(parse_age_limit('PG-13'), 13)
self.assertEqual(parse_age_limit('TV-14'), 14)
self.assertEqual(parse_age_limit('TV-MA'), 17)
def test_parse_duration(self):
self.assertEqual(parse_duration(None), None)
self.assertEqual(parse_duration(False), None)

View File

@@ -0,0 +1,70 @@
#!/usr/bin/env python
# coding: utf-8
from __future__ import unicode_literals
import unittest
import sys
import os
import subprocess
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
rootDir = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
class TestVerboseOutput(unittest.TestCase):
def test_private_info_arg(self):
outp = subprocess.Popen(
[
sys.executable, 'youtube_dl/__main__.py', '-v',
'--username', 'johnsmith@gmail.com',
'--password', 'secret',
], cwd=rootDir, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
sout, serr = outp.communicate()
self.assertTrue('--username' in serr)
self.assertTrue('johnsmith' not in serr)
self.assertTrue('--password' in serr)
self.assertTrue('secret' not in serr)
def test_private_info_shortarg(self):
outp = subprocess.Popen(
[
sys.executable, 'youtube_dl/__main__.py', '-v',
'-u', 'johnsmith@gmail.com',
'-p', 'secret',
], cwd=rootDir, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
sout, serr = outp.communicate()
self.assertTrue('-u' in serr)
self.assertTrue('johnsmith' not in serr)
self.assertTrue('-p' in serr)
self.assertTrue('secret' not in serr)
def test_private_info_eq(self):
outp = subprocess.Popen(
[
sys.executable, 'youtube_dl/__main__.py', '-v',
'--username=johnsmith@gmail.com',
'--password=secret',
], cwd=rootDir, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
sout, serr = outp.communicate()
self.assertTrue('--username' in serr)
self.assertTrue('johnsmith' not in serr)
self.assertTrue('--password' in serr)
self.assertTrue('secret' not in serr)
def test_private_info_shortarg_eq(self):
outp = subprocess.Popen(
[
sys.executable, 'youtube_dl/__main__.py', '-v',
'-u=johnsmith@gmail.com',
'-p=secret',
], cwd=rootDir, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
sout, serr = outp.communicate()
self.assertTrue('-u' in serr)
self.assertTrue('johnsmith' not in serr)
self.assertTrue('-p' in serr)
self.assertTrue('secret' not in serr)
if __name__ == '__main__':
unittest.main()

View File

@@ -249,7 +249,16 @@ class YoutubeDL(object):
source_address: (Experimental) Client-side IP address to bind to.
call_home: Boolean, true iff we are allowed to contact the
youtube-dl servers for debugging.
sleep_interval: Number of seconds to sleep before each download.
sleep_interval: Number of seconds to sleep before each download when
used alone or a lower bound of a range for randomized
sleep before each download (minimum possible number
of seconds to sleep) when used along with
max_sleep_interval.
max_sleep_interval:Upper bound of a range for randomized sleep before each
download (maximum possible number of seconds to sleep).
Must only be used along with sleep_interval.
Actual sleep time will be a random float from range
[sleep_interval; max_sleep_interval].
listformats: Print an overview of available video formats and exit.
list_thumbnails: Print a table of all thumbnails and exit.
match_filter: A function that gets called with the info_dict of

View File

@@ -145,6 +145,16 @@ def _real_main(argv=None):
if numeric_limit is None:
parser.error('invalid max_filesize specified')
opts.max_filesize = numeric_limit
if opts.sleep_interval is not None:
if opts.sleep_interval < 0:
parser.error('sleep interval must be positive or 0')
if opts.max_sleep_interval is not None:
if opts.max_sleep_interval < 0:
parser.error('max sleep interval must be positive or 0')
if opts.max_sleep_interval < opts.sleep_interval:
parser.error('max sleep interval must be greater than or equal to min sleep interval')
else:
opts.max_sleep_interval = opts.sleep_interval
def parse_retries(retries):
if retries in ('inf', 'infinite'):
@@ -370,6 +380,7 @@ def _real_main(argv=None):
'source_address': opts.source_address,
'call_home': opts.call_home,
'sleep_interval': opts.sleep_interval,
'max_sleep_interval': opts.max_sleep_interval,
'external_downloader': opts.external_downloader,
'list_thumbnails': opts.list_thumbnails,
'playlist_items': opts.playlist_items,

View File

@@ -4,6 +4,7 @@ import os
import re
import sys
import time
import random
from ..compat import compat_os_name
from ..utils import (
@@ -342,8 +343,11 @@ class FileDownloader(object):
})
return True
sleep_interval = self.params.get('sleep_interval')
if sleep_interval:
min_sleep_interval = self.params.get('sleep_interval')
if min_sleep_interval:
max_sleep_interval = self.params.get('max_sleep_interval', min_sleep_interval)
print(min_sleep_interval, max_sleep_interval)
sleep_interval = random.uniform(min_sleep_interval, max_sleep_interval)
self.to_screen('[download] Sleeping %s seconds...' % sleep_interval)
time.sleep(sleep_interval)

View File

@@ -83,6 +83,20 @@ class AdultSwimIE(InfoExtractor):
# m3u8 download
'skip_download': True,
}
}, {
# heroMetadata.trailer
'url': 'http://www.adultswim.com/videos/decker/inside-decker-a-new-hero/',
'info_dict': {
'id': 'I0LQFQkaSUaFp8PnAWHhoQ',
'ext': 'mp4',
'title': 'Decker - Inside Decker: A New Hero',
'description': 'md5:c916df071d425d62d70c86d4399d3ee0',
'duration': 249.008,
},
'params': {
# m3u8 download
'skip_download': True,
}
}]
@staticmethod
@@ -133,20 +147,26 @@ class AdultSwimIE(InfoExtractor):
if video_info is None:
if bootstrapped_data.get('slugged_video', {}).get('slug') == episode_path:
video_info = bootstrapped_data['slugged_video']
else:
raise ExtractorError('Unable to find video info')
if not video_info:
video_info = bootstrapped_data.get('heroMetadata', {}).get('trailer').get('video')
if not video_info:
raise ExtractorError('Unable to find video info')
show = bootstrapped_data['show']
show_title = show['title']
stream = video_info.get('stream')
clips = [stream] if stream else video_info.get('clips')
if not clips:
if stream and stream.get('videoPlaybackID'):
segment_ids = [stream['videoPlaybackID']]
elif video_info.get('clips'):
segment_ids = [clip['videoPlaybackID'] for clip in video_info['clips']]
elif video_info.get('videoPlaybackID'):
segment_ids = [video_info['videoPlaybackID']]
else:
raise ExtractorError(
'This video is only available via cable service provider subscription that'
' is not currently supported. You may want to use --cookies.'
if video_info.get('auth') is True else 'Unable to find stream or clips',
expected=True)
segment_ids = [clip['videoPlaybackID'] for clip in clips]
episode_id = video_info['id']
episode_title = video_info['title']

View File

@@ -123,6 +123,10 @@ class AolFeaturesIE(InfoExtractor):
'title': 'What To Watch - February 17, 2016',
},
'add_ie': ['FiveMin'],
'params': {
# encrypted m3u8 download
'skip_download': True,
},
}]
def _real_extract(self, url):

View File

@@ -1,8 +1,6 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
ExtractorError,
@@ -15,7 +13,7 @@ class AparatIE(InfoExtractor):
_TEST = {
'url': 'http://www.aparat.com/v/wP8On',
'md5': '6714e0af7e0d875c5a39c4dc4ab46ad1',
'md5': '131aca2e14fe7c4dcb3c4877ba300c89',
'info_dict': {
'id': 'wP8On',
'ext': 'mp4',
@@ -31,13 +29,13 @@ class AparatIE(InfoExtractor):
# Note: There is an easier-to-parse configuration at
# http://www.aparat.com/video/video/config/videohash/%video_id
# but the URL in there does not work
embed_url = ('http://www.aparat.com/video/video/embed/videohash/' +
video_id + '/vt/frame')
embed_url = 'http://www.aparat.com/video/video/embed/vt/frame/showvideo/yes/videohash/' + video_id
webpage = self._download_webpage(embed_url, video_id)
video_urls = [video_url.replace('\\/', '/') for video_url in re.findall(
r'(?:fileList\[[0-9]+\]\s*=|"file"\s*:)\s*"([^"]+)"', webpage)]
for i, video_url in enumerate(video_urls):
file_list = self._parse_json(self._search_regex(
r'fileList\s*=\s*JSON\.parse\(\'([^\']+)\'\)', webpage, 'file list'), video_id)
for i, item in enumerate(file_list[0]):
video_url = item['file']
req = HEADRequest(video_url)
res = self._request_webpage(
req, video_id, note='Testing video URL %d' % i, errnote=False)

View File

@@ -1,67 +1,65 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import unified_strdate
from .jwplatform import JWPlatformBaseIE
from ..utils import (
unified_strdate,
clean_html,
)
class ArchiveOrgIE(InfoExtractor):
class ArchiveOrgIE(JWPlatformBaseIE):
IE_NAME = 'archive.org'
IE_DESC = 'archive.org videos'
_VALID_URL = r'https?://(?:www\.)?archive\.org/details/(?P<id>[^?/]+)(?:[?].*)?$'
_VALID_URL = r'https?://(?:www\.)?archive\.org/(?:details|embed)/(?P<id>[^/?#]+)(?:[?].*)?$'
_TESTS = [{
'url': 'http://archive.org/details/XD300-23_68HighlightsAResearchCntAugHumanIntellect',
'md5': '8af1d4cf447933ed3c7f4871162602db',
'info_dict': {
'id': 'XD300-23_68HighlightsAResearchCntAugHumanIntellect',
'ext': 'ogv',
'ext': 'ogg',
'title': '1968 Demo - FJCC Conference Presentation Reel #1',
'description': 'md5:1780b464abaca9991d8968c877bb53ed',
'description': 'md5:da45c349df039f1cc8075268eb1b5c25',
'upload_date': '19681210',
'uploader': 'SRI International'
}
}, {
'url': 'https://archive.org/details/Cops1922',
'md5': '18f2a19e6d89af8425671da1cf3d4e04',
'md5': 'bc73c8ab3838b5a8fc6c6651fa7b58ba',
'info_dict': {
'id': 'Cops1922',
'ext': 'ogv',
'ext': 'mp4',
'title': 'Buster Keaton\'s "Cops" (1922)',
'description': 'md5:70f72ee70882f713d4578725461ffcc3',
'description': 'md5:b4544662605877edd99df22f9620d858',
}
}, {
'url': 'http://archive.org/embed/XD300-23_68HighlightsAResearchCntAugHumanIntellect',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(
'http://archive.org/embed/' + video_id, video_id)
jwplayer_playlist = self._parse_json(self._search_regex(
r"(?s)Play\('[^']+'\s*,\s*(\[.+\])\s*,\s*{.*?}\);",
webpage, 'jwplayer playlist'), video_id)
info = self._parse_jwplayer_data(
{'playlist': jwplayer_playlist}, video_id, base_url=url)
json_url = url + ('&' if '?' in url else '?') + 'output=json'
data = self._download_json(json_url, video_id)
def get_optional(metadata, field):
return metadata.get(field, [None])[0]
def get_optional(data_dict, field):
return data_dict['metadata'].get(field, [None])[0]
title = get_optional(data, 'title')
description = get_optional(data, 'description')
uploader = get_optional(data, 'creator')
upload_date = unified_strdate(get_optional(data, 'date'))
formats = [
{
'format': fdata['format'],
'url': 'http://' + data['server'] + data['dir'] + fn,
'file_size': int(fdata['size']),
}
for fn, fdata in data['files'].items()
if 'Video' in fdata['format']]
self._sort_formats(formats)
return {
'_type': 'video',
'id': video_id,
'title': title,
'formats': formats,
'description': description,
'uploader': uploader,
'upload_date': upload_date,
'thumbnail': data.get('misc', {}).get('image'),
}
metadata = self._download_json(
'http://archive.org/details/' + video_id, video_id, query={
'output': 'json',
})['metadata']
info.update({
'title': get_optional(metadata, 'title') or info.get('title'),
'description': clean_html(get_optional(metadata, 'description')),
})
if info.get('_type') != 'playlist':
info.update({
'uploader': get_optional(metadata, 'creator'),
'upload_date': unified_strdate(get_optional(metadata, 'date')),
})
return info

View File

@@ -73,6 +73,7 @@ class ARDMediathekIE(InfoExtractor):
'description': 'md5:c0c1c8048514deaed2a73b3a60eecacb',
'duration': 3287,
},
'skip': 'Video is no longer available',
}]
def _extract_media_info(self, media_info_url, webpage, video_id):

View File

@@ -5,11 +5,13 @@ import re
from .common import InfoExtractor
from ..utils import (
dict_get,
ExtractorError,
float_or_none,
int_or_none,
parse_duration,
parse_iso8601,
try_get,
unescapeHTML,
)
from ..compat import (
@@ -229,51 +231,6 @@ class BBCCoUkIE(InfoExtractor):
asx = self._download_xml(connection.get('href'), programme_id, 'Downloading ASX playlist')
return [ref.get('href') for ref in asx.findall('./Entry/ref')]
def _extract_connection(self, connection, programme_id):
formats = []
kind = connection.get('kind')
protocol = connection.get('protocol')
supplier = connection.get('supplier')
if protocol == 'http':
href = connection.get('href')
transfer_format = connection.get('transferFormat')
# ASX playlist
if supplier == 'asx':
for i, ref in enumerate(self._extract_asx_playlist(connection, programme_id)):
formats.append({
'url': ref,
'format_id': 'ref%s_%s' % (i, supplier),
})
# Skip DASH until supported
elif transfer_format == 'dash':
pass
elif transfer_format == 'hls':
formats.extend(self._extract_m3u8_formats(
href, programme_id, ext='mp4', entry_protocol='m3u8_native',
m3u8_id=supplier, fatal=False))
# Direct link
else:
formats.append({
'url': href,
'format_id': supplier or kind or protocol,
})
elif protocol == 'rtmp':
application = connection.get('application', 'ondemand')
auth_string = connection.get('authString')
identifier = connection.get('identifier')
server = connection.get('server')
formats.append({
'url': '%s://%s/%s?%s' % (protocol, server, application, auth_string),
'play_path': identifier,
'app': '%s?%s' % (application, auth_string),
'page_url': 'http://www.bbc.co.uk',
'player_url': 'http://www.bbc.co.uk/emp/releases/iplayer/revisions/617463_618125_4/617463_618125_4_emp.swf',
'rtmp_live': False,
'ext': 'flv',
'format_id': supplier,
})
return formats
def _extract_items(self, playlist):
return playlist.findall('./{%s}item' % self._EMP_PLAYLIST_NS)
@@ -294,46 +251,6 @@ class BBCCoUkIE(InfoExtractor):
def _extract_connections(self, media):
return self._findall_ns(media, './{%s}connection')
def _extract_video(self, media, programme_id):
formats = []
vbr = int_or_none(media.get('bitrate'))
vcodec = media.get('encoding')
service = media.get('service')
width = int_or_none(media.get('width'))
height = int_or_none(media.get('height'))
file_size = int_or_none(media.get('media_file_size'))
for connection in self._extract_connections(media):
conn_formats = self._extract_connection(connection, programme_id)
for format in conn_formats:
format.update({
'width': width,
'height': height,
'vbr': vbr,
'vcodec': vcodec,
'filesize': file_size,
})
if service:
format['format_id'] = '%s_%s' % (service, format['format_id'])
formats.extend(conn_formats)
return formats
def _extract_audio(self, media, programme_id):
formats = []
abr = int_or_none(media.get('bitrate'))
acodec = media.get('encoding')
service = media.get('service')
for connection in self._extract_connections(media):
conn_formats = self._extract_connection(connection, programme_id)
for format in conn_formats:
format.update({
'format_id': '%s_%s' % (service, format['format_id']),
'abr': abr,
'acodec': acodec,
'vcodec': 'none',
})
formats.extend(conn_formats)
return formats
def _get_subtitles(self, media, programme_id):
subtitles = {}
for connection in self._extract_connections(media):
@@ -379,13 +296,87 @@ class BBCCoUkIE(InfoExtractor):
def _process_media_selector(self, media_selection, programme_id):
formats = []
subtitles = None
urls = []
for media in self._extract_medias(media_selection):
kind = media.get('kind')
if kind == 'audio':
formats.extend(self._extract_audio(media, programme_id))
elif kind == 'video':
formats.extend(self._extract_video(media, programme_id))
if kind in ('video', 'audio'):
bitrate = int_or_none(media.get('bitrate'))
encoding = media.get('encoding')
service = media.get('service')
width = int_or_none(media.get('width'))
height = int_or_none(media.get('height'))
file_size = int_or_none(media.get('media_file_size'))
for connection in self._extract_connections(media):
href = connection.get('href')
if href in urls:
continue
if href:
urls.append(href)
conn_kind = connection.get('kind')
protocol = connection.get('protocol')
supplier = connection.get('supplier')
transfer_format = connection.get('transferFormat')
format_id = supplier or conn_kind or protocol
if service:
format_id = '%s_%s' % (service, format_id)
# ASX playlist
if supplier == 'asx':
for i, ref in enumerate(self._extract_asx_playlist(connection, programme_id)):
formats.append({
'url': ref,
'format_id': 'ref%s_%s' % (i, format_id),
})
elif transfer_format == 'dash':
formats.extend(self._extract_mpd_formats(
href, programme_id, mpd_id=format_id, fatal=False))
elif transfer_format == 'hls':
formats.extend(self._extract_m3u8_formats(
href, programme_id, ext='mp4', entry_protocol='m3u8_native',
m3u8_id=format_id, fatal=False))
elif transfer_format == 'hds':
formats.extend(self._extract_f4m_formats(
href, programme_id, f4m_id=format_id, fatal=False))
else:
if not service and not supplier and bitrate:
format_id += '-%d' % bitrate
fmt = {
'format_id': format_id,
'filesize': file_size,
}
if kind == 'video':
fmt.update({
'width': width,
'height': height,
'vbr': bitrate,
'vcodec': encoding,
})
else:
fmt.update({
'abr': bitrate,
'acodec': encoding,
'vcodec': 'none',
})
if protocol == 'http':
# Direct link
fmt.update({
'url': href,
})
elif protocol == 'rtmp':
application = connection.get('application', 'ondemand')
auth_string = connection.get('authString')
identifier = connection.get('identifier')
server = connection.get('server')
fmt.update({
'url': '%s://%s/%s?%s' % (protocol, server, application, auth_string),
'play_path': identifier,
'app': '%s?%s' % (application, auth_string),
'page_url': 'http://www.bbc.co.uk',
'player_url': 'http://www.bbc.co.uk/emp/releases/iplayer/revisions/617463_618125_4/617463_618125_4_emp.swf',
'rtmp_live': False,
'ext': 'flv',
})
formats.append(fmt)
elif kind == 'captions':
subtitles = self.extract_subtitles(media, programme_id)
return formats, subtitles
@@ -589,7 +580,7 @@ class BBCIE(BBCCoUkIE):
'info_dict': {
'id': '150615_telabyad_kentin_cogu',
'ext': 'mp4',
'title': "Tel Abyad'da IŞİD bayrağı indirildi YPG bayrağı çekildi",
'title': "YPG: Tel Abyad'ın tamamı kontrolümüzde",
'description': 'md5:33a4805a855c9baf7115fcbde57e7025',
'timestamp': 1434397334,
'upload_date': '20150615',
@@ -654,6 +645,23 @@ class BBCIE(BBCCoUkIE):
# rtmp download
'skip_download': True,
}
}, {
# single video embedded with Morph
'url': 'http://www.bbc.co.uk/sport/live/olympics/36895975',
'info_dict': {
'id': 'p041vhd0',
'ext': 'mp4',
'title': "Nigeria v Japan - Men's First Round",
'description': 'Live coverage of the first round from Group B at the Amazonia Arena.',
'duration': 7980,
'uploader': 'BBC Sport',
'uploader_id': 'bbc_sport',
},
'params': {
# m3u8 download
'skip_download': True,
},
'skip': 'Georestricted to UK',
}, {
# single video with playlist.sxml URL in playlist param
'url': 'http://www.bbc.com/sport/0/football/33653409',
@@ -751,7 +759,7 @@ class BBCIE(BBCCoUkIE):
webpage = self._download_webpage(url, playlist_id)
json_ld_info = self._search_json_ld(webpage, playlist_id, default=None)
json_ld_info = self._search_json_ld(webpage, playlist_id, default={})
timestamp = json_ld_info.get('timestamp')
playlist_title = json_ld_info.get('title')
@@ -820,13 +828,19 @@ class BBCIE(BBCCoUkIE):
# http://www.bbc.com/turkce/multimedya/2015/10/151010_vid_ankara_patlama_ani)
playlist = data_playable.get('otherSettings', {}).get('playlist', {})
if playlist:
for key in ('progressiveDownload', 'streaming'):
entry = None
for key in ('streaming', 'progressiveDownload'):
playlist_url = playlist.get('%sUrl' % key)
if not playlist_url:
continue
try:
entries.append(self._extract_from_playlist_sxml(
playlist_url, playlist_id, timestamp))
info = self._extract_from_playlist_sxml(
playlist_url, playlist_id, timestamp)
if not entry:
entry = info
else:
entry['title'] = info['title']
entry['formats'].extend(info['formats'])
except Exception as e:
# Some playlist URL may fail with 500, at the same time
# the other one may work fine (e.g.
@@ -834,6 +848,9 @@ class BBCIE(BBCCoUkIE):
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 500:
continue
raise
if entry:
self._sort_formats(entry['formats'])
entries.append(entry)
if entries:
return self.playlist_result(entries, playlist_id, playlist_title, playlist_description)
@@ -866,6 +883,50 @@ class BBCIE(BBCCoUkIE):
'subtitles': subtitles,
}
# Morph based embed (e.g. http://www.bbc.co.uk/sport/live/olympics/36895975)
# There are several setPayload calls may be present but the video
# seems to be always related to the first one
morph_payload = self._parse_json(
self._search_regex(
r'Morph\.setPayload\([^,]+,\s*({.+?})\);',
webpage, 'morph payload', default='{}'),
playlist_id, fatal=False)
if morph_payload:
components = try_get(morph_payload, lambda x: x['body']['components'], list) or []
for component in components:
if not isinstance(component, dict):
continue
lead_media = try_get(component, lambda x: x['props']['leadMedia'], dict)
if not lead_media:
continue
identifiers = lead_media.get('identifiers')
if not identifiers or not isinstance(identifiers, dict):
continue
programme_id = identifiers.get('vpid') or identifiers.get('playablePid')
if not programme_id:
continue
title = lead_media.get('title') or self._og_search_title(webpage)
formats, subtitles = self._download_media_selector(programme_id)
self._sort_formats(formats)
description = lead_media.get('summary')
uploader = lead_media.get('masterBrand')
uploader_id = lead_media.get('mid')
duration = None
duration_d = lead_media.get('duration')
if isinstance(duration_d, dict):
duration = parse_duration(dict_get(
duration_d, ('rawDuration', 'formattedDuration', 'spokenDuration')))
return {
'id': programme_id,
'title': title,
'description': description,
'duration': duration,
'uploader': uploader,
'uploader_id': uploader_id,
'formats': formats,
'subtitles': subtitles,
}
def extract_all(pattern):
return list(filter(None, map(
lambda s: self._parse_json(s, playlist_id, fatal=False),
@@ -883,7 +944,7 @@ class BBCIE(BBCCoUkIE):
r'setPlaylist\("(%s)"\)' % EMBED_URL, webpage))
if entries:
return self.playlist_result(
[self.url_result(entry, 'BBCCoUk') for entry in entries],
[self.url_result(entry_, 'BBCCoUk') for entry_ in entries],
playlist_id, playlist_title, playlist_description)
# Multiple video article (e.g. http://www.bbc.com/news/world-europe-32668511)

View File

@@ -25,13 +25,13 @@ class BiliBiliIE(InfoExtractor):
_TESTS = [{
'url': 'http://www.bilibili.tv/video/av1074402/',
'md5': '5f7d29e1a2872f3df0cf76b1f87d3788',
'md5': '9fa226fe2b8a9a4d5a69b4c6a183417e',
'info_dict': {
'id': '1554319',
'ext': 'flv',
'ext': 'mp4',
'title': '【金坷垃】金泡沫',
'description': 'md5:ce18c2a2d2193f0df2917d270f2e5923',
'duration': 308.067,
'duration': 308.315,
'timestamp': 1398012660,
'upload_date': '20140420',
'thumbnail': 're:^https?://.+\.jpg',
@@ -41,73 +41,33 @@ class BiliBiliIE(InfoExtractor):
}, {
'url': 'http://www.bilibili.com/video/av1041170/',
'info_dict': {
'id': '1041170',
'id': '1507019',
'ext': 'mp4',
'title': '【BD1080P】刀语【诸神&异域】',
'description': '这是个神奇的故事~每个人不留弹幕不给走哦~切利哦!~',
'timestamp': 1396530060,
'upload_date': '20140403',
'uploader': '枫叶逝去',
'uploader_id': '520116',
},
'playlist_count': 9,
}, {
'url': 'http://www.bilibili.com/video/av4808130/',
'info_dict': {
'id': '4808130',
'id': '7802182',
'ext': 'mp4',
'title': '【长篇】哆啦A梦443【钉铛】',
'description': '(2016.05.27)来组合客人的脸吧&amp;amp;寻母六千里锭 抱歉,又轮到周日上班现在才到家 封面www.pixiv.net/member_illust.php?mode=medium&amp;amp;illust_id=56912929',
'timestamp': 1464564180,
'upload_date': '20160529',
'uploader': '喜欢拉面',
'uploader_id': '151066',
},
'playlist': [{
'md5': '55cdadedf3254caaa0d5d27cf20a8f9c',
'info_dict': {
'id': '4808130_part1',
'ext': 'flv',
'title': '【长篇】哆啦A梦443【钉铛】',
'description': '(2016.05.27)来组合客人的脸吧&amp;amp;寻母六千里锭 抱歉,又轮到周日上班现在才到家 封面www.pixiv.net/member_illust.php?mode=medium&amp;amp;illust_id=56912929',
'timestamp': 1464564180,
'upload_date': '20160529',
'uploader': '喜欢拉面',
'uploader_id': '151066',
},
}, {
'md5': '926f9f67d0c482091872fbd8eca7ea3d',
'info_dict': {
'id': '4808130_part2',
'ext': 'flv',
'title': '【长篇】哆啦A梦443【钉铛】',
'description': '(2016.05.27)来组合客人的脸吧&amp;amp;寻母六千里锭 抱歉,又轮到周日上班现在才到家 封面www.pixiv.net/member_illust.php?mode=medium&amp;amp;illust_id=56912929',
'timestamp': 1464564180,
'upload_date': '20160529',
'uploader': '喜欢拉面',
'uploader_id': '151066',
},
}, {
'md5': '4b7b225b968402d7c32348c646f1fd83',
'info_dict': {
'id': '4808130_part3',
'ext': 'flv',
'title': '【长篇】哆啦A梦443【钉铛】',
'description': '(2016.05.27)来组合客人的脸吧&amp;amp;寻母六千里锭 抱歉,又轮到周日上班现在才到家 封面www.pixiv.net/member_illust.php?mode=medium&amp;amp;illust_id=56912929',
'timestamp': 1464564180,
'upload_date': '20160529',
'uploader': '喜欢拉面',
'uploader_id': '151066',
},
}, {
'md5': '7b795e214166501e9141139eea236e91',
'info_dict': {
'id': '4808130_part4',
'ext': 'flv',
'title': '【长篇】哆啦A梦443【钉铛】',
'description': '(2016.05.27)来组合客人的脸吧&amp;amp;寻母六千里锭 抱歉,又轮到周日上班现在才到家 封面www.pixiv.net/member_illust.php?mode=medium&amp;amp;illust_id=56912929',
'timestamp': 1464564180,
'upload_date': '20160529',
'uploader': '喜欢拉面',
'uploader_id': '151066',
},
}],
}, {
# Missing upload time
'url': 'http://www.bilibili.com/video/av1867637/',
'info_dict': {
'id': '2880301',
'ext': 'flv',
'ext': 'mp4',
'title': '【HDTV】【喜剧】岳父岳母真难当 2014【法国票房冠军】',
'description': '一个信奉天主教的法国旧式传统资产阶级家庭中有四个女儿。三个女儿却分别找了阿拉伯、犹太、中国丈夫,老夫老妻唯独期盼剩下未嫁的小女儿能找一个信奉天主教的法国白人,结果没想到小女儿找了一位非裔黑人……【这次应该不会跳帧了】',
'uploader': '黑夜为猫',

View File

@@ -24,7 +24,8 @@ class BIQLEIE(InfoExtractor):
'ext': 'mp4',
'title': 'Ребенок в шоке от автоматической мойки',
'uploader': 'Dmitry Kotov',
}
},
'skip': ' This video was marked as adult. Embedding adult videos on external sites is prohibited.',
}]
def _real_extract(self, url):

View File

@@ -1,3 +1,4 @@
# coding: utf-8
from __future__ import unicode_literals
import re
@@ -20,6 +21,18 @@ class BloombergIE(InfoExtractor):
'params': {
'format': 'best[format_id^=hds]',
},
}, {
# video ID in BPlayer(...)
'url': 'http://www.bloomberg.com/features/2016-hello-world-new-zealand/',
'info_dict': {
'id': '938c7e72-3f25-4ddb-8b85-a9be731baa74',
'ext': 'flv',
'title': 'Meet the Real-Life Tech Wizards of Middle Earth',
'description': 'Hello World, Episode 1: New Zealands freaky AI babies, robot exoskeletons, and a virtual you.',
},
'params': {
'format': 'best[format_id^=hds]',
},
}, {
'url': 'http://www.bloomberg.com/news/articles/2015-11-12/five-strange-things-that-have-been-happening-in-financial-markets',
'only_matching': True,
@@ -33,7 +46,11 @@ class BloombergIE(InfoExtractor):
webpage = self._download_webpage(url, name)
video_id = self._search_regex(
r'["\']bmmrId["\']\s*:\s*(["\'])(?P<url>.+?)\1',
webpage, 'id', group='url')
webpage, 'id', group='url', default=None)
if not video_id:
bplayer_data = self._parse_json(self._search_regex(
r'BPlayer\(null,\s*({[^;]+})\);', webpage, 'id'), name)
video_id = bplayer_data['id']
title = re.sub(': Video$', '', self._og_search_title(webpage))
embed_info = self._download_json(

View File

@@ -1,12 +1,10 @@
# coding: utf-8
from __future__ import unicode_literals
import calendar
import datetime
from .anvato import AnvatoIE
from .sendtonews import SendtoNewsIE
from ..compat import compat_urlparse
from ..utils import unified_timestamp
class CBSLocalIE(AnvatoIE):
@@ -71,10 +69,7 @@ class CBSLocalIE(AnvatoIE):
time_str = self._html_search_regex(
r'class="entry-date">([^<]+)<', webpage, 'released date', fatal=False)
timestamp = None
if time_str:
timestamp = calendar.timegm(datetime.datetime.strptime(
time_str, '%b %d, %Y %I:%M %p').timetuple())
timestamp = unified_timestamp(time_str)
info_dict.update({
'display_id': display_id,

View File

@@ -17,7 +17,8 @@ class ChaturbateIE(InfoExtractor):
},
'params': {
'skip_download': True,
}
},
'skip': 'Room is offline',
}, {
'url': 'https://en.chaturbate.com/siswet19/',
'only_matching': True,

View File

@@ -1,6 +1,7 @@
from __future__ import unicode_literals
from .mtv import MTVServicesInfoExtractor
from .common import InfoExtractor
class ComedyCentralIE(MTVServicesInfoExtractor):
@@ -96,3 +97,22 @@ class ComedyCentralTVIE(MTVServicesInfoExtractor):
webpage, 'mrss url', group='url')
return self._get_videos_info_from_url(mrss_url, video_id)
class ComedyCentralShortnameIE(InfoExtractor):
_VALID_URL = r'^:(?P<id>tds|thedailyshow)$'
_TESTS = [{
'url': ':tds',
'only_matching': True,
}, {
'url': ':thedailyshow',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
shortcut_map = {
'tds': 'http://www.cc.com/shows/the-daily-show-with-trevor-noah/full-episodes',
'thedailyshow': 'http://www.cc.com/shows/the-daily-show-with-trevor-noah/full-episodes',
}
return self.url_result(shortcut_map[video_id])

View File

@@ -727,9 +727,14 @@ class InfoExtractor(object):
[^>]+?content=(["\'])(?P<content>.*?)\2''' % re.escape(prop)
def _og_search_property(self, prop, html, name=None, **kargs):
if not isinstance(prop, (list, tuple)):
prop = [prop]
if name is None:
name = 'OpenGraph %s' % prop
escaped = self._search_regex(self._og_regexes(prop), html, name, flags=re.DOTALL, **kargs)
name = 'OpenGraph %s' % prop[0]
og_regexes = []
for p in prop:
og_regexes.extend(self._og_regexes(p))
escaped = self._search_regex(og_regexes, html, name, flags=re.DOTALL, **kargs)
if escaped is None:
return None
return unescapeHTML(escaped)
@@ -811,11 +816,14 @@ class InfoExtractor(object):
json_ld = self._search_regex(
r'(?s)<script[^>]+type=(["\'])application/ld\+json\1[^>]*>(?P<json_ld>.+?)</script>',
html, 'JSON-LD', group='json_ld', **kwargs)
default = kwargs.get('default', NO_DEFAULT)
if not json_ld:
return {}
return self._json_ld(
json_ld, video_id, fatal=kwargs.get('fatal', True),
expected_type=expected_type)
return default if default is not NO_DEFAULT else {}
# JSON-LD may be malformed and thus `fatal` should be respected.
# At the same time `default` may be passed that assumes `fatal=False`
# for _search_regex. Let's simulate the same behavior here as well.
fatal = kwargs.get('fatal', True) if default == NO_DEFAULT else False
return self._json_ld(json_ld, video_id, fatal=fatal, expected_type=expected_type)
def _json_ld(self, json_ld, video_id, fatal=True, expected_type=None):
if isinstance(json_ld, compat_str):
@@ -823,41 +831,47 @@ class InfoExtractor(object):
if not json_ld:
return {}
info = {}
if json_ld.get('@context') == 'http://schema.org':
item_type = json_ld.get('@type')
if expected_type is not None and expected_type != item_type:
return info
if item_type == 'TVEpisode':
info.update({
'episode': unescapeHTML(json_ld.get('name')),
'episode_number': int_or_none(json_ld.get('episodeNumber')),
'description': unescapeHTML(json_ld.get('description')),
})
part_of_season = json_ld.get('partOfSeason')
if isinstance(part_of_season, dict) and part_of_season.get('@type') == 'TVSeason':
info['season_number'] = int_or_none(part_of_season.get('seasonNumber'))
part_of_series = json_ld.get('partOfSeries')
if isinstance(part_of_series, dict) and part_of_series.get('@type') == 'TVSeries':
info['series'] = unescapeHTML(part_of_series.get('name'))
elif item_type == 'Article':
info.update({
'timestamp': parse_iso8601(json_ld.get('datePublished')),
'title': unescapeHTML(json_ld.get('headline')),
'description': unescapeHTML(json_ld.get('articleBody')),
})
elif item_type == 'VideoObject':
info.update({
'url': json_ld.get('contentUrl'),
'title': unescapeHTML(json_ld.get('name')),
'description': unescapeHTML(json_ld.get('description')),
'thumbnail': json_ld.get('thumbnailUrl'),
'duration': parse_duration(json_ld.get('duration')),
'timestamp': unified_timestamp(json_ld.get('uploadDate')),
'filesize': float_or_none(json_ld.get('contentSize')),
'tbr': int_or_none(json_ld.get('bitrate')),
'width': int_or_none(json_ld.get('width')),
'height': int_or_none(json_ld.get('height')),
})
if not isinstance(json_ld, (list, tuple, dict)):
return info
if isinstance(json_ld, dict):
json_ld = [json_ld]
for e in json_ld:
if e.get('@context') == 'http://schema.org':
item_type = e.get('@type')
if expected_type is not None and expected_type != item_type:
return info
if item_type == 'TVEpisode':
info.update({
'episode': unescapeHTML(e.get('name')),
'episode_number': int_or_none(e.get('episodeNumber')),
'description': unescapeHTML(e.get('description')),
})
part_of_season = e.get('partOfSeason')
if isinstance(part_of_season, dict) and part_of_season.get('@type') == 'TVSeason':
info['season_number'] = int_or_none(part_of_season.get('seasonNumber'))
part_of_series = e.get('partOfSeries') or e.get('partOfTVSeries')
if isinstance(part_of_series, dict) and part_of_series.get('@type') == 'TVSeries':
info['series'] = unescapeHTML(part_of_series.get('name'))
elif item_type == 'Article':
info.update({
'timestamp': parse_iso8601(e.get('datePublished')),
'title': unescapeHTML(e.get('headline')),
'description': unescapeHTML(e.get('articleBody')),
})
elif item_type == 'VideoObject':
info.update({
'url': e.get('contentUrl'),
'title': unescapeHTML(e.get('name')),
'description': unescapeHTML(e.get('description')),
'thumbnail': e.get('thumbnailUrl'),
'duration': parse_duration(e.get('duration')),
'timestamp': unified_timestamp(e.get('uploadDate')),
'filesize': float_or_none(e.get('contentSize')),
'tbr': int_or_none(e.get('bitrate')),
'width': int_or_none(e.get('width')),
'height': int_or_none(e.get('height')),
})
break
return dict((k, v) for k, v in info.items() if v is not None)
@staticmethod
@@ -911,7 +925,8 @@ class InfoExtractor(object):
if f.get('ext') in ['f4f', 'f4m']: # Not yet supported
preference -= 0.5
proto_preference = 0 if determine_protocol(f) in ['http', 'https'] else -0.1
protocol = f.get('protocol') or determine_protocol(f)
proto_preference = 0 if protocol in ['http', 'https'] else (-0.5 if protocol == 'rtsp' else -0.1)
if f.get('vcodec') == 'none': # audio only
preference -= 50
@@ -1128,7 +1143,7 @@ class InfoExtractor(object):
'url': m3u8_url,
'ext': ext,
'protocol': 'm3u8',
'preference': preference - 1 if preference else -1,
'preference': preference - 100 if preference else -100,
'resolution': 'multiple',
'format_note': 'Quality selection URL',
}

View File

@@ -5,13 +5,17 @@ import re
from .common import InfoExtractor
from ..compat import (
compat_urllib_parse_urlencode,
compat_urllib_parse_urlparse,
compat_urlparse,
)
from ..utils import (
orderedSet,
remove_end,
extract_attributes,
mimetype2ext,
determine_ext,
int_or_none,
parse_iso8601,
)
@@ -58,6 +62,9 @@ class CondeNastIE(InfoExtractor):
'ext': 'mp4',
'title': '3D Printed Speakers Lit With LED',
'description': 'Check out these beautiful 3D printed LED speakers. You can\'t actually buy them, but LumiGeek is working on a board that will let you make you\'re own.',
'uploader': 'wired',
'upload_date': '20130314',
'timestamp': 1363219200,
}
}, {
# JS embed
@@ -67,70 +74,93 @@ class CondeNastIE(InfoExtractor):
'id': '55f9cf8b61646d1acf00000c',
'ext': 'mp4',
'title': '3D printed TSA Travel Sentry keys really do open TSA locks',
'uploader': 'arstechnica',
'upload_date': '20150916',
'timestamp': 1442434955,
}
}]
def _extract_series(self, url, webpage):
title = self._html_search_regex(r'<div class="cne-series-info">.*?<h1>(.+?)</h1>',
webpage, 'series title', flags=re.DOTALL)
title = self._html_search_regex(
r'(?s)<div class="cne-series-info">.*?<h1>(.+?)</h1>',
webpage, 'series title')
url_object = compat_urllib_parse_urlparse(url)
base_url = '%s://%s' % (url_object.scheme, url_object.netloc)
m_paths = re.finditer(r'<p class="cne-thumb-title">.*?<a href="(/watch/.+?)["\?]',
webpage, flags=re.DOTALL)
m_paths = re.finditer(
r'(?s)<p class="cne-thumb-title">.*?<a href="(/watch/.+?)["\?]', webpage)
paths = orderedSet(m.group(1) for m in m_paths)
build_url = lambda path: compat_urlparse.urljoin(base_url, path)
entries = [self.url_result(build_url(path), 'CondeNast') for path in paths]
return self.playlist_result(entries, playlist_title=title)
def _extract_video(self, webpage, url_type):
if url_type != 'embed':
description = self._html_search_regex(
[
r'<div class="cne-video-description">(.+?)</div>',
r'<div class="video-post-content">(.+?)</div>',
],
webpage, 'description', fatal=False, flags=re.DOTALL)
query = {}
params = self._search_regex(
r'(?s)var params = {(.+?)}[;,]', webpage, 'player params', default=None)
if params:
query.update({
'videoId': self._search_regex(r'videoId: [\'"](.+?)[\'"]', params, 'video id'),
'playerId': self._search_regex(r'playerId: [\'"](.+?)[\'"]', params, 'player id'),
'target': self._search_regex(r'target: [\'"](.+?)[\'"]', params, 'target'),
})
else:
description = None
params = self._search_regex(r'var params = {(.+?)}[;,]', webpage,
'player params', flags=re.DOTALL)
video_id = self._search_regex(r'videoId: [\'"](.+?)[\'"]', params, 'video id')
player_id = self._search_regex(r'playerId: [\'"](.+?)[\'"]', params, 'player id')
target = self._search_regex(r'target: [\'"](.+?)[\'"]', params, 'target')
data = compat_urllib_parse_urlencode({'videoId': video_id,
'playerId': player_id,
'target': target,
})
base_info_url = self._search_regex(r'url = [\'"](.+?)[\'"][,;]',
webpage, 'base info url',
default='http://player.cnevids.com/player/loader.js?')
info_url = base_info_url + data
info_page = self._download_webpage(info_url, video_id,
'Downloading video info')
video_info = self._search_regex(r'var\s+video\s*=\s*({.+?});', info_page, 'video info')
video_info = self._parse_json(video_info, video_id)
params = extract_attributes(self._search_regex(
r'(<[^>]+data-js="video-player"[^>]+>)',
webpage, 'player params element'))
query.update({
'videoId': params['data-video'],
'playerId': params['data-player'],
'target': params['id'],
})
video_id = query['videoId']
video_info = None
info_page = self._download_webpage(
'http://player.cnevids.com/player/video.js',
video_id, 'Downloading video info', query=query, fatal=False)
if info_page:
video_info = self._parse_json(self._search_regex(
r'loadCallback\(({.+})\)', info_page, 'video info'), video_id)['video']
else:
info_page = self._download_webpage(
'http://player.cnevids.com/player/loader.js',
video_id, 'Downloading loader info', query=query)
video_info = self._parse_json(self._search_regex(
r'var\s+video\s*=\s*({.+?});', info_page, 'video info'), video_id)
title = video_info['title']
formats = [{
'format_id': '%s-%s' % (fdata['type'].split('/')[-1], fdata['quality']),
'url': fdata['src'],
'ext': fdata['type'].split('/')[-1],
'quality': 1 if fdata['quality'] == 'high' else 0,
} for fdata in video_info['sources'][0]]
formats = []
for fdata in video_info.get('sources', [{}])[0]:
src = fdata.get('src')
if not src:
continue
ext = mimetype2ext(fdata.get('type')) or determine_ext(src)
quality = fdata.get('quality')
formats.append({
'format_id': ext + ('-%s' % quality if quality else ''),
'url': src,
'ext': ext,
'quality': 1 if quality == 'high' else 0,
})
self._sort_formats(formats)
return {
info = self._search_json_ld(
webpage, video_id, fatal=False) if url_type != 'embed' else {}
info.update({
'id': video_id,
'formats': formats,
'title': video_info['title'],
'thumbnail': video_info['poster_frame'],
'description': description,
}
'title': title,
'thumbnail': video_info.get('poster_frame'),
'uploader': video_info.get('brand'),
'duration': int_or_none(video_info.get('duration')),
'tags': video_info.get('tags'),
'series': video_info.get('series_title'),
'season': video_info.get('season_title'),
'timestamp': parse_iso8601(video_info.get('premiere_date')),
})
return info
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
site = mobj.group('site')
url_type = mobj.group('type')
item_id = mobj.group('id')
site, url_type, item_id = re.match(self._VALID_URL, url).groups()
# Convert JS embed to regular embed
if url_type == 'embedjs':

View File

@@ -9,7 +9,7 @@ from ..utils import (
class CWTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?cw(?:tv|seed)\.com/(?:shows/)?(?:[^/]+/){2}\?.*\bplay=(?P<id>[a-z0-9]{8}-[a-z0-9]{4}-[a-z0-9]{4}-[a-z0-9]{4}-[a-z0-9]{12})'
_VALID_URL = r'https?://(?:www\.)?cw(?:tv(?:pr)?|seed)\.com/(?:shows/)?(?:[^/]+/)+[^?]*\?.*\b(?:play|watch)=(?P<id>[a-z0-9]{8}-[a-z0-9]{4}-[a-z0-9]{4}-[a-z0-9]{4}-[a-z0-9]{12})'
_TESTS = [{
'url': 'http://cwtv.com/shows/arrow/legends-of-yesterday/?play=6b15e985-9345-4f60-baf8-56e96be57c63',
'info_dict': {
@@ -28,7 +28,8 @@ class CWTVIE(InfoExtractor):
'params': {
# m3u8 download
'skip_download': True,
}
},
'skip': 'redirect to http://cwtv.com/shows/arrow/',
}, {
'url': 'http://www.cwseed.com/shows/whose-line-is-it-anyway/jeff-davis-4/?play=24282b12-ead2-42f2-95ad-26770c2c6088',
'info_dict': {
@@ -44,22 +45,43 @@ class CWTVIE(InfoExtractor):
'upload_date': '20151006',
'timestamp': 1444107300,
},
'params': {
# m3u8 download
'skip_download': True,
}
}, {
'url': 'http://cwtv.com/thecw/chroniclesofcisco/?play=8adebe35-f447-465f-ab52-e863506ff6d6',
'only_matching': True,
}, {
'url': 'http://cwtvpr.com/the-cw/video?watch=9eee3f60-ef4e-440b-b3b2-49428ac9c54e',
'only_matching': True,
}, {
'url': 'http://cwtv.com/shows/arrow/legends-of-yesterday/?watch=6b15e985-9345-4f60-baf8-56e96be57c63',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
video_data = self._download_json(
'http://metaframe.digitalsmiths.tv/v2/CWtv/assets/%s/partner/132?format=json' % video_id, video_id)
formats = self._extract_m3u8_formats(
video_data['videos']['variantplaylist']['uri'], video_id, 'mp4')
video_data = None
formats = []
for partner in (154, 213):
vdata = self._download_json(
'http://metaframe.digitalsmiths.tv/v2/CWtv/assets/%s/partner/%d?format=json' % (video_id, partner), video_id, fatal=False)
if not vdata:
continue
video_data = vdata
for quality, quality_data in vdata.get('videos', {}).items():
quality_url = quality_data.get('uri')
if not quality_url:
continue
if quality == 'variantplaylist':
formats.extend(self._extract_m3u8_formats(
quality_url, video_id, 'mp4', m3u8_id='hls', fatal=False))
else:
tbr = int_or_none(quality_data.get('bitrate'))
format_id = 'http' + ('-%d' % tbr if tbr else '')
if self._is_valid_url(quality_url, video_id, format_id):
formats.append({
'format_id': format_id,
'url': quality_url,
'tbr': tbr,
})
self._sort_formats(formats)
thumbnails = [{

View File

@@ -331,7 +331,9 @@ class DailymotionPlaylistIE(DailymotionBaseInfoExtractor):
for video_id in re.findall(r'data-xid="(.+?)"', webpage):
if video_id not in video_ids:
yield self.url_result('http://www.dailymotion.com/video/%s' % video_id, 'Dailymotion')
yield self.url_result(
'http://www.dailymotion.com/video/%s' % video_id,
DailymotionIE.ie_key(), video_id)
video_ids.add(video_id)
if re.search(self._MORE_PAGES_INDICATOR, webpage) is None:

View File

@@ -0,0 +1,98 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
extract_attributes,
int_or_none,
parse_age_limit,
unescapeHTML,
)
class DiscoveryGoIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?discoverygo\.com/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_TEST = {
'url': 'https://www.discoverygo.com/love-at-first-kiss/kiss-first-ask-questions-later/',
'info_dict': {
'id': '57a33c536b66d1cd0345eeb1',
'ext': 'mp4',
'title': 'Kiss First, Ask Questions Later!',
'description': 'md5:fe923ba34050eae468bffae10831cb22',
'duration': 2579,
'series': 'Love at First Kiss',
'season_number': 1,
'episode_number': 1,
'age_limit': 14,
},
}
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
container = extract_attributes(
self._search_regex(
r'(<div[^>]+class=["\']video-player-container[^>]+>)',
webpage, 'video container'))
video = self._parse_json(
unescapeHTML(container.get('data-video') or container.get('data-json')),
display_id)
title = video['name']
stream = video['stream']
STREAM_URL_SUFFIX = 'streamUrl'
formats = []
for stream_kind in ('', 'hds'):
suffix = STREAM_URL_SUFFIX.capitalize() if stream_kind else STREAM_URL_SUFFIX
stream_url = stream.get('%s%s' % (stream_kind, suffix))
if not stream_url:
continue
if stream_kind == '':
formats.extend(self._extract_m3u8_formats(
stream_url, display_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls', fatal=False))
elif stream_kind == 'hds':
formats.extend(self._extract_f4m_formats(
stream_url, display_id, f4m_id=stream_kind, fatal=False))
self._sort_formats(formats)
video_id = video.get('id') or display_id
description = video.get('description', {}).get('detailed')
duration = int_or_none(video.get('duration'))
series = video.get('show', {}).get('name')
season_number = int_or_none(video.get('season', {}).get('number'))
episode_number = int_or_none(video.get('episodeNumber'))
tags = video.get('tags')
age_limit = parse_age_limit(video.get('parental', {}).get('rating'))
subtitles = {}
captions = stream.get('captions')
if isinstance(captions, list):
for caption in captions:
subtitle_url = caption.get('fileUrl')
if (not subtitle_url or not isinstance(subtitle_url, compat_str) or
not subtitle_url.startswith('http')):
continue
lang = caption.get('fileLang', 'en')
subtitles.setdefault(lang, []).append({'url': subtitle_url})
return {
'id': video_id,
'display_id': display_id,
'title': title,
'description': description,
'duration': duration,
'series': series,
'season_number': season_number,
'episode_number': episode_number,
'tags': tags,
'age_limit': age_limit,
'formats': formats,
'subtitles': subtitles,
}

View File

@@ -4,9 +4,10 @@ from .common import InfoExtractor
class EngadgetIE(InfoExtractor):
_VALID_URL = r'https?://www.engadget.com/video/(?P<id>\d+)'
_VALID_URL = r'https?://www.engadget.com/video/(?P<id>[^/?#]+)'
_TEST = {
_TESTS = [{
# video with 5min ID
'url': 'http://www.engadget.com/video/518153925/',
'md5': 'c6820d4828a5064447a4d9fc73f312c9',
'info_dict': {
@@ -15,8 +16,12 @@ class EngadgetIE(InfoExtractor):
'title': 'Samsung Galaxy Tab Pro 8.4 Review',
},
'add_ie': ['FiveMin'],
}
}, {
# video with vidible ID
'url': 'https://www.engadget.com/video/57a28462134aa15a39f0421a/',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
return self.url_result('5min:%s' % video_id)
return self.url_result('aol-video:%s' % video_id)

View File

@@ -159,6 +159,7 @@ from .coub import CoubIE
from .collegerama import CollegeRamaIE
from .comedycentral import (
ComedyCentralIE,
ComedyCentralShortnameIE,
ComedyCentralTVIE,
ToshIE,
)
@@ -220,6 +221,7 @@ from .dvtv import DVTVIE
from .dumpert import DumpertIE
from .defense import DefenseGouvFrIE
from .discovery import DiscoveryIE
from .discoverygo import DiscoveryGoIE
from .dispeak import DigitallySpeakingIE
from .dropbox import DropboxIE
from .dw import (
@@ -289,7 +291,6 @@ from .funimation import FunimationIE
from .funnyordie import FunnyOrDieIE
from .fusion import FusionIE
from .gameinformer import GameInformerIE
from .gamekings import GamekingsIE
from .gameone import (
GameOneIE,
GameOnePlaylistIE,
@@ -491,8 +492,9 @@ from .myvi import MyviIE
from .myvideo import MyVideoIE
from .myvidster import MyVidsterIE
from .nationalgeographic import (
NationalGeographicVideoIE,
NationalGeographicIE,
NationalGeographicChannelIE,
NationalGeographicEpisodeGuideIE,
)
from .naver import NaverIE
from .nba import NBAIE
@@ -635,6 +637,7 @@ from .pluralsight import (
PluralsightCourseIE,
)
from .podomatic import PodomaticIE
from .pokemon import PokemonIE
from .polskieradio import PolskieRadioIE
from .porn91 import Porn91IE
from .pornhd import PornHdIE
@@ -693,6 +696,7 @@ from .rockstargames import RockstarGamesIE
from .roosterteeth import RoosterTeethIE
from .rottentomatoes import RottenTomatoesIE
from .roxwel import RoxwelIE
from .rozhlas import RozhlasIE
from .rtbf import RTBFIE
from .rte import RteIE, RteRadioIE
from .rtlnl import RtlNlIE
@@ -752,6 +756,7 @@ from .smotri import (
)
from .snotr import SnotrIE
from .sohu import SohuIE
from .sonyliv import SonyLIVIE
from .soundcloud import (
SoundcloudIE,
SoundcloudSetIE,
@@ -891,6 +896,7 @@ from .tvc import (
from .tvigle import TvigleIE
from .tvland import TVLandIE
from .tvp import (
TVPEmbedIE,
TVPIE,
TVPSeriesIE,
)
@@ -923,6 +929,7 @@ from .udemy import (
from .udn import UDNEmbedIE
from .digiteka import DigitekaIE
from .unistra import UnistraIE
from .uol import UOLIE
from .urort import UrortIE
from .urplay import URPlayIE
from .usatoday import USATodayIE
@@ -1004,6 +1011,7 @@ from .vk import (
)
from .vlive import VLiveIE
from .vodlocker import VodlockerIE
from .vodplatform import VODPlatformIE
from .voicerepublic import VoiceRepublicIE
from .voxmedia import VoxMediaIE
from .vporn import VpornIE

View File

@@ -1,24 +1,11 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import (
compat_parse_qs,
compat_urllib_parse_urlencode,
compat_urllib_parse_urlparse,
compat_urlparse,
)
from ..utils import (
ExtractorError,
parse_duration,
replace_extension,
)
class FiveMinIE(InfoExtractor):
IE_NAME = '5min'
_VALID_URL = r'(?:5min:(?P<id>\d+)(?::(?P<sid>\d+))?|https?://[^/]*?5min\.com/Scripts/PlayerSeed\.js\?(?P<query>.*))'
_VALID_URL = r'(?:5min:|https?://(?:[^/]*?5min\.com/|delivery\.vidible\.tv/aol)(?:(?:Scripts/PlayerSeed\.js|playerseed/?)?\?.*?playList=)?)(?P<id>\d+)'
_TESTS = [
{
@@ -29,8 +16,16 @@ class FiveMinIE(InfoExtractor):
'id': '518013791',
'ext': 'mp4',
'title': 'iPad Mini with Retina Display Review',
'description': 'iPad mini with Retina Display review',
'duration': 177,
'uploader': 'engadget',
'upload_date': '20131115',
'timestamp': 1384515288,
},
'params': {
# m3u8 download
'skip_download': True,
}
},
{
# From http://on.aol.com/video/how-to-make-a-next-level-fruit-salad-518086247
@@ -44,108 +39,16 @@ class FiveMinIE(InfoExtractor):
},
'skip': 'no longer available',
},
{
'url': 'http://embed.5min.com/518726732/',
'only_matching': True,
},
{
'url': 'http://delivery.vidible.tv/aol?playList=518013791',
'only_matching': True,
}
]
_ERRORS = {
'ErrorVideoNotExist': 'We\'re sorry, but the video you are trying to watch does not exist.',
'ErrorVideoNoLongerAvailable': 'We\'re sorry, but the video you are trying to watch is no longer available.',
'ErrorVideoRejected': 'We\'re sorry, but the video you are trying to watch has been removed.',
'ErrorVideoUserNotGeo': 'We\'re sorry, but the video you are trying to watch cannot be viewed from your current location.',
'ErrorVideoLibraryRestriction': 'We\'re sorry, but the video you are trying to watch is currently unavailable for viewing at this domain.',
'ErrorExposurePermission': 'We\'re sorry, but the video you are trying to watch is currently unavailable for viewing at this domain.',
}
_QUALITIES = {
1: {
'width': 640,
'height': 360,
},
2: {
'width': 854,
'height': 480,
},
4: {
'width': 1280,
'height': 720,
},
8: {
'width': 1920,
'height': 1080,
},
16: {
'width': 640,
'height': 360,
},
32: {
'width': 854,
'height': 480,
},
64: {
'width': 1280,
'height': 720,
},
128: {
'width': 640,
'height': 360,
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
sid = mobj.group('sid')
if mobj.group('query'):
qs = compat_parse_qs(mobj.group('query'))
if not qs.get('playList'):
raise ExtractorError('Invalid URL', expected=True)
video_id = qs['playList'][0]
if qs.get('sid'):
sid = qs['sid'][0]
embed_url = 'https://embed.5min.com/playerseed/?playList=%s' % video_id
if not sid:
embed_page = self._download_webpage(embed_url, video_id,
'Downloading embed page')
sid = self._search_regex(r'sid=(\d+)', embed_page, 'sid')
response = self._download_json(
'https://syn.5min.com/handlers/SenseHandler.ashx?' +
compat_urllib_parse_urlencode({
'func': 'GetResults',
'playlist': video_id,
'sid': sid,
'isPlayerSeed': 'true',
'url': embed_url,
}),
video_id)
if not response['success']:
raise ExtractorError(
'%s said: %s' % (
self.IE_NAME,
self._ERRORS.get(response['errorMessage'], response['errorMessage'])),
expected=True)
info = response['binding'][0]
formats = []
parsed_video_url = compat_urllib_parse_urlparse(compat_parse_qs(
compat_urllib_parse_urlparse(info['EmbededURL']).query)['videoUrl'][0])
for rendition in info['Renditions']:
if rendition['RenditionType'] == 'aac' or rendition['RenditionType'] == 'm3u8':
continue
else:
rendition_url = compat_urlparse.urlunparse(parsed_video_url._replace(path=replace_extension(parsed_video_url.path.replace('//', '/%s/' % rendition['ID']), rendition['RenditionType'])))
quality = self._QUALITIES.get(rendition['ID'], {})
formats.append({
'format_id': '%s-%d' % (rendition['RenditionType'], rendition['ID']),
'url': rendition_url,
'width': quality.get('width'),
'height': quality.get('height'),
})
self._sort_formats(formats)
return {
'id': video_id,
'title': info['Title'],
'thumbnail': info.get('ThumbURL'),
'duration': parse_duration(info.get('Duration')),
'formats': formats,
}
video_id = self._match_id(url)
return self.url_result('aol-video:%s' % video_id)

View File

@@ -48,7 +48,7 @@ class FlipagramIE(InfoExtractor):
flipagram = video_data['flipagram']
video = flipagram['video']
json_ld = self._search_json_ld(webpage, video_id, default=False)
json_ld = self._search_json_ld(webpage, video_id, default={})
title = json_ld.get('title') or flipagram['captionText']
description = json_ld.get('description') or flipagram.get('captionText')

View File

@@ -2,7 +2,10 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import smuggle_url
from ..utils import (
smuggle_url,
update_url_query,
)
class FOXIE(InfoExtractor):
@@ -29,11 +32,12 @@ class FOXIE(InfoExtractor):
release_url = self._parse_json(self._search_regex(
r'"fox_pdk_player"\s*:\s*({[^}]+?})', webpage, 'fox_pdk_player'),
video_id)['release_url'] + '&switch=http'
video_id)['release_url']
return {
'_type': 'url_transparent',
'ie_key': 'ThePlatform',
'url': smuggle_url(release_url, {'force_smil_url': True}),
'url': smuggle_url(update_url_query(
release_url, {'switch': 'http'}), {'force_smil_url': True}),
'id': video_id,
}

View File

@@ -1,76 +0,0 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
xpath_text,
xpath_with_ns,
)
from .youtube import YoutubeIE
class GamekingsIE(InfoExtractor):
_VALID_URL = r'https?://www\.gamekings\.nl/(?:videos|nieuws)/(?P<id>[^/]+)'
_TESTS = [{
# YouTube embed video
'url': 'http://www.gamekings.nl/videos/phoenix-wright-ace-attorney-dual-destinies-review/',
'md5': '5208d3a17adeaef829a7861887cb9029',
'info_dict': {
'id': 'HkSQKetlGOU',
'ext': 'mp4',
'title': 'Phoenix Wright: Ace Attorney - Dual Destinies Review',
'description': 'md5:db88c0e7f47e9ea50df3271b9dc72e1d',
'thumbnail': 're:^https?://.*\.jpg$',
'uploader_id': 'UCJugRGo4STYMeFr5RoOShtQ',
'uploader': 'Gamekings Vault',
'upload_date': '20151123',
},
'add_ie': ['Youtube'],
}, {
# vimeo video
'url': 'http://www.gamekings.nl/videos/the-legend-of-zelda-majoras-mask/',
'md5': '12bf04dfd238e70058046937657ea68d',
'info_dict': {
'id': 'the-legend-of-zelda-majoras-mask',
'ext': 'mp4',
'title': 'The Legend of Zelda: Majoras Mask',
'description': 'md5:9917825fe0e9f4057601fe1e38860de3',
'thumbnail': 're:^https?://.*\.jpg$',
},
}, {
'url': 'http://www.gamekings.nl/nieuws/gamekings-extra-shelly-en-david-bereiden-zich-voor-op-de-livestream/',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
playlist_id = self._search_regex(
r'gogoVideo\([^,]+,\s*"([^"]+)', webpage, 'playlist id')
# Check if a YouTube embed is used
if YoutubeIE.suitable(playlist_id):
return self.url_result(playlist_id, ie='Youtube')
playlist = self._download_xml(
'http://www.gamekings.tv/wp-content/themes/gk2010/rss_playlist.php?id=%s' % playlist_id,
video_id)
NS_MAP = {
'jwplayer': 'http://rss.jwpcdn.com/'
}
item = playlist.find('./channel/item')
thumbnail = xpath_text(item, xpath_with_ns('./jwplayer:image', NS_MAP), 'thumbnail')
video_url = item.find(xpath_with_ns('./jwplayer:source', NS_MAP)).get('file')
return {
'id': video_id,
'url': video_url,
'title': self._og_search_title(webpage),
'description': self._og_search_description(webpage),
'thumbnail': thumbnail,
}

View File

@@ -71,6 +71,7 @@ from .vessel import VesselIE
from .kaltura import KalturaIE
from .eagleplatform import EaglePlatformIE
from .facebook import FacebookIE
from .soundcloud import SoundcloudIE
class GenericIE(InfoExtractor):
@@ -784,6 +785,15 @@ class GenericIE(InfoExtractor):
'upload_date': '20141029',
}
},
# Soundcloud multiple embeds
{
'url': 'http://www.guitarplayer.com/lessons/1014/legato-workout-one-hour-to-more-fluid-performance---tab/52809',
'info_dict': {
'id': '52809',
'title': 'Guitar Essentials: Legato Workout—One-Hour to Fluid Performance | TAB + AUDIO',
},
'playlist_mincount': 7,
},
# Livestream embed
{
'url': 'http://www.esa.int/Our_Activities/Space_Science/Rosetta/Philae_comet_touch-down_webcast',
@@ -1999,12 +2009,9 @@ class GenericIE(InfoExtractor):
return self.url_result(myvi_url)
# Look for embedded soundcloud player
mobj = re.search(
r'<iframe\s+(?:[a-zA-Z0-9_-]+="[^"]+"\s+)*src="(?P<url>https?://(?:w\.)?soundcloud\.com/player[^"]+)"',
webpage)
if mobj is not None:
url = unescapeHTML(mobj.group('url'))
return self.url_result(url)
soundcloud_urls = SoundcloudIE._extract_urls(webpage)
if soundcloud_urls:
return _playlist_from_matches(soundcloud_urls, getter=unescapeHTML, ie=SoundcloudIE.ie_key())
# Look for embedded mtvservices player
mtvservices_url = MTVServicesEmbeddedIE._extract_url(webpage)
@@ -2200,6 +2207,14 @@ class GenericIE(InfoExtractor):
return self.url_result(
self._proto_relative_url(unescapeHTML(mobj.group(1))), 'Vine')
# Look for VODPlatform embeds
mobj = re.search(
r'<iframe[^>]+src=[\'"]((?:https?:)?//(?:www\.)?vod-platform\.net/embed/[^/?#]+)',
webpage)
if mobj is not None:
return self.url_result(
self._proto_relative_url(unescapeHTML(mobj.group(1))), 'VODPlatform')
# Look for Instagram embeds
instagram_embed_url = InstagramIE._extract_embed_url(webpage)
if instagram_embed_url is not None:
@@ -2226,8 +2241,8 @@ class GenericIE(InfoExtractor):
# Looking for http://schema.org/VideoObject
json_ld = self._search_json_ld(
webpage, video_id, default=None, expected_type='VideoObject')
if json_ld and json_ld.get('url'):
webpage, video_id, default={}, expected_type='VideoObject')
if json_ld.get('url'):
info_dict.update({
'title': video_title or info_dict['title'],
'description': video_description,

View File

@@ -4,10 +4,12 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_urlparse
from ..utils import (
determine_ext,
float_or_none,
int_or_none,
mimetype2ext,
)
@@ -28,74 +30,84 @@ class JWPlatformBaseIE(InfoExtractor):
return self._parse_jwplayer_data(
jwplayer_data, video_id, *args, **kwargs)
def _parse_jwplayer_data(self, jwplayer_data, video_id, require_title=True, m3u8_id=None, rtmp_params=None):
def _parse_jwplayer_data(self, jwplayer_data, video_id, require_title=True, m3u8_id=None, rtmp_params=None, base_url=None):
# JWPlayer backward compatibility: flattened playlists
# https://github.com/jwplayer/jwplayer/blob/v7.4.3/src/js/api/config.js#L81-L96
if 'playlist' not in jwplayer_data:
jwplayer_data = {'playlist': [jwplayer_data]}
video_data = jwplayer_data['playlist'][0]
entries = []
for video_data in jwplayer_data['playlist']:
# JWPlayer backward compatibility: flattened sources
# https://github.com/jwplayer/jwplayer/blob/v7.4.3/src/js/playlist/item.js#L29-L35
if 'sources' not in video_data:
video_data['sources'] = [video_data]
# JWPlayer backward compatibility: flattened sources
# https://github.com/jwplayer/jwplayer/blob/v7.4.3/src/js/playlist/item.js#L29-L35
if 'sources' not in video_data:
video_data['sources'] = [video_data]
formats = []
for source in video_data['sources']:
source_url = self._proto_relative_url(source['file'])
source_type = source.get('type') or ''
if source_type in ('application/vnd.apple.mpegurl', 'hls') or determine_ext(source_url) == 'm3u8':
formats.extend(self._extract_m3u8_formats(
source_url, video_id, 'mp4', 'm3u8_native', m3u8_id=m3u8_id, fatal=False))
elif source_type.startswith('audio'):
formats.append({
'url': source_url,
'vcodec': 'none',
})
else:
a_format = {
'url': source_url,
'width': int_or_none(source.get('width')),
'height': int_or_none(source.get('height')),
}
if source_url.startswith('rtmp'):
a_format['ext'] = 'flv',
# See com/longtailvideo/jwplayer/media/RTMPMediaProvider.as
# of jwplayer.flash.swf
rtmp_url_parts = re.split(
r'((?:mp4|mp3|flv):)', source_url, 1)
if len(rtmp_url_parts) == 3:
rtmp_url, prefix, play_path = rtmp_url_parts
a_format.update({
'url': rtmp_url,
'play_path': prefix + play_path,
})
if rtmp_params:
a_format.update(rtmp_params)
formats.append(a_format)
self._sort_formats(formats)
subtitles = {}
tracks = video_data.get('tracks')
if tracks and isinstance(tracks, list):
for track in tracks:
if track.get('file') and track.get('kind') == 'captions':
subtitles.setdefault(track.get('label') or 'en', []).append({
'url': self._proto_relative_url(track['file'])
formats = []
for source in video_data['sources']:
source_url = self._proto_relative_url(source['file'])
if base_url:
source_url = compat_urlparse.urljoin(base_url, source_url)
source_type = source.get('type') or ''
ext = mimetype2ext(source_type) or determine_ext(source_url)
if source_type == 'hls' or ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
source_url, video_id, 'mp4', 'm3u8_native', m3u8_id=m3u8_id, fatal=False))
# https://github.com/jwplayer/jwplayer/blob/master/src/js/providers/default.js#L67
elif source_type.startswith('audio') or ext in ('oga', 'aac', 'mp3', 'mpeg', 'vorbis'):
formats.append({
'url': source_url,
'vcodec': 'none',
'ext': ext,
})
else:
a_format = {
'url': source_url,
'width': int_or_none(source.get('width')),
'height': int_or_none(source.get('height')),
'ext': ext,
}
if source_url.startswith('rtmp'):
a_format['ext'] = 'flv',
return {
'id': video_id,
'title': video_data['title'] if require_title else video_data.get('title'),
'description': video_data.get('description'),
'thumbnail': self._proto_relative_url(video_data.get('image')),
'timestamp': int_or_none(video_data.get('pubdate')),
'duration': float_or_none(jwplayer_data.get('duration')),
'subtitles': subtitles,
'formats': formats,
}
# See com/longtailvideo/jwplayer/media/RTMPMediaProvider.as
# of jwplayer.flash.swf
rtmp_url_parts = re.split(
r'((?:mp4|mp3|flv):)', source_url, 1)
if len(rtmp_url_parts) == 3:
rtmp_url, prefix, play_path = rtmp_url_parts
a_format.update({
'url': rtmp_url,
'play_path': prefix + play_path,
})
if rtmp_params:
a_format.update(rtmp_params)
formats.append(a_format)
self._sort_formats(formats)
subtitles = {}
tracks = video_data.get('tracks')
if tracks and isinstance(tracks, list):
for track in tracks:
if track.get('file') and track.get('kind') == 'captions':
subtitles.setdefault(track.get('label') or 'en', []).append({
'url': self._proto_relative_url(track['file'])
})
entries.append({
'id': video_id,
'title': video_data['title'] if require_title else video_data.get('title'),
'description': video_data.get('description'),
'thumbnail': self._proto_relative_url(video_data.get('image')),
'timestamp': int_or_none(video_data.get('pubdate')),
'duration': float_or_none(jwplayer_data.get('duration')),
'subtitles': subtitles,
'formats': formats,
})
if len(entries) == 1:
return entries[0]
else:
return self.playlist_result(entries)
class JWPlatformIE(JWPlatformBaseIE):

View File

@@ -62,6 +62,11 @@ class KalturaIE(InfoExtractor):
{
'url': 'https://cdnapisec.kaltura.com/html5/html5lib/v2.30.2/mwEmbedFrame.php/p/1337/uiconf_id/20540612/entry_id/1_sf5ovm7u?wid=_243342',
'only_matching': True,
},
{
# video with subtitles
'url': 'kaltura:111032:1_cw786r8q',
'only_matching': True,
}
]
@@ -130,7 +135,6 @@ class KalturaIE(InfoExtractor):
video_id, actions, service_url, note='Downloading Kaltura signature')['ks']
def _get_video_info(self, video_id, partner_id, service_url=None):
signature = self._get_kaltura_signature(video_id, partner_id, service_url)
actions = [
{
'action': 'null',
@@ -138,18 +142,30 @@ class KalturaIE(InfoExtractor):
'clientTag': 'kdp:v3.8.5',
'format': 1, # JSON, 2 = XML, 3 = PHP
'service': 'multirequest',
'ks': signature,
},
{
'expiry': 86400,
'service': 'session',
'action': 'startWidgetSession',
'widgetId': '_%s' % partner_id,
},
{
'action': 'get',
'entryId': video_id,
'service': 'baseentry',
'version': '-1',
'ks': '{1:result:ks}',
},
{
'action': 'getbyentryid',
'entryId': video_id,
'service': 'flavorAsset',
'ks': '{1:result:ks}',
},
{
'action': 'list',
'filter:entryIdEqual': video_id,
'service': 'caption_captionasset',
'ks': '{1:result:ks}',
},
]
return self._kaltura_api_call(
@@ -161,8 +177,9 @@ class KalturaIE(InfoExtractor):
mobj = re.match(self._VALID_URL, url)
partner_id, entry_id = mobj.group('partner_id', 'id')
ks = None
captions = None
if partner_id and entry_id:
info, flavor_assets = self._get_video_info(entry_id, partner_id, smuggled_data.get('service_url'))
_, info, flavor_assets, captions = self._get_video_info(entry_id, partner_id, smuggled_data.get('service_url'))
else:
path, query = mobj.group('path', 'query')
if not path and not query:
@@ -181,7 +198,7 @@ class KalturaIE(InfoExtractor):
raise ExtractorError('Invalid URL', expected=True)
if 'entry_id' in params:
entry_id = params['entry_id'][0]
info, flavor_assets = self._get_video_info(entry_id, partner_id)
_, info, flavor_assets, captions = self._get_video_info(entry_id, partner_id)
elif 'uiconf_id' in params and 'flashvars[referenceId]' in params:
reference_id = params['flashvars[referenceId]'][0]
webpage = self._download_webpage(url, reference_id)
@@ -217,7 +234,7 @@ class KalturaIE(InfoExtractor):
formats = []
for f in flavor_assets:
# Continue if asset is not ready
if f['status'] != 2:
if f.get('status') != 2:
continue
video_url = sign_url(
'%s/flavorId/%s' % (data_url, f['id']))
@@ -240,13 +257,24 @@ class KalturaIE(InfoExtractor):
m3u8_url, entry_id, 'mp4', 'm3u8_native',
m3u8_id='hls', fatal=False))
self._check_formats(formats, entry_id)
self._sort_formats(formats)
subtitles = {}
if captions:
for caption in captions.get('objects', []):
# Continue if caption is not ready
if f.get('status') != 2:
continue
subtitles.setdefault(caption.get('languageCode') or caption.get('language'), []).append({
'url': '%s/api_v3/service/caption_captionasset/action/serve/captionAssetId/%s' % (self._SERVICE_URL, caption['id']),
'ext': caption.get('fileExt'),
})
return {
'id': entry_id,
'title': info['name'],
'formats': formats,
'subtitles': subtitles,
'description': clean_html(info.get('description')),
'thumbnail': info.get('thumbnailUrl'),
'duration': info.get('duration'),

View File

@@ -4,6 +4,7 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_urlparse
from ..utils import (
get_element_by_id,
clean_html,
@@ -242,8 +243,9 @@ class KuwoSingerIE(InfoExtractor):
query={'artistId': artist_id, 'pn': page_num, 'rn': self.PAGE_SIZE})
return [
self.url_result(song_url, 'Kuwo') for song_url in re.findall(
r'<div[^>]+class="name"><a[^>]+href="(http://www\.kuwo\.cn/yinyue/\d+)',
self.url_result(compat_urlparse.urljoin(url, song_url), 'Kuwo')
for song_url in re.findall(
r'<div[^>]+class="name"><a[^>]+href="(/yinyue/\d+)',
webpage)
]

View File

@@ -37,11 +37,12 @@ class LimelightBaseIE(InfoExtractor):
for stream in streams:
stream_url = stream.get('url')
if not stream_url:
if not stream_url or stream.get('drmProtected'):
continue
if '.f4m' in stream_url:
ext = determine_ext(stream_url)
if ext == 'f4m':
formats.extend(self._extract_f4m_formats(
stream_url, video_id, fatal=False))
stream_url, video_id, f4m_id='hds', fatal=False))
else:
fmt = {
'url': stream_url,
@@ -50,13 +51,19 @@ class LimelightBaseIE(InfoExtractor):
'fps': float_or_none(stream.get('videoFrameRate')),
'width': int_or_none(stream.get('videoWidthInPixels')),
'height': int_or_none(stream.get('videoHeightInPixels')),
'ext': determine_ext(stream_url)
'ext': ext,
}
rtmp = re.search(r'^(?P<url>rtmpe?://[^/]+/(?P<app>.+))/(?P<playpath>mp4:.+)$', stream_url)
rtmp = re.search(r'^(?P<url>rtmpe?://(?P<host>[^/]+)/(?P<app>.+))/(?P<playpath>mp4:.+)$', stream_url)
if rtmp:
format_id = 'rtmp'
if stream.get('videoBitRate'):
format_id += '-%d' % int_or_none(stream['videoBitRate'])
http_fmt = fmt.copy()
http_fmt.update({
'url': 'http://%s/%s' % (rtmp.group('host').replace('csl.', 'cpl.'), rtmp.group('playpath')[4:]),
'format_id': format_id.replace('rtmp', 'http'),
})
formats.append(http_fmt)
fmt.update({
'url': rtmp.group('url'),
'play_path': rtmp.group('playpath'),
@@ -68,18 +75,23 @@ class LimelightBaseIE(InfoExtractor):
for mobile_url in mobile_urls:
media_url = mobile_url.get('mobileUrl')
if not media_url:
continue
format_id = mobile_url.get('targetMediaPlatform')
if determine_ext(media_url) == 'm3u8':
if not media_url or format_id == 'Widevine':
continue
ext = determine_ext(media_url)
if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
media_url, video_id, 'mp4', 'm3u8_native',
m3u8_id=format_id, fatal=False))
elif ext == 'f4m':
formats.extend(self._extract_f4m_formats(
stream_url, video_id, f4m_id=format_id, fatal=False))
else:
formats.append({
'url': media_url,
'format_id': format_id,
'preference': -1,
'ext': ext,
})
self._sort_formats(formats)
@@ -145,7 +157,7 @@ class LimelightMediaIE(LimelightBaseIE):
'url': 'http://link.videoplatform.limelight.com/media/?mediaId=3ffd040b522b4485b6d84effc750cd86',
'info_dict': {
'id': '3ffd040b522b4485b6d84effc750cd86',
'ext': 'flv',
'ext': 'mp4',
'title': 'HaP and the HB Prince Trailer',
'description': 'md5:8005b944181778e313d95c1237ddb640',
'thumbnail': 're:^https?://.*\.jpeg$',
@@ -154,27 +166,23 @@ class LimelightMediaIE(LimelightBaseIE):
'upload_date': '20090604',
},
'params': {
# rtmp download
# m3u8 download
'skip_download': True,
},
}, {
# video with subtitles
'url': 'limelight:media:a3e00274d4564ec4a9b29b9466432335',
'md5': '2fa3bad9ac321e23860ca23bc2c69e3d',
'info_dict': {
'id': 'a3e00274d4564ec4a9b29b9466432335',
'ext': 'flv',
'ext': 'mp4',
'title': '3Play Media Overview Video',
'description': '',
'thumbnail': 're:^https?://.*\.jpeg$',
'duration': 78.101,
'timestamp': 1338929955,
'upload_date': '20120605',
'subtitles': 'mincount:9',
},
'params': {
# rtmp download
'skip_download': True,
},
}, {
'url': 'https://assets.delvenetworks.com/player/loader.swf?mediaId=8018a574f08d416e95ceaccae4ba0452',
'only_matching': True,

View File

@@ -1,16 +1,19 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from .theplatform import ThePlatformIE
from ..utils import (
smuggle_url,
url_basename,
update_url_query,
get_element_by_class,
)
class NationalGeographicIE(InfoExtractor):
IE_NAME = 'natgeo'
class NationalGeographicVideoIE(InfoExtractor):
IE_NAME = 'natgeo:video'
_VALID_URL = r'https?://video\.nationalgeographic\.com/.*?'
_TESTS = [
@@ -62,16 +65,16 @@ class NationalGeographicIE(InfoExtractor):
}
class NationalGeographicChannelIE(ThePlatformIE):
IE_NAME = 'natgeo:channel'
_VALID_URL = r'https?://channel\.nationalgeographic\.com/(?:wild/)?[^/]+/videos/(?P<id>[^/?]+)'
class NationalGeographicIE(ThePlatformIE):
IE_NAME = 'natgeo'
_VALID_URL = r'https?://channel\.nationalgeographic\.com/(?:wild/)?[^/]+/(?:videos|episodes)/(?P<id>[^/?]+)'
_TESTS = [
{
'url': 'http://channel.nationalgeographic.com/the-story-of-god-with-morgan-freeman/videos/uncovering-a-universal-knowledge/',
'md5': '518c9aa655686cf81493af5cc21e2a04',
'info_dict': {
'id': 'nB5vIAfmyllm',
'id': 'vKInpacll2pC',
'ext': 'mp4',
'title': 'Uncovering a Universal Knowledge',
'description': 'md5:1a89148475bf931b3661fcd6ddb2ae3a',
@@ -85,7 +88,7 @@ class NationalGeographicChannelIE(ThePlatformIE):
'url': 'http://channel.nationalgeographic.com/wild/destination-wild/videos/the-stunning-red-bird-of-paradise/',
'md5': 'c4912f656b4cbe58f3e000c489360989',
'info_dict': {
'id': '3TmMv9OvGwIR',
'id': 'Pok5lWCkiEFA',
'ext': 'mp4',
'title': 'The Stunning Red Bird of Paradise',
'description': 'md5:7bc8cd1da29686be4d17ad1230f0140c',
@@ -95,6 +98,10 @@ class NationalGeographicChannelIE(ThePlatformIE):
},
'add_ie': ['ThePlatform'],
},
{
'url': 'http://channel.nationalgeographic.com/the-story-of-god-with-morgan-freeman/episodes/the-power-of-miracles/',
'only_matching': True,
}
]
def _real_extract(self, url):
@@ -122,3 +129,40 @@ class NationalGeographicChannelIE(ThePlatformIE):
{'force_smil_url': True}),
'display_id': display_id,
}
class NationalGeographicEpisodeGuideIE(ThePlatformIE):
IE_NAME = 'natgeo:episodeguide'
_VALID_URL = r'https?://channel\.nationalgeographic\.com/(?:wild/)?(?P<id>[^/]+)/episode-guide'
_TESTS = [
{
'url': 'http://channel.nationalgeographic.com/the-story-of-god-with-morgan-freeman/episode-guide/',
'info_dict': {
'id': 'the-story-of-god-with-morgan-freeman-season-1',
'title': 'The Story of God with Morgan Freeman - Season 1',
},
'playlist_mincount': 6,
},
{
'url': 'http://channel.nationalgeographic.com/underworld-inc/episode-guide/?s=2',
'info_dict': {
'id': 'underworld-inc-season-2',
'title': 'Underworld, Inc. - Season 2',
},
'playlist_mincount': 7,
},
]
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
show = get_element_by_class('show', webpage)
selected_season = self._search_regex(
r'<div[^>]+class="select-seasons[^"]*".*?<a[^>]*>(.*?)</a>',
webpage, 'selected season')
entries = [
self.url_result(self._proto_relative_url(entry_url), 'NationalGeographic')
for entry_url in re.findall('(?s)<div[^>]+class="col-inner"[^>]*?>.*?<a[^>]+href="([^"]+)"', webpage)]
return self.playlist_result(
entries, '%s-%s' % (display_id, selected_season.lower().replace(' ', '-')),
'%s - %s' % (show, selected_season))

View File

@@ -4,12 +4,10 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import (
compat_urllib_parse_urlencode,
compat_urlparse,
)
from ..utils import (
ExtractorError,
int_or_none,
update_url_query,
)
@@ -51,48 +49,74 @@ class NaverIE(InfoExtractor):
if error:
raise ExtractorError(error, expected=True)
raise ExtractorError('couldn\'t extract vid and key')
vid = m_id.group(1)
key = m_id.group(2)
query = compat_urllib_parse_urlencode({'vid': vid, 'inKey': key, })
query_urls = compat_urllib_parse_urlencode({
'masterVid': vid,
'protocol': 'p2p',
'inKey': key,
})
info = self._download_xml(
'http://serviceapi.rmcnmv.naver.com/flash/videoInfo.nhn?' + query,
video_id, 'Downloading video info')
urls = self._download_xml(
'http://serviceapi.rmcnmv.naver.com/flash/playableEncodingOption.nhn?' + query_urls,
video_id, 'Downloading video formats info')
video_data = self._download_json(
'http://play.rmcnmv.naver.com/vod/play/v2.0/' + m_id.group(1),
video_id, query={
'key': m_id.group(2),
})
meta = video_data['meta']
title = meta['subject']
formats = []
for format_el in urls.findall('EncodingOptions/EncodingOption'):
domain = format_el.find('Domain').text
uri = format_el.find('uri').text
f = {
'url': compat_urlparse.urljoin(domain, uri),
'ext': 'mp4',
'width': int(format_el.find('width').text),
'height': int(format_el.find('height').text),
}
if domain.startswith('rtmp'):
# urlparse does not support custom schemes
# https://bugs.python.org/issue18828
f.update({
'url': domain + uri,
'ext': 'flv',
'rtmp_protocol': '1', # rtmpt
def extract_formats(streams, stream_type, query={}):
for stream in streams:
stream_url = stream.get('source')
if not stream_url:
continue
stream_url = update_url_query(stream_url, query)
encoding_option = stream.get('encodingOption', {})
bitrate = stream.get('bitrate', {})
formats.append({
'format_id': '%s_%s' % (stream.get('type') or stream_type, encoding_option.get('id') or encoding_option.get('name')),
'url': stream_url,
'width': int_or_none(encoding_option.get('width')),
'height': int_or_none(encoding_option.get('height')),
'vbr': int_or_none(bitrate.get('video')),
'abr': int_or_none(bitrate.get('audio')),
'filesize': int_or_none(stream.get('size')),
'protocol': 'm3u8_native' if stream_type == 'HLS' else None,
})
formats.append(f)
extract_formats(video_data.get('videos', {}).get('list', []), 'H264')
for stream_set in video_data.get('streams', []):
query = {}
for param in stream_set.get('keys', []):
query[param['name']] = param['value']
stream_type = stream_set.get('type')
videos = stream_set.get('videos')
if videos:
extract_formats(videos, stream_type, query)
elif stream_type == 'HLS':
stream_url = stream_set.get('source')
if not stream_url:
continue
formats.extend(self._extract_m3u8_formats(
update_url_query(stream_url, query), video_id,
'mp4', 'm3u8_native', m3u8_id=stream_type, fatal=False))
self._sort_formats(formats)
subtitles = {}
for caption in video_data.get('captions', {}).get('list', []):
caption_url = caption.get('source')
if not caption_url:
continue
subtitles.setdefault(caption.get('language') or caption.get('locale'), []).append({
'url': caption_url,
})
upload_date = self._search_regex(
r'<span[^>]+class="date".*?(\d{4}\.\d{2}\.\d{2})',
webpage, 'upload date', fatal=False)
if upload_date:
upload_date = upload_date.replace('.', '')
return {
'id': video_id,
'title': info.find('Subject').text,
'title': title,
'formats': formats,
'subtitles': subtitles,
'description': self._og_search_description(webpage),
'thumbnail': self._og_search_thumbnail(webpage),
'upload_date': info.find('WriteDate').text.replace('.', ''),
'view_count': int(info.find('PlayCount').text),
'thumbnail': meta.get('cover', {}).get('source') or self._og_search_thumbnail(webpage),
'view_count': int_or_none(meta.get('count')),
'upload_date': upload_date,
}

View File

@@ -11,70 +11,64 @@ from ..utils import (
class NTVRuIE(InfoExtractor):
IE_NAME = 'ntv.ru'
_VALID_URL = r'https?://(?:www\.)?ntv\.ru/(?P<id>.+)'
_VALID_URL = r'https?://(?:www\.)?ntv\.ru/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_TESTS = [
{
'url': 'http://www.ntv.ru/novosti/863142/',
'md5': 'ba7ea172a91cb83eb734cad18c10e723',
'info_dict': {
'id': '746000',
'ext': 'mp4',
'title': 'Командующий Черноморским флотом провел переговоры в штабе ВМС Украины',
'description': 'Командующий Черноморским флотом провел переговоры в штабе ВМС Украины',
'thumbnail': 're:^http://.*\.jpg',
'duration': 136,
},
_TESTS = [{
'url': 'http://www.ntv.ru/novosti/863142/',
'md5': 'ba7ea172a91cb83eb734cad18c10e723',
'info_dict': {
'id': '746000',
'ext': 'mp4',
'title': 'Командующий Черноморским флотом провел переговоры в штабе ВМС Украины',
'description': 'Командующий Черноморским флотом провел переговоры в штабе ВМС Украины',
'thumbnail': 're:^http://.*\.jpg',
'duration': 136,
},
{
'url': 'http://www.ntv.ru/video/novosti/750370/',
'md5': 'adecff79691b4d71e25220a191477124',
'info_dict': {
'id': '750370',
'ext': 'mp4',
'title': 'Родные пассажиров пропавшего Boeing не верят в трагический исход',
'description': 'Родные пассажиров пропавшего Boeing не верят в трагический исход',
'thumbnail': 're:^http://.*\.jpg',
'duration': 172,
},
}, {
'url': 'http://www.ntv.ru/video/novosti/750370/',
'md5': 'adecff79691b4d71e25220a191477124',
'info_dict': {
'id': '750370',
'ext': 'mp4',
'title': 'Родные пассажиров пропавшего Boeing не верят в трагический исход',
'description': 'Родные пассажиров пропавшего Boeing не верят в трагический исход',
'thumbnail': 're:^http://.*\.jpg',
'duration': 172,
},
{
'url': 'http://www.ntv.ru/peredacha/segodnya/m23700/o232416',
'md5': '82dbd49b38e3af1d00df16acbeab260c',
'info_dict': {
'id': '747480',
'ext': 'mp4',
'title': '«Сегодня». 21 марта 2014 года. 16:00',
'description': '«Сегодня». 21 марта 2014 года. 16:00',
'thumbnail': 're:^http://.*\.jpg',
'duration': 1496,
},
}, {
'url': 'http://www.ntv.ru/peredacha/segodnya/m23700/o232416',
'md5': '82dbd49b38e3af1d00df16acbeab260c',
'info_dict': {
'id': '747480',
'ext': 'mp4',
'title': '«Сегодня». 21 марта 2014 года. 16:00',
'description': '«Сегодня». 21 марта 2014 года. 16:00',
'thumbnail': 're:^http://.*\.jpg',
'duration': 1496,
},
{
'url': 'http://www.ntv.ru/kino/Koma_film',
'md5': 'f825770930937aa7e5aca0dc0d29319a',
'info_dict': {
'id': '1007609',
'ext': 'mp4',
'title': 'Остросюжетный фильм «Кома»',
'description': 'Остросюжетный фильм «Кома»',
'thumbnail': 're:^http://.*\.jpg',
'duration': 5592,
},
}, {
'url': 'http://www.ntv.ru/kino/Koma_film',
'md5': 'f825770930937aa7e5aca0dc0d29319a',
'info_dict': {
'id': '1007609',
'ext': 'mp4',
'title': 'Остросюжетный фильм «Кома»',
'description': 'Остросюжетный фильм «Кома»',
'thumbnail': 're:^http://.*\.jpg',
'duration': 5592,
},
{
'url': 'http://www.ntv.ru/serial/Delo_vrachey/m31760/o233916/',
'md5': '9320cd0e23f3ea59c330dc744e06ff3b',
'info_dict': {
'id': '751482',
'ext': 'mp4',
'title': '«Дело врачей»: «Деревце жизни»',
'description': '«Дело врачей»: «Деревце жизни»',
'thumbnail': 're:^http://.*\.jpg',
'duration': 2590,
},
}, {
'url': 'http://www.ntv.ru/serial/Delo_vrachey/m31760/o233916/',
'md5': '9320cd0e23f3ea59c330dc744e06ff3b',
'info_dict': {
'id': '751482',
'ext': 'mp4',
'title': '«Дело врачей»: «Деревце жизни»',
'description': '«Дело врачей»: «Деревце жизни»',
'thumbnail': 're:^http://.*\.jpg',
'duration': 2590,
},
]
}]
_VIDEO_ID_REGEXES = [
r'<meta property="og:url" content="http://www\.ntv\.ru/video/(\d+)',
@@ -87,11 +81,21 @@ class NTVRuIE(InfoExtractor):
webpage = self._download_webpage(url, video_id)
video_id = self._html_search_regex(self._VIDEO_ID_REGEXES, webpage, 'video id')
video_url = self._og_search_property(
('video', 'video:iframe'), webpage, default=None)
if video_url:
video_id = self._search_regex(
r'https?://(?:www\.)?ntv\.ru/video/(?:embed/)?(\d+)',
video_url, 'video id', default=None)
if not video_id:
video_id = self._html_search_regex(
self._VIDEO_ID_REGEXES, webpage, 'video id')
player = self._download_xml(
'http://www.ntv.ru/vi%s/' % video_id,
video_id, 'Downloading video XML')
title = clean_html(xpath_text(player, './data/title', 'title', fatal=True))
description = clean_html(xpath_text(player, './data/description', 'description'))

View File

@@ -1,15 +1,14 @@
# coding: utf-8
from __future__ import unicode_literals
from __future__ import unicode_literals, division
import re
import math
from .common import InfoExtractor
from ..compat import compat_chr
from ..utils import (
decode_png,
determine_ext,
encode_base_n,
ExtractorError,
mimetype2ext,
)
@@ -41,60 +40,6 @@ class OpenloadIE(InfoExtractor):
'only_matching': True,
}]
@staticmethod
def openload_level2_debase(m):
radix, num = int(m.group(1)) + 27, int(m.group(2))
return '"' + encode_base_n(num, radix) + '"'
@classmethod
def openload_level2(cls, txt):
# The function name is ǃ \u01c3
# Using escaped unicode literals does not work in Python 3.2
return re.sub(r'ǃ\((\d+),(\d+)\)', cls.openload_level2_debase, txt, re.UNICODE).replace('"+"', '')
# Openload uses a variant of aadecode
# openload_decode and related functions are originally written by
# vitas@matfyz.cz and released with public domain
# See https://github.com/rg3/youtube-dl/issues/8489
@classmethod
def openload_decode(cls, txt):
symbol_table = [
('_', '(゚Д゚) [゚Θ゚]'),
('a', '(゚Д゚) [゚ω゚ノ]'),
('b', '(゚Д゚) [゚Θ゚ノ]'),
('c', '(゚Д゚) [\'c\']'),
('d', '(゚Д゚) [゚ー゚ノ]'),
('e', '(゚Д゚) [゚Д゚ノ]'),
('f', '(゚Д゚) [1]'),
('o', '(゚Д゚) [\'o\']'),
('u', '(o゚ー゚o)'),
('c', '(゚Д゚) [\'c\']'),
('7', '((゚ー゚) + (o^_^o))'),
('6', '((o^_^o) +(o^_^o) +(c^_^o))'),
('5', '((゚ー゚) + (゚Θ゚))'),
('4', '(-~3)'),
('3', '(-~-~1)'),
('2', '(-~1)'),
('1', '(-~0)'),
('0', '((c^_^o)-(c^_^o))'),
]
delim = '(゚Д゚)[゚ε゚]+'
ret = ''
for aachar in txt.split(delim):
for val, pat in symbol_table:
aachar = aachar.replace(pat, val)
aachar = aachar.replace('+ ', '')
m = re.match(r'^\d+', aachar)
if m:
ret += compat_chr(int(m.group(0), 8))
else:
m = re.match(r'^u([\da-f]+)', aachar)
if m:
ret += compat_chr(int(m.group(1), 16))
return cls.openload_level2(ret)
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
@@ -102,29 +47,77 @@ class OpenloadIE(InfoExtractor):
if 'File not found' in webpage:
raise ExtractorError('File not found', expected=True)
code = self._search_regex(
r'</video>\s*</div>\s*<script[^>]+>[^>]+</script>\s*<script[^>]+>([^<]+)</script>',
webpage, 'JS code')
# The following extraction logic is proposed by @Belderak and @gdkchan
# and declared to be used freely in youtube-dl
# See https://github.com/rg3/youtube-dl/issues/9706
decoded = self.openload_decode(code)
numbers_js = self._download_webpage(
'https://openload.co/assets/js/obfuscator/n.js', video_id,
note='Downloading signature numbers')
signums = self._search_regex(
r'window\.signatureNumbers\s*=\s*[\'"](?P<data>[a-z]+)[\'"]',
numbers_js, 'signature numbers', group='data')
video_url = self._search_regex(
r'return\s+"(https?://[^"]+)"', decoded, 'video URL')
linkimg_uri = self._search_regex(
r'<img[^>]+id="linkimg"[^>]+src="([^"]+)"', webpage, 'link image')
linkimg = self._request_webpage(
linkimg_uri, video_id, note=False).read()
width, height, pixels = decode_png(linkimg)
output = ''
for y in range(height):
for x in range(width):
r, g, b = pixels[y][3 * x:3 * x + 3]
if r == 0 and g == 0 and b == 0:
break
else:
output += compat_chr(r)
output += compat_chr(g)
output += compat_chr(b)
img_str_length = len(output) // 200
img_str = [[0 for x in range(img_str_length)] for y in range(10)]
sig_str_length = len(signums) // 260
sig_str = [[0 for x in range(sig_str_length)] for y in range(10)]
for i in range(10):
for j in range(img_str_length):
begin = i * img_str_length * 20 + j * 20
img_str[i][j] = output[begin:begin + 20]
for j in range(sig_str_length):
begin = i * sig_str_length * 26 + j * 26
sig_str[i][j] = signums[begin:begin + 26]
parts = []
# TODO: find better names for str_, chr_ and sum_
str_ = ''
for i in [2, 3, 5, 7]:
str_ = ''
sum_ = float(99)
for j in range(len(sig_str[i])):
for chr_idx in range(len(img_str[i][j])):
if sum_ > float(122):
sum_ = float(98)
chr_ = compat_chr(int(math.floor(sum_)))
if sig_str[i][j][chr_idx] == chr_ and j >= len(str_):
sum_ += float(2.5)
str_ += img_str[i][j][chr_idx]
parts.append(str_.replace(',', ''))
video_url = 'https://openload.co/stream/%s~%s~%s~%s' % (parts[3], parts[1], parts[2], parts[0])
title = self._og_search_title(webpage, default=None) or self._search_regex(
r'<span[^>]+class=["\']title["\'][^>]*>([^<]+)', webpage,
'title', default=None) or self._html_search_meta(
'description', webpage, 'title', fatal=True)
ext = mimetype2ext(self._search_regex(
r'window\.vt\s*=\s*(["\'])(?P<mimetype>.+?)\1', decoded,
'mimetype', default=None, group='mimetype')) or determine_ext(
video_url, 'mp4')
return {
'id': video_id,
'title': title,
'ext': ext,
'thumbnail': self._og_search_thumbnail(webpage, default=None),
'url': video_url,
# Seems all videos have extensions in their titles
'ext': determine_ext(title),
}

View File

@@ -0,0 +1,58 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
extract_attributes,
int_or_none,
)
class PokemonIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?pokemon\.com/[a-z]{2}(?:.*?play=(?P<id>[a-z0-9]{32})|/[^/]+/\d+_\d+-(?P<display_id>[^/?#]+))'
_TESTS = [{
'url': 'http://www.pokemon.com/us/pokemon-episodes/19_01-from-a-to-z/?play=true',
'md5': '9fb209ae3a569aac25de0f5afc4ee08f',
'info_dict': {
'id': 'd0436c00c3ce4071ac6cee8130ac54a1',
'ext': 'mp4',
'title': 'From A to Z!',
'description': 'Bonnie makes a new friend, Ash runs into an old friend, and a terrifying premonition begins to unfold!',
'timestamp': 1460478136,
'upload_date': '20160412',
},
'add_id': ['LimelightMedia']
}, {
'url': 'http://www.pokemon.com/uk/pokemon-episodes/?play=2e8b5c761f1d4a9286165d7748c1ece2',
'only_matching': True,
}, {
'url': 'http://www.pokemon.com/fr/episodes-pokemon/18_09-un-hiver-inattendu/',
'only_matching': True,
}, {
'url': 'http://www.pokemon.com/de/pokemon-folgen/01_20-bye-bye-smettbo/',
'only_matching': True,
}]
def _real_extract(self, url):
video_id, display_id = re.match(self._VALID_URL, url).groups()
webpage = self._download_webpage(url, video_id or display_id)
video_data = extract_attributes(self._search_regex(
r'(<[^>]+data-video-id="%s"[^>]*>)' % (video_id if video_id else '[a-z0-9]{32}'),
webpage, 'video data element'))
video_id = video_data['data-video-id']
title = video_data['data-video-title']
return {
'_type': 'url_transparent',
'id': video_id,
'url': 'limelight:media:%s' % video_id,
'title': title,
'description': video_data.get('data-video-summary'),
'thumbnail': video_data.get('data-video-poster'),
'series': 'Pokémon',
'season_number': int_or_none(video_data.get('data-video-season')),
'episode': title,
'episode_number': int_or_none(video_data.get('data-video-episode')),
'ie_key': 'LimelightMedia',
}

View File

@@ -1,55 +1,71 @@
# encoding: utf-8
from __future__ import unicode_literals
import json
import re
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
ExtractorError,
clean_html,
int_or_none,
unified_timestamp,
update_url_query,
)
class RBMARadioIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?rbmaradio\.com/shows/(?P<videoID>[^/]+)$'
_VALID_URL = r'https?://(?:www\.)?rbmaradio\.com/shows/(?P<show_id>[^/]+)/episodes/(?P<id>[^/?#&]+)'
_TEST = {
'url': 'http://www.rbmaradio.com/shows/ford-lopatin-live-at-primavera-sound-2011',
'url': 'https://www.rbmaradio.com/shows/main-stage/episodes/ford-lopatin-live-at-primavera-sound-2011',
'md5': '6bc6f9bcb18994b4c983bc3bf4384d95',
'info_dict': {
'id': 'ford-lopatin-live-at-primavera-sound-2011',
'ext': 'mp3',
'uploader_id': 'ford-lopatin',
'location': 'Spain',
'description': 'Joel Ford and Daniel Oneohtrix Point Never Lopatin fly their midified pop extravaganza to Spain. Live at Primavera Sound 2011.',
'uploader': 'Ford & Lopatin',
'title': 'Live at Primavera Sound 2011',
'title': 'Main Stage - Ford & Lopatin',
'description': 'md5:4f340fb48426423530af5a9d87bd7b91',
'thumbnail': 're:^https?://.*\.jpg',
'duration': 2452,
'timestamp': 1307103164,
'upload_date': '20110603',
},
}
def _real_extract(self, url):
m = re.match(self._VALID_URL, url)
video_id = m.group('videoID')
mobj = re.match(self._VALID_URL, url)
show_id = mobj.group('show_id')
episode_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
webpage = self._download_webpage(url, episode_id)
json_data = self._search_regex(r'window\.gon.*?gon\.show=(.+?);$',
webpage, 'json data', flags=re.MULTILINE)
episode = self._parse_json(
self._search_regex(
r'__INITIAL_STATE__\s*=\s*({.+?})\s*</script>',
webpage, 'json data'),
episode_id)['episodes'][show_id][episode_id]
try:
data = json.loads(json_data)
except ValueError as e:
raise ExtractorError('Invalid JSON: ' + str(e))
title = episode['title']
video_url = data['akamai_url'] + '&cbr=256'
show_title = episode.get('showTitle')
if show_title:
title = '%s - %s' % (show_title, title)
formats = [{
'url': update_url_query(episode['audioURL'], query={'cbr': abr}),
'format_id': compat_str(abr),
'abr': abr,
'vcodec': 'none',
} for abr in (96, 128, 256)]
description = clean_html(episode.get('longTeaser'))
thumbnail = self._proto_relative_url(episode.get('imageURL', {}).get('landscape'))
duration = int_or_none(episode.get('duration'))
timestamp = unified_timestamp(episode.get('publishedAt'))
return {
'id': video_id,
'url': video_url,
'title': data['title'],
'description': data.get('teaser_text'),
'location': data.get('country_of_origin'),
'uploader': data.get('host', {}).get('name'),
'uploader_id': data.get('host', {}).get('slug'),
'thumbnail': data.get('image', {}).get('large_url_2x'),
'duration': data.get('duration'),
'id': episode_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'duration': duration,
'timestamp': timestamp,
'formats': formats,
}

View File

@@ -0,0 +1,50 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
int_or_none,
remove_start,
)
class RozhlasIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?prehravac\.rozhlas\.cz/audio/(?P<id>[0-9]+)'
_TESTS = [{
'url': 'http://prehravac.rozhlas.cz/audio/3421320',
'md5': '504c902dbc9e9a1fd50326eccf02a7e2',
'info_dict': {
'id': '3421320',
'ext': 'mp3',
'title': 'Echo Pavla Klusáka (30.06.2015 21:00)',
'description': 'Osmdesátiny Terryho Rileyho jsou skvělou příležitostí proletět se elektronickými i akustickými díly zakladatatele minimalismu, který je aktivní už přes padesát let'
}
}, {
'url': 'http://prehravac.rozhlas.cz/audio/3421320/embed',
'skip_download': True,
}]
def _real_extract(self, url):
audio_id = self._match_id(url)
webpage = self._download_webpage(
'http://prehravac.rozhlas.cz/audio/%s' % audio_id, audio_id)
title = self._html_search_regex(
r'<h3>(.+?)</h3>\s*<p[^>]*>.*?</p>\s*<div[^>]+id=["\']player-track',
webpage, 'title', default=None) or remove_start(
self._og_search_title(webpage), 'Radio Wave - ')
description = self._html_search_regex(
r'<p[^>]+title=(["\'])(?P<url>(?:(?!\1).)+)\1[^>]*>.*?</p>\s*<div[^>]+id=["\']player-track',
webpage, 'description', fatal=False, group='url')
duration = int_or_none(self._search_regex(
r'data-duration=["\'](\d+)', webpage, 'duration', default=None))
return {
'id': audio_id,
'url': 'http://media.rozhlas.cz/_audio/%s.mp3' % audio_id,
'title': title,
'description': description,
'duration': duration,
'vcodec': 'none',
}

View File

@@ -113,6 +113,8 @@ class RTVEALaCartaIE(InfoExtractor):
png = self._download_webpage(png_request, video_id, 'Downloading url information')
video_url = _decrypt_url(png)
if not video_url.endswith('.f4m'):
if '?' not in video_url:
video_url = video_url.replace('resources/', 'auth/resources/')
video_url = video_url.replace('.net.rtve', '.multimedia.cdn.rtve')
subtitles = None

View File

@@ -75,7 +75,7 @@ class SafariBaseIE(InfoExtractor):
class SafariIE(SafariBaseIE):
IE_NAME = 'safari'
IE_DESC = 'safaribooksonline.com online video'
_VALID_URL = r'https?://(?:www\.)?safaribooksonline\.com/library/view/[^/]+/(?P<course_id>[^/]+)/(?P<part>part\d+)\.html'
_VALID_URL = r'https?://(?:www\.)?safaribooksonline\.com/library/view/[^/]+/(?P<course_id>[^/]+)/(?P<part>[^/?#&]+)\.html'
_TESTS = [{
'url': 'https://www.safaribooksonline.com/library/view/hadoop-fundamentals-livelessons/9780133392838/part00.html',
@@ -92,6 +92,9 @@ class SafariIE(SafariBaseIE):
# non-digits in course id
'url': 'https://www.safaribooksonline.com/library/view/create-a-nodejs/100000006A0210/part00.html',
'only_matching': True,
}, {
'url': 'https://www.safaribooksonline.com/library/view/learning-path-red/9780134664057/RHCE_Introduction.html',
'only_matching': True,
}]
def _real_extract(self, url):
@@ -132,12 +135,15 @@ class SafariIE(SafariBaseIE):
class SafariApiIE(SafariBaseIE):
IE_NAME = 'safari:api'
_VALID_URL = r'https?://(?:www\.)?safaribooksonline\.com/api/v1/book/(?P<course_id>[^/]+)/chapter(?:-content)?/(?P<part>part\d+)\.html'
_VALID_URL = r'https?://(?:www\.)?safaribooksonline\.com/api/v1/book/(?P<course_id>[^/]+)/chapter(?:-content)?/(?P<part>[^/?#&]+)\.html'
_TEST = {
_TESTS = [{
'url': 'https://www.safaribooksonline.com/api/v1/book/9780133392838/chapter/part00.html',
'only_matching': True,
}
}, {
'url': 'https://www.safaribooksonline.com/api/v1/book/9780134664057/chapter/RHCE_Introduction.html',
'only_matching': True,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)

View File

@@ -6,7 +6,6 @@ from .common import InfoExtractor
from ..utils import (
ExtractorError,
int_or_none,
sanitized_Request,
urlencode_postdata,
)
@@ -37,28 +36,33 @@ class SharedIE(InfoExtractor):
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
webpage, urlh = self._download_webpage_handle(url, video_id)
if '>File does not exist<' in webpage:
raise ExtractorError(
'Video %s does not exist' % video_id, expected=True)
download_form = self._hidden_inputs(webpage)
request = sanitized_Request(
url, urlencode_postdata(download_form))
request.add_header('Content-Type', 'application/x-www-form-urlencoded')
video_page = self._download_webpage(
request, video_id, 'Downloading video page')
urlh.geturl(), video_id, 'Downloading video page',
data=urlencode_postdata(download_form),
headers={
'Content-Type': 'application/x-www-form-urlencoded',
'Referer': urlh.geturl(),
})
video_url = self._html_search_regex(
r'data-url="([^"]+)"', video_page, 'video URL')
r'data-url=(["\'])(?P<url>(?:(?!\1).)+)\1',
video_page, 'video URL', group='url')
title = base64.b64decode(self._html_search_meta(
'full:title', webpage, 'title').encode('utf-8')).decode('utf-8')
filesize = int_or_none(self._html_search_meta(
'full:size', webpage, 'file size', fatal=False))
thumbnail = self._html_search_regex(
r'data-poster="([^"]+)"', video_page, 'thumbnail', default=None)
r'data-poster=(["\'])(?P<url>(?:(?!\1).)+)\1',
video_page, 'thumbnail', default=None, group='url')
return {
'id': video_id,

View File

@@ -14,10 +14,10 @@ from ..utils import ExtractorError
class SohuIE(InfoExtractor):
_VALID_URL = r'https?://(?P<mytv>my\.)?tv\.sohu\.com/.+?/(?(mytv)|n)(?P<id>\d+)\.shtml.*?'
# Sohu videos give different MD5 sums on Travis CI and my machine
_TESTS = [{
'note': 'This video is available only in Mainland China',
'url': 'http://tv.sohu.com/20130724/n382479172.shtml#super',
'md5': '29175c8cadd8b5cc4055001e85d6b372',
'info_dict': {
'id': '382479172',
'ext': 'mp4',
@@ -26,7 +26,6 @@ class SohuIE(InfoExtractor):
'skip': 'On available in China',
}, {
'url': 'http://tv.sohu.com/20150305/n409385080.shtml',
'md5': '699060e75cf58858dd47fb9c03c42cfb',
'info_dict': {
'id': '409385080',
'ext': 'mp4',
@@ -34,7 +33,6 @@ class SohuIE(InfoExtractor):
}
}, {
'url': 'http://my.tv.sohu.com/us/232799889/78693464.shtml',
'md5': '9bf34be48f2f4dadcb226c74127e203c',
'info_dict': {
'id': '78693464',
'ext': 'mp4',
@@ -48,7 +46,6 @@ class SohuIE(InfoExtractor):
'title': '【神探苍实战秘籍】第13期 战争之影 赫卡里姆',
},
'playlist': [{
'md5': 'bdbfb8f39924725e6589c146bc1883ad',
'info_dict': {
'id': '78910339_part1',
'ext': 'mp4',
@@ -56,7 +53,6 @@ class SohuIE(InfoExtractor):
'title': '【神探苍实战秘籍】第13期 战争之影 赫卡里姆',
}
}, {
'md5': '3e1f46aaeb95354fd10e7fca9fc1804e',
'info_dict': {
'id': '78910339_part2',
'ext': 'mp4',
@@ -64,7 +60,6 @@ class SohuIE(InfoExtractor):
'title': '【神探苍实战秘籍】第13期 战争之影 赫卡里姆',
}
}, {
'md5': '8407e634175fdac706766481b9443450',
'info_dict': {
'id': '78910339_part3',
'ext': 'mp4',

View File

@@ -0,0 +1,34 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
class SonyLIVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?sonyliv\.com/details/[^/]+/(?P<id>\d+)'
_TESTS = [{
'url': "http://www.sonyliv.com/details/episodes/5024612095001/Ep.-1---Achaari-Cheese-Toast---Bachelor's-Delight",
'info_dict': {
'title': "Ep. 1 - Achaari Cheese Toast - Bachelor's Delight",
'id': '5024612095001',
'ext': 'mp4',
'upload_date': '20160707',
'description': 'md5:7f28509a148d5be9d0782b4d5106410d',
'uploader_id': '4338955589001',
'timestamp': 1467870968,
},
'params': {
'skip_download': True,
},
'add_ie': ['BrightcoveNew'],
}, {
'url': 'http://www.sonyliv.com/details/full%20movie/4951168986001/Sei-Raat-(Bangla)',
'only_matching': True,
}]
BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/4338955589001/default_default/index.html?videoId=%s'
def _real_extract(self, url):
brightcove_id = self._match_id(url)
return self.url_result(
self.BRIGHTCOVE_URL_TEMPLATE % brightcove_id, 'BrightcoveNew', brightcove_id)

View File

@@ -119,6 +119,12 @@ class SoundcloudIE(InfoExtractor):
_CLIENT_ID = '02gUJC0hH2ct1EGOcYXQIzRFU91c72Ea'
_IPHONE_CLIENT_ID = '376f225bf427445fc4bfb6b99b72e0bf'
@staticmethod
def _extract_urls(webpage):
return [m.group('url') for m in re.finditer(
r'<iframe[^>]+src=(["\'])(?P<url>(?:https?://)?(?:w\.)?soundcloud\.com/player.+?)\1',
webpage)]
def report_resolve(self, video_id):
"""Report information extraction."""
self.to_screen('%s: Resolving id' % video_id)

View File

@@ -118,8 +118,12 @@ class TNAFlixNetworkBaseIE(InfoExtractor):
xpath_text(cfg_xml, './startThumb', 'thumbnail'), 'http:')
thumbnails = self._extract_thumbnails(cfg_xml)
title = self._html_search_regex(
self._TITLE_REGEX, webpage, 'title') if self._TITLE_REGEX else self._og_search_title(webpage)
title = None
if self._TITLE_REGEX:
title = self._html_search_regex(
self._TITLE_REGEX, webpage, 'title', default=None)
if not title:
title = self._og_search_title(webpage)
age_limit = self._rta_search(webpage) or 18
@@ -189,9 +193,9 @@ class TNAFlixNetworkEmbedIE(TNAFlixNetworkBaseIE):
class TNAFlixIE(TNAFlixNetworkBaseIE):
_VALID_URL = r'https?://(?:www\.)?tnaflix\.com/[^/]+/(?P<display_id>[^/]+)/video(?P<id>\d+)'
_TITLE_REGEX = r'<title>(.+?) - TNAFlix Porn Videos</title>'
_DESCRIPTION_REGEX = r'<meta[^>]+name="description"[^>]+content="([^"]+)"'
_UPLOADER_REGEX = r'<i>\s*Verified Member\s*</i>\s*<h1>(.+?)</h1>'
_TITLE_REGEX = r'<title>(.+?) - (?:TNAFlix Porn Videos|TNAFlix\.com)</title>'
_DESCRIPTION_REGEX = r'(?s)>Description:</[^>]+>(.+?)<'
_UPLOADER_REGEX = r'<i>\s*Verified Member\s*</i>\s*<h\d+>(.+?)<'
_CATEGORIES_REGEX = r'(?s)<span[^>]*>Categories:</span>(.+?)</div>'
_TESTS = [{

View File

@@ -8,6 +8,7 @@ from ..utils import (
determine_ext,
int_or_none,
float_or_none,
js_to_json,
parse_iso8601,
remove_end,
)
@@ -54,10 +55,11 @@ class TV2IE(InfoExtractor):
ext = determine_ext(video_url)
if ext == 'f4m':
formats.extend(self._extract_f4m_formats(
video_url, video_id, f4m_id=format_id))
video_url, video_id, f4m_id=format_id, fatal=False))
elif ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
video_url, video_id, 'mp4', m3u8_id=format_id))
video_url, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id=format_id, fatal=False))
elif ext == 'ism' or video_url.endswith('.ism/Manifest'):
pass
else:
@@ -105,7 +107,7 @@ class TV2ArticleIE(InfoExtractor):
'url': 'http://www.tv2.no/2015/05/16/nyheter/alesund/krim/pingvin/6930542',
'info_dict': {
'id': '6930542',
'title': 'Russen hetses etter pingvintyveri innrømmer å ha åpnet luken på buret',
'title': 'Russen hetses etter pingvintyveri - innrømmer å ha åpnet luken på buret',
'description': 'md5:339573779d3eea3542ffe12006190954',
},
'playlist_count': 2,
@@ -119,9 +121,23 @@ class TV2ArticleIE(InfoExtractor):
webpage = self._download_webpage(url, playlist_id)
# Old embed pattern (looks unused nowadays)
assets = re.findall(r'data-assetid=["\'](\d+)', webpage)
if not assets:
# New embed pattern
for v in re.findall('TV2ContentboxVideo\(({.+?})\)', webpage):
video = self._parse_json(
v, playlist_id, transform_source=js_to_json, fatal=False)
if not video:
continue
asset = video.get('assetId')
if asset:
assets.append(asset)
entries = [
self.url_result('http://www.tv2.no/v/%s' % video_id, 'TV2')
for video_id in re.findall(r'data-assetid="(\d+)"', webpage)]
self.url_result('http://www.tv2.no/v/%s' % asset_id, 'TV2')
for asset_id in assets]
title = remove_end(self._og_search_title(webpage), ' - TV2.no')
description = remove_end(self._og_search_description(webpage), ' - TV2.no')

View File

@@ -24,6 +24,7 @@ class TVPIE(InfoExtractor):
'id': '194536',
'ext': 'mp4',
'title': 'Czas honoru, I seria odc. 13',
'description': 'md5:76649d2014f65c99477be17f23a4dead',
},
}, {
'url': 'http://www.tvp.pl/there-can-be-anything-so-i-shortened-it/17916176',
@@ -32,6 +33,16 @@ class TVPIE(InfoExtractor):
'id': '17916176',
'ext': 'mp4',
'title': 'TVP Gorzów pokaże filmy studentów z podroży dookoła świata',
'description': 'TVP Gorzów pokaże filmy studentów z podroży dookoła świata',
},
}, {
# page id is not the same as video id(#7799)
'url': 'http://vod.tvp.pl/22704887/08122015-1500',
'md5': 'cf6a4705dfd1489aef8deb168d6ba742',
'info_dict': {
'id': '22680786',
'ext': 'mp4',
'title': 'Wiadomości, 08.12.2015, 15:00',
},
}, {
'url': 'http://vod.tvp.pl/seriale/obyczajowe/na-sygnale/sezon-2-27-/odc-39/17834272',
@@ -53,6 +64,39 @@ class TVPIE(InfoExtractor):
'only_matching': True,
}]
def _real_extract(self, url):
page_id = self._match_id(url)
webpage = self._download_webpage(url, page_id)
video_id = self._search_regex([
r'<iframe[^>]+src="[^"]*?object_id=(\d+)',
"object_id\s*:\s*'(\d+)'"], webpage, 'video id')
return {
'_type': 'url_transparent',
'url': 'tvp:' + video_id,
'description': self._og_search_description(webpage, default=None),
'thumbnail': self._og_search_thumbnail(webpage),
'ie_key': 'TVPEmbed',
}
class TVPEmbedIE(InfoExtractor):
IE_NAME = 'tvp:embed'
IE_DESC = 'Telewizja Polska'
_VALID_URL = r'(?:tvp:|https?://[^/]+\.tvp\.(?:pl|info)/sess/tvplayer\.php\?.*?object_id=)(?P<id>\d+)'
_TESTS = [{
'url': 'http://www.tvp.pl/sess/tvplayer.php?object_id=22670268',
'md5': '8c9cd59d16edabf39331f93bf8a766c7',
'info_dict': {
'id': '22670268',
'ext': 'mp4',
'title': 'Panorama, 07.12.2015, 15:40',
},
}, {
'url': 'tvp:22670268',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)

View File

@@ -4,13 +4,18 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_str
from ..compat import (
compat_HTTPError,
compat_str,
compat_urlparse,
)
from ..utils import (
determine_ext,
ExtractorError,
int_or_none,
parse_iso8601,
qualities,
determine_ext,
update_url_query,
int_or_none,
)
@@ -34,6 +39,9 @@ class TVPlayIE(InfoExtractor):
'ext': 'mp4',
'title': 'Kādi ir īri? - Viņas melo labāk',
'description': 'Baiba apsmej īrus, kādi tie ir un ko viņi dara.',
'series': 'Viņas melo labāk',
'season': '2.sezona',
'season_number': 2,
'duration': 25,
'timestamp': 1406097056,
'upload_date': '20140723',
@@ -46,6 +54,10 @@ class TVPlayIE(InfoExtractor):
'ext': 'flv',
'title': 'Moterys meluoja geriau',
'description': 'md5:9aec0fc68e2cbc992d2a140bd41fa89e',
'series': 'Moterys meluoja geriau',
'episode_number': 47,
'season': '1 sezonas',
'season_number': 1,
'duration': 1330,
'timestamp': 1403769181,
'upload_date': '20140626',
@@ -196,12 +208,15 @@ class TVPlayIE(InfoExtractor):
title = video['title']
if video.get('is_geo_blocked'):
self.report_warning(
'This content might not be available in your country due to copyright reasons')
streams = self._download_json(
'http://playapi.mtgx.tv/v1/videos/stream/%s' % video_id, video_id, 'Downloading streams JSON')
try:
streams = self._download_json(
'http://playapi.mtgx.tv/v1/videos/stream/%s' % video_id,
video_id, 'Downloading streams JSON')
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 403:
msg = self._parse_json(e.cause.read().decode('utf-8'), video_id)
raise ExtractorError(msg['msg'], expected=True)
raise
quality = qualities(['hls', 'medium', 'high'])
formats = []
@@ -226,7 +241,8 @@ class TVPlayIE(InfoExtractor):
'ext': ext,
}
if video_url.startswith('rtmp'):
m = re.search(r'^(?P<url>rtmp://[^/]+/(?P<app>[^/]+))/(?P<playpath>.+)$', video_url)
m = re.search(
r'^(?P<url>rtmp://[^/]+/(?P<app>[^/]+))/(?P<playpath>.+)$', video_url)
if not m:
continue
fmt.update({
@@ -240,15 +256,41 @@ class TVPlayIE(InfoExtractor):
'url': video_url,
})
formats.append(fmt)
if not formats and video.get('is_geo_blocked'):
self.raise_geo_restricted(
'This content might not be available in your country due to copyright reasons')
self._sort_formats(formats)
# TODO: webvtt in m3u8
subtitles = {}
sami_path = video.get('sami_path')
if sami_path:
lang = self._search_regex(
r'_([a-z]{2})\.xml', sami_path, 'lang',
default=compat_urlparse.urlparse(url).netloc.rsplit('.', 1)[-1])
subtitles[lang] = [{
'url': sami_path,
}]
series = video.get('format_title')
episode_number = int_or_none(video.get('format_position', {}).get('episode'))
season = video.get('_embedded', {}).get('season', {}).get('title')
season_number = int_or_none(video.get('format_position', {}).get('season'))
return {
'id': video_id,
'title': title,
'description': video.get('description'),
'series': series,
'episode_number': episode_number,
'season': season,
'season_number': season_number,
'duration': int_or_none(video.get('duration')),
'timestamp': parse_iso8601(video.get('created_at')),
'view_count': int_or_none(video.get('views', {}).get('total')),
'age_limit': int_or_none(video.get('age_limit', 0)),
'formats': formats,
'subtitles': subtitles,
}

View File

@@ -461,7 +461,7 @@ class TwitchClipsIE(InfoExtractor):
IE_NAME = 'twitch:clips'
_VALID_URL = r'https?://clips\.twitch\.tv/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_TEST = {
_TESTS = [{
'url': 'https://clips.twitch.tv/ea/AggressiveCobraPoooound',
'md5': '761769e1eafce0ffebfb4089cb3847cd',
'info_dict': {
@@ -473,7 +473,11 @@ class TwitchClipsIE(InfoExtractor):
'uploader': 'stereotype_',
'uploader_id': 'stereotype_',
},
}
}, {
# multiple formats
'url': 'https://clips.twitch.tv/rflegendary/UninterestedBeeDAESuppy',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
@@ -485,15 +489,27 @@ class TwitchClipsIE(InfoExtractor):
r'(?s)clipInfo\s*=\s*({.+?});', webpage, 'clip info'),
video_id, transform_source=js_to_json)
video_url = clip['clip_video_url']
title = clip['channel_title']
title = clip.get('channel_title') or self._og_search_title(webpage)
formats = [{
'url': option['source'],
'format_id': option.get('quality'),
'height': int_or_none(option.get('quality')),
} for option in clip.get('quality_options', []) if option.get('source')]
if not formats:
formats = [{
'url': clip['clip_video_url'],
}]
self._sort_formats(formats)
return {
'id': video_id,
'url': video_url,
'title': title,
'thumbnail': self._og_search_thumbnail(webpage),
'creator': clip.get('broadcaster_display_name') or clip.get('broadcaster_login'),
'uploader': clip.get('curator_login'),
'uploader_id': clip.get('curator_display_name'),
'formats': formats,
}

128
youtube_dl/extractor/uol.py Normal file
View File

@@ -0,0 +1,128 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
clean_html,
int_or_none,
parse_duration,
update_url_query,
str_or_none,
)
class UOLIE(InfoExtractor):
IE_NAME = 'uol.com.br'
_VALID_URL = r'https?://(?:.+?\.)?uol\.com\.br/.*?(?:(?:mediaId|v)=|view/(?:[a-z0-9]+/)?|video(?:=|/(?:\d{4}/\d{2}/\d{2}/)?))(?P<id>\d+|[\w-]+-[A-Z0-9]+)'
_TESTS = [{
'url': 'http://player.mais.uol.com.br/player_video_v3.swf?mediaId=15951931',
'md5': '25291da27dc45e0afb5718a8603d3816',
'info_dict': {
'id': '15951931',
'ext': 'mp4',
'title': 'Miss simpatia é encontrada morta',
'description': 'md5:3f8c11a0c0556d66daf7e5b45ef823b2',
}
}, {
'url': 'http://tvuol.uol.com.br/video/incendio-destroi-uma-das-maiores-casas-noturnas-de-londres-04024E9A3268D4C95326',
'md5': 'e41a2fb7b7398a3a46b6af37b15c00c9',
'info_dict': {
'id': '15954259',
'ext': 'mp4',
'title': 'Incêndio destrói uma das maiores casas noturnas de Londres',
'description': 'Em Londres, um incêndio destruiu uma das maiores boates da cidade. Não há informações sobre vítimas.',
}
}, {
'url': 'http://mais.uol.com.br/static/uolplayer/index.html?mediaId=15951931',
'only_matching': True,
}, {
'url': 'http://mais.uol.com.br/view/15954259',
'only_matching': True,
}, {
'url': 'http://noticias.band.uol.com.br/brasilurgente/video/2016/08/05/15951931/miss-simpatia-e-encontrada-morta.html',
'only_matching': True,
}, {
'url': 'http://videos.band.uol.com.br/programa.asp?e=noticias&pr=brasil-urgente&v=15951931&t=Policia-desmonte-base-do-PCC-na-Cracolandia',
'only_matching': True,
}, {
'url': 'http://mais.uol.com.br/view/cphaa0gl2x8r/incendio-destroi-uma-das-maiores-casas-noturnas-de-londres-04024E9A3268D4C95326',
'only_matching': True,
}, {
'url': 'http://noticias.uol.com.br//videos/assistir.htm?video=rafaela-silva-inspira-criancas-no-judo-04024D983968D4C95326',
'only_matching': True,
}, {
'url': 'http://mais.uol.com.br/view/e0qbgxid79uv/15275470',
'only_matching': True,
}]
_FORMATS = {
'2': {
'width': 640,
'height': 360,
},
'5': {
'width': 1080,
'height': 720,
},
'6': {
'width': 426,
'height': 240,
},
'7': {
'width': 1920,
'height': 1080,
},
'8': {
'width': 192,
'height': 144,
},
'9': {
'width': 568,
'height': 320,
},
}
def _real_extract(self, url):
video_id = self._match_id(url)
if not video_id.isdigit():
embed_page = self._download_webpage('https://jsuol.com.br/c/tv/uol/embed/?params=[embed,%s]' % video_id, video_id)
video_id = self._search_regex(r'mediaId=(\d+)', embed_page, 'media id')
video_data = self._download_json(
'http://mais.uol.com.br/apiuol/v3/player/getMedia/%s.json' % video_id,
video_id)['item']
title = video_data['title']
query = {
'ver': video_data.get('numRevision', 2),
'r': 'http://mais.uol.com.br',
}
formats = []
for f in video_data.get('formats', []):
f_url = f.get('url') or f.get('secureUrl')
if not f_url:
continue
format_id = str_or_none(f.get('id'))
fmt = {
'format_id': format_id,
'url': update_url_query(f_url, query),
}
fmt.update(self._FORMATS.get(format_id, {}))
formats.append(fmt)
self._sort_formats(formats)
tags = []
for tag in video_data.get('tags', []):
tag_description = tag.get('description')
if not tag_description:
continue
tags.append(tag_description)
return {
'id': video_id,
'title': title,
'description': clean_html(video_data.get('desMedia')),
'thumbnail': video_data.get('thumbnail'),
'duration': int_or_none(video_data.get('durationSeconds')) or parse_duration(video_data.get('duration')),
'tags': tags,
'formats': formats,
}

View File

@@ -0,0 +1,58 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import unescapeHTML
class VODPlatformIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?vod-platform\.net/embed/(?P<id>[^/?#]+)'
_TEST = {
# from http://www.lbcgroup.tv/watch/chapter/29143/52844/%D8%A7%D9%84%D9%86%D8%B5%D8%B1%D8%A9-%D9%81%D9%8A-%D8%B6%D9%8A%D8%A7%D9%81%D8%A9-%D8%A7%D9%84%D9%80-cnn/ar
'url': 'http://vod-platform.net/embed/RufMcytHDolTH1MuKHY9Fw',
'md5': '1db2b7249ce383d6be96499006e951fc',
'info_dict': {
'id': 'RufMcytHDolTH1MuKHY9Fw',
'ext': 'mp4',
'title': 'LBCi News_ النصرة في ضيافة الـ "سي.أن.أن"',
}
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
title = unescapeHTML(self._og_search_title(webpage))
hidden_inputs = self._hidden_inputs(webpage)
base_url = self._search_regex(
'(.*/)(?:playlist.m3u8|manifest.mpd)',
hidden_inputs.get('HiddenmyhHlsLink') or hidden_inputs['HiddenmyDashLink'],
'base url')
formats = self._extract_m3u8_formats(
base_url + 'playlist.m3u8', video_id, 'mp4',
'm3u8_native', m3u8_id='hls', fatal=False)
formats.extend(self._extract_mpd_formats(
base_url + 'manifest.mpd', video_id,
mpd_id='dash', fatal=False))
rtmp_formats = self._extract_smil_formats(
base_url + 'jwplayer.smil', video_id, fatal=False)
for rtmp_format in rtmp_formats:
rtsp_format = rtmp_format.copy()
rtsp_format['url'] = '%s/%s' % (rtmp_format['url'], rtmp_format['play_path'])
del rtsp_format['play_path']
del rtsp_format['ext']
rtsp_format.update({
'url': rtsp_format['url'].replace('rtmp://', 'rtsp://'),
'format_id': rtmp_format['format_id'].replace('rtmp', 'rtsp'),
'protocol': 'rtsp',
})
formats.extend([rtmp_format, rtsp_format])
self._sort_formats(formats)
return {
'id': video_id,
'title': title,
'thumbnail': hidden_inputs.get('HiddenThumbnail') or self._og_search_thumbnail(webpage),
'formats': formats,
}

View File

@@ -75,6 +75,12 @@ class YandexMusicTrackIE(YandexMusicBaseIE):
% storage_dir,
track_id, 'Downloading track location JSON')
# Each string is now wrapped in a list, this is probably only temporarily thus
# supporting both scenarios (see https://github.com/rg3/youtube-dl/issues/10193)
for k, v in data.items():
if v and isinstance(v, list):
data[k] = v[0]
key = hashlib.md5(('XGRlBW9FXlekgbPrRHuSiA' + data['path'][1:] + data['s']).encode('utf-8')).hexdigest()
storage = storage_dir.split('.')

View File

@@ -2,6 +2,7 @@ from __future__ import unicode_literals
import os.path
import optparse
import re
import sys
from .downloader.external import list_external_downloaders
@@ -93,8 +94,18 @@ def parseOpts(overrideArguments=None):
setattr(parser.values, option.dest, value.split(','))
def _hide_login_info(opts):
opts = list(opts)
for private_opt in ['-p', '--password', '-u', '--username', '--video-password']:
PRIVATE_OPTS = ['-p', '--password', '-u', '--username', '--video-password']
eqre = re.compile('^(?P<key>' + ('|'.join(re.escape(po) for po in PRIVATE_OPTS)) + ')=.+$')
def _scrub_eq(o):
m = eqre.match(o)
if m:
return m.group('key') + '=PRIVATE'
else:
return o
opts = list(map(_scrub_eq, opts))
for private_opt in PRIVATE_OPTS:
try:
i = opts.index(private_opt)
opts[i + 1] = 'PRIVATE'
@@ -488,9 +499,20 @@ def parseOpts(overrideArguments=None):
dest='bidi_workaround', action='store_true',
help='Work around terminals that lack bidirectional text support. Requires bidiv or fribidi executable in PATH')
workarounds.add_option(
'--sleep-interval', metavar='SECONDS',
'--sleep-interval', '--min-sleep-interval', metavar='SECONDS',
dest='sleep_interval', type=float,
help='Number of seconds to sleep before each download.')
help=(
'Number of seconds to sleep before each download when used alone '
'or a lower bound of a range for randomized sleep before each download '
'(minimum possible number of seconds to sleep) when used along with '
'--max-sleep-interval.'))
workarounds.add_option(
'--max-sleep-interval', metavar='SECONDS',
dest='max_sleep_interval', type=float,
help=(
'Upper bound of a range for randomized sleep before each download '
'(maximum possible number of seconds to sleep). Must only be used '
'along with --min-sleep-interval.'))
verbosity = optparse.OptionGroup(parser, 'Verbosity / Simulation Options')
verbosity.add_option(

View File

@@ -3,11 +3,6 @@ from __future__ import unicode_literals
import re
from .common import PostProcessor
from ..utils import PostProcessingError
class MetadataFromTitlePPError(PostProcessingError):
pass
class MetadataFromTitlePP(PostProcessor):
@@ -38,7 +33,8 @@ class MetadataFromTitlePP(PostProcessor):
title = info['title']
match = re.match(self._titleregex, title)
if match is None:
raise MetadataFromTitlePPError('Could not interpret title of video as "%s"' % self._titleformat)
self._downloader.to_screen('[fromtitle] Could not interpret title of video as "%s"' % self._titleformat)
return [], info
for attribute, value in match.groupdict().items():
value = match.group(attribute)
info[attribute] = value

View File

@@ -47,6 +47,7 @@ from .compat import (
compat_socket_create_connection,
compat_str,
compat_struct_pack,
compat_struct_unpack,
compat_urllib_error,
compat_urllib_parse,
compat_urllib_parse_urlencode,
@@ -1101,7 +1102,7 @@ def unified_timestamp(date_str, day_first=True):
date_str = date_str.replace(',', ' ')
pm_delta = datetime.timedelta(hours=12 if re.search(r'(?i)PM', date_str) else 0)
pm_delta = 12 if re.search(r'(?i)PM', date_str) else 0
timezone, date_str = extract_timezone(date_str)
# Remove AM/PM + timezone
@@ -1109,13 +1110,13 @@ def unified_timestamp(date_str, day_first=True):
for expression in date_formats(day_first):
try:
dt = datetime.datetime.strptime(date_str, expression) - timezone + pm_delta
dt = datetime.datetime.strptime(date_str, expression) - timezone + datetime.timedelta(hours=pm_delta)
return calendar.timegm(dt.timetuple())
except ValueError:
pass
timetuple = email.utils.parsedate_tz(date_str)
if timetuple:
return calendar.timegm(timetuple.timetuple())
return calendar.timegm(timetuple) + pm_delta * 3600
def determine_ext(url, default_ext='unknown_video'):
@@ -1983,11 +1984,27 @@ US_RATINGS = {
}
TV_PARENTAL_GUIDELINES = {
'TV-Y': 0,
'TV-Y7': 7,
'TV-G': 0,
'TV-PG': 0,
'TV-14': 14,
'TV-MA': 17,
}
def parse_age_limit(s):
if s is None:
if type(s) == int:
return s if 0 <= s <= 21 else None
if not isinstance(s, compat_basestring):
return None
m = re.match(r'^(?P<age>\d{1,2})\+?$', s)
return int(m.group('age')) if m else US_RATINGS.get(s)
if m:
return int(m.group('age'))
if s in US_RATINGS:
return US_RATINGS[s]
return TV_PARENTAL_GUIDELINES.get(s)
def strip_jsonp(code):
@@ -2969,3 +2986,110 @@ def parse_m3u8_attributes(attrib):
def urshift(val, n):
return val >> n if val >= 0 else (val + 0x100000000) >> n
# Based on png2str() written by @gdkchan and improved by @yokrysty
# Originally posted at https://github.com/rg3/youtube-dl/issues/9706
def decode_png(png_data):
# Reference: https://www.w3.org/TR/PNG/
header = png_data[8:]
if png_data[:8] != b'\x89PNG\x0d\x0a\x1a\x0a' or header[4:8] != b'IHDR':
raise IOError('Not a valid PNG file.')
int_map = {1: '>B', 2: '>H', 4: '>I'}
unpack_integer = lambda x: compat_struct_unpack(int_map[len(x)], x)[0]
chunks = []
while header:
length = unpack_integer(header[:4])
header = header[4:]
chunk_type = header[:4]
header = header[4:]
chunk_data = header[:length]
header = header[length:]
header = header[4:] # Skip CRC
chunks.append({
'type': chunk_type,
'length': length,
'data': chunk_data
})
ihdr = chunks[0]['data']
width = unpack_integer(ihdr[:4])
height = unpack_integer(ihdr[4:8])
idat = b''
for chunk in chunks:
if chunk['type'] == b'IDAT':
idat += chunk['data']
if not idat:
raise IOError('Unable to read PNG data.')
decompressed_data = bytearray(zlib.decompress(idat))
stride = width * 3
pixels = []
def _get_pixel(idx):
x = idx % stride
y = idx // stride
return pixels[y][x]
for y in range(height):
basePos = y * (1 + stride)
filter_type = decompressed_data[basePos]
current_row = []
pixels.append(current_row)
for x in range(stride):
color = decompressed_data[1 + basePos + x]
basex = y * stride + x
left = 0
up = 0
if x > 2:
left = _get_pixel(basex - 3)
if y > 0:
up = _get_pixel(basex - stride)
if filter_type == 1: # Sub
color = (color + left) & 0xff
elif filter_type == 2: # Up
color = (color + up) & 0xff
elif filter_type == 3: # Average
color = (color + ((left + up) >> 1)) & 0xff
elif filter_type == 4: # Paeth
a = left
b = up
c = 0
if x > 2 and y > 0:
c = _get_pixel(basex - stride - 3)
p = a + b - c
pa = abs(p - a)
pb = abs(p - b)
pc = abs(p - c)
if pa <= pb and pa <= pc:
color = (color + a) & 0xff
elif pb <= pc:
color = (color + b) & 0xff
else:
color = (color + c) & 0xff
current_row.append(color)
return width, height, pixels

View File

@@ -1,3 +1,3 @@
from __future__ import unicode_literals
__version__ = '2016.07.26.2'
__version__ = '2016.08.10'