Compare commits

...

111 Commits

Author SHA1 Message Date
Sergey M․
65c416dda8 release 2017.07.09 2017-07-09 20:16:38 +07:00
Sergey M․
207acd8465 [ChangeLog] Actualize 2017-07-09 20:15:15 +07:00
Sergey M․
71a1db8919 [dailymail] Add support for embeds 2017-07-09 20:06:24 +07:00
Sergey M․
6e925598d6 [csjw] Add coding cookie 2017-07-09 19:18:12 +07:00
Sergey M․
73cf76a93f [joj] Rewrite and add support for generic embeds (closes #13268) 2017-07-09 19:17:54 +07:00
luboss
256a746d21 [joj] Add extractor 2017-07-09 19:17:38 +07:00
Sergey M․
58179eb7d9 [abc.net.au:iview] Extract more formats (closes #13492, closes #13489) 2017-07-09 17:55:40 +07:00
Sergey M․
485cb37576 [egghead:course] Improve (closes #13370) 2017-07-09 17:30:49 +07:00
Santiago Calcagno
ed84454d35 [egghead:course] Fix extraction 2017-07-09 17:30:25 +07:00
Sergey M․
a02682fd13 Keep in sync with ffmpeg's current malformed AAC bitstream wording (closes #13587) 2017-07-09 17:09:44 +07:00
Sergey M․
0d2f0b0357 [csjw] Make description optional 2017-07-09 17:05:11 +07:00
Sergey M․
c319d1c483 [csjw] Fix issues and improve extraction (closes #13525) 2017-07-09 17:01:05 +07:00
Christopher Smith
d2b9f362fa [cjsw] Add extractor 2017-07-09 17:01:00 +07:00
Sergey M․
4328ddf82b [extractor/common] Add support for AMP tags in _parse_html5_media_entries 2017-07-09 16:29:52 +07:00
Sergey M․
250b042c7e [generic] Add tests for #13557 2017-07-09 16:02:38 +07:00
Sergey M․
665e945246 [eagleplatform] Add support for referrer protected videos (closes #13557) 2017-07-09 15:57:58 +07:00
Sergey M․
5af2fd7fa0 [eagleplatform] Add support for another embed pattern (#13557) 2017-07-09 15:55:04 +07:00
mlindner
15237fcd51 [veoh] Extend _VALID_URL 2017-07-09 14:54:52 +07:00
rrooij
7a57730907 [npo:live] Fix live stream id extraction (closes #13568) 2017-07-09 14:21:40 +07:00
Sergey M․
8b347a389e [googledrive] Fix height extraction (closes #13603) 2017-07-09 00:26:13 +07:00
Sergey M․
a49804816c [dailymotion] Add support for new layout (close #13580) 2017-07-08 18:12:15 +07:00
Yen Chi Hsuan
eadd313321 [yam] Remove extractor
mymedia.yam.com is dead. An wikipedia user also pointed out that Yam's
blog service is no longer available. [1]

[1] https://zh.wikipedia.org/zh-tw/%E5%A4%A9%E7%A9%BA%E9%83%A8%E8%90%BD
2017-07-08 15:48:05 +08:00
Sergey M․
d852c6bc59 [xhamster] Extract all formats and fix duration extraction (#13593) 2017-07-07 22:49:11 +07:00
Sergey M․
00e5c36315 [xhamster] Add support for new URL schema (closes #13593) 2017-07-07 22:27:34 +07:00
Sergey M․
8a04ade86b Credit @parmjitv for #13322, #13503, #13541, #13549 2017-07-06 23:15:23 +07:00
Sergey M․
ab328411d5 Credit @orng for ruv (#13396) 2017-07-06 23:15:16 +07:00
Sergey M․
ddeff4be3f Credit @gfabiano for #13382, #13385, #13415 2017-07-06 23:15:09 +07:00
Parmjit Virk
60d4401c5e [espn] Extend _VALID_URL (fixes #13244) 2017-07-06 22:55:59 +07:00
Sergey M․
dee2ff1d81 [test_utils] Fix tests under Windows 2017-07-06 00:25:37 +07:00
Sergey M․
6554708252 [kaltura] Fix typo in subtitles extraction (closes #13569) 2017-07-05 23:20:50 +07:00
Sergey M․
0a2e1b2e30 [vier] Adapt extraction to redesign (#13575) 2017-07-05 22:52:47 +07:00
Yen Chi Hsuan
babbc04d45 [xuite] Move to the new HTML5 API and reduce # of requests 2017-07-05 23:27:12 +08:00
Yen Chi Hsuan
609ff8ca19 [utils] Support attributes with no values in get_elements_by_attribute() 2017-07-05 23:27:12 +08:00
Sergey M․
b6c9fe4162 release 2017.07.02 2017-07-02 20:17:10 +07:00
Sergey M․
4d9ba27bba [ChangeLog] Actualize 2017-07-02 20:12:40 +07:00
Sergey M․
50ae3f646e [thisoldhouse] Add more fallbacks for video id (closes #13541) 2017-07-02 20:06:15 +07:00
Parmjit Virk
99a7e76240 [thisoldhouse] Update test 2017-07-02 20:05:11 +07:00
Parmjit Virk
a3a6d01a96 [thisoldhouse] Fix video id extraction (closes #13540) 2017-07-02 20:04:51 +07:00
Sergey M․
02d61a65e2 [xfileshare] Extend format regex (closes #13536) 2017-07-02 08:00:22 +07:00
Sergey M․
9b35297be1 [extractors] Add import for tastytrade 2017-07-01 18:39:29 +07:00
Sergey M․
4917478803 [ted] Fix extraction (closes #13535)) 2017-07-01 18:39:01 +07:00
Sergey M․
54faac2235 [tastytrade] Add extractor (closes #13521) 2017-06-30 22:20:30 +07:00
Sergey M․
c69701c6ab [extractor/common] Improve _json_ld 2017-06-30 22:19:06 +07:00
Sergey M․
d4f8ce6e91 [dplayit] Relax video id regex (closes #13524) 2017-06-30 21:55:45 +07:00
Sergey M․
b311b0ead2 [generic] Extract more generic metadata (closes #13527) 2017-06-30 21:42:04 +07:00
Sergey M․
72d256c434 [bbccouk] Extend _VALID_URL 2017-06-29 22:29:28 +07:00
Sergey M․
b2ed954fc6 [bbccouk] Capture and output error message (closes #13518) 2017-06-29 22:27:53 +07:00
Sergey M․
a919ca0ad6 [cbsnews] Actualize test 2017-06-28 22:30:12 +07:00
Parmjit Virk
88d6b7c2bd [cbsnews] Relax video info regex (fixes #13284) 2017-06-28 22:21:35 +07:00
Sergey M․
fd1c5fba6b [facebook] Add test for plugin video embed (#13493) 2017-06-27 22:38:59 +07:00
Sergey M․
0646e34c7d [facebook] Add support for plugin video embeds and multiple embeds (closes #13493) 2017-06-27 22:38:54 +07:00
Sergey M․
bf2dc9cc6e [soundcloud] Fix tests 2017-06-27 21:26:46 +07:00
Viktor Szakats
f1c051009b [soundcloud] Switch to https for API requests 2017-06-27 21:20:18 +07:00
Sergey M․
33ffb645a6 [pandatv] Switch to https for API and download URLs 2017-06-26 22:11:09 +07:00
Xuan Hu (Sean)
35544690e4 [pandatv] Add support for https URLs 2017-06-26 22:00:31 +07:00
Yen Chi Hsuan
136503e302 [ChangeLog] Update after #13494 2017-06-26 19:56:07 +08:00
Luca Steeb
4a87de72df [niconico] fix sp subdomain links 2017-06-25 21:30:05 +02:00
Sergey M․
a7ce8f16c4 release 2017.06.25 2017-06-25 05:16:06 +07:00
Sergey M․
a5aea53fc8 [ChangeLog] Actualize 2017-06-25 05:13:12 +07:00
Sergey M․
0c7a631b61 [adobepass] Add support for ATTOTT MSO (DIRECTV NOW) (closes #13472) 2017-06-25 05:03:17 +07:00
Sergey M․
fd9ee4de8c [wsj] Add support for barrons.com (closes #13470) 2017-06-25 02:15:35 +07:00
Argn0
5744cf6c03 [ign] Add another video id pattern (closes #13328) 2017-06-25 01:59:15 +07:00
Sergey M․
9c48b5a193 [raiplay:live] Improve and add test (closes #13414) 2017-06-25 01:49:27 +07:00
james
449c665776 [raiplay:live] Add extractor 2017-06-25 01:48:54 +07:00
Sergey M․
23aec3d623 [redbulltv] Restore hls format prefix 2017-06-25 01:10:31 +07:00
Sergey M․
27449ad894 [redbulltv] Add support for lives and segments (closes #13486)) 2017-06-25 01:09:12 +07:00
Sergey M․
bd65f18153 [onetpl] Add support for videos embedded via pulsembed (closes #13482) 2017-06-24 18:33:31 +07:00
Sergey M․
73af5cc817 [YoutubeDL] Skip malformed formats for better extraction robustness 2017-06-23 21:18:33 +07:00
Sergey M․
b5f523ed62 [ooyala] Add test for missing stream['url']['data'] 2017-06-23 20:56:48 +07:00
Sergey M․
4f4dd8d797 [ooyala] Make more robust 2017-06-23 20:56:21 +07:00
Sergey M․
4cb18ab1b9 [ooyala] Skip empty format URLs (closes #13471, closes #13476) 2017-06-23 20:50:48 +07:00
Sergey M․
ac7409eec5 [hgtv.com:show] Fix typo 2017-06-23 02:54:12 +07:00
Sergey M․
170719414d release 2017.06.23 2017-06-23 02:13:21 +07:00
Sergey M․
38dad4737f [ChangeLog] Actualize 2017-06-23 02:10:54 +07:00
Sergey M․
ddbb4c5c3e [youtube] Adapt to new automatic captions rendition (closes #13467) 2017-06-23 02:00:19 +07:00
Sergey M․
fa3ea7223a [hgtv.com:show] Relax video config regex and update test (closes #13279, closes #13461) 2017-06-23 00:42:42 +07:00
Parmjit Virk
0f4a5a73e7 [drtuber] Fix formats extraction (fixes 12058) 2017-06-23 00:08:36 +07:00
Sergey M․
18166bb8e8 [youporn] Fix upload date extraction 2017-06-22 00:47:02 +07:00
Sergey M․
d4893e764b [youporn] Improve formats extraction 2017-06-22 00:40:15 +07:00
Sergey M․
97b6e30113 [youporn] Fix title extraction (closes #13456) 2017-06-22 00:20:45 +07:00
Sergey M․
9be9ec5980 [googledrive] Fix formats' sorting (closes #13443) 2017-06-20 22:58:33 +07:00
Giuseppe Fabiano
048b55804d [watchindianporn] Fix extraction (closes #13411) 2017-06-20 04:30:45 +07:00
Giuseppe Fabiano
6ce79d7ac0 [abcotvs] Fix test md5 2017-06-20 04:07:00 +07:00
Sergey M․
1641ca402d [vimeo] Add fallback mp4 extension for original format 2017-06-20 01:27:59 +07:00
Sergey M․
85cbcede5b [ruv] Improve, extract all formats and metadata (closes #13396) 2017-06-19 23:46:03 +07:00
Orn
a1de83e5f0 [ruv] Add extractor 2017-06-19 23:45:45 +07:00
Sergey M․
fee00b3884 [viu] Fix extraction on older python 2.6 2017-06-19 22:57:37 +07:00
Sergey M․
2d2132ac6e [adobepass] Fix extraction on older python 2.6 2017-06-19 22:54:53 +07:00
Yen Chi Hsuan
cc2ffe5afe [pandora.tv] Fix upload_date extraction (closes #12846) 2017-06-19 16:20:36 +08:00
Sergey M․
560050669b [asiancrush] Add extractor (closes #13420) 2017-06-18 20:18:51 +07:00
Sergey M․
eaa006d1bd release 2017.06.18 2017-06-18 00:16:49 +07:00
Sergey M․
a6f29820c6 [ChangeLog] Actualize 2017-06-18 00:15:43 +07:00
Sergey M․
1433734c35 [downloader/common] Use utils.shell_quote for debug command line 2017-06-17 23:50:21 +07:00
Sergey M․
aefce8e6dc [utils] Use compat_shlex_quote in shell_quote 2017-06-17 23:48:58 +07:00
Sergey M․
8b6ac49ecc [postprocessor/execafterdownload] Encode command line (closes #13407) 2017-06-17 23:16:53 +07:00
Sergey M․
b08e235f09 [compat] Fix compat_shlex_quote on Windows (closes #5889, closes #10254) 2017-06-17 23:14:24 +07:00
Sergey M․
be80986ed9 [postprocessor/metadatafromtitle] Fix missing optional meta fields (closes #13408) 2017-06-17 19:05:10 +07:00
Yen Chi Hsuan
473e87064b [devscripts/prepare_manpage] Fix deprecated escape sequence on py36 2017-06-17 17:37:25 +08:00
Yen Chi Hsuan
4f90d2aeac [Makefile] Excluding __pycache__ correctly (#13400) 2017-06-17 17:09:24 +08:00
Jakub Adam Wieczorek
b230fefc3c [polskieradio] Fix extraction 2017-06-16 04:57:56 +07:00
Sergey M․
96a2daa1ee [extractor/common] Improve jwplayer subtitles extraction 2017-06-15 23:40:39 +07:00
gfabiano
0ea6efbb7a [xfileshare] Add support for fastvideo.me 2017-06-15 21:41:49 +07:00
Yen Chi Hsuan
6a9cb29509 [extractor/common] Fix json dumping with --geo-bypass
The line "[debug] Using fake IP %s (%s) as X-Forwarded-For." was printed
to stdout even with -j/-J, which breaks the resultant JSON.
2017-06-15 13:04:36 +08:00
Yen Chi Hsuan
ca27037171 [bilibili] Fix extraction of videos with double quotes in titles
Closes #13387
2017-06-15 11:19:03 +08:00
gfabiano
0bf4b71b75 [4tube] Fix extraction (closes #13381) 2017-06-15 04:16:50 +07:00
Marcin Cieślak
5215f45327 [disney] Add support for disneychannel.de 2017-06-15 04:13:04 +07:00
Sergey M․
0a268c6e11 [extractor/common] Improve jwplayer formats extraction (closes #13379) 2017-06-14 22:02:15 +07:00
Sergey M․
7dd5415cd0 [npo] Improve _VALID_URL (closes #13376) 2017-06-14 21:33:40 +07:00
Sergey M․
b5dc33daa9 [corus] Add support for showcase.ca 2017-06-13 23:27:27 +07:00
Sergey M․
97fa1f8dc4 [corus] Add support for history.ca (closes #13359) 2017-06-13 23:16:21 +07:00
Sergey M․
b081f53b08 [compat] Add compat_HTMLParseError to __all__ 2017-06-12 02:36:43 +07:00
69 changed files with 1556 additions and 569 deletions

View File

@@ -6,8 +6,8 @@
---
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2017.06.12*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2017.06.12**
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2017.07.09*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2017.07.09**
### Before submitting an *issue* make sure you have:
- [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
@@ -35,7 +35,7 @@ $ youtube-dl -v <your command line>
[debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2017.06.12
[debug] youtube-dl version 2017.07.09
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {}

View File

@@ -220,3 +220,6 @@ gritstub
Adam Voss
Mike Fährmann
Jan Kundrát
Giuseppe Fabiano
Örn Guðjónsson
Parmjit Virk

110
ChangeLog
View File

@@ -1,3 +1,113 @@
version 2017.07.09
Core
+ [extractor/common] Add support for AMP tags in _parse_html5_media_entries
+ [utils] Support attributes with no values in get_elements_by_attribute
Extractors
+ [dailymail] Add support for embeds
+ [joj] Add support for joj.sk (#13268)
* [abc.net.au:iview] Extract more formats (#13492, #13489)
* [egghead:course] Fix extraction (#6635, #13370)
+ [cjsw] Add support for cjsw.com (#13525)
+ [eagleplatform] Add support for referrer protected videos (#13557)
+ [eagleplatform] Add support for another embed pattern (#13557)
* [veoh] Extend URL regular expression (#13601)
* [npo:live] Fix live stream id extraction (#13568, #13605)
* [googledrive] Fix height extraction (#13603)
+ [dailymotion] Add support for new layout (#13580)
- [yam] Remove extractor
* [xhamster] Extract all formats and fix duration extraction (#13593)
+ [xhamster] Add support for new URL schema (#13593)
* [espn] Extend URL regular expression (#13244, #13549)
* [kaltura] Fix typo in subtitles extraction (#13569)
* [vier] Adapt extraction to redesign (#13575)
version 2017.07.02
Core
* [extractor/common] Improve _json_ld
Extractors
+ [thisoldhouse] Add more fallbacks for video id
* [thisoldhouse] Fix video id extraction (#13540, #13541)
* [xfileshare] Extend format regular expression (#13536)
* [ted] Fix extraction (#13535)
+ [tastytrade] Add support for tastytrade.com (#13521)
* [dplayit] Relax video id regular expression (#13524)
+ [generic] Extract more generic metadata (#13527)
+ [bbccouk] Capture and output error message (#13501, #13518)
* [cbsnews] Relax video info regular expression (#13284, #13503)
+ [facebook] Add support for plugin video embeds and multiple embeds (#13493)
* [soundcloud] Switch to https for API requests (#13502)
* [pandatv] Switch to https for API and download URLs
+ [pandatv] Add support for https URLs (#13491)
+ [niconico] Support sp subdomain (#13494)
version 2017.06.25
Core
+ [adobepass] Add support for DIRECTV NOW (mso ATTOTT) (#13472)
* [YoutubeDL] Skip malformed formats for better extraction robustness
Extractors
+ [wsj] Add support for barrons.com (#13470)
+ [ign] Add another video id pattern (#13328)
+ [raiplay:live] Add support for live streams (#13414)
+ [redbulltv] Add support for live videos and segments (#13486)
+ [onetpl] Add support for videos embedded via pulsembed (#13482)
* [ooyala] Make more robust
* [ooyala] Skip empty format URLs (#13471, #13476)
* [hgtv.com:show] Fix typo
version 2017.06.23
Core
* [adobepass] Fix extraction on older python 2.6
Extractors
* [youtube] Adapt to new automatic captions rendition (#13467)
* [hgtv.com:show] Relax video config regular expression (#13279, #13461)
* [drtuber] Fix formats extraction (#12058)
* [youporn] Fix upload date extraction
* [youporn] Improve formats extraction
* [youporn] Fix title extraction (#13456)
* [googledrive] Fix formats sorting (#13443)
* [watchindianporn] Fix extraction (#13411, #13415)
+ [vimeo] Add fallback mp4 extension for original format
+ [ruv] Add support for ruv.is (#13396)
* [viu] Fix extraction on older python 2.6
* [pandora.tv] Fix upload_date extraction (#12846)
+ [asiancrush] Add support for asiancrush.com (#13420)
version 2017.06.18
Core
* [downloader/common] Use utils.shell_quote for debug command line
* [utils] Use compat_shlex_quote in shell_quote
* [postprocessor/execafterdownload] Encode command line (#13407)
* [compat] Fix compat_shlex_quote on Windows (#5889, #10254)
* [postprocessor/metadatafromtitle] Fix missing optional meta fields processing
in --metadata-from-title (#13408)
* [extractor/common] Fix json dumping with --geo-bypass
+ [extractor/common] Improve jwplayer subtitles extraction
+ [extractor/common] Improve jwplayer formats extraction (#13379)
Extractors
* [polskieradio] Fix extraction (#13392)
+ [xfileshare] Add support for fastvideo.me (#13385)
* [bilibili] Fix extraction of videos with double quotes in titles (#13387)
* [4tube] Fix extraction (#13381, #13382)
+ [disney] Add support for disneychannel.de (#13383)
* [npo] Improve URL regular expression (#13376)
+ [corus] Add support for showcase.ca
+ [corus] Add support for history.ca (#13359)
version 2017.06.12
Core

View File

@@ -101,7 +101,7 @@ youtube-dl.tar.gz: youtube-dl README.md README.txt youtube-dl.1 youtube-dl.bash-
--exclude '*.pyc' \
--exclude '*.pyo' \
--exclude '*~' \
--exclude '__pycache' \
--exclude '__pycache__' \
--exclude '.git' \
--exclude 'testdata' \
--exclude 'docs/_build' \

View File

@@ -8,7 +8,7 @@ import re
ROOT_DIR = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
README_FILE = os.path.join(ROOT_DIR, 'README.md')
PREFIX = '''%YOUTUBE-DL(1)
PREFIX = r'''%YOUTUBE-DL(1)
# NAME

View File

@@ -67,6 +67,8 @@
- **arte.tv:info**
- **arte.tv:magazine**
- **arte.tv:playlist**
- **AsianCrush**
- **AsianCrushPlaylist**
- **AtresPlayer**
- **ATTTechChannel**
- **ATVAt**
@@ -152,6 +154,7 @@
- **chirbit**
- **chirbit:profile**
- **Cinchcast**
- **CJSW**
- **Clipfish**
- **cliphunter**
- **ClipRs**
@@ -367,6 +370,7 @@
- **Jamendo**
- **JamendoAlbum**
- **JeuxVideo**
- **Joj**
- **Jove**
- **jpopsuki.tv**
- **JWPlatform**
@@ -642,6 +646,7 @@
- **RadioJavan**
- **Rai**
- **RaiPlay**
- **RaiPlayLive**
- **RBMARadio**
- **RDS**: RDS.ca
- **RedBullTV**
@@ -686,6 +691,7 @@
- **rutube:person**: Rutube person videos
- **RUTV**: RUTV.RU
- **Ruutu**
- **Ruv**
- **safari**: safaribooksonline.com online video
- **safari:api**
- **safari:course**: safaribooksonline.com online courses
@@ -764,6 +770,7 @@
- **Tagesschau**
- **tagesschau:player**
- **Tass**
- **TastyTrade**
- **TBS**
- **TDSLifeway**
- **teachertube**: teachertube.com videos
@@ -975,7 +982,7 @@
- **WSJArticle**
- **XBef**
- **XboxClips**
- **XFileShare**: XFileShare based sites: DaClips, FileHoot, GorillaVid, MovPod, PowerWatch, Rapidvideo.ws, TheVideoBee, Vidto, Streamin.To, XVIDSTAGE, Vid ABC, VidBom, vidlo, RapidVideo.TV
- **XFileShare**: XFileShare based sites: DaClips, FileHoot, GorillaVid, MovPod, PowerWatch, Rapidvideo.ws, TheVideoBee, Vidto, Streamin.To, XVIDSTAGE, Vid ABC, VidBom, vidlo, RapidVideo.TV, FastVideo.me
- **XHamster**
- **XHamsterEmbed**
- **xiami:album**: 虾米音乐 - 专辑
@@ -991,7 +998,6 @@
- **XVideos**
- **XXXYMovies**
- **Yahoo**: Yahoo screen and movies
- **Yam**: 蕃薯藤yam天空部落
- **yandexmusic:album**: Яндекс.Музыка - Альбом
- **yandexmusic:playlist**: Яндекс.Музыка - Плейлист
- **yandexmusic:track**: Яндекс.Музыка - Трек

View File

@@ -98,6 +98,7 @@ from youtube_dl.compat import (
compat_chr,
compat_etree_fromstring,
compat_getenv,
compat_os_name,
compat_setenv,
compat_urlparse,
compat_parse_qs,
@@ -448,7 +449,9 @@ class TestUtil(unittest.TestCase):
def test_shell_quote(self):
args = ['ffmpeg', '-i', encodeFilename('ñ€ß\'.mp4')]
self.assertEqual(shell_quote(args), """ffmpeg -i 'ñ€ß'"'"'.mp4'""")
self.assertEqual(
shell_quote(args),
"""ffmpeg -i 'ñ€ß'"'"'.mp4'""" if compat_os_name != 'nt' else '''ffmpeg -i "ñ€ß'.mp4"''')
def test_str_to_int(self):
self.assertEqual(str_to_int('123,456'), 123456)
@@ -932,7 +935,7 @@ class TestUtil(unittest.TestCase):
def test_args_to_str(self):
self.assertEqual(
args_to_str(['foo', 'ba/r', '-baz', '2 be', '']),
'foo ba/r -baz \'2 be\' \'\''
'foo ba/r -baz \'2 be\' \'\'' if compat_os_name != 'nt' else 'foo ba/r -baz "2 be" ""'
)
def test_parse_filesize(self):
@@ -1228,6 +1231,12 @@ part 3</font></u>
self.assertEqual(get_element_by_attribute('class', 'foo', html), None)
self.assertEqual(get_element_by_attribute('class', 'no-such-foo', html), None)
html = '''
<div itemprop="author" itemscope>foo</div>
'''
self.assertEqual(get_element_by_attribute('itemprop', 'author', html), 'foo')
def test_get_elements_by_class(self):
html = '''
<span class="foo bar">nice</span><span class="foo bar">also nice</span>

View File

@@ -1448,17 +1448,25 @@ class YoutubeDL(object):
if not formats:
raise ExtractorError('No video formats found!')
def is_wellformed(f):
url = f.get('url')
valid_url = url and isinstance(url, compat_str)
if not valid_url:
self.report_warning(
'"url" field is missing or empty - skipping format, '
'there is an error in extractor')
return valid_url
# Filter out malformed formats for better extraction robustness
formats = list(filter(is_wellformed, formats))
formats_dict = {}
# We check that all the formats have the format and format_id fields
for i, format in enumerate(formats):
if 'url' not in format:
raise ExtractorError('Missing "url" key in result (index %d)' % i)
sanitize_string_field(format, 'format_id')
sanitize_numeric_fields(format)
format['url'] = sanitize_url(format['url'])
if format.get('format_id') is None:
format['format_id'] = compat_str(i)
else:
@@ -1882,7 +1890,7 @@ class YoutubeDL(object):
info_dict.get('protocol') == 'm3u8' and
self.params.get('hls_prefer_native')):
if fixup_policy == 'warn':
self.report_warning('%s: malformated aac bitstream.' % (
self.report_warning('%s: malformed AAC bitstream detected.' % (
info_dict['id']))
elif fixup_policy == 'detect_or_warn':
fixup_pp = FFmpegFixupM3u8PP(self)
@@ -1891,7 +1899,7 @@ class YoutubeDL(object):
info_dict['__postprocessors'].append(fixup_pp)
else:
self.report_warning(
'%s: malformated aac bitstream. %s'
'%s: malformed AAC bitstream detected. %s'
% (info_dict['id'], INSTALL_FFMPEG_MESSAGE))
else:
assert fixup_policy in ('ignore', 'never')

View File

@@ -2617,14 +2617,22 @@ except ImportError: # Python 2
parsed_result[name] = [value]
return parsed_result
try:
from shlex import quote as compat_shlex_quote
except ImportError: # Python < 3.3
compat_os_name = os._name if os.name == 'java' else os.name
if compat_os_name == 'nt':
def compat_shlex_quote(s):
if re.match(r'^[-_\w./]+$', s):
return s
else:
return "'" + s.replace("'", "'\"'\"'") + "'"
return s if re.match(r'^[-_\w./]+$', s) else '"%s"' % s.replace('"', '\\"')
else:
try:
from shlex import quote as compat_shlex_quote
except ImportError: # Python < 3.3
def compat_shlex_quote(s):
if re.match(r'^[-_\w./]+$', s):
return s
else:
return "'" + s.replace("'", "'\"'\"'") + "'"
try:
@@ -2649,9 +2657,6 @@ def compat_ord(c):
return ord(c)
compat_os_name = os._name if os.name == 'java' else os.name
if sys.version_info >= (3, 0):
compat_getenv = os.getenv
compat_expanduser = os.path.expanduser
@@ -2895,6 +2900,7 @@ else:
__all__ = [
'compat_HTMLParseError',
'compat_HTMLParser',
'compat_HTTPError',
'compat_basestring',

View File

@@ -8,10 +8,11 @@ import random
from ..compat import compat_os_name
from ..utils import (
decodeArgument,
encodeFilename,
error_to_compat_str,
decodeArgument,
format_bytes,
shell_quote,
timeconvert,
)
@@ -381,10 +382,5 @@ class FileDownloader(object):
if exe is None:
exe = os.path.basename(str_args[0])
try:
import pipes
shell_quote = lambda args: ' '.join(map(pipes.quote, str_args))
except ImportError:
shell_quote = repr
self.to_screen('[debug] %s command line: %s' % (
exe, shell_quote(str_args)))

View File

@@ -3,11 +3,13 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
ExtractorError,
js_to_json,
int_or_none,
parse_iso8601,
try_get,
)
@@ -124,7 +126,20 @@ class ABCIViewIE(InfoExtractor):
title = video_params.get('title') or video_params['seriesTitle']
stream = next(s for s in video_params['playlist'] if s.get('type') == 'program')
formats = self._extract_akamai_formats(stream['hds-unmetered'], video_id)
format_urls = [
try_get(stream, lambda x: x['hds-unmetered'], compat_str)]
# May have higher quality video
sd_url = try_get(
stream, lambda x: x['streams']['hds']['sd'], compat_str)
if sd_url:
format_urls.append(sd_url.replace('metered', 'um'))
formats = []
for format_url in format_urls:
if format_url:
formats.extend(
self._extract_akamai_formats(format_url, video_id))
self._sort_formats(formats)
subtitles = {}

View File

@@ -22,7 +22,7 @@ class ABCOTVSIE(InfoExtractor):
'display_id': 'east-bay-museum-celebrates-vintage-synthesizers',
'ext': 'mp4',
'title': 'East Bay museum celebrates vintage synthesizers',
'description': 'md5:a4f10fb2f2a02565c1749d4adbab4b10',
'description': 'md5:24ed2bd527096ec2a5c67b9d5a9005f3',
'thumbnail': r're:^https?://.*\.jpg$',
'timestamp': 1421123075,
'upload_date': '20150113',

View File

@@ -6,12 +6,16 @@ import time
import xml.etree.ElementTree as etree
from .common import InfoExtractor
from ..compat import compat_urlparse
from ..compat import (
compat_kwargs,
compat_urlparse,
)
from ..utils import (
unescapeHTML,
urlencode_postdata,
unified_timestamp,
ExtractorError,
NO_DEFAULT,
)
@@ -21,6 +25,11 @@ MSO_INFO = {
'username_field': 'username',
'password_field': 'password',
},
'ATTOTT': {
'name': 'DIRECTV NOW',
'username_field': 'email',
'password_field': 'loginpassword',
},
'Rogers': {
'name': 'Rogers',
'username_field': 'UserName',
@@ -1313,11 +1322,14 @@ class AdobePassIE(InfoExtractor):
_USER_AGENT = 'Mozilla/5.0 (X11; Linux i686; rv:47.0) Gecko/20100101 Firefox/47.0'
_MVPD_CACHE = 'ap-mvpd'
_DOWNLOADING_LOGIN_PAGE = 'Downloading Provider Login Page'
def _download_webpage_handle(self, *args, **kwargs):
headers = kwargs.get('headers', {})
headers.update(self.geo_verification_headers())
kwargs['headers'] = headers
return super(AdobePassIE, self)._download_webpage_handle(*args, **kwargs)
return super(AdobePassIE, self)._download_webpage_handle(
*args, **compat_kwargs(kwargs))
@staticmethod
def _get_mvpd_resource(provider_id, title, guid, rating):
@@ -1361,6 +1373,21 @@ class AdobePassIE(InfoExtractor):
'Use --ap-mso to specify Adobe Pass Multiple-system operator Identifier '
'and --ap-username and --ap-password or --netrc to provide account credentials.', expected=True)
def extract_redirect_url(html, url=None, fatal=False):
# TODO: eliminate code duplication with generic extractor and move
# redirection code into _download_webpage_handle
REDIRECT_REGEX = r'[0-9]{,2};\s*(?:URL|url)=\'?([^\'"]+)'
redirect_url = self._search_regex(
r'(?i)<meta\s+(?=(?:[a-z-]+="[^"]+"\s+)*http-equiv="refresh")'
r'(?:[a-z-]+="[^"]+"\s+)*?content="%s' % REDIRECT_REGEX,
html, 'meta refresh redirect',
default=NO_DEFAULT if fatal else None, fatal=fatal)
if not redirect_url:
return None
if url:
redirect_url = compat_urlparse.urljoin(url, unescapeHTML(redirect_url))
return redirect_url
mvpd_headers = {
'ap_42': 'anonymous',
'ap_11': 'Linux i686',
@@ -1410,16 +1437,15 @@ class AdobePassIE(InfoExtractor):
if '<form name="signin"' in provider_redirect_page:
provider_login_page_res = provider_redirect_page_res
elif 'http-equiv="refresh"' in provider_redirect_page:
oauth_redirect_url = self._html_search_regex(
r'content="0;\s*url=([^\'"]+)',
provider_redirect_page, 'meta refresh redirect')
oauth_redirect_url = extract_redirect_url(
provider_redirect_page, fatal=True)
provider_login_page_res = self._download_webpage_handle(
oauth_redirect_url, video_id,
'Downloading Provider Login Page')
self._DOWNLOADING_LOGIN_PAGE)
else:
provider_login_page_res = post_form(
provider_redirect_page_res,
'Downloading Provider Login Page')
self._DOWNLOADING_LOGIN_PAGE)
mvpd_confirm_page_res = post_form(
provider_login_page_res, 'Logging in', {
@@ -1466,8 +1492,17 @@ class AdobePassIE(InfoExtractor):
'Content-Type': 'application/x-www-form-urlencoded'
})
else:
# Some providers (e.g. DIRECTV NOW) have another meta refresh
# based redirect that should be followed.
provider_redirect_page, urlh = provider_redirect_page_res
provider_refresh_redirect_url = extract_redirect_url(
provider_redirect_page, url=urlh.geturl())
if provider_refresh_redirect_url:
provider_redirect_page_res = self._download_webpage_handle(
provider_refresh_redirect_url, video_id,
'Downloading Provider Redirect Page (meta refresh)')
provider_login_page_res = post_form(
provider_redirect_page_res, 'Downloading Provider Login Page')
provider_redirect_page_res, self._DOWNLOADING_LOGIN_PAGE)
mvpd_confirm_page_res = post_form(provider_login_page_res, 'Logging in', {
mso_info.get('username_field', 'username'): username,
mso_info.get('password_field', 'password'): password,

View File

@@ -0,0 +1,93 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from .kaltura import KalturaIE
from ..utils import (
extract_attributes,
remove_end,
urlencode_postdata,
)
class AsianCrushIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?asiancrush\.com/video/(?:[^/]+/)?0+(?P<id>\d+)v\b'
_TESTS = [{
'url': 'https://www.asiancrush.com/video/012869v/women-who-flirt/',
'md5': 'c3b740e48d0ba002a42c0b72857beae6',
'info_dict': {
'id': '1_y4tmjm5r',
'ext': 'mp4',
'title': 'Women Who Flirt',
'description': 'md5:3db14e9186197857e7063522cb89a805',
'timestamp': 1496936429,
'upload_date': '20170608',
'uploader_id': 'craig@crifkin.com',
},
}, {
'url': 'https://www.asiancrush.com/video/she-was-pretty/011886v-pretty-episode-3/',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
data = self._download_json(
'https://www.asiancrush.com/wp-admin/admin-ajax.php', video_id,
data=urlencode_postdata({
'postid': video_id,
'action': 'get_channel_kaltura_vars',
}))
entry_id = data['entry_id']
return self.url_result(
'kaltura:%s:%s' % (data['partner_id'], entry_id),
ie=KalturaIE.ie_key(), video_id=entry_id,
video_title=data.get('vid_label'))
class AsianCrushPlaylistIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?asiancrush\.com/series/0+(?P<id>\d+)s\b'
_TEST = {
'url': 'https://www.asiancrush.com/series/012481s/scholar-walks-night/',
'info_dict': {
'id': '12481',
'title': 'Scholar Who Walks the Night',
'description': 'md5:7addd7c5132a09fd4741152d96cce886',
},
'playlist_count': 20,
}
def _real_extract(self, url):
playlist_id = self._match_id(url)
webpage = self._download_webpage(url, playlist_id)
entries = []
for mobj in re.finditer(
r'<a[^>]+href=(["\'])(?P<url>%s.*?)\1[^>]*>' % AsianCrushIE._VALID_URL,
webpage):
attrs = extract_attributes(mobj.group(0))
if attrs.get('class') == 'clearfix':
entries.append(self.url_result(
mobj.group('url'), ie=AsianCrushIE.ie_key()))
title = remove_end(
self._html_search_regex(
r'(?s)<h1\b[^>]\bid=["\']movieTitle[^>]+>(.+?)</h1>', webpage,
'title', default=None) or self._og_search_title(
webpage, default=None) or self._html_search_meta(
'twitter:title', webpage, 'title',
default=None) or self._search_regex(
r'<title>([^<]+)</title>', webpage, 'title', fatal=False),
' | AsianCrush')
description = self._og_search_description(
webpage, default=None) or self._html_search_meta(
'twitter:description', webpage, 'description', fatal=False)
return self.playlist_result(entries, playlist_id, title, description)

View File

@@ -36,7 +36,7 @@ class BBCCoUkIE(InfoExtractor):
(?:
programmes/(?!articles/)|
iplayer(?:/[^/]+)?/(?:episode/|playlist/)|
music/clips[/#]|
music/(?:clips|audiovideo/popular)[/#]|
radio/player/
)
(?P<id>%s)(?!/(?:episodes|broadcasts|clips))
@@ -229,8 +229,10 @@ class BBCCoUkIE(InfoExtractor):
}, {
'url': 'http://www.bbc.co.uk/radio/player/p03cchwf',
'only_matching': True,
}
]
}, {
'url': 'https://www.bbc.co.uk/music/audiovideo/popular#p055bc55',
'only_matching': True,
}]
_USP_RE = r'/([^/]+?)\.ism(?:\.hlsv2\.ism)?/[^/]+\.m3u8'
@@ -523,6 +525,12 @@ class BBCCoUkIE(InfoExtractor):
webpage = self._download_webpage(url, group_id, 'Downloading video page')
error = self._search_regex(
r'<div\b[^>]+\bclass=["\']smp__message delta["\'][^>]*>([^<]+)<',
webpage, 'error', default=None)
if error:
raise ExtractorError(error, expected=True)
programme_id = None
duration = None

View File

@@ -54,6 +54,22 @@ class BiliBiliIE(InfoExtractor):
'description': '如果你是神明并且能够让妄想成为现实。那你会进行怎么样的妄想是淫靡的世界独裁社会毁灭性的制裁还是……2015年涩谷。从6年前发生的大灾害“涩谷地震”之后复兴了的这个街区里新设立的私立高中...',
},
'skip': 'Geo-restricted to China',
}, {
# Title with double quotes
'url': 'http://www.bilibili.com/video/av8903802/',
'info_dict': {
'id': '8903802',
'ext': 'mp4',
'title': '阿滴英文|英文歌分享#6 "Closer',
'description': '滴妹今天唱Closer給你聽! 有史以来,被推最多次也是最久的歌曲,其实歌词跟我原本想像差蛮多的,不过还是好听! 微博@阿滴英文',
'uploader': '阿滴英文',
'uploader_id': '65880958',
'timestamp': 1488382620,
'upload_date': '20170301',
},
'params': {
'skip_download': True, # Test metadata only
},
}]
_APP_KEY = '84956560bc028eb7'
@@ -135,7 +151,7 @@ class BiliBiliIE(InfoExtractor):
'formats': formats,
})
title = self._html_search_regex('<h1[^>]+title="([^"]+)">', webpage, 'title')
title = self._html_search_regex('<h1[^>]*>([^<]+)</h1>', webpage, 'title')
description = self._html_search_meta('description', webpage)
timestamp = unified_timestamp(self._html_search_regex(
r'<time[^>]+datetime="([^"]+)"', webpage, 'upload time', default=None))

View File

@@ -84,9 +84,10 @@ class BuzzFeedIE(InfoExtractor):
continue
entries.append(self.url_result(video['url']))
facebook_url = FacebookIE._extract_url(webpage)
if facebook_url:
entries.append(self.url_result(facebook_url))
facebook_urls = FacebookIE._extract_urls(webpage)
entries.extend([
self.url_result(facebook_url)
for facebook_url in facebook_urls])
return {
'_type': 'playlist',

View File

@@ -15,19 +15,23 @@ class CBSNewsIE(CBSIE):
_TESTS = [
{
'url': 'http://www.cbsnews.com/news/tesla-and-spacex-elon-musks-industrial-empire/',
# 60 minutes
'url': 'http://www.cbsnews.com/news/artificial-intelligence-positioned-to-be-a-game-changer/',
'info_dict': {
'id': 'tesla-and-spacex-elon-musks-industrial-empire',
'ext': 'flv',
'title': 'Tesla and SpaceX: Elon Musk\'s industrial empire',
'thumbnail': 'http://beta.img.cbsnews.com/i/2014/03/30/60147937-2f53-4565-ad64-1bdd6eb64679/60-0330-pelley-640x360.jpg',
'duration': 791,
'id': '_B6Ga3VJrI4iQNKsir_cdFo9Re_YJHE_',
'ext': 'mp4',
'title': 'Artificial Intelligence',
'description': 'md5:8818145f9974431e0fb58a1b8d69613c',
'thumbnail': r're:^https?://.*\.jpg$',
'duration': 1606,
'uploader': 'CBSI-NEW',
'timestamp': 1498431900,
'upload_date': '20170625',
},
'params': {
# rtmp download
# m3u8 download
'skip_download': True,
},
'skip': 'Subscribers only',
},
{
'url': 'http://www.cbsnews.com/videos/fort-hood-shooting-army-downplays-mental-illness-as-cause-of-attack/',
@@ -52,6 +56,22 @@ class CBSNewsIE(CBSIE):
'skip_download': True,
},
},
{
# 48 hours
'url': 'http://www.cbsnews.com/news/maria-ridulph-murder-will-the-nations-oldest-cold-case-to-go-to-trial-ever-get-solved/',
'info_dict': {
'id': 'QpM5BJjBVEAUFi7ydR9LusS69DPLqPJ1',
'ext': 'mp4',
'title': 'Cold as Ice',
'description': 'Can a childhood memory of a friend\'s murder solve a 1957 cold case? "48 Hours" correspondent Erin Moriarty has the latest.',
'upload_date': '20170604',
'timestamp': 1496538000,
'uploader': 'CBSI-NEW',
},
'params': {
'skip_download': True,
},
},
]
def _real_extract(self, url):
@@ -60,7 +80,7 @@ class CBSNewsIE(CBSIE):
webpage = self._download_webpage(url, video_id)
video_info = self._parse_json(self._html_search_regex(
r'(?:<ul class="media-list items" id="media-related-items"><li data-video-info|<div id="cbsNewsVideoPlayer" data-video-player-options)=\'({.+?})\'',
r'(?:<ul class="media-list items" id="media-related-items"[^>]*><li data-video-info|<div id="cbsNewsVideoPlayer" data-video-player-options)=\'({.+?})\'',
webpage, 'video JSON info', default='{}'), video_id, fatal=False)
if video_info:

View File

@@ -0,0 +1,72 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
determine_ext,
unescapeHTML,
)
class CJSWIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?cjsw\.com/program/(?P<program>[^/]+)/episode/(?P<id>\d+)'
_TESTS = [{
'url': 'http://cjsw.com/program/freshly-squeezed/episode/20170620',
'md5': 'cee14d40f1e9433632c56e3d14977120',
'info_dict': {
'id': '91d9f016-a2e7-46c5-8dcb-7cbcd7437c41',
'ext': 'mp3',
'title': 'Freshly Squeezed Episode June 20, 2017',
'description': 'md5:c967d63366c3898a80d0c7b0ff337202',
'series': 'Freshly Squeezed',
'episode_id': '20170620',
},
}, {
# no description
'url': 'http://cjsw.com/program/road-pops/episode/20170707/',
'only_matching': True,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
program, episode_id = mobj.group('program', 'id')
audio_id = '%s/%s' % (program, episode_id)
webpage = self._download_webpage(url, episode_id)
title = unescapeHTML(self._search_regex(
(r'<h1[^>]+class=["\']episode-header__title["\'][^>]*>(?P<title>[^<]+)',
r'data-audio-title=(["\'])(?P<title>(?:(?!\1).)+)\1'),
webpage, 'title', group='title'))
audio_url = self._search_regex(
r'<button[^>]+data-audio-src=(["\'])(?P<url>(?:(?!\1).)+)\1',
webpage, 'audio url', group='url')
audio_id = self._search_regex(
r'/([\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})\.mp3',
audio_url, 'audio id', default=audio_id)
formats = [{
'url': audio_url,
'ext': determine_ext(audio_url, 'mp3'),
'vcodec': 'none',
}]
description = self._html_search_regex(
r'<p>(?P<description>.+?)</p>', webpage, 'description',
default=None)
series = self._search_regex(
r'data-showname=(["\'])(?P<name>(?:(?!\1).)+)\1', webpage,
'series', default=program, group='name')
return {
'id': audio_id,
'title': title,
'description': description,
'formats': formats,
'series': series,
'episode_id': episode_id,
}

View File

@@ -420,7 +420,7 @@ class InfoExtractor(object):
if country_code:
self._x_forwarded_for_ip = GeoUtils.random_ipv4(country_code)
if self._downloader.params.get('verbose', False):
self._downloader.to_stdout(
self._downloader.to_screen(
'[debug] Using fake IP %s (%s) as X-Forwarded-For.'
% (self._x_forwarded_for_ip, country_code.upper()))
@@ -1002,17 +1002,17 @@ class InfoExtractor(object):
item_type = e.get('@type')
if expected_type is not None and expected_type != item_type:
return info
if item_type == 'TVEpisode':
if item_type in ('TVEpisode', 'Episode'):
info.update({
'episode': unescapeHTML(e.get('name')),
'episode_number': int_or_none(e.get('episodeNumber')),
'description': unescapeHTML(e.get('description')),
})
part_of_season = e.get('partOfSeason')
if isinstance(part_of_season, dict) and part_of_season.get('@type') == 'TVSeason':
if isinstance(part_of_season, dict) and part_of_season.get('@type') in ('TVSeason', 'Season', 'CreativeWorkSeason'):
info['season_number'] = int_or_none(part_of_season.get('seasonNumber'))
part_of_series = e.get('partOfSeries') or e.get('partOfTVSeries')
if isinstance(part_of_series, dict) and part_of_series.get('@type') == 'TVSeries':
if isinstance(part_of_series, dict) and part_of_series.get('@type') in ('TVSeries', 'Series', 'CreativeWorkSeries'):
info['series'] = unescapeHTML(part_of_series.get('name'))
elif item_type == 'Article':
info.update({
@@ -1022,10 +1022,10 @@ class InfoExtractor(object):
})
elif item_type == 'VideoObject':
extract_video_object(e)
elif item_type == 'WebPage':
video = e.get('video')
if isinstance(video, dict) and video.get('@type') == 'VideoObject':
extract_video_object(video)
continue
video = e.get('video')
if isinstance(video, dict) and video.get('@type') == 'VideoObject':
extract_video_object(video)
break
return dict((k, v) for k, v in info.items() if v is not None)
@@ -2132,15 +2132,18 @@ class InfoExtractor(object):
return is_plain_url, formats
entries = []
# amp-video and amp-audio are very similar to their HTML5 counterparts
# so we wll include them right here (see
# https://www.ampproject.org/docs/reference/components/amp-video)
media_tags = [(media_tag, media_type, '')
for media_tag, media_type
in re.findall(r'(?s)(<(video|audio)[^>]*/>)', webpage)]
in re.findall(r'(?s)(<(?:amp-)?(video|audio)[^>]*/>)', webpage)]
media_tags.extend(re.findall(
# We only allow video|audio followed by a whitespace or '>'.
# Allowing more characters may end up in significant slow down (see
# https://github.com/rg3/youtube-dl/issues/11979, example URL:
# http://www.porntrex.com/maps/videositemap.xml).
r'(?s)(<(?P<tag>video|audio)(?:\s+[^>]*)?>)(.*?)</(?P=tag)>', webpage))
r'(?s)(<(?P<tag>(?:amp-)?(?:video|audio))(?:\s+[^>]*)?>)(.*?)</(?P=tag)>', webpage))
for media_tag, media_type, media_content in media_tags:
media_info = {
'formats': [],
@@ -2299,6 +2302,8 @@ class InfoExtractor(object):
tracks = video_data.get('tracks')
if tracks and isinstance(tracks, list):
for track in tracks:
if not isinstance(track, dict):
continue
if track.get('kind') != 'captions':
continue
track_url = urljoin(base_url, track.get('file'))
@@ -2328,6 +2333,8 @@ class InfoExtractor(object):
urls = []
formats = []
for source in jwplayer_sources_data:
if not isinstance(source, dict):
continue
source_url = self._proto_relative_url(source.get('file'))
if not source_url:
continue

View File

@@ -8,7 +8,16 @@ from ..utils import int_or_none
class CorusIE(ThePlatformFeedIE):
_VALID_URL = r'https?://(?:www\.)?(?P<domain>(?:globaltv|etcanada)\.com|(?:hgtv|foodnetwork|slice)\.ca)/(?:video/|(?:[^/]+/)+(?:videos/[a-z0-9-]+-|video\.html\?.*?\bv=))(?P<id>\d+)'
_VALID_URL = r'''(?x)
https?://
(?:www\.)?
(?P<domain>
(?:globaltv|etcanada)\.com|
(?:hgtv|foodnetwork|slice|history|showcase)\.ca
)
/(?:video/|(?:[^/]+/)+(?:videos/[a-z0-9-]+-|video\.html\?.*?\bv=))
(?P<id>\d+)
'''
_TESTS = [{
'url': 'http://www.hgtv.ca/shows/bryan-inc/videos/movie-night-popcorn-with-bryan-870923331648/',
'md5': '05dcbca777bf1e58c2acbb57168ad3a6',
@@ -27,6 +36,12 @@ class CorusIE(ThePlatformFeedIE):
}, {
'url': 'http://etcanada.com/video/873675331955/meet-the-survivor-game-changers-castaways-part-2/',
'only_matching': True,
}, {
'url': 'http://www.history.ca/the-world-without-canada/video/full-episodes/natural-resources/video.html?v=955054659646#video',
'only_matching': True,
}, {
'url': 'http://www.showcase.ca/eyewitness/video/eyewitness++106/video.html?v=955070531919&p=1&s=da#video',
'only_matching': True,
}]
_TP_FEEDS = {
@@ -50,6 +65,14 @@ class CorusIE(ThePlatformFeedIE):
'feed_id': '5tUJLgV2YNJ5',
'account_id': 2414427935,
},
'history': {
'feed_id': 'tQFx_TyyEq4J',
'account_id': 2369613659,
},
'showcase': {
'feed_id': '9H6qyshBZU3E',
'account_id': 2414426607,
},
}
def _real_extract(self, url):

View File

@@ -1,6 +1,8 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
@@ -12,8 +14,8 @@ from ..utils import (
class DailyMailIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?dailymail\.co\.uk/video/[^/]+/video-(?P<id>[0-9]+)'
_TEST = {
_VALID_URL = r'https?://(?:www\.)?dailymail\.co\.uk/(?:video/[^/]+/video-|embed/video/)(?P<id>[0-9]+)'
_TESTS = [{
'url': 'http://www.dailymail.co.uk/video/tvshowbiz/video-1295863/The-Mountain-appears-sparkling-water-ad-Heavy-Bubbles.html',
'md5': 'f6129624562251f628296c3a9ffde124',
'info_dict': {
@@ -22,7 +24,16 @@ class DailyMailIE(InfoExtractor):
'title': 'The Mountain appears in sparkling water ad for \'Heavy Bubbles\'',
'description': 'md5:a93d74b6da172dd5dc4d973e0b766a84',
}
}
}, {
'url': 'http://www.dailymail.co.uk/embed/video/1295863.html',
'only_matching': True,
}]
@staticmethod
def _extract_urls(webpage):
return re.findall(
r'<iframe\b[^>]+\bsrc=["\'](?P<url>(?:https?:)?//(?:www\.)?dailymail\.co\.uk/embed/video/\d+\.html)',
webpage)
def _real_extract(self, url):
video_id = self._match_id(url)

View File

@@ -147,7 +147,7 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
view_count_str = self._search_regex(
(r'<meta[^>]+itemprop="interactionCount"[^>]+content="UserPlays:([\s\d,.]+)"',
r'video_views_count[^>]+>\s+([\s\d\,.]+)'),
webpage, 'view count', fatal=False)
webpage, 'view count', default=None)
if view_count_str:
view_count_str = re.sub(r'\s', '', view_count_str)
view_count = str_to_int(view_count_str)
@@ -159,7 +159,9 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
[r'buildPlayer\(({.+?})\);\n', # See https://github.com/rg3/youtube-dl/issues/7826
r'playerV5\s*=\s*dmp\.create\([^,]+?,\s*({.+?})\);',
r'buildPlayer\(({.+?})\);',
r'var\s+config\s*=\s*({.+?});'],
r'var\s+config\s*=\s*({.+?});',
# New layout regex (see https://github.com/rg3/youtube-dl/issues/13580)
r'__PLAYER_CONFIG__\s*=\s*({.+?});'],
webpage, 'player v5', default=None)
if player_v5:
player = self._parse_json(player_v5, video_id)

View File

@@ -15,7 +15,7 @@ from ..utils import (
class DisneyIE(InfoExtractor):
_VALID_URL = r'''(?x)
https?://(?P<domain>(?:[^/]+\.)?(?:disney\.[a-z]{2,3}(?:\.[a-z]{2})?|disney(?:(?:me|latino)\.com|turkiye\.com\.tr)|(?:starwars|marvelkids)\.com))/(?:(?:embed/|(?:[^/]+/)+[\w-]+-)(?P<id>[a-z0-9]{24})|(?:[^/]+/)?(?P<display_id>[^/?#]+))'''
https?://(?P<domain>(?:[^/]+\.)?(?:disney\.[a-z]{2,3}(?:\.[a-z]{2})?|disney(?:(?:me|latino)\.com|turkiye\.com\.tr|channel\.de)|(?:starwars|marvelkids)\.com))/(?:(?:embed/|(?:[^/]+/)+[\w-]+-)(?P<id>[a-z0-9]{24})|(?:[^/]+/)?(?P<display_id>[^/?#]+))'''
_TESTS = [{
# Disney.EmbedVideo
'url': 'http://video.disney.com/watch/moana-trailer-545ed1857afee5a0ec239977',
@@ -68,6 +68,9 @@ class DisneyIE(InfoExtractor):
}, {
'url': 'http://disneyjunior.en.disneyme.com/dj/watch-my-friends-tigger-and-pooh-promo',
'only_matching': True,
}, {
'url': 'http://disneychannel.de/sehen/soy-luna-folge-118-5518518987ba27f3cc729268',
'only_matching': True,
}, {
'url': 'http://disneyjunior.disney.com/galactech-the-galactech-grab-galactech-an-admiral-rescue',
'only_matching': True,

View File

@@ -184,7 +184,7 @@ class DPlayItIE(InfoExtractor):
webpage = self._download_webpage(url, display_id)
info_url = self._search_regex(
r'url\s*:\s*["\']((?:https?:)?//[^/]+/playback/videoPlaybackInfo/\d+)',
r'url\s*[:=]\s*["\']((?:https?:)?//[^/]+/playback/videoPlaybackInfo/\d+)',
webpage, 'video id')
title = remove_end(self._og_search_title(webpage), ' | Dplay')

View File

@@ -44,8 +44,23 @@ class DrTuberIE(InfoExtractor):
webpage = self._download_webpage(
'http://www.drtuber.com/video/%s' % video_id, display_id)
video_url = self._html_search_regex(
r'<source src="([^"]+)"', webpage, 'video URL')
video_data = self._download_json(
'http://www.drtuber.com/player_config_json/', video_id, query={
'vid': video_id,
'embed': 0,
'aid': 0,
'domain_id': 0,
})
formats = []
for format_id, video_url in video_data['files'].items():
if video_url:
formats.append({
'format_id': format_id,
'quality': 2 if format_id == 'hq' else 1,
'url': video_url
})
self._sort_formats(formats)
title = self._html_search_regex(
(r'class="title_watch"[^>]*><(?:p|h\d+)[^>]*>([^<]+)<',
@@ -75,7 +90,7 @@ class DrTuberIE(InfoExtractor):
return {
'id': video_id,
'display_id': display_id,
'url': video_url,
'formats': formats,
'title': title,
'thumbnail': thumbnail,
'like_count': like_count,

View File

@@ -11,6 +11,7 @@ from ..compat import (
from ..utils import (
ExtractorError,
int_or_none,
unsmuggle_url,
)
@@ -50,6 +51,10 @@ class EaglePlatformIE(InfoExtractor):
'view_count': int,
},
'skip': 'Georestricted',
}, {
# referrer protected video (https://tvrain.ru/lite/teleshow/kak_vse_nachinalos/namin-418921/)
'url': 'tvrainru.media.eagleplatform.com:582306',
'only_matching': True,
}]
@staticmethod
@@ -60,16 +65,40 @@ class EaglePlatformIE(InfoExtractor):
webpage)
if mobj is not None:
return mobj.group('url')
# Basic usage embedding (see http://dultonmedia.github.io/eplayer/)
PLAYER_JS_RE = r'''
<script[^>]+
src=(?P<qjs>["\'])(?:https?:)?//(?P<host>(?:(?!(?P=qjs)).)+\.media\.eagleplatform\.com)/player/player\.js(?P=qjs)
.+?
'''
# "Basic usage" embedding (see http://dultonmedia.github.io/eplayer/)
mobj = re.search(
r'''(?xs)
<script[^>]+
src=(?P<q1>["\'])(?:https?:)?//(?P<host>.+?\.media\.eagleplatform\.com)/player/player\.js(?P=q1)
.+?
%s
<div[^>]+
class=(?P<q2>["\'])eagleplayer(?P=q2)[^>]+
class=(?P<qclass>["\'])eagleplayer(?P=qclass)[^>]+
data-id=["\'](?P<id>\d+)
''', webpage)
''' % PLAYER_JS_RE, webpage)
if mobj is not None:
return 'eagleplatform:%(host)s:%(id)s' % mobj.groupdict()
# Generalization of "Javascript code usage", "Combined usage" and
# "Usage without attaching to DOM" embeddings (see
# http://dultonmedia.github.io/eplayer/)
mobj = re.search(
r'''(?xs)
%s
<script>
.+?
new\s+EaglePlayer\(
(?:[^,]+\s*,\s*)?
{
.+?
\bid\s*:\s*["\']?(?P<id>\d+)
.+?
}
\s*\)
.+?
</script>
''' % PLAYER_JS_RE, webpage)
if mobj is not None:
return 'eagleplatform:%(host)s:%(id)s' % mobj.groupdict()
@@ -79,9 +108,10 @@ class EaglePlatformIE(InfoExtractor):
if status != 200:
raise ExtractorError(' '.join(response['errors']), expected=True)
def _download_json(self, url_or_request, video_id, note='Downloading JSON metadata', *args, **kwargs):
def _download_json(self, url_or_request, video_id, *args, **kwargs):
try:
response = super(EaglePlatformIE, self)._download_json(url_or_request, video_id, note)
response = super(EaglePlatformIE, self)._download_json(
url_or_request, video_id, *args, **kwargs)
except ExtractorError as ee:
if isinstance(ee.cause, compat_HTTPError):
response = self._parse_json(ee.cause.read().decode('utf-8'), video_id)
@@ -93,11 +123,24 @@ class EaglePlatformIE(InfoExtractor):
return self._download_json(url_or_request, video_id, note)['data'][0]
def _real_extract(self, url):
url, smuggled_data = unsmuggle_url(url, {})
mobj = re.match(self._VALID_URL, url)
host, video_id = mobj.group('custom_host') or mobj.group('host'), mobj.group('id')
headers = {}
query = {
'id': video_id,
}
referrer = smuggled_data.get('referrer')
if referrer:
headers['Referer'] = referrer
query['referrer'] = referrer
player_data = self._download_json(
'http://%s/api/player_data?id=%s' % (host, video_id), video_id)
'http://%s/api/player_data' % host, video_id,
headers=headers, query=query)
media = player_data['data']['playlist']['viewports'][0]['medialist'][0]

View File

@@ -1,15 +1,13 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
class EggheadCourseIE(InfoExtractor):
IE_DESC = 'egghead.io course'
IE_NAME = 'egghead:course'
_VALID_URL = r'https://egghead\.io/courses/(?P<id>[a-zA-Z_0-9-]+)'
_VALID_URL = r'https://egghead\.io/courses/(?P<id>[^/?#&]+)'
_TEST = {
'url': 'https://egghead.io/courses/professor-frisby-introduces-composable-functional-javascript',
'playlist_count': 29,
@@ -22,18 +20,16 @@ class EggheadCourseIE(InfoExtractor):
def _real_extract(self, url):
playlist_id = self._match_id(url)
webpage = self._download_webpage(url, playlist_id)
title = self._html_search_regex(r'<h1 class="title">([^<]+)</h1>', webpage, 'title')
ul = self._search_regex(r'(?s)<ul class="series-lessons-list">(.*?)</ul>', webpage, 'session list')
course = self._download_json(
'https://egghead.io/api/v1/series/%s' % playlist_id, playlist_id)
found = re.findall(r'(?s)<a class="[^"]*"\s*href="([^"]+)">\s*<li class="item', ul)
entries = [self.url_result(m) for m in found]
entries = [
self.url_result(
'wistia:%s' % lesson['wistia_id'], ie='Wistia',
video_id=lesson['wistia_id'], video_title=lesson.get('title'))
for lesson in course['lessons'] if lesson.get('wistia_id')]
return {
'_type': 'playlist',
'id': playlist_id,
'title': title,
'description': self._og_search_description(webpage),
'entries': entries,
}
return self.playlist_result(
entries, playlist_id, course.get('title'),
course.get('description'))

View File

@@ -10,7 +10,25 @@ from ..utils import (
class ESPNIE(InfoExtractor):
_VALID_URL = r'https?://(?:espn\.go|(?:www\.)?espn)\.com/video/clip(?:\?.*?\bid=|/_/id/)(?P<id>\d+)'
_VALID_URL = r'''(?x)
https?://
(?:
(?:(?:\w+\.)+)?espn\.go|
(?:www\.)?espn
)\.com/
(?:
(?:
video/clip|
watch/player
)
(?:
\?.*?\bid=|
/_/id/
)
)
(?P<id>\d+)
'''
_TESTS = [{
'url': 'http://espn.go.com/video/clip?id=10365079',
'info_dict': {
@@ -25,20 +43,34 @@ class ESPNIE(InfoExtractor):
'skip_download': True,
},
}, {
# intl video, from http://www.espnfc.us/video/mls-highlights/150/video/2743663/must-see-moments-best-of-the-mls-season
'url': 'http://espn.go.com/video/clip?id=2743663',
'url': 'https://broadband.espn.go.com/video/clip?id=18910086',
'info_dict': {
'id': '2743663',
'id': '18910086',
'ext': 'mp4',
'title': 'Must-See Moments: Best of the MLS season',
'description': 'md5:4c2d7232beaea572632bec41004f0aeb',
'timestamp': 1449446454,
'upload_date': '20151207',
'title': 'Kyrie spins around defender for two',
'description': 'md5:2b0f5bae9616d26fba8808350f0d2b9b',
'timestamp': 1489539155,
'upload_date': '20170315',
},
'params': {
'skip_download': True,
},
'expected_warnings': ['Unable to download f4m manifest'],
}, {
'url': 'http://nonredline.sports.espn.go.com/video/clip?id=19744672',
'only_matching': True,
}, {
'url': 'https://cdn.espn.go.com/video/clip/_/id/19771774',
'only_matching': True,
}, {
'url': 'http://www.espn.com/watch/player?id=19141491',
'only_matching': True,
}, {
'url': 'http://www.espn.com/watch/player?bucketId=257&id=19505875',
'only_matching': True,
}, {
'url': 'http://www.espn.com/watch/player/_/id/19141491',
'only_matching': True,
}, {
'url': 'http://www.espn.com/video/clip?id=10365079',
'only_matching': True,

View File

@@ -71,6 +71,10 @@ from .arte import (
TheOperaPlatformIE,
ArteTVPlaylistIE,
)
from .asiancrush import (
AsianCrushIE,
AsianCrushPlaylistIE,
)
from .atresplayer import AtresPlayerIE
from .atttechchannel import ATTTechChannelIE
from .atvat import ATVAtIE
@@ -181,6 +185,7 @@ from .chirbit import (
ChirbitProfileIE,
)
from .cinchcast import CinchcastIE
from .cjsw import CJSWIE
from .clipfish import ClipfishIE
from .cliphunter import CliphunterIE
from .cliprs import ClipRsIE
@@ -465,6 +470,7 @@ from .jamendo import (
)
from .jeuxvideo import JeuxVideoIE
from .jove import JoveIE
from .joj import JojIE
from .jwplatform import JWPlatformIE
from .jpopsukitv import JpopsukiIE
from .kaltura import KalturaIE
@@ -820,6 +826,7 @@ from .radiobremen import RadioBremenIE
from .radiofrance import RadioFranceIE
from .rai import (
RaiPlayIE,
RaiPlayLiveIE,
RaiIE,
)
from .rbmaradio import RBMARadioIE
@@ -871,6 +878,7 @@ from .rutube import (
)
from .rutv import RUTVIE
from .ruutu import RuutuIE
from .ruv import RuvIE
from .sandia import SandiaIE
from .safari import (
SafariIE,
@@ -967,6 +975,7 @@ from .tagesschau import (
TagesschauIE,
)
from .tass import TassIE
from .tastytrade import TastyTradeIE
from .tbs import TBSIE
from .tdslifeway import TDSLifewayIE
from .teachertube import (
@@ -1273,7 +1282,6 @@ from .yahoo import (
YahooIE,
YahooSearchIE,
)
from .yam import YamIE
from .yandexmusic import (
YandexMusicTrackIE,
YandexMusicAlbumIE,

View File

@@ -203,19 +203,19 @@ class FacebookIE(InfoExtractor):
}]
@staticmethod
def _extract_url(webpage):
mobj = re.search(
r'<iframe[^>]+?src=(["\'])(?P<url>https://www\.facebook\.com/video/embed.+?)\1', webpage)
if mobj is not None:
return mobj.group('url')
def _extract_urls(webpage):
urls = []
for mobj in re.finditer(
r'<iframe[^>]+?src=(["\'])(?P<url>https?://www\.facebook\.com/(?:video/embed|plugins/video\.php).+?)\1',
webpage):
urls.append(mobj.group('url'))
# Facebook API embed
# see https://developers.facebook.com/docs/plugins/embedded-video-player
mobj = re.search(r'''(?x)<div[^>]+
for mobj in re.finditer(r'''(?x)<div[^>]+
class=(?P<q1>[\'"])[^\'"]*\bfb-(?:video|post)\b[^\'"]*(?P=q1)[^>]+
data-href=(?P<q2>[\'"])(?P<url>(?:https?:)?//(?:www\.)?facebook.com/.+?)(?P=q2)''', webpage)
if mobj is not None:
return mobj.group('url')
data-href=(?P<q2>[\'"])(?P<url>(?:https?:)?//(?:www\.)?facebook.com/.+?)(?P=q2)''', webpage):
urls.append(mobj.group('url'))
return urls
def _login(self):
(useremail, password) = self._get_login_info()

View File

@@ -85,11 +85,11 @@ class FourTubeIE(InfoExtractor):
media_id = params[0]
sources = ['%s' % p for p in params[2]]
token_url = 'http://tkn.4tube.com/{0}/desktop/{1}'.format(
token_url = 'https://tkn.kodicdn.com/{0}/desktop/{1}'.format(
media_id, '+'.join(sources))
headers = {
b'Content-Type': b'application/x-www-form-urlencoded',
b'Origin': b'http://www.4tube.com',
b'Origin': b'https://www.4tube.com',
}
token_req = sanitized_Request(token_url, b'{}', headers)
tokens = self._download_json(token_req, video_id)

View File

@@ -57,6 +57,7 @@ from .dailymotion import (
DailymotionIE,
DailymotionCloudIE,
)
from .dailymail import DailyMailIE
from .onionstudios import OnionStudiosIE
from .viewlift import ViewLiftEmbedIE
from .mtv import MTVServicesEmbeddedIE
@@ -91,6 +92,7 @@ from .anvato import AnvatoIE
from .washingtonpost import WashingtonPostIE
from .wistia import WistiaIE
from .mediaset import MediasetIE
from .joj import JojIE
class GenericIE(InfoExtractor):
@@ -759,6 +761,20 @@ class GenericIE(InfoExtractor):
},
'add_ie': ['Dailymotion'],
},
# DailyMail embed
{
'url': 'http://www.bumm.sk/krimi/2017/07/05/biztonsagi-kamera-buktatta-le-az-agg-ferfit-utlegelo-apolot',
'info_dict': {
'id': '1495629',
'ext': 'mp4',
'title': 'Care worker punches elderly dementia patient in head 11 times',
'description': 'md5:3a743dee84e57e48ec68bf67113199a5',
},
'add_ie': ['DailyMail'],
'params': {
'skip_download': True,
},
},
# YouTube embed
{
'url': 'http://www.badzine.de/ansicht/datum/2014/06/09/so-funktioniert-die-neue-englische-badminton-liga.html',
@@ -1185,7 +1201,7 @@ class GenericIE(InfoExtractor):
},
'add_ie': ['Kaltura'],
},
# Eagle.Platform embed (generic URL)
# EaglePlatform embed (generic URL)
{
'url': 'http://lenta.ru/news/2015/03/06/navalny/',
# Not checking MD5 as sometimes the direct HTTP link results in 404 and HLS is used
@@ -1199,8 +1215,26 @@ class GenericIE(InfoExtractor):
'view_count': int,
'age_limit': 0,
},
'params': {
'skip_download': True,
},
},
# ClipYou (Eagle.Platform) embed (custom URL)
# referrer protected EaglePlatform embed
{
'url': 'https://tvrain.ru/lite/teleshow/kak_vse_nachinalos/namin-418921/',
'info_dict': {
'id': '582306',
'ext': 'mp4',
'title': 'Стас Намин: «Мы нарушили девственность Кремля»',
'thumbnail': r're:^https?://.*\.jpg$',
'duration': 3382,
'view_count': int,
},
'params': {
'skip_download': True,
},
},
# ClipYou (EaglePlatform) embed (custom URL)
{
'url': 'http://muz-tv.ru/play/7129/',
# Not checking MD5 as sometimes the direct HTTP link results in 404 and HLS is used
@@ -1212,6 +1246,9 @@ class GenericIE(InfoExtractor):
'duration': 216,
'view_count': int,
},
'params': {
'skip_download': True,
},
},
# Pladform embed
{
@@ -1522,6 +1559,21 @@ class GenericIE(InfoExtractor):
'title': 'Facebook video #599637780109885',
},
},
# Facebook <iframe> embed, plugin video
{
'url': 'http://5pillarsuk.com/2017/06/07/tariq-ramadan-disagrees-with-pr-exercise-by-imams-refusing-funeral-prayers-for-london-attackers/',
'info_dict': {
'id': '1754168231264132',
'ext': 'mp4',
'title': 'About the Imams and Religious leaders refusing to perform funeral prayers for...',
'uploader': 'Tariq Ramadan (official)',
'timestamp': 1496758379,
'upload_date': '20170606',
},
'params': {
'skip_download': True,
},
},
# Facebook API embed
{
'url': 'http://www.lothype.com/blue-stars-2016-preview-standstill-full-show/',
@@ -1734,6 +1786,26 @@ class GenericIE(InfoExtractor):
},
'add_ie': [MediasetIE.ie_key()],
},
{
# JOJ.sk embeds
'url': 'https://www.noviny.sk/slovensko/238543-slovenskom-sa-prehnala-vlna-silnych-burok',
'info_dict': {
'id': '238543-slovenskom-sa-prehnala-vlna-silnych-burok',
'title': 'Slovenskom sa prehnala vlna silných búrok',
},
'playlist_mincount': 5,
'add_ie': [JojIE.ie_key()],
},
{
# AMP embed (see https://www.ampproject.org/docs/reference/components/amp-video)
'url': 'https://tvrain.ru/amp/418921/',
'md5': 'cc00413936695987e8de148b67d14f1d',
'info_dict': {
'id': '418921',
'ext': 'mp4',
'title': 'Стас Намин: «Мы нарушили девственность Кремля»',
},
},
# {
# # TODO: find another test
# # http://schema.org/VideoObject
@@ -2033,6 +2105,13 @@ class GenericIE(InfoExtractor):
video_description = self._og_search_description(webpage, default=None)
video_thumbnail = self._og_search_thumbnail(webpage, default=None)
info_dict.update({
'title': video_title,
'description': video_description,
'thumbnail': video_thumbnail,
'age_limit': age_limit,
})
# Look for Brightcove Legacy Studio embeds
bc_urls = BrightcoveLegacyIE._extract_brightcove_urls(webpage)
if bc_urls:
@@ -2126,6 +2205,12 @@ class GenericIE(InfoExtractor):
return self.playlist_from_matches(
playlists, video_id, video_title, lambda p: '//dailymotion.com/playlist/%s' % p)
# Look for DailyMail embeds
dailymail_urls = DailyMailIE._extract_urls(webpage)
if dailymail_urls:
return self.playlist_from_matches(
dailymail_urls, video_id, video_title, ie=DailyMailIE.ie_key())
# Look for embedded Wistia player
wistia_url = WistiaIE._extract_url(webpage)
if wistia_url:
@@ -2222,9 +2307,9 @@ class GenericIE(InfoExtractor):
return self.url_result(mobj.group('url'))
# Look for embedded Facebook player
facebook_url = FacebookIE._extract_url(webpage)
if facebook_url is not None:
return self.url_result(facebook_url, 'Facebook')
facebook_urls = FacebookIE._extract_urls(webpage)
if facebook_urls:
return self.playlist_from_matches(facebook_urls, video_id, video_title)
# Look for embedded VK player
mobj = re.search(r'<iframe[^>]+?src=(["\'])(?P<url>https?://vk\.com/video_ext\.php.+?)\1', webpage)
@@ -2421,12 +2506,12 @@ class GenericIE(InfoExtractor):
if kaltura_url:
return self.url_result(smuggle_url(kaltura_url, {'source_url': url}), KalturaIE.ie_key())
# Look for Eagle.Platform embeds
# Look for EaglePlatform embeds
eagleplatform_url = EaglePlatformIE._extract_url(webpage)
if eagleplatform_url:
return self.url_result(eagleplatform_url, EaglePlatformIE.ie_key())
return self.url_result(smuggle_url(eagleplatform_url, {'referrer': url}), EaglePlatformIE.ie_key())
# Look for ClipYou (uses Eagle.Platform) embeds
# Look for ClipYou (uses EaglePlatform) embeds
mobj = re.search(
r'<iframe[^>]+src="https?://(?P<host>media\.clipyou\.ru)/index/player\?.*\brecord_id=(?P<id>\d+).*"', webpage)
if mobj is not None:
@@ -2669,18 +2754,32 @@ class GenericIE(InfoExtractor):
return self.playlist_from_matches(
mediaset_urls, video_id, video_title, ie=MediasetIE.ie_key())
# Look for JOJ.sk embeds
joj_urls = JojIE._extract_urls(webpage)
if joj_urls:
return self.playlist_from_matches(
joj_urls, video_id, video_title, ie=JojIE.ie_key())
def merge_dicts(dict1, dict2):
merged = {}
for k, v in dict1.items():
if v is not None:
merged[k] = v
for k, v in dict2.items():
if v is None:
continue
if (k not in merged or
(isinstance(v, compat_str) and v and
isinstance(merged[k], compat_str) and
not merged[k])):
merged[k] = v
return merged
# Looking for http://schema.org/VideoObject
json_ld = self._search_json_ld(
webpage, video_id, default={}, expected_type='VideoObject')
if json_ld.get('url'):
info_dict.update({
'title': video_title or info_dict['title'],
'description': video_description,
'thumbnail': video_thumbnail,
'age_limit': age_limit
})
info_dict.update(json_ld)
return info_dict
return merge_dicts(json_ld, info_dict)
# Look for HTML5 media
entries = self._parse_html5_media_entries(url, webpage, video_id, m3u8_id='hls')
@@ -2698,9 +2797,7 @@ class GenericIE(InfoExtractor):
if jwplayer_data:
info = self._parse_jwplayer_data(
jwplayer_data, video_id, require_title=False, base_url=url)
if not info.get('title'):
info['title'] = video_title
return info
return merge_dicts(info, info_dict)
def check_video(vurl):
if YoutubeIE.suitable(vurl):

View File

@@ -69,19 +69,32 @@ class GoogleDriveIE(InfoExtractor):
r'"fmt_stream_map"\s*,\s*"([^"]+)', webpage, 'fmt stream map').split(',')
fmt_list = self._search_regex(r'"fmt_list"\s*,\s*"([^"]+)', webpage, 'fmt_list').split(',')
resolutions = {}
for fmt in fmt_list:
mobj = re.search(
r'^(?P<format_id>\d+)/(?P<width>\d+)[xX](?P<height>\d+)', fmt)
if mobj:
resolutions[mobj.group('format_id')] = (
int(mobj.group('width')), int(mobj.group('height')))
formats = []
for fmt, fmt_stream in zip(fmt_list, fmt_stream_map):
fmt_id, fmt_url = fmt_stream.split('|')
resolution = fmt.split('/')[1]
width, height = resolution.split('x')
formats.append({
'url': lowercase_escape(fmt_url),
'format_id': fmt_id,
'resolution': resolution,
'width': int_or_none(width),
'height': int_or_none(height),
'ext': self._FORMATS_EXT[fmt_id],
})
for fmt_stream in fmt_stream_map:
fmt_stream_split = fmt_stream.split('|')
if len(fmt_stream_split) < 2:
continue
format_id, format_url = fmt_stream_split[:2]
f = {
'url': lowercase_escape(format_url),
'format_id': format_id,
'ext': self._FORMATS_EXT[format_id],
}
resolution = resolutions.get(format_id)
if resolution:
f.update({
'width': resolution[0],
'height': resolution[1],
})
formats.append(f)
self._sort_formats(formats)
return {

View File

@@ -7,14 +7,19 @@ from .common import InfoExtractor
class HGTVComShowIE(InfoExtractor):
IE_NAME = 'hgtv.com:show'
_VALID_URL = r'https?://(?:www\.)?hgtv\.com/shows/[^/]+/(?P<id>[^/?#&]+)'
_TEST = {
'url': 'http://www.hgtv.com/shows/flip-or-flop/flip-or-flop-full-episodes-videos',
_TESTS = [{
# data-module="video"
'url': 'http://www.hgtv.com/shows/flip-or-flop/flip-or-flop-full-episodes-season-4-videos',
'info_dict': {
'id': 'flip-or-flop-full-episodes-videos',
'id': 'flip-or-flop-full-episodes-season-4-videos',
'title': 'Flip or Flop Full Episodes',
},
'playlist_mincount': 15,
}
}, {
# data-deferred-module="video"
'url': 'http://www.hgtv.com/shows/good-bones/episodes/an-old-victorian-house-gets-a-new-facelift',
'only_matching': True,
}]
def _real_extract(self, url):
display_id = self._match_id(url)
@@ -23,7 +28,7 @@ class HGTVComShowIE(InfoExtractor):
config = self._parse_json(
self._search_regex(
r'(?s)data-module=["\']video["\'][^>]*>.*?<script[^>]+type=["\']text/x-config["\'][^>]*>(.+?)</script',
r'(?s)data-(?:deferred-)?module=["\']video["\'][^>]*>.*?<script[^>]+type=["\']text/x-config["\'][^>]*>(.+?)</script',
webpage, 'video config'),
display_id)['channels'][0]

View File

@@ -89,6 +89,11 @@ class IGNIE(InfoExtractor):
'url': 'http://me.ign.com/ar/angry-birds-2/106533/video/lrd-ldyy-lwl-lfylm-angry-birds',
'only_matching': True,
},
{
# videoId pattern
'url': 'http://www.ign.com/articles/2017/06/08/new-ducktales-short-donalds-birthday-doesnt-go-as-planned',
'only_matching': True,
},
]
def _find_video_id(self, webpage):
@@ -98,6 +103,8 @@ class IGNIE(InfoExtractor):
r'data-video-id="(.+?)"',
r'<object id="vid_(.+?)"',
r'<meta name="og:image" content=".*/(.+?)-(.+?)/.+.jpg"',
r'videoId&quot;\s*:\s*&quot;(.+?)&quot;',
r'videoId["\']\s*:\s*["\']([^"\']+?)["\']',
]
return self._search_regex(res_id, webpage, 'video id', default=None)

100
youtube_dl/extractor/joj.py Executable file
View File

@@ -0,0 +1,100 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
int_or_none,
js_to_json,
try_get,
)
class JojIE(InfoExtractor):
_VALID_URL = r'''(?x)
(?:
joj:|
https?://media\.joj\.sk/embed/
)
(?P<id>[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})
'''
_TESTS = [{
'url': 'https://media.joj.sk/embed/a388ec4c-6019-4a4a-9312-b1bee194e932',
'info_dict': {
'id': 'a388ec4c-6019-4a4a-9312-b1bee194e932',
'ext': 'mp4',
'title': 'NOVÉ BÝVANIE',
'thumbnail': r're:^https?://.*\.jpg$',
'duration': 3118,
}
}, {
'url': 'joj:a388ec4c-6019-4a4a-9312-b1bee194e932',
'only_matching': True,
}]
@staticmethod
def _extract_urls(webpage):
return re.findall(
r'<iframe\b[^>]+\bsrc=["\'](?P<url>(?:https?:)?//media\.joj\.sk/embed/[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})',
webpage)
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(
'https://media.joj.sk/embed/%s' % video_id, video_id)
title = self._search_regex(
(r'videoTitle\s*:\s*(["\'])(?P<title>(?:(?!\1).)+)\1',
r'<title>(?P<title>[^<]+)'), webpage, 'title',
default=None, group='title') or self._og_search_title(webpage)
bitrates = self._parse_json(
self._search_regex(
r'(?s)bitrates\s*=\s*({.+?});', webpage, 'bitrates',
default='{}'),
video_id, transform_source=js_to_json, fatal=False)
formats = []
for format_url in try_get(bitrates, lambda x: x['mp4'], list) or []:
if isinstance(format_url, compat_str):
height = self._search_regex(
r'(\d+)[pP]\.', format_url, 'height', default=None)
formats.append({
'url': format_url,
'format_id': '%sp' % height if height else None,
'height': int(height),
})
if not formats:
playlist = self._download_xml(
'https://media.joj.sk/services/Video.php?clip=%s' % video_id,
video_id)
for file_el in playlist.findall('./files/file'):
path = file_el.get('path')
if not path:
continue
format_id = file_el.get('id') or file_el.get('label')
formats.append({
'url': 'http://n16.joj.sk/storage/%s' % path.replace(
'dat/', '', 1),
'format_id': format_id,
'height': int_or_none(self._search_regex(
r'(\d+)[pP]', format_id or path, 'height',
default=None)),
})
self._sort_formats(formats)
thumbnail = self._og_search_thumbnail(webpage)
duration = int_or_none(self._search_regex(
r'videoDuration\s*:\s*(\d+)', webpage, 'duration', fatal=False))
return {
'id': video_id,
'title': title,
'thumbnail': thumbnail,
'duration': duration,
'formats': formats,
}

View File

@@ -324,7 +324,7 @@ class KalturaIE(InfoExtractor):
if captions:
for caption in captions.get('objects', []):
# Continue if caption is not ready
if f.get('status') != 2:
if caption.get('status') != 2:
continue
if not caption.get('id'):
continue

View File

@@ -83,9 +83,12 @@ class NiconicoIE(InfoExtractor):
'uploader_id': '312',
},
'skip': 'The viewing period of the video you were searching for has expired.',
}, {
'url': 'http://sp.nicovideo.jp/watch/sm28964488?ss_pos=1&cp_in=wt_tg',
'only_matching': True,
}]
_VALID_URL = r'https?://(?:www\.|secure\.)?nicovideo\.jp/watch/(?P<id>(?:[a-z]{2})?[0-9]+)'
_VALID_URL = r'https?://(?:www\.|secure\.|sp\.)?nicovideo\.jp/watch/(?P<id>(?:[a-z]{2})?[0-9]+)'
_NETRC_MACHINE = 'niconico'
def _real_initialize(self):

View File

@@ -35,7 +35,7 @@ class NPOIE(NPOBaseIE):
https?://
(?:www\.)?
(?:
npo\.nl/(?!live|radio)(?:[^/]+/){2}|
npo\.nl/(?!(?:live|radio)/)(?:[^/]+/){2}|
ntr\.nl/(?:[^/]+/){2,}|
omroepwnl\.nl/video/fragment/[^/]+__|
zapp\.nl/[^/]+/[^/]+/
@@ -150,6 +150,9 @@ class NPOIE(NPOBaseIE):
# live stream
'url': 'npo:LI_NL1_4188102',
'only_matching': True,
}, {
'url': 'http://www.npo.nl/radio-gaga/13-06-2017/BNN_101383373',
'only_matching': True,
}]
def _real_extract(self, url):
@@ -338,7 +341,7 @@ class NPOLiveIE(NPOBaseIE):
webpage = self._download_webpage(url, display_id)
live_id = self._search_regex(
r'data-prid="([^"]+)"', webpage, 'live id')
[r'media-id="([^"]+)"', r'data-prid="([^"]+)"'], webpage, 'live id')
return {
'_type': 'url_transparent',

View File

@@ -11,6 +11,7 @@ from ..utils import (
get_element_by_class,
int_or_none,
js_to_json,
NO_DEFAULT,
parse_iso8601,
remove_start,
strip_or_none,
@@ -198,6 +199,19 @@ class OnetPlIE(InfoExtractor):
'upload_date': '20170214',
'timestamp': 1487078046,
},
}, {
# embedded via pulsembed
'url': 'http://film.onet.pl/pensjonat-nad-rozlewiskiem-relacja-z-planu-serialu/y428n0',
'info_dict': {
'id': '501235.965429946',
'ext': 'mp4',
'title': '"Pensjonat nad rozlewiskiem": relacja z planu serialu',
'upload_date': '20170622',
'timestamp': 1498159955,
},
'params': {
'skip_download': True,
},
}, {
'url': 'http://film.onet.pl/zwiastuny/ghost-in-the-shell-drugi-zwiastun-pl/5q6yl3',
'only_matching': True,
@@ -212,13 +226,25 @@ class OnetPlIE(InfoExtractor):
'only_matching': True,
}]
def _search_mvp_id(self, webpage, default=NO_DEFAULT):
return self._search_regex(
r'data-(?:params-)?mvp=["\'](\d+\.\d+)', webpage, 'mvp id',
default=default)
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
mvp_id = self._search_regex(
r'data-params-mvp=["\'](\d+\.\d+)', webpage, 'mvp id')
mvp_id = self._search_mvp_id(webpage, default=None)
if not mvp_id:
pulsembed_url = self._search_regex(
r'data-src=(["\'])(?P<url>(?:https?:)?//pulsembed\.eu/.+?)\1',
webpage, 'pulsembed url', group='url')
webpage = self._download_webpage(
pulsembed_url, video_id, 'Downloading pulsembed webpage')
mvp_id = self._search_mvp_id(webpage)
return self.url_result(
'onetmvp:%s' % mvp_id, OnetMVPIE.ie_key(), video_id=mvp_id)

View File

@@ -3,12 +3,14 @@ import re
import base64
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
int_or_none,
float_or_none,
ExtractorError,
unsmuggle_url,
determine_ext,
ExtractorError,
float_or_none,
int_or_none,
try_get,
unsmuggle_url,
)
from ..compat import compat_urllib_parse_urlencode
@@ -39,13 +41,15 @@ class OoyalaBaseIE(InfoExtractor):
formats = []
if cur_auth_data['authorized']:
for stream in cur_auth_data['streams']:
s_url = base64.b64decode(
stream['url']['data'].encode('ascii')).decode('utf-8')
if s_url in urls:
url_data = try_get(stream, lambda x: x['url']['data'], compat_str)
if not url_data:
continue
s_url = base64.b64decode(url_data.encode('ascii')).decode('utf-8')
if not s_url or s_url in urls:
continue
urls.append(s_url)
ext = determine_ext(s_url, None)
delivery_type = stream['delivery_type']
delivery_type = stream.get('delivery_type')
if delivery_type == 'hls' or ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
re.sub(r'/ip(?:ad|hone)/', '/all/', s_url), embed_code, 'mp4', 'm3u8_native',
@@ -65,7 +69,7 @@ class OoyalaBaseIE(InfoExtractor):
else:
formats.append({
'url': s_url,
'ext': ext or stream.get('delivery_type'),
'ext': ext or delivery_type,
'vcodec': stream.get('video_codec'),
'format_id': delivery_type,
'width': int_or_none(stream.get('width')),
@@ -136,6 +140,11 @@ class OoyalaIE(OoyalaBaseIE):
'title': 'Divide Tool Path.mp4',
'duration': 204.405,
}
},
{
# empty stream['url']['data']
'url': 'http://player.ooyala.com/player.js?embedCode=w2bnZtYjE6axZ_dw1Cd0hQtXd_ige2Is',
'only_matching': True,
}
]

View File

@@ -10,13 +10,13 @@ from ..utils import (
class PandaTVIE(InfoExtractor):
IE_DESC = '熊猫TV'
_VALID_URL = r'http://(?:www\.)?panda\.tv/(?P<id>[0-9]+)'
_TEST = {
'url': 'http://www.panda.tv/10091',
_VALID_URL = r'https?://(?:www\.)?panda\.tv/(?P<id>[0-9]+)'
_TESTS = [{
'url': 'http://www.panda.tv/66666',
'info_dict': {
'id': '10091',
'id': '66666',
'title': 're:.+',
'uploader': '囚徒',
'uploader': '刘杀鸡',
'ext': 'flv',
'is_live': True,
},
@@ -24,13 +24,16 @@ class PandaTVIE(InfoExtractor):
'skip_download': True,
},
'skip': 'Live stream is offline',
}
}, {
'url': 'https://www.panda.tv/66666',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
config = self._download_json(
'http://www.panda.tv/api_room?roomid=%s' % video_id, video_id)
'https://www.panda.tv/api_room?roomid=%s' % video_id, video_id)
error_code = config.get('errno', 0)
if error_code is not 0:
@@ -74,7 +77,7 @@ class PandaTVIE(InfoExtractor):
continue
for pref, (ext, pl) in enumerate((('m3u8', '-hls'), ('flv', ''))):
formats.append({
'url': 'http://pl%s%s.live.panda.tv/live_panda/%s%s%s.%s'
'url': 'https://pl%s%s.live.panda.tv/live_panda/%s%s%s.%s'
% (pl, plflag1, room_key, live_panda, suffix[quality], ext),
'format_id': '%s-%s' % (k, ext),
'quality': quality,

View File

@@ -19,7 +19,7 @@ class PandoraTVIE(InfoExtractor):
IE_NAME = 'pandora.tv'
IE_DESC = '판도라TV'
_VALID_URL = r'https?://(?:.+?\.)?channel\.pandora\.tv/channel/video\.ptv\?'
_TEST = {
_TESTS = [{
'url': 'http://jp.channel.pandora.tv/channel/video.ptv?c1=&prgid=53294230&ch_userid=mikakim&ref=main&lot=cate_01_2',
'info_dict': {
'id': '53294230',
@@ -34,7 +34,26 @@ class PandoraTVIE(InfoExtractor):
'view_count': int,
'like_count': int,
}
}
}, {
'url': 'http://channel.pandora.tv/channel/video.ptv?ch_userid=gogoucc&prgid=54721744',
'info_dict': {
'id': '54721744',
'ext': 'flv',
'title': '[HD] JAPAN COUNTDOWN 170423',
'description': '[HD] JAPAN COUNTDOWN 170423',
'thumbnail': r're:^https?://.*\.jpg$',
'duration': 1704.9,
'upload_date': '20170423',
'uploader': 'GOGO_UCC',
'uploader_id': 'gogoucc',
'view_count': int,
'like_count': int,
},
'params': {
# Test metadata only
'skip_download': True,
},
}]
def _real_extract(self, url):
qs = compat_urlparse.parse_qs(compat_urlparse.urlparse(url).query)
@@ -86,7 +105,7 @@ class PandoraTVIE(InfoExtractor):
'description': info.get('body'),
'thumbnail': info.get('thumbnail') or info.get('poster'),
'duration': float_or_none(info.get('runtime'), 1000) or parse_duration(info.get('time')),
'upload_date': info['fid'][:8] if isinstance(info.get('fid'), compat_str) else None,
'upload_date': info['fid'].split('/')[-1][:8] if isinstance(info.get('fid'), compat_str) else None,
'uploader': info.get('nickname'),
'uploader_id': info.get('upload_userid'),
'view_count': str_to_int(info.get('hit')),

View File

@@ -65,7 +65,7 @@ class PolskieRadioIE(InfoExtractor):
webpage = self._download_webpage(url, playlist_id)
content = self._search_regex(
r'(?s)<div[^>]+class="audio atarticle"[^>]*>(.+?)<script>',
r'(?s)<div[^>]+class="\s*this-article\s*"[^>]*>(.+?)<div[^>]+class="tags"[^>]*>',
webpage, 'content')
timestamp = unified_timestamp(self._html_search_regex(

View File

@@ -191,11 +191,12 @@ class RaiPlayIE(RaiBaseIE):
info = {
'id': video_id,
'title': title,
'title': self._live_title(title) if relinker_info.get(
'is_live') else title,
'alt_title': media.get('subtitle'),
'description': media.get('description'),
'uploader': media.get('channel'),
'creator': media.get('editor'),
'uploader': strip_or_none(media.get('channel')),
'creator': strip_or_none(media.get('editor')),
'duration': parse_duration(video.get('duration')),
'timestamp': timestamp,
'thumbnails': thumbnails,
@@ -208,10 +209,46 @@ class RaiPlayIE(RaiBaseIE):
}
info.update(relinker_info)
return info
class RaiPlayLiveIE(RaiBaseIE):
_VALID_URL = r'https?://(?:www\.)?raiplay\.it/dirette/(?P<id>[^/?#&]+)'
_TEST = {
'url': 'http://www.raiplay.it/dirette/rainews24',
'info_dict': {
'id': 'd784ad40-e0ae-4a69-aa76-37519d238a9c',
'display_id': 'rainews24',
'ext': 'mp4',
'title': 're:^Diretta di Rai News 24 [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
'description': 'md5:6eca31500550f9376819f174e5644754',
'uploader': 'Rai News 24',
'creator': 'Rai News 24',
'is_live': True,
},
'params': {
'skip_download': True,
},
}
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
video_id = self._search_regex(
r'data-uniquename=["\']ContentItem-(%s)' % RaiBaseIE._UUID_RE,
webpage, 'content id')
return {
'_type': 'url_transparent',
'ie_key': RaiPlayIE.ie_key(),
'url': 'http://www.raiplay.it/dirette/ContentItem-%s.html' % video_id,
'id': video_id,
'display_id': display_id,
}
class RaiIE(RaiBaseIE):
_VALID_URL = r'https?://[^/]+\.(?:rai\.(?:it|tv)|rainews\.it)/dl/.+?-(?P<id>%s)(?:-.+?)?\.html' % RaiBaseIE._UUID_RE
_TESTS = [{

View File

@@ -13,7 +13,7 @@ from ..utils import (
class RedBullTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?redbull\.tv/(?:video|film)/(?P<id>AP-\w+)'
_VALID_URL = r'https?://(?:www\.)?redbull\.tv/(?:video|film|live)/(?:AP-\w+/segment/)?(?P<id>AP-\w+)'
_TESTS = [{
# film
'url': 'https://www.redbull.tv/video/AP-1Q756YYX51W11/abc-of-wrc',
@@ -42,6 +42,22 @@ class RedBullTVIE(InfoExtractor):
'season_number': 2,
'episode_number': 4,
},
'params': {
'skip_download': True,
},
}, {
# segment
'url': 'https://www.redbull.tv/live/AP-1R5DX49XS1W11/segment/AP-1QSAQJ6V52111/semi-finals',
'info_dict': {
'id': 'AP-1QSAQJ6V52111',
'ext': 'mp4',
'title': 'Semi Finals - Vans Park Series Pro Tour',
'description': 'md5:306a2783cdafa9e65e39aa62f514fd97',
'duration': 11791.991,
},
'params': {
'skip_download': True,
},
}, {
'url': 'https://www.redbull.tv/film/AP-1MSKKF5T92111/in-motion',
'only_matching': True,
@@ -82,7 +98,8 @@ class RedBullTVIE(InfoExtractor):
title = info['title'].strip()
formats = self._extract_m3u8_formats(
video['url'], video_id, 'mp4', 'm3u8_native')
video['url'], video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls')
self._sort_formats(formats)
subtitles = {}

101
youtube_dl/extractor/ruv.py Normal file
View File

@@ -0,0 +1,101 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
determine_ext,
unified_timestamp,
)
class RuvIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?ruv\.is/(?:sarpurinn/[^/]+|node)/(?P<id>[^/]+(?:/\d+)?)'
_TESTS = [{
# m3u8
'url': 'http://ruv.is/sarpurinn/ruv-aukaras/fh-valur/20170516',
'md5': '66347652f4e13e71936817102acc1724',
'info_dict': {
'id': '1144499',
'display_id': 'fh-valur/20170516',
'ext': 'mp4',
'title': 'FH - Valur',
'description': 'Bein útsending frá 3. leik FH og Vals í úrslitum Olísdeildar karla í handbolta.',
'timestamp': 1494963600,
'upload_date': '20170516',
},
}, {
# mp3
'url': 'http://ruv.is/sarpurinn/ras-2/morgunutvarpid/20170619',
'md5': '395ea250c8a13e5fdb39d4670ef85378',
'info_dict': {
'id': '1153630',
'display_id': 'morgunutvarpid/20170619',
'ext': 'mp3',
'title': 'Morgunútvarpið',
'description': 'md5:a4cf1202c0a1645ca096b06525915418',
'timestamp': 1497855000,
'upload_date': '20170619',
},
}, {
'url': 'http://ruv.is/sarpurinn/ruv/frettir/20170614',
'only_matching': True,
}, {
'url': 'http://www.ruv.is/node/1151854',
'only_matching': True,
}, {
'url': 'http://ruv.is/sarpurinn/klippa/secret-soltice-hefst-a-morgun',
'only_matching': True,
}, {
'url': 'http://ruv.is/sarpurinn/ras-1/morgunvaktin/20170619',
'only_matching': True,
}]
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
title = self._og_search_title(webpage)
FIELD_RE = r'video\.%s\s*=\s*(["\'])(?P<url>(?:(?!\1).)+)\1'
media_url = self._html_search_regex(
FIELD_RE % 'src', webpage, 'video URL', group='url')
video_id = self._search_regex(
r'<link\b[^>]+\bhref=["\']https?://www\.ruv\.is/node/(\d+)',
webpage, 'video id', default=display_id)
ext = determine_ext(media_url)
if ext == 'm3u8':
formats = self._extract_m3u8_formats(
media_url, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls')
elif ext == 'mp3':
formats = [{
'format_id': 'mp3',
'url': media_url,
'vcodec': 'none',
}]
else:
formats = [{
'url': media_url,
}]
description = self._og_search_description(webpage, default=None)
thumbnail = self._og_search_thumbnail(
webpage, default=None) or self._search_regex(
FIELD_RE % 'poster', webpage, 'thumbnail', fatal=False)
timestamp = unified_timestamp(self._html_search_meta(
'article:published_time', webpage, 'timestamp', fatal=False))
return {
'id': video_id,
'display_id': display_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'timestamp': timestamp,
'formats': formats,
}

View File

@@ -136,7 +136,7 @@ class SoundcloudIE(InfoExtractor):
@classmethod
def _resolv_url(cls, url):
return 'http://api.soundcloud.com/resolve.json?url=' + url + '&client_id=' + cls._CLIENT_ID
return 'https://api.soundcloud.com/resolve.json?url=' + url + '&client_id=' + cls._CLIENT_ID
def _extract_info_dict(self, info, full_title=None, quiet=False, secret_token=None):
track_id = compat_str(info['id'])
@@ -174,7 +174,7 @@ class SoundcloudIE(InfoExtractor):
# We have to retrieve the url
format_dict = self._download_json(
'http://api.soundcloud.com/i1/tracks/%s/streams' % track_id,
'https://api.soundcloud.com/i1/tracks/%s/streams' % track_id,
track_id, 'Downloading track url', query={
'client_id': self._CLIENT_ID,
'secret_token': secret_token,
@@ -236,7 +236,7 @@ class SoundcloudIE(InfoExtractor):
track_id = mobj.group('track_id')
if track_id is not None:
info_json_url = 'http://api.soundcloud.com/tracks/' + track_id + '.json?client_id=' + self._CLIENT_ID
info_json_url = 'https://api.soundcloud.com/tracks/' + track_id + '.json?client_id=' + self._CLIENT_ID
full_title = track_id
token = mobj.group('secret_token')
if token:
@@ -261,7 +261,7 @@ class SoundcloudIE(InfoExtractor):
self.report_resolve(full_title)
url = 'http://soundcloud.com/%s' % resolve_title
url = 'https://soundcloud.com/%s' % resolve_title
info_json_url = self._resolv_url(url)
info = self._download_json(info_json_url, full_title, 'Downloading info JSON')
@@ -290,7 +290,7 @@ class SoundcloudSetIE(SoundcloudPlaylistBaseIE):
'id': '2284613',
'title': 'The Royal Concept EP',
},
'playlist_mincount': 6,
'playlist_mincount': 5,
}, {
'url': 'https://soundcloud.com/the-concept-band/sets/the-royal-concept-ep/token',
'only_matching': True,
@@ -304,7 +304,7 @@ class SoundcloudSetIE(SoundcloudPlaylistBaseIE):
# extract simple title (uploader + slug of song title)
slug_title = mobj.group('slug_title')
full_title = '%s/sets/%s' % (uploader, slug_title)
url = 'http://soundcloud.com/%s/sets/%s' % (uploader, slug_title)
url = 'https://soundcloud.com/%s/sets/%s' % (uploader, slug_title)
token = mobj.group('token')
if token:
@@ -380,7 +380,7 @@ class SoundcloudUserIE(SoundcloudPlaylistBaseIE):
'url': 'https://soundcloud.com/grynpyret/spotlight',
'info_dict': {
'id': '7098329',
'title': 'GRYNPYRET (Spotlight)',
'title': 'Grynpyret (Spotlight)',
},
'playlist_mincount': 1,
}]
@@ -410,7 +410,7 @@ class SoundcloudUserIE(SoundcloudPlaylistBaseIE):
mobj = re.match(self._VALID_URL, url)
uploader = mobj.group('user')
url = 'http://soundcloud.com/%s/' % uploader
url = 'https://soundcloud.com/%s/' % uploader
resolv_url = self._resolv_url(url)
user = self._download_json(
resolv_url, uploader, 'Downloading user info')
@@ -473,7 +473,7 @@ class SoundcloudPlaylistIE(SoundcloudPlaylistBaseIE):
_VALID_URL = r'https?://api\.soundcloud\.com/playlists/(?P<id>[0-9]+)(?:/?\?secret_token=(?P<token>[^&]+?))?$'
IE_NAME = 'soundcloud:playlist'
_TESTS = [{
'url': 'http://api.soundcloud.com/playlists/4110309',
'url': 'https://api.soundcloud.com/playlists/4110309',
'info_dict': {
'id': '4110309',
'title': 'TILT Brass - Bowery Poetry Club, August \'03 [Non-Site SCR 02]',

View File

@@ -0,0 +1,43 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from .ooyala import OoyalaIE
class TastyTradeIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?tastytrade\.com/tt/shows/[^/]+/episodes/(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'https://www.tastytrade.com/tt/shows/market-measures/episodes/correlation-in-short-volatility-06-28-2017',
'info_dict': {
'id': 'F3bnlzbToeI6pLEfRyrlfooIILUjz4nM',
'ext': 'mp4',
'title': 'A History of Teaming',
'description': 'md5:2a9033db8da81f2edffa4c99888140b3',
'duration': 422.255,
},
'params': {
'skip_download': True,
},
'add_ie': ['Ooyala'],
}, {
'url': 'https://www.tastytrade.com/tt/shows/daily-dose/episodes/daily-dose-06-30-2017',
'only_matching': True,
}]
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
ooyala_code = self._search_regex(
r'data-media-id=(["\'])(?P<code>(?:(?!\1).)+)\1',
webpage, 'ooyala code', group='code')
info = self._search_json_ld(webpage, display_id, fatal=False)
info.update({
'_type': 'url_transparent',
'ie_key': OoyalaIE.ie_key(),
'url': 'ooyala:%s' % ooyala_code,
'display_id': display_id,
})
return info

View File

@@ -6,7 +6,10 @@ import re
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import int_or_none
from ..utils import (
int_or_none,
try_get,
)
class TEDIE(InfoExtractor):
@@ -113,8 +116,9 @@ class TEDIE(InfoExtractor):
}
def _extract_info(self, webpage):
info_json = self._search_regex(r'q\("\w+.init",({.+})\)</script>',
webpage, 'info json')
info_json = self._search_regex(
r'(?s)q\(\s*"\w+.init"\s*,\s*({.+})\)\s*</script>',
webpage, 'info json')
return json.loads(info_json)
def _real_extract(self, url):
@@ -136,11 +140,16 @@ class TEDIE(InfoExtractor):
webpage = self._download_webpage(url, name,
'Downloading playlist webpage')
info = self._extract_info(webpage)
playlist_info = info['playlist']
playlist_info = try_get(
info, lambda x: x['__INITIAL_DATA__']['playlist'],
dict) or info['playlist']
playlist_entries = [
self.url_result('http://www.ted.com/talks/' + talk['slug'], self.ie_key())
for talk in info['talks']
for talk in try_get(
info, lambda x: x['__INITIAL_DATA__']['talks'],
dict) or info['talks']
]
return self.playlist_result(
playlist_entries,
@@ -149,9 +158,14 @@ class TEDIE(InfoExtractor):
def _talk_info(self, url, video_name):
webpage = self._download_webpage(url, video_name)
self.report_extraction(video_name)
talk_info = self._extract_info(webpage)['talks'][0]
info = self._extract_info(webpage)
talk_info = try_get(
info, lambda x: x['__INITIAL_DATA__']['talks'][0],
dict) or info['talks'][0]
title = talk_info['title'].strip()
external = talk_info.get('external')
if external:
@@ -165,19 +179,27 @@ class TEDIE(InfoExtractor):
'url': ext_url or external['uri'],
}
native_downloads = try_get(
talk_info, lambda x: x['downloads']['nativeDownloads'],
dict) or talk_info['nativeDownloads']
formats = [{
'url': format_url,
'format_id': format_id,
'format': format_id,
} for (format_id, format_url) in talk_info['nativeDownloads'].items() if format_url is not None]
} for (format_id, format_url) in native_downloads.items() if format_url is not None]
if formats:
for f in formats:
finfo = self._NATIVE_FORMATS.get(f['format_id'])
if finfo:
f.update(finfo)
player_talk = talk_info['player_talks'][0]
resources_ = player_talk.get('resources') or talk_info.get('resources')
http_url = None
for format_id, resources in talk_info['resources'].items():
for format_id, resources in resources_.items():
if format_id == 'h264':
for resource in resources:
h264_url = resource.get('file')
@@ -237,14 +259,11 @@ class TEDIE(InfoExtractor):
video_id = compat_str(talk_info['id'])
thumbnail = talk_info['thumb']
if not thumbnail.startswith('http'):
thumbnail = 'http://' + thumbnail
return {
'id': video_id,
'title': talk_info['title'].strip(),
'uploader': talk_info['speaker'],
'thumbnail': thumbnail,
'title': title,
'uploader': player_talk.get('speaker') or talk_info.get('speaker'),
'thumbnail': player_talk.get('thumb') or talk_info.get('thumb'),
'description': self._og_search_description(webpage),
'subtitles': self._get_subtitles(video_id, talk_info),
'formats': formats,

View File

@@ -2,13 +2,15 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import try_get
class ThisOldHouseIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?thisoldhouse\.com/(?:watch|how-to|tv-episode)/(?P<id>[^/?#]+)'
_TESTS = [{
'url': 'https://www.thisoldhouse.com/how-to/how-to-build-storage-bench',
'md5': '946f05bbaa12a33f9ae35580d2dfcfe3',
'md5': '568acf9ca25a639f0c4ff905826b662f',
'info_dict': {
'id': '2REGtUDQ',
'ext': 'mp4',
@@ -28,8 +30,15 @@ class ThisOldHouseIE(InfoExtractor):
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
drupal_settings = self._parse_json(self._search_regex(
r'jQuery\.extend\(Drupal\.settings\s*,\s*({.+?})\);',
webpage, 'drupal settings'), display_id)
video_id = drupal_settings['jwplatform']['video_id']
video_id = self._search_regex(
(r'data-mid=(["\'])(?P<id>(?:(?!\1).)+)\1',
r'id=(["\'])inline-video-player-(?P<id>(?:(?!\1).)+)\1'),
webpage, 'video id', default=None, group='id')
if not video_id:
drupal_settings = self._parse_json(self._search_regex(
r'jQuery\.extend\(Drupal\.settings\s*,\s*({.+?})\);',
webpage, 'drupal settings'), display_id)
video_id = try_get(
drupal_settings, lambda x: x['jwplatform']['video_id'],
compat_str) or list(drupal_settings['comScore'])[0]
return self.url_result('jwplatform:' + video_id, 'JWPlatform', video_id)

View File

@@ -12,47 +12,46 @@ from ..utils import (
class VeohIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?veoh\.com/(?:watch|iphone/#_Watch)/(?P<id>(?:v|yapi-)[\da-zA-Z]+)'
_VALID_URL = r'https?://(?:www\.)?veoh\.com/(?:watch|iphone/#_Watch)/(?P<id>(?:v|e|yapi-)[\da-zA-Z]+)'
_TESTS = [
{
'url': 'http://www.veoh.com/watch/v56314296nk7Zdmz3',
'md5': '620e68e6a3cff80086df3348426c9ca3',
'info_dict': {
'id': '56314296',
'ext': 'mp4',
'title': 'Straight Backs Are Stronger',
'uploader': 'LUMOback',
'description': 'At LUMOback, we believe straight backs are stronger. The LUMOback Posture & Movement Sensor: It gently vibrates when you slouch, inspiring improved posture and mobility. Use the app to track your data and improve your posture over time. ',
},
_TESTS = [{
'url': 'http://www.veoh.com/watch/v56314296nk7Zdmz3',
'md5': '620e68e6a3cff80086df3348426c9ca3',
'info_dict': {
'id': '56314296',
'ext': 'mp4',
'title': 'Straight Backs Are Stronger',
'uploader': 'LUMOback',
'description': 'At LUMOback, we believe straight backs are stronger. The LUMOback Posture & Movement Sensor: It gently vibrates when you slouch, inspiring improved posture and mobility. Use the app to track your data and improve your posture over time. ',
},
{
'url': 'http://www.veoh.com/watch/v27701988pbTc4wzN?h1=Chile+workers+cover+up+to+avoid+skin+damage',
'md5': '4a6ff84b87d536a6a71e6aa6c0ad07fa',
'info_dict': {
'id': '27701988',
'ext': 'mp4',
'title': 'Chile workers cover up to avoid skin damage',
'description': 'md5:2bd151625a60a32822873efc246ba20d',
'uploader': 'afp-news',
'duration': 123,
},
'skip': 'This video has been deleted.',
}, {
'url': 'http://www.veoh.com/watch/v27701988pbTc4wzN?h1=Chile+workers+cover+up+to+avoid+skin+damage',
'md5': '4a6ff84b87d536a6a71e6aa6c0ad07fa',
'info_dict': {
'id': '27701988',
'ext': 'mp4',
'title': 'Chile workers cover up to avoid skin damage',
'description': 'md5:2bd151625a60a32822873efc246ba20d',
'uploader': 'afp-news',
'duration': 123,
},
{
'url': 'http://www.veoh.com/watch/v69525809F6Nc4frX',
'md5': '4fde7b9e33577bab2f2f8f260e30e979',
'note': 'Embedded ooyala video',
'info_dict': {
'id': '69525809',
'ext': 'mp4',
'title': 'Doctors Alter Plan For Preteen\'s Weight Loss Surgery',
'description': 'md5:f5a11c51f8fb51d2315bca0937526891',
'uploader': 'newsy-videos',
},
'skip': 'This video has been deleted.',
'skip': 'This video has been deleted.',
}, {
'url': 'http://www.veoh.com/watch/v69525809F6Nc4frX',
'md5': '4fde7b9e33577bab2f2f8f260e30e979',
'note': 'Embedded ooyala video',
'info_dict': {
'id': '69525809',
'ext': 'mp4',
'title': 'Doctors Alter Plan For Preteen\'s Weight Loss Surgery',
'description': 'md5:f5a11c51f8fb51d2315bca0937526891',
'uploader': 'newsy-videos',
},
]
'skip': 'This video has been deleted.',
}, {
'url': 'http://www.veoh.com/watch/e152215AJxZktGS',
'only_matching': True,
}]
def _extract_formats(self, source):
formats = []

View File

@@ -15,7 +15,21 @@ from ..utils import (
class VierIE(InfoExtractor):
IE_NAME = 'vier'
IE_DESC = 'vier.be and vijf.be'
_VALID_URL = r'https?://(?:www\.)?(?P<site>vier|vijf)\.be/(?:[^/]+/videos/(?P<display_id>[^/]+)(?:/(?P<id>\d+))?|video/v3/embed/(?P<embed_id>\d+))'
_VALID_URL = r'''(?x)
https?://
(?:www\.)?(?P<site>vier|vijf)\.be/
(?:
(?:
[^/]+/videos|
video(?:/[^/]+)*
)/
(?P<display_id>[^/]+)(?:/(?P<id>\d+))?|
(?:
video/v3/embed|
embed/video/public
)/(?P<embed_id>\d+)
)
'''
_NETRC_MACHINE = 'vier'
_TESTS = [{
'url': 'http://www.vier.be/planb/videos/het-wordt-warm-de-moestuin/16129',
@@ -83,6 +97,15 @@ class VierIE(InfoExtractor):
}, {
'url': 'http://www.vier.be/video/v3/embed/16129',
'only_matching': True,
}, {
'url': 'https://www.vijf.be/embed/video/public/4093',
'only_matching': True,
}, {
'url': 'https://www.vier.be/video/blockbusters/in-juli-en-augustus-summer-classics',
'only_matching': True,
}, {
'url': 'https://www.vier.be/video/achter-de-rug/2017/achter-de-rug-seizoen-1-aflevering-6',
'only_matching': True,
}]
def _real_initialize(self):
@@ -133,14 +156,20 @@ class VierIE(InfoExtractor):
video_id = self._search_regex(
[r'data-nid="(\d+)"', r'"nid"\s*:\s*"(\d+)"'],
webpage, 'video id', default=video_id or display_id)
application = self._search_regex(
[r'data-application="([^"]+)"', r'"application"\s*:\s*"([^"]+)"'],
webpage, 'application', default=site + '_vod')
filename = self._search_regex(
[r'data-filename="([^"]+)"', r'"filename"\s*:\s*"([^"]+)"'],
webpage, 'filename')
playlist_url = 'http://vod.streamcloud.be/%s/_definst_/mp4:%s.mp4/playlist.m3u8' % (application, filename)
playlist_url = self._search_regex(
r'data-file=(["\'])(?P<url>(?:https?:)?//[^/]+/.+?\.m3u8.*?)\1',
webpage, 'm3u8 url', default=None, group='url')
if not playlist_url:
application = self._search_regex(
[r'data-application="([^"]+)"', r'"application"\s*:\s*"([^"]+)"'],
webpage, 'application', default=site + '_vod')
filename = self._search_regex(
[r'data-filename="([^"]+)"', r'"filename"\s*:\s*"([^"]+)"'],
webpage, 'filename')
playlist_url = 'http://vod.streamcloud.be/%s/_definst_/mp4:%s.mp4/playlist.m3u8' % (application, filename)
formats = self._extract_wowza_formats(
playlist_url, display_id, skip_protocols=['dash'])
self._sort_formats(formats)

View File

@@ -615,7 +615,10 @@ class VimeoIE(VimeoBaseInfoExtractor):
if download_url and not source_file.get('is_cold') and not source_file.get('is_defrosting'):
source_name = source_file.get('public_name', 'Original')
if self._is_valid_url(download_url, video_id, '%s video' % source_name):
ext = source_file.get('extension', determine_ext(download_url)).lower()
ext = (try_get(
source_file, lambda x: x['extension'],
compat_str) or determine_ext(
download_url, None) or 'mp4').lower()
formats.append({
'url': download_url,
'ext': ext,

View File

@@ -4,7 +4,10 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_str
from ..compat import (
compat_kwargs,
compat_str,
)
from ..utils import (
ExtractorError,
int_or_none,
@@ -36,7 +39,8 @@ class ViuBaseIE(InfoExtractor):
headers.update(kwargs.get('headers', {}))
kwargs['headers'] = headers
response = self._download_json(
'https://www.viu.com/api/' + path, *args, **kwargs)['response']
'https://www.viu.com/api/' + path, *args,
**compat_kwargs(kwargs))['response']
if response.get('status') != 'success':
raise ExtractorError('%s said: %s' % (
self.IE_NAME, response['message']), expected=True)

View File

@@ -4,11 +4,7 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
unified_strdate,
parse_duration,
int_or_none,
)
from ..utils import parse_duration
class WatchIndianPornIE(InfoExtractor):
@@ -23,11 +19,8 @@ class WatchIndianPornIE(InfoExtractor):
'ext': 'mp4',
'title': 'Hot milf from kerala shows off her gorgeous large breasts on camera',
'thumbnail': r're:^https?://.*\.jpg$',
'uploader': 'LoveJay',
'upload_date': '20160428',
'duration': 226,
'view_count': int,
'comment_count': int,
'categories': list,
'age_limit': 18,
}
@@ -40,51 +33,36 @@ class WatchIndianPornIE(InfoExtractor):
webpage = self._download_webpage(url, display_id)
video_url = self._html_search_regex(
r"url: escape\('([^']+)'\)", webpage, 'url')
info_dict = self._parse_html5_media_entries(url, webpage, video_id)[0]
title = self._html_search_regex(
r'<h2 class="he2"><span>(.*?)</span>',
webpage, 'title')
thumbnail = self._html_search_regex(
r'<span id="container"><img\s+src="([^"]+)"',
webpage, 'thumbnail', fatal=False)
uploader = self._html_search_regex(
r'class="aupa">\s*(.*?)</a>',
webpage, 'uploader')
upload_date = unified_strdate(self._html_search_regex(
r'Added: <strong>(.+?)</strong>', webpage, 'upload date', fatal=False))
title = self._html_search_regex((
r'<title>(.+?)\s*-\s*Indian\s+Porn</title>',
r'<h4>(.+?)</h4>'
), webpage, 'title')
duration = parse_duration(self._search_regex(
r'<td>Time:\s*</td>\s*<td align="right"><span>\s*(.+?)\s*</span>',
r'Time:\s*<strong>\s*(.+?)\s*</strong>',
webpage, 'duration', fatal=False))
view_count = int_or_none(self._search_regex(
r'<td>Views:\s*</td>\s*<td align="right"><span>\s*(\d+)\s*</span>',
view_count = int(self._search_regex(
r'(?s)Time:\s*<strong>.*?</strong>.*?<strong>\s*(\d+)\s*</strong>',
webpage, 'view count', fatal=False))
comment_count = int_or_none(self._search_regex(
r'<td>Comments:\s*</td>\s*<td align="right"><span>\s*(\d+)\s*</span>',
webpage, 'comment count', fatal=False))
categories = re.findall(
r'<a href="[^"]+/search/video/desi"><span>([^<]+)</span></a>',
r'<a[^>]+class=[\'"]categories[\'"][^>]*>\s*([^<]+)\s*</a>',
webpage)
return {
info_dict.update({
'id': video_id,
'display_id': display_id,
'url': video_url,
'http_headers': {
'Referer': url,
},
'title': title,
'thumbnail': thumbnail,
'uploader': uploader,
'upload_date': upload_date,
'duration': duration,
'view_count': view_count,
'comment_count': comment_count,
'categories': categories,
'age_limit': 18,
}
})
return info_dict

View File

@@ -13,7 +13,7 @@ class WSJIE(InfoExtractor):
_VALID_URL = r'''(?x)
(?:
https?://video-api\.wsj\.com/api-video/player/iframe\.html\?.*?\bguid=|
https?://(?:www\.)?wsj\.com/video/[^/]+/|
https?://(?:www\.)?(?:wsj|barrons)\.com/video/[^/]+/|
wsj:
)
(?P<id>[a-fA-F0-9-]{36})
@@ -35,6 +35,9 @@ class WSJIE(InfoExtractor):
}, {
'url': 'http://www.wsj.com/video/can-alphabet-build-a-smarter-city/359DDAA8-9AC1-489C-82E6-0429C1E430E0.html',
'only_matching': True,
}, {
'url': 'http://www.barrons.com/video/capitalism-deserves-more-respect-from-millennials/F301217E-6F46-43AE-B8D2-B7180D642EE9.html',
'only_matching': True,
}]
def _real_extract(self, url):

View File

@@ -30,6 +30,7 @@ class XFileShareIE(InfoExtractor):
(r'vidbom\.com', 'VidBom'),
(r'vidlo\.us', 'vidlo'),
(r'rapidvideo\.(?:cool|org)', 'RapidVideo.TV'),
(r'fastvideo\.me', 'FastVideo.me'),
)
IE_DESC = 'XFileShare based sites: %s' % ', '.join(list(zip(*_SITES))[1])
@@ -112,6 +113,9 @@ class XFileShareIE(InfoExtractor):
}, {
'url': 'http://www.rapidvideo.cool/b667kprndr8w',
'only_matching': True,
}, {
'url': 'http://www.fastvideo.me/k8604r8nk8sn/FAST_FURIOUS_8_-_Trailer_italiano_ufficiale.mp4.html',
'only_matching': True
}]
def _real_extract(self, url):
@@ -153,7 +157,7 @@ class XFileShareIE(InfoExtractor):
def extract_formats(default=NO_DEFAULT):
urls = []
for regex in (
r'file\s*:\s*(["\'])(?P<url>http(?:(?!\1).)+\.(?:m3u8|mp4|flv)(?:(?!\1).)*)\1',
r'(?:file|src)\s*:\s*(["\'])(?P<url>http(?:(?!\1).)+\.(?:m3u8|mp4|flv)(?:(?!\1).)*)\1',
r'file_link\s*=\s*(["\'])(?P<url>http(?:(?!\1).)+)\1',
r'addVariable\((\\?["\'])file\1\s*,\s*(\\?["\'])(?P<url>http(?:(?!\2).)+)\2\)',
r'<embed[^>]+src=(["\'])(?P<url>http(?:(?!\1).)+\.(?:m3u8|mp4|flv)(?:(?!\1).)*)\1'):

View File

@@ -3,6 +3,7 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
clean_html,
dict_get,
@@ -14,12 +15,21 @@ from ..utils import (
class XHamsterIE(InfoExtractor):
_VALID_URL = r'(?P<proto>https?)://(?:.+?\.)?xhamster\.com/movies/(?P<id>[0-9]+)/(?P<seo>.*?)\.html(?:\?.*)?'
_VALID_URL = r'''(?x)
https?://
(?:.+?\.)?xhamster\.com/
(?:
movies/(?P<id>\d+)/(?P<display_id>[^/]*)\.html|
videos/(?P<display_id_2>[^/]*)-(?P<id_2>\d+)
)
'''
_TESTS = [{
'url': 'http://xhamster.com/movies/1509445/femaleagent_shy_beauty_takes_the_bait.html',
'md5': '8281348b8d3c53d39fffb377d24eac4e',
'info_dict': {
'id': '1509445',
'display_id': 'femaleagent_shy_beauty_takes_the_bait',
'ext': 'mp4',
'title': 'FemaleAgent Shy beauty takes the bait',
'upload_date': '20121014',
@@ -32,6 +42,7 @@ class XHamsterIE(InfoExtractor):
'url': 'http://xhamster.com/movies/2221348/britney_spears_sexy_booty.html?hd',
'info_dict': {
'id': '2221348',
'display_id': 'britney_spears_sexy_booty',
'ext': 'mp4',
'title': 'Britney Spears Sexy Booty',
'upload_date': '20130914',
@@ -66,26 +77,18 @@ class XHamsterIE(InfoExtractor):
# This video is visible for marcoalfa123456's friends only
'url': 'https://it.xhamster.com/movies/7263980/la_mia_vicina.html',
'only_matching': True,
}, {
# new URL schema
'url': 'https://pt.xhamster.com/videos/euro-pedal-pumping-7937821',
'only_matching': True,
}]
def _real_extract(self, url):
def extract_video_url(webpage, name):
return self._search_regex(
[r'''file\s*:\s*(?P<q>["'])(?P<mp4>.+?)(?P=q)''',
r'''<a\s+href=(?P<q>["'])(?P<mp4>.+?)(?P=q)\s+class=["']mp4Thumb''',
r'''<video[^>]+file=(?P<q>["'])(?P<mp4>.+?)(?P=q)[^>]*>'''],
webpage, name, group='mp4')
def is_hd(webpage):
return '<div class=\'icon iconHD\'' in webpage
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id') or mobj.group('id_2')
display_id = mobj.group('display_id') or mobj.group('display_id_2')
video_id = mobj.group('id')
seo = mobj.group('seo')
proto = mobj.group('proto')
mrss_url = '%s://xhamster.com/movies/%s/%s.html' % (proto, video_id, seo)
webpage = self._download_webpage(mrss_url, video_id)
webpage = self._download_webpage(url, video_id)
error = self._html_search_regex(
r'<div[^>]+id=["\']videoClosed["\'][^>]*>(.+?)</div>',
@@ -99,6 +102,39 @@ class XHamsterIE(InfoExtractor):
r'<title[^>]*>(.+?)(?:,\s*[^,]*?\s*Porn\s*[^,]*?:\s*xHamster[^<]*| - xHamster\.com)</title>'],
webpage, 'title')
formats = []
format_urls = set()
sources = self._parse_json(
self._search_regex(
r'sources\s*:\s*({.+?})\s*,?\s*\n', webpage, 'sources',
default='{}'),
video_id, fatal=False)
for format_id, format_url in sources.items():
if not isinstance(format_url, compat_str):
continue
if format_url in format_urls:
continue
format_urls.add(format_url)
formats.append({
'format_id': format_id,
'url': format_url,
'height': int_or_none(self._search_regex(
r'^(\d+)[pP]', format_id, 'height', default=None))
})
video_url = self._search_regex(
[r'''file\s*:\s*(?P<q>["'])(?P<mp4>.+?)(?P=q)''',
r'''<a\s+href=(?P<q>["'])(?P<mp4>.+?)(?P=q)\s+class=["']mp4Thumb''',
r'''<video[^>]+file=(?P<q>["'])(?P<mp4>.+?)(?P=q)[^>]*>'''],
webpage, 'video url', group='mp4', default=None)
if video_url and video_url not in format_urls:
formats.append({
'url': video_url,
})
self._sort_formats(formats)
# Only a few videos have an description
mobj = re.search(r'<span>Description: </span>([^<]+)', webpage)
description = mobj.group(1) if mobj else None
@@ -117,7 +153,8 @@ class XHamsterIE(InfoExtractor):
webpage, 'thumbnail', fatal=False, group='thumbnail')
duration = parse_duration(self._search_regex(
r'Runtime:\s*</span>\s*([\d:]+)', webpage,
[r'<[^<]+\bitemprop=["\']duration["\'][^<]+\bcontent=["\'](.+?)["\']',
r'Runtime:\s*</span>\s*([\d:]+)'], webpage,
'duration', fatal=False))
view_count = int_or_none(self._search_regex(
@@ -132,30 +169,6 @@ class XHamsterIE(InfoExtractor):
age_limit = self._rta_search(webpage)
hd = is_hd(webpage)
format_id = 'hd' if hd else 'sd'
video_url = extract_video_url(webpage, format_id)
formats = [{
'url': video_url,
'format_id': 'hd' if hd else 'sd',
'preference': 1,
}]
if not hd:
mrss_url = self._search_regex(r'<link rel="canonical" href="([^"]+)', webpage, 'mrss_url')
webpage = self._download_webpage(mrss_url + '?hd', video_id, note='Downloading HD webpage')
if is_hd(webpage):
video_url = extract_video_url(webpage, 'hd')
formats.append({
'url': video_url,
'format_id': 'hd',
'preference': 2,
})
self._sort_formats(formats)
categories_html = self._search_regex(
r'(?s)<table.+?(<span>Categories:.+?)</table>', webpage,
'categories', default=None)
@@ -164,6 +177,7 @@ class XHamsterIE(InfoExtractor):
return {
'id': video_id,
'display_id': display_id,
'title': title,
'description': description,
'upload_date': upload_date,

View File

@@ -1,14 +1,13 @@
# coding: utf-8
from __future__ import unicode_literals
import base64
from .common import InfoExtractor
from ..compat import compat_urllib_parse_unquote
from ..utils import (
ExtractorError,
float_or_none,
get_element_by_attribute,
parse_iso8601,
parse_duration,
remove_end,
)
@@ -24,6 +23,7 @@ class XuiteIE(InfoExtractor):
'id': '3860914',
'ext': 'mp3',
'title': '孤單南半球-歐德陽',
'description': '孤單南半球-歐德陽',
'thumbnail': r're:^https?://.*\.jpg$',
'duration': 247.246,
'timestamp': 1314932940,
@@ -44,7 +44,7 @@ class XuiteIE(InfoExtractor):
'duration': 596.458,
'timestamp': 1454242500,
'upload_date': '20160131',
'uploader': 'yan12125',
'uploader': '屁姥',
'uploader_id': '12158353',
'categories': ['個人短片'],
'description': 'http://download.blender.org/peach/bigbuckbunny_movies/BigBuckBunny_320x180.mp4',
@@ -72,10 +72,10 @@ class XuiteIE(InfoExtractor):
# from http://forgetfulbc.blogspot.com/2016/06/date.html
'url': 'http://vlog.xuite.net/embed/cE1xbENoLTI3NDQ3MzM2LmZsdg==?ar=0&as=0',
'info_dict': {
'id': 'cE1xbENoLTI3NDQ3MzM2LmZsdg==',
'id': '27447336',
'ext': 'mp4',
'title': '男女平權只是口號?專家解釋約會時男生是否該幫女生付錢 (中字)',
'description': 'md5:f0abdcb69df300f522a5442ef3146f2a',
'description': 'md5:1223810fa123b179083a3aed53574706',
'timestamp': 1466160960,
'upload_date': '20160617',
'uploader': 'B.C. & Lowy',
@@ -86,29 +86,9 @@ class XuiteIE(InfoExtractor):
'only_matching': True,
}]
@staticmethod
def base64_decode_utf8(data):
return base64.b64decode(data.encode('utf-8')).decode('utf-8')
@staticmethod
def base64_encode_utf8(data):
return base64.b64encode(data.encode('utf-8')).decode('utf-8')
def _extract_flv_config(self, encoded_media_id):
flv_config = self._download_xml(
'http://vlog.xuite.net/flash/player?media=%s' % encoded_media_id,
'flv config')
prop_dict = {}
for prop in flv_config.findall('./property'):
prop_id = self.base64_decode_utf8(prop.attrib['id'])
# CDATA may be empty in flv config
if not prop.text:
continue
encoded_content = self.base64_decode_utf8(prop.text)
prop_dict[prop_id] = compat_urllib_parse_unquote(encoded_content)
return prop_dict
def _real_extract(self, url):
# /play/ URLs provide embedded video URL and more metadata
url = url.replace('/embed/', '/play/')
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
@@ -121,51 +101,53 @@ class XuiteIE(InfoExtractor):
'%s returned error: %s' % (self.IE_NAME, error_msg),
expected=True)
encoded_media_id = self._search_regex(
r'attributes\.name\s*=\s*"([^"]+)"', webpage,
'encoded media id', default=None)
if encoded_media_id is None:
video_id = self._html_search_regex(
r'data-mediaid="(\d+)"', webpage, 'media id')
encoded_media_id = self.base64_encode_utf8(video_id)
flv_config = self._extract_flv_config(encoded_media_id)
media_info = self._parse_json(self._search_regex(
r'var\s+mediaInfo\s*=\s*({.*});', webpage, 'media info'), video_id)
FORMATS = {
'audio': 'mp3',
'video': 'mp4',
}
video_id = media_info['MEDIA_ID']
formats = []
for format_tag in ('src', 'hq_src'):
video_url = flv_config.get(format_tag)
for key in ('html5Url', 'html5HQUrl'):
video_url = media_info.get(key)
if not video_url:
continue
format_id = self._search_regex(
r'\bq=(.+?)\b', video_url, 'format id', default=format_tag)
r'\bq=(.+?)\b', video_url, 'format id', default=None)
formats.append({
'url': video_url,
'ext': FORMATS.get(flv_config['type'], 'mp4'),
'ext': 'mp4' if format_id.isnumeric() else format_id,
'format_id': format_id,
'height': int(format_id) if format_id.isnumeric() else None,
})
self._sort_formats(formats)
timestamp = flv_config.get('publish_datetime')
timestamp = media_info.get('PUBLISH_DATETIME')
if timestamp:
timestamp = parse_iso8601(timestamp + ' +0800', ' ')
category = flv_config.get('category')
category = media_info.get('catName')
categories = [category] if category else []
uploader = media_info.get('NICKNAME')
uploader_url = None
author_div = get_element_by_attribute('itemprop', 'author', webpage)
if author_div:
uploader = uploader or self._html_search_meta('name', author_div)
uploader_url = self._html_search_regex(
r'<link[^>]+itemprop="url"[^>]+href="([^"]+)"', author_div,
'uploader URL', fatal=False)
return {
'id': video_id,
'title': flv_config['title'],
'description': flv_config.get('description'),
'thumbnail': flv_config.get('thumb'),
'title': media_info['TITLE'],
'description': remove_end(media_info.get('metaDesc'), ' (Xuite 影音)'),
'thumbnail': media_info.get('ogImageUrl'),
'timestamp': timestamp,
'uploader': flv_config.get('author_name'),
'uploader_id': flv_config.get('author_id'),
'duration': parse_duration(flv_config.get('duration')),
'uploader': uploader,
'uploader_id': media_info.get('MEMBER_ID'),
'uploader_url': uploader_url,
'duration': float_or_none(media_info.get('MEDIA_DURATION'), 1000000),
'categories': categories,
'formats': formats,
}

View File

@@ -1,123 +0,0 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_urlparse
from ..utils import (
float_or_none,
month_by_abbreviation,
ExtractorError,
get_element_by_attribute,
)
class YamIE(InfoExtractor):
IE_DESC = '蕃薯藤yam天空部落'
_VALID_URL = r'https?://mymedia\.yam\.com/m/(?P<id>\d+)'
_TESTS = [{
# An audio hosted on Yam
'url': 'http://mymedia.yam.com/m/2283921',
'md5': 'c011b8e262a52d5473d9c2e3c9963b9c',
'info_dict': {
'id': '2283921',
'ext': 'mp3',
'title': '發現 - 趙薇 京華煙雲主題曲',
'description': '發現 - 趙薇 京華煙雲主題曲',
'uploader_id': 'princekt',
'upload_date': '20080807',
'duration': 313.0,
}
}, {
# An external video hosted on YouTube
'url': 'http://mymedia.yam.com/m/3599430',
'md5': '03127cf10d8f35d120a9e8e52e3b17c6',
'info_dict': {
'id': 'CNpEoQlrIgA',
'ext': 'mp4',
'upload_date': '20150306',
'uploader': '新莊社大瑜伽社',
'description': 'md5:11e2e405311633ace874f2e6226c8b17',
'uploader_id': '2323agoy',
'title': '20090412陽明山二子坪-1',
},
'skip': 'Video does not exist',
}, {
'url': 'http://mymedia.yam.com/m/3598173',
'info_dict': {
'id': '3598173',
'ext': 'mp4',
},
'skip': 'cause Yam system error',
}, {
'url': 'http://mymedia.yam.com/m/3599437',
'info_dict': {
'id': '3599437',
'ext': 'mp4',
},
'skip': 'invalid YouTube URL',
}, {
'url': 'http://mymedia.yam.com/m/2373534',
'md5': '7ff74b91b7a817269d83796f8c5890b1',
'info_dict': {
'id': '2373534',
'ext': 'mp3',
'title': '林俊傑&蔡卓妍-小酒窩',
'description': 'md5:904003395a0fcce6cfb25028ff468420',
'upload_date': '20080928',
'uploader_id': 'onliner2',
}
}]
def _real_extract(self, url):
video_id = self._match_id(url)
page = self._download_webpage(url, video_id)
# Check for errors
system_msg = self._html_search_regex(
r'系統訊息(?:<br>|\n|\r)*([^<>]+)<br>', page, 'system message',
default=None)
if system_msg:
raise ExtractorError(system_msg, expected=True)
# Is it hosted externally on YouTube?
youtube_url = self._html_search_regex(
r'<embed src="(http://www.youtube.com/[^"]+)"',
page, 'YouTube url', default=None)
if youtube_url:
return self.url_result(youtube_url, 'Youtube')
title = self._html_search_regex(
r'<h1[^>]+class="heading"[^>]*>\s*(.+)\s*</h1>', page, 'title')
api_page = self._download_webpage(
'http://mymedia.yam.com/api/a/?pID=' + video_id, video_id,
note='Downloading API page')
api_result_obj = compat_urlparse.parse_qs(api_page)
info_table = get_element_by_attribute('class', 'info', page)
uploader_id = self._html_search_regex(
r'<!-- 發表作者 -->[\n ]+<a href="/([a-z0-9]+)"',
info_table, 'uploader id', fatal=False)
mobj = re.search(r'<!-- 發表於 -->(?P<mon>[A-Z][a-z]{2})\s+' +
r'(?P<day>\d{1,2}), (?P<year>\d{4})', page)
if mobj:
upload_date = '%s%02d%02d' % (
mobj.group('year'),
month_by_abbreviation(mobj.group('mon')),
int(mobj.group('day')))
else:
upload_date = None
duration = float_or_none(api_result_obj['totaltime'][0], scale=1000)
return {
'id': video_id,
'url': api_result_obj['mp3file'][0],
'title': title,
'description': self._html_search_meta('description', page),
'duration': duration,
'uploader_id': uploader_id,
'upload_date': upload_date,
}

View File

@@ -3,6 +3,7 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
int_or_none,
sanitized_Request,
@@ -26,7 +27,7 @@ class YouPornIE(InfoExtractor):
'description': 'Love & Sex Answers: http://bit.ly/DanAndJenn -- Is It Unhealthy To Masturbate Daily?',
'thumbnail': r're:^https?://.*\.jpg$',
'uploader': 'Ask Dan And Jennifer',
'upload_date': '20101221',
'upload_date': '20101217',
'average_rating': int,
'view_count': int,
'comment_count': int,
@@ -45,7 +46,7 @@ class YouPornIE(InfoExtractor):
'description': 'http://sweetlivegirls.com Big Tits Awesome Brunette On amazing webcam show.mp4',
'thumbnail': r're:^https?://.*\.jpg$',
'uploader': 'Unknown',
'upload_date': '20111125',
'upload_date': '20110418',
'average_rating': int,
'view_count': int,
'comment_count': int,
@@ -68,28 +69,46 @@ class YouPornIE(InfoExtractor):
webpage = self._download_webpage(request, display_id)
title = self._search_regex(
[r'(?:video_titles|videoTitle)\s*[:=]\s*(["\'])(?P<title>.+?)\1',
r'<h1[^>]+class=["\']heading\d?["\'][^>]*>([^<])<'],
webpage, 'title', group='title')
[r'(?:video_titles|videoTitle)\s*[:=]\s*(["\'])(?P<title>(?:(?!\1).)+)\1',
r'<h1[^>]+class=["\']heading\d?["\'][^>]*>(?P<title>[^<]+)<'],
webpage, 'title', group='title',
default=None) or self._og_search_title(
webpage, default=None) or self._html_search_meta(
'title', webpage, fatal=True)
links = []
# Main source
definitions = self._parse_json(
self._search_regex(
r'mediaDefinition\s*=\s*(\[.+?\]);', webpage,
'media definitions', default='[]'),
video_id, fatal=False)
if definitions:
for definition in definitions:
if not isinstance(definition, dict):
continue
video_url = definition.get('videoUrl')
if isinstance(video_url, compat_str) and video_url:
links.append(video_url)
# Fallback #1, this also contains extra low quality 180p format
for _, link in re.findall(r'<a[^>]+href=(["\'])(http.+?)\1[^>]+title=["\']Download [Vv]ideo', webpage):
links.append(link)
# Fallback #2 (unavailable as at 22.06.2017)
sources = self._search_regex(
r'(?s)sources\s*:\s*({.+?})', webpage, 'sources', default=None)
if sources:
for _, link in re.findall(r'[^:]+\s*:\s*(["\'])(http.+?)\1', sources):
links.append(link)
# Fallback #1
# Fallback #3 (unavailable as at 22.06.2017)
for _, link in re.findall(
r'(?:videoUrl|videoSrc|videoIpadUrl|html5PlayerSrc)\s*[:=]\s*(["\'])(http.+?)\1', webpage):
r'(?:videoSrc|videoIpadUrl|html5PlayerSrc)\s*[:=]\s*(["\'])(http.+?)\1', webpage):
links.append(link)
# Fallback #2, this also contains extra low quality 180p format
for _, link in re.findall(r'<a[^>]+href=(["\'])(http.+?)\1[^>]+title=["\']Download [Vv]ideo', webpage):
links.append(link)
# Fallback #3, encrypted links
# Fallback #4, encrypted links (unavailable as at 22.06.2017)
for _, encrypted_link in re.findall(
r'encryptedQuality\d{3,4}URL\s*=\s*(["\'])([\da-zA-Z+/=]+)\1', webpage):
links.append(aes_decrypt_text(encrypted_link, title, 32).decode('utf-8'))
@@ -124,7 +143,8 @@ class YouPornIE(InfoExtractor):
r'(?s)<div[^>]+class=["\']submitByLink["\'][^>]*>(.+?)</div>',
webpage, 'uploader', fatal=False)
upload_date = unified_strdate(self._html_search_regex(
r'(?s)<div[^>]+class=["\']videoInfo(?:Date|Time)["\'][^>]*>(.+?)</div>',
[r'Date\s+[Aa]dded:\s*<span>([^<]+)',
r'(?s)<div[^>]+class=["\']videoInfo(?:Date|Time)["\'][^>]*>(.+?)</div>'],
webpage, 'upload date', fatal=False))
age_limit = self._rta_search(webpage)

View File

@@ -1269,37 +1269,57 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
sub_lang_list[sub_lang] = sub_formats
return sub_lang_list
def make_captions(sub_url, sub_langs):
parsed_sub_url = compat_urllib_parse_urlparse(sub_url)
caption_qs = compat_parse_qs(parsed_sub_url.query)
captions = {}
for sub_lang in sub_langs:
sub_formats = []
for ext in self._SUBTITLE_FORMATS:
caption_qs.update({
'tlang': [sub_lang],
'fmt': [ext],
})
sub_url = compat_urlparse.urlunparse(parsed_sub_url._replace(
query=compat_urllib_parse_urlencode(caption_qs, True)))
sub_formats.append({
'url': sub_url,
'ext': ext,
})
captions[sub_lang] = sub_formats
return captions
# New captions format as of 22.06.2017
player_response = args.get('player_response')
if player_response and isinstance(player_response, compat_str):
player_response = self._parse_json(
player_response, video_id, fatal=False)
if player_response:
renderer = player_response['captions']['playerCaptionsTracklistRenderer']
base_url = renderer['captionTracks'][0]['baseUrl']
sub_lang_list = []
for lang in renderer['translationLanguages']:
lang_code = lang.get('languageCode')
if lang_code:
sub_lang_list.append(lang_code)
return make_captions(base_url, sub_lang_list)
# Some videos don't provide ttsurl but rather caption_tracks and
# caption_translation_languages (e.g. 20LmZk1hakA)
# Does not used anymore as of 22.06.2017
caption_tracks = args['caption_tracks']
caption_translation_languages = args['caption_translation_languages']
caption_url = compat_parse_qs(caption_tracks.split(',')[0])['u'][0]
parsed_caption_url = compat_urllib_parse_urlparse(caption_url)
caption_qs = compat_parse_qs(parsed_caption_url.query)
sub_lang_list = {}
sub_lang_list = []
for lang in caption_translation_languages.split(','):
lang_qs = compat_parse_qs(compat_urllib_parse_unquote_plus(lang))
sub_lang = lang_qs.get('lc', [None])[0]
if not sub_lang:
continue
sub_formats = []
for ext in self._SUBTITLE_FORMATS:
caption_qs.update({
'tlang': [sub_lang],
'fmt': [ext],
})
sub_url = compat_urlparse.urlunparse(parsed_caption_url._replace(
query=compat_urllib_parse_urlencode(caption_qs, True)))
sub_formats.append({
'url': sub_url,
'ext': ext,
})
sub_lang_list[sub_lang] = sub_formats
return sub_lang_list
if sub_lang:
sub_lang_list.append(sub_lang)
return make_captions(caption_url, sub_lang_list)
# An extractor error can be raise by the download process if there are
# no automatic captions but there are subtitles
except (KeyError, ExtractorError):
except (KeyError, IndexError, ExtractorError):
self._downloader.report_warning(err_msg)
return {}

View File

@@ -4,7 +4,10 @@ import subprocess
from .common import PostProcessor
from ..compat import compat_shlex_quote
from ..utils import PostProcessingError
from ..utils import (
encodeArgument,
PostProcessingError,
)
class ExecAfterDownloadPP(PostProcessor):
@@ -20,7 +23,7 @@ class ExecAfterDownloadPP(PostProcessor):
cmd = cmd.replace('{}', compat_shlex_quote(information['filepath']))
self._downloader.to_screen('[exec] Executing command: %s' % cmd)
retCode = subprocess.call(cmd, shell=True)
retCode = subprocess.call(encodeArgument(cmd), shell=True)
if retCode != 0:
raise PostProcessingError(
'Command returned error code %d' % retCode)

View File

@@ -542,7 +542,7 @@ class FFmpegFixupM3u8PP(FFmpegPostProcessor):
temp_filename = prepend_extension(filename, 'temp')
options = ['-c', 'copy', '-f', 'mp4', '-bsf:a', 'aac_adtstoasc']
self._downloader.to_screen('[ffmpeg] Fixing malformated aac bitstream in "%s"' % filename)
self._downloader.to_screen('[ffmpeg] Fixing malformed AAC bitstream in "%s"' % filename)
self.run_ffmpeg(filename, temp_filename, options)
os.remove(encodeFilename(filename))

View File

@@ -35,11 +35,14 @@ class MetadataFromTitlePP(PostProcessor):
title = info['title']
match = re.match(self._titleregex, title)
if match is None:
self._downloader.to_screen('[fromtitle] Could not interpret title of video as "%s"' % self._titleformat)
self._downloader.to_screen(
'[fromtitle] Could not interpret title of video as "%s"'
% self._titleformat)
return [], info
for attribute, value in match.groupdict().items():
value = match.group(attribute)
info[attribute] = value
self._downloader.to_screen('[fromtitle] parsed ' + attribute + ': ' + value)
self._downloader.to_screen(
'[fromtitle] parsed %s: %s'
% (attribute, value if value is not None else 'NA'))
return [], info

View File

@@ -22,7 +22,6 @@ import locale
import math
import operator
import os
import pipes
import platform
import random
import re
@@ -366,9 +365,9 @@ def get_elements_by_attribute(attribute, value, html, escape_value=True):
retlist = []
for m in re.finditer(r'''(?xs)
<([a-zA-Z0-9:._-]+)
(?:\s+[a-zA-Z0-9:._-]+(?:=[a-zA-Z0-9:._-]*|="[^"]*"|='[^']*'))*?
(?:\s+[a-zA-Z0-9:._-]+(?:=[a-zA-Z0-9:._-]*|="[^"]*"|='[^']*'|))*?
\s+%s=['"]?%s['"]?
(?:\s+[a-zA-Z0-9:._-]+(?:=[a-zA-Z0-9:._-]*|="[^"]*"|='[^']*'))*?
(?:\s+[a-zA-Z0-9:._-]+(?:=[a-zA-Z0-9:._-]*|="[^"]*"|='[^']*'|))*?
\s*>
(?P<content>.*?)
</\1>
@@ -1535,7 +1534,7 @@ def shell_quote(args):
if isinstance(a, bytes):
# We may get a filename encoded with 'encodeFilename'
a = a.decode(encoding)
quoted_args.append(pipes.quote(a))
quoted_args.append(compat_shlex_quote(a))
return ' '.join(quoted_args)

View File

@@ -1,3 +1,3 @@
from __future__ import unicode_literals
__version__ = '2017.06.12'
__version__ = '2017.07.09'