Compare commits

...

115 Commits

Author SHA1 Message Date
Sergey M․
d38b27dd9b release 2016.08.24.1 2016-08-24 10:11:04 +07:00
Sergey M․
6d94cbd2f4 [ChangeLog] Actualize 2016-08-24 10:07:06 +07:00
Sergey M․
30317f4887 [pluralsight] Modernize and make more robust 2016-08-24 08:52:12 +07:00
Sergey M․
8c3e35dd44 [pluralsight] Add support for subtitles (Closes #9681) 2016-08-24 08:41:52 +07:00
Sergey M․
c86f51ee38 release 2016.08.24 2016-08-24 01:38:46 +07:00
Sergey M․
6e52bbb413 [ChangeLog] Actualize 2016-08-24 01:36:27 +07:00
Sergey M․
05bddcc512 [youtube] Fix authentication (2) (Closes #10392) 2016-08-24 01:29:50 +07:00
Sergey M․
1212e9972f [youtube] Fix authentication (#10392) 2016-08-24 00:25:21 +07:00
Remita Amine
ccb6570e9e [syfy,bravotv] restrict drupal settings regex 2016-08-23 17:31:35 +01:00
Yen Chi Hsuan
18b6216150 [openload] Fix extraction (closes #10408)
Thanks @yokrysty for the algorithm
2016-08-23 21:55:58 +08:00
Remita Amine
fb009b7f53 [bravotv] correct clip info extraction and add support for adobe pass auth(closes #10407) 2016-08-23 10:29:52 +01:00
Sergey M․
3083e4dc07 [eagleplatform] Improve detection of embedded videos (Closes #10409) 2016-08-23 07:22:14 +07:00
Remita Amine
7367bdef23 [awaan] fix extraction, modernize, rename the extractors and add test for live stream 2016-08-22 23:10:06 +01:00
Remita Amine
ad31642584 [nrk,abc:iview] use _extract_akamai_formats 2016-08-22 07:54:08 +01:00
Remita Amine
c7c43a93ba [common] add helper method to extract akamai m3u8 and f4m formats 2016-08-22 07:49:34 +01:00
Yen Chi Hsuan
96229e5f95 [mtvservices:embedded] Update config URL
All starts from #10363. The test case in mtvservices:embedded uses
config.xml, while the video from #10363 and the test case in generic.py
is broken. Both uses index.html for fetching the feed URL.
2016-08-22 13:56:09 +08:00
Remita Amine
55d119e2a1 [abc:iview] Add new extractor(closes #6148) 2016-08-22 00:07:17 +01:00
Sergey M․
6d2679ee26 release 2016.08.22 2016-08-22 04:17:34 +07:00
Sergey M․
afbab5688e [ChangeLog] Actualize 2016-08-22 04:15:46 +07:00
Sergey M․
3d897cc791 [ivi] Fix episode number extraction 2016-08-22 03:34:27 +07:00
Sergey M․
cf143c4d97 [ivi] Add support for 720p and 1080p 2016-08-22 03:31:33 +07:00
Yen Chi Hsuan
ad120ae1c5 [extractor/common] Change the default m3u8 protocol in HTML5
Helper functions should have consistent default values
2016-08-22 02:26:07 +08:00
Remita Amine
d0fa172e5f [firsttv] keep a test videos with multiple formats 2016-08-21 19:13:43 +01:00
Yen Chi Hsuan
f97f9f71e5 Merge branch 'TRox1972-charlierose' 2016-08-22 02:11:43 +08:00
Yen Chi Hsuan
526656726b [charlierose] Simplify and improve 2016-08-22 02:06:47 +08:00
Remita Amine
9b8c554ea7 [firsttv] fix extraction(closes #9249) 2016-08-21 17:56:25 +01:00
Yen Chi Hsuan
d13bfc07b7 Merge branch 'charlierose' of https://github.com/TRox1972/youtube-dl into TRox1972-charlierose 2016-08-22 00:48:35 +08:00
Sergey M․
efe470e261 [twitch] Renew authentication 2016-08-21 22:45:50 +07:00
Sergey M․
e3f6b56909 [twitch] Refactor API calls 2016-08-21 22:09:29 +07:00
Sergey M․
b1e676fde8 [twitch] Modernize 2016-08-21 21:28:02 +07:00
Sergey M․
92d4cfa358 [kaltura] Fallback ext calculation on caption's format 2016-08-21 21:01:01 +07:00
Remita Amine
3d47ee0a9e [zingmp3] fix extraction and add support for video clips(closes #10041) 2016-08-21 14:09:48 +01:00
Yen Chi Hsuan
d164a0d41b [README.md] Add a format selection example using comma
Ref: #10399
2016-08-21 20:00:48 +08:00
Déstin Reed
db29af6d36 [charlierose] Add new extractor 2016-08-21 11:29:48 +02:00
Sergey M․
2c6acdfd2d [kaltura] Add test for #10279 2016-08-21 08:37:01 +07:00
Sergey M․
fddaa76a59 [kaltura] Assume ttml to be default subtitles' extension 2016-08-21 08:28:36 +07:00
Sergey M․
a809446750 [kaltura] Add subtitles support when entry_id is unknown beforehand (Closes #10279) 2016-08-21 08:28:36 +07:00
Sergey M․
d8f30a7e66 [kaltura] Remove unused code 2016-08-21 08:28:36 +07:00
Sergey M․
5b1d85754e [YoutubeDL] Autocalculate ext when ext is None 2016-08-21 08:28:36 +07:00
Remita Amine
e25586e471 [cultureunplugged] fix extraction(closes #10330) 2016-08-20 20:02:49 +01:00
Remita Amine
292a2301bf [cnn] add support for money.cnn.com videos(closes #2797) 2016-08-20 19:00:25 +01:00
Remita Amine
dabe15701b [cbs, cbsnews] fix extraction(fixes #10393) 2016-08-20 13:25:32 +01:00
Sergey M․
4245f55880 [dotsub] Replace test (Closes #10386) 2016-08-20 06:18:20 +07:00
Déstin Reed
5b9d187cc6 [imdb] Improve title extraction and make thumbnail non-fatal 2016-08-20 04:50:39 +07:00
Yen Chi Hsuan
39e1c4f08c [litv] Support 'promo' URLs (closes #10385) 2016-08-20 00:52:37 +08:00
Yen Chi Hsuan
19f35402c5 [snotr] Fix extraction (closes #10338) 2016-08-20 00:18:22 +08:00
Yen Chi Hsuan
70852b47ca [utils] Recognize units with full names in parse_filename
Reference: https://en.wikipedia.org/wiki/Template:Quantities_of_bytes
2016-08-20 00:17:26 +08:00
Yen Chi Hsuan
a9a3b4a081 [miomio] Adapt to the new API and update _TESTS
The test case is from #9680
2016-08-20 00:08:23 +08:00
Yen Chi Hsuan
ecc90093f9 [vuclip] Adapt to the new API and update _TEST 2016-08-19 23:56:09 +08:00
Yen Chi Hsuan
520251c093 [extractor/common] Recognize m3u8 manifests in HTML5 multimedia tags 2016-08-19 23:53:47 +08:00
Yen Chi Hsuan
55af45fcab [radiobremen] Update _TEST (closes #10337) 2016-08-19 23:12:30 +08:00
Yen Chi Hsuan
b82232036a [n-tv.de] Fix extraction (closes #10331) 2016-08-19 20:39:28 +08:00
Yen Chi Hsuan
e4659b4547 [utils] Correct octal/hexadecimal number detection in js_to_json 2016-08-19 20:37:17 +08:00
Sergey M․
9e5751b9fe [globo:article] Relax _VALID_URL and video id regex (Closes #10379) 2016-08-19 01:13:45 +07:00
Sergey M․
bd1bcd3ea0 release 2016.08.19 2016-08-19 00:15:12 +07:00
Sergey M․
93a63b36f1 [ChangeLog] Actualize 2016-08-19 00:13:24 +07:00
Sergey M․
8b2dc4c328 [options] Remove output template description from --help
Same reasons as for --format
2016-08-18 23:59:13 +07:00
Sergey M․
850837b67a [porncom] Add extractor (Closes #2251, closes #10251) 2016-08-18 23:52:41 +07:00
Sergey M․
13585d7682 [utils] Recognize lowercase units in parse_filesize 2016-08-18 23:32:00 +07:00
Sergey M․
fd3ec986a4 [generic] Fix dbtv test (Closes #10364) 2016-08-18 21:35:41 +07:00
Sergey M․
b0d578ff7b [dbtv] Relax embed regex 2016-08-18 21:30:55 +07:00
Déstin Reed
b0c8f2e9c8 [DBTV:generic] Add support for embeds 2016-08-18 21:29:27 +07:00
Sergey M․
51815886a9 [vk:wallpost] Fix audio extraction 2016-08-18 06:14:05 +07:00
Sergey M․
08a42f9c74 [vk] Fix authentication on python3 2016-08-18 05:22:23 +07:00
Sergey M․
e15ad9ef09 [keezmovies] PEP 8 2016-08-18 04:39:31 +07:00
Sergey M․
4e9fee1015 [hgtvcom:show] Add extractor (Closes #10365) 2016-08-18 04:37:14 +07:00
Remita Amine
7273e5849b [discoverygo] extend _VALID_URL to support other networks 2016-08-17 11:03:09 +01:00
Sergey M․
b505e98784 [extremetube] Revert display_id 2016-08-17 07:02:13 +07:00
Sergey M․
92cd9fd565 [keezmovies] Make display_id optional 2016-08-17 07:01:32 +07:00
Sergey M․
b3d7dce429 release 2016.08.17 2016-08-17 06:21:21 +07:00
Sergey M․
a44694ab4e [ChangeLog] Actualize 2016-08-17 06:19:22 +07:00
Sergey M․
ab19b46b88 [extremetube] Modernize 2016-08-17 06:02:12 +07:00
Sergey M․
8804f10e6b [tube8] Modernize 2016-08-17 05:46:45 +07:00
Sergey M․
6be17c0870 [mofosex] Extract all formats and modernize (Closes #10335) 2016-08-17 05:45:49 +07:00
Sergey M․
8652770bd2 [keezmovies] Improve and modernize 2016-08-17 05:44:46 +07:00
Sergey M․
2a1321a272 [vbox7:generic] Add support for vbox7 embeds 2016-08-17 01:02:59 +07:00
Sergey M․
9c0fa60bf3 [vbox7] Add support for embed URLs 2016-08-17 00:42:02 +07:00
Sergey M․
502d87c546 [mtg] Improve view count extraction 2016-08-17 00:32:28 +07:00
Sergey M․
b35b0d73d8 [viafree] Add extractor (Closes #10358) 2016-08-17 00:21:30 +07:00
Sergey M․
6e7e4a6edf [mtg] Add support for viafree URLs (#10358) 2016-08-17 00:19:43 +07:00
Remita Amine
53fef319f1 [fxnetworks] extend _VALID_URL to support simpsonsworld.com 2016-08-16 16:22:34 +01:00
Remita Amine
2cabee2a7d [amcnetworks] fix typo 2016-08-16 16:22:34 +01:00
Remita Amine
11f502fac1 [theplatform] extract subtitles with multiple formats from the metadata 2016-08-16 16:22:34 +01:00
Sergey M․
98affc1a48 [xvideos] Fix test 2016-08-16 21:20:15 +07:00
Sergey M․
70a2829fee [xvideos] Fix HLS extraction (Closes #10356) 2016-08-16 21:17:52 +07:00
Remita Amine
837e56c8ee [amcnetworks] extract episode metadata 2016-08-16 14:49:32 +01:00
Remita Amine
b5ddee8c77 [amcnetworks] Add new extractor 2016-08-16 13:44:01 +01:00
Sergey M․
fb64adcbd3 [adobepass] PEP 8 2016-08-16 04:45:21 +07:00
Sergey M․
4f640f2890 [bbc:playlist] Fix tests 2016-08-16 04:43:10 +07:00
Sergey M․
254e64a20a [bbc:playlist] Add support for pagination (Closes #10349) 2016-08-16 04:36:23 +07:00
Remita Amine
818ac213eb [adobepass] add IE suffix to the extractor and remove duplicate constant 2016-08-15 21:36:34 +01:00
Remita Amine
cbef4d5c9f [fxnetworks] add test and check geo restriction 2016-08-15 17:10:45 +01:00
Remita Amine
bf90c46790 [fxnetworks] Add new extractor(closes #9462) 2016-08-15 16:34:32 +01:00
Yen Chi Hsuan
69eb4d699f [cbsnews] Remove invalid tests. CBS Live videos gets deleted soon. 2016-08-15 20:29:22 +08:00
Yen Chi Hsuan
6d8ec8c3b7 [ChangeLog] Update for CBSLocal and related changes 2016-08-15 13:39:43 +08:00
Yen Chi Hsuan
760845ce99 [cbslocal] Adapt to SendtoNewsIE 2016-08-15 13:37:37 +08:00
Yen Chi Hsuan
5c2d087221 [sendtonews] Fix extraction 2016-08-15 13:31:08 +08:00
Yen Chi Hsuan
b6c4e36728 [jwplatform] Parse video_id from JWPlayer data
And remove a mysterious comma from 115c65793a
2016-08-15 13:29:01 +08:00
Sergey M․
1a57b8c18c [zippcast] Remove extractor (Closes #10332)
ZippCast is shut down
2016-08-15 08:25:24 +07:00
Remita Amine
24eb13b1c6 [uplynk,viceland] update tests and change uplynk extractors names 2016-08-14 22:45:43 +01:00
Remita Amine
525e0316c0 [adobepass] fix check for pendingLogout errors 2016-08-14 21:25:43 +01:00
Remita Amine
7e60ce9cf7 [adobepass] clear cache in case of pendingLogout errors 2016-08-14 21:24:33 +01:00
Remita Amine
e811bcf8f8 [viceland] raise ExtractorError for errors other than HTTP 400 2016-08-14 20:13:35 +01:00
Remita Amine
6103f59095 [viceland] remove outdated comment 2016-08-14 19:08:35 +01:00
Remita Amine
9fa5789279 [viceland] fix info extraction(closes #8799) 2016-08-14 19:04:23 +01:00
Remita Amine
d2ac04674d [viceland] Add new extractor(#8799) 2016-08-14 18:04:50 +01:00
Remita Amine
1fd6e30988 [adobepass] create separate class for adobe pass authentication 2016-08-14 18:04:50 +01:00
Sergey M․
884cdb6cd9 [life:embed] Improve extraction 2016-08-14 20:49:11 +07:00
Remita Amine
9771b1f901 [theplatform] use _get_netrc_login_info and fix session expiration check(#10345) 2016-08-14 11:55:28 +01:00
Remita Amine
2118fdd1a9 [common] add separate method for getting netrc ligin info 2016-08-14 11:55:28 +01:00
Sergey M․
320d597c21 [vgtv] Detect geo restricted videos (#10348) 2016-08-14 16:25:14 +07:00
Remita Amine
aaf44a2f47 [uplynk] Add new extractor 2016-08-13 22:53:41 +01:00
Yen Chi Hsuan
fafabc0712 Update ChangeLog for #10342
[skip ci]
2016-08-14 02:33:15 +08:00
Yen Chi Hsuan
409760a932 Merge pull request #10342 from muphil/patch-1
[xiami] bug fix for extractor xiami.py
2016-08-14 02:30:50 +08:00
phi
097eba019d bug fix for extractor xiami.py
Before applying this patch, when downloading resources from xiami.com, it crashes with these:
Traceback (most recent call last):
  File "/home/phi/.local/bin/youtube-dl", line 11, in <module>
    sys.exit(main())
  File "/home/phi/.local/lib/python3.5/site-packages/youtube_dl/__init__.py", line 433, in main
    _real_main(argv)
  File "/home/phi/.local/lib/python3.5/site-packages/youtube_dl/__init__.py", line 423, in _real_main
    retcode = ydl.download(all_urls)
  File "/home/phi/.local/lib/python3.5/site-packages/youtube_dl/YoutubeDL.py", line 1786, in download
    url, force_generic_extractor=self.params.get('force_generic_extractor', False))
  File "/home/phi/.local/lib/python3.5/site-packages/youtube_dl/YoutubeDL.py", line 691, in extract_info
    ie_result = ie.extract(url)
  File "/home/phi/.local/lib/python3.5/site-packages/youtube_dl/extractor/common.py", line 347, in extract
    return self._real_extract(url)
  File "/home/phi/.local/lib/python3.5/site-packages/youtube_dl/extractor/xiami.py", line 116, in _real_extract
    return self._extract_tracks(self._match_id(url))[0]
  File "/home/phi/.local/lib/python3.5/site-packages/youtube_dl/extractor/xiami.py", line 43, in _extract_tracks
    '%s/%s%s' % (self._API_BASE_URL, item_id, '/type/%s' % typ if typ else ''), item_id)
  File "/home/phi/.local/lib/python3.5/site-packages/youtube_dl/extractor/common.py", line 562, in _download_json
    json_string, video_id, transform_source=transform_source, fatal=fatal)
  File "/home/phi/.local/lib/python3.5/site-packages/youtube_dl/extractor/common.py", line 568, in _parse_json
    return json.loads(json_string)
  File "/usr/lib/python3.5/json/__init__.py", line 312, in loads
    s.__class__.__name__))
TypeError: the JSON object must be str, not 'NoneType'

This patch solves exactly this problem.
2016-08-14 02:18:59 +08:00
71 changed files with 2127 additions and 1113 deletions

View File

@@ -6,8 +6,8 @@
---
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2016.08.13*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2016.08.13**
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2016.08.24.1*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2016.08.24.1**
### Before submitting an *issue* make sure you have:
- [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
@@ -35,7 +35,7 @@ $ youtube-dl -v <your command line>
[debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2016.08.13
[debug] youtube-dl version 2016.08.24.1
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {}

View File

@@ -1,3 +1,92 @@
version 2016.08.24.1
Extractors
+ [pluralsight] Add support for subtitles (#9681)
version 2016.08.24
Extractors
* [youtube] Fix authentication (#10392)
* [openload] Fix extraction (#10408)
+ [bravotv] Add support for Adobe Pass (#10407)
* [bravotv] Fix clip info extraction (#10407)
* [eagleplatform] Improve embedded videos detection (#10409)
* [awaan] Fix extraction
* [mtvservices:embedded] Update config URL
+ [abc:iview] Add extractor (#6148)
version 2016.08.22
Core
* Improve formats and subtitles extension auto calculation
+ Recognize full unit names in parse_filesize
+ Add support for m3u8 manifests in HTML5 multimedia tags
* Fix octal/hexadecimal number detection in js_to_json
Extractors
+ [ivi] Add support for 720p and 1080p
+ [charlierose] Add new extractor (#10382)
* [1tv] Fix extraction (#9249)
* [twitch] Renew authentication
* [kaltura] Improve subtitles extension calculation
+ [zingmp3] Add support for video clips
* [zingmp3] Fix extraction (#10041)
* [kaltura] Improve subtitles extraction (#10279)
* [cultureunplugged] Fix extraction (#10330)
+ [cnn] Add support for money.cnn.com (#2797)
* [cbsnews] Fix extraction (#10362)
* [cbs] Fix extraction (#10393)
+ [litv] Support 'promo' URLs (#10385)
* [snotr] Fix extraction (#10338)
* [n-tv.de] Fix extraction (#10331)
* [globo:article] Relax URL and video id regular expressions (#10379)
version 2016.08.19
Core
- Remove output template description from --help
* Recognize lowercase units in parse_filesize
Extractors
+ [porncom] Add extractor for porn.com (#2251, #10251)
+ [generic] Add support for DBTV embeds
* [vk:wallpost] Fix audio extraction for new site layout
* [vk] Fix authentication
+ [hgtvcom:show] Add extractor for hgtv.com shows (#10365)
+ [discoverygo] Add support for another GO network sites
version 2016.08.17
Core
+ Add _get_netrc_login_info
Extractors
* [mofosex] Extract all formats (#10335)
+ [generic] Add support for vbox7 embeds
+ [vbox7] Add support for embed URLs
+ [viafree] Add extractor (#10358)
+ [mtg] Add support for viafree URLs (#10358)
* [theplatform] Extract all subtitles per language
+ [xvideos] Fix HLS extraction (#10356)
+ [amcnetworks] Add extractor
+ [bbc:playlist] Add support for pagination (#10349)
+ [fxnetworks] Add extractor (#9462)
* [cbslocal] Fix extraction for SendtoNews-based videos
* [sendtonews] Fix extraction
* [jwplatform] Extract video id from JWPlayer data
- [zippcast] Remove extractor (#10332)
+ [viceland] Add extractor (#8799)
+ [adobepass] Add base extractor for Adobe Pass Authentication
* [life:embed] Improve extraction
* [vgtv] Detect geo restricted videos (#10348)
+ [uplynk] Add extractor
* [xiami] Fix extraction (#10342)
version 2016.08.13
Core
@@ -23,6 +112,7 @@ Extractors
+ [pbs] Add support for high quality HTTP formats
+ [crunchyroll] Add support for HLS formats (#10301)
version 2016.08.12
Core

View File

@@ -201,32 +201,8 @@ which means you can modify it, redistribute it or use it however you like.
-a, --batch-file FILE File containing URLs to download ('-' for
stdin)
--id Use only video ID in file name
-o, --output TEMPLATE Output filename template. Use %(title)s to
get the title, %(uploader)s for the
uploader name, %(uploader_id)s for the
uploader nickname if different,
%(autonumber)s to get an automatically
incremented number, %(ext)s for the
filename extension, %(format)s for the
format description (like "22 - 1280x720" or
"HD"), %(format_id)s for the unique id of
the format (like YouTube's itags: "137"),
%(upload_date)s for the upload date
(YYYYMMDD), %(extractor)s for the provider
(youtube, metacafe, etc), %(id)s for the
video id, %(playlist_title)s,
%(playlist_id)s, or %(playlist)s (=title if
present, ID otherwise) for the playlist the
video is in, %(playlist_index)s for the
position in the playlist. %(height)s and
%(width)s for the width and height of the
video format. %(resolution)s for a textual
description of the resolution of the video
format. %% for a literal percent. Use - to
output to stdout. Can also be used to
download to a different directory, for
example with -o '/my/downloads/%(uploader)s
/%(title)s-%(id)s.%(ext)s' .
-o, --output TEMPLATE Output filename template, see the "OUTPUT
TEMPLATE" for all the info
--autonumber-size NUMBER Specify the number of digits in
%(autonumber)s when it is present in output
filename template or --auto-number option
@@ -669,7 +645,11 @@ $ youtube-dl -f 'best[filesize<50M]'
# Download best format available via direct link over HTTP/HTTPS protocol
$ youtube-dl -f '(bestvideo+bestaudio/best)[protocol^=http]'
# Download the best video format and the best audio format without merging them
$ youtube-dl -f 'bestvideo,bestaudio' -o '%(title)s.f%(format_id)s.%(ext)s'
```
Note that in the last example, an output template is recommended as bestvideo and bestaudio may have the same file name.
# VIDEO SELECTION

View File

@@ -16,6 +16,7 @@
- **9gag**
- **9now.com.au**
- **abc.net.au**
- **abc.net.au:iview**
- **Abc7News**
- **abcnews**
- **abcnews:video**
@@ -35,6 +36,7 @@
- **AlJazeera**
- **Allocine**
- **AlphaPorno**
- **AMCNetworks**
- **AnimeOnDemand**
- **anitube.se**
- **AnySex**
@@ -65,6 +67,10 @@
- **audiomack**
- **audiomack:album**
- **auroravid**: AuroraVid
- **AWAAN**
- **awaan:live**
- **awaan:season**
- **awaan:video**
- **Azubu**
- **AzubuLive**
- **BaiduVideo**: 百度视频
@@ -120,6 +126,7 @@
- **CDA**
- **CeskaTelevize**
- **channel9**: Channel 9
- **CharlieRose**
- **Chaturbate**
- **Chilloutzone**
- **chirbit**
@@ -170,10 +177,6 @@
- **daum.net:playlist**
- **daum.net:user**
- **DBTV**
- **DCN**
- **dcn:live**
- **dcn:season**
- **dcn:video**
- **DctpTv**
- **DeezerPlaylist**
- **defense.gouv.fr**
@@ -247,6 +250,7 @@
- **Funimation**
- **FunnyOrDie**
- **Fusion**
- **FXNetworks**
- **GameInformer**
- **GameOne**
- **gameone:playlist**
@@ -277,6 +281,7 @@
- **Helsinki**: helsinki.fi
- **HentaiStigma**
- **HGTV**
- **hgtv.com:show**
- **HistoricFilms**
- **history:topic**: History.com Topic
- **hitbox**
@@ -398,6 +403,7 @@
- **Moviezine**
- **MPORA**
- **MSN**
- **mtg**: MTG services
- **MTV**
- **mtv.de**
- **mtvservices:embedded**
@@ -520,6 +526,7 @@
- **podomatic**
- **Pokemon**
- **PolskieRadio**
- **PornCom**
- **PornHd**
- **PornHub**: PornHub and Thumbzilla
- **PornHubPlaylist**
@@ -731,7 +738,6 @@
- **tvp**: Telewizja Polska
- **tvp:embed**: Telewizja Polska
- **tvp:series**
- **TVPlay**: TV3Play and related services
- **Tweakers**
- **twitch:chapter**
- **twitch:clips**
@@ -748,6 +754,8 @@
- **UDNEmbed**: 聯合影音
- **Unistra**
- **uol.com.br**
- **uplynk**
- **uplynk:preplay**
- **Urort**: NRK P3 Urørt
- **URPlay**
- **USAToday**
@@ -765,7 +773,9 @@
- **VevoPlaylist**
- **VGTV**: VGTV, BTTV, FTV, Aftenposten and Aftonbladet
- **vh1.com**
- **Viafree**
- **Vice**
- **Viceland**
- **ViceShow**
- **Vidbit**
- **Viddler**
@@ -885,6 +895,4 @@
- **Zapiks**
- **ZDF**
- **ZDFChannel**
- **zingmp3:album**: mp3.zing.vn albums
- **zingmp3:song**: mp3.zing.vn songs
- **ZippCast**
- **zingmp3**: mp3.zing.vn

View File

@@ -712,6 +712,9 @@ class TestUtil(unittest.TestCase):
inp = '''{"foo":101}'''
self.assertEqual(js_to_json(inp), '''{"foo":101}''')
inp = '''{"duration": "00:01:07"}'''
self.assertEqual(js_to_json(inp), '''{"duration": "00:01:07"}''')
def test_js_to_json_edgecases(self):
on = js_to_json("{abc_def:'1\\'\\\\2\\\\\\'3\"4'}")
self.assertEqual(json.loads(on), {"abc_def": "1'\\2\\'3\"4"})
@@ -817,7 +820,10 @@ class TestUtil(unittest.TestCase):
self.assertEqual(parse_filesize('2 MiB'), 2097152)
self.assertEqual(parse_filesize('5 GB'), 5000000000)
self.assertEqual(parse_filesize('1.2Tb'), 1200000000000)
self.assertEqual(parse_filesize('1.2tb'), 1200000000000)
self.assertEqual(parse_filesize('1,24 KB'), 1240)
self.assertEqual(parse_filesize('1,24 kb'), 1240)
self.assertEqual(parse_filesize('8.5 megabytes'), 8500000)
def test_parse_count(self):
self.assertEqual(parse_count(None), None)

View File

@@ -1299,7 +1299,7 @@ class YoutubeDL(object):
for subtitle_format in subtitle:
if subtitle_format.get('url'):
subtitle_format['url'] = sanitize_url(subtitle_format['url'])
if 'ext' not in subtitle_format:
if subtitle_format.get('ext') is None:
subtitle_format['ext'] = determine_ext(subtitle_format['url']).lower()
if self.params.get('listsubtitles', False):
@@ -1354,7 +1354,7 @@ class YoutubeDL(object):
note=' ({0})'.format(format['format_note']) if format.get('format_note') is not None else '',
)
# Automatically determine file extension if missing
if 'ext' not in format:
if format.get('ext') is None:
format['ext'] = determine_ext(format['url']).lower()
# Automatically determine protocol if missing (useful for format
# selection purposes)

View File

@@ -20,6 +20,7 @@ from ..utils import (
encodeFilename,
sanitize_open,
parse_m3u8_attributes,
update_url_query,
)
@@ -82,6 +83,7 @@ class HlsFD(FragmentFD):
self._prepare_and_start_frag_download(ctx)
extra_param_to_segment_url = info_dict.get('extra_param_to_segment_url')
i = 0
media_sequence = 0
decrypt_info = {'METHOD': 'NONE'}
@@ -95,6 +97,8 @@ class HlsFD(FragmentFD):
if re.match(r'^https?://', line)
else compat_urlparse.urljoin(man_url, line))
frag_filename = '%s-Frag%d' % (ctx['tmpfilename'], i)
if extra_param_to_segment_url:
frag_url = update_url_query(frag_url, extra_param_to_segment_url)
success = ctx['dl'].download(frag_filename, {'url': frag_url})
if not success:
return False
@@ -120,6 +124,8 @@ class HlsFD(FragmentFD):
if not re.match(r'^https?://', decrypt_info['URI']):
decrypt_info['URI'] = compat_urlparse.urljoin(
man_url, decrypt_info['URI'])
if extra_param_to_segment_url:
decrypt_info['URI'] = update_url_query(decrypt_info['URI'], extra_param_to_segment_url)
decrypt_info['KEY'] = self.ydl.urlopen(decrypt_info['URI']).read()
elif line.startswith('#EXT-X-MEDIA-SEQUENCE'):
media_sequence = int(line[22:])

View File

@@ -7,6 +7,7 @@ from ..utils import (
ExtractorError,
js_to_json,
int_or_none,
parse_iso8601,
)
@@ -93,3 +94,57 @@ class ABCIE(InfoExtractor):
'description': self._og_search_description(webpage),
'thumbnail': self._og_search_thumbnail(webpage),
}
class ABCIViewIE(InfoExtractor):
IE_NAME = 'abc.net.au:iview'
_VALID_URL = r'https?://iview\.abc\.net\.au/programs/[^/]+/(?P<id>[^/?#]+)'
_TESTS = [{
'url': 'http://iview.abc.net.au/programs/gardening-australia/FA1505V024S00',
'md5': '979d10b2939101f0d27a06b79edad536',
'info_dict': {
'id': 'FA1505V024S00',
'ext': 'mp4',
'title': 'Series 27 Ep 24',
'description': 'md5:b28baeae7504d1148e1d2f0e3ed3c15d',
'upload_date': '20160820',
'uploader_id': 'abc1',
'timestamp': 1471719600,
},
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
video_params = self._parse_json(self._search_regex(
r'videoParams\s*=\s*({.+?});', webpage, 'video params'), video_id)
title = video_params['title']
stream = next(s for s in video_params['playlist'] if s.get('type') == 'program')
formats = self._extract_akamai_formats(stream['hds-unmetered'], video_id)
self._sort_formats(formats)
subtitles = {}
src_vtt = stream.get('captions', {}).get('src-vtt')
if src_vtt:
subtitles['en'] = [{
'url': src_vtt,
'ext': 'vtt',
}]
return {
'id': video_id,
'title': title,
'description': self._html_search_meta(['og:description', 'twitter:description'], webpage),
'thumbnail': self._html_search_meta(['og:image', 'twitter:image:src'], webpage),
'duration': int_or_none(video_params.get('eventDuration')),
'timestamp': parse_iso8601(video_params.get('pubDate'), ' '),
'series': video_params.get('seriesTitle'),
'series_id': video_params.get('seriesHouseNumber') or video_id[:7],
'episode_number': int_or_none(self._html_search_meta('episodeNumber', webpage)),
'episode': self._html_search_meta('episode_title', webpage),
'uploader_id': video_params.get('channel'),
'formats': formats,
'subtitles': subtitles,
}

View File

@@ -0,0 +1,134 @@
# -*- coding: utf-8 -*-
from __future__ import unicode_literals
import re
import time
import xml.etree.ElementTree as etree
from .common import InfoExtractor
from ..utils import (
unescapeHTML,
urlencode_postdata,
unified_timestamp,
)
class AdobePassIE(InfoExtractor):
_SERVICE_PROVIDER_TEMPLATE = 'https://sp.auth.adobe.com/adobe-services/%s'
_USER_AGENT = 'Mozilla/5.0 (X11; Linux i686; rv:47.0) Gecko/20100101 Firefox/47.0'
@staticmethod
def _get_mvpd_resource(provider_id, title, guid, rating):
channel = etree.Element('channel')
channel_title = etree.SubElement(channel, 'title')
channel_title.text = provider_id
item = etree.SubElement(channel, 'item')
resource_title = etree.SubElement(item, 'title')
resource_title.text = title
resource_guid = etree.SubElement(item, 'guid')
resource_guid.text = guid
resource_rating = etree.SubElement(item, 'media:rating')
resource_rating.attrib = {'scheme': 'urn:v-chip'}
resource_rating.text = rating
return '<rss version="2.0" xmlns:media="http://search.yahoo.com/mrss/">' + etree.tostring(channel).decode() + '</rss>'
def _extract_mvpd_auth(self, url, video_id, requestor_id, resource):
def xml_text(xml_str, tag):
return self._search_regex(
'<%s>(.+?)</%s>' % (tag, tag), xml_str, tag)
mvpd_headers = {
'ap_42': 'anonymous',
'ap_11': 'Linux i686',
'ap_z': self._USER_AGENT,
'User-Agent': self._USER_AGENT,
}
guid = xml_text(resource, 'guid')
requestor_info = self._downloader.cache.load('mvpd', requestor_id) or {}
authn_token = requestor_info.get('authn_token')
if authn_token:
token_expires = unified_timestamp(re.sub(r'[_ ]GMT', '', xml_text(authn_token, 'simpleTokenExpires')))
if token_expires and token_expires <= int(time.time()):
authn_token = None
requestor_info = {}
if not authn_token:
# TODO add support for other TV Providers
mso_id = 'DTV'
username, password = self._get_netrc_login_info(mso_id)
if not username or not password:
return ''
def post_form(form_page, note, data={}):
post_url = self._html_search_regex(r'<form[^>]+action=(["\'])(?P<url>.+?)\1', form_page, 'post url', group='url')
return self._download_webpage(
post_url, video_id, note, data=urlencode_postdata(data or self._hidden_inputs(form_page)), headers={
'Content-Type': 'application/x-www-form-urlencoded',
})
provider_redirect_page = self._download_webpage(
self._SERVICE_PROVIDER_TEMPLATE % 'authenticate/saml', video_id,
'Downloading Provider Redirect Page', query={
'noflash': 'true',
'mso_id': mso_id,
'requestor_id': requestor_id,
'no_iframe': 'false',
'domain_name': 'adobe.com',
'redirect_url': url,
})
provider_login_page = post_form(
provider_redirect_page, 'Downloading Provider Login Page')
mvpd_confirm_page = post_form(provider_login_page, 'Logging in', {
'username': username,
'password': password,
})
post_form(mvpd_confirm_page, 'Confirming Login')
session = self._download_webpage(
self._SERVICE_PROVIDER_TEMPLATE % 'session', video_id,
'Retrieving Session', data=urlencode_postdata({
'_method': 'GET',
'requestor_id': requestor_id,
}), headers=mvpd_headers)
if '<pendingLogout' in session:
self._downloader.cache.store('mvpd', requestor_id, {})
return self._extract_mvpd_auth(url, video_id, requestor_id, resource)
authn_token = unescapeHTML(xml_text(session, 'authnToken'))
requestor_info['authn_token'] = authn_token
self._downloader.cache.store('mvpd', requestor_id, requestor_info)
authz_token = requestor_info.get(guid)
if not authz_token:
authorize = self._download_webpage(
self._SERVICE_PROVIDER_TEMPLATE % 'authorize', video_id,
'Retrieving Authorization Token', data=urlencode_postdata({
'resource_id': resource,
'requestor_id': requestor_id,
'authentication_token': authn_token,
'mso_id': xml_text(authn_token, 'simpleTokenMsoID'),
'userMeta': '1',
}), headers=mvpd_headers)
if '<pendingLogout' in authorize:
self._downloader.cache.store('mvpd', requestor_id, {})
return self._extract_mvpd_auth(url, video_id, requestor_id, resource)
authz_token = unescapeHTML(xml_text(authorize, 'authzToken'))
requestor_info[guid] = authz_token
self._downloader.cache.store('mvpd', requestor_id, requestor_info)
mvpd_headers.update({
'ap_19': xml_text(authn_token, 'simpleSamlNameID'),
'ap_23': xml_text(authn_token, 'simpleSamlSessionIndex'),
})
short_authorize = self._download_webpage(
self._SERVICE_PROVIDER_TEMPLATE % 'shortAuthorize',
video_id, 'Retrieving Media Token', data=urlencode_postdata({
'authz_token': authz_token,
'requestor_id': requestor_id,
'session_guid': xml_text(authn_token, 'simpleTokenAuthenticationGuid'),
'hashed_guid': 'false',
}), headers=mvpd_headers)
if '<pendingLogout' in short_authorize:
self._downloader.cache.store('mvpd', requestor_id, {})
return self._extract_mvpd_auth(url, video_id, requestor_id, resource)
return short_authorize

View File

@@ -109,7 +109,10 @@ class AENetworksIE(AENetworksBaseIE):
info = self._parse_theplatform_metadata(theplatform_metadata)
if theplatform_metadata.get('AETN$isBehindWall'):
requestor_id = self._DOMAIN_TO_REQUESTOR_ID[domain]
resource = '<rss version="2.0" xmlns:media="http://search.yahoo.com/mrss/"><channel><title>%s</title><item><title>%s</title><guid>%s</guid><media:rating scheme="urn:v-chip">%s</media:rating></item></channel></rss>' % (requestor_id, theplatform_metadata['title'], theplatform_metadata['AETN$PPL_pplProgramId'], theplatform_metadata['ratings'][0]['rating'])
resource = self._get_mvpd_resource(
requestor_id, theplatform_metadata['title'],
theplatform_metadata.get('AETN$PPL_pplProgramId') or theplatform_metadata.get('AETN$PPL_pplProgramId_OLD'),
theplatform_metadata['ratings'][0]['rating'])
query['auth'] = self._extract_mvpd_auth(
url, video_id, requestor_id, resource)
info.update(self._search_json_ld(webpage, video_id, fatal=False))

View File

@@ -0,0 +1,91 @@
# coding: utf-8
from __future__ import unicode_literals
from .theplatform import ThePlatformIE
from ..utils import (
update_url_query,
parse_age_limit,
int_or_none,
)
class AMCNetworksIE(ThePlatformIE):
_VALID_URL = r'https?://(?:www\.)?(?:amc|bbcamerica|ifc|wetv)\.com/(?:movies/|shows/[^/]+/(?:full-episodes/)?season-\d+/episode-\d+(?:-(?:[^/]+/)?|/))(?P<id>[^/?#]+)'
_TESTS = [{
'url': 'http://www.ifc.com/shows/maron/season-04/episode-01/step-1',
'md5': '',
'info_dict': {
'id': 's3MX01Nl4vPH',
'ext': 'mp4',
'title': 'Maron - Season 4 - Step 1',
'description': 'In denial about his current situation, Marc is reluctantly convinced by his friends to enter rehab. Starring Marc Maron and Constance Zimmer.',
'age_limit': 17,
'upload_date': '20160505',
'timestamp': 1462468831,
'uploader': 'AMCN',
},
'params': {
# m3u8 download
'skip_download': True,
},
}, {
'url': 'http://www.bbcamerica.com/shows/the-hunt/full-episodes/season-1/episode-01-the-hardest-challenge',
'only_matching': True,
}, {
'url': 'http://www.amc.com/shows/preacher/full-episodes/season-01/episode-00/pilot',
'only_matching': True,
}, {
'url': 'http://www.wetv.com/shows/million-dollar-matchmaker/season-01/episode-06-the-dumped-dj-and-shallow-hal',
'only_matching': True,
}, {
'url': 'http://www.ifc.com/movies/chaos',
'only_matching': True,
}]
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
query = {
'mbr': 'true',
'manifest': 'm3u',
}
media_url = self._search_regex(r'window\.platformLinkURL\s*=\s*[\'"]([^\'"]+)', webpage, 'media url')
theplatform_metadata = self._download_theplatform_metadata(self._search_regex(
r'https?://link.theplatform.com/s/([^?]+)', media_url, 'theplatform_path'), display_id)
info = self._parse_theplatform_metadata(theplatform_metadata)
video_id = theplatform_metadata['pid']
title = theplatform_metadata['title']
rating = theplatform_metadata['ratings'][0]['rating']
auth_required = self._search_regex(r'window\.authRequired\s*=\s*(true|false);', webpage, 'auth required')
if auth_required == 'true':
requestor_id = self._search_regex(r'window\.requestor_id\s*=\s*[\'"]([^\'"]+)', webpage, 'requestor id')
resource = self._get_mvpd_resource(requestor_id, title, video_id, rating)
query['auth'] = self._extract_mvpd_auth(url, video_id, requestor_id, resource)
media_url = update_url_query(media_url, query)
formats, subtitles = self._extract_theplatform_smil(media_url, video_id)
self._sort_formats(formats)
info.update({
'id': video_id,
'subtitles': subtitles,
'formats': formats,
'age_limit': parse_age_limit(parse_age_limit(rating)),
})
ns_keys = theplatform_metadata.get('$xmlns', {}).keys()
if ns_keys:
ns = list(ns_keys)[0]
series = theplatform_metadata.get(ns + '$show')
season_number = int_or_none(theplatform_metadata.get(ns + '$season'))
episode = theplatform_metadata.get(ns + '$episodeTitle')
episode_number = int_or_none(theplatform_metadata.get(ns + '$episode'))
if season_number:
title = 'Season %d - %s' % (season_number, title)
if series:
title = '%s - %s' % (series, title)
info.update({
'title': title,
'series': series,
'season_number': season_number,
'episode': episode,
'episode_number': episode_number,
})
return info

View File

@@ -12,46 +12,41 @@ from ..compat import (
from ..utils import (
int_or_none,
parse_iso8601,
sanitized_Request,
smuggle_url,
unsmuggle_url,
urlencode_postdata,
)
class DCNIE(InfoExtractor):
class AWAANIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?(?:awaan|dcndigital)\.ae/(?:#/)?show/(?P<show_id>\d+)/[^/]+(?:/(?P<video_id>\d+)/(?P<season_id>\d+))?'
def _real_extract(self, url):
show_id, video_id, season_id = re.match(self._VALID_URL, url).groups()
if video_id and int(video_id) > 0:
return self.url_result(
'http://www.dcndigital.ae/media/%s' % video_id, 'DCNVideo')
'http://awaan.ae/media/%s' % video_id, 'AWAANVideo')
elif season_id and int(season_id) > 0:
return self.url_result(smuggle_url(
'http://www.dcndigital.ae/program/season/%s' % season_id,
{'show_id': show_id}), 'DCNSeason')
'http://awaan.ae/program/season/%s' % season_id,
{'show_id': show_id}), 'AWAANSeason')
else:
return self.url_result(
'http://www.dcndigital.ae/program/%s' % show_id, 'DCNSeason')
'http://awaan.ae/program/%s' % show_id, 'AWAANSeason')
class DCNBaseIE(InfoExtractor):
def _extract_video_info(self, video_data, video_id, is_live):
class AWAANBaseIE(InfoExtractor):
def _parse_video_data(self, video_data, video_id, is_live):
title = video_data.get('title_en') or video_data['title_ar']
img = video_data.get('img')
thumbnail = 'http://admin.mangomolo.com/analytics/%s' % img if img else None
duration = int_or_none(video_data.get('duration'))
description = video_data.get('description_en') or video_data.get('description_ar')
timestamp = parse_iso8601(video_data.get('create_time'), ' ')
return {
'id': video_id,
'title': self._live_title(title) if is_live else title,
'description': description,
'thumbnail': thumbnail,
'duration': duration,
'timestamp': timestamp,
'description': video_data.get('description_en') or video_data.get('description_ar'),
'thumbnail': 'http://admin.mangomolo.com/analytics/%s' % img if img else None,
'duration': int_or_none(video_data.get('duration')),
'timestamp': parse_iso8601(video_data.get('create_time'), ' '),
'is_live': is_live,
}
@@ -75,11 +70,12 @@ class DCNBaseIE(InfoExtractor):
return formats
class DCNVideoIE(DCNBaseIE):
IE_NAME = 'dcn:video'
class AWAANVideoIE(AWAANBaseIE):
IE_NAME = 'awaan:video'
_VALID_URL = r'https?://(?:www\.)?(?:awaan|dcndigital)\.ae/(?:#/)?(?:video(?:/[^/]+)?|media|catchup/[^/]+/[^/]+)/(?P<id>\d+)'
_TESTS = [{
'url': 'http://www.dcndigital.ae/#/video/%D8%B1%D8%AD%D9%84%D8%A9-%D8%A7%D9%84%D8%B9%D9%85%D8%B1-%D8%A7%D9%84%D8%AD%D9%84%D9%82%D8%A9-1/17375',
'md5': '5f61c33bfc7794315c671a62d43116aa',
'info_dict':
{
'id': '17375',
@@ -90,10 +86,6 @@ class DCNVideoIE(DCNBaseIE):
'timestamp': 1227504126,
'upload_date': '20081124',
},
'params': {
# m3u8 download
'skip_download': True,
},
}, {
'url': 'http://awaan.ae/video/26723981/%D8%AF%D8%A7%D8%B1-%D8%A7%D9%84%D8%B3%D9%84%D8%A7%D9%85:-%D8%AE%D9%8A%D8%B1-%D8%AF%D9%88%D8%B1-%D8%A7%D9%84%D8%A3%D9%86%D8%B5%D8%A7%D8%B1',
'only_matching': True,
@@ -102,11 +94,10 @@ class DCNVideoIE(DCNBaseIE):
def _real_extract(self, url):
video_id = self._match_id(url)
request = sanitized_Request(
video_data = self._download_json(
'http://admin.mangomolo.com/analytics/index.php/plus/video?id=%s' % video_id,
headers={'Origin': 'http://www.dcndigital.ae'})
video_data = self._download_json(request, video_id)
info = self._extract_video_info(video_data, video_id, False)
video_id, headers={'Origin': 'http://awaan.ae'})
info = self._parse_video_data(video_data, video_id, False)
webpage = self._download_webpage(
'http://admin.mangomolo.com/analytics/index.php/customers/embed/video?' +
@@ -121,19 +112,31 @@ class DCNVideoIE(DCNBaseIE):
return info
class DCNLiveIE(DCNBaseIE):
IE_NAME = 'dcn:live'
class AWAANLiveIE(AWAANBaseIE):
IE_NAME = 'awaan:live'
_VALID_URL = r'https?://(?:www\.)?(?:awaan|dcndigital)\.ae/(?:#/)?live/(?P<id>\d+)'
_TEST = {
'url': 'http://awaan.ae/live/6/dubai-tv',
'info_dict': {
'id': '6',
'ext': 'mp4',
'title': 're:Dubai Al Oula [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
'upload_date': '20150107',
'timestamp': 1420588800,
},
'params': {
# m3u8 download
'skip_download': True,
},
}
def _real_extract(self, url):
channel_id = self._match_id(url)
request = sanitized_Request(
channel_data = self._download_json(
'http://admin.mangomolo.com/analytics/index.php/plus/getchanneldetails?channel_id=%s' % channel_id,
headers={'Origin': 'http://www.dcndigital.ae'})
channel_data = self._download_json(request, channel_id)
info = self._extract_video_info(channel_data, channel_id, True)
channel_id, headers={'Origin': 'http://awaan.ae'})
info = self._parse_video_data(channel_data, channel_id, True)
webpage = self._download_webpage(
'http://admin.mangomolo.com/analytics/index.php/customers/embed/index?' +
@@ -148,8 +151,8 @@ class DCNLiveIE(DCNBaseIE):
return info
class DCNSeasonIE(InfoExtractor):
IE_NAME = 'dcn:season'
class AWAANSeasonIE(InfoExtractor):
IE_NAME = 'awaan:season'
_VALID_URL = r'https?://(?:www\.)?(?:awaan|dcndigital)\.ae/(?:#/)?program/(?:(?P<show_id>\d+)|season/(?P<season_id>\d+))'
_TEST = {
'url': 'http://dcndigital.ae/#/program/205024/%D9%85%D8%AD%D8%A7%D8%B6%D8%B1%D8%A7%D8%AA-%D8%A7%D9%84%D8%B4%D9%8A%D8%AE-%D8%A7%D9%84%D8%B4%D8%B9%D8%B1%D8%A7%D9%88%D9%8A',
@@ -170,21 +173,17 @@ class DCNSeasonIE(InfoExtractor):
data['season'] = season_id
show_id = smuggled_data.get('show_id')
if show_id is None:
request = sanitized_Request(
season = self._download_json(
'http://admin.mangomolo.com/analytics/index.php/plus/season_info?id=%s' % season_id,
headers={'Origin': 'http://www.dcndigital.ae'})
season = self._download_json(request, season_id)
season_id, headers={'Origin': 'http://awaan.ae'})
show_id = season['id']
data['show_id'] = show_id
request = sanitized_Request(
show = self._download_json(
'http://admin.mangomolo.com/analytics/index.php/plus/show',
urlencode_postdata(data),
{
'Origin': 'http://www.dcndigital.ae',
show_id, data=urlencode_postdata(data), headers={
'Origin': 'http://awaan.ae',
'Content-Type': 'application/x-www-form-urlencoded'
})
show = self._download_json(request, show_id)
if not season_id:
season_id = show['default_season']
for season in show['seasons']:
@@ -195,6 +194,6 @@ class DCNSeasonIE(InfoExtractor):
for video in show['videos']:
video_id = compat_str(video['id'])
entries.append(self.url_result(
'http://www.dcndigital.ae/media/%s' % video_id, 'DCNVideo', video_id))
'http://awaan.ae/media/%s' % video_id, 'AWAANVideo', video_id))
return self.playlist_result(entries, season_id, title)

View File

@@ -2,6 +2,7 @@
from __future__ import unicode_literals
import re
import itertools
from .common import InfoExtractor
from ..utils import (
@@ -17,6 +18,7 @@ from ..utils import (
from ..compat import (
compat_etree_fromstring,
compat_HTTPError,
compat_urlparse,
)
@@ -1056,19 +1058,35 @@ class BBCCoUkArticleIE(InfoExtractor):
class BBCCoUkPlaylistBaseIE(InfoExtractor):
def _entries(self, webpage, url, playlist_id):
single_page = 'page' in compat_urlparse.parse_qs(
compat_urlparse.urlparse(url).query)
for page_num in itertools.count(2):
for video_id in re.findall(
self._VIDEO_ID_TEMPLATE % BBCCoUkIE._ID_REGEX, webpage):
yield self.url_result(
self._URL_TEMPLATE % video_id, BBCCoUkIE.ie_key())
if single_page:
return
next_page = self._search_regex(
r'<li[^>]+class=(["\'])pagination_+next\1[^>]*><a[^>]+href=(["\'])(?P<url>(?:(?!\2).)+)\2',
webpage, 'next page url', default=None, group='url')
if not next_page:
break
webpage = self._download_webpage(
compat_urlparse.urljoin(url, next_page), playlist_id,
'Downloading page %d' % page_num, page_num)
def _real_extract(self, url):
playlist_id = self._match_id(url)
webpage = self._download_webpage(url, playlist_id)
entries = [
self.url_result(self._URL_TEMPLATE % video_id, BBCCoUkIE.ie_key())
for video_id in re.findall(
self._VIDEO_ID_TEMPLATE % BBCCoUkIE._ID_REGEX, webpage)]
title, description = self._extract_title_and_description(webpage)
return self.playlist_result(entries, playlist_id, title, description)
return self.playlist_result(
self._entries(webpage, url, playlist_id),
playlist_id, title, description)
class BBCCoUkIPlayerPlaylistIE(BBCCoUkPlaylistBaseIE):
@@ -1117,6 +1135,24 @@ class BBCCoUkPlaylistIE(BBCCoUkPlaylistBaseIE):
'description': 'French thriller serial about a missing teenager.',
},
'playlist_mincount': 7,
}, {
# multipage playlist, explicit page
'url': 'http://www.bbc.co.uk/programmes/b00mfl7n/clips?page=1',
'info_dict': {
'id': 'b00mfl7n',
'title': 'Frozen Planet - Clips - BBC One',
'description': 'md5:65dcbf591ae628dafe32aa6c4a4a0d8c',
},
'playlist_mincount': 24,
}, {
# multipage playlist, all pages
'url': 'http://www.bbc.co.uk/programmes/b00mfl7n/clips',
'info_dict': {
'id': 'b00mfl7n',
'title': 'Frozen Planet - Clips - BBC One',
'description': 'md5:65dcbf591ae628dafe32aa6c4a4a0d8c',
},
'playlist_mincount': 142,
}, {
'url': 'http://www.bbc.co.uk/programmes/b05rcz9v/broadcasts/2016/06',
'only_matching': True,

View File

@@ -1,31 +1,74 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import smuggle_url
from .adobepass import AdobePassIE
from ..utils import (
smuggle_url,
update_url_query,
int_or_none,
)
class BravoTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?bravotv\.com/(?:[^/]+/)+videos/(?P<id>[^/?]+)'
_TEST = {
class BravoTVIE(AdobePassIE):
_VALID_URL = r'https?://(?:www\.)?bravotv\.com/(?:[^/]+/)+(?P<id>[^/?#]+)'
_TESTS = [{
'url': 'http://www.bravotv.com/last-chance-kitchen/season-5/videos/lck-ep-12-fishy-finale',
'md5': 'd60cdf68904e854fac669bd26cccf801',
'md5': '9086d0b7ef0ea2aabc4781d75f4e5863',
'info_dict': {
'id': 'LitrBdX64qLn',
'id': 'zHyk1_HU_mPy',
'ext': 'mp4',
'title': 'Last Chance Kitchen Returns',
'description': 'S13: Last Chance Kitchen Returns for Top Chef Season 13',
'timestamp': 1448926740,
'upload_date': '20151130',
'title': 'LCK Ep 12: Fishy Finale',
'description': 'S13/E12: Two eliminated chefs have just 12 minutes to cook up a delicious fish dish.',
'uploader': 'NBCU-BRAV',
'upload_date': '20160302',
'timestamp': 1456945320,
}
}
}, {
'url': 'http://www.bravotv.com/below-deck/season-3/ep-14-reunion-part-1',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
account_pid = self._search_regex(r'"account_pid"\s*:\s*"([^"]+)"', webpage, 'account pid')
release_pid = self._search_regex(r'"release_pid"\s*:\s*"([^"]+)"', webpage, 'release pid')
return self.url_result(smuggle_url(
'http://link.theplatform.com/s/%s/%s?mbr=true&switch=progressive' % (account_pid, release_pid),
{'force_smil_url': True}), 'ThePlatform', release_pid)
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
settings = self._parse_json(self._search_regex(
r'jQuery\.extend\(Drupal\.settings\s*,\s*({.+?})\);', webpage, 'drupal settings'),
display_id)
info = {}
query = {
'mbr': 'true',
}
account_pid, release_pid = [None] * 2
tve = settings.get('sharedTVE')
if tve:
query['manifest'] = 'm3u'
account_pid = 'HNK2IC'
release_pid = tve['release_pid']
if tve.get('entitlement') == 'auth':
adobe_pass = settings.get('adobePass', {})
resource = self._get_mvpd_resource(
adobe_pass.get('adobePassResourceId', 'bravo'),
tve['title'], release_pid, tve.get('rating'))
query['auth'] = self._extract_mvpd_auth(
url, release_pid, adobe_pass.get('adobePassRequestorId', 'bravo'), resource)
else:
shared_playlist = settings['shared_playlist']
account_pid = shared_playlist['account_pid']
metadata = shared_playlist['video_metadata'][shared_playlist['default_clip']]
release_pid = metadata['release_pid']
info.update({
'title': metadata['title'],
'description': metadata.get('description'),
'season_number': int_or_none(metadata.get('season_num')),
'episode_number': int_or_none(metadata.get('episode_num')),
})
query['switch'] = 'progressive'
info.update({
'_type': 'url_transparent',
'id': release_pid,
'url': smuggle_url(update_url_query(
'http://link.theplatform.com/s/%s/%s' % (account_pid, release_pid),
query), {'force_smil_url': True}),
'ie_key': 'ThePlatform',
})
return info

View File

@@ -4,6 +4,7 @@ from .theplatform import ThePlatformFeedIE
from ..utils import (
int_or_none,
find_xpath_attr,
ExtractorError,
)
@@ -17,19 +18,6 @@ class CBSBaseIE(ThePlatformFeedIE):
}]
} if closed_caption_e is not None and closed_caption_e.attrib.get('value') else []
def _extract_video_info(self, filter_query, video_id):
return self._extract_feed_info(
'dJ5BDC', 'VxxJg8Ymh8sE', filter_query, video_id, lambda entry: {
'series': entry.get('cbs$SeriesTitle'),
'season_number': int_or_none(entry.get('cbs$SeasonNumber')),
'episode': entry.get('cbs$EpisodeTitle'),
'episode_number': int_or_none(entry.get('cbs$EpisodeNumber')),
}, {
'StreamPack': {
'manifest': 'm3u',
}
})
class CBSIE(CBSBaseIE):
_VALID_URL = r'(?:cbs:|https?://(?:www\.)?(?:cbs\.com/shows/[^/]+/video|colbertlateshow\.com/(?:video|podcasts))/)(?P<id>[\w-]+)'
@@ -38,7 +26,6 @@ class CBSIE(CBSBaseIE):
'url': 'http://www.cbs.com/shows/garth-brooks/video/_u7W953k6la293J7EPTd9oHkSPs6Xn6_/connect-chat-feat-garth-brooks/',
'info_dict': {
'id': '_u7W953k6la293J7EPTd9oHkSPs6Xn6_',
'display_id': 'connect-chat-feat-garth-brooks',
'ext': 'mp4',
'title': 'Connect Chat feat. Garth Brooks',
'description': 'Connect with country music singer Garth Brooks, as he chats with fans on Wednesday November 27, 2013. Be sure to tune in to Garth Brooks: Live from Las Vegas, Friday November 29, at 9/8c on CBS!',
@@ -47,7 +34,10 @@ class CBSIE(CBSBaseIE):
'upload_date': '20131127',
'uploader': 'CBSI-NEW',
},
'expected_warnings': ['Failed to download m3u8 information'],
'params': {
# m3u8 download
'skip_download': True,
},
'_skip': 'Blocked outside the US',
}, {
'url': 'http://colbertlateshow.com/video/8GmB0oY0McANFvp2aEffk9jZZZ2YyXxy/the-colbeard/',
@@ -56,8 +46,31 @@ class CBSIE(CBSBaseIE):
'url': 'http://www.colbertlateshow.com/podcasts/dYSwjqPs_X1tvbV_P2FcPWRa_qT6akTC/in-the-bad-room-with-stephen/',
'only_matching': True,
}]
TP_RELEASE_URL_TEMPLATE = 'http://link.theplatform.com/s/dJ5BDC/%s?mbr=true'
def _extract_video_info(self, guid):
path = 'dJ5BDC/media/guid/2198311517/' + guid
smil_url = 'http://link.theplatform.com/s/%s?mbr=true' % path
formats, subtitles = self._extract_theplatform_smil(smil_url + '&manifest=m3u', guid)
for r in ('HLS&formats=M3U', 'RTMP', 'WIFI', '3G'):
try:
tp_formats, _ = self._extract_theplatform_smil(smil_url + '&assetTypes=' + r, guid, 'Downloading %s SMIL data' % r.split('&')[0])
formats.extend(tp_formats)
except ExtractorError:
continue
self._sort_formats(formats)
metadata = self._download_theplatform_metadata(path, guid)
info = self._parse_theplatform_metadata(metadata)
info.update({
'id': guid,
'formats': formats,
'subtitles': subtitles,
'series': metadata.get('cbs$SeriesTitle'),
'season_number': int_or_none(metadata.get('cbs$SeasonNumber')),
'episode': metadata.get('cbs$EpisodeTitle'),
'episode_number': int_or_none(metadata.get('cbs$EpisodeNumber')),
})
return info
def _real_extract(self, url):
content_id = self._match_id(url)
return self._extract_video_info('byGuid=%s' % content_id, content_id)
return self._extract_video_info(content_id)

View File

@@ -41,13 +41,8 @@ class CBSLocalIE(AnvatoIE):
'url': 'http://cleveland.cbslocal.com/2016/05/16/indians-score-season-high-15-runs-in-blowout-win-over-reds-rapid-reaction/',
'info_dict': {
'id': 'GxfCe0Zo7D-175909-5588',
'ext': 'mp4',
'title': 'Recap: CLE 15, CIN 6',
'description': '5/16/16: Indians\' bats explode for 15 runs in a win',
'upload_date': '20160516',
'timestamp': 1463433840,
'duration': 49,
},
'playlist_count': 9,
'params': {
# m3u8 download
'skip_download': True,
@@ -60,12 +55,11 @@ class CBSLocalIE(AnvatoIE):
sendtonews_url = SendtoNewsIE._extract_url(webpage)
if sendtonews_url:
info_dict = {
'_type': 'url_transparent',
'url': compat_urlparse.urljoin(url, sendtonews_url),
}
else:
info_dict = self._extract_anvato_videos(webpage, display_id)
return self.url_result(
compat_urlparse.urljoin(url, sendtonews_url),
ie=SendtoNewsIE.ie_key())
info_dict = self._extract_anvato_videos(webpage, display_id)
time_str = self._html_search_regex(
r'class="entry-date">([^<]+)<', webpage, 'released date', fatal=False)

View File

@@ -2,13 +2,13 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from .cbs import CBSBaseIE
from .cbs import CBSIE
from ..utils import (
parse_duration,
)
class CBSNewsIE(CBSBaseIE):
class CBSNewsIE(CBSIE):
IE_DESC = 'CBS News'
_VALID_URL = r'https?://(?:www\.)?cbsnews\.com/(?:news|videos)/(?P<id>[\da-z_-]+)'
@@ -35,7 +35,8 @@ class CBSNewsIE(CBSBaseIE):
'ext': 'mp4',
'title': 'Fort Hood shooting: Army downplays mental illness as cause of attack',
'description': 'md5:4a6983e480542d8b333a947bfc64ddc7',
'upload_date': '19700101',
'upload_date': '20140404',
'timestamp': 1396650660,
'uploader': 'CBSI-NEW',
'thumbnail': 're:^https?://.*\.jpg$',
'duration': 205,
@@ -63,14 +64,15 @@ class CBSNewsIE(CBSBaseIE):
item = video_info['item'] if 'item' in video_info else video_info
guid = item['mpxRefId']
return self._extract_video_info('byGuid=%s' % guid, guid)
return self._extract_video_info(guid)
class CBSNewsLiveVideoIE(InfoExtractor):
IE_DESC = 'CBS News Live Videos'
_VALID_URL = r'https?://(?:www\.)?cbsnews\.com/live/video/(?P<id>[\da-z_-]+)'
_TESTS = [{
# Live videos get deleted soon. See http://www.cbsnews.com/live/ for the latest examples
_TEST = {
'url': 'http://www.cbsnews.com/live/video/clinton-sanders-prepare-to-face-off-in-nh/',
'info_dict': {
'id': 'clinton-sanders-prepare-to-face-off-in-nh',
@@ -78,15 +80,8 @@ class CBSNewsLiveVideoIE(InfoExtractor):
'title': 'Clinton, Sanders Prepare To Face Off In NH',
'duration': 334,
},
'skip': 'Video gone, redirected to http://www.cbsnews.com/live/',
}, {
'url': 'http://www.cbsnews.com/live/video/video-shows-intense-paragliding-accident/',
'info_dict': {
'id': 'video-shows-intense-paragliding-accident',
'ext': 'flv',
'title': 'Video Shows Intense Paragliding Accident',
},
}]
'skip': 'Video gone',
}
def _real_extract(self, url):
video_id = self._match_id(url)

View File

@@ -23,6 +23,9 @@ class CBSSportsIE(CBSBaseIE):
}
}]
def _extract_video_info(self, filter_query, video_id):
return self._extract_feed_info('dJ5BDC', 'VxxJg8Ymh8sE', filter_query, video_id)
def _real_extract(self, url):
video_id = self._match_id(url)
return self._extract_video_info('byId=%s' % video_id, video_id)

View File

@@ -0,0 +1,51 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import remove_end
class CharlieRoseIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?charlierose\.com/video(?:s|/player)/(?P<id>\d+)'
_TESTS = [{
'url': 'https://charlierose.com/videos/27996',
'md5': 'fda41d49e67d4ce7c2411fd2c4702e09',
'info_dict': {
'id': '27996',
'ext': 'mp4',
'title': 'Remembering Zaha Hadid',
'thumbnail': 're:^https?://.*\.jpg\?\d+',
'description': 'We revisit past conversations with Zaha Hadid, in memory of the world renowned Iraqi architect.',
'subtitles': {
'en': [{
'ext': 'vtt',
}],
},
},
}, {
'url': 'https://charlierose.com/videos/27996',
'only_matching': True,
}]
_PLAYER_BASE = 'https://charlierose.com/video/player/%s'
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(self._PLAYER_BASE % video_id, video_id)
title = remove_end(self._og_search_title(webpage), ' - Charlie Rose')
info_dict = self._parse_html5_media_entries(
self._PLAYER_BASE % video_id, webpage, video_id,
m3u8_entry_protocol='m3u8_native')[0]
self._sort_formats(info_dict['formats'])
self._remove_duplicate_formats(info_dict['formats'])
info_dict.update({
'id': video_id,
'title': title,
'thumbnail': self._og_search_thumbnail(webpage),
'description': self._og_search_description(webpage),
})
return info_dict

View File

@@ -11,7 +11,7 @@ from ..utils import (
class CNNIE(InfoExtractor):
_VALID_URL = r'''(?x)https?://(?:(?:edition|www)\.)?cnn\.com/video/(?:data/.+?|\?)/
_VALID_URL = r'''(?x)https?://(?:(?P<sub_domain>edition|www|money)\.)?cnn\.com/(?:video/(?:data/.+?|\?)/)?videos?/
(?P<path>.+?/(?P<title>[^/]+?)(?:\.(?:[a-z\-]+)|(?=&)))'''
_TESTS = [{
@@ -45,19 +45,46 @@ class CNNIE(InfoExtractor):
'description': 'md5:e7223a503315c9f150acac52e76de086',
'upload_date': '20141222',
}
}, {
'url': 'http://money.cnn.com/video/news/2016/08/19/netflix-stunning-stats.cnnmoney/index.html',
'md5': '52a515dc1b0f001cd82e4ceda32be9d1',
'info_dict': {
'id': '/video/news/2016/08/19/netflix-stunning-stats.cnnmoney',
'ext': 'mp4',
'title': '5 stunning stats about Netflix',
'description': 'Did you know that Netflix has more than 80 million members? Here are five facts about the online video distributor that you probably didn\'t know.',
'upload_date': '20160819',
}
}, {
'url': 'http://cnn.com/video/?/video/politics/2015/03/27/pkg-arizona-senator-church-attendance-mandatory.ktvk',
'only_matching': True,
}, {
'url': 'http://cnn.com/video/?/video/us/2015/04/06/dnt-baker-refuses-anti-gay-order.wkmg',
'only_matching': True,
}, {
'url': 'http://edition.cnn.com/videos/arts/2016/04/21/olympic-games-cultural-a-z-brazil.cnn',
'only_matching': True,
}]
_CONFIG = {
# http://edition.cnn.com/.element/apps/cvp/3.0/cfg/spider/cnn/expansion/config.xml
'edition': {
'data_src': 'http://edition.cnn.com/video/data/3.0/video/%s/index.xml',
'media_src': 'http://pmd.cdn.turner.com/cnn/big',
},
# http://money.cnn.com/.element/apps/cvp2/cfg/config.xml
'money': {
'data_src': 'http://money.cnn.com/video/data/4.0/video/%s.xml',
'media_src': 'http://ht3.cdn.turner.com/money/big',
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
path = mobj.group('path')
page_title = mobj.group('title')
info_url = 'http://edition.cnn.com/video/data/3.0/%s/index.xml' % path
sub_domain, path, page_title = re.match(self._VALID_URL, url).groups()
if sub_domain not in ('money', 'edition'):
sub_domain = 'edition'
config = self._CONFIG[sub_domain]
info_url = config['data_src'] % path
info = self._download_xml(info_url, page_title)
formats = []
@@ -66,7 +93,7 @@ class CNNIE(InfoExtractor):
(?:_(?P<bitrate>[0-9]+)k)?
''')
for f in info.findall('files/file'):
video_url = 'http://ht.cdn.turner.com/cnn/big%s' % (f.text.strip())
video_url = config['media_src'] + f.text.strip()
fdct = {
'format_id': f.attrib['bitrate'],
'url': video_url,
@@ -146,7 +173,7 @@ class CNNBlogsIE(InfoExtractor):
class CNNArticleIE(InfoExtractor):
_VALID_URL = r'https?://(?:(?:edition|www)\.)?cnn\.com/(?!video/)'
_VALID_URL = r'https?://(?:(?:edition|www)\.)?cnn\.com/(?!videos?/)'
_TEST = {
'url': 'http://www.cnn.com/2014/12/21/politics/obama-north-koreas-hack-not-war-but-cyber-vandalism/',
'md5': '689034c2a3d9c6dc4aa72d65a81efd01',

View File

@@ -662,6 +662,24 @@ class InfoExtractor(object):
else:
return res
def _get_netrc_login_info(self, netrc_machine=None):
username = None
password = None
netrc_machine = netrc_machine or self._NETRC_MACHINE
if self._downloader.params.get('usenetrc', False):
try:
info = netrc.netrc().authenticators(netrc_machine)
if info is not None:
username = info[0]
password = info[2]
else:
raise netrc.NetrcParseError('No authenticators for %s' % netrc_machine)
except (IOError, netrc.NetrcParseError) as err:
self._downloader.report_warning('parsing .netrc: %s' % error_to_compat_str(err))
return (username, password)
def _get_login_info(self):
"""
Get the login info as (username, password)
@@ -679,16 +697,8 @@ class InfoExtractor(object):
if downloader_params.get('username') is not None:
username = downloader_params['username']
password = downloader_params['password']
elif downloader_params.get('usenetrc', False):
try:
info = netrc.netrc().authenticators(self._NETRC_MACHINE)
if info is not None:
username = info[0]
password = info[2]
else:
raise netrc.NetrcParseError('No authenticators for %s' % self._NETRC_MACHINE)
except (IOError, netrc.NetrcParseError) as err:
self._downloader.report_warning('parsing .netrc: %s' % error_to_compat_str(err))
else:
username, password = self._get_netrc_login_info()
return (username, password)
@@ -1685,7 +1695,7 @@ class InfoExtractor(object):
self.report_warning('Unknown MIME type %s in DASH manifest' % mime_type)
return formats
def _parse_html5_media_entries(self, base_url, webpage):
def _parse_html5_media_entries(self, base_url, webpage, video_id, m3u8_id=None, m3u8_entry_protocol='m3u8'):
def absolute_url(video_url):
return compat_urlparse.urljoin(base_url, video_url)
@@ -1700,6 +1710,21 @@ class InfoExtractor(object):
return f
return {}
def _media_formats(src, cur_media_type):
full_url = absolute_url(src)
if determine_ext(full_url) == 'm3u8':
is_plain_url = False
formats = self._extract_m3u8_formats(
full_url, video_id, ext='mp4',
entry_protocol=m3u8_entry_protocol, m3u8_id=m3u8_id)
else:
is_plain_url = True
formats = [{
'url': full_url,
'vcodec': 'none' if cur_media_type == 'audio' else None,
}]
return is_plain_url, formats
entries = []
for media_tag, media_type, media_content in re.findall(r'(?s)(<(?P<tag>video|audio)[^>]*>)(.*?)</(?P=tag)>', webpage):
media_info = {
@@ -1709,10 +1734,8 @@ class InfoExtractor(object):
media_attributes = extract_attributes(media_tag)
src = media_attributes.get('src')
if src:
media_info['formats'].append({
'url': absolute_url(src),
'vcodec': 'none' if media_type == 'audio' else None,
})
_, formats = _media_formats(src)
media_info['formats'].extend(formats)
media_info['thumbnail'] = media_attributes.get('poster')
if media_content:
for source_tag in re.findall(r'<source[^>]+>', media_content):
@@ -1720,12 +1743,13 @@ class InfoExtractor(object):
src = source_attributes.get('src')
if not src:
continue
f = parse_content_type(source_attributes.get('type'))
f.update({
'url': absolute_url(src),
'vcodec': 'none' if media_type == 'audio' else None,
})
media_info['formats'].append(f)
is_plain_url, formats = _media_formats(src, media_type)
if is_plain_url:
f = parse_content_type(source_attributes.get('type'))
f.update(formats[0])
media_info['formats'].append(f)
else:
media_info['formats'].extend(formats)
for track_tag in re.findall(r'<track[^>]+>', media_content):
track_attributes = extract_attributes(track_tag)
kind = track_attributes.get('kind')
@@ -1741,6 +1765,18 @@ class InfoExtractor(object):
entries.append(media_info)
return entries
def _extract_akamai_formats(self, manifest_url, video_id):
formats = []
f4m_url = re.sub(r'(https?://.+?)/i/', r'\1/z/', manifest_url).replace('/master.m3u8', '/manifest.f4m')
formats.extend(self._extract_f4m_formats(
update_url_query(f4m_url, {'hdcore': '3.7.0'}),
video_id, f4m_id='hds', fatal=False))
m3u8_url = re.sub(r'(https?://.+?)/z/', r'\1/i/', manifest_url).replace('/manifest.f4m', '/master.m3u8')
formats.extend(self._extract_m3u8_formats(
m3u8_url, video_id, 'mp4', 'm3u8_native',
m3u8_id='hls', fatal=False))
return formats
def _live_title(self, name):
""" Generate the title for a live video """
now = datetime.datetime.now()

View File

@@ -1,9 +1,13 @@
from __future__ import unicode_literals
import re
import time
from .common import InfoExtractor
from ..utils import int_or_none
from ..utils import (
int_or_none,
HEADRequest,
)
class CultureUnpluggedIE(InfoExtractor):
@@ -32,6 +36,9 @@ class CultureUnpluggedIE(InfoExtractor):
video_id = mobj.group('id')
display_id = mobj.group('display_id') or video_id
# request setClientTimezone.php to get PHPSESSID cookie which is need to get valid json data in the next request
self._request_webpage(HEADRequest(
'http://www.cultureunplugged.com/setClientTimezone.php?timeOffset=%d' % -(time.timezone / 3600)), display_id)
movie_data = self._download_json(
'http://www.cultureunplugged.com/movie-data/cu-%s.json' % video_id, display_id)

View File

@@ -38,6 +38,12 @@ class DBTVIE(InfoExtractor):
'only_matching': True,
}]
@staticmethod
def _extract_urls(webpage):
return [url for _, url in re.findall(
r'<iframe[^>]+src=(["\'])((?:https?:)?//(?:www\.)?dbtv\.no/(?:lazy)?player/\d+.*?)\1',
webpage)]
def _real_extract(self, url):
video_id, display_id = re.match(self._VALID_URL, url).groups()

View File

@@ -11,7 +11,17 @@ from ..utils import (
class DiscoveryGoIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?discoverygo\.com/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_VALID_URL = r'''(?x)https?://(?:www\.)?(?:
discovery|
investigationdiscovery|
discoverylife|
animalplanet|
ahctv|
destinationamerica|
sciencechannel|
tlc|
velocitychannel
)go\.com/(?:[^/]+/)*(?P<id>[^/?#&]+)'''
_TEST = {
'url': 'https://www.discoverygo.com/love-at-first-kiss/kiss-first-ask-questions-later/',
'info_dict': {

View File

@@ -10,18 +10,18 @@ from ..utils import (
class DotsubIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?dotsub\.com/view/(?P<id>[^/]+)'
_TEST = {
'url': 'http://dotsub.com/view/aed3b8b2-1889-4df5-ae63-ad85f5572f27',
'md5': '0914d4d69605090f623b7ac329fea66e',
'url': 'https://dotsub.com/view/9c63db2a-fa95-4838-8e6e-13deafe47f09',
'md5': '21c7ff600f545358134fea762a6d42b6',
'info_dict': {
'id': 'aed3b8b2-1889-4df5-ae63-ad85f5572f27',
'id': '9c63db2a-fa95-4838-8e6e-13deafe47f09',
'ext': 'flv',
'title': 'Pyramids of Waste (2010), AKA The Lightbulb Conspiracy - Planned obsolescence documentary',
'description': 'md5:699a0f7f50aeec6042cb3b1db2d0d074',
'thumbnail': 're:^https?://dotsub.com/media/aed3b8b2-1889-4df5-ae63-ad85f5572f27/p',
'duration': 3169,
'uploader': '4v4l0n42',
'timestamp': 1292248482.625,
'upload_date': '20101213',
'title': 'MOTIVATION - "It\'s Possible" Best Inspirational Video Ever',
'description': 'md5:41af1e273edbbdfe4e216a78b9d34ac6',
'thumbnail': 're:^https?://dotsub.com/media/9c63db2a-fa95-4838-8e6e-13deafe47f09/p',
'duration': 198,
'uploader': 'liuxt',
'timestamp': 1385778501.104,
'upload_date': '20131130',
'view_count': int,
}
}

View File

@@ -52,11 +52,24 @@ class EaglePlatformIE(InfoExtractor):
@staticmethod
def _extract_url(webpage):
# Regular iframe embedding
mobj = re.search(
r'<iframe[^>]+src=(["\'])(?P<url>(?:https?:)?//.+?\.media\.eagleplatform\.com/index/player\?.+?)\1',
webpage)
if mobj is not None:
return mobj.group('url')
# Basic usage embedding (see http://dultonmedia.github.io/eplayer/)
mobj = re.search(
r'''(?xs)
<script[^>]+
src=(?P<q1>["\'])(?:https?:)?//(?P<host>.+?\.media\.eagleplatform\.com)/player/player\.js(?P=q1)
.+?
<div[^>]+
class=(?P<q2>["\'])eagleplayer(?P=q2)[^>]+
data-id=["\'](?P<id>\d+)
''', webpage)
if mobj is not None:
return 'eagleplatform:%(host)s:%(id)s' % mobj.groupdict()
@staticmethod
def _handle_error(response):

View File

@@ -1,7 +1,10 @@
# flake8: noqa
from __future__ import unicode_literals
from .abc import ABCIE
from .abc import (
ABCIE,
ABCIViewIE,
)
from .abc7news import Abc7NewsIE
from .abcnews import (
AbcNewsIE,
@@ -29,6 +32,7 @@ from .aftonbladet import AftonbladetIE
from .airmozilla import AirMozillaIE
from .aljazeera import AlJazeeraIE
from .alphaporno import AlphaPornoIE
from .amcnetworks import AMCNetworksIE
from .animeondemand import AnimeOnDemandIE
from .anitube import AnitubeIE
from .anysex import AnySexIE
@@ -67,6 +71,12 @@ from .atttechchannel import ATTTechChannelIE
from .audimedia import AudiMediaIE
from .audioboom import AudioBoomIE
from .audiomack import AudiomackIE, AudiomackAlbumIE
from .awaan import (
AWAANIE,
AWAANVideoIE,
AWAANLiveIE,
AWAANSeasonIE,
)
from .azubu import AzubuIE, AzubuLiveIE
from .baidu import BaiduVideoIE
from .bambuser import BambuserIE, BambuserChannelIE
@@ -133,6 +143,7 @@ from .ccc import CCCIE
from .cda import CDAIE
from .ceskatelevize import CeskaTelevizeIE
from .channel9 import Channel9IE
from .charlierose import CharlieRoseIE
from .chaturbate import ChaturbateIE
from .chilloutzone import ChilloutzoneIE
from .chirbit import (
@@ -195,12 +206,6 @@ from .daum import (
DaumUserIE,
)
from .dbtv import DBTVIE
from .dcn import (
DCNIE,
DCNVideoIE,
DCNLiveIE,
DCNSeasonIE,
)
from .dctp import DctpTvIE
from .deezer import DeezerPlaylistIE
from .democracynow import DemocracynowIE
@@ -287,6 +292,7 @@ from .freevideo import FreeVideoIE
from .funimation import FunimationIE
from .funnyordie import FunnyOrDieIE
from .fusion import FusionIE
from .fxnetworks import FXNetworksIE
from .gameinformer import GameInformerIE
from .gameone import (
GameOneIE,
@@ -322,7 +328,10 @@ from .heise import HeiseIE
from .hellporno import HellPornoIE
from .helsinki import HelsinkiIE
from .hentaistigma import HentaiStigmaIE
from .hgtv import HGTVIE
from .hgtv import (
HGTVIE,
HGTVComShowIE,
)
from .historicfilms import HistoricFilmsIE
from .hitbox import HitboxIE, HitboxLiveIE
from .hornbunny import HornBunnyIE
@@ -637,6 +646,7 @@ from .podomatic import PodomaticIE
from .pokemon import PokemonIE
from .polskieradio import PolskieRadioIE
from .porn91 import Porn91IE
from .porncom import PornComIE
from .pornhd import PornHdIE
from .pornhub import (
PornHubIE,
@@ -896,7 +906,10 @@ from .tvp import (
TVPIE,
TVPSeriesIE,
)
from .tvplay import TVPlayIE
from .tvplay import (
TVPlayIE,
ViafreeIE,
)
from .tweakers import TweakersIE
from .twentyfourvideo import TwentyFourVideoIE
from .twentymin import TwentyMinutenIE
@@ -926,6 +939,10 @@ from .udn import UDNEmbedIE
from .digiteka import DigitekaIE
from .unistra import UnistraIE
from .uol import UOLIE
from .uplynk import (
UplynkIE,
UplynkPreplayIE,
)
from .urort import UrortIE
from .urplay import URPlayIE
from .usatoday import USATodayIE
@@ -954,6 +971,7 @@ from .vice import (
ViceIE,
ViceShowIE,
)
from .viceland import VicelandIE
from .vidbit import VidbitIE
from .viddler import ViddlerIE
from .videodetective import VideoDetectiveIE
@@ -1100,8 +1118,4 @@ from .youtube import (
)
from .zapiks import ZapiksIE
from .zdf import ZDFIE, ZDFChannelIE
from .zingmp3 import (
ZingMp3SongIE,
ZingMp3AlbumIE,
)
from .zippcast import ZippCastIE
from .zingmp3 import ZingMp3IE

View File

@@ -1,20 +1,14 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
int_or_none,
sanitized_Request,
str_to_int,
)
from ..utils import str_to_int
from .keezmovies import KeezMoviesIE
class ExtremeTubeIE(InfoExtractor):
class ExtremeTubeIE(KeezMoviesIE):
_VALID_URL = r'https?://(?:www\.)?extremetube\.com/(?:[^/]+/)?video/(?P<id>[^/#?&]+)'
_TESTS = [{
'url': 'http://www.extremetube.com/video/music-video-14-british-euro-brit-european-cumshots-swallow-652431',
'md5': '344d0c6d50e2f16b06e49ca011d8ac69',
'md5': '1fb9228f5e3332ec8c057d6ac36f33e0',
'info_dict': {
'id': 'music-video-14-british-euro-brit-european-cumshots-swallow-652431',
'ext': 'mp4',
@@ -35,58 +29,22 @@ class ExtremeTubeIE(InfoExtractor):
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage, info = self._extract_info(url)
req = sanitized_Request(url)
req.add_header('Cookie', 'age_verified=1')
webpage = self._download_webpage(req, video_id)
if not info['title']:
info['title'] = self._search_regex(
r'<h1[^>]+title="([^"]+)"[^>]*>', webpage, 'title')
video_title = self._html_search_regex(
r'<h1 [^>]*?title="([^"]+)"[^>]*>', webpage, 'title')
uploader = self._html_search_regex(
r'Uploaded by:\s*</strong>\s*(.+?)\s*</div>',
webpage, 'uploader', fatal=False)
view_count = str_to_int(self._html_search_regex(
view_count = str_to_int(self._search_regex(
r'Views:\s*</strong>\s*<span>([\d,\.]+)</span>',
webpage, 'view count', fatal=False))
flash_vars = self._parse_json(
self._search_regex(
r'var\s+flashvars\s*=\s*({.+?});', webpage, 'flash vars'),
video_id)
formats = []
for quality_key, video_url in flash_vars.items():
height = int_or_none(self._search_regex(
r'quality_(\d+)[pP]$', quality_key, 'height', default=None))
if not height:
continue
f = {
'url': video_url,
}
mobj = re.search(
r'/(?P<height>\d{3,4})[pP]_(?P<bitrate>\d+)[kK]_\d+', video_url)
if mobj:
height = int(mobj.group('height'))
bitrate = int(mobj.group('bitrate'))
f.update({
'format_id': '%dp-%dk' % (height, bitrate),
'height': height,
'tbr': bitrate,
})
else:
f.update({
'format_id': '%dp' % height,
'height': height,
})
formats.append(f)
self._sort_formats(formats)
return {
'id': video_id,
'title': video_title,
'formats': formats,
info.update({
'uploader': uploader,
'view_count': view_count,
'age_limit': 18,
}
})
return info

View File

@@ -2,44 +2,40 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_xpath
from ..compat import compat_urlparse
from ..utils import (
int_or_none,
qualities,
unified_strdate,
xpath_attr,
xpath_element,
xpath_text,
xpath_with_ns,
)
class FirstTVIE(InfoExtractor):
IE_NAME = '1tv'
IE_DESC = 'Первый канал'
_VALID_URL = r'https?://(?:www\.)?1tv\.ru/(?:[^/]+/)+p?(?P<id>\d+)'
_VALID_URL = r'https?://(?:www\.)?1tv\.ru/(?:[^/]+/)+(?P<id>[^/?#]+)'
_TESTS = [{
# single format via video_materials.json API
'url': 'http://www.1tv.ru/prj/inprivate/vypusk/35930',
'md5': '82a2777648acae812d58b3f5bd42882b',
# single format
'url': 'http://www.1tv.ru/shows/naedine-so-vsemi/vypuski/gost-lyudmila-senchina-naedine-so-vsemi-vypusk-ot-12-02-2015',
'md5': 'a1b6b60d530ebcf8daacf4565762bbaf',
'info_dict': {
'id': '35930',
'id': '40049',
'ext': 'mp4',
'title': 'Гость Людмила Сенчина. Наедине со всеми. Выпуск от 12.02.2015',
'description': 'md5:357933adeede13b202c7c21f91b871b2',
'description': 'md5:36a39c1d19618fec57d12efe212a8370',
'thumbnail': 're:^https?://.*\.(?:jpg|JPG)$',
'upload_date': '20150212',
'duration': 2694,
},
}, {
# multiple formats via video_materials.json API
'url': 'http://www.1tv.ru/video_archive/projects/dobroeutro/p113641',
# multiple formats
'url': 'http://www.1tv.ru/shows/dobroe-utro/pro-zdorove/vesennyaya-allergiya-dobroe-utro-fragment-vypuska-ot-07042016',
'info_dict': {
'id': '113641',
'id': '364746',
'ext': 'mp4',
'title': 'Весенняя аллергия. Доброе утро. Фрагмент выпуска от 07.04.2016',
'description': 'md5:8dcebb3dded0ff20fade39087fd1fee2',
'description': 'md5:a242eea0031fd180a4497d52640a9572',
'thumbnail': 're:^https?://.*\.(?:jpg|JPG)$',
'upload_date': '20160407',
'duration': 179,
@@ -48,84 +44,47 @@ class FirstTVIE(InfoExtractor):
'params': {
'skip_download': True,
},
}, {
# single format only available via ONE_ONLINE_VIDEOS.archive_single_xml API
'url': 'http://www.1tv.ru/video_archive/series/f7552/p47038',
'md5': '519d306c5b5669761fd8906c39dbee23',
'info_dict': {
'id': '47038',
'ext': 'mp4',
'title': '"Побег". Второй сезон. 3 серия',
'description': 'md5:3abf8f6b9bce88201c33e9a3d794a00b',
'thumbnail': 're:^https?://.*\.(?:jpg|JPG)$',
'upload_date': '20120516',
'duration': 3080,
},
}, {
'url': 'http://www.1tv.ru/videoarchive/9967',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
display_id = self._match_id(url)
# Videos with multiple formats only available via this API
video = self._download_json(
'http://www.1tv.ru/video_materials.json?legacy_id=%s' % video_id,
video_id, fatal=False)
description, thumbnail, upload_date, duration = [None] * 4
if video:
item = video[0]
title = item['title']
quality = qualities(('ld', 'sd', 'hd', ))
formats = [{
'url': f['src'],
'format_id': f.get('name'),
'quality': quality(f.get('name')),
} for f in item['mbr'] if f.get('src')]
thumbnail = item.get('poster')
else:
# Some videos are not available via video_materials.json
video = self._download_xml(
'http://www.1tv.ru/owa/win/ONE_ONLINE_VIDEOS.archive_single_xml?pid=%s' % video_id,
video_id)
NS_MAP = {
'media': 'http://search.yahoo.com/mrss/',
}
item = xpath_element(video, './channel/item', fatal=True)
title = xpath_text(item, './title', fatal=True)
formats = [{
'url': content.attrib['url'],
} for content in item.findall(
compat_xpath(xpath_with_ns('./media:content', NS_MAP))) if content.attrib.get('url')]
thumbnail = xpath_attr(
item, xpath_with_ns('./media:thumbnail', NS_MAP), 'url')
webpage = self._download_webpage(url, display_id)
playlist_url = compat_urlparse.urljoin(url, self._search_regex(
r'data-playlist-url="([^"]+)', webpage, 'playlist url'))
item = self._download_json(playlist_url, display_id)[0]
video_id = item['id']
quality = qualities(('ld', 'sd', 'hd', ))
formats = []
for f in item.get('mbr', []):
src = f.get('src')
if not src:
continue
fname = f.get('name')
formats.append({
'url': src,
'format_id': fname,
'quality': quality(fname),
})
self._sort_formats(formats)
webpage = self._download_webpage(url, video_id, 'Downloading page', fatal=False)
if webpage:
title = self._html_search_regex(
(r'<div class="tv_translation">\s*<h1><a href="[^"]+">([^<]*)</a>',
r"'title'\s*:\s*'([^']+)'"),
webpage, 'title', default=None) or title
description = self._html_search_regex(
r'<div class="descr">\s*<div>&nbsp;</div>\s*<p>([^<]*)</p></div>',
webpage, 'description', default=None) or self._html_search_meta(
'description', webpage, 'description')
thumbnail = thumbnail or self._og_search_thumbnail(webpage)
duration = int_or_none(self._html_search_meta(
'video:duration', webpage, 'video duration', fatal=False))
upload_date = unified_strdate(self._html_search_meta(
'ya:ovs:upload_date', webpage, 'upload date', fatal=False))
title = self._html_search_regex(
(r'<div class="tv_translation">\s*<h1><a href="[^"]+">([^<]*)</a>',
r"'title'\s*:\s*'([^']+)'"),
webpage, 'title', default=None) or item['title']
description = self._html_search_regex(
r'<div class="descr">\s*<div>&nbsp;</div>\s*<p>([^<]*)</p></div>',
webpage, 'description', default=None) or self._html_search_meta(
'description', webpage, 'description')
duration = int_or_none(self._html_search_meta(
'video:duration', webpage, 'video duration', fatal=False))
upload_date = unified_strdate(self._html_search_meta(
'ya:ovs:upload_date', webpage, 'upload date', fatal=False))
return {
'id': video_id,
'thumbnail': thumbnail,
'thumbnail': item.get('poster') or self._og_search_thumbnail(webpage),
'title': title,
'description': description,
'upload_date': upload_date,

View File

@@ -0,0 +1,70 @@
# coding: utf-8
from __future__ import unicode_literals
from .adobepass import AdobePassIE
from ..utils import (
update_url_query,
extract_attributes,
parse_age_limit,
smuggle_url,
)
class FXNetworksIE(AdobePassIE):
_VALID_URL = r'https?://(?:www\.)?(?:fxnetworks|simpsonsworld)\.com/video/(?P<id>\d+)'
_TESTS = [{
'url': 'http://www.fxnetworks.com/video/719841347694',
'md5': '1447d4722e42ebca19e5232ab93abb22',
'info_dict': {
'id': '719841347694',
'ext': 'mp4',
'title': 'Vanpage',
'description': 'F*ck settling down. You\'re the Worst returns for an all new season August 31st on FXX.',
'age_limit': 14,
'uploader': 'NEWA-FNG-FX',
'upload_date': '20160706',
'timestamp': 1467844741,
},
'add_ie': ['ThePlatform'],
}, {
'url': 'http://www.simpsonsworld.com/video/716094019682',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
if 'The content you are trying to access is not available in your region.' in webpage:
self.raise_geo_restricted()
video_data = extract_attributes(self._search_regex(
r'(<a.+?rel="http://link\.theplatform\.com/s/.+?</a>)', webpage, 'video data'))
player_type = self._search_regex(r'playerType\s*=\s*[\'"]([^\'"]+)', webpage, 'player type', default=None)
release_url = video_data['rel']
title = video_data['data-title']
rating = video_data.get('data-rating')
query = {
'mbr': 'true',
}
if player_type == 'movies':
query.update({
'manifest': 'm3u',
})
else:
query.update({
'switch': 'http',
})
if video_data.get('data-req-auth') == '1':
resource = self._get_mvpd_resource(
video_data['data-channel'], title,
video_data.get('data-guid'), rating)
query['auth'] = self._extract_mvpd_auth(url, video_id, 'fx', resource)
return {
'_type': 'url_transparent',
'id': video_id,
'title': title,
'url': smuggle_url(update_url_query(release_url, query), {'force_smil_url': True}),
'thumbnail': video_data.get('data-large-thumb'),
'age_limit': parse_age_limit(rating),
'ie_key': 'ThePlatform',
}

View File

@@ -72,6 +72,8 @@ from .kaltura import KalturaIE
from .eagleplatform import EaglePlatformIE
from .facebook import FacebookIE
from .soundcloud import SoundcloudIE
from .vbox7 import Vbox7IE
from .dbtv import DBTVIE
class GenericIE(InfoExtractor):
@@ -1373,6 +1375,27 @@ class GenericIE(InfoExtractor):
},
'add_ie': [ArkenaIE.ie_key()],
},
{
'url': 'http://nova.bg/news/view/2016/08/16/156543/%D0%BD%D0%B0-%D0%BA%D0%BE%D1%81%D1%8A%D0%BC-%D0%BE%D1%82-%D0%B2%D0%B7%D1%80%D0%B8%D0%B2-%D0%BE%D1%82%D1%86%D0%B5%D0%BF%D0%B8%D1%85%D0%B0-%D1%86%D1%8F%D0%BB-%D0%BA%D0%B2%D0%B0%D1%80%D1%82%D0%B0%D0%BB-%D0%B7%D0%B0%D1%80%D0%B0%D0%B4%D0%B8-%D0%B8%D0%B7%D1%82%D0%B8%D1%87%D0%B0%D0%BD%D0%B5-%D0%BD%D0%B0-%D0%B3%D0%B0%D0%B7-%D0%B2-%D0%BF%D0%BB%D0%BE%D0%B2%D0%B4%D0%B8%D0%B2/',
'info_dict': {
'id': '1c7141f46c',
'ext': 'mp4',
'title': 'НА КОСЪМ ОТ ВЗРИВ: Изтичане на газ на бензиностанция в Пловдив',
},
'params': {
'skip_download': True,
},
'add_ie': [Vbox7IE.ie_key()],
},
{
# DBTV embeds
'url': 'http://www.dagbladet.no/2016/02/23/nyheter/nordlys/ski/troms/ver/43254897/',
'info_dict': {
'id': '43254897',
'title': 'Etter ett års planlegging, klaffet endelig alt: - Jeg måtte ta en liten dans',
},
'playlist_mincount': 3,
},
# {
# # TODO: find another test
# # http://schema.org/VideoObject
@@ -2239,6 +2262,16 @@ class GenericIE(InfoExtractor):
'uploader': video_uploader,
}
# Look for VBOX7 embeds
vbox7_url = Vbox7IE._extract_url(webpage)
if vbox7_url:
return self.url_result(vbox7_url, Vbox7IE.ie_key())
# Look for DBTV embeds
dbtv_urls = DBTVIE._extract_urls(webpage)
if dbtv_urls:
return _playlist_from_matches(dbtv_urls, ie=DBTVIE.ie_key())
# Looking for http://schema.org/VideoObject
json_ld = self._search_json_ld(
webpage, video_id, default={}, expected_type='VideoObject')

View File

@@ -396,12 +396,12 @@ class GloboIE(InfoExtractor):
class GloboArticleIE(InfoExtractor):
_VALID_URL = 'https?://.+?\.globo\.com/(?:[^/]+/)*(?P<id>[^/]+)\.html'
_VALID_URL = 'https?://.+?\.globo\.com/(?:[^/]+/)*(?P<id>[^/]+)(?:\.html)?'
_VIDEOID_REGEXES = [
r'\bdata-video-id=["\'](\d{7,})',
r'\bdata-player-videosids=["\'](\d{7,})',
r'\bvideosIDs\s*:\s*["\'](\d{7,})',
r'\bvideosIDs\s*:\s*["\']?(\d{7,})',
r'\bdata-id=["\'](\d{7,})',
r'<div[^>]+\bid=["\'](\d{7,})',
]
@@ -423,6 +423,9 @@ class GloboArticleIE(InfoExtractor):
}, {
'url': 'http://gshow.globo.com/programas/tv-xuxa/O-Programa/noticia/2014/01/xuxa-e-junno-namoram-muuuito-em-luau-de-zeze-di-camargo-e-luciano.html',
'only_matching': True,
}, {
'url': 'http://oglobo.globo.com/rio/a-amizade-entre-um-entregador-de-farmacia-um-piano-19946271',
'only_matching': True,
}]
@classmethod

View File

@@ -46,3 +46,34 @@ class HGTVIE(InfoExtractor):
'episode_number': int_or_none(embed_vars.get('episode')),
'ie_key': 'ThePlatform',
}
class HGTVComShowIE(InfoExtractor):
IE_NAME = 'hgtv.com:show'
_VALID_URL = r'https?://(?:www\.)?hgtv\.com/shows/[^/]+/(?P<id>[^/?#&]+)'
_TEST = {
'url': 'http://www.hgtv.com/shows/flip-or-flop/flip-or-flop-full-episodes-videos',
'info_dict': {
'id': 'flip-or-flop-full-episodes-videos',
'title': 'Flip or Flop Full Episodes',
},
'playlist_mincount': 15,
}
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
config = self._parse_json(
self._search_regex(
r'(?s)data-module=["\']video["\'][^>]*>.*?<script[^>]+type=["\']text/x-config["\'][^>]*>(.+?)</script',
webpage, 'video config'),
display_id)['channels'][0]
entries = [
self.url_result(video['releaseUrl'])
for video in config['videos'] if video.get('releaseUrl')]
return self.playlist_result(
entries, display_id, config.get('title'), config.get('description'))

View File

@@ -6,6 +6,7 @@ from .common import InfoExtractor
from ..utils import (
mimetype2ext,
qualities,
remove_end,
)
@@ -19,7 +20,7 @@ class ImdbIE(InfoExtractor):
'info_dict': {
'id': '2524815897',
'ext': 'mp4',
'title': 'Ice Age: Continental Drift Trailer (No. 2) - IMDb',
'title': 'Ice Age: Continental Drift Trailer (No. 2)',
'description': 'md5:9061c2219254e5d14e03c25c98e96a81',
}
}, {
@@ -83,10 +84,10 @@ class ImdbIE(InfoExtractor):
return {
'id': video_id,
'title': self._og_search_title(webpage),
'title': remove_end(self._og_search_title(webpage), ' - IMDb'),
'formats': formats,
'description': descr,
'thumbnail': format_info['slate'],
'thumbnail': format_info.get('slate'),
}

View File

@@ -1,4 +1,4 @@
# encoding: utf-8
# coding: utf-8
from __future__ import unicode_literals
import re
@@ -8,7 +8,7 @@ from .common import InfoExtractor
from ..utils import (
ExtractorError,
int_or_none,
sanitized_Request,
qualities,
)
@@ -49,11 +49,27 @@ class IviIE(InfoExtractor):
'thumbnail': 're:^https?://.*\.jpg$',
},
'skip': 'Only works from Russia',
},
{
# with MP4-HD720 format
'url': 'http://www.ivi.ru/watch/146500',
'md5': 'd63d35cdbfa1ea61a5eafec7cc523e1e',
'info_dict': {
'id': '146500',
'ext': 'mp4',
'title': 'Кукла',
'description': 'md5:ffca9372399976a2d260a407cc74cce6',
'duration': 5599,
'thumbnail': 're:^https?://.*\.jpg$',
},
'skip': 'Only works from Russia',
}
]
# Sorted by quality
_KNOWN_FORMATS = ['MP4-low-mobile', 'MP4-mobile', 'FLV-lo', 'MP4-lo', 'FLV-hi', 'MP4-hi', 'MP4-SHQ']
_KNOWN_FORMATS = (
'MP4-low-mobile', 'MP4-mobile', 'FLV-lo', 'MP4-lo', 'FLV-hi', 'MP4-hi',
'MP4-SHQ', 'MP4-HD720', 'MP4-HD1080')
def _real_extract(self, url):
video_id = self._match_id(url)
@@ -69,10 +85,9 @@ class IviIE(InfoExtractor):
]
}
request = sanitized_Request(
'http://api.digitalaccess.ru/api/json/', json.dumps(data))
video_json = self._download_json(
request, video_id, 'Downloading video JSON')
'http://api.digitalaccess.ru/api/json/', video_id,
'Downloading video JSON', data=json.dumps(data))
if 'error' in video_json:
error = video_json['error']
@@ -84,11 +99,13 @@ class IviIE(InfoExtractor):
result = video_json['result']
quality = qualities(self._KNOWN_FORMATS)
formats = [{
'url': x['url'],
'format_id': x['content_format'],
'preference': self._KNOWN_FORMATS.index(x['content_format']),
} for x in result['files'] if x['content_format'] in self._KNOWN_FORMATS]
'format_id': x.get('content_format'),
'quality': quality(x.get('content_format')),
} for x in result['files'] if x.get('url')]
self._sort_formats(formats)
@@ -115,7 +132,7 @@ class IviIE(InfoExtractor):
webpage, 'season number', default=None))
episode_number = int_or_none(self._search_regex(
r'<meta[^>]+itemprop="episode"[^>]*>\s*<meta[^>]+itemprop="episodeNumber"[^>]+content="(\d+)',
r'[^>]+itemprop="episode"[^>]*>\s*<meta[^>]+itemprop="episodeNumber"[^>]+content="(\d+)',
webpage, 'episode number', default=None))
description = self._og_search_description(webpage, default=None) or self._html_search_meta(

View File

@@ -30,7 +30,7 @@ class JWPlatformBaseIE(InfoExtractor):
return self._parse_jwplayer_data(
jwplayer_data, video_id, *args, **kwargs)
def _parse_jwplayer_data(self, jwplayer_data, video_id, require_title=True, m3u8_id=None, rtmp_params=None, base_url=None):
def _parse_jwplayer_data(self, jwplayer_data, video_id=None, require_title=True, m3u8_id=None, rtmp_params=None, base_url=None):
# JWPlayer backward compatibility: flattened playlists
# https://github.com/jwplayer/jwplayer/blob/v7.4.3/src/js/api/config.js#L81-L96
if 'playlist' not in jwplayer_data:
@@ -43,6 +43,8 @@ class JWPlatformBaseIE(InfoExtractor):
if 'sources' not in video_data:
video_data['sources'] = [video_data]
this_video_id = video_id or video_data['mediaid']
formats = []
for source in video_data['sources']:
source_url = self._proto_relative_url(source['file'])
@@ -52,7 +54,7 @@ class JWPlatformBaseIE(InfoExtractor):
ext = mimetype2ext(source_type) or determine_ext(source_url)
if source_type == 'hls' or ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
source_url, video_id, 'mp4', 'm3u8_native', m3u8_id=m3u8_id, fatal=False))
source_url, this_video_id, 'mp4', 'm3u8_native', m3u8_id=m3u8_id, fatal=False))
# https://github.com/jwplayer/jwplayer/blob/master/src/js/providers/default.js#L67
elif source_type.startswith('audio') or ext in ('oga', 'aac', 'mp3', 'mpeg', 'vorbis'):
formats.append({
@@ -68,7 +70,7 @@ class JWPlatformBaseIE(InfoExtractor):
'ext': ext,
}
if source_url.startswith('rtmp'):
a_format['ext'] = 'flv',
a_format['ext'] = 'flv'
# See com/longtailvideo/jwplayer/media/RTMPMediaProvider.as
# of jwplayer.flash.swf
@@ -95,7 +97,7 @@ class JWPlatformBaseIE(InfoExtractor):
})
entries.append({
'id': video_id,
'id': this_video_id,
'title': video_data['title'] if require_title else video_data.get('title'),
'description': video_data.get('description'),
'thumbnail': self._proto_relative_url(video_data.get('image')),

View File

@@ -36,6 +36,12 @@ class KalturaIE(InfoExtractor):
'''
_SERVICE_URL = 'http://cdnapi.kaltura.com'
_SERVICE_BASE = '/api_v3/index.php'
# See https://github.com/kaltura/server/blob/master/plugins/content/caption/base/lib/model/enums/CaptionType.php
_CAPTION_TYPES = {
1: 'srt',
2: 'ttml',
3: 'vtt',
}
_TESTS = [
{
'url': 'kaltura:269692:1_1jc2y3e4',
@@ -67,6 +73,27 @@ class KalturaIE(InfoExtractor):
# video with subtitles
'url': 'kaltura:111032:1_cw786r8q',
'only_matching': True,
},
{
# video with ttml subtitles (no fileExt)
'url': 'kaltura:1926081:0_l5ye1133',
'info_dict': {
'id': '0_l5ye1133',
'ext': 'mp4',
'title': 'What Can You Do With Python?',
'upload_date': '20160221',
'uploader_id': 'stork',
'thumbnail': 're:^https?://.*/thumbnail/.*',
'timestamp': int,
'subtitles': {
'en': [{
'ext': 'ttml',
}],
},
},
'params': {
'skip_download': True,
},
}
]
@@ -122,18 +149,6 @@ class KalturaIE(InfoExtractor):
return data
def _get_kaltura_signature(self, video_id, partner_id, service_url=None):
actions = [{
'apiVersion': '3.1',
'expiry': 86400,
'format': 1,
'service': 'session',
'action': 'startWidgetSession',
'widgetId': '_%s' % partner_id,
}]
return self._kaltura_api_call(
video_id, actions, service_url, note='Downloading Kaltura signature')['ks']
def _get_video_info(self, video_id, partner_id, service_url=None):
actions = [
{
@@ -208,6 +223,17 @@ class KalturaIE(InfoExtractor):
reference_id)['entryResult']
info, flavor_assets = entry_data['meta'], entry_data['contextData']['flavorAssets']
entry_id = info['id']
# Unfortunately, data returned in kalturaIframePackageData lacks
# captions so we will try requesting the complete data using
# regular approach since we now know the entry_id
try:
_, info, flavor_assets, captions = self._get_video_info(
entry_id, partner_id)
except ExtractorError:
# Regular scenario failed but we already have everything
# extracted apart from captions and can process at least
# with this
pass
else:
raise ExtractorError('Invalid URL', expected=True)
ks = params.get('flashvars[ks]', [None])[0]
@@ -265,9 +291,12 @@ class KalturaIE(InfoExtractor):
# Continue if caption is not ready
if f.get('status') != 2:
continue
if not caption.get('id'):
continue
caption_format = int_or_none(caption.get('format'))
subtitles.setdefault(caption.get('languageCode') or caption.get('language'), []).append({
'url': '%s/api_v3/service/caption_captionasset/action/serve/captionAssetId/%s' % (self._SERVICE_URL, caption['id']),
'ext': caption.get('fileExt'),
'ext': caption.get('fileExt') or self._CAPTION_TYPES.get(caption_format) or 'ttml',
})
return {

View File

@@ -3,64 +3,126 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..aes import aes_decrypt_text
from ..compat import (
compat_str,
compat_urllib_parse_unquote,
)
from ..utils import (
sanitized_Request,
url_basename,
determine_ext,
ExtractorError,
int_or_none,
str_to_int,
strip_or_none,
)
class KeezMoviesIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?keezmovies\.com/video/.+?(?P<id>[0-9]+)(?:[/?&]|$)'
_TEST = {
_VALID_URL = r'https?://(?:www\.)?keezmovies\.com/video/(?:(?P<display_id>[^/]+)-)?(?P<id>\d+)'
_TESTS = [{
'url': 'http://www.keezmovies.com/video/petite-asian-lady-mai-playing-in-bathtub-1214711',
'md5': '1c1e75d22ffa53320f45eeb07bc4cdc0',
'info_dict': {
'id': '1214711',
'display_id': 'petite-asian-lady-mai-playing-in-bathtub',
'ext': 'mp4',
'title': 'Petite Asian Lady Mai Playing In Bathtub',
'age_limit': 18,
'thumbnail': 're:^https?://.*\.jpg$',
'view_count': int,
'age_limit': 18,
}
}
}, {
'url': 'http://www.keezmovies.com/video/1214711',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
def _extract_info(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
display_id = (mobj.group('display_id')
if 'display_id' in mobj.groupdict()
else None) or mobj.group('id')
req = sanitized_Request(url)
req.add_header('Cookie', 'age_verified=1')
webpage = self._download_webpage(req, video_id)
# embedded video
mobj = re.search(r'href="([^"]+)"></iframe>', webpage)
if mobj:
embedded_url = mobj.group(1)
return self.url_result(embedded_url)
video_title = self._html_search_regex(
r'<h1 [^>]*>([^<]+)', webpage, 'title')
flashvars = self._parse_json(self._search_regex(
r'var\s+flashvars\s*=\s*([^;]+);', webpage, 'flashvars'), video_id)
webpage = self._download_webpage(
url, display_id, headers={'Cookie': 'age_verified=1'})
formats = []
for height in (180, 240, 480):
if flashvars.get('quality_%dp' % height):
video_url = flashvars['quality_%dp' % height]
a_format = {
'url': video_url,
'height': height,
'format_id': '%dp' % height,
}
filename_parts = url_basename(video_url).split('_')
if len(filename_parts) >= 2 and re.match(r'\d+[Kk]', filename_parts[1]):
a_format['tbr'] = int(filename_parts[1][:-1])
formats.append(a_format)
format_urls = set()
age_limit = self._rta_search(webpage)
title = None
thumbnail = None
duration = None
encrypted = False
return {
def extract_format(format_url, height=None):
if not isinstance(format_url, compat_str) or not format_url.startswith('http'):
return
if format_url in format_urls:
return
format_urls.add(format_url)
tbr = int_or_none(self._search_regex(
r'[/_](\d+)[kK][/_]', format_url, 'tbr', default=None))
if not height:
height = int_or_none(self._search_regex(
r'[/_](\d+)[pP][/_]', format_url, 'height', default=None))
if encrypted:
format_url = aes_decrypt_text(
video_url, title, 32).decode('utf-8')
formats.append({
'url': format_url,
'format_id': '%dp' % height if height else None,
'height': height,
'tbr': tbr,
})
flashvars = self._parse_json(
self._search_regex(
r'flashvars\s*=\s*({.+?});', webpage,
'flashvars', default='{}'),
display_id, fatal=False)
if flashvars:
title = flashvars.get('video_title')
thumbnail = flashvars.get('image_url')
duration = int_or_none(flashvars.get('video_duration'))
encrypted = flashvars.get('encrypted') is True
for key, value in flashvars.items():
mobj = re.search(r'quality_(\d+)[pP]', key)
if mobj:
extract_format(value, int(mobj.group(1)))
video_url = flashvars.get('video_url')
if video_url and determine_ext(video_url, None):
extract_format(video_url)
video_url = self._html_search_regex(
r'flashvars\.video_url\s*=\s*(["\'])(?P<url>http.+?)\1',
webpage, 'video url', default=None, group='url')
if video_url:
extract_format(compat_urllib_parse_unquote(video_url))
if not formats:
if 'title="This video is no longer available"' in webpage:
raise ExtractorError(
'Video %s is no longer available' % video_id, expected=True)
self._sort_formats(formats)
if not title:
title = self._html_search_regex(
r'<h1[^>]*>([^<]+)', webpage, 'title')
return webpage, {
'id': video_id,
'title': video_title,
'display_id': display_id,
'title': strip_or_none(title),
'thumbnail': thumbnail,
'duration': duration,
'age_limit': 18,
'formats': formats,
'age_limit': age_limit,
'thumbnail': flashvars.get('image_url')
}
def _real_extract(self, url):
webpage, info = self._extract_info(url)
info['view_count'] = str_to_int(self._search_regex(
r'<b>([\d,.]+)</b> Views?', webpage, 'view count', fatal=False))
return info

View File

@@ -4,7 +4,10 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_urlparse
from ..compat import (
compat_str,
compat_urlparse,
)
from ..utils import (
determine_ext,
ExtractorError,
@@ -96,7 +99,7 @@ class LifeNewsIE(InfoExtractor):
r'<video[^>]+><source[^>]+src=["\'](.+?)["\']', webpage)
iframe_links = re.findall(
r'<iframe[^>]+src=["\']((?:https?:)?//embed\.life\.ru/embed/.+?)["\']',
r'<iframe[^>]+src=["\']((?:https?:)?//embed\.life\.ru/(?:embed|video)/.+?)["\']',
webpage)
if not video_urls and not iframe_links:
@@ -164,9 +167,9 @@ class LifeNewsIE(InfoExtractor):
class LifeEmbedIE(InfoExtractor):
IE_NAME = 'life:embed'
_VALID_URL = r'https?://embed\.life\.ru/embed/(?P<id>[\da-f]{32})'
_VALID_URL = r'https?://embed\.life\.ru/(?:embed|video)/(?P<id>[\da-f]{32})'
_TEST = {
_TESTS = [{
'url': 'http://embed.life.ru/embed/e50c2dec2867350528e2574c899b8291',
'md5': 'b889715c9e49cb1981281d0e5458fbbe',
'info_dict': {
@@ -175,30 +178,57 @@ class LifeEmbedIE(InfoExtractor):
'title': 'e50c2dec2867350528e2574c899b8291',
'thumbnail': 're:http://.*\.jpg',
}
}
}, {
# with 1080p
'url': 'https://embed.life.ru/video/e50c2dec2867350528e2574c899b8291',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
thumbnail = None
formats = []
for video_url in re.findall(r'"file"\s*:\s*"([^"]+)', webpage):
video_url = compat_urlparse.urljoin(url, video_url)
ext = determine_ext(video_url)
if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
video_url, video_id, 'mp4',
entry_protocol='m3u8_native', m3u8_id='m3u8'))
else:
formats.append({
'url': video_url,
'format_id': ext,
'preference': 1,
})
def extract_m3u8(manifest_url):
formats.extend(self._extract_m3u8_formats(
manifest_url, video_id, 'mp4',
entry_protocol='m3u8_native', m3u8_id='m3u8'))
def extract_original(original_url):
formats.append({
'url': original_url,
'format_id': determine_ext(original_url, None),
'preference': 1,
})
playlist = self._parse_json(
self._search_regex(
r'options\s*=\s*({.+?});', webpage, 'options', default='{}'),
video_id).get('playlist', {})
if playlist:
master = playlist.get('master')
if isinstance(master, compat_str) and determine_ext(master) == 'm3u8':
extract_m3u8(compat_urlparse.urljoin(url, master))
original = playlist.get('original')
if isinstance(original, compat_str):
extract_original(original)
thumbnail = playlist.get('image')
# Old rendition fallback
if not formats:
for video_url in re.findall(r'"file"\s*:\s*"([^"]+)', webpage):
video_url = compat_urlparse.urljoin(url, video_url)
if determine_ext(video_url) == 'm3u8':
extract_m3u8(video_url)
else:
extract_original(video_url)
self._sort_formats(formats)
thumbnail = self._search_regex(
thumbnail = thumbnail or self._search_regex(
r'"image"\s*:\s*"([^"]+)', webpage, 'thumbnail', default=None)
return {

View File

@@ -14,7 +14,7 @@ from ..utils import (
class LiTVIE(InfoExtractor):
_VALID_URL = r'https?://www\.litv\.tv/vod/[^/]+/content\.do\?.*?\bid=(?P<id>[^&]+)'
_VALID_URL = r'https?://www\.litv\.tv/(?:vod|promo)/[^/]+/(?:content\.do)?\?.*?\b(?:content_)?id=(?P<id>[^&]+)'
_URL_TEMPLATE = 'https://www.litv.tv/vod/%s/content.do?id=%s'
@@ -27,6 +27,7 @@ class LiTVIE(InfoExtractor):
'playlist_count': 50,
}, {
'url': 'https://www.litv.tv/vod/drama/content.do?brc_id=root&id=VOD00041610&isUHEnabled=true&autoPlay=1',
'md5': '969e343d9244778cb29acec608e53640',
'info_dict': {
'id': 'VOD00041610',
'ext': 'mp4',
@@ -37,7 +38,16 @@ class LiTVIE(InfoExtractor):
},
'params': {
'noplaylist': True,
'skip_download': True, # m3u8 download
},
'skip': 'Georestricted to Taiwan',
}, {
'url': 'https://www.litv.tv/promo/miyuezhuan/?content_id=VOD00044841&',
'md5': '88322ea132f848d6e3e18b32a832b918',
'info_dict': {
'id': 'VOD00044841',
'ext': 'mp4',
'title': '芈月傳第1集 霸星芈月降世楚國',
'description': '楚威王二年,太史令唐昧夜觀星象,發現霸星即將現世。王后得知霸星的預言後,想盡辦法不讓孩子順利出生,幸得莒姬相護化解危機。沒想到眾人期待下出生的霸星卻是位公主,楚威王對此失望至極。楚王后命人將女嬰丟棄河中,居然奇蹟似的被少司命像攔下,楚威王認為此女非同凡響,為她取名芈月。',
},
'skip': 'Georestricted to Taiwan',
}]
@@ -92,13 +102,18 @@ class LiTVIE(InfoExtractor):
# endpoint gives the same result as the data embedded in the webpage.
# If georestricted, there are no embedded data, so an extra request is
# necessary to get the error code
if 'assetId' not in view_data:
view_data = self._download_json(
'https://www.litv.tv/vod/ajax/getProgramInfo', video_id,
query={'contentId': video_id},
headers={'Accept': 'application/json'})
video_data = self._parse_json(self._search_regex(
r'uiHlsUrl\s*=\s*testBackendData\(([^;]+)\);',
webpage, 'video data', default='{}'), video_id)
if not video_data:
payload = {
'assetId': view_data['assetId'],
'watchDevices': vod_data['watchDevices'],
'watchDevices': view_data['watchDevices'],
'contentType': view_data['contentType'],
}
video_data = self._download_json(
@@ -115,7 +130,8 @@ class LiTVIE(InfoExtractor):
raise ExtractorError('Unexpected result from %s' % self.IE_NAME)
formats = self._extract_m3u8_formats(
video_data['fullpath'], video_id, ext='mp4', m3u8_id='hls')
video_data['fullpath'], video_id, ext='mp4',
entry_protocol='m3u8_native', m3u8_id='hls')
for a_format in formats:
# LiTV HLS segments doesn't like compressions
a_format.setdefault('http_headers', {})['Youtubedl-no-compression'] = True

View File

@@ -25,10 +25,7 @@ class MioMioIE(InfoExtractor):
'title': '【SKY】字幕 铠武昭和VS平成 假面骑士大战FEAT战队 魔星字幕组 字幕',
'duration': 5923,
},
'params': {
# The server provides broken file
'skip_download': True,
}
'skip': 'Unable to load videos',
}, {
'url': 'http://www.miomio.tv/watch/cc184024/',
'info_dict': {
@@ -47,16 +44,12 @@ class MioMioIE(InfoExtractor):
'skip': 'Unable to load videos',
}, {
# new 'h5' player
'url': 'http://www.miomio.tv/watch/cc273295/',
'md5': '',
'url': 'http://www.miomio.tv/watch/cc273997/',
'md5': '0b27a4b4495055d826813f8c3a6b2070',
'info_dict': {
'id': '273295',
'id': '273997',
'ext': 'mp4',
'title': 'アウト×デラックス 20160526',
},
'params': {
# intermittent HTTP 500
'skip_download': True,
'title': 'マツコの知らない世界【劇的進化SPビニール傘冷凍食品2016】 1_2 - 16 05 31',
},
}]
@@ -116,7 +109,7 @@ class MioMioIE(InfoExtractor):
player_webpage = self._download_webpage(
player_url, video_id,
note='Downloading player webpage', headers={'Referer': url})
entries = self._parse_html5_media_entries(player_url, player_webpage)
entries = self._parse_html5_media_entries(player_url, player_webpage, video_id)
http_headers = {'Referer': player_url}
else:
http_headers = {'Referer': 'http://www.miomio.tv%s' % mioplayer_path}

View File

@@ -1,53 +1,56 @@
from __future__ import unicode_literals
import os
import re
from .common import InfoExtractor
from ..compat import (
compat_urllib_parse_unquote,
compat_urllib_parse_urlparse,
from ..utils import (
int_or_none,
str_to_int,
unified_strdate,
)
from ..utils import sanitized_Request
from .keezmovies import KeezMoviesIE
class MofosexIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?(?P<url>mofosex\.com/videos/(?P<id>[0-9]+)/.*?\.html)'
_TEST = {
'url': 'http://www.mofosex.com/videos/5018/japanese-teen-music-video.html',
'md5': '1b2eb47ac33cc75d4a80e3026b613c5a',
class MofosexIE(KeezMoviesIE):
_VALID_URL = r'https?://(?:www\.)?mofosex\.com/videos/(?P<id>\d+)/(?P<display_id>[^/?#&.]+)\.html'
_TESTS = [{
'url': 'http://www.mofosex.com/videos/318131/amateur-teen-playing-and-masturbating-318131.html',
'md5': '39a15853632b7b2e5679f92f69b78e91',
'info_dict': {
'id': '5018',
'id': '318131',
'display_id': 'amateur-teen-playing-and-masturbating-318131',
'ext': 'mp4',
'title': 'Japanese Teen Music Video',
'title': 'amateur teen playing and masturbating',
'thumbnail': 're:^https?://.*\.jpg$',
'upload_date': '20121114',
'view_count': int,
'like_count': int,
'dislike_count': int,
'age_limit': 18,
}
}
}, {
# This video is no longer available
'url': 'http://www.mofosex.com/videos/5018/japanese-teen-music-video.html',
'only_matching': True,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
url = 'http://www.' + mobj.group('url')
webpage, info = self._extract_info(url)
req = sanitized_Request(url)
req.add_header('Cookie', 'age_verified=1')
webpage = self._download_webpage(req, video_id)
view_count = str_to_int(self._search_regex(
r'VIEWS:</span>\s*([\d,.]+)', webpage, 'view count', fatal=False))
like_count = int_or_none(self._search_regex(
r'id=["\']amountLikes["\'][^>]*>(\d+)', webpage,
'like count', fatal=False))
dislike_count = int_or_none(self._search_regex(
r'id=["\']amountDislikes["\'][^>]*>(\d+)', webpage,
'like count', fatal=False))
upload_date = unified_strdate(self._html_search_regex(
r'Added:</span>([^<]+)', webpage, 'upload date', fatal=False))
video_title = self._html_search_regex(r'<h1>(.+?)<', webpage, 'title')
video_url = compat_urllib_parse_unquote(self._html_search_regex(r'flashvars.video_url = \'([^\']+)', webpage, 'video_url'))
path = compat_urllib_parse_urlparse(video_url).path
extension = os.path.splitext(path)[1][1:]
format = path.split('/')[5].split('_')[:2]
format = '-'.join(format)
info.update({
'view_count': view_count,
'like_count': like_count,
'dislike_count': dislike_count,
'upload_date': upload_date,
'thumbnail': self._og_search_thumbnail(webpage),
})
age_limit = self._rta_search(webpage)
return {
'id': video_id,
'title': video_title,
'url': video_url,
'ext': extension,
'format': format,
'format_id': format,
'age_limit': age_limit,
}
return info

View File

@@ -257,8 +257,8 @@ class MTVServicesEmbeddedIE(MTVServicesInfoExtractor):
def _get_feed_url(self, uri):
video_id = self._id_from_uri(uri)
site_id = uri.replace(video_id, '')
config_url = ('http://media.mtvnservices.com/pmt/e1/players/{0}/'
'context4/context5/config.xml'.format(site_id))
config_url = ('http://media.mtvnservices.com/pmt-arc/e1/players/{0}/'
'context52/config.xml'.format(site_id))
config_doc = self._download_xml(config_url, video_id)
feed_node = config_doc.find('.//feed')
feed_url = feed_node.text.strip().split('?')[0]

View File

@@ -3,7 +3,7 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from .theplatform import ThePlatformIE
from .adobepass import AdobePassIE
from ..utils import (
smuggle_url,
url_basename,
@@ -65,7 +65,7 @@ class NationalGeographicVideoIE(InfoExtractor):
}
class NationalGeographicIE(ThePlatformIE):
class NationalGeographicIE(AdobePassIE):
IE_NAME = 'natgeo'
_VALID_URL = r'https?://channel\.nationalgeographic\.com/(?:wild/)?[^/]+/(?:videos|episodes)/(?P<id>[^/?]+)'
@@ -119,7 +119,7 @@ class NationalGeographicIE(ThePlatformIE):
auth_resource_id = self._search_regex(
r"video_auth_resourceId\s*=\s*'([^']+)'",
webpage, 'auth resource id')
query['auth'] = self._extract_mvpd_auth(url, display_id, 'natgeo', auth_resource_id) or ''
query['auth'] = self._extract_mvpd_auth(url, display_id, 'natgeo', auth_resource_id)
return {
'_type': 'url_transparent',
@@ -131,7 +131,7 @@ class NationalGeographicIE(ThePlatformIE):
}
class NationalGeographicEpisodeGuideIE(ThePlatformIE):
class NationalGeographicEpisodeGuideIE(InfoExtractor):
IE_NAME = 'natgeo:episodeguide'
_VALID_URL = r'https?://channel\.nationalgeographic\.com/(?:wild/)?(?P<id>[^/]+)/episode-guide'
_TESTS = [

View File

@@ -14,16 +14,6 @@ from ..utils import (
class NRKBaseIE(InfoExtractor):
def _extract_formats(self, manifest_url, video_id, fatal=True):
formats = []
formats.extend(self._extract_f4m_formats(
manifest_url + '?hdcore=3.5.0&plugin=aasp-3.5.0.151.81',
video_id, f4m_id='hds', fatal=fatal))
formats.extend(self._extract_m3u8_formats(manifest_url.replace(
'akamaihd.net/z/', 'akamaihd.net/i/').replace('/manifest.f4m', '/master.m3u8'),
video_id, 'mp4', 'm3u8_native', m3u8_id='hls', fatal=fatal))
return formats
def _real_extract(self, url):
video_id = self._match_id(url)
@@ -45,7 +35,7 @@ class NRKBaseIE(InfoExtractor):
asset_url = asset.get('url')
if not asset_url:
continue
formats = self._extract_formats(asset_url, video_id, fatal=False)
formats = self._extract_akamai_formats(asset_url, video_id)
if not formats:
continue
self._sort_formats(formats)
@@ -69,7 +59,7 @@ class NRKBaseIE(InfoExtractor):
if not entries:
media_url = data.get('mediaUrl')
if media_url:
formats = self._extract_formats(media_url, video_id)
formats = self._extract_akamai_formats(media_url, video_id)
self._sort_formats(formats)
duration = parse_duration(data.get('duration'))
entries = [{

View File

@@ -1,6 +1,8 @@
# encoding: utf-8
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_urlparse
from ..utils import (
@@ -40,8 +42,8 @@ class NTVDeIE(InfoExtractor):
timestamp = int_or_none(info.get('publishedDateAsUnixTimeStamp'))
vdata = self._parse_json(self._search_regex(
r'(?s)\$\(\s*"\#player"\s*\)\s*\.data\(\s*"player",\s*(\{.*?\})\);',
webpage, 'player data'),
video_id, transform_source=js_to_json)
webpage, 'player data'), video_id,
transform_source=lambda s: js_to_json(re.sub(r'advertising:\s*{[^}]+},', '', s)))
duration = parse_duration(vdata.get('duration'))
formats = []

View File

@@ -1,12 +1,12 @@
# coding: utf-8
from __future__ import unicode_literals, division
import math
from .common import InfoExtractor
from ..compat import compat_chr
from ..compat import (
compat_chr,
compat_ord,
)
from ..utils import (
decode_png,
determine_ext,
ExtractorError,
)
@@ -42,71 +42,26 @@ class OpenloadIE(InfoExtractor):
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
webpage = self._download_webpage('https://openload.co/embed/%s/' % video_id, video_id)
if 'File not found' in webpage:
if 'File not found' in webpage or 'deleted by the owner' in webpage:
raise ExtractorError('File not found', expected=True)
# The following extraction logic is proposed by @Belderak and @gdkchan
# and declared to be used freely in youtube-dl
# See https://github.com/rg3/youtube-dl/issues/9706
# The following decryption algorithm is written by @yokrysty and
# declared to be freely used in youtube-dl
# See https://github.com/rg3/youtube-dl/issues/10408
enc_data = self._html_search_regex(
r'<span[^>]+id="hiddenurl"[^>]*>([^<]+)</span>', webpage, 'encrypted data')
numbers_js = self._download_webpage(
'https://openload.co/assets/js/obfuscator/n.js', video_id,
note='Downloading signature numbers')
signums = self._search_regex(
r'window\.signatureNumbers\s*=\s*[\'"](?P<data>[a-z]+)[\'"]',
numbers_js, 'signature numbers', group='data')
video_url_chars = []
linkimg_uri = self._search_regex(
r'<img[^>]+id="linkimg"[^>]+src="([^"]+)"', webpage, 'link image')
linkimg = self._request_webpage(
linkimg_uri, video_id, note=False).read()
for c in enc_data:
j = compat_ord(c)
if j >= 33 and j <= 126:
j = ((j + 14) % 94) + 33
video_url_chars += compat_chr(j)
width, height, pixels = decode_png(linkimg)
output = ''
for y in range(height):
for x in range(width):
r, g, b = pixels[y][3 * x:3 * x + 3]
if r == 0 and g == 0 and b == 0:
break
else:
output += compat_chr(r)
output += compat_chr(g)
output += compat_chr(b)
img_str_length = len(output) // 200
img_str = [[0 for x in range(img_str_length)] for y in range(10)]
sig_str_length = len(signums) // 260
sig_str = [[0 for x in range(sig_str_length)] for y in range(10)]
for i in range(10):
for j in range(img_str_length):
begin = i * img_str_length * 20 + j * 20
img_str[i][j] = output[begin:begin + 20]
for j in range(sig_str_length):
begin = i * sig_str_length * 26 + j * 26
sig_str[i][j] = signums[begin:begin + 26]
parts = []
# TODO: find better names for str_, chr_ and sum_
str_ = ''
for i in [2, 3, 5, 7]:
str_ = ''
sum_ = float(99)
for j in range(len(sig_str[i])):
for chr_idx in range(len(img_str[i][j])):
if sum_ > float(122):
sum_ = float(98)
chr_ = compat_chr(int(math.floor(sum_)))
if sig_str[i][j][chr_idx] == chr_ and j >= len(str_):
sum_ += float(2.5)
str_ += img_str[i][j][chr_idx]
parts.append(str_.replace(',', ''))
video_url = 'https://openload.co/stream/%s~%s~%s~%s' % (parts[3], parts[1], parts[2], parts[0])
video_url = 'https://openload.co/stream/%s?mime=true' % ''.join(video_url_chars)
title = self._og_search_title(webpage, default=None) or self._search_regex(
r'<span[^>]+class=["\']title["\'][^>]*>([^<]+)', webpage,

View File

@@ -1,9 +1,10 @@
from __future__ import unicode_literals
import re
import json
import random
import collections
import json
import os
import random
import re
from .common import InfoExtractor
from ..compat import (
@@ -12,10 +13,11 @@ from ..compat import (
)
from ..utils import (
ExtractorError,
float_or_none,
int_or_none,
parse_duration,
qualities,
sanitized_Request,
srt_subtitles_timecode,
urlencode_postdata,
)
@@ -75,12 +77,10 @@ class PluralsightIE(PluralsightBaseIE):
if not post_url.startswith('http'):
post_url = compat_urlparse.urljoin(self._LOGIN_URL, post_url)
request = sanitized_Request(
post_url, urlencode_postdata(login_form))
request.add_header('Content-Type', 'application/x-www-form-urlencoded')
response = self._download_webpage(
request, None, 'Logging in as %s' % username)
post_url, None, 'Logging in as %s' % username,
data=urlencode_postdata(login_form),
headers={'Content-Type': 'application/x-www-form-urlencoded'})
error = self._search_regex(
r'<span[^>]+class="field-validation-error"[^>]*>([^<]+)</span>',
@@ -91,6 +91,53 @@ class PluralsightIE(PluralsightBaseIE):
if all(p not in response for p in ('__INITIAL_STATE__', '"currentUser"')):
raise ExtractorError('Unable to log in')
def _get_subtitles(self, author, clip_id, lang, name, duration, video_id):
captions_post = {
'a': author,
'cn': clip_id,
'lc': lang,
'm': name,
}
captions = self._download_json(
'%s/training/Player/Captions' % self._API_BASE, video_id,
'Downloading captions JSON', 'Unable to download captions JSON',
fatal=False, data=json.dumps(captions_post).encode('utf-8'),
headers={'Content-Type': 'application/json;charset=utf-8'})
if captions:
return {
lang: [{
'ext': 'json',
'data': json.dumps(captions),
}, {
'ext': 'srt',
'data': self._convert_subtitles(duration, captions),
}]
}
@staticmethod
def _convert_subtitles(duration, subs):
srt = ''
for num, current in enumerate(subs):
current = subs[num]
start, text = float_or_none(
current.get('DisplayTimeOffset')), current.get('Text')
if start is None or text is None:
continue
end = duration if num == len(subs) - 1 else float_or_none(
subs[num + 1].get('DisplayTimeOffset'))
if end is None:
continue
srt += os.linesep.join(
(
'%d' % num,
'%s --> %s' % (
srt_subtitles_timecode(start),
srt_subtitles_timecode(end)),
text,
os.linesep,
))
return srt
def _real_extract(self, url):
qs = compat_urlparse.parse_qs(compat_urlparse.urlparse(url).query)
@@ -138,6 +185,8 @@ class PluralsightIE(PluralsightBaseIE):
if not clip:
raise ExtractorError('Unable to resolve clip')
title = '%s - %s' % (module['title'], clip['title'])
QUALITIES = {
'low': {'width': 640, 'height': 480},
'medium': {'width': 848, 'height': 640},
@@ -196,13 +245,12 @@ class PluralsightIE(PluralsightBaseIE):
'mt': ext,
'q': '%dx%d' % (f['width'], f['height']),
}
request = sanitized_Request(
'%s/training/Player/ViewClip' % self._API_BASE,
json.dumps(clip_post).encode('utf-8'))
request.add_header('Content-Type', 'application/json;charset=utf-8')
format_id = '%s-%s' % (ext, quality)
clip_url = self._download_webpage(
request, display_id, 'Downloading %s URL' % format_id, fatal=False)
'%s/training/Player/ViewClip' % self._API_BASE, display_id,
'Downloading %s URL' % format_id, fatal=False,
data=json.dumps(clip_post).encode('utf-8'),
headers={'Content-Type': 'application/json;charset=utf-8'})
# Pluralsight tracks multiple sequential calls to ViewClip API and start
# to return 429 HTTP errors after some time (see
@@ -225,18 +273,20 @@ class PluralsightIE(PluralsightBaseIE):
formats.append(f)
self._sort_formats(formats)
# TODO: captions
# http://www.pluralsight.com/training/Player/ViewClip + cap = true
# or
# http://www.pluralsight.com/training/Player/Captions
# { a = author, cn = clip_id, lc = end, m = name }
duration = int_or_none(
clip.get('duration')) or parse_duration(clip.get('formattedDuration'))
# TODO: other languages?
subtitles = self.extract_subtitles(
author, clip_id, 'en', name, duration, display_id)
return {
'id': clip.get('clipName') or clip['name'],
'title': '%s - %s' % (module['title'], clip['title']),
'duration': int_or_none(clip.get('duration')) or parse_duration(clip.get('formattedDuration')),
'title': title,
'duration': duration,
'creator': author,
'formats': formats
'formats': formats,
'subtitles': subtitles,
}

View File

@@ -0,0 +1,89 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_urlparse
from ..utils import (
int_or_none,
js_to_json,
parse_filesize,
str_to_int,
)
class PornComIE(InfoExtractor):
_VALID_URL = r'https?://(?:[a-zA-Z]+\.)?porn\.com/videos/(?:(?P<display_id>[^/]+)-)?(?P<id>\d+)'
_TESTS = [{
'url': 'http://www.porn.com/videos/teen-grabs-a-dildo-and-fucks-her-pussy-live-on-1hottie-i-rec-2603339',
'md5': '3f30ce76267533cd12ba999263156de7',
'info_dict': {
'id': '2603339',
'display_id': 'teen-grabs-a-dildo-and-fucks-her-pussy-live-on-1hottie-i-rec',
'ext': 'mp4',
'title': 'Teen grabs a dildo and fucks her pussy live on 1hottie, I rec',
'thumbnail': 're:^https?://.*\.jpg$',
'duration': 551,
'view_count': int,
'age_limit': 18,
},
}, {
'url': 'http://se.porn.com/videos/marsha-may-rides-seth-on-top-of-his-thick-cock-2658067',
'only_matching': True,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
display_id = mobj.group('display_id') or video_id
webpage = self._download_webpage(url, display_id)
config = self._parse_json(
self._search_regex(
r'=\s*({.+?})\s*,\s*[\da-zA-Z_]+\s*=',
webpage, 'config', default='{}'),
display_id, transform_source=js_to_json, fatal=False)
if config:
title = config['title']
formats = [{
'url': stream['url'],
'format_id': stream.get('id'),
'height': int_or_none(self._search_regex(
r'^(\d+)[pP]', stream.get('id') or '', 'height', default=None))
} for stream in config['streams'] if stream.get('url')]
thumbnail = (compat_urlparse.urljoin(
config['thumbCDN'], config['poster'])
if config.get('thumbCDN') and config.get('poster') else None)
duration = int_or_none(config.get('length'))
else:
title = self._search_regex(
(r'<title>([^<]+)</title>', r'<h1[^>]*>([^<]+)</h1>'),
webpage, 'title')
formats = [{
'url': compat_urlparse.urljoin(url, format_url),
'format_id': '%sp' % height,
'height': int(height),
'filesize_approx': parse_filesize(filesize),
} for format_url, height, filesize in re.findall(
r'<a[^>]+href="(/download/[^"]+)">MPEG4 (\d+)p<span[^>]*>(\d+\s+[a-zA-Z]+)<',
webpage)]
thumbnail = None
duration = None
self._sort_formats(formats)
view_count = str_to_int(self._search_regex(
r'class=["\']views["\'][^>]*><p>([\d,.]+)', webpage, 'view count'))
return {
'id': video_id,
'display_id': display_id,
'title': title,
'thumbnail': thumbnail,
'duration': duration,
'view_count': view_count,
'formats': formats,
'age_limit': 18,
}

View File

@@ -13,15 +13,15 @@ class RadioBremenIE(InfoExtractor):
IE_NAME = 'radiobremen'
_TEST = {
'url': 'http://www.radiobremen.de/mediathek/index.html?id=114720',
'url': 'http://www.radiobremen.de/mediathek/?id=141876',
'info_dict': {
'id': '114720',
'id': '141876',
'ext': 'mp4',
'duration': 1685,
'duration': 178,
'width': 512,
'title': 'buten un binnen vom 22. Dezember',
'title': 'Druck auf Patrick Öztürk',
'thumbnail': 're:https?://.*\.jpg$',
'description': 'Unter anderem mit diesen Themen: 45 Flüchtlinge sind in Worpswede angekommen +++ Freies Internet für alle: Bremer arbeiten an einem flächendeckenden W-Lan-Netzwerk +++ Aktivisten kämpfen für das Unibad +++ So war das Wetter 2014 +++',
'description': 'Gegen den SPD-Bürgerschaftsabgeordneten Patrick Öztürk wird wegen Beihilfe zum gewerbsmäßigen Betrug ermittelt. Am Donnerstagabend sollte er dem Vorstand des SPD-Unterbezirks Bremerhaven dazu Rede und Antwort stehen.',
},
}

View File

@@ -4,33 +4,43 @@ from __future__ import unicode_literals
import re
from .jwplatform import JWPlatformBaseIE
from ..compat import compat_parse_qs
from ..utils import (
ExtractorError,
parse_duration,
float_or_none,
parse_iso8601,
update_url_query,
)
class SendtoNewsIE(JWPlatformBaseIE):
_VALID_URL = r'https?://embed\.sendtonews\.com/player/embed\.php\?(?P<query>[^#]+)'
_VALID_URL = r'https?://embed\.sendtonews\.com/player2/embedplayer\.php\?.*\bSC=(?P<id>[0-9A-Za-z-]+)'
_TEST = {
# From http://cleveland.cbslocal.com/2016/05/16/indians-score-season-high-15-runs-in-blowout-win-over-reds-rapid-reaction/
'url': 'http://embed.sendtonews.com/player/embed.php?SK=GxfCe0Zo7D&MK=175909&PK=5588&autoplay=on&sound=yes',
'url': 'http://embed.sendtonews.com/player2/embedplayer.php?SC=GxfCe0Zo7D-175909-5588&type=single&autoplay=on&sound=YES',
'info_dict': {
'id': 'GxfCe0Zo7D-175909-5588',
'ext': 'mp4',
'title': 'Recap: CLE 15, CIN 6',
'description': '5/16/16: Indians\' bats explode for 15 runs in a win',
'duration': 49,
'id': 'GxfCe0Zo7D-175909-5588'
},
'playlist_count': 9,
# test the first video only to prevent lengthy tests
'playlist': [{
'info_dict': {
'id': '198180',
'ext': 'mp4',
'title': 'Recap: CLE 5, LAA 4',
'description': '8/14/16: Naquin, Almonte lead Indians in 5-4 win',
'duration': 57.343,
'thumbnail': 're:https?://.*\.jpg$',
'upload_date': '20160815',
'timestamp': 1471221961,
},
}],
'params': {
# m3u8 download
'skip_download': True,
},
}
_URL_TEMPLATE = '//embed.sendtonews.com/player/embed.php?SK=%s&MK=%s&PK=%s'
_URL_TEMPLATE = '//embed.sendtonews.com/player2/embedplayer.php?SC=%s'
@classmethod
def _extract_url(cls, webpage):
@@ -39,48 +49,41 @@ class SendtoNewsIE(JWPlatformBaseIE):
.*\bSC=(?P<SC>[0-9a-zA-Z-]+).*
\1>''', webpage)
if mobj:
sk, mk, pk = mobj.group('SC').split('-')
return cls._URL_TEMPLATE % (sk, mk, pk)
sc = mobj.group('SC')
return cls._URL_TEMPLATE % sc
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
params = compat_parse_qs(mobj.group('query'))
playlist_id = self._match_id(url)
if 'SK' not in params or 'MK' not in params or 'PK' not in params:
raise ExtractorError('Invalid URL', expected=True)
data_url = update_url_query(
url.replace('embedplayer.php', 'data_read.php'),
{'cmd': 'loadInitial'})
playlist_data = self._download_json(data_url, playlist_id)
video_id = '-'.join([params['SK'][0], params['MK'][0], params['PK'][0]])
entries = []
for video in playlist_data['playlistData'][0]:
info_dict = self._parse_jwplayer_data(
video['jwconfiguration'],
require_title=False, rtmp_params={'no_resume': True})
webpage = self._download_webpage(url, video_id)
thumbnails = []
if video.get('thumbnailUrl'):
thumbnails.append({
'id': 'normal',
'url': video['thumbnailUrl'],
})
if video.get('smThumbnailUrl'):
thumbnails.append({
'id': 'small',
'url': video['smThumbnailUrl'],
})
info_dict.update({
'title': video['S_headLine'],
'description': video.get('S_fullStory'),
'thumbnails': thumbnails,
'duration': float_or_none(video.get('SM_length')),
'timestamp': parse_iso8601(video.get('S_sysDate'), delimiter=' '),
})
entries.append(info_dict)
jwplayer_data_str = self._search_regex(
r'jwplayer\("[^"]+"\)\.setup\((.+?)\);', webpage, 'JWPlayer data')
js_vars = {
'w': 1024,
'h': 768,
'modeVar': 'html5',
}
for name, val in js_vars.items():
js_val = '%d' % val if isinstance(val, int) else '"%s"' % val
jwplayer_data_str = jwplayer_data_str.replace(':%s,' % name, ':%s,' % js_val)
info_dict = self._parse_jwplayer_data(
self._parse_json(jwplayer_data_str, video_id),
video_id, require_title=False, rtmp_params={'no_resume': True})
title = self._html_search_regex(
r'<div[^>]+class="embedTitle">([^<]+)</div>', webpage, 'title')
description = self._html_search_regex(
r'<div[^>]+class="embedSubTitle">([^<]+)</div>', webpage,
'description', fatal=False)
duration = parse_duration(self._html_search_regex(
r'<div[^>]+class="embedDetails">([0-9:]+)', webpage,
'duration', fatal=False))
info_dict.update({
'title': title,
'description': description,
'duration': duration,
})
return info_dict
return self.playlist_result(entries, playlist_id)

View File

@@ -5,9 +5,9 @@ import re
from .common import InfoExtractor
from ..utils import (
float_or_none,
str_to_int,
parse_duration,
parse_filesize,
str_to_int,
)
@@ -17,21 +17,24 @@ class SnotrIE(InfoExtractor):
'url': 'http://www.snotr.com/video/13708/Drone_flying_through_fireworks',
'info_dict': {
'id': '13708',
'ext': 'flv',
'ext': 'mp4',
'title': 'Drone flying through fireworks!',
'duration': 247,
'filesize_approx': 98566144,
'duration': 248,
'filesize_approx': 40700000,
'description': 'A drone flying through Fourth of July Fireworks',
}
'thumbnail': 're:^https?://.*\.jpg$',
},
'expected_warnings': ['description'],
}, {
'url': 'http://www.snotr.com/video/530/David_Letteman_-_George_W_Bush_Top_10',
'info_dict': {
'id': '530',
'ext': 'flv',
'ext': 'mp4',
'title': 'David Letteman - George W. Bush Top 10',
'duration': 126,
'filesize_approx': 8912896,
'filesize_approx': 8500000,
'description': 'The top 10 George W. Bush moments, brought to you by David Letterman!',
'thumbnail': 're:^https?://.*\.jpg$',
}
}]
@@ -43,26 +46,28 @@ class SnotrIE(InfoExtractor):
title = self._og_search_title(webpage)
description = self._og_search_description(webpage)
video_url = 'http://cdn.videos.snotr.com/%s.flv' % video_id
info_dict = self._parse_html5_media_entries(
url, webpage, video_id, m3u8_entry_protocol='m3u8_native')[0]
view_count = str_to_int(self._html_search_regex(
r'<p>\n<strong>Views:</strong>\n([\d,\.]+)</p>',
r'<p[^>]*>\s*<strong[^>]*>Views:</strong>\s*<span[^>]*>([\d,\.]+)',
webpage, 'view count', fatal=False))
duration = parse_duration(self._html_search_regex(
r'<p>\n<strong>Length:</strong>\n\s*([0-9:]+).*?</p>',
r'<p[^>]*>\s*<strong[^>]*>Length:</strong>\s*<span[^>]*>([\d:]+)',
webpage, 'duration', fatal=False))
filesize_approx = float_or_none(self._html_search_regex(
r'<p>\n<strong>Filesize:</strong>\n\s*([0-9.]+)\s*megabyte</p>',
webpage, 'filesize', fatal=False), invscale=1024 * 1024)
filesize_approx = parse_filesize(self._html_search_regex(
r'<p[^>]*>\s*<strong[^>]*>Filesize:</strong>\s*<span[^>]*>([^<]+)',
webpage, 'filesize', fatal=False))
return {
info_dict.update({
'id': video_id,
'description': description,
'title': title,
'url': video_url,
'view_count': view_count,
'duration': duration,
'filesize_approx': filesize_approx,
}
})
return info_dict

View File

@@ -1,13 +1,13 @@
from __future__ import unicode_literals
from .theplatform import ThePlatformIE
from .adobepass import AdobePassIE
from ..utils import (
update_url_query,
smuggle_url,
)
class SyfyIE(ThePlatformIE):
class SyfyIE(AdobePassIE):
_VALID_URL = r'https?://www\.syfy\.com/(?:[^/]+/)?videos/(?P<id>[^/?#]+)'
_TESTS = [{
'url': 'http://www.syfy.com/theinternetruinedmylife/videos/the-internet-ruined-my-life-season-1-trailer',
@@ -31,7 +31,7 @@ class SyfyIE(ThePlatformIE):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
syfy_mpx = list(self._parse_json(self._search_regex(
r'jQuery\.extend\([^,]+,\s*({.+})\);', webpage, 'drupal settings'),
r'jQuery\.extend\(Drupal\.settings\s*,\s*({.+?})\);', webpage, 'drupal settings'),
display_id)['syfy']['syfy_mpx'].values())[0]
video_id = syfy_mpx['mpxGUID']
title = syfy_mpx['episodeTitle']
@@ -40,7 +40,9 @@ class SyfyIE(ThePlatformIE):
'manifest': 'm3u',
}
if syfy_mpx.get('entitlement') == 'auth':
resource = '<rss version="2.0" xmlns:media="http://search.yahoo.com/mrss/"><channel><title>syfy</title><item><title><![CDATA[%s]]></title><guid>%s</guid><media:rating scheme="urn:v-chip">%s</media:rating></item></channel></rss>' % (title, video_id, syfy_mpx.get('mpxRating', 'TV-14'))
resource = self._get_mvpd_resource(
'syfy', title, video_id,
syfy_mpx.get('mpxRating', 'TV-14'))
query['auth'] = self._extract_mvpd_auth(
url, video_id, 'syfy', resource)

View File

@@ -6,10 +6,10 @@ import time
import hmac
import binascii
import hashlib
import netrc
from .once import OnceIE
from .adobepass import AdobePassIE
from ..compat import (
compat_parse_qs,
compat_urllib_parse_urlparse,
@@ -25,9 +25,6 @@ from ..utils import (
xpath_with_ns,
mimetype2ext,
find_xpath_attr,
unescapeHTML,
urlencode_postdata,
unified_timestamp,
)
default_ns = 'http://www.w3.org/2005/SMIL21/Language'
@@ -76,10 +73,10 @@ class ThePlatformBaseIE(OnceIE):
if isinstance(captions, list):
for caption in captions:
lang, src, mime = caption.get('lang', 'en'), caption.get('src'), caption.get('type')
subtitles[lang] = [{
subtitles.setdefault(lang, []).append({
'ext': mimetype2ext(mime),
'url': src,
}]
})
return {
'title': info['title'],
@@ -96,7 +93,7 @@ class ThePlatformBaseIE(OnceIE):
return self._parse_theplatform_metadata(info)
class ThePlatformIE(ThePlatformBaseIE):
class ThePlatformIE(ThePlatformBaseIE, AdobePassIE):
_VALID_URL = r'''(?x)
(?:https?://(?:link|player)\.theplatform\.com/[sp]/(?P<provider_id>[^/]+)/
(?:(?:(?:[^/]+/)+select/)?(?P<media>media/(?:guid/\d+/)?)|(?P<config>(?:[^/\?]+/(?:swf|config)|onsite)/select/))?
@@ -167,7 +164,6 @@ class ThePlatformIE(ThePlatformBaseIE):
'url': 'http://player.theplatform.com/p/NnzsPC/onsite_universal/select/media/guid/2410887629/2928790?fwsitesection=nbc_the_blacklist_video_library&autoPlay=true&carouselID=137781',
'only_matching': True,
}]
_SERVICE_PROVIDER_TEMPLATE = 'https://sp.auth.adobe.com/adobe-services/%s'
@classmethod
def _extract_urls(cls, webpage):
@@ -202,96 +198,6 @@ class ThePlatformIE(ThePlatformBaseIE):
sig = flags + expiration_date + checksum + str_to_hex(sig_secret)
return '%s&sig=%s' % (url, sig)
def _extract_mvpd_auth(self, url, video_id, requestor_id, resource):
def xml_text(xml_str, tag):
return self._search_regex(
'<%s>(.+?)</%s>' % (tag, tag), xml_str, tag)
mvpd_headers = {
'ap_42': 'anonymous',
'ap_11': 'Linux i686',
'ap_z': 'Mozilla/5.0 (X11; Linux i686; rv:47.0) Gecko/20100101 Firefox/47.0',
'User-Agent': 'Mozilla/5.0 (X11; Linux i686; rv:47.0) Gecko/20100101 Firefox/47.0',
}
guid = xml_text(resource, 'guid')
requestor_info = self._downloader.cache.load('mvpd', requestor_id) or {}
authn_token = requestor_info.get('authn_token')
if authn_token:
token_expires = unified_timestamp(xml_text(authn_token, 'simpleTokenExpires').replace('_GMT', ''))
if token_expires and token_expires >= time.time():
authn_token = None
if not authn_token:
# TODO add support for other TV Providers
mso_id = 'DTV'
login_info = netrc.netrc().authenticators(mso_id)
if not login_info:
return None
def post_form(form_page, note, data={}):
post_url = self._html_search_regex(r'<form[^>]+action=(["\'])(?P<url>.+?)\1', form_page, 'post url', group='url')
return self._download_webpage(
post_url, video_id, note, data=urlencode_postdata(data or self._hidden_inputs(form_page)), headers={
'Content-Type': 'application/x-www-form-urlencoded',
})
provider_redirect_page = self._download_webpage(
self._SERVICE_PROVIDER_TEMPLATE % 'authenticate/saml', video_id,
'Downloading Provider Redirect Page', query={
'noflash': 'true',
'mso_id': mso_id,
'requestor_id': requestor_id,
'no_iframe': 'false',
'domain_name': 'adobe.com',
'redirect_url': url,
})
provider_login_page = post_form(
provider_redirect_page, 'Downloading Provider Login Page')
mvpd_confirm_page = post_form(provider_login_page, 'Logging in', {
'username': login_info[0],
'password': login_info[2],
})
post_form(mvpd_confirm_page, 'Confirming Login')
session = self._download_webpage(
self._SERVICE_PROVIDER_TEMPLATE % 'session', video_id,
'Retrieving Session', data=urlencode_postdata({
'_method': 'GET',
'requestor_id': requestor_id,
}), headers=mvpd_headers)
authn_token = unescapeHTML(xml_text(session, 'authnToken'))
requestor_info['authn_token'] = authn_token
self._downloader.cache.store('mvpd', requestor_id, requestor_info)
authz_token = requestor_info.get(guid)
if not authz_token:
authorize = self._download_webpage(
self._SERVICE_PROVIDER_TEMPLATE % 'authorize', video_id,
'Retrieving Authorization Token', data=urlencode_postdata({
'resource_id': resource,
'requestor_id': requestor_id,
'authentication_token': authn_token,
'mso_id': xml_text(authn_token, 'simpleTokenMsoID'),
'userMeta': '1',
}), headers=mvpd_headers)
authz_token = unescapeHTML(xml_text(authorize, 'authzToken'))
requestor_info[guid] = authz_token
self._downloader.cache.store('mvpd', requestor_id, requestor_info)
mvpd_headers.update({
'ap_19': xml_text(authn_token, 'simpleSamlNameID'),
'ap_23': xml_text(authn_token, 'simpleSamlSessionIndex'),
})
return self._download_webpage(
self._SERVICE_PROVIDER_TEMPLATE % 'shortAuthorize',
video_id, 'Retrieving Media Token', data=urlencode_postdata({
'authz_token': authz_token,
'requestor_id': requestor_id,
'session_guid': xml_text(authn_token, 'simpleTokenAuthenticationGuid'),
'hashed_guid': 'false',
}), headers=mvpd_headers)
def _real_extract(self, url):
url, smuggled_data = unsmuggle_url(url, {})

View File

@@ -1,18 +1,13 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
int_or_none,
sanitized_Request,
str_to_int,
)
from ..aes import aes_decrypt_text
from .keezmovies import KeezMoviesIE
class Tube8IE(InfoExtractor):
class Tube8IE(KeezMoviesIE):
_VALID_URL = r'https?://(?:www\.)?tube8\.com/(?:[^/]+/)+(?P<display_id>[^/]+)/(?P<id>\d+)'
_TESTS = [{
'url': 'http://www.tube8.com/teen/kasia-music-video/229795/',
@@ -33,47 +28,17 @@ class Tube8IE(InfoExtractor):
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
display_id = mobj.group('display_id')
webpage, info = self._extract_info(url)
req = sanitized_Request(url)
req.add_header('Cookie', 'age_verified=1')
webpage = self._download_webpage(req, display_id)
if not info['title']:
info['title'] = self._html_search_regex(
r'videoTitle\s*=\s*"([^"]+)', webpage, 'title')
flashvars = self._parse_json(
self._search_regex(
r'flashvars\s*=\s*({.+?});\r?\n', webpage, 'flashvars'),
video_id)
formats = []
for key, video_url in flashvars.items():
if not isinstance(video_url, compat_str) or not video_url.startswith('http'):
continue
height = self._search_regex(
r'quality_(\d+)[pP]', key, 'height', default=None)
if not height:
continue
if flashvars.get('encrypted') is True:
video_url = aes_decrypt_text(
video_url, flashvars['video_title'], 32).decode('utf-8')
formats.append({
'url': video_url,
'format_id': '%sp' % height,
'height': int(height),
})
self._sort_formats(formats)
thumbnail = flashvars.get('image_url')
title = self._html_search_regex(
r'videoTitle\s*=\s*"([^"]+)', webpage, 'title')
description = self._html_search_regex(
r'>Description:</strong>\s*(.+?)\s*<', webpage, 'description', fatal=False)
uploader = self._html_search_regex(
r'<span class="username">\s*(.+?)\s*<',
webpage, 'uploader', fatal=False)
duration = int_or_none(flashvars.get('video_duration'))
like_count = int_or_none(self._search_regex(
r'rupVar\s*=\s*"(\d+)"', webpage, 'like count', fatal=False))
@@ -86,18 +51,13 @@ class Tube8IE(InfoExtractor):
r'<span id="allCommentsCount">(\d+)</span>',
webpage, 'comment count', fatal=False))
return {
'id': video_id,
'display_id': display_id,
'title': title,
info.update({
'description': description,
'thumbnail': thumbnail,
'uploader': uploader,
'duration': duration,
'view_count': view_count,
'like_count': like_count,
'dislike_count': dislike_count,
'comment_count': comment_count,
'age_limit': 18,
'formats': formats,
}
})
return info

View File

@@ -15,21 +15,31 @@ from ..utils import (
int_or_none,
parse_iso8601,
qualities,
try_get,
update_url_query,
)
class TVPlayIE(InfoExtractor):
IE_DESC = 'TV3Play and related services'
_VALID_URL = r'''(?x)https?://(?:www\.)?
(?:tvplay(?:\.skaties)?\.lv/parraides|
(?:tv3play|play\.tv3)\.lt/programos|
tv3play(?:\.tv3)?\.ee/sisu|
tv(?:3|6|8|10)play\.se/program|
(?:(?:tv3play|viasat4play|tv6play)\.no|tv3play\.dk)/programmer|
play\.novatv\.bg/programi
)/[^/]+/(?P<id>\d+)
'''
IE_NAME = 'mtg'
IE_DESC = 'MTG services'
_VALID_URL = r'''(?x)
(?:
mtg:|
https?://
(?:www\.)?
(?:
tvplay(?:\.skaties)?\.lv/parraides|
(?:tv3play|play\.tv3)\.lt/programos|
tv3play(?:\.tv3)?\.ee/sisu|
(?:tv(?:3|6|8|10)play|viafree)\.se/program|
(?:(?:tv3play|viasat4play|tv6play|viafree)\.no|(?:tv3play|viafree)\.dk)/programmer|
play\.novatv\.bg/programi
)
/(?:[^/]+/)+
)
(?P<id>\d+)
'''
_TESTS = [
{
'url': 'http://www.tvplay.lv/parraides/vinas-melo-labak/418113?autostart=true',
@@ -194,9 +204,22 @@ class TVPlayIE(InfoExtractor):
'url': 'http://tvplay.skaties.lv/parraides/vinas-melo-labak/418113?autostart=true',
'only_matching': True,
},
{
# views is null
'url': 'http://tvplay.skaties.lv/parraides/tv3-zinas/760183',
'only_matching': True,
},
{
'url': 'http://tv3play.tv3.ee/sisu/kodu-keset-linna/238551?autostart=true',
'only_matching': True,
},
{
'url': 'http://www.viafree.se/program/underhallning/i-like-radio-live/sasong-1/676869',
'only_matching': True,
},
{
'url': 'mtg:418113',
'only_matching': True,
}
]
@@ -204,13 +227,13 @@ class TVPlayIE(InfoExtractor):
video_id = self._match_id(url)
video = self._download_json(
'http://playapi.mtgx.tv/v1/videos/%s' % video_id, video_id, 'Downloading video JSON')
'http://playapi.mtgx.tv/v3/videos/%s' % video_id, video_id, 'Downloading video JSON')
title = video['title']
try:
streams = self._download_json(
'http://playapi.mtgx.tv/v1/videos/stream/%s' % video_id,
'http://playapi.mtgx.tv/v3/videos/stream/%s' % video_id,
video_id, 'Downloading streams JSON')
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 403:
@@ -289,8 +312,61 @@ class TVPlayIE(InfoExtractor):
'season_number': season_number,
'duration': int_or_none(video.get('duration')),
'timestamp': parse_iso8601(video.get('created_at')),
'view_count': int_or_none(video.get('views', {}).get('total')),
'view_count': try_get(video, lambda x: x['views']['total'], int),
'age_limit': int_or_none(video.get('age_limit', 0)),
'formats': formats,
'subtitles': subtitles,
}
class ViafreeIE(InfoExtractor):
_VALID_URL = r'''(?x)
https?://
(?:www\.)?
viafree\.
(?:
(?:dk|no)/programmer|
se/program
)
/(?:[^/]+/)+(?P<id>[^/?#&]+)
'''
_TESTS = [{
'url': 'http://www.viafree.se/program/livsstil/husraddarna/sasong-2/avsnitt-2',
'info_dict': {
'id': '395375',
'ext': 'mp4',
'title': 'Husräddarna S02E02',
'description': 'md5:4db5c933e37db629b5a2f75dfb34829e',
'series': 'Husräddarna',
'season': 'Säsong 2',
'season_number': 2,
'duration': 2576,
'timestamp': 1400596321,
'upload_date': '20140520',
},
'params': {
'skip_download': True,
},
'add_ie': [TVPlayIE.ie_key()],
}, {
'url': 'http://www.viafree.no/programmer/underholdning/det-beste-vorspielet/sesong-2/episode-1',
'only_matching': True,
}, {
'url': 'http://www.viafree.dk/programmer/reality/paradise-hotel/saeson-7/episode-5',
'only_matching': True,
}]
@classmethod
def suitable(cls, url):
return False if TVPlayIE.suitable(url) else super(ViafreeIE, cls).suitable(url)
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
video_id = self._search_regex(
r'currentVideo["\']\s*:\s*.+?["\']id["\']\s*:\s*["\'](?P<id>\d{6,})',
webpage, 'video id')
return self.url_result('mtg:%s' % video_id, TVPlayIE.ie_key())

View File

@@ -7,6 +7,7 @@ import random
from .common import InfoExtractor
from ..compat import (
compat_HTTPError,
compat_parse_qs,
compat_str,
compat_urllib_parse_urlencode,
@@ -14,13 +15,13 @@ from ..compat import (
compat_urlparse,
)
from ..utils import (
clean_html,
ExtractorError,
int_or_none,
js_to_json,
orderedSet,
parse_duration,
parse_iso8601,
sanitized_Request,
urlencode_postdata,
)
@@ -42,7 +43,7 @@ class TwitchBaseIE(InfoExtractor):
'%s returned error: %s - %s' % (self.IE_NAME, error, response.get('message')),
expected=True)
def _download_json(self, url, video_id, note='Downloading JSON metadata'):
def _call_api(self, path, item_id, note):
headers = {
'Referer': 'http://api.twitch.tv/crossdomain/receiver.html?v=2',
'X-Requested-With': 'XMLHttpRequest',
@@ -50,8 +51,8 @@ class TwitchBaseIE(InfoExtractor):
for cookie in self._downloader.cookiejar:
if cookie.name == 'api_token':
headers['Twitch-Api-Token'] = cookie.value
request = sanitized_Request(url, headers=headers)
response = super(TwitchBaseIE, self)._download_json(request, video_id, note)
response = self._download_json(
'%s/%s' % (self._API_BASE, path), item_id, note)
self._handle_error(response)
return response
@@ -63,9 +64,17 @@ class TwitchBaseIE(InfoExtractor):
if username is None:
return
def fail(message):
raise ExtractorError(
'Unable to login. Twitch said: %s' % message, expected=True)
login_page, handle = self._download_webpage_handle(
self._LOGIN_URL, None, 'Downloading login page')
# Some TOR nodes and public proxies are blocked completely
if 'blacklist_message' in login_page:
fail(clean_html(login_page))
login_form = self._hidden_inputs(login_page)
login_form.update({
@@ -82,21 +91,24 @@ class TwitchBaseIE(InfoExtractor):
if not post_url.startswith('http'):
post_url = compat_urlparse.urljoin(redirect_url, post_url)
request = sanitized_Request(
post_url, urlencode_postdata(login_form))
request.add_header('Referer', redirect_url)
response = self._download_webpage(
request, None, 'Logging in as %s' % username)
headers = {'Referer': redirect_url}
error_message = self._search_regex(
r'<div[^>]+class="subwindow_notice"[^>]*>([^<]+)</div>',
response, 'error message', default=None)
if error_message:
raise ExtractorError(
'Unable to login. Twitch said: %s' % error_message, expected=True)
try:
response = self._download_json(
post_url, None, 'Logging in as %s' % username,
data=urlencode_postdata(login_form),
headers=headers)
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 400:
response = self._parse_json(
e.cause.read().decode('utf-8'), None)
fail(response['message'])
raise
if '>Reset your password<' in response:
self.report_warning('Twitch asks you to reset your password, go to https://secure.twitch.tv/reset/submit')
if response.get('redirect'):
self._download_webpage(
response['redirect'], None, 'Downloading login redirect page',
headers=headers)
def _prefer_source(self, formats):
try:
@@ -109,14 +121,14 @@ class TwitchBaseIE(InfoExtractor):
class TwitchItemBaseIE(TwitchBaseIE):
def _download_info(self, item, item_id):
return self._extract_info(self._download_json(
'%s/kraken/videos/%s%s' % (self._API_BASE, item, item_id), item_id,
return self._extract_info(self._call_api(
'kraken/videos/%s%s' % (item, item_id), item_id,
'Downloading %s info JSON' % self._ITEM_TYPE))
def _extract_media(self, item_id):
info = self._download_info(self._ITEM_SHORTCUT, item_id)
response = self._download_json(
'%s/api/videos/%s%s' % (self._API_BASE, self._ITEM_SHORTCUT, item_id), item_id,
response = self._call_api(
'api/videos/%s%s' % (self._ITEM_SHORTCUT, item_id), item_id,
'Downloading %s playlist JSON' % self._ITEM_TYPE)
entries = []
chunks = response['chunks']
@@ -246,8 +258,8 @@ class TwitchVodIE(TwitchItemBaseIE):
item_id = self._match_id(url)
info = self._download_info(self._ITEM_SHORTCUT, item_id)
access_token = self._download_json(
'%s/api/vods/%s/access_token' % (self._API_BASE, item_id), item_id,
access_token = self._call_api(
'api/vods/%s/access_token' % item_id, item_id,
'Downloading %s access token' % self._ITEM_TYPE)
formats = self._extract_m3u8_formats(
@@ -275,12 +287,12 @@ class TwitchVodIE(TwitchItemBaseIE):
class TwitchPlaylistBaseIE(TwitchBaseIE):
_PLAYLIST_URL = '%s/kraken/channels/%%s/videos/?offset=%%d&limit=%%d' % TwitchBaseIE._API_BASE
_PLAYLIST_PATH = 'kraken/channels/%s/videos/?offset=%d&limit=%d'
_PAGE_LIMIT = 100
def _extract_playlist(self, channel_id):
info = self._download_json(
'%s/kraken/channels/%s' % (self._API_BASE, channel_id),
info = self._call_api(
'kraken/channels/%s' % channel_id,
channel_id, 'Downloading channel info JSON')
channel_name = info.get('display_name') or info.get('name')
entries = []
@@ -289,8 +301,8 @@ class TwitchPlaylistBaseIE(TwitchBaseIE):
broken_paging_detected = False
counter_override = None
for counter in itertools.count(1):
response = self._download_json(
self._PLAYLIST_URL % (channel_id, offset, limit),
response = self._call_api(
self._PLAYLIST_PATH % (channel_id, offset, limit),
channel_id,
'Downloading %s videos JSON page %s'
% (self._PLAYLIST_TYPE, counter_override or counter))
@@ -345,7 +357,7 @@ class TwitchProfileIE(TwitchPlaylistBaseIE):
class TwitchPastBroadcastsIE(TwitchPlaylistBaseIE):
IE_NAME = 'twitch:past_broadcasts'
_VALID_URL = r'%s/(?P<id>[^/]+)/profile/past_broadcasts/?(?:\#.*)?$' % TwitchBaseIE._VALID_URL_BASE
_PLAYLIST_URL = TwitchPlaylistBaseIE._PLAYLIST_URL + '&broadcasts=true'
_PLAYLIST_PATH = TwitchPlaylistBaseIE._PLAYLIST_PATH + '&broadcasts=true'
_PLAYLIST_TYPE = 'past broadcasts'
_TEST = {
@@ -389,8 +401,8 @@ class TwitchStreamIE(TwitchBaseIE):
def _real_extract(self, url):
channel_id = self._match_id(url)
stream = self._download_json(
'%s/kraken/streams/%s' % (self._API_BASE, channel_id), channel_id,
stream = self._call_api(
'kraken/streams/%s' % channel_id, channel_id,
'Downloading stream JSON').get('stream')
# Fallback on profile extraction if stream is offline
@@ -405,8 +417,8 @@ class TwitchStreamIE(TwitchBaseIE):
# JSON and fallback to lowercase if it's not available.
channel_id = stream.get('channel', {}).get('name') or channel_id.lower()
access_token = self._download_json(
'%s/api/channels/%s/access_token' % (self._API_BASE, channel_id), channel_id,
access_token = self._call_api(
'api/channels/%s/access_token' % channel_id, channel_id,
'Downloading channel access token')
query = {

View File

@@ -0,0 +1,70 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
float_or_none,
ExtractorError,
)
class UplynkIE(InfoExtractor):
IE_NAME = 'uplynk'
_VALID_URL = r'https?://.*?\.uplynk\.com/(?P<path>ext/[0-9a-f]{32}/(?P<external_id>[^/?&]+)|(?P<id>[0-9a-f]{32}))\.(?:m3u8|json)(?:.*?\bpbs=(?P<session_id>[^&]+))?'
_TEST = {
'url': 'http://content.uplynk.com/e89eaf2ce9054aa89d92ddb2d817a52e.m3u8',
'info_dict': {
'id': 'e89eaf2ce9054aa89d92ddb2d817a52e',
'ext': 'mp4',
'title': '030816-kgo-530pm-solar-eclipse-vid_web.mp4',
'uploader_id': '4413701bf5a1488db55b767f8ae9d4fa',
},
'params': {
# m3u8 download
'skip_download': True,
},
}
def _extract_uplynk_info(self, uplynk_content_url):
path, external_id, video_id, session_id = re.match(UplynkIE._VALID_URL, uplynk_content_url).groups()
display_id = video_id or external_id
formats = self._extract_m3u8_formats('http://content.uplynk.com/%s.m3u8' % path, display_id, 'mp4')
if session_id:
for f in formats:
f['extra_param_to_segment_url'] = {
'pbs': session_id,
}
self._sort_formats(formats)
asset = self._download_json('http://content.uplynk.com/player/assetinfo/%s.json' % path, display_id)
if asset.get('error') == 1:
raise ExtractorError('% said: %s' % (self.IE_NAME, asset['msg']), expected=True)
return {
'id': asset['asset'],
'title': asset['desc'],
'thumbnail': asset.get('default_poster_url'),
'duration': float_or_none(asset.get('duration')),
'uploader_id': asset.get('owner'),
'formats': formats,
}
def _real_extract(self, url):
return self._extract_uplynk_info(url)
class UplynkPreplayIE(UplynkIE):
IE_NAME = 'uplynk:preplay'
_VALID_URL = r'https?://.*?\.uplynk\.com/preplay2?/(?P<path>ext/[0-9a-f]{32}/(?P<external_id>[^/?&]+)|(?P<id>[0-9a-f]{32}))\.json'
_TEST = None
def _real_extract(self, url):
path, external_id, video_id = re.match(self._VALID_URL, url).groups()
display_id = video_id or external_id
preplay = self._download_json(url, display_id)
content_url = 'http://content.uplynk.com/%s.m3u8' % path
session_id = preplay.get('sid')
if session_id:
content_url += '?pbs=' + session_id
return self._extract_uplynk_info(content_url)

View File

@@ -1,12 +1,14 @@
# encoding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import urlencode_postdata
class Vbox7IE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?vbox7\.com/play:(?P<id>[^/]+)'
_VALID_URL = r'https?://(?:www\.)?vbox7\.com/(?:play:|emb/external\.php\?.*?\bvid=)(?P<id>[\da-fA-F]+)'
_TESTS = [{
'url': 'http://vbox7.com/play:0946fff23c',
'md5': 'a60f9ab3a3a2f013ef9a967d5f7be5bf',
@@ -24,15 +26,27 @@ class Vbox7IE(InfoExtractor):
'title': 'Смях! Чудо - чист за секунди - Скрита камера',
},
'skip': 'georestricted',
}, {
'url': 'http://vbox7.com/emb/external.php?vid=a240d20f9c&autoplay=1',
'only_matching': True,
}]
@staticmethod
def _extract_url(webpage):
mobj = re.search(
'<iframe[^>]+src=(?P<q>["\'])(?P<url>(?:https?:)?//vbox7\.com/emb/external\.php.+?)(?P=q)',
webpage)
if mobj:
return mobj.group('url')
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
webpage = self._download_webpage(
'http://vbox7.com/play:%s' % video_id, video_id)
title = self._html_search_regex(
r'<title>(.*)</title>', webpage, 'title').split('/')[0].strip()
r'<title>(.+?)</title>', webpage, 'title').split('/')[0].strip()
video_url = self._search_regex(
r'src\s*:\s*(["\'])(?P<url>.+?.mp4.*?)\1',

View File

@@ -8,6 +8,7 @@ from .xstream import XstreamIE
from ..utils import (
ExtractorError,
float_or_none,
try_get,
)
@@ -129,6 +130,11 @@ class VGTVIE(XstreamIE):
'url': 'http://ap.vgtv.no/webtv#!/video/111084/de-nye-bysyklene-lettere-bedre-gir-stoerre-hjul-og-feste-til-mobil',
'only_matching': True,
},
{
# geoblocked
'url': 'http://www.vgtv.no/#!/video/127205/inside-the-mind-of-favela-funk',
'only_matching': True,
},
]
def _real_extract(self, url):
@@ -196,6 +202,12 @@ class VGTVIE(XstreamIE):
info['formats'].extend(formats)
if not info['formats']:
properties = try_get(
data, lambda x: x['streamConfiguration']['properties'], list)
if properties and 'geoblocked' in properties:
raise self.raise_geo_restricted()
self._sort_formats(info['formats'])
info.update({

View File

@@ -0,0 +1,107 @@
# coding: utf-8
from __future__ import unicode_literals
import time
import hashlib
import json
from .adobepass import AdobePassIE
from ..compat import compat_HTTPError
from ..utils import (
int_or_none,
parse_age_limit,
str_or_none,
parse_duration,
ExtractorError,
extract_attributes,
)
class VicelandIE(AdobePassIE):
_VALID_URL = r'https?://(?:www\.)?viceland\.com/[^/]+/video/[^/]+/(?P<id>[a-f0-9]+)'
_TEST = {
'url': 'https://www.viceland.com/en_us/video/cyberwar-trailer/57608447973ee7705f6fbd4e',
'info_dict': {
'id': '57608447973ee7705f6fbd4e',
'ext': 'mp4',
'title': 'CYBERWAR (Trailer)',
'description': 'Tapping into the geopolitics of hacking and surveillance, Ben Makuch travels the world to meet with hackers, government officials, and dissidents to investigate the ecosystem of cyberwarfare.',
'age_limit': 14,
'timestamp': 1466008539,
'upload_date': '20160615',
'uploader_id': '11',
'uploader': 'Viceland',
},
'params': {
# m3u8 download
'skip_download': True,
},
'add_ie': ['UplynkPreplay'],
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
watch_hub_data = extract_attributes(self._search_regex(
r'(?s)(<watch-hub\s*.+?</watch-hub>)', webpage, 'watch hub'))
video_id = watch_hub_data['vms-id']
title = watch_hub_data['video-title']
query = {}
if watch_hub_data.get('video-locked') == '1':
resource = self._get_mvpd_resource(
'VICELAND', title, video_id,
watch_hub_data.get('video-rating'))
query['tvetoken'] = self._extract_mvpd_auth(url, video_id, 'VICELAND', resource)
# signature generation algorithm is reverse engineered from signatureGenerator in
# webpack:///../shared/~/vice-player/dist/js/vice-player.js in
# https://www.viceland.com/assets/common/js/web.vendor.bundle.js
exp = int(time.time()) + 14400
query.update({
'exp': exp,
'sign': hashlib.sha512(('%s:GET:%d' % (video_id, exp)).encode()).hexdigest(),
})
try:
preplay = self._download_json('https://www.viceland.com/en_us/preplay/%s' % video_id, video_id, query=query)
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 400:
error = json.loads(e.cause.read().decode())
raise ExtractorError('%s said: %s' % (self.IE_NAME, error['details']), expected=True)
raise
video_data = preplay['video']
base = video_data['base']
uplynk_preplay_url = preplay['preplayURL']
episode = video_data.get('episode', {})
channel = video_data.get('channel', {})
subtitles = {}
cc_url = preplay.get('ccURL')
if cc_url:
subtitles['en'] = [{
'url': cc_url,
}]
return {
'_type': 'url_transparent',
'url': uplynk_preplay_url,
'id': video_id,
'title': title,
'description': base.get('body'),
'thumbnail': watch_hub_data.get('cover-image') or watch_hub_data.get('thumbnail'),
'duration': parse_duration(video_data.get('video_duration') or watch_hub_data.get('video-duration')),
'timestamp': int_or_none(video_data.get('created_at')),
'age_limit': parse_age_limit(video_data.get('video_rating')),
'series': video_data.get('show_title') or watch_hub_data.get('show-title'),
'episode_number': int_or_none(episode.get('episode_number') or watch_hub_data.get('episode')),
'episode_id': str_or_none(episode.get('id') or video_data.get('episode_id')),
'season_number': int_or_none(watch_hub_data.get('season')),
'season_id': str_or_none(episode.get('season_id')),
'uploader': channel.get('base', {}).get('title') or watch_hub_data.get('channel-title'),
'uploader_id': str_or_none(channel.get('id')),
'subtitles': subtitles,
'ie_key': 'UplynkPreplay',
}

View File

@@ -1,6 +1,7 @@
# encoding: utf-8
from __future__ import unicode_literals
import collections
import re
import json
import sys
@@ -16,7 +17,6 @@ from ..utils import (
get_element_by_class,
int_or_none,
orderedSet,
parse_duration,
remove_start,
str_to_int,
unescapeHTML,
@@ -52,8 +52,9 @@ class VKBaseIE(InfoExtractor):
# what actually happens.
# We will workaround this VK issue by resetting the remixlhk cookie to
# the first one manually.
cookies = url_handle.headers.get('Set-Cookie')
if cookies:
for header, cookies in url_handle.headers.items():
if header.lower() != 'set-cookie':
continue
if sys.version_info[0] >= 3:
cookies = cookies.encode('iso-8859-1')
cookies = cookies.decode('utf-8')
@@ -61,6 +62,7 @@ class VKBaseIE(InfoExtractor):
if remixlhk:
value, domain = remixlhk.groups()
self._set_cookie(domain, 'remixlhk', value)
break
login_page = self._download_webpage(
'https://login.vk.com/?act=login', None,
@@ -445,6 +447,9 @@ class VKWallPostIE(VKBaseIE):
'skip_download': True,
},
}],
'params': {
'usenetrc': True,
},
'skip': 'Requires vk account credentials',
}, {
# single YouTube embed, no leading -
@@ -454,6 +459,9 @@ class VKWallPostIE(VKBaseIE):
'title': 'Sergey Gorbunov - Wall post 85155021_6319',
},
'playlist_count': 1,
'params': {
'usenetrc': True,
},
'skip': 'Requires vk account credentials',
}, {
# wall page URL
@@ -481,37 +489,41 @@ class VKWallPostIE(VKBaseIE):
raise ExtractorError('VK said: %s' % error, expected=True)
description = clean_html(get_element_by_class('wall_post_text', webpage))
uploader = clean_html(get_element_by_class(
'fw_post_author', webpage)) or self._og_search_description(webpage)
uploader = clean_html(get_element_by_class('author', webpage))
thumbnail = self._og_search_thumbnail(webpage)
entries = []
for audio in re.finditer(r'''(?sx)
<input[^>]+
id=(?P<q1>["\'])audio_info(?P<id>\d+_\d+).*?(?P=q1)[^>]+
value=(?P<q2>["\'])(?P<url>http.+?)(?P=q2)
.+?
</table>''', webpage):
audio_html = audio.group(0)
audio_id = audio.group('id')
duration = parse_duration(get_element_by_class('duration', audio_html))
track = self._html_search_regex(
r'<span[^>]+id=["\']title%s[^>]*>([^<]+)' % audio_id,
audio_html, 'title', default=None)
artist = self._html_search_regex(
r'>([^<]+)</a></b>\s*&ndash', audio_html,
'artist', default=None)
entries.append({
'id': audio_id,
'url': audio.group('url'),
'title': '%s - %s' % (artist, track) if artist and track else audio_id,
'thumbnail': thumbnail,
'duration': duration,
'uploader': uploader,
'artist': artist,
'track': track,
})
audio_ids = re.findall(r'data-full-id=["\'](\d+_\d+)', webpage)
if audio_ids:
al_audio = self._download_webpage(
'https://vk.com/al_audio.php', post_id,
note='Downloading audio info', fatal=False,
data=urlencode_postdata({
'act': 'reload_audio',
'al': '1',
'ids': ','.join(audio_ids)
}))
if al_audio:
Audio = collections.namedtuple(
'Audio', ['id', 'user_id', 'url', 'track', 'artist', 'duration'])
audios = self._parse_json(
self._search_regex(
r'<!json>(.+?)<!>', al_audio, 'audios', default='[]'),
post_id, fatal=False, transform_source=unescapeHTML)
if isinstance(audios, list):
for audio in audios:
a = Audio._make(audio[:6])
entries.append({
'id': '%s_%s' % (a.user_id, a.id),
'url': a.url,
'title': '%s - %s' % (a.artist, a.track) if a.artist and a.track else a.id,
'thumbnail': thumbnail,
'duration': a.duration,
'uploader': uploader,
'artist': a.artist,
'track': a.track,
})
for video in re.finditer(
r'<a[^>]+href=(["\'])(?P<url>/video(?:-?[\d_]+).*?)\1', webpage):

View File

@@ -17,12 +17,12 @@ class VuClipIE(InfoExtractor):
_VALID_URL = r'https?://(?:m\.)?vuclip\.com/w\?.*?cid=(?P<id>[0-9]+)'
_TEST = {
'url': 'http://m.vuclip.com/w?cid=922692425&fid=70295&z=1010&nvar&frm=index.html',
'url': 'http://m.vuclip.com/w?cid=1129900602&bu=8589892792&frm=w&z=34801&op=0&oc=843169247&section=recommend',
'info_dict': {
'id': '922692425',
'id': '1129900602',
'ext': '3gp',
'title': 'The Toy Soldiers - Hollywood Movie Trailer',
'duration': 177,
'title': 'Top 10 TV Convicts',
'duration': 733,
}
}
@@ -54,7 +54,7 @@ class VuClipIE(InfoExtractor):
'url': video_url,
}]
else:
formats = self._parse_html5_media_entries(url, webpage)[0]['formats']
formats = self._parse_html5_media_entries(url, webpage, video_id)[0]['formats']
title = remove_end(self._html_search_regex(
r'<title>(.*?)-\s*Vuclip</title>', webpage, 'title').strip(), ' - Video')

View File

@@ -13,6 +13,7 @@ class XiamiBaseIE(InfoExtractor):
webpage = super(XiamiBaseIE, self)._download_webpage(*args, **kwargs)
if '>Xiami is currently not available in your country.<' in webpage:
self.raise_geo_restricted('Xiami is currently not available in your country')
return webpage
def _extract_track(self, track, track_id=None):
title = track['title']

View File

@@ -15,10 +15,10 @@ class XVideosIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?xvideos\.com/video(?P<id>[0-9]+)(?:.*)'
_TEST = {
'url': 'http://www.xvideos.com/video4588838/biker_takes_his_girl',
'md5': '4b46ae6ea5e6e9086e714d883313c0c9',
'md5': '14cea69fcb84db54293b1e971466c2e1',
'info_dict': {
'id': '4588838',
'ext': 'flv',
'ext': 'mp4',
'title': 'Biker Takes his Girl',
'age_limit': 18,
}
@@ -42,24 +42,24 @@ class XVideosIE(InfoExtractor):
video_url = compat_urllib_parse_unquote(self._search_regex(
r'flv_url=(.+?)&', webpage, 'video URL', default=''))
if video_url:
formats.append({'url': video_url})
formats.append({
'url': video_url,
'format_id': 'flv',
})
player_args = self._search_regex(
r'(?s)new\s+HTML5Player\((.+?)\)', webpage, ' html5 player', default=None)
if player_args:
for arg in player_args.split(','):
format_url = self._search_regex(
r'(["\'])(?P<url>https?://.+?)\1', arg, 'url',
default=None, group='url')
if not format_url:
continue
ext = determine_ext(format_url)
if ext == 'mp4':
formats.append({'url': format_url})
elif ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
format_url, video_id, 'mp4',
entry_protocol='m3u8_native', m3u8_id='hls', fatal=False))
for kind, _, format_url in re.findall(
r'setVideo([^(]+)\((["\'])(http.+?)\2\)', webpage):
format_id = kind.lower()
if format_id == 'hls':
formats.extend(self._extract_m3u8_formats(
format_url, video_id, 'mp4',
entry_protocol='m3u8_native', m3u8_id='hls', fatal=False))
elif format_id in ('urllow', 'urlhigh'):
formats.append({
'url': format_url,
'format_id': '%s-%s' % (determine_ext(format_url, 'mp4'), format_id[3:]),
'quality': -2 if format_id.endswith('low') else None,
})
self._sort_formats(formats)

View File

@@ -91,36 +91,18 @@ class YoutubeBaseInfoExtractor(InfoExtractor):
if login_page is False:
return
galx = self._search_regex(r'(?s)<input.+?name="GALX".+?value="(.+?)"',
login_page, 'Login GALX parameter')
login_form = self._hidden_inputs(login_page)
# Log in
login_form_strs = {
'continue': 'https://www.youtube.com/signin?action_handle_signin=true&feature=sign_in_button&hl=en_US&nomobiletemp=1',
login_form.update({
'checkConnection': 'youtube',
'Email': username,
'GALX': galx,
'Passwd': password,
'PersistentCookie': 'yes',
'_utf8': '',
'bgresponse': 'js_disabled',
'checkConnection': '',
'checkedDomains': 'youtube',
'dnConn': '',
'pstMsg': '0',
'rmShown': '1',
'secTok': '',
'signIn': 'Sign in',
'timeStmp': '',
'service': 'youtube',
'uilel': '3',
'hl': 'en_US',
}
})
login_results = self._download_webpage(
self._PASSWORD_CHALLENGE_URL, None,
note='Logging in', errnote='unable to log in', fatal=False,
data=urlencode_postdata(login_form_strs))
data=urlencode_postdata(login_form))
if login_results is False:
return False

View File

@@ -4,13 +4,17 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import ExtractorError
from ..utils import (
ExtractorError,
int_or_none,
update_url_query,
)
class ZingMp3BaseInfoExtractor(InfoExtractor):
def _extract_item(self, item, fatal=True):
error_message = item.find('./errormessage').text
def _extract_item(self, item, page_type, fatal=True):
error_message = item.get('msg')
if error_message:
if not fatal:
return
@@ -18,25 +22,48 @@ class ZingMp3BaseInfoExtractor(InfoExtractor):
'%s returned error: %s' % (self.IE_NAME, error_message),
expected=True)
title = item.find('./title').text.strip()
source = item.find('./source').text
extension = item.attrib['type']
thumbnail = item.find('./backimage').text
formats = []
for quality, source_url in zip(item.get('qualities') or item.get('quality', []), item.get('source_list') or item.get('source', [])):
if not source_url or source_url == 'require vip':
continue
if not re.match(r'https?://', source_url):
source_url = '//' + source_url
source_url = self._proto_relative_url(source_url, 'http:')
quality_num = int_or_none(quality)
f = {
'format_id': quality,
'url': source_url,
}
if page_type == 'video':
f.update({
'height': quality_num,
'ext': 'mp4',
})
else:
f.update({
'abr': quality_num,
'ext': 'mp3',
})
formats.append(f)
cover = item.get('cover')
return {
'title': title,
'url': source,
'ext': extension,
'thumbnail': thumbnail,
'title': (item.get('name') or item.get('title')).strip(),
'formats': formats,
'thumbnail': 'http:/' + cover if cover else None,
'artist': item.get('artist'),
}
def _extract_player_xml(self, player_xml_url, id, playlist_title=None):
player_xml = self._download_xml(player_xml_url, id, 'Downloading Player XML')
items = player_xml.findall('./item')
def _extract_player_json(self, player_json_url, id, page_type, playlist_title=None):
player_json = self._download_json(player_json_url, id, 'Downloading Player JSON')
items = player_json['data']
if 'item' in items:
items = items['item']
if len(items) == 1:
# one single song
data = self._extract_item(items[0])
data = self._extract_item(items[0], page_type)
data['id'] = id
return data
@@ -45,7 +72,7 @@ class ZingMp3BaseInfoExtractor(InfoExtractor):
entries = []
for i, item in enumerate(items, 1):
entry = self._extract_item(item, fatal=False)
entry = self._extract_item(item, page_type, fatal=False)
if not entry:
continue
entry['id'] = '%s-%d' % (id, i)
@@ -59,8 +86,8 @@ class ZingMp3BaseInfoExtractor(InfoExtractor):
}
class ZingMp3SongIE(ZingMp3BaseInfoExtractor):
_VALID_URL = r'https?://mp3\.zing\.vn/bai-hat/(?P<slug>[^/]+)/(?P<song_id>\w+)\.html'
class ZingMp3IE(ZingMp3BaseInfoExtractor):
_VALID_URL = r'https?://mp3\.zing\.vn/(?:bai-hat|album|playlist|video-clip)/[^/]+/(?P<id>\w+)\.html'
_TESTS = [{
'url': 'http://mp3.zing.vn/bai-hat/Xa-Mai-Xa-Bao-Thy/ZWZB9WAB.html',
'md5': 'ead7ae13693b3205cbc89536a077daed',
@@ -70,51 +97,47 @@ class ZingMp3SongIE(ZingMp3BaseInfoExtractor):
'ext': 'mp3',
'thumbnail': 're:^https?://.*\.jpg$',
},
}]
IE_NAME = 'zingmp3:song'
IE_DESC = 'mp3.zing.vn songs'
def _real_extract(self, url):
matched = re.match(self._VALID_URL, url)
slug = matched.group('slug')
song_id = matched.group('song_id')
webpage = self._download_webpage(
'http://mp3.zing.vn/bai-hat/%s/%s.html' % (slug, song_id), song_id)
player_xml_url = self._search_regex(
r'&amp;xmlURL=(?P<xml_url>[^&]+)&', webpage, 'player xml url')
return self._extract_player_xml(player_xml_url, song_id)
class ZingMp3AlbumIE(ZingMp3BaseInfoExtractor):
_VALID_URL = r'https?://mp3\.zing\.vn/(?:album|playlist)/(?P<slug>[^/]+)/(?P<album_id>\w+)\.html'
_TESTS = [{
}, {
'url': 'http://mp3.zing.vn/video-clip/Let-It-Go-Frozen-OST-Sungha-Jung/ZW6BAEA0.html',
'md5': '870295a9cd8045c0e15663565902618d',
'info_dict': {
'id': 'ZW6BAEA0',
'title': 'Let It Go (Frozen OST)',
'ext': 'mp4',
},
}, {
'url': 'http://mp3.zing.vn/album/Lau-Dai-Tinh-Ai-Bang-Kieu-Minh-Tuyet/ZWZBWDAF.html',
'info_dict': {
'_type': 'playlist',
'id': 'ZWZBWDAF',
'title': 'Lâu Đài Tình Ái - Bằng Kiều ft. Minh Tuyết | Album 320 lossless',
'title': 'Lâu Đài Tình Ái - Bằng Kiều,Minh Tuyết | Album 320 lossless',
},
'playlist_count': 10,
'skip': 'removed at the request of the owner',
}, {
'url': 'http://mp3.zing.vn/playlist/Duong-Hong-Loan-apollobee/IWCAACCB.html',
'only_matching': True,
}]
IE_NAME = 'zingmp3:album'
IE_DESC = 'mp3.zing.vn albums'
IE_NAME = 'zingmp3'
IE_DESC = 'mp3.zing.vn'
def _real_extract(self, url):
matched = re.match(self._VALID_URL, url)
slug = matched.group('slug')
album_id = matched.group('album_id')
page_id = self._match_id(url)
webpage = self._download_webpage(
'http://mp3.zing.vn/album/%s/%s.html' % (slug, album_id), album_id)
player_xml_url = self._search_regex(
r'&amp;xmlURL=(?P<xml_url>[^&]+)&', webpage, 'player xml url')
webpage = self._download_webpage(url, page_id)
return self._extract_player_xml(
player_xml_url, album_id,
playlist_title=self._og_search_title(webpage))
player_json_url = self._search_regex([
r'data-xml="([^"]+)',
r'&amp;xmlURL=([^&]+)&'
], webpage, 'player xml url')
playlist_title = None
page_type = self._search_regex(r'/(?:html5)?xml/([^/-]+)', player_json_url, 'page type')
if page_type == 'video':
player_json_url = update_url_query(player_json_url, {'format': 'json'})
else:
player_json_url = player_json_url.replace('/xml/', '/html5xml/')
if page_type == 'album':
playlist_title = self._og_search_title(webpage)
return self._extract_player_json(player_json_url, page_id, page_type, playlist_title)

View File

@@ -1,94 +0,0 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
determine_ext,
str_to_int,
)
class ZippCastIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?zippcast\.com/(?:video/|videoview\.php\?.*\bvplay=)(?P<id>[0-9a-zA-Z]+)'
_TESTS = [{
# m3u8, hq direct link
'url': 'http://www.zippcast.com/video/c9cfd5c7e44dbc29c81',
'md5': '5ea0263b5606866c4d6cda0fc5e8c6b6',
'info_dict': {
'id': 'c9cfd5c7e44dbc29c81',
'ext': 'mp4',
'title': '[Vinesauce] Vinny - Digital Space Traveler',
'description': 'Muted on youtube, but now uploaded in it\'s original form.',
'thumbnail': 're:^https?://.*\.jpg$',
'uploader': 'vinesauce',
'view_count': int,
'categories': ['Entertainment'],
'tags': list,
},
}, {
# f4m, lq ipod direct link
'url': 'http://www.zippcast.com/video/b79c0a233e9c6581775',
'only_matching': True,
}, {
'url': 'http://www.zippcast.com/videoview.php?vplay=c9cfd5c7e44dbc29c81&auto=no',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(
'http://www.zippcast.com/video/%s' % video_id, video_id)
formats = []
video_url = self._search_regex(
r'<source[^>]+src=(["\'])(?P<url>.+?)\1', webpage,
'video url', default=None, group='url')
if video_url:
formats.append({
'url': video_url,
'format_id': 'http',
'preference': 0, # direct link is almost always of worse quality
})
src_url = self._search_regex(
r'src\s*:\s*(?:escape\()?(["\'])(?P<url>http://.+?)\1',
webpage, 'src', default=None, group='url')
ext = determine_ext(src_url)
if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
src_url, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls', fatal=False))
elif ext == 'f4m':
formats.extend(self._extract_f4m_formats(
src_url, video_id, f4m_id='hds', fatal=False))
self._sort_formats(formats)
title = self._og_search_title(webpage)
description = self._og_search_description(webpage) or self._html_search_meta(
'description', webpage)
uploader = self._search_regex(
r'<a[^>]+href="https?://[^/]+/profile/[^>]+>([^<]+)</a>',
webpage, 'uploader', fatal=False)
thumbnail = self._og_search_thumbnail(webpage)
view_count = str_to_int(self._search_regex(
r'>([\d,.]+) views!', webpage, 'view count', fatal=False))
categories = re.findall(
r'<a[^>]+href="https?://[^/]+/categories/[^"]+">([^<]+),?<',
webpage)
tags = re.findall(
r'<a[^>]+href="https?://[^/]+/search/tags/[^"]+">([^<]+),?<',
webpage)
return {
'id': video_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'uploader': uploader,
'view_count': view_count,
'categories': categories,
'tags': tags,
'formats': formats,
}

View File

@@ -628,22 +628,7 @@ def parseOpts(overrideArguments=None):
filesystem.add_option(
'-o', '--output',
dest='outtmpl', metavar='TEMPLATE',
help=('Output filename template. Use %(title)s to get the title, '
'%(uploader)s for the uploader name, %(uploader_id)s for the uploader nickname if different, '
'%(autonumber)s to get an automatically incremented number, '
'%(ext)s for the filename extension, '
'%(format)s for the format description (like "22 - 1280x720" or "HD"), '
'%(format_id)s for the unique id of the format (like YouTube\'s itags: "137"), '
'%(upload_date)s for the upload date (YYYYMMDD), '
'%(extractor)s for the provider (youtube, metacafe, etc), '
'%(id)s for the video id, '
'%(playlist_title)s, %(playlist_id)s, or %(playlist)s (=title if present, ID otherwise) for the playlist the video is in, '
'%(playlist_index)s for the position in the playlist. '
'%(height)s and %(width)s for the width and height of the video format. '
'%(resolution)s for a textual description of the resolution of the video format. '
'%% for a literal percent. '
'Use - to output to stdout. Can also be used to download to a different directory, '
'for example with -o \'/my/downloads/%(uploader)s/%(title)s-%(id)s.%(ext)s\' .'))
help=('Output filename template, see the "OUTPUT TEMPLATE" for all the info'))
filesystem.add_option(
'--autonumber-size',
dest='autonumber_size', metavar='NUMBER',

View File

@@ -1504,38 +1504,63 @@ def parse_filesize(s):
_UNIT_TABLE = {
'B': 1,
'b': 1,
'bytes': 1,
'KiB': 1024,
'KB': 1000,
'kB': 1024,
'Kb': 1000,
'kb': 1000,
'kilobytes': 1000,
'kibibytes': 1024,
'MiB': 1024 ** 2,
'MB': 1000 ** 2,
'mB': 1024 ** 2,
'Mb': 1000 ** 2,
'mb': 1000 ** 2,
'megabytes': 1000 ** 2,
'mebibytes': 1024 ** 2,
'GiB': 1024 ** 3,
'GB': 1000 ** 3,
'gB': 1024 ** 3,
'Gb': 1000 ** 3,
'gb': 1000 ** 3,
'gigabytes': 1000 ** 3,
'gibibytes': 1024 ** 3,
'TiB': 1024 ** 4,
'TB': 1000 ** 4,
'tB': 1024 ** 4,
'Tb': 1000 ** 4,
'tb': 1000 ** 4,
'terabytes': 1000 ** 4,
'tebibytes': 1024 ** 4,
'PiB': 1024 ** 5,
'PB': 1000 ** 5,
'pB': 1024 ** 5,
'Pb': 1000 ** 5,
'pb': 1000 ** 5,
'petabytes': 1000 ** 5,
'pebibytes': 1024 ** 5,
'EiB': 1024 ** 6,
'EB': 1000 ** 6,
'eB': 1024 ** 6,
'Eb': 1000 ** 6,
'eb': 1000 ** 6,
'exabytes': 1000 ** 6,
'exbibytes': 1024 ** 6,
'ZiB': 1024 ** 7,
'ZB': 1000 ** 7,
'zB': 1024 ** 7,
'Zb': 1000 ** 7,
'zb': 1000 ** 7,
'zettabytes': 1000 ** 7,
'zebibytes': 1024 ** 7,
'YiB': 1024 ** 8,
'YB': 1000 ** 8,
'yB': 1024 ** 8,
'Yb': 1000 ** 8,
'yb': 1000 ** 8,
'yottabytes': 1000 ** 8,
'yobibytes': 1024 ** 8,
}
return lookup_unit_table(_UNIT_TABLE, s)
@@ -2030,14 +2055,14 @@ def js_to_json(code):
}.get(m.group(0), m.group(0)), v[1:-1])
INTEGER_TABLE = (
(r'^0[xX][0-9a-fA-F]+', 16),
(r'^0+[0-7]+', 8),
(r'^(0[xX][0-9a-fA-F]+)\s*:?$', 16),
(r'^(0+[0-7]+)\s*:?$', 8),
)
for regex, base in INTEGER_TABLE:
im = re.match(regex, v)
if im:
i = int(im.group(0), base)
i = int(im.group(1), base)
return '"%d":' % i if v.endswith(':') else '%d' % i
return '"%s"' % v

View File

@@ -1,3 +1,3 @@
from __future__ import unicode_literals
__version__ = '2016.08.13'
__version__ = '2016.08.24.1'