Compare commits
115 Commits
2016.08.13
...
2016.08.24
Author | SHA1 | Date | |
---|---|---|---|
![]() |
d38b27dd9b | ||
![]() |
6d94cbd2f4 | ||
![]() |
30317f4887 | ||
![]() |
8c3e35dd44 | ||
![]() |
c86f51ee38 | ||
![]() |
6e52bbb413 | ||
![]() |
05bddcc512 | ||
![]() |
1212e9972f | ||
![]() |
ccb6570e9e | ||
![]() |
18b6216150 | ||
![]() |
fb009b7f53 | ||
![]() |
3083e4dc07 | ||
![]() |
7367bdef23 | ||
![]() |
ad31642584 | ||
![]() |
c7c43a93ba | ||
![]() |
96229e5f95 | ||
![]() |
55d119e2a1 | ||
![]() |
6d2679ee26 | ||
![]() |
afbab5688e | ||
![]() |
3d897cc791 | ||
![]() |
cf143c4d97 | ||
![]() |
ad120ae1c5 | ||
![]() |
d0fa172e5f | ||
![]() |
f97f9f71e5 | ||
![]() |
526656726b | ||
![]() |
9b8c554ea7 | ||
![]() |
d13bfc07b7 | ||
![]() |
efe470e261 | ||
![]() |
e3f6b56909 | ||
![]() |
b1e676fde8 | ||
![]() |
92d4cfa358 | ||
![]() |
3d47ee0a9e | ||
![]() |
d164a0d41b | ||
![]() |
db29af6d36 | ||
![]() |
2c6acdfd2d | ||
![]() |
fddaa76a59 | ||
![]() |
a809446750 | ||
![]() |
d8f30a7e66 | ||
![]() |
5b1d85754e | ||
![]() |
e25586e471 | ||
![]() |
292a2301bf | ||
![]() |
dabe15701b | ||
![]() |
4245f55880 | ||
![]() |
5b9d187cc6 | ||
![]() |
39e1c4f08c | ||
![]() |
19f35402c5 | ||
![]() |
70852b47ca | ||
![]() |
a9a3b4a081 | ||
![]() |
ecc90093f9 | ||
![]() |
520251c093 | ||
![]() |
55af45fcab | ||
![]() |
b82232036a | ||
![]() |
e4659b4547 | ||
![]() |
9e5751b9fe | ||
![]() |
bd1bcd3ea0 | ||
![]() |
93a63b36f1 | ||
![]() |
8b2dc4c328 | ||
![]() |
850837b67a | ||
![]() |
13585d7682 | ||
![]() |
fd3ec986a4 | ||
![]() |
b0d578ff7b | ||
![]() |
b0c8f2e9c8 | ||
![]() |
51815886a9 | ||
![]() |
08a42f9c74 | ||
![]() |
e15ad9ef09 | ||
![]() |
4e9fee1015 | ||
![]() |
7273e5849b | ||
![]() |
b505e98784 | ||
![]() |
92cd9fd565 | ||
![]() |
b3d7dce429 | ||
![]() |
a44694ab4e | ||
![]() |
ab19b46b88 | ||
![]() |
8804f10e6b | ||
![]() |
6be17c0870 | ||
![]() |
8652770bd2 | ||
![]() |
2a1321a272 | ||
![]() |
9c0fa60bf3 | ||
![]() |
502d87c546 | ||
![]() |
b35b0d73d8 | ||
![]() |
6e7e4a6edf | ||
![]() |
53fef319f1 | ||
![]() |
2cabee2a7d | ||
![]() |
11f502fac1 | ||
![]() |
98affc1a48 | ||
![]() |
70a2829fee | ||
![]() |
837e56c8ee | ||
![]() |
b5ddee8c77 | ||
![]() |
fb64adcbd3 | ||
![]() |
4f640f2890 | ||
![]() |
254e64a20a | ||
![]() |
818ac213eb | ||
![]() |
cbef4d5c9f | ||
![]() |
bf90c46790 | ||
![]() |
69eb4d699f | ||
![]() |
6d8ec8c3b7 | ||
![]() |
760845ce99 | ||
![]() |
5c2d087221 | ||
![]() |
b6c4e36728 | ||
![]() |
1a57b8c18c | ||
![]() |
24eb13b1c6 | ||
![]() |
525e0316c0 | ||
![]() |
7e60ce9cf7 | ||
![]() |
e811bcf8f8 | ||
![]() |
6103f59095 | ||
![]() |
9fa5789279 | ||
![]() |
d2ac04674d | ||
![]() |
1fd6e30988 | ||
![]() |
884cdb6cd9 | ||
![]() |
9771b1f901 | ||
![]() |
2118fdd1a9 | ||
![]() |
320d597c21 | ||
![]() |
aaf44a2f47 | ||
![]() |
fafabc0712 | ||
![]() |
409760a932 | ||
![]() |
097eba019d |
6
.github/ISSUE_TEMPLATE.md
vendored
6
.github/ISSUE_TEMPLATE.md
vendored
@@ -6,8 +6,8 @@
|
||||
|
||||
---
|
||||
|
||||
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2016.08.13*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
|
||||
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2016.08.13**
|
||||
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2016.08.24.1*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
|
||||
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2016.08.24.1**
|
||||
|
||||
### Before submitting an *issue* make sure you have:
|
||||
- [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
|
||||
@@ -35,7 +35,7 @@ $ youtube-dl -v <your command line>
|
||||
[debug] User config: []
|
||||
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
|
||||
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
|
||||
[debug] youtube-dl version 2016.08.13
|
||||
[debug] youtube-dl version 2016.08.24.1
|
||||
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
|
||||
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
|
||||
[debug] Proxy map: {}
|
||||
|
90
ChangeLog
90
ChangeLog
@@ -1,3 +1,92 @@
|
||||
version 2016.08.24.1
|
||||
|
||||
Extractors
|
||||
+ [pluralsight] Add support for subtitles (#9681)
|
||||
|
||||
|
||||
version 2016.08.24
|
||||
|
||||
Extractors
|
||||
* [youtube] Fix authentication (#10392)
|
||||
* [openload] Fix extraction (#10408)
|
||||
+ [bravotv] Add support for Adobe Pass (#10407)
|
||||
* [bravotv] Fix clip info extraction (#10407)
|
||||
* [eagleplatform] Improve embedded videos detection (#10409)
|
||||
* [awaan] Fix extraction
|
||||
* [mtvservices:embedded] Update config URL
|
||||
+ [abc:iview] Add extractor (#6148)
|
||||
|
||||
|
||||
version 2016.08.22
|
||||
|
||||
Core
|
||||
* Improve formats and subtitles extension auto calculation
|
||||
+ Recognize full unit names in parse_filesize
|
||||
+ Add support for m3u8 manifests in HTML5 multimedia tags
|
||||
* Fix octal/hexadecimal number detection in js_to_json
|
||||
|
||||
Extractors
|
||||
+ [ivi] Add support for 720p and 1080p
|
||||
+ [charlierose] Add new extractor (#10382)
|
||||
* [1tv] Fix extraction (#9249)
|
||||
* [twitch] Renew authentication
|
||||
* [kaltura] Improve subtitles extension calculation
|
||||
+ [zingmp3] Add support for video clips
|
||||
* [zingmp3] Fix extraction (#10041)
|
||||
* [kaltura] Improve subtitles extraction (#10279)
|
||||
* [cultureunplugged] Fix extraction (#10330)
|
||||
+ [cnn] Add support for money.cnn.com (#2797)
|
||||
* [cbsnews] Fix extraction (#10362)
|
||||
* [cbs] Fix extraction (#10393)
|
||||
+ [litv] Support 'promo' URLs (#10385)
|
||||
* [snotr] Fix extraction (#10338)
|
||||
* [n-tv.de] Fix extraction (#10331)
|
||||
* [globo:article] Relax URL and video id regular expressions (#10379)
|
||||
|
||||
|
||||
version 2016.08.19
|
||||
|
||||
Core
|
||||
- Remove output template description from --help
|
||||
* Recognize lowercase units in parse_filesize
|
||||
|
||||
Extractors
|
||||
+ [porncom] Add extractor for porn.com (#2251, #10251)
|
||||
+ [generic] Add support for DBTV embeds
|
||||
* [vk:wallpost] Fix audio extraction for new site layout
|
||||
* [vk] Fix authentication
|
||||
+ [hgtvcom:show] Add extractor for hgtv.com shows (#10365)
|
||||
+ [discoverygo] Add support for another GO network sites
|
||||
|
||||
|
||||
version 2016.08.17
|
||||
|
||||
Core
|
||||
+ Add _get_netrc_login_info
|
||||
|
||||
Extractors
|
||||
* [mofosex] Extract all formats (#10335)
|
||||
+ [generic] Add support for vbox7 embeds
|
||||
+ [vbox7] Add support for embed URLs
|
||||
+ [viafree] Add extractor (#10358)
|
||||
+ [mtg] Add support for viafree URLs (#10358)
|
||||
* [theplatform] Extract all subtitles per language
|
||||
+ [xvideos] Fix HLS extraction (#10356)
|
||||
+ [amcnetworks] Add extractor
|
||||
+ [bbc:playlist] Add support for pagination (#10349)
|
||||
+ [fxnetworks] Add extractor (#9462)
|
||||
* [cbslocal] Fix extraction for SendtoNews-based videos
|
||||
* [sendtonews] Fix extraction
|
||||
* [jwplatform] Extract video id from JWPlayer data
|
||||
- [zippcast] Remove extractor (#10332)
|
||||
+ [viceland] Add extractor (#8799)
|
||||
+ [adobepass] Add base extractor for Adobe Pass Authentication
|
||||
* [life:embed] Improve extraction
|
||||
* [vgtv] Detect geo restricted videos (#10348)
|
||||
+ [uplynk] Add extractor
|
||||
* [xiami] Fix extraction (#10342)
|
||||
|
||||
|
||||
version 2016.08.13
|
||||
|
||||
Core
|
||||
@@ -23,6 +112,7 @@ Extractors
|
||||
+ [pbs] Add support for high quality HTTP formats
|
||||
+ [crunchyroll] Add support for HLS formats (#10301)
|
||||
|
||||
|
||||
version 2016.08.12
|
||||
|
||||
Core
|
||||
|
32
README.md
32
README.md
@@ -201,32 +201,8 @@ which means you can modify it, redistribute it or use it however you like.
|
||||
-a, --batch-file FILE File containing URLs to download ('-' for
|
||||
stdin)
|
||||
--id Use only video ID in file name
|
||||
-o, --output TEMPLATE Output filename template. Use %(title)s to
|
||||
get the title, %(uploader)s for the
|
||||
uploader name, %(uploader_id)s for the
|
||||
uploader nickname if different,
|
||||
%(autonumber)s to get an automatically
|
||||
incremented number, %(ext)s for the
|
||||
filename extension, %(format)s for the
|
||||
format description (like "22 - 1280x720" or
|
||||
"HD"), %(format_id)s for the unique id of
|
||||
the format (like YouTube's itags: "137"),
|
||||
%(upload_date)s for the upload date
|
||||
(YYYYMMDD), %(extractor)s for the provider
|
||||
(youtube, metacafe, etc), %(id)s for the
|
||||
video id, %(playlist_title)s,
|
||||
%(playlist_id)s, or %(playlist)s (=title if
|
||||
present, ID otherwise) for the playlist the
|
||||
video is in, %(playlist_index)s for the
|
||||
position in the playlist. %(height)s and
|
||||
%(width)s for the width and height of the
|
||||
video format. %(resolution)s for a textual
|
||||
description of the resolution of the video
|
||||
format. %% for a literal percent. Use - to
|
||||
output to stdout. Can also be used to
|
||||
download to a different directory, for
|
||||
example with -o '/my/downloads/%(uploader)s
|
||||
/%(title)s-%(id)s.%(ext)s' .
|
||||
-o, --output TEMPLATE Output filename template, see the "OUTPUT
|
||||
TEMPLATE" for all the info
|
||||
--autonumber-size NUMBER Specify the number of digits in
|
||||
%(autonumber)s when it is present in output
|
||||
filename template or --auto-number option
|
||||
@@ -669,7 +645,11 @@ $ youtube-dl -f 'best[filesize<50M]'
|
||||
|
||||
# Download best format available via direct link over HTTP/HTTPS protocol
|
||||
$ youtube-dl -f '(bestvideo+bestaudio/best)[protocol^=http]'
|
||||
|
||||
# Download the best video format and the best audio format without merging them
|
||||
$ youtube-dl -f 'bestvideo,bestaudio' -o '%(title)s.f%(format_id)s.%(ext)s'
|
||||
```
|
||||
Note that in the last example, an output template is recommended as bestvideo and bestaudio may have the same file name.
|
||||
|
||||
|
||||
# VIDEO SELECTION
|
||||
|
@@ -16,6 +16,7 @@
|
||||
- **9gag**
|
||||
- **9now.com.au**
|
||||
- **abc.net.au**
|
||||
- **abc.net.au:iview**
|
||||
- **Abc7News**
|
||||
- **abcnews**
|
||||
- **abcnews:video**
|
||||
@@ -35,6 +36,7 @@
|
||||
- **AlJazeera**
|
||||
- **Allocine**
|
||||
- **AlphaPorno**
|
||||
- **AMCNetworks**
|
||||
- **AnimeOnDemand**
|
||||
- **anitube.se**
|
||||
- **AnySex**
|
||||
@@ -65,6 +67,10 @@
|
||||
- **audiomack**
|
||||
- **audiomack:album**
|
||||
- **auroravid**: AuroraVid
|
||||
- **AWAAN**
|
||||
- **awaan:live**
|
||||
- **awaan:season**
|
||||
- **awaan:video**
|
||||
- **Azubu**
|
||||
- **AzubuLive**
|
||||
- **BaiduVideo**: 百度视频
|
||||
@@ -120,6 +126,7 @@
|
||||
- **CDA**
|
||||
- **CeskaTelevize**
|
||||
- **channel9**: Channel 9
|
||||
- **CharlieRose**
|
||||
- **Chaturbate**
|
||||
- **Chilloutzone**
|
||||
- **chirbit**
|
||||
@@ -170,10 +177,6 @@
|
||||
- **daum.net:playlist**
|
||||
- **daum.net:user**
|
||||
- **DBTV**
|
||||
- **DCN**
|
||||
- **dcn:live**
|
||||
- **dcn:season**
|
||||
- **dcn:video**
|
||||
- **DctpTv**
|
||||
- **DeezerPlaylist**
|
||||
- **defense.gouv.fr**
|
||||
@@ -247,6 +250,7 @@
|
||||
- **Funimation**
|
||||
- **FunnyOrDie**
|
||||
- **Fusion**
|
||||
- **FXNetworks**
|
||||
- **GameInformer**
|
||||
- **GameOne**
|
||||
- **gameone:playlist**
|
||||
@@ -277,6 +281,7 @@
|
||||
- **Helsinki**: helsinki.fi
|
||||
- **HentaiStigma**
|
||||
- **HGTV**
|
||||
- **hgtv.com:show**
|
||||
- **HistoricFilms**
|
||||
- **history:topic**: History.com Topic
|
||||
- **hitbox**
|
||||
@@ -398,6 +403,7 @@
|
||||
- **Moviezine**
|
||||
- **MPORA**
|
||||
- **MSN**
|
||||
- **mtg**: MTG services
|
||||
- **MTV**
|
||||
- **mtv.de**
|
||||
- **mtvservices:embedded**
|
||||
@@ -520,6 +526,7 @@
|
||||
- **podomatic**
|
||||
- **Pokemon**
|
||||
- **PolskieRadio**
|
||||
- **PornCom**
|
||||
- **PornHd**
|
||||
- **PornHub**: PornHub and Thumbzilla
|
||||
- **PornHubPlaylist**
|
||||
@@ -731,7 +738,6 @@
|
||||
- **tvp**: Telewizja Polska
|
||||
- **tvp:embed**: Telewizja Polska
|
||||
- **tvp:series**
|
||||
- **TVPlay**: TV3Play and related services
|
||||
- **Tweakers**
|
||||
- **twitch:chapter**
|
||||
- **twitch:clips**
|
||||
@@ -748,6 +754,8 @@
|
||||
- **UDNEmbed**: 聯合影音
|
||||
- **Unistra**
|
||||
- **uol.com.br**
|
||||
- **uplynk**
|
||||
- **uplynk:preplay**
|
||||
- **Urort**: NRK P3 Urørt
|
||||
- **URPlay**
|
||||
- **USAToday**
|
||||
@@ -765,7 +773,9 @@
|
||||
- **VevoPlaylist**
|
||||
- **VGTV**: VGTV, BTTV, FTV, Aftenposten and Aftonbladet
|
||||
- **vh1.com**
|
||||
- **Viafree**
|
||||
- **Vice**
|
||||
- **Viceland**
|
||||
- **ViceShow**
|
||||
- **Vidbit**
|
||||
- **Viddler**
|
||||
@@ -885,6 +895,4 @@
|
||||
- **Zapiks**
|
||||
- **ZDF**
|
||||
- **ZDFChannel**
|
||||
- **zingmp3:album**: mp3.zing.vn albums
|
||||
- **zingmp3:song**: mp3.zing.vn songs
|
||||
- **ZippCast**
|
||||
- **zingmp3**: mp3.zing.vn
|
||||
|
@@ -712,6 +712,9 @@ class TestUtil(unittest.TestCase):
|
||||
inp = '''{"foo":101}'''
|
||||
self.assertEqual(js_to_json(inp), '''{"foo":101}''')
|
||||
|
||||
inp = '''{"duration": "00:01:07"}'''
|
||||
self.assertEqual(js_to_json(inp), '''{"duration": "00:01:07"}''')
|
||||
|
||||
def test_js_to_json_edgecases(self):
|
||||
on = js_to_json("{abc_def:'1\\'\\\\2\\\\\\'3\"4'}")
|
||||
self.assertEqual(json.loads(on), {"abc_def": "1'\\2\\'3\"4"})
|
||||
@@ -817,7 +820,10 @@ class TestUtil(unittest.TestCase):
|
||||
self.assertEqual(parse_filesize('2 MiB'), 2097152)
|
||||
self.assertEqual(parse_filesize('5 GB'), 5000000000)
|
||||
self.assertEqual(parse_filesize('1.2Tb'), 1200000000000)
|
||||
self.assertEqual(parse_filesize('1.2tb'), 1200000000000)
|
||||
self.assertEqual(parse_filesize('1,24 KB'), 1240)
|
||||
self.assertEqual(parse_filesize('1,24 kb'), 1240)
|
||||
self.assertEqual(parse_filesize('8.5 megabytes'), 8500000)
|
||||
|
||||
def test_parse_count(self):
|
||||
self.assertEqual(parse_count(None), None)
|
||||
|
@@ -1299,7 +1299,7 @@ class YoutubeDL(object):
|
||||
for subtitle_format in subtitle:
|
||||
if subtitle_format.get('url'):
|
||||
subtitle_format['url'] = sanitize_url(subtitle_format['url'])
|
||||
if 'ext' not in subtitle_format:
|
||||
if subtitle_format.get('ext') is None:
|
||||
subtitle_format['ext'] = determine_ext(subtitle_format['url']).lower()
|
||||
|
||||
if self.params.get('listsubtitles', False):
|
||||
@@ -1354,7 +1354,7 @@ class YoutubeDL(object):
|
||||
note=' ({0})'.format(format['format_note']) if format.get('format_note') is not None else '',
|
||||
)
|
||||
# Automatically determine file extension if missing
|
||||
if 'ext' not in format:
|
||||
if format.get('ext') is None:
|
||||
format['ext'] = determine_ext(format['url']).lower()
|
||||
# Automatically determine protocol if missing (useful for format
|
||||
# selection purposes)
|
||||
|
@@ -20,6 +20,7 @@ from ..utils import (
|
||||
encodeFilename,
|
||||
sanitize_open,
|
||||
parse_m3u8_attributes,
|
||||
update_url_query,
|
||||
)
|
||||
|
||||
|
||||
@@ -82,6 +83,7 @@ class HlsFD(FragmentFD):
|
||||
|
||||
self._prepare_and_start_frag_download(ctx)
|
||||
|
||||
extra_param_to_segment_url = info_dict.get('extra_param_to_segment_url')
|
||||
i = 0
|
||||
media_sequence = 0
|
||||
decrypt_info = {'METHOD': 'NONE'}
|
||||
@@ -95,6 +97,8 @@ class HlsFD(FragmentFD):
|
||||
if re.match(r'^https?://', line)
|
||||
else compat_urlparse.urljoin(man_url, line))
|
||||
frag_filename = '%s-Frag%d' % (ctx['tmpfilename'], i)
|
||||
if extra_param_to_segment_url:
|
||||
frag_url = update_url_query(frag_url, extra_param_to_segment_url)
|
||||
success = ctx['dl'].download(frag_filename, {'url': frag_url})
|
||||
if not success:
|
||||
return False
|
||||
@@ -120,6 +124,8 @@ class HlsFD(FragmentFD):
|
||||
if not re.match(r'^https?://', decrypt_info['URI']):
|
||||
decrypt_info['URI'] = compat_urlparse.urljoin(
|
||||
man_url, decrypt_info['URI'])
|
||||
if extra_param_to_segment_url:
|
||||
decrypt_info['URI'] = update_url_query(decrypt_info['URI'], extra_param_to_segment_url)
|
||||
decrypt_info['KEY'] = self.ydl.urlopen(decrypt_info['URI']).read()
|
||||
elif line.startswith('#EXT-X-MEDIA-SEQUENCE'):
|
||||
media_sequence = int(line[22:])
|
||||
|
@@ -7,6 +7,7 @@ from ..utils import (
|
||||
ExtractorError,
|
||||
js_to_json,
|
||||
int_or_none,
|
||||
parse_iso8601,
|
||||
)
|
||||
|
||||
|
||||
@@ -93,3 +94,57 @@ class ABCIE(InfoExtractor):
|
||||
'description': self._og_search_description(webpage),
|
||||
'thumbnail': self._og_search_thumbnail(webpage),
|
||||
}
|
||||
|
||||
|
||||
class ABCIViewIE(InfoExtractor):
|
||||
IE_NAME = 'abc.net.au:iview'
|
||||
_VALID_URL = r'https?://iview\.abc\.net\.au/programs/[^/]+/(?P<id>[^/?#]+)'
|
||||
|
||||
_TESTS = [{
|
||||
'url': 'http://iview.abc.net.au/programs/gardening-australia/FA1505V024S00',
|
||||
'md5': '979d10b2939101f0d27a06b79edad536',
|
||||
'info_dict': {
|
||||
'id': 'FA1505V024S00',
|
||||
'ext': 'mp4',
|
||||
'title': 'Series 27 Ep 24',
|
||||
'description': 'md5:b28baeae7504d1148e1d2f0e3ed3c15d',
|
||||
'upload_date': '20160820',
|
||||
'uploader_id': 'abc1',
|
||||
'timestamp': 1471719600,
|
||||
},
|
||||
}]
|
||||
|
||||
def _real_extract(self, url):
|
||||
video_id = self._match_id(url)
|
||||
webpage = self._download_webpage(url, video_id)
|
||||
video_params = self._parse_json(self._search_regex(
|
||||
r'videoParams\s*=\s*({.+?});', webpage, 'video params'), video_id)
|
||||
title = video_params['title']
|
||||
stream = next(s for s in video_params['playlist'] if s.get('type') == 'program')
|
||||
|
||||
formats = self._extract_akamai_formats(stream['hds-unmetered'], video_id)
|
||||
self._sort_formats(formats)
|
||||
|
||||
subtitles = {}
|
||||
src_vtt = stream.get('captions', {}).get('src-vtt')
|
||||
if src_vtt:
|
||||
subtitles['en'] = [{
|
||||
'url': src_vtt,
|
||||
'ext': 'vtt',
|
||||
}]
|
||||
|
||||
return {
|
||||
'id': video_id,
|
||||
'title': title,
|
||||
'description': self._html_search_meta(['og:description', 'twitter:description'], webpage),
|
||||
'thumbnail': self._html_search_meta(['og:image', 'twitter:image:src'], webpage),
|
||||
'duration': int_or_none(video_params.get('eventDuration')),
|
||||
'timestamp': parse_iso8601(video_params.get('pubDate'), ' '),
|
||||
'series': video_params.get('seriesTitle'),
|
||||
'series_id': video_params.get('seriesHouseNumber') or video_id[:7],
|
||||
'episode_number': int_or_none(self._html_search_meta('episodeNumber', webpage)),
|
||||
'episode': self._html_search_meta('episode_title', webpage),
|
||||
'uploader_id': video_params.get('channel'),
|
||||
'formats': formats,
|
||||
'subtitles': subtitles,
|
||||
}
|
||||
|
134
youtube_dl/extractor/adobepass.py
Normal file
134
youtube_dl/extractor/adobepass.py
Normal file
@@ -0,0 +1,134 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
from __future__ import unicode_literals
|
||||
|
||||
import re
|
||||
import time
|
||||
import xml.etree.ElementTree as etree
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..utils import (
|
||||
unescapeHTML,
|
||||
urlencode_postdata,
|
||||
unified_timestamp,
|
||||
)
|
||||
|
||||
|
||||
class AdobePassIE(InfoExtractor):
|
||||
_SERVICE_PROVIDER_TEMPLATE = 'https://sp.auth.adobe.com/adobe-services/%s'
|
||||
_USER_AGENT = 'Mozilla/5.0 (X11; Linux i686; rv:47.0) Gecko/20100101 Firefox/47.0'
|
||||
|
||||
@staticmethod
|
||||
def _get_mvpd_resource(provider_id, title, guid, rating):
|
||||
channel = etree.Element('channel')
|
||||
channel_title = etree.SubElement(channel, 'title')
|
||||
channel_title.text = provider_id
|
||||
item = etree.SubElement(channel, 'item')
|
||||
resource_title = etree.SubElement(item, 'title')
|
||||
resource_title.text = title
|
||||
resource_guid = etree.SubElement(item, 'guid')
|
||||
resource_guid.text = guid
|
||||
resource_rating = etree.SubElement(item, 'media:rating')
|
||||
resource_rating.attrib = {'scheme': 'urn:v-chip'}
|
||||
resource_rating.text = rating
|
||||
return '<rss version="2.0" xmlns:media="http://search.yahoo.com/mrss/">' + etree.tostring(channel).decode() + '</rss>'
|
||||
|
||||
def _extract_mvpd_auth(self, url, video_id, requestor_id, resource):
|
||||
def xml_text(xml_str, tag):
|
||||
return self._search_regex(
|
||||
'<%s>(.+?)</%s>' % (tag, tag), xml_str, tag)
|
||||
|
||||
mvpd_headers = {
|
||||
'ap_42': 'anonymous',
|
||||
'ap_11': 'Linux i686',
|
||||
'ap_z': self._USER_AGENT,
|
||||
'User-Agent': self._USER_AGENT,
|
||||
}
|
||||
|
||||
guid = xml_text(resource, 'guid')
|
||||
requestor_info = self._downloader.cache.load('mvpd', requestor_id) or {}
|
||||
authn_token = requestor_info.get('authn_token')
|
||||
if authn_token:
|
||||
token_expires = unified_timestamp(re.sub(r'[_ ]GMT', '', xml_text(authn_token, 'simpleTokenExpires')))
|
||||
if token_expires and token_expires <= int(time.time()):
|
||||
authn_token = None
|
||||
requestor_info = {}
|
||||
if not authn_token:
|
||||
# TODO add support for other TV Providers
|
||||
mso_id = 'DTV'
|
||||
username, password = self._get_netrc_login_info(mso_id)
|
||||
if not username or not password:
|
||||
return ''
|
||||
|
||||
def post_form(form_page, note, data={}):
|
||||
post_url = self._html_search_regex(r'<form[^>]+action=(["\'])(?P<url>.+?)\1', form_page, 'post url', group='url')
|
||||
return self._download_webpage(
|
||||
post_url, video_id, note, data=urlencode_postdata(data or self._hidden_inputs(form_page)), headers={
|
||||
'Content-Type': 'application/x-www-form-urlencoded',
|
||||
})
|
||||
|
||||
provider_redirect_page = self._download_webpage(
|
||||
self._SERVICE_PROVIDER_TEMPLATE % 'authenticate/saml', video_id,
|
||||
'Downloading Provider Redirect Page', query={
|
||||
'noflash': 'true',
|
||||
'mso_id': mso_id,
|
||||
'requestor_id': requestor_id,
|
||||
'no_iframe': 'false',
|
||||
'domain_name': 'adobe.com',
|
||||
'redirect_url': url,
|
||||
})
|
||||
provider_login_page = post_form(
|
||||
provider_redirect_page, 'Downloading Provider Login Page')
|
||||
mvpd_confirm_page = post_form(provider_login_page, 'Logging in', {
|
||||
'username': username,
|
||||
'password': password,
|
||||
})
|
||||
post_form(mvpd_confirm_page, 'Confirming Login')
|
||||
|
||||
session = self._download_webpage(
|
||||
self._SERVICE_PROVIDER_TEMPLATE % 'session', video_id,
|
||||
'Retrieving Session', data=urlencode_postdata({
|
||||
'_method': 'GET',
|
||||
'requestor_id': requestor_id,
|
||||
}), headers=mvpd_headers)
|
||||
if '<pendingLogout' in session:
|
||||
self._downloader.cache.store('mvpd', requestor_id, {})
|
||||
return self._extract_mvpd_auth(url, video_id, requestor_id, resource)
|
||||
authn_token = unescapeHTML(xml_text(session, 'authnToken'))
|
||||
requestor_info['authn_token'] = authn_token
|
||||
self._downloader.cache.store('mvpd', requestor_id, requestor_info)
|
||||
|
||||
authz_token = requestor_info.get(guid)
|
||||
if not authz_token:
|
||||
authorize = self._download_webpage(
|
||||
self._SERVICE_PROVIDER_TEMPLATE % 'authorize', video_id,
|
||||
'Retrieving Authorization Token', data=urlencode_postdata({
|
||||
'resource_id': resource,
|
||||
'requestor_id': requestor_id,
|
||||
'authentication_token': authn_token,
|
||||
'mso_id': xml_text(authn_token, 'simpleTokenMsoID'),
|
||||
'userMeta': '1',
|
||||
}), headers=mvpd_headers)
|
||||
if '<pendingLogout' in authorize:
|
||||
self._downloader.cache.store('mvpd', requestor_id, {})
|
||||
return self._extract_mvpd_auth(url, video_id, requestor_id, resource)
|
||||
authz_token = unescapeHTML(xml_text(authorize, 'authzToken'))
|
||||
requestor_info[guid] = authz_token
|
||||
self._downloader.cache.store('mvpd', requestor_id, requestor_info)
|
||||
|
||||
mvpd_headers.update({
|
||||
'ap_19': xml_text(authn_token, 'simpleSamlNameID'),
|
||||
'ap_23': xml_text(authn_token, 'simpleSamlSessionIndex'),
|
||||
})
|
||||
|
||||
short_authorize = self._download_webpage(
|
||||
self._SERVICE_PROVIDER_TEMPLATE % 'shortAuthorize',
|
||||
video_id, 'Retrieving Media Token', data=urlencode_postdata({
|
||||
'authz_token': authz_token,
|
||||
'requestor_id': requestor_id,
|
||||
'session_guid': xml_text(authn_token, 'simpleTokenAuthenticationGuid'),
|
||||
'hashed_guid': 'false',
|
||||
}), headers=mvpd_headers)
|
||||
if '<pendingLogout' in short_authorize:
|
||||
self._downloader.cache.store('mvpd', requestor_id, {})
|
||||
return self._extract_mvpd_auth(url, video_id, requestor_id, resource)
|
||||
return short_authorize
|
@@ -109,7 +109,10 @@ class AENetworksIE(AENetworksBaseIE):
|
||||
info = self._parse_theplatform_metadata(theplatform_metadata)
|
||||
if theplatform_metadata.get('AETN$isBehindWall'):
|
||||
requestor_id = self._DOMAIN_TO_REQUESTOR_ID[domain]
|
||||
resource = '<rss version="2.0" xmlns:media="http://search.yahoo.com/mrss/"><channel><title>%s</title><item><title>%s</title><guid>%s</guid><media:rating scheme="urn:v-chip">%s</media:rating></item></channel></rss>' % (requestor_id, theplatform_metadata['title'], theplatform_metadata['AETN$PPL_pplProgramId'], theplatform_metadata['ratings'][0]['rating'])
|
||||
resource = self._get_mvpd_resource(
|
||||
requestor_id, theplatform_metadata['title'],
|
||||
theplatform_metadata.get('AETN$PPL_pplProgramId') or theplatform_metadata.get('AETN$PPL_pplProgramId_OLD'),
|
||||
theplatform_metadata['ratings'][0]['rating'])
|
||||
query['auth'] = self._extract_mvpd_auth(
|
||||
url, video_id, requestor_id, resource)
|
||||
info.update(self._search_json_ld(webpage, video_id, fatal=False))
|
||||
|
91
youtube_dl/extractor/amcnetworks.py
Normal file
91
youtube_dl/extractor/amcnetworks.py
Normal file
@@ -0,0 +1,91 @@
|
||||
# coding: utf-8
|
||||
from __future__ import unicode_literals
|
||||
|
||||
from .theplatform import ThePlatformIE
|
||||
from ..utils import (
|
||||
update_url_query,
|
||||
parse_age_limit,
|
||||
int_or_none,
|
||||
)
|
||||
|
||||
|
||||
class AMCNetworksIE(ThePlatformIE):
|
||||
_VALID_URL = r'https?://(?:www\.)?(?:amc|bbcamerica|ifc|wetv)\.com/(?:movies/|shows/[^/]+/(?:full-episodes/)?season-\d+/episode-\d+(?:-(?:[^/]+/)?|/))(?P<id>[^/?#]+)'
|
||||
_TESTS = [{
|
||||
'url': 'http://www.ifc.com/shows/maron/season-04/episode-01/step-1',
|
||||
'md5': '',
|
||||
'info_dict': {
|
||||
'id': 's3MX01Nl4vPH',
|
||||
'ext': 'mp4',
|
||||
'title': 'Maron - Season 4 - Step 1',
|
||||
'description': 'In denial about his current situation, Marc is reluctantly convinced by his friends to enter rehab. Starring Marc Maron and Constance Zimmer.',
|
||||
'age_limit': 17,
|
||||
'upload_date': '20160505',
|
||||
'timestamp': 1462468831,
|
||||
'uploader': 'AMCN',
|
||||
},
|
||||
'params': {
|
||||
# m3u8 download
|
||||
'skip_download': True,
|
||||
},
|
||||
}, {
|
||||
'url': 'http://www.bbcamerica.com/shows/the-hunt/full-episodes/season-1/episode-01-the-hardest-challenge',
|
||||
'only_matching': True,
|
||||
}, {
|
||||
'url': 'http://www.amc.com/shows/preacher/full-episodes/season-01/episode-00/pilot',
|
||||
'only_matching': True,
|
||||
}, {
|
||||
'url': 'http://www.wetv.com/shows/million-dollar-matchmaker/season-01/episode-06-the-dumped-dj-and-shallow-hal',
|
||||
'only_matching': True,
|
||||
}, {
|
||||
'url': 'http://www.ifc.com/movies/chaos',
|
||||
'only_matching': True,
|
||||
}]
|
||||
|
||||
def _real_extract(self, url):
|
||||
display_id = self._match_id(url)
|
||||
webpage = self._download_webpage(url, display_id)
|
||||
query = {
|
||||
'mbr': 'true',
|
||||
'manifest': 'm3u',
|
||||
}
|
||||
media_url = self._search_regex(r'window\.platformLinkURL\s*=\s*[\'"]([^\'"]+)', webpage, 'media url')
|
||||
theplatform_metadata = self._download_theplatform_metadata(self._search_regex(
|
||||
r'https?://link.theplatform.com/s/([^?]+)', media_url, 'theplatform_path'), display_id)
|
||||
info = self._parse_theplatform_metadata(theplatform_metadata)
|
||||
video_id = theplatform_metadata['pid']
|
||||
title = theplatform_metadata['title']
|
||||
rating = theplatform_metadata['ratings'][0]['rating']
|
||||
auth_required = self._search_regex(r'window\.authRequired\s*=\s*(true|false);', webpage, 'auth required')
|
||||
if auth_required == 'true':
|
||||
requestor_id = self._search_regex(r'window\.requestor_id\s*=\s*[\'"]([^\'"]+)', webpage, 'requestor id')
|
||||
resource = self._get_mvpd_resource(requestor_id, title, video_id, rating)
|
||||
query['auth'] = self._extract_mvpd_auth(url, video_id, requestor_id, resource)
|
||||
media_url = update_url_query(media_url, query)
|
||||
formats, subtitles = self._extract_theplatform_smil(media_url, video_id)
|
||||
self._sort_formats(formats)
|
||||
info.update({
|
||||
'id': video_id,
|
||||
'subtitles': subtitles,
|
||||
'formats': formats,
|
||||
'age_limit': parse_age_limit(parse_age_limit(rating)),
|
||||
})
|
||||
ns_keys = theplatform_metadata.get('$xmlns', {}).keys()
|
||||
if ns_keys:
|
||||
ns = list(ns_keys)[0]
|
||||
series = theplatform_metadata.get(ns + '$show')
|
||||
season_number = int_or_none(theplatform_metadata.get(ns + '$season'))
|
||||
episode = theplatform_metadata.get(ns + '$episodeTitle')
|
||||
episode_number = int_or_none(theplatform_metadata.get(ns + '$episode'))
|
||||
if season_number:
|
||||
title = 'Season %d - %s' % (season_number, title)
|
||||
if series:
|
||||
title = '%s - %s' % (series, title)
|
||||
info.update({
|
||||
'title': title,
|
||||
'series': series,
|
||||
'season_number': season_number,
|
||||
'episode': episode,
|
||||
'episode_number': episode_number,
|
||||
})
|
||||
return info
|
@@ -12,46 +12,41 @@ from ..compat import (
|
||||
from ..utils import (
|
||||
int_or_none,
|
||||
parse_iso8601,
|
||||
sanitized_Request,
|
||||
smuggle_url,
|
||||
unsmuggle_url,
|
||||
urlencode_postdata,
|
||||
)
|
||||
|
||||
|
||||
class DCNIE(InfoExtractor):
|
||||
class AWAANIE(InfoExtractor):
|
||||
_VALID_URL = r'https?://(?:www\.)?(?:awaan|dcndigital)\.ae/(?:#/)?show/(?P<show_id>\d+)/[^/]+(?:/(?P<video_id>\d+)/(?P<season_id>\d+))?'
|
||||
|
||||
def _real_extract(self, url):
|
||||
show_id, video_id, season_id = re.match(self._VALID_URL, url).groups()
|
||||
if video_id and int(video_id) > 0:
|
||||
return self.url_result(
|
||||
'http://www.dcndigital.ae/media/%s' % video_id, 'DCNVideo')
|
||||
'http://awaan.ae/media/%s' % video_id, 'AWAANVideo')
|
||||
elif season_id and int(season_id) > 0:
|
||||
return self.url_result(smuggle_url(
|
||||
'http://www.dcndigital.ae/program/season/%s' % season_id,
|
||||
{'show_id': show_id}), 'DCNSeason')
|
||||
'http://awaan.ae/program/season/%s' % season_id,
|
||||
{'show_id': show_id}), 'AWAANSeason')
|
||||
else:
|
||||
return self.url_result(
|
||||
'http://www.dcndigital.ae/program/%s' % show_id, 'DCNSeason')
|
||||
'http://awaan.ae/program/%s' % show_id, 'AWAANSeason')
|
||||
|
||||
|
||||
class DCNBaseIE(InfoExtractor):
|
||||
def _extract_video_info(self, video_data, video_id, is_live):
|
||||
class AWAANBaseIE(InfoExtractor):
|
||||
def _parse_video_data(self, video_data, video_id, is_live):
|
||||
title = video_data.get('title_en') or video_data['title_ar']
|
||||
img = video_data.get('img')
|
||||
thumbnail = 'http://admin.mangomolo.com/analytics/%s' % img if img else None
|
||||
duration = int_or_none(video_data.get('duration'))
|
||||
description = video_data.get('description_en') or video_data.get('description_ar')
|
||||
timestamp = parse_iso8601(video_data.get('create_time'), ' ')
|
||||
|
||||
return {
|
||||
'id': video_id,
|
||||
'title': self._live_title(title) if is_live else title,
|
||||
'description': description,
|
||||
'thumbnail': thumbnail,
|
||||
'duration': duration,
|
||||
'timestamp': timestamp,
|
||||
'description': video_data.get('description_en') or video_data.get('description_ar'),
|
||||
'thumbnail': 'http://admin.mangomolo.com/analytics/%s' % img if img else None,
|
||||
'duration': int_or_none(video_data.get('duration')),
|
||||
'timestamp': parse_iso8601(video_data.get('create_time'), ' '),
|
||||
'is_live': is_live,
|
||||
}
|
||||
|
||||
@@ -75,11 +70,12 @@ class DCNBaseIE(InfoExtractor):
|
||||
return formats
|
||||
|
||||
|
||||
class DCNVideoIE(DCNBaseIE):
|
||||
IE_NAME = 'dcn:video'
|
||||
class AWAANVideoIE(AWAANBaseIE):
|
||||
IE_NAME = 'awaan:video'
|
||||
_VALID_URL = r'https?://(?:www\.)?(?:awaan|dcndigital)\.ae/(?:#/)?(?:video(?:/[^/]+)?|media|catchup/[^/]+/[^/]+)/(?P<id>\d+)'
|
||||
_TESTS = [{
|
||||
'url': 'http://www.dcndigital.ae/#/video/%D8%B1%D8%AD%D9%84%D8%A9-%D8%A7%D9%84%D8%B9%D9%85%D8%B1-%D8%A7%D9%84%D8%AD%D9%84%D9%82%D8%A9-1/17375',
|
||||
'md5': '5f61c33bfc7794315c671a62d43116aa',
|
||||
'info_dict':
|
||||
{
|
||||
'id': '17375',
|
||||
@@ -90,10 +86,6 @@ class DCNVideoIE(DCNBaseIE):
|
||||
'timestamp': 1227504126,
|
||||
'upload_date': '20081124',
|
||||
},
|
||||
'params': {
|
||||
# m3u8 download
|
||||
'skip_download': True,
|
||||
},
|
||||
}, {
|
||||
'url': 'http://awaan.ae/video/26723981/%D8%AF%D8%A7%D8%B1-%D8%A7%D9%84%D8%B3%D9%84%D8%A7%D9%85:-%D8%AE%D9%8A%D8%B1-%D8%AF%D9%88%D8%B1-%D8%A7%D9%84%D8%A3%D9%86%D8%B5%D8%A7%D8%B1',
|
||||
'only_matching': True,
|
||||
@@ -102,11 +94,10 @@ class DCNVideoIE(DCNBaseIE):
|
||||
def _real_extract(self, url):
|
||||
video_id = self._match_id(url)
|
||||
|
||||
request = sanitized_Request(
|
||||
video_data = self._download_json(
|
||||
'http://admin.mangomolo.com/analytics/index.php/plus/video?id=%s' % video_id,
|
||||
headers={'Origin': 'http://www.dcndigital.ae'})
|
||||
video_data = self._download_json(request, video_id)
|
||||
info = self._extract_video_info(video_data, video_id, False)
|
||||
video_id, headers={'Origin': 'http://awaan.ae'})
|
||||
info = self._parse_video_data(video_data, video_id, False)
|
||||
|
||||
webpage = self._download_webpage(
|
||||
'http://admin.mangomolo.com/analytics/index.php/customers/embed/video?' +
|
||||
@@ -121,19 +112,31 @@ class DCNVideoIE(DCNBaseIE):
|
||||
return info
|
||||
|
||||
|
||||
class DCNLiveIE(DCNBaseIE):
|
||||
IE_NAME = 'dcn:live'
|
||||
class AWAANLiveIE(AWAANBaseIE):
|
||||
IE_NAME = 'awaan:live'
|
||||
_VALID_URL = r'https?://(?:www\.)?(?:awaan|dcndigital)\.ae/(?:#/)?live/(?P<id>\d+)'
|
||||
_TEST = {
|
||||
'url': 'http://awaan.ae/live/6/dubai-tv',
|
||||
'info_dict': {
|
||||
'id': '6',
|
||||
'ext': 'mp4',
|
||||
'title': 're:Dubai Al Oula [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
|
||||
'upload_date': '20150107',
|
||||
'timestamp': 1420588800,
|
||||
},
|
||||
'params': {
|
||||
# m3u8 download
|
||||
'skip_download': True,
|
||||
},
|
||||
}
|
||||
|
||||
def _real_extract(self, url):
|
||||
channel_id = self._match_id(url)
|
||||
|
||||
request = sanitized_Request(
|
||||
channel_data = self._download_json(
|
||||
'http://admin.mangomolo.com/analytics/index.php/plus/getchanneldetails?channel_id=%s' % channel_id,
|
||||
headers={'Origin': 'http://www.dcndigital.ae'})
|
||||
|
||||
channel_data = self._download_json(request, channel_id)
|
||||
info = self._extract_video_info(channel_data, channel_id, True)
|
||||
channel_id, headers={'Origin': 'http://awaan.ae'})
|
||||
info = self._parse_video_data(channel_data, channel_id, True)
|
||||
|
||||
webpage = self._download_webpage(
|
||||
'http://admin.mangomolo.com/analytics/index.php/customers/embed/index?' +
|
||||
@@ -148,8 +151,8 @@ class DCNLiveIE(DCNBaseIE):
|
||||
return info
|
||||
|
||||
|
||||
class DCNSeasonIE(InfoExtractor):
|
||||
IE_NAME = 'dcn:season'
|
||||
class AWAANSeasonIE(InfoExtractor):
|
||||
IE_NAME = 'awaan:season'
|
||||
_VALID_URL = r'https?://(?:www\.)?(?:awaan|dcndigital)\.ae/(?:#/)?program/(?:(?P<show_id>\d+)|season/(?P<season_id>\d+))'
|
||||
_TEST = {
|
||||
'url': 'http://dcndigital.ae/#/program/205024/%D9%85%D8%AD%D8%A7%D8%B6%D8%B1%D8%A7%D8%AA-%D8%A7%D9%84%D8%B4%D9%8A%D8%AE-%D8%A7%D9%84%D8%B4%D8%B9%D8%B1%D8%A7%D9%88%D9%8A',
|
||||
@@ -170,21 +173,17 @@ class DCNSeasonIE(InfoExtractor):
|
||||
data['season'] = season_id
|
||||
show_id = smuggled_data.get('show_id')
|
||||
if show_id is None:
|
||||
request = sanitized_Request(
|
||||
season = self._download_json(
|
||||
'http://admin.mangomolo.com/analytics/index.php/plus/season_info?id=%s' % season_id,
|
||||
headers={'Origin': 'http://www.dcndigital.ae'})
|
||||
season = self._download_json(request, season_id)
|
||||
season_id, headers={'Origin': 'http://awaan.ae'})
|
||||
show_id = season['id']
|
||||
data['show_id'] = show_id
|
||||
request = sanitized_Request(
|
||||
show = self._download_json(
|
||||
'http://admin.mangomolo.com/analytics/index.php/plus/show',
|
||||
urlencode_postdata(data),
|
||||
{
|
||||
'Origin': 'http://www.dcndigital.ae',
|
||||
show_id, data=urlencode_postdata(data), headers={
|
||||
'Origin': 'http://awaan.ae',
|
||||
'Content-Type': 'application/x-www-form-urlencoded'
|
||||
})
|
||||
|
||||
show = self._download_json(request, show_id)
|
||||
if not season_id:
|
||||
season_id = show['default_season']
|
||||
for season in show['seasons']:
|
||||
@@ -195,6 +194,6 @@ class DCNSeasonIE(InfoExtractor):
|
||||
for video in show['videos']:
|
||||
video_id = compat_str(video['id'])
|
||||
entries.append(self.url_result(
|
||||
'http://www.dcndigital.ae/media/%s' % video_id, 'DCNVideo', video_id))
|
||||
'http://awaan.ae/media/%s' % video_id, 'AWAANVideo', video_id))
|
||||
|
||||
return self.playlist_result(entries, season_id, title)
|
@@ -2,6 +2,7 @@
|
||||
from __future__ import unicode_literals
|
||||
|
||||
import re
|
||||
import itertools
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..utils import (
|
||||
@@ -17,6 +18,7 @@ from ..utils import (
|
||||
from ..compat import (
|
||||
compat_etree_fromstring,
|
||||
compat_HTTPError,
|
||||
compat_urlparse,
|
||||
)
|
||||
|
||||
|
||||
@@ -1056,19 +1058,35 @@ class BBCCoUkArticleIE(InfoExtractor):
|
||||
|
||||
|
||||
class BBCCoUkPlaylistBaseIE(InfoExtractor):
|
||||
def _entries(self, webpage, url, playlist_id):
|
||||
single_page = 'page' in compat_urlparse.parse_qs(
|
||||
compat_urlparse.urlparse(url).query)
|
||||
for page_num in itertools.count(2):
|
||||
for video_id in re.findall(
|
||||
self._VIDEO_ID_TEMPLATE % BBCCoUkIE._ID_REGEX, webpage):
|
||||
yield self.url_result(
|
||||
self._URL_TEMPLATE % video_id, BBCCoUkIE.ie_key())
|
||||
if single_page:
|
||||
return
|
||||
next_page = self._search_regex(
|
||||
r'<li[^>]+class=(["\'])pagination_+next\1[^>]*><a[^>]+href=(["\'])(?P<url>(?:(?!\2).)+)\2',
|
||||
webpage, 'next page url', default=None, group='url')
|
||||
if not next_page:
|
||||
break
|
||||
webpage = self._download_webpage(
|
||||
compat_urlparse.urljoin(url, next_page), playlist_id,
|
||||
'Downloading page %d' % page_num, page_num)
|
||||
|
||||
def _real_extract(self, url):
|
||||
playlist_id = self._match_id(url)
|
||||
|
||||
webpage = self._download_webpage(url, playlist_id)
|
||||
|
||||
entries = [
|
||||
self.url_result(self._URL_TEMPLATE % video_id, BBCCoUkIE.ie_key())
|
||||
for video_id in re.findall(
|
||||
self._VIDEO_ID_TEMPLATE % BBCCoUkIE._ID_REGEX, webpage)]
|
||||
|
||||
title, description = self._extract_title_and_description(webpage)
|
||||
|
||||
return self.playlist_result(entries, playlist_id, title, description)
|
||||
return self.playlist_result(
|
||||
self._entries(webpage, url, playlist_id),
|
||||
playlist_id, title, description)
|
||||
|
||||
|
||||
class BBCCoUkIPlayerPlaylistIE(BBCCoUkPlaylistBaseIE):
|
||||
@@ -1117,6 +1135,24 @@ class BBCCoUkPlaylistIE(BBCCoUkPlaylistBaseIE):
|
||||
'description': 'French thriller serial about a missing teenager.',
|
||||
},
|
||||
'playlist_mincount': 7,
|
||||
}, {
|
||||
# multipage playlist, explicit page
|
||||
'url': 'http://www.bbc.co.uk/programmes/b00mfl7n/clips?page=1',
|
||||
'info_dict': {
|
||||
'id': 'b00mfl7n',
|
||||
'title': 'Frozen Planet - Clips - BBC One',
|
||||
'description': 'md5:65dcbf591ae628dafe32aa6c4a4a0d8c',
|
||||
},
|
||||
'playlist_mincount': 24,
|
||||
}, {
|
||||
# multipage playlist, all pages
|
||||
'url': 'http://www.bbc.co.uk/programmes/b00mfl7n/clips',
|
||||
'info_dict': {
|
||||
'id': 'b00mfl7n',
|
||||
'title': 'Frozen Planet - Clips - BBC One',
|
||||
'description': 'md5:65dcbf591ae628dafe32aa6c4a4a0d8c',
|
||||
},
|
||||
'playlist_mincount': 142,
|
||||
}, {
|
||||
'url': 'http://www.bbc.co.uk/programmes/b05rcz9v/broadcasts/2016/06',
|
||||
'only_matching': True,
|
||||
|
@@ -1,31 +1,74 @@
|
||||
# coding: utf-8
|
||||
from __future__ import unicode_literals
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..utils import smuggle_url
|
||||
from .adobepass import AdobePassIE
|
||||
from ..utils import (
|
||||
smuggle_url,
|
||||
update_url_query,
|
||||
int_or_none,
|
||||
)
|
||||
|
||||
|
||||
class BravoTVIE(InfoExtractor):
|
||||
_VALID_URL = r'https?://(?:www\.)?bravotv\.com/(?:[^/]+/)+videos/(?P<id>[^/?]+)'
|
||||
_TEST = {
|
||||
class BravoTVIE(AdobePassIE):
|
||||
_VALID_URL = r'https?://(?:www\.)?bravotv\.com/(?:[^/]+/)+(?P<id>[^/?#]+)'
|
||||
_TESTS = [{
|
||||
'url': 'http://www.bravotv.com/last-chance-kitchen/season-5/videos/lck-ep-12-fishy-finale',
|
||||
'md5': 'd60cdf68904e854fac669bd26cccf801',
|
||||
'md5': '9086d0b7ef0ea2aabc4781d75f4e5863',
|
||||
'info_dict': {
|
||||
'id': 'LitrBdX64qLn',
|
||||
'id': 'zHyk1_HU_mPy',
|
||||
'ext': 'mp4',
|
||||
'title': 'Last Chance Kitchen Returns',
|
||||
'description': 'S13: Last Chance Kitchen Returns for Top Chef Season 13',
|
||||
'timestamp': 1448926740,
|
||||
'upload_date': '20151130',
|
||||
'title': 'LCK Ep 12: Fishy Finale',
|
||||
'description': 'S13/E12: Two eliminated chefs have just 12 minutes to cook up a delicious fish dish.',
|
||||
'uploader': 'NBCU-BRAV',
|
||||
'upload_date': '20160302',
|
||||
'timestamp': 1456945320,
|
||||
}
|
||||
}
|
||||
}, {
|
||||
'url': 'http://www.bravotv.com/below-deck/season-3/ep-14-reunion-part-1',
|
||||
'only_matching': True,
|
||||
}]
|
||||
|
||||
def _real_extract(self, url):
|
||||
video_id = self._match_id(url)
|
||||
webpage = self._download_webpage(url, video_id)
|
||||
account_pid = self._search_regex(r'"account_pid"\s*:\s*"([^"]+)"', webpage, 'account pid')
|
||||
release_pid = self._search_regex(r'"release_pid"\s*:\s*"([^"]+)"', webpage, 'release pid')
|
||||
return self.url_result(smuggle_url(
|
||||
'http://link.theplatform.com/s/%s/%s?mbr=true&switch=progressive' % (account_pid, release_pid),
|
||||
{'force_smil_url': True}), 'ThePlatform', release_pid)
|
||||
display_id = self._match_id(url)
|
||||
webpage = self._download_webpage(url, display_id)
|
||||
settings = self._parse_json(self._search_regex(
|
||||
r'jQuery\.extend\(Drupal\.settings\s*,\s*({.+?})\);', webpage, 'drupal settings'),
|
||||
display_id)
|
||||
info = {}
|
||||
query = {
|
||||
'mbr': 'true',
|
||||
}
|
||||
account_pid, release_pid = [None] * 2
|
||||
tve = settings.get('sharedTVE')
|
||||
if tve:
|
||||
query['manifest'] = 'm3u'
|
||||
account_pid = 'HNK2IC'
|
||||
release_pid = tve['release_pid']
|
||||
if tve.get('entitlement') == 'auth':
|
||||
adobe_pass = settings.get('adobePass', {})
|
||||
resource = self._get_mvpd_resource(
|
||||
adobe_pass.get('adobePassResourceId', 'bravo'),
|
||||
tve['title'], release_pid, tve.get('rating'))
|
||||
query['auth'] = self._extract_mvpd_auth(
|
||||
url, release_pid, adobe_pass.get('adobePassRequestorId', 'bravo'), resource)
|
||||
else:
|
||||
shared_playlist = settings['shared_playlist']
|
||||
account_pid = shared_playlist['account_pid']
|
||||
metadata = shared_playlist['video_metadata'][shared_playlist['default_clip']]
|
||||
release_pid = metadata['release_pid']
|
||||
info.update({
|
||||
'title': metadata['title'],
|
||||
'description': metadata.get('description'),
|
||||
'season_number': int_or_none(metadata.get('season_num')),
|
||||
'episode_number': int_or_none(metadata.get('episode_num')),
|
||||
})
|
||||
query['switch'] = 'progressive'
|
||||
info.update({
|
||||
'_type': 'url_transparent',
|
||||
'id': release_pid,
|
||||
'url': smuggle_url(update_url_query(
|
||||
'http://link.theplatform.com/s/%s/%s' % (account_pid, release_pid),
|
||||
query), {'force_smil_url': True}),
|
||||
'ie_key': 'ThePlatform',
|
||||
})
|
||||
return info
|
||||
|
@@ -4,6 +4,7 @@ from .theplatform import ThePlatformFeedIE
|
||||
from ..utils import (
|
||||
int_or_none,
|
||||
find_xpath_attr,
|
||||
ExtractorError,
|
||||
)
|
||||
|
||||
|
||||
@@ -17,19 +18,6 @@ class CBSBaseIE(ThePlatformFeedIE):
|
||||
}]
|
||||
} if closed_caption_e is not None and closed_caption_e.attrib.get('value') else []
|
||||
|
||||
def _extract_video_info(self, filter_query, video_id):
|
||||
return self._extract_feed_info(
|
||||
'dJ5BDC', 'VxxJg8Ymh8sE', filter_query, video_id, lambda entry: {
|
||||
'series': entry.get('cbs$SeriesTitle'),
|
||||
'season_number': int_or_none(entry.get('cbs$SeasonNumber')),
|
||||
'episode': entry.get('cbs$EpisodeTitle'),
|
||||
'episode_number': int_or_none(entry.get('cbs$EpisodeNumber')),
|
||||
}, {
|
||||
'StreamPack': {
|
||||
'manifest': 'm3u',
|
||||
}
|
||||
})
|
||||
|
||||
|
||||
class CBSIE(CBSBaseIE):
|
||||
_VALID_URL = r'(?:cbs:|https?://(?:www\.)?(?:cbs\.com/shows/[^/]+/video|colbertlateshow\.com/(?:video|podcasts))/)(?P<id>[\w-]+)'
|
||||
@@ -38,7 +26,6 @@ class CBSIE(CBSBaseIE):
|
||||
'url': 'http://www.cbs.com/shows/garth-brooks/video/_u7W953k6la293J7EPTd9oHkSPs6Xn6_/connect-chat-feat-garth-brooks/',
|
||||
'info_dict': {
|
||||
'id': '_u7W953k6la293J7EPTd9oHkSPs6Xn6_',
|
||||
'display_id': 'connect-chat-feat-garth-brooks',
|
||||
'ext': 'mp4',
|
||||
'title': 'Connect Chat feat. Garth Brooks',
|
||||
'description': 'Connect with country music singer Garth Brooks, as he chats with fans on Wednesday November 27, 2013. Be sure to tune in to Garth Brooks: Live from Las Vegas, Friday November 29, at 9/8c on CBS!',
|
||||
@@ -47,7 +34,10 @@ class CBSIE(CBSBaseIE):
|
||||
'upload_date': '20131127',
|
||||
'uploader': 'CBSI-NEW',
|
||||
},
|
||||
'expected_warnings': ['Failed to download m3u8 information'],
|
||||
'params': {
|
||||
# m3u8 download
|
||||
'skip_download': True,
|
||||
},
|
||||
'_skip': 'Blocked outside the US',
|
||||
}, {
|
||||
'url': 'http://colbertlateshow.com/video/8GmB0oY0McANFvp2aEffk9jZZZ2YyXxy/the-colbeard/',
|
||||
@@ -56,8 +46,31 @@ class CBSIE(CBSBaseIE):
|
||||
'url': 'http://www.colbertlateshow.com/podcasts/dYSwjqPs_X1tvbV_P2FcPWRa_qT6akTC/in-the-bad-room-with-stephen/',
|
||||
'only_matching': True,
|
||||
}]
|
||||
TP_RELEASE_URL_TEMPLATE = 'http://link.theplatform.com/s/dJ5BDC/%s?mbr=true'
|
||||
|
||||
def _extract_video_info(self, guid):
|
||||
path = 'dJ5BDC/media/guid/2198311517/' + guid
|
||||
smil_url = 'http://link.theplatform.com/s/%s?mbr=true' % path
|
||||
formats, subtitles = self._extract_theplatform_smil(smil_url + '&manifest=m3u', guid)
|
||||
for r in ('HLS&formats=M3U', 'RTMP', 'WIFI', '3G'):
|
||||
try:
|
||||
tp_formats, _ = self._extract_theplatform_smil(smil_url + '&assetTypes=' + r, guid, 'Downloading %s SMIL data' % r.split('&')[0])
|
||||
formats.extend(tp_formats)
|
||||
except ExtractorError:
|
||||
continue
|
||||
self._sort_formats(formats)
|
||||
metadata = self._download_theplatform_metadata(path, guid)
|
||||
info = self._parse_theplatform_metadata(metadata)
|
||||
info.update({
|
||||
'id': guid,
|
||||
'formats': formats,
|
||||
'subtitles': subtitles,
|
||||
'series': metadata.get('cbs$SeriesTitle'),
|
||||
'season_number': int_or_none(metadata.get('cbs$SeasonNumber')),
|
||||
'episode': metadata.get('cbs$EpisodeTitle'),
|
||||
'episode_number': int_or_none(metadata.get('cbs$EpisodeNumber')),
|
||||
})
|
||||
return info
|
||||
|
||||
def _real_extract(self, url):
|
||||
content_id = self._match_id(url)
|
||||
return self._extract_video_info('byGuid=%s' % content_id, content_id)
|
||||
return self._extract_video_info(content_id)
|
||||
|
@@ -41,13 +41,8 @@ class CBSLocalIE(AnvatoIE):
|
||||
'url': 'http://cleveland.cbslocal.com/2016/05/16/indians-score-season-high-15-runs-in-blowout-win-over-reds-rapid-reaction/',
|
||||
'info_dict': {
|
||||
'id': 'GxfCe0Zo7D-175909-5588',
|
||||
'ext': 'mp4',
|
||||
'title': 'Recap: CLE 15, CIN 6',
|
||||
'description': '5/16/16: Indians\' bats explode for 15 runs in a win',
|
||||
'upload_date': '20160516',
|
||||
'timestamp': 1463433840,
|
||||
'duration': 49,
|
||||
},
|
||||
'playlist_count': 9,
|
||||
'params': {
|
||||
# m3u8 download
|
||||
'skip_download': True,
|
||||
@@ -60,12 +55,11 @@ class CBSLocalIE(AnvatoIE):
|
||||
|
||||
sendtonews_url = SendtoNewsIE._extract_url(webpage)
|
||||
if sendtonews_url:
|
||||
info_dict = {
|
||||
'_type': 'url_transparent',
|
||||
'url': compat_urlparse.urljoin(url, sendtonews_url),
|
||||
}
|
||||
else:
|
||||
info_dict = self._extract_anvato_videos(webpage, display_id)
|
||||
return self.url_result(
|
||||
compat_urlparse.urljoin(url, sendtonews_url),
|
||||
ie=SendtoNewsIE.ie_key())
|
||||
|
||||
info_dict = self._extract_anvato_videos(webpage, display_id)
|
||||
|
||||
time_str = self._html_search_regex(
|
||||
r'class="entry-date">([^<]+)<', webpage, 'released date', fatal=False)
|
||||
|
@@ -2,13 +2,13 @@
|
||||
from __future__ import unicode_literals
|
||||
|
||||
from .common import InfoExtractor
|
||||
from .cbs import CBSBaseIE
|
||||
from .cbs import CBSIE
|
||||
from ..utils import (
|
||||
parse_duration,
|
||||
)
|
||||
|
||||
|
||||
class CBSNewsIE(CBSBaseIE):
|
||||
class CBSNewsIE(CBSIE):
|
||||
IE_DESC = 'CBS News'
|
||||
_VALID_URL = r'https?://(?:www\.)?cbsnews\.com/(?:news|videos)/(?P<id>[\da-z_-]+)'
|
||||
|
||||
@@ -35,7 +35,8 @@ class CBSNewsIE(CBSBaseIE):
|
||||
'ext': 'mp4',
|
||||
'title': 'Fort Hood shooting: Army downplays mental illness as cause of attack',
|
||||
'description': 'md5:4a6983e480542d8b333a947bfc64ddc7',
|
||||
'upload_date': '19700101',
|
||||
'upload_date': '20140404',
|
||||
'timestamp': 1396650660,
|
||||
'uploader': 'CBSI-NEW',
|
||||
'thumbnail': 're:^https?://.*\.jpg$',
|
||||
'duration': 205,
|
||||
@@ -63,14 +64,15 @@ class CBSNewsIE(CBSBaseIE):
|
||||
|
||||
item = video_info['item'] if 'item' in video_info else video_info
|
||||
guid = item['mpxRefId']
|
||||
return self._extract_video_info('byGuid=%s' % guid, guid)
|
||||
return self._extract_video_info(guid)
|
||||
|
||||
|
||||
class CBSNewsLiveVideoIE(InfoExtractor):
|
||||
IE_DESC = 'CBS News Live Videos'
|
||||
_VALID_URL = r'https?://(?:www\.)?cbsnews\.com/live/video/(?P<id>[\da-z_-]+)'
|
||||
|
||||
_TESTS = [{
|
||||
# Live videos get deleted soon. See http://www.cbsnews.com/live/ for the latest examples
|
||||
_TEST = {
|
||||
'url': 'http://www.cbsnews.com/live/video/clinton-sanders-prepare-to-face-off-in-nh/',
|
||||
'info_dict': {
|
||||
'id': 'clinton-sanders-prepare-to-face-off-in-nh',
|
||||
@@ -78,15 +80,8 @@ class CBSNewsLiveVideoIE(InfoExtractor):
|
||||
'title': 'Clinton, Sanders Prepare To Face Off In NH',
|
||||
'duration': 334,
|
||||
},
|
||||
'skip': 'Video gone, redirected to http://www.cbsnews.com/live/',
|
||||
}, {
|
||||
'url': 'http://www.cbsnews.com/live/video/video-shows-intense-paragliding-accident/',
|
||||
'info_dict': {
|
||||
'id': 'video-shows-intense-paragliding-accident',
|
||||
'ext': 'flv',
|
||||
'title': 'Video Shows Intense Paragliding Accident',
|
||||
},
|
||||
}]
|
||||
'skip': 'Video gone',
|
||||
}
|
||||
|
||||
def _real_extract(self, url):
|
||||
video_id = self._match_id(url)
|
||||
|
@@ -23,6 +23,9 @@ class CBSSportsIE(CBSBaseIE):
|
||||
}
|
||||
}]
|
||||
|
||||
def _extract_video_info(self, filter_query, video_id):
|
||||
return self._extract_feed_info('dJ5BDC', 'VxxJg8Ymh8sE', filter_query, video_id)
|
||||
|
||||
def _real_extract(self, url):
|
||||
video_id = self._match_id(url)
|
||||
return self._extract_video_info('byId=%s' % video_id, video_id)
|
||||
|
51
youtube_dl/extractor/charlierose.py
Normal file
51
youtube_dl/extractor/charlierose.py
Normal file
@@ -0,0 +1,51 @@
|
||||
from __future__ import unicode_literals
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..utils import remove_end
|
||||
|
||||
|
||||
class CharlieRoseIE(InfoExtractor):
|
||||
_VALID_URL = r'https?://(?:www\.)?charlierose\.com/video(?:s|/player)/(?P<id>\d+)'
|
||||
_TESTS = [{
|
||||
'url': 'https://charlierose.com/videos/27996',
|
||||
'md5': 'fda41d49e67d4ce7c2411fd2c4702e09',
|
||||
'info_dict': {
|
||||
'id': '27996',
|
||||
'ext': 'mp4',
|
||||
'title': 'Remembering Zaha Hadid',
|
||||
'thumbnail': 're:^https?://.*\.jpg\?\d+',
|
||||
'description': 'We revisit past conversations with Zaha Hadid, in memory of the world renowned Iraqi architect.',
|
||||
'subtitles': {
|
||||
'en': [{
|
||||
'ext': 'vtt',
|
||||
}],
|
||||
},
|
||||
},
|
||||
}, {
|
||||
'url': 'https://charlierose.com/videos/27996',
|
||||
'only_matching': True,
|
||||
}]
|
||||
|
||||
_PLAYER_BASE = 'https://charlierose.com/video/player/%s'
|
||||
|
||||
def _real_extract(self, url):
|
||||
video_id = self._match_id(url)
|
||||
webpage = self._download_webpage(self._PLAYER_BASE % video_id, video_id)
|
||||
|
||||
title = remove_end(self._og_search_title(webpage), ' - Charlie Rose')
|
||||
|
||||
info_dict = self._parse_html5_media_entries(
|
||||
self._PLAYER_BASE % video_id, webpage, video_id,
|
||||
m3u8_entry_protocol='m3u8_native')[0]
|
||||
|
||||
self._sort_formats(info_dict['formats'])
|
||||
self._remove_duplicate_formats(info_dict['formats'])
|
||||
|
||||
info_dict.update({
|
||||
'id': video_id,
|
||||
'title': title,
|
||||
'thumbnail': self._og_search_thumbnail(webpage),
|
||||
'description': self._og_search_description(webpage),
|
||||
})
|
||||
|
||||
return info_dict
|
@@ -11,7 +11,7 @@ from ..utils import (
|
||||
|
||||
|
||||
class CNNIE(InfoExtractor):
|
||||
_VALID_URL = r'''(?x)https?://(?:(?:edition|www)\.)?cnn\.com/video/(?:data/.+?|\?)/
|
||||
_VALID_URL = r'''(?x)https?://(?:(?P<sub_domain>edition|www|money)\.)?cnn\.com/(?:video/(?:data/.+?|\?)/)?videos?/
|
||||
(?P<path>.+?/(?P<title>[^/]+?)(?:\.(?:[a-z\-]+)|(?=&)))'''
|
||||
|
||||
_TESTS = [{
|
||||
@@ -45,19 +45,46 @@ class CNNIE(InfoExtractor):
|
||||
'description': 'md5:e7223a503315c9f150acac52e76de086',
|
||||
'upload_date': '20141222',
|
||||
}
|
||||
}, {
|
||||
'url': 'http://money.cnn.com/video/news/2016/08/19/netflix-stunning-stats.cnnmoney/index.html',
|
||||
'md5': '52a515dc1b0f001cd82e4ceda32be9d1',
|
||||
'info_dict': {
|
||||
'id': '/video/news/2016/08/19/netflix-stunning-stats.cnnmoney',
|
||||
'ext': 'mp4',
|
||||
'title': '5 stunning stats about Netflix',
|
||||
'description': 'Did you know that Netflix has more than 80 million members? Here are five facts about the online video distributor that you probably didn\'t know.',
|
||||
'upload_date': '20160819',
|
||||
}
|
||||
}, {
|
||||
'url': 'http://cnn.com/video/?/video/politics/2015/03/27/pkg-arizona-senator-church-attendance-mandatory.ktvk',
|
||||
'only_matching': True,
|
||||
}, {
|
||||
'url': 'http://cnn.com/video/?/video/us/2015/04/06/dnt-baker-refuses-anti-gay-order.wkmg',
|
||||
'only_matching': True,
|
||||
}, {
|
||||
'url': 'http://edition.cnn.com/videos/arts/2016/04/21/olympic-games-cultural-a-z-brazil.cnn',
|
||||
'only_matching': True,
|
||||
}]
|
||||
|
||||
_CONFIG = {
|
||||
# http://edition.cnn.com/.element/apps/cvp/3.0/cfg/spider/cnn/expansion/config.xml
|
||||
'edition': {
|
||||
'data_src': 'http://edition.cnn.com/video/data/3.0/video/%s/index.xml',
|
||||
'media_src': 'http://pmd.cdn.turner.com/cnn/big',
|
||||
},
|
||||
# http://money.cnn.com/.element/apps/cvp2/cfg/config.xml
|
||||
'money': {
|
||||
'data_src': 'http://money.cnn.com/video/data/4.0/video/%s.xml',
|
||||
'media_src': 'http://ht3.cdn.turner.com/money/big',
|
||||
},
|
||||
}
|
||||
|
||||
def _real_extract(self, url):
|
||||
mobj = re.match(self._VALID_URL, url)
|
||||
path = mobj.group('path')
|
||||
page_title = mobj.group('title')
|
||||
info_url = 'http://edition.cnn.com/video/data/3.0/%s/index.xml' % path
|
||||
sub_domain, path, page_title = re.match(self._VALID_URL, url).groups()
|
||||
if sub_domain not in ('money', 'edition'):
|
||||
sub_domain = 'edition'
|
||||
config = self._CONFIG[sub_domain]
|
||||
info_url = config['data_src'] % path
|
||||
info = self._download_xml(info_url, page_title)
|
||||
|
||||
formats = []
|
||||
@@ -66,7 +93,7 @@ class CNNIE(InfoExtractor):
|
||||
(?:_(?P<bitrate>[0-9]+)k)?
|
||||
''')
|
||||
for f in info.findall('files/file'):
|
||||
video_url = 'http://ht.cdn.turner.com/cnn/big%s' % (f.text.strip())
|
||||
video_url = config['media_src'] + f.text.strip()
|
||||
fdct = {
|
||||
'format_id': f.attrib['bitrate'],
|
||||
'url': video_url,
|
||||
@@ -146,7 +173,7 @@ class CNNBlogsIE(InfoExtractor):
|
||||
|
||||
|
||||
class CNNArticleIE(InfoExtractor):
|
||||
_VALID_URL = r'https?://(?:(?:edition|www)\.)?cnn\.com/(?!video/)'
|
||||
_VALID_URL = r'https?://(?:(?:edition|www)\.)?cnn\.com/(?!videos?/)'
|
||||
_TEST = {
|
||||
'url': 'http://www.cnn.com/2014/12/21/politics/obama-north-koreas-hack-not-war-but-cyber-vandalism/',
|
||||
'md5': '689034c2a3d9c6dc4aa72d65a81efd01',
|
||||
|
@@ -662,6 +662,24 @@ class InfoExtractor(object):
|
||||
else:
|
||||
return res
|
||||
|
||||
def _get_netrc_login_info(self, netrc_machine=None):
|
||||
username = None
|
||||
password = None
|
||||
netrc_machine = netrc_machine or self._NETRC_MACHINE
|
||||
|
||||
if self._downloader.params.get('usenetrc', False):
|
||||
try:
|
||||
info = netrc.netrc().authenticators(netrc_machine)
|
||||
if info is not None:
|
||||
username = info[0]
|
||||
password = info[2]
|
||||
else:
|
||||
raise netrc.NetrcParseError('No authenticators for %s' % netrc_machine)
|
||||
except (IOError, netrc.NetrcParseError) as err:
|
||||
self._downloader.report_warning('parsing .netrc: %s' % error_to_compat_str(err))
|
||||
|
||||
return (username, password)
|
||||
|
||||
def _get_login_info(self):
|
||||
"""
|
||||
Get the login info as (username, password)
|
||||
@@ -679,16 +697,8 @@ class InfoExtractor(object):
|
||||
if downloader_params.get('username') is not None:
|
||||
username = downloader_params['username']
|
||||
password = downloader_params['password']
|
||||
elif downloader_params.get('usenetrc', False):
|
||||
try:
|
||||
info = netrc.netrc().authenticators(self._NETRC_MACHINE)
|
||||
if info is not None:
|
||||
username = info[0]
|
||||
password = info[2]
|
||||
else:
|
||||
raise netrc.NetrcParseError('No authenticators for %s' % self._NETRC_MACHINE)
|
||||
except (IOError, netrc.NetrcParseError) as err:
|
||||
self._downloader.report_warning('parsing .netrc: %s' % error_to_compat_str(err))
|
||||
else:
|
||||
username, password = self._get_netrc_login_info()
|
||||
|
||||
return (username, password)
|
||||
|
||||
@@ -1685,7 +1695,7 @@ class InfoExtractor(object):
|
||||
self.report_warning('Unknown MIME type %s in DASH manifest' % mime_type)
|
||||
return formats
|
||||
|
||||
def _parse_html5_media_entries(self, base_url, webpage):
|
||||
def _parse_html5_media_entries(self, base_url, webpage, video_id, m3u8_id=None, m3u8_entry_protocol='m3u8'):
|
||||
def absolute_url(video_url):
|
||||
return compat_urlparse.urljoin(base_url, video_url)
|
||||
|
||||
@@ -1700,6 +1710,21 @@ class InfoExtractor(object):
|
||||
return f
|
||||
return {}
|
||||
|
||||
def _media_formats(src, cur_media_type):
|
||||
full_url = absolute_url(src)
|
||||
if determine_ext(full_url) == 'm3u8':
|
||||
is_plain_url = False
|
||||
formats = self._extract_m3u8_formats(
|
||||
full_url, video_id, ext='mp4',
|
||||
entry_protocol=m3u8_entry_protocol, m3u8_id=m3u8_id)
|
||||
else:
|
||||
is_plain_url = True
|
||||
formats = [{
|
||||
'url': full_url,
|
||||
'vcodec': 'none' if cur_media_type == 'audio' else None,
|
||||
}]
|
||||
return is_plain_url, formats
|
||||
|
||||
entries = []
|
||||
for media_tag, media_type, media_content in re.findall(r'(?s)(<(?P<tag>video|audio)[^>]*>)(.*?)</(?P=tag)>', webpage):
|
||||
media_info = {
|
||||
@@ -1709,10 +1734,8 @@ class InfoExtractor(object):
|
||||
media_attributes = extract_attributes(media_tag)
|
||||
src = media_attributes.get('src')
|
||||
if src:
|
||||
media_info['formats'].append({
|
||||
'url': absolute_url(src),
|
||||
'vcodec': 'none' if media_type == 'audio' else None,
|
||||
})
|
||||
_, formats = _media_formats(src)
|
||||
media_info['formats'].extend(formats)
|
||||
media_info['thumbnail'] = media_attributes.get('poster')
|
||||
if media_content:
|
||||
for source_tag in re.findall(r'<source[^>]+>', media_content):
|
||||
@@ -1720,12 +1743,13 @@ class InfoExtractor(object):
|
||||
src = source_attributes.get('src')
|
||||
if not src:
|
||||
continue
|
||||
f = parse_content_type(source_attributes.get('type'))
|
||||
f.update({
|
||||
'url': absolute_url(src),
|
||||
'vcodec': 'none' if media_type == 'audio' else None,
|
||||
})
|
||||
media_info['formats'].append(f)
|
||||
is_plain_url, formats = _media_formats(src, media_type)
|
||||
if is_plain_url:
|
||||
f = parse_content_type(source_attributes.get('type'))
|
||||
f.update(formats[0])
|
||||
media_info['formats'].append(f)
|
||||
else:
|
||||
media_info['formats'].extend(formats)
|
||||
for track_tag in re.findall(r'<track[^>]+>', media_content):
|
||||
track_attributes = extract_attributes(track_tag)
|
||||
kind = track_attributes.get('kind')
|
||||
@@ -1741,6 +1765,18 @@ class InfoExtractor(object):
|
||||
entries.append(media_info)
|
||||
return entries
|
||||
|
||||
def _extract_akamai_formats(self, manifest_url, video_id):
|
||||
formats = []
|
||||
f4m_url = re.sub(r'(https?://.+?)/i/', r'\1/z/', manifest_url).replace('/master.m3u8', '/manifest.f4m')
|
||||
formats.extend(self._extract_f4m_formats(
|
||||
update_url_query(f4m_url, {'hdcore': '3.7.0'}),
|
||||
video_id, f4m_id='hds', fatal=False))
|
||||
m3u8_url = re.sub(r'(https?://.+?)/z/', r'\1/i/', manifest_url).replace('/manifest.f4m', '/master.m3u8')
|
||||
formats.extend(self._extract_m3u8_formats(
|
||||
m3u8_url, video_id, 'mp4', 'm3u8_native',
|
||||
m3u8_id='hls', fatal=False))
|
||||
return formats
|
||||
|
||||
def _live_title(self, name):
|
||||
""" Generate the title for a live video """
|
||||
now = datetime.datetime.now()
|
||||
|
@@ -1,9 +1,13 @@
|
||||
from __future__ import unicode_literals
|
||||
|
||||
import re
|
||||
import time
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..utils import int_or_none
|
||||
from ..utils import (
|
||||
int_or_none,
|
||||
HEADRequest,
|
||||
)
|
||||
|
||||
|
||||
class CultureUnpluggedIE(InfoExtractor):
|
||||
@@ -32,6 +36,9 @@ class CultureUnpluggedIE(InfoExtractor):
|
||||
video_id = mobj.group('id')
|
||||
display_id = mobj.group('display_id') or video_id
|
||||
|
||||
# request setClientTimezone.php to get PHPSESSID cookie which is need to get valid json data in the next request
|
||||
self._request_webpage(HEADRequest(
|
||||
'http://www.cultureunplugged.com/setClientTimezone.php?timeOffset=%d' % -(time.timezone / 3600)), display_id)
|
||||
movie_data = self._download_json(
|
||||
'http://www.cultureunplugged.com/movie-data/cu-%s.json' % video_id, display_id)
|
||||
|
||||
|
@@ -38,6 +38,12 @@ class DBTVIE(InfoExtractor):
|
||||
'only_matching': True,
|
||||
}]
|
||||
|
||||
@staticmethod
|
||||
def _extract_urls(webpage):
|
||||
return [url for _, url in re.findall(
|
||||
r'<iframe[^>]+src=(["\'])((?:https?:)?//(?:www\.)?dbtv\.no/(?:lazy)?player/\d+.*?)\1',
|
||||
webpage)]
|
||||
|
||||
def _real_extract(self, url):
|
||||
video_id, display_id = re.match(self._VALID_URL, url).groups()
|
||||
|
||||
|
@@ -11,7 +11,17 @@ from ..utils import (
|
||||
|
||||
|
||||
class DiscoveryGoIE(InfoExtractor):
|
||||
_VALID_URL = r'https?://(?:www\.)?discoverygo\.com/(?:[^/]+/)*(?P<id>[^/?#&]+)'
|
||||
_VALID_URL = r'''(?x)https?://(?:www\.)?(?:
|
||||
discovery|
|
||||
investigationdiscovery|
|
||||
discoverylife|
|
||||
animalplanet|
|
||||
ahctv|
|
||||
destinationamerica|
|
||||
sciencechannel|
|
||||
tlc|
|
||||
velocitychannel
|
||||
)go\.com/(?:[^/]+/)*(?P<id>[^/?#&]+)'''
|
||||
_TEST = {
|
||||
'url': 'https://www.discoverygo.com/love-at-first-kiss/kiss-first-ask-questions-later/',
|
||||
'info_dict': {
|
||||
|
@@ -10,18 +10,18 @@ from ..utils import (
|
||||
class DotsubIE(InfoExtractor):
|
||||
_VALID_URL = r'https?://(?:www\.)?dotsub\.com/view/(?P<id>[^/]+)'
|
||||
_TEST = {
|
||||
'url': 'http://dotsub.com/view/aed3b8b2-1889-4df5-ae63-ad85f5572f27',
|
||||
'md5': '0914d4d69605090f623b7ac329fea66e',
|
||||
'url': 'https://dotsub.com/view/9c63db2a-fa95-4838-8e6e-13deafe47f09',
|
||||
'md5': '21c7ff600f545358134fea762a6d42b6',
|
||||
'info_dict': {
|
||||
'id': 'aed3b8b2-1889-4df5-ae63-ad85f5572f27',
|
||||
'id': '9c63db2a-fa95-4838-8e6e-13deafe47f09',
|
||||
'ext': 'flv',
|
||||
'title': 'Pyramids of Waste (2010), AKA The Lightbulb Conspiracy - Planned obsolescence documentary',
|
||||
'description': 'md5:699a0f7f50aeec6042cb3b1db2d0d074',
|
||||
'thumbnail': 're:^https?://dotsub.com/media/aed3b8b2-1889-4df5-ae63-ad85f5572f27/p',
|
||||
'duration': 3169,
|
||||
'uploader': '4v4l0n42',
|
||||
'timestamp': 1292248482.625,
|
||||
'upload_date': '20101213',
|
||||
'title': 'MOTIVATION - "It\'s Possible" Best Inspirational Video Ever',
|
||||
'description': 'md5:41af1e273edbbdfe4e216a78b9d34ac6',
|
||||
'thumbnail': 're:^https?://dotsub.com/media/9c63db2a-fa95-4838-8e6e-13deafe47f09/p',
|
||||
'duration': 198,
|
||||
'uploader': 'liuxt',
|
||||
'timestamp': 1385778501.104,
|
||||
'upload_date': '20131130',
|
||||
'view_count': int,
|
||||
}
|
||||
}
|
||||
|
@@ -52,11 +52,24 @@ class EaglePlatformIE(InfoExtractor):
|
||||
|
||||
@staticmethod
|
||||
def _extract_url(webpage):
|
||||
# Regular iframe embedding
|
||||
mobj = re.search(
|
||||
r'<iframe[^>]+src=(["\'])(?P<url>(?:https?:)?//.+?\.media\.eagleplatform\.com/index/player\?.+?)\1',
|
||||
webpage)
|
||||
if mobj is not None:
|
||||
return mobj.group('url')
|
||||
# Basic usage embedding (see http://dultonmedia.github.io/eplayer/)
|
||||
mobj = re.search(
|
||||
r'''(?xs)
|
||||
<script[^>]+
|
||||
src=(?P<q1>["\'])(?:https?:)?//(?P<host>.+?\.media\.eagleplatform\.com)/player/player\.js(?P=q1)
|
||||
.+?
|
||||
<div[^>]+
|
||||
class=(?P<q2>["\'])eagleplayer(?P=q2)[^>]+
|
||||
data-id=["\'](?P<id>\d+)
|
||||
''', webpage)
|
||||
if mobj is not None:
|
||||
return 'eagleplatform:%(host)s:%(id)s' % mobj.groupdict()
|
||||
|
||||
@staticmethod
|
||||
def _handle_error(response):
|
||||
|
@@ -1,7 +1,10 @@
|
||||
# flake8: noqa
|
||||
from __future__ import unicode_literals
|
||||
|
||||
from .abc import ABCIE
|
||||
from .abc import (
|
||||
ABCIE,
|
||||
ABCIViewIE,
|
||||
)
|
||||
from .abc7news import Abc7NewsIE
|
||||
from .abcnews import (
|
||||
AbcNewsIE,
|
||||
@@ -29,6 +32,7 @@ from .aftonbladet import AftonbladetIE
|
||||
from .airmozilla import AirMozillaIE
|
||||
from .aljazeera import AlJazeeraIE
|
||||
from .alphaporno import AlphaPornoIE
|
||||
from .amcnetworks import AMCNetworksIE
|
||||
from .animeondemand import AnimeOnDemandIE
|
||||
from .anitube import AnitubeIE
|
||||
from .anysex import AnySexIE
|
||||
@@ -67,6 +71,12 @@ from .atttechchannel import ATTTechChannelIE
|
||||
from .audimedia import AudiMediaIE
|
||||
from .audioboom import AudioBoomIE
|
||||
from .audiomack import AudiomackIE, AudiomackAlbumIE
|
||||
from .awaan import (
|
||||
AWAANIE,
|
||||
AWAANVideoIE,
|
||||
AWAANLiveIE,
|
||||
AWAANSeasonIE,
|
||||
)
|
||||
from .azubu import AzubuIE, AzubuLiveIE
|
||||
from .baidu import BaiduVideoIE
|
||||
from .bambuser import BambuserIE, BambuserChannelIE
|
||||
@@ -133,6 +143,7 @@ from .ccc import CCCIE
|
||||
from .cda import CDAIE
|
||||
from .ceskatelevize import CeskaTelevizeIE
|
||||
from .channel9 import Channel9IE
|
||||
from .charlierose import CharlieRoseIE
|
||||
from .chaturbate import ChaturbateIE
|
||||
from .chilloutzone import ChilloutzoneIE
|
||||
from .chirbit import (
|
||||
@@ -195,12 +206,6 @@ from .daum import (
|
||||
DaumUserIE,
|
||||
)
|
||||
from .dbtv import DBTVIE
|
||||
from .dcn import (
|
||||
DCNIE,
|
||||
DCNVideoIE,
|
||||
DCNLiveIE,
|
||||
DCNSeasonIE,
|
||||
)
|
||||
from .dctp import DctpTvIE
|
||||
from .deezer import DeezerPlaylistIE
|
||||
from .democracynow import DemocracynowIE
|
||||
@@ -287,6 +292,7 @@ from .freevideo import FreeVideoIE
|
||||
from .funimation import FunimationIE
|
||||
from .funnyordie import FunnyOrDieIE
|
||||
from .fusion import FusionIE
|
||||
from .fxnetworks import FXNetworksIE
|
||||
from .gameinformer import GameInformerIE
|
||||
from .gameone import (
|
||||
GameOneIE,
|
||||
@@ -322,7 +328,10 @@ from .heise import HeiseIE
|
||||
from .hellporno import HellPornoIE
|
||||
from .helsinki import HelsinkiIE
|
||||
from .hentaistigma import HentaiStigmaIE
|
||||
from .hgtv import HGTVIE
|
||||
from .hgtv import (
|
||||
HGTVIE,
|
||||
HGTVComShowIE,
|
||||
)
|
||||
from .historicfilms import HistoricFilmsIE
|
||||
from .hitbox import HitboxIE, HitboxLiveIE
|
||||
from .hornbunny import HornBunnyIE
|
||||
@@ -637,6 +646,7 @@ from .podomatic import PodomaticIE
|
||||
from .pokemon import PokemonIE
|
||||
from .polskieradio import PolskieRadioIE
|
||||
from .porn91 import Porn91IE
|
||||
from .porncom import PornComIE
|
||||
from .pornhd import PornHdIE
|
||||
from .pornhub import (
|
||||
PornHubIE,
|
||||
@@ -896,7 +906,10 @@ from .tvp import (
|
||||
TVPIE,
|
||||
TVPSeriesIE,
|
||||
)
|
||||
from .tvplay import TVPlayIE
|
||||
from .tvplay import (
|
||||
TVPlayIE,
|
||||
ViafreeIE,
|
||||
)
|
||||
from .tweakers import TweakersIE
|
||||
from .twentyfourvideo import TwentyFourVideoIE
|
||||
from .twentymin import TwentyMinutenIE
|
||||
@@ -926,6 +939,10 @@ from .udn import UDNEmbedIE
|
||||
from .digiteka import DigitekaIE
|
||||
from .unistra import UnistraIE
|
||||
from .uol import UOLIE
|
||||
from .uplynk import (
|
||||
UplynkIE,
|
||||
UplynkPreplayIE,
|
||||
)
|
||||
from .urort import UrortIE
|
||||
from .urplay import URPlayIE
|
||||
from .usatoday import USATodayIE
|
||||
@@ -954,6 +971,7 @@ from .vice import (
|
||||
ViceIE,
|
||||
ViceShowIE,
|
||||
)
|
||||
from .viceland import VicelandIE
|
||||
from .vidbit import VidbitIE
|
||||
from .viddler import ViddlerIE
|
||||
from .videodetective import VideoDetectiveIE
|
||||
@@ -1100,8 +1118,4 @@ from .youtube import (
|
||||
)
|
||||
from .zapiks import ZapiksIE
|
||||
from .zdf import ZDFIE, ZDFChannelIE
|
||||
from .zingmp3 import (
|
||||
ZingMp3SongIE,
|
||||
ZingMp3AlbumIE,
|
||||
)
|
||||
from .zippcast import ZippCastIE
|
||||
from .zingmp3 import ZingMp3IE
|
||||
|
@@ -1,20 +1,14 @@
|
||||
from __future__ import unicode_literals
|
||||
|
||||
import re
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..utils import (
|
||||
int_or_none,
|
||||
sanitized_Request,
|
||||
str_to_int,
|
||||
)
|
||||
from ..utils import str_to_int
|
||||
from .keezmovies import KeezMoviesIE
|
||||
|
||||
|
||||
class ExtremeTubeIE(InfoExtractor):
|
||||
class ExtremeTubeIE(KeezMoviesIE):
|
||||
_VALID_URL = r'https?://(?:www\.)?extremetube\.com/(?:[^/]+/)?video/(?P<id>[^/#?&]+)'
|
||||
_TESTS = [{
|
||||
'url': 'http://www.extremetube.com/video/music-video-14-british-euro-brit-european-cumshots-swallow-652431',
|
||||
'md5': '344d0c6d50e2f16b06e49ca011d8ac69',
|
||||
'md5': '1fb9228f5e3332ec8c057d6ac36f33e0',
|
||||
'info_dict': {
|
||||
'id': 'music-video-14-british-euro-brit-european-cumshots-swallow-652431',
|
||||
'ext': 'mp4',
|
||||
@@ -35,58 +29,22 @@ class ExtremeTubeIE(InfoExtractor):
|
||||
}]
|
||||
|
||||
def _real_extract(self, url):
|
||||
video_id = self._match_id(url)
|
||||
webpage, info = self._extract_info(url)
|
||||
|
||||
req = sanitized_Request(url)
|
||||
req.add_header('Cookie', 'age_verified=1')
|
||||
webpage = self._download_webpage(req, video_id)
|
||||
if not info['title']:
|
||||
info['title'] = self._search_regex(
|
||||
r'<h1[^>]+title="([^"]+)"[^>]*>', webpage, 'title')
|
||||
|
||||
video_title = self._html_search_regex(
|
||||
r'<h1 [^>]*?title="([^"]+)"[^>]*>', webpage, 'title')
|
||||
uploader = self._html_search_regex(
|
||||
r'Uploaded by:\s*</strong>\s*(.+?)\s*</div>',
|
||||
webpage, 'uploader', fatal=False)
|
||||
view_count = str_to_int(self._html_search_regex(
|
||||
view_count = str_to_int(self._search_regex(
|
||||
r'Views:\s*</strong>\s*<span>([\d,\.]+)</span>',
|
||||
webpage, 'view count', fatal=False))
|
||||
|
||||
flash_vars = self._parse_json(
|
||||
self._search_regex(
|
||||
r'var\s+flashvars\s*=\s*({.+?});', webpage, 'flash vars'),
|
||||
video_id)
|
||||
|
||||
formats = []
|
||||
for quality_key, video_url in flash_vars.items():
|
||||
height = int_or_none(self._search_regex(
|
||||
r'quality_(\d+)[pP]$', quality_key, 'height', default=None))
|
||||
if not height:
|
||||
continue
|
||||
f = {
|
||||
'url': video_url,
|
||||
}
|
||||
mobj = re.search(
|
||||
r'/(?P<height>\d{3,4})[pP]_(?P<bitrate>\d+)[kK]_\d+', video_url)
|
||||
if mobj:
|
||||
height = int(mobj.group('height'))
|
||||
bitrate = int(mobj.group('bitrate'))
|
||||
f.update({
|
||||
'format_id': '%dp-%dk' % (height, bitrate),
|
||||
'height': height,
|
||||
'tbr': bitrate,
|
||||
})
|
||||
else:
|
||||
f.update({
|
||||
'format_id': '%dp' % height,
|
||||
'height': height,
|
||||
})
|
||||
formats.append(f)
|
||||
self._sort_formats(formats)
|
||||
|
||||
return {
|
||||
'id': video_id,
|
||||
'title': video_title,
|
||||
'formats': formats,
|
||||
info.update({
|
||||
'uploader': uploader,
|
||||
'view_count': view_count,
|
||||
'age_limit': 18,
|
||||
}
|
||||
})
|
||||
|
||||
return info
|
||||
|
@@ -2,44 +2,40 @@
|
||||
from __future__ import unicode_literals
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..compat import compat_xpath
|
||||
from ..compat import compat_urlparse
|
||||
from ..utils import (
|
||||
int_or_none,
|
||||
qualities,
|
||||
unified_strdate,
|
||||
xpath_attr,
|
||||
xpath_element,
|
||||
xpath_text,
|
||||
xpath_with_ns,
|
||||
)
|
||||
|
||||
|
||||
class FirstTVIE(InfoExtractor):
|
||||
IE_NAME = '1tv'
|
||||
IE_DESC = 'Первый канал'
|
||||
_VALID_URL = r'https?://(?:www\.)?1tv\.ru/(?:[^/]+/)+p?(?P<id>\d+)'
|
||||
_VALID_URL = r'https?://(?:www\.)?1tv\.ru/(?:[^/]+/)+(?P<id>[^/?#]+)'
|
||||
|
||||
_TESTS = [{
|
||||
# single format via video_materials.json API
|
||||
'url': 'http://www.1tv.ru/prj/inprivate/vypusk/35930',
|
||||
'md5': '82a2777648acae812d58b3f5bd42882b',
|
||||
# single format
|
||||
'url': 'http://www.1tv.ru/shows/naedine-so-vsemi/vypuski/gost-lyudmila-senchina-naedine-so-vsemi-vypusk-ot-12-02-2015',
|
||||
'md5': 'a1b6b60d530ebcf8daacf4565762bbaf',
|
||||
'info_dict': {
|
||||
'id': '35930',
|
||||
'id': '40049',
|
||||
'ext': 'mp4',
|
||||
'title': 'Гость Людмила Сенчина. Наедине со всеми. Выпуск от 12.02.2015',
|
||||
'description': 'md5:357933adeede13b202c7c21f91b871b2',
|
||||
'description': 'md5:36a39c1d19618fec57d12efe212a8370',
|
||||
'thumbnail': 're:^https?://.*\.(?:jpg|JPG)$',
|
||||
'upload_date': '20150212',
|
||||
'duration': 2694,
|
||||
},
|
||||
}, {
|
||||
# multiple formats via video_materials.json API
|
||||
'url': 'http://www.1tv.ru/video_archive/projects/dobroeutro/p113641',
|
||||
# multiple formats
|
||||
'url': 'http://www.1tv.ru/shows/dobroe-utro/pro-zdorove/vesennyaya-allergiya-dobroe-utro-fragment-vypuska-ot-07042016',
|
||||
'info_dict': {
|
||||
'id': '113641',
|
||||
'id': '364746',
|
||||
'ext': 'mp4',
|
||||
'title': 'Весенняя аллергия. Доброе утро. Фрагмент выпуска от 07.04.2016',
|
||||
'description': 'md5:8dcebb3dded0ff20fade39087fd1fee2',
|
||||
'description': 'md5:a242eea0031fd180a4497d52640a9572',
|
||||
'thumbnail': 're:^https?://.*\.(?:jpg|JPG)$',
|
||||
'upload_date': '20160407',
|
||||
'duration': 179,
|
||||
@@ -48,84 +44,47 @@ class FirstTVIE(InfoExtractor):
|
||||
'params': {
|
||||
'skip_download': True,
|
||||
},
|
||||
}, {
|
||||
# single format only available via ONE_ONLINE_VIDEOS.archive_single_xml API
|
||||
'url': 'http://www.1tv.ru/video_archive/series/f7552/p47038',
|
||||
'md5': '519d306c5b5669761fd8906c39dbee23',
|
||||
'info_dict': {
|
||||
'id': '47038',
|
||||
'ext': 'mp4',
|
||||
'title': '"Побег". Второй сезон. 3 серия',
|
||||
'description': 'md5:3abf8f6b9bce88201c33e9a3d794a00b',
|
||||
'thumbnail': 're:^https?://.*\.(?:jpg|JPG)$',
|
||||
'upload_date': '20120516',
|
||||
'duration': 3080,
|
||||
},
|
||||
}, {
|
||||
'url': 'http://www.1tv.ru/videoarchive/9967',
|
||||
'only_matching': True,
|
||||
}]
|
||||
|
||||
def _real_extract(self, url):
|
||||
video_id = self._match_id(url)
|
||||
display_id = self._match_id(url)
|
||||
|
||||
# Videos with multiple formats only available via this API
|
||||
video = self._download_json(
|
||||
'http://www.1tv.ru/video_materials.json?legacy_id=%s' % video_id,
|
||||
video_id, fatal=False)
|
||||
|
||||
description, thumbnail, upload_date, duration = [None] * 4
|
||||
|
||||
if video:
|
||||
item = video[0]
|
||||
title = item['title']
|
||||
quality = qualities(('ld', 'sd', 'hd', ))
|
||||
formats = [{
|
||||
'url': f['src'],
|
||||
'format_id': f.get('name'),
|
||||
'quality': quality(f.get('name')),
|
||||
} for f in item['mbr'] if f.get('src')]
|
||||
thumbnail = item.get('poster')
|
||||
else:
|
||||
# Some videos are not available via video_materials.json
|
||||
video = self._download_xml(
|
||||
'http://www.1tv.ru/owa/win/ONE_ONLINE_VIDEOS.archive_single_xml?pid=%s' % video_id,
|
||||
video_id)
|
||||
|
||||
NS_MAP = {
|
||||
'media': 'http://search.yahoo.com/mrss/',
|
||||
}
|
||||
|
||||
item = xpath_element(video, './channel/item', fatal=True)
|
||||
title = xpath_text(item, './title', fatal=True)
|
||||
formats = [{
|
||||
'url': content.attrib['url'],
|
||||
} for content in item.findall(
|
||||
compat_xpath(xpath_with_ns('./media:content', NS_MAP))) if content.attrib.get('url')]
|
||||
thumbnail = xpath_attr(
|
||||
item, xpath_with_ns('./media:thumbnail', NS_MAP), 'url')
|
||||
webpage = self._download_webpage(url, display_id)
|
||||
playlist_url = compat_urlparse.urljoin(url, self._search_regex(
|
||||
r'data-playlist-url="([^"]+)', webpage, 'playlist url'))
|
||||
|
||||
item = self._download_json(playlist_url, display_id)[0]
|
||||
video_id = item['id']
|
||||
quality = qualities(('ld', 'sd', 'hd', ))
|
||||
formats = []
|
||||
for f in item.get('mbr', []):
|
||||
src = f.get('src')
|
||||
if not src:
|
||||
continue
|
||||
fname = f.get('name')
|
||||
formats.append({
|
||||
'url': src,
|
||||
'format_id': fname,
|
||||
'quality': quality(fname),
|
||||
})
|
||||
self._sort_formats(formats)
|
||||
|
||||
webpage = self._download_webpage(url, video_id, 'Downloading page', fatal=False)
|
||||
if webpage:
|
||||
title = self._html_search_regex(
|
||||
(r'<div class="tv_translation">\s*<h1><a href="[^"]+">([^<]*)</a>',
|
||||
r"'title'\s*:\s*'([^']+)'"),
|
||||
webpage, 'title', default=None) or title
|
||||
description = self._html_search_regex(
|
||||
r'<div class="descr">\s*<div> </div>\s*<p>([^<]*)</p></div>',
|
||||
webpage, 'description', default=None) or self._html_search_meta(
|
||||
'description', webpage, 'description')
|
||||
thumbnail = thumbnail or self._og_search_thumbnail(webpage)
|
||||
duration = int_or_none(self._html_search_meta(
|
||||
'video:duration', webpage, 'video duration', fatal=False))
|
||||
upload_date = unified_strdate(self._html_search_meta(
|
||||
'ya:ovs:upload_date', webpage, 'upload date', fatal=False))
|
||||
title = self._html_search_regex(
|
||||
(r'<div class="tv_translation">\s*<h1><a href="[^"]+">([^<]*)</a>',
|
||||
r"'title'\s*:\s*'([^']+)'"),
|
||||
webpage, 'title', default=None) or item['title']
|
||||
description = self._html_search_regex(
|
||||
r'<div class="descr">\s*<div> </div>\s*<p>([^<]*)</p></div>',
|
||||
webpage, 'description', default=None) or self._html_search_meta(
|
||||
'description', webpage, 'description')
|
||||
duration = int_or_none(self._html_search_meta(
|
||||
'video:duration', webpage, 'video duration', fatal=False))
|
||||
upload_date = unified_strdate(self._html_search_meta(
|
||||
'ya:ovs:upload_date', webpage, 'upload date', fatal=False))
|
||||
|
||||
return {
|
||||
'id': video_id,
|
||||
'thumbnail': thumbnail,
|
||||
'thumbnail': item.get('poster') or self._og_search_thumbnail(webpage),
|
||||
'title': title,
|
||||
'description': description,
|
||||
'upload_date': upload_date,
|
||||
|
70
youtube_dl/extractor/fxnetworks.py
Normal file
70
youtube_dl/extractor/fxnetworks.py
Normal file
@@ -0,0 +1,70 @@
|
||||
# coding: utf-8
|
||||
from __future__ import unicode_literals
|
||||
|
||||
from .adobepass import AdobePassIE
|
||||
from ..utils import (
|
||||
update_url_query,
|
||||
extract_attributes,
|
||||
parse_age_limit,
|
||||
smuggle_url,
|
||||
)
|
||||
|
||||
|
||||
class FXNetworksIE(AdobePassIE):
|
||||
_VALID_URL = r'https?://(?:www\.)?(?:fxnetworks|simpsonsworld)\.com/video/(?P<id>\d+)'
|
||||
_TESTS = [{
|
||||
'url': 'http://www.fxnetworks.com/video/719841347694',
|
||||
'md5': '1447d4722e42ebca19e5232ab93abb22',
|
||||
'info_dict': {
|
||||
'id': '719841347694',
|
||||
'ext': 'mp4',
|
||||
'title': 'Vanpage',
|
||||
'description': 'F*ck settling down. You\'re the Worst returns for an all new season August 31st on FXX.',
|
||||
'age_limit': 14,
|
||||
'uploader': 'NEWA-FNG-FX',
|
||||
'upload_date': '20160706',
|
||||
'timestamp': 1467844741,
|
||||
},
|
||||
'add_ie': ['ThePlatform'],
|
||||
}, {
|
||||
'url': 'http://www.simpsonsworld.com/video/716094019682',
|
||||
'only_matching': True,
|
||||
}]
|
||||
|
||||
def _real_extract(self, url):
|
||||
video_id = self._match_id(url)
|
||||
webpage = self._download_webpage(url, video_id)
|
||||
if 'The content you are trying to access is not available in your region.' in webpage:
|
||||
self.raise_geo_restricted()
|
||||
video_data = extract_attributes(self._search_regex(
|
||||
r'(<a.+?rel="http://link\.theplatform\.com/s/.+?</a>)', webpage, 'video data'))
|
||||
player_type = self._search_regex(r'playerType\s*=\s*[\'"]([^\'"]+)', webpage, 'player type', default=None)
|
||||
release_url = video_data['rel']
|
||||
title = video_data['data-title']
|
||||
rating = video_data.get('data-rating')
|
||||
query = {
|
||||
'mbr': 'true',
|
||||
}
|
||||
if player_type == 'movies':
|
||||
query.update({
|
||||
'manifest': 'm3u',
|
||||
})
|
||||
else:
|
||||
query.update({
|
||||
'switch': 'http',
|
||||
})
|
||||
if video_data.get('data-req-auth') == '1':
|
||||
resource = self._get_mvpd_resource(
|
||||
video_data['data-channel'], title,
|
||||
video_data.get('data-guid'), rating)
|
||||
query['auth'] = self._extract_mvpd_auth(url, video_id, 'fx', resource)
|
||||
|
||||
return {
|
||||
'_type': 'url_transparent',
|
||||
'id': video_id,
|
||||
'title': title,
|
||||
'url': smuggle_url(update_url_query(release_url, query), {'force_smil_url': True}),
|
||||
'thumbnail': video_data.get('data-large-thumb'),
|
||||
'age_limit': parse_age_limit(rating),
|
||||
'ie_key': 'ThePlatform',
|
||||
}
|
@@ -72,6 +72,8 @@ from .kaltura import KalturaIE
|
||||
from .eagleplatform import EaglePlatformIE
|
||||
from .facebook import FacebookIE
|
||||
from .soundcloud import SoundcloudIE
|
||||
from .vbox7 import Vbox7IE
|
||||
from .dbtv import DBTVIE
|
||||
|
||||
|
||||
class GenericIE(InfoExtractor):
|
||||
@@ -1373,6 +1375,27 @@ class GenericIE(InfoExtractor):
|
||||
},
|
||||
'add_ie': [ArkenaIE.ie_key()],
|
||||
},
|
||||
{
|
||||
'url': 'http://nova.bg/news/view/2016/08/16/156543/%D0%BD%D0%B0-%D0%BA%D0%BE%D1%81%D1%8A%D0%BC-%D0%BE%D1%82-%D0%B2%D0%B7%D1%80%D0%B8%D0%B2-%D0%BE%D1%82%D1%86%D0%B5%D0%BF%D0%B8%D1%85%D0%B0-%D1%86%D1%8F%D0%BB-%D0%BA%D0%B2%D0%B0%D1%80%D1%82%D0%B0%D0%BB-%D0%B7%D0%B0%D1%80%D0%B0%D0%B4%D0%B8-%D0%B8%D0%B7%D1%82%D0%B8%D1%87%D0%B0%D0%BD%D0%B5-%D0%BD%D0%B0-%D0%B3%D0%B0%D0%B7-%D0%B2-%D0%BF%D0%BB%D0%BE%D0%B2%D0%B4%D0%B8%D0%B2/',
|
||||
'info_dict': {
|
||||
'id': '1c7141f46c',
|
||||
'ext': 'mp4',
|
||||
'title': 'НА КОСЪМ ОТ ВЗРИВ: Изтичане на газ на бензиностанция в Пловдив',
|
||||
},
|
||||
'params': {
|
||||
'skip_download': True,
|
||||
},
|
||||
'add_ie': [Vbox7IE.ie_key()],
|
||||
},
|
||||
{
|
||||
# DBTV embeds
|
||||
'url': 'http://www.dagbladet.no/2016/02/23/nyheter/nordlys/ski/troms/ver/43254897/',
|
||||
'info_dict': {
|
||||
'id': '43254897',
|
||||
'title': 'Etter ett års planlegging, klaffet endelig alt: - Jeg måtte ta en liten dans',
|
||||
},
|
||||
'playlist_mincount': 3,
|
||||
},
|
||||
# {
|
||||
# # TODO: find another test
|
||||
# # http://schema.org/VideoObject
|
||||
@@ -2239,6 +2262,16 @@ class GenericIE(InfoExtractor):
|
||||
'uploader': video_uploader,
|
||||
}
|
||||
|
||||
# Look for VBOX7 embeds
|
||||
vbox7_url = Vbox7IE._extract_url(webpage)
|
||||
if vbox7_url:
|
||||
return self.url_result(vbox7_url, Vbox7IE.ie_key())
|
||||
|
||||
# Look for DBTV embeds
|
||||
dbtv_urls = DBTVIE._extract_urls(webpage)
|
||||
if dbtv_urls:
|
||||
return _playlist_from_matches(dbtv_urls, ie=DBTVIE.ie_key())
|
||||
|
||||
# Looking for http://schema.org/VideoObject
|
||||
json_ld = self._search_json_ld(
|
||||
webpage, video_id, default={}, expected_type='VideoObject')
|
||||
|
@@ -396,12 +396,12 @@ class GloboIE(InfoExtractor):
|
||||
|
||||
|
||||
class GloboArticleIE(InfoExtractor):
|
||||
_VALID_URL = 'https?://.+?\.globo\.com/(?:[^/]+/)*(?P<id>[^/]+)\.html'
|
||||
_VALID_URL = 'https?://.+?\.globo\.com/(?:[^/]+/)*(?P<id>[^/]+)(?:\.html)?'
|
||||
|
||||
_VIDEOID_REGEXES = [
|
||||
r'\bdata-video-id=["\'](\d{7,})',
|
||||
r'\bdata-player-videosids=["\'](\d{7,})',
|
||||
r'\bvideosIDs\s*:\s*["\'](\d{7,})',
|
||||
r'\bvideosIDs\s*:\s*["\']?(\d{7,})',
|
||||
r'\bdata-id=["\'](\d{7,})',
|
||||
r'<div[^>]+\bid=["\'](\d{7,})',
|
||||
]
|
||||
@@ -423,6 +423,9 @@ class GloboArticleIE(InfoExtractor):
|
||||
}, {
|
||||
'url': 'http://gshow.globo.com/programas/tv-xuxa/O-Programa/noticia/2014/01/xuxa-e-junno-namoram-muuuito-em-luau-de-zeze-di-camargo-e-luciano.html',
|
||||
'only_matching': True,
|
||||
}, {
|
||||
'url': 'http://oglobo.globo.com/rio/a-amizade-entre-um-entregador-de-farmacia-um-piano-19946271',
|
||||
'only_matching': True,
|
||||
}]
|
||||
|
||||
@classmethod
|
||||
|
@@ -46,3 +46,34 @@ class HGTVIE(InfoExtractor):
|
||||
'episode_number': int_or_none(embed_vars.get('episode')),
|
||||
'ie_key': 'ThePlatform',
|
||||
}
|
||||
|
||||
|
||||
class HGTVComShowIE(InfoExtractor):
|
||||
IE_NAME = 'hgtv.com:show'
|
||||
_VALID_URL = r'https?://(?:www\.)?hgtv\.com/shows/[^/]+/(?P<id>[^/?#&]+)'
|
||||
_TEST = {
|
||||
'url': 'http://www.hgtv.com/shows/flip-or-flop/flip-or-flop-full-episodes-videos',
|
||||
'info_dict': {
|
||||
'id': 'flip-or-flop-full-episodes-videos',
|
||||
'title': 'Flip or Flop Full Episodes',
|
||||
},
|
||||
'playlist_mincount': 15,
|
||||
}
|
||||
|
||||
def _real_extract(self, url):
|
||||
display_id = self._match_id(url)
|
||||
|
||||
webpage = self._download_webpage(url, display_id)
|
||||
|
||||
config = self._parse_json(
|
||||
self._search_regex(
|
||||
r'(?s)data-module=["\']video["\'][^>]*>.*?<script[^>]+type=["\']text/x-config["\'][^>]*>(.+?)</script',
|
||||
webpage, 'video config'),
|
||||
display_id)['channels'][0]
|
||||
|
||||
entries = [
|
||||
self.url_result(video['releaseUrl'])
|
||||
for video in config['videos'] if video.get('releaseUrl')]
|
||||
|
||||
return self.playlist_result(
|
||||
entries, display_id, config.get('title'), config.get('description'))
|
||||
|
@@ -6,6 +6,7 @@ from .common import InfoExtractor
|
||||
from ..utils import (
|
||||
mimetype2ext,
|
||||
qualities,
|
||||
remove_end,
|
||||
)
|
||||
|
||||
|
||||
@@ -19,7 +20,7 @@ class ImdbIE(InfoExtractor):
|
||||
'info_dict': {
|
||||
'id': '2524815897',
|
||||
'ext': 'mp4',
|
||||
'title': 'Ice Age: Continental Drift Trailer (No. 2) - IMDb',
|
||||
'title': 'Ice Age: Continental Drift Trailer (No. 2)',
|
||||
'description': 'md5:9061c2219254e5d14e03c25c98e96a81',
|
||||
}
|
||||
}, {
|
||||
@@ -83,10 +84,10 @@ class ImdbIE(InfoExtractor):
|
||||
|
||||
return {
|
||||
'id': video_id,
|
||||
'title': self._og_search_title(webpage),
|
||||
'title': remove_end(self._og_search_title(webpage), ' - IMDb'),
|
||||
'formats': formats,
|
||||
'description': descr,
|
||||
'thumbnail': format_info['slate'],
|
||||
'thumbnail': format_info.get('slate'),
|
||||
}
|
||||
|
||||
|
||||
|
@@ -1,4 +1,4 @@
|
||||
# encoding: utf-8
|
||||
# coding: utf-8
|
||||
from __future__ import unicode_literals
|
||||
|
||||
import re
|
||||
@@ -8,7 +8,7 @@ from .common import InfoExtractor
|
||||
from ..utils import (
|
||||
ExtractorError,
|
||||
int_or_none,
|
||||
sanitized_Request,
|
||||
qualities,
|
||||
)
|
||||
|
||||
|
||||
@@ -49,11 +49,27 @@ class IviIE(InfoExtractor):
|
||||
'thumbnail': 're:^https?://.*\.jpg$',
|
||||
},
|
||||
'skip': 'Only works from Russia',
|
||||
},
|
||||
{
|
||||
# with MP4-HD720 format
|
||||
'url': 'http://www.ivi.ru/watch/146500',
|
||||
'md5': 'd63d35cdbfa1ea61a5eafec7cc523e1e',
|
||||
'info_dict': {
|
||||
'id': '146500',
|
||||
'ext': 'mp4',
|
||||
'title': 'Кукла',
|
||||
'description': 'md5:ffca9372399976a2d260a407cc74cce6',
|
||||
'duration': 5599,
|
||||
'thumbnail': 're:^https?://.*\.jpg$',
|
||||
},
|
||||
'skip': 'Only works from Russia',
|
||||
}
|
||||
]
|
||||
|
||||
# Sorted by quality
|
||||
_KNOWN_FORMATS = ['MP4-low-mobile', 'MP4-mobile', 'FLV-lo', 'MP4-lo', 'FLV-hi', 'MP4-hi', 'MP4-SHQ']
|
||||
_KNOWN_FORMATS = (
|
||||
'MP4-low-mobile', 'MP4-mobile', 'FLV-lo', 'MP4-lo', 'FLV-hi', 'MP4-hi',
|
||||
'MP4-SHQ', 'MP4-HD720', 'MP4-HD1080')
|
||||
|
||||
def _real_extract(self, url):
|
||||
video_id = self._match_id(url)
|
||||
@@ -69,10 +85,9 @@ class IviIE(InfoExtractor):
|
||||
]
|
||||
}
|
||||
|
||||
request = sanitized_Request(
|
||||
'http://api.digitalaccess.ru/api/json/', json.dumps(data))
|
||||
video_json = self._download_json(
|
||||
request, video_id, 'Downloading video JSON')
|
||||
'http://api.digitalaccess.ru/api/json/', video_id,
|
||||
'Downloading video JSON', data=json.dumps(data))
|
||||
|
||||
if 'error' in video_json:
|
||||
error = video_json['error']
|
||||
@@ -84,11 +99,13 @@ class IviIE(InfoExtractor):
|
||||
|
||||
result = video_json['result']
|
||||
|
||||
quality = qualities(self._KNOWN_FORMATS)
|
||||
|
||||
formats = [{
|
||||
'url': x['url'],
|
||||
'format_id': x['content_format'],
|
||||
'preference': self._KNOWN_FORMATS.index(x['content_format']),
|
||||
} for x in result['files'] if x['content_format'] in self._KNOWN_FORMATS]
|
||||
'format_id': x.get('content_format'),
|
||||
'quality': quality(x.get('content_format')),
|
||||
} for x in result['files'] if x.get('url')]
|
||||
|
||||
self._sort_formats(formats)
|
||||
|
||||
@@ -115,7 +132,7 @@ class IviIE(InfoExtractor):
|
||||
webpage, 'season number', default=None))
|
||||
|
||||
episode_number = int_or_none(self._search_regex(
|
||||
r'<meta[^>]+itemprop="episode"[^>]*>\s*<meta[^>]+itemprop="episodeNumber"[^>]+content="(\d+)',
|
||||
r'[^>]+itemprop="episode"[^>]*>\s*<meta[^>]+itemprop="episodeNumber"[^>]+content="(\d+)',
|
||||
webpage, 'episode number', default=None))
|
||||
|
||||
description = self._og_search_description(webpage, default=None) or self._html_search_meta(
|
||||
|
@@ -30,7 +30,7 @@ class JWPlatformBaseIE(InfoExtractor):
|
||||
return self._parse_jwplayer_data(
|
||||
jwplayer_data, video_id, *args, **kwargs)
|
||||
|
||||
def _parse_jwplayer_data(self, jwplayer_data, video_id, require_title=True, m3u8_id=None, rtmp_params=None, base_url=None):
|
||||
def _parse_jwplayer_data(self, jwplayer_data, video_id=None, require_title=True, m3u8_id=None, rtmp_params=None, base_url=None):
|
||||
# JWPlayer backward compatibility: flattened playlists
|
||||
# https://github.com/jwplayer/jwplayer/blob/v7.4.3/src/js/api/config.js#L81-L96
|
||||
if 'playlist' not in jwplayer_data:
|
||||
@@ -43,6 +43,8 @@ class JWPlatformBaseIE(InfoExtractor):
|
||||
if 'sources' not in video_data:
|
||||
video_data['sources'] = [video_data]
|
||||
|
||||
this_video_id = video_id or video_data['mediaid']
|
||||
|
||||
formats = []
|
||||
for source in video_data['sources']:
|
||||
source_url = self._proto_relative_url(source['file'])
|
||||
@@ -52,7 +54,7 @@ class JWPlatformBaseIE(InfoExtractor):
|
||||
ext = mimetype2ext(source_type) or determine_ext(source_url)
|
||||
if source_type == 'hls' or ext == 'm3u8':
|
||||
formats.extend(self._extract_m3u8_formats(
|
||||
source_url, video_id, 'mp4', 'm3u8_native', m3u8_id=m3u8_id, fatal=False))
|
||||
source_url, this_video_id, 'mp4', 'm3u8_native', m3u8_id=m3u8_id, fatal=False))
|
||||
# https://github.com/jwplayer/jwplayer/blob/master/src/js/providers/default.js#L67
|
||||
elif source_type.startswith('audio') or ext in ('oga', 'aac', 'mp3', 'mpeg', 'vorbis'):
|
||||
formats.append({
|
||||
@@ -68,7 +70,7 @@ class JWPlatformBaseIE(InfoExtractor):
|
||||
'ext': ext,
|
||||
}
|
||||
if source_url.startswith('rtmp'):
|
||||
a_format['ext'] = 'flv',
|
||||
a_format['ext'] = 'flv'
|
||||
|
||||
# See com/longtailvideo/jwplayer/media/RTMPMediaProvider.as
|
||||
# of jwplayer.flash.swf
|
||||
@@ -95,7 +97,7 @@ class JWPlatformBaseIE(InfoExtractor):
|
||||
})
|
||||
|
||||
entries.append({
|
||||
'id': video_id,
|
||||
'id': this_video_id,
|
||||
'title': video_data['title'] if require_title else video_data.get('title'),
|
||||
'description': video_data.get('description'),
|
||||
'thumbnail': self._proto_relative_url(video_data.get('image')),
|
||||
|
@@ -36,6 +36,12 @@ class KalturaIE(InfoExtractor):
|
||||
'''
|
||||
_SERVICE_URL = 'http://cdnapi.kaltura.com'
|
||||
_SERVICE_BASE = '/api_v3/index.php'
|
||||
# See https://github.com/kaltura/server/blob/master/plugins/content/caption/base/lib/model/enums/CaptionType.php
|
||||
_CAPTION_TYPES = {
|
||||
1: 'srt',
|
||||
2: 'ttml',
|
||||
3: 'vtt',
|
||||
}
|
||||
_TESTS = [
|
||||
{
|
||||
'url': 'kaltura:269692:1_1jc2y3e4',
|
||||
@@ -67,6 +73,27 @@ class KalturaIE(InfoExtractor):
|
||||
# video with subtitles
|
||||
'url': 'kaltura:111032:1_cw786r8q',
|
||||
'only_matching': True,
|
||||
},
|
||||
{
|
||||
# video with ttml subtitles (no fileExt)
|
||||
'url': 'kaltura:1926081:0_l5ye1133',
|
||||
'info_dict': {
|
||||
'id': '0_l5ye1133',
|
||||
'ext': 'mp4',
|
||||
'title': 'What Can You Do With Python?',
|
||||
'upload_date': '20160221',
|
||||
'uploader_id': 'stork',
|
||||
'thumbnail': 're:^https?://.*/thumbnail/.*',
|
||||
'timestamp': int,
|
||||
'subtitles': {
|
||||
'en': [{
|
||||
'ext': 'ttml',
|
||||
}],
|
||||
},
|
||||
},
|
||||
'params': {
|
||||
'skip_download': True,
|
||||
},
|
||||
}
|
||||
]
|
||||
|
||||
@@ -122,18 +149,6 @@ class KalturaIE(InfoExtractor):
|
||||
|
||||
return data
|
||||
|
||||
def _get_kaltura_signature(self, video_id, partner_id, service_url=None):
|
||||
actions = [{
|
||||
'apiVersion': '3.1',
|
||||
'expiry': 86400,
|
||||
'format': 1,
|
||||
'service': 'session',
|
||||
'action': 'startWidgetSession',
|
||||
'widgetId': '_%s' % partner_id,
|
||||
}]
|
||||
return self._kaltura_api_call(
|
||||
video_id, actions, service_url, note='Downloading Kaltura signature')['ks']
|
||||
|
||||
def _get_video_info(self, video_id, partner_id, service_url=None):
|
||||
actions = [
|
||||
{
|
||||
@@ -208,6 +223,17 @@ class KalturaIE(InfoExtractor):
|
||||
reference_id)['entryResult']
|
||||
info, flavor_assets = entry_data['meta'], entry_data['contextData']['flavorAssets']
|
||||
entry_id = info['id']
|
||||
# Unfortunately, data returned in kalturaIframePackageData lacks
|
||||
# captions so we will try requesting the complete data using
|
||||
# regular approach since we now know the entry_id
|
||||
try:
|
||||
_, info, flavor_assets, captions = self._get_video_info(
|
||||
entry_id, partner_id)
|
||||
except ExtractorError:
|
||||
# Regular scenario failed but we already have everything
|
||||
# extracted apart from captions and can process at least
|
||||
# with this
|
||||
pass
|
||||
else:
|
||||
raise ExtractorError('Invalid URL', expected=True)
|
||||
ks = params.get('flashvars[ks]', [None])[0]
|
||||
@@ -265,9 +291,12 @@ class KalturaIE(InfoExtractor):
|
||||
# Continue if caption is not ready
|
||||
if f.get('status') != 2:
|
||||
continue
|
||||
if not caption.get('id'):
|
||||
continue
|
||||
caption_format = int_or_none(caption.get('format'))
|
||||
subtitles.setdefault(caption.get('languageCode') or caption.get('language'), []).append({
|
||||
'url': '%s/api_v3/service/caption_captionasset/action/serve/captionAssetId/%s' % (self._SERVICE_URL, caption['id']),
|
||||
'ext': caption.get('fileExt'),
|
||||
'ext': caption.get('fileExt') or self._CAPTION_TYPES.get(caption_format) or 'ttml',
|
||||
})
|
||||
|
||||
return {
|
||||
|
@@ -3,64 +3,126 @@ from __future__ import unicode_literals
|
||||
import re
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..aes import aes_decrypt_text
|
||||
from ..compat import (
|
||||
compat_str,
|
||||
compat_urllib_parse_unquote,
|
||||
)
|
||||
from ..utils import (
|
||||
sanitized_Request,
|
||||
url_basename,
|
||||
determine_ext,
|
||||
ExtractorError,
|
||||
int_or_none,
|
||||
str_to_int,
|
||||
strip_or_none,
|
||||
)
|
||||
|
||||
|
||||
class KeezMoviesIE(InfoExtractor):
|
||||
_VALID_URL = r'https?://(?:www\.)?keezmovies\.com/video/.+?(?P<id>[0-9]+)(?:[/?&]|$)'
|
||||
_TEST = {
|
||||
_VALID_URL = r'https?://(?:www\.)?keezmovies\.com/video/(?:(?P<display_id>[^/]+)-)?(?P<id>\d+)'
|
||||
_TESTS = [{
|
||||
'url': 'http://www.keezmovies.com/video/petite-asian-lady-mai-playing-in-bathtub-1214711',
|
||||
'md5': '1c1e75d22ffa53320f45eeb07bc4cdc0',
|
||||
'info_dict': {
|
||||
'id': '1214711',
|
||||
'display_id': 'petite-asian-lady-mai-playing-in-bathtub',
|
||||
'ext': 'mp4',
|
||||
'title': 'Petite Asian Lady Mai Playing In Bathtub',
|
||||
'age_limit': 18,
|
||||
'thumbnail': 're:^https?://.*\.jpg$',
|
||||
'view_count': int,
|
||||
'age_limit': 18,
|
||||
}
|
||||
}
|
||||
}, {
|
||||
'url': 'http://www.keezmovies.com/video/1214711',
|
||||
'only_matching': True,
|
||||
}]
|
||||
|
||||
def _real_extract(self, url):
|
||||
video_id = self._match_id(url)
|
||||
def _extract_info(self, url):
|
||||
mobj = re.match(self._VALID_URL, url)
|
||||
video_id = mobj.group('id')
|
||||
display_id = (mobj.group('display_id')
|
||||
if 'display_id' in mobj.groupdict()
|
||||
else None) or mobj.group('id')
|
||||
|
||||
req = sanitized_Request(url)
|
||||
req.add_header('Cookie', 'age_verified=1')
|
||||
webpage = self._download_webpage(req, video_id)
|
||||
|
||||
# embedded video
|
||||
mobj = re.search(r'href="([^"]+)"></iframe>', webpage)
|
||||
if mobj:
|
||||
embedded_url = mobj.group(1)
|
||||
return self.url_result(embedded_url)
|
||||
|
||||
video_title = self._html_search_regex(
|
||||
r'<h1 [^>]*>([^<]+)', webpage, 'title')
|
||||
flashvars = self._parse_json(self._search_regex(
|
||||
r'var\s+flashvars\s*=\s*([^;]+);', webpage, 'flashvars'), video_id)
|
||||
webpage = self._download_webpage(
|
||||
url, display_id, headers={'Cookie': 'age_verified=1'})
|
||||
|
||||
formats = []
|
||||
for height in (180, 240, 480):
|
||||
if flashvars.get('quality_%dp' % height):
|
||||
video_url = flashvars['quality_%dp' % height]
|
||||
a_format = {
|
||||
'url': video_url,
|
||||
'height': height,
|
||||
'format_id': '%dp' % height,
|
||||
}
|
||||
filename_parts = url_basename(video_url).split('_')
|
||||
if len(filename_parts) >= 2 and re.match(r'\d+[Kk]', filename_parts[1]):
|
||||
a_format['tbr'] = int(filename_parts[1][:-1])
|
||||
formats.append(a_format)
|
||||
format_urls = set()
|
||||
|
||||
age_limit = self._rta_search(webpage)
|
||||
title = None
|
||||
thumbnail = None
|
||||
duration = None
|
||||
encrypted = False
|
||||
|
||||
return {
|
||||
def extract_format(format_url, height=None):
|
||||
if not isinstance(format_url, compat_str) or not format_url.startswith('http'):
|
||||
return
|
||||
if format_url in format_urls:
|
||||
return
|
||||
format_urls.add(format_url)
|
||||
tbr = int_or_none(self._search_regex(
|
||||
r'[/_](\d+)[kK][/_]', format_url, 'tbr', default=None))
|
||||
if not height:
|
||||
height = int_or_none(self._search_regex(
|
||||
r'[/_](\d+)[pP][/_]', format_url, 'height', default=None))
|
||||
if encrypted:
|
||||
format_url = aes_decrypt_text(
|
||||
video_url, title, 32).decode('utf-8')
|
||||
formats.append({
|
||||
'url': format_url,
|
||||
'format_id': '%dp' % height if height else None,
|
||||
'height': height,
|
||||
'tbr': tbr,
|
||||
})
|
||||
|
||||
flashvars = self._parse_json(
|
||||
self._search_regex(
|
||||
r'flashvars\s*=\s*({.+?});', webpage,
|
||||
'flashvars', default='{}'),
|
||||
display_id, fatal=False)
|
||||
|
||||
if flashvars:
|
||||
title = flashvars.get('video_title')
|
||||
thumbnail = flashvars.get('image_url')
|
||||
duration = int_or_none(flashvars.get('video_duration'))
|
||||
encrypted = flashvars.get('encrypted') is True
|
||||
for key, value in flashvars.items():
|
||||
mobj = re.search(r'quality_(\d+)[pP]', key)
|
||||
if mobj:
|
||||
extract_format(value, int(mobj.group(1)))
|
||||
video_url = flashvars.get('video_url')
|
||||
if video_url and determine_ext(video_url, None):
|
||||
extract_format(video_url)
|
||||
|
||||
video_url = self._html_search_regex(
|
||||
r'flashvars\.video_url\s*=\s*(["\'])(?P<url>http.+?)\1',
|
||||
webpage, 'video url', default=None, group='url')
|
||||
if video_url:
|
||||
extract_format(compat_urllib_parse_unquote(video_url))
|
||||
|
||||
if not formats:
|
||||
if 'title="This video is no longer available"' in webpage:
|
||||
raise ExtractorError(
|
||||
'Video %s is no longer available' % video_id, expected=True)
|
||||
|
||||
self._sort_formats(formats)
|
||||
|
||||
if not title:
|
||||
title = self._html_search_regex(
|
||||
r'<h1[^>]*>([^<]+)', webpage, 'title')
|
||||
|
||||
return webpage, {
|
||||
'id': video_id,
|
||||
'title': video_title,
|
||||
'display_id': display_id,
|
||||
'title': strip_or_none(title),
|
||||
'thumbnail': thumbnail,
|
||||
'duration': duration,
|
||||
'age_limit': 18,
|
||||
'formats': formats,
|
||||
'age_limit': age_limit,
|
||||
'thumbnail': flashvars.get('image_url')
|
||||
}
|
||||
|
||||
def _real_extract(self, url):
|
||||
webpage, info = self._extract_info(url)
|
||||
info['view_count'] = str_to_int(self._search_regex(
|
||||
r'<b>([\d,.]+)</b> Views?', webpage, 'view count', fatal=False))
|
||||
return info
|
||||
|
@@ -4,7 +4,10 @@ from __future__ import unicode_literals
|
||||
import re
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..compat import compat_urlparse
|
||||
from ..compat import (
|
||||
compat_str,
|
||||
compat_urlparse,
|
||||
)
|
||||
from ..utils import (
|
||||
determine_ext,
|
||||
ExtractorError,
|
||||
@@ -96,7 +99,7 @@ class LifeNewsIE(InfoExtractor):
|
||||
r'<video[^>]+><source[^>]+src=["\'](.+?)["\']', webpage)
|
||||
|
||||
iframe_links = re.findall(
|
||||
r'<iframe[^>]+src=["\']((?:https?:)?//embed\.life\.ru/embed/.+?)["\']',
|
||||
r'<iframe[^>]+src=["\']((?:https?:)?//embed\.life\.ru/(?:embed|video)/.+?)["\']',
|
||||
webpage)
|
||||
|
||||
if not video_urls and not iframe_links:
|
||||
@@ -164,9 +167,9 @@ class LifeNewsIE(InfoExtractor):
|
||||
|
||||
class LifeEmbedIE(InfoExtractor):
|
||||
IE_NAME = 'life:embed'
|
||||
_VALID_URL = r'https?://embed\.life\.ru/embed/(?P<id>[\da-f]{32})'
|
||||
_VALID_URL = r'https?://embed\.life\.ru/(?:embed|video)/(?P<id>[\da-f]{32})'
|
||||
|
||||
_TEST = {
|
||||
_TESTS = [{
|
||||
'url': 'http://embed.life.ru/embed/e50c2dec2867350528e2574c899b8291',
|
||||
'md5': 'b889715c9e49cb1981281d0e5458fbbe',
|
||||
'info_dict': {
|
||||
@@ -175,30 +178,57 @@ class LifeEmbedIE(InfoExtractor):
|
||||
'title': 'e50c2dec2867350528e2574c899b8291',
|
||||
'thumbnail': 're:http://.*\.jpg',
|
||||
}
|
||||
}
|
||||
}, {
|
||||
# with 1080p
|
||||
'url': 'https://embed.life.ru/video/e50c2dec2867350528e2574c899b8291',
|
||||
'only_matching': True,
|
||||
}]
|
||||
|
||||
def _real_extract(self, url):
|
||||
video_id = self._match_id(url)
|
||||
|
||||
webpage = self._download_webpage(url, video_id)
|
||||
|
||||
thumbnail = None
|
||||
formats = []
|
||||
for video_url in re.findall(r'"file"\s*:\s*"([^"]+)', webpage):
|
||||
video_url = compat_urlparse.urljoin(url, video_url)
|
||||
ext = determine_ext(video_url)
|
||||
if ext == 'm3u8':
|
||||
formats.extend(self._extract_m3u8_formats(
|
||||
video_url, video_id, 'mp4',
|
||||
entry_protocol='m3u8_native', m3u8_id='m3u8'))
|
||||
else:
|
||||
formats.append({
|
||||
'url': video_url,
|
||||
'format_id': ext,
|
||||
'preference': 1,
|
||||
})
|
||||
|
||||
def extract_m3u8(manifest_url):
|
||||
formats.extend(self._extract_m3u8_formats(
|
||||
manifest_url, video_id, 'mp4',
|
||||
entry_protocol='m3u8_native', m3u8_id='m3u8'))
|
||||
|
||||
def extract_original(original_url):
|
||||
formats.append({
|
||||
'url': original_url,
|
||||
'format_id': determine_ext(original_url, None),
|
||||
'preference': 1,
|
||||
})
|
||||
|
||||
playlist = self._parse_json(
|
||||
self._search_regex(
|
||||
r'options\s*=\s*({.+?});', webpage, 'options', default='{}'),
|
||||
video_id).get('playlist', {})
|
||||
if playlist:
|
||||
master = playlist.get('master')
|
||||
if isinstance(master, compat_str) and determine_ext(master) == 'm3u8':
|
||||
extract_m3u8(compat_urlparse.urljoin(url, master))
|
||||
original = playlist.get('original')
|
||||
if isinstance(original, compat_str):
|
||||
extract_original(original)
|
||||
thumbnail = playlist.get('image')
|
||||
|
||||
# Old rendition fallback
|
||||
if not formats:
|
||||
for video_url in re.findall(r'"file"\s*:\s*"([^"]+)', webpage):
|
||||
video_url = compat_urlparse.urljoin(url, video_url)
|
||||
if determine_ext(video_url) == 'm3u8':
|
||||
extract_m3u8(video_url)
|
||||
else:
|
||||
extract_original(video_url)
|
||||
|
||||
self._sort_formats(formats)
|
||||
|
||||
thumbnail = self._search_regex(
|
||||
thumbnail = thumbnail or self._search_regex(
|
||||
r'"image"\s*:\s*"([^"]+)', webpage, 'thumbnail', default=None)
|
||||
|
||||
return {
|
||||
|
@@ -14,7 +14,7 @@ from ..utils import (
|
||||
|
||||
|
||||
class LiTVIE(InfoExtractor):
|
||||
_VALID_URL = r'https?://www\.litv\.tv/vod/[^/]+/content\.do\?.*?\bid=(?P<id>[^&]+)'
|
||||
_VALID_URL = r'https?://www\.litv\.tv/(?:vod|promo)/[^/]+/(?:content\.do)?\?.*?\b(?:content_)?id=(?P<id>[^&]+)'
|
||||
|
||||
_URL_TEMPLATE = 'https://www.litv.tv/vod/%s/content.do?id=%s'
|
||||
|
||||
@@ -27,6 +27,7 @@ class LiTVIE(InfoExtractor):
|
||||
'playlist_count': 50,
|
||||
}, {
|
||||
'url': 'https://www.litv.tv/vod/drama/content.do?brc_id=root&id=VOD00041610&isUHEnabled=true&autoPlay=1',
|
||||
'md5': '969e343d9244778cb29acec608e53640',
|
||||
'info_dict': {
|
||||
'id': 'VOD00041610',
|
||||
'ext': 'mp4',
|
||||
@@ -37,7 +38,16 @@ class LiTVIE(InfoExtractor):
|
||||
},
|
||||
'params': {
|
||||
'noplaylist': True,
|
||||
'skip_download': True, # m3u8 download
|
||||
},
|
||||
'skip': 'Georestricted to Taiwan',
|
||||
}, {
|
||||
'url': 'https://www.litv.tv/promo/miyuezhuan/?content_id=VOD00044841&',
|
||||
'md5': '88322ea132f848d6e3e18b32a832b918',
|
||||
'info_dict': {
|
||||
'id': 'VOD00044841',
|
||||
'ext': 'mp4',
|
||||
'title': '芈月傳第1集 霸星芈月降世楚國',
|
||||
'description': '楚威王二年,太史令唐昧夜觀星象,發現霸星即將現世。王后得知霸星的預言後,想盡辦法不讓孩子順利出生,幸得莒姬相護化解危機。沒想到眾人期待下出生的霸星卻是位公主,楚威王對此失望至極。楚王后命人將女嬰丟棄河中,居然奇蹟似的被少司命像攔下,楚威王認為此女非同凡響,為她取名芈月。',
|
||||
},
|
||||
'skip': 'Georestricted to Taiwan',
|
||||
}]
|
||||
@@ -92,13 +102,18 @@ class LiTVIE(InfoExtractor):
|
||||
# endpoint gives the same result as the data embedded in the webpage.
|
||||
# If georestricted, there are no embedded data, so an extra request is
|
||||
# necessary to get the error code
|
||||
if 'assetId' not in view_data:
|
||||
view_data = self._download_json(
|
||||
'https://www.litv.tv/vod/ajax/getProgramInfo', video_id,
|
||||
query={'contentId': video_id},
|
||||
headers={'Accept': 'application/json'})
|
||||
video_data = self._parse_json(self._search_regex(
|
||||
r'uiHlsUrl\s*=\s*testBackendData\(([^;]+)\);',
|
||||
webpage, 'video data', default='{}'), video_id)
|
||||
if not video_data:
|
||||
payload = {
|
||||
'assetId': view_data['assetId'],
|
||||
'watchDevices': vod_data['watchDevices'],
|
||||
'watchDevices': view_data['watchDevices'],
|
||||
'contentType': view_data['contentType'],
|
||||
}
|
||||
video_data = self._download_json(
|
||||
@@ -115,7 +130,8 @@ class LiTVIE(InfoExtractor):
|
||||
raise ExtractorError('Unexpected result from %s' % self.IE_NAME)
|
||||
|
||||
formats = self._extract_m3u8_formats(
|
||||
video_data['fullpath'], video_id, ext='mp4', m3u8_id='hls')
|
||||
video_data['fullpath'], video_id, ext='mp4',
|
||||
entry_protocol='m3u8_native', m3u8_id='hls')
|
||||
for a_format in formats:
|
||||
# LiTV HLS segments doesn't like compressions
|
||||
a_format.setdefault('http_headers', {})['Youtubedl-no-compression'] = True
|
||||
|
@@ -25,10 +25,7 @@ class MioMioIE(InfoExtractor):
|
||||
'title': '【SKY】字幕 铠武昭和VS平成 假面骑士大战FEAT战队 魔星字幕组 字幕',
|
||||
'duration': 5923,
|
||||
},
|
||||
'params': {
|
||||
# The server provides broken file
|
||||
'skip_download': True,
|
||||
}
|
||||
'skip': 'Unable to load videos',
|
||||
}, {
|
||||
'url': 'http://www.miomio.tv/watch/cc184024/',
|
||||
'info_dict': {
|
||||
@@ -47,16 +44,12 @@ class MioMioIE(InfoExtractor):
|
||||
'skip': 'Unable to load videos',
|
||||
}, {
|
||||
# new 'h5' player
|
||||
'url': 'http://www.miomio.tv/watch/cc273295/',
|
||||
'md5': '',
|
||||
'url': 'http://www.miomio.tv/watch/cc273997/',
|
||||
'md5': '0b27a4b4495055d826813f8c3a6b2070',
|
||||
'info_dict': {
|
||||
'id': '273295',
|
||||
'id': '273997',
|
||||
'ext': 'mp4',
|
||||
'title': 'アウト×デラックス 20160526',
|
||||
},
|
||||
'params': {
|
||||
# intermittent HTTP 500
|
||||
'skip_download': True,
|
||||
'title': 'マツコの知らない世界【劇的進化SP!ビニール傘&冷凍食品2016】 1_2 - 16 05 31',
|
||||
},
|
||||
}]
|
||||
|
||||
@@ -116,7 +109,7 @@ class MioMioIE(InfoExtractor):
|
||||
player_webpage = self._download_webpage(
|
||||
player_url, video_id,
|
||||
note='Downloading player webpage', headers={'Referer': url})
|
||||
entries = self._parse_html5_media_entries(player_url, player_webpage)
|
||||
entries = self._parse_html5_media_entries(player_url, player_webpage, video_id)
|
||||
http_headers = {'Referer': player_url}
|
||||
else:
|
||||
http_headers = {'Referer': 'http://www.miomio.tv%s' % mioplayer_path}
|
||||
|
@@ -1,53 +1,56 @@
|
||||
from __future__ import unicode_literals
|
||||
|
||||
import os
|
||||
import re
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..compat import (
|
||||
compat_urllib_parse_unquote,
|
||||
compat_urllib_parse_urlparse,
|
||||
from ..utils import (
|
||||
int_or_none,
|
||||
str_to_int,
|
||||
unified_strdate,
|
||||
)
|
||||
from ..utils import sanitized_Request
|
||||
from .keezmovies import KeezMoviesIE
|
||||
|
||||
|
||||
class MofosexIE(InfoExtractor):
|
||||
_VALID_URL = r'https?://(?:www\.)?(?P<url>mofosex\.com/videos/(?P<id>[0-9]+)/.*?\.html)'
|
||||
_TEST = {
|
||||
'url': 'http://www.mofosex.com/videos/5018/japanese-teen-music-video.html',
|
||||
'md5': '1b2eb47ac33cc75d4a80e3026b613c5a',
|
||||
class MofosexIE(KeezMoviesIE):
|
||||
_VALID_URL = r'https?://(?:www\.)?mofosex\.com/videos/(?P<id>\d+)/(?P<display_id>[^/?#&.]+)\.html'
|
||||
_TESTS = [{
|
||||
'url': 'http://www.mofosex.com/videos/318131/amateur-teen-playing-and-masturbating-318131.html',
|
||||
'md5': '39a15853632b7b2e5679f92f69b78e91',
|
||||
'info_dict': {
|
||||
'id': '5018',
|
||||
'id': '318131',
|
||||
'display_id': 'amateur-teen-playing-and-masturbating-318131',
|
||||
'ext': 'mp4',
|
||||
'title': 'Japanese Teen Music Video',
|
||||
'title': 'amateur teen playing and masturbating',
|
||||
'thumbnail': 're:^https?://.*\.jpg$',
|
||||
'upload_date': '20121114',
|
||||
'view_count': int,
|
||||
'like_count': int,
|
||||
'dislike_count': int,
|
||||
'age_limit': 18,
|
||||
}
|
||||
}
|
||||
}, {
|
||||
# This video is no longer available
|
||||
'url': 'http://www.mofosex.com/videos/5018/japanese-teen-music-video.html',
|
||||
'only_matching': True,
|
||||
}]
|
||||
|
||||
def _real_extract(self, url):
|
||||
mobj = re.match(self._VALID_URL, url)
|
||||
video_id = mobj.group('id')
|
||||
url = 'http://www.' + mobj.group('url')
|
||||
webpage, info = self._extract_info(url)
|
||||
|
||||
req = sanitized_Request(url)
|
||||
req.add_header('Cookie', 'age_verified=1')
|
||||
webpage = self._download_webpage(req, video_id)
|
||||
view_count = str_to_int(self._search_regex(
|
||||
r'VIEWS:</span>\s*([\d,.]+)', webpage, 'view count', fatal=False))
|
||||
like_count = int_or_none(self._search_regex(
|
||||
r'id=["\']amountLikes["\'][^>]*>(\d+)', webpage,
|
||||
'like count', fatal=False))
|
||||
dislike_count = int_or_none(self._search_regex(
|
||||
r'id=["\']amountDislikes["\'][^>]*>(\d+)', webpage,
|
||||
'like count', fatal=False))
|
||||
upload_date = unified_strdate(self._html_search_regex(
|
||||
r'Added:</span>([^<]+)', webpage, 'upload date', fatal=False))
|
||||
|
||||
video_title = self._html_search_regex(r'<h1>(.+?)<', webpage, 'title')
|
||||
video_url = compat_urllib_parse_unquote(self._html_search_regex(r'flashvars.video_url = \'([^\']+)', webpage, 'video_url'))
|
||||
path = compat_urllib_parse_urlparse(video_url).path
|
||||
extension = os.path.splitext(path)[1][1:]
|
||||
format = path.split('/')[5].split('_')[:2]
|
||||
format = '-'.join(format)
|
||||
info.update({
|
||||
'view_count': view_count,
|
||||
'like_count': like_count,
|
||||
'dislike_count': dislike_count,
|
||||
'upload_date': upload_date,
|
||||
'thumbnail': self._og_search_thumbnail(webpage),
|
||||
})
|
||||
|
||||
age_limit = self._rta_search(webpage)
|
||||
|
||||
return {
|
||||
'id': video_id,
|
||||
'title': video_title,
|
||||
'url': video_url,
|
||||
'ext': extension,
|
||||
'format': format,
|
||||
'format_id': format,
|
||||
'age_limit': age_limit,
|
||||
}
|
||||
return info
|
||||
|
@@ -257,8 +257,8 @@ class MTVServicesEmbeddedIE(MTVServicesInfoExtractor):
|
||||
def _get_feed_url(self, uri):
|
||||
video_id = self._id_from_uri(uri)
|
||||
site_id = uri.replace(video_id, '')
|
||||
config_url = ('http://media.mtvnservices.com/pmt/e1/players/{0}/'
|
||||
'context4/context5/config.xml'.format(site_id))
|
||||
config_url = ('http://media.mtvnservices.com/pmt-arc/e1/players/{0}/'
|
||||
'context52/config.xml'.format(site_id))
|
||||
config_doc = self._download_xml(config_url, video_id)
|
||||
feed_node = config_doc.find('.//feed')
|
||||
feed_url = feed_node.text.strip().split('?')[0]
|
||||
|
@@ -3,7 +3,7 @@ from __future__ import unicode_literals
|
||||
import re
|
||||
|
||||
from .common import InfoExtractor
|
||||
from .theplatform import ThePlatformIE
|
||||
from .adobepass import AdobePassIE
|
||||
from ..utils import (
|
||||
smuggle_url,
|
||||
url_basename,
|
||||
@@ -65,7 +65,7 @@ class NationalGeographicVideoIE(InfoExtractor):
|
||||
}
|
||||
|
||||
|
||||
class NationalGeographicIE(ThePlatformIE):
|
||||
class NationalGeographicIE(AdobePassIE):
|
||||
IE_NAME = 'natgeo'
|
||||
_VALID_URL = r'https?://channel\.nationalgeographic\.com/(?:wild/)?[^/]+/(?:videos|episodes)/(?P<id>[^/?]+)'
|
||||
|
||||
@@ -119,7 +119,7 @@ class NationalGeographicIE(ThePlatformIE):
|
||||
auth_resource_id = self._search_regex(
|
||||
r"video_auth_resourceId\s*=\s*'([^']+)'",
|
||||
webpage, 'auth resource id')
|
||||
query['auth'] = self._extract_mvpd_auth(url, display_id, 'natgeo', auth_resource_id) or ''
|
||||
query['auth'] = self._extract_mvpd_auth(url, display_id, 'natgeo', auth_resource_id)
|
||||
|
||||
return {
|
||||
'_type': 'url_transparent',
|
||||
@@ -131,7 +131,7 @@ class NationalGeographicIE(ThePlatformIE):
|
||||
}
|
||||
|
||||
|
||||
class NationalGeographicEpisodeGuideIE(ThePlatformIE):
|
||||
class NationalGeographicEpisodeGuideIE(InfoExtractor):
|
||||
IE_NAME = 'natgeo:episodeguide'
|
||||
_VALID_URL = r'https?://channel\.nationalgeographic\.com/(?:wild/)?(?P<id>[^/]+)/episode-guide'
|
||||
_TESTS = [
|
||||
|
@@ -14,16 +14,6 @@ from ..utils import (
|
||||
|
||||
|
||||
class NRKBaseIE(InfoExtractor):
|
||||
def _extract_formats(self, manifest_url, video_id, fatal=True):
|
||||
formats = []
|
||||
formats.extend(self._extract_f4m_formats(
|
||||
manifest_url + '?hdcore=3.5.0&plugin=aasp-3.5.0.151.81',
|
||||
video_id, f4m_id='hds', fatal=fatal))
|
||||
formats.extend(self._extract_m3u8_formats(manifest_url.replace(
|
||||
'akamaihd.net/z/', 'akamaihd.net/i/').replace('/manifest.f4m', '/master.m3u8'),
|
||||
video_id, 'mp4', 'm3u8_native', m3u8_id='hls', fatal=fatal))
|
||||
return formats
|
||||
|
||||
def _real_extract(self, url):
|
||||
video_id = self._match_id(url)
|
||||
|
||||
@@ -45,7 +35,7 @@ class NRKBaseIE(InfoExtractor):
|
||||
asset_url = asset.get('url')
|
||||
if not asset_url:
|
||||
continue
|
||||
formats = self._extract_formats(asset_url, video_id, fatal=False)
|
||||
formats = self._extract_akamai_formats(asset_url, video_id)
|
||||
if not formats:
|
||||
continue
|
||||
self._sort_formats(formats)
|
||||
@@ -69,7 +59,7 @@ class NRKBaseIE(InfoExtractor):
|
||||
if not entries:
|
||||
media_url = data.get('mediaUrl')
|
||||
if media_url:
|
||||
formats = self._extract_formats(media_url, video_id)
|
||||
formats = self._extract_akamai_formats(media_url, video_id)
|
||||
self._sort_formats(formats)
|
||||
duration = parse_duration(data.get('duration'))
|
||||
entries = [{
|
||||
|
@@ -1,6 +1,8 @@
|
||||
# encoding: utf-8
|
||||
# coding: utf-8
|
||||
from __future__ import unicode_literals
|
||||
|
||||
import re
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..compat import compat_urlparse
|
||||
from ..utils import (
|
||||
@@ -40,8 +42,8 @@ class NTVDeIE(InfoExtractor):
|
||||
timestamp = int_or_none(info.get('publishedDateAsUnixTimeStamp'))
|
||||
vdata = self._parse_json(self._search_regex(
|
||||
r'(?s)\$\(\s*"\#player"\s*\)\s*\.data\(\s*"player",\s*(\{.*?\})\);',
|
||||
webpage, 'player data'),
|
||||
video_id, transform_source=js_to_json)
|
||||
webpage, 'player data'), video_id,
|
||||
transform_source=lambda s: js_to_json(re.sub(r'advertising:\s*{[^}]+},', '', s)))
|
||||
duration = parse_duration(vdata.get('duration'))
|
||||
|
||||
formats = []
|
||||
|
@@ -1,12 +1,12 @@
|
||||
# coding: utf-8
|
||||
from __future__ import unicode_literals, division
|
||||
|
||||
import math
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..compat import compat_chr
|
||||
from ..compat import (
|
||||
compat_chr,
|
||||
compat_ord,
|
||||
)
|
||||
from ..utils import (
|
||||
decode_png,
|
||||
determine_ext,
|
||||
ExtractorError,
|
||||
)
|
||||
@@ -42,71 +42,26 @@ class OpenloadIE(InfoExtractor):
|
||||
|
||||
def _real_extract(self, url):
|
||||
video_id = self._match_id(url)
|
||||
webpage = self._download_webpage(url, video_id)
|
||||
webpage = self._download_webpage('https://openload.co/embed/%s/' % video_id, video_id)
|
||||
|
||||
if 'File not found' in webpage:
|
||||
if 'File not found' in webpage or 'deleted by the owner' in webpage:
|
||||
raise ExtractorError('File not found', expected=True)
|
||||
|
||||
# The following extraction logic is proposed by @Belderak and @gdkchan
|
||||
# and declared to be used freely in youtube-dl
|
||||
# See https://github.com/rg3/youtube-dl/issues/9706
|
||||
# The following decryption algorithm is written by @yokrysty and
|
||||
# declared to be freely used in youtube-dl
|
||||
# See https://github.com/rg3/youtube-dl/issues/10408
|
||||
enc_data = self._html_search_regex(
|
||||
r'<span[^>]+id="hiddenurl"[^>]*>([^<]+)</span>', webpage, 'encrypted data')
|
||||
|
||||
numbers_js = self._download_webpage(
|
||||
'https://openload.co/assets/js/obfuscator/n.js', video_id,
|
||||
note='Downloading signature numbers')
|
||||
signums = self._search_regex(
|
||||
r'window\.signatureNumbers\s*=\s*[\'"](?P<data>[a-z]+)[\'"]',
|
||||
numbers_js, 'signature numbers', group='data')
|
||||
video_url_chars = []
|
||||
|
||||
linkimg_uri = self._search_regex(
|
||||
r'<img[^>]+id="linkimg"[^>]+src="([^"]+)"', webpage, 'link image')
|
||||
linkimg = self._request_webpage(
|
||||
linkimg_uri, video_id, note=False).read()
|
||||
for c in enc_data:
|
||||
j = compat_ord(c)
|
||||
if j >= 33 and j <= 126:
|
||||
j = ((j + 14) % 94) + 33
|
||||
video_url_chars += compat_chr(j)
|
||||
|
||||
width, height, pixels = decode_png(linkimg)
|
||||
|
||||
output = ''
|
||||
for y in range(height):
|
||||
for x in range(width):
|
||||
r, g, b = pixels[y][3 * x:3 * x + 3]
|
||||
if r == 0 and g == 0 and b == 0:
|
||||
break
|
||||
else:
|
||||
output += compat_chr(r)
|
||||
output += compat_chr(g)
|
||||
output += compat_chr(b)
|
||||
|
||||
img_str_length = len(output) // 200
|
||||
img_str = [[0 for x in range(img_str_length)] for y in range(10)]
|
||||
|
||||
sig_str_length = len(signums) // 260
|
||||
sig_str = [[0 for x in range(sig_str_length)] for y in range(10)]
|
||||
|
||||
for i in range(10):
|
||||
for j in range(img_str_length):
|
||||
begin = i * img_str_length * 20 + j * 20
|
||||
img_str[i][j] = output[begin:begin + 20]
|
||||
for j in range(sig_str_length):
|
||||
begin = i * sig_str_length * 26 + j * 26
|
||||
sig_str[i][j] = signums[begin:begin + 26]
|
||||
|
||||
parts = []
|
||||
# TODO: find better names for str_, chr_ and sum_
|
||||
str_ = ''
|
||||
for i in [2, 3, 5, 7]:
|
||||
str_ = ''
|
||||
sum_ = float(99)
|
||||
for j in range(len(sig_str[i])):
|
||||
for chr_idx in range(len(img_str[i][j])):
|
||||
if sum_ > float(122):
|
||||
sum_ = float(98)
|
||||
chr_ = compat_chr(int(math.floor(sum_)))
|
||||
if sig_str[i][j][chr_idx] == chr_ and j >= len(str_):
|
||||
sum_ += float(2.5)
|
||||
str_ += img_str[i][j][chr_idx]
|
||||
parts.append(str_.replace(',', ''))
|
||||
|
||||
video_url = 'https://openload.co/stream/%s~%s~%s~%s' % (parts[3], parts[1], parts[2], parts[0])
|
||||
video_url = 'https://openload.co/stream/%s?mime=true' % ''.join(video_url_chars)
|
||||
|
||||
title = self._og_search_title(webpage, default=None) or self._search_regex(
|
||||
r'<span[^>]+class=["\']title["\'][^>]*>([^<]+)', webpage,
|
||||
|
@@ -1,9 +1,10 @@
|
||||
from __future__ import unicode_literals
|
||||
|
||||
import re
|
||||
import json
|
||||
import random
|
||||
import collections
|
||||
import json
|
||||
import os
|
||||
import random
|
||||
import re
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..compat import (
|
||||
@@ -12,10 +13,11 @@ from ..compat import (
|
||||
)
|
||||
from ..utils import (
|
||||
ExtractorError,
|
||||
float_or_none,
|
||||
int_or_none,
|
||||
parse_duration,
|
||||
qualities,
|
||||
sanitized_Request,
|
||||
srt_subtitles_timecode,
|
||||
urlencode_postdata,
|
||||
)
|
||||
|
||||
@@ -75,12 +77,10 @@ class PluralsightIE(PluralsightBaseIE):
|
||||
if not post_url.startswith('http'):
|
||||
post_url = compat_urlparse.urljoin(self._LOGIN_URL, post_url)
|
||||
|
||||
request = sanitized_Request(
|
||||
post_url, urlencode_postdata(login_form))
|
||||
request.add_header('Content-Type', 'application/x-www-form-urlencoded')
|
||||
|
||||
response = self._download_webpage(
|
||||
request, None, 'Logging in as %s' % username)
|
||||
post_url, None, 'Logging in as %s' % username,
|
||||
data=urlencode_postdata(login_form),
|
||||
headers={'Content-Type': 'application/x-www-form-urlencoded'})
|
||||
|
||||
error = self._search_regex(
|
||||
r'<span[^>]+class="field-validation-error"[^>]*>([^<]+)</span>',
|
||||
@@ -91,6 +91,53 @@ class PluralsightIE(PluralsightBaseIE):
|
||||
if all(p not in response for p in ('__INITIAL_STATE__', '"currentUser"')):
|
||||
raise ExtractorError('Unable to log in')
|
||||
|
||||
def _get_subtitles(self, author, clip_id, lang, name, duration, video_id):
|
||||
captions_post = {
|
||||
'a': author,
|
||||
'cn': clip_id,
|
||||
'lc': lang,
|
||||
'm': name,
|
||||
}
|
||||
captions = self._download_json(
|
||||
'%s/training/Player/Captions' % self._API_BASE, video_id,
|
||||
'Downloading captions JSON', 'Unable to download captions JSON',
|
||||
fatal=False, data=json.dumps(captions_post).encode('utf-8'),
|
||||
headers={'Content-Type': 'application/json;charset=utf-8'})
|
||||
if captions:
|
||||
return {
|
||||
lang: [{
|
||||
'ext': 'json',
|
||||
'data': json.dumps(captions),
|
||||
}, {
|
||||
'ext': 'srt',
|
||||
'data': self._convert_subtitles(duration, captions),
|
||||
}]
|
||||
}
|
||||
|
||||
@staticmethod
|
||||
def _convert_subtitles(duration, subs):
|
||||
srt = ''
|
||||
for num, current in enumerate(subs):
|
||||
current = subs[num]
|
||||
start, text = float_or_none(
|
||||
current.get('DisplayTimeOffset')), current.get('Text')
|
||||
if start is None or text is None:
|
||||
continue
|
||||
end = duration if num == len(subs) - 1 else float_or_none(
|
||||
subs[num + 1].get('DisplayTimeOffset'))
|
||||
if end is None:
|
||||
continue
|
||||
srt += os.linesep.join(
|
||||
(
|
||||
'%d' % num,
|
||||
'%s --> %s' % (
|
||||
srt_subtitles_timecode(start),
|
||||
srt_subtitles_timecode(end)),
|
||||
text,
|
||||
os.linesep,
|
||||
))
|
||||
return srt
|
||||
|
||||
def _real_extract(self, url):
|
||||
qs = compat_urlparse.parse_qs(compat_urlparse.urlparse(url).query)
|
||||
|
||||
@@ -138,6 +185,8 @@ class PluralsightIE(PluralsightBaseIE):
|
||||
if not clip:
|
||||
raise ExtractorError('Unable to resolve clip')
|
||||
|
||||
title = '%s - %s' % (module['title'], clip['title'])
|
||||
|
||||
QUALITIES = {
|
||||
'low': {'width': 640, 'height': 480},
|
||||
'medium': {'width': 848, 'height': 640},
|
||||
@@ -196,13 +245,12 @@ class PluralsightIE(PluralsightBaseIE):
|
||||
'mt': ext,
|
||||
'q': '%dx%d' % (f['width'], f['height']),
|
||||
}
|
||||
request = sanitized_Request(
|
||||
'%s/training/Player/ViewClip' % self._API_BASE,
|
||||
json.dumps(clip_post).encode('utf-8'))
|
||||
request.add_header('Content-Type', 'application/json;charset=utf-8')
|
||||
format_id = '%s-%s' % (ext, quality)
|
||||
clip_url = self._download_webpage(
|
||||
request, display_id, 'Downloading %s URL' % format_id, fatal=False)
|
||||
'%s/training/Player/ViewClip' % self._API_BASE, display_id,
|
||||
'Downloading %s URL' % format_id, fatal=False,
|
||||
data=json.dumps(clip_post).encode('utf-8'),
|
||||
headers={'Content-Type': 'application/json;charset=utf-8'})
|
||||
|
||||
# Pluralsight tracks multiple sequential calls to ViewClip API and start
|
||||
# to return 429 HTTP errors after some time (see
|
||||
@@ -225,18 +273,20 @@ class PluralsightIE(PluralsightBaseIE):
|
||||
formats.append(f)
|
||||
self._sort_formats(formats)
|
||||
|
||||
# TODO: captions
|
||||
# http://www.pluralsight.com/training/Player/ViewClip + cap = true
|
||||
# or
|
||||
# http://www.pluralsight.com/training/Player/Captions
|
||||
# { a = author, cn = clip_id, lc = end, m = name }
|
||||
duration = int_or_none(
|
||||
clip.get('duration')) or parse_duration(clip.get('formattedDuration'))
|
||||
|
||||
# TODO: other languages?
|
||||
subtitles = self.extract_subtitles(
|
||||
author, clip_id, 'en', name, duration, display_id)
|
||||
|
||||
return {
|
||||
'id': clip.get('clipName') or clip['name'],
|
||||
'title': '%s - %s' % (module['title'], clip['title']),
|
||||
'duration': int_or_none(clip.get('duration')) or parse_duration(clip.get('formattedDuration')),
|
||||
'title': title,
|
||||
'duration': duration,
|
||||
'creator': author,
|
||||
'formats': formats
|
||||
'formats': formats,
|
||||
'subtitles': subtitles,
|
||||
}
|
||||
|
||||
|
||||
|
89
youtube_dl/extractor/porncom.py
Normal file
89
youtube_dl/extractor/porncom.py
Normal file
@@ -0,0 +1,89 @@
|
||||
from __future__ import unicode_literals
|
||||
|
||||
import re
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..compat import compat_urlparse
|
||||
from ..utils import (
|
||||
int_or_none,
|
||||
js_to_json,
|
||||
parse_filesize,
|
||||
str_to_int,
|
||||
)
|
||||
|
||||
|
||||
class PornComIE(InfoExtractor):
|
||||
_VALID_URL = r'https?://(?:[a-zA-Z]+\.)?porn\.com/videos/(?:(?P<display_id>[^/]+)-)?(?P<id>\d+)'
|
||||
_TESTS = [{
|
||||
'url': 'http://www.porn.com/videos/teen-grabs-a-dildo-and-fucks-her-pussy-live-on-1hottie-i-rec-2603339',
|
||||
'md5': '3f30ce76267533cd12ba999263156de7',
|
||||
'info_dict': {
|
||||
'id': '2603339',
|
||||
'display_id': 'teen-grabs-a-dildo-and-fucks-her-pussy-live-on-1hottie-i-rec',
|
||||
'ext': 'mp4',
|
||||
'title': 'Teen grabs a dildo and fucks her pussy live on 1hottie, I rec',
|
||||
'thumbnail': 're:^https?://.*\.jpg$',
|
||||
'duration': 551,
|
||||
'view_count': int,
|
||||
'age_limit': 18,
|
||||
},
|
||||
}, {
|
||||
'url': 'http://se.porn.com/videos/marsha-may-rides-seth-on-top-of-his-thick-cock-2658067',
|
||||
'only_matching': True,
|
||||
}]
|
||||
|
||||
def _real_extract(self, url):
|
||||
mobj = re.match(self._VALID_URL, url)
|
||||
video_id = mobj.group('id')
|
||||
display_id = mobj.group('display_id') or video_id
|
||||
|
||||
webpage = self._download_webpage(url, display_id)
|
||||
|
||||
config = self._parse_json(
|
||||
self._search_regex(
|
||||
r'=\s*({.+?})\s*,\s*[\da-zA-Z_]+\s*=',
|
||||
webpage, 'config', default='{}'),
|
||||
display_id, transform_source=js_to_json, fatal=False)
|
||||
|
||||
if config:
|
||||
title = config['title']
|
||||
formats = [{
|
||||
'url': stream['url'],
|
||||
'format_id': stream.get('id'),
|
||||
'height': int_or_none(self._search_regex(
|
||||
r'^(\d+)[pP]', stream.get('id') or '', 'height', default=None))
|
||||
} for stream in config['streams'] if stream.get('url')]
|
||||
thumbnail = (compat_urlparse.urljoin(
|
||||
config['thumbCDN'], config['poster'])
|
||||
if config.get('thumbCDN') and config.get('poster') else None)
|
||||
duration = int_or_none(config.get('length'))
|
||||
else:
|
||||
title = self._search_regex(
|
||||
(r'<title>([^<]+)</title>', r'<h1[^>]*>([^<]+)</h1>'),
|
||||
webpage, 'title')
|
||||
formats = [{
|
||||
'url': compat_urlparse.urljoin(url, format_url),
|
||||
'format_id': '%sp' % height,
|
||||
'height': int(height),
|
||||
'filesize_approx': parse_filesize(filesize),
|
||||
} for format_url, height, filesize in re.findall(
|
||||
r'<a[^>]+href="(/download/[^"]+)">MPEG4 (\d+)p<span[^>]*>(\d+\s+[a-zA-Z]+)<',
|
||||
webpage)]
|
||||
thumbnail = None
|
||||
duration = None
|
||||
|
||||
self._sort_formats(formats)
|
||||
|
||||
view_count = str_to_int(self._search_regex(
|
||||
r'class=["\']views["\'][^>]*><p>([\d,.]+)', webpage, 'view count'))
|
||||
|
||||
return {
|
||||
'id': video_id,
|
||||
'display_id': display_id,
|
||||
'title': title,
|
||||
'thumbnail': thumbnail,
|
||||
'duration': duration,
|
||||
'view_count': view_count,
|
||||
'formats': formats,
|
||||
'age_limit': 18,
|
||||
}
|
@@ -13,15 +13,15 @@ class RadioBremenIE(InfoExtractor):
|
||||
IE_NAME = 'radiobremen'
|
||||
|
||||
_TEST = {
|
||||
'url': 'http://www.radiobremen.de/mediathek/index.html?id=114720',
|
||||
'url': 'http://www.radiobremen.de/mediathek/?id=141876',
|
||||
'info_dict': {
|
||||
'id': '114720',
|
||||
'id': '141876',
|
||||
'ext': 'mp4',
|
||||
'duration': 1685,
|
||||
'duration': 178,
|
||||
'width': 512,
|
||||
'title': 'buten un binnen vom 22. Dezember',
|
||||
'title': 'Druck auf Patrick Öztürk',
|
||||
'thumbnail': 're:https?://.*\.jpg$',
|
||||
'description': 'Unter anderem mit diesen Themen: 45 Flüchtlinge sind in Worpswede angekommen +++ Freies Internet für alle: Bremer arbeiten an einem flächendeckenden W-Lan-Netzwerk +++ Aktivisten kämpfen für das Unibad +++ So war das Wetter 2014 +++',
|
||||
'description': 'Gegen den SPD-Bürgerschaftsabgeordneten Patrick Öztürk wird wegen Beihilfe zum gewerbsmäßigen Betrug ermittelt. Am Donnerstagabend sollte er dem Vorstand des SPD-Unterbezirks Bremerhaven dazu Rede und Antwort stehen.',
|
||||
},
|
||||
}
|
||||
|
||||
|
@@ -4,33 +4,43 @@ from __future__ import unicode_literals
|
||||
import re
|
||||
|
||||
from .jwplatform import JWPlatformBaseIE
|
||||
from ..compat import compat_parse_qs
|
||||
from ..utils import (
|
||||
ExtractorError,
|
||||
parse_duration,
|
||||
float_or_none,
|
||||
parse_iso8601,
|
||||
update_url_query,
|
||||
)
|
||||
|
||||
|
||||
class SendtoNewsIE(JWPlatformBaseIE):
|
||||
_VALID_URL = r'https?://embed\.sendtonews\.com/player/embed\.php\?(?P<query>[^#]+)'
|
||||
_VALID_URL = r'https?://embed\.sendtonews\.com/player2/embedplayer\.php\?.*\bSC=(?P<id>[0-9A-Za-z-]+)'
|
||||
|
||||
_TEST = {
|
||||
# From http://cleveland.cbslocal.com/2016/05/16/indians-score-season-high-15-runs-in-blowout-win-over-reds-rapid-reaction/
|
||||
'url': 'http://embed.sendtonews.com/player/embed.php?SK=GxfCe0Zo7D&MK=175909&PK=5588&autoplay=on&sound=yes',
|
||||
'url': 'http://embed.sendtonews.com/player2/embedplayer.php?SC=GxfCe0Zo7D-175909-5588&type=single&autoplay=on&sound=YES',
|
||||
'info_dict': {
|
||||
'id': 'GxfCe0Zo7D-175909-5588',
|
||||
'ext': 'mp4',
|
||||
'title': 'Recap: CLE 15, CIN 6',
|
||||
'description': '5/16/16: Indians\' bats explode for 15 runs in a win',
|
||||
'duration': 49,
|
||||
'id': 'GxfCe0Zo7D-175909-5588'
|
||||
},
|
||||
'playlist_count': 9,
|
||||
# test the first video only to prevent lengthy tests
|
||||
'playlist': [{
|
||||
'info_dict': {
|
||||
'id': '198180',
|
||||
'ext': 'mp4',
|
||||
'title': 'Recap: CLE 5, LAA 4',
|
||||
'description': '8/14/16: Naquin, Almonte lead Indians in 5-4 win',
|
||||
'duration': 57.343,
|
||||
'thumbnail': 're:https?://.*\.jpg$',
|
||||
'upload_date': '20160815',
|
||||
'timestamp': 1471221961,
|
||||
},
|
||||
}],
|
||||
'params': {
|
||||
# m3u8 download
|
||||
'skip_download': True,
|
||||
},
|
||||
}
|
||||
|
||||
_URL_TEMPLATE = '//embed.sendtonews.com/player/embed.php?SK=%s&MK=%s&PK=%s'
|
||||
_URL_TEMPLATE = '//embed.sendtonews.com/player2/embedplayer.php?SC=%s'
|
||||
|
||||
@classmethod
|
||||
def _extract_url(cls, webpage):
|
||||
@@ -39,48 +49,41 @@ class SendtoNewsIE(JWPlatformBaseIE):
|
||||
.*\bSC=(?P<SC>[0-9a-zA-Z-]+).*
|
||||
\1>''', webpage)
|
||||
if mobj:
|
||||
sk, mk, pk = mobj.group('SC').split('-')
|
||||
return cls._URL_TEMPLATE % (sk, mk, pk)
|
||||
sc = mobj.group('SC')
|
||||
return cls._URL_TEMPLATE % sc
|
||||
|
||||
def _real_extract(self, url):
|
||||
mobj = re.match(self._VALID_URL, url)
|
||||
params = compat_parse_qs(mobj.group('query'))
|
||||
playlist_id = self._match_id(url)
|
||||
|
||||
if 'SK' not in params or 'MK' not in params or 'PK' not in params:
|
||||
raise ExtractorError('Invalid URL', expected=True)
|
||||
data_url = update_url_query(
|
||||
url.replace('embedplayer.php', 'data_read.php'),
|
||||
{'cmd': 'loadInitial'})
|
||||
playlist_data = self._download_json(data_url, playlist_id)
|
||||
|
||||
video_id = '-'.join([params['SK'][0], params['MK'][0], params['PK'][0]])
|
||||
entries = []
|
||||
for video in playlist_data['playlistData'][0]:
|
||||
info_dict = self._parse_jwplayer_data(
|
||||
video['jwconfiguration'],
|
||||
require_title=False, rtmp_params={'no_resume': True})
|
||||
|
||||
webpage = self._download_webpage(url, video_id)
|
||||
thumbnails = []
|
||||
if video.get('thumbnailUrl'):
|
||||
thumbnails.append({
|
||||
'id': 'normal',
|
||||
'url': video['thumbnailUrl'],
|
||||
})
|
||||
if video.get('smThumbnailUrl'):
|
||||
thumbnails.append({
|
||||
'id': 'small',
|
||||
'url': video['smThumbnailUrl'],
|
||||
})
|
||||
info_dict.update({
|
||||
'title': video['S_headLine'],
|
||||
'description': video.get('S_fullStory'),
|
||||
'thumbnails': thumbnails,
|
||||
'duration': float_or_none(video.get('SM_length')),
|
||||
'timestamp': parse_iso8601(video.get('S_sysDate'), delimiter=' '),
|
||||
})
|
||||
entries.append(info_dict)
|
||||
|
||||
jwplayer_data_str = self._search_regex(
|
||||
r'jwplayer\("[^"]+"\)\.setup\((.+?)\);', webpage, 'JWPlayer data')
|
||||
js_vars = {
|
||||
'w': 1024,
|
||||
'h': 768,
|
||||
'modeVar': 'html5',
|
||||
}
|
||||
for name, val in js_vars.items():
|
||||
js_val = '%d' % val if isinstance(val, int) else '"%s"' % val
|
||||
jwplayer_data_str = jwplayer_data_str.replace(':%s,' % name, ':%s,' % js_val)
|
||||
|
||||
info_dict = self._parse_jwplayer_data(
|
||||
self._parse_json(jwplayer_data_str, video_id),
|
||||
video_id, require_title=False, rtmp_params={'no_resume': True})
|
||||
|
||||
title = self._html_search_regex(
|
||||
r'<div[^>]+class="embedTitle">([^<]+)</div>', webpage, 'title')
|
||||
description = self._html_search_regex(
|
||||
r'<div[^>]+class="embedSubTitle">([^<]+)</div>', webpage,
|
||||
'description', fatal=False)
|
||||
duration = parse_duration(self._html_search_regex(
|
||||
r'<div[^>]+class="embedDetails">([0-9:]+)', webpage,
|
||||
'duration', fatal=False))
|
||||
|
||||
info_dict.update({
|
||||
'title': title,
|
||||
'description': description,
|
||||
'duration': duration,
|
||||
})
|
||||
|
||||
return info_dict
|
||||
return self.playlist_result(entries, playlist_id)
|
||||
|
@@ -5,9 +5,9 @@ import re
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..utils import (
|
||||
float_or_none,
|
||||
str_to_int,
|
||||
parse_duration,
|
||||
parse_filesize,
|
||||
str_to_int,
|
||||
)
|
||||
|
||||
|
||||
@@ -17,21 +17,24 @@ class SnotrIE(InfoExtractor):
|
||||
'url': 'http://www.snotr.com/video/13708/Drone_flying_through_fireworks',
|
||||
'info_dict': {
|
||||
'id': '13708',
|
||||
'ext': 'flv',
|
||||
'ext': 'mp4',
|
||||
'title': 'Drone flying through fireworks!',
|
||||
'duration': 247,
|
||||
'filesize_approx': 98566144,
|
||||
'duration': 248,
|
||||
'filesize_approx': 40700000,
|
||||
'description': 'A drone flying through Fourth of July Fireworks',
|
||||
}
|
||||
'thumbnail': 're:^https?://.*\.jpg$',
|
||||
},
|
||||
'expected_warnings': ['description'],
|
||||
}, {
|
||||
'url': 'http://www.snotr.com/video/530/David_Letteman_-_George_W_Bush_Top_10',
|
||||
'info_dict': {
|
||||
'id': '530',
|
||||
'ext': 'flv',
|
||||
'ext': 'mp4',
|
||||
'title': 'David Letteman - George W. Bush Top 10',
|
||||
'duration': 126,
|
||||
'filesize_approx': 8912896,
|
||||
'filesize_approx': 8500000,
|
||||
'description': 'The top 10 George W. Bush moments, brought to you by David Letterman!',
|
||||
'thumbnail': 're:^https?://.*\.jpg$',
|
||||
}
|
||||
}]
|
||||
|
||||
@@ -43,26 +46,28 @@ class SnotrIE(InfoExtractor):
|
||||
title = self._og_search_title(webpage)
|
||||
|
||||
description = self._og_search_description(webpage)
|
||||
video_url = 'http://cdn.videos.snotr.com/%s.flv' % video_id
|
||||
info_dict = self._parse_html5_media_entries(
|
||||
url, webpage, video_id, m3u8_entry_protocol='m3u8_native')[0]
|
||||
|
||||
view_count = str_to_int(self._html_search_regex(
|
||||
r'<p>\n<strong>Views:</strong>\n([\d,\.]+)</p>',
|
||||
r'<p[^>]*>\s*<strong[^>]*>Views:</strong>\s*<span[^>]*>([\d,\.]+)',
|
||||
webpage, 'view count', fatal=False))
|
||||
|
||||
duration = parse_duration(self._html_search_regex(
|
||||
r'<p>\n<strong>Length:</strong>\n\s*([0-9:]+).*?</p>',
|
||||
r'<p[^>]*>\s*<strong[^>]*>Length:</strong>\s*<span[^>]*>([\d:]+)',
|
||||
webpage, 'duration', fatal=False))
|
||||
|
||||
filesize_approx = float_or_none(self._html_search_regex(
|
||||
r'<p>\n<strong>Filesize:</strong>\n\s*([0-9.]+)\s*megabyte</p>',
|
||||
webpage, 'filesize', fatal=False), invscale=1024 * 1024)
|
||||
filesize_approx = parse_filesize(self._html_search_regex(
|
||||
r'<p[^>]*>\s*<strong[^>]*>Filesize:</strong>\s*<span[^>]*>([^<]+)',
|
||||
webpage, 'filesize', fatal=False))
|
||||
|
||||
return {
|
||||
info_dict.update({
|
||||
'id': video_id,
|
||||
'description': description,
|
||||
'title': title,
|
||||
'url': video_url,
|
||||
'view_count': view_count,
|
||||
'duration': duration,
|
||||
'filesize_approx': filesize_approx,
|
||||
}
|
||||
})
|
||||
|
||||
return info_dict
|
||||
|
@@ -1,13 +1,13 @@
|
||||
from __future__ import unicode_literals
|
||||
|
||||
from .theplatform import ThePlatformIE
|
||||
from .adobepass import AdobePassIE
|
||||
from ..utils import (
|
||||
update_url_query,
|
||||
smuggle_url,
|
||||
)
|
||||
|
||||
|
||||
class SyfyIE(ThePlatformIE):
|
||||
class SyfyIE(AdobePassIE):
|
||||
_VALID_URL = r'https?://www\.syfy\.com/(?:[^/]+/)?videos/(?P<id>[^/?#]+)'
|
||||
_TESTS = [{
|
||||
'url': 'http://www.syfy.com/theinternetruinedmylife/videos/the-internet-ruined-my-life-season-1-trailer',
|
||||
@@ -31,7 +31,7 @@ class SyfyIE(ThePlatformIE):
|
||||
display_id = self._match_id(url)
|
||||
webpage = self._download_webpage(url, display_id)
|
||||
syfy_mpx = list(self._parse_json(self._search_regex(
|
||||
r'jQuery\.extend\([^,]+,\s*({.+})\);', webpage, 'drupal settings'),
|
||||
r'jQuery\.extend\(Drupal\.settings\s*,\s*({.+?})\);', webpage, 'drupal settings'),
|
||||
display_id)['syfy']['syfy_mpx'].values())[0]
|
||||
video_id = syfy_mpx['mpxGUID']
|
||||
title = syfy_mpx['episodeTitle']
|
||||
@@ -40,7 +40,9 @@ class SyfyIE(ThePlatformIE):
|
||||
'manifest': 'm3u',
|
||||
}
|
||||
if syfy_mpx.get('entitlement') == 'auth':
|
||||
resource = '<rss version="2.0" xmlns:media="http://search.yahoo.com/mrss/"><channel><title>syfy</title><item><title><![CDATA[%s]]></title><guid>%s</guid><media:rating scheme="urn:v-chip">%s</media:rating></item></channel></rss>' % (title, video_id, syfy_mpx.get('mpxRating', 'TV-14'))
|
||||
resource = self._get_mvpd_resource(
|
||||
'syfy', title, video_id,
|
||||
syfy_mpx.get('mpxRating', 'TV-14'))
|
||||
query['auth'] = self._extract_mvpd_auth(
|
||||
url, video_id, 'syfy', resource)
|
||||
|
||||
|
@@ -6,10 +6,10 @@ import time
|
||||
import hmac
|
||||
import binascii
|
||||
import hashlib
|
||||
import netrc
|
||||
|
||||
|
||||
from .once import OnceIE
|
||||
from .adobepass import AdobePassIE
|
||||
from ..compat import (
|
||||
compat_parse_qs,
|
||||
compat_urllib_parse_urlparse,
|
||||
@@ -25,9 +25,6 @@ from ..utils import (
|
||||
xpath_with_ns,
|
||||
mimetype2ext,
|
||||
find_xpath_attr,
|
||||
unescapeHTML,
|
||||
urlencode_postdata,
|
||||
unified_timestamp,
|
||||
)
|
||||
|
||||
default_ns = 'http://www.w3.org/2005/SMIL21/Language'
|
||||
@@ -76,10 +73,10 @@ class ThePlatformBaseIE(OnceIE):
|
||||
if isinstance(captions, list):
|
||||
for caption in captions:
|
||||
lang, src, mime = caption.get('lang', 'en'), caption.get('src'), caption.get('type')
|
||||
subtitles[lang] = [{
|
||||
subtitles.setdefault(lang, []).append({
|
||||
'ext': mimetype2ext(mime),
|
||||
'url': src,
|
||||
}]
|
||||
})
|
||||
|
||||
return {
|
||||
'title': info['title'],
|
||||
@@ -96,7 +93,7 @@ class ThePlatformBaseIE(OnceIE):
|
||||
return self._parse_theplatform_metadata(info)
|
||||
|
||||
|
||||
class ThePlatformIE(ThePlatformBaseIE):
|
||||
class ThePlatformIE(ThePlatformBaseIE, AdobePassIE):
|
||||
_VALID_URL = r'''(?x)
|
||||
(?:https?://(?:link|player)\.theplatform\.com/[sp]/(?P<provider_id>[^/]+)/
|
||||
(?:(?:(?:[^/]+/)+select/)?(?P<media>media/(?:guid/\d+/)?)|(?P<config>(?:[^/\?]+/(?:swf|config)|onsite)/select/))?
|
||||
@@ -167,7 +164,6 @@ class ThePlatformIE(ThePlatformBaseIE):
|
||||
'url': 'http://player.theplatform.com/p/NnzsPC/onsite_universal/select/media/guid/2410887629/2928790?fwsitesection=nbc_the_blacklist_video_library&autoPlay=true&carouselID=137781',
|
||||
'only_matching': True,
|
||||
}]
|
||||
_SERVICE_PROVIDER_TEMPLATE = 'https://sp.auth.adobe.com/adobe-services/%s'
|
||||
|
||||
@classmethod
|
||||
def _extract_urls(cls, webpage):
|
||||
@@ -202,96 +198,6 @@ class ThePlatformIE(ThePlatformBaseIE):
|
||||
sig = flags + expiration_date + checksum + str_to_hex(sig_secret)
|
||||
return '%s&sig=%s' % (url, sig)
|
||||
|
||||
def _extract_mvpd_auth(self, url, video_id, requestor_id, resource):
|
||||
def xml_text(xml_str, tag):
|
||||
return self._search_regex(
|
||||
'<%s>(.+?)</%s>' % (tag, tag), xml_str, tag)
|
||||
|
||||
mvpd_headers = {
|
||||
'ap_42': 'anonymous',
|
||||
'ap_11': 'Linux i686',
|
||||
'ap_z': 'Mozilla/5.0 (X11; Linux i686; rv:47.0) Gecko/20100101 Firefox/47.0',
|
||||
'User-Agent': 'Mozilla/5.0 (X11; Linux i686; rv:47.0) Gecko/20100101 Firefox/47.0',
|
||||
}
|
||||
|
||||
guid = xml_text(resource, 'guid')
|
||||
requestor_info = self._downloader.cache.load('mvpd', requestor_id) or {}
|
||||
authn_token = requestor_info.get('authn_token')
|
||||
if authn_token:
|
||||
token_expires = unified_timestamp(xml_text(authn_token, 'simpleTokenExpires').replace('_GMT', ''))
|
||||
if token_expires and token_expires >= time.time():
|
||||
authn_token = None
|
||||
if not authn_token:
|
||||
# TODO add support for other TV Providers
|
||||
mso_id = 'DTV'
|
||||
login_info = netrc.netrc().authenticators(mso_id)
|
||||
if not login_info:
|
||||
return None
|
||||
|
||||
def post_form(form_page, note, data={}):
|
||||
post_url = self._html_search_regex(r'<form[^>]+action=(["\'])(?P<url>.+?)\1', form_page, 'post url', group='url')
|
||||
return self._download_webpage(
|
||||
post_url, video_id, note, data=urlencode_postdata(data or self._hidden_inputs(form_page)), headers={
|
||||
'Content-Type': 'application/x-www-form-urlencoded',
|
||||
})
|
||||
|
||||
provider_redirect_page = self._download_webpage(
|
||||
self._SERVICE_PROVIDER_TEMPLATE % 'authenticate/saml', video_id,
|
||||
'Downloading Provider Redirect Page', query={
|
||||
'noflash': 'true',
|
||||
'mso_id': mso_id,
|
||||
'requestor_id': requestor_id,
|
||||
'no_iframe': 'false',
|
||||
'domain_name': 'adobe.com',
|
||||
'redirect_url': url,
|
||||
})
|
||||
provider_login_page = post_form(
|
||||
provider_redirect_page, 'Downloading Provider Login Page')
|
||||
mvpd_confirm_page = post_form(provider_login_page, 'Logging in', {
|
||||
'username': login_info[0],
|
||||
'password': login_info[2],
|
||||
})
|
||||
post_form(mvpd_confirm_page, 'Confirming Login')
|
||||
|
||||
session = self._download_webpage(
|
||||
self._SERVICE_PROVIDER_TEMPLATE % 'session', video_id,
|
||||
'Retrieving Session', data=urlencode_postdata({
|
||||
'_method': 'GET',
|
||||
'requestor_id': requestor_id,
|
||||
}), headers=mvpd_headers)
|
||||
authn_token = unescapeHTML(xml_text(session, 'authnToken'))
|
||||
requestor_info['authn_token'] = authn_token
|
||||
self._downloader.cache.store('mvpd', requestor_id, requestor_info)
|
||||
|
||||
authz_token = requestor_info.get(guid)
|
||||
if not authz_token:
|
||||
authorize = self._download_webpage(
|
||||
self._SERVICE_PROVIDER_TEMPLATE % 'authorize', video_id,
|
||||
'Retrieving Authorization Token', data=urlencode_postdata({
|
||||
'resource_id': resource,
|
||||
'requestor_id': requestor_id,
|
||||
'authentication_token': authn_token,
|
||||
'mso_id': xml_text(authn_token, 'simpleTokenMsoID'),
|
||||
'userMeta': '1',
|
||||
}), headers=mvpd_headers)
|
||||
authz_token = unescapeHTML(xml_text(authorize, 'authzToken'))
|
||||
requestor_info[guid] = authz_token
|
||||
self._downloader.cache.store('mvpd', requestor_id, requestor_info)
|
||||
|
||||
mvpd_headers.update({
|
||||
'ap_19': xml_text(authn_token, 'simpleSamlNameID'),
|
||||
'ap_23': xml_text(authn_token, 'simpleSamlSessionIndex'),
|
||||
})
|
||||
|
||||
return self._download_webpage(
|
||||
self._SERVICE_PROVIDER_TEMPLATE % 'shortAuthorize',
|
||||
video_id, 'Retrieving Media Token', data=urlencode_postdata({
|
||||
'authz_token': authz_token,
|
||||
'requestor_id': requestor_id,
|
||||
'session_guid': xml_text(authn_token, 'simpleTokenAuthenticationGuid'),
|
||||
'hashed_guid': 'false',
|
||||
}), headers=mvpd_headers)
|
||||
|
||||
def _real_extract(self, url):
|
||||
url, smuggled_data = unsmuggle_url(url, {})
|
||||
|
||||
|
@@ -1,18 +1,13 @@
|
||||
from __future__ import unicode_literals
|
||||
|
||||
import re
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..compat import compat_str
|
||||
from ..utils import (
|
||||
int_or_none,
|
||||
sanitized_Request,
|
||||
str_to_int,
|
||||
)
|
||||
from ..aes import aes_decrypt_text
|
||||
from .keezmovies import KeezMoviesIE
|
||||
|
||||
|
||||
class Tube8IE(InfoExtractor):
|
||||
class Tube8IE(KeezMoviesIE):
|
||||
_VALID_URL = r'https?://(?:www\.)?tube8\.com/(?:[^/]+/)+(?P<display_id>[^/]+)/(?P<id>\d+)'
|
||||
_TESTS = [{
|
||||
'url': 'http://www.tube8.com/teen/kasia-music-video/229795/',
|
||||
@@ -33,47 +28,17 @@ class Tube8IE(InfoExtractor):
|
||||
}]
|
||||
|
||||
def _real_extract(self, url):
|
||||
mobj = re.match(self._VALID_URL, url)
|
||||
video_id = mobj.group('id')
|
||||
display_id = mobj.group('display_id')
|
||||
webpage, info = self._extract_info(url)
|
||||
|
||||
req = sanitized_Request(url)
|
||||
req.add_header('Cookie', 'age_verified=1')
|
||||
webpage = self._download_webpage(req, display_id)
|
||||
if not info['title']:
|
||||
info['title'] = self._html_search_regex(
|
||||
r'videoTitle\s*=\s*"([^"]+)', webpage, 'title')
|
||||
|
||||
flashvars = self._parse_json(
|
||||
self._search_regex(
|
||||
r'flashvars\s*=\s*({.+?});\r?\n', webpage, 'flashvars'),
|
||||
video_id)
|
||||
|
||||
formats = []
|
||||
for key, video_url in flashvars.items():
|
||||
if not isinstance(video_url, compat_str) or not video_url.startswith('http'):
|
||||
continue
|
||||
height = self._search_regex(
|
||||
r'quality_(\d+)[pP]', key, 'height', default=None)
|
||||
if not height:
|
||||
continue
|
||||
if flashvars.get('encrypted') is True:
|
||||
video_url = aes_decrypt_text(
|
||||
video_url, flashvars['video_title'], 32).decode('utf-8')
|
||||
formats.append({
|
||||
'url': video_url,
|
||||
'format_id': '%sp' % height,
|
||||
'height': int(height),
|
||||
})
|
||||
self._sort_formats(formats)
|
||||
|
||||
thumbnail = flashvars.get('image_url')
|
||||
|
||||
title = self._html_search_regex(
|
||||
r'videoTitle\s*=\s*"([^"]+)', webpage, 'title')
|
||||
description = self._html_search_regex(
|
||||
r'>Description:</strong>\s*(.+?)\s*<', webpage, 'description', fatal=False)
|
||||
uploader = self._html_search_regex(
|
||||
r'<span class="username">\s*(.+?)\s*<',
|
||||
webpage, 'uploader', fatal=False)
|
||||
duration = int_or_none(flashvars.get('video_duration'))
|
||||
|
||||
like_count = int_or_none(self._search_regex(
|
||||
r'rupVar\s*=\s*"(\d+)"', webpage, 'like count', fatal=False))
|
||||
@@ -86,18 +51,13 @@ class Tube8IE(InfoExtractor):
|
||||
r'<span id="allCommentsCount">(\d+)</span>',
|
||||
webpage, 'comment count', fatal=False))
|
||||
|
||||
return {
|
||||
'id': video_id,
|
||||
'display_id': display_id,
|
||||
'title': title,
|
||||
info.update({
|
||||
'description': description,
|
||||
'thumbnail': thumbnail,
|
||||
'uploader': uploader,
|
||||
'duration': duration,
|
||||
'view_count': view_count,
|
||||
'like_count': like_count,
|
||||
'dislike_count': dislike_count,
|
||||
'comment_count': comment_count,
|
||||
'age_limit': 18,
|
||||
'formats': formats,
|
||||
}
|
||||
})
|
||||
|
||||
return info
|
||||
|
@@ -15,21 +15,31 @@ from ..utils import (
|
||||
int_or_none,
|
||||
parse_iso8601,
|
||||
qualities,
|
||||
try_get,
|
||||
update_url_query,
|
||||
)
|
||||
|
||||
|
||||
class TVPlayIE(InfoExtractor):
|
||||
IE_DESC = 'TV3Play and related services'
|
||||
_VALID_URL = r'''(?x)https?://(?:www\.)?
|
||||
(?:tvplay(?:\.skaties)?\.lv/parraides|
|
||||
(?:tv3play|play\.tv3)\.lt/programos|
|
||||
tv3play(?:\.tv3)?\.ee/sisu|
|
||||
tv(?:3|6|8|10)play\.se/program|
|
||||
(?:(?:tv3play|viasat4play|tv6play)\.no|tv3play\.dk)/programmer|
|
||||
play\.novatv\.bg/programi
|
||||
)/[^/]+/(?P<id>\d+)
|
||||
'''
|
||||
IE_NAME = 'mtg'
|
||||
IE_DESC = 'MTG services'
|
||||
_VALID_URL = r'''(?x)
|
||||
(?:
|
||||
mtg:|
|
||||
https?://
|
||||
(?:www\.)?
|
||||
(?:
|
||||
tvplay(?:\.skaties)?\.lv/parraides|
|
||||
(?:tv3play|play\.tv3)\.lt/programos|
|
||||
tv3play(?:\.tv3)?\.ee/sisu|
|
||||
(?:tv(?:3|6|8|10)play|viafree)\.se/program|
|
||||
(?:(?:tv3play|viasat4play|tv6play|viafree)\.no|(?:tv3play|viafree)\.dk)/programmer|
|
||||
play\.novatv\.bg/programi
|
||||
)
|
||||
/(?:[^/]+/)+
|
||||
)
|
||||
(?P<id>\d+)
|
||||
'''
|
||||
_TESTS = [
|
||||
{
|
||||
'url': 'http://www.tvplay.lv/parraides/vinas-melo-labak/418113?autostart=true',
|
||||
@@ -194,9 +204,22 @@ class TVPlayIE(InfoExtractor):
|
||||
'url': 'http://tvplay.skaties.lv/parraides/vinas-melo-labak/418113?autostart=true',
|
||||
'only_matching': True,
|
||||
},
|
||||
{
|
||||
# views is null
|
||||
'url': 'http://tvplay.skaties.lv/parraides/tv3-zinas/760183',
|
||||
'only_matching': True,
|
||||
},
|
||||
{
|
||||
'url': 'http://tv3play.tv3.ee/sisu/kodu-keset-linna/238551?autostart=true',
|
||||
'only_matching': True,
|
||||
},
|
||||
{
|
||||
'url': 'http://www.viafree.se/program/underhallning/i-like-radio-live/sasong-1/676869',
|
||||
'only_matching': True,
|
||||
},
|
||||
{
|
||||
'url': 'mtg:418113',
|
||||
'only_matching': True,
|
||||
}
|
||||
]
|
||||
|
||||
@@ -204,13 +227,13 @@ class TVPlayIE(InfoExtractor):
|
||||
video_id = self._match_id(url)
|
||||
|
||||
video = self._download_json(
|
||||
'http://playapi.mtgx.tv/v1/videos/%s' % video_id, video_id, 'Downloading video JSON')
|
||||
'http://playapi.mtgx.tv/v3/videos/%s' % video_id, video_id, 'Downloading video JSON')
|
||||
|
||||
title = video['title']
|
||||
|
||||
try:
|
||||
streams = self._download_json(
|
||||
'http://playapi.mtgx.tv/v1/videos/stream/%s' % video_id,
|
||||
'http://playapi.mtgx.tv/v3/videos/stream/%s' % video_id,
|
||||
video_id, 'Downloading streams JSON')
|
||||
except ExtractorError as e:
|
||||
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 403:
|
||||
@@ -289,8 +312,61 @@ class TVPlayIE(InfoExtractor):
|
||||
'season_number': season_number,
|
||||
'duration': int_or_none(video.get('duration')),
|
||||
'timestamp': parse_iso8601(video.get('created_at')),
|
||||
'view_count': int_or_none(video.get('views', {}).get('total')),
|
||||
'view_count': try_get(video, lambda x: x['views']['total'], int),
|
||||
'age_limit': int_or_none(video.get('age_limit', 0)),
|
||||
'formats': formats,
|
||||
'subtitles': subtitles,
|
||||
}
|
||||
|
||||
|
||||
class ViafreeIE(InfoExtractor):
|
||||
_VALID_URL = r'''(?x)
|
||||
https?://
|
||||
(?:www\.)?
|
||||
viafree\.
|
||||
(?:
|
||||
(?:dk|no)/programmer|
|
||||
se/program
|
||||
)
|
||||
/(?:[^/]+/)+(?P<id>[^/?#&]+)
|
||||
'''
|
||||
_TESTS = [{
|
||||
'url': 'http://www.viafree.se/program/livsstil/husraddarna/sasong-2/avsnitt-2',
|
||||
'info_dict': {
|
||||
'id': '395375',
|
||||
'ext': 'mp4',
|
||||
'title': 'Husräddarna S02E02',
|
||||
'description': 'md5:4db5c933e37db629b5a2f75dfb34829e',
|
||||
'series': 'Husräddarna',
|
||||
'season': 'Säsong 2',
|
||||
'season_number': 2,
|
||||
'duration': 2576,
|
||||
'timestamp': 1400596321,
|
||||
'upload_date': '20140520',
|
||||
},
|
||||
'params': {
|
||||
'skip_download': True,
|
||||
},
|
||||
'add_ie': [TVPlayIE.ie_key()],
|
||||
}, {
|
||||
'url': 'http://www.viafree.no/programmer/underholdning/det-beste-vorspielet/sesong-2/episode-1',
|
||||
'only_matching': True,
|
||||
}, {
|
||||
'url': 'http://www.viafree.dk/programmer/reality/paradise-hotel/saeson-7/episode-5',
|
||||
'only_matching': True,
|
||||
}]
|
||||
|
||||
@classmethod
|
||||
def suitable(cls, url):
|
||||
return False if TVPlayIE.suitable(url) else super(ViafreeIE, cls).suitable(url)
|
||||
|
||||
def _real_extract(self, url):
|
||||
video_id = self._match_id(url)
|
||||
|
||||
webpage = self._download_webpage(url, video_id)
|
||||
|
||||
video_id = self._search_regex(
|
||||
r'currentVideo["\']\s*:\s*.+?["\']id["\']\s*:\s*["\'](?P<id>\d{6,})',
|
||||
webpage, 'video id')
|
||||
|
||||
return self.url_result('mtg:%s' % video_id, TVPlayIE.ie_key())
|
||||
|
@@ -7,6 +7,7 @@ import random
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..compat import (
|
||||
compat_HTTPError,
|
||||
compat_parse_qs,
|
||||
compat_str,
|
||||
compat_urllib_parse_urlencode,
|
||||
@@ -14,13 +15,13 @@ from ..compat import (
|
||||
compat_urlparse,
|
||||
)
|
||||
from ..utils import (
|
||||
clean_html,
|
||||
ExtractorError,
|
||||
int_or_none,
|
||||
js_to_json,
|
||||
orderedSet,
|
||||
parse_duration,
|
||||
parse_iso8601,
|
||||
sanitized_Request,
|
||||
urlencode_postdata,
|
||||
)
|
||||
|
||||
@@ -42,7 +43,7 @@ class TwitchBaseIE(InfoExtractor):
|
||||
'%s returned error: %s - %s' % (self.IE_NAME, error, response.get('message')),
|
||||
expected=True)
|
||||
|
||||
def _download_json(self, url, video_id, note='Downloading JSON metadata'):
|
||||
def _call_api(self, path, item_id, note):
|
||||
headers = {
|
||||
'Referer': 'http://api.twitch.tv/crossdomain/receiver.html?v=2',
|
||||
'X-Requested-With': 'XMLHttpRequest',
|
||||
@@ -50,8 +51,8 @@ class TwitchBaseIE(InfoExtractor):
|
||||
for cookie in self._downloader.cookiejar:
|
||||
if cookie.name == 'api_token':
|
||||
headers['Twitch-Api-Token'] = cookie.value
|
||||
request = sanitized_Request(url, headers=headers)
|
||||
response = super(TwitchBaseIE, self)._download_json(request, video_id, note)
|
||||
response = self._download_json(
|
||||
'%s/%s' % (self._API_BASE, path), item_id, note)
|
||||
self._handle_error(response)
|
||||
return response
|
||||
|
||||
@@ -63,9 +64,17 @@ class TwitchBaseIE(InfoExtractor):
|
||||
if username is None:
|
||||
return
|
||||
|
||||
def fail(message):
|
||||
raise ExtractorError(
|
||||
'Unable to login. Twitch said: %s' % message, expected=True)
|
||||
|
||||
login_page, handle = self._download_webpage_handle(
|
||||
self._LOGIN_URL, None, 'Downloading login page')
|
||||
|
||||
# Some TOR nodes and public proxies are blocked completely
|
||||
if 'blacklist_message' in login_page:
|
||||
fail(clean_html(login_page))
|
||||
|
||||
login_form = self._hidden_inputs(login_page)
|
||||
|
||||
login_form.update({
|
||||
@@ -82,21 +91,24 @@ class TwitchBaseIE(InfoExtractor):
|
||||
if not post_url.startswith('http'):
|
||||
post_url = compat_urlparse.urljoin(redirect_url, post_url)
|
||||
|
||||
request = sanitized_Request(
|
||||
post_url, urlencode_postdata(login_form))
|
||||
request.add_header('Referer', redirect_url)
|
||||
response = self._download_webpage(
|
||||
request, None, 'Logging in as %s' % username)
|
||||
headers = {'Referer': redirect_url}
|
||||
|
||||
error_message = self._search_regex(
|
||||
r'<div[^>]+class="subwindow_notice"[^>]*>([^<]+)</div>',
|
||||
response, 'error message', default=None)
|
||||
if error_message:
|
||||
raise ExtractorError(
|
||||
'Unable to login. Twitch said: %s' % error_message, expected=True)
|
||||
try:
|
||||
response = self._download_json(
|
||||
post_url, None, 'Logging in as %s' % username,
|
||||
data=urlencode_postdata(login_form),
|
||||
headers=headers)
|
||||
except ExtractorError as e:
|
||||
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 400:
|
||||
response = self._parse_json(
|
||||
e.cause.read().decode('utf-8'), None)
|
||||
fail(response['message'])
|
||||
raise
|
||||
|
||||
if '>Reset your password<' in response:
|
||||
self.report_warning('Twitch asks you to reset your password, go to https://secure.twitch.tv/reset/submit')
|
||||
if response.get('redirect'):
|
||||
self._download_webpage(
|
||||
response['redirect'], None, 'Downloading login redirect page',
|
||||
headers=headers)
|
||||
|
||||
def _prefer_source(self, formats):
|
||||
try:
|
||||
@@ -109,14 +121,14 @@ class TwitchBaseIE(InfoExtractor):
|
||||
|
||||
class TwitchItemBaseIE(TwitchBaseIE):
|
||||
def _download_info(self, item, item_id):
|
||||
return self._extract_info(self._download_json(
|
||||
'%s/kraken/videos/%s%s' % (self._API_BASE, item, item_id), item_id,
|
||||
return self._extract_info(self._call_api(
|
||||
'kraken/videos/%s%s' % (item, item_id), item_id,
|
||||
'Downloading %s info JSON' % self._ITEM_TYPE))
|
||||
|
||||
def _extract_media(self, item_id):
|
||||
info = self._download_info(self._ITEM_SHORTCUT, item_id)
|
||||
response = self._download_json(
|
||||
'%s/api/videos/%s%s' % (self._API_BASE, self._ITEM_SHORTCUT, item_id), item_id,
|
||||
response = self._call_api(
|
||||
'api/videos/%s%s' % (self._ITEM_SHORTCUT, item_id), item_id,
|
||||
'Downloading %s playlist JSON' % self._ITEM_TYPE)
|
||||
entries = []
|
||||
chunks = response['chunks']
|
||||
@@ -246,8 +258,8 @@ class TwitchVodIE(TwitchItemBaseIE):
|
||||
item_id = self._match_id(url)
|
||||
|
||||
info = self._download_info(self._ITEM_SHORTCUT, item_id)
|
||||
access_token = self._download_json(
|
||||
'%s/api/vods/%s/access_token' % (self._API_BASE, item_id), item_id,
|
||||
access_token = self._call_api(
|
||||
'api/vods/%s/access_token' % item_id, item_id,
|
||||
'Downloading %s access token' % self._ITEM_TYPE)
|
||||
|
||||
formats = self._extract_m3u8_formats(
|
||||
@@ -275,12 +287,12 @@ class TwitchVodIE(TwitchItemBaseIE):
|
||||
|
||||
|
||||
class TwitchPlaylistBaseIE(TwitchBaseIE):
|
||||
_PLAYLIST_URL = '%s/kraken/channels/%%s/videos/?offset=%%d&limit=%%d' % TwitchBaseIE._API_BASE
|
||||
_PLAYLIST_PATH = 'kraken/channels/%s/videos/?offset=%d&limit=%d'
|
||||
_PAGE_LIMIT = 100
|
||||
|
||||
def _extract_playlist(self, channel_id):
|
||||
info = self._download_json(
|
||||
'%s/kraken/channels/%s' % (self._API_BASE, channel_id),
|
||||
info = self._call_api(
|
||||
'kraken/channels/%s' % channel_id,
|
||||
channel_id, 'Downloading channel info JSON')
|
||||
channel_name = info.get('display_name') or info.get('name')
|
||||
entries = []
|
||||
@@ -289,8 +301,8 @@ class TwitchPlaylistBaseIE(TwitchBaseIE):
|
||||
broken_paging_detected = False
|
||||
counter_override = None
|
||||
for counter in itertools.count(1):
|
||||
response = self._download_json(
|
||||
self._PLAYLIST_URL % (channel_id, offset, limit),
|
||||
response = self._call_api(
|
||||
self._PLAYLIST_PATH % (channel_id, offset, limit),
|
||||
channel_id,
|
||||
'Downloading %s videos JSON page %s'
|
||||
% (self._PLAYLIST_TYPE, counter_override or counter))
|
||||
@@ -345,7 +357,7 @@ class TwitchProfileIE(TwitchPlaylistBaseIE):
|
||||
class TwitchPastBroadcastsIE(TwitchPlaylistBaseIE):
|
||||
IE_NAME = 'twitch:past_broadcasts'
|
||||
_VALID_URL = r'%s/(?P<id>[^/]+)/profile/past_broadcasts/?(?:\#.*)?$' % TwitchBaseIE._VALID_URL_BASE
|
||||
_PLAYLIST_URL = TwitchPlaylistBaseIE._PLAYLIST_URL + '&broadcasts=true'
|
||||
_PLAYLIST_PATH = TwitchPlaylistBaseIE._PLAYLIST_PATH + '&broadcasts=true'
|
||||
_PLAYLIST_TYPE = 'past broadcasts'
|
||||
|
||||
_TEST = {
|
||||
@@ -389,8 +401,8 @@ class TwitchStreamIE(TwitchBaseIE):
|
||||
def _real_extract(self, url):
|
||||
channel_id = self._match_id(url)
|
||||
|
||||
stream = self._download_json(
|
||||
'%s/kraken/streams/%s' % (self._API_BASE, channel_id), channel_id,
|
||||
stream = self._call_api(
|
||||
'kraken/streams/%s' % channel_id, channel_id,
|
||||
'Downloading stream JSON').get('stream')
|
||||
|
||||
# Fallback on profile extraction if stream is offline
|
||||
@@ -405,8 +417,8 @@ class TwitchStreamIE(TwitchBaseIE):
|
||||
# JSON and fallback to lowercase if it's not available.
|
||||
channel_id = stream.get('channel', {}).get('name') or channel_id.lower()
|
||||
|
||||
access_token = self._download_json(
|
||||
'%s/api/channels/%s/access_token' % (self._API_BASE, channel_id), channel_id,
|
||||
access_token = self._call_api(
|
||||
'api/channels/%s/access_token' % channel_id, channel_id,
|
||||
'Downloading channel access token')
|
||||
|
||||
query = {
|
||||
|
70
youtube_dl/extractor/uplynk.py
Normal file
70
youtube_dl/extractor/uplynk.py
Normal file
@@ -0,0 +1,70 @@
|
||||
# coding: utf-8
|
||||
from __future__ import unicode_literals
|
||||
|
||||
import re
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..utils import (
|
||||
float_or_none,
|
||||
ExtractorError,
|
||||
)
|
||||
|
||||
|
||||
class UplynkIE(InfoExtractor):
|
||||
IE_NAME = 'uplynk'
|
||||
_VALID_URL = r'https?://.*?\.uplynk\.com/(?P<path>ext/[0-9a-f]{32}/(?P<external_id>[^/?&]+)|(?P<id>[0-9a-f]{32}))\.(?:m3u8|json)(?:.*?\bpbs=(?P<session_id>[^&]+))?'
|
||||
_TEST = {
|
||||
'url': 'http://content.uplynk.com/e89eaf2ce9054aa89d92ddb2d817a52e.m3u8',
|
||||
'info_dict': {
|
||||
'id': 'e89eaf2ce9054aa89d92ddb2d817a52e',
|
||||
'ext': 'mp4',
|
||||
'title': '030816-kgo-530pm-solar-eclipse-vid_web.mp4',
|
||||
'uploader_id': '4413701bf5a1488db55b767f8ae9d4fa',
|
||||
},
|
||||
'params': {
|
||||
# m3u8 download
|
||||
'skip_download': True,
|
||||
},
|
||||
}
|
||||
|
||||
def _extract_uplynk_info(self, uplynk_content_url):
|
||||
path, external_id, video_id, session_id = re.match(UplynkIE._VALID_URL, uplynk_content_url).groups()
|
||||
display_id = video_id or external_id
|
||||
formats = self._extract_m3u8_formats('http://content.uplynk.com/%s.m3u8' % path, display_id, 'mp4')
|
||||
if session_id:
|
||||
for f in formats:
|
||||
f['extra_param_to_segment_url'] = {
|
||||
'pbs': session_id,
|
||||
}
|
||||
self._sort_formats(formats)
|
||||
asset = self._download_json('http://content.uplynk.com/player/assetinfo/%s.json' % path, display_id)
|
||||
if asset.get('error') == 1:
|
||||
raise ExtractorError('% said: %s' % (self.IE_NAME, asset['msg']), expected=True)
|
||||
|
||||
return {
|
||||
'id': asset['asset'],
|
||||
'title': asset['desc'],
|
||||
'thumbnail': asset.get('default_poster_url'),
|
||||
'duration': float_or_none(asset.get('duration')),
|
||||
'uploader_id': asset.get('owner'),
|
||||
'formats': formats,
|
||||
}
|
||||
|
||||
def _real_extract(self, url):
|
||||
return self._extract_uplynk_info(url)
|
||||
|
||||
|
||||
class UplynkPreplayIE(UplynkIE):
|
||||
IE_NAME = 'uplynk:preplay'
|
||||
_VALID_URL = r'https?://.*?\.uplynk\.com/preplay2?/(?P<path>ext/[0-9a-f]{32}/(?P<external_id>[^/?&]+)|(?P<id>[0-9a-f]{32}))\.json'
|
||||
_TEST = None
|
||||
|
||||
def _real_extract(self, url):
|
||||
path, external_id, video_id = re.match(self._VALID_URL, url).groups()
|
||||
display_id = video_id or external_id
|
||||
preplay = self._download_json(url, display_id)
|
||||
content_url = 'http://content.uplynk.com/%s.m3u8' % path
|
||||
session_id = preplay.get('sid')
|
||||
if session_id:
|
||||
content_url += '?pbs=' + session_id
|
||||
return self._extract_uplynk_info(content_url)
|
@@ -1,12 +1,14 @@
|
||||
# encoding: utf-8
|
||||
from __future__ import unicode_literals
|
||||
|
||||
import re
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..utils import urlencode_postdata
|
||||
|
||||
|
||||
class Vbox7IE(InfoExtractor):
|
||||
_VALID_URL = r'https?://(?:www\.)?vbox7\.com/play:(?P<id>[^/]+)'
|
||||
_VALID_URL = r'https?://(?:www\.)?vbox7\.com/(?:play:|emb/external\.php\?.*?\bvid=)(?P<id>[\da-fA-F]+)'
|
||||
_TESTS = [{
|
||||
'url': 'http://vbox7.com/play:0946fff23c',
|
||||
'md5': 'a60f9ab3a3a2f013ef9a967d5f7be5bf',
|
||||
@@ -24,15 +26,27 @@ class Vbox7IE(InfoExtractor):
|
||||
'title': 'Смях! Чудо - чист за секунди - Скрита камера',
|
||||
},
|
||||
'skip': 'georestricted',
|
||||
}, {
|
||||
'url': 'http://vbox7.com/emb/external.php?vid=a240d20f9c&autoplay=1',
|
||||
'only_matching': True,
|
||||
}]
|
||||
|
||||
@staticmethod
|
||||
def _extract_url(webpage):
|
||||
mobj = re.search(
|
||||
'<iframe[^>]+src=(?P<q>["\'])(?P<url>(?:https?:)?//vbox7\.com/emb/external\.php.+?)(?P=q)',
|
||||
webpage)
|
||||
if mobj:
|
||||
return mobj.group('url')
|
||||
|
||||
def _real_extract(self, url):
|
||||
video_id = self._match_id(url)
|
||||
|
||||
webpage = self._download_webpage(url, video_id)
|
||||
webpage = self._download_webpage(
|
||||
'http://vbox7.com/play:%s' % video_id, video_id)
|
||||
|
||||
title = self._html_search_regex(
|
||||
r'<title>(.*)</title>', webpage, 'title').split('/')[0].strip()
|
||||
r'<title>(.+?)</title>', webpage, 'title').split('/')[0].strip()
|
||||
|
||||
video_url = self._search_regex(
|
||||
r'src\s*:\s*(["\'])(?P<url>.+?.mp4.*?)\1',
|
||||
|
@@ -8,6 +8,7 @@ from .xstream import XstreamIE
|
||||
from ..utils import (
|
||||
ExtractorError,
|
||||
float_or_none,
|
||||
try_get,
|
||||
)
|
||||
|
||||
|
||||
@@ -129,6 +130,11 @@ class VGTVIE(XstreamIE):
|
||||
'url': 'http://ap.vgtv.no/webtv#!/video/111084/de-nye-bysyklene-lettere-bedre-gir-stoerre-hjul-og-feste-til-mobil',
|
||||
'only_matching': True,
|
||||
},
|
||||
{
|
||||
# geoblocked
|
||||
'url': 'http://www.vgtv.no/#!/video/127205/inside-the-mind-of-favela-funk',
|
||||
'only_matching': True,
|
||||
},
|
||||
]
|
||||
|
||||
def _real_extract(self, url):
|
||||
@@ -196,6 +202,12 @@ class VGTVIE(XstreamIE):
|
||||
|
||||
info['formats'].extend(formats)
|
||||
|
||||
if not info['formats']:
|
||||
properties = try_get(
|
||||
data, lambda x: x['streamConfiguration']['properties'], list)
|
||||
if properties and 'geoblocked' in properties:
|
||||
raise self.raise_geo_restricted()
|
||||
|
||||
self._sort_formats(info['formats'])
|
||||
|
||||
info.update({
|
||||
|
107
youtube_dl/extractor/viceland.py
Normal file
107
youtube_dl/extractor/viceland.py
Normal file
@@ -0,0 +1,107 @@
|
||||
# coding: utf-8
|
||||
from __future__ import unicode_literals
|
||||
|
||||
import time
|
||||
import hashlib
|
||||
import json
|
||||
|
||||
from .adobepass import AdobePassIE
|
||||
from ..compat import compat_HTTPError
|
||||
from ..utils import (
|
||||
int_or_none,
|
||||
parse_age_limit,
|
||||
str_or_none,
|
||||
parse_duration,
|
||||
ExtractorError,
|
||||
extract_attributes,
|
||||
)
|
||||
|
||||
|
||||
class VicelandIE(AdobePassIE):
|
||||
_VALID_URL = r'https?://(?:www\.)?viceland\.com/[^/]+/video/[^/]+/(?P<id>[a-f0-9]+)'
|
||||
_TEST = {
|
||||
'url': 'https://www.viceland.com/en_us/video/cyberwar-trailer/57608447973ee7705f6fbd4e',
|
||||
'info_dict': {
|
||||
'id': '57608447973ee7705f6fbd4e',
|
||||
'ext': 'mp4',
|
||||
'title': 'CYBERWAR (Trailer)',
|
||||
'description': 'Tapping into the geopolitics of hacking and surveillance, Ben Makuch travels the world to meet with hackers, government officials, and dissidents to investigate the ecosystem of cyberwarfare.',
|
||||
'age_limit': 14,
|
||||
'timestamp': 1466008539,
|
||||
'upload_date': '20160615',
|
||||
'uploader_id': '11',
|
||||
'uploader': 'Viceland',
|
||||
},
|
||||
'params': {
|
||||
# m3u8 download
|
||||
'skip_download': True,
|
||||
},
|
||||
'add_ie': ['UplynkPreplay'],
|
||||
}
|
||||
|
||||
def _real_extract(self, url):
|
||||
video_id = self._match_id(url)
|
||||
|
||||
webpage = self._download_webpage(url, video_id)
|
||||
watch_hub_data = extract_attributes(self._search_regex(
|
||||
r'(?s)(<watch-hub\s*.+?</watch-hub>)', webpage, 'watch hub'))
|
||||
video_id = watch_hub_data['vms-id']
|
||||
title = watch_hub_data['video-title']
|
||||
|
||||
query = {}
|
||||
if watch_hub_data.get('video-locked') == '1':
|
||||
resource = self._get_mvpd_resource(
|
||||
'VICELAND', title, video_id,
|
||||
watch_hub_data.get('video-rating'))
|
||||
query['tvetoken'] = self._extract_mvpd_auth(url, video_id, 'VICELAND', resource)
|
||||
|
||||
# signature generation algorithm is reverse engineered from signatureGenerator in
|
||||
# webpack:///../shared/~/vice-player/dist/js/vice-player.js in
|
||||
# https://www.viceland.com/assets/common/js/web.vendor.bundle.js
|
||||
exp = int(time.time()) + 14400
|
||||
query.update({
|
||||
'exp': exp,
|
||||
'sign': hashlib.sha512(('%s:GET:%d' % (video_id, exp)).encode()).hexdigest(),
|
||||
})
|
||||
|
||||
try:
|
||||
preplay = self._download_json('https://www.viceland.com/en_us/preplay/%s' % video_id, video_id, query=query)
|
||||
except ExtractorError as e:
|
||||
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 400:
|
||||
error = json.loads(e.cause.read().decode())
|
||||
raise ExtractorError('%s said: %s' % (self.IE_NAME, error['details']), expected=True)
|
||||
raise
|
||||
|
||||
video_data = preplay['video']
|
||||
base = video_data['base']
|
||||
uplynk_preplay_url = preplay['preplayURL']
|
||||
episode = video_data.get('episode', {})
|
||||
channel = video_data.get('channel', {})
|
||||
|
||||
subtitles = {}
|
||||
cc_url = preplay.get('ccURL')
|
||||
if cc_url:
|
||||
subtitles['en'] = [{
|
||||
'url': cc_url,
|
||||
}]
|
||||
|
||||
return {
|
||||
'_type': 'url_transparent',
|
||||
'url': uplynk_preplay_url,
|
||||
'id': video_id,
|
||||
'title': title,
|
||||
'description': base.get('body'),
|
||||
'thumbnail': watch_hub_data.get('cover-image') or watch_hub_data.get('thumbnail'),
|
||||
'duration': parse_duration(video_data.get('video_duration') or watch_hub_data.get('video-duration')),
|
||||
'timestamp': int_or_none(video_data.get('created_at')),
|
||||
'age_limit': parse_age_limit(video_data.get('video_rating')),
|
||||
'series': video_data.get('show_title') or watch_hub_data.get('show-title'),
|
||||
'episode_number': int_or_none(episode.get('episode_number') or watch_hub_data.get('episode')),
|
||||
'episode_id': str_or_none(episode.get('id') or video_data.get('episode_id')),
|
||||
'season_number': int_or_none(watch_hub_data.get('season')),
|
||||
'season_id': str_or_none(episode.get('season_id')),
|
||||
'uploader': channel.get('base', {}).get('title') or watch_hub_data.get('channel-title'),
|
||||
'uploader_id': str_or_none(channel.get('id')),
|
||||
'subtitles': subtitles,
|
||||
'ie_key': 'UplynkPreplay',
|
||||
}
|
@@ -1,6 +1,7 @@
|
||||
# encoding: utf-8
|
||||
from __future__ import unicode_literals
|
||||
|
||||
import collections
|
||||
import re
|
||||
import json
|
||||
import sys
|
||||
@@ -16,7 +17,6 @@ from ..utils import (
|
||||
get_element_by_class,
|
||||
int_or_none,
|
||||
orderedSet,
|
||||
parse_duration,
|
||||
remove_start,
|
||||
str_to_int,
|
||||
unescapeHTML,
|
||||
@@ -52,8 +52,9 @@ class VKBaseIE(InfoExtractor):
|
||||
# what actually happens.
|
||||
# We will workaround this VK issue by resetting the remixlhk cookie to
|
||||
# the first one manually.
|
||||
cookies = url_handle.headers.get('Set-Cookie')
|
||||
if cookies:
|
||||
for header, cookies in url_handle.headers.items():
|
||||
if header.lower() != 'set-cookie':
|
||||
continue
|
||||
if sys.version_info[0] >= 3:
|
||||
cookies = cookies.encode('iso-8859-1')
|
||||
cookies = cookies.decode('utf-8')
|
||||
@@ -61,6 +62,7 @@ class VKBaseIE(InfoExtractor):
|
||||
if remixlhk:
|
||||
value, domain = remixlhk.groups()
|
||||
self._set_cookie(domain, 'remixlhk', value)
|
||||
break
|
||||
|
||||
login_page = self._download_webpage(
|
||||
'https://login.vk.com/?act=login', None,
|
||||
@@ -445,6 +447,9 @@ class VKWallPostIE(VKBaseIE):
|
||||
'skip_download': True,
|
||||
},
|
||||
}],
|
||||
'params': {
|
||||
'usenetrc': True,
|
||||
},
|
||||
'skip': 'Requires vk account credentials',
|
||||
}, {
|
||||
# single YouTube embed, no leading -
|
||||
@@ -454,6 +459,9 @@ class VKWallPostIE(VKBaseIE):
|
||||
'title': 'Sergey Gorbunov - Wall post 85155021_6319',
|
||||
},
|
||||
'playlist_count': 1,
|
||||
'params': {
|
||||
'usenetrc': True,
|
||||
},
|
||||
'skip': 'Requires vk account credentials',
|
||||
}, {
|
||||
# wall page URL
|
||||
@@ -481,37 +489,41 @@ class VKWallPostIE(VKBaseIE):
|
||||
raise ExtractorError('VK said: %s' % error, expected=True)
|
||||
|
||||
description = clean_html(get_element_by_class('wall_post_text', webpage))
|
||||
uploader = clean_html(get_element_by_class(
|
||||
'fw_post_author', webpage)) or self._og_search_description(webpage)
|
||||
uploader = clean_html(get_element_by_class('author', webpage))
|
||||
thumbnail = self._og_search_thumbnail(webpage)
|
||||
|
||||
entries = []
|
||||
|
||||
for audio in re.finditer(r'''(?sx)
|
||||
<input[^>]+
|
||||
id=(?P<q1>["\'])audio_info(?P<id>\d+_\d+).*?(?P=q1)[^>]+
|
||||
value=(?P<q2>["\'])(?P<url>http.+?)(?P=q2)
|
||||
.+?
|
||||
</table>''', webpage):
|
||||
audio_html = audio.group(0)
|
||||
audio_id = audio.group('id')
|
||||
duration = parse_duration(get_element_by_class('duration', audio_html))
|
||||
track = self._html_search_regex(
|
||||
r'<span[^>]+id=["\']title%s[^>]*>([^<]+)' % audio_id,
|
||||
audio_html, 'title', default=None)
|
||||
artist = self._html_search_regex(
|
||||
r'>([^<]+)</a></b>\s*&ndash', audio_html,
|
||||
'artist', default=None)
|
||||
entries.append({
|
||||
'id': audio_id,
|
||||
'url': audio.group('url'),
|
||||
'title': '%s - %s' % (artist, track) if artist and track else audio_id,
|
||||
'thumbnail': thumbnail,
|
||||
'duration': duration,
|
||||
'uploader': uploader,
|
||||
'artist': artist,
|
||||
'track': track,
|
||||
})
|
||||
audio_ids = re.findall(r'data-full-id=["\'](\d+_\d+)', webpage)
|
||||
if audio_ids:
|
||||
al_audio = self._download_webpage(
|
||||
'https://vk.com/al_audio.php', post_id,
|
||||
note='Downloading audio info', fatal=False,
|
||||
data=urlencode_postdata({
|
||||
'act': 'reload_audio',
|
||||
'al': '1',
|
||||
'ids': ','.join(audio_ids)
|
||||
}))
|
||||
if al_audio:
|
||||
Audio = collections.namedtuple(
|
||||
'Audio', ['id', 'user_id', 'url', 'track', 'artist', 'duration'])
|
||||
audios = self._parse_json(
|
||||
self._search_regex(
|
||||
r'<!json>(.+?)<!>', al_audio, 'audios', default='[]'),
|
||||
post_id, fatal=False, transform_source=unescapeHTML)
|
||||
if isinstance(audios, list):
|
||||
for audio in audios:
|
||||
a = Audio._make(audio[:6])
|
||||
entries.append({
|
||||
'id': '%s_%s' % (a.user_id, a.id),
|
||||
'url': a.url,
|
||||
'title': '%s - %s' % (a.artist, a.track) if a.artist and a.track else a.id,
|
||||
'thumbnail': thumbnail,
|
||||
'duration': a.duration,
|
||||
'uploader': uploader,
|
||||
'artist': a.artist,
|
||||
'track': a.track,
|
||||
})
|
||||
|
||||
for video in re.finditer(
|
||||
r'<a[^>]+href=(["\'])(?P<url>/video(?:-?[\d_]+).*?)\1', webpage):
|
||||
|
@@ -17,12 +17,12 @@ class VuClipIE(InfoExtractor):
|
||||
_VALID_URL = r'https?://(?:m\.)?vuclip\.com/w\?.*?cid=(?P<id>[0-9]+)'
|
||||
|
||||
_TEST = {
|
||||
'url': 'http://m.vuclip.com/w?cid=922692425&fid=70295&z=1010&nvar&frm=index.html',
|
||||
'url': 'http://m.vuclip.com/w?cid=1129900602&bu=8589892792&frm=w&z=34801&op=0&oc=843169247§ion=recommend',
|
||||
'info_dict': {
|
||||
'id': '922692425',
|
||||
'id': '1129900602',
|
||||
'ext': '3gp',
|
||||
'title': 'The Toy Soldiers - Hollywood Movie Trailer',
|
||||
'duration': 177,
|
||||
'title': 'Top 10 TV Convicts',
|
||||
'duration': 733,
|
||||
}
|
||||
}
|
||||
|
||||
@@ -54,7 +54,7 @@ class VuClipIE(InfoExtractor):
|
||||
'url': video_url,
|
||||
}]
|
||||
else:
|
||||
formats = self._parse_html5_media_entries(url, webpage)[0]['formats']
|
||||
formats = self._parse_html5_media_entries(url, webpage, video_id)[0]['formats']
|
||||
|
||||
title = remove_end(self._html_search_regex(
|
||||
r'<title>(.*?)-\s*Vuclip</title>', webpage, 'title').strip(), ' - Video')
|
||||
|
@@ -13,6 +13,7 @@ class XiamiBaseIE(InfoExtractor):
|
||||
webpage = super(XiamiBaseIE, self)._download_webpage(*args, **kwargs)
|
||||
if '>Xiami is currently not available in your country.<' in webpage:
|
||||
self.raise_geo_restricted('Xiami is currently not available in your country')
|
||||
return webpage
|
||||
|
||||
def _extract_track(self, track, track_id=None):
|
||||
title = track['title']
|
||||
|
@@ -15,10 +15,10 @@ class XVideosIE(InfoExtractor):
|
||||
_VALID_URL = r'https?://(?:www\.)?xvideos\.com/video(?P<id>[0-9]+)(?:.*)'
|
||||
_TEST = {
|
||||
'url': 'http://www.xvideos.com/video4588838/biker_takes_his_girl',
|
||||
'md5': '4b46ae6ea5e6e9086e714d883313c0c9',
|
||||
'md5': '14cea69fcb84db54293b1e971466c2e1',
|
||||
'info_dict': {
|
||||
'id': '4588838',
|
||||
'ext': 'flv',
|
||||
'ext': 'mp4',
|
||||
'title': 'Biker Takes his Girl',
|
||||
'age_limit': 18,
|
||||
}
|
||||
@@ -42,24 +42,24 @@ class XVideosIE(InfoExtractor):
|
||||
video_url = compat_urllib_parse_unquote(self._search_regex(
|
||||
r'flv_url=(.+?)&', webpage, 'video URL', default=''))
|
||||
if video_url:
|
||||
formats.append({'url': video_url})
|
||||
formats.append({
|
||||
'url': video_url,
|
||||
'format_id': 'flv',
|
||||
})
|
||||
|
||||
player_args = self._search_regex(
|
||||
r'(?s)new\s+HTML5Player\((.+?)\)', webpage, ' html5 player', default=None)
|
||||
if player_args:
|
||||
for arg in player_args.split(','):
|
||||
format_url = self._search_regex(
|
||||
r'(["\'])(?P<url>https?://.+?)\1', arg, 'url',
|
||||
default=None, group='url')
|
||||
if not format_url:
|
||||
continue
|
||||
ext = determine_ext(format_url)
|
||||
if ext == 'mp4':
|
||||
formats.append({'url': format_url})
|
||||
elif ext == 'm3u8':
|
||||
formats.extend(self._extract_m3u8_formats(
|
||||
format_url, video_id, 'mp4',
|
||||
entry_protocol='m3u8_native', m3u8_id='hls', fatal=False))
|
||||
for kind, _, format_url in re.findall(
|
||||
r'setVideo([^(]+)\((["\'])(http.+?)\2\)', webpage):
|
||||
format_id = kind.lower()
|
||||
if format_id == 'hls':
|
||||
formats.extend(self._extract_m3u8_formats(
|
||||
format_url, video_id, 'mp4',
|
||||
entry_protocol='m3u8_native', m3u8_id='hls', fatal=False))
|
||||
elif format_id in ('urllow', 'urlhigh'):
|
||||
formats.append({
|
||||
'url': format_url,
|
||||
'format_id': '%s-%s' % (determine_ext(format_url, 'mp4'), format_id[3:]),
|
||||
'quality': -2 if format_id.endswith('low') else None,
|
||||
})
|
||||
|
||||
self._sort_formats(formats)
|
||||
|
||||
|
@@ -91,36 +91,18 @@ class YoutubeBaseInfoExtractor(InfoExtractor):
|
||||
if login_page is False:
|
||||
return
|
||||
|
||||
galx = self._search_regex(r'(?s)<input.+?name="GALX".+?value="(.+?)"',
|
||||
login_page, 'Login GALX parameter')
|
||||
login_form = self._hidden_inputs(login_page)
|
||||
|
||||
# Log in
|
||||
login_form_strs = {
|
||||
'continue': 'https://www.youtube.com/signin?action_handle_signin=true&feature=sign_in_button&hl=en_US&nomobiletemp=1',
|
||||
login_form.update({
|
||||
'checkConnection': 'youtube',
|
||||
'Email': username,
|
||||
'GALX': galx,
|
||||
'Passwd': password,
|
||||
|
||||
'PersistentCookie': 'yes',
|
||||
'_utf8': '霱',
|
||||
'bgresponse': 'js_disabled',
|
||||
'checkConnection': '',
|
||||
'checkedDomains': 'youtube',
|
||||
'dnConn': '',
|
||||
'pstMsg': '0',
|
||||
'rmShown': '1',
|
||||
'secTok': '',
|
||||
'signIn': 'Sign in',
|
||||
'timeStmp': '',
|
||||
'service': 'youtube',
|
||||
'uilel': '3',
|
||||
'hl': 'en_US',
|
||||
}
|
||||
})
|
||||
|
||||
login_results = self._download_webpage(
|
||||
self._PASSWORD_CHALLENGE_URL, None,
|
||||
note='Logging in', errnote='unable to log in', fatal=False,
|
||||
data=urlencode_postdata(login_form_strs))
|
||||
data=urlencode_postdata(login_form))
|
||||
if login_results is False:
|
||||
return False
|
||||
|
||||
|
@@ -4,13 +4,17 @@ from __future__ import unicode_literals
|
||||
import re
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..utils import ExtractorError
|
||||
from ..utils import (
|
||||
ExtractorError,
|
||||
int_or_none,
|
||||
update_url_query,
|
||||
)
|
||||
|
||||
|
||||
class ZingMp3BaseInfoExtractor(InfoExtractor):
|
||||
|
||||
def _extract_item(self, item, fatal=True):
|
||||
error_message = item.find('./errormessage').text
|
||||
def _extract_item(self, item, page_type, fatal=True):
|
||||
error_message = item.get('msg')
|
||||
if error_message:
|
||||
if not fatal:
|
||||
return
|
||||
@@ -18,25 +22,48 @@ class ZingMp3BaseInfoExtractor(InfoExtractor):
|
||||
'%s returned error: %s' % (self.IE_NAME, error_message),
|
||||
expected=True)
|
||||
|
||||
title = item.find('./title').text.strip()
|
||||
source = item.find('./source').text
|
||||
extension = item.attrib['type']
|
||||
thumbnail = item.find('./backimage').text
|
||||
formats = []
|
||||
for quality, source_url in zip(item.get('qualities') or item.get('quality', []), item.get('source_list') or item.get('source', [])):
|
||||
if not source_url or source_url == 'require vip':
|
||||
continue
|
||||
if not re.match(r'https?://', source_url):
|
||||
source_url = '//' + source_url
|
||||
source_url = self._proto_relative_url(source_url, 'http:')
|
||||
quality_num = int_or_none(quality)
|
||||
f = {
|
||||
'format_id': quality,
|
||||
'url': source_url,
|
||||
}
|
||||
if page_type == 'video':
|
||||
f.update({
|
||||
'height': quality_num,
|
||||
'ext': 'mp4',
|
||||
})
|
||||
else:
|
||||
f.update({
|
||||
'abr': quality_num,
|
||||
'ext': 'mp3',
|
||||
})
|
||||
formats.append(f)
|
||||
|
||||
cover = item.get('cover')
|
||||
|
||||
return {
|
||||
'title': title,
|
||||
'url': source,
|
||||
'ext': extension,
|
||||
'thumbnail': thumbnail,
|
||||
'title': (item.get('name') or item.get('title')).strip(),
|
||||
'formats': formats,
|
||||
'thumbnail': 'http:/' + cover if cover else None,
|
||||
'artist': item.get('artist'),
|
||||
}
|
||||
|
||||
def _extract_player_xml(self, player_xml_url, id, playlist_title=None):
|
||||
player_xml = self._download_xml(player_xml_url, id, 'Downloading Player XML')
|
||||
items = player_xml.findall('./item')
|
||||
def _extract_player_json(self, player_json_url, id, page_type, playlist_title=None):
|
||||
player_json = self._download_json(player_json_url, id, 'Downloading Player JSON')
|
||||
items = player_json['data']
|
||||
if 'item' in items:
|
||||
items = items['item']
|
||||
|
||||
if len(items) == 1:
|
||||
# one single song
|
||||
data = self._extract_item(items[0])
|
||||
data = self._extract_item(items[0], page_type)
|
||||
data['id'] = id
|
||||
|
||||
return data
|
||||
@@ -45,7 +72,7 @@ class ZingMp3BaseInfoExtractor(InfoExtractor):
|
||||
entries = []
|
||||
|
||||
for i, item in enumerate(items, 1):
|
||||
entry = self._extract_item(item, fatal=False)
|
||||
entry = self._extract_item(item, page_type, fatal=False)
|
||||
if not entry:
|
||||
continue
|
||||
entry['id'] = '%s-%d' % (id, i)
|
||||
@@ -59,8 +86,8 @@ class ZingMp3BaseInfoExtractor(InfoExtractor):
|
||||
}
|
||||
|
||||
|
||||
class ZingMp3SongIE(ZingMp3BaseInfoExtractor):
|
||||
_VALID_URL = r'https?://mp3\.zing\.vn/bai-hat/(?P<slug>[^/]+)/(?P<song_id>\w+)\.html'
|
||||
class ZingMp3IE(ZingMp3BaseInfoExtractor):
|
||||
_VALID_URL = r'https?://mp3\.zing\.vn/(?:bai-hat|album|playlist|video-clip)/[^/]+/(?P<id>\w+)\.html'
|
||||
_TESTS = [{
|
||||
'url': 'http://mp3.zing.vn/bai-hat/Xa-Mai-Xa-Bao-Thy/ZWZB9WAB.html',
|
||||
'md5': 'ead7ae13693b3205cbc89536a077daed',
|
||||
@@ -70,51 +97,47 @@ class ZingMp3SongIE(ZingMp3BaseInfoExtractor):
|
||||
'ext': 'mp3',
|
||||
'thumbnail': 're:^https?://.*\.jpg$',
|
||||
},
|
||||
}]
|
||||
IE_NAME = 'zingmp3:song'
|
||||
IE_DESC = 'mp3.zing.vn songs'
|
||||
|
||||
def _real_extract(self, url):
|
||||
matched = re.match(self._VALID_URL, url)
|
||||
slug = matched.group('slug')
|
||||
song_id = matched.group('song_id')
|
||||
|
||||
webpage = self._download_webpage(
|
||||
'http://mp3.zing.vn/bai-hat/%s/%s.html' % (slug, song_id), song_id)
|
||||
|
||||
player_xml_url = self._search_regex(
|
||||
r'&xmlURL=(?P<xml_url>[^&]+)&', webpage, 'player xml url')
|
||||
|
||||
return self._extract_player_xml(player_xml_url, song_id)
|
||||
|
||||
|
||||
class ZingMp3AlbumIE(ZingMp3BaseInfoExtractor):
|
||||
_VALID_URL = r'https?://mp3\.zing\.vn/(?:album|playlist)/(?P<slug>[^/]+)/(?P<album_id>\w+)\.html'
|
||||
_TESTS = [{
|
||||
}, {
|
||||
'url': 'http://mp3.zing.vn/video-clip/Let-It-Go-Frozen-OST-Sungha-Jung/ZW6BAEA0.html',
|
||||
'md5': '870295a9cd8045c0e15663565902618d',
|
||||
'info_dict': {
|
||||
'id': 'ZW6BAEA0',
|
||||
'title': 'Let It Go (Frozen OST)',
|
||||
'ext': 'mp4',
|
||||
},
|
||||
}, {
|
||||
'url': 'http://mp3.zing.vn/album/Lau-Dai-Tinh-Ai-Bang-Kieu-Minh-Tuyet/ZWZBWDAF.html',
|
||||
'info_dict': {
|
||||
'_type': 'playlist',
|
||||
'id': 'ZWZBWDAF',
|
||||
'title': 'Lâu Đài Tình Ái - Bằng Kiều ft. Minh Tuyết | Album 320 lossless',
|
||||
'title': 'Lâu Đài Tình Ái - Bằng Kiều,Minh Tuyết | Album 320 lossless',
|
||||
},
|
||||
'playlist_count': 10,
|
||||
'skip': 'removed at the request of the owner',
|
||||
}, {
|
||||
'url': 'http://mp3.zing.vn/playlist/Duong-Hong-Loan-apollobee/IWCAACCB.html',
|
||||
'only_matching': True,
|
||||
}]
|
||||
IE_NAME = 'zingmp3:album'
|
||||
IE_DESC = 'mp3.zing.vn albums'
|
||||
IE_NAME = 'zingmp3'
|
||||
IE_DESC = 'mp3.zing.vn'
|
||||
|
||||
def _real_extract(self, url):
|
||||
matched = re.match(self._VALID_URL, url)
|
||||
slug = matched.group('slug')
|
||||
album_id = matched.group('album_id')
|
||||
page_id = self._match_id(url)
|
||||
|
||||
webpage = self._download_webpage(
|
||||
'http://mp3.zing.vn/album/%s/%s.html' % (slug, album_id), album_id)
|
||||
player_xml_url = self._search_regex(
|
||||
r'&xmlURL=(?P<xml_url>[^&]+)&', webpage, 'player xml url')
|
||||
webpage = self._download_webpage(url, page_id)
|
||||
|
||||
return self._extract_player_xml(
|
||||
player_xml_url, album_id,
|
||||
playlist_title=self._og_search_title(webpage))
|
||||
player_json_url = self._search_regex([
|
||||
r'data-xml="([^"]+)',
|
||||
r'&xmlURL=([^&]+)&'
|
||||
], webpage, 'player xml url')
|
||||
|
||||
playlist_title = None
|
||||
page_type = self._search_regex(r'/(?:html5)?xml/([^/-]+)', player_json_url, 'page type')
|
||||
if page_type == 'video':
|
||||
player_json_url = update_url_query(player_json_url, {'format': 'json'})
|
||||
else:
|
||||
player_json_url = player_json_url.replace('/xml/', '/html5xml/')
|
||||
if page_type == 'album':
|
||||
playlist_title = self._og_search_title(webpage)
|
||||
|
||||
return self._extract_player_json(player_json_url, page_id, page_type, playlist_title)
|
||||
|
@@ -1,94 +0,0 @@
|
||||
from __future__ import unicode_literals
|
||||
|
||||
import re
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..utils import (
|
||||
determine_ext,
|
||||
str_to_int,
|
||||
)
|
||||
|
||||
|
||||
class ZippCastIE(InfoExtractor):
|
||||
_VALID_URL = r'https?://(?:www\.)?zippcast\.com/(?:video/|videoview\.php\?.*\bvplay=)(?P<id>[0-9a-zA-Z]+)'
|
||||
_TESTS = [{
|
||||
# m3u8, hq direct link
|
||||
'url': 'http://www.zippcast.com/video/c9cfd5c7e44dbc29c81',
|
||||
'md5': '5ea0263b5606866c4d6cda0fc5e8c6b6',
|
||||
'info_dict': {
|
||||
'id': 'c9cfd5c7e44dbc29c81',
|
||||
'ext': 'mp4',
|
||||
'title': '[Vinesauce] Vinny - Digital Space Traveler',
|
||||
'description': 'Muted on youtube, but now uploaded in it\'s original form.',
|
||||
'thumbnail': 're:^https?://.*\.jpg$',
|
||||
'uploader': 'vinesauce',
|
||||
'view_count': int,
|
||||
'categories': ['Entertainment'],
|
||||
'tags': list,
|
||||
},
|
||||
}, {
|
||||
# f4m, lq ipod direct link
|
||||
'url': 'http://www.zippcast.com/video/b79c0a233e9c6581775',
|
||||
'only_matching': True,
|
||||
}, {
|
||||
'url': 'http://www.zippcast.com/videoview.php?vplay=c9cfd5c7e44dbc29c81&auto=no',
|
||||
'only_matching': True,
|
||||
}]
|
||||
|
||||
def _real_extract(self, url):
|
||||
video_id = self._match_id(url)
|
||||
|
||||
webpage = self._download_webpage(
|
||||
'http://www.zippcast.com/video/%s' % video_id, video_id)
|
||||
|
||||
formats = []
|
||||
video_url = self._search_regex(
|
||||
r'<source[^>]+src=(["\'])(?P<url>.+?)\1', webpage,
|
||||
'video url', default=None, group='url')
|
||||
if video_url:
|
||||
formats.append({
|
||||
'url': video_url,
|
||||
'format_id': 'http',
|
||||
'preference': 0, # direct link is almost always of worse quality
|
||||
})
|
||||
src_url = self._search_regex(
|
||||
r'src\s*:\s*(?:escape\()?(["\'])(?P<url>http://.+?)\1',
|
||||
webpage, 'src', default=None, group='url')
|
||||
ext = determine_ext(src_url)
|
||||
if ext == 'm3u8':
|
||||
formats.extend(self._extract_m3u8_formats(
|
||||
src_url, video_id, 'mp4', entry_protocol='m3u8_native',
|
||||
m3u8_id='hls', fatal=False))
|
||||
elif ext == 'f4m':
|
||||
formats.extend(self._extract_f4m_formats(
|
||||
src_url, video_id, f4m_id='hds', fatal=False))
|
||||
self._sort_formats(formats)
|
||||
|
||||
title = self._og_search_title(webpage)
|
||||
description = self._og_search_description(webpage) or self._html_search_meta(
|
||||
'description', webpage)
|
||||
uploader = self._search_regex(
|
||||
r'<a[^>]+href="https?://[^/]+/profile/[^>]+>([^<]+)</a>',
|
||||
webpage, 'uploader', fatal=False)
|
||||
thumbnail = self._og_search_thumbnail(webpage)
|
||||
view_count = str_to_int(self._search_regex(
|
||||
r'>([\d,.]+) views!', webpage, 'view count', fatal=False))
|
||||
|
||||
categories = re.findall(
|
||||
r'<a[^>]+href="https?://[^/]+/categories/[^"]+">([^<]+),?<',
|
||||
webpage)
|
||||
tags = re.findall(
|
||||
r'<a[^>]+href="https?://[^/]+/search/tags/[^"]+">([^<]+),?<',
|
||||
webpage)
|
||||
|
||||
return {
|
||||
'id': video_id,
|
||||
'title': title,
|
||||
'description': description,
|
||||
'thumbnail': thumbnail,
|
||||
'uploader': uploader,
|
||||
'view_count': view_count,
|
||||
'categories': categories,
|
||||
'tags': tags,
|
||||
'formats': formats,
|
||||
}
|
@@ -628,22 +628,7 @@ def parseOpts(overrideArguments=None):
|
||||
filesystem.add_option(
|
||||
'-o', '--output',
|
||||
dest='outtmpl', metavar='TEMPLATE',
|
||||
help=('Output filename template. Use %(title)s to get the title, '
|
||||
'%(uploader)s for the uploader name, %(uploader_id)s for the uploader nickname if different, '
|
||||
'%(autonumber)s to get an automatically incremented number, '
|
||||
'%(ext)s for the filename extension, '
|
||||
'%(format)s for the format description (like "22 - 1280x720" or "HD"), '
|
||||
'%(format_id)s for the unique id of the format (like YouTube\'s itags: "137"), '
|
||||
'%(upload_date)s for the upload date (YYYYMMDD), '
|
||||
'%(extractor)s for the provider (youtube, metacafe, etc), '
|
||||
'%(id)s for the video id, '
|
||||
'%(playlist_title)s, %(playlist_id)s, or %(playlist)s (=title if present, ID otherwise) for the playlist the video is in, '
|
||||
'%(playlist_index)s for the position in the playlist. '
|
||||
'%(height)s and %(width)s for the width and height of the video format. '
|
||||
'%(resolution)s for a textual description of the resolution of the video format. '
|
||||
'%% for a literal percent. '
|
||||
'Use - to output to stdout. Can also be used to download to a different directory, '
|
||||
'for example with -o \'/my/downloads/%(uploader)s/%(title)s-%(id)s.%(ext)s\' .'))
|
||||
help=('Output filename template, see the "OUTPUT TEMPLATE" for all the info'))
|
||||
filesystem.add_option(
|
||||
'--autonumber-size',
|
||||
dest='autonumber_size', metavar='NUMBER',
|
||||
|
@@ -1504,38 +1504,63 @@ def parse_filesize(s):
|
||||
_UNIT_TABLE = {
|
||||
'B': 1,
|
||||
'b': 1,
|
||||
'bytes': 1,
|
||||
'KiB': 1024,
|
||||
'KB': 1000,
|
||||
'kB': 1024,
|
||||
'Kb': 1000,
|
||||
'kb': 1000,
|
||||
'kilobytes': 1000,
|
||||
'kibibytes': 1024,
|
||||
'MiB': 1024 ** 2,
|
||||
'MB': 1000 ** 2,
|
||||
'mB': 1024 ** 2,
|
||||
'Mb': 1000 ** 2,
|
||||
'mb': 1000 ** 2,
|
||||
'megabytes': 1000 ** 2,
|
||||
'mebibytes': 1024 ** 2,
|
||||
'GiB': 1024 ** 3,
|
||||
'GB': 1000 ** 3,
|
||||
'gB': 1024 ** 3,
|
||||
'Gb': 1000 ** 3,
|
||||
'gb': 1000 ** 3,
|
||||
'gigabytes': 1000 ** 3,
|
||||
'gibibytes': 1024 ** 3,
|
||||
'TiB': 1024 ** 4,
|
||||
'TB': 1000 ** 4,
|
||||
'tB': 1024 ** 4,
|
||||
'Tb': 1000 ** 4,
|
||||
'tb': 1000 ** 4,
|
||||
'terabytes': 1000 ** 4,
|
||||
'tebibytes': 1024 ** 4,
|
||||
'PiB': 1024 ** 5,
|
||||
'PB': 1000 ** 5,
|
||||
'pB': 1024 ** 5,
|
||||
'Pb': 1000 ** 5,
|
||||
'pb': 1000 ** 5,
|
||||
'petabytes': 1000 ** 5,
|
||||
'pebibytes': 1024 ** 5,
|
||||
'EiB': 1024 ** 6,
|
||||
'EB': 1000 ** 6,
|
||||
'eB': 1024 ** 6,
|
||||
'Eb': 1000 ** 6,
|
||||
'eb': 1000 ** 6,
|
||||
'exabytes': 1000 ** 6,
|
||||
'exbibytes': 1024 ** 6,
|
||||
'ZiB': 1024 ** 7,
|
||||
'ZB': 1000 ** 7,
|
||||
'zB': 1024 ** 7,
|
||||
'Zb': 1000 ** 7,
|
||||
'zb': 1000 ** 7,
|
||||
'zettabytes': 1000 ** 7,
|
||||
'zebibytes': 1024 ** 7,
|
||||
'YiB': 1024 ** 8,
|
||||
'YB': 1000 ** 8,
|
||||
'yB': 1024 ** 8,
|
||||
'Yb': 1000 ** 8,
|
||||
'yb': 1000 ** 8,
|
||||
'yottabytes': 1000 ** 8,
|
||||
'yobibytes': 1024 ** 8,
|
||||
}
|
||||
|
||||
return lookup_unit_table(_UNIT_TABLE, s)
|
||||
@@ -2030,14 +2055,14 @@ def js_to_json(code):
|
||||
}.get(m.group(0), m.group(0)), v[1:-1])
|
||||
|
||||
INTEGER_TABLE = (
|
||||
(r'^0[xX][0-9a-fA-F]+', 16),
|
||||
(r'^0+[0-7]+', 8),
|
||||
(r'^(0[xX][0-9a-fA-F]+)\s*:?$', 16),
|
||||
(r'^(0+[0-7]+)\s*:?$', 8),
|
||||
)
|
||||
|
||||
for regex, base in INTEGER_TABLE:
|
||||
im = re.match(regex, v)
|
||||
if im:
|
||||
i = int(im.group(0), base)
|
||||
i = int(im.group(1), base)
|
||||
return '"%d":' % i if v.endswith(':') else '%d' % i
|
||||
|
||||
return '"%s"' % v
|
||||
|
@@ -1,3 +1,3 @@
|
||||
from __future__ import unicode_literals
|
||||
|
||||
__version__ = '2016.08.13'
|
||||
__version__ = '2016.08.24.1'
|
||||
|
Reference in New Issue
Block a user