Compare commits

...

26 Commits

Author SHA1 Message Date
Sergey M․
7e721e35da release 2017.10.15.1 2017-10-15 06:16:41 +07:00
Sergey M․
bd7e1406b3 [ChangeLog] Actualize 2017-10-15 06:15:37 +07:00
Sergey M․
74c42d9ec3 [downloader/hls] Ignore anvato ad fragments (closes #14496) 2017-10-15 06:13:48 +07:00
Sergey M․
5efaf43c93 [downloader/fragment] Output ad fragment count 2017-10-15 06:13:07 +07:00
Sergey M․
4827270526 [scrippsnetworks:watch] Bypass geo restriction 2017-10-15 06:11:35 +07:00
Sergey M․
ee093a0ea0 [anvato] Add ability to bypass geo restriction 2017-10-15 06:11:02 +07:00
Sergey M․
9bb2c7673e [redditr] Fix extraction for URLs with query (closes #14495) 2017-10-15 03:38:34 +07:00
Sergey M․
715534083d release 2017.10.15 2017-10-15 02:26:58 +07:00
Sergey M․
ee88c1cbc6 [ChangeLog] Actualize 2017-10-15 02:26:10 +07:00
Sergey M․
57eb45b111 [scrippsnetworks:watch] Add support for geniuskitchen.com 2017-10-15 02:01:16 +07:00
Sergey M․
b21ab85088 [scrippsnetworks:watch] Fix extraction (closes #14389) 2017-10-15 01:57:43 +07:00
Sergey M․
210a2720bc [anvato] Process master m3u8 manifests
>>> Individual m3u8 manifests are not always present, e.g. anvato:anvato_scripps_app_web_prod_0837996dbe373629133857ae9eb72e740424d80a:4173834
2017-10-15 01:44:57 +07:00
Sergey M․
685e87b61f [youtube] Fix relative URLs in description 2017-10-14 20:26:52 +07:00
Remita Amine
c9bd503e7d [spike] bypass geo restriction 2017-10-13 08:41:57 +00:00
Remita Amine
94a530c6cb [howstuffworks] add support for more domains 2017-10-12 19:03:47 +00:00
Remita Amine
e650659b94 [infoq] fix http format downloading 2017-10-12 17:39:51 +00:00
Remita Amine
2637fadc38 [generic] fix some of the tests 2017-10-12 16:14:43 +00:00
Remita Amine
50d808f5c9 [common] add support for jwplayer youtube embeds 2017-10-12 16:12:47 +00:00
Remita Amine
7a64c33aee [rtlnl] add support for another type of embeds 2017-10-12 16:09:06 +00:00
Remita Amine
b0def2c297 [onionstudios] add support for bulbs-video embeds 2017-10-12 16:05:25 +00:00
Remita Amine
81ce479f4d [udn] fix extraction 2017-10-12 16:04:41 +00:00
Remita Amine
414e709405 [shahid] fix extraction(fixes #14448) 2017-10-12 09:20:39 +00:00
Yen Chi Hsuan
645ed3e7c9 [ChangeLog] Update after #14471
[skip ci]
2017-10-12 12:12:37 +08:00
nyuszika7h
c0bddd6d65 [kaltura] Ignore Widevine encrypted video (.wvm)
There is currently no public method to decrypt this, and there may be
other streams available that can be downloaded.

Example URL, has `.wvm` and `.mp4` formats:
https://www.voot.com/shows/bigg-boss-s11/11/538936/bigg-boss-extra-dose-arshi-s-quirky-demand/541700
2017-10-12 12:09:58 +08:00
Yen Chi Hsuan
1baba7f4a8 [vh1] Adding coding cookie 2017-10-12 12:02:26 +08:00
Remita Amine
344d1a6794 [vh1] fix extraction(fixes #9613) 2017-10-11 20:52:14 +00:00
20 changed files with 413 additions and 280 deletions

View File

@@ -6,8 +6,8 @@
--- ---
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2017.10.12*. If it's not, read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected. ### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2017.10.15.1*. If it's not, read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2017.10.12** - [ ] I've **verified** and **I assure** that I'm running youtube-dl **2017.10.15.1**
### Before submitting an *issue* make sure you have: ### Before submitting an *issue* make sure you have:
- [ ] At least skimmed through the [README](https://github.com/rg3/youtube-dl/blob/master/README.md), **most notably** the [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections - [ ] At least skimmed through the [README](https://github.com/rg3/youtube-dl/blob/master/README.md), **most notably** the [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
@@ -35,7 +35,7 @@ Add the `-v` flag to **your command line** you run youtube-dl with (`youtube-dl
[debug] User config: [] [debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj'] [debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251 [debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2017.10.12 [debug] youtube-dl version 2017.10.15.1
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2 [debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4 [debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {} [debug] Proxy map: {}

View File

@@ -1,3 +1,35 @@
version 2017.10.15.1
Core
* [downloader/hls] Ignore anvato ad fragments (#14496)
* [downloader/fragment] Output ad fragment count
Extractors
* [scrippsnetworks:watch] Bypass geo restriction
+ [anvato] Add ability to bypass geo restriction
* [redditr] Fix extraction for URLs with query (#14495)
version 2017.10.15
Core
+ [common] Add support for jwplayer youtube embeds
Extractors
* [scrippsnetworks:watch] Fix extraction (#14389)
* [anvato] Process master m3u8 manifests
* [youtube] Fix relative URLs in description
* [spike] Bypass geo restriction
+ [howstuffworks] Add support for more domains
* [infoq] Fix http format downloading
+ [rtlnl] Add support for another type of embeds
+ [onionstudios] Add support for bulbs-video embeds
* [udn] Fix extraction
* [shahid] Fix extraction (#14448)
* [kaltura] Ignore Widevine encrypted video (.wvm) (#14471)
* [vh1] Fix extraction (#9613)
version 2017.10.12 version 2017.10.12
Core Core

View File

@@ -117,9 +117,15 @@ class FragmentFD(FileDownloader):
def _prepare_frag_download(self, ctx): def _prepare_frag_download(self, ctx):
if 'live' not in ctx: if 'live' not in ctx:
ctx['live'] = False ctx['live'] = False
if not ctx['live']:
total_frags_str = '%d' % ctx['total_frags']
ad_frags = ctx.get('ad_frags', 0)
if ad_frags:
total_frags_str += ' (not including %d ad)' % ad_frags
else:
total_frags_str = 'unknown (live)'
self.to_screen( self.to_screen(
'[%s] Total fragments: %s' '[%s] Total fragments: %s' % (self.FD_NAME, total_frags_str))
% (self.FD_NAME, ctx['total_frags'] if not ctx['live'] else 'unknown (live)'))
self.report_destination(ctx['filename']) self.report_destination(ctx['filename'])
dl = HttpQuietDownloader( dl = HttpQuietDownloader(
self.ydl, self.ydl,

View File

@@ -75,15 +75,29 @@ class HlsFD(FragmentFD):
fd.add_progress_hook(ph) fd.add_progress_hook(ph)
return fd.real_download(filename, info_dict) return fd.real_download(filename, info_dict)
total_frags = 0 def anvato_ad(s):
return s.startswith('#ANVATO-SEGMENT-INFO') and 'type=ad' in s
media_frags = 0
ad_frags = 0
ad_frag_next = False
for line in s.splitlines(): for line in s.splitlines():
line = line.strip() line = line.strip()
if line and not line.startswith('#'): if not line:
total_frags += 1 continue
if line.startswith('#'):
if anvato_ad(line):
ad_frags += 1
continue
if ad_frag_next:
ad_frag_next = False
continue
media_frags += 1
ctx = { ctx = {
'filename': filename, 'filename': filename,
'total_frags': total_frags, 'total_frags': media_frags,
'ad_frags': ad_frags,
} }
self._prepare_and_start_frag_download(ctx) self._prepare_and_start_frag_download(ctx)
@@ -101,10 +115,14 @@ class HlsFD(FragmentFD):
decrypt_info = {'METHOD': 'NONE'} decrypt_info = {'METHOD': 'NONE'}
byte_range = {} byte_range = {}
frag_index = 0 frag_index = 0
ad_frag_next = False
for line in s.splitlines(): for line in s.splitlines():
line = line.strip() line = line.strip()
if line: if line:
if not line.startswith('#'): if not line.startswith('#'):
if ad_frag_next:
ad_frag_next = False
continue
frag_index += 1 frag_index += 1
if frag_index <= ctx['fragment_index']: if frag_index <= ctx['fragment_index']:
continue continue
@@ -175,6 +193,8 @@ class HlsFD(FragmentFD):
'start': sub_range_start, 'start': sub_range_start,
'end': sub_range_start + int(splitted_byte_range[0]), 'end': sub_range_start + int(splitted_byte_range[0]),
} }
elif anvato_ad(line):
ad_frag_next = True
self._finish_frag_download(ctx) self._finish_frag_download(ctx)

View File

@@ -18,6 +18,7 @@ from ..utils import (
int_or_none, int_or_none,
strip_jsonp, strip_jsonp,
unescapeHTML, unescapeHTML,
unsmuggle_url,
) )
@@ -197,12 +198,16 @@ class AnvatoIE(InfoExtractor):
'tbr': tbr if tbr != 0 else None, 'tbr': tbr if tbr != 0 else None,
} }
if ext == 'm3u8' or media_format in ('m3u8', 'm3u8-variant'): if media_format == 'm3u8' and tbr is not None:
if tbr is not None:
a_format.update({ a_format.update({
'format_id': '-'.join(filter(None, ['hls', compat_str(tbr)])), 'format_id': '-'.join(filter(None, ['hls', compat_str(tbr)])),
'ext': 'mp4', 'ext': 'mp4',
}) })
elif media_format == 'm3u8-variant' or ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
video_url, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls', fatal=False))
continue
elif ext == 'mp3' or media_format == 'mp3': elif ext == 'mp3' or media_format == 'mp3':
a_format['vcodec'] = 'none' a_format['vcodec'] = 'none'
else: else:
@@ -271,6 +276,9 @@ class AnvatoIE(InfoExtractor):
anvplayer_data['accessKey'], anvplayer_data['video']) anvplayer_data['accessKey'], anvplayer_data['video'])
def _real_extract(self, url): def _real_extract(self, url):
url, smuggled_data = unsmuggle_url(url, {})
self._initialize_geo_bypass(smuggled_data.get('geo_countries'))
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
access_key, video_id = mobj.group('access_key_or_mcp', 'id') access_key, video_id = mobj.group('access_key_or_mcp', 'id')
if access_key not in self._ANVACK_TABLE: if access_key not in self._ANVACK_TABLE:

View File

@@ -2322,7 +2322,6 @@ class InfoExtractor(object):
formats = self._parse_jwplayer_formats( formats = self._parse_jwplayer_formats(
video_data['sources'], video_id=this_video_id, m3u8_id=m3u8_id, video_data['sources'], video_id=this_video_id, m3u8_id=m3u8_id,
mpd_id=mpd_id, rtmp_params=rtmp_params, base_url=base_url) mpd_id=mpd_id, rtmp_params=rtmp_params, base_url=base_url)
self._sort_formats(formats)
subtitles = {} subtitles = {}
tracks = video_data.get('tracks') tracks = video_data.get('tracks')
@@ -2339,16 +2338,25 @@ class InfoExtractor(object):
'url': self._proto_relative_url(track_url) 'url': self._proto_relative_url(track_url)
}) })
entries.append({ entry = {
'id': this_video_id, 'id': this_video_id,
'title': video_data['title'] if require_title else video_data.get('title'), 'title': unescapeHTML(video_data['title'] if require_title else video_data.get('title')),
'description': video_data.get('description'), 'description': video_data.get('description'),
'thumbnail': self._proto_relative_url(video_data.get('image')), 'thumbnail': self._proto_relative_url(video_data.get('image')),
'timestamp': int_or_none(video_data.get('pubdate')), 'timestamp': int_or_none(video_data.get('pubdate')),
'duration': float_or_none(jwplayer_data.get('duration') or video_data.get('duration')), 'duration': float_or_none(jwplayer_data.get('duration') or video_data.get('duration')),
'subtitles': subtitles, 'subtitles': subtitles,
'formats': formats, }
# https://github.com/jwplayer/jwplayer/blob/master/src/js/utils/validator.js#L32
if len(formats) == 1 and re.search(r'^(?:http|//).*(?:youtube\.com|youtu\.be)/.+', formats[0]['url']):
entry.update({
'_type': 'url_transparent',
'url': formats[0]['url'],
}) })
else:
self._sort_formats(formats)
entry['formats'] = formats
entries.append(entry)
if len(entries) == 1: if len(entries) == 1:
return entries[0] return entries[0]
else: else:

View File

@@ -1091,7 +1091,7 @@ class GenericIE(InfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'upload_date': '20150212', 'upload_date': '20150212',
'uploader': 'The National Archives UK', 'uploader': 'The National Archives UK',
'description': 'md5:a236581cd2449dd2df4f93412f3f01c6', 'description': 'md5:8078af856dca76edc42910b61273dbbf',
'uploader_id': 'NationalArchives08', 'uploader_id': 'NationalArchives08',
'title': 'Webinar: Using Discovery, The National Archives online catalogue', 'title': 'Webinar: Using Discovery, The National Archives online catalogue',
}, },
@@ -1107,7 +1107,8 @@ class GenericIE(InfoExtractor):
}, },
'params': { 'params': {
'skip_download': True, 'skip_download': True,
} },
'skip': 'does not contain a video anymore',
}, },
# Complex jwplayer # Complex jwplayer
{ {
@@ -1116,6 +1117,7 @@ class GenericIE(InfoExtractor):
'id': 'videos', 'id': 'videos',
'ext': 'mp4', 'ext': 'mp4',
'title': 'king machine trailer 1', 'title': 'king machine trailer 1',
'description': 'Browse King Machine videos & audio for sweet media. Your eyes will thank you.',
'thumbnail': r're:^https?://.*\.jpg$', 'thumbnail': r're:^https?://.*\.jpg$',
}, },
}, },
@@ -1168,7 +1170,7 @@ class GenericIE(InfoExtractor):
'playlist_mincount': 5, 'playlist_mincount': 5,
'info_dict': { 'info_dict': {
'id': 'aanslagen-kopenhagen', 'id': 'aanslagen-kopenhagen',
'title': 'Aanslagen Kopenhagen | RTL Nieuws', 'title': 'Aanslagen Kopenhagen',
} }
}, },
# Zapiks embed # Zapiks embed
@@ -1300,6 +1302,7 @@ class GenericIE(InfoExtractor):
'params': { 'params': {
'skip_download': True, 'skip_download': True,
}, },
'skip': 'This video is unavailable.',
}, },
# Pladform embed # Pladform embed
{ {
@@ -1313,6 +1316,7 @@ class GenericIE(InfoExtractor):
'duration': 694, 'duration': 694,
'age_limit': 0, 'age_limit': 0,
}, },
'skip': 'HTTP Error 404: Not Found',
}, },
# Playwire embed # Playwire embed
{ {
@@ -1333,6 +1337,14 @@ class GenericIE(InfoExtractor):
'id': '518726732', 'id': '518726732',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Facebook Creates "On This Day" | Crunch Report', 'title': 'Facebook Creates "On This Day" | Crunch Report',
'description': 'Amazon updates Fire TV line, Tesla\'s Model X spotted in the wild',
'timestamp': 1427237531,
'uploader': 'Crunch Report',
'upload_date': '20150324',
},
'params': {
# m3u8 download
'skip_download': True,
}, },
}, },
# SVT embed # SVT embed
@@ -1384,16 +1396,20 @@ class GenericIE(InfoExtractor):
'upload_date': '20140107', 'upload_date': '20140107',
'timestamp': 1389118457, 'timestamp': 1389118457,
}, },
'skip': 'Invalid Page URL',
}, },
# NBC News embed # NBC News embed
{ {
'url': 'http://www.vulture.com/2016/06/letterman-couldnt-care-less-about-late-night.html', 'url': 'http://www.vulture.com/2016/06/letterman-couldnt-care-less-about-late-night.html',
'md5': '1aa589c675898ae6d37a17913cf68d66', 'md5': '1aa589c675898ae6d37a17913cf68d66',
'info_dict': { 'info_dict': {
'id': '701714499682', 'id': 'x_dtl_oa_LettermanliftPR_160608',
'ext': 'mp4', 'ext': 'mp4',
'title': 'PREVIEW: On Assignment: David Letterman', 'title': 'David Letterman: A Preview',
'description': 'A preview of Tom Brokaw\'s interview with David Letterman as part of the On Assignment series powered by Dateline. Airs Sunday June 12 at 7/6c.', 'description': 'A preview of Tom Brokaw\'s interview with David Letterman as part of the On Assignment series powered by Dateline. Airs Sunday June 12 at 7/6c.',
'upload_date': '20160609',
'timestamp': 1465431544,
'uploader': 'NBCU-NEWS',
}, },
}, },
# UDN embed # UDN embed
@@ -1410,6 +1426,7 @@ class GenericIE(InfoExtractor):
# m3u8 download # m3u8 download
'skip_download': True, 'skip_download': True,
}, },
'expected_warnings': ['Failed to parse JSON Expecting value'],
}, },
# Ooyala embed # Ooyala embed
{ {
@@ -1417,7 +1434,7 @@ class GenericIE(InfoExtractor):
'info_dict': { 'info_dict': {
'id': '50YnY4czr4ms1vJ7yz3xzq0excz_pUMs', 'id': '50YnY4czr4ms1vJ7yz3xzq0excz_pUMs',
'ext': 'mp4', 'ext': 'mp4',
'description': 'VIDEO: INDEX/MATCH versus VLOOKUP.', 'description': 'Index/Match versus VLOOKUP.',
'title': 'This is what separates the Excel masters from the wannabes', 'title': 'This is what separates the Excel masters from the wannabes',
'duration': 191.933, 'duration': 191.933,
}, },
@@ -1455,7 +1472,8 @@ class GenericIE(InfoExtractor):
'upload_date': '20150622', 'upload_date': '20150622',
'uploader': 'Public Sénat', 'uploader': 'Public Sénat',
'uploader_id': 'xa9gza', 'uploader_id': 'xa9gza',
} },
'skip': 'File not found.',
}, },
# OnionStudios embed # OnionStudios embed
{ {
@@ -2253,7 +2271,7 @@ class GenericIE(InfoExtractor):
# Look for embedded rtl.nl player # Look for embedded rtl.nl player
matches = re.findall( matches = re.findall(
r'<iframe[^>]+?src="((?:https?:)?//(?:www\.)?rtl\.nl/system/videoplayer/[^"]+(?:video_)?embed[^"]+)"', r'<iframe[^>]+?src="((?:https?:)?//(?:(?:www|static)\.)?rtl\.nl/(?:system/videoplayer/[^"]+(?:video_)?)?embed[^"]+)"',
webpage) webpage)
if matches: if matches:
return self.playlist_from_matches(matches, video_id, video_title, ie='RtlNl') return self.playlist_from_matches(matches, video_id, video_title, ie='RtlNl')
@@ -2652,7 +2670,7 @@ class GenericIE(InfoExtractor):
# Look for UDN embeds # Look for UDN embeds
mobj = re.search( mobj = re.search(
r'<iframe[^>]+src="(?P<url>%s)"' % UDNEmbedIE._PROTOCOL_RELATIVE_VALID_URL, webpage) r'<iframe[^>]+src="(?:https?:)?(?P<url>%s)"' % UDNEmbedIE._PROTOCOL_RELATIVE_VALID_URL, webpage)
if mobj is not None: if mobj is not None:
return self.url_result( return self.url_result(
compat_urlparse.urljoin(url, mobj.group('url')), 'UDNEmbed') compat_urlparse.urljoin(url, mobj.group('url')), 'UDNEmbed')

View File

@@ -11,45 +11,20 @@ from ..utils import (
class HowStuffWorksIE(InfoExtractor): class HowStuffWorksIE(InfoExtractor):
_VALID_URL = r'https?://[\da-z-]+\.howstuffworks\.com/(?:[^/]+/)*(?:\d+-)?(?P<id>.+?)-video\.htm' _VALID_URL = r'https?://[\da-z-]+\.(?:howstuffworks|stuff(?:(?:youshould|theydontwantyouto)know|toblowyourmind|momnevertoldyou)|(?:brain|car)stuffshow|fwthinking|geniusstuff)\.com/(?:[^/]+/)*(?:\d+-)?(?P<id>.+?)-video\.htm'
_TESTS = [ _TESTS = [
{ {
'url': 'http://adventure.howstuffworks.com/5266-cool-jobs-iditarod-musher-video.htm', 'url': 'http://www.stufftoblowyourmind.com/videos/optical-illusions-video.htm',
'md5': '76646a5acc0c92bf7cd66751ca5db94d',
'info_dict': { 'info_dict': {
'id': '450221', 'id': '855410',
'ext': 'flv',
'title': 'Cool Jobs - Iditarod Musher',
'description': 'Cold sleds, freezing temps and warm dog breath... an Iditarod musher\'s dream. Kasey-Dee Gardner jumps on a sled to find out what the big deal is.',
'display_id': 'cool-jobs-iditarod-musher',
'thumbnail': r're:^https?://.*\.jpg$',
'duration': 161,
},
'skip': 'Video broken',
},
{
'url': 'http://adventure.howstuffworks.com/7199-survival-zone-food-and-water-in-the-savanna-video.htm',
'info_dict': {
'id': '453464',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Survival Zone: Food and Water In the Savanna', 'title': 'Your Trickster Brain: Optical Illusions -- Science on the Web',
'description': 'Learn how to find both food and water while trekking in the African savannah. In this video from the Discovery Channel.', 'description': 'md5:e374ff9561f6833ad076a8cc0a5ab2fb',
'display_id': 'survival-zone-food-and-water-in-the-savanna',
'thumbnail': r're:^https?://.*\.jpg$',
}, },
}, },
{ {
'url': 'http://entertainment.howstuffworks.com/arts/2706-sword-swallowing-1-by-dan-meyer-video.htm', 'url': 'http://shows.howstuffworks.com/more-shows/why-does-balloon-stick-to-hair-video.htm',
'info_dict': {
'id': '440011',
'ext': 'mp4',
'title': 'Sword Swallowing #1 by Dan Meyer',
'description': 'Video footage (1 of 3) used by permission of the owner Dan Meyer through Sword Swallowers Association International <www.swordswallow.org>',
'display_id': 'sword-swallowing-1-by-dan-meyer',
'thumbnail': r're:^https?://.*\.jpg$',
},
},
{
'url': 'http://shows.howstuffworks.com/stuff-to-blow-your-mind/optical-illusions-video.htm',
'only_matching': True, 'only_matching': True,
} }
] ]

View File

@@ -8,7 +8,10 @@ from ..compat import (
compat_urllib_parse_unquote, compat_urllib_parse_unquote,
compat_urlparse, compat_urlparse,
) )
from ..utils import determine_ext from ..utils import (
determine_ext,
update_url_query,
)
from .bokecc import BokeCCBaseIE from .bokecc import BokeCCBaseIE
@@ -68,21 +71,22 @@ class InfoQIE(BokeCCBaseIE):
'play_path': playpath, 'play_path': playpath,
}] }]
def _extract_cookies(self, webpage): def _extract_cf_auth(self, webpage):
policy = self._search_regex(r'InfoQConstants\.scp\s*=\s*\'([^\']+)\'', webpage, 'policy') policy = self._search_regex(r'InfoQConstants\.scp\s*=\s*\'([^\']+)\'', webpage, 'policy')
signature = self._search_regex(r'InfoQConstants\.scs\s*=\s*\'([^\']+)\'', webpage, 'signature') signature = self._search_regex(r'InfoQConstants\.scs\s*=\s*\'([^\']+)\'', webpage, 'signature')
key_pair_id = self._search_regex(r'InfoQConstants\.sck\s*=\s*\'([^\']+)\'', webpage, 'key-pair-id') key_pair_id = self._search_regex(r'InfoQConstants\.sck\s*=\s*\'([^\']+)\'', webpage, 'key-pair-id')
return 'CloudFront-Policy=%s; CloudFront-Signature=%s; CloudFront-Key-Pair-Id=%s' % ( return {
policy, signature, key_pair_id) 'Policy': policy,
'Signature': signature,
'Key-Pair-Id': key_pair_id,
}
def _extract_http_video(self, webpage): def _extract_http_video(self, webpage):
http_video_url = self._search_regex(r'P\.s\s*=\s*\'([^\']+)\'', webpage, 'video URL') http_video_url = self._search_regex(r'P\.s\s*=\s*\'([^\']+)\'', webpage, 'video URL')
http_video_url = update_url_query(http_video_url, self._extract_cf_auth(webpage))
return [{ return [{
'format_id': 'http_video', 'format_id': 'http_video',
'url': http_video_url, 'url': http_video_url,
'http_headers': {
'Cookie': self._extract_cookies(webpage)
},
}] }]
def _extract_http_audio(self, webpage, video_id): def _extract_http_audio(self, webpage, video_id):
@@ -91,22 +95,20 @@ class InfoQIE(BokeCCBaseIE):
if not http_audio_url: if not http_audio_url:
return [] return []
cookies_header = {'Cookie': self._extract_cookies(webpage)}
# base URL is found in the Location header in the response returned by # base URL is found in the Location header in the response returned by
# GET https://www.infoq.com/mp3download.action?filename=... when logged in. # GET https://www.infoq.com/mp3download.action?filename=... when logged in.
http_audio_url = compat_urlparse.urljoin('http://res.infoq.com/downloads/mp3downloads/', http_audio_url) http_audio_url = compat_urlparse.urljoin('http://res.infoq.com/downloads/mp3downloads/', http_audio_url)
http_audio_url = update_url_query(http_audio_url, self._extract_cf_auth(webpage))
# audio file seem to be missing some times even if there is a download link # audio file seem to be missing some times even if there is a download link
# so probe URL to make sure # so probe URL to make sure
if not self._is_valid_url(http_audio_url, video_id, headers=cookies_header): if not self._is_valid_url(http_audio_url, video_id):
return [] return []
return [{ return [{
'format_id': 'http_audio', 'format_id': 'http_audio',
'url': http_audio_url, 'url': http_audio_url,
'vcodec': 'none', 'vcodec': 'none',
'http_headers': cookies_header,
}] }]
def _real_extract(self, url): def _real_extract(self, url):

View File

@@ -287,6 +287,9 @@ class KalturaIE(InfoExtractor):
# skip for now. # skip for now.
if f.get('fileExt') == 'chun': if f.get('fileExt') == 'chun':
continue continue
# DRM-protected video, cannot be decrypted
if f.get('fileExt') == 'wvm':
continue
if not f.get('fileExt'): if not f.get('fileExt'):
# QT indicates QuickTime; some videos have broken fileExt # QT indicates QuickTime; some videos have broken fileExt
if f.get('containerFormat') == 'qt': if f.get('containerFormat') == 'qt':

View File

@@ -13,11 +13,11 @@ from ..utils import (
class OnionStudiosIE(InfoExtractor): class OnionStudiosIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?onionstudios\.com/(?:videos/[^/]+-|embed\?.*\bid=)(?P<id>\d+)(?!-)' _VALID_URL = r'https?://(?:www\.)?onionstudios\.com/(?:video(?:s/[^/]+-|/)|embed\?.*\bid=)(?P<id>\d+)(?!-)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.onionstudios.com/videos/hannibal-charges-forward-stops-for-a-cocktail-2937', 'url': 'http://www.onionstudios.com/videos/hannibal-charges-forward-stops-for-a-cocktail-2937',
'md5': 'e49f947c105b8a78a675a0ee1bddedfe', 'md5': '719d1f8c32094b8c33902c17bcae5e34',
'info_dict': { 'info_dict': {
'id': '2937', 'id': '2937',
'ext': 'mp4', 'ext': 'mp4',
@@ -29,12 +29,15 @@ class OnionStudiosIE(InfoExtractor):
}, { }, {
'url': 'http://www.onionstudios.com/embed?id=2855&autoplay=true', 'url': 'http://www.onionstudios.com/embed?id=2855&autoplay=true',
'only_matching': True, 'only_matching': True,
}, {
'url': 'http://www.onionstudios.com/video/6139.json',
'only_matching': True,
}] }]
@staticmethod @staticmethod
def _extract_url(webpage): def _extract_url(webpage):
mobj = re.search( mobj = re.search(
r'<iframe[^>]+?src=(["\'])(?P<url>(?:https?:)?//(?:www\.)?onionstudios\.com/embed.+?)\1', webpage) r'(?s)<(?:iframe|bulbs-video)[^>]+?src=(["\'])(?P<url>(?:https?:)?//(?:www\.)?onionstudios\.com/(?:embed.+?|video/\d+\.json))\1', webpage)
if mobj: if mobj:
return mobj.group('url') return mobj.group('url')

View File

@@ -1,5 +1,7 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
@@ -45,7 +47,7 @@ class RedditIE(InfoExtractor):
class RedditRIE(InfoExtractor): class RedditRIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?reddit\.com/r/[^/]+/comments/(?P<id>[^/]+)' _VALID_URL = r'(?P<url>https?://(?:www\.)?reddit\.com/r/[^/]+/comments/(?P<id>[^/?#&]+))'
_TESTS = [{ _TESTS = [{
'url': 'https://www.reddit.com/r/videos/comments/6rrwyj/that_small_heart_attack/', 'url': 'https://www.reddit.com/r/videos/comments/6rrwyj/that_small_heart_attack/',
'info_dict': { 'info_dict': {
@@ -83,10 +85,13 @@ class RedditRIE(InfoExtractor):
}] }]
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
url, video_id = mobj.group('url', 'id')
video_id = self._match_id(url) video_id = self._match_id(url)
data = self._download_json( data = self._download_json(
url + '.json', video_id)[0]['data']['children'][0]['data'] url + '/.json', video_id)[0]['data']['children'][0]['data']
video_url = data['url'] video_url = data['url']

View File

@@ -12,10 +12,10 @@ class RtlNlIE(InfoExtractor):
IE_NAME = 'rtl.nl' IE_NAME = 'rtl.nl'
IE_DESC = 'rtl.nl and rtlxl.nl' IE_DESC = 'rtl.nl and rtlxl.nl'
_VALID_URL = r'''(?x) _VALID_URL = r'''(?x)
https?://(?:www\.)? https?://(?:(?:www|static)\.)?
(?: (?:
rtlxl\.nl/[^\#]*\#!/[^/]+/| rtlxl\.nl/[^\#]*\#!/[^/]+/|
rtl\.nl/(?:system/videoplayer/(?:[^/]+/)+(?:video_)?embed\.html\b.+?\buuid=|video/) rtl\.nl/(?:(?:system/videoplayer/(?:[^/]+/)+(?:video_)?embed\.html|embed)\b.+?\buuid=|video/)
) )
(?P<id>[0-9a-f-]+)''' (?P<id>[0-9a-f-]+)'''
@@ -73,6 +73,9 @@ class RtlNlIE(InfoExtractor):
}, { }, {
'url': 'https://www.rtl.nl/video/c603c9c2-601d-4b5e-8175-64f1e942dc7d/', 'url': 'https://www.rtl.nl/video/c603c9c2-601d-4b5e-8175-64f1e942dc7d/',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://static.rtl.nl/embed/?uuid=1a2970fc-5c0b-43ff-9fdc-927e39e6d1bc&autoplay=false&publicatiepunt=rtlnieuwsnl',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):

View File

@@ -1,60 +1,190 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
from .adobepass import AdobePassIE import datetime
import json
import hashlib
import hmac
import re
from .common import InfoExtractor
from .anvato import AnvatoIE
from ..utils import ( from ..utils import (
int_or_none,
smuggle_url, smuggle_url,
update_url_query, urlencode_postdata,
xpath_text,
) )
class ScrippsNetworksWatchIE(AdobePassIE): class ScrippsNetworksWatchIE(InfoExtractor):
IE_NAME = 'scrippsnetworks:watch' IE_NAME = 'scrippsnetworks:watch'
_VALID_URL = r'https?://watch\.(?:hgtv|foodnetwork|travelchannel|diynetwork|cookingchanneltv)\.com/player\.[A-Z0-9]+\.html#(?P<id>\d+)' _VALID_URL = r'''(?x)
_TEST = { https?://
'url': 'http://watch.hgtv.com/player.HNT.html#0256538', watch\.
(?P<site>hgtv|foodnetwork|travelchannel|diynetwork|cookingchanneltv|geniuskitchen)\.com/
(?:
player\.[A-Z0-9]+\.html\#|
show/(?:[^/]+/){2}|
player/
)
(?P<id>\d+)
'''
_TESTS = [{
'url': 'http://watch.hgtv.com/show/HGTVE/Best-Ever-Treehouses/2241515/Best-Ever-Treehouses/',
'md5': '26545fd676d939954c6808274bdb905a', 'md5': '26545fd676d939954c6808274bdb905a',
'info_dict': { 'info_dict': {
'id': '0256538', 'id': '4173834',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Seeking a Wow House', 'title': 'Best Ever Treehouses',
'description': 'Buyers retiring in Palm Springs, California, want a modern house with major wow factor. They\'re also looking for a pool and a large, open floorplan with tall windows looking out at the views.', 'description': "We're searching for the most over the top treehouses.",
'uploader': 'SCNI', 'uploader': 'ANV',
'upload_date': '20170207', 'upload_date': '20170922',
'timestamp': 1486450493, 'timestamp': 1506056400,
}, },
'skip': 'requires TV provider authentication', 'params': {
'skip_download': True,
},
'add_ie': [AnvatoIE.ie_key()],
}, {
'url': 'http://watch.diynetwork.com/show/DSAL/Salvage-Dawgs/2656646/Covington-Church/',
'only_matching': True,
}, {
'url': 'http://watch.diynetwork.com/player.HNT.html#2656646',
'only_matching': True,
}, {
'url': 'http://watch.geniuskitchen.com/player/3787617/Ample-Hills-Ice-Cream-Bike/',
'only_matching': True,
}]
_SNI_TABLE = {
'hgtv': 'hgtv',
'diynetwork': 'diy',
'foodnetwork': 'food',
'cookingchanneltv': 'cook',
'travelchannel': 'trav',
'geniuskitchen': 'genius',
} }
_SNI_HOST = 'web.api.video.snidigital.com'
_AWS_REGION = 'us-east-1'
_AWS_IDENTITY_ID_JSON = json.dumps({
'IdentityId': '%s:7655847c-0ae7-4d9b-80d6-56c062927eb3' % _AWS_REGION
})
_AWS_USER_AGENT = 'aws-sdk-js/2.80.0 callback'
_AWS_API_KEY = 'E7wSQmq0qK6xPrF13WmzKiHo4BQ7tip4pQcSXVl1'
_AWS_SERVICE = 'execute-api'
_AWS_REQUEST = 'aws4_request'
_AWS_SIGNED_HEADERS = ';'.join([
'host', 'x-amz-date', 'x-amz-security-token', 'x-api-key'])
_AWS_CANONICAL_REQUEST_TEMPLATE = '''GET
%(uri)s
host:%(host)s
x-amz-date:%(date)s
x-amz-security-token:%(token)s
x-api-key:%(key)s
%(signed_headers)s
%(payload_hash)s'''
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) mobj = re.match(self._VALID_URL, url)
webpage = self._download_webpage(url, video_id) site_id, video_id = mobj.group('site', 'id')
channel = self._parse_json(self._search_regex(
r'"channels"\s*:\s*(\[.+\])',
webpage, 'channels'), video_id)[0]
video_data = next(v for v in channel['videos'] if v.get('nlvid') == video_id)
title = video_data['title']
release_url = video_data['releaseUrl']
if video_data.get('restricted'):
requestor_id = self._search_regex(
r'requestorId\s*=\s*"([^"]+)";', webpage, 'requestor id')
resource = self._get_mvpd_resource(
requestor_id, title, video_id,
video_data.get('ratings', [{}])[0].get('rating'))
auth = self._extract_mvpd_auth(
url, video_id, requestor_id, resource)
release_url = update_url_query(release_url, {'auth': auth})
return { def aws_hash(s):
'_type': 'url_transparent', return hashlib.sha256(s.encode('utf-8')).hexdigest()
'id': video_id,
'title': title, token = self._download_json(
'url': smuggle_url(release_url, {'force_smil_url': True}), 'https://cognito-identity.us-east-1.amazonaws.com/', video_id,
'description': video_data.get('description'), data=self._AWS_IDENTITY_ID_JSON.encode('utf-8'),
'thumbnail': video_data.get('thumbnailUrl'), headers={
'series': video_data.get('showTitle'), 'Accept': '*/*',
'season_number': int_or_none(video_data.get('season')), 'Content-Type': 'application/x-amz-json-1.1',
'episode_number': int_or_none(video_data.get('episodeNumber')), 'Referer': url,
'ie_key': 'ThePlatform', 'X-Amz-Content-Sha256': aws_hash(self._AWS_IDENTITY_ID_JSON),
'X-Amz-Target': 'AWSCognitoIdentityService.GetOpenIdToken',
'X-Amz-User-Agent': self._AWS_USER_AGENT,
})['Token']
sts = self._download_xml(
'https://sts.amazonaws.com/', video_id, data=urlencode_postdata({
'Action': 'AssumeRoleWithWebIdentity',
'RoleArn': 'arn:aws:iam::710330595350:role/Cognito_WebAPIUnauth_Role',
'RoleSessionName': 'web-identity',
'Version': '2011-06-15',
'WebIdentityToken': token,
}), headers={
'Referer': url,
'X-Amz-User-Agent': self._AWS_USER_AGENT,
'Content-Type': 'application/x-www-form-urlencoded; charset=utf-8',
})
def get(key):
return xpath_text(
sts, './/{https://sts.amazonaws.com/doc/2011-06-15/}%s' % key,
fatal=True)
access_key_id = get('AccessKeyId')
secret_access_key = get('SecretAccessKey')
session_token = get('SessionToken')
# Task 1: http://docs.aws.amazon.com/general/latest/gr/sigv4-create-canonical-request.html
uri = '/1/web/brands/%s/episodes/scrid/%s' % (self._SNI_TABLE[site_id], video_id)
datetime_now = datetime.datetime.utcnow().strftime('%Y%m%dT%H%M%SZ')
date = datetime_now[:8]
canonical_string = self._AWS_CANONICAL_REQUEST_TEMPLATE % {
'uri': uri,
'host': self._SNI_HOST,
'date': datetime_now,
'token': session_token,
'key': self._AWS_API_KEY,
'signed_headers': self._AWS_SIGNED_HEADERS,
'payload_hash': aws_hash(''),
} }
# Task 2: http://docs.aws.amazon.com/general/latest/gr/sigv4-create-string-to-sign.html
credential_string = '/'.join([date, self._AWS_REGION, self._AWS_SERVICE, self._AWS_REQUEST])
string_to_sign = '\n'.join([
'AWS4-HMAC-SHA256', datetime_now, credential_string,
aws_hash(canonical_string)])
# Task 3: http://docs.aws.amazon.com/general/latest/gr/sigv4-calculate-signature.html
def aws_hmac(key, msg):
return hmac.new(key, msg.encode('utf-8'), hashlib.sha256)
def aws_hmac_digest(key, msg):
return aws_hmac(key, msg).digest()
def aws_hmac_hexdigest(key, msg):
return aws_hmac(key, msg).hexdigest()
k_secret = 'AWS4' + secret_access_key
k_date = aws_hmac_digest(k_secret.encode('utf-8'), date)
k_region = aws_hmac_digest(k_date, self._AWS_REGION)
k_service = aws_hmac_digest(k_region, self._AWS_SERVICE)
k_signing = aws_hmac_digest(k_service, self._AWS_REQUEST)
signature = aws_hmac_hexdigest(k_signing, string_to_sign)
auth_header = ', '.join([
'AWS4-HMAC-SHA256 Credential=%s' % '/'.join(
[access_key_id, date, self._AWS_REGION, self._AWS_SERVICE, self._AWS_REQUEST]),
'SignedHeaders=%s' % self._AWS_SIGNED_HEADERS,
'Signature=%s' % signature,
])
mcp_id = self._download_json(
'https://%s%s' % (self._SNI_HOST, uri), video_id, headers={
'Accept': '*/*',
'Referer': url,
'Authorization': auth_header,
'X-Amz-Date': datetime_now,
'X-Amz-Security-Token': session_token,
'X-Api-Key': self._AWS_API_KEY,
})['results'][0]['mcpId']
return self.url_result(
smuggle_url(
'anvato:anvato_scripps_app_web_prod_0837996dbe373629133857ae9eb72e740424d80a:%s' % mcp_id,
{'geo_countries': ['US']}),
AnvatoIE.ie_key(), video_id=mcp_id)

View File

@@ -18,46 +18,32 @@ from ..utils import (
class ShahidIE(InfoExtractor): class ShahidIE(InfoExtractor):
_NETRC_MACHINE = 'shahid' _NETRC_MACHINE = 'shahid'
_VALID_URL = r'https?://shahid\.mbc\.net/ar/(?P<type>episode|movie)/(?P<id>\d+)' _VALID_URL = r'https?://shahid\.mbc\.net/ar/(?:serie|show|movie)s/[^/]+/(?P<type>episode|clip|movie)-(?P<id>\d+)'
_TESTS = [{ _TESTS = [{
'url': 'https://shahid.mbc.net/ar/episode/90574/%D8%A7%D9%84%D9%85%D9%84%D9%83-%D8%B9%D8%A8%D8%AF%D8%A7%D9%84%D9%84%D9%87-%D8%A7%D9%84%D8%A5%D9%86%D8%B3%D8%A7%D9%86-%D8%A7%D9%84%D9%85%D9%88%D8%B3%D9%85-1-%D9%83%D9%84%D9%8A%D8%A8-3.html', 'url': 'https://shahid.mbc.net/ar/shows/%D9%85%D8%AC%D9%84%D8%B3-%D8%A7%D9%84%D8%B4%D8%A8%D8%A7%D8%A8-%D8%A7%D9%84%D9%85%D9%88%D8%B3%D9%85-1-%D9%83%D9%84%D9%8A%D8%A8-1/clip-275286',
'info_dict': { 'info_dict': {
'id': '90574', 'id': '275286',
'ext': 'mp4', 'ext': 'mp4',
'title': 'الملك عبدالله الإنسان الموسم 1 كليب 3', 'title': 'مجلس الشباب الموسم 1 كليب 1',
'description': 'الفيلم الوثائقي - الملك عبد الله الإنسان', 'timestamp': 1506988800,
'duration': 2972, 'upload_date': '20171003',
'timestamp': 1422057420,
'upload_date': '20150123',
}, },
'params': { 'params': {
# m3u8 download # m3u8 download
'skip_download': True, 'skip_download': True,
} }
}, { }, {
'url': 'https://shahid.mbc.net/ar/movie/151746/%D8%A7%D9%84%D9%82%D9%86%D8%A7%D8%B5%D8%A9.html', 'url': 'https://shahid.mbc.net/ar/movies/%D8%A7%D9%84%D9%82%D9%86%D8%A7%D8%B5%D8%A9/movie-151746',
'only_matching': True 'only_matching': True
}, { }, {
# shahid plus subscriber only # shahid plus subscriber only
'url': 'https://shahid.mbc.net/ar/episode/90511/%D9%85%D8%B1%D8%A7%D9%8A%D8%A7-2011-%D8%A7%D9%84%D9%85%D9%88%D8%B3%D9%85-1-%D8%A7%D9%84%D8%AD%D9%84%D9%82%D8%A9-1.html', 'url': 'https://shahid.mbc.net/ar/series/%D9%85%D8%B1%D8%A7%D9%8A%D8%A7-2011-%D8%A7%D9%84%D9%85%D9%88%D8%B3%D9%85-1-%D8%A7%D9%84%D8%AD%D9%84%D9%82%D8%A9-1/episode-90511',
'only_matching': True 'only_matching': True
}] }]
def _real_initialize(self): def _api2_request(self, *args, **kwargs):
email, password = self._get_login_info()
if email is None:
return
try: try:
user_data = self._download_json( return self._download_json(*args, **kwargs)
'https://shahid.mbc.net/wd/service/users/login',
None, 'Logging in', data=json.dumps({
'email': email,
'password': password,
'basic': 'false',
}).encode('utf-8'), headers={
'Content-Type': 'application/json; charset=UTF-8',
})['user']
except ExtractorError as e: except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError): if isinstance(e.cause, compat_HTTPError):
fail_data = self._parse_json( fail_data = self._parse_json(
@@ -69,6 +55,21 @@ class ShahidIE(InfoExtractor):
raise ExtractorError(faults_message, expected=True) raise ExtractorError(faults_message, expected=True)
raise raise
def _real_initialize(self):
email, password = self._get_login_info()
if email is None:
return
user_data = self._api2_request(
'https://shahid.mbc.net/wd/service/users/login',
None, 'Logging in', data=json.dumps({
'email': email,
'password': password,
'basic': 'false',
}).encode('utf-8'), headers={
'Content-Type': 'application/json; charset=UTF-8',
})['user']
self._download_webpage( self._download_webpage(
'https://shahid.mbc.net/populateContext', 'https://shahid.mbc.net/populateContext',
None, 'Populate Context', data=urlencode_postdata({ None, 'Populate Context', data=urlencode_postdata({
@@ -93,15 +94,17 @@ class ShahidIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
page_type, video_id = re.match(self._VALID_URL, url).groups() page_type, video_id = re.match(self._VALID_URL, url).groups()
if page_type == 'clip':
page_type = 'episode'
player = self._get_api_data(self._download_json( playout = self._api2_request(
'https://shahid.mbc.net/arContent/getPlayerContent-param-.id-%s.type-player.html' % video_id, 'https://api2.shahid.net/proxy/v2/playout/url/' + video_id,
video_id, 'Downloading player JSON')) video_id, 'Downloading player JSON')['playout']
if player.get('drm'): if playout.get('drm'):
raise ExtractorError('This video is DRM protected.', expected=True) raise ExtractorError('This video is DRM protected.', expected=True)
formats = self._extract_m3u8_formats(player['url'], video_id, 'mp4') formats = self._extract_m3u8_formats(playout['url'], video_id, 'mp4')
self._sort_formats(formats) self._sort_formats(formats)
video = self._get_api_data(self._download_json( video = self._get_api_data(self._download_json(

View File

@@ -44,6 +44,7 @@ class SpikeIE(MTVServicesInfoExtractor):
_FEED_URL = 'http://www.spike.com/feeds/mrss/' _FEED_URL = 'http://www.spike.com/feeds/mrss/'
_MOBILE_TEMPLATE = 'http://m.spike.com/videos/video.rbml?id=%s' _MOBILE_TEMPLATE = 'http://m.spike.com/videos/video.rbml?id=%s'
_CUSTOM_URL_REGEX = re.compile(r'spikenetworkapp://([^/]+/[-a-fA-F0-9]+)') _CUSTOM_URL_REGEX = re.compile(r'spikenetworkapp://([^/]+/[-a-fA-F0-9]+)')
_GEO_COUNTRIES = ['US']
def _extract_mgid(self, webpage): def _extract_mgid(self, webpage):
mgid = super(SpikeIE, self)._extract_mgid(webpage) mgid = super(SpikeIE, self)._extract_mgid(webpage)

View File

@@ -1,7 +1,6 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import json
import re import re
from .common import InfoExtractor from .common import InfoExtractor
@@ -29,6 +28,7 @@ class UDNEmbedIE(InfoExtractor):
# m3u8 download # m3u8 download
'skip_download': True, 'skip_download': True,
}, },
'expected_warnings': ['Failed to parse JSON Expecting value'],
}, { }, {
'url': 'https://video.udn.com/embed/news/300040', 'url': 'https://video.udn.com/embed/news/300040',
'only_matching': True, 'only_matching': True,
@@ -43,10 +43,21 @@ class UDNEmbedIE(InfoExtractor):
page = self._download_webpage(url, video_id) page = self._download_webpage(url, video_id)
options = json.loads(js_to_json(self._html_search_regex( options_str = self._html_search_regex(
r'var\s+options\s*=\s*([^;]+);', page, 'video urls dictionary'))) r'var\s+options\s*=\s*([^;]+);', page, 'options')
trans_options_str = js_to_json(options_str)
options = self._parse_json(trans_options_str, 'options', fatal=False) or {}
if options:
video_urls = options['video'] video_urls = options['video']
title = options['title']
poster = options.get('poster')
else:
video_urls = self._parse_json(self._html_search_regex(
r'"video"\s*:\s*({.+?})\s*,', trans_options_str, 'video urls'), 'video urls')
title = self._html_search_regex(
r"title\s*:\s*'(.+?)'\s*,", options_str, 'title')
poster = self._html_search_regex(
r"poster\s*:\s*'(.+?)'\s*,", options_str, 'poster', default=None)
if video_urls.get('youtube'): if video_urls.get('youtube'):
return self.url_result(video_urls.get('youtube'), 'Youtube') return self.url_result(video_urls.get('youtube'), 'Youtube')
@@ -68,7 +79,7 @@ class UDNEmbedIE(InfoExtractor):
formats.extend(self._extract_f4m_formats( formats.extend(self._extract_f4m_formats(
video_url, video_id, f4m_id='hds')) video_url, video_id, f4m_id='hds'))
else: else:
mobj = re.search(r'_(?P<height>\d+)p_(?P<tbr>\d+).mp4', video_url) mobj = re.search(r'_(?P<height>\d+)p_(?P<tbr>\d+)\.mp4', video_url)
a_format = { a_format = {
'url': video_url, 'url': video_url,
# video_type may be 'mp4', which confuses YoutubeDL # video_type may be 'mp4', which confuses YoutubeDL
@@ -83,14 +94,9 @@ class UDNEmbedIE(InfoExtractor):
self._sort_formats(formats) self._sort_formats(formats)
thumbnails = [{
'url': img_url,
'id': img_type,
} for img_type, img_url in options.get('gallery', [{}])[0].items() if img_url]
return { return {
'id': video_id, 'id': video_id,
'formats': formats, 'formats': formats,
'title': options['title'], 'title': title,
'thumbnails': thumbnails, 'thumbnail': poster,
} }

View File

@@ -1,131 +1,41 @@
# coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
from .mtv import MTVIE from .mtv import MTVServicesInfoExtractor
import re
from ..utils import fix_xml_ampersands
class VH1IE(MTVIE): class VH1IE(MTVServicesInfoExtractor):
IE_NAME = 'vh1.com' IE_NAME = 'vh1.com'
_FEED_URL = 'http://www.vh1.com/player/embed/AS3/fullepisode/rss/' _FEED_URL = 'http://www.vh1.com/feeds/mrss/'
_TESTS = [{ _TESTS = [{
'url': 'http://www.vh1.com/video/metal-evolution/full-episodes/progressive-metal/1678612/playlist.jhtml', 'url': 'http://www.vh1.com/episodes/0umwpq/hip-hop-squares-kent-jones-vs-nick-young-season-1-ep-120',
'playlist': [
{
'md5': '7827a7505f59633983165bbd2c119b52',
'info_dict': { 'info_dict': {
'id': '731565', 'title': 'Kent Jones vs. Nick Young',
'ext': 'mp4', 'description': 'Come to Play. Stay to Party. With Mike Epps, TIP, OShea Jackson Jr., T-Pain, Tisha Campbell-Martin and more.',
'title': 'Metal Evolution: Ep. 11 Act 1',
'description': 'Many rock academics have proclaimed that the truly progressive musicianship of the last 20 years has been found right here in the world of heavy metal, rather than obvious locales such as jazz, fusion or progressive rock. It stands to reason then, that much of this jaw-dropping virtuosity occurs within what\'s known as progressive metal, a genre that takes root with the likes of Rush in the \'70s, Queensryche and Fates Warning in the \'80s, and Dream Theater in the \'90s. Since then, the genre has exploded with creativity, spawning mind-bending, genre-defying acts such as Tool, Mastodon, Coheed And Cambria, Porcupine Tree, Meshuggah, A Perfect Circle and Opeth. Episode 12 looks at the extreme musicianship of these bands, as well as their often extreme literary prowess and conceptual strength, the end result being a rich level of respect and attention such challenging acts have brought upon the world of heavy metal, from a critical community usually dismissive of the form.'
}
}, },
{ 'playlist_mincount': 4,
'md5': '34fb4b7321c546b54deda2102a61821f',
'info_dict': {
'id': '731567',
'ext': 'mp4',
'title': 'Metal Evolution: Ep. 11 Act 2',
'description': 'Many rock academics have proclaimed that the truly progressive musicianship of the last 20 years has been found right here in the world of heavy metal, rather than obvious locales such as jazz, fusion or progressive rock. It stands to reason then, that much of this jaw-dropping virtuosity occurs within what\'s known as progressive metal, a genre that takes root with the likes of Rush in the \'70s, Queensryche and Fates Warning in the \'80s, and Dream Theater in the \'90s. Since then, the genre has exploded with creativity, spawning mind-bending, genre-defying acts such as Tool, Mastodon, Coheed And Cambria, Porcupine Tree, Meshuggah, A Perfect Circle and Opeth. Episode 11 looks at the extreme musicianship of these bands, as well as their often extreme literary prowess and conceptual strength, the end result being a rich level of respect and attention such challenging acts have brought upon the world of heavy metal, from a critical community usually dismissive of the form.'
}
},
{
'md5': '813f38dba4c1b8647196135ebbf7e048',
'info_dict': {
'id': '731568',
'ext': 'mp4',
'title': 'Metal Evolution: Ep. 11 Act 3',
'description': 'Many rock academics have proclaimed that the truly progressive musicianship of the last 20 years has been found right here in the world of heavy metal, rather than obvious locales such as jazz, fusion or progressive rock. It stands to reason then, that much of this jaw-dropping virtuosity occurs within what\'s known as progressive metal, a genre that takes root with the likes of Rush in the \'70s, Queensryche and Fates Warning in the \'80s, and Dream Theater in the \'90s. Since then, the genre has exploded with creativity, spawning mind-bending, genre-defying acts such as Tool, Mastodon, Coheed And Cambria, Porcupine Tree, Meshuggah, A Perfect Circle and Opeth. Episode 11 looks at the extreme musicianship of these bands, as well as their often extreme literary prowess and conceptual strength, the end result being a rich level of respect and attention such challenging acts have brought upon the world of heavy metal, from a critical community usually dismissive of the form.'
}
},
{
'md5': '51adb72439dfaed11c799115d76e497f',
'info_dict': {
'id': '731569',
'ext': 'mp4',
'title': 'Metal Evolution: Ep. 11 Act 4',
'description': 'Many rock academics have proclaimed that the truly progressive musicianship of the last 20 years has been found right here in the world of heavy metal, rather than obvious locales such as jazz, fusion or progressive rock. It stands to reason then, that much of this jaw-dropping virtuosity occurs within what\'s known as progressive metal, a genre that takes root with the likes of Rush in the \'70s, Queensryche and Fates Warning in the \'80s, and Dream Theater in the \'90s. Since then, the genre has exploded with creativity, spawning mind-bending, genre-defying acts such as Tool, Mastodon, Coheed And Cambria, Porcupine Tree, Meshuggah, A Perfect Circle and Opeth. Episode 11 looks at the extreme musicianship of these bands, as well as their often extreme literary prowess and conceptual strength, the end result being a rich level of respect and attention such challenging acts have brought upon the world of heavy metal, from a critical community usually dismissive of the form.'
}
},
{
'md5': '93d554aaf79320703b73a95288c76a6e',
'info_dict': {
'id': '731570',
'ext': 'mp4',
'title': 'Metal Evolution: Ep. 11 Act 5',
'description': 'Many rock academics have proclaimed that the truly progressive musicianship of the last 20 years has been found right here in the world of heavy metal, rather than obvious locales such as jazz, fusion or progressive rock. It stands to reason then, that much of this jaw-dropping virtuosity occurs within what\'s known as progressive metal, a genre that takes root with the likes of Rush in the \'70s, Queensryche and Fates Warning in the \'80s, and Dream Theater in the \'90s. Since then, the genre has exploded with creativity, spawning mind-bending, genre-defying acts such as Tool, Mastodon, Coheed And Cambria, Porcupine Tree, Meshuggah, A Perfect Circle and Opeth. Episode 11 looks at the extreme musicianship of these bands, as well as their often extreme literary prowess and conceptual strength, the end result being a rich level of respect and attention such challenging acts have brought upon the world of heavy metal, from a critical community usually dismissive of the form.'
}
}
],
'skip': 'Blocked outside the US',
}, { }, {
# Clip # Clip
'url': 'http://www.vh1.com/video/misc/706675/metal-evolution-episode-1-pre-metal-show-clip.jhtml#id=1674118', 'url': 'http://www.vh1.com/video-clips/t74mif/scared-famous-scared-famous-extended-preview',
'md5': '7d67cf6d9cdc6b4f3d3ac97a55403844',
'info_dict': { 'info_dict': {
'id': '706675', 'id': '0a50c2d2-a86b-4141-9565-911c7e2d0b92',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Metal Evolution: Episode 1 Pre-Metal Show Clip', 'title': 'Scared Famous|October 9, 2017|1|NO-EPISODE#|Scared Famous + Extended Preview',
'description': 'The greatest documentary ever made about Heavy Metal begins as our host Sam Dunn travels the globe to seek out the origins and influences that helped create Heavy Metal. Sam speaks to legends like Kirk Hammett, Alice Cooper, Slash, Bill Ward, Geezer Butler, Tom Morello, Ace Frehley, Lemmy Kilmister, Dave Davies, and many many more. This episode is the prologue for the 11 hour series, and Sam goes back to the very beginning to reveal how Heavy Metal was created.' 'description': 'md5:eff5551a274c473a29463de40f7b09da',
'upload_date': '20171009',
'timestamp': 1507574700,
}, },
'skip': 'Blocked outside the US', 'params': {
}, { # m3u8 download
# Short link 'skip_download': True,
'url': 'http://www.vh1.com/video/play.jhtml?id=1678353',
'md5': '853192b87ad978732b67dd8e549b266a',
'info_dict': {
'id': '730355',
'ext': 'mp4',
'title': 'Metal Evolution: Episode 11 Progressive Metal Sneak',
'description': 'In Metal Evolution\'s finale sneak, Sam sits with Michael Giles of King Crimson and gets feedback from Metallica guitarist Kirk Hammett on why the group was influential.'
}, },
'skip': 'Blocked outside the US',
}, {
'url': 'http://www.vh1.com/video/macklemore-ryan-lewis/900535/cant-hold-us-ft-ray-dalton.jhtml',
'md5': 'b1bcb5b4380c9d7f544065589432dee7',
'info_dict': {
'id': '900535',
'ext': 'mp4',
'title': 'Macklemore & Ryan Lewis - "Can\'t Hold Us ft. Ray Dalton"',
'description': 'The Heist'
},
'skip': 'Blocked outside the US',
}] }]
_VALID_URL = r'''(?x) _VALID_URL = r'https?://(?:www\.)?vh1\.com/(?:video-clips|episodes)/(?P<id>[^/?#.]+)'
https?://www\.vh1\.com/video/
(?:
.+?/full-episodes/.+?/(?P<playlist_id>[^/]+)/playlist\.jhtml
|
(?:
play.jhtml\?id=|
misc/.+?/.+?\.jhtml\#id=
)
(?P<video_id>[0-9]+)$
|
[^/]+/(?P<music_id>[0-9]+)/[^/]+?
)
'''
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) playlist_id = self._match_id(url)
if mobj.group('music_id'): webpage = self._download_webpage(url, playlist_id)
id_field = 'vid' mgid = self._extract_triforce_mgid(webpage)
video_id = mobj.group('music_id') videos_info = self._get_videos_info(mgid)
else: return videos_info
video_id = mobj.group('playlist_id') or mobj.group('video_id')
id_field = 'id'
doc_url = '%s?%s=%s' % (self._FEED_URL, id_field, video_id)
idoc = self._download_xml(
doc_url, video_id,
'Downloading info', transform_source=fix_xml_ampersands)
entries = []
for item in idoc.findall('.//item'):
info = self._get_video_info(item)
if info:
entries.append(info)
return self.playlist_result(entries, playlist_id=video_id)

View File

@@ -1630,7 +1630,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
class="[^"]*"[^>]*> class="[^"]*"[^>]*>
[^<]+\.{3}\s* [^<]+\.{3}\s*
</a> </a>
''', r'\1', video_description) ''', lambda m: compat_urlparse.urljoin(url, m.group(1)), video_description)
video_description = clean_html(video_description) video_description = clean_html(video_description)
else: else:
fd_mobj = re.search(r'<meta name="description" content="([^"]+)"', video_webpage) fd_mobj = re.search(r'<meta name="description" content="([^"]+)"', video_webpage)

View File

@@ -1,3 +1,3 @@
from __future__ import unicode_literals from __future__ import unicode_literals
__version__ = '2017.10.12' __version__ = '2017.10.15.1'