Compare commits

...

20 Commits

Author SHA1 Message Date
Sergey M․
b30e4c2754 release 2016.11.04 2016-11-04 22:07:54 +07:00
Sergey M․
09ffe34b00 [ChangeLog] Actualize 2016-11-04 21:59:42 +07:00
Sergey M․
640aff1d0c [anvato] Improve formats extraction 2016-11-04 21:45:24 +07:00
Sergey M․
c897af8aac [cbslocal] Update test 2016-11-04 21:33:08 +07:00
Sergey M․
f3c705f8ec [fox9] Add extractor (closes #11110) 2016-11-04 21:32:30 +07:00
Sergey M․
f93ac1d175 [anvato] Extract more metadata 2016-11-04 21:17:56 +07:00
Sergey M․
c4c9b8440c [extractor/common] Tolerate malformed RESOLUTION attribute in m3u8 manifests (closes #11113) 2016-11-04 05:02:31 +07:00
Sergey M․
32f2627aed [vodlocker] Add another removed file pattern (closes #11106) 2016-11-03 22:22:40 +07:00
Sergey M․
9d64e1dcdc [downloader/ism] Fix typo 2016-11-03 22:15:09 +07:00
Remita Amine
10380e55de [downloader/ism] fix AVC Decoder Configuration Record creation in python 3 2016-11-03 16:08:57 +01:00
Remita Amine
22979993e7 [vice] add coding cookie 2016-11-03 16:07:22 +01:00
Remita Amine
b47ecd0b74 [vzaar] Add new extractor(closes #11093) 2016-11-03 12:50:41 +01:00
Yen Chi Hsuan
3a86b2c51e Ignore and clean .wav files 2016-11-03 18:55:55 +08:00
Remita Amine
b811b4c93b [vice] add support for uplynk preplay videos(#11101) 2016-11-03 10:37:07 +01:00
Remita Amine
f4dfa9a5ed [tubitv] fix extraction(closes #11061) 2016-11-03 09:04:20 +01:00
Remita Amine
3b4b66b50c [shahid] add support for authentication(closes #11091) 2016-11-03 00:44:12 +01:00
Sergey M․
4119a96ce5 [extractor/generic] Skip URLs we came from when delegating ISM extraction 2016-11-02 23:43:41 +07:00
Sergey M․
26aae56690 [extractor/generic] Improve ISM extraction 2016-11-02 23:34:37 +07:00
Remita Amine
4f9cd4d36f [radiocanada] extract subtitle(closes #11096) 2016-11-02 13:55:40 +01:00
Sergey M․
cc99a77ac1 [extractor/generic] Add support for ISM manifests 2016-11-02 03:01:13 +07:00
20 changed files with 380 additions and 150 deletions

View File

@@ -6,8 +6,8 @@
--- ---
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2016.11.02*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected. ### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2016.11.04*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2016.11.02** - [ ] I've **verified** and **I assure** that I'm running youtube-dl **2016.11.04**
### Before submitting an *issue* make sure you have: ### Before submitting an *issue* make sure you have:
- [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections - [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
@@ -35,7 +35,7 @@ $ youtube-dl -v <your command line>
[debug] User config: [] [debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj'] [debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251 [debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2016.11.02 [debug] youtube-dl version 2016.11.04
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2 [debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4 [debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {} [debug] Proxy map: {}

1
.gitignore vendored
View File

@@ -30,6 +30,7 @@ updates_key.pem
*.m4v *.m4v
*.mp3 *.mp3
*.3gp *.3gp
*.wav
*.part *.part
*.swp *.swp
test/testdata test/testdata

View File

@@ -1,3 +1,22 @@
version 2016.11.04
Core
* [extractor/common] Tolerate malformed RESOLUTION attribute in m3u8
manifests (#11113)
* [downloader/ism] Fix AVC Decoder Configuration Record
Extractors
+ [fox9] Add support for fox9.com (#11110)
+ [anvato] Extract more metadata and improve formats extraction
* [vodlocker] Improve removed videos detection (#11106)
+ [vzaar] Add support for vzaar.com (#11093)
+ [vice] Add support for uplynk preplay videos (#11101)
* [tubitv] Fix extraction (#11061)
+ [shahid] Add support for authentication (#11091)
+ [radiocanada] Add subtitles support (#11096)
+ [generic] Add support for ISM manifests
version 2016.11.02 version 2016.11.02
Core Core

View File

@@ -1,7 +1,7 @@
all: youtube-dl README.md CONTRIBUTING.md README.txt youtube-dl.1 youtube-dl.bash-completion youtube-dl.zsh youtube-dl.fish supportedsites all: youtube-dl README.md CONTRIBUTING.md README.txt youtube-dl.1 youtube-dl.bash-completion youtube-dl.zsh youtube-dl.fish supportedsites
clean: clean:
rm -rf youtube-dl.1.temp.md youtube-dl.1 youtube-dl.bash-completion README.txt MANIFEST build/ dist/ .coverage cover/ youtube-dl.tar.gz youtube-dl.zsh youtube-dl.fish youtube_dl/extractor/lazy_extractors.py *.dump *.part* *.info.json *.mp4 *.m4a *.flv *.mp3 *.avi *.mkv *.webm *.3gp *.jpg *.png CONTRIBUTING.md.tmp ISSUE_TEMPLATE.md.tmp youtube-dl youtube-dl.exe rm -rf youtube-dl.1.temp.md youtube-dl.1 youtube-dl.bash-completion README.txt MANIFEST build/ dist/ .coverage cover/ youtube-dl.tar.gz youtube-dl.zsh youtube-dl.fish youtube_dl/extractor/lazy_extractors.py *.dump *.part* *.info.json *.mp4 *.m4a *.flv *.mp3 *.avi *.mkv *.webm *.3gp *.wav *.jpg *.png CONTRIBUTING.md.tmp ISSUE_TEMPLATE.md.tmp youtube-dl youtube-dl.exe
find . -name "*.pyc" -delete find . -name "*.pyc" -delete
find . -name "*.class" -delete find . -name "*.class" -delete

View File

@@ -247,6 +247,7 @@
- **FootyRoom** - **FootyRoom**
- **Formula1** - **Formula1**
- **FOX** - **FOX**
- **FOX9**
- **Foxgay** - **Foxgay**
- **foxnews**: Fox News and Fox Business Video - **foxnews**: Fox News and Fox Business Video
- **foxnews:article** - **foxnews:article**
@@ -870,6 +871,7 @@
- **vube**: Vube.com - **vube**: Vube.com
- **VuClip** - **VuClip**
- **VyboryMos** - **VyboryMos**
- **Vzaar**
- **Walla** - **Walla**
- **washingtonpost** - **washingtonpost**
- **washingtonpost:article** - **washingtonpost:article**

View File

@@ -129,7 +129,7 @@ def write_piff_header(stream, params):
sample_entry_payload += u1616.pack(params['sampling_rate']) sample_entry_payload += u1616.pack(params['sampling_rate'])
if fourcc == 'AACL': if fourcc == 'AACL':
smaple_entry_box = box(b'mp4a', sample_entry_payload) sample_entry_box = box(b'mp4a', sample_entry_payload)
else: else:
sample_entry_payload = sample_entry_payload sample_entry_payload = sample_entry_payload
sample_entry_payload += u16.pack(0) # pre defined sample_entry_payload += u16.pack(0) # pre defined
@@ -149,9 +149,7 @@ def write_piff_header(stream, params):
if fourcc in ('H264', 'AVC1'): if fourcc in ('H264', 'AVC1'):
sps, pps = codec_private_data.split(u32.pack(1))[1:] sps, pps = codec_private_data.split(u32.pack(1))[1:]
avcc_payload = u8.pack(1) # configuration version avcc_payload = u8.pack(1) # configuration version
avcc_payload += sps[1] # avc profile indication avcc_payload += sps[1:4] # avc profile indication + profile compatibility + avc level indication
avcc_payload += sps[2] # profile compatibility
avcc_payload += sps[3] # avc level indication
avcc_payload += u8.pack(0xfc | (params.get('nal_unit_length_field', 4) - 1)) # complete represenation (1) + reserved (11111) + length size minus one avcc_payload += u8.pack(0xfc | (params.get('nal_unit_length_field', 4) - 1)) # complete represenation (1) + reserved (11111) + length size minus one
avcc_payload += u8.pack(1) # reserved (0) + number of sps (0000001) avcc_payload += u8.pack(1) # reserved (0) + number of sps (0000001)
avcc_payload += u16.pack(len(sps)) avcc_payload += u16.pack(len(sps))
@@ -160,8 +158,8 @@ def write_piff_header(stream, params):
avcc_payload += u16.pack(len(pps)) avcc_payload += u16.pack(len(pps))
avcc_payload += pps avcc_payload += pps
sample_entry_payload += box(b'avcC', avcc_payload) # AVC Decoder Configuration Record sample_entry_payload += box(b'avcC', avcc_payload) # AVC Decoder Configuration Record
smaple_entry_box = box(b'avc1', sample_entry_payload) # AVC Simple Entry sample_entry_box = box(b'avc1', sample_entry_payload) # AVC Simple Entry
stsd_payload += smaple_entry_box stsd_payload += sample_entry_box
stbl_payload = full_box(b'stsd', 0, 0, stsd_payload) # Sample Description Box stbl_payload = full_box(b'stsd', 0, 0, stsd_payload) # Sample Description Box

View File

@@ -157,22 +157,16 @@ class AnvatoIE(InfoExtractor):
video_data_url, video_id, transform_source=strip_jsonp, video_data_url, video_id, transform_source=strip_jsonp,
data=json.dumps(payload).encode('utf-8')) data=json.dumps(payload).encode('utf-8'))
def _extract_anvato_videos(self, webpage, video_id): def _get_anvato_videos(self, access_key, video_id):
anvplayer_data = self._parse_json(self._html_search_regex(
r'<script[^>]+data-anvp=\'([^\']+)\'', webpage,
'Anvato player data'), video_id)
video_id = anvplayer_data['video']
access_key = anvplayer_data['accessKey']
video_data = self._get_video_json(access_key, video_id) video_data = self._get_video_json(access_key, video_id)
formats = [] formats = []
for published_url in video_data['published_urls']: for published_url in video_data['published_urls']:
video_url = published_url['embed_url'] video_url = published_url['embed_url']
media_format = published_url.get('format')
ext = determine_ext(video_url) ext = determine_ext(video_url)
if ext == 'smil': if ext == 'smil' or media_format == 'smil':
formats.extend(self._extract_smil_formats(video_url, video_id)) formats.extend(self._extract_smil_formats(video_url, video_id))
continue continue
@@ -183,7 +177,7 @@ class AnvatoIE(InfoExtractor):
'tbr': tbr if tbr != 0 else None, 'tbr': tbr if tbr != 0 else None,
} }
if ext == 'm3u8': if ext == 'm3u8' or media_format in ('m3u8', 'm3u8-variant'):
# Not using _extract_m3u8_formats here as individual media # Not using _extract_m3u8_formats here as individual media
# playlists are also included in published_urls. # playlists are also included in published_urls.
if tbr is None: if tbr is None:
@@ -194,7 +188,7 @@ class AnvatoIE(InfoExtractor):
'format_id': '-'.join(filter(None, ['hls', compat_str(tbr)])), 'format_id': '-'.join(filter(None, ['hls', compat_str(tbr)])),
'ext': 'mp4', 'ext': 'mp4',
}) })
elif ext == 'mp3': elif ext == 'mp3' or media_format == 'mp3':
a_format['vcodec'] = 'none' a_format['vcodec'] = 'none'
else: else:
a_format.update({ a_format.update({
@@ -218,7 +212,19 @@ class AnvatoIE(InfoExtractor):
'formats': formats, 'formats': formats,
'title': video_data.get('def_title'), 'title': video_data.get('def_title'),
'description': video_data.get('def_description'), 'description': video_data.get('def_description'),
'tags': video_data.get('def_tags', '').split(','),
'categories': video_data.get('categories'), 'categories': video_data.get('categories'),
'thumbnail': video_data.get('thumbnail'), 'thumbnail': video_data.get('thumbnail'),
'timestamp': int_or_none(video_data.get(
'ts_published') or video_data.get('ts_added')),
'uploader': video_data.get('mcp_id'),
'duration': int_or_none(video_data.get('duration')),
'subtitles': subtitles, 'subtitles': subtitles,
} }
def _extract_anvato_videos(self, webpage, video_id):
anvplayer_data = self._parse_json(self._html_search_regex(
r'<script[^>]+data-anvp=\'([^\']+)\'', webpage,
'Anvato player data'), video_id)
return self._get_anvato_videos(
anvplayer_data['accessKey'], anvplayer_data['video'])

View File

@@ -22,6 +22,7 @@ class CBSLocalIE(AnvatoIE):
'thumbnail': 're:^https?://.*', 'thumbnail': 're:^https?://.*',
'timestamp': 1463440500, 'timestamp': 1463440500,
'upload_date': '20160516', 'upload_date': '20160516',
'uploader': 'CBS',
'subtitles': { 'subtitles': {
'en': 'mincount:5', 'en': 'mincount:5',
}, },
@@ -35,6 +36,7 @@ class CBSLocalIE(AnvatoIE):
'Syndication\\Curb.tv', 'Syndication\\Curb.tv',
'Content\\News' 'Content\\News'
], ],
'tags': ['CBS 2 News Evening'],
}, },
}, { }, {
# SendtoNews embed # SendtoNews embed

View File

@@ -1280,9 +1280,10 @@ class InfoExtractor(object):
} }
resolution = last_info.get('RESOLUTION') resolution = last_info.get('RESOLUTION')
if resolution: if resolution:
width_str, height_str = resolution.split('x') mobj = re.search(r'(?P<width>\d+)[xX](?P<height>\d+)', resolution)
f['width'] = int(width_str) if mobj:
f['height'] = int(height_str) f['width'] = int(mobj.group('width'))
f['height'] = int(mobj.group('height'))
# Unified Streaming Platform # Unified Streaming Platform
mobj = re.search( mobj = re.search(
r'audio.*?(?:%3D|=)(\d+)(?:-video.*?(?:%3D|=)(\d+))?', f['url']) r'audio.*?(?:%3D|=)(\d+)(?:-video.*?(?:%3D|=)(\d+))?', f['url'])

View File

@@ -296,6 +296,7 @@ from .footyroom import FootyRoomIE
from .formula1 import Formula1IE from .formula1 import Formula1IE
from .fourtube import FourTubeIE from .fourtube import FourTubeIE
from .fox import FOXIE from .fox import FOXIE
from .fox9 import FOX9IE
from .foxgay import FoxgayIE from .foxgay import FoxgayIE
from .foxnews import ( from .foxnews import (
FoxNewsIE, FoxNewsIE,
@@ -1101,6 +1102,7 @@ from .vrt import VRTIE
from .vube import VubeIE from .vube import VubeIE
from .vuclip import VuClipIE from .vuclip import VuClipIE
from .vyborymos import VyboryMosIE from .vyborymos import VyboryMosIE
from .vzaar import VzaarIE
from .walla import WallaIE from .walla import WallaIE
from .washingtonpost import ( from .washingtonpost import (
WashingtonPostIE, WashingtonPostIE,

View File

@@ -0,0 +1,43 @@
# coding: utf-8
from __future__ import unicode_literals
from .anvato import AnvatoIE
from ..utils import js_to_json
class FOX9IE(AnvatoIE):
_VALID_URL = r'https?://(?:www\.)?fox9\.com/(?:[^/]+/)+(?P<id>\d+)-story'
_TESTS = [{
'url': 'http://www.fox9.com/news/215123287-story',
'md5': 'd6e1b2572c3bab8a849c9103615dd243',
'info_dict': {
'id': '314473',
'ext': 'mp4',
'title': 'Bear climbs tree in downtown Duluth',
'description': 'md5:6a36bfb5073a411758a752455408ac90',
'duration': 51,
'timestamp': 1478123580,
'upload_date': '20161102',
'uploader': 'EPFOX',
'categories': ['News', 'Sports'],
'tags': ['news', 'video'],
},
}, {
'url': 'http://www.fox9.com/news/investigators/214070684-story',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
video_id = self._parse_json(
self._search_regex(
r'AnvatoPlaylist\s*\(\s*(\[.+?\])\s*\)\s*;',
webpage, 'anvato playlist'),
video_id, transform_source=js_to_json)[0]['video']
return self._get_anvato_videos(
'anvato_epfox_app_web_prod_b3373168e12f423f41504f207000188daf88251b',
video_id)

View File

@@ -1634,6 +1634,10 @@ class GenericIE(InfoExtractor):
doc = compat_etree_fromstring(webpage.encode('utf-8')) doc = compat_etree_fromstring(webpage.encode('utf-8'))
if doc.tag == 'rss': if doc.tag == 'rss':
return self._extract_rss(url, video_id, doc) return self._extract_rss(url, video_id, doc)
elif doc.tag == 'SmoothStreamingMedia':
info_dict['formats'] = self._parse_ism_formats(doc, url)
self._sort_formats(info_dict['formats'])
return info_dict
elif re.match(r'^(?:{[^}]+})?smil$', doc.tag): elif re.match(r'^(?:{[^}]+})?smil$', doc.tag):
smil = self._parse_smil(doc, url, video_id) smil = self._parse_smil(doc, url, video_id)
self._sort_formats(smil['formats']) self._sort_formats(smil['formats'])
@@ -2449,6 +2453,21 @@ class GenericIE(InfoExtractor):
entry_info_dict['formats'] = self._extract_mpd_formats(video_url, video_id) entry_info_dict['formats'] = self._extract_mpd_formats(video_url, video_id)
elif ext == 'f4m': elif ext == 'f4m':
entry_info_dict['formats'] = self._extract_f4m_formats(video_url, video_id) entry_info_dict['formats'] = self._extract_f4m_formats(video_url, video_id)
elif re.search(r'(?i)\.(?:ism|smil)/manifest', video_url) and video_url != url:
# Just matching .ism/manifest is not enough to be reliably sure
# whether it's actually an ISM manifest or some other streaming
# manifest since there are various streaming URL formats
# possible (see [1]) as well as some other shenanigans like
# .smil/manifest URLs that actually serve an ISM (see [2]) and
# so on.
# Thus the most reasonable way to solve this is to delegate
# to generic extractor in order to look into the contents of
# the manifest itself.
# 1. https://azure.microsoft.com/en-us/documentation/articles/media-services-deliver-content-overview/#streaming-url-formats
# 2. https://svs.itworkscdn.net/lbcivod/smil:itwfcdn/lbci/170976.smil/Manifest
entry_info_dict = self.url_result(
smuggle_url(video_url, {'to_generic': True}),
GenericIE.ie_key())
else: else:
entry_info_dict['url'] = video_url entry_info_dict['url'] = video_url

View File

@@ -125,6 +125,14 @@ class RadioCanadaIE(InfoExtractor):
f4m_id='hds', fatal=False)) f4m_id='hds', fatal=False))
self._sort_formats(formats) self._sort_formats(formats)
subtitles = {}
closed_caption_url = get_meta('closedCaption') or get_meta('closedCaptionHTML5')
if closed_caption_url:
subtitles['fr'] = [{
'url': closed_caption_url,
'ext': determine_ext(closed_caption_url, 'vtt'),
}]
return { return {
'id': video_id, 'id': video_id,
'title': get_meta('Title'), 'title': get_meta('Title'),
@@ -135,6 +143,7 @@ class RadioCanadaIE(InfoExtractor):
'season_number': int_or_none('SrcSaison'), 'season_number': int_or_none('SrcSaison'),
'episode_number': int_or_none('SrcEpisode'), 'episode_number': int_or_none('SrcEpisode'),
'upload_date': unified_strdate(get_meta('Date')), 'upload_date': unified_strdate(get_meta('Date')),
'subtitles': subtitles,
'formats': formats, 'formats': formats,
} }

View File

@@ -1,17 +1,24 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import re
import json
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_HTTPError
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
int_or_none, int_or_none,
parse_iso8601, parse_iso8601,
str_or_none, str_or_none,
urlencode_postdata,
clean_html,
) )
class ShahidIE(InfoExtractor): class ShahidIE(InfoExtractor):
_VALID_URL = r'https?://shahid\.mbc\.net/ar/episode/(?P<id>\d+)/?' _NETRC_MACHINE = 'shahid'
_VALID_URL = r'https?://shahid\.mbc\.net/ar/(?P<type>episode|movie)/(?P<id>\d+)'
_TESTS = [{ _TESTS = [{
'url': 'https://shahid.mbc.net/ar/episode/90574/%D8%A7%D9%84%D9%85%D9%84%D9%83-%D8%B9%D8%A8%D8%AF%D8%A7%D9%84%D9%84%D9%87-%D8%A7%D9%84%D8%A5%D9%86%D8%B3%D8%A7%D9%86-%D8%A7%D9%84%D9%85%D9%88%D8%B3%D9%85-1-%D9%83%D9%84%D9%8A%D8%A8-3.html', 'url': 'https://shahid.mbc.net/ar/episode/90574/%D8%A7%D9%84%D9%85%D9%84%D9%83-%D8%B9%D8%A8%D8%AF%D8%A7%D9%84%D9%84%D9%87-%D8%A7%D9%84%D8%A5%D9%86%D8%B3%D8%A7%D9%86-%D8%A7%D9%84%D9%85%D9%88%D8%B3%D9%85-1-%D9%83%D9%84%D9%8A%D8%A8-3.html',
'info_dict': { 'info_dict': {
@@ -27,18 +34,54 @@ class ShahidIE(InfoExtractor):
# m3u8 download # m3u8 download
'skip_download': True, 'skip_download': True,
} }
}, {
'url': 'https://shahid.mbc.net/ar/movie/151746/%D8%A7%D9%84%D9%82%D9%86%D8%A7%D8%B5%D8%A9.html',
'only_matching': True
}, { }, {
# shahid plus subscriber only # shahid plus subscriber only
'url': 'https://shahid.mbc.net/ar/episode/90511/%D9%85%D8%B1%D8%A7%D9%8A%D8%A7-2011-%D8%A7%D9%84%D9%85%D9%88%D8%B3%D9%85-1-%D8%A7%D9%84%D8%AD%D9%84%D9%82%D8%A9-1.html', 'url': 'https://shahid.mbc.net/ar/episode/90511/%D9%85%D8%B1%D8%A7%D9%8A%D8%A7-2011-%D8%A7%D9%84%D9%85%D9%88%D8%B3%D9%85-1-%D8%A7%D9%84%D8%AD%D9%84%D9%82%D8%A9-1.html',
'only_matching': True 'only_matching': True
}] }]
def _call_api(self, path, video_id, note): def _real_initialize(self):
data = self._download_json( email, password = self._get_login_info()
'http://api.shahid.net/api/v1_1/' + path, video_id, note, query={ if email is None:
'apiKey': 'sh@hid0nlin3', return
'hash': 'b2wMCTHpSmyxGqQjJFOycRmLSex+BpTK/ooxy6vHaqs=',
}).get('data', {}) try:
user_data = self._download_json(
'https://shahid.mbc.net/wd/service/users/login',
None, 'Logging in', data=json.dumps({
'email': email,
'password': password,
'basic': 'false',
}).encode('utf-8'), headers={
'Content-Type': 'application/json; charset=UTF-8',
})['user']
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError):
fail_data = self._parse_json(
e.cause.read().decode('utf-8'), None, fatal=False)
if fail_data:
faults = fail_data.get('faults', [])
faults_message = ', '.join([clean_html(fault['userMessage']) for fault in faults if fault.get('userMessage')])
if faults_message:
raise ExtractorError(faults_message, expected=True)
raise
self._download_webpage(
'https://shahid.mbc.net/populateContext',
None, 'Populate Context', data=urlencode_postdata({
'firstName': user_data['firstName'],
'lastName': user_data['lastName'],
'userName': user_data['email'],
'csg_user_name': user_data['email'],
'subscriberId': user_data['id'],
'sessionId': user_data['sessionId'],
}))
def _get_api_data(self, response):
data = response.get('data', {})
error = data.get('error') error = data.get('error')
if error: if error:
@@ -49,11 +92,11 @@ class ShahidIE(InfoExtractor):
return data return data
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) page_type, video_id = re.match(self._VALID_URL, url).groups()
player = self._call_api( player = self._get_api_data(self._download_json(
'Content/Episode/%s' % video_id, 'https://shahid.mbc.net/arContent/getPlayerContent-param-.id-%s.type-player.html' % video_id,
video_id, 'Downloading player JSON') video_id, 'Downloading player JSON'))
if player.get('drm'): if player.get('drm'):
raise ExtractorError('This video is DRM protected.', expected=True) raise ExtractorError('This video is DRM protected.', expected=True)
@@ -61,9 +104,12 @@ class ShahidIE(InfoExtractor):
formats = self._extract_m3u8_formats(player['url'], video_id, 'mp4') formats = self._extract_m3u8_formats(player['url'], video_id, 'mp4')
self._sort_formats(formats) self._sort_formats(formats)
video = self._call_api( video = self._get_api_data(self._download_json(
'episode/%s' % video_id, video_id, 'http://api.shahid.net/api/v1_1/%s/%s' % (page_type, video_id),
'Downloading video JSON')['episode'] video_id, 'Downloading video JSON', query={
'apiKey': 'sh@hid0nlin3',
'hash': 'b2wMCTHpSmyxGqQjJFOycRmLSex+BpTK/ooxy6vHaqs=',
}))[page_type]
title = video['title'] title = video['title']
categories = [ categories = [

View File

@@ -9,7 +9,6 @@ from ..utils import (
int_or_none, int_or_none,
sanitized_Request, sanitized_Request,
urlencode_postdata, urlencode_postdata,
parse_iso8601,
) )
@@ -19,17 +18,13 @@ class TubiTvIE(InfoExtractor):
_NETRC_MACHINE = 'tubitv' _NETRC_MACHINE = 'tubitv'
_TEST = { _TEST = {
'url': 'http://tubitv.com/video/283829/the_comedian_at_the_friday', 'url': 'http://tubitv.com/video/283829/the_comedian_at_the_friday',
'md5': '43ac06be9326f41912dc64ccf7a80320',
'info_dict': { 'info_dict': {
'id': '283829', 'id': '283829',
'ext': 'mp4', 'ext': 'mp4',
'title': 'The Comedian at The Friday', 'title': 'The Comedian at The Friday',
'description': 'A stand up comedian is forced to look at the decisions in his life while on a one week trip to the west coast.', 'description': 'A stand up comedian is forced to look at the decisions in his life while on a one week trip to the west coast.',
'uploader': 'Indie Rights Films', 'uploader_id': 'bc168bee0d18dd1cb3b86c68706ab434',
'upload_date': '20160111',
'timestamp': 1452555979,
},
'params': {
'skip_download': 'HLS download',
}, },
} }
@@ -58,19 +53,28 @@ class TubiTvIE(InfoExtractor):
video_id = self._match_id(url) video_id = self._match_id(url)
video_data = self._download_json( video_data = self._download_json(
'http://tubitv.com/oz/videos/%s/content' % video_id, video_id) 'http://tubitv.com/oz/videos/%s/content' % video_id, video_id)
title = video_data['n'] title = video_data['title']
formats = self._extract_m3u8_formats( formats = self._extract_m3u8_formats(
video_data['mh'], video_id, 'mp4', 'm3u8_native') self._proto_relative_url(video_data['url']),
video_id, 'mp4', 'm3u8_native')
self._sort_formats(formats) self._sort_formats(formats)
thumbnails = []
for thumbnail_url in video_data.get('thumbnails', []):
if not thumbnail_url:
continue
thumbnails.append({
'url': self._proto_relative_url(thumbnail_url),
})
subtitles = {} subtitles = {}
for sub in video_data.get('sb', []): for sub in video_data.get('subtitles', []):
sub_url = sub.get('u') sub_url = sub.get('url')
if not sub_url: if not sub_url:
continue continue
subtitles.setdefault(sub.get('l', 'en'), []).append({ subtitles.setdefault(sub.get('lang', 'English'), []).append({
'url': sub_url, 'url': self._proto_relative_url(sub_url),
}) })
return { return {
@@ -78,9 +82,8 @@ class TubiTvIE(InfoExtractor):
'title': title, 'title': title,
'formats': formats, 'formats': formats,
'subtitles': subtitles, 'subtitles': subtitles,
'thumbnail': video_data.get('ph'), 'thumbnails': thumbnails,
'description': video_data.get('d'), 'description': video_data.get('description'),
'duration': int_or_none(video_data.get('s')), 'duration': int_or_none(video_data.get('duration')),
'timestamp': parse_iso8601(video_data.get('u')), 'uploader_id': video_data.get('publisher_id'),
'uploader': video_data.get('on'),
} }

View File

@@ -1,12 +1,93 @@
# coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import re import re
import time
import hashlib
import json
from .adobepass import AdobePassIE
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ExtractorError from ..compat import compat_HTTPError
from ..utils import (
int_or_none,
parse_age_limit,
str_or_none,
parse_duration,
ExtractorError,
extract_attributes,
)
class ViceIE(InfoExtractor): class ViceBaseIE(AdobePassIE):
def _extract_preplay_video(self, url, webpage):
watch_hub_data = extract_attributes(self._search_regex(
r'(?s)(<watch-hub\s*.+?</watch-hub>)', webpage, 'watch hub'))
video_id = watch_hub_data['vms-id']
title = watch_hub_data['video-title']
query = {}
is_locked = watch_hub_data.get('video-locked') == '1'
if is_locked:
resource = self._get_mvpd_resource(
'VICELAND', title, video_id,
watch_hub_data.get('video-rating'))
query['tvetoken'] = self._extract_mvpd_auth(url, video_id, 'VICELAND', resource)
# signature generation algorithm is reverse engineered from signatureGenerator in
# webpack:///../shared/~/vice-player/dist/js/vice-player.js in
# https://www.viceland.com/assets/common/js/web.vendor.bundle.js
exp = int(time.time()) + 14400
query.update({
'exp': exp,
'sign': hashlib.sha512(('%s:GET:%d' % (video_id, exp)).encode()).hexdigest(),
})
try:
host = 'www.viceland' if is_locked else self._PREPLAY_HOST
preplay = self._download_json('https://%s.com/en_us/preplay/%s' % (host, video_id), video_id, query=query)
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 400:
error = json.loads(e.cause.read().decode())
raise ExtractorError('%s said: %s' % (self.IE_NAME, error['details']), expected=True)
raise
video_data = preplay['video']
base = video_data['base']
uplynk_preplay_url = preplay['preplayURL']
episode = video_data.get('episode', {})
channel = video_data.get('channel', {})
subtitles = {}
cc_url = preplay.get('ccURL')
if cc_url:
subtitles['en'] = [{
'url': cc_url,
}]
return {
'_type': 'url_transparent',
'url': uplynk_preplay_url,
'id': video_id,
'title': title,
'description': base.get('body'),
'thumbnail': watch_hub_data.get('cover-image') or watch_hub_data.get('thumbnail'),
'duration': parse_duration(video_data.get('video_duration') or watch_hub_data.get('video-duration')),
'timestamp': int_or_none(video_data.get('created_at')),
'age_limit': parse_age_limit(video_data.get('video_rating')),
'series': video_data.get('show_title') or watch_hub_data.get('show-title'),
'episode_number': int_or_none(episode.get('episode_number') or watch_hub_data.get('episode')),
'episode_id': str_or_none(episode.get('id') or video_data.get('episode_id')),
'season_number': int_or_none(watch_hub_data.get('season')),
'season_id': str_or_none(episode.get('season_id')),
'uploader': channel.get('base', {}).get('title') or watch_hub_data.get('channel-title'),
'uploader_id': str_or_none(channel.get('id')),
'subtitles': subtitles,
'ie_key': 'UplynkPreplay',
}
class ViceIE(ViceBaseIE):
_VALID_URL = r'https?://(?:.+?\.)?vice\.com/(?:[^/]+/)?videos?/(?P<id>[^/?#&]+)' _VALID_URL = r'https?://(?:.+?\.)?vice\.com/(?:[^/]+/)?videos?/(?P<id>[^/?#&]+)'
_TESTS = [{ _TESTS = [{
@@ -21,7 +102,7 @@ class ViceIE(InfoExtractor):
'add_ie': ['Ooyala'], 'add_ie': ['Ooyala'],
}, { }, {
'url': 'http://www.vice.com/video/how-to-hack-a-car', 'url': 'http://www.vice.com/video/how-to-hack-a-car',
'md5': '6fb2989a3fed069fb8eab3401fc2d3c9', 'md5': 'a7ecf64ee4fa19b916c16f4b56184ae2',
'info_dict': { 'info_dict': {
'id': '3jstaBeXgAs', 'id': '3jstaBeXgAs',
'ext': 'mp4', 'ext': 'mp4',
@@ -32,6 +113,22 @@ class ViceIE(InfoExtractor):
'upload_date': '20140529', 'upload_date': '20140529',
}, },
'add_ie': ['Youtube'], 'add_ie': ['Youtube'],
}, {
'url': 'https://video.vice.com/en_us/video/the-signal-from-tolva/5816510690b70e6c5fd39a56',
'md5': '',
'info_dict': {
'id': '5816510690b70e6c5fd39a56',
'ext': 'mp4',
'uploader': 'Waypoint',
'title': 'The Signal From Tölva',
'uploader_id': '57f7d621e05ca860fa9ccaf9',
'timestamp': 1477941983938,
},
'params': {
# m3u8 download
'skip_download': True,
},
'add_ie': ['UplynkPreplay'],
}, { }, {
'url': 'https://news.vice.com/video/experimenting-on-animals-inside-the-monkey-lab', 'url': 'https://news.vice.com/video/experimenting-on-animals-inside-the-monkey-lab',
'only_matching': True, 'only_matching': True,
@@ -42,21 +139,21 @@ class ViceIE(InfoExtractor):
'url': 'https://munchies.vice.com/en/videos/watch-the-trailer-for-our-new-series-the-pizza-show', 'url': 'https://munchies.vice.com/en/videos/watch-the-trailer-for-our-new-series-the-pizza-show',
'only_matching': True, 'only_matching': True,
}] }]
_PREPLAY_HOST = 'video.vice'
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id) webpage, urlh = self._download_webpage_handle(url, video_id)
try: embed_code = self._search_regex(
embed_code = self._search_regex( r'embedCode=([^&\'"]+)', webpage,
r'embedCode=([^&\'"]+)', webpage, 'ooyala embed code', default=None)
'ooyala embed code', default=None) if embed_code:
if embed_code: return self.url_result('ooyala:%s' % embed_code, 'Ooyala')
return self.url_result('ooyala:%s' % embed_code, 'Ooyala') youtube_id = self._search_regex(
youtube_id = self._search_regex( r'data-youtube-id="([^"]+)"', webpage, 'youtube id', default=None)
r'data-youtube-id="([^"]+)"', webpage, 'youtube id') if youtube_id:
return self.url_result(youtube_id, 'Youtube') return self.url_result(youtube_id, 'Youtube')
except ExtractorError: return self._extract_preplay_video(urlh.geturl(), webpage)
raise ExtractorError('The page doesn\'t contain a video', expected=True)
class ViceShowIE(InfoExtractor): class ViceShowIE(InfoExtractor):

View File

@@ -1,23 +1,10 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import time from .vice import ViceBaseIE
import hashlib
import json
from .adobepass import AdobePassIE
from ..compat import compat_HTTPError
from ..utils import (
int_or_none,
parse_age_limit,
str_or_none,
parse_duration,
ExtractorError,
extract_attributes,
)
class VicelandIE(AdobePassIE): class VicelandIE(ViceBaseIE):
_VALID_URL = r'https?://(?:www\.)?viceland\.com/[^/]+/video/[^/]+/(?P<id>[a-f0-9]+)' _VALID_URL = r'https?://(?:www\.)?viceland\.com/[^/]+/video/[^/]+/(?P<id>[a-f0-9]+)'
_TEST = { _TEST = {
'url': 'https://www.viceland.com/en_us/video/cyberwar-trailer/57608447973ee7705f6fbd4e', 'url': 'https://www.viceland.com/en_us/video/cyberwar-trailer/57608447973ee7705f6fbd4e',
@@ -38,70 +25,9 @@ class VicelandIE(AdobePassIE):
}, },
'add_ie': ['UplynkPreplay'], 'add_ie': ['UplynkPreplay'],
} }
_PREPLAY_HOST = 'www.viceland'
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
watch_hub_data = extract_attributes(self._search_regex( return self._extract_preplay_video(url, webpage)
r'(?s)(<watch-hub\s*.+?</watch-hub>)', webpage, 'watch hub'))
video_id = watch_hub_data['vms-id']
title = watch_hub_data['video-title']
query = {}
if watch_hub_data.get('video-locked') == '1':
resource = self._get_mvpd_resource(
'VICELAND', title, video_id,
watch_hub_data.get('video-rating'))
query['tvetoken'] = self._extract_mvpd_auth(url, video_id, 'VICELAND', resource)
# signature generation algorithm is reverse engineered from signatureGenerator in
# webpack:///../shared/~/vice-player/dist/js/vice-player.js in
# https://www.viceland.com/assets/common/js/web.vendor.bundle.js
exp = int(time.time()) + 14400
query.update({
'exp': exp,
'sign': hashlib.sha512(('%s:GET:%d' % (video_id, exp)).encode()).hexdigest(),
})
try:
preplay = self._download_json('https://www.viceland.com/en_us/preplay/%s' % video_id, video_id, query=query)
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 400:
error = json.loads(e.cause.read().decode())
raise ExtractorError('%s said: %s' % (self.IE_NAME, error['details']), expected=True)
raise
video_data = preplay['video']
base = video_data['base']
uplynk_preplay_url = preplay['preplayURL']
episode = video_data.get('episode', {})
channel = video_data.get('channel', {})
subtitles = {}
cc_url = preplay.get('ccURL')
if cc_url:
subtitles['en'] = [{
'url': cc_url,
}]
return {
'_type': 'url_transparent',
'url': uplynk_preplay_url,
'id': video_id,
'title': title,
'description': base.get('body'),
'thumbnail': watch_hub_data.get('cover-image') or watch_hub_data.get('thumbnail'),
'duration': parse_duration(video_data.get('video_duration') or watch_hub_data.get('video-duration')),
'timestamp': int_or_none(video_data.get('created_at')),
'age_limit': parse_age_limit(video_data.get('video_rating')),
'series': video_data.get('show_title') or watch_hub_data.get('show-title'),
'episode_number': int_or_none(episode.get('episode_number') or watch_hub_data.get('episode')),
'episode_id': str_or_none(episode.get('id') or video_data.get('episode_id')),
'season_number': int_or_none(watch_hub_data.get('season')),
'season_id': str_or_none(episode.get('season_id')),
'uploader': channel.get('base', {}).get('title') or watch_hub_data.get('channel-title'),
'uploader_id': str_or_none(channel.get('id')),
'subtitles': subtitles,
'ie_key': 'UplynkPreplay',
}

View File

@@ -31,7 +31,8 @@ class VodlockerIE(InfoExtractor):
if any(p in webpage for p in ( if any(p in webpage for p in (
'>THIS FILE WAS DELETED<', '>THIS FILE WAS DELETED<',
'>File Not Found<', '>File Not Found<',
'The file you were looking for could not be found, sorry for any inconvenience.<')): 'The file you were looking for could not be found, sorry for any inconvenience.<',
'>The file was removed')):
raise ExtractorError('Video %s does not exist' % video_id, expected=True) raise ExtractorError('Video %s does not exist' % video_id, expected=True)
fields = self._hidden_inputs(webpage) fields = self._hidden_inputs(webpage)

View File

@@ -0,0 +1,55 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
int_or_none,
float_or_none,
)
class VzaarIE(InfoExtractor):
_VALID_URL = r'https?://(?:(?:www|view)\.)?vzaar\.com/(?:videos/)?(?P<id>\d+)'
_TESTS = [{
'url': 'https://vzaar.com/videos/1152805',
'md5': 'bde5ddfeb104a6c56a93a06b04901dbf',
'info_dict': {
'id': '1152805',
'ext': 'mp4',
'title': 'sample video (public)',
},
}, {
'url': 'https://view.vzaar.com/27272/player',
'md5': '3b50012ac9bbce7f445550d54e0508f2',
'info_dict': {
'id': '27272',
'ext': 'mp3',
'title': 'MP3',
},
}]
def _real_extract(self, url):
video_id = self._match_id(url)
video_data = self._download_json(
'http://view.vzaar.com/v2/%s/video' % video_id, video_id)
source_url = video_data['sourceUrl']
info = {
'id': video_id,
'title': video_data['videoTitle'],
'url': source_url,
'thumbnail': self._proto_relative_url(video_data.get('poster')),
'duration': float_or_none(video_data.get('videoDuration')),
}
if 'audio' in source_url:
info.update({
'vcodec': 'none',
'ext': 'mp3',
})
else:
info.update({
'width': int_or_none(video_data.get('width')),
'height': int_or_none(video_data.get('height')),
'ext': 'mp4',
})
return info

View File

@@ -1,3 +1,3 @@
from __future__ import unicode_literals from __future__ import unicode_literals
__version__ = '2016.11.02' __version__ = '2016.11.04'