Compare commits

...

34 Commits

Author SHA1 Message Date
Sergey M․
90352a8041 release 2016.12.20 2016-12-20 22:39:39 +07:00
Sergey M․
1f6a79b0af [ChangeLog] Actualize 2016-12-20 22:37:06 +07:00
Sergey M․
3d6761ba92 [vbox7] Fix extraction (closes #11494) 2016-12-20 21:53:51 +07:00
Remita Amine
f59d1146c0 [uktvplay] Add new extractor(closes #11027) 2016-12-20 12:52:46 +01:00
Remita Amine
b1c357975d [piksel] Add new extractor(closes #11246) 2016-12-20 12:35:03 +01:00
Remita Amine
d8c507c9e2 [vimeo] fix extraction for hls formats and add support for dash formats(closes #11490) 2016-12-20 12:35:03 +01:00
Remita Amine
7fe1592073 [common] fix dash codec information for mixed videos and fragment url construction(#11490) 2016-12-20 12:35:03 +01:00
Yen Chi Hsuan
8ab7e6c4cc [kaltura] Improve widget ID extraction (closes #11480) 2016-12-20 18:45:52 +08:00
Sergey M․
c80db5d398 [nrktv:direkte] Add support for live streams (#11488) 2016-12-19 23:47:45 +07:00
Remita Amine
5aaf012a4e [pbs] fix extraction for geo restricted videos(#7095) 2016-12-19 16:27:12 +01:00
Remita Amine
954529c10f [brightcove:new] skip widevine classic videos 2016-12-18 21:39:59 +01:00
Remita Amine
ed7b333fbf [viu] extract supported hls manifest 2016-12-18 18:24:01 +01:00
Remita Amine
723103151e [viu] improve extraction(closes #10607)(closes #11329) 2016-12-18 17:20:53 +01:00
ping
e7b6caef24 [viu] New extractor for viu.com 2016-12-18 17:20:53 +01:00
Sergey M․
ec79b1de1c Revert "Credit @pyx for meipai (#10718)"
This reverts commit d5e623aaa1.
2016-12-18 20:56:21 +07:00
Sergey M․
f73d7d5074 release 2016.12.18 2016-12-18 19:50:33 +07:00
Sergey M․
52a1d48d9f [ChangeLog] Actualize 2016-12-18 19:48:59 +07:00
Sergey M․
d5e623aaa1 Credit @pyx for meipai (#10718) 2016-12-18 19:46:57 +07:00
Remita Amine
199a47abba [ccma] Add new extractor(closes #11359) 2016-12-18 10:49:10 +01:00
Remita Amine
b42a0bf360 [laola1tv] add support embed urls and improve extraction(#11460) 2016-12-17 21:48:45 +01:00
Remita Amine
6e416b210c [nbc] fix extraction for msnbc videos(fixes #11466) 2016-12-17 18:11:13 +01:00
Sergey M․
04bf59ff64 [extractors] Add missing twitch imports 2016-12-17 23:03:50 +07:00
Sergey M․
87a449c1ed [extractor/common] Recognize DASH formats in html5 media entries 2016-12-17 23:03:13 +07:00
Sergey M․
93753aad20 [twitch] Adapt to new videos pages schema (closes #11469) 2016-12-17 20:20:23 +07:00
Sergey M․
2786818c33 [meipai] Fix regular videos extraction and improve (closes #10718) 2016-12-17 19:42:34 +07:00
Philip Xu
9b785768ac [meipai] Add extractor 2016-12-17 19:41:35 +07:00
Sergey M․
47c914f995 [ondemandkorea] Fix extraction (closes #10772) 2016-12-17 18:50:12 +07:00
Sergey M․
732d116aa7 [jwplatform] Improve duration extraction 2016-12-17 18:50:07 +07:00
Sergey M․
a495840d3b [jwplatform] Improve subtitles extraction 2016-12-17 18:50:00 +07:00
Sergey M․
b0c65c677f [utils] Improve urljoin 2016-12-17 18:49:55 +07:00
ping
594601f545 [ondemandkorea] Add extractor 2016-12-17 18:49:45 +07:00
Sergey M․
0ae9560eea [vporn] Use urljoin for thumbnail 2016-12-16 23:57:51 +07:00
Remita Amine
dc1f3a9f20 [vvvvid] do not cache the conn_id 2016-12-16 11:05:46 +01:00
Remita Amine
7b1e80792b [vvvvid] Add new extractor(closes #5915) 2016-12-16 09:05:34 +01:00
27 changed files with 1199 additions and 170 deletions

View File

@@ -6,8 +6,8 @@
---
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2016.12.15*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2016.12.15**
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2016.12.20*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2016.12.20**
### Before submitting an *issue* make sure you have:
- [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
@@ -35,7 +35,7 @@ $ youtube-dl -v <your command line>
[debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2016.12.15
[debug] youtube-dl version 2016.12.20
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {}

View File

@@ -1,3 +1,40 @@
version 2016.12.20
Core
* [extractor/common] Improve fragment URL construction for DASH media
* [extractor/common] Fix codec information extraction for mixed audio/video
DASH media (#11490)
Extractors
* [vbox7] Fix extraction (#11494)
+ [uktvplay] Add support for uktvplay.uktv.co.uk (#11027)
+ [piksel] Add support for player.piksel.com (#11246)
+ [vimeo] Add support for DASH formats
* [vimeo] Fix extraction for HLS formats (#11490)
* [kaltura] Fix wrong widget ID in some cases (#11480)
+ [nrktv:direkte] Add support for live streams (#11488)
* [pbs] Fix extraction for geo restricted videos (#7095)
* [brightcove:new] Skip widevine classic videos
+ [viu] Add support for viu.com (#10607, #11329)
version 2016.12.18
Core
+ [extractor/common] Recognize DASH formats in html5 media entries
Extractors
+ [ccma] Add support for ccma.cat (#11359)
* [laola1tv] Improve extraction
+ [laola1tv] Add support embed URLs (#11460)
* [nbc] Fix extraction for MSNBC videos (#11466)
* [twitch] Adapt to new videos pages URL schema (#11469)
+ [meipai] Add support for meipai.com (#10718)
* [jwplatform] Improve subtitles and duration extraction
+ [ondemandkorea] Add support for ondemandkorea.com (#10772)
+ [vvvvid] Add support for vvvvid.it (#5915)
version 2016.12.15
Core

View File

@@ -131,6 +131,7 @@
- **cbsnews**: CBS News
- **cbsnews:livevideo**: CBS News Live Videos
- **CBSSports**
- **CCMA**
- **CCTV**
- **CDA**
- **CeskaTelevize**
@@ -365,6 +366,7 @@
- **kuwo:song**: 酷我音乐
- **la7.it**
- **Laola1Tv**
- **Laola1TvEmbed**
- **LCI**
- **Lcp**
- **LcpPlay**
@@ -402,6 +404,7 @@
- **MatchTV**
- **MDR**: MDR.DE and KiKA
- **media.ccc.de**
- **Meipai**: 美拍
- **MelonVOD**
- **META**
- **metacafe**
@@ -514,6 +517,7 @@
- **NRKPlaylist**
- **NRKSkole**: NRK Skole
- **NRKTV**: NRK TV and NRK Radio
- **NRKTVDirekte**: NRK TV Direkte and NRK Radio Direkte
- **ntv.ru**
- **Nuvid**
- **NYTimes**
@@ -524,6 +528,7 @@
- **Odnoklassniki**
- **OktoberfestTV**
- **on.aol.com**
- **OnDemandKorea**
- **onet.tv**
- **onet.tv:channel**
- **OnionStudios**
@@ -547,6 +552,7 @@
- **PhilharmonieDeParis**: Philharmonie de Paris
- **phoenix.de**
- **Photobucket**
- **Piksel**
- **Pinkbike**
- **Pladform**
- **play.fm**
@@ -785,10 +791,13 @@
- **Tweakers**
- **twitch:chapter**
- **twitch:clips**
- **twitch:past_broadcasts**
- **twitch:profile**
- **twitch:stream**
- **twitch:video**
- **twitch:videos:all**
- **twitch:videos:highlights**
- **twitch:videos:past-broadcasts**
- **twitch:videos:uploads**
- **twitch:vod**
- **twitter**
- **twitter:amplify**
@@ -796,6 +805,7 @@
- **udemy**
- **udemy:course**
- **UDNEmbed**: 聯合影音
- **UKTVPlay**
- **Unistra**
- **uol.com.br**
- **uplynk**
@@ -860,6 +870,9 @@
- **Vimple**: Vimple - one-click video hosting
- **Vine**
- **vine:user**
- **Viu**
- **viu:ott**
- **viu:playlist**
- **Vivo**: vivo.sx
- **vk**: VK
- **vk:uservideos**: VK - User's Videos
@@ -874,6 +887,7 @@
- **VRT**
- **vube**: Vube.com
- **VuClip**
- **VVVVID**
- **VyboryMos**
- **Vzaar**
- **Walla**

View File

@@ -448,16 +448,20 @@ class TestUtil(unittest.TestCase):
def test_urljoin(self):
self.assertEqual(urljoin('http://foo.de/', '/a/b/c.txt'), 'http://foo.de/a/b/c.txt')
self.assertEqual(urljoin('//foo.de/', '/a/b/c.txt'), '//foo.de/a/b/c.txt')
self.assertEqual(urljoin('http://foo.de/', 'a/b/c.txt'), 'http://foo.de/a/b/c.txt')
self.assertEqual(urljoin('http://foo.de', '/a/b/c.txt'), 'http://foo.de/a/b/c.txt')
self.assertEqual(urljoin('http://foo.de', 'a/b/c.txt'), 'http://foo.de/a/b/c.txt')
self.assertEqual(urljoin('http://foo.de/', 'http://foo.de/a/b/c.txt'), 'http://foo.de/a/b/c.txt')
self.assertEqual(urljoin('http://foo.de/', '//foo.de/a/b/c.txt'), '//foo.de/a/b/c.txt')
self.assertEqual(urljoin(None, 'http://foo.de/a/b/c.txt'), 'http://foo.de/a/b/c.txt')
self.assertEqual(urljoin(None, '//foo.de/a/b/c.txt'), '//foo.de/a/b/c.txt')
self.assertEqual(urljoin('', 'http://foo.de/a/b/c.txt'), 'http://foo.de/a/b/c.txt')
self.assertEqual(urljoin(['foobar'], 'http://foo.de/a/b/c.txt'), 'http://foo.de/a/b/c.txt')
self.assertEqual(urljoin('http://foo.de/', None), None)
self.assertEqual(urljoin('http://foo.de/', ''), None)
self.assertEqual(urljoin('http://foo.de/', ['foobar']), None)
self.assertEqual(urljoin('http://foo.de/a/b/c.txt', '.././../d.txt'), 'http://foo.de/d.txt')
def test_parse_age_limit(self):
self.assertEqual(parse_age_limit(None), None)

View File

@@ -548,7 +548,7 @@ class BrightcoveNewIE(InfoExtractor):
container = source.get('container')
ext = mimetype2ext(source.get('type'))
src = source.get('src')
if ext == 'ism':
if ext == 'ism' or container == 'WVM':
continue
elif ext == 'm3u8' or container == 'M2TS':
if not src:

View File

@@ -0,0 +1,99 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
int_or_none,
parse_duration,
parse_iso8601,
clean_html,
)
class CCMAIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?ccma\.cat/(?:[^/]+/)*?(?P<type>video|audio)/(?P<id>\d+)'
_TESTS = [{
'url': 'http://www.ccma.cat/tv3/alacarta/lespot-de-la-marato-de-tv3/lespot-de-la-marato-de-tv3/video/5630208/',
'md5': '7296ca43977c8ea4469e719c609b0871',
'info_dict': {
'id': '5630208',
'ext': 'mp4',
'title': 'L\'espot de La Marató de TV3',
'description': 'md5:f12987f320e2f6e988e9908e4fe97765',
'timestamp': 1470918540,
'upload_date': '20160811',
}
}, {
'url': 'http://www.ccma.cat/catradio/alacarta/programa/el-consell-de-savis-analitza-el-derbi/audio/943685/',
'md5': 'fa3e38f269329a278271276330261425',
'info_dict': {
'id': '943685',
'ext': 'mp3',
'title': 'El Consell de Savis analitza el derbi',
'description': 'md5:e2a3648145f3241cb9c6b4b624033e53',
'upload_date': '20171205',
'timestamp': 1512507300,
}
}]
def _real_extract(self, url):
media_type, media_id = re.match(self._VALID_URL, url).groups()
media_data = {}
formats = []
profiles = ['pc'] if media_type == 'audio' else ['mobil', 'pc']
for i, profile in enumerate(profiles):
md = self._download_json('http://dinamics.ccma.cat/pvideo/media.jsp', media_id, query={
'media': media_type,
'idint': media_id,
'profile': profile,
}, fatal=False)
if md:
media_data = md
media_url = media_data.get('media', {}).get('url')
if media_url:
formats.append({
'format_id': profile,
'url': media_url,
'quality': i,
})
self._sort_formats(formats)
informacio = media_data['informacio']
title = informacio['titol']
durada = informacio.get('durada', {})
duration = int_or_none(durada.get('milisegons'), 1000) or parse_duration(durada.get('text'))
timestamp = parse_iso8601(informacio.get('data_emissio', {}).get('utc'))
subtitles = {}
subtitols = media_data.get('subtitols', {})
if subtitols:
sub_url = subtitols.get('url')
if sub_url:
subtitles.setdefault(
subtitols.get('iso') or subtitols.get('text') or 'ca', []).append({
'url': sub_url,
})
thumbnails = []
imatges = media_data.get('imatges', {})
if imatges:
thumbnail_url = imatges.get('url')
if thumbnail_url:
thumbnails = [{
'url': thumbnail_url,
'width': int_or_none(imatges.get('amplada')),
'height': int_or_none(imatges.get('alcada')),
}]
return {
'id': media_id,
'title': title,
'description': clean_html(informacio.get('descripcio')),
'duration': duration,
'timestamp': timestamp,
'thumnails': thumbnails,
'subtitles': subtitles,
'formats': formats,
}

View File

@@ -59,6 +59,7 @@ from ..utils import (
parse_m3u8_attributes,
extract_attributes,
parse_codecs,
urljoin,
)
@@ -1631,11 +1632,6 @@ class InfoExtractor(object):
extract_Initialization(segment_template)
return ms_info
def combine_url(base_url, target_url):
if re.match(r'^https?://', target_url):
return target_url
return '%s%s%s' % (base_url, '' if base_url.endswith('/') else '/', target_url)
mpd_duration = parse_duration(mpd_doc.get('mediaPresentationDuration'))
formats = []
for period in mpd_doc.findall(_add_ns('Period')):
@@ -1685,12 +1681,11 @@ class InfoExtractor(object):
'tbr': int_or_none(representation_attrib.get('bandwidth'), 1000),
'asr': int_or_none(representation_attrib.get('audioSamplingRate')),
'fps': int_or_none(representation_attrib.get('frameRate')),
'vcodec': 'none' if content_type == 'audio' else representation_attrib.get('codecs'),
'acodec': 'none' if content_type == 'video' else representation_attrib.get('codecs'),
'language': lang if lang not in ('mul', 'und', 'zxx', 'mis') else None,
'format_note': 'DASH %s' % content_type,
'filesize': filesize,
}
f.update(parse_codecs(representation_attrib.get('codecs')))
representation_ms_info = extract_multisegment_info(representation, adaption_set_ms_info)
if 'segment_urls' not in representation_ms_info and 'media_template' in representation_ms_info:
@@ -1774,7 +1769,7 @@ class InfoExtractor(object):
f['fragments'].append({'url': initialization_url})
f['fragments'].extend(representation_ms_info['fragments'])
for fragment in f['fragments']:
fragment['url'] = combine_url(base_url, fragment['url'])
fragment['url'] = urljoin(base_url, fragment['url'])
try:
existing_format = next(
fo for fo in formats
@@ -1888,7 +1883,7 @@ class InfoExtractor(object):
})
return formats
def _parse_html5_media_entries(self, base_url, webpage, video_id, m3u8_id=None, m3u8_entry_protocol='m3u8'):
def _parse_html5_media_entries(self, base_url, webpage, video_id, m3u8_id=None, m3u8_entry_protocol='m3u8', mpd_id=None):
def absolute_url(video_url):
return compat_urlparse.urljoin(base_url, video_url)
@@ -1905,11 +1900,16 @@ class InfoExtractor(object):
def _media_formats(src, cur_media_type):
full_url = absolute_url(src)
if determine_ext(full_url) == 'm3u8':
ext = determine_ext(full_url)
if ext == 'm3u8':
is_plain_url = False
formats = self._extract_m3u8_formats(
full_url, video_id, ext='mp4',
entry_protocol=m3u8_entry_protocol, m3u8_id=m3u8_id)
elif ext == 'mpd':
is_plain_url = False
formats = self._extract_mpd_formats(
full_url, video_id, mpd_id=mpd_id)
else:
is_plain_url = True
formats = [{

View File

@@ -150,6 +150,7 @@ from .cbsnews import (
)
from .cbssports import CBSSportsIE
from .ccc import CCCIE
from .ccma import CCMAIE
from .cctv import CCTVIE
from .cda import CDAIE
from .ceskatelevize import CeskaTelevizeIE
@@ -446,7 +447,10 @@ from .kuwo import (
KuwoMvIE,
)
from .la7 import LA7IE
from .laola1tv import Laola1TvIE
from .laola1tv import (
Laola1TvEmbedIE,
Laola1TvIE,
)
from .lci import LCIIE
from .lcp import (
LcpPlayIE,
@@ -498,6 +502,7 @@ from .mangomolo import (
)
from .matchtv import MatchTVIE
from .mdr import MDRIE
from .meipai import MeipaiIE
from .melonvod import MelonVODIE
from .meta import METAIE
from .metacafe import MetacafeIE
@@ -650,6 +655,7 @@ from .nrk import (
NRKPlaylistIE,
NRKSkoleIE,
NRKTVIE,
NRKTVDirekteIE,
)
from .ntvde import NTVDeIE
from .ntvru import NTVRuIE
@@ -662,6 +668,7 @@ from .nzz import NZZIE
from .odatv import OdaTVIE
from .odnoklassniki import OdnoklassnikiIE
from .oktoberfesttv import OktoberfestTVIE
from .ondemandkorea import OnDemandKoreaIE
from .onet import (
OnetIE,
OnetChannelIE,
@@ -692,6 +699,7 @@ from .periscope import (
from .philharmoniedeparis import PhilharmonieDeParisIE
from .phoenix import PhoenixIE
from .photobucket import PhotobucketIE
from .piksel import PikselIE
from .pinkbike import PinkbikeIE
from .pladform import PladformIE
from .playfm import PlayFMIE
@@ -999,7 +1007,10 @@ from .twitch import (
TwitchChapterIE,
TwitchVodIE,
TwitchProfileIE,
TwitchAllVideosIE,
TwitchUploadsIE,
TwitchPastBroadcastsIE,
TwitchHighlightsIE,
TwitchStreamIE,
TwitchClipsIE,
)
@@ -1013,6 +1024,7 @@ from .udemy import (
UdemyCourseIE
)
from .udn import UDNEmbedIE
from .uktvplay import UKTVPlayIE
from .digiteka import DigitekaIE
from .unistra import UnistraIE
from .uol import UOLIE
@@ -1096,6 +1108,11 @@ from .viki import (
VikiIE,
VikiChannelIE,
)
from .viu import (
ViuIE,
ViuPlaylistIE,
ViuOTTIE,
)
from .vk import (
VKIE,
VKUserVideosIE,
@@ -1110,6 +1127,7 @@ from .vporn import VpornIE
from .vrt import VRTIE
from .vube import VubeIE
from .vuclip import VuClipIE
from .vvvvid import VVVVIDIE
from .vyborymos import VyboryMosIE
from .vzaar import VzaarIE
from .walla import WallaIE

View File

@@ -75,6 +75,7 @@ from .facebook import FacebookIE
from .soundcloud import SoundcloudIE
from .vbox7 import Vbox7IE
from .dbtv import DBTVIE
from .piksel import PikselIE
class GenericIE(InfoExtractor):
@@ -972,6 +973,20 @@ class GenericIE(InfoExtractor):
'skip_download': True,
}
},
{
# Kaltura embedded, some fileExt broken (#11480)
'url': 'http://www.cornell.edu/video/nima-arkani-hamed-standard-models-of-particle-physics',
'info_dict': {
'id': '1_sgtvehim',
'ext': 'mp4',
'title': 'Our "Standard Models" of particle physics and cosmology',
'description': 'md5:67ea74807b8c4fea92a6f38d6d323861',
'timestamp': 1321158993,
'upload_date': '20111113',
'uploader_id': 'kps1',
},
'add_ie': ['Kaltura'],
},
# Eagle.Platform embed (generic URL)
{
'url': 'http://lenta.ru/news/2015/03/06/navalny/',
@@ -2211,6 +2226,11 @@ class GenericIE(InfoExtractor):
if arkena_url:
return self.url_result(arkena_url, ArkenaIE.ie_key())
# Look for Piksel embeds
piksel_url = PikselIE._extract_url(webpage)
if piksel_url:
return self.url_result(piksel_url, PikselIE.ie_key())
# Look for Limelight embeds
mobj = re.search(r'LimelightPlayer\.doLoad(Media|Channel|ChannelList)\(["\'](?P<id>[a-z0-9]{32})', webpage)
if mobj:

View File

@@ -11,6 +11,7 @@ from ..utils import (
int_or_none,
js_to_json,
mimetype2ext,
urljoin,
)
@@ -110,10 +111,14 @@ class JWPlatformBaseIE(InfoExtractor):
tracks = video_data.get('tracks')
if tracks and isinstance(tracks, list):
for track in tracks:
if track.get('file') and track.get('kind') == 'captions':
subtitles.setdefault(track.get('label') or 'en', []).append({
'url': self._proto_relative_url(track['file'])
})
if track.get('kind') != 'captions':
continue
track_url = urljoin(base_url, track.get('file'))
if not track_url:
continue
subtitles.setdefault(track.get('label') or 'en', []).append({
'url': self._proto_relative_url(track_url)
})
entries.append({
'id': this_video_id,
@@ -121,7 +126,7 @@ class JWPlatformBaseIE(InfoExtractor):
'description': video_data.get('description'),
'thumbnail': self._proto_relative_url(video_data.get('image')),
'timestamp': int_or_none(video_data.get('pubdate')),
'duration': float_or_none(jwplayer_data.get('duration')),
'duration': float_or_none(jwplayer_data.get('duration') or video_data.get('duration')),
'subtitles': subtitles,
'formats': formats,
})

View File

@@ -107,7 +107,7 @@ class KalturaIE(InfoExtractor):
(?P<q1>['\"])wid(?P=q1)\s*:\s*
(?P<q2>['\"])_?(?P<partner_id>(?:(?!(?P=q2)).)+)(?P=q2),.*?
(?P<q3>['\"])entry_?[Ii]d(?P=q3)\s*:\s*
(?P<q4>['\"])(?P<id>(?:(?!(?P=q4)).)+)(?P=q4),
(?P<q4>['\"])(?P<id>(?:(?!(?P=q4)).)+)(?P=q4)(?:,|\s*\})
""", webpage) or
re.search(
r'''(?xs)
@@ -266,6 +266,9 @@ class KalturaIE(InfoExtractor):
# skip for now.
if f.get('fileExt') == 'chun':
continue
if not f.get('fileExt') and f.get('containerFormat') == 'qt':
# QT indicates QuickTime; some videos have broken fileExt
f['fileExt'] = 'mov'
video_url = sign_url(
'%s/flavorId/%s' % (data_url, f['id']))
# audio-only has no videoCodecId (e.g. kaltura:1926081:0_c03e1b5g

View File

@@ -1,25 +1,81 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import (
compat_urllib_parse_urlencode,
compat_urlparse,
)
from ..utils import (
ExtractorError,
sanitized_Request,
unified_strdate,
urlencode_postdata,
xpath_element,
xpath_text,
urljoin,
)
class Laola1TvEmbedIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?laola1\.tv/titanplayer\.php\?.*?\bvideoid=(?P<id>\d+)'
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
flash_vars = self._search_regex(
r'(?s)flashvars\s*=\s*({.+?});', webpage, 'flash vars')
get_flashvar = lambda x: self._search_regex(r'%s\s*:\s*"([^"]+)"' % x, flash_vars, x)
hd_doc = self._download_xml(
'http://www.laola1.tv/server/hd_video.php', video_id, query={
'play': get_flashvar('streamid'),
'partner': get_flashvar('partnerid'),
'portal': get_flashvar('portalid'),
'lang': get_flashvar('sprache'),
'v5ident': '',
})
_v = lambda x, **k: xpath_text(hd_doc, './/video/' + x, **k)
title = _v('title', fatal=True)
data_abo = urlencode_postdata(
dict((i, v) for i, v in enumerate(_v('req_liga_abos').split(','))))
token_url = self._download_json(
'https://club.laola1.tv/sp/laola1/api/v3/user/session/premium/player/stream-access',
video_id, query={
'videoId': _v('id'),
'target': self._search_regex(r'vs_target = (\d+);', webpage, 'vs target'),
'label': _v('label'),
'area': _v('area'),
}, data=data_abo)['data']['stream-access'][0]
token_doc = self._download_xml(
token_url, video_id, 'Downloading token',
headers=self.geo_verification_headers())
token_attrib = xpath_element(token_doc, './/token').attrib
if token_attrib['status'] != '0':
raise ExtractorError(
'Token error: %s' % token_attrib['comment'], expected=True)
formats = self._extract_akamai_formats(
'%s?hdnea=%s' % (token_attrib['url'], token_attrib['auth']),
video_id)
self._sort_formats(formats)
categories_str = _v('meta_sports')
categories = categories_str.split(',') if categories_str else []
is_live = _v('islive') == 'true'
return {
'id': video_id,
'title': self._live_title(title) if is_live else title,
'upload_date': unified_strdate(_v('time_date')),
'uploader': _v('meta_organisation'),
'categories': categories,
'is_live': is_live,
'formats': formats,
}
class Laola1TvIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?laola1\.tv/(?P<lang>[a-z]+)-(?P<portal>[a-z]+)/(?P<kind>[^/]+)/(?P<slug>[^/?#&]+)'
_VALID_URL = r'https?://(?:www\.)?laola1\.tv/[a-z]+-[a-z]+/[^/]+/(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'http://www.laola1.tv/de-de/video/straubing-tigers-koelner-haie/227883.html',
'info_dict': {
@@ -67,85 +123,20 @@ class Laola1TvIE(InfoExtractor):
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
display_id = mobj.group('slug')
kind = mobj.group('kind')
lang = mobj.group('lang')
portal = mobj.group('portal')
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
if 'Dieser Livestream ist bereits beendet.' in webpage:
raise ExtractorError('This live stream has already finished.', expected=True)
iframe_url = self._search_regex(
iframe_url = urljoin(url, self._search_regex(
r'<iframe[^>]*?id="videoplayer"[^>]*?src="([^"]+)"',
webpage, 'iframe url')
video_id = self._search_regex(
r'videoid=(\d+)', iframe_url, 'video id')
iframe = self._download_webpage(compat_urlparse.urljoin(
url, iframe_url), display_id, 'Downloading iframe')
partner_id = self._search_regex(
r'partnerid\s*:\s*(["\'])(?P<partner_id>.+?)\1',
iframe, 'partner id', group='partner_id')
hd_doc = self._download_xml(
'http://www.laola1.tv/server/hd_video.php?%s'
% compat_urllib_parse_urlencode({
'play': video_id,
'partner': partner_id,
'portal': portal,
'lang': lang,
'v5ident': '',
}), display_id)
_v = lambda x, **k: xpath_text(hd_doc, './/video/' + x, **k)
title = _v('title', fatal=True)
VS_TARGETS = {
'video': '2',
'livestream': '17',
}
req = sanitized_Request(
'https://club.laola1.tv/sp/laola1/api/v3/user/session/premium/player/stream-access?%s' %
compat_urllib_parse_urlencode({
'videoId': video_id,
'target': VS_TARGETS.get(kind, '2'),
'label': _v('label'),
'area': _v('area'),
}),
urlencode_postdata(
dict((i, v) for i, v in enumerate(_v('req_liga_abos').split(',')))))
token_url = self._download_json(req, display_id)['data']['stream-access'][0]
token_doc = self._download_xml(token_url, display_id, 'Downloading token')
token_attrib = xpath_element(token_doc, './/token').attrib
token_auth = token_attrib['auth']
if token_auth in ('blocked', 'restricted', 'error'):
raise ExtractorError(
'Token error: %s' % token_attrib['comment'], expected=True)
formats = self._extract_f4m_formats(
'%s?hdnea=%s&hdcore=3.2.0' % (token_attrib['url'], token_auth),
video_id, f4m_id='hds')
self._sort_formats(formats)
categories_str = _v('meta_sports')
categories = categories_str.split(',') if categories_str else []
webpage, 'iframe url'))
return {
'id': video_id,
'_type': 'url',
'display_id': display_id,
'title': title,
'upload_date': unified_strdate(_v('time_date')),
'uploader': _v('meta_organisation'),
'categories': categories,
'is_live': _v('islive') == 'true',
'formats': formats,
'url': iframe_url,
'ie_key': 'Laola1TvEmbed',
}

View File

@@ -0,0 +1,104 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
int_or_none,
parse_duration,
unified_timestamp,
)
class MeipaiIE(InfoExtractor):
IE_DESC = '美拍'
_VALID_URL = r'https?://(?:www\.)?meipai.com/media/(?P<id>[0-9]+)'
_TESTS = [{
# regular uploaded video
'url': 'http://www.meipai.com/media/531697625',
'md5': 'e3e9600f9e55a302daecc90825854b4f',
'info_dict': {
'id': '531697625',
'ext': 'mp4',
'title': '#葉子##阿桑##余姿昀##超級女聲#',
'description': '#葉子##阿桑##余姿昀##超級女聲#',
'thumbnail': 're:^https?://.*\.jpg$',
'duration': 152,
'timestamp': 1465492420,
'upload_date': '20160609',
'view_count': 35511,
'creator': '她她-TATA',
'tags': ['葉子', '阿桑', '余姿昀', '超級女聲'],
}
}, {
# record of live streaming
'url': 'http://www.meipai.com/media/585526361',
'md5': 'ff7d6afdbc6143342408223d4f5fb99a',
'info_dict': {
'id': '585526361',
'ext': 'mp4',
'title': '姿昀和善願 練歌練琴啦😁😁😁',
'description': '姿昀和善願 練歌練琴啦😁😁😁',
'thumbnail': 're:^https?://.*\.jpg$',
'duration': 5975,
'timestamp': 1474311799,
'upload_date': '20160919',
'view_count': 1215,
'creator': '她她-TATA',
}
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
title = self._og_search_title(
webpage, default=None) or self._html_search_regex(
r'<title[^>]*>([^<]+)</title>', webpage, 'title')
formats = []
# recorded playback of live streaming
m3u8_url = self._html_search_regex(
r'file:\s*encodeURIComponent\((["\'])(?P<url>(?:(?!\1).)+)\1\)',
webpage, 'm3u8 url', group='url', default=None)
if m3u8_url:
formats.extend(self._extract_m3u8_formats(
m3u8_url, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls', fatal=False))
if not formats:
# regular uploaded video
video_url = self._search_regex(
r'data-video=(["\'])(?P<url>(?:(?!\1).)+)\1', webpage, 'video url',
group='url', default=None)
if video_url:
formats.append({
'url': video_url,
'format_id': 'http',
})
timestamp = unified_timestamp(self._og_search_property(
'video:release_date', webpage, 'release date', fatal=False))
tags = self._og_search_property(
'video:tag', webpage, 'tags', default='').split(',')
view_count = int_or_none(self._html_search_meta(
'interactionCount', webpage, 'view count'))
duration = parse_duration(self._html_search_meta(
'duration', webpage, 'duration'))
creator = self._og_search_property(
'video:director', webpage, 'creator', fatal=False)
return {
'id': video_id,
'title': title,
'description': self._og_search_description(webpage),
'thumbnail': self._og_search_thumbnail(webpage),
'duration': duration,
'timestamp': timestamp,
'view_count': view_count,
'creator': creator,
'tags': tags,
'formats': formats,
}

View File

@@ -9,6 +9,7 @@ from ..utils import (
lowercase_escape,
smuggle_url,
unescapeHTML,
update_url_query,
)
@@ -208,7 +209,7 @@ class NBCNewsIE(ThePlatformIE):
'url': 'http://www.nbcnews.com/watch/nbcnews-com/how-twitter-reacted-to-the-snowden-interview-269389891880',
'md5': 'af1adfa51312291a017720403826bb64',
'info_dict': {
'id': '269389891880',
'id': 'p_tweet_snow_140529',
'ext': 'mp4',
'title': 'How Twitter Reacted To The Snowden Interview',
'description': 'md5:65a0bd5d76fe114f3c2727aa3a81fe64',
@@ -232,7 +233,7 @@ class NBCNewsIE(ThePlatformIE):
'url': 'http://www.nbcnews.com/nightly-news/video/nightly-news-with-brian-williams-full-broadcast-february-4-394064451844',
'md5': '73135a2e0ef819107bbb55a5a9b2a802',
'info_dict': {
'id': '394064451844',
'id': 'nn_netcast_150204',
'ext': 'mp4',
'title': 'Nightly News with Brian Williams Full Broadcast (February 4)',
'description': 'md5:1c10c1eccbe84a26e5debb4381e2d3c5',
@@ -245,7 +246,7 @@ class NBCNewsIE(ThePlatformIE):
'url': 'http://www.nbcnews.com/business/autos/volkswagen-11-million-vehicles-could-have-suspect-software-emissions-scandal-n431456',
'md5': 'a49e173825e5fcd15c13fc297fced39d',
'info_dict': {
'id': '529953347624',
'id': 'x_lon_vwhorn_150922',
'ext': 'mp4',
'title': 'Volkswagen U.S. Chief:\xa0 We Have Totally Screwed Up',
'description': 'md5:c8be487b2d80ff0594c005add88d8351',
@@ -258,7 +259,7 @@ class NBCNewsIE(ThePlatformIE):
'url': 'http://www.today.com/video/see-the-aurora-borealis-from-space-in-stunning-new-nasa-video-669831235788',
'md5': '118d7ca3f0bea6534f119c68ef539f71',
'info_dict': {
'id': '669831235788',
'id': 'tdy_al_space_160420',
'ext': 'mp4',
'title': 'See the aurora borealis from space in stunning new NASA video',
'description': 'md5:74752b7358afb99939c5f8bb2d1d04b1',
@@ -271,7 +272,7 @@ class NBCNewsIE(ThePlatformIE):
'url': 'http://www.msnbc.com/all-in-with-chris-hayes/watch/the-chaotic-gop-immigration-vote-314487875924',
'md5': '6d236bf4f3dddc226633ce6e2c3f814d',
'info_dict': {
'id': '314487875924',
'id': 'n_hayes_Aimm_140801_272214',
'ext': 'mp4',
'title': 'The chaotic GOP immigration vote',
'description': 'The Republican House votes on a border bill that has no chance of getting through the Senate or signed by the President and is drawing criticism from all sides.',
@@ -279,7 +280,6 @@ class NBCNewsIE(ThePlatformIE):
'timestamp': 1406937606,
'upload_date': '20140802',
'uploader': 'NBCU-NEWS',
'categories': ['MSNBC/Topics/Franchise/Best of last night', 'MSNBC/Topics/General/Congress'],
},
},
{
@@ -311,28 +311,41 @@ class NBCNewsIE(ThePlatformIE):
else:
# "feature" and "nightly-news" pages use theplatform.com
video_id = mobj.group('mpx_id')
if not video_id.isdigit():
webpage = self._download_webpage(url, video_id)
info = None
bootstrap_json = self._search_regex(
[r'(?m)(?:var\s+(?:bootstrapJson|playlistData)|NEWS\.videoObj)\s*=\s*({.+});?\s*$',
r'videoObj\s*:\s*({.+})', r'data-video="([^"]+)"'],
webpage, 'bootstrap json', default=None)
webpage = self._download_webpage(url, video_id)
filter_param = 'byId'
bootstrap_json = self._search_regex(
[r'(?m)(?:var\s+(?:bootstrapJson|playlistData)|NEWS\.videoObj)\s*=\s*({.+});?\s*$',
r'videoObj\s*:\s*({.+})', r'data-video="([^"]+)"',
r'jQuery\.extend\(Drupal\.settings\s*,\s*({.+?})\);'],
webpage, 'bootstrap json', default=None)
if bootstrap_json:
bootstrap = self._parse_json(
bootstrap_json, video_id, transform_source=unescapeHTML)
info = None
if 'results' in bootstrap:
info = bootstrap['results'][0]['video']
elif 'video' in bootstrap:
info = bootstrap['video']
elif 'msnbcVideoInfo' in bootstrap:
info = bootstrap['msnbcVideoInfo']['meta']
elif 'msnbcThePlatform' in bootstrap:
info = bootstrap['msnbcThePlatform']['videoPlayer']['video']
else:
info = bootstrap
video_id = info['mpxId']
if 'guid' in info:
video_id = info['guid']
filter_param = 'byGuid'
elif 'mpxId' in info:
video_id = info['mpxId']
return {
'_type': 'url_transparent',
'id': video_id,
# http://feed.theplatform.com/f/2E2eJC/nbcnews also works
'url': 'http://feed.theplatform.com/f/2E2eJC/nnd_NBCNews?byId=%s' % video_id,
'url': update_url_query('http://feed.theplatform.com/f/2E2eJC/nnd_NBCNews', {filter_param: video_id}),
'ie_key': 'ThePlatformFeed',
}

View File

@@ -48,6 +48,13 @@ class NRKBaseIE(InfoExtractor):
entries = []
conviva = data.get('convivaStatistics') or {}
live = (data.get('mediaElementType') == 'Live' or
data.get('isLive') is True or conviva.get('isLive'))
def make_title(t):
return self._live_title(t) if live else t
media_assets = data.get('mediaAssets')
if media_assets and isinstance(media_assets, list):
def video_id_and_title(idx):
@@ -61,6 +68,13 @@ class NRKBaseIE(InfoExtractor):
if not formats:
continue
self._sort_formats(formats)
# Some f4m streams may not work with hdcore in fragments' URLs
for f in formats:
extra_param = f.get('extra_param_to_segment_url')
if extra_param and 'hdcore' in extra_param:
del f['extra_param_to_segment_url']
entry_id, entry_title = video_id_and_title(num)
duration = parse_duration(asset.get('duration'))
subtitles = {}
@@ -72,7 +86,7 @@ class NRKBaseIE(InfoExtractor):
})
entries.append({
'id': asset.get('carrierId') or entry_id,
'title': entry_title,
'title': make_title(entry_title),
'duration': duration,
'subtitles': subtitles,
'formats': formats,
@@ -87,7 +101,7 @@ class NRKBaseIE(InfoExtractor):
duration = parse_duration(data.get('duration'))
entries = [{
'id': video_id,
'title': title,
'title': make_title(title),
'duration': duration,
'formats': formats,
}]
@@ -111,7 +125,6 @@ class NRKBaseIE(InfoExtractor):
message_type, message_type)),
expected=True)
conviva = data.get('convivaStatistics') or {}
series = conviva.get('seriesName') or data.get('seriesTitle')
episode = conviva.get('episodeName') or data.get('episodeNumberOrDate')
@@ -260,6 +273,19 @@ class NRKTVIE(NRKBaseIE):
}]
class NRKTVDirekteIE(NRKTVIE):
IE_DESC = 'NRK TV Direkte and NRK Radio Direkte'
_VALID_URL = r'https?://(?:tv|radio)\.nrk\.no/direkte/(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'https://tv.nrk.no/direkte/nrk1',
'only_matching': True,
}, {
'url': 'https://radio.nrk.no/direkte/p1_oslo_akershus',
'only_matching': True,
}]
class NRKPlaylistIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?nrk\.no/(?!video|skole)(?:[^/]+/)+(?P<id>[^/]+)'

View File

@@ -0,0 +1,60 @@
# coding: utf-8
from __future__ import unicode_literals
from .jwplatform import JWPlatformBaseIE
from ..utils import (
ExtractorError,
js_to_json,
)
class OnDemandKoreaIE(JWPlatformBaseIE):
_VALID_URL = r'https?://(?:www\.)?ondemandkorea\.com/(?P<id>[^/]+)\.html'
_TEST = {
'url': 'http://www.ondemandkorea.com/ask-us-anything-e43.html',
'info_dict': {
'id': 'ask-us-anything-e43',
'ext': 'mp4',
'title': 'Ask Us Anything : E43',
'thumbnail': 're:^https?://.*\.jpg$',
},
'params': {
'skip_download': 'm3u8 download'
}
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id, fatal=False)
if not webpage:
# Page sometimes returns captcha page with HTTP 403
raise ExtractorError(
'Unable to access page. You may have been blocked.',
expected=True)
if 'msg_block_01.png' in webpage:
self.raise_geo_restricted(
'This content is not available in your region')
if 'This video is only available to ODK PLUS members.' in webpage:
raise ExtractorError(
'This video is only available to ODK PLUS members.',
expected=True)
title = self._og_search_title(webpage)
jw_config = self._parse_json(
self._search_regex(
r'(?s)jwplayer\(([\'"])(?:(?!\1).)+\1\)\.setup\s*\((?P<options>.+?)\);',
webpage, 'jw config', group='options'),
video_id, transform_source=js_to_json)
info = self._parse_jwplayer_data(
jw_config, video_id, require_title=False, m3u8_id='hls',
base_url=url)
info.update({
'title': title,
'thumbnail': self._og_search_thumbnail(webpage),
})
return info

View File

@@ -350,6 +350,15 @@ class PBSIE(InfoExtractor):
410: 'This video has expired and is no longer available for online streaming.',
}
def _real_initialize(self):
cookie = (self._download_json(
'http://localization.services.pbs.org/localize/auto/cookie/',
None, headers=self.geo_verification_headers(), fatal=False) or {}).get('cookie')
if cookie:
station = self._search_regex(r'#?s=\["([^"]+)"', cookie, 'station')
if station:
self._set_cookie('.pbs.org', 'pbsol.station', station)
def _extract_webpage(self, url):
mobj = re.match(self._VALID_URL, url)
@@ -476,7 +485,8 @@ class PBSIE(InfoExtractor):
redirect_info = self._download_json(
'%s?format=json' % redirect['url'], display_id,
'Downloading %s video url info' % (redirect_id or num))
'Downloading %s video url info' % (redirect_id or num),
headers=self.geo_verification_headers())
if redirect_info['status'] == 'error':
raise ExtractorError(

View File

@@ -0,0 +1,106 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
ExtractorError,
dict_get,
int_or_none,
unescapeHTML,
parse_iso8601,
)
class PikselIE(InfoExtractor):
_VALID_URL = r'https?://player\.piksel\.com/v/(?P<id>[a-z0-9]+)'
_TEST = {
'url': 'http://player.piksel.com/v/nv60p12f',
'md5': 'd9c17bbe9c3386344f9cfd32fad8d235',
'info_dict': {
'id': 'nv60p12f',
'ext': 'mp4',
'title': 'فن الحياة - الحلقة 1',
'description': 'احدث برامج الداعية الاسلامي " مصطفي حسني " فى رمضان 2016علي النهار نور',
'timestamp': 1465231790,
'upload_date': '20160606',
}
}
@staticmethod
def _extract_url(webpage):
mobj = re.search(
r'<iframe[^>]+src=["\'](?P<url>(?:https?:)?//player\.piksel\.com/v/[a-z0-9]+)',
webpage)
if mobj:
return mobj.group('url')
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
app_token = self._search_regex(
r'clientAPI\s*:\s*"([^"]+)"', webpage, 'app token')
response = self._download_json(
'http://player.piksel.com/ws/ws_program/api/%s/mode/json/apiv/5' % app_token,
video_id, query={
'v': video_id
})['response']
failure = response.get('failure')
if failure:
raise ExtractorError(response['failure']['reason'], expected=True)
video_data = response['WsProgramResponse']['program']['asset']
title = video_data['title']
formats = []
m3u8_url = dict_get(video_data, [
'm3u8iPadURL',
'ipadM3u8Url',
'm3u8AndroidURL',
'm3u8iPhoneURL',
'iphoneM3u8Url'])
if m3u8_url:
formats.extend(self._extract_m3u8_formats(
m3u8_url, video_id, 'mp4', 'm3u8_native',
m3u8_id='hls', fatal=False))
asset_type = dict_get(video_data, ['assetType', 'asset_type'])
for asset_file in video_data.get('assetFiles', []):
# TODO: extract rtmp formats
http_url = asset_file.get('http_url')
if not http_url:
continue
tbr = None
vbr = int_or_none(asset_file.get('videoBitrate'), 1024)
abr = int_or_none(asset_file.get('audioBitrate'), 1024)
if asset_type == 'video':
tbr = vbr + abr
elif asset_type == 'audio':
tbr = abr
format_id = ['http']
if tbr:
format_id.append(compat_str(tbr))
formats.append({
'format_id': '-'.join(format_id),
'url': unescapeHTML(http_url),
'vbr': vbr,
'abr': abr,
'width': int_or_none(asset_file.get('videoWidth')),
'height': int_or_none(asset_file.get('videoHeight')),
'filesize': int_or_none(asset_file.get('filesize')),
'tbr': tbr,
})
self._sort_formats(formats)
return {
'id': video_id,
'title': title,
'description': video_data.get('description'),
'thumbnail': video_data.get('thumbnailUrl'),
'timestamp': parse_iso8601(video_data.get('dateadd')),
'formats': formats,
}

View File

@@ -300,7 +300,7 @@ class TwitchPlaylistBaseIE(TwitchBaseIE):
response = self._call_api(
self._PLAYLIST_PATH % (channel_id, offset, limit),
channel_id,
'Downloading %s videos JSON page %s'
'Downloading %s JSON page %s'
% (self._PLAYLIST_TYPE, counter_override or counter))
page_entries = self._extract_playlist_page(response)
if not page_entries:
@@ -350,19 +350,72 @@ class TwitchProfileIE(TwitchPlaylistBaseIE):
}
class TwitchPastBroadcastsIE(TwitchPlaylistBaseIE):
IE_NAME = 'twitch:past_broadcasts'
_VALID_URL = r'%s/(?P<id>[^/]+)/profile/past_broadcasts/?(?:\#.*)?$' % TwitchBaseIE._VALID_URL_BASE
_PLAYLIST_PATH = TwitchPlaylistBaseIE._PLAYLIST_PATH + '&broadcasts=true'
_PLAYLIST_TYPE = 'past broadcasts'
class TwitchVideosBaseIE(TwitchPlaylistBaseIE):
_VALID_URL_VIDEOS_BASE = r'%s/(?P<id>[^/]+)/videos' % TwitchBaseIE._VALID_URL_BASE
_PLAYLIST_PATH = TwitchPlaylistBaseIE._PLAYLIST_PATH + '&broadcast_type='
class TwitchAllVideosIE(TwitchVideosBaseIE):
IE_NAME = 'twitch:videos:all'
_VALID_URL = r'%s/all' % TwitchVideosBaseIE._VALID_URL_VIDEOS_BASE
_PLAYLIST_PATH = TwitchVideosBaseIE._PLAYLIST_PATH + 'archive,upload,highlight'
_PLAYLIST_TYPE = 'all videos'
_TEST = {
'url': 'http://www.twitch.tv/spamfish/profile/past_broadcasts',
'url': 'https://www.twitch.tv/spamfish/videos/all',
'info_dict': {
'id': 'spamfish',
'title': 'Spamfish',
},
'playlist_mincount': 54,
'playlist_mincount': 869,
}
class TwitchUploadsIE(TwitchVideosBaseIE):
IE_NAME = 'twitch:videos:uploads'
_VALID_URL = r'%s/uploads' % TwitchVideosBaseIE._VALID_URL_VIDEOS_BASE
_PLAYLIST_PATH = TwitchVideosBaseIE._PLAYLIST_PATH + 'upload'
_PLAYLIST_TYPE = 'uploads'
_TEST = {
'url': 'https://www.twitch.tv/spamfish/videos/uploads',
'info_dict': {
'id': 'spamfish',
'title': 'Spamfish',
},
'playlist_mincount': 0,
}
class TwitchPastBroadcastsIE(TwitchVideosBaseIE):
IE_NAME = 'twitch:videos:past-broadcasts'
_VALID_URL = r'%s/past-broadcasts' % TwitchVideosBaseIE._VALID_URL_VIDEOS_BASE
_PLAYLIST_PATH = TwitchVideosBaseIE._PLAYLIST_PATH + 'archive'
_PLAYLIST_TYPE = 'past broadcasts'
_TEST = {
'url': 'https://www.twitch.tv/spamfish/videos/past-broadcasts',
'info_dict': {
'id': 'spamfish',
'title': 'Spamfish',
},
'playlist_mincount': 0,
}
class TwitchHighlightsIE(TwitchVideosBaseIE):
IE_NAME = 'twitch:videos:highlights'
_VALID_URL = r'%s/highlights' % TwitchVideosBaseIE._VALID_URL_VIDEOS_BASE
_PLAYLIST_PATH = TwitchVideosBaseIE._PLAYLIST_PATH + 'highlight'
_PLAYLIST_TYPE = 'highlights'
_TEST = {
'url': 'https://www.twitch.tv/spamfish/videos/highlights',
'info_dict': {
'id': 'spamfish',
'title': 'Spamfish',
},
'playlist_mincount': 805,
}

View File

@@ -0,0 +1,33 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
class UKTVPlayIE(InfoExtractor):
_VALID_URL = r'https?://uktvplay\.uktv\.co\.uk/.+?\?.*?\bvideo=(?P<id>\d+)'
_TEST = {
'url': 'https://uktvplay.uktv.co.uk/shows/world-at-war/c/200/watch-online/?video=2117008346001',
'md5': '',
'info_dict': {
'id': '2117008346001',
'ext': 'mp4',
'title': 'Pincers',
'description': 'Pincers',
'uploader_id': '1242911124001',
'upload_date': '20130124',
'timestamp': 1359049267,
},
'params': {
# m3u8 download
'skip_download': True,
},
'expected_warnings': ['Failed to download MPD manifest']
}
BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/1242911124001/H1xnMOqP_default/index.html?videoId=%s'
def _real_extract(self, url):
video_id = self._match_id(url)
return self.url_result(
self.BRIGHTCOVE_URL_TEMPLATE % video_id,
'BrightcoveNew', video_id)

View File

@@ -4,11 +4,22 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import urlencode_postdata
from ..utils import ExtractorError
class Vbox7IE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?vbox7\.com/(?:play:|emb/external\.php\?.*?\bvid=)(?P<id>[\da-fA-F]+)'
_VALID_URL = r'''(?x)
https?://
(?:[^/]+\.)?vbox7\.com/
(?:
play:|
(?:
emb/external\.php|
player/ext\.swf
)\?.*?\bvid=
)
(?P<id>[\da-fA-F]+)
'''
_TESTS = [{
'url': 'http://vbox7.com/play:0946fff23c',
'md5': 'a60f9ab3a3a2f013ef9a967d5f7be5bf',
@@ -16,6 +27,14 @@ class Vbox7IE(InfoExtractor):
'id': '0946fff23c',
'ext': 'mp4',
'title': 'Борисов: Притеснен съм за бъдещето на България',
'description': 'По думите му е опасно страната ни да бъде обявена за "сигурна"',
'thumbnail': 're:^https?://.*\.jpg$',
'timestamp': 1470982814,
'upload_date': '20160812',
'uploader': 'zdraveibulgaria',
},
'params': {
'proxy': '127.0.0.1:8118',
},
}, {
'url': 'http://vbox7.com/play:249bb972c2',
@@ -29,6 +48,9 @@ class Vbox7IE(InfoExtractor):
}, {
'url': 'http://vbox7.com/emb/external.php?vid=a240d20f9c&autoplay=1',
'only_matching': True,
}, {
'url': 'http://i49.vbox7.com/player/ext.swf?vid=0946fff23c&autoplay=1',
'only_matching': True,
}]
@staticmethod
@@ -42,33 +64,40 @@ class Vbox7IE(InfoExtractor):
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(
'http://vbox7.com/play:%s' % video_id, video_id)
response = self._download_json(
'https://www.vbox7.com/ajax/video/nextvideo.php?vid=%s' % video_id,
video_id)
title = self._html_search_regex(
r'<title>(.+?)</title>', webpage, 'title').split('/')[0].strip()
if 'error' in response:
raise ExtractorError(
'%s said: %s' % (self.IE_NAME, response['error']), expected=True)
video_url = self._search_regex(
r'src\s*:\s*(["\'])(?P<url>.+?.mp4.*?)\1',
webpage, 'video url', default=None, group='url')
video = response['options']
thumbnail_url = self._og_search_thumbnail(webpage)
if not video_url:
info_response = self._download_webpage(
'http://vbox7.com/play/magare.do', video_id,
'Downloading info webpage',
data=urlencode_postdata({'as3': '1', 'vid': video_id}),
headers={'Content-Type': 'application/x-www-form-urlencoded'})
final_url, thumbnail_url = map(
lambda x: x.split('=')[1], info_response.split('&'))
title = video['title']
video_url = video['src']
if '/na.mp4' in video_url:
self.raise_geo_restricted()
return {
uploader = video.get('uploader')
webpage = self._download_webpage(
'http://vbox7.com/play:%s' % video_id, video_id, fatal=None)
info = {}
if webpage:
info = self._search_json_ld(
webpage.replace('"/*@context"', '"@context"'), video_id)
info.update({
'id': video_id,
'url': self._proto_relative_url(video_url, 'http:'),
'title': title,
'thumbnail': thumbnail_url,
}
'url': video_url,
'uploader': uploader,
'thumbnail': self._proto_relative_url(
info.get('thumbnail') or self._og_search_thumbnail(webpage),
'http:'),
})
return info

View File

@@ -92,29 +92,30 @@ class VimeoBaseInfoExtractor(InfoExtractor):
def _vimeo_sort_formats(self, formats):
# Bitrates are completely broken. Single m3u8 may contain entries in kbps and bps
# at the same time without actual units specified. This lead to wrong sorting.
self._sort_formats(formats, field_preference=('preference', 'height', 'width', 'fps', 'format_id'))
self._sort_formats(formats, field_preference=('preference', 'height', 'width', 'fps', 'tbr', 'format_id'))
def _parse_config(self, config, video_id):
video_data = config['video']
# Extract title
video_title = config['video']['title']
video_title = video_data['title']
# Extract uploader, uploader_url and uploader_id
video_uploader = config['video'].get('owner', {}).get('name')
video_uploader_url = config['video'].get('owner', {}).get('url')
video_uploader = video_data.get('owner', {}).get('name')
video_uploader_url = video_data.get('owner', {}).get('url')
video_uploader_id = video_uploader_url.split('/')[-1] if video_uploader_url else None
# Extract video thumbnail
video_thumbnail = config['video'].get('thumbnail')
video_thumbnail = video_data.get('thumbnail')
if video_thumbnail is None:
video_thumbs = config['video'].get('thumbs')
video_thumbs = video_data.get('thumbs')
if video_thumbs and isinstance(video_thumbs, dict):
_, video_thumbnail = sorted((int(width if width.isdigit() else 0), t_url) for (width, t_url) in video_thumbs.items())[-1]
# Extract video duration
video_duration = int_or_none(config['video'].get('duration'))
video_duration = int_or_none(video_data.get('duration'))
formats = []
config_files = config['video'].get('files') or config['request'].get('files', {})
config_files = video_data.get('files') or config['request'].get('files', {})
for f in config_files.get('progressive', []):
video_url = f.get('url')
if not video_url:
@@ -127,10 +128,24 @@ class VimeoBaseInfoExtractor(InfoExtractor):
'fps': int_or_none(f.get('fps')),
'tbr': int_or_none(f.get('bitrate')),
})
m3u8_url = config_files.get('hls', {}).get('url')
if m3u8_url:
formats.extend(self._extract_m3u8_formats(
m3u8_url, video_id, 'mp4', 'm3u8_native', m3u8_id='hls', fatal=False))
for files_type in ('hls', 'dash'):
for cdn_name, cdn_data in config_files.get(files_type, {}).get('cdns', {}).items():
manifest_url = cdn_data.get('url')
if not manifest_url:
continue
format_id = '%s-%s' % (files_type, cdn_name)
if files_type == 'hls':
formats.extend(self._extract_m3u8_formats(
manifest_url, video_id, 'mp4',
'm3u8_native', m3u8_id=format_id,
note='Downloading %s m3u8 information' % cdn_name,
fatal=False))
elif files_type == 'dash':
formats.extend(self._extract_mpd_formats(
manifest_url.replace('/master.json', '/master.mpd'), video_id, format_id,
'Downloading %s MPD information' % cdn_name,
fatal=False))
subtitles = {}
text_tracks = config['request'].get('text_tracks')

249
youtube_dl/extractor/viu.py Normal file
View File

@@ -0,0 +1,249 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
ExtractorError,
int_or_none,
)
class ViuBaseIE(InfoExtractor):
def _real_initialize(self):
viu_auth_res = self._request_webpage(
'https://www.viu.com/api/apps/v2/authenticate', None,
'Requesting Viu auth', query={
'acct': 'test',
'appid': 'viu_desktop',
'fmt': 'json',
'iid': 'guest',
'languageid': 'default',
'platform': 'desktop',
'userid': 'guest',
'useridtype': 'guest',
'ver': '1.0'
})
self._auth_token = viu_auth_res.info()['X-VIU-AUTH']
def _call_api(self, path, *args, **kwargs):
headers = self.geo_verification_headers()
headers.update({
'X-VIU-AUTH': self._auth_token
})
headers.update(kwargs.get('headers', {}))
kwargs['headers'] = headers
response = self._download_json(
'https://www.viu.com/api/' + path, *args, **kwargs)['response']
if response.get('status') != 'success':
raise ExtractorError('%s said: %s' % (
self.IE_NAME, response['message']), expected=True)
return response
class ViuIE(ViuBaseIE):
_VALID_URL = r'(?:viu:|https?://www\.viu\.com/[a-z]{2}/media/)(?P<id>\d+)'
_TESTS = [{
'url': 'https://www.viu.com/en/media/1116705532?containerId=playlist-22168059',
'info_dict': {
'id': '1116705532',
'ext': 'mp4',
'title': 'Citizen Khan - Ep 1',
'description': 'md5:d7ea1604f49e5ba79c212c551ce2110e',
},
'params': {
'skip_download': 'm3u8 download',
},
'skip': 'Geo-restricted to India',
}, {
'url': 'https://www.viu.com/en/media/1130599965',
'info_dict': {
'id': '1130599965',
'ext': 'mp4',
'title': 'Jealousy Incarnate - Episode 1',
'description': 'md5:d3d82375cab969415d2720b6894361e9',
},
'params': {
'skip_download': 'm3u8 download',
},
'skip': 'Geo-restricted to Indonesia',
}]
def _real_extract(self, url):
video_id = self._match_id(url)
video_data = self._call_api(
'clip/load', video_id, 'Downloading video data', query={
'appid': 'viu_desktop',
'fmt': 'json',
'id': video_id
})['item'][0]
title = video_data['title']
m3u8_url = None
url_path = video_data.get('urlpathd') or video_data.get('urlpath')
tdirforwhole = video_data.get('tdirforwhole')
# #EXT-X-BYTERANGE is not supported by native hls downloader
# and ffmpeg (#10955)
# hls_file = video_data.get('hlsfile')
hls_file = video_data.get('jwhlsfile')
if url_path and tdirforwhole and hls_file:
m3u8_url = '%s/%s/%s' % (url_path, tdirforwhole, hls_file)
else:
# m3u8_url = re.sub(
# r'(/hlsc_)[a-z]+(\d+\.m3u8)',
# r'\1whe\2', video_data['href'])
m3u8_url = video_data['href']
formats = self._extract_m3u8_formats(m3u8_url, video_id, 'mp4')
self._sort_formats(formats)
subtitles = {}
for key, value in video_data.items():
mobj = re.match(r'^subtitle_(?P<lang>[^_]+)_(?P<ext>(vtt|srt))', key)
if not mobj:
continue
subtitles.setdefault(mobj.group('lang'), []).append({
'url': value,
'ext': mobj.group('ext')
})
return {
'id': video_id,
'title': title,
'description': video_data.get('description'),
'series': video_data.get('moviealbumshowname'),
'episode': title,
'episode_number': int_or_none(video_data.get('episodeno')),
'duration': int_or_none(video_data.get('duration')),
'formats': formats,
'subtitles': subtitles,
}
class ViuPlaylistIE(ViuBaseIE):
IE_NAME = 'viu:playlist'
_VALID_URL = r'https?://www\.viu\.com/[^/]+/listing/playlist-(?P<id>\d+)'
_TEST = {
'url': 'https://www.viu.com/en/listing/playlist-22461380',
'info_dict': {
'id': '22461380',
'title': 'The Good Wife',
},
'playlist_count': 16,
'skip': 'Geo-restricted to Indonesia',
}
def _real_extract(self, url):
playlist_id = self._match_id(url)
playlist_data = self._call_api(
'container/load', playlist_id,
'Downloading playlist info', query={
'appid': 'viu_desktop',
'fmt': 'json',
'id': 'playlist-' + playlist_id
})['container']
entries = []
for item in playlist_data.get('item', []):
item_id = item.get('id')
if not item_id:
continue
item_id = compat_str(item_id)
entries.append(self.url_result(
'viu:' + item_id, 'Viu', item_id))
return self.playlist_result(
entries, playlist_id, playlist_data.get('title'))
class ViuOTTIE(InfoExtractor):
IE_NAME = 'viu:ott'
_VALID_URL = r'https?://(?:www\.)?viu\.com/ott/(?P<country_code>[a-z]{2})/[a-z]{2}-[a-z]{2}/vod/(?P<id>\d+)'
_TESTS = [{
'url': 'http://www.viu.com/ott/sg/en-us/vod/3421/The%20Prime%20Minister%20and%20I',
'info_dict': {
'id': '3421',
'ext': 'mp4',
'title': 'A New Beginning',
'description': 'md5:1e7486a619b6399b25ba6a41c0fe5b2c',
},
'params': {
'skip_download': 'm3u8 download',
},
'skip': 'Geo-restricted to Singapore',
}, {
'url': 'http://www.viu.com/ott/hk/zh-hk/vod/7123/%E5%A4%A7%E4%BA%BA%E5%A5%B3%E5%AD%90',
'info_dict': {
'id': '7123',
'ext': 'mp4',
'title': '這就是我的生活之道',
'description': 'md5:4eb0d8b08cf04fcdc6bbbeb16043434f',
},
'params': {
'skip_download': 'm3u8 download',
},
'skip': 'Geo-restricted to Hong Kong',
}]
def _real_extract(self, url):
country_code, video_id = re.match(self._VALID_URL, url).groups()
product_data = self._download_json(
'http://www.viu.com/ott/%s/index.php' % country_code, video_id,
'Downloading video info', query={
'r': 'vod/ajax-detail',
'platform_flag_label': 'web',
'product_id': video_id,
})['data']
video_data = product_data.get('current_product')
if not video_data:
raise ExtractorError('This video is not available in your region.', expected=True)
stream_data = self._download_json(
'https://d1k2us671qcoau.cloudfront.net/distribute_web_%s.php' % country_code,
video_id, 'Downloading stream info', query={
'ccs_product_id': video_data['ccs_product_id'],
})['data']['stream']
stream_sizes = stream_data.get('size', {})
formats = []
for vid_format, stream_url in stream_data.get('url', {}).items():
height = int_or_none(self._search_regex(
r's(\d+)p', vid_format, 'height', default=None))
formats.append({
'format_id': vid_format,
'url': stream_url,
'height': height,
'ext': 'mp4',
'filesize': int_or_none(stream_sizes.get(vid_format))
})
self._sort_formats(formats)
subtitles = {}
for sub in video_data.get('subtitle', []):
sub_url = sub.get('url')
if not sub_url:
continue
subtitles.setdefault(sub.get('name'), []).append({
'url': sub_url,
'ext': 'srt',
})
title = video_data['synopsis'].strip()
return {
'id': video_id,
'title': title,
'description': video_data.get('description'),
'series': product_data.get('series', {}).get('name'),
'episode': title,
'episode_number': int_or_none(video_data.get('number')),
'duration': int_or_none(stream_data.get('duration')),
'thumbnail': video_data.get('cover_image_url'),
'formats': formats,
'subtitles': subtitles,
}

View File

@@ -7,6 +7,7 @@ from ..utils import (
ExtractorError,
parse_duration,
str_to_int,
urljoin,
)
@@ -66,10 +67,9 @@ class VpornIE(InfoExtractor):
description = self._html_search_regex(
r'class="(?:descr|description_txt)">(.*?)</div>',
webpage, 'description', fatal=False)
thumbnail = self._html_search_regex(
r'flashvars\.imageUrl\s*=\s*"([^"]+)"', webpage, 'description', fatal=False, default=None)
if thumbnail:
thumbnail = 'http://www.vporn.com' + thumbnail
thumbnail = urljoin('http://www.vporn.com', self._html_search_regex(
r'flashvars\.imageUrl\s*=\s*"([^"]+)"', webpage, 'description',
default=None))
uploader = self._html_search_regex(
r'(?s)Uploaded by:.*?<a href="/user/[^"]+"[^>]*>(.+?)</a>',

View File

@@ -0,0 +1,140 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
ExtractorError,
int_or_none,
str_or_none,
)
class VVVVIDIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?vvvvid\.it/#!(?:show|anime|film|series)/(?P<show_id>\d+)/[^/]+/(?P<season_id>\d+)/(?P<id>[0-9]+)'
_TESTS = [{
# video_type == 'video/vvvvid'
'url': 'https://www.vvvvid.it/#!show/434/perche-dovrei-guardarlo-di-dario-moccia/437/489048/ping-pong',
'md5': 'b8d3cecc2e981adc3835adf07f6df91b',
'info_dict': {
'id': '489048',
'ext': 'mp4',
'title': 'Ping Pong',
},
}, {
# video_type == 'video/rcs'
'url': 'https://www.vvvvid.it/#!show/376/death-note-live-action/377/482493/episodio-01',
'md5': '33e0edfba720ad73a8782157fdebc648',
'info_dict': {
'id': '482493',
'ext': 'mp4',
'title': 'Episodio 01',
},
}]
_conn_id = None
def _real_initialize(self):
self._conn_id = self._download_json(
'https://www.vvvvid.it/user/login',
None, headers=self.geo_verification_headers())['data']['conn_id']
def _real_extract(self, url):
show_id, season_id, video_id = re.match(self._VALID_URL, url).groups()
response = self._download_json(
'https://www.vvvvid.it/vvvvid/ondemand/%s/season/%s' % (show_id, season_id),
video_id, headers=self.geo_verification_headers(), query={
'conn_id': self._conn_id,
})
if response['result'] == 'error':
raise ExtractorError('%s said: %s' % (
self.IE_NAME, response['message']), expected=True)
vid = int(video_id)
video_data = list(filter(
lambda episode: episode.get('video_id') == vid, response['data']))[0]
formats = []
# vvvvid embed_info decryption algorithm is reverse engineered from function $ds(h) at vvvvid.js
def ds(h):
g = "MNOPIJKL89+/4567UVWXQRSTEFGHABCDcdefYZabstuvopqr0123wxyzklmnghij"
def f(m):
l = []
o = 0
b = False
m_len = len(m)
while ((not b) and o < m_len):
n = m[o] << 2
o += 1
k = -1
j = -1
if o < m_len:
n += m[o] >> 4
o += 1
if o < m_len:
k = (m[o - 1] << 4) & 255
k += m[o] >> 2
o += 1
if o < m_len:
j = (m[o - 1] << 6) & 255
j += m[o]
o += 1
else:
b = True
else:
b = True
else:
b = True
l.append(n)
if k != -1:
l.append(k)
if j != -1:
l.append(j)
return l
c = []
for e in h:
c.append(g.index(e))
c_len = len(c)
for e in range(c_len * 2 - 1, -1, -1):
a = c[e % c_len] ^ c[(e + 1) % c_len]
c[e % c_len] = a
c = f(c)
d = ''
for e in c:
d += chr(e)
return d
for quality in ('_sd', ''):
embed_code = video_data.get('embed_info' + quality)
if not embed_code:
continue
embed_code = ds(embed_code)
video_type = video_data.get('video_type')
if video_type in ('video/rcs', 'video/kenc'):
formats.extend(self._extract_akamai_formats(
embed_code, video_id))
else:
formats.extend(self._extract_wowza_formats(
'http://sb.top-ix.org/videomg/_definst_/mp4:%s/playlist.m3u8' % embed_code, video_id))
self._sort_formats(formats)
return {
'id': video_id,
'title': video_data['title'],
'formats': formats,
'thumbnail': video_data.get('thumbnail'),
'duration': int_or_none(video_data.get('length')),
'series': video_data.get('show_title'),
'season_id': season_id,
'season_number': video_data.get('season_number'),
'episode_id': str_or_none(video_data.get('id')),
'epidode_number': int_or_none(video_data.get('number')),
'episode_title': video_data['title'],
'view_count': int_or_none(video_data.get('views')),
'like_count': int_or_none(video_data.get('video_likes')),
}

View File

@@ -1703,9 +1703,9 @@ def base_url(url):
def urljoin(base, path):
if not isinstance(path, compat_str) or not path:
return None
if re.match(r'https?://', path):
if re.match(r'^(?:https?:)?//', path):
return path
if not isinstance(base, compat_str) or not re.match(r'https?://', base):
if not isinstance(base, compat_str) or not re.match(r'^(?:https?:)?//', base):
return None
return compat_urlparse.urljoin(base, path)

View File

@@ -1,3 +1,3 @@
from __future__ import unicode_literals
__version__ = '2016.12.15'
__version__ = '2016.12.20'