Compare commits

...

27 Commits

Author SHA1 Message Date
Sergey M․
0db492c02a release 2017.07.23 2017-07-23 01:09:09 +07:00
Sergey M․
425f41319a [ChangeLog] Actualize 2017-07-23 01:06:08 +07:00
Sergey M․
71dde5eecf [itv] Fix production id extraction (closes #13671) 2017-07-23 00:59:07 +07:00
Sergey M․
935d6c20c0 [vidio] Make duration non fatal and fix typo 2017-07-23 00:44:50 +07:00
Sergey M․
e0f1fb0a27 [mtv] Skip missing video parts (closes #13690) 2017-07-23 00:25:23 +07:00
Sergey M․
0017d9ad6d [YoutubeDL] Improve default format specification (closes #13704) 2017-07-23 00:12:01 +07:00
Sergey M․
327c8364f1 [sportbox:embed] Fix extraction 2017-07-22 21:35:14 +07:00
dubber0
359aa2fdd1 [npo] Add support for npo3.nl URLs 2017-07-22 19:15:55 +07:00
Sergey M․
f76c02c87b [dramafever] Fix tests 2017-07-22 11:41:40 +07:00
Sergey M․
7d9a1db111 [dramafever] Remove video id from title (closes #13699) 2017-07-22 11:40:46 +07:00
Sergey M․
0396806f67 [YoutubeDL] Do not override id, extractor and extractor_key in url_transparent
All these meta fields must be borrowed from final extractor that actually performs extraction.
This commit fixes extractor id in download archives for url_transparent downloads. Previously, 'transparent' extractor was erroneously
used for extractor archive id, e.g. 'eggheadlesson 4n8ugwwj5t' instead of 'wistia 4n8ugwwj5t'.
2017-07-21 00:13:32 +07:00
Sergey M․
dc6520aa3d [egghead:lesson] Add extractor (#6635) 2017-07-20 23:22:36 +07:00
Sergey M․
c653326a14 [funnyordie] Extract more metadata (closes #13677) 2017-07-20 22:50:56 +07:00
Yen Chi Hsuan
3fcf346ac1 [youku:show] Refine playlist extraction
Handle playlists that the initial page is not the first page
2017-07-20 23:20:46 +08:00
Yen Chi Hsuan
fa63cf6c23 [youku:show] Fix playlist extraction (closes #13248) 2017-07-20 22:57:51 +08:00
Yen Chi Hsuan
85f5a74b6c [tbs] Mark as broken and skip invalid tests 2017-07-20 21:19:09 +08:00
Yen Chi Hsuan
d20b1c6725 [dispeak] Recognize sevt subdomain (closes #13276) 2017-07-20 18:14:14 +08:00
Sergey M․
bb176df3bb [spiegel:article] Move test 2017-07-17 22:19:40 +07:00
Sergey M․
83d00044c1 [adn] Improve error reporting (#13663) 2017-07-16 20:50:32 +07:00
Sergey M․
7abed4e06c [crunchyroll] Relax series and season regex (closes #13659) 2017-07-16 12:40:45 +07:00
Sergey M․
13eb526f11 [nexx:embed] PEP 8 2017-07-16 05:23:19 +07:00
Sergey M․
00d06e3cfc [spiegel:article] Add support for nexx iframe embeds (closes #13029) 2017-07-16 04:38:20 +07:00
Sergey M․
749ca5eced [extractor/common] Fix playlist_from_matches 2017-07-16 04:33:14 +07:00
Sergey M․
3f59b0154a [nexx:embed] Add extractor for iframe embeds 2017-07-16 04:32:37 +07:00
Sergey M․
089b97cfee [nexx] Improve JS embed extraction 2017-07-16 04:30:48 +07:00
Sergey M․
decf86044d [pearvideo] Improve (closes #13031) 2017-07-16 03:06:04 +07:00
troywith77
94b817edeb [pearvideo] Add extractor 2017-07-16 03:02:31 +07:00
26 changed files with 510 additions and 124 deletions

View File

@@ -6,8 +6,8 @@
---
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2017.07.15*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2017.07.15**
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2017.07.23*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2017.07.23**
### Before submitting an *issue* make sure you have:
- [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
@@ -35,7 +35,7 @@ $ youtube-dl -v <your command line>
[debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2017.07.15
[debug] youtube-dl version 2017.07.23
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {}

View File

@@ -1,3 +1,30 @@
version 2017.07.23
Core
* [YoutubeDL] Improve default format specification (#13704)
* [YoutubeDL] Do not override id, extractor and extractor_key for
url_transparent entities
* [extractor/common] Fix playlist_from_matches
Extractors
* [itv] Fix production id extraction (#13671, #13703)
* [vidio] Make duration non fatal and fix typo
* [mtv] Skip missing video parts (#13690)
* [sportbox:embed] Fix extraction
+ [npo] Add support for npo3.nl URLs (#13695)
* [dramafever] Remove video id from title (#13699)
+ [egghead:lesson] Add support for lessons (#6635)
* [funnyordie] Extract more metadata (#13677)
* [youku:show] Fix playlist extraction (#13248)
+ [dispeak] Recognize sevt subdomain (#13276)
* [adn] Improve error reporting (#13663)
* [crunchyroll] Relax series and season regex (#13659)
+ [spiegel:article] Add support for nexx iframe embeds (#13029)
+ [nexx:embed] Add support for iframe embeds
* [nexx] Improve JS embed extraction
+ [pearvideo] Add support for pearvideo.com (#13031)
version 2017.07.15
Core

View File

@@ -42,7 +42,7 @@
- **Allocine**
- **AlphaPorno**
- **AMCNetworks**
- **anderetijden**: npo.nl and ntr.nl
- **anderetijden**: npo.nl, ntr.nl, omroepwnl.nl, zapp.nl and npo3.nl
- **AnimeOnDemand**
- **anitube.se**
- **Anvato**
@@ -238,6 +238,7 @@
- **EbaumsWorld**
- **EchoMsk**
- **egghead:course**: egghead.io course
- **egghead:lesson**: egghead.io lesson
- **eHow**
- **Einthusan**
- **eitb.tv**
@@ -522,6 +523,7 @@
- **NextMediaActionNews**: 蘋果日報 - 動新聞
- **NextTV**: 壹電視
- **Nexx**
- **NexxEmbed**
- **nfb**: National Film Board of Canada
- **nfl.com**
- **NhkVod**
@@ -552,7 +554,7 @@
- **NowTVList**
- **nowvideo**: NowVideo
- **Noz**
- **npo**: npo.nl and ntr.nl
- **npo**: npo.nl, ntr.nl, omroepwnl.nl, zapp.nl and npo3.nl
- **npo.nl:live**
- **npo.nl:radio**
- **npo.nl:radio:fragment**
@@ -596,6 +598,7 @@
- **Patreon**
- **pbs**: Public Broadcasting Service (PBS) and member stations: PBS: Public Broadcasting Service, APT - Alabama Public Television (WBIQ), GPB/Georgia Public Broadcasting (WGTV), Mississippi Public Broadcasting (WMPN), Nashville Public Television (WNPT), WFSU-TV (WFSU), WSRE (WSRE), WTCI (WTCI), WPBA/Channel 30 (WPBA), Alaska Public Media (KAKM), Arizona PBS (KAET), KNME-TV/Channel 5 (KNME), Vegas PBS (KLVX), AETN/ARKANSAS ETV NETWORK (KETS), KET (WKLE), WKNO/Channel 10 (WKNO), LPB/LOUISIANA PUBLIC BROADCASTING (WLPB), OETA (KETA), Ozarks Public Television (KOZK), WSIU Public Broadcasting (WSIU), KEET TV (KEET), KIXE/Channel 9 (KIXE), KPBS San Diego (KPBS), KQED (KQED), KVIE Public Television (KVIE), PBS SoCal/KOCE (KOCE), ValleyPBS (KVPT), CONNECTICUT PUBLIC TELEVISION (WEDH), KNPB Channel 5 (KNPB), SOPTV (KSYS), Rocky Mountain PBS (KRMA), KENW-TV3 (KENW), KUED Channel 7 (KUED), Wyoming PBS (KCWC), Colorado Public Television / KBDI 12 (KBDI), KBYU-TV (KBYU), Thirteen/WNET New York (WNET), WGBH/Channel 2 (WGBH), WGBY (WGBY), NJTV Public Media NJ (WNJT), WLIW21 (WLIW), mpt/Maryland Public Television (WMPB), WETA Television and Radio (WETA), WHYY (WHYY), PBS 39 (WLVT), WVPT - Your Source for PBS and More! (WVPT), Howard University Television (WHUT), WEDU PBS (WEDU), WGCU Public Media (WGCU), WPBT2 (WPBT), WUCF TV (WUCF), WUFT/Channel 5 (WUFT), WXEL/Channel 42 (WXEL), WLRN/Channel 17 (WLRN), WUSF Public Broadcasting (WUSF), ETV (WRLK), UNC-TV (WUNC), PBS Hawaii - Oceanic Cable Channel 10 (KHET), Idaho Public Television (KAID), KSPS (KSPS), OPB (KOPB), KWSU/Channel 10 & KTNW/Channel 31 (KWSU), WILL-TV (WILL), Network Knowledge - WSEC/Springfield (WSEC), WTTW11 (WTTW), Iowa Public Television/IPTV (KDIN), Nine Network (KETC), PBS39 Fort Wayne (WFWA), WFYI Indianapolis (WFYI), Milwaukee Public Television (WMVS), WNIN (WNIN), WNIT Public Television (WNIT), WPT (WPNE), WVUT/Channel 22 (WVUT), WEIU/Channel 51 (WEIU), WQPT-TV (WQPT), WYCC PBS Chicago (WYCC), WIPB-TV (WIPB), WTIU (WTIU), CET (WCET), ThinkTVNetwork (WPTD), WBGU-TV (WBGU), WGVU TV (WGVU), NET1 (KUON), Pioneer Public Television (KWCM), SDPB Television (KUSD), TPT (KTCA), KSMQ (KSMQ), KPTS/Channel 8 (KPTS), KTWU/Channel 11 (KTWU), East Tennessee PBS (WSJK), WCTE-TV (WCTE), WLJT, Channel 11 (WLJT), WOSU TV (WOSU), WOUB/WOUC (WOUB), WVPB (WVPB), WKYU-PBS (WKYU), KERA 13 (KERA), MPBN (WCBB), Mountain Lake PBS (WCFE), NHPTV (WENH), Vermont PBS (WETK), witf (WITF), WQED Multimedia (WQED), WMHT Educational Telecommunications (WMHT), Q-TV (WDCQ), WTVS Detroit Public TV (WTVS), CMU Public Television (WCMU), WKAR-TV (WKAR), WNMU-TV Public TV 13 (WNMU), WDSE - WRPT (WDSE), WGTE TV (WGTE), Lakeland Public Television (KAWE), KMOS-TV - Channels 6.1, 6.2 and 6.3 (KMOS), MontanaPBS (KUSM), KRWG/Channel 22 (KRWG), KACV (KACV), KCOS/Channel 13 (KCOS), WCNY/Channel 24 (WCNY), WNED (WNED), WPBS (WPBS), WSKG Public TV (WSKG), WXXI (WXXI), WPSU (WPSU), WVIA Public Media Studios (WVIA), WTVI (WTVI), Western Reserve PBS (WNEO), WVIZ/PBS ideastream (WVIZ), KCTS 9 (KCTS), Basin PBS (KPBT), KUHT / Channel 8 (KUHT), KLRN (KLRN), KLRU (KLRU), WTJX Channel 12 (WTJX), WCVE PBS (WCVE), KBTC Public Television (KBTC)
- **pcmag**
- **PearVideo**
- **People**
- **periscope**: Periscope
- **periscope:user**: Periscope user videos
@@ -772,7 +775,7 @@
- **tagesschau:player**
- **Tass**
- **TastyTrade**
- **TBS**
- **TBS** (Currently broken)
- **TDSLifeway**
- **teachertube**: teachertube.com videos
- **teachertube:user:collection**: teachertube.com user and collection videos
@@ -950,7 +953,7 @@
- **VoiceRepublic**
- **VoxMedia**
- **Vporn**
- **vpro**: npo.nl and ntr.nl
- **vpro**: npo.nl, ntr.nl, omroepwnl.nl, zapp.nl and npo3.nl
- **Vrak**
- **VRT**: deredactie.be, sporza.be, cobra.be and cobra.canvas.be
- **vrv**
@@ -976,7 +979,7 @@
- **wholecloud**: WholeCloud
- **Wimp**
- **Wistia**
- **wnl**: npo.nl and ntr.nl
- **wnl**: npo.nl, ntr.nl, omroepwnl.nl, zapp.nl and npo3.nl
- **WorldStarHipHop**
- **wrzuta.pl**
- **wrzuta.pl:playlist**

View File

@@ -41,6 +41,7 @@ def _make_result(formats, **kwargs):
'id': 'testid',
'title': 'testttitle',
'extractor': 'testex',
'extractor_key': 'TestEx',
}
res.update(**kwargs)
return res
@@ -448,6 +449,17 @@ class TestFormatSelection(unittest.TestCase):
pass
self.assertEqual(ydl.downloaded_info_dicts, [])
def test_default_format_spec(self):
ydl = YDL({'simulate': True})
self.assertEqual(ydl._default_format_spec({}), 'bestvideo+bestaudio/best')
ydl = YDL({'outtmpl': '-'})
self.assertEqual(ydl._default_format_spec({}), 'best')
ydl = YDL({})
self.assertEqual(ydl._default_format_spec({}, download=False), 'bestvideo+bestaudio/best')
self.assertEqual(ydl._default_format_spec({'is_live': True}), 'best')
class TestYoutubeDL(unittest.TestCase):
def test_subtitles(self):
@@ -761,7 +773,8 @@ class TestYoutubeDL(unittest.TestCase):
'_type': 'url_transparent',
'url': 'foo2:',
'ie_key': 'Foo2',
'title': 'foo1 title'
'title': 'foo1 title',
'id': 'foo1_id',
}
class Foo2IE(InfoExtractor):
@@ -787,6 +800,9 @@ class TestYoutubeDL(unittest.TestCase):
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['url'], TEST_URL)
self.assertEqual(downloaded['title'], 'foo1 title')
self.assertEqual(downloaded['id'], 'testid')
self.assertEqual(downloaded['extractor'], 'testex')
self.assertEqual(downloaded['extractor_key'], 'TestEx')
if __name__ == '__main__':

View File

@@ -860,7 +860,7 @@ class YoutubeDL(object):
force_properties = dict(
(k, v) for k, v in ie_result.items() if v is not None)
for f in ('_type', 'url', 'ie_key'):
for f in ('_type', 'url', 'id', 'extractor', 'extractor_key', 'ie_key'):
if f in force_properties:
del force_properties[f]
new_result = info.copy()
@@ -1064,6 +1064,25 @@ class YoutubeDL(object):
return op(actual_value, comparison_value)
return _filter
def _default_format_spec(self, info_dict, download=True):
req_format_list = []
def can_have_partial_formats():
if self.params.get('simulate', False):
return True
if not download:
return True
if self.params.get('outtmpl', DEFAULT_OUTTMPL) == '-':
return False
if info_dict.get('is_live'):
return False
merger = FFmpegMergerPP(self)
return merger.available and merger.can_merge()
if can_have_partial_formats():
req_format_list.append('bestvideo+bestaudio')
req_format_list.append('best')
return '/'.join(req_format_list)
def build_format_selector(self, format_spec):
def syntax_error(note, start):
message = (
@@ -1534,14 +1553,10 @@ class YoutubeDL(object):
req_format = self.params.get('format')
if req_format is None:
req_format_list = []
if (self.params.get('outtmpl', DEFAULT_OUTTMPL) != '-' and
not info_dict.get('is_live')):
merger = FFmpegMergerPP(self)
if merger.available and merger.can_merge():
req_format_list.append('bestvideo+bestaudio')
req_format_list.append('best')
req_format = '/'.join(req_format_list)
req_format = self._default_format_spec(info_dict, download=download)
if self.params.get('verbose'):
self.to_stdout('[debug] Default format spec: %s' % req_format)
format_selector = self.build_format_selector(req_format)
# While in format selection we may need to have an access to the original

View File

@@ -107,11 +107,13 @@ class ADNIE(InfoExtractor):
metas = options.get('metas') or {}
title = metas.get('title') or video_info['title']
links = player_config.get('links') or {}
error = None
if not links:
links_url = player_config['linksurl']
links_data = self._download_json(urljoin(
self._BASE_URL, links_url), video_id)
links = links_data.get('links') or {}
error = links_data.get('error')
formats = []
for format_id, qualities in links.items():
@@ -130,7 +132,8 @@ class ADNIE(InfoExtractor):
for f in m3u8_formats:
f['language'] = 'fr'
formats.extend(m3u8_formats)
error = options.get('error')
if not error:
error = options.get('error')
if not formats and error:
raise ExtractorError('%s said: %s' % (self.IE_NAME, error), expected=True)
self._sort_formats(formats)

View File

@@ -730,12 +730,12 @@ class InfoExtractor(object):
video_info['title'] = video_title
return video_info
def playlist_from_matches(self, matches, video_id, video_title, getter=None, ie=None):
urlrs = orderedSet(
def playlist_from_matches(self, matches, playlist_id=None, playlist_title=None, getter=None, ie=None):
urls = orderedSet(
self.url_result(self._proto_relative_url(getter(m) if getter else m), ie)
for m in matches)
return self.playlist_result(
urlrs, playlist_id=video_id, playlist_title=video_title)
urls, playlist_id=playlist_id, playlist_title=playlist_title)
@staticmethod
def playlist_result(entries, playlist_id=None, playlist_title=None, playlist_description=None):

View File

@@ -510,7 +510,7 @@ Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
# webpage provide more accurate data than series_title from XML
series = self._html_search_regex(
r'id=["\']showmedia_about_episode_num[^>]+>\s*<a[^>]+>([^<]+)',
r'(?s)<h\d[^>]+\bid=["\']showmedia_about_episode_num[^>]+>(.+?)</h\d',
webpage, 'series', fatal=False)
season = xpath_text(metadata, 'series_title')
@@ -518,7 +518,7 @@ Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
episode_number = int_or_none(xpath_text(metadata, 'episode_number'))
season_number = int_or_none(self._search_regex(
r'(?s)<h4[^>]+id=["\']showmedia_about_episode_num[^>]+>.+?</h4>\s*<h4>\s*Season (\d+)',
r'(?s)<h\d[^>]+id=["\']showmedia_about_episode_num[^>]+>.+?</h\d>\s*<h4>\s*Season (\d+)',
webpage, 'season number', default=None))
return {

View File

@@ -13,7 +13,7 @@ from ..utils import (
class DigitallySpeakingIE(InfoExtractor):
_VALID_URL = r'https?://(?:evt\.dispeak|events\.digitallyspeaking)\.com/(?:[^/]+/)+xml/(?P<id>[^.]+)\.xml'
_VALID_URL = r'https?://(?:s?evt\.dispeak|events\.digitallyspeaking)\.com/(?:[^/]+/)+xml/(?P<id>[^.]+)\.xml'
_TESTS = [{
# From http://gdcvault.com/play/1023460/Tenacious-Design-and-The-Interface
@@ -28,6 +28,10 @@ class DigitallySpeakingIE(InfoExtractor):
# From http://www.gdcvault.com/play/1014631/Classic-Game-Postmortem-PAC
'url': 'http://events.digitallyspeaking.com/gdc/sf11/xml/12396_1299111843500GMPX.xml',
'only_matching': True,
}, {
# From http://www.gdcvault.com/play/1013700/Advanced-Material
'url': 'http://sevt.dispeak.com/ubm/gdc/eur10/xml/11256_1282118587281VNIT.xml',
'only_matching': True,
}]
def _parse_mp4(self, metadata):

View File

@@ -12,6 +12,7 @@ from ..utils import (
ExtractorError,
clean_html,
int_or_none,
remove_end,
sanitized_Request,
urlencode_postdata
)
@@ -72,15 +73,15 @@ class DramaFeverIE(DramaFeverBaseIE):
'url': 'http://www.dramafever.com/drama/4512/1/Cooking_with_Shin/',
'info_dict': {
'id': '4512.1',
'ext': 'mp4',
'title': 'Cooking with Shin 4512.1',
'ext': 'flv',
'title': 'Cooking with Shin',
'description': 'md5:a8eec7942e1664a6896fcd5e1287bfd0',
'episode': 'Episode 1',
'episode_number': 1,
'thumbnail': r're:^https?://.*\.jpg',
'timestamp': 1404336058,
'upload_date': '20140702',
'duration': 343,
'duration': 344,
},
'params': {
# m3u8 download
@@ -90,15 +91,15 @@ class DramaFeverIE(DramaFeverBaseIE):
'url': 'http://www.dramafever.com/drama/4826/4/Mnet_Asian_Music_Awards_2015/?ap=1',
'info_dict': {
'id': '4826.4',
'ext': 'mp4',
'title': 'Mnet Asian Music Awards 2015 4826.4',
'ext': 'flv',
'title': 'Mnet Asian Music Awards 2015',
'description': 'md5:3ff2ee8fedaef86e076791c909cf2e91',
'episode': 'Mnet Asian Music Awards 2015 - Part 3',
'episode_number': 4,
'thumbnail': r're:^https?://.*\.jpg',
'timestamp': 1450213200,
'upload_date': '20151215',
'duration': 5602,
'duration': 5359,
},
'params': {
# m3u8 download
@@ -122,6 +123,10 @@ class DramaFeverIE(DramaFeverBaseIE):
countries=self._GEO_COUNTRIES)
raise
# title is postfixed with video id for some reason, removing
if info.get('title'):
info['title'] = remove_end(info['title'], video_id).strip()
series_id, episode_number = video_id.split('.')
episode_info = self._download_json(
# We only need a single episode info, so restricting page size to one episode

View File

@@ -2,6 +2,11 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
int_or_none,
try_get,
unified_timestamp,
)
class EggheadCourseIE(InfoExtractor):
@@ -33,3 +38,47 @@ class EggheadCourseIE(InfoExtractor):
return self.playlist_result(
entries, playlist_id, course.get('title'),
course.get('description'))
class EggheadLessonIE(InfoExtractor):
IE_DESC = 'egghead.io lesson'
IE_NAME = 'egghead:lesson'
_VALID_URL = r'https://egghead\.io/lessons/(?P<id>[^/?#&]+)'
_TEST = {
'url': 'https://egghead.io/lessons/javascript-linear-data-flow-with-container-style-types-box',
'info_dict': {
'id': 'fv5yotjxcg',
'ext': 'mp4',
'title': 'Create linear data flow with container style types (Box)',
'description': 'md5:9aa2cdb6f9878ed4c39ec09e85a8150e',
'thumbnail': r're:^https?:.*\.jpg$',
'timestamp': 1481296768,
'upload_date': '20161209',
'duration': 304,
'view_count': 0,
'tags': ['javascript', 'free'],
},
'params': {
'skip_download': True,
},
}
def _real_extract(self, url):
lesson_id = self._match_id(url)
lesson = self._download_json(
'https://egghead.io/api/v1/lessons/%s' % lesson_id, lesson_id)
return {
'_type': 'url_transparent',
'ie_key': 'Wistia',
'url': 'wistia:%s' % lesson['wistia_id'],
'id': lesson['wistia_id'],
'title': lesson.get('title'),
'description': lesson.get('summary'),
'thumbnail': lesson.get('thumb_nail'),
'timestamp': unified_timestamp(lesson.get('published_at')),
'duration': int_or_none(lesson.get('duration')),
'view_count': int_or_none(lesson.get('plays_count')),
'tags': try_get(lesson, lambda x: x['tag_list'], list),
}

View File

@@ -298,7 +298,10 @@ from .dw import (
from .eagleplatform import EaglePlatformIE
from .ebaumsworld import EbaumsWorldIE
from .echomsk import EchoMskIE
from .egghead import EggheadCourseIE
from .egghead import (
EggheadCourseIE,
EggheadLessonIE,
)
from .ehow import EHowIE
from .eighttracks import EightTracksIE
from .einthusan import EinthusanIE
@@ -653,7 +656,10 @@ from .nextmedia import (
AppleDailyIE,
NextTVIE,
)
from .nexx import NexxIE
from .nexx import (
NexxIE,
NexxEmbedIE,
)
from .nfb import NFBIE
from .nfl import NFLIE
from .nhk import NhkVodIE
@@ -762,6 +768,7 @@ from .pandoratv import PandoraTVIE
from .parliamentliveuk import ParliamentLiveUKIE
from .patreon import PatreonIE
from .pbs import PBSIE
from .pearvideo import PearVideoIE
from .people import PeopleIE
from .periscope import (
PeriscopeIE,

View File

@@ -1,10 +1,14 @@
from __future__ import unicode_literals
import json
import re
from .common import InfoExtractor
from ..utils import ExtractorError
from ..utils import (
ExtractorError,
float_or_none,
int_or_none,
unified_timestamp,
)
class FunnyOrDieIE(InfoExtractor):
@@ -18,6 +22,10 @@ class FunnyOrDieIE(InfoExtractor):
'title': 'Heart-Shaped Box: Literal Video Version',
'description': 'md5:ea09a01bc9a1c46d9ab696c01747c338',
'thumbnail': r're:^http:.*\.jpg$',
'uploader': 'DASjr',
'timestamp': 1317904928,
'upload_date': '20111006',
'duration': 318.3,
},
}, {
'url': 'http://www.funnyordie.com/embed/e402820827',
@@ -27,6 +35,8 @@ class FunnyOrDieIE(InfoExtractor):
'title': 'Please Use This Song (Jon Lajoie)',
'description': 'Please use this to sell something. www.jonlajoie.com',
'thumbnail': r're:^http:.*\.jpg$',
'timestamp': 1398988800,
'upload_date': '20140502',
},
'params': {
'skip_download': True,
@@ -100,15 +110,53 @@ class FunnyOrDieIE(InfoExtractor):
'url': 'http://www.funnyordie.com%s' % src,
}]
post_json = self._search_regex(
r'fb_post\s*=\s*(\{.*?\});', webpage, 'post details')
post = json.loads(post_json)
timestamp = unified_timestamp(self._html_search_meta(
'uploadDate', webpage, 'timestamp', default=None))
uploader = self._html_search_regex(
r'<h\d[^>]+\bclass=["\']channel-preview-name[^>]+>(.+?)</h',
webpage, 'uploader', default=None)
title, description, thumbnail, duration = [None] * 4
medium = self._parse_json(
self._search_regex(
r'jsonMedium\s*=\s*({.+?});', webpage, 'JSON medium',
default='{}'),
video_id, fatal=False)
if medium:
title = medium.get('title')
duration = float_or_none(medium.get('duration'))
if not timestamp:
timestamp = unified_timestamp(medium.get('publishDate'))
post = self._parse_json(
self._search_regex(
r'fb_post\s*=\s*(\{.*?\});', webpage, 'post details',
default='{}'),
video_id, fatal=False)
if post:
if not title:
title = post.get('name')
description = post.get('description')
thumbnail = post.get('picture')
if not title:
title = self._og_search_title(webpage)
if not description:
description = self._og_search_description(webpage)
if not duration:
duration = int_or_none(self._html_search_meta(
('video:duration', 'duration'), webpage, 'duration', default=False))
return {
'id': video_id,
'title': post['name'],
'description': post.get('description'),
'thumbnail': post.get('picture'),
'title': title,
'description': description,
'thumbnail': thumbnail,
'uploader': uploader,
'timestamp': timestamp,
'duration': duration,
'formats': formats,
'subtitles': subtitles,
}

View File

@@ -36,7 +36,10 @@ from .brightcove import (
BrightcoveLegacyIE,
BrightcoveNewIE,
)
from .nexx import NexxIE
from .nexx import (
NexxIE,
NexxEmbedIE,
)
from .nbc import NBCSportsVPlayerIE
from .ooyala import OoyalaIE
from .rutv import RUTVIE
@@ -2155,6 +2158,11 @@ class GenericIE(InfoExtractor):
if nexx_urls:
return self.playlist_from_matches(nexx_urls, video_id, video_title, ie=NexxIE.ie_key())
# Look for Nexx iFrame embeds
nexx_embed_urls = NexxEmbedIE._extract_urls(webpage)
if nexx_embed_urls:
return self.playlist_from_matches(nexx_embed_urls, video_id, video_title, ie=NexxEmbedIE.ie_key())
# Look for ThePlatform embeds
tp_urls = ThePlatformIE._extract_urls(webpage)
if tp_urls:

View File

@@ -59,12 +59,18 @@ class ITVIE(InfoExtractor):
def _add_sub_element(element, name):
return etree.SubElement(element, _add_ns(name))
production_id = (
params.get('data-video-autoplay-id') or
'%s#001' % (
params.get('data-video-episode-id') or
video_id.replace('a', '/')))
req_env = etree.Element(_add_ns('soapenv:Envelope'))
_add_sub_element(req_env, 'soapenv:Header')
body = _add_sub_element(req_env, 'soapenv:Body')
get_playlist = _add_sub_element(body, ('tem:GetPlaylist'))
request = _add_sub_element(get_playlist, 'tem:request')
_add_sub_element(request, 'itv:ProductionId').text = params['data-video-id']
_add_sub_element(request, 'itv:ProductionId').text = production_id
_add_sub_element(request, 'itv:RequestGuid').text = compat_str(uuid.uuid4()).upper()
vodcrid = _add_sub_element(request, 'itv:Vodcrid')
_add_sub_element(vodcrid, 'com:Id')

View File

@@ -83,7 +83,7 @@ class MTVServicesInfoExtractor(InfoExtractor):
hls_url = rendition.find('./src').text
formats.extend(self._extract_m3u8_formats(
hls_url, video_id, ext='mp4', entry_protocol='m3u8_native',
m3u8_id='hls'))
m3u8_id='hls', fatal=False))
else:
# fms
try:
@@ -106,7 +106,8 @@ class MTVServicesInfoExtractor(InfoExtractor):
}])
except (KeyError, TypeError):
raise ExtractorError('Invalid rendition field.')
self._sort_formats(formats)
if formats:
self._sort_formats(formats)
return formats
def _extract_subtitles(self, mdoc, mtvn_id):
@@ -133,8 +134,11 @@ class MTVServicesInfoExtractor(InfoExtractor):
mediagen_url += 'acceptMethods='
mediagen_url += 'hls' if use_hls else 'fms'
mediagen_doc = self._download_xml(mediagen_url, video_id,
'Downloading video urls')
mediagen_doc = self._download_xml(
mediagen_url, video_id, 'Downloading video urls', fatal=False)
if mediagen_doc is False:
return None
item = mediagen_doc.find('./video/item')
if item is not None and item.get('type') == 'text':
@@ -174,6 +178,13 @@ class MTVServicesInfoExtractor(InfoExtractor):
formats = self._extract_video_formats(mediagen_doc, mtvn_id, video_id)
# Some parts of complete video may be missing (e.g. missing Act 3 in
# http://www.southpark.de/alle-episoden/s14e01-sexual-healing)
if not formats:
return None
self._sort_formats(formats)
return {
'title': title,
'formats': formats,
@@ -205,9 +216,14 @@ class MTVServicesInfoExtractor(InfoExtractor):
title = xpath_text(idoc, './channel/title')
description = xpath_text(idoc, './channel/description')
entries = []
for item in idoc.findall('.//item'):
info = self._get_video_info(item, use_hls)
if info:
entries.append(info)
return self.playlist_result(
[self._get_video_info(item, use_hls) for item in idoc.findall('.//item')],
playlist_title=title, playlist_description=description)
entries, playlist_title=title, playlist_description=description)
def _extract_triforce_mgid(self, webpage, data_zone=None, video_id=None):
triforce_feed = self._parse_json(self._search_regex(

View File

@@ -72,18 +72,26 @@ class NexxIE(InfoExtractor):
entries = []
# JavaScript Integration
for domain_id, video_id in re.findall(
r'''(?isx)
<script\b[^>]+\bsrc=["\']https?://require\.nexx(?:\.cloud|cdn\.com)/(\d+).+?
onPLAYReady.+?
_play\.init\s*\(.+?\s*,\s*(\d+)\s*,\s*.+?\)
''', webpage):
entries.append('https://api.nexx.cloud/v3/%s/videos/byid/%s' % (domain_id, video_id))
mobj = re.search(
r'<script\b[^>]+\bsrc=["\']https?://require\.nexx(?:\.cloud|cdn\.com)/(?P<id>\d+)',
webpage)
if mobj:
domain_id = mobj.group('id')
for video_id in re.findall(
r'(?is)onPLAYReady.+?_play\.init\s*\(.+?\s*,\s*["\']?(\d+)',
webpage):
entries.append(
'https://api.nexx.cloud/v3/%s/videos/byid/%s'
% (domain_id, video_id))
# TODO: support more embed formats
return entries
@staticmethod
def _extract_url(webpage):
return NexxIE._extract_urls(webpage)[0]
def _handle_error(self, response):
status = int_or_none(try_get(
response, lambda x: x['metadata']['status']) or 200)
@@ -219,3 +227,45 @@ class NexxIE(InfoExtractor):
video, lambda x: x['episodedata']['season'])),
'formats': formats,
}
class NexxEmbedIE(InfoExtractor):
_VALID_URL = r'https?://embed\.nexx(?:\.cloud|cdn\.com)/\d+/(?P<id>[^/?#&]+)'
_TEST = {
'url': 'http://embed.nexx.cloud/748/KC1614647Z27Y7T?autoplay=1',
'md5': '16746bfc28c42049492385c989b26c4a',
'info_dict': {
'id': '161464',
'ext': 'mp4',
'title': 'Nervenkitzel Achterbahn',
'alt_title': 'Karussellbauer in Deutschland',
'description': 'md5:ffe7b1cc59a01f585e0569949aef73cc',
'release_year': 2005,
'creator': 'SPIEGEL TV',
'thumbnail': r're:^https?://.*\.jpg$',
'duration': 2761,
'timestamp': 1394021479,
'upload_date': '20140305',
},
'params': {
'format': 'bestvideo',
'skip_download': True,
},
}
@staticmethod
def _extract_urls(webpage):
# Reference:
# 1. https://nx-s.akamaized.net/files/201510/44.pdf
# iFrame Embed Integration
return [mobj.group('url') for mobj in re.finditer(
r'<iframe[^>]+\bsrc=(["\'])(?P<url>(?:https?:)?//embed\.nexx(?:\.cloud|cdn\.com)/\d+/(?:(?!\1).)+)\1',
webpage)]
def _real_extract(self, url):
embed_id = self._match_id(url)
webpage = self._download_webpage(url, embed_id)
return self.url_result(NexxIE._extract_url(webpage), ie=NexxIE.ie_key())

View File

@@ -28,7 +28,7 @@ class NPOBaseIE(InfoExtractor):
class NPOIE(NPOBaseIE):
IE_NAME = 'npo'
IE_DESC = 'npo.nl and ntr.nl'
IE_DESC = 'npo.nl, ntr.nl, omroepwnl.nl, zapp.nl and npo3.nl'
_VALID_URL = r'''(?x)
(?:
npo:|
@@ -38,7 +38,7 @@ class NPOIE(NPOBaseIE):
npo\.nl/(?!(?:live|radio)/)(?:[^/]+/){2}|
ntr\.nl/(?:[^/]+/){2,}|
omroepwnl\.nl/video/fragment/[^/]+__|
zapp\.nl/[^/]+/[^/]+/
(?:zapp|npo3)\.nl/(?:[^/]+/){2}
)
)
(?P<id>[^/?#]+)
@@ -146,6 +146,9 @@ class NPOIE(NPOBaseIE):
}, {
'url': 'http://www.zapp.nl/beste-vrienden-quiz/extra-video-s/WO_NTR_1067990',
'only_matching': True,
}, {
'url': 'https://www.npo3.nl/3onderzoekt/16-09-2015/VPWON_1239870',
'only_matching': True,
}, {
# live stream
'url': 'npo:LI_NL1_4188102',

View File

@@ -0,0 +1,63 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
qualities,
unified_timestamp,
)
class PearVideoIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?pearvideo\.com/video_(?P<id>\d+)'
_TEST = {
'url': 'http://www.pearvideo.com/video_1076290',
'info_dict': {
'id': '1076290',
'ext': 'mp4',
'title': '小浣熊在主人家玻璃上滚石头:没砸',
'description': 'md5:01d576b747de71be0ee85eb7cac25f9d',
'timestamp': 1494275280,
'upload_date': '20170508',
}
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
quality = qualities(
('ldflv', 'ld', 'sdflv', 'sd', 'hdflv', 'hd', 'src'))
formats = [{
'url': mobj.group('url'),
'format_id': mobj.group('id'),
'quality': quality(mobj.group('id')),
} for mobj in re.finditer(
r'(?P<id>[a-zA-Z]+)Url\s*=\s*(["\'])(?P<url>(?:https?:)?//.+?)\2',
webpage)]
self._sort_formats(formats)
title = self._search_regex(
(r'<h1[^>]+\bclass=(["\'])video-tt\1[^>]*>(?P<value>[^<]+)',
r'<[^>]+\bdata-title=(["\'])(?P<value>(?:(?!\1).)+)\1'),
webpage, 'title', group='value')
description = self._search_regex(
(r'<div[^>]+\bclass=(["\'])summary\1[^>]*>(?P<value>[^<]+)',
r'<[^>]+\bdata-summary=(["\'])(?P<value>(?:(?!\1).)+)\1'),
webpage, 'description', default=None,
group='value') or self._html_search_meta('Description', webpage)
timestamp = unified_timestamp(self._search_regex(
r'<div[^>]+\bclass=["\']date["\'][^>]*>([^<]+)',
webpage, 'timestamp', fatal=False))
return {
'id': video_id,
'title': title,
'description': description,
'timestamp': timestamp,
'formats': formats,
}

View File

@@ -4,6 +4,7 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from .nexx import NexxEmbedIE
from .spiegeltv import SpiegeltvIE
from ..compat import compat_urlparse
from ..utils import (
@@ -121,6 +122,26 @@ class SpiegelArticleIE(InfoExtractor):
},
'playlist_count': 6,
}, {
# Nexx iFrame embed
'url': 'http://www.spiegel.de/sptv/spiegeltv/spiegel-tv-ueber-schnellste-katapult-achterbahn-der-welt-taron-a-1137884.html',
'info_dict': {
'id': '161464',
'ext': 'mp4',
'title': 'Nervenkitzel Achterbahn',
'alt_title': 'Karussellbauer in Deutschland',
'description': 'md5:ffe7b1cc59a01f585e0569949aef73cc',
'release_year': 2005,
'creator': 'SPIEGEL TV',
'thumbnail': r're:^https?://.*\.jpg$',
'duration': 2761,
'timestamp': 1394021479,
'upload_date': '20140305',
},
'params': {
'format': 'bestvideo',
'skip_download': True,
},
}]
def _real_extract(self, url):
@@ -143,6 +164,9 @@ class SpiegelArticleIE(InfoExtractor):
entries = [
self.url_result(compat_urlparse.urljoin(
self.http_scheme() + '//spiegel.de/', embed_path))
for embed_path in embeds
]
return self.playlist_result(entries)
for embed_path in embeds]
if embeds:
return self.playlist_result(entries)
return self.playlist_from_matches(
NexxEmbedIE._extract_urls(webpage), ie=NexxEmbedIE.ie_key())

View File

@@ -4,7 +4,11 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import js_to_json
from ..utils import (
determine_ext,
int_or_none,
js_to_json,
)
class SportBoxEmbedIE(InfoExtractor):
@@ -14,8 +18,10 @@ class SportBoxEmbedIE(InfoExtractor):
'info_dict': {
'id': '211355',
'ext': 'mp4',
'title': 'В Новороссийске прошел детский турнир «Поле славы боевой»',
'title': '211355',
'thumbnail': r're:^https?://.*\.jpg$',
'duration': 292,
'view_count': int,
},
'params': {
# m3u8 download
@@ -24,6 +30,9 @@ class SportBoxEmbedIE(InfoExtractor):
}, {
'url': 'http://news.sportbox.ru/vdl/player?nid=370908&only_player=1&autostart=false&playeri=2&height=340&width=580',
'only_matching': True,
}, {
'url': 'https://news.sportbox.ru/vdl/player/media/193095',
'only_matching': True,
}]
@staticmethod
@@ -37,36 +46,34 @@ class SportBoxEmbedIE(InfoExtractor):
webpage = self._download_webpage(url, video_id)
wjplayer_data = self._parse_json(
self._search_regex(
r'(?s)wjplayer\(({.+?})\);', webpage, 'wjplayer settings'),
video_id, transform_source=js_to_json)
formats = []
def cleanup_js(code):
# desktop_advert_config contains complex Javascripts and we don't need it
return js_to_json(re.sub(r'desktop_advert_config.*', '', code))
jwplayer_data = self._parse_json(self._search_regex(
r'(?s)player\.setup\(({.+?})\);', webpage, 'jwplayer settings'), video_id,
transform_source=cleanup_js)
hls_url = jwplayer_data.get('hls_url')
if hls_url:
formats.extend(self._extract_m3u8_formats(
hls_url, video_id, ext='mp4', m3u8_id='hls'))
rtsp_url = jwplayer_data.get('rtsp_url')
if rtsp_url:
formats.append({
'url': rtsp_url,
'format_id': 'rtsp',
})
for source in wjplayer_data['sources']:
src = source.get('src')
if not src:
continue
if determine_ext(src) == 'm3u8':
formats.extend(self._extract_m3u8_formats(
src, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls', fatal=False))
else:
formats.append({
'url': src,
})
self._sort_formats(formats)
title = jwplayer_data['node_title']
thumbnail = jwplayer_data.get('image_url')
view_count = int_or_none(self._search_regex(
r'Просмотров\s*:\s*(\d+)', webpage, 'view count', default=None))
return {
'id': video_id,
'title': title,
'thumbnail': thumbnail,
'title': video_id,
'thumbnail': wjplayer_data.get('poster'),
'duration': int_or_none(wjplayer_data.get('duration')),
'view_count': view_count,
'formats': formats,
}

View File

@@ -8,6 +8,9 @@ from ..utils import extract_attributes
class TBSIE(TurnerBaseIE):
# https://github.com/rg3/youtube-dl/issues/13658
_WORKING = False
_VALID_URL = r'https?://(?:www\.)?(?P<site>tbs|tntdrama)\.com/videos/(?:[^/]+/)+(?P<id>[^/?#]+)\.html'
_TESTS = [{
'url': 'http://www.tbs.com/videos/people-of-earth/season-1/extras/2007318/theatrical-trailer.html',
@@ -17,7 +20,8 @@ class TBSIE(TurnerBaseIE):
'ext': 'mp4',
'title': 'Theatrical Trailer',
'description': 'Catch the latest comedy from TBS, People of Earth, premiering Halloween night--Monday, October 31, at 9/8c.',
}
},
'skip': 'TBS videos are deleted after a while',
}, {
'url': 'http://www.tntdrama.com/videos/good-behavior/season-1/extras/1538823/you-better-run.html',
'md5': 'ce53c6ead5e9f3280b4ad2031a6fab56',
@@ -26,7 +30,8 @@ class TBSIE(TurnerBaseIE):
'ext': 'mp4',
'title': 'You Better Run',
'description': 'Letty Raines must figure out what she\'s running toward while running away from her past. Good Behavior premieres November 15 at 9/8c.',
}
},
'skip': 'TBS videos are deleted after a while',
}]
def _real_extract(self, url):

View File

@@ -121,7 +121,11 @@ class VH1IE(MTVIE):
idoc = self._download_xml(
doc_url, video_id,
'Downloading info', transform_source=fix_xml_ampersands)
return self.playlist_result(
[self._get_video_info(item) for item in idoc.findall('.//item')],
playlist_id=video_id,
)
entries = []
for item in idoc.findall('.//item'):
info = self._get_video_info(item)
if info:
entries.append(info)
return self.playlist_result(entries, playlist_id=video_id)

View File

@@ -56,7 +56,8 @@ class VidioIE(InfoExtractor):
self._sort_formats(formats)
duration = int_or_none(duration or self._search_regex(
r'data-video-duration=(["\'])(?P<duartion>\d+)\1', webpage, 'duration'))
r'data-video-duration=(["\'])(?P<duration>\d+)\1', webpage,
'duration', fatal=False, group='duration'))
thumbnail = thumbnail or self._og_search_thumbnail(webpage)
like_count = int_or_none(self._search_regex(

View File

@@ -1,7 +1,6 @@
# coding: utf-8
from __future__ import unicode_literals
import itertools
import random
import re
import string
@@ -14,7 +13,6 @@ from ..utils import (
js_to_json,
str_or_none,
strip_jsonp,
urljoin,
)
@@ -222,17 +220,42 @@ class YoukuShowIE(InfoExtractor):
_VALID_URL = r'https?://list\.youku\.com/show/id_(?P<id>[0-9a-z]+)\.html'
IE_NAME = 'youku:show'
_TEST = {
_TESTS = [{
'url': 'http://list.youku.com/show/id_zc7c670be07ff11e48b3f.html',
'info_dict': {
'id': 'zc7c670be07ff11e48b3f',
'title': '花千骨 未删减',
'title': '花千骨 DVD',
'description': 'md5:a1ae6f5618571bbeb5c9821f9c81b558',
},
'playlist_count': 50,
}
}, {
# Episode number not starting from 1
'url': 'http://list.youku.com/show/id_zefbfbd70efbfbd780bef.html',
'info_dict': {
'id': 'zefbfbd70efbfbd780bef',
'title': '超级飞侠3',
'description': 'md5:275715156abebe5ccc2a1992e9d56b98',
},
'playlist_count': 24,
}, {
# Ongoing playlist. The initial page is the last one
'url': 'http://list.youku.com/show/id_za7c275ecd7b411e1a19e.html',
'only_matchine': True,
}]
_PAGE_SIZE = 40
def _extract_entries(self, playlist_data_url, show_id, note, query):
query['callback'] = 'cb'
playlist_data = self._download_json(
playlist_data_url, show_id, query=query, note=note,
transform_source=lambda s: js_to_json(strip_jsonp(s)))['html']
drama_list = (get_element_by_class('p-drama-grid', playlist_data) or
get_element_by_class('p-drama-half-row', playlist_data))
if drama_list is None:
raise ExtractorError('No episodes found')
video_urls = re.findall(r'<a[^>]+href="([^"]+)"', drama_list)
return playlist_data, [
self.url_result(self._proto_relative_url(video_url, 'http:'), YoukuIE.ie_key())
for video_url in video_urls]
def _real_extract(self, url):
show_id = self._match_id(url)
@@ -242,30 +265,29 @@ class YoukuShowIE(InfoExtractor):
page_config = self._parse_json(self._search_regex(
r'var\s+PageConfig\s*=\s*({.+});', webpage, 'page config'),
show_id, transform_source=js_to_json)
for idx in itertools.count(0):
if idx == 0:
playlist_data_url = 'http://list.youku.com/show/module'
query = {'id': page_config['showid'], 'tab': 'point'}
else:
playlist_data_url = 'http://list.youku.com/show/point'
query = {
'id': page_config['showid'],
'stage': 'reload_%d' % (self._PAGE_SIZE * idx + 1),
}
query['callback'] = 'cb'
playlist_data = self._download_json(
playlist_data_url, show_id, query=query,
first_page, initial_entries = self._extract_entries(
'http://list.youku.com/show/module', show_id,
note='Downloading initial playlist data page',
query={
'id': page_config['showid'],
'tab': 'showInfo',
})
first_page_reload_id = self._html_search_regex(
r'<div[^>]+id="(reload_\d+)', first_page, 'first page reload id')
# The first reload_id has the same items as first_page
reload_ids = re.findall('<li[^>]+data-id="([^"]+)">', first_page)
for idx, reload_id in enumerate(reload_ids):
if reload_id == first_page_reload_id:
entries.extend(initial_entries)
continue
_, new_entries = self._extract_entries(
'http://list.youku.com/show/episode', show_id,
note='Downloading playlist data page %d' % (idx + 1),
transform_source=lambda s: js_to_json(strip_jsonp(s)))['html']
video_urls = re.findall(
r'<div[^>]+class="p-thumb"[^<]+<a[^>]+href="([^"]+)"',
playlist_data)
new_entries = [
self.url_result(urljoin(url, video_url), YoukuIE.ie_key())
for video_url in video_urls]
query={
'id': page_config['showid'],
'stage': reload_id,
})
entries.extend(new_entries)
if len(new_entries) < self._PAGE_SIZE:
break
desc = self._html_search_meta('description', webpage, fatal=False)
playlist_title = desc.split(',')[0] if desc else None

View File

@@ -1,3 +1,3 @@
from __future__ import unicode_literals
__version__ = '2017.07.15'
__version__ = '2017.07.23'