release 2016.08.07

[ChangeLog] Actualize
[discoverygo] Add extractor (Closes #10245 )
2016-08-07 21:12:41 +07:00 · 2016-08-07 21:10:48 +07:00 · 2016-08-07 20:57:05 +07:00 · 2016-08-07 20:45:18 +07:00 · 2016-08-07 19:06:55 +07:00 · 2016-08-07 19:04:22 +07:00
17 changed files with 580 additions and 185 deletions
--- a/.github/ISSUE_TEMPLATE.md
+++ b/.github/ISSUE_TEMPLATE.md
@@ -6,8 +6,8 @@
 ---
-### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2016.08.06*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
+### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2016.08.07*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2016.08.06**
+- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2016.08.07**
 ### Before submitting an *issue* make sure you have:
 - [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
@@ -35,7 +35,7 @@ $ youtube-dl -v <your command line>
 [debug] User config: []
 [debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
 [debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
-[debug] youtube-dl version 2016.08.06
+[debug] youtube-dl version 2016.08.07
 [debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
 [debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
 [debug] Proxy map: {}
--- a/26
+++ b/26
@@ -1,3 +1,29 @@
 version 2016.08.07
 Core
 + Add support for TV Parental Guidelines ratings in parse_age_limit
 + Add decode_png (#9706)
 + Add support for partOfTVSeries in JSON-LD
 * Lower master M3U8 manifest preference for better format sorting
 Extractors
 + [discoverygo] Add extractor (#10245)
 * [flipagram] Make JSON-LD extraction non fatal
 * [generic] Make JSON-LD extraction non fatal
 + [bbc] Add support for morph embeds (#10239)
 * [tnaflixnetworkbase] Improve title extraction
 * [tnaflix] Fix metadata extraction (#10249)
 * [fox] Fix theplatform release URL query
 * [openload] Fix extraction (#9706)
 * [bbc] Skip duplicate manifest URLs
 * [bbc] Improve format code
 + [bbc] Add support for DASH and F4M
 * [bbc] Improve format sorting and listing
 * [bbc] Improve playlist extraction
 + [pokemon] Add extractor (#10093)
 + [condenast] Add fallback scenario for video info extraction
 version 2016.08.06
 Core
--- a/docs/supportedsites.md
+++ b/docs/supportedsites.md
@@ -182,6 +182,7 @@
 - **DigitallySpeaking**
 - **Digiteka**
 - **Discovery**
 - **DiscoveryGo**
 - **Dotsub**
 - **DouyuTV**: 斗鱼
 - **DPlay**
@@ -518,6 +519,7 @@
 - **plus.google**: Google Plus
 - **pluzz.francetv.fr**
 - **podomatic**
 - **Pokemon**
 - **PolskieRadio**
 - **PornHd**
 - **PornHub**: PornHub and Thumbzilla
--- a/test/test_utils.py
+++ b/test/test_utils.py
@@ -42,6 +42,7 @@ from youtube_dl.utils import (
    ohdave_rsa_encrypt,
    OnDemandPagedList,
    orderedSet,
    parse_age_limit,
    parse_duration,
    parse_filesize,
    parse_count,
@@ -432,6 +433,20 @@ class TestUtil(unittest.TestCase):
            url_basename('http://media.w3.org/2010/05/sintel/trailer.mp4'),
            'trailer.mp4')
    def test_parse_age_limit(self):
        self.assertEqual(parse_age_limit(None), None)
        self.assertEqual(parse_age_limit(False), None)
        self.assertEqual(parse_age_limit('invalid'), None)
        self.assertEqual(parse_age_limit(0), 0)
        self.assertEqual(parse_age_limit(18), 18)
        self.assertEqual(parse_age_limit(21), 21)
        self.assertEqual(parse_age_limit(22), None)
        self.assertEqual(parse_age_limit('18'), 18)
        self.assertEqual(parse_age_limit('18+'), 18)
        self.assertEqual(parse_age_limit('PG-13'), 13)
        self.assertEqual(parse_age_limit('TV-14'), 14)
        self.assertEqual(parse_age_limit('TV-MA'), 17)
    def test_parse_duration(self):
        self.assertEqual(parse_duration(None), None)
        self.assertEqual(parse_duration(False), None)
--- a/youtube_dl/extractor/bbc.py
+++ b/youtube_dl/extractor/bbc.py
@@ -5,11 +5,13 @@ import re
 from .common import InfoExtractor
 from ..utils import (
    dict_get,
    ExtractorError,
    float_or_none,
    int_or_none,
    parse_duration,
    parse_iso8601,
    try_get,
    unescapeHTML,
 )
 from ..compat import (
@@ -229,51 +231,6 @@ class BBCCoUkIE(InfoExtractor):
        asx = self._download_xml(connection.get('href'), programme_id, 'Downloading ASX playlist')
        return [ref.get('href') for ref in asx.findall('./Entry/ref')]
    def _extract_connection(self, connection, programme_id):
        formats = []
        kind = connection.get('kind')
        protocol = connection.get('protocol')
        supplier = connection.get('supplier')
        if protocol == 'http':
            href = connection.get('href')
            transfer_format = connection.get('transferFormat')
            # ASX playlist
            if supplier == 'asx':
                for i, ref in enumerate(self._extract_asx_playlist(connection, programme_id)):
                    formats.append({
                        'url': ref,
                        'format_id': 'ref%s_%s' % (i, supplier),
                    })
            # Skip DASH until supported
            elif transfer_format == 'dash':
                pass
            elif transfer_format == 'hls':
                formats.extend(self._extract_m3u8_formats(
                    href, programme_id, ext='mp4', entry_protocol='m3u8_native',
                    m3u8_id=supplier, fatal=False))
            # Direct link
            else:
                formats.append({
                    'url': href,
                    'format_id': supplier or kind or protocol,
                })
        elif protocol == 'rtmp':
            application = connection.get('application', 'ondemand')
            auth_string = connection.get('authString')
            identifier = connection.get('identifier')
            server = connection.get('server')
            formats.append({
                'url': '%s://%s/%s?%s' % (protocol, server, application, auth_string),
                'play_path': identifier,
                'app': '%s?%s' % (application, auth_string),
                'page_url': 'http://www.bbc.co.uk',
                'player_url': 'http://www.bbc.co.uk/emp/releases/iplayer/revisions/617463_618125_4/617463_618125_4_emp.swf',
                'rtmp_live': False,
                'ext': 'flv',
                'format_id': supplier,
            })
        return formats
    def _extract_items(self, playlist):
        return playlist.findall('./{%s}item' % self._EMP_PLAYLIST_NS)
@@ -294,46 +251,6 @@ class BBCCoUkIE(InfoExtractor):
    def _extract_connections(self, media):
        return self._findall_ns(media, './{%s}connection')
    def _extract_video(self, media, programme_id):
        formats = []
        vbr = int_or_none(media.get('bitrate'))
        vcodec = media.get('encoding')
        service = media.get('service')
        width = int_or_none(media.get('width'))
        height = int_or_none(media.get('height'))
        file_size = int_or_none(media.get('media_file_size'))
        for connection in self._extract_connections(media):
            conn_formats = self._extract_connection(connection, programme_id)
            for format in conn_formats:
                format.update({
                    'width': width,
                    'height': height,
                    'vbr': vbr,
                    'vcodec': vcodec,
                    'filesize': file_size,
                })
                if service:
                    format['format_id'] = '%s_%s' % (service, format['format_id'])
            formats.extend(conn_formats)
        return formats
    def _extract_audio(self, media, programme_id):
        formats = []
        abr = int_or_none(media.get('bitrate'))
        acodec = media.get('encoding')
        service = media.get('service')
        for connection in self._extract_connections(media):
            conn_formats = self._extract_connection(connection, programme_id)
            for format in conn_formats:
                format.update({
                    'format_id': '%s_%s' % (service, format['format_id']),
                    'abr': abr,
                    'acodec': acodec,
                    'vcodec': 'none',
                })
            formats.extend(conn_formats)
        return formats
    def _get_subtitles(self, media, programme_id):
        subtitles = {}
        for connection in self._extract_connections(media):
@@ -379,13 +296,87 @@ class BBCCoUkIE(InfoExtractor):
    def _process_media_selector(self, media_selection, programme_id):
        formats = []
        subtitles = None
        urls = []
        for media in self._extract_medias(media_selection):
            kind = media.get('kind')
-            if kind == 'audio':
+            if kind in ('video', 'audio'):
-                formats.extend(self._extract_audio(media, programme_id))
+                bitrate = int_or_none(media.get('bitrate'))
-            elif kind == 'video':
+                encoding = media.get('encoding')
-                formats.extend(self._extract_video(media, programme_id))
+                service = media.get('service')
                width = int_or_none(media.get('width'))
                height = int_or_none(media.get('height'))
                file_size = int_or_none(media.get('media_file_size'))
                for connection in self._extract_connections(media):
                    href = connection.get('href')
                    if href in urls:
                        continue
                    if href:
                        urls.append(href)
                    conn_kind = connection.get('kind')
                    protocol = connection.get('protocol')
                    supplier = connection.get('supplier')
                    transfer_format = connection.get('transferFormat')
                    format_id = supplier or conn_kind or protocol
                    if service:
                        format_id = '%s_%s' % (service, format_id)
                    # ASX playlist
                    if supplier == 'asx':
                        for i, ref in enumerate(self._extract_asx_playlist(connection, programme_id)):
                            formats.append({
                                'url': ref,
                                'format_id': 'ref%s_%s' % (i, format_id),
                            })
                    elif transfer_format == 'dash':
                        formats.extend(self._extract_mpd_formats(
                            href, programme_id, mpd_id=format_id, fatal=False))
                    elif transfer_format == 'hls':
                        formats.extend(self._extract_m3u8_formats(
                            href, programme_id, ext='mp4', entry_protocol='m3u8_native',
                            m3u8_id=format_id, fatal=False))
                    elif transfer_format == 'hds':
                        formats.extend(self._extract_f4m_formats(
                            href, programme_id, f4m_id=format_id, fatal=False))
                    else:
                        if not service and not supplier and bitrate:
                            format_id += '-%d' % bitrate
                        fmt = {
                            'format_id': format_id,
                            'filesize': file_size,
                        }
                        if kind == 'video':
                            fmt.update({
                                'width': width,
                                'height': height,
                                'vbr': bitrate,
                                'vcodec': encoding,
                            })
                        else:
                            fmt.update({
                                'abr': bitrate,
                                'acodec': encoding,
                                'vcodec': 'none',
                            })
                        if protocol == 'http':
                            # Direct link
                            fmt.update({
                                'url': href,
                            })
                        elif protocol == 'rtmp':
                            application = connection.get('application', 'ondemand')
                            auth_string = connection.get('authString')
                            identifier = connection.get('identifier')
                            server = connection.get('server')
                            fmt.update({
                                'url': '%s://%s/%s?%s' % (protocol, server, application, auth_string),
                                'play_path': identifier,
                                'app': '%s?%s' % (application, auth_string),
                                'page_url': 'http://www.bbc.co.uk',
                                'player_url': 'http://www.bbc.co.uk/emp/releases/iplayer/revisions/617463_618125_4/617463_618125_4_emp.swf',
                                'rtmp_live': False,
                                'ext': 'flv',
                            })
                        formats.append(fmt)
            elif kind == 'captions':
                subtitles = self.extract_subtitles(media, programme_id)
        return formats, subtitles
@@ -589,7 +580,7 @@ class BBCIE(BBCCoUkIE):
        'info_dict': {
            'id': '150615_telabyad_kentin_cogu',
            'ext': 'mp4',
-            'title': "Tel Abyad'da IŞİD bayrağı indirildi YPG bayrağı çekildi",
+            'title': "YPG: Tel Abyad'ın tamamı kontrolümüzde",
            'description': 'md5:33a4805a855c9baf7115fcbde57e7025',
            'timestamp': 1434397334,
            'upload_date': '20150615',
@@ -654,6 +645,23 @@ class BBCIE(BBCCoUkIE):
            # rtmp download
            'skip_download': True,
        }
    }, {
        # single video embedded with Morph
        'url': 'http://www.bbc.co.uk/sport/live/olympics/36895975',
        'info_dict': {
            'id': 'p041vhd0',
            'ext': 'mp4',
            'title': "Nigeria v Japan - Men's First Round",
            'description': 'Live coverage of the first round from Group B at the Amazonia Arena.',
            'duration': 7980,
            'uploader': 'BBC Sport',
            'uploader_id': 'bbc_sport',
        },
        'params': {
            # m3u8 download
            'skip_download': True,
        },
        'skip': 'Georestricted to UK',
    }, {
        # single video with playlist.sxml URL in playlist param
        'url': 'http://www.bbc.com/sport/0/football/33653409',
@@ -820,13 +828,19 @@ class BBCIE(BBCCoUkIE):
                        # http://www.bbc.com/turkce/multimedya/2015/10/151010_vid_ankara_patlama_ani)
                        playlist = data_playable.get('otherSettings', {}).get('playlist', {})
                        if playlist:
-                            for key in ('progressiveDownload', 'streaming'):
+                            entry = None
                            for key in ('streaming', 'progressiveDownload'):
                                playlist_url = playlist.get('%sUrl' % key)
                                if not playlist_url:
                                    continue
                                try:
-                                    entries.append(self._extract_from_playlist_sxml(
+                                    info = self._extract_from_playlist_sxml(
-                                        playlist_url, playlist_id, timestamp))
+                                        playlist_url, playlist_id, timestamp)
                                    if not entry:
                                        entry = info
                                    else:
                                        entry['title'] = info['title']
                                        entry['formats'].extend(info['formats'])
                                except Exception as e:
                                    # Some playlist URL may fail with 500, at the same time
                                    # the other one may work fine (e.g.
@@ -834,6 +848,9 @@ class BBCIE(BBCCoUkIE):
                                    if isinstance(e.cause, compat_HTTPError) and e.cause.code == 500:
                                        continue
                                    raise
                            if entry:
                                self._sort_formats(entry['formats'])
                                entries.append(entry)
        if entries:
            return self.playlist_result(entries, playlist_id, playlist_title, playlist_description)
@@ -866,6 +883,50 @@ class BBCIE(BBCCoUkIE):
                'subtitles': subtitles,
            }
        # Morph based embed (e.g. http://www.bbc.co.uk/sport/live/olympics/36895975)
        # There are several setPayload calls may be present but the video
        # seems to be always related to the first one
        morph_payload = self._parse_json(
            self._search_regex(
                r'Morph\.setPayload\([^,]+,\s*({.+?})\);',
                webpage, 'morph payload', default='{}'),
            playlist_id, fatal=False)
        if morph_payload:
            components = try_get(morph_payload, lambda x: x['body']['components'], list) or []
            for component in components:
                if not isinstance(component, dict):
                    continue
                lead_media = try_get(component, lambda x: x['props']['leadMedia'], dict)
                if not lead_media:
                    continue
                identifiers = lead_media.get('identifiers')
                if not identifiers or not isinstance(identifiers, dict):
                    continue
                programme_id = identifiers.get('vpid') or identifiers.get('playablePid')
                if not programme_id:
                    continue
                title = lead_media.get('title') or self._og_search_title(webpage)
                formats, subtitles = self._download_media_selector(programme_id)
                self._sort_formats(formats)
                description = lead_media.get('summary')
                uploader = lead_media.get('masterBrand')
                uploader_id = lead_media.get('mid')
                duration = None
                duration_d = lead_media.get('duration')
                if isinstance(duration_d, dict):
                    duration = parse_duration(dict_get(
                        duration_d, ('rawDuration', 'formattedDuration', 'spokenDuration')))
                return {
                    'id': programme_id,
                    'title': title,
                    'description': description,
                    'duration': duration,
                    'uploader': uploader,
                    'uploader_id': uploader_id,
                    'formats': formats,
                    'subtitles': subtitles,
                }
        def extract_all(pattern):
            return list(filter(None, map(
                lambda s: self._parse_json(s, playlist_id, fatal=False),
@@ -883,7 +944,7 @@ class BBCIE(BBCCoUkIE):
            r'setPlaylist\("(%s)"\)' % EMBED_URL, webpage))
        if entries:
            return self.playlist_result(
-                [self.url_result(entry, 'BBCCoUk') for entry in entries],
+                [self.url_result(entry_, 'BBCCoUk') for entry_ in entries],
                playlist_id, playlist_title, playlist_description)
        # Multiple video article (e.g. http://www.bbc.com/news/world-europe-32668511)
--- a/youtube_dl/extractor/common.py
+++ b/youtube_dl/extractor/common.py
@@ -846,7 +846,7 @@ class InfoExtractor(object):
                    part_of_season = e.get('partOfSeason')
                    if isinstance(part_of_season, dict) and part_of_season.get('@type') == 'TVSeason':
                        info['season_number'] = int_or_none(part_of_season.get('seasonNumber'))
-                    part_of_series = e.get('partOfSeries')
+                    part_of_series = e.get('partOfSeries') or e.get('partOfTVSeries')
                    if isinstance(part_of_series, dict) and part_of_series.get('@type') == 'TVSeries':
                        info['series'] = unescapeHTML(part_of_series.get('name'))
                elif item_type == 'Article':
@@ -1140,7 +1140,7 @@ class InfoExtractor(object):
            'url': m3u8_url,
            'ext': ext,
            'protocol': 'm3u8',
-            'preference': preference - 1 if preference else -1,
+            'preference': preference - 100 if preference else -100,
            'resolution': 'multiple',
            'format_note': 'Quality selection URL',
        }
--- a/youtube_dl/extractor/condenast.py
+++ b/youtube_dl/extractor/condenast.py
@@ -113,11 +113,19 @@ class CondeNastIE(InfoExtractor):
                'target': params['id'],
            })
        video_id = query['videoId']
        video_info = None
        info_page = self._download_webpage(
            'http://player.cnevids.com/player/video.js',
-            video_id, 'Downloading video info', query=query)
+            video_id, 'Downloading video info', query=query, fatal=False)
-        video_info = self._parse_json(self._search_regex(
+        if info_page:
-            r'loadCallback\(({.+})\)', info_page, 'video info'), video_id)['video']
+            video_info = self._parse_json(self._search_regex(
                r'loadCallback\(({.+})\)', info_page, 'video info'), video_id)['video']
        else:
            info_page = self._download_webpage(
                'http://player.cnevids.com/player/loader.js',
                video_id, 'Downloading loader info', query=query)
            video_info = self._parse_json(self._search_regex(
                r'var\s+video\s*=\s*({.+?});', info_page, 'video info'), video_id)
        title = video_info['title']
        formats = []
--- a/youtube_dl/extractor/discoverygo.py
+++ b/youtube_dl/extractor/discoverygo.py
@@ -0,0 +1,98 @@
 from __future__ import unicode_literals
 from .common import InfoExtractor
 from ..compat import compat_str
 from ..utils import (
    extract_attributes,
    int_or_none,
    parse_age_limit,
    unescapeHTML,
 )
 class DiscoveryGoIE(InfoExtractor):
    _VALID_URL = r'https?://(?:www\.)?discoverygo\.com/(?:[^/]+/)*(?P<id>[^/?#&]+)'
    _TEST = {
        'url': 'https://www.discoverygo.com/love-at-first-kiss/kiss-first-ask-questions-later/',
        'info_dict': {
            'id': '57a33c536b66d1cd0345eeb1',
            'ext': 'mp4',
            'title': 'Kiss First, Ask Questions Later!',
            'description': 'md5:fe923ba34050eae468bffae10831cb22',
            'duration': 2579,
            'series': 'Love at First Kiss',
            'season_number': 1,
            'episode_number': 1,
            'age_limit': 14,
        },
    }
    def _real_extract(self, url):
        display_id = self._match_id(url)
        webpage = self._download_webpage(url, display_id)
        container = extract_attributes(
            self._search_regex(
                r'(<div[^>]+class=["\']video-player-container[^>]+>)',
                webpage, 'video container'))
        video = self._parse_json(
            unescapeHTML(container.get('data-video') or container.get('data-json')),
            display_id)
        title = video['name']
        stream = video['stream']
        STREAM_URL_SUFFIX = 'streamUrl'
        formats = []
        for stream_kind in ('', 'hds'):
            suffix = STREAM_URL_SUFFIX.capitalize() if stream_kind else STREAM_URL_SUFFIX
            stream_url = stream.get('%s%s' % (stream_kind, suffix))
            if not stream_url:
                continue
            if stream_kind == '':
                formats.extend(self._extract_m3u8_formats(
                    stream_url, display_id, 'mp4', entry_protocol='m3u8_native',
                    m3u8_id='hls', fatal=False))
            elif stream_kind == 'hds':
                formats.extend(self._extract_f4m_formats(
                    stream_url, display_id, f4m_id=stream_kind, fatal=False))
        self._sort_formats(formats)
        video_id = video.get('id') or display_id
        description = video.get('description', {}).get('detailed')
        duration = int_or_none(video.get('duration'))
        series = video.get('show', {}).get('name')
        season_number = int_or_none(video.get('season', {}).get('number'))
        episode_number = int_or_none(video.get('episodeNumber'))
        tags = video.get('tags')
        age_limit = parse_age_limit(video.get('parental', {}).get('rating'))
        subtitles = {}
        captions = stream.get('captions')
        if isinstance(captions, list):
            for caption in captions:
                subtitle_url = caption.get('fileUrl')
                if (not subtitle_url or not isinstance(subtitle_url, compat_str) or
                        not subtitle_url.startswith('http')):
                    continue
                lang = caption.get('fileLang', 'en')
                subtitles.setdefault(lang, []).append({'url': subtitle_url})
        return {
            'id': video_id,
            'display_id': display_id,
            'title': title,
            'description': description,
            'duration': duration,
            'series': series,
            'season_number': season_number,
            'episode_number': episode_number,
            'tags': tags,
            'age_limit': age_limit,
            'formats': formats,
            'subtitles': subtitles,
        }
--- a/youtube_dl/extractor/extractors.py
+++ b/youtube_dl/extractor/extractors.py
@@ -221,6 +221,7 @@ from .dvtv import DVTVIE
 from .dumpert import DumpertIE
 from .defense import DefenseGouvFrIE
 from .discovery import DiscoveryIE
 from .discoverygo import DiscoveryGoIE
 from .dispeak import DigitallySpeakingIE
 from .dropbox import DropboxIE
 from .dw import (
@@ -636,6 +637,7 @@ from .pluralsight import (
    PluralsightCourseIE,
 )
 from .podomatic import PodomaticIE
 from .pokemon import PokemonIE
 from .polskieradio import PolskieRadioIE
 from .porn91 import Porn91IE
 from .pornhd import PornHdIE
--- a/youtube_dl/extractor/flipagram.py
+++ b/youtube_dl/extractor/flipagram.py
@@ -48,7 +48,7 @@ class FlipagramIE(InfoExtractor):
        flipagram = video_data['flipagram']
        video = flipagram['video']
-        json_ld = self._search_json_ld(webpage, video_id, default=False)
+        json_ld = self._search_json_ld(webpage, video_id, fatal=False)
        title = json_ld.get('title') or flipagram['captionText']
        description = json_ld.get('description') or flipagram.get('captionText')
--- a/youtube_dl/extractor/fox.py
+++ b/youtube_dl/extractor/fox.py
@@ -2,7 +2,10 @@
 from __future__ import unicode_literals
 from .common import InfoExtractor
-from ..utils import smuggle_url
+from ..utils import (
    smuggle_url,
    update_url_query,
 )
 class FOXIE(InfoExtractor):
@@ -29,11 +32,12 @@ class FOXIE(InfoExtractor):
        release_url = self._parse_json(self._search_regex(
            r'"fox_pdk_player"\s*:\s*({[^}]+?})', webpage, 'fox_pdk_player'),
-            video_id)['release_url'] + '&switch=http'
+            video_id)['release_url']
        return {
            '_type': 'url_transparent',
            'ie_key': 'ThePlatform',
-            'url': smuggle_url(release_url, {'force_smil_url': True}),
+            'url': smuggle_url(update_url_query(
                release_url, {'switch': 'http'}), {'force_smil_url': True}),
            'id': video_id,
        }
--- a/youtube_dl/extractor/generic.py
+++ b/youtube_dl/extractor/generic.py
@@ -2241,7 +2241,7 @@ class GenericIE(InfoExtractor):
        # Looking for http://schema.org/VideoObject
        json_ld = self._search_json_ld(
-            webpage, video_id, default=None, expected_type='VideoObject')
+            webpage, video_id, fatal=False, expected_type='VideoObject')
        if json_ld and json_ld.get('url'):
            info_dict.update({
                'title': video_title or info_dict['title'],
--- a/youtube_dl/extractor/openload.py
+++ b/youtube_dl/extractor/openload.py
@@ -1,15 +1,14 @@
 # coding: utf-8
-from __future__ import unicode_literals
+from __future__ import unicode_literals, division
-import re
+import math
 from .common import InfoExtractor
 from ..compat import compat_chr
 from ..utils import (
    decode_png,
    determine_ext,
    encode_base_n,
    ExtractorError,
    mimetype2ext,
 )
@@ -41,60 +40,6 @@ class OpenloadIE(InfoExtractor):
        'only_matching': True,
    }]
    @staticmethod
    def openload_level2_debase(m):
        radix, num = int(m.group(1)) + 27, int(m.group(2))
        return '"' + encode_base_n(num, radix) + '"'
    @classmethod
    def openload_level2(cls, txt):
        # The function name is ǃ \u01c3
        # Using escaped unicode literals does not work in Python 3.2
        return re.sub(r'ǃ\((\d+),(\d+)\)', cls.openload_level2_debase, txt, re.UNICODE).replace('"+"', '')
    # Openload uses a variant of aadecode
    # openload_decode and related functions are originally written by
    # vitas@matfyz.cz and released with public domain
    # See https://github.com/rg3/youtube-dl/issues/8489
    @classmethod
    def openload_decode(cls, txt):
        symbol_table = [
            ('_', '(ﾟДﾟ) [ﾟΘﾟ]'),
            ('a', '(ﾟДﾟ) [ﾟωﾟﾉ]'),
            ('b', '(ﾟДﾟ) [ﾟΘﾟﾉ]'),
            ('c', '(ﾟДﾟ) [\'c\']'),
            ('d', '(ﾟДﾟ) [ﾟｰﾟﾉ]'),
            ('e', '(ﾟДﾟ) [ﾟДﾟﾉ]'),
            ('f', '(ﾟДﾟ) [1]'),
            ('o', '(ﾟДﾟ) [\'o\']'),
            ('u', '(oﾟｰﾟo)'),
            ('c', '(ﾟДﾟ) [\'c\']'),
            ('7', '((ﾟｰﾟ) + (o^_^o))'),
            ('6', '((o^_^o) +(o^_^o) +(c^_^o))'),
            ('5', '((ﾟｰﾟ) + (ﾟΘﾟ))'),
            ('4', '(-~3)'),
            ('3', '(-~-~1)'),
            ('2', '(-~1)'),
            ('1', '(-~0)'),
            ('0', '((c^_^o)-(c^_^o))'),
        ]
        delim = '(ﾟДﾟ)[ﾟεﾟ]+'
        ret = ''
        for aachar in txt.split(delim):
            for val, pat in symbol_table:
                aachar = aachar.replace(pat, val)
            aachar = aachar.replace('+ ', '')
            m = re.match(r'^\d+', aachar)
            if m:
                ret += compat_chr(int(m.group(0), 8))
            else:
                m = re.match(r'^u([\da-f]+)', aachar)
                if m:
                    ret += compat_chr(int(m.group(1), 16))
        return cls.openload_level2(ret)
    def _real_extract(self, url):
        video_id = self._match_id(url)
        webpage = self._download_webpage(url, video_id)
@@ -102,29 +47,77 @@ class OpenloadIE(InfoExtractor):
        if 'File not found' in webpage:
            raise ExtractorError('File not found', expected=True)
-        code = self._search_regex(
+        # The following extraction logic is proposed by @Belderak and @gdkchan
-            r'</video>\s*</div>\s*<script[^>]+>[^>]+</script>\s*<script[^>]+>([^<]+)</script>',
+        # and declared to be used freely in youtube-dl
-            webpage, 'JS code')
+        # See https://github.com/rg3/youtube-dl/issues/9706
-        decoded = self.openload_decode(code)
+        numbers_js = self._download_webpage(
            'https://openload.co/assets/js/obfuscator/n.js', video_id,
            note='Downloading signature numbers')
        signums = self._search_regex(
            r'window\.signatureNumbers\s*=\s*[\'"](?P<data>[a-z]+)[\'"]',
            numbers_js, 'signature numbers', group='data')
-        video_url = self._search_regex(
+        linkimg_uri = self._search_regex(
-            r'return\s+"(https?://[^"]+)"', decoded, 'video URL')
+            r'<img[^>]+id="linkimg"[^>]+src="([^"]+)"', webpage, 'link image')
        linkimg = self._request_webpage(
            linkimg_uri, video_id, note=False).read()
        width, height, pixels = decode_png(linkimg)
        output = ''
        for y in range(height):
            for x in range(width):
                r, g, b = pixels[y][3 * x:3 * x + 3]
                if r == 0 and g == 0 and b == 0:
                    break
                else:
                    output += compat_chr(r)
                    output += compat_chr(g)
                    output += compat_chr(b)
        img_str_length = len(output) // 200
        img_str = [[0 for x in range(img_str_length)] for y in range(10)]
        sig_str_length = len(signums) // 260
        sig_str = [[0 for x in range(sig_str_length)] for y in range(10)]
        for i in range(10):
            for j in range(img_str_length):
                begin = i * img_str_length * 20 + j * 20
                img_str[i][j] = output[begin:begin + 20]
            for j in range(sig_str_length):
                begin = i * sig_str_length * 26 + j * 26
                sig_str[i][j] = signums[begin:begin + 26]
        parts = []
        # TODO: find better names for str_, chr_ and sum_
        str_ = ''
        for i in [2, 3, 5, 7]:
            str_ = ''
            sum_ = float(99)
            for j in range(len(sig_str[i])):
                for chr_idx in range(len(img_str[i][j])):
                    if sum_ > float(122):
                        sum_ = float(98)
                    chr_ = compat_chr(int(math.floor(sum_)))
                    if sig_str[i][j][chr_idx] == chr_ and j >= len(str_):
                        sum_ += float(2.5)
                        str_ += img_str[i][j][chr_idx]
            parts.append(str_.replace(',', ''))
        video_url = 'https://openload.co/stream/%s~%s~%s~%s' % (parts[3], parts[1], parts[2], parts[0])
        title = self._og_search_title(webpage, default=None) or self._search_regex(
            r'<span[^>]+class=["\']title["\'][^>]*>([^<]+)', webpage,
            'title', default=None) or self._html_search_meta(
            'description', webpage, 'title', fatal=True)
        ext = mimetype2ext(self._search_regex(
            r'window\.vt\s*=\s*(["\'])(?P<mimetype>.+?)\1', decoded,
            'mimetype', default=None, group='mimetype')) or determine_ext(
            video_url, 'mp4')
        return {
            'id': video_id,
            'title': title,
            'ext': ext,
            'thumbnail': self._og_search_thumbnail(webpage, default=None),
            'url': video_url,
            # Seems all videos have extensions in their titles
            'ext': determine_ext(title),
        }
--- a/youtube_dl/extractor/pokemon.py
+++ b/youtube_dl/extractor/pokemon.py
@@ -0,0 +1,58 @@
 # coding: utf-8
 from __future__ import unicode_literals
 import re
 from .common import InfoExtractor
 from ..utils import (
    extract_attributes,
    int_or_none,
 )
 class PokemonIE(InfoExtractor):
    _VALID_URL = r'https?://(?:www\.)?pokemon\.com/[a-z]{2}(?:.*?play=(?P<id>[a-z0-9]{32})|/[^/]+/\d+_\d+-(?P<display_id>[^/?#]+))'
    _TESTS = [{
        'url': 'http://www.pokemon.com/us/pokemon-episodes/19_01-from-a-to-z/?play=true',
        'md5': '9fb209ae3a569aac25de0f5afc4ee08f',
        'info_dict': {
            'id': 'd0436c00c3ce4071ac6cee8130ac54a1',
            'ext': 'mp4',
            'title': 'From A to Z!',
            'description': 'Bonnie makes a new friend, Ash runs into an old friend, and a terrifying premonition begins to unfold!',
            'timestamp': 1460478136,
            'upload_date': '20160412',
        },
        'add_id': ['LimelightMedia']
    }, {
        'url': 'http://www.pokemon.com/uk/pokemon-episodes/?play=2e8b5c761f1d4a9286165d7748c1ece2',
        'only_matching': True,
    }, {
        'url': 'http://www.pokemon.com/fr/episodes-pokemon/18_09-un-hiver-inattendu/',
        'only_matching': True,
    }, {
        'url': 'http://www.pokemon.com/de/pokemon-folgen/01_20-bye-bye-smettbo/',
        'only_matching': True,
    }]
    def _real_extract(self, url):
        video_id, display_id = re.match(self._VALID_URL, url).groups()
        webpage = self._download_webpage(url, video_id or display_id)
        video_data = extract_attributes(self._search_regex(
            r'(<[^>]+data-video-id="%s"[^>]*>)' % (video_id if video_id else '[a-z0-9]{32}'),
            webpage, 'video data element'))
        video_id = video_data['data-video-id']
        title = video_data['data-video-title']
        return {
            '_type': 'url_transparent',
            'id': video_id,
            'url': 'limelight:media:%s' % video_id,
            'title': title,
            'description': video_data.get('data-video-summary'),
            'thumbnail': video_data.get('data-video-poster'),
            'series': 'Pokémon',
            'season_number': int_or_none(video_data.get('data-video-season')),
            'episode': title,
            'episode_number': int_or_none(video_data.get('data-video-episode')),
            'ie_key': 'LimelightMedia',
        }
--- a/youtube_dl/extractor/tnaflix.py
+++ b/youtube_dl/extractor/tnaflix.py
@@ -118,8 +118,12 @@ class TNAFlixNetworkBaseIE(InfoExtractor):
            xpath_text(cfg_xml, './startThumb', 'thumbnail'), 'http:')
        thumbnails = self._extract_thumbnails(cfg_xml)
-        title = self._html_search_regex(
+        title = None
-            self._TITLE_REGEX, webpage, 'title') if self._TITLE_REGEX else self._og_search_title(webpage)
+        if self._TITLE_REGEX:
            title = self._html_search_regex(
                self._TITLE_REGEX, webpage, 'title', default=None)
        if not title:
            title = self._og_search_title(webpage)
        age_limit = self._rta_search(webpage) or 18
@@ -189,9 +193,9 @@ class TNAFlixNetworkEmbedIE(TNAFlixNetworkBaseIE):
 class TNAFlixIE(TNAFlixNetworkBaseIE):
    _VALID_URL = r'https?://(?:www\.)?tnaflix\.com/[^/]+/(?P<display_id>[^/]+)/video(?P<id>\d+)'
-    _TITLE_REGEX = r'<title>(.+?) - TNAFlix Porn Videos</title>'
+    _TITLE_REGEX = r'<title>(.+?) - (?:TNAFlix Porn Videos|TNAFlix\.com)</title>'
-    _DESCRIPTION_REGEX = r'<meta[^>]+name="description"[^>]+content="([^"]+)"'
+    _DESCRIPTION_REGEX = r'(?s)>Description:</[^>]+>(.+?)<'
-    _UPLOADER_REGEX = r'<i>\s*Verified Member\s*</i>\s*<h1>(.+?)</h1>'
+    _UPLOADER_REGEX = r'<i>\s*Verified Member\s*</i>\s*<h\d+>(.+?)<'
    _CATEGORIES_REGEX = r'(?s)<span[^>]*>Categories:</span>(.+?)</div>'
    _TESTS = [{
--- a/youtube_dl/utils.py
+++ b/youtube_dl/utils.py
@@ -47,6 +47,7 @@ from .compat import (
    compat_socket_create_connection,
    compat_str,
    compat_struct_pack,
    compat_struct_unpack,
    compat_urllib_error,
    compat_urllib_parse,
    compat_urllib_parse_urlencode,
@@ -1983,11 +1984,27 @@ US_RATINGS = {
 }
 TV_PARENTAL_GUIDELINES = {
    'TV-Y': 0,
    'TV-Y7': 7,
    'TV-G': 0,
    'TV-PG': 0,
    'TV-14': 14,
    'TV-MA': 17,
 }
 def parse_age_limit(s):
-    if s is None:
+    if type(s) == int:
        return s if 0 <= s <= 21 else None
    if not isinstance(s, compat_basestring):
        return None
    m = re.match(r'^(?P<age>\d{1,2})\+?$', s)
-    return int(m.group('age')) if m else US_RATINGS.get(s)
+    if m:
        return int(m.group('age'))
    if s in US_RATINGS:
        return US_RATINGS[s]
    return TV_PARENTAL_GUIDELINES.get(s)
 def strip_jsonp(code):
@@ -2969,3 +2986,110 @@ def parse_m3u8_attributes(attrib):
 def urshift(val, n):
    return val >> n if val >= 0 else (val + 0x100000000) >> n
 # Based on png2str() written by @gdkchan and improved by @yokrysty
 # Originally posted at https://github.com/rg3/youtube-dl/issues/9706
 def decode_png(png_data):
    # Reference: https://www.w3.org/TR/PNG/
    header = png_data[8:]
    if png_data[:8] != b'\x89PNG\x0d\x0a\x1a\x0a' or header[4:8] != b'IHDR':
        raise IOError('Not a valid PNG file.')
    int_map = {1: '>B', 2: '>H', 4: '>I'}
    unpack_integer = lambda x: compat_struct_unpack(int_map[len(x)], x)[0]
    chunks = []
    while header:
        length = unpack_integer(header[:4])
        header = header[4:]
        chunk_type = header[:4]
        header = header[4:]
        chunk_data = header[:length]
        header = header[length:]
        header = header[4:]  # Skip CRC
        chunks.append({
            'type': chunk_type,
            'length': length,
            'data': chunk_data
        })
    ihdr = chunks[0]['data']
    width = unpack_integer(ihdr[:4])
    height = unpack_integer(ihdr[4:8])
    idat = b''
    for chunk in chunks:
        if chunk['type'] == b'IDAT':
            idat += chunk['data']
    if not idat:
        raise IOError('Unable to read PNG data.')
    decompressed_data = bytearray(zlib.decompress(idat))
    stride = width * 3
    pixels = []
    def _get_pixel(idx):
        x = idx % stride
        y = idx // stride
        return pixels[y][x]
    for y in range(height):
        basePos = y * (1 + stride)
        filter_type = decompressed_data[basePos]
        current_row = []
        pixels.append(current_row)
        for x in range(stride):
            color = decompressed_data[1 + basePos + x]
            basex = y * stride + x
            left = 0
            up = 0
            if x > 2:
                left = _get_pixel(basex - 3)
            if y > 0:
                up = _get_pixel(basex - stride)
            if filter_type == 1:  # Sub
                color = (color + left) & 0xff
            elif filter_type == 2:  # Up
                color = (color + up) & 0xff
            elif filter_type == 3:  # Average
                color = (color + ((left + up) >> 1)) & 0xff
            elif filter_type == 4:  # Paeth
                a = left
                b = up
                c = 0
                if x > 2 and y > 0:
                    c = _get_pixel(basex - stride - 3)
                p = a + b - c
                pa = abs(p - a)
                pb = abs(p - b)
                pc = abs(p - c)
                if pa <= pb and pa <= pc:
                    color = (color + a) & 0xff
                elif pb <= pc:
                    color = (color + b) & 0xff
                else:
                    color = (color + c) & 0xff
            current_row.append(color)
    return width, height, pixels
--- a/youtube_dl/version.py
+++ b/youtube_dl/version.py
@@ -1,3 +1,3 @@
 from __future__ import unicode_literals
-__version__ = '2016.08.06'
+__version__ = '2016.08.07'
Author	SHA1	Message	Date
Sergey M․	4a01befb34	release 2016.08.07	2016-08-07 21:12:41 +07:00
Sergey M․	845dfcdc40	[ChangeLog] Actualize	2016-08-07 21:10:48 +07:00
Sergey M․	d92cb46305	[discoverygo] Add extractor (Closes #10245 )	2016-08-07 20:57:05 +07:00
Sergey M․	a8795327ca	[utils] Add support TV Parental Guidelines ratings in parse_age_limit	2016-08-07 20:45:18 +07:00
Sergey M․	d34995a9e3	[flipagram] Make _search_json_ld non fatal	2016-08-07 19:06:55 +07:00
Sergey M․	958849275f	[extractor/generic] Make _search_json_ld non fatal	2016-08-07 19:04:22 +07:00
Sergey M․	998f094452	[bbc] Remove proxy from test	2016-08-07 18:13:05 +07:00
Sergey M․	aaa42cf0cf	[bbc] PEP 8	2016-08-07 18:05:13 +07:00
Sergey M․	9fb64c04cd	[bbc] Add support for morph embeds (Closes #10239 )	2016-08-07 18:01:50 +07:00
Remita Amine	f9622868e7	[bbc] preserve format_id backward compatibility	2016-08-07 11:14:15 +01:00
Remita Amine	37768f9242	[common] correctly lower the preference of m3u8 master manifest format	2016-08-07 10:59:09 +01:00
Sergey M․	a1aadd09a4	[tnaflixnetworkbase] Improve title extraction	2016-08-07 16:00:09 +07:00
Sergey M․	b47a75017b	[tnaflix] Fix metadata extraction (Closes #10249 )	2016-08-07 16:00:03 +07:00
Remita Amine	e37b54b140	[fox] fix theplatform release url query	2016-08-06 20:53:39 +01:00
Yen Chi Hsuan	c1decda58c	[openload] Fix extraction (closes #9706 )	2016-08-07 02:44:15 +08:00
Yen Chi Hsuan	d3f8e038fe	[utils] Add decode_png for openload (#9706 )	2016-08-07 02:42:58 +08:00
Remita Amine	ad152e2d95	[bbc] fix test	2016-08-06 19:36:12 +01:00
Remita Amine	b0af12154e	[bbc] reduce requests and improve format_id	2016-08-06 19:24:59 +01:00
Remita Amine	d16b3c6677	[common] extract partOfTVSeries info in json-ld	2016-08-06 18:58:38 +01:00
Remita Amine	c57244cdb1	[common] lower the preference of m3u8 master manifest format	2016-08-06 18:55:05 +01:00
Remita Amine	a7e5f27412	[bbc] improve extraction - extract f4m and dash formats - improve format sorting and listing - improve extraction of articles with `otherSettings.playlist`	2016-08-06 18:48:09 +01:00
Remita Amine	089a40955c	[pokemon] improve _VALID_URL	2016-08-06 12:08:14 +01:00
Remita Amine	d73ebac100	[pokemon] Add new extractor(closes #10093 )	2016-08-06 11:18:14 +01:00
Remita Amine	e563c0d73b	[condenast] fallback to loader.js if video.js fail	2016-08-05 21:01:16 +01:00
`@@ -1,3 +1,3 @@`
	`from __future__ import unicode_literals`	`from __future__ import unicode_literals`

	`__version__ = '2016.08.06'`	`__version__ = '2016.08.07'`