release 2014.08.22.3

[rtve] Add support for live stream
At the moment, only RTVE-1 seems to work flawlessly. -2 seems geoblocked right now. -TDP doesn't seem to be available outside of Spain.
2014-08-22 18:41:43 +02:00 · 2014-08-22 18:40:28 +02:00 · 2014-08-22 18:19:56 +02:00 · 2014-08-22 17:46:57 +02:00 · 2014-08-22 17:40:36 +02:00 · 2014-08-22 17:38:11 +02:00
17 changed files with 579 additions and 146 deletions
--- a/README.md
+++ b/README.md
@@ -311,10 +311,12 @@ The current default template is `%(title)s-%(id)s.%(ext)s`.

 In some cases, you don't want special characters such as 中, spaces, or &, such as when transferring the downloaded filename to a Windows system or the filename through an 8bit-unsafe channel. In these cases, add the `--restrict-filenames` flag to get a shorter title:

-    $ youtube-dl --get-filename -o "%(title)s.%(ext)s" BaW_jenozKc
-    youtube-dl test video ''_ä↭𝕐.mp4    # All kinds of weird characters
-    $ youtube-dl --get-filename -o "%(title)s.%(ext)s" BaW_jenozKc --restrict-filenames
-    youtube-dl_test_video_.mp4          # A simple file name
+```bash
+$ youtube-dl --get-filename -o "%(title)s.%(ext)s" BaW_jenozKc
+youtube-dl test video ''_ä↭𝕐.mp4    # All kinds of weird characters
+$ youtube-dl --get-filename -o "%(title)s.%(ext)s" BaW_jenozKc --restrict-filenames
+youtube-dl_test_video_.mp4          # A simple file name
+```

 # VIDEO SELECTION

@@ -325,14 +327,16 @@ Videos can be filtered by their upload date using the options `--date`, `--dateb
 
 Examples:

-    # Download only the videos uploaded in the last 6 months
-    $ youtube-dl --dateafter now-6months
+```bash
+# Download only the videos uploaded in the last 6 months
+$ youtube-dl --dateafter now-6months

-    # Download only the videos uploaded on January 1, 1970
-    $ youtube-dl --date 19700101
+# Download only the videos uploaded on January 1, 1970
+$ youtube-dl --date 19700101

-    $ # will only download the videos uploaded in the 200x decade
-    $ youtube-dl --dateafter 20000101 --datebefore 20091231
+$ # will only download the videos uploaded in the 200x decade
+$ youtube-dl --dateafter 20000101 --datebefore 20091231
+```

 # FAQ

@@ -407,49 +411,49 @@ If you want to add support for a new site, you can follow this quick list (assum
 2. Check out the source code with `git clone git@github.com:YOUR_GITHUB_USERNAME/youtube-dl.git`
 3. Start a new git branch with `cd youtube-dl; git checkout -b yourextractor`
 4. Start with this simple template and save it to `youtube_dl/extractor/yourextractor.py`:
+    ```python
+    # coding: utf-8
+    from __future__ import unicode_literals

-        # coding: utf-8
-        from __future__ import unicode_literals
+    import re

-        import re
+    from .common import InfoExtractor

-        from .common import InfoExtractor
-        
-        
-        class YourExtractorIE(InfoExtractor):
-            _VALID_URL = r'https?://(?:www\.)?yourextractor\.com/watch/(?P<id>[0-9]+)'
-            _TEST = {
-                'url': 'http://yourextractor.com/watch/42',
-                'md5': 'TODO: md5 sum of the first 10KiB of the video file',
-                'info_dict': {
-                    'id': '42',
-                    'ext': 'mp4',
-                    'title': 'Video title goes here',
-                    # TODO more properties, either as:
-                    # * A value
-                    # * MD5 checksum; start the string with md5:
-                    # * A regular expression; start the string with re:
-                    # * Any Python type (for example int or float)
-                }
+
+    class YourExtractorIE(InfoExtractor):
+        _VALID_URL = r'https?://(?:www\.)?yourextractor\.com/watch/(?P<id>[0-9]+)'
+        _TEST = {
+            'url': 'http://yourextractor.com/watch/42',
+            'md5': 'TODO: md5 sum of the first 10KiB of the video file',
+            'info_dict': {
+                'id': '42',
+                'ext': 'mp4',
+                'title': 'Video title goes here',
+                'thumbnail': 're:^https?://.*\.jpg$',
+                # TODO more properties, either as:
+                # * A value
+                # * MD5 checksum; start the string with md5:
+                # * A regular expression; start the string with re:
+                # * Any Python type (for example int or float)
            }
+        }

-            def _real_extract(self, url):
-                mobj = re.match(self._VALID_URL, url)
-                video_id = mobj.group('id')
-
-                # TODO more code goes here, for example ...
-                webpage = self._download_webpage(url, video_id)
-                title = self._html_search_regex(r'<h1>(.*?)</h1>', webpage, 'title')
-
-                return {
-                    'id': video_id,
-                    'title': title,
-                    # TODO more properties (see youtube_dl/extractor/common.py)
-                }
+        def _real_extract(self, url):
+            mobj = re.match(self._VALID_URL, url)
+            video_id = mobj.group('id')

+            # TODO more code goes here, for example ...
+            webpage = self._download_webpage(url, video_id)
+            title = self._html_search_regex(r'<h1>(.*?)</h1>', webpage, 'title')

+            return {
+                'id': video_id,
+                'title': title,
+                # TODO more properties (see youtube_dl/extractor/common.py)
+            }
+    ```
 5. Add an import in [`youtube_dl/extractor/__init__.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/__init__.py).
-6. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done.
+6. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, then rename ``_TEST`` to ``_TESTS`` and make it into a list of dictionaries. The tests will be then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc.
 7. Have a look at [`youtube_dl/common/extractor/common.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should return](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L38). Add tests and code for as many as you want.
 8. If you can, check the code with [pyflakes](https://pypi.python.org/pypi/pyflakes) (a good idea) and [pep8](https://pypi.python.org/pypi/pep8) (optional, ignore E501).
 9. When the tests pass, [add](https://www.kernel.org/pub/software/scm/git/docs/git-add.html) the new files and [commit](https://www.kernel.org/pub/software/scm/git/docs/git-commit.html) them and [push](https://www.kernel.org/pub/software/scm/git/docs/git-push.html) the result, like this:
--- a/test/test_YoutubeDL.py
+++ b/test/test_YoutubeDL.py
@@ -221,7 +221,7 @@ class TestFormatSelection(unittest.TestCase):
            '138', '137', '248', '136', '247', '135', '246',
            '245', '244', '134', '243', '133', '242', '160',
            # Dash audio
-            '141', '172', '140', '139', '171',
+            '141', '172', '140', '171', '139',
        ]

        for f1id, f2id in zip(order, order[1:]):
--- a/youtube_dl/YoutubeDL.py
+++ b/youtube_dl/YoutubeDL.py
@@ -480,7 +480,10 @@ class YoutubeDL(object):
                return 'Skipping %s, because it has exceeded the maximum view count (%d/%d)' % (video_title, view_count, max_views)
        age_limit = self.params.get('age_limit')
        if age_limit is not None:
-            if age_limit < info_dict.get('age_limit', 0):
+            actual_age_limit = info_dict.get('age_limit')
+            if actual_age_limit is None:
+                actual_age_limit = 0
+            if age_limit < actual_age_limit:
                return 'Skipping "' + title + '" because it is age restricted'
        if self.in_download_archive(info_dict):
            return '%s has already been recorded in archive' % video_title
--- a/youtube_dl/init.py
+++ b/youtube_dl/init.py
@@ -70,6 +70,7 @@ __authors__  = (
    'David Fabijan',
    'Sebastian Haas',
    'Alexander Kirk',
+    'Erik Johnson',
 )

 __license__ = 'Public Domain'
--- a/youtube_dl/extractor/init.py
+++ b/youtube_dl/extractor/init.py
@@ -69,6 +69,7 @@ from .dfb import DFBIE
 from .dotsub import DotsubIE
 from .dreisat import DreiSatIE
 from .drtv import DRTVIE
+from .dump import DumpIE
 from .defense import DefenseGouvFrIE
 from .discovery import DiscoveryIE
 from .divxstage import DivxStageIE
@@ -239,8 +240,10 @@ from .orf import (
    ORFFM4IE,
 )
 from .parliamentliveuk import ParliamentLiveUKIE
+from .patreon import PatreonIE
 from .pbs import PBSIE
 from .photobucket import PhotobucketIE
+from .playfm import PlayFMIE
 from .playvid import PlayvidIE
 from .podomatic import PodomaticIE
 from .pornhd import PornHdIE
@@ -261,7 +264,7 @@ from .rtbf import RTBFIE
 from .rtlnl import RtlXlIE
 from .rtlnow import RTLnowIE
 from .rts import RTSIE
-from .rtve import RTVEALaCartaIE
+from .rtve import RTVEALaCartaIE, RTVELiveIE
 from .ruhd import RUHDIE
 from .rutube import (
    RutubeIE,
--- a/youtube_dl/extractor/aparat.py
+++ b/youtube_dl/extractor/aparat.py
@@ -1,5 +1,7 @@
 #coding: utf-8

+from __future__ import unicode_literals
+
 import re

 from .common import InfoExtractor
@@ -13,13 +15,14 @@ class AparatIE(InfoExtractor):
    _VALID_URL = r'^https?://(?:www\.)?aparat\.com/(?:v/|video/video/embed/videohash/)(?P<id>[a-zA-Z0-9]+)'

    _TEST = {
-        u'url': u'http://www.aparat.com/v/wP8On',
-        u'file': u'wP8On.mp4',
-        u'md5': u'6714e0af7e0d875c5a39c4dc4ab46ad1',
-        u'info_dict': {
-            u"title": u"تیم گلکسی 11 - زومیت",
+        'url': 'http://www.aparat.com/v/wP8On',
+        'md5': '6714e0af7e0d875c5a39c4dc4ab46ad1',
+        'info_dict': {
+            'id': 'wP8On',
+            'ext': 'mp4',
+            'title': 'تیم گلکسی 11 - زومیت',
        },
-        #u'skip': u'Extremely unreliable',
+        # 'skip': 'Extremely unreliable',
    }

    def _real_extract(self, url):
@@ -29,8 +32,8 @@ class AparatIE(InfoExtractor):
        # Note: There is an easier-to-parse configuration at
        # http://www.aparat.com/video/video/config/videohash/%video_id
        # but the URL in there does not work
-        embed_url = (u'http://www.aparat.com/video/video/embed/videohash/' +
-                     video_id + u'/vt/frame')
+        embed_url = ('http://www.aparat.com/video/video/embed/videohash/' +
+                     video_id + '/vt/frame')
        webpage = self._download_webpage(embed_url, video_id)

        video_urls = re.findall(r'fileList\[[0-9]+\]\s*=\s*"([^"]+)"', webpage)
--- a/youtube_dl/extractor/dump.py
+++ b/youtube_dl/extractor/dump.py
@@ -0,0 +1,39 @@
+# encoding: utf-8
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+
+
+class DumpIE(InfoExtractor):
+    _VALID_URL = r'^https?://(?:www\.)?dump\.com/(?P<id>[a-zA-Z0-9]+)/'
+
+    _TEST = {
+        'url': 'http://www.dump.com/oneus/',
+        'md5': 'ad71704d1e67dfd9e81e3e8b42d69d99',
+        'info_dict': {
+            'id': 'oneus',
+            'ext': 'flv',
+            'title': "He's one of us.",
+            'thumbnail': 're:^https?://.*\.jpg$',
+        },
+    }
+
+    def _real_extract(self, url):
+        m = re.match(self._VALID_URL, url)
+        video_id = m.group('id')
+
+        webpage = self._download_webpage(url, video_id)
+        video_url = self._search_regex(
+            r's1.addVariable\("file",\s*"([^"]+)"', webpage, 'video URL')
+
+        thumb = self._og_search_thumbnail(webpage)
+        title = self._search_regex(r'<b>([^"]+)</b>', webpage, 'title')
+
+        return {
+            'id': video_id,
+            'title': title,
+            'url': video_url,
+            'thumbnail': thumb,
+        }
--- a/youtube_dl/extractor/generic.py
+++ b/youtube_dl/extractor/generic.py
@@ -16,6 +16,7 @@ from ..utils import (

    ExtractorError,
    HEADRequest,
+    orderedSet,
    parse_xml,
    smuggle_url,
    unescapeHTML,
@@ -289,6 +290,22 @@ class GenericIE(InfoExtractor):
                'description': 'Mario\'s life in the fast lane has never looked so good.',
            },
        },
+        # YouTube embed via <data-embed-url="">
+        {
+            'url': 'https://play.google.com/store/apps/details?id=com.gameloft.android.ANMP.GloftA8HM',
+            'info_dict': {
+                'id': 'jpSGZsgga_I',
+                'ext': 'mp4',
+                'title': 'Asphalt 8: Airborne - Launch Trailer',
+                'uploader': 'Gameloft',
+                'uploader_id': 'gameloft',
+                'upload_date': '20130821',
+                'description': 'md5:87bd95f13d8be3e7da87a5f2c443106a',
+            },
+            'params': {
+                'skip_download': True,
+            }
+        }
    ]

    def report_download_webpage(self, video_id):
@@ -479,6 +496,12 @@ class GenericIE(InfoExtractor):
        video_uploader = self._search_regex(
            r'^(?:https?://)?([^/]*)/.*', url, 'video uploader')

+        # Helper method
+        def _playlist_from_matches(matches, getter, ie=None):
+            urlrs = orderedSet(self.url_result(getter(m), ie) for m in matches)
+            return self.playlist_result(
+                urlrs, playlist_id=video_id, playlist_title=video_title)
+
        # Look for BrightCove:
        bc_urls = BrightcoveIE._extract_brightcove_urls(webpage)
        if bc_urls:
@@ -514,6 +537,7 @@ class GenericIE(InfoExtractor):
        matches = re.findall(r'''(?x)
            (?:
                <iframe[^>]+?src=|
+                data-video-url=|
                <embed[^>]+?src=|
                embedSWF\(?:\s*
            )
@@ -522,19 +546,15 @@ class GenericIE(InfoExtractor):
                (?:embed|v)/.+?)
            \1''', webpage)
        if matches:
-            urlrs = [self.url_result(unescapeHTML(tuppl[1]), 'Youtube')
-                     for tuppl in matches]
-            return self.playlist_result(
-                urlrs, playlist_id=video_id, playlist_title=video_title)
+            return _playlist_from_matches(
+                matches, lambda m: unescapeHTML(m[1]), ie='Youtube')

        # Look for embedded Dailymotion player
        matches = re.findall(
            r'<iframe[^>]+?src=(["\'])(?P<url>(?:https?:)?//(?:www\.)?dailymotion\.com/embed/video/.+?)\1', webpage)
        if matches:
-            urlrs = [self.url_result(unescapeHTML(tuppl[1]))
-                     for tuppl in matches]
-            return self.playlist_result(
-                urlrs, playlist_id=video_id, playlist_title=video_title)
+            return _playlist_from_matches(
+                matches, lambda m: unescapeHTML(m[1]))

        # Look for embedded Wistia player
        match = re.search(
@@ -648,10 +668,8 @@ class GenericIE(InfoExtractor):
        # Look for funnyordie embed
        matches = re.findall(r'<iframe[^>]+?src="(https?://(?:www\.)?funnyordie\.com/embed/[^"]+)"', webpage)
        if matches:
-            urlrs = [self.url_result(unescapeHTML(eurl), 'FunnyOrDie')
-                     for eurl in matches]
-            return self.playlist_result(
-                urlrs, playlist_id=video_id, playlist_title=video_title)
+            return _playlist_from_matches(
+                matches, getter=unescapeHTML, ie='FunnyOrDie')

        # Look for embedded RUTV player
        rutv_url = RUTVIE._extract_url(webpage)
--- a/youtube_dl/extractor/metacafe.py
+++ b/youtube_dl/extractor/metacafe.py
@@ -9,6 +9,7 @@ from ..utils import (
    compat_urllib_request,
    determine_ext,
    ExtractorError,
+    int_or_none,
 )


@@ -83,6 +84,21 @@ class MetacafeIE(InfoExtractor):
                'skip_download': True,
            },
        },
+        # Movieclips.com video
+        {
+            'url': 'http://www.metacafe.com/watch/mv-Wy7ZU/my_week_with_marilyn_do_you_love_me/',
+            'info_dict': {
+                'id': 'mv-Wy7ZU',
+                'ext': 'mp4',
+                'title': 'My Week with Marilyn - Do You Love Me?',
+                'description': 'From the movie My Week with Marilyn - Colin (Eddie Redmayne) professes his love to Marilyn (Michelle Williams) and gets her to promise to return to set and finish the movie.',
+                'uploader': 'movie_trailers',
+                'duration': 176,
+            },
+            'params': {
+                'skip_download': 'requires rtmpdump',
+            }
+        }
    ]

    def report_disclaimer(self):
@@ -134,6 +150,7 @@ class MetacafeIE(InfoExtractor):

        # Extract URL, uploader and title from webpage
        self.report_extraction(video_id)
+        video_url = None
        mobj = re.search(r'(?m)&mediaURL=([^&]+)', webpage)
        if mobj is not None:
            mediaURL = compat_urllib_parse.unquote(mobj.group(1))
@@ -146,16 +163,17 @@ class MetacafeIE(InfoExtractor):
            else:
                gdaKey = mobj.group(1)
                video_url = '%s?__gda__=%s' % (mediaURL, gdaKey)
-        else:
+        if video_url is None:
            mobj = re.search(r'<video src="([^"]+)"', webpage)
            if mobj:
                video_url = mobj.group(1)
                video_ext = 'mp4'
-            else:
-                mobj = re.search(r' name="flashvars" value="(.*?)"', webpage)
-                if mobj is None:
-                    raise ExtractorError('Unable to extract media URL')
-                vardict = compat_parse_qs(mobj.group(1))
+        if video_url is None:
+            flashvars = self._search_regex(
+                r' name="flashvars" value="(.*?)"', webpage, 'flashvars',
+                default=None)
+            if flashvars:
+                vardict = compat_parse_qs(flashvars)
                if 'mediaData' not in vardict:
                    raise ExtractorError('Unable to extract media URL')
                mobj = re.search(
@@ -165,26 +183,68 @@ class MetacafeIE(InfoExtractor):
                mediaURL = mobj.group('mediaURL').replace('\\/', '/')
                video_url = '%s?__gda__=%s' % (mediaURL, mobj.group('key'))
                video_ext = determine_ext(video_url)
+        if video_url is None:
+            player_url = self._search_regex(
+                r"swfobject\.embedSWF\('([^']+)'",
+                webpage, 'config URL', default=None)
+            if player_url:
+                config_url = self._search_regex(
+                    r'config=(.+)$', player_url, 'config URL')
+                config_doc = self._download_xml(
+                    config_url, video_id,
+                    note='Downloading video config')
+                smil_url = config_doc.find('.//properties').attrib['smil_file']
+                smil_doc = self._download_xml(
+                    smil_url, video_id,
+                    note='Downloading SMIL document')
+                base_url = smil_doc.find('./head/meta').attrib['base']
+                video_url = []
+                for vn in smil_doc.findall('.//video'):
+                    br = int(vn.attrib['system-bitrate'])
+                    play_path = vn.attrib['src']
+                    video_url.append({
+                        'format_id': 'smil-%d' % br,
+                        'url': base_url,
+                        'play_path': play_path,
+                        'page_url': url,
+                        'player_url': player_url,
+                        'ext': play_path.partition(':')[0],
+                    })

-        video_title = self._html_search_regex(r'(?im)<title>(.*) - Video</title>', webpage, 'title')
+        if video_url is None:
+            raise ExtractorError('Unsupported video type')
+
+        video_title = self._html_search_regex(
+            r'(?im)<title>(.*) - Video</title>', webpage, 'title')
        description = self._og_search_description(webpage)
        thumbnail = self._og_search_thumbnail(webpage)
        video_uploader = self._html_search_regex(
                r'submitter=(.*?);|googletag\.pubads\(\)\.setTargeting\("(?:channel|submiter)","([^"]+)"\);',
                webpage, 'uploader nickname', fatal=False)
+        duration = int_or_none(
+            self._html_search_meta('video:duration', webpage))

-        if re.search(r'"contentRating":"restricted"', webpage) is not None:
-            age_limit = 18
+        age_limit = (
+            18
+            if re.search(r'"contentRating":"restricted"', webpage)
+            else 0)
+
+        if isinstance(video_url, list):
+            formats = video_url
        else:
-            age_limit = 0
+            formats = [{
+                'url': video_url,
+                'ext': video_ext,
+            }]

+        self._sort_formats(formats)
        return {
            'id': video_id,
-            'url': video_url,
            'description': description,
            'uploader': video_uploader,
            'title': video_title,
-            'thumbnail':thumbnail,
-            'ext': video_ext,
+            'thumbnail': thumbnail,
            'age_limit': age_limit,
+            'formats': formats,
+            'duration': duration,
        }
--- a/youtube_dl/extractor/nuvid.py
+++ b/youtube_dl/extractor/nuvid.py
@@ -38,7 +38,7 @@ class NuvidIE(InfoExtractor):
            webpage = self._download_webpage(
                request, video_id, 'Downloading %s page' % format_id)
            video_url = self._html_search_regex(
-                r'<a href="([^"]+)"\s*>Continue to watch video', webpage, '%s video URL' % format_id, fatal=False)
+                r'<a\s+href="([^"]+)"\s+class="b_link">', webpage, '%s video URL' % format_id, fatal=False)
            if not video_url:
                continue
            formats.append({
@@ -49,19 +49,24 @@ class NuvidIE(InfoExtractor):
        webpage = self._download_webpage(
            'http://m.nuvid.com/video/%s' % video_id, video_id, 'Downloading video page')
        title = self._html_search_regex(
-            r'<div class="title">\s+<h2[^>]*>([^<]+)</h2>', webpage, 'title').strip()
-        thumbnail = self._html_search_regex(
-            r'href="(/thumbs/[^"]+)"[^>]*data-link_type="thumbs"',
-            webpage, 'thumbnail URL', fatal=False)
+            [r'<span title="([^"]+)">',
+             r'<div class="thumb-holder video">\s*<h5[^>]*>([^<]+)</h5>'], webpage, 'title').strip()
+        thumbnails = [
+            {
+                'url': thumb_url,
+            } for thumb_url in re.findall(r'<img src="([^"]+)" alt="" />', webpage)
+        ]
+        thumbnail = thumbnails[0]['url'] if thumbnails else None
        duration = parse_duration(self._html_search_regex(
-            r'Length:\s*<span>(\d{2}:\d{2})</span>',webpage, 'duration', fatal=False))
+            r'<i class="fa fa-clock-o"></i>\s*(\d{2}:\d{2})', webpage, 'duration', fatal=False))
        upload_date = unified_strdate(self._html_search_regex(
-            r'Added:\s*<span>(\d{4}-\d{2}-\d{2})</span>', webpage, 'upload date', fatal=False))
+            r'<i class="fa fa-user"></i>\s*(\d{4}-\d{2}-\d{2})', webpage, 'upload date', fatal=False))

        return {
            'id': video_id,
            'title': title,
-            'thumbnail': 'http://m.nuvid.com%s' % thumbnail,
+            'thumbnails': thumbnails,
+            'thumbnail': thumbnail,
            'duration': duration,
            'upload_date': upload_date,
            'age_limit': 18,
--- a/youtube_dl/extractor/patreon.py
+++ b/youtube_dl/extractor/patreon.py
@@ -0,0 +1,101 @@
+# encoding: utf-8
+from __future__ import unicode_literals
+
+import json
+import re
+
+from .common import InfoExtractor
+from ..utils import (
+    compat_urlparse,
+    js_to_json,
+)
+
+
+class PatreonIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?patreon\.com/creation\?hid=(.+)'
+    _TESTS = [
+        {
+            'url': 'http://www.patreon.com/creation?hid=743933',
+            'md5': 'e25505eec1053a6e6813b8ed369875cc',
+            'info_dict': {
+                'id': '743933',
+                'ext': 'mp3',
+                'title': 'Episode 166: David Smalley of Dogma Debate',
+                'uploader': 'Cognitive Dissonance Podcast',
+                'thumbnail': 're:^https?://.*$',
+            },
+        },
+        {
+            'url': 'http://www.patreon.com/creation?hid=754133',
+            'md5': '3eb09345bf44bf60451b8b0b81759d0a',
+            'info_dict': {
+                'id': '754133',
+                'ext': 'mp3',
+                'title': 'CD 167 Extra',
+                'uploader': 'Cognitive Dissonance Podcast',
+                'thumbnail': 're:^https?://.*$',
+            },
+        },
+    ]
+
+    # Currently Patreon exposes download URL via hidden CSS, so login is not
+    # needed. Keeping this commented for when this inevitably changes.
+    '''
+    def _login(self):
+        (username, password) = self._get_login_info()
+        if username is None:
+            return
+
+        login_form = {
+            'redirectUrl': 'http://www.patreon.com/',
+            'email': username,
+            'password': password,
+        }
+
+        request = compat_urllib_request.Request(
+            'https://www.patreon.com/processLogin',
+            compat_urllib_parse.urlencode(login_form).encode('utf-8')
+        )
+        login_page = self._download_webpage(request, None, note='Logging in as %s' % username)
+
+        if re.search(r'onLoginFailed', login_page):
+            raise ExtractorError('Unable to login, incorrect username and/or password', expected=True)
+
+    def _real_initialize(self):
+        self._login()
+    '''
+
+    def _real_extract(self, url):
+        mobj = re.match(self._VALID_URL, url)
+        video_id = mobj.group(1)
+
+        webpage = self._download_webpage(url, video_id)
+        title = self._og_search_title(webpage).strip()
+
+        attach_fn = self._html_search_regex(
+            r'<div class="attach"><a target="_blank" href="([^"]+)">',
+            webpage, 'attachment URL', default=None)
+        if attach_fn is not None:
+            video_url = 'http://www.patreon.com' + attach_fn
+            thumbnail = self._og_search_thumbnail(webpage)
+            uploader = self._html_search_regex(
+                r'<strong>(.*?)</strong> is creating', webpage, 'uploader')
+        else:
+            playlist_js = self._search_regex(
+                r'(?s)new\s+jPlayerPlaylist\(\s*\{\s*[^}]*},\s*(\[.*?,?\s*\])',
+                webpage, 'playlist JSON')
+            playlist_json = js_to_json(playlist_js)
+            playlist = json.loads(playlist_json)
+            data = playlist[0]
+            video_url = self._proto_relative_url(data['mp3'])
+            thumbnail = self._proto_relative_url(data.get('cover'))
+            uploader = data.get('artist')
+
+        return {
+            'id': video_id,
+            'url': video_url,
+            'ext': 'mp3',
+            'title': title,
+            'uploader': uploader,
+            'thumbnail': thumbnail,
+        }
--- a/youtube_dl/extractor/pbs.py
+++ b/youtube_dl/extractor/pbs.py
@@ -54,6 +54,18 @@ class PBSIE(InfoExtractor):
                'duration': 801,
            },
        },
+        {
+            'url': 'http://www.pbs.org/wnet/gperf/dudamel-conducts-verdi-requiem-hollywood-bowl-full-episode/3374/',
+            'md5': 'c62859342be2a0358d6c9eb306595978',
+            'info_dict': {
+                'id': '2365297708',
+                'ext': 'mp4',
+                'description': 'md5:68d87ef760660eb564455eb30ca464fe',
+                'title': 'Dudamel Conducts Verdi Requiem at the Hollywood Bowl - Full',
+                'duration': 6559,
+                'thumbnail': 're:^https?://.*\.jpg$',
+            }
+        }
    ]

    def _extract_ids(self, url):
@@ -75,7 +87,7 @@ class PBSIE(InfoExtractor):
                return media_id, presumptive_id

            url = self._search_regex(
-                r'<iframe\s+id=["\']partnerPlayer["\'].*?\s+src=["\'](.*?)["\']>',
+                r'<iframe\s+(?:class|id)=["\']partnerPlayer["\'].*?\s+src=["\'](.*?)["\']>',
                webpage, 'player URL')
            mobj = re.match(self._VALID_URL, url)

--- a/youtube_dl/extractor/playfm.py
+++ b/youtube_dl/extractor/playfm.py
@@ -0,0 +1,82 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+from ..utils import (
+    compat_urllib_parse,
+    compat_urllib_request,
+    ExtractorError,
+    float_or_none,
+    int_or_none,
+)
+
+
+class PlayFMIE(InfoExtractor):
+    IE_NAME = 'play.fm'
+    _VALID_URL = r'https?://(?:www\.)?play\.fm/[^?#]*(?P<upload_date>[0-9]{8})(?P<id>[0-9]{6})(?:$|[?#])'
+
+    _TEST = {
+        'url': 'http://www.play.fm/recording/leipzigelectronicmusicbatofarparis_fr20140712137220',
+        'md5': 'c505f8307825a245d0c7ad1850001f22',
+        'info_dict': {
+            'id': '137220',
+            'ext': 'mp3',
+            'title': 'LEIPZIG ELECTRONIC MUSIC @ Batofar (Paris,FR) - 2014-07-12',
+            'uploader': 'Sven Tasnadi',
+            'uploader_id': 'sventasnadi',
+            'duration': 5627.428,
+            'upload_date': '20140712',
+            'view_count': int,
+            'thumbnail': 're:^https?://.*\.jpg$',
+        },
+    }
+
+    def _real_extract(self, url):
+        mobj = re.match(self._VALID_URL, url)
+        video_id = mobj.group('id')
+        upload_date = mobj.group('upload_date')
+
+        rec_data = compat_urllib_parse.urlencode({'rec_id': video_id})
+        req = compat_urllib_request.Request(
+            'http://www.play.fm/flexRead/recording', data=rec_data)
+        req.add_header('Content-Type', 'application/x-www-form-urlencoded')
+        rec_doc = self._download_xml(req, video_id)
+
+        error_node = rec_doc.find('./error')
+        if error_node is not None:
+            raise ExtractorError('An error occured: %s (code %s)' % (
+                error_node.text, rec_doc.find('./status').text))
+
+        recording = rec_doc.find('./recording')
+        title = recording.find('./title').text
+        view_count = int_or_none(recording.find('./stats/playcount').text)
+        duration = float_or_none(recording.find('./duration').text, scale=1000)
+        thumbnail = recording.find('./image').text
+
+        artist = recording.find('./artists/artist')
+        uploader = artist.find('./name').text
+        uploader_id = artist.find('./slug').text
+
+        video_url = '%s//%s/%s/%s/offset/0/sh/%s/rec/%s/jingle/%s/loc/%s' % (
+            'http:', recording.find('./url').text,
+            recording.find('./_class').text, recording.find('./file_id').text,
+            rec_doc.find('./uuid').text, video_id,
+            rec_doc.find('./jingle/file_id').text,
+            'http%3A%2F%2Fwww.play.fm%2Fplayer',
+        )
+
+        return {
+            'id': video_id,
+            'url': video_url,
+            'ext': 'mp3',
+            'filesize': int_or_none(recording.find('./size').text),
+            'title': title,
+            'upload_date': upload_date,
+            'view_count': view_count,
+            'duration': duration,
+            'thumbnail': thumbnail,
+            'uploader': uploader,
+            'uploader_id': uploader_id,
+        }
--- a/youtube_dl/extractor/rtve.py
+++ b/youtube_dl/extractor/rtve.py
@@ -1,21 +1,66 @@
 # encoding: utf-8
 from __future__ import unicode_literals

-import re
 import base64
+import re
+import time

 from .common import InfoExtractor
 from ..utils import (
    struct_unpack,
+    remove_end,
 )


+def _decrypt_url(png):
+    encrypted_data = base64.b64decode(png)
+    text_index = encrypted_data.find(b'tEXt')
+    text_chunk = encrypted_data[text_index - 4:]
+    length = struct_unpack('!I', text_chunk[:4])[0]
+    # Use bytearray to get integers when iterating in both python 2.x and 3.x
+    data = bytearray(text_chunk[8:8 + length])
+    data = [chr(b) for b in data if b != 0]
+    hash_index = data.index('#')
+    alphabet_data = data[:hash_index]
+    url_data = data[hash_index + 1:]
+
+    alphabet = []
+    e = 0
+    d = 0
+    for l in alphabet_data:
+        if d == 0:
+            alphabet.append(l)
+            d = e = (e + 1) % 4
+        else:
+            d -= 1
+    url = ''
+    f = 0
+    e = 3
+    b = 1
+    for letter in url_data:
+        if f == 0:
+            l = int(letter) * 10
+            f = 1
+        else:
+            if e == 0:
+                l += int(letter)
+                url += alphabet[l]
+                e = (b + 3) % 4
+                f = 0
+                b += 1
+            else:
+                e -= 1
+
+    return url
+
+
+
 class RTVEALaCartaIE(InfoExtractor):
    IE_NAME = 'rtve.es:alacarta'
    IE_DESC = 'RTVE a la carta'
    _VALID_URL = r'http://www\.rtve\.es/alacarta/videos/[^/]+/[^/]+/(?P<id>\d+)'

-    _TEST = {
+    _TESTS = [{
        'url': 'http://www.rtve.es/alacarta/videos/balonmano/o-swiss-cup-masculina-final-espana-suecia/2491869/',
        'md5': '1d49b7e1ca7a7502c56a4bf1b60f1b43',
        'info_dict': {
@@ -23,48 +68,15 @@ class RTVEALaCartaIE(InfoExtractor):
            'ext': 'mp4',
            'title': 'Balonmano - Swiss Cup masculina. Final: España-Suecia',
        },
-    }
-
-    def _decrypt_url(self, png):
-        encrypted_data = base64.b64decode(png)
-        text_index = encrypted_data.find(b'tEXt')
-        text_chunk = encrypted_data[text_index-4:]
-        length = struct_unpack('!I', text_chunk[:4])[0]
-        # Use bytearray to get integers when iterating in both python 2.x and 3.x
-        data = bytearray(text_chunk[8:8+length])
-        data = [chr(b) for b in data if b != 0]
-        hash_index = data.index('#')
-        alphabet_data = data[:hash_index]
-        url_data = data[hash_index+1:]
-
-        alphabet = []
-        e = 0
-        d = 0
-        for l in alphabet_data:
-            if d == 0:
-                alphabet.append(l)
-                d = e = (e + 1) % 4
-            else:
-                d -= 1
-        url = ''
-        f = 0
-        e = 3
-        b = 1
-        for letter in url_data:
-            if f == 0:
-                l = int(letter)*10
-                f = 1
-            else:
-                if e == 0:
-                    l += int(letter)
-                    url += alphabet[l]
-                    e = (b + 3) % 4
-                    f = 0
-                    b += 1
-                else:
-                    e -= 1
-
-        return url
+    }, {
+        'note': 'Live stream',
+        'url': 'http://www.rtve.es/alacarta/videos/television/24h-live/1694255/',
+        'info_dict': {
+            'id': '1694255',
+            'ext': 'flv',
+            'title': 'TODO',
+        }
+    }]

    def _real_extract(self, url):
        mobj = re.match(self._VALID_URL, url)
@@ -74,11 +86,59 @@ class RTVEALaCartaIE(InfoExtractor):
            video_id)['page']['items'][0]
        png_url = 'http://www.rtve.es/ztnr/movil/thumbnail/default/videos/%s.png' % video_id
        png = self._download_webpage(png_url, video_id, 'Downloading url information')
-        video_url = self._decrypt_url(png)
+        video_url = _decrypt_url(png)

        return {
            'id': video_id,
            'title': info['title'],
            'url': video_url,
-            'thumbnail': info['image'],
+            'thumbnail': info.get('image'),
+            'page_url': url,
+        }
+
+
+class RTVELiveIE(InfoExtractor):
+    IE_NAME = 'rtve.es:live'
+    IE_DESC = 'RTVE.es live streams'
+    _VALID_URL = r'http://www\.rtve\.es/(?:deportes/directo|noticias(?=/directo-la-1)|television)/(?P<id>[a-zA-Z0-9-]+)'
+
+    _TESTS = [{
+        'url': 'http://www.rtve.es/noticias/directo-la-1/',
+        'info_dict': {
+            'id': 'directo-la-1',
+            'ext': 'flv',
+            'title': 're:^La 1 de TVE [0-9]{4}-[0-9]{2}-[0-9]{2}Z[0-9]{6}$',
+        },
+        'params': {
+            'skip_download': 'live stream',
+        }
+    }]
+
+    def _real_extract(self, url):
+        mobj = re.match(self._VALID_URL, url)
+        start_time = time.gmtime()
+        video_id = mobj.group('id')
+
+        webpage = self._download_webpage(url, video_id)
+        player_url = self._search_regex(
+            r'<param name="movie" value="([^"]+)"/>', webpage, 'player URL')
+        title = remove_end(self._og_search_title(webpage), ' en directo')
+        title += ' ' + time.strftime('%Y-%m-%dZ%H%M%S', start_time)
+
+        vidplayer_id = self._search_regex(
+            r' id="vidplayer([0-9]+)"', webpage, 'internal video ID')
+        png_url = 'http://www.rtve.es/ztnr/movil/thumbnail/default/videos/%s.png' % vidplayer_id
+        png = self._download_webpage(png_url, video_id, 'Downloading url information')
+        video_url = _decrypt_url(png)
+
+        print(video_url)
+
+        return {
+            'id': video_id,
+            'ext': 'flv',
+            'title': title,
+            'url': video_url,
+            'app': 'rtve-live-live?ovpfv=2.1.2',
+            'player_url': player_url,
+            'rtmp_live': True,
        }
--- a/youtube_dl/extractor/youtube.py
+++ b/youtube_dl/extractor/youtube.py
@@ -225,7 +225,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor, SubtitlesInfoExtractor):
        '272': {'ext': 'webm', 'height': 2160, 'format_note': 'DASH video', 'acodec': 'none', 'preference': -40},

        # Dash webm audio
-        '171': {'ext': 'webm', 'vcodec': 'none', 'format_note': 'DASH audio', 'abr': 48, 'preference': -50},
+        '171': {'ext': 'webm', 'vcodec': 'none', 'format_note': 'DASH audio', 'abr': 128, 'preference': -50},
        '172': {'ext': 'webm', 'vcodec': 'none', 'format_note': 'DASH audio', 'abr': 256, 'preference': -50},

        # RTMP (unnamed)
@@ -508,6 +508,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor, SubtitlesInfoExtractor):
        sub_lang_list = {}
        for l in lang_list:
            lang = l[1]
+            if lang in sub_lang_list:
+                continue
            params = compat_urllib_parse.urlencode({
                'lang': lang,
                'v': video_id,
--- a/youtube_dl/utils.py
+++ b/youtube_dl/utils.py
@@ -233,18 +233,24 @@ else:
 def write_json_file(obj, fn):
    """ Encode obj as JSON and write it to fn, atomically """

+    args = {
+        'suffix': '.tmp',
+        'prefix': os.path.basename(fn) + '.',
+        'dir': os.path.dirname(fn),
+        'delete': False,
+    }
+
    # In Python 2.x, json.dump expects a bytestream.
    # In Python 3.x, it writes to a character stream
    if sys.version_info < (3, 0):
-        mode = 'wb'
-        encoding = None
+        args['mode'] = 'wb'
    else:
-        mode = 'w'
-        encoding = 'utf-8'
-    tf = tempfile.NamedTemporaryFile(
-        suffix='.tmp', prefix=os.path.basename(fn) + '.',
-        dir=os.path.dirname(fn),
-        delete=False)
+        args.update({
+            'mode': 'w',
+            'encoding': 'utf-8',
+        })
+
+    tf = tempfile.NamedTemporaryFile(**args)

    try:
        with tf:
@@ -1279,6 +1285,12 @@ def remove_start(s, start):
    return s


+def remove_end(s, end):
+    if s.endswith(end):
+        return s[:-len(end)]
+    return s
+
+
 def url_basename(url):
    path = compat_urlparse.urlparse(url).path
    return path.strip(u'/').split(u'/')[-1]
@@ -1468,6 +1480,34 @@ def strip_jsonp(code):
    return re.sub(r'(?s)^[a-zA-Z0-9_]+\s*\(\s*(.*)\);?\s*?\s*$', r'\1', code)


+def js_to_json(code):
+    def fix_kv(m):
+        key = m.group(2)
+        if key.startswith("'"):
+            assert key.endswith("'")
+            assert '"' not in key
+            key = '"%s"' % key[1:-1]
+        elif not key.startswith('"'):
+            key = '"%s"' % key
+
+        value = m.group(4)
+        if value.startswith("'"):
+            assert value.endswith("'")
+            assert '"' not in value
+            value = '"%s"' % value[1:-1]
+
+        return m.group(1) + key + m.group(3) + value
+
+    res = re.sub(r'''(?x)
+            ([{,]\s*)
+            ("[^"]*"|\'[^\']*\'|[a-z0-9A-Z]+)
+            (:\s*)
+            ([0-9.]+|true|false|"[^"]*"|\'[^\']*\'|\[|\{)
+        ''', fix_kv, code)
+    res = re.sub(r',(\s*\])', lambda m: m.group(1), res)
+    return res
+
+
 def qualities(quality_ids):
    """ Get a numeric quality value out of a list of possible values """
    def q(qid):
--- a/youtube_dl/version.py
+++ b/youtube_dl/version.py
@@ -1,2 +1,2 @@

-__version__ = '2014.08.21.2'
+__version__ = '2014.08.22.3'
Author	SHA1	Message	Date
Philipp Hagemeister	492641d10a	release 2014.08.22.3	2014-08-22 18:41:43 +02:00
Philipp Hagemeister	2b9faf5542	[rtve] Add support for live stream At the moment, only RTVE-1 seems to work flawlessly. -2 seems geoblocked right now. -TDP doesn't seem to be available outside of Spain.	2014-08-22 18:40:28 +02:00
Philipp Hagemeister	ed2d6a1960	[generic] Simplify playlist support (#2948 )	2014-08-22 18:19:56 +02:00
Philipp Hagemeister	be843678b1	[YouTubeDL] Correct handling of age_limit = None in result	2014-08-22 17:46:57 +02:00
Philipp Hagemeister	c71dfccc98	Merge remote-tracking branch 'anovicecodemonkey/generic-data-video-url' Conflicts: youtube_dl/extractor/generic.py	2014-08-22 17:40:36 +02:00
Philipp Hagemeister	1a9ccac7c1	Merge remote-tracking branch 'origin/master'	2014-08-22 17:38:11 +02:00
Philipp Hagemeister	e330d59abb	[playfm] Add extractor (Fixes #3538 )	2014-08-22 17:38:06 +02:00
Sergey M․	394df6d7d0	[nuvid] Adapt to latest layout changes	2014-08-22 21:41:51 +07:00
Philipp Hagemeister	218f754940	[README] Add thumbnail to _TEST example While it's not mandatory, extractors are highly encouraged to provide a thumbnail field.	2014-08-22 11:30:49 +02:00
Philipp Hagemeister	a053c3493a	[test_YoutubeDL] Reorder formats (#3542 )	2014-08-22 03:44:30 +02:00
Philipp Hagemeister	50b294aab8	release 2014.08.22.2	2014-08-22 03:16:16 +02:00
Philipp Hagemeister	756b046f3e	[pbs] recognize class=partnerPlayer as well (Fixes #3564 )	2014-08-22 03:16:08 +02:00
Philipp Hagemeister	388ac0b18a	release 2014.08.22.1	2014-08-22 03:02:49 +02:00
Philipp Hagemeister	ad06434bd3	release 2014.08.22	2014-08-22 02:57:08 +02:00
Philipp Hagemeister	bd9820c937	Merge remote-tracking branch 'liudongmiao/patch-subtitle'	2014-08-22 02:45:21 +02:00
Philipp Hagemeister	deda8ac376	Credit @terminalmage for patreon (#3390 )	2014-08-22 02:34:22 +02:00
Philipp Hagemeister	e05f693942	[patreon] Simplify (#3390 )	2014-08-22 02:33:29 +02:00
Philipp Hagemeister	b27295d2ab	Merge remote-tracking branch 'terminalmage/add-patreon'	2014-08-22 01:52:56 +02:00
Philipp Hagemeister	ace52c5713	[README] format	2014-08-22 01:51:26 +02:00
Philipp Hagemeister	e62e150f64	[README] brevity is the soul of wit These instructions are overly long as it is. Leave out the _TESTS example; most developers will not need it in their first IE.	2014-08-22 01:47:44 +02:00
Philipp Hagemeister	c44c0a775d	Merge remote-tracking branch 'terminalmage/readme'	2014-08-22 01:46:46 +02:00
Philipp Hagemeister	5fcf2dbed0	[aparat] modernize	2014-08-22 01:44:52 +02:00
Philipp Hagemeister	91dff03217	[dump] Modernize (#3565 )	2014-08-22 01:43:19 +02:00
Philipp Hagemeister	a200f4cee2	Merge remote-tracking branch 'yasoob/master'	2014-08-22 01:38:59 +02:00
Philipp Hagemeister	ea6e8d5454	[metacafe] Add support for movieclips videos (Fixes #3555 )	2014-08-22 01:36:07 +02:00
M.Yasoob Ullah Khalid ☺	83d35817f5	Added test for dump.com	2014-08-22 01:31:12 +05:00
M.Yasoob Ullah Khalid ☺	76beff70a8	Added an IE for Dump.com	2014-08-22 01:30:49 +05:00
Philipp Hagemeister	61882bf7c6	release 2014.08.21.3	2014-08-21 18:02:02 +02:00
Philipp Hagemeister	cab317a680	Merge remote-tracking branch 'origin/master'	2014-08-21 18:01:33 +02:00
Sergey M․	73159f99cc	[utils] Add missing mode and encoding arguments	2014-08-21 22:03:00 +07:00
Philipp Hagemeister	c15235cd07	[metacafe] Avoid excessive nesting	2014-08-21 13:37:19 +02:00
Philipp Hagemeister	12c3ec3382	[metacafe] Simplify	2014-08-21 13:25:17 +02:00
Philipp Hagemeister	55db73efdf	[youtube] tag 171 is 128KBits (Fixes #3542 )	2014-08-21 13:13:26 +02:00
Erik Johnson	1ce464aba9	Add more information about running tests, add syntax highlighting There was no information in the README about how to handle multiple tests for a given extractor. This commit adds an explanation of how this is handled. It also adds some syntax highlighting.	2014-08-05 01:54:58 -05:00
Erik Johnson	6994e70651	Fix CSS parsing for Patreon Some of the CSS classes end in " double", so this commit refines the HTML parsing to account for both kinds of classes, and also adds an additional test case.	2014-08-05 00:26:23 -05:00
Erik Johnson	c3f0b12b0f	fix exception	2014-07-30 15:30:07 -05:00
Erik Johnson	27ace98f51	Add import for Patreon extractor	2014-07-28 13:41:28 -05:00
Erik Johnson	a00d73c8c8	Add Patreon extractor	2014-07-28 13:40:58 -05:00
Liu DongMiao	7e660ac113	if there is more than one subtitle for the language, use the first one	2014-07-23 10:56:09 +08:00
anovicecodemonkey	37e3cbe22e	Move duplicate check to generic.py	2014-06-01 01:16:35 +09:30
anovicecodemonkey	610134730a	Add a _TEST_	2014-05-21 19:25:37 +09:30
anovicecodemonkey	212a5e28ba	Add a duplicate check to /extractor/common.py playlist_result function	2014-05-21 19:04:55 +09:30
anovicecodemonkey	3442b30ab2	[generic] Support data-video-url for YouTube embeds (Fixes #2862 )	2014-05-18 23:15:09 +09:30