Compare commits

...

65 Commits

Author SHA1 Message Date
Philipp Hagemeister
1f91bd15c3 release 2014.03.21.3 2014-03-21 02:10:35 +01:00
Philipp Hagemeister
11a15be4ce [cspan] Add support for newer videos (Fixes #2577) 2014-03-21 02:10:24 +01:00
Philipp Hagemeister
14e17e18cb release 2014.03.21.2 2014-03-21 01:42:45 +01:00
Philipp Hagemeister
1b124d1942 [parliamentliveuk] Add extractor 2014-03-21 01:42:28 +01:00
Philipp Hagemeister
747373d4ae release 2014.03.21.1 2014-03-21 01:00:27 +01:00
Philipp Hagemeister
18d367c0a5 Remove legacy InfoExtractors file 2014-03-21 01:00:06 +01:00
Philipp Hagemeister
a1a530b067 [pbs] Add support for video ratings 2014-03-21 00:59:51 +01:00
Philipp Hagemeister
cb9722cb3f [viki] Modernize 2014-03-21 00:53:18 +01:00
Philipp Hagemeister
773c0b4bb8 [pbs] Add support for widget URLs (Fixes #2594) 2014-03-21 00:46:32 +01:00
Philipp Hagemeister
23c322a531 release 2014.03.21 2014-03-21 00:37:23 +01:00
Philipp Hagemeister
7e8c0af004 Add --prefer-insecure option (Fixes #2364) 2014-03-21 00:37:10 +01:00
Philipp Hagemeister
d2983ccb25 [ninegag] Modernize and remove unused import 2014-03-21 00:37:10 +01:00
Philipp Hagemeister
f24e9833dc [youporn] Modernize 2014-03-21 00:37:10 +01:00
Sergey M․
bc2bdf5709 [kontrtube] Modernize 2014-03-20 23:05:57 +07:00
Philipp Hagemeister
627a209f74 release 2014.03.20 2014-03-20 16:35:54 +01:00
Philipp Hagemeister
1a4895453a [YoutubeDL] Improve error message 2014-03-20 16:33:46 +01:00
Philipp Hagemeister
aab74fa106 [ted] Simplify embed code (#2587) 2014-03-20 16:33:23 +01:00
Philipp Hagemeister
2bd9efd4c2 Merge remote-tracking branch 'anovicecodemonkey/TEDIEimprovements' 2014-03-20 16:24:34 +01:00
Jaime Marquínez Ferrándiz
39a743fb9b [arte] Modernize tests and fix _VALID_REGEX 2014-03-20 09:14:43 +01:00
Jaime Marquínez Ferrándiz
4966a0b22d [arte] Add extractor for concert.arte.tv (closes #2588) 2014-03-20 09:11:47 +01:00
anovicecodemonkey
fc26023120 [TEDIE] Add support for embeded TED video URLs 2014-03-20 01:04:21 +10:30
anovicecodemonkey
8d7c0cca13 [generic] Add support for embeded TED videos 2014-03-20 00:56:32 +10:30
Sergey M․
f66ede4328 [arte.tv:+7] Fix _VALID_URL 2014-03-19 21:23:55 +07:00
Philipp Hagemeister
cc88b90ec8 [desvscripts/release] Bump the number of password tries to accomodate stubby-fingered @phihag 2014-03-18 15:02:37 +01:00
Philipp Hagemeister
b6c5fa9a0b release 2014.03.18.1 2014-03-18 14:42:59 +01:00
Philipp Hagemeister
dff10eaa77 release 2014.03.18 2014-03-18 14:31:03 +01:00
Philipp Hagemeister
4e6f9aeca1 Fix typo 2014-03-18 14:28:53 +01:00
Philipp Hagemeister
e68301af21 Fix getpass on Windows (Fixes #2547) 2014-03-18 14:27:42 +01:00
Sergey M․
17286a96f2 [iprima] Fix permission check regex 2014-03-18 19:33:28 +07:00
Jaime Marquínez Ferrándiz
0892363e6d Merge pull request #2580 from ericpardee/patch-1
Update to comedycentral.py (cc.com)
2014-03-18 08:14:39 +01:00
ericpardee
f102372b5f Update to comedycentral.py (cc.com)
Added cc.com as it's same as comedycentral.com and used, i.e. http://www.cc.com/video-clips/fmyq0m/broad-city-a-beautiful-railroad-style-apartment
2014-03-17 18:01:26 -07:00
Jaime Marquínez Ferrándiz
ecbe1ad207 [generic] Fix access to removed function in python 3.4
The `Request.get_origin_req_host` method was deprecated in 3.3, use the
 `origin_req_host` property if it's not available, see http://docs.python.org/3.3/library/urllib.request.html#urllib.request.Request.get_origin_req_host.
2014-03-17 21:59:21 +01:00
Philipp Hagemeister
9d840c43b5 release 2014.03.17 2014-03-17 14:49:02 +01:00
Philipp Hagemeister
6f50f63382 Merge remote-tracking branch 'origin/wheels' 2014-03-17 14:31:22 +01:00
Philipp Hagemeister
ff14fc4964 [test] Rename get_testcases to gettestcases
Apparently, newer versions of nosetests are somewhat over-eager in their test discovery.
2014-03-17 14:30:13 +01:00
Sergey M․
e125c21531 [vesti] Restore vesti extractor 2014-03-17 02:01:01 +07:00
Sergey M․
93d020dd65 [generic] Add support for embedded rutv player 2014-03-17 02:00:31 +07:00
Sergey M․
a7515ec265 [rutv] Refactor vgtrk/rutv extractor 2014-03-17 01:59:40 +07:00
Jaime Marquínez Ferrándiz
b6c1ceccc2 [ted] Add 'http://' to the thumbnail url if it's missing 2014-03-16 11:24:11 +01:00
Jaime Marquínez Ferrándiz
4056ad8f36 Build and upload universal wheels to pypi 2014-03-16 10:22:41 +01:00
Philipp Hagemeister
6563837ee1 [udemy] Make sure test case is not inherited 2014-03-16 07:09:10 +01:00
Philipp Hagemeister
fd5e6f7ef2 [vevo] Mark all test timestamps as approximate 2014-03-16 07:05:48 +01:00
Sergey M․
15fd51b37c [generic] More generic support for embedded vimeo player (#1602) 2014-03-16 00:47:04 +07:00
Sergey M․
f1cef7a9ff [iprima] Skip test 2014-03-15 01:39:42 +07:00
Sergey M․
8264223511 [iprima] Add access permission check 2014-03-15 01:38:44 +07:00
Jaime Marquínez Ferrándiz
bc6d597828 Add bestvideo and worstvideo to special format names (#2163) 2014-03-14 17:01:47 +01:00
Philipp Hagemeister
aba77bbfc2 [vevo] Adapt test to constantly changing timestamp 2014-03-13 18:45:14 +01:00
Philipp Hagemeister
955c451456 Rename upload_timestamp to timestamp 2014-03-13 18:45:14 +01:00
Sergey M․
e5de3f6c89 [udemy] Initial support for free courses (#1617) 2014-03-14 00:36:39 +07:00
Philipp Hagemeister
2a1db721d4 [test_download] Move assertions before debugging output 2014-03-13 17:05:51 +01:00
Philipp Hagemeister
1e0eb60f1a [videobam] Fix empty title handling 2014-03-13 17:03:43 +01:00
Philipp Hagemeister
87a29e6f25 [wdr] Add description to tests 2014-03-13 17:01:58 +01:00
Philipp Hagemeister
c3d36f134f [googlesearch] Fix next page indicator check 2014-03-13 16:52:13 +01:00
Philipp Hagemeister
84769e708c [ninegag] Fix extraction 2014-03-13 16:40:53 +01:00
Philipp Hagemeister
9d2ecdbc71 [vevo] Centralize timestamp handling 2014-03-13 15:30:25 +01:00
Philipp Hagemeister
9b69af5342 Merge remote-tracking branch 'soult/br' 2014-03-13 14:35:34 +01:00
David Triendl
c21215b421 [br] Allow '/' in URL, allow empty author + broadcastDate fields
* Allow URLs that have a 'subdirectory' before the actual program name, e.g.
  'xyz/xyz-episode-1'.
* The author and broadcastDate fields in the XML file may be empty.
* Add test case for the two problems above.
2014-03-13 14:08:34 +01:00
Philipp Hagemeister
cddcfd90b4 [funnyordie] Correct JSON interpretation 2014-03-13 00:53:19 +01:00
Sergey M․
f36aacba0f [collegehumor] Fix one more test 2014-03-13 06:25:12 +07:00
Sergey M․
355271fb61 [collegehumor] Extract like count 2014-03-13 06:12:39 +07:00
Sergey M․
2a5b502364 [collegehumor] Fix test 2014-03-13 06:09:21 +07:00
Philipp Hagemeister
98ff9d82d4 release 2014.03.12 2014-03-12 14:50:14 +01:00
Jaime Marquínez Ferrándiz
b1ff87224c [vimeo] Now VimeoIE doesn't match urls of channels with a numeric id (fixes #2552) 2014-03-12 14:23:06 +01:00
Sergey M․
b461641fb9 [wdr] Add support for WDR sites (Closes #1367) 2014-03-12 04:20:47 +07:00
Sergey M․
b047de6f6e Add format to unified_strdate 2014-03-12 04:18:43 +07:00
39 changed files with 916 additions and 323 deletions

View File

@@ -36,6 +36,9 @@ which means you can modify it, redistribute it or use it however you like.
an empty string (--proxy "") for direct an empty string (--proxy "") for direct
connection connection
--no-check-certificate Suppress HTTPS certificate validation. --no-check-certificate Suppress HTTPS certificate validation.
--prefer-insecure Use an unencrypted connection to retrieve
information about the video. (Currently
supported only for YouTube)
--cache-dir DIR Location in the filesystem where youtube-dl --cache-dir DIR Location in the filesystem where youtube-dl
can store some downloaded information can store some downloaded information
permanently. By default $XDG_CACHE_HOME permanently. By default $XDG_CACHE_HOME
@@ -191,9 +194,9 @@ which means you can modify it, redistribute it or use it however you like.
preference using slashes: "-f 22/17/18". preference using slashes: "-f 22/17/18".
"-f mp4" and "-f flv" are also supported. "-f mp4" and "-f flv" are also supported.
You can also use the special names "best", You can also use the special names "best",
"bestaudio", "worst", and "worstaudio". By "bestvideo", "bestaudio", "worst",
default, youtube-dl will pick the best "worstvideo" and "worstaudio". By default,
quality. youtube-dl will pick the best quality.
--all-formats download all available video formats --all-formats download all available video formats
--prefer-free-formats prefer free video formats unless a specific --prefer-free-formats prefer free video formats unless a specific
one is requested one is requested

View File

@@ -70,7 +70,7 @@ RELEASE_FILES="youtube-dl youtube-dl.exe youtube-dl-$version.tar.gz"
git checkout HEAD -- youtube-dl youtube-dl.exe git checkout HEAD -- youtube-dl youtube-dl.exe
/bin/echo -e "\n### Signing and uploading the new binaries to yt-dl.org ..." /bin/echo -e "\n### Signing and uploading the new binaries to yt-dl.org ..."
for f in $RELEASE_FILES; do gpg --detach-sig "build/$version/$f"; done for f in $RELEASE_FILES; do gpg --passphrase-repeat 5 --detach-sig "build/$version/$f"; done
scp -r "build/$version" ytdl@yt-dl.org:html/tmp/ scp -r "build/$version" ytdl@yt-dl.org:html/tmp/
ssh ytdl@yt-dl.org "mv html/tmp/$version html/downloads/" ssh ytdl@yt-dl.org "mv html/tmp/$version html/downloads/"
ssh ytdl@yt-dl.org "sh html/update_latest.sh $version" ssh ytdl@yt-dl.org "sh html/update_latest.sh $version"
@@ -97,7 +97,7 @@ rm -rf build
make pypi-files make pypi-files
echo "Uploading to PyPi ..." echo "Uploading to PyPi ..."
python setup.py sdist upload python setup.py sdist bdist_wheel upload
make clean make clean
/bin/echo -e "\n### DONE!" /bin/echo -e "\n### DONE!"

2
setup.cfg Normal file
View File

@@ -0,0 +1,2 @@
[wheel]
universal = True

View File

@@ -71,7 +71,7 @@ class FakeYDL(YoutubeDL):
old_report_warning(message) old_report_warning(message)
self.report_warning = types.MethodType(report_warning, self) self.report_warning = types.MethodType(report_warning, self)
def get_testcases(): def gettestcases():
for ie in youtube_dl.extractor.gen_extractors(): for ie in youtube_dl.extractor.gen_extractors():
t = getattr(ie, '_TEST', None) t = getattr(ie, '_TEST', None)
if t: if t:

View File

@@ -182,6 +182,24 @@ class TestFormatSelection(unittest.TestCase):
downloaded = ydl.downloaded_info_dicts[0] downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], 'vid-high') self.assertEqual(downloaded['format_id'], 'vid-high')
def test_format_selection_video(self):
formats = [
{'format_id': 'dash-video-low', 'ext': 'mp4', 'preference': 1, 'acodec': 'none'},
{'format_id': 'dash-video-high', 'ext': 'mp4', 'preference': 2, 'acodec': 'none'},
{'format_id': 'vid', 'ext': 'mp4', 'preference': 3},
]
info_dict = {'formats': formats, 'extractor': 'test'}
ydl = YDL({'format': 'bestvideo'})
ydl.process_ie_result(info_dict.copy())
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], 'dash-video-high')
ydl = YDL({'format': 'worstvideo'})
ydl.process_ie_result(info_dict.copy())
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], 'dash-video-low')
def test_youtube_format_selection(self): def test_youtube_format_selection(self):
order = [ order = [
'38', '37', '46', '22', '45', '35', '44', '18', '34', '43', '6', '5', '36', '17', '13', '38', '37', '46', '22', '45', '35', '44', '18', '34', '43', '6', '5', '36', '17', '13',

View File

@@ -9,7 +9,7 @@ import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__)))) sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import get_testcases from test.helper import gettestcases
from youtube_dl.extractor import ( from youtube_dl.extractor import (
FacebookIE, FacebookIE,
@@ -105,7 +105,7 @@ class TestAllURLsMatching(unittest.TestCase):
def test_no_duplicates(self): def test_no_duplicates(self):
ies = gen_extractors() ies = gen_extractors()
for tc in get_testcases(): for tc in gettestcases():
url = tc['url'] url = tc['url']
for ie in ies: for ie in ies:
if type(ie).__name__ in ('GenericIE', tc['name'] + 'IE'): if type(ie).__name__ in ('GenericIE', tc['name'] + 'IE'):
@@ -124,6 +124,8 @@ class TestAllURLsMatching(unittest.TestCase):
def test_vimeo_matching(self): def test_vimeo_matching(self):
self.assertMatch('http://vimeo.com/channels/tributes', ['vimeo:channel']) self.assertMatch('http://vimeo.com/channels/tributes', ['vimeo:channel'])
self.assertMatch('http://vimeo.com/channels/31259', ['vimeo:channel'])
self.assertMatch('http://vimeo.com/channels/31259/53576664', ['vimeo'])
self.assertMatch('http://vimeo.com/user7108434', ['vimeo:user']) self.assertMatch('http://vimeo.com/user7108434', ['vimeo:user'])
self.assertMatch('http://vimeo.com/user7108434/videos', ['vimeo:user']) self.assertMatch('http://vimeo.com/user7108434/videos', ['vimeo:user'])
self.assertMatch('https://vimeo.com/user21297594/review/75524534/3c257a1b5d', ['vimeo:review']) self.assertMatch('https://vimeo.com/user21297594/review/75524534/3c257a1b5d', ['vimeo:review'])
@@ -139,6 +141,7 @@ class TestAllURLsMatching(unittest.TestCase):
def test_pbs(self): def test_pbs(self):
# https://github.com/rg3/youtube-dl/issues/2350 # https://github.com/rg3/youtube-dl/issues/2350
self.assertMatch('http://video.pbs.org/viralplayer/2365173446/', ['PBS']) self.assertMatch('http://video.pbs.org/viralplayer/2365173446/', ['PBS'])
self.assertMatch('http://video.pbs.org/widget/partnerplayer/980042464/', ['PBS'])
if __name__ == '__main__': if __name__ == '__main__':
unittest.main() unittest.main()

View File

@@ -8,7 +8,7 @@ sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import ( from test.helper import (
get_params, get_params,
get_testcases, gettestcases,
try_rm, try_rm,
md5, md5,
report_warning report_warning
@@ -51,7 +51,7 @@ def _file_md5(fn):
with open(fn, 'rb') as f: with open(fn, 'rb') as f:
return hashlib.md5(f.read()).hexdigest() return hashlib.md5(f.read()).hexdigest()
defs = get_testcases() defs = gettestcases()
class TestDownload(unittest.TestCase): class TestDownload(unittest.TestCase):
@@ -144,6 +144,10 @@ def generator(test_case):
self.assertTrue( self.assertTrue(
isinstance(got, compat_str) and match_rex.match(got), isinstance(got, compat_str) and match_rex.match(got),
u'field %s (value: %r) should match %r' % (info_field, got, match_str)) u'field %s (value: %r) should match %r' % (info_field, got, match_str))
elif isinstance(expected, type):
got = info_dict.get(info_field)
self.assertTrue(isinstance(got, expected),
u'Expected type %r, but got value %r of type %r' % (expected, got, type(got)))
else: else:
if isinstance(expected, compat_str) and expected.startswith('md5:'): if isinstance(expected, compat_str) and expected.startswith('md5:'):
got = 'md5:' + md5(info_dict.get(info_field)) got = 'md5:' + md5(info_dict.get(info_field))
@@ -152,19 +156,19 @@ def generator(test_case):
self.assertEqual(expected, got, self.assertEqual(expected, got,
u'invalid value for field %s, expected %r, got %r' % (info_field, expected, got)) u'invalid value for field %s, expected %r, got %r' % (info_field, expected, got))
# If checkable fields are missing from the test case, print the info_dict
test_info_dict = dict((key, value if not isinstance(value, compat_str) or len(value) < 250 else 'md5:' + md5(value))
for key, value in info_dict.items()
if value and key in ('title', 'description', 'uploader', 'upload_date', 'uploader_id', 'location'))
if not all(key in tc.get('info_dict', {}).keys() for key in test_info_dict.keys()):
sys.stderr.write(u'\n"info_dict": ' + json.dumps(test_info_dict, ensure_ascii=False, indent=4) + u'\n')
# Check for the presence of mandatory fields # Check for the presence of mandatory fields
for key in ('id', 'url', 'title', 'ext'): for key in ('id', 'url', 'title', 'ext'):
self.assertTrue(key in info_dict.keys() and info_dict[key]) self.assertTrue(key in info_dict.keys() and info_dict[key])
# Check for mandatory fields that are automatically set by YoutubeDL # Check for mandatory fields that are automatically set by YoutubeDL
for key in ['webpage_url', 'extractor', 'extractor_key']: for key in ['webpage_url', 'extractor', 'extractor_key']:
self.assertTrue(info_dict.get(key), u'Missing field: %s' % key) self.assertTrue(info_dict.get(key), u'Missing field: %s' % key)
# If checkable fields are missing from the test case, print the info_dict
test_info_dict = dict((key, value if not isinstance(value, compat_str) or len(value) < 250 else 'md5:' + md5(value))
for key, value in info_dict.items()
if value and key in ('title', 'description', 'uploader', 'upload_date', 'timestamp', 'uploader_id', 'location'))
if not all(key in tc.get('info_dict', {}).keys() for key in test_info_dict.keys()):
sys.stderr.write(u'\n"info_dict": ' + json.dumps(test_info_dict, ensure_ascii=False, indent=4) + u'\n')
finally: finally:
try_rm_tcs_files() try_rm_tcs_files()

View File

@@ -249,7 +249,7 @@ class TestPlaylists(unittest.TestCase):
self.assertIsPlaylist(result) self.assertIsPlaylist(result)
self.assertEqual(result['id'], 'python language') self.assertEqual(result['id'], 'python language')
self.assertEqual(result['title'], 'python language') self.assertEqual(result['title'], 'python language')
self.assertTrue(len(result['entries']) == 15) self.assertEqual(len(result['entries']), 15)
def test_generic_rss_feed(self): def test_generic_rss_feed(self):
dl = FakeYDL() dl = FakeYDL()

View File

@@ -1,4 +0,0 @@
# Legacy file for backwards compatibility, use youtube_dl.extractor instead!
from .extractor.common import InfoExtractor, SearchInfoExtractor
from .extractor import gen_extractors, get_info_extractor

View File

@@ -4,6 +4,7 @@
from __future__ import absolute_import, unicode_literals from __future__ import absolute_import, unicode_literals
import collections import collections
import datetime
import errno import errno
import io import io
import json import json
@@ -147,6 +148,8 @@ class YoutubeDL(object):
again. again.
cookiefile: File name where cookies should be read from and dumped to. cookiefile: File name where cookies should be read from and dumped to.
nocheckcertificate:Do not verify SSL certificates nocheckcertificate:Do not verify SSL certificates
prefer_insecure: Use HTTP instead of HTTPS to retrieve information.
At the moment, this is only supported by YouTube.
proxy: URL of the proxy server to use proxy: URL of the proxy server to use
socket_timeout: Time to wait for unresponsive hosts, in seconds socket_timeout: Time to wait for unresponsive hosts, in seconds
bidi_workaround: Work around buggy terminals without bidirectional text bidi_workaround: Work around buggy terminals without bidirectional text
@@ -532,7 +535,7 @@ class YoutubeDL(object):
else: else:
raise raise
else: else:
self.report_error('no suitable InfoExtractor: %s' % url) self.report_error('no suitable InfoExtractor for URL %s' % url)
def process_ie_result(self, ie_result, download=True, extra_info={}): def process_ie_result(self, ie_result, download=True, extra_info={}):
""" """
@@ -666,6 +669,18 @@ class YoutubeDL(object):
if f.get('vcodec') == 'none'] if f.get('vcodec') == 'none']
if audio_formats: if audio_formats:
return audio_formats[0] return audio_formats[0]
elif format_spec == 'bestvideo':
video_formats = [
f for f in available_formats
if f.get('acodec') == 'none']
if video_formats:
return video_formats[-1]
elif format_spec == 'worstvideo':
video_formats = [
f for f in available_formats
if f.get('acodec') == 'none']
if video_formats:
return video_formats[0]
else: else:
extensions = ['mp4', 'flv', 'webm', '3gp'] extensions = ['mp4', 'flv', 'webm', '3gp']
if format_spec in extensions: if format_spec in extensions:
@@ -688,6 +703,11 @@ class YoutubeDL(object):
if 'display_id' not in info_dict and 'id' in info_dict: if 'display_id' not in info_dict and 'id' in info_dict:
info_dict['display_id'] = info_dict['id'] info_dict['display_id'] = info_dict['id']
if info_dict.get('upload_date') is None and info_dict.get('timestamp') is not None:
upload_date = datetime.datetime.utcfromtimestamp(
info_dict['timestamp'])
info_dict['upload_date'] = upload_date.strftime('%Y%m%d')
# This extractors handle format selection themselves # This extractors handle format selection themselves
if info_dict['extractor'] in ['Youku']: if info_dict['extractor'] in ['Youku']:
if download: if download:

View File

@@ -56,7 +56,6 @@ __authors__ = (
__license__ = 'Public Domain' __license__ = 'Public Domain'
import codecs import codecs
import getpass
import io import io
import locale import locale
import optparse import optparse
@@ -68,6 +67,7 @@ import sys
from .utils import ( from .utils import (
compat_getpass,
compat_print, compat_print,
DateRange, DateRange,
decodeOption, decodeOption,
@@ -237,6 +237,9 @@ def parseOpts(overrideArguments=None):
'--proxy', dest='proxy', default=None, metavar='URL', '--proxy', dest='proxy', default=None, metavar='URL',
help='Use the specified HTTP/HTTPS proxy. Pass in an empty string (--proxy "") for direct connection') help='Use the specified HTTP/HTTPS proxy. Pass in an empty string (--proxy "") for direct connection')
general.add_option('--no-check-certificate', action='store_true', dest='no_check_certificate', default=False, help='Suppress HTTPS certificate validation.') general.add_option('--no-check-certificate', action='store_true', dest='no_check_certificate', default=False, help='Suppress HTTPS certificate validation.')
general.add_option(
'--prefer-insecure', action='store_true', dest='prefer_insecure',
help='Use an unencrypted connection to retrieve information about the video. (Currently supported only for YouTube)')
general.add_option( general.add_option(
'--cache-dir', dest='cachedir', default=get_cachedir(), metavar='DIR', '--cache-dir', dest='cachedir', default=get_cachedir(), metavar='DIR',
help='Location in the filesystem where youtube-dl can store some downloaded information permanently. By default $XDG_CACHE_HOME/youtube-dl or ~/.cache/youtube-dl . At the moment, only YouTube player files (for videos with obfuscated signatures) are cached, but that may change.') help='Location in the filesystem where youtube-dl can store some downloaded information permanently. By default $XDG_CACHE_HOME/youtube-dl or ~/.cache/youtube-dl . At the moment, only YouTube player files (for videos with obfuscated signatures) are cached, but that may change.')
@@ -257,7 +260,6 @@ def parseOpts(overrideArguments=None):
action='store_true', action='store_true',
help='Do not read configuration files. When given in the global configuration file /etc/youtube-dl.conf: do not read the user configuration in ~/.config/youtube-dl.conf (%APPDATA%/youtube-dl/config.txt on Windows)') help='Do not read configuration files. When given in the global configuration file /etc/youtube-dl.conf: do not read the user configuration in ~/.config/youtube-dl.conf (%APPDATA%/youtube-dl/config.txt on Windows)')
selection.add_option( selection.add_option(
'--playlist-start', '--playlist-start',
dest='playliststart', metavar='NUMBER', default=1, type=int, dest='playliststart', metavar='NUMBER', default=1, type=int,
@@ -316,7 +318,7 @@ def parseOpts(overrideArguments=None):
video_format.add_option('-f', '--format', video_format.add_option('-f', '--format',
action='store', dest='format', metavar='FORMAT', default=None, action='store', dest='format', metavar='FORMAT', default=None,
help='video format code, specify the order of preference using slashes: "-f 22/17/18". "-f mp4" and "-f flv" are also supported. You can also use the special names "best", "bestaudio", "worst", and "worstaudio". By default, youtube-dl will pick the best quality.') help='video format code, specify the order of preference using slashes: "-f 22/17/18". "-f mp4" and "-f flv" are also supported. You can also use the special names "best", "bestvideo", "bestaudio", "worst", "worstvideo" and "worstaudio". By default, youtube-dl will pick the best quality.')
video_format.add_option('--all-formats', video_format.add_option('--all-formats',
action='store_const', dest='format', help='download all available video formats', const='all') action='store_const', dest='format', help='download all available video formats', const='all')
video_format.add_option('--prefer-free-formats', video_format.add_option('--prefer-free-formats',
@@ -611,7 +613,7 @@ def _real_main(argv=None):
if opts.usetitle and opts.useid: if opts.usetitle and opts.useid:
parser.error(u'using title conflicts with using video ID') parser.error(u'using title conflicts with using video ID')
if opts.username is not None and opts.password is None: if opts.username is not None and opts.password is None:
opts.password = getpass.getpass(u'Type account password and press return:') opts.password = compat_getpass(u'Type account password and press [Return]: ')
if opts.ratelimit is not None: if opts.ratelimit is not None:
numeric_limit = FileDownloader.parse_bytes(opts.ratelimit) numeric_limit = FileDownloader.parse_bytes(opts.ratelimit)
if numeric_limit is None: if numeric_limit is None:
@@ -756,6 +758,7 @@ def _real_main(argv=None):
'download_archive': download_archive_fn, 'download_archive': download_archive_fn,
'cookiefile': opts.cookiefile, 'cookiefile': opts.cookiefile,
'nocheckcertificate': opts.no_check_certificate, 'nocheckcertificate': opts.no_check_certificate,
'prefer_insecure': opts.prefer_insecure,
'proxy': opts.proxy, 'proxy': opts.proxy,
'socket_timeout': opts.socket_timeout, 'socket_timeout': opts.socket_timeout,
'bidi_workaround': opts.bidi_workaround, 'bidi_workaround': opts.bidi_workaround,

View File

@@ -10,6 +10,7 @@ from .arte import (
ArteTvIE, ArteTvIE,
ArteTVPlus7IE, ArteTVPlus7IE,
ArteTVCreativeIE, ArteTVCreativeIE,
ArteTVConcertIE,
ArteTVFutureIE, ArteTVFutureIE,
ArteTVDDCIE, ArteTVDDCIE,
) )
@@ -173,6 +174,7 @@ from .nowness import NownessIE
from .nowvideo import NowVideoIE from .nowvideo import NowVideoIE
from .ooyala import OoyalaIE from .ooyala import OoyalaIE
from .orf import ORFIE from .orf import ORFIE
from .parliamentliveuk import ParliamentLiveUKIE
from .pbs import PBSIE from .pbs import PBSIE
from .photobucket import PhotobucketIE from .photobucket import PhotobucketIE
from .playvid import PlayvidIE from .playvid import PlayvidIE
@@ -196,6 +198,7 @@ from .rutube import (
RutubeMovieIE, RutubeMovieIE,
RutubePersonIE, RutubePersonIE,
) )
from .rutv import RUTVIE
from .savefrom import SaveFromIE from .savefrom import SaveFromIE
from .servingsys import ServingSysIE from .servingsys import ServingSysIE
from .sina import SinaIE from .sina import SinaIE
@@ -242,13 +245,17 @@ from .tumblr import TumblrIE
from .tutv import TutvIE from .tutv import TutvIE
from .tvigle import TvigleIE from .tvigle import TvigleIE
from .tvp import TvpIE from .tvp import TvpIE
from .udemy import (
UdemyIE,
UdemyCourseIE
)
from .unistra import UnistraIE from .unistra import UnistraIE
from .ustream import UstreamIE, UstreamChannelIE from .ustream import UstreamIE, UstreamChannelIE
from .vbox7 import Vbox7IE from .vbox7 import Vbox7IE
from .veehd import VeeHDIE from .veehd import VeeHDIE
from .veoh import VeohIE from .veoh import VeohIE
from .vesti import VestiIE
from .vevo import VevoIE from .vevo import VevoIE
from .vgtrk import VGTRKIE
from .vice import ViceIE from .vice import ViceIE
from .viddler import ViddlerIE from .viddler import ViddlerIE
from .videobam import VideoBamIE from .videobam import VideoBamIE
@@ -268,6 +275,7 @@ from .viki import VikiIE
from .vk import VKIE from .vk import VKIE
from .vube import VubeIE from .vube import VubeIE
from .wat import WatIE from .wat import WatIE
from .wdr import WDRIE
from .weibo import WeiboIE from .weibo import WeiboIE
from .wimp import WimpIE from .wimp import WimpIE
from .wistia import WistiaIE from .wistia import WistiaIE

View File

@@ -131,7 +131,7 @@ class ArteTvIE(InfoExtractor):
class ArteTVPlus7IE(InfoExtractor): class ArteTVPlus7IE(InfoExtractor):
IE_NAME = 'arte.tv:+7' IE_NAME = 'arte.tv:+7'
_VALID_URL = r'https?://www\.arte.tv/guide/(?P<lang>fr|de)/(?:(?:sendungen|emissions)/)?(?P<id>.*?)/(?P<name>.*?)(\?.*)?' _VALID_URL = r'https?://(?:www\.)?arte\.tv/guide/(?P<lang>fr|de)/(?:(?:sendungen|emissions)/)?(?P<id>.*?)/(?P<name>.*?)(\?.*)?'
@classmethod @classmethod
def _extract_url_info(cls, url): def _extract_url_info(cls, url):
@@ -202,6 +202,8 @@ class ArteTVPlus7IE(InfoExtractor):
re.match(r'VO-ST(F|A)', f.get('versionCode', '')) is None, re.match(r'VO-ST(F|A)', f.get('versionCode', '')) is None,
# The version with sourds/mal subtitles has also lower relevance # The version with sourds/mal subtitles has also lower relevance
re.match(r'VO?(F|A)-STM\1', f.get('versionCode', '')) is None, re.match(r'VO?(F|A)-STM\1', f.get('versionCode', '')) is None,
# Prefer http downloads over m3u8
0 if f['url'].endswith('m3u8') else 1,
) )
formats = sorted(formats, key=sort_key) formats = sorted(formats, key=sort_key)
def _format(format_info): def _format(format_info):
@@ -242,8 +244,9 @@ class ArteTVCreativeIE(ArteTVPlus7IE):
_TEST = { _TEST = {
'url': 'http://creative.arte.tv/de/magazin/agentur-amateur-corporate-design', 'url': 'http://creative.arte.tv/de/magazin/agentur-amateur-corporate-design',
'file': '050489-002.mp4',
'info_dict': { 'info_dict': {
'id': '050489-002',
'ext': 'mp4',
'title': 'Agentur Amateur / Agence Amateur #2 : Corporate Design', 'title': 'Agentur Amateur / Agence Amateur #2 : Corporate Design',
}, },
} }
@@ -255,8 +258,9 @@ class ArteTVFutureIE(ArteTVPlus7IE):
_TEST = { _TEST = {
'url': 'http://future.arte.tv/fr/sujet/info-sciences#article-anchor-7081', 'url': 'http://future.arte.tv/fr/sujet/info-sciences#article-anchor-7081',
'file': '050940-003.mp4',
'info_dict': { 'info_dict': {
'id': '050940-003',
'ext': 'mp4',
'title': 'Les champignons au secours de la planète', 'title': 'Les champignons au secours de la planète',
}, },
} }
@@ -270,7 +274,7 @@ class ArteTVFutureIE(ArteTVPlus7IE):
class ArteTVDDCIE(ArteTVPlus7IE): class ArteTVDDCIE(ArteTVPlus7IE):
IE_NAME = 'arte.tv:ddc' IE_NAME = 'arte.tv:ddc'
_VALID_URL = r'http?://ddc\.arte\.tv/(?P<lang>emission|folge)/(?P<id>.+)' _VALID_URL = r'https?://ddc\.arte\.tv/(?P<lang>emission|folge)/(?P<id>.+)'
def _real_extract(self, url): def _real_extract(self, url):
video_id, lang = self._extract_url_info(url) video_id, lang = self._extract_url_info(url)
@@ -284,3 +288,19 @@ class ArteTVDDCIE(ArteTVPlus7IE):
javascriptPlayerGenerator = self._download_webpage(script_url, video_id, 'Download javascript player generator') javascriptPlayerGenerator = self._download_webpage(script_url, video_id, 'Download javascript player generator')
json_url = self._search_regex(r"json_url=(.*)&rendering_place.*", javascriptPlayerGenerator, 'json url') json_url = self._search_regex(r"json_url=(.*)&rendering_place.*", javascriptPlayerGenerator, 'json url')
return self._extract_from_json_url(json_url, video_id, lang) return self._extract_from_json_url(json_url, video_id, lang)
class ArteTVConcertIE(ArteTVPlus7IE):
IE_NAME = 'arte.tv:concert'
_VALID_URL = r'https?://concert\.arte\.tv/(?P<lang>de|fr)/(?P<id>.+)'
_TEST = {
'url': 'http://concert.arte.tv/de/notwist-im-pariser-konzertclub-divan-du-monde',
'md5': '9ea035b7bd69696b67aa2ccaaa218161',
'info_dict': {
'id': '186',
'ext': 'mp4',
'title': 'The Notwist im Pariser Konzertclub "Divan du Monde"',
'upload_date': '20140128',
},
}

View File

@@ -9,21 +9,35 @@ from ..utils import ExtractorError
class BRIE(InfoExtractor): class BRIE(InfoExtractor):
IE_DESC = "Bayerischer Rundfunk Mediathek" IE_DESC = "Bayerischer Rundfunk Mediathek"
_VALID_URL = r"^https?://(?:www\.)?br\.de/mediathek/video/(?:sendungen/)?(?P<id>[a-z0-9\-]+)\.html$" _VALID_URL = r"^https?://(?:www\.)?br\.de/mediathek/video/(?:sendungen/)?(?:[a-z0-9\-/]+/)?(?P<id>[a-z0-9\-]+)\.html$"
_BASE_URL = "http://www.br.de" _BASE_URL = "http://www.br.de"
_TEST = { _TESTS = [
"url": "http://www.br.de/mediathek/video/anselm-gruen-114.html", {
"md5": "c4f83cf0f023ba5875aba0bf46860df2", "url": "http://www.br.de/mediathek/video/anselm-gruen-114.html",
"info_dict": { "md5": "c4f83cf0f023ba5875aba0bf46860df2",
"id": "2c8d81c5-6fb7-4a74-88d4-e768e5856532", "info_dict": {
"ext": "mp4", "id": "2c8d81c5-6fb7-4a74-88d4-e768e5856532",
"title": "Feiern und Verzichten", "ext": "mp4",
"description": "Anselm Grün: Feiern und Verzichten", "title": "Feiern und Verzichten",
"uploader": "BR/Birgit Baier", "description": "Anselm Grün: Feiern und Verzichten",
"upload_date": "20140301" "uploader": "BR/Birgit Baier",
"upload_date": "20140301"
}
},
{
"url": "http://www.br.de/mediathek/video/sendungen/unter-unserem-himmel/unter-unserem-himmel-alpen-ueber-den-pass-100.html",
"md5": "ab451b09d861dbed7d7cc9ab0be19ebe",
"info_dict": {
"id": "2c060e69-3a27-4e13-b0f0-668fac17d812",
"ext": "mp4",
"title": "Über den Pass",
"description": "Die Eroberung der Alpen: Über den Pass",
"uploader": None,
"upload_date": None
}
} }
} ]
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
@@ -33,16 +47,21 @@ class BRIE(InfoExtractor):
r"return BRavFramework\.register\(BRavFramework\('avPlayer_(?:[a-f0-9-]{36})'\)\.setup\({dataURL:'(/mediathek/video/[a-z0-9/~_.-]+)'}\)\);", page, "XMLURL") r"return BRavFramework\.register\(BRavFramework\('avPlayer_(?:[a-f0-9-]{36})'\)\.setup\({dataURL:'(/mediathek/video/[a-z0-9/~_.-]+)'}\)\);", page, "XMLURL")
xml = self._download_xml(self._BASE_URL + xml_url, None) xml = self._download_xml(self._BASE_URL + xml_url, None)
videos = [{ videos = []
"id": xml_video.get("externalId"), for xml_video in xml.findall("video"):
"title": xml_video.find("title").text, video = {
"formats": self._extract_formats(xml_video.find("assets")), "id": xml_video.get("externalId"),
"thumbnails": self._extract_thumbnails(xml_video.find("teaserImage/variants")), "title": xml_video.find("title").text,
"description": " ".join(xml_video.find("shareTitle").text.splitlines()), "formats": self._extract_formats(xml_video.find("assets")),
"uploader": xml_video.find("author").text, "thumbnails": self._extract_thumbnails(xml_video.find("teaserImage/variants")),
"upload_date": "".join(reversed(xml_video.find("broadcastDate").text.split("."))), "description": " ".join(xml_video.find("shareTitle").text.splitlines()),
"webpage_url": xml_video.find("permalink").text, "webpage_url": xml_video.find("permalink").text
} for xml_video in xml.findall("video")] }
if xml_video.find("author").text:
video["uploader"] = xml_video.find("author").text
if xml_video.find("broadcastDate").text:
video["upload_date"] = "".join(reversed(xml_video.find("broadcastDate").text.split(".")))
videos.append(video)
if len(videos) > 1: if len(videos) > 1:
self._downloader.report_warning( self._downloader.report_warning(

View File

@@ -17,8 +17,9 @@ class CollegeHumorIE(InfoExtractor):
'id': '6902724', 'id': '6902724',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Comic-Con Cosplay Catastrophe', 'title': 'Comic-Con Cosplay Catastrophe',
'description': 'Fans get creative this year', 'description': "Fans get creative this year at San Diego. Too creative. And yes, that's really Joss Whedon.",
'age_limit': 13, 'age_limit': 13,
'duration': 187,
}, },
}, },
{ {
@@ -28,7 +29,7 @@ class CollegeHumorIE(InfoExtractor):
'id': '3505939', 'id': '3505939',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Font Conference', 'title': 'Font Conference',
'description': 'This video wasn\'t long enough,', 'description': "This video wasn't long enough, so we made it double-spaced.",
'age_limit': 10, 'age_limit': 10,
'duration': 179, 'duration': 179,
}, },
@@ -87,6 +88,7 @@ class CollegeHumorIE(InfoExtractor):
self._sort_formats(formats) self._sort_formats(formats)
duration = int_or_none(vdata.get('duration'), 1000) duration = int_or_none(vdata.get('duration'), 1000)
like_count = int_or_none(vdata.get('likes'))
return { return {
'id': video_id, 'id': video_id,
@@ -96,4 +98,5 @@ class CollegeHumorIE(InfoExtractor):
'formats': formats, 'formats': formats,
'age_limit': age_limit, 'age_limit': age_limit,
'duration': duration, 'duration': duration,
'like_count': like_count,
} }

View File

@@ -14,7 +14,7 @@ from ..utils import (
class ComedyCentralIE(MTVServicesInfoExtractor): class ComedyCentralIE(MTVServicesInfoExtractor):
_VALID_URL = r'''(?x)https?://(?:www\.)?comedycentral\.com/ _VALID_URL = r'''(?x)https?://(?:www\.)?(comedycentral|cc)\.com/
(video-clips|episodes|cc-studios|video-collections) (video-clips|episodes|cc-studios|video-collections)
/(?P<title>.*)''' /(?P<title>.*)'''
_FEED_URL = 'http://comedycentral.com/feeds/mrss/' _FEED_URL = 'http://comedycentral.com/feeds/mrss/'

View File

@@ -97,7 +97,9 @@ class InfoExtractor(object):
thumbnail: Full URL to a video thumbnail image. thumbnail: Full URL to a video thumbnail image.
description: One-line video description. description: One-line video description.
uploader: Full name of the video uploader. uploader: Full name of the video uploader.
timestamp: UNIX timestamp of the moment the video became available.
upload_date: Video upload date (YYYYMMDD). upload_date: Video upload date (YYYYMMDD).
If not explicitly set, calculated from timestamp.
uploader_id: Nickname or id of the video uploader. uploader_id: Nickname or id of the video uploader.
location: Physical location of the video. location: Physical location of the video.
subtitles: The subtitle file contents as a dictionary in the format subtitles: The subtitle file contents as a dictionary in the format

View File

@@ -10,9 +10,9 @@ from ..utils import (
class CSpanIE(InfoExtractor): class CSpanIE(InfoExtractor):
_VALID_URL = r'http://(?:www\.)?c-span\.org/video/\?(?P<id>\d+)' _VALID_URL = r'http://(?:www\.)?c-span\.org/video/\?(?P<id>[0-9a-f]+)'
IE_DESC = 'C-SPAN' IE_DESC = 'C-SPAN'
_TEST = { _TESTS = [{
'url': 'http://www.c-span.org/video/?313572-1/HolderonV', 'url': 'http://www.c-span.org/video/?313572-1/HolderonV',
'md5': '8e44ce11f0f725527daccc453f553eb0', 'md5': '8e44ce11f0f725527daccc453f553eb0',
'info_dict': { 'info_dict': {
@@ -22,13 +22,24 @@ class CSpanIE(InfoExtractor):
'description': 'Attorney General Eric Holder spoke to reporters following the Supreme Court decision in Shelby County v. Holder in which the court ruled that the preclearance provisions of the Voting Rights Act could not be enforced until Congress established new guidelines for review.', 'description': 'Attorney General Eric Holder spoke to reporters following the Supreme Court decision in Shelby County v. Holder in which the court ruled that the preclearance provisions of the Voting Rights Act could not be enforced until Congress established new guidelines for review.',
}, },
'skip': 'Regularly fails on travis, for unknown reasons', 'skip': 'Regularly fails on travis, for unknown reasons',
} }, {
'url': 'http://www.c-span.org/video/?c4486943/cspan-international-health-care-models',
# For whatever reason, the served vide oalternates between
# two different ones
#'md5': 'dbb0f047376d457f2ab8b3929cbb2d0c',
'info_dict': {
'id': '340723',
'ext': 'mp4',
'title': 'International Health Care Models',
'description': 'md5:7a985a2d595dba00af3d9c9f0783c967',
}
}]
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
page_id = mobj.group('id') page_id = mobj.group('id')
webpage = self._download_webpage(url, page_id) webpage = self._download_webpage(url, page_id)
video_id = self._search_regex(r'data-progid=\'(\d+)\'>', webpage, 'video id') video_id = self._search_regex(r'progid=\'?([0-9]+)\'?>', webpage, 'video id')
description = self._html_search_regex( description = self._html_search_regex(
[ [

View File

@@ -34,12 +34,14 @@ class FunnyOrDieIE(InfoExtractor):
if mobj.group('type') == 'embed': if mobj.group('type') == 'embed':
post_json = self._search_regex( post_json = self._search_regex(
r'fb_post\s*=\s*(\{.*?\});', webpage, 'post details') r'fb_post\s*=\s*(\{.*?\});', webpage, 'post details')
post = json.loads(post_json)['attachment'] post = json.loads(post_json)
title = post['name'] title = post['name']
description = post.get('description') description = post.get('description')
thumbnail = post.get('picture')
else: else:
title = self._og_search_title(webpage) title = self._og_search_title(webpage)
description = self._og_search_description(webpage) description = self._og_search_description(webpage)
thumbnail = None
return { return {
'id': video_id, 'id': video_id,
@@ -47,4 +49,5 @@ class FunnyOrDieIE(InfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'title': title, 'title': title,
'description': description, 'description': description,
'thumbnail': thumbnail,
} }

View File

@@ -24,6 +24,7 @@ from ..utils import (
) )
from .brightcove import BrightcoveIE from .brightcove import BrightcoveIE
from .ooyala import OoyalaIE from .ooyala import OoyalaIE
from .rutv import RUTVIE
class GenericIE(InfoExtractor): class GenericIE(InfoExtractor):
@@ -143,8 +144,34 @@ class GenericIE(InfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'title': 'Between Two Ferns with Zach Galifianakis: President Barack Obama', 'title': 'Between Two Ferns with Zach Galifianakis: President Barack Obama',
'description': 'Episode 18: President Barack Obama sits down with Zach Galifianakis for his most memorable interview yet.', 'description': 'Episode 18: President Barack Obama sits down with Zach Galifianakis for his most memorable interview yet.',
} },
}, },
# RUTV embed
{
'url': 'http://www.rg.ru/2014/03/15/reg-dfo/anklav-anons.html',
'info_dict': {
'id': '776940',
'ext': 'mp4',
'title': 'Охотское море стало целиком российским',
'description': 'md5:5ed62483b14663e2a95ebbe115eb8f43',
},
'params': {
# m3u8 download
'skip_download': True,
},
},
# Embedded TED video
{
'url': 'http://en.support.wordpress.com/videos/ted-talks/',
'md5': 'deeeabcc1085eb2ba205474e7235a3d5',
'info_dict': {
'id': '981',
'ext': 'mp4',
'title': 'My web playroom',
'uploader': 'Ze Frank',
'description': 'md5:ddb2a40ecd6b6a147e400e535874947b',
}
}
] ]
def report_download_webpage(self, video_id): def report_download_webpage(self, video_id):
@@ -170,9 +197,14 @@ class GenericIE(InfoExtractor):
newurl = newurl.replace(' ', '%20') newurl = newurl.replace(' ', '%20')
newheaders = dict((k,v) for k,v in req.headers.items() newheaders = dict((k,v) for k,v in req.headers.items()
if k.lower() not in ("content-length", "content-type")) if k.lower() not in ("content-length", "content-type"))
try:
# This function was deprecated in python 3.3 and removed in 3.4
origin_req_host = req.get_origin_req_host()
except AttributeError:
origin_req_host = req.origin_req_host
return HEADRequest(newurl, return HEADRequest(newurl,
headers=newheaders, headers=newheaders,
origin_req_host=req.get_origin_req_host(), origin_req_host=origin_req_host,
unverifiable=True) unverifiable=True)
else: else:
raise compat_urllib_error.HTTPError(req.get_full_url(), code, msg, headers, fp) raise compat_urllib_error.HTTPError(req.get_full_url(), code, msg, headers, fp)
@@ -324,9 +356,9 @@ class GenericIE(InfoExtractor):
# Look for embedded (iframe) Vimeo player # Look for embedded (iframe) Vimeo player
mobj = re.search( mobj = re.search(
r'<iframe[^>]+?src="((?:https?:)?//player\.vimeo\.com/video/.+?)"', webpage) r'<iframe[^>]+?src=(["\'])(?P<url>(?:https?:)?//player\.vimeo\.com/video/.+?)\1', webpage)
if mobj: if mobj:
player_url = unescapeHTML(mobj.group(1)) player_url = unescapeHTML(mobj.group('url'))
surl = smuggle_url(player_url, {'Referer': url}) surl = smuggle_url(player_url, {'Referer': url})
return self.url_result(surl, 'Vimeo') return self.url_result(surl, 'Vimeo')
@@ -451,6 +483,11 @@ class GenericIE(InfoExtractor):
return self.playlist_result( return self.playlist_result(
urlrs, playlist_id=video_id, playlist_title=video_title) urlrs, playlist_id=video_id, playlist_title=video_title)
# Look for embedded RUTV player
rutv_url = RUTVIE._extract_url(webpage)
if rutv_url:
return self.url_result(rutv_url, 'RUTV')
# Start with something easy: JW Player in SWFObject # Start with something easy: JW Player in SWFObject
mobj = re.search(r'flashvars: [\'"](?:.*&)?file=(http[^\'"&]*)', webpage) mobj = re.search(r'flashvars: [\'"](?:.*&)?file=(http[^\'"&]*)', webpage)
if mobj is None: if mobj is None:
@@ -462,6 +499,13 @@ class GenericIE(InfoExtractor):
if mobj is None: if mobj is None:
# Broaden the search a little bit: JWPlayer JS loader # Broaden the search a little bit: JWPlayer JS loader
mobj = re.search(r'[^A-Za-z0-9]?file["\']?:\s*["\'](http(?![^\'"]+\.[0-9]+[\'"])[^\'"]+)["\']', webpage) mobj = re.search(r'[^A-Za-z0-9]?file["\']?:\s*["\'](http(?![^\'"]+\.[0-9]+[\'"])[^\'"]+)["\']', webpage)
# Look for embedded TED player
mobj = re.search(
r'<iframe[^>]+?src=(["\'])(?P<url>http://embed\.ted\.com/.+?)\1', webpage)
if mobj is not None:
return self.url_result(mobj.group('url'), 'TED')
if mobj is None: if mobj is None:
# Try to find twitter cards info # Try to find twitter cards info
mobj = re.search(r'<meta (?:property|name)="twitter:player:stream" (?:content|value)="(.+?)"', webpage) mobj = re.search(r'<meta (?:property|name)="twitter:player:stream" (?:content|value)="(.+?)"', webpage)

View File

@@ -46,6 +46,6 @@ class GoogleSearchIE(SearchInfoExtractor):
'url': mobj.group(1) 'url': mobj.group(1)
}) })
if (len(entries) >= n) or not re.search(r'class="pn" id="pnnext"', webpage): if (len(entries) >= n) or not re.search(r'id="pnnext"', webpage):
res['entries'] = entries[:n] res['entries'] = entries[:n]
return res return res

View File

@@ -6,7 +6,10 @@ from random import random
from math import floor from math import floor
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import compat_urllib_request from ..utils import (
compat_urllib_request,
ExtractorError,
)
class IPrimaIE(InfoExtractor): class IPrimaIE(InfoExtractor):
@@ -36,6 +39,7 @@ class IPrimaIE(InfoExtractor):
'params': { 'params': {
'skip_download': True, # requires rtmpdump 'skip_download': True, # requires rtmpdump
}, },
'skip': 'Do not have permission to access this page',
}] }]
def _real_extract(self, url): def _real_extract(self, url):
@@ -44,6 +48,10 @@ class IPrimaIE(InfoExtractor):
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
if re.search(r'Nemáte oprávnění přistupovat na tuto stránku\.\s*</div>', webpage):
raise ExtractorError(
'%s said: You do not have permission to access this page' % self.IE_NAME, expected=True)
player_url = ( player_url = (
'http://embed.livebox.cz/iprimaplay/player-embed-v2.js?__tok%s__=%s' % 'http://embed.livebox.cz/iprimaplay/player-embed-v2.js?__tok%s__=%s' %
(floor(random()*1073741824), floor(random()*1073741824)) (floor(random()*1073741824), floor(random()*1073741824))

View File

@@ -4,6 +4,7 @@ from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import int_or_none
class KontrTubeIE(InfoExtractor): class KontrTubeIE(InfoExtractor):
@@ -32,27 +33,26 @@ class KontrTubeIE(InfoExtractor):
video_url = self._html_search_regex(r"video_url: '(.+?)/?',", webpage, 'video URL') video_url = self._html_search_regex(r"video_url: '(.+?)/?',", webpage, 'video URL')
thumbnail = self._html_search_regex(r"preview_url: '(.+?)/?',", webpage, 'video thumbnail', fatal=False) thumbnail = self._html_search_regex(r"preview_url: '(.+?)/?',", webpage, 'video thumbnail', fatal=False)
title = self._html_search_regex(r'<title>(.+?) - Труба зовёт - Интересный видеохостинг</title>', webpage, title = self._html_search_regex(
'video title') r'<title>(.+?) - Труба зовёт - Интересный видеохостинг</title>', webpage, 'video title')
description = self._html_search_meta('description', webpage, 'video description') description = self._html_search_meta('description', webpage, 'video description')
mobj = re.search(r'<div class="col_2">Длительность: <span>(?P<minutes>\d+)м:(?P<seconds>\d+)с</span></div>', mobj = re.search(
webpage) r'<div class="col_2">Длительность: <span>(?P<minutes>\d+)м:(?P<seconds>\d+)с</span></div>', webpage)
duration = int(mobj.group('minutes')) * 60 + int(mobj.group('seconds')) if mobj else None duration = int(mobj.group('minutes')) * 60 + int(mobj.group('seconds')) if mobj else None
view_count = self._html_search_regex(r'<div class="col_2">Просмотров: <span>(\d+)</span></div>', webpage, view_count = self._html_search_regex(
'view count', fatal=False) r'<div class="col_2">Просмотров: <span>(\d+)</span></div>', webpage, 'view count', fatal=False)
view_count = int(view_count) if view_count is not None else None
comment_count = None comment_count = None
comment_str = self._html_search_regex(r'Комментарии: <span>([^<]+)</span>', webpage, 'comment count', comment_str = self._html_search_regex(
fatal=False) r'Комментарии: <span>([^<]+)</span>', webpage, 'comment count', fatal=False)
if comment_str.startswith('комментариев нет'): if comment_str.startswith('комментариев нет'):
comment_count = 0 comment_count = 0
else: else:
mobj = re.search(r'\d+ из (?P<total>\d+) комментариев', comment_str) mobj = re.search(r'\d+ из (?P<total>\d+) комментариев', comment_str)
if mobj: if mobj:
comment_count = int(mobj.group('total')) comment_count = mobj.group('total')
return { return {
'id': video_id, 'id': video_id,
@@ -61,6 +61,6 @@ class KontrTubeIE(InfoExtractor):
'title': title, 'title': title,
'description': description, 'description': description,
'duration': duration, 'duration': duration,
'view_count': view_count, 'view_count': int_or_none(view_count),
'comment_count': comment_count, 'comment_count': int_or_none(comment_count),
} }

View File

@@ -1,6 +1,5 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import json
import re import re
from .common import InfoExtractor from .common import InfoExtractor
@@ -12,10 +11,13 @@ class NineGagIE(InfoExtractor):
_TEST = { _TEST = {
"url": "http://9gag.tv/v/1912", "url": "http://9gag.tv/v/1912",
"file": "1912.mp4",
"info_dict": { "info_dict": {
"id": "1912",
"ext": "mp4",
"description": "This 3-minute video will make you smile and then make you feel untalented and insignificant. Anyway, you should share this awesomeness. (Thanks, Dino!)", "description": "This 3-minute video will make you smile and then make you feel untalented and insignificant. Anyway, you should share this awesomeness. (Thanks, Dino!)",
"title": "\"People Are Awesome 2013\" Is Absolutely Awesome" "title": "\"People Are Awesome 2013\" Is Absolutely Awesome",
"view_count": int,
"thumbnail": "re:^https?://",
}, },
'add_ie': ['Youtube'] 'add_ie': ['Youtube']
} }
@@ -25,21 +27,27 @@ class NineGagIE(InfoExtractor):
video_id = mobj.group('id') video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
data_json = self._html_search_regex(r'''(?x)
<div\s*id="tv-video"\s*data-video-source="youtube"\s*
data-video-meta="([^"]+)"''', webpage, 'video metadata')
data = json.loads(data_json) youtube_id = self._html_search_regex(
r'(?s)id="jsid-video-post-container".*?data-external-id="([^"]+)"',
webpage, 'video ID')
description = self._html_search_regex(
r'(?s)<div class="video-caption">.*?<p>(.*?)</p>', webpage,
'description', fatal=False)
view_count_str = self._html_search_regex(
r'<p><b>([0-9][0-9,]*)</b> views</p>', webpage, 'view count',
fatal=False)
view_count = (
None if view_count_str is None
else int(view_count_str.replace(',', '')))
return { return {
'_type': 'url_transparent', '_type': 'url_transparent',
'url': data['youtubeVideoId'], 'url': youtube_id,
'ie_key': 'Youtube', 'ie_key': 'Youtube',
'id': video_id, 'id': video_id,
'title': data['title'], 'title': self._og_search_title(webpage),
'description': data['description'], 'description': description,
'view_count': int(data['view_count']), 'view_count': view_count,
'like_count': int(data['statistic']['like']), 'thumbnail': self._og_search_thumbnail(webpage),
'dislike_count': int(data['statistic']['dislike']),
'thumbnail': data['thumbnail_url'],
} }

View File

@@ -0,0 +1,57 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
ExtractorError,
unified_strdate,
)
class ParliamentLiveUKIE(InfoExtractor):
IE_NAME = 'parliamentlive.tv'
IE_DESC = 'UK parliament videos'
_VALID_URL = r'https?://www\.parliamentlive\.tv/Main/Player\.aspx\?(?:[^&]+&)*?meetingId=(?P<id>[0-9]+)'
_TEST = {
'url': 'http://www.parliamentlive.tv/Main/Player.aspx?meetingId=15121&player=windowsmedia',
'info_dict': {
'id': '15121',
'ext': 'asf',
'title': 'hoc home affairs committee, 18 mar 2014.pm',
'description': 'md5:033b3acdf83304cd43946b2d5e5798d1',
},
'params': {
'skip_download': True, # Requires mplayer (mms)
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
asx_url = self._html_search_regex(
r'embed.*?src="([^"]+)" name="MediaPlayer"', webpage,
'metadata URL')
asx = self._download_xml(asx_url, video_id, 'Downloading ASX metadata')
video_url = asx.find('.//REF').attrib['HREF']
title = self._search_regex(
r'''(?x)player\.setClipDetails\(
(?:(?:[0-9]+|"[^"]+"),\s*){2}
"([^"]+",\s*"[^"]+)"
''',
webpage, 'title').replace('", "', ', ')
description = self._html_search_regex(
r'(?s)<span id="MainContentPlaceHolder_CaptionsBlock_WitnessInfo">(.*?)</span>',
webpage, 'description')
return {
'id': video_id,
'ext': 'asf',
'url': video_url,
'title': title,
'description': description,
}

View File

@@ -3,6 +3,9 @@ from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import (
US_RATINGS,
)
class PBSIE(InfoExtractor): class PBSIE(InfoExtractor):
@@ -13,7 +16,7 @@ class PBSIE(InfoExtractor):
# Article with embedded player # Article with embedded player
(?:www\.)?pbs\.org/(?:[^/]+/){2,5}(?P<presumptive_id>[^/]+)/?(?:$|[?\#]) | (?:www\.)?pbs\.org/(?:[^/]+/){2,5}(?P<presumptive_id>[^/]+)/?(?:$|[?\#]) |
# Player # Player
video\.pbs\.org/partnerplayer/(?P<player_id>[^/]+)/ video\.pbs\.org/(?:widget/)?partnerplayer/(?P<player_id>[^/]+)/
) )
''' '''
@@ -57,6 +60,11 @@ class PBSIE(InfoExtractor):
info_url = 'http://video.pbs.org/videoInfo/%s?format=json' % video_id info_url = 'http://video.pbs.org/videoInfo/%s?format=json' % video_id
info = self._download_json(info_url, display_id) info = self._download_json(info_url, display_id)
rating_str = info.get('rating')
if rating_str is not None:
rating_str = rating_str.rpartition('-')[2]
age_limit = US_RATINGS.get(rating_str)
return { return {
'id': video_id, 'id': video_id,
'title': info['title'], 'title': info['title'],
@@ -65,4 +73,5 @@ class PBSIE(InfoExtractor):
'description': info['program'].get('description'), 'description': info['program'].get('description'),
'thumbnail': info.get('image_url'), 'thumbnail': info.get('image_url'),
'duration': info.get('duration'), 'duration': info.get('duration'),
'age_limit': age_limit,
} }

View File

@@ -10,141 +10,13 @@ from ..utils import (
) )
class VGTRKIE(InfoExtractor): class RUTVIE(InfoExtractor):
IE_DESC = 'ВГТРК' IE_DESC = 'RUTV.RU'
_VALID_URL = r'http://(?:.+?\.)?(?:vesti\.ru|russia2?\.tv|tvkultura\.ru|rutv\.ru)/(?P<id>.+)' _VALID_URL = r'https?://player\.(?:rutv\.ru|vgtrk\.com)/(?:flash2v/container\.swf\?id=|iframe/(?P<type>swf|video|live)/id/)(?P<id>\d+)'
_TESTS = [ _TESTS = [
{ {
'url': 'http://www.vesti.ru/videos?vid=575582&cid=1', 'url': 'http://player.rutv.ru/flash2v/container.swf?id=774471&sid=kultura&fbv=true&isPlay=true&ssl=false&i=560&acc_video_id=episode_id/972347/video_id/978186/brand_id/31724',
'info_dict': {
'id': '765035',
'ext': 'mp4',
'title': 'Вести.net: биткоины в России не являются законными',
'description': 'md5:d4bb3859dc1177b28a94c5014c35a36b',
'duration': 302,
},
'params': {
# m3u8 download
'skip_download': True,
},
},
{
'url': 'http://www.vesti.ru/doc.html?id=1349233',
'info_dict': {
'id': '773865',
'ext': 'mp4',
'title': 'Участники митинга штурмуют Донецкую областную администрацию',
'description': 'md5:1a160e98b3195379b4c849f2f4958009',
'duration': 210,
},
'params': {
# m3u8 download
'skip_download': True,
},
},
{
'url': 'http://www.vesti.ru/only_video.html?vid=576180',
'info_dict': {
'id': '766048',
'ext': 'mp4',
'title': 'США заморозило, Британию затопило',
'description': 'md5:f0ed0695ec05aed27c56a70a58dc4cc1',
'duration': 87,
},
'params': {
# m3u8 download
'skip_download': True,
},
},
{
'url': 'http://hitech.vesti.ru/news/view/id/4000',
'info_dict': {
'id': '766888',
'ext': 'mp4',
'title': 'Вести.net: интернет-гиганты начали перетягивание программных "одеял"',
'description': 'md5:65ddd47f9830c4f42ed6475f8730c995',
'duration': 279,
},
'params': {
# m3u8 download
'skip_download': True,
},
},
{
'url': 'http://sochi2014.vesti.ru/video/index/video_id/766403',
'info_dict': {
'id': '766403',
'ext': 'mp4',
'title': 'XXII зимние Олимпийские игры. Российские хоккеисты стартовали на Олимпиаде с победы',
'description': 'md5:55805dfd35763a890ff50fa9e35e31b3',
'duration': 271,
},
'params': {
# m3u8 download
'skip_download': True,
},
'skip': 'Blocked outside Russia',
},
{
'url': 'http://sochi2014.vesti.ru/live/play/live_id/301',
'info_dict': {
'id': '51499',
'ext': 'flv',
'title': 'Сочи-2014. Биатлон. Индивидуальная гонка. Мужчины ',
'description': 'md5:9e0ed5c9d2fa1efbfdfed90c9a6d179c',
},
'params': {
# rtmp download
'skip_download': True,
},
'skip': 'Translation has finished'
},
{
'url': 'http://russia.tv/video/show/brand_id/5169/episode_id/970443/video_id/975648',
'info_dict': {
'id': '771852',
'ext': 'mp4',
'title': 'Прямой эфир. Жертвы загадочной болезни: смерть от старости в 17 лет',
'description': 'md5:b81c8c55247a4bd996b43ce17395b2d8',
'duration': 3096,
},
'params': {
# m3u8 download
'skip_download': True,
},
},
{
'url': 'http://russia.tv/brand/show/brand_id/57638',
'info_dict': {
'id': '774016',
'ext': 'mp4',
'title': 'Чужой в семье Сталина',
'description': '',
'duration': 2539,
},
'params': {
# m3u8 download
'skip_download': True,
},
},
{
'url': 'http://2.russia.tv/video/show/brand_id/48863/episode_id/972920/video_id/978667/viewtype/picture',
'info_dict': {
'id': '775081',
'ext': 'mp4',
'title': 'XXII зимние Олимпийские игры. Россияне заняли весь пьедестал в лыжных гонках',
'description': 'md5:15d3741dd8d04b203fbc031c6a47fb0f',
'duration': 101,
},
'params': {
# m3u8 download
'skip_download': True,
},
'skip': 'Blocked outside Russia',
},
{
'url': 'http://tvkultura.ru/video/show/brand_id/31724/episode_id/972347/video_id/978186',
'info_dict': { 'info_dict': {
'id': '774471', 'id': '774471',
'ext': 'mp4', 'ext': 'mp4',
@@ -158,64 +30,97 @@ class VGTRKIE(InfoExtractor):
}, },
}, },
{ {
'url': 'http://rutv.ru/brand/show/id/6792/channel/75', 'url': 'https://player.vgtrk.com/flash2v/container.swf?id=774016&sid=russiatv&fbv=true&isPlay=true&ssl=false&i=560&acc_video_id=episode_id/972098/video_id/977760/brand_id/57638',
'info_dict': { 'info_dict': {
'id': '125521', 'id': '774016',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Грустная дама червей. Х', 'title': 'Чужой в семье Сталина',
'description': '', 'description': '',
'duration': 4882, 'duration': 2539,
}, },
'params': { 'params': {
# m3u8 download # m3u8 download
'skip_download': True, 'skip_download': True,
}, },
}, },
{
'url': 'http://player.rutv.ru/iframe/swf/id/766888/sid/hitech/?acc_video_id=4000',
'info_dict': {
'id': '766888',
'ext': 'mp4',
'title': 'Вести.net: интернет-гиганты начали перетягивание программных "одеял"',
'description': 'md5:65ddd47f9830c4f42ed6475f8730c995',
'duration': 279,
},
'params': {
# m3u8 download
'skip_download': True,
},
},
{
'url': 'http://player.rutv.ru/iframe/video/id/771852/start_zoom/true/showZoomBtn/false/sid/russiatv/?acc_video_id=episode_id/970443/video_id/975648/brand_id/5169',
'info_dict': {
'id': '771852',
'ext': 'mp4',
'title': 'Прямой эфир. Жертвы загадочной болезни: смерть от старости в 17 лет',
'description': 'md5:b81c8c55247a4bd996b43ce17395b2d8',
'duration': 3096,
},
'params': {
# m3u8 download
'skip_download': True,
},
},
{
'url': 'http://player.rutv.ru/iframe/live/id/51499/showZoomBtn/false/isPlay/true/sid/sochi2014',
'info_dict': {
'id': '51499',
'ext': 'flv',
'title': 'Сочи-2014. Биатлон. Индивидуальная гонка. Мужчины ',
'description': 'md5:9e0ed5c9d2fa1efbfdfed90c9a6d179c',
},
'params': {
# rtmp download
'skip_download': True,
},
'skip': 'Translation has finished',
},
] ]
@classmethod
def _extract_url(cls, webpage):
mobj = re.search(
r'<iframe[^>]+?src=(["\'])(?P<url>https?://player\.rutv\.ru/iframe/(?:swf|video|live)/id/.+?)\1', webpage)
if mobj:
return mobj.group('url')
mobj = re.search(
r'<meta[^>]+?property=(["\'])og:video\1[^>]+?content=(["\'])(?P<url>http://player\.(?:rutv\.ru|vgtrk\.com)/flash2v/container\.swf\?id=.+?\2)',
webpage)
if mobj:
return mobj.group('url')
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id') video_id = mobj.group('id')
video_type = mobj.group('type')
page = self._download_webpage(url, video_id, 'Downloading page') if not video_type or video_type == 'swf':
mobj = re.search(
r'<meta property="og:video" content="http://www\.vesti\.ru/i/flvplayer_videoHost\.swf\?vid=(?P<id>\d+)',
page)
if mobj:
video_id = mobj.group('id')
page = self._download_webpage('http://www.vesti.ru/only_video.html?vid=%s' % video_id, video_id,
'Downloading video page')
mobj = re.search(
r'<meta property="og:video" content="http://player\.rutv\.ru/flash2v/container\.swf\?id=(?P<id>\d+)', page)
if mobj:
video_type = 'video' video_type = 'video'
video_id = mobj.group('id')
else:
mobj = re.search(
r'<iframe.+?src="http://player\.rutv\.ru/iframe/(?P<type>[^/]+)/id/(?P<id>\d+)[^"]*".*?></iframe>',
page)
if not mobj:
raise ExtractorError('No media found', expected=True)
video_type = mobj.group('type')
video_id = mobj.group('id')
json_data = self._download_json( json_data = self._download_json(
'http://player.rutv.ru/iframe/%splay/id/%s' % ('live-' if video_type == 'live' else '', video_id), 'http://player.rutv.ru/iframe/%splay/id/%s' % ('live-' if video_type == 'live' else '', video_id),
video_id, 'Downloading JSON') video_id, 'Downloading JSON')
if json_data['errors']: if json_data['errors']:
raise ExtractorError('vesti returned error: %s' % json_data['errors'], expected=True) raise ExtractorError('%s said: %s' % (self.IE_NAME, json_data['errors']), expected=True)
playlist = json_data['data']['playlist'] playlist = json_data['data']['playlist']
medialist = playlist['medialist'] medialist = playlist['medialist']
media = medialist[0] media = medialist[0]
if media['errors']: if media['errors']:
raise ExtractorError('vesti returned error: %s' % media['errors'], expected=True) raise ExtractorError('%s said: %s' % (self.IE_NAME, media['errors']), expected=True)
view_count = playlist.get('count_views') view_count = playlist.get('count_views')
priority_transport = playlist['priority_transport'] priority_transport = playlist['priority_transport']

View File

@@ -11,7 +11,9 @@ from ..utils import (
class TEDIE(SubtitlesInfoExtractor): class TEDIE(SubtitlesInfoExtractor):
_VALID_URL = r'''(?x)http://www\.ted\.com/ _VALID_URL = r'''(?x)
(?P<proto>https?://)
(?P<type>www|embed)(?P<urlmain>\.ted\.com/
( (
(?P<type_playlist>playlists(?:/\d+)?) # We have a playlist (?P<type_playlist>playlists(?:/\d+)?) # We have a playlist
| |
@@ -19,6 +21,7 @@ class TEDIE(SubtitlesInfoExtractor):
) )
(/lang/(.*?))? # The url may contain the language (/lang/(.*?))? # The url may contain the language
/(?P<name>\w+) # Here goes the name and then ".html" /(?P<name>\w+) # Here goes the name and then ".html"
.*)$
''' '''
_TEST = { _TEST = {
'url': 'http://www.ted.com/talks/dan_dennett_on_our_consciousness.html', 'url': 'http://www.ted.com/talks/dan_dennett_on_our_consciousness.html',
@@ -48,6 +51,9 @@ class TEDIE(SubtitlesInfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
m = re.match(self._VALID_URL, url, re.VERBOSE) m = re.match(self._VALID_URL, url, re.VERBOSE)
if m.group('type') == 'embed':
desktop_url = m.group('proto') + 'www' + m.group('urlmain')
return self.url_result(desktop_url, 'TED')
name = m.group('name') name = m.group('name')
if m.group('type_talk'): if m.group('type_talk'):
return self._talk_info(url, name) return self._talk_info(url, name)
@@ -93,11 +99,14 @@ class TEDIE(SubtitlesInfoExtractor):
self._list_available_subtitles(video_id, talk_info) self._list_available_subtitles(video_id, talk_info)
return return
thumbnail = talk_info['thumb']
if not thumbnail.startswith('http'):
thumbnail = 'http://' + thumbnail
return { return {
'id': video_id, 'id': video_id,
'title': talk_info['title'], 'title': talk_info['title'],
'uploader': talk_info['speaker'], 'uploader': talk_info['speaker'],
'thumbnail': talk_info['thumb'], 'thumbnail': thumbnail,
'description': self._og_search_description(webpage), 'description': self._og_search_description(webpage),
'subtitles': video_subtitles, 'subtitles': video_subtitles,
'formats': formats, 'formats': formats,

View File

@@ -0,0 +1,164 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
compat_urllib_parse,
compat_urllib_request,
ExtractorError,
)
class UdemyIE(InfoExtractor):
IE_NAME = 'udemy'
_VALID_URL = r'https?://www\.udemy\.com/(?:[^#]+#/lecture/|lecture/view/?\?lectureId=)(?P<id>\d+)'
_LOGIN_URL = 'https://www.udemy.com/join/login-submit/'
_NETRC_MACHINE = 'udemy'
_TESTS = [{
'url': 'https://www.udemy.com/java-tutorial/#/lecture/172757',
'md5': '98eda5b657e752cf945d8445e261b5c5',
'info_dict': {
'id': '160614',
'ext': 'mp4',
'title': 'Introduction and Installation',
'description': 'md5:c0d51f6f21ef4ec65f091055a5eef876',
'duration': 579.29,
},
'skip': 'Requires udemy account credentials',
}]
def _handle_error(self, response):
if not isinstance(response, dict):
return
error = response.get('error')
if error:
error_str = 'Udemy returned error #%s: %s' % (error.get('code'), error.get('message'))
error_data = error.get('data')
if error_data:
error_str += ' - %s' % error_data.get('formErrors')
raise ExtractorError(error_str, expected=True)
def _download_json(self, url, video_id, note='Downloading JSON metadata'):
response = super(UdemyIE, self)._download_json(url, video_id, note)
self._handle_error(response)
return response
def _real_initialize(self):
self._login()
def _login(self):
(username, password) = self._get_login_info()
if username is None:
raise ExtractorError(
'Udemy account is required, use --username and --password options to provide account credentials.',
expected=True)
login_popup = self._download_webpage(
'https://www.udemy.com/join/login-popup?displayType=ajax&showSkipButton=1', None,
'Downloading login popup')
if login_popup == '<div class="run-command close-popup redirect" data-url="https://www.udemy.com/"></div>':
return
csrf = self._html_search_regex(r'<input type="hidden" name="csrf" value="(.+?)"', login_popup, 'csrf token')
login_form = {
'email': username,
'password': password,
'csrf': csrf,
'displayType': 'json',
'isSubmitted': '1',
}
request = compat_urllib_request.Request(self._LOGIN_URL, compat_urllib_parse.urlencode(login_form))
response = self._download_json(request, None, 'Logging in as %s' % username)
if 'returnUrl' not in response:
raise ExtractorError('Unable to log in')
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
lecture_id = mobj.group('id')
lecture = self._download_json(
'https://www.udemy.com/api-1.1/lectures/%s' % lecture_id, lecture_id, 'Downloading lecture JSON')
if lecture['assetType'] != 'Video':
raise ExtractorError('Lecture %s is not a video' % lecture_id, expected=True)
asset = lecture['asset']
stream_url = asset['streamUrl']
mobj = re.search(r'(https?://www\.youtube\.com/watch\?v=.*)', stream_url)
if mobj:
return self.url_result(mobj.group(1), 'Youtube')
video_id = asset['id']
thumbnail = asset['thumbnailUrl']
duration = asset['data']['duration']
download_url = asset['downloadUrl']
formats = [
{
'url': download_url['Video480p'][0],
'format_id': '360p',
},
{
'url': download_url['Video'][0],
'format_id': '720p',
},
]
title = lecture['title']
description = lecture['description']
return {
'id': video_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'duration': duration,
'formats': formats
}
class UdemyCourseIE(UdemyIE):
IE_NAME = 'udemy:course'
_VALID_URL = r'https?://www\.udemy\.com/(?P<coursepath>[\da-z-]+)'
_SUCCESSFULLY_ENROLLED = '>You have enrolled in this course!<'
_ALREADY_ENROLLED = '>You are already taking this course.<'
_TESTS = []
@classmethod
def suitable(cls, url):
return False if UdemyIE.suitable(url) else super(UdemyCourseIE, cls).suitable(url)
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
course_path = mobj.group('coursepath')
response = self._download_json(
'https://www.udemy.com/api-1.1/courses/%s' % course_path, course_path, 'Downloading course JSON')
course_id = int(response['id'])
course_title = response['title']
webpage = self._download_webpage(
'https://www.udemy.com/course/subscribe/?courseId=%s' % course_id, course_id, 'Enrolling in the course')
if self._SUCCESSFULLY_ENROLLED in webpage:
self.to_screen('%s: Successfully enrolled in' % course_id)
elif self._ALREADY_ENROLLED in webpage:
self.to_screen('%s: Already enrolled in' % course_id)
response = self._download_json('https://www.udemy.com/api-1.1/courses/%s/curriculum' % course_id,
course_id, 'Downloading course curriculum')
entries = [
self.url_result('https://www.udemy.com/%s/#/lecture/%s' % (course_path, asset['id']), 'Udemy')
for asset in response if asset.get('assetType') == 'Video'
]
return self.playlist_result(entries, course_id, course_title)

View File

@@ -0,0 +1,121 @@
# encoding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import ExtractorError
from .rutv import RUTVIE
class VestiIE(InfoExtractor):
IE_DESC = 'Вести.Ru'
_VALID_URL = r'http://(?:.+?\.)?vesti\.ru/(?P<id>.+)'
_TESTS = [
{
'url': 'http://www.vesti.ru/videos?vid=575582&cid=1',
'info_dict': {
'id': '765035',
'ext': 'mp4',
'title': 'Вести.net: биткоины в России не являются законными',
'description': 'md5:d4bb3859dc1177b28a94c5014c35a36b',
'duration': 302,
},
'params': {
# m3u8 download
'skip_download': True,
},
},
{
'url': 'http://www.vesti.ru/doc.html?id=1349233',
'info_dict': {
'id': '773865',
'ext': 'mp4',
'title': 'Участники митинга штурмуют Донецкую областную администрацию',
'description': 'md5:1a160e98b3195379b4c849f2f4958009',
'duration': 210,
},
'params': {
# m3u8 download
'skip_download': True,
},
},
{
'url': 'http://www.vesti.ru/only_video.html?vid=576180',
'info_dict': {
'id': '766048',
'ext': 'mp4',
'title': 'США заморозило, Британию затопило',
'description': 'md5:f0ed0695ec05aed27c56a70a58dc4cc1',
'duration': 87,
},
'params': {
# m3u8 download
'skip_download': True,
},
},
{
'url': 'http://hitech.vesti.ru/news/view/id/4000',
'info_dict': {
'id': '766888',
'ext': 'mp4',
'title': 'Вести.net: интернет-гиганты начали перетягивание программных "одеял"',
'description': 'md5:65ddd47f9830c4f42ed6475f8730c995',
'duration': 279,
},
'params': {
# m3u8 download
'skip_download': True,
},
},
{
'url': 'http://sochi2014.vesti.ru/video/index/video_id/766403',
'info_dict': {
'id': '766403',
'ext': 'mp4',
'title': 'XXII зимние Олимпийские игры. Российские хоккеисты стартовали на Олимпиаде с победы',
'description': 'md5:55805dfd35763a890ff50fa9e35e31b3',
'duration': 271,
},
'params': {
# m3u8 download
'skip_download': True,
},
'skip': 'Blocked outside Russia',
},
{
'url': 'http://sochi2014.vesti.ru/live/play/live_id/301',
'info_dict': {
'id': '51499',
'ext': 'flv',
'title': 'Сочи-2014. Биатлон. Индивидуальная гонка. Мужчины ',
'description': 'md5:9e0ed5c9d2fa1efbfdfed90c9a6d179c',
},
'params': {
# rtmp download
'skip_download': True,
},
'skip': 'Translation has finished'
},
]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
page = self._download_webpage(url, video_id, 'Downloading page')
mobj = re.search(
r'<meta[^>]+?property="og:video"[^>]+?content="http://www\.vesti\.ru/i/flvplayer_videoHost\.swf\?vid=(?P<id>\d+)',
page)
if mobj:
video_id = mobj.group('id')
page = self._download_webpage('http://www.vesti.ru/only_video.html?vid=%s' % video_id, video_id,
'Downloading video page')
rutv_url = RUTVIE._extract_url(page)
if rutv_url:
return self.url_result(rutv_url, 'RUTV')
raise ExtractorError('No video found', expected=True)

View File

@@ -2,7 +2,6 @@ from __future__ import unicode_literals
import re import re
import xml.etree.ElementTree import xml.etree.ElementTree
import datetime
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
@@ -22,6 +21,7 @@ class VevoIE(InfoExtractor):
https?://videoplayer\.vevo\.com/embed/embedded\?videoId=| https?://videoplayer\.vevo\.com/embed/embedded\?videoId=|
vevo:) vevo:)
(?P<id>[^&?#]+)''' (?P<id>[^&?#]+)'''
_TESTS = [{ _TESTS = [{
'url': 'http://www.vevo.com/watch/hurts/somebody-to-die-for/GB1101300280', 'url': 'http://www.vevo.com/watch/hurts/somebody-to-die-for/GB1101300280',
"md5": "06bea460acb744eab74a9d7dcb4bfd61", "md5": "06bea460acb744eab74a9d7dcb4bfd61",
@@ -34,6 +34,8 @@ class VevoIE(InfoExtractor):
"duration": 230.12, "duration": 230.12,
"width": 1920, "width": 1920,
"height": 1080, "height": 1080,
# timestamp and upload_date are often incorrect; seem to change randomly
'timestamp': int,
} }
}, { }, {
'note': 'v3 SMIL format', 'note': 'v3 SMIL format',
@@ -47,6 +49,7 @@ class VevoIE(InfoExtractor):
'title': 'I Wish I Could Break Your Heart', 'title': 'I Wish I Could Break Your Heart',
'duration': 226.101, 'duration': 226.101,
'age_limit': 0, 'age_limit': 0,
'timestamp': int,
} }
}, { }, {
'note': 'Age-limited video', 'note': 'Age-limited video',
@@ -57,7 +60,8 @@ class VevoIE(InfoExtractor):
'age_limit': 18, 'age_limit': 18,
'title': 'Tunnel Vision (Explicit)', 'title': 'Tunnel Vision (Explicit)',
'uploader': 'Justin Timberlake', 'uploader': 'Justin Timberlake',
'upload_date': '20130703', 'upload_date': 're:2013070[34]',
'timestamp': int,
}, },
'params': { 'params': {
'skip_download': 'true', 'skip_download': 'true',
@@ -169,13 +173,13 @@ class VevoIE(InfoExtractor):
timestamp_ms = int(self._search_regex( timestamp_ms = int(self._search_regex(
r'/Date\((\d+)\)/', video_info['launchDate'], 'launch date')) r'/Date\((\d+)\)/', video_info['launchDate'], 'launch date'))
upload_date = datetime.datetime.utcfromtimestamp(timestamp_ms // 1000)
return { return {
'id': video_id, 'id': video_id,
'title': video_info['title'], 'title': video_info['title'],
'formats': formats, 'formats': formats,
'thumbnail': video_info['imageUrl'], 'thumbnail': video_info['imageUrl'],
'upload_date': upload_date.strftime('%Y%m%d'), 'timestamp': timestamp_ms // 1000,
'uploader': video_info['mainArtists'][0]['artistName'], 'uploader': video_info['mainArtists'][0]['artistName'],
'duration': video_info['duration'], 'duration': video_info['duration'],
'age_limit': age_limit, 'age_limit': age_limit,

View File

@@ -29,6 +29,7 @@ class VideoBamIE(InfoExtractor):
'info_dict': { 'info_dict': {
'id': 'pqLvq', 'id': 'pqLvq',
'ext': 'mp4', 'ext': 'mp4',
'title': '_',
} }
}, },
] ]
@@ -61,7 +62,7 @@ class VideoBamIE(InfoExtractor):
self._sort_formats(formats) self._sort_formats(formats)
title = self._og_search_title(page, default='VideoBam', fatal=False) title = self._og_search_title(page, default='_', fatal=False)
description = self._og_search_description(page, default=None) description = self._og_search_description(page, default=None)
thumbnail = self._og_search_thumbnail(page) thumbnail = self._og_search_thumbnail(page)
uploader = self._html_search_regex(r'Upload by ([^<]+)</a>', page, 'uploader', fatal=False, default=None) uploader = self._html_search_regex(r'Upload by ([^<]+)</a>', page, 'uploader', fatal=False, default=None)

View File

@@ -1,29 +1,33 @@
from __future__ import unicode_literals
import re import re
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
unescapeHTML, unescapeHTML,
unified_strdate, unified_strdate,
US_RATINGS,
) )
from .subtitles import SubtitlesInfoExtractor from .subtitles import SubtitlesInfoExtractor
class VikiIE(SubtitlesInfoExtractor): class VikiIE(SubtitlesInfoExtractor):
IE_NAME = u'viki' IE_NAME = 'viki'
_VALID_URL = r'^https?://(?:www\.)?viki\.com/videos/(?P<id>[0-9]+v)' _VALID_URL = r'^https?://(?:www\.)?viki\.com/videos/(?P<id>[0-9]+v)'
_TEST = { _TEST = {
u'url': u'http://www.viki.com/videos/1023585v-heirs-episode-14', 'url': 'http://www.viki.com/videos/1023585v-heirs-episode-14',
u'file': u'1023585v.mp4', 'md5': 'a21454021c2646f5433514177e2caa5f',
u'md5': u'a21454021c2646f5433514177e2caa5f', 'info_dict': {
u'info_dict': { 'id': '1023585v',
u'title': u'Heirs Episode 14', 'ext': 'mp4',
u'uploader': u'SBS', 'title': 'Heirs Episode 14',
u'description': u'md5:c4b17b9626dd4b143dcc4d855ba3474e', 'uploader': 'SBS',
u'upload_date': u'20131121', 'description': 'md5:c4b17b9626dd4b143dcc4d855ba3474e',
u'age_limit': 13, 'upload_date': '20131121',
'age_limit': 13,
}, },
u'skip': u'Blocked in the US', 'skip': 'Blocked in the US',
} }
def _real_extract(self, url): def _real_extract(self, url):
@@ -44,28 +48,21 @@ class VikiIE(SubtitlesInfoExtractor):
rating_str = self._html_search_regex( rating_str = self._html_search_regex(
r'<strong>Rating: </strong>\s*([^<]*)<', webpage, r'<strong>Rating: </strong>\s*([^<]*)<', webpage,
u'rating information', default='').strip() 'rating information', default='').strip()
RATINGS = { age_limit = US_RATINGS.get(rating_str)
'G': 0,
'PG': 10,
'PG-13': 13,
'R': 16,
'NC': 18,
}
age_limit = RATINGS.get(rating_str)
info_url = 'http://www.viki.com/player5_fragment/%s?action=show&controller=videos' % video_id info_url = 'http://www.viki.com/player5_fragment/%s?action=show&controller=videos' % video_id
info_webpage = self._download_webpage( info_webpage = self._download_webpage(
info_url, video_id, note=u'Downloading info page') info_url, video_id, note='Downloading info page')
if re.match(r'\s*<div\s+class="video-error', info_webpage): if re.match(r'\s*<div\s+class="video-error', info_webpage):
raise ExtractorError( raise ExtractorError(
u'Video %s is blocked from your location.' % video_id, 'Video %s is blocked from your location.' % video_id,
expected=True) expected=True)
video_url = self._html_search_regex( video_url = self._html_search_regex(
r'<source[^>]+src="([^"]+)"', info_webpage, u'video URL') r'<source[^>]+src="([^"]+)"', info_webpage, 'video URL')
upload_date_str = self._html_search_regex( upload_date_str = self._html_search_regex(
r'"created_at":"([^"]+)"', info_webpage, u'upload date') r'"created_at":"([^"]+)"', info_webpage, 'upload date')
upload_date = ( upload_date = (
unified_strdate(upload_date_str) unified_strdate(upload_date_str)
if upload_date_str is not None if upload_date_str is not None

View File

@@ -102,6 +102,15 @@ class VimeoIE(SubtitlesInfoExtractor):
}, },
] ]
@classmethod
def suitable(cls, url):
if VimeoChannelIE.suitable(url):
# Otherwise channel urls like http://vimeo.com/channels/31259 would
# match
return False
else:
return super(VimeoIE, cls).suitable(url)
def _login(self): def _login(self):
(username, password) = self._get_login_info() (username, password) = self._get_login_info()
if username is None: if username is None:
@@ -332,7 +341,7 @@ class VimeoIE(SubtitlesInfoExtractor):
class VimeoChannelIE(InfoExtractor): class VimeoChannelIE(InfoExtractor):
IE_NAME = 'vimeo:channel' IE_NAME = 'vimeo:channel'
_VALID_URL = r'(?:https?://)?vimeo\.com/channels/(?P<id>[^/]+)' _VALID_URL = r'(?:https?://)?vimeo\.com/channels/(?P<id>[^/]+)/?(\?.*)?$'
_MORE_PAGES_INDICATOR = r'<a.+?rel="next"' _MORE_PAGES_INDICATOR = r'<a.+?rel="next"'
_TITLE_RE = r'<link rel="alternate"[^>]+?title="(.*?)"' _TITLE_RE = r'<link rel="alternate"[^>]+?title="(.*?)"'

114
youtube_dl/extractor/wdr.py Normal file
View File

@@ -0,0 +1,114 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
unified_strdate,
compat_urlparse,
determine_ext,
)
class WDRIE(InfoExtractor):
_PLAYER_REGEX = '-(?:video|audio)player(?:_size-[LMS])?'
_VALID_URL = r'(?P<url>https?://www\d?\.(?:wdr\d?|funkhauseuropa)\.de/)(?P<id>.+?)(?P<player>%s)?\.html' % _PLAYER_REGEX
_TESTS = [
{
'url': 'http://www1.wdr.de/mediathek/video/sendungen/servicezeit/videoservicezeit560-videoplayer_size-L.html',
'info_dict': {
'id': 'mdb-362427',
'ext': 'flv',
'title': 'Servicezeit',
'description': 'md5:c8f43e5e815eeb54d0b96df2fba906cb',
'upload_date': '20140310',
},
'params': {
'skip_download': True,
},
},
{
'url': 'http://www1.wdr.de/themen/av/videomargaspiegelisttot101-videoplayer.html',
'info_dict': {
'id': 'mdb-363194',
'ext': 'flv',
'title': 'Marga Spiegel ist tot',
'description': 'md5:2309992a6716c347891c045be50992e4',
'upload_date': '20140311',
},
'params': {
'skip_download': True,
},
},
{
'url': 'http://www1.wdr.de/themen/kultur/audioerlebtegeschichtenmargaspiegel100-audioplayer.html',
'md5': '83e9e8fefad36f357278759870805898',
'info_dict': {
'id': 'mdb-194332',
'ext': 'mp3',
'title': 'Erlebte Geschichten: Marga Spiegel (29.11.2009)',
'description': 'md5:2309992a6716c347891c045be50992e4',
'upload_date': '20091129',
},
},
{
'url': 'http://www.funkhauseuropa.de/av/audiogrenzenlosleckerbaklava101-audioplayer.html',
'md5': 'cfff440d4ee64114083ac44676df5d15',
'info_dict': {
'id': 'mdb-363068',
'ext': 'mp3',
'title': 'Grenzenlos lecker - Baklava',
'description': 'md5:7b29e97e10dfb6e265238b32fa35b23a',
'upload_date': '20140311',
},
},
]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
page_url = mobj.group('url')
page_id = mobj.group('id')
webpage = self._download_webpage(url, page_id)
if mobj.group('player') is None:
entries = [
self.url_result(page_url + href, 'WDR')
for href in re.findall(r'<a href="/?(.+?%s\.html)" rel="nofollow"' % self._PLAYER_REGEX, webpage)
]
return self.playlist_result(entries, page_id)
flashvars = compat_urlparse.parse_qs(
self._html_search_regex(r'<param name="flashvars" value="([^"]+)"', webpage, 'flashvars'))
page_id = flashvars['trackerClipId'][0]
video_url = flashvars['dslSrc'][0]
title = flashvars['trackerClipTitle'][0]
thumbnail = flashvars['startPicture'][0] if 'startPicture' in flashvars else None
if 'trackerClipAirTime' in flashvars:
upload_date = flashvars['trackerClipAirTime'][0]
else:
upload_date = self._html_search_meta('DC.Date', webpage, 'upload date')
if upload_date:
upload_date = unified_strdate(upload_date)
if video_url.endswith('.f4m'):
video_url += '?hdcore=3.2.0&plugin=aasp-3.2.0.77.18'
ext = 'flv'
else:
ext = determine_ext(video_url)
description = self._html_search_meta('Description', webpage, 'description')
return {
'id': page_id,
'url': video_url,
'ext': ext,
'title': title,
'description': description,
'thumbnail': thumbnail,
'upload_date': upload_date,
}

View File

@@ -1,3 +1,6 @@
from __future__ import unicode_literals
import json import json
import re import re
import sys import sys
@@ -17,24 +20,25 @@ from ..aes import (
class YouPornIE(InfoExtractor): class YouPornIE(InfoExtractor):
_VALID_URL = r'^(?:https?://)?(?:www\.)?(?P<url>youporn\.com/watch/(?P<videoid>[0-9]+)/(?P<title>[^/]+))' _VALID_URL = r'^(?P<proto>https?://)(?:www\.)?(?P<url>youporn\.com/watch/(?P<videoid>[0-9]+)/(?P<title>[^/]+))'
_TEST = { _TEST = {
u'url': u'http://www.youporn.com/watch/505835/sex-ed-is-it-safe-to-masturbate-daily/', 'url': 'http://www.youporn.com/watch/505835/sex-ed-is-it-safe-to-masturbate-daily/',
u'file': u'505835.mp4', 'md5': '71ec5fcfddacf80f495efa8b6a8d9a89',
u'md5': u'71ec5fcfddacf80f495efa8b6a8d9a89', 'info_dict': {
u'info_dict': { 'id': '505835',
u"upload_date": u"20101221", 'ext': 'mp4',
u"description": u"Love & Sex Answers: http://bit.ly/DanAndJenn -- Is It Unhealthy To Masturbate Daily?", 'upload_date': '20101221',
u"uploader": u"Ask Dan And Jennifer", 'description': 'Love & Sex Answers: http://bit.ly/DanAndJenn -- Is It Unhealthy To Masturbate Daily?',
u"title": u"Sex Ed: Is It Safe To Masturbate Daily?", 'uploader': 'Ask Dan And Jennifer',
u"age_limit": 18, 'title': 'Sex Ed: Is It Safe To Masturbate Daily?',
'age_limit': 18,
} }
} }
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('videoid') video_id = mobj.group('videoid')
url = 'http://www.' + mobj.group('url') url = mobj.group('proto') + 'www.' + mobj.group('url')
req = compat_urllib_request.Request(url) req = compat_urllib_request.Request(url)
req.add_header('Cookie', 'age_verified=1') req.add_header('Cookie', 'age_verified=1')
@@ -42,7 +46,7 @@ class YouPornIE(InfoExtractor):
age_limit = self._rta_search(webpage) age_limit = self._rta_search(webpage)
# Get JSON parameters # Get JSON parameters
json_params = self._search_regex(r'var currentVideo = new Video\((.*)\);', webpage, u'JSON parameters') json_params = self._search_regex(r'var currentVideo = new Video\((.*)\);', webpage, 'JSON parameters')
try: try:
params = json.loads(json_params) params = json.loads(json_params)
except: except:
@@ -61,7 +65,7 @@ class YouPornIE(InfoExtractor):
# Get all of the links from the page # Get all of the links from the page
DOWNLOAD_LIST_RE = r'(?s)<ul class="downloadList">(?P<download_list>.*?)</ul>' DOWNLOAD_LIST_RE = r'(?s)<ul class="downloadList">(?P<download_list>.*?)</ul>'
download_list_html = self._search_regex(DOWNLOAD_LIST_RE, download_list_html = self._search_regex(DOWNLOAD_LIST_RE,
webpage, u'download list').strip() webpage, 'download list').strip()
LINK_RE = r'<a href="([^"]+)">' LINK_RE = r'<a href="([^"]+)">'
links = re.findall(LINK_RE, download_list_html) links = re.findall(LINK_RE, download_list_html)
@@ -86,7 +90,7 @@ class YouPornIE(InfoExtractor):
resolution = format_parts[0] resolution = format_parts[0]
height = int(resolution[:-len('p')]) height = int(resolution[:-len('p')])
bitrate = int(format_parts[1][:-len('k')]) bitrate = int(format_parts[1][:-len('k')])
format = u'-'.join(format_parts) + u'-' + dn format = '-'.join(format_parts) + '-' + dn
formats.append({ formats.append({
'url': video_url, 'url': video_url,

View File

@@ -194,14 +194,14 @@ class YoutubeIE(YoutubeBaseInfoExtractor, SubtitlesInfoExtractor):
'151': {'ext': 'mp4', 'height': 72, 'resolution': '72p', 'format_note': 'HLS', 'preference': -10}, '151': {'ext': 'mp4', 'height': 72, 'resolution': '72p', 'format_note': 'HLS', 'preference': -10},
# DASH mp4 video # DASH mp4 video
'133': {'ext': 'mp4', 'height': 240, 'resolution': '240p', 'format_note': 'DASH video', 'preference': -40}, '133': {'ext': 'mp4', 'height': 240, 'resolution': '240p', 'format_note': 'DASH video', 'acodec': 'none', 'preference': -40},
'134': {'ext': 'mp4', 'height': 360, 'resolution': '360p', 'format_note': 'DASH video', 'preference': -40}, '134': {'ext': 'mp4', 'height': 360, 'resolution': '360p', 'format_note': 'DASH video', 'acodec': 'none', 'preference': -40},
'135': {'ext': 'mp4', 'height': 480, 'resolution': '480p', 'format_note': 'DASH video', 'preference': -40}, '135': {'ext': 'mp4', 'height': 480, 'resolution': '480p', 'format_note': 'DASH video', 'acodec': 'none', 'preference': -40},
'136': {'ext': 'mp4', 'height': 720, 'resolution': '720p', 'format_note': 'DASH video', 'preference': -40}, '136': {'ext': 'mp4', 'height': 720, 'resolution': '720p', 'format_note': 'DASH video', 'acodec': 'none', 'preference': -40},
'137': {'ext': 'mp4', 'height': 1080, 'resolution': '1080p', 'format_note': 'DASH video', 'preference': -40}, '137': {'ext': 'mp4', 'height': 1080, 'resolution': '1080p', 'format_note': 'DASH video', 'acodec': 'none', 'preference': -40},
'138': {'ext': 'mp4', 'height': 2160, 'resolution': '2160p', 'format_note': 'DASH video', 'preference': -40}, '138': {'ext': 'mp4', 'height': 2160, 'resolution': '2160p', 'format_note': 'DASH video', 'acodec': 'none', 'preference': -40},
'160': {'ext': 'mp4', 'height': 192, 'resolution': '192p', 'format_note': 'DASH video', 'preference': -40}, '160': {'ext': 'mp4', 'height': 192, 'resolution': '192p', 'format_note': 'DASH video', 'acodec': 'none', 'preference': -40},
'264': {'ext': 'mp4', 'height': 1440, 'resolution': '1440p', 'format_note': 'DASH video', 'preference': -40}, '264': {'ext': 'mp4', 'height': 1440, 'resolution': '1440p', 'format_note': 'DASH video', 'acodec': 'none', 'preference': -40},
# Dash mp4 audio # Dash mp4 audio
'139': {'ext': 'm4a', 'format_note': 'DASH audio', 'vcodec': 'none', 'abr': 48, 'preference': -50}, '139': {'ext': 'm4a', 'format_note': 'DASH audio', 'vcodec': 'none', 'abr': 48, 'preference': -50},
@@ -209,12 +209,12 @@ class YoutubeIE(YoutubeBaseInfoExtractor, SubtitlesInfoExtractor):
'141': {'ext': 'm4a', 'format_note': 'DASH audio', 'vcodec': 'none', 'abr': 256, 'preference': -50}, '141': {'ext': 'm4a', 'format_note': 'DASH audio', 'vcodec': 'none', 'abr': 256, 'preference': -50},
# Dash webm # Dash webm
'167': {'ext': 'webm', 'height': 360, 'width': 640, 'format_note': 'DASH video', 'container': 'webm', 'vcodec': 'VP8', 'acodec': 'none', 'preference': -40}, '167': {'ext': 'webm', 'height': 360, 'width': 640, 'format_note': 'DASH video', 'acodec': 'none', 'container': 'webm', 'vcodec': 'VP8', 'acodec': 'none', 'preference': -40},
'168': {'ext': 'webm', 'height': 480, 'width': 854, 'format_note': 'DASH video', 'container': 'webm', 'vcodec': 'VP8', 'acodec': 'none', 'preference': -40}, '168': {'ext': 'webm', 'height': 480, 'width': 854, 'format_note': 'DASH video', 'acodec': 'none', 'container': 'webm', 'vcodec': 'VP8', 'acodec': 'none', 'preference': -40},
'169': {'ext': 'webm', 'height': 720, 'width': 1280, 'format_note': 'DASH video', 'container': 'webm', 'vcodec': 'VP8', 'acodec': 'none', 'preference': -40}, '169': {'ext': 'webm', 'height': 720, 'width': 1280, 'format_note': 'DASH video', 'acodec': 'none', 'container': 'webm', 'vcodec': 'VP8', 'acodec': 'none', 'preference': -40},
'170': {'ext': 'webm', 'height': 1080, 'width': 1920, 'format_note': 'DASH video', 'container': 'webm', 'vcodec': 'VP8', 'acodec': 'none', 'preference': -40}, '170': {'ext': 'webm', 'height': 1080, 'width': 1920, 'format_note': 'DASH video', 'acodec': 'none', 'container': 'webm', 'vcodec': 'VP8', 'acodec': 'none', 'preference': -40},
'218': {'ext': 'webm', 'height': 480, 'width': 854, 'format_note': 'DASH video', 'container': 'webm', 'vcodec': 'VP8', 'acodec': 'none', 'preference': -40}, '218': {'ext': 'webm', 'height': 480, 'width': 854, 'format_note': 'DASH video', 'acodec': 'none', 'container': 'webm', 'vcodec': 'VP8', 'acodec': 'none', 'preference': -40},
'219': {'ext': 'webm', 'height': 480, 'width': 854, 'format_note': 'DASH video', 'container': 'webm', 'vcodec': 'VP8', 'acodec': 'none', 'preference': -40}, '219': {'ext': 'webm', 'height': 480, 'width': 854, 'format_note': 'DASH video', 'acodec': 'none', 'container': 'webm', 'vcodec': 'VP8', 'acodec': 'none', 'preference': -40},
'242': {'ext': 'webm', 'height': 240, 'resolution': '240p', 'format_note': 'DASH webm', 'preference': -40}, '242': {'ext': 'webm', 'height': 240, 'resolution': '240p', 'format_note': 'DASH webm', 'preference': -40},
'243': {'ext': 'webm', 'height': 360, 'resolution': '360p', 'format_note': 'DASH webm', 'preference': -40}, '243': {'ext': 'webm', 'height': 360, 'resolution': '360p', 'format_note': 'DASH webm', 'preference': -40},
'244': {'ext': 'webm', 'height': 480, 'resolution': '480p', 'format_note': 'DASH webm', 'preference': -40}, '244': {'ext': 'webm', 'height': 480, 'resolution': '480p', 'format_note': 'DASH webm', 'preference': -40},
@@ -1130,14 +1130,18 @@ class YoutubeIE(YoutubeBaseInfoExtractor, SubtitlesInfoExtractor):
return self._download_webpage(url, video_id, note=u'Searching for annotations.', errnote=u'Unable to download video annotations.') return self._download_webpage(url, video_id, note=u'Searching for annotations.', errnote=u'Unable to download video annotations.')
def _real_extract(self, url): def _real_extract(self, url):
proto = (
u'http' if self._downloader.params.get('prefer_insecure', False)
else u'https')
# Extract original video URL from URL with redirection, like age verification, using next_url parameter # Extract original video URL from URL with redirection, like age verification, using next_url parameter
mobj = re.search(self._NEXT_URL_RE, url) mobj = re.search(self._NEXT_URL_RE, url)
if mobj: if mobj:
url = 'https://www.youtube.com/' + compat_urllib_parse.unquote(mobj.group(1)).lstrip('/') url = proto + '://www.youtube.com/' + compat_urllib_parse.unquote(mobj.group(1)).lstrip('/')
video_id = self.extract_id(url) video_id = self.extract_id(url)
# Get video webpage # Get video webpage
url = 'https://www.youtube.com/watch?v=%s&gl=US&hl=en&has_verified=1' % video_id url = proto + '://www.youtube.com/watch?v=%s&gl=US&hl=en&has_verified=1' % video_id
video_webpage = self._download_webpage(url, video_id) video_webpage = self._download_webpage(url, video_id)
# Attempt to extract SWF player URL # Attempt to extract SWF player URL
@@ -1162,7 +1166,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor, SubtitlesInfoExtractor):
'asv': 3, 'asv': 3,
'sts':'1588', 'sts':'1588',
}) })
video_info_url = 'https://www.youtube.com/get_video_info?' + data video_info_url = proto + '://www.youtube.com/get_video_info?' + data
video_info_webpage = self._download_webpage(video_info_url, video_id, video_info_webpage = self._download_webpage(video_info_url, video_id,
note=False, note=False,
errnote='unable to download video info webpage') errnote='unable to download video info webpage')
@@ -1170,7 +1174,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor, SubtitlesInfoExtractor):
else: else:
age_gate = False age_gate = False
for el_type in ['&el=embedded', '&el=detailpage', '&el=vevo', '']: for el_type in ['&el=embedded', '&el=detailpage', '&el=vevo', '']:
video_info_url = ('https://www.youtube.com/get_video_info?&video_id=%s%s&ps=default&eurl=&gl=US&hl=en' video_info_url = (proto + '://www.youtube.com/get_video_info?&video_id=%s%s&ps=default&eurl=&gl=US&hl=en'
% (video_id, el_type)) % (video_id, el_type))
video_info_webpage = self._download_webpage(video_info_url, video_id, video_info_webpage = self._download_webpage(video_info_url, video_id,
note=False, note=False,
@@ -1445,7 +1449,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor, SubtitlesInfoExtractor):
'duration': video_duration, 'duration': video_duration,
'age_limit': 18 if age_gate else 0, 'age_limit': 18 if age_gate else 0,
'annotations': video_annotations, 'annotations': video_annotations,
'webpage_url': 'https://www.youtube.com/watch?v=%s' % video_id, 'webpage_url': proto + '://www.youtube.com/watch?v=%s' % video_id,
'view_count': view_count, 'view_count': view_count,
'like_count': like_count, 'like_count': like_count,
'dislike_count': dislike_count, 'dislike_count': dislike_count,

View File

@@ -6,6 +6,7 @@ import ctypes
import datetime import datetime
import email.utils import email.utils
import errno import errno
import getpass
import gzip import gzip
import itertools import itertools
import io import io
@@ -778,6 +779,7 @@ def unified_strdate(date_str):
'%Y/%m/%d %H:%M:%S', '%Y/%m/%d %H:%M:%S',
'%Y-%m-%d %H:%M:%S', '%Y-%m-%d %H:%M:%S',
'%d.%m.%Y %H:%M', '%d.%m.%Y %H:%M',
'%d.%m.%Y %H.%M',
'%Y-%m-%dT%H:%M:%SZ', '%Y-%m-%dT%H:%M:%SZ',
'%Y-%m-%dT%H:%M:%S.%fZ', '%Y-%m-%dT%H:%M:%S.%fZ',
'%Y-%m-%dT%H:%M:%S.%f0Z', '%Y-%m-%dT%H:%M:%S.%f0Z',
@@ -1278,3 +1280,21 @@ def parse_xml(s):
parser = xml.etree.ElementTree.XMLParser(target=TreeBuilder()) parser = xml.etree.ElementTree.XMLParser(target=TreeBuilder())
kwargs = {'parser': parser} if sys.version_info >= (2, 7) else {} kwargs = {'parser': parser} if sys.version_info >= (2, 7) else {}
return xml.etree.ElementTree.XML(s.encode('utf-8'), **kwargs) return xml.etree.ElementTree.XML(s.encode('utf-8'), **kwargs)
if sys.version_info < (3, 0) and sys.platform == 'win32':
def compat_getpass(prompt, *args, **kwargs):
if isinstance(prompt, compat_str):
prompt = prompt.encode(preferredencoding())
return getpass.getpass(prompt, *args, **kwargs)
else:
compat_getpass = getpass.getpass
US_RATINGS = {
'G': 0,
'PG': 10,
'PG-13': 13,
'R': 16,
'NC': 18,
}

View File

@@ -1,2 +1,2 @@
__version__ = '2014.03.11' __version__ = '2014.03.21.3'