Compare commits

...

78 Commits

Author SHA1 Message Date
Philipp Hagemeister
8db69786c2 release 2014.01.29 2014-01-29 11:16:28 +01:00
Philipp Hagemeister
b11cec4162 [youtube:user] Fix id key (Fixes #1745) 2014-01-29 11:16:12 +01:00
Philipp Hagemeister
7eeb5bef24 [liveleak] Simplify 2014-01-28 21:57:38 +01:00
Philipp Hagemeister
9d2032932c Merge remote-tracking branch 'dstftw/ivi' 2014-01-28 21:47:05 +01:00
Philipp Hagemeister
6490306017 Merge remote-tracking branch 'dstftw/channel9' 2014-01-28 21:46:42 +01:00
dst
ceb2b7d257 [ivi] Fix test and use unicode literals 2014-01-29 02:20:48 +07:00
dst
459a53c2c2 [channel9] Remove unnecessary coding cookie 2014-01-29 02:07:29 +07:00
dst
adc267eebf [channel9] Use unicode literals 2014-01-29 02:00:56 +07:00
dst
ffe8f62d27 [smotri] Simplify login and use unicode literals 2014-01-29 01:52:57 +07:00
Jaime Marquínez Ferrándiz
ed85007039 [ninegag] Use unicode_literals 2014-01-28 18:55:06 +01:00
Jaime Marquínez Ferrándiz
5aaca50d60 [keek] Simplify and use unicode_literals 2014-01-28 18:47:31 +01:00
Jaime Marquínez Ferrándiz
869baf3565 [funnyordie] Simplify and use unicode_literals 2014-01-28 18:41:39 +01:00
Philipp Hagemeister
e299f6d27f [pornhd] Fix 2014-01-28 03:53:00 +01:00
Philipp Hagemeister
4a192f817e release 2014.01.28.1 2014-01-28 03:44:19 +01:00
Philipp Hagemeister
bc1d1a5a71 release 2014.01.28 2014-01-28 03:37:42 +01:00
Philipp Hagemeister
456895d9cf [tumblr] Test new URL format (#2255) 2014-01-28 03:37:38 +01:00
Philipp Hagemeister
218c15ab59 Merge remote-tracking branch 'mike/tumblr-url' 2014-01-28 03:35:52 +01:00
Philipp Hagemeister
17ab4d3b5e [brightcove] Move test to generic 2014-01-28 03:35:32 +01:00
Philipp Hagemeister
31ef0ff038 Merge remote-tracking branch 'dstftw/rutube-channel' 2014-01-28 03:32:22 +01:00
Philipp Hagemeister
37e3b90d59 [rutube] Simplify 2014-01-28 03:32:07 +01:00
dst
00ff8f92a5 [rutube] Update test 2014-01-28 09:31:14 +07:00
Philipp Hagemeister
4857beba3a Merge remote-tracking branch 'dstftw/rutube-channel' 2014-01-28 03:30:21 +01:00
Philipp Hagemeister
c1e60cc2bf Merge remote-tracking branch 'dstftw/master' 2014-01-28 03:29:10 +01:00
dst
98669ed79c [imdb] Fix playlist test 2014-01-28 09:13:08 +07:00
dst
a3978a6159 [imdb] Fix duplicated entries bug 2014-01-28 09:12:23 +07:00
dst
e3a9f32f52 [rutube] Add support for user videos 2014-01-28 08:47:17 +07:00
dst
87fac3238d [rutube] Add channel test 2014-01-28 08:25:56 +07:00
dst
a2fb2a2134 [rutube] Improve video extractor 2014-01-28 08:19:45 +07:00
MikeCol
9e8ee54553 VALID_URL changed to match different kinds of Tumblr-URLs 2014-01-28 01:41:18 +01:00
Philipp Hagemeister
117bec936c [brightcove] Parse URL from meta element if available (Fixes #2253) 2014-01-28 01:01:23 +01:00
dst
1547c8cc88 [rutube] Add support for channels and movies 2014-01-28 06:56:09 +07:00
Philipp Hagemeister
075911d48e [la7] Skip test on travis 2014-01-27 23:47:22 +01:00
Philipp Hagemeister
b21a918984 release 2014.01.27.2 2014-01-27 19:22:45 +01:00
Philipp Hagemeister
f9b8549609 [ard] Support multiple formats (Closes #2247) 2014-01-27 18:40:10 +01:00
Jaime Marquínez Ferrándiz
e2ba07024f Merge remote-tracking branch 'origin/master' 2014-01-27 12:45:59 +01:00
Jaime Marquínez Ferrándiz
9b05bd42e5 [discovery] Extract more info and simplify 2014-01-27 12:41:30 +01:00
Philipp Hagemeister
b6d3a99678 [cliphunter] Simplify (#2233) 2014-01-27 12:39:39 +01:00
Jaime Marquínez Ferrándiz
96d7b8873a Merge remote-tracking branch 'sahutd/master' 2014-01-27 12:21:00 +01:00
Philipp Hagemeister
efc867775e [cliphunter] Simplify 2014-01-27 07:55:30 +01:00
Philipp Hagemeister
5ab772f09c Merge branch 'cliphunter' of https://github.com/pornophage/youtube-dl 2014-01-27 07:48:51 +01:00
Philipp Hagemeister
2a89386232 Credit @MikeCol for malemotion IE 2014-01-27 07:43:41 +01:00
MikeCol
4d9be98dbc Malemotion extractor 2014-01-27 07:43:02 +01:00
Mike Col
6737907826 [tumblr] Fix thumbnail extraction
Signed-off-by: Philipp Hagemeister <phihag@phihag.de>
2014-01-27 07:38:55 +01:00
Philipp Hagemeister
c060b77446 [tumblr] Use unicode_literals 2014-01-27 07:36:18 +01:00
Philipp Hagemeister
7e8caf30c0 Throw an error if no video formats are found 2014-01-27 07:31:54 +01:00
Philipp Hagemeister
ca3e054750 release 2014.01.27.1 2014-01-27 07:09:55 +01:00
Philipp Hagemeister
1da1558f46 [la7] Support more URLs 2014-01-27 07:08:01 +01:00
Philipp Hagemeister
25c67d257c release 2014.01.27 2014-01-27 07:05:39 +01:00
Philipp Hagemeister
a17d16d59c [la7] Add support 2014-01-27 07:05:28 +01:00
Philipp Hagemeister
d16076ff3e [huffpost] Fix extractor 2014-01-27 06:55:35 +01:00
Philipp Hagemeister
6c57e8a063 [setup.py] Only print a warning if documentation files are missing (Fixes #780) 2014-01-27 06:22:15 +01:00
Philipp Hagemeister
db1f388878 [huffpost] Add support 2014-01-27 05:47:38 +01:00
Philipp Hagemeister
0f2999fe2b Merge pull request #2221 from Rudloff/master
Removed websurg extractor
2014-01-26 18:03:26 -08:00
sahutd
53bfd6b24c Added support for Discovery Issue #2227 2014-01-26 14:05:34 +05:30
Jaime Marquínez Ferrándiz
5700e7792a [youtube] Encode the data when submitting the form for confirming the age
Needed on python 3
2014-01-25 17:22:41 +01:00
Jaime Marquínez Ferrándiz
38c2e5b8d5 [youtube] Use https: in more urls 2014-01-25 17:11:55 +01:00
Jaime Marquínez Ferrándiz
48f9678a32 [test/youtube_lists] Change the list used for testing the Top Lists extractor
The ‘Top tracks’ list is not always present in the channel page
2014-01-25 17:02:32 +01:00
Jaime Marquínez Ferrándiz
beddbc2ad1 [youtube:toplist] Make the regex for finding the playlist link more flexible
`title={foo}` may not be at the end of the `href` string.
2014-01-25 15:47:03 +01:00
Jaime Marquínez Ferrándiz
f89197d73e Some pep8 style fixes 2014-01-25 15:33:23 +01:00
Jaime Marquínez Ferrándiz
944d65c762 [extractor/common] Encode the url when calculating the md5 with —write-pages option
This doesn’t cause any problem in python 2.*, but on python 3 the `md5` function only accepts bytes.
2014-01-25 15:32:56 +01:00
Philipp Hagemeister
f945612bd0 [rtlnow] Simplify 2014-01-25 14:18:54 +01:00
Jaime Marquínez Ferrándiz
59188de113 Properly escape ‘.’ in some _VALID_URL properties 2014-01-25 11:48:08 +01:00
Jaime Marquínez Ferrándiz
352d08e3e5 Add an extractor for freespeech.org (closes #2234) 2014-01-25 11:31:30 +01:00
Pornophage
bacb5e4f44 Minor fixes
Remove empty description
Set correct md5 test
2014-01-25 02:34:08 +01:00
Pornophage
008af8660b Add cliphunter extractor 2014-01-25 01:46:52 +01:00
Philipp Hagemeister
886fa72324 release 2014.01.23.4 2014-01-24 00:06:55 +01:00
Philipp Hagemeister
2c5bae429a [youtube] Fix new formats 2014-01-24 00:06:26 +01:00
Philipp Hagemeister
f265fc1238 release 2014.01.23.3 2014-01-23 23:55:53 +01:00
Philipp Hagemeister
1394ce65b4 [youtube] Add new formats (Fixes #2221) 2014-01-23 23:54:06 +01:00
Pierre Rudloff
67ccb77197 Removed websurg extractor 2014-01-23 23:42:34 +01:00
Philipp Hagemeister
63ef36e8d8 Add build instructions (Fixes #2218) 2014-01-23 23:28:29 +01:00
Philipp Hagemeister
0b65e5d40f [youtube] Do not break upon unknown formats 2014-01-23 23:21:42 +01:00
Philipp Hagemeister
629be17af4 release 2014.01.23.2 2014-01-23 19:05:05 +01:00
Philipp Hagemeister
fd28827864 Do not count unmatched videos for --max-downloads (Fixes #2211) 2014-01-23 19:04:22 +01:00
Philipp Hagemeister
8c61d9a9b1 Mention default for -f (Fixes #2215) 2014-01-23 18:50:04 +01:00
Philipp Hagemeister
975d35dbab [youtube:truncated_url] Also match mail subscription links (#2214) 2014-01-23 16:14:54 +01:00
Jaime Marquínez Ferrándiz
8b769664c4 [sina] Recognize http://video.sina.com.cn/v/b/{id}-*.html urls (fixes #2212) 2014-01-23 14:03:14 +01:00
Jaime Marquínez Ferrándiz
76f270a46a [sina] use unicode_literals 2014-01-23 14:00:29 +01:00
41 changed files with 1072 additions and 567 deletions

View File

@@ -181,7 +181,9 @@ which means you can modify it, redistribute it or use it however you like.
preference using slashes: "-f 22/17/18".
"-f mp4" and "-f flv" are also supported.
You can also use the special names "best",
"bestaudio", "worst", and "worstaudio"
"bestaudio", "worst", and "worstaudio". By
default, youtube-dl will pick the best
quality.
--all-formats download all available video formats
--prefer-free-formats prefer free video formats unless a specific
one is requested
@@ -323,11 +325,27 @@ Since June 2012 (#342) youtube-dl is packed as an executable zipfile, simply unz
To run the exe you need to install first the [Microsoft Visual C++ 2008 Redistributable Package](http://www.microsoft.com/en-us/download/details.aspx?id=29).
# COPYRIGHT
# BUILD INSTRUCTIONS
youtube-dl is released into the public domain by the copyright holders.
Most users do not need to build youtube-dl and can [download the builds](http://rg3.github.io/youtube-dl/download.html) or get them from their distribution.
This README file was originally written by Daniel Bolton (<https://github.com/dbbolton>) and is likewise released into the public domain.
To run youtube-dl as a developer, you don't need to build anything either. Simply execute
python -m youtube_dl
To run the test, simply invoke your favorite test runner, or execute a test file directly; any of the following work:
python -m unittest discover
python test/test_download.py
nosetests
If you want to create a build of youtube-dl yourself, you'll need
* python
* make
* pandoc
* zip
* nosetests
# BUGS
@@ -386,3 +404,9 @@ Only post features that you (or an incapicated friend you can personally talk to
### Is your question about youtube-dl?
It may sound strange, but some bug reports we receive are completely unrelated to youtube-dl and relate to a different or even the reporter's own application. Please make sure that you are actually using youtube-dl. If you are using a UI for youtube-dl, report the bug to the maintainer of the actual application providing the UI. On the other hand, if your UI for youtube-dl fails in some way you believe is related to youtube-dl, by all means, go ahead and report the bug.
# COPYRIGHT
youtube-dl is released into the public domain by the copyright holders.
This README file was originally written by Daniel Bolton (<https://github.com/dbbolton>) and is likewise released into the public domain.

View File

@@ -3,7 +3,9 @@
from __future__ import print_function
import os.path
import pkg_resources
import warnings
import sys
try:
@@ -44,12 +46,24 @@ py2exe_params = {
if len(sys.argv) >= 2 and sys.argv[1] == 'py2exe':
params = py2exe_params
else:
files_spec = [
('etc/bash_completion.d', ['youtube-dl.bash-completion']),
('share/doc/youtube_dl', ['README.txt']),
('share/man/man1', ['youtube-dl.1'])
]
root = os.path.dirname(os.path.abspath(__file__))
data_files = []
for dirname, files in files_spec:
resfiles = []
for fn in files:
if not os.path.exists(fn):
warnings.warn('Skipping file %s since it is not present. Type make to build all automatically generated files.' % fn)
else:
resfiles.append(fn)
data_files.append((dirname, resfiles))
params = {
'data_files': [ # Installing system-wide would require sudo...
('etc/bash_completion.d', ['youtube-dl.bash-completion']),
('share/doc/youtube_dl', ['README.txt']),
('share/man/man1', ['youtube-dl.1'])
]
'data_files': data_files,
}
if setuptools_available:
params['entry_points'] = {'console_scripts': ['youtube-dl = youtube_dl:main']}

View File

@@ -120,5 +120,9 @@ class TestAllURLsMatching(unittest.TestCase):
def test_soundcloud_not_matching_sets(self):
self.assertMatch('http://soundcloud.com/floex/sets/gone-ep', ['soundcloud:set'])
def test_tumblr(self):
self.assertMatch('http://tatianamaslanydaily.tumblr.com/post/54196191430/orphan-black-dvd-extra-behind-the-scenes', ['Tumblr'])
self.assertMatch('http://tatianamaslanydaily.tumblr.com/post/54196191430', ['Tumblr'])
if __name__ == '__main__':
unittest.main()

View File

@@ -33,6 +33,7 @@ from youtube_dl.extractor import (
ImdbListIE,
KhanAcademyIE,
EveryonesMixtapeIE,
RutubeChannelIE,
)
@@ -195,11 +196,11 @@ class TestPlaylists(unittest.TestCase):
def test_imdb_list(self):
dl = FakeYDL()
ie = ImdbListIE(dl)
result = ie.extract('http://www.imdb.com/list/sMjedvGDd8U')
result = ie.extract('http://www.imdb.com/list/JFs9NWw6XI0')
self.assertIsPlaylist(result)
self.assertEqual(result['id'], 'sMjedvGDd8U')
self.assertEqual(result['title'], 'Animated and Family Films')
self.assertTrue(len(result['entries']) >= 48)
self.assertEqual(result['id'], 'JFs9NWw6XI0')
self.assertEqual(result['title'], 'March 23, 2012 Releases')
self.assertEqual(len(result['entries']), 7)
def test_khanacademy_topic(self):
dl = FakeYDL()
@@ -219,6 +220,14 @@ class TestPlaylists(unittest.TestCase):
self.assertEqual(result['id'], 'm7m0jJAbMQi')
self.assertEqual(result['title'], 'Driving')
self.assertEqual(len(result['entries']), 24)
def test_rutube_channel(self):
dl = FakeYDL()
ie = RutubeChannelIE(dl)
result = ie.extract('http://rutube.ru/tags/video/1409')
self.assertIsPlaylist(result)
self.assertEqual(result['id'], '1409')
self.assertTrue(len(result['entries']) >= 34)
if __name__ == '__main__':

View File

@@ -120,7 +120,7 @@ class TestYoutubeLists(unittest.TestCase):
def test_youtube_toplist(self):
dl = FakeYDL()
ie = YoutubeTopListIE(dl)
result = ie.extract('yttoplist:music:Top Tracks')
result = ie.extract('yttoplist:music:Trending')
entries = result['entries']
self.assertTrue(len(entries) >= 5)

View File

@@ -331,7 +331,7 @@ class YoutubeDL(object):
def __exit__(self, *args):
self.restore_console_title()
if self.params.get('cookiefile') is not None:
self.cookiejar.save()
@@ -396,10 +396,6 @@ class YoutubeDL(object):
except UnicodeEncodeError:
self.to_screen('[download] The file has already been downloaded')
def increment_downloads(self):
"""Increment the ordinal that assigns a number to each file."""
self._num_downloads += 1
def prepare_filename(self, info_dict):
"""Generate the output filename."""
try:
@@ -714,10 +710,10 @@ class YoutubeDL(object):
# TODO Central sorting goes here
if formats[0] is not info_dict:
if formats[0] is not info_dict:
# only set the 'formats' fields if the original info_dict list them
# otherwise we end up with a circular reference, the first (and unique)
# element in the 'formats' field in info_dict is info_dict itself,
# element in the 'formats' field in info_dict is info_dict itself,
# wich can't be exported to json
info_dict['formats'] = formats
if self.params.get('listformats', None):
@@ -773,8 +769,11 @@ class YoutubeDL(object):
"""Process a single resolved IE result."""
assert info_dict.get('_type', 'video') == 'video'
#We increment the download the download count here to match the previous behaviour.
self.increment_downloads()
max_downloads = self.params.get('max_downloads')
if max_downloads is not None:
if self._num_downloads >= int(max_downloads):
raise MaxDownloadsReached()
info_dict['fulltitle'] = info_dict['title']
if len(info_dict['title']) > 200:
@@ -791,10 +790,7 @@ class YoutubeDL(object):
self.to_screen('[download] ' + reason)
return
max_downloads = self.params.get('max_downloads')
if max_downloads is not None:
if self._num_downloads > int(max_downloads):
raise MaxDownloadsReached()
self._num_downloads += 1
filename = self.prepare_filename(info_dict)
@@ -1098,9 +1094,15 @@ class YoutubeDL(object):
res += fdict['format_note'] + ' '
if fdict.get('tbr') is not None:
res += '%4dk ' % fdict['tbr']
if fdict.get('container') is not None:
if res:
res += ', '
res += '%s container' % fdict['container']
if (fdict.get('vcodec') is not None and
fdict.get('vcodec') != 'none'):
res += '%-5s' % fdict['vcodec']
if res:
res += ', '
res += fdict['vcodec']
if fdict.get('vbr') is not None:
res += '@'
elif fdict.get('vbr') is not None and fdict.get('abr') is not None:
@@ -1110,7 +1112,10 @@ class YoutubeDL(object):
if fdict.get('acodec') is not None:
if res:
res += ', '
res += '%-5s' % fdict['acodec']
if fdict['acodec'] == 'none':
res += 'video only'
else:
res += '%-5s' % fdict['acodec']
elif fdict.get('abr') is not None:
if res:
res += ', '

View File

@@ -40,6 +40,7 @@ __authors__ = (
'Michael Orlitzky',
'Chris Gahan',
'Saimadhav Heblikar',
'Mike Col',
)
__license__ = 'Public Domain'
@@ -261,7 +262,7 @@ def parseOpts(overrideArguments=None):
video_format.add_option('-f', '--format',
action='store', dest='format', metavar='FORMAT', default=None,
help='video format code, specify the order of preference using slashes: "-f 22/17/18". "-f mp4" and "-f flv" are also supported. You can also use the special names "best", "bestaudio", "worst", and "worstaudio"')
help='video format code, specify the order of preference using slashes: "-f 22/17/18". "-f mp4" and "-f flv" are also supported. You can also use the special names "best", "bestaudio", "worst", and "worstaudio". By default, youtube-dl will pick the best quality.')
video_format.add_option('--all-formats',
action='store_const', dest='format', help='download all available video formats', const='all')
video_format.add_option('--prefer-free-formats',

View File

@@ -1,3 +1,5 @@
from __future__ import unicode_literals
from .common import FileDownloader
from .hls import HlsFD
from .http import HttpFD
@@ -8,16 +10,17 @@ from ..utils import (
determine_ext,
)
def get_suitable_downloader(info_dict):
"""Get the downloader class that can handle the info dict."""
url = info_dict['url']
protocol = info_dict.get('protocol')
if url.startswith('rtmp'):
return RtmpFD
if determine_ext(url) == u'm3u8':
if (protocol == 'm3u8') or (protocol is None and determine_ext(url) == 'm3u8'):
return HlsFD
if url.startswith('mms') or url.startswith('rtsp'):
return MplayerFD
else:
return HttpFD

View File

@@ -314,4 +314,3 @@ class FileDownloader(object):
if the download is successful.
"""
self._progress_hooks.append(ph)

View File

@@ -27,7 +27,7 @@ class HttpFD(FileDownloader):
request = compat_urllib_request.Request(url, None, headers)
if self.params.get('test', False):
request.add_header('Range','bytes=0-10240')
request.add_header('Range', 'bytes=0-10240')
# Establish possible resume length
if os.path.isfile(encodeFilename(tmpfilename)):
@@ -39,7 +39,7 @@ class HttpFD(FileDownloader):
if resume_len != 0:
if self.params.get('continuedl', False):
self.report_resuming_byte(resume_len)
request.add_header('Range','bytes=%d-' % resume_len)
request.add_header('Range', 'bytes=%d-' % resume_len)
open_mode = 'ab'
else:
resume_len = 0
@@ -100,7 +100,7 @@ class HttpFD(FileDownloader):
if data_len is not None:
data_len = int(data_len) + resume_len
min_data_len = self.params.get("min_filesize", None)
max_data_len = self.params.get("max_filesize", None)
max_data_len = self.params.get("max_filesize", None)
if min_data_len is not None and data_len < min_data_len:
self.to_screen(u'\r[download] File is smaller than min-filesize (%s bytes < %s bytes). Aborting.' % (data_len, min_data_len))
return False

View File

@@ -18,10 +18,10 @@ class MplayerFD(FileDownloader):
try:
subprocess.call(['mplayer', '-h'], stdout=(open(os.path.devnull, 'w')), stderr=subprocess.STDOUT)
except (OSError, IOError):
self.report_error(u'MMS or RTSP download detected but "%s" could not be run' % args[0] )
self.report_error(u'MMS or RTSP download detected but "%s" could not be run' % args[0])
return False
# Download using mplayer.
# Download using mplayer.
retval = subprocess.call(args)
if retval == 0:
fsize = os.path.getsize(encodeFilename(tmpfilename))

View File

@@ -27,6 +27,7 @@ from .cbs import CBSIE
from .channel9 import Channel9IE
from .cinemassacre import CinemassacreIE
from .clipfish import ClipfishIE
from .cliphunter import CliphunterIE
from .clipsyndicate import ClipsyndicateIE
from .cmt import CMTIE
from .cnn import CNNIE
@@ -47,6 +48,7 @@ from .depositfiles import DepositFilesIE
from .dotsub import DotsubIE
from .dreisat import DreiSatIE
from .defense import DefenseGouvFrIE
from .discovery import DiscoveryIE
from .dropbox import DropboxIE
from .ebaumsworld import EbaumsWorldIE
from .ehow import EHowIE
@@ -72,6 +74,7 @@ from .francetv import (
CultureboxIE,
)
from .freesound import FreesoundIE
from .freespeech import FreespeechIE
from .funnyordie import FunnyOrDieIE
from .gamekings import GamekingsIE
from .gamespot import GameSpotIE
@@ -82,6 +85,7 @@ from .googlesearch import GoogleSearchIE
from .hark import HarkIE
from .hotnewhiphop import HotNewHipHopIE
from .howcast import HowcastIE
from .huffpost import HuffPostIE
from .hypem import HypemIE
from .ign import IGNIE, OneUPIE
from .imdb import (
@@ -105,6 +109,7 @@ from .keezmovies import KeezMoviesIE
from .khanacademy import KhanAcademyIE
from .kickstarter import KickStarterIE
from .keek import KeekIE
from .la7 import LA7IE
from .liveleak import LiveLeakIE
from .livestream import LivestreamIE, LivestreamOriginalIE
from .lynda import (
@@ -112,6 +117,7 @@ from .lynda import (
LyndaCourseIE
)
from .macgamestore import MacGameStoreIE
from .malemotion import MalemotionIE
from .mdr import MDRIE
from .metacafe import MetacafeIE
from .metacritic import MetacriticIE
@@ -155,7 +161,12 @@ from .ro220 import Ro220IE
from .rottentomatoes import RottenTomatoesIE
from .roxwel import RoxwelIE
from .rtlnow import RTLnowIE
from .rutube import RutubeIE
from .rutube import (
RutubeIE,
RutubeChannelIE,
RutubeMovieIE,
RutubePersonIE,
)
from .servingsys import ServingSysIE
from .sina import SinaIE
from .slashdot import SlashdotIE
@@ -218,7 +229,6 @@ from .vine import VineIE
from .viki import VikiIE
from .vk import VKIE
from .wat import WatIE
from .websurg import WeBSurgIE
from .weibo import WeiboIE
from .wimp import WimpIE
from .wistia import WistiaIE

View File

@@ -1,22 +1,28 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
determine_ext,
ExtractorError,
)
class ARDIE(InfoExtractor):
_VALID_URL = r'^(?:https?://)?(?:(?:www\.)?ardmediathek\.de|mediathek\.daserste\.de)/(?:.*/)(?P<video_id>[^/\?]+)(?:\?.*)?'
_TITLE = r'<h1(?: class="boxTopHeadline")?>(?P<title>.*)</h1>'
_MEDIA_STREAM = r'mediaCollection\.addMediaStream\((?P<media_type>\d+), (?P<quality>\d+), "(?P<rtmp_url>[^"]*)", "(?P<video_url>[^"]*)", "[^"]*"\)'
_VALID_URL = r'^https?://(?:(?:www\.)?ardmediathek\.de|mediathek\.daserste\.de)/(?:.*/)(?P<video_id>[^/\?]+)(?:\?.*)?'
_TEST = {
u'url': u'http://www.ardmediathek.de/das-erste/tagesschau-in-100-sek?documentId=14077640',
u'file': u'14077640.mp4',
u'md5': u'6ca8824255460c787376353f9e20bbd8',
u'info_dict': {
u"title": u"11.04.2013 09:23 Uhr - Tagesschau in 100 Sekunden"
'url': 'http://www.ardmediathek.de/das-erste/guenther-jauch/edward-snowden-im-interview-held-oder-verraeter?documentId=19288786',
'file': '19288786.mp4',
'md5': '515bf47ce209fb3f5a61b7aad364634c',
'info_dict': {
'title': 'Edward Snowden im Interview - Held oder Verräter?',
'description': 'Edward Snowden hat alles aufs Spiel gesetzt, um die weltweite \xdcberwachung durch die Geheimdienste zu enttarnen. Nun stellt sich der ehemalige NSA-Mitarbeiter erstmals weltweit in einem TV-Interview den Fragen eines NDR-Journalisten. Die Sendung vom Sonntagabend.',
'thumbnail': 'http://www.ardmediathek.de/ard/servlet/contentblob/19/28/87/90/19288790/bild/2250037',
},
u'skip': u'Requires rtmpdump'
'skip': 'Blocked outside of Germany',
}
def _real_extract(self, url):
@@ -29,26 +35,49 @@ class ARDIE(InfoExtractor):
else:
video_id = m.group('video_id')
# determine title and media streams from webpage
html = self._download_webpage(url, video_id)
title = re.search(self._TITLE, html).group('title')
streams = [mo.groupdict() for mo in re.finditer(self._MEDIA_STREAM, html)]
webpage = self._download_webpage(url, video_id)
title = self._html_search_regex(
r'<h1(?:\s+class="boxTopHeadline")?>(.*?)</h1>', webpage, 'title')
description = self._html_search_meta(
'dcterms.abstract', webpage, 'description')
thumbnail = self._og_search_thumbnail(webpage)
streams = [
mo.groupdict()
for mo in re.finditer(
r'mediaCollection\.addMediaStream\((?P<media_type>\d+), (?P<quality>\d+), "(?P<rtmp_url>[^"]*)", "(?P<video_url>[^"]*)", "[^"]*"\)', webpage)]
if not streams:
assert '"fsk"' in html
raise ExtractorError(u'This video is only available after 8:00 pm')
if '"fsk"' in webpage:
raise ExtractorError('This video is only available after 20:00')
# choose default media type and highest quality for now
stream = max([s for s in streams if int(s["media_type"]) == 0],
key=lambda s: int(s["quality"]))
formats = []
for s in streams:
format = {
'quality': int(s['quality']),
}
if s.get('rtmp_url'):
format['protocol'] = 'rtmp'
format['url'] = s['rtmp_url']
format['playpath'] = s['video_url']
else:
format['url'] = s['video_url']
# there's two possibilities: RTMP stream or HTTP download
info = {'id': video_id, 'title': title, 'ext': 'mp4'}
if stream['rtmp_url']:
self.to_screen(u'RTMP download detected')
assert stream['video_url'].startswith('mp4:')
info["url"] = stream["rtmp_url"]
info["play_path"] = stream['video_url']
else:
assert stream["video_url"].endswith('.mp4')
info["url"] = stream["video_url"]
return [info]
quality_name = self._search_regex(
r'[,.]([a-zA-Z0-9_-]+),?\.mp4', format['url'],
'quality name', default='NA')
format['format_id'] = '%s-%s-%s-%s' % (
determine_ext(format['url']), quality_name, s['media_type'],
s['quality'])
formats.append(format)
self._sort_formats(formats)
return {
'id': video_id,
'title': title,
'description': description,
'formats': formats,
'thumbnail': thumbnail,
}

View File

@@ -23,7 +23,6 @@ from ..utils import (
class BrightcoveIE(InfoExtractor):
_VALID_URL = r'https?://.*brightcove\.com/(services|viewer).*\?(?P<query>.*)'
_FEDERATED_URL_TEMPLATE = 'http://c.brightcove.com/services/viewer/htmlFederated?%s'
_PLAYLIST_URL_TEMPLATE = 'http://c.brightcove.com/services/json/experience/runtime/?command=get_programming_for_experience&playerKey=%s'
_TESTS = [
{
@@ -70,7 +69,7 @@ class BrightcoveIE(InfoExtractor):
'description': 'md5:363109c02998fee92ec02211bd8000df',
'uploader': 'National Ballet of Canada',
},
},
}
]
@classmethod
@@ -131,6 +130,11 @@ class BrightcoveIE(InfoExtractor):
"""Try to extract the brightcove url from the wepbage, returns None
if it can't be found
"""
url_m = re.search(r'<meta\s+property="og:video"\s+content="(http://c.brightcove.com/[^"]+)"', webpage)
if url_m:
return url_m.group(1)
m_brightcove = re.search(
r'''(?sx)<object
(?:
@@ -183,8 +187,9 @@ class BrightcoveIE(InfoExtractor):
return self._extract_video_info(video_info)
def _get_playlist_info(self, player_key):
playlist_info = self._download_webpage(self._PLAYLIST_URL_TEMPLATE % player_key,
player_key, 'Downloading playlist information')
info_url = 'http://c.brightcove.com/services/json/experience/runtime/?command=get_programming_for_experience&playerKey=%s' % player_key
playlist_info = self._download_webpage(
info_url, player_key, 'Downloading playlist information')
json_data = json.loads(playlist_info)
if 'videoList' not in json_data:

View File

@@ -1,4 +1,4 @@
# encoding: utf-8
from __future__ import unicode_literals
import re
@@ -11,38 +11,38 @@ class Channel9IE(InfoExtractor):
The type of provided URL (video or playlist) is determined according to
meta Search.PageType from web page HTML rather than URL itself, as it is
not always possible to do.
not always possible to do.
'''
IE_DESC = u'Channel 9'
IE_NAME = u'channel9'
IE_DESC = 'Channel 9'
IE_NAME = 'channel9'
_VALID_URL = r'^https?://(?:www\.)?channel9\.msdn\.com/(?P<contentpath>.+)/?'
_TESTS = [
{
u'url': u'http://channel9.msdn.com/Events/TechEd/Australia/2013/KOS002',
u'file': u'Events_TechEd_Australia_2013_KOS002.mp4',
u'md5': u'bbd75296ba47916b754e73c3a4bbdf10',
u'info_dict': {
u'title': u'Developer Kick-Off Session: Stuff We Love',
u'description': u'md5:c08d72240b7c87fcecafe2692f80e35f',
u'duration': 4576,
u'thumbnail': u'http://media.ch9.ms/ch9/9d51/03902f2d-fc97-4d3c-b195-0bfe15a19d51/KOS002_220.jpg',
u'session_code': u'KOS002',
u'session_day': u'Day 1',
u'session_room': u'Arena 1A',
u'session_speakers': [ u'Ed Blankenship', u'Andrew Coates', u'Brady Gaster', u'Patrick Klug', u'Mads Kristensen' ],
'url': 'http://channel9.msdn.com/Events/TechEd/Australia/2013/KOS002',
'file': 'Events_TechEd_Australia_2013_KOS002.mp4',
'md5': 'bbd75296ba47916b754e73c3a4bbdf10',
'info_dict': {
'title': 'Developer Kick-Off Session: Stuff We Love',
'description': 'md5:c08d72240b7c87fcecafe2692f80e35f',
'duration': 4576,
'thumbnail': 'http://media.ch9.ms/ch9/9d51/03902f2d-fc97-4d3c-b195-0bfe15a19d51/KOS002_220.jpg',
'session_code': 'KOS002',
'session_day': 'Day 1',
'session_room': 'Arena 1A',
'session_speakers': [ 'Ed Blankenship', 'Andrew Coates', 'Brady Gaster', 'Patrick Klug', 'Mads Kristensen' ],
},
},
{
u'url': u'http://channel9.msdn.com/posts/Self-service-BI-with-Power-BI-nuclear-testing',
u'file': u'posts_Self-service-BI-with-Power-BI-nuclear-testing.mp4',
u'md5': u'b43ee4529d111bc37ba7ee4f34813e68',
u'info_dict': {
u'title': u'Self-service BI with Power BI - nuclear testing',
u'description': u'md5:d1e6ecaafa7fb52a2cacdf9599829f5b',
u'duration': 1540,
u'thumbnail': u'http://media.ch9.ms/ch9/87e1/0300391f-a455-4c72-bec3-4422f19287e1/selfservicenuk_512.jpg',
u'authors': [ u'Mike Wilmot' ],
'url': 'http://channel9.msdn.com/posts/Self-service-BI-with-Power-BI-nuclear-testing',
'file': 'posts_Self-service-BI-with-Power-BI-nuclear-testing.mp4',
'md5': 'b43ee4529d111bc37ba7ee4f34813e68',
'info_dict': {
'title': 'Self-service BI with Power BI - nuclear testing',
'description': 'md5:d1e6ecaafa7fb52a2cacdf9599829f5b',
'duration': 1540,
'thumbnail': 'http://media.ch9.ms/ch9/87e1/0300391f-a455-4c72-bec3-4422f19287e1/selfservicenuk_512.jpg',
'authors': [ 'Mike Wilmot' ],
},
}
]
@@ -60,7 +60,7 @@ class Channel9IE(InfoExtractor):
return 0
units = m.group('units')
try:
exponent = [u'B', u'KB', u'MB', u'GB', u'TB', u'PB', u'EB', u'ZB', u'YB'].index(units.upper())
exponent = ['B', 'KB', 'MB', 'GB', 'TB', 'PB', 'EB', 'ZB', 'YB'].index(units.upper())
except ValueError:
return 0
size = float(m.group('size'))
@@ -80,7 +80,7 @@ class Channel9IE(InfoExtractor):
'url': x.group('url'),
'format_id': x.group('quality'),
'format_note': x.group('note'),
'format': u'%s (%s)' % (x.group('quality'), x.group('note')),
'format': '%s (%s)' % (x.group('quality'), x.group('note')),
'filesize': self._restore_bytes(x.group('filesize')), # File size is approximate
'preference': self._known_formats.index(x.group('quality')),
'vcodec': 'none' if x.group('note') == 'Audio only' else None,
@@ -91,10 +91,10 @@ class Channel9IE(InfoExtractor):
return formats
def _extract_title(self, html):
title = self._html_search_meta(u'title', html, u'title')
title = self._html_search_meta('title', html, 'title')
if title is None:
title = self._og_search_title(html)
TITLE_SUFFIX = u' (Channel 9)'
TITLE_SUFFIX = ' (Channel 9)'
if title is not None and title.endswith(TITLE_SUFFIX):
title = title[:-len(TITLE_SUFFIX)]
return title
@@ -110,7 +110,7 @@ class Channel9IE(InfoExtractor):
m = re.search(DESCRIPTION_REGEX, html)
if m is not None:
return m.group('description')
return self._html_search_meta(u'description', html, u'description')
return self._html_search_meta('description', html, 'description')
def _extract_duration(self, html):
m = re.search(r'data-video_duration="(?P<hours>\d{2}):(?P<minutes>\d{2}):(?P<seconds>\d{2})"', html)
@@ -172,7 +172,7 @@ class Channel9IE(InfoExtractor):
# Nothing to download
if len(formats) == 0 and slides is None and zip_ is None:
self._downloader.report_warning(u'None of recording, slides or zip are available for %s' % content_path)
self._downloader.report_warning('None of recording, slides or zip are available for %s' % content_path)
return
# Extract meta
@@ -244,7 +244,7 @@ class Channel9IE(InfoExtractor):
return contents
def _extract_list(self, content_path):
rss = self._download_xml(self._RSS_URL % content_path, content_path, u'Downloading RSS')
rss = self._download_xml(self._RSS_URL % content_path, content_path, 'Downloading RSS')
entries = [self.url_result(session_url.text, 'Channel9')
for session_url in rss.findall('./channel/item/link')]
title_text = rss.find('./channel/title').text
@@ -254,11 +254,11 @@ class Channel9IE(InfoExtractor):
mobj = re.match(self._VALID_URL, url)
content_path = mobj.group('contentpath')
webpage = self._download_webpage(url, content_path, u'Downloading web page')
webpage = self._download_webpage(url, content_path, 'Downloading web page')
page_type_m = re.search(r'<meta name="Search.PageType" content="(?P<pagetype>[^"]+)"/>', webpage)
if page_type_m is None:
raise ExtractorError(u'Search.PageType not found, don\'t know how to process this page', expected=True)
raise ExtractorError('Search.PageType not found, don\'t know how to process this page', expected=True)
page_type = page_type_m.group('pagetype')
if page_type == 'List': # List page, may contain list of 'item'-like objects
@@ -268,4 +268,4 @@ class Channel9IE(InfoExtractor):
elif page_type == 'Session': # Event session page, may contain downloadable content
return self._extract_session(webpage, content_path)
else:
raise ExtractorError(u'Unexpected Search.PageType %s' % page_type, expected=True)
raise ExtractorError('Unexpected Search.PageType %s' % page_type, expected=True)

View File

@@ -0,0 +1,59 @@
from __future__ import unicode_literals
import re
import string
from .common import InfoExtractor
from ..utils import (
ExtractorError,
)
translation_table = {
'a': 'h', 'd': 'e', 'e': 'v', 'f': 'o', 'g': 'f', 'i': 'd', 'l': 'n',
'm': 'a', 'n': 'm', 'p': 'u', 'q': 't', 'r': 's', 'v': 'p', 'x': 'r',
'y': 'l', 'z': 'i',
'$': ':', '&': '.', '(': '=', '^': '&', '=': '/',
}
class CliphunterIE(InfoExtractor):
IE_NAME = 'cliphunter'
_VALID_URL = r'''(?x)http://(?:www\.)?cliphunter\.com/w/
(?P<id>[0-9]+)/
(?P<seo>.+?)(?:$|[#\?])
'''
_TEST = {
'url': 'http://www.cliphunter.com/w/1012420/Fun_Jynx_Maze_solo',
'file': '1012420.flv',
'md5': '15e7740f30428abf70f4223478dc1225',
'info_dict': {
'title': 'Fun Jynx Maze solo',
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
pl_fiji = self._search_regex(
r'pl_fiji = \'([^\']+)\'', webpage, 'video data')
pl_c_qual = self._search_regex(
r'pl_c_qual = "(.)"', webpage, 'video quality')
video_title = self._search_regex(
r'mediaTitle = "([^"]+)"', webpage, 'title')
video_url = ''.join(translation_table.get(c, c) for c in pl_fiji)
formats = [{
'url': video_url,
'format_id': pl_c_qual,
}]
return {
'id': video_id,
'title': video_title,
'formats': formats,
}

View File

@@ -66,11 +66,12 @@ class InfoExtractor(object):
* asr Audio sampling rate in Hertz
* vbr Average video bitrate in KBit/s
* vcodec Name of the video codec in use
* container Name of the container format
* filesize The number of bytes, if known in advance
* player_url SWF Player URL (used for rtmpdump).
* protocol The protocol that will be used for the actual
download, lower-case.
"http", "https", "rtsp", "rtmp" or so.
"http", "https", "rtsp", "rtmp", "m3u8" or so.
* preference Order number of this format. If this field is
present and not None, the formats get sorted
by this field.
@@ -239,7 +240,7 @@ class InfoExtractor(object):
except AttributeError:
url = url_or_request
if len(url) > 200:
h = u'___' + hashlib.md5(url).hexdigest()
h = u'___' + hashlib.md5(url.encode('utf-8')).hexdigest()
url = url[:200 - len(h)] + h
raw_filename = ('%s_%s.dump' % (video_id, url))
filename = sanitize_filename(raw_filename, restricted=True)
@@ -465,6 +466,9 @@ class InfoExtractor(object):
return RATING_TABLE.get(rating.lower(), None)
def _sort_formats(self, formats):
if not formats:
raise ExtractorError(u'No video formats found')
def _formats_key(f):
# TODO remove the following workaround
from ..utils import determine_ext

View File

@@ -30,7 +30,7 @@ class CondeNastIE(InfoExtractor):
'vanityfair': 'Vanity Fair',
}
_VALID_URL = r'http://(video|www).(?P<site>%s).com/(?P<type>watch|series|video)/(?P<id>.+)' % '|'.join(_SITES.keys())
_VALID_URL = r'http://(video|www)\.(?P<site>%s)\.com/(?P<type>watch|series|video)/(?P<id>.+)' % '|'.join(_SITES.keys())
IE_DESC = 'Condé Nast media group: %s' % ', '.join(sorted(_SITES.values()))
_TEST = {

View File

@@ -0,0 +1,46 @@
from __future__ import unicode_literals
import re
import json
from .common import InfoExtractor
class DiscoveryIE(InfoExtractor):
_VALID_URL = r'http://dsc\.discovery\.com\/[a-zA-Z0-9\-]*/[a-zA-Z0-9\-]*/videos/(?P<id>[a-zA-Z0-9\-]*)(.htm)?'
_TEST = {
'url': 'http://dsc.discovery.com/tv-shows/mythbusters/videos/mission-impossible-outtakes.htm',
'file': '614784.mp4',
'md5': 'e12614f9ee303a6ccef415cb0793eba2',
'info_dict': {
'title': 'MythBusters: Mission Impossible Outtakes',
'description': ('Watch Jamie Hyneman and Adam Savage practice being'
' each other -- to the point of confusing Jamie\'s dog -- and '
'don\'t miss Adam moon-walking as Jamie ... behind Jamie\'s'
' back.'),
'duration': 156,
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
video_list_json = self._search_regex(r'var videoListJSON = ({.*?});',
webpage, 'video list', flags=re.DOTALL)
video_list = json.loads(video_list_json)
info = video_list['clips'][0]
formats = []
for f in info['mp4']:
formats.append(
{'url': f['src'], r'ext': r'mp4', 'tbr': int(f['bitrate'][:-1])})
return {
'id': info['contentId'],
'title': video_list['name'],
'formats': formats,
'description': info['videoCaption'],
'thumbnail': info.get('videoStillURL') or info.get('thumbnailURL'),
'duration': info['duration'],
}

View File

@@ -0,0 +1,37 @@
from __future__ import unicode_literals
import re
import json
from .common import InfoExtractor
class FreespeechIE(InfoExtractor):
IE_NAME = 'freespeech.org'
_VALID_URL = r'https://www\.freespeech\.org/video/(?P<title>.+)'
_TEST = {
'add_ie': ['Youtube'],
'url': 'https://www.freespeech.org/video/obama-romney-campaign-colorado-ahead-debate-0',
'info_dict': {
'id': 'poKsVCZ64uU',
'ext': 'mp4',
'title': 'Obama, Romney Campaign in Colorado Ahead of Debate',
'description': 'Obama, Romney Campaign in Colorado Ahead of Debate',
'uploader': 'freespeechtv',
'uploader_id': 'freespeechtv',
'upload_date': '20121002',
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
title = mobj.group('title')
webpage = self._download_webpage(url, title)
info_json = self._search_regex(r'jQuery.extend\(Drupal.settings, ({.*?})\);', webpage, 'info')
info = json.loads(info_json)
return {
'_type': 'url',
'url': info['jw_player']['basic_video_node_player']['file'],
'ie_key': 'Youtube',
}

View File

@@ -1,3 +1,5 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
@@ -6,13 +8,16 @@ from .common import InfoExtractor
class FunnyOrDieIE(InfoExtractor):
_VALID_URL = r'^(?:https?://)?(?:www\.)?funnyordie\.com/videos/(?P<id>[0-9a-f]+)/.*$'
_TEST = {
u'url': u'http://www.funnyordie.com/videos/0732f586d7/heart-shaped-box-literal-video-version',
u'file': u'0732f586d7.mp4',
u'md5': u'f647e9e90064b53b6e046e75d0241fbd',
u'info_dict': {
u"description": u"Lyrics changed to match the video. Spoken cameo by Obscurus Lupa (from ThatGuyWithTheGlasses.com). Based on a concept by Dustin McLean (DustFilms.com). Performed, edited, and written by David A. Scott.",
u"title": u"Heart-Shaped Box: Literal Video Version"
}
'url': 'http://www.funnyordie.com/videos/0732f586d7/heart-shaped-box-literal-video-version',
'file': '0732f586d7.mp4',
'md5': 'f647e9e90064b53b6e046e75d0241fbd',
'info_dict': {
'description': ('Lyrics changed to match the video. Spoken cameo '
'by Obscurus Lupa (from ThatGuyWithTheGlasses.com). Based on a '
'concept by Dustin McLean (DustFilms.com). Performed, edited, '
'and written by David A. Scott.'),
'title': 'Heart-Shaped Box: Literal Video Version',
},
}
def _real_extract(self, url):
@@ -23,13 +28,12 @@ class FunnyOrDieIE(InfoExtractor):
video_url = self._search_regex(
[r'type="video/mp4" src="(.*?)"', r'src="([^>]*?)" type=\'video/mp4\''],
webpage, u'video URL', flags=re.DOTALL)
webpage, 'video URL', flags=re.DOTALL)
info = {
return {
'id': video_id,
'url': video_url,
'ext': 'mp4',
'title': self._og_search_title(webpage),
'description': self._og_search_description(webpage),
}
return [info]

View File

@@ -78,6 +78,18 @@ class GenericIE(InfoExtractor):
'skip_download': True,
},
},
{
# https://github.com/rg3/youtube-dl/issues/2253
'url': 'http://bcove.me/i6nfkrc3',
'file': '3101154703001.mp4',
'md5': '0ba9446db037002366bab3b3eb30c88c',
'info_dict': {
'title': 'Still no power',
'uploader': 'thestar.com',
'description': 'Mississauga resident David Farmer is still out of power as a result of the ice storm a month ago. To keep the house warm, Farmer cuts wood from his property for a wood burning stove downstairs.',
},
'add_ie': ['Brightcove'],
},
# Direct link to a video
{
'url': 'http://media.w3.org/2010/05/sintel/trailer.mp4',
@@ -332,10 +344,16 @@ class GenericIE(InfoExtractor):
# Look for embedded Facebook player
mobj = re.search(
r'<iframe[^>]+?src=(["\'])(?P<url>https://www.facebook.com/video/embed.+?)\1', webpage)
r'<iframe[^>]+?src=(["\'])(?P<url>https://www\.facebook\.com/video/embed.+?)\1', webpage)
if mobj is not None:
return self.url_result(mobj.group('url'), 'Facebook')
# Look for embedded Huffington Post player
mobj = re.search(
r'<iframe[^>]+?src=(["\'])(?P<url>https?://embed\.live.huffingtonpost\.com/.+?)\1', webpage)
if mobj is not None:
return self.url_result(mobj.group('url'), 'HuffPost')
# Start with something easy: JW Player in SWFObject
mobj = re.search(r'flashvars: [\'"](?:.*&)?file=(http[^\'"&]*)', webpage)
if mobj is None:

View File

@@ -13,7 +13,7 @@ from ..utils import (
class HotNewHipHopIE(InfoExtractor):
_VALID_URL = r'http://www\.hotnewhiphop.com/.*\.(?P<id>.*)\.html'
_VALID_URL = r'http://www\.hotnewhiphop\.com/.*\.(?P<id>.*)\.html'
_TEST = {
'url': 'http://www.hotnewhiphop.com/freddie-gibbs-lay-it-down-song.1435540.html',
'file': '1435540.mp3',

View File

@@ -0,0 +1,82 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
parse_duration,
unified_strdate,
)
class HuffPostIE(InfoExtractor):
IE_DESC = 'Huffington Post'
_VALID_URL = r'''(?x)
https?://(embed\.)?live\.huffingtonpost\.com/
(?:
r/segment/[^/]+/|
HPLEmbedPlayer/\?segmentId=
)
(?P<id>[0-9a-f]+)'''
_TEST = {
'url': 'http://live.huffingtonpost.com/r/segment/legalese-it/52dd3e4b02a7602131000677',
'file': '52dd3e4b02a7602131000677.mp4',
'md5': '55f5e8981c1c80a64706a44b74833de8',
'info_dict': {
'title': 'Legalese It! with @MikeSacksHP',
'description': 'This week on Legalese It, Mike talks to David Bosco about his new book on the ICC, "Rough Justice," he also discusses the Virginia AG\'s historic stance on gay marriage, the execution of Edgar Tamayo, the ICC\'s delay of Kenya\'s President and more. ',
'duration': 1549,
'upload_date': '20140124',
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
api_url = 'http://embed.live.huffingtonpost.com/api/segments/%s.json' % video_id
data = self._download_json(api_url, video_id)['data']
video_title = data['title']
duration = parse_duration(data['running_time'])
upload_date = unified_strdate(data['schedule']['starts_at'])
description = data.get('description')
thumbnails = []
for url in data['images'].values():
m = re.match('.*-([0-9]+x[0-9]+)\.', url)
if not m:
continue
thumbnails.append({
'url': url,
'resolution': m.group(1),
})
formats = [{
'format': key,
'format_id': key.replace('/', '.'),
'ext': 'mp4',
'url': url,
'vcodec': 'none' if key.startswith('audio/') else None,
} for key, url in data['sources']['live'].items()]
if data.get('fivemin_id'):
fid = data['fivemin_id']
fcat = str(int(fid) // 100 + 1)
furl = 'http://avideos.5min.com/2/' + fcat[-3:] + '/' + fcat + '/' + fid + '.mp4'
formats.append({
'format': 'fivemin',
'url': furl,
'preference': 1,
})
self._sort_formats(formats)
return {
'id': video_id,
'title': video_title,
'description': description,
'formats': formats,
'duration': duration,
'upload_date': upload_date,
'thumbnails': thumbnails,
}

View File

@@ -69,12 +69,9 @@ class ImdbListIE(InfoExtractor):
list_id = mobj.group('id')
webpage = self._download_webpage(url, list_id)
list_code = self._search_regex(
r'(?s)<div\s+class="list\sdetail">(.*?)class="see-more"',
webpage, 'list code')
entries = [
self.url_result('http://www.imdb.com' + m, 'Imdb')
for m in re.findall(r'href="(/video/imdb/vi[^"]+)"', webpage)]
for m in re.findall(r'href="(/video/imdb/vi[^"]+)"\s+data-type="playlist"', webpage)]
list_title = self._html_search_regex(
r'<h1 class="header">(.*?)</h1>', webpage, 'list title')

View File

@@ -1,4 +1,5 @@
# encoding: utf-8
from __future__ import unicode_literals
import re
import json
@@ -11,38 +12,38 @@ from ..utils import (
class IviIE(InfoExtractor):
IE_DESC = u'ivi.ru'
IE_NAME = u'ivi'
IE_DESC = 'ivi.ru'
IE_NAME = 'ivi'
_VALID_URL = r'^https?://(?:www\.)?ivi\.ru/watch(?:/(?P<compilationid>[^/]+))?/(?P<videoid>\d+)'
_TESTS = [
# Single movie
{
u'url': u'http://www.ivi.ru/watch/53141',
u'file': u'53141.mp4',
u'md5': u'6ff5be2254e796ed346251d117196cf4',
u'info_dict': {
u'title': u'Иван Васильевич меняет профессию',
u'description': u'md5:14d8eda24e9d93d29b5857012c6d6346',
u'duration': 5498,
u'thumbnail': u'http://thumbs.ivi.ru/f20.vcp.digitalaccess.ru/contents/d/1/c3c885163a082c29bceeb7b5a267a6.jpg',
'url': 'http://www.ivi.ru/watch/53141',
'file': '53141.mp4',
'md5': '6ff5be2254e796ed346251d117196cf4',
'info_dict': {
'title': 'Иван Васильевич меняет профессию',
'description': 'md5:b924063ea1677c8fe343d8a72ac2195f',
'duration': 5498,
'thumbnail': 'http://thumbs.ivi.ru/f20.vcp.digitalaccess.ru/contents/d/1/c3c885163a082c29bceeb7b5a267a6.jpg',
},
u'skip': u'Only works from Russia',
'skip': 'Only works from Russia',
},
# Serial's serie
{
u'url': u'http://www.ivi.ru/watch/dezhurnyi_angel/74791',
u'file': u'74791.mp4',
u'md5': u'3e6cc9a848c1d2ebcc6476444967baa9',
u'info_dict': {
u'title': u'Дежурный ангел - 1 серия',
u'duration': 2490,
u'thumbnail': u'http://thumbs.ivi.ru/f7.vcp.digitalaccess.ru/contents/8/e/bc2f6c2b6e5d291152fdd32c059141.jpg',
'url': 'http://www.ivi.ru/watch/dezhurnyi_angel/74791',
'file': '74791.mp4',
'md5': '3e6cc9a848c1d2ebcc6476444967baa9',
'info_dict': {
'title': 'Дежурный ангел - 1 серия',
'duration': 2490,
'thumbnail': 'http://thumbs.ivi.ru/f7.vcp.digitalaccess.ru/contents/8/e/bc2f6c2b6e5d291152fdd32c059141.jpg',
},
u'skip': u'Only works from Russia',
'skip': 'Only works from Russia',
}
]
# Sorted by quality
_known_formats = ['MP4-low-mobile', 'MP4-mobile', 'FLV-lo', 'MP4-lo', 'FLV-hi', 'MP4-hi', 'MP4-SHQ']
@@ -54,7 +55,7 @@ class IviIE(InfoExtractor):
return m.group('description') if m is not None else None
def _extract_comment_count(self, html):
m = re.search(u'(?s)<a href="#" id="view-comments" class="action-button dim gradient">\s*Комментарии:\s*(?P<commentcount>\d+)\s*</a>', html)
m = re.search('(?s)<a href="#" id="view-comments" class="action-button dim gradient">\s*Комментарии:\s*(?P<commentcount>\d+)\s*</a>', html)
return int(m.group('commentcount')) if m is not None else 0
def _real_extract(self, url):
@@ -63,49 +64,49 @@ class IviIE(InfoExtractor):
api_url = 'http://api.digitalaccess.ru/api/json/'
data = {u'method': u'da.content.get',
u'params': [video_id, {u'site': u's183',
u'referrer': u'http://www.ivi.ru/watch/%s' % video_id,
u'contentid': video_id
}
]
data = {'method': 'da.content.get',
'params': [video_id, {'site': 's183',
'referrer': 'http://www.ivi.ru/watch/%s' % video_id,
'contentid': video_id
}
]
}
request = compat_urllib_request.Request(api_url, json.dumps(data))
video_json_page = self._download_webpage(request, video_id, u'Downloading video JSON')
video_json_page = self._download_webpage(request, video_id, 'Downloading video JSON')
video_json = json.loads(video_json_page)
if u'error' in video_json:
error = video_json[u'error']
if error[u'origin'] == u'NoRedisValidData':
raise ExtractorError(u'Video %s does not exist' % video_id, expected=True)
raise ExtractorError(u'Unable to download video %s: %s' % (video_id, error[u'message']), expected=True)
if 'error' in video_json:
error = video_json['error']
if error['origin'] == 'NoRedisValidData':
raise ExtractorError('Video %s does not exist' % video_id, expected=True)
raise ExtractorError('Unable to download video %s: %s' % (video_id, error['message']), expected=True)
result = video_json[u'result']
result = video_json['result']
formats = [{
'url': x[u'url'],
'format_id': x[u'content_format'],
'preference': self._known_formats.index(x[u'content_format']),
} for x in result[u'files'] if x[u'content_format'] in self._known_formats]
'url': x['url'],
'format_id': x['content_format'],
'preference': self._known_formats.index(x['content_format']),
} for x in result['files'] if x['content_format'] in self._known_formats]
self._sort_formats(formats)
if not formats:
raise ExtractorError(u'No media links available for %s' % video_id)
raise ExtractorError('No media links available for %s' % video_id)
duration = result[u'duration']
compilation = result[u'compilation']
title = result[u'title']
duration = result['duration']
compilation = result['compilation']
title = result['title']
title = '%s - %s' % (compilation, title) if compilation is not None else title
previews = result[u'preview']
previews = result['preview']
previews.sort(key=lambda fmt: self._known_thumbnails.index(fmt['content_format']))
thumbnail = previews[-1][u'url'] if len(previews) > 0 else None
thumbnail = previews[-1]['url'] if len(previews) > 0 else None
video_page = self._download_webpage(url, video_id, u'Downloading video page')
video_page = self._download_webpage(url, video_id, 'Downloading video page')
description = self._extract_description(video_page)
comment_count = self._extract_comment_count(video_page)
@@ -121,8 +122,8 @@ class IviIE(InfoExtractor):
class IviCompilationIE(InfoExtractor):
IE_DESC = u'ivi.ru compilations'
IE_NAME = u'ivi:compilation'
IE_DESC = 'ivi.ru compilations'
IE_NAME = 'ivi:compilation'
_VALID_URL = r'^https?://(?:www\.)?ivi\.ru/watch/(?!\d+)(?P<compilationid>[a-z\d_-]+)(?:/season(?P<seasonid>\d+))?$'
def _extract_entries(self, html, compilation_id):
@@ -135,22 +136,23 @@ class IviCompilationIE(InfoExtractor):
season_id = mobj.group('seasonid')
if season_id is not None: # Season link
season_page = self._download_webpage(url, compilation_id, u'Downloading season %s web page' % season_id)
season_page = self._download_webpage(url, compilation_id, 'Downloading season %s web page' % season_id)
playlist_id = '%s/season%s' % (compilation_id, season_id)
playlist_title = self._html_search_meta(u'title', season_page, u'title')
playlist_title = self._html_search_meta('title', season_page, 'title')
entries = self._extract_entries(season_page, compilation_id)
else: # Compilation link
compilation_page = self._download_webpage(url, compilation_id, u'Downloading compilation web page')
compilation_page = self._download_webpage(url, compilation_id, 'Downloading compilation web page')
playlist_id = compilation_id
playlist_title = self._html_search_meta(u'title', compilation_page, u'title')
playlist_title = self._html_search_meta('title', compilation_page, 'title')
seasons = re.findall(r'<a href="/watch/%s/season(\d+)">[^<]+</a>' % compilation_id, compilation_page)
if len(seasons) == 0: # No seasons in this compilation
entries = self._extract_entries(compilation_page, compilation_id)
else:
entries = []
for season_id in seasons:
season_page = self._download_webpage('http://www.ivi.ru/watch/%s/season%s' % (compilation_id, season_id),
compilation_id, u'Downloading season %s web page' % season_id)
season_page = self._download_webpage(
'http://www.ivi.ru/watch/%s/season%s' % (compilation_id, season_id),
compilation_id, 'Downloading season %s web page' % season_id)
entries.extend(self._extract_entries(season_page, compilation_id))
return self.playlist_result(entries, playlist_id, playlist_title)

View File

@@ -1,3 +1,5 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
@@ -5,36 +7,34 @@ from .common import InfoExtractor
class KeekIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?keek\.com/(?:!|\w+/keeks/)(?P<videoID>\w+)'
IE_NAME = u'keek'
IE_NAME = 'keek'
_TEST = {
u'url': u'https://www.keek.com/ytdl/keeks/NODfbab',
u'file': u'NODfbab.mp4',
u'md5': u'9b0636f8c0f7614afa4ea5e4c6e57e83',
u'info_dict': {
u"uploader": u"ytdl",
u"title": u"test chars: \"'/\\\u00e4<>This is a test video for youtube-dl.For more information, contact phihag@phihag.de ."
}
'url': 'https://www.keek.com/ytdl/keeks/NODfbab',
'file': 'NODfbab.mp4',
'md5': '9b0636f8c0f7614afa4ea5e4c6e57e83',
'info_dict': {
'uploader': 'ytdl',
'title': 'test chars: "\'/\\\u00e4<>This is a test video for youtube-dl.For more information, contact phihag@phihag.de .',
},
}
def _real_extract(self, url):
m = re.match(self._VALID_URL, url)
video_id = m.group('videoID')
video_url = u'http://cdn.keek.com/keek/video/%s' % video_id
thumbnail = u'http://cdn.keek.com/keek/thumbnail/%s/w100/h75' % video_id
video_url = 'http://cdn.keek.com/keek/video/%s' % video_id
thumbnail = 'http://cdn.keek.com/keek/thumbnail/%s/w100/h75' % video_id
webpage = self._download_webpage(url, video_id)
video_title = self._og_search_title(webpage)
uploader = self._html_search_regex(
r'<div class="user-name-and-bio">[\S\s]+?<h2>(?P<uploader>.+?)</h2>',
webpage, 'uploader', fatal=False)
uploader = self._html_search_regex(r'<div class="user-name-and-bio">[\S\s]+?<h2>(?P<uploader>.+?)</h2>',
webpage, u'uploader', fatal=False)
info = {
'id': video_id,
'url': video_url,
'ext': 'mp4',
'title': video_title,
'thumbnail': thumbnail,
'uploader': uploader
return {
'id': video_id,
'url': video_url,
'ext': 'mp4',
'title': self._og_search_title(webpage),
'thumbnail': thumbnail,
'uploader': uploader
}
return [info]

View File

@@ -0,0 +1,63 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
parse_duration,
)
class LA7IE(InfoExtractor):
IE_NAME = 'la7.tv'
_VALID_URL = r'''(?x)
https?://(?:www\.)?la7\.tv/
(?:
richplayer/\?assetid=|
\?contentId=
)
(?P<id>[0-9]+)'''
_TEST = {
'url': 'http://www.la7.tv/richplayer/?assetid=50355319',
'file': '50355319.mp4',
'md5': 'ec7d1f0224d20ba293ab56cf2259651f',
'info_dict': {
'title': 'IL DIVO',
'description': 'Un film di Paolo Sorrentino con Toni Servillo, Anna Bonaiuto, Giulio Bosetti e Flavio Bucci',
'duration': 6254,
},
'skip': 'Blocked in the US',
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
xml_url = 'http://www.la7.tv/repliche/content/index.php?contentId=%s' % video_id
doc = self._download_xml(xml_url, video_id)
video_title = doc.find('title').text
description = doc.find('description').text
duration = parse_duration(doc.find('duration').text)
thumbnail = doc.find('img').text
view_count = int(doc.find('views').text)
prefix = doc.find('.//fqdn').text.strip().replace('auto:', 'http:')
formats = [{
'format': vnode.find('quality').text,
'tbr': int(vnode.find('quality').text),
'url': vnode.find('fms').text.strip().replace('mp4:', prefix),
} for vnode in doc.findall('.//videos/video')]
self._sort_formats(formats)
return {
'id': video_id,
'title': video_title,
'description': description,
'thumbnail': thumbnail,
'duration': duration,
'formats': formats,
'view_count': view_count,
}

View File

@@ -1,3 +1,5 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
@@ -7,46 +9,36 @@ from ..utils import (
class LiveLeakIE(InfoExtractor):
_VALID_URL = r'^(?:http://)?(?:\w+\.)?liveleak\.com/view\?(?:.*?)i=(?P<video_id>[\w_]+)(?:.*)'
IE_NAME = u'liveleak'
_TEST = {
u'url': u'http://www.liveleak.com/view?i=757_1364311680',
u'file': u'757_1364311680.mp4',
u'md5': u'0813c2430bea7a46bf13acf3406992f4',
u'info_dict': {
u"description": u"extremely bad day for this guy..!",
u"uploader": u"ljfriel2",
u"title": u"Most unlucky car accident"
'url': 'http://www.liveleak.com/view?i=757_1364311680',
'file': '757_1364311680.mp4',
'md5': '0813c2430bea7a46bf13acf3406992f4',
'info_dict': {
'description': 'extremely bad day for this guy..!',
'uploader': 'ljfriel2',
'title': 'Most unlucky car accident'
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
if mobj is None:
raise ExtractorError(u'Invalid URL: %s' % url)
video_id = mobj.group('video_id')
webpage = self._download_webpage(url, video_id)
video_url = self._search_regex(r'file: "(.*?)",',
webpage, u'video URL')
video_url = self._search_regex(
r'file: "(.*?)",', webpage, 'video URL')
video_title = self._og_search_title(webpage).replace('LiveLeak.com -', '').strip()
video_description = self._og_search_description(webpage)
video_uploader = self._html_search_regex(
r'By:.*?(\w+)</a>', webpage, 'uploader', fatal=False)
video_uploader = self._html_search_regex(r'By:.*?(\w+)</a>',
webpage, u'uploader', fatal=False)
info = {
'id': video_id,
return {
'id': video_id,
'url': video_url,
'ext': 'mp4',
'title': video_title,
'description': video_description,
'uploader': video_uploader
}
return [info]

View File

@@ -0,0 +1,58 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
compat_urllib_parse,
)
class MalemotionIE(InfoExtractor):
_VALID_URL = r'^(?:https?://)?malemotion\.com/video/(.+?)\.(?P<id>.+?)(#|$)'
_TEST = {
'url': 'http://malemotion.com/video/bien-dur.10ew',
'file': '10ew.mp4',
'md5': 'b3cc49f953b107e4a363cdff07d100ce',
'info_dict': {
"title": "Bien dur",
"age_limit": 18,
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group("id")
webpage = self._download_webpage(url, video_id)
self.report_extraction(video_id)
# Extract video URL
video_url = compat_urllib_parse.unquote(
self._search_regex(r'<source type="video/mp4" src="(.+?)"', webpage, 'video URL'))
# Extract title
video_title = self._html_search_regex(
r'<title>(.*?)</title', webpage, 'title')
# Extract video thumbnail
video_thumbnail = self._search_regex(
r'<video .+?poster="(.+?)"', webpage, 'thumbnail', fatal=False)
formats = [{
'url': video_url,
'ext': 'mp4',
'format_id': 'mp4',
'preference': 1,
}]
return {
'id': video_id,
'formats': formats,
'uploader': None,
'upload_date': None,
'title': video_title,
'thumbnail': video_thumbnail,
'description': None,
'age_limit': 18,
}

View File

@@ -1,3 +1,5 @@
from __future__ import unicode_literals
import json
import re
@@ -9,13 +11,13 @@ class NineGagIE(InfoExtractor):
_VALID_URL = r'^https?://(?:www\.)?9gag\.tv/v/(?P<id>[0-9]+)'
_TEST = {
u"url": u"http://9gag.tv/v/1912",
u"file": u"1912.mp4",
u"info_dict": {
u"description": u"This 3-minute video will make you smile and then make you feel untalented and insignificant. Anyway, you should share this awesomeness. (Thanks, Dino!)",
u"title": u"\"People Are Awesome 2013\" Is Absolutely Awesome"
"url": "http://9gag.tv/v/1912",
"file": "1912.mp4",
"info_dict": {
"description": "This 3-minute video will make you smile and then make you feel untalented and insignificant. Anyway, you should share this awesomeness. (Thanks, Dino!)",
"title": "\"People Are Awesome 2013\" Is Absolutely Awesome"
},
u'add_ie': [u'Youtube']
'add_ie': ['Youtube']
}
def _real_extract(self, url):
@@ -25,7 +27,7 @@ class NineGagIE(InfoExtractor):
webpage = self._download_webpage(url, video_id)
data_json = self._html_search_regex(r'''(?x)
<div\s*id="tv-video"\s*data-video-source="youtube"\s*
data-video-meta="([^"]+)"''', webpage, u'video metadata')
data-video-meta="([^"]+)"''', webpage, 'video metadata')
data = json.loads(data_json)

View File

@@ -1,3 +1,5 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
@@ -7,12 +9,12 @@ from ..utils import compat_urllib_parse
class PornHdIE(InfoExtractor):
_VALID_URL = r'(?:http://)?(?:www\.)?pornhd\.com/(?:[a-z]{2,4}/)?videos/(?P<video_id>[0-9]+)/(?P<video_title>.+)'
_TEST = {
u'url': u'http://www.pornhd.com/videos/1962/sierra-day-gets-his-cum-all-over-herself-hd-porn-video',
u'file': u'1962.flv',
u'md5': u'35272469887dca97abd30abecc6cdf75',
u'info_dict': {
u"title": u"sierra-day-gets-his-cum-all-over-herself-hd-porn-video",
u"age_limit": 18,
'url': 'http://www.pornhd.com/videos/1962/sierra-day-gets-his-cum-all-over-herself-hd-porn-video',
'file': '1962.flv',
'md5': '35272469887dca97abd30abecc6cdf75',
'info_dict': {
"title": "sierra-day-gets-his-cum-all-over-herself-hd-porn-video",
"age_limit": 18,
}
}
@@ -24,9 +26,13 @@ class PornHdIE(InfoExtractor):
webpage = self._download_webpage(url, video_id)
video_url = self._html_search_regex(
r'&hd=(http.+?)&', webpage, u'video URL')
video_url = compat_urllib_parse.unquote(video_url)
next_url = self._html_search_regex(
r'&hd=(http.+?)&', webpage, 'video URL')
next_url = compat_urllib_parse.unquote(next_url)
video_url = self._download_webpage(
next_url, video_id, note='Retrieving video URL',
errnote='Could not retrieve video URL')
age_limit = 18
return {

View File

@@ -1,4 +1,7 @@
# encoding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
@@ -12,78 +15,77 @@ class RTLnowIE(InfoExtractor):
"""Information Extractor for RTL NOW, RTL2 NOW, RTL NITRO, SUPER RTL NOW, VOX NOW and n-tv NOW"""
_VALID_URL = r'(?:http://)?(?P<url>(?P<domain>rtl-now\.rtl\.de|rtl2now\.rtl2\.de|(?:www\.)?voxnow\.de|(?:www\.)?rtlnitronow\.de|(?:www\.)?superrtlnow\.de|(?:www\.)?n-tvnow\.de)/+[a-zA-Z0-9-]+/[a-zA-Z0-9-]+\.php\?(?:container_id|film_id)=(?P<video_id>[0-9]+)&player=1(?:&season=[0-9]+)?(?:&.*)?)'
_TESTS = [{
u'url': u'http://rtl-now.rtl.de/ahornallee/folge-1.php?film_id=90419&player=1&season=1',
u'file': u'90419.flv',
u'info_dict': {
u'upload_date': u'20070416',
u'title': u'Ahornallee - Folge 1 - Der Einzug',
u'description': u'Folge 1 - Der Einzug',
'url': 'http://rtl-now.rtl.de/ahornallee/folge-1.php?film_id=90419&player=1&season=1',
'file': '90419.flv',
'info_dict': {
'upload_date': '20070416',
'title': 'Ahornallee - Folge 1 - Der Einzug',
'description': 'Folge 1 - Der Einzug',
},
u'params': {
u'skip_download': True,
'params': {
'skip_download': True,
},
u'skip': u'Only works from Germany',
'skip': 'Only works from Germany',
},
{
u'url': u'http://rtl2now.rtl2.de/aerger-im-revier/episode-15-teil-1.php?film_id=69756&player=1&season=2&index=5',
u'file': u'69756.flv',
u'info_dict': {
u'upload_date': u'20120519',
u'title': u'Ärger im Revier - Ein junger Ladendieb, ein handfester Streit...',
u'description': u'Ärger im Revier - Ein junger Ladendieb, ein handfester Streit u.a.',
u'thumbnail': u'http://autoimg.static-fra.de/rtl2now/219850/1500x1500/image2.jpg',
'url': 'http://rtl2now.rtl2.de/aerger-im-revier/episode-15-teil-1.php?film_id=69756&player=1&season=2&index=5',
'file': '69756.flv',
'info_dict': {
'upload_date': '20120519',
'title': 'Ärger im Revier - Ein junger Ladendieb, ein handfester Streit...',
'description': 'Ärger im Revier - Ein junger Ladendieb, ein handfester Streit u.a.',
'thumbnail': 'http://autoimg.static-fra.de/rtl2now/219850/1500x1500/image2.jpg',
},
u'params': {
u'skip_download': True,
'params': {
'skip_download': True,
},
u'skip': u'Only works from Germany',
'skip': 'Only works from Germany',
},
{
u'url': u'http://www.voxnow.de/voxtours/suedafrika-reporter-ii.php?film_id=13883&player=1&season=17',
u'file': u'13883.flv',
u'info_dict': {
u'upload_date': u'20090627',
u'title': u'Voxtours - Südafrika-Reporter II',
u'description': u'Südafrika-Reporter II',
'url': 'http://www.voxnow.de/voxtours/suedafrika-reporter-ii.php?film_id=13883&player=1&season=17',
'file': '13883.flv',
'info_dict': {
'upload_date': '20090627',
'title': 'Voxtours - Südafrika-Reporter II',
'description': 'Südafrika-Reporter II',
},
u'params': {
u'skip_download': True,
'params': {
'skip_download': True,
},
},
{
u'url': u'http://superrtlnow.de/medicopter-117/angst.php?film_id=99205&player=1',
u'file': u'99205.flv',
u'info_dict': {
u'upload_date': u'20080928',
u'title': u'Medicopter 117 - Angst!',
u'description': u'Angst!',
u'thumbnail': u'http://autoimg.static-fra.de/superrtlnow/287529/1500x1500/image2.jpg'
'url': 'http://superrtlnow.de/medicopter-117/angst.php?film_id=99205&player=1',
'file': '99205.flv',
'info_dict': {
'upload_date': '20080928',
'title': 'Medicopter 117 - Angst!',
'description': 'Angst!',
'thumbnail': 'http://autoimg.static-fra.de/superrtlnow/287529/1500x1500/image2.jpg'
},
u'params': {
u'skip_download': True,
'params': {
'skip_download': True,
},
},
{
u'url': u'http://www.n-tvnow.de/top-gear/episode-1-2013-01-01-00-00-00.php?film_id=124903&player=1&season=10',
u'file': u'124903.flv',
u'info_dict': {
u'upload_date': u'20130101',
u'title': u'Top Gear vom 01.01.2013',
u'description': u'Episode 1',
'url': 'http://www.n-tvnow.de/top-gear/episode-1-2013-01-01-00-00-00.php?film_id=124903&player=1&season=10',
'file': '124903.flv',
'info_dict': {
'upload_date': '20130101',
'title': 'Top Gear vom 01.01.2013',
'description': 'Episode 1',
},
u'params': {
u'skip_download': True,
'params': {
'skip_download': True,
},
u'skip': u'Only works from Germany',
'skip': 'Only works from Germany',
}]
def _real_extract(self,url):
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
webpage_url = u'http://' + mobj.group('url')
video_page_url = u'http://' + mobj.group('domain') + u'/'
video_id = mobj.group(u'video_id')
webpage_url = 'http://' + mobj.group('url')
video_page_url = 'http://' + mobj.group('domain') + '/'
video_id = mobj.group('video_id')
webpage = self._download_webpage(webpage_url, video_id)
@@ -94,51 +96,53 @@ class RTLnowIE(InfoExtractor):
msg = clean_html(note_m.group(1))
raise ExtractorError(msg)
video_title = self._html_search_regex(r'<title>(?P<title>[^<]+?)( \| [^<]*)?</title>',
webpage, u'title')
playerdata_url = self._html_search_regex(r'\'playerdata\': \'(?P<playerdata_url>[^\']+)\'',
webpage, u'playerdata_url')
video_title = self._html_search_regex(
r'<title>(?P<title>[^<]+?)( \| [^<]*)?</title>',
webpage, 'title')
playerdata_url = self._html_search_regex(
r'\'playerdata\': \'(?P<playerdata_url>[^\']+)\'',
webpage, 'playerdata_url')
playerdata = self._download_webpage(playerdata_url, video_id)
mobj = re.search(r'<title><!\[CDATA\[(?P<description>.+?)(?:\s+- (?:Sendung )?vom (?P<upload_date_d>[0-9]{2})\.(?P<upload_date_m>[0-9]{2})\.(?:(?P<upload_date_Y>[0-9]{4})|(?P<upload_date_y>[0-9]{2})) [0-9]{2}:[0-9]{2} Uhr)?\]\]></title>', playerdata)
if mobj:
video_description = mobj.group(u'description')
video_description = mobj.group('description')
if mobj.group('upload_date_Y'):
video_upload_date = mobj.group('upload_date_Y')
elif mobj.group('upload_date_y'):
video_upload_date = u'20' + mobj.group('upload_date_y')
video_upload_date = '20' + mobj.group('upload_date_y')
else:
video_upload_date = None
if video_upload_date:
video_upload_date += mobj.group('upload_date_m')+mobj.group('upload_date_d')
video_upload_date += mobj.group('upload_date_m') + mobj.group('upload_date_d')
else:
video_description = None
video_upload_date = None
self._downloader.report_warning(u'Unable to extract description and upload date')
self._downloader.report_warning('Unable to extract description and upload date')
# Thumbnail: not every video has an thumbnail
mobj = re.search(r'<meta property="og:image" content="(?P<thumbnail>[^"]+)">', webpage)
if mobj:
video_thumbnail = mobj.group(u'thumbnail')
video_thumbnail = mobj.group('thumbnail')
else:
video_thumbnail = None
mobj = re.search(r'<filename [^>]+><!\[CDATA\[(?P<url>rtmpe://(?:[^/]+/){2})(?P<play_path>[^\]]+)\]\]></filename>', playerdata)
if mobj is None:
raise ExtractorError(u'Unable to extract media URL')
video_url = mobj.group(u'url')
video_play_path = u'mp4:' + mobj.group(u'play_path')
video_player_url = video_page_url + u'includes/vodplayer.swf'
raise ExtractorError('Unable to extract media URL')
video_url = mobj.group('url')
video_play_path = 'mp4:' + mobj.group('play_path')
video_player_url = video_page_url + 'includes/vodplayer.swf'
return [{
'id': video_id,
'url': video_url,
'play_path': video_play_path,
'page_url': video_page_url,
'player_url': video_player_url,
'ext': 'flv',
'title': video_title,
return {
'id': video_id,
'url': video_url,
'play_path': video_play_path,
'page_url': video_page_url,
'player_url': video_player_url,
'ext': 'flv',
'title': video_title,
'description': video_description,
'upload_date': video_upload_date,
'thumbnail': video_thumbnail,
}]
'thumbnail': video_thumbnail,
}

View File

@@ -1,58 +1,124 @@
# encoding: utf-8
from __future__ import unicode_literals
import re
import json
import itertools
from .common import InfoExtractor
from ..utils import (
compat_urlparse,
compat_str,
unified_strdate,
ExtractorError,
)
class RutubeIE(InfoExtractor):
_VALID_URL = r'https?://rutube\.ru/video/(?P<long_id>\w+)'
IE_NAME = 'rutube'
IE_DESC = 'Rutube videos'
_VALID_URL = r'https?://rutube\.ru/video/(?P<id>[\da-z]{32})'
_TEST = {
u'url': u'http://rutube.ru/video/3eac3b4561676c17df9132a9a1e62e3e/',
u'file': u'3eac3b4561676c17df9132a9a1e62e3e.mp4',
u'info_dict': {
u'title': u'Раненный кенгуру забежал в аптеку',
u'uploader': u'NTDRussian',
u'uploader_id': u'29790',
'url': 'http://rutube.ru/video/3eac3b4561676c17df9132a9a1e62e3e/',
'file': '3eac3b4561676c17df9132a9a1e62e3e.mp4',
'info_dict': {
'title': 'Раненный кенгуру забежал в аптеку',
'description': 'http://www.ntdtv.ru ',
'duration': 80,
'uploader': 'NTDRussian',
'uploader_id': '29790',
'upload_date': '20131016',
},
u'params': {
'params': {
# It requires ffmpeg (m3u8 download)
u'skip_download': True,
'skip_download': True,
},
}
def _get_api_response(self, short_id, subpath):
api_url = 'http://rutube.ru/api/play/%s/%s/?format=json' % (subpath, short_id)
response_json = self._download_webpage(api_url, short_id,
u'Downloading %s json' % subpath)
return json.loads(response_json)
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
long_id = mobj.group('long_id')
webpage = self._download_webpage(url, long_id)
og_video = self._og_search_video_url(webpage)
short_id = compat_urlparse.urlparse(og_video).path[1:]
options = self._get_api_response(short_id, 'options')
trackinfo = self._get_api_response(short_id, 'trackinfo')
video_id = mobj.group('id')
api_response = self._download_webpage('http://rutube.ru/api/video/%s/?format=json' % video_id,
video_id, 'Downloading video JSON')
video = json.loads(api_response)
api_response = self._download_webpage('http://rutube.ru/api/play/trackinfo/%s/?format=json' % video_id,
video_id, 'Downloading trackinfo JSON')
trackinfo = json.loads(api_response)
# Some videos don't have the author field
author = trackinfo.get('author') or {}
m3u8_url = trackinfo['video_balancer'].get('m3u8')
if m3u8_url is None:
raise ExtractorError(u'Couldn\'t find m3u8 manifest url')
raise ExtractorError('Couldn\'t find m3u8 manifest url')
return {
'id': trackinfo['id'],
'title': trackinfo['title'],
'id': video['id'],
'title': video['title'],
'description': video['description'],
'duration': video['duration'],
'view_count': video['hits'],
'url': m3u8_url,
'ext': 'mp4',
'thumbnail': options['thumbnail_url'],
'thumbnail': video['thumbnail_url'],
'uploader': author.get('name'),
'uploader_id': compat_str(author['id']) if author else None,
'upload_date': unified_strdate(video['created_ts']),
'age_limit': 18 if video['is_adult'] else 0,
}
class RutubeChannelIE(InfoExtractor):
IE_NAME = 'rutube:channel'
IE_DESC = 'Rutube channels'
_VALID_URL = r'http://rutube\.ru/tags/video/(?P<id>\d+)'
_PAGE_TEMPLATE = 'http://rutube.ru/api/tags/video/%s/?page=%s&format=json'
def _extract_videos(self, channel_id, channel_title=None):
entries = []
for pagenum in itertools.count(1):
api_response = self._download_webpage(
self._PAGE_TEMPLATE % (channel_id, pagenum),
channel_id, 'Downloading page %s' % pagenum)
page = json.loads(api_response)
results = page['results']
if not results:
break
entries.extend(self.url_result(result['video_url'], 'Rutube') for result in results)
if not page['has_next']:
break
return self.playlist_result(entries, channel_id, channel_title)
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
channel_id = mobj.group('id')
return self._extract_videos(channel_id)
class RutubeMovieIE(RutubeChannelIE):
IE_NAME = 'rutube:movie'
IE_DESC = 'Rutube movies'
_VALID_URL = r'http://rutube\.ru/metainfo/tv/(?P<id>\d+)'
_MOVIE_TEMPLATE = 'http://rutube.ru/api/metainfo/tv/%s/?format=json'
_PAGE_TEMPLATE = 'http://rutube.ru/api/metainfo/tv/%s/video?page=%s&format=json'
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
movie_id = mobj.group('id')
api_response = self._download_webpage(
self._MOVIE_TEMPLATE % movie_id, movie_id,
'Downloading movie JSON')
movie = json.loads(api_response)
movie_name = movie['name']
return self._extract_videos(movie_id, movie_name)
class RutubePersonIE(RutubeChannelIE):
IE_NAME = 'rutube:person'
IE_DESC = 'Rutube person videos'
_VALID_URL = r'http://rutube\.ru/video/person/(?P<id>\d+)'
_PAGE_TEMPLATE = 'http://rutube.ru/api/video/person/%s/?page=%s&format=json'

View File

@@ -1,4 +1,5 @@
# coding: utf-8
from __future__ import unicode_literals
import re
@@ -12,21 +13,31 @@ from ..utils import (
class SinaIE(InfoExtractor):
_VALID_URL = r'''https?://(.*?\.)?video\.sina\.com\.cn/
(
(.+?/(((?P<pseudo_id>\d+).html)|(.*?(\#|(vid=))(?P<id>\d+?)($|&))))
(.+?/(((?P<pseudo_id>\d+).html)|(.*?(\#|(vid=)|b/)(?P<id>\d+?)($|&|\-))))
|
# This is used by external sites like Weibo
(api/sinawebApi/outplay.php/(?P<token>.+?)\.swf)
)
'''
_TEST = {
u'url': u'http://video.sina.com.cn/news/vlist/zt/chczlj2013/?opsubject_id=top12#110028898',
u'file': u'110028898.flv',
u'md5': u'd65dd22ddcf44e38ce2bf58a10c3e71f',
u'info_dict': {
u'title': u'《中国新闻》 朝鲜要求巴拿马立即释放被扣船员',
}
}
_TESTS = [
{
'url': 'http://video.sina.com.cn/news/vlist/zt/chczlj2013/?opsubject_id=top12#110028898',
'file': '110028898.flv',
'md5': 'd65dd22ddcf44e38ce2bf58a10c3e71f',
'info_dict': {
'title': '《中国新闻》 朝鲜要求巴拿马立即释放被扣船员',
}
},
{
'url': 'http://video.sina.com.cn/v/b/101314253-1290078633.html',
'info_dict': {
'id': '101314253',
'ext': 'flv',
'title': '军方提高对朝情报监视级别',
},
},
]
@classmethod
def suitable(cls, url):
@@ -35,10 +46,10 @@ class SinaIE(InfoExtractor):
def _extract_video(self, video_id):
data = compat_urllib_parse.urlencode({'vid': video_id})
url_doc = self._download_xml('http://v.iask.com/v_play.php?%s' % data,
video_id, u'Downloading video url')
video_id, 'Downloading video url')
image_page = self._download_webpage(
'http://interface.video.sina.com.cn/interface/common/getVideoImage.php?%s' % data,
video_id, u'Downloading thumbnail info')
video_id, 'Downloading thumbnail info')
return {'id': video_id,
'url': url_doc.find('./durl/url').text,
@@ -52,7 +63,7 @@ class SinaIE(InfoExtractor):
video_id = mobj.group('id')
if mobj.group('token') is not None:
# The video id is in the redirected url
self.to_screen(u'Getting video id')
self.to_screen('Getting video id')
request = compat_urllib_request.Request(url)
request.get_method = lambda: 'HEAD'
(_, urlh) = self._download_webpage_handle(request, 'NA', False)
@@ -60,6 +71,6 @@ class SinaIE(InfoExtractor):
elif video_id is None:
pseudo_id = mobj.group('pseudo_id')
webpage = self._download_webpage(url, pseudo_id)
video_id = self._search_regex(r'vid:\'(\d+?)\'', webpage, u'video id')
video_id = self._search_regex(r'vid:\'(\d+?)\'', webpage, 'video id')
return self._extract_video(video_id)

View File

@@ -1,4 +1,5 @@
# encoding: utf-8
from __future__ import unicode_literals
import os.path
import re
@@ -16,76 +17,76 @@ from ..utils import (
class SmotriIE(InfoExtractor):
IE_DESC = u'Smotri.com'
IE_NAME = u'smotri'
IE_DESC = 'Smotri.com'
IE_NAME = 'smotri'
_VALID_URL = r'^https?://(?:www\.)?(?P<url>smotri\.com/video/view/\?id=(?P<videoid>v(?P<realvideoid>[0-9]+)[a-z0-9]{4}))'
_TESTS = [
# real video id 2610366
{
u'url': u'http://smotri.com/video/view/?id=v261036632ab',
u'file': u'v261036632ab.mp4',
u'md5': u'2a7b08249e6f5636557579c368040eb9',
u'info_dict': {
u'title': u'катастрофа с камер видеонаблюдения',
u'uploader': u'rbc2008',
u'uploader_id': u'rbc08',
u'upload_date': u'20131118',
u'description': u'катастрофа с камер видеонаблюдения, видео катастрофа с камер видеонаблюдения',
u'thumbnail': u'http://frame6.loadup.ru/8b/a9/2610366.3.3.jpg',
'url': 'http://smotri.com/video/view/?id=v261036632ab',
'file': 'v261036632ab.mp4',
'md5': '2a7b08249e6f5636557579c368040eb9',
'info_dict': {
'title': 'катастрофа с камер видеонаблюдения',
'uploader': 'rbc2008',
'uploader_id': 'rbc08',
'upload_date': '20131118',
'description': 'катастрофа с камер видеонаблюдения, видео катастрофа с камер видеонаблюдения',
'thumbnail': 'http://frame6.loadup.ru/8b/a9/2610366.3.3.jpg',
},
},
# real video id 57591
{
u'url': u'http://smotri.com/video/view/?id=v57591cb20',
u'file': u'v57591cb20.flv',
u'md5': u'830266dfc21f077eac5afd1883091bcd',
u'info_dict': {
u'title': u'test',
u'uploader': u'Support Photofile@photofile',
u'uploader_id': u'support-photofile',
u'upload_date': u'20070704',
u'description': u'test, видео test',
u'thumbnail': u'http://frame4.loadup.ru/03/ed/57591.2.3.jpg',
'url': 'http://smotri.com/video/view/?id=v57591cb20',
'file': 'v57591cb20.flv',
'md5': '830266dfc21f077eac5afd1883091bcd',
'info_dict': {
'title': 'test',
'uploader': 'Support Photofile@photofile',
'uploader_id': 'support-photofile',
'upload_date': '20070704',
'description': 'test, видео test',
'thumbnail': 'http://frame4.loadup.ru/03/ed/57591.2.3.jpg',
},
},
# video-password
{
u'url': u'http://smotri.com/video/view/?id=v1390466a13c',
u'file': u'v1390466a13c.mp4',
u'md5': u'f6331cef33cad65a0815ee482a54440b',
u'info_dict': {
u'title': u'TOCCA_A_NOI_-_LE_COSE_NON_VANNO_CAMBIAMOLE_ORA-1',
u'uploader': u'timoxa40',
u'uploader_id': u'timoxa40',
u'upload_date': u'20100404',
u'thumbnail': u'http://frame7.loadup.ru/af/3f/1390466.3.3.jpg',
u'description': u'TOCCA_A_NOI_-_LE_COSE_NON_VANNO_CAMBIAMOLE_ORA-1, видео TOCCA_A_NOI_-_LE_COSE_NON_VANNO_CAMBIAMOLE_ORA-1',
'url': 'http://smotri.com/video/view/?id=v1390466a13c',
'file': 'v1390466a13c.mp4',
'md5': 'f6331cef33cad65a0815ee482a54440b',
'info_dict': {
'title': 'TOCCA_A_NOI_-_LE_COSE_NON_VANNO_CAMBIAMOLE_ORA-1',
'uploader': 'timoxa40',
'uploader_id': 'timoxa40',
'upload_date': '20100404',
'thumbnail': 'http://frame7.loadup.ru/af/3f/1390466.3.3.jpg',
'description': 'TOCCA_A_NOI_-_LE_COSE_NON_VANNO_CAMBIAMOLE_ORA-1, видео TOCCA_A_NOI_-_LE_COSE_NON_VANNO_CAMBIAMOLE_ORA-1',
},
u'params': {
u'videopassword': u'qwerty',
'params': {
'videopassword': 'qwerty',
},
},
# age limit + video-password
{
u'url': u'http://smotri.com/video/view/?id=v15408898bcf',
u'file': u'v15408898bcf.flv',
u'md5': u'91e909c9f0521adf5ee86fbe073aad70',
u'info_dict': {
u'title': u'этот ролик не покажут по ТВ',
u'uploader': u'zzxxx',
u'uploader_id': u'ueggb',
u'upload_date': u'20101001',
u'thumbnail': u'http://frame3.loadup.ru/75/75/1540889.1.3.jpg',
u'age_limit': 18,
u'description': u'этот ролик не покажут по ТВ, видео этот ролик не покажут по ТВ',
'url': 'http://smotri.com/video/view/?id=v15408898bcf',
'file': 'v15408898bcf.flv',
'md5': '91e909c9f0521adf5ee86fbe073aad70',
'info_dict': {
'title': 'этот ролик не покажут по ТВ',
'uploader': 'zzxxx',
'uploader_id': 'ueggb',
'upload_date': '20101001',
'thumbnail': 'http://frame3.loadup.ru/75/75/1540889.1.3.jpg',
'age_limit': 18,
'description': 'этот ролик не покажут по ТВ, видео этот ролик не покажут по ТВ',
},
u'params': {
u'videopassword': u'333'
'params': {
'videopassword': '333'
}
}
]
_SUCCESS = 0
_PASSWORD_NOT_VERIFIED = 1
_PASSWORD_DETECTED = 2
@@ -106,71 +107,71 @@ class SmotriIE(InfoExtractor):
# Download video JSON data
video_json_url = 'http://smotri.com/vt.php?id=%s' % real_video_id
video_json_page = self._download_webpage(video_json_url, video_id, u'Downloading video JSON')
video_json_page = self._download_webpage(video_json_url, video_id, 'Downloading video JSON')
video_json = json.loads(video_json_page)
status = video_json['status']
if status == self._VIDEO_NOT_FOUND:
raise ExtractorError(u'Video %s does not exist' % video_id, expected=True)
elif status == self._PASSWORD_DETECTED: # The video is protected by a password, retry with
raise ExtractorError('Video %s does not exist' % video_id, expected=True)
elif status == self._PASSWORD_DETECTED: # The video is protected by a password, retry with
# video-password set
video_password = self._downloader.params.get('videopassword', None)
if not video_password:
raise ExtractorError(u'This video is protected by a password, use the --video-password option', expected=True)
raise ExtractorError('This video is protected by a password, use the --video-password option', expected=True)
video_json_url += '&md5pass=%s' % hashlib.md5(video_password.encode('utf-8')).hexdigest()
video_json_page = self._download_webpage(video_json_url, video_id, u'Downloading video JSON (video-password set)')
video_json_page = self._download_webpage(video_json_url, video_id, 'Downloading video JSON (video-password set)')
video_json = json.loads(video_json_page)
status = video_json['status']
if status == self._PASSWORD_NOT_VERIFIED:
raise ExtractorError(u'Video password is invalid', expected=True)
raise ExtractorError('Video password is invalid', expected=True)
if status != self._SUCCESS:
raise ExtractorError(u'Unexpected status value %s' % status)
raise ExtractorError('Unexpected status value %s' % status)
# Extract the URL of the video
video_url = video_json['file_data']
# Video JSON does not provide enough meta data
# We will extract some from the video web page instead
video_page_url = 'http://' + mobj.group('url')
video_page = self._download_webpage(video_page_url, video_id, u'Downloading video page')
video_page = self._download_webpage(video_page_url, video_id, 'Downloading video page')
# Warning if video is unavailable
warning = self._html_search_regex(
r'<div class="videoUnModer">(.*?)</div>', video_page,
u'warning message', default=None)
'warning message', default=None)
if warning is not None:
self._downloader.report_warning(
u'Video %s may not be available; smotri said: %s ' %
'Video %s may not be available; smotri said: %s ' %
(video_id, warning))
# Adult content
if re.search(u'EroConfirmText">', video_page) is not None:
if re.search('EroConfirmText">', video_page) is not None:
self.report_age_confirmation()
confirm_string = self._html_search_regex(
r'<a href="/video/view/\?id=%s&confirm=([^"]+)" title="[^"]+">' % video_id,
video_page, u'confirm string')
video_page, 'confirm string')
confirm_url = video_page_url + '&confirm=%s' % confirm_string
video_page = self._download_webpage(confirm_url, video_id, u'Downloading video page (age confirmed)')
video_page = self._download_webpage(confirm_url, video_id, 'Downloading video page (age confirmed)')
adult_content = True
else:
adult_content = False
# Extract the rest of meta data
video_title = self._search_meta(u'name', video_page, u'title')
video_title = self._search_meta('name', video_page, 'title')
if not video_title:
video_title = os.path.splitext(url_basename(video_url))[0]
video_description = self._search_meta(u'description', video_page)
END_TEXT = u' на сайте Smotri.com'
video_description = self._search_meta('description', video_page)
END_TEXT = ' на сайте Smotri.com'
if video_description and video_description.endswith(END_TEXT):
video_description = video_description[:-len(END_TEXT)]
START_TEXT = u'Смотреть онлайн ролик '
START_TEXT = 'Смотреть онлайн ролик '
if video_description and video_description.startswith(START_TEXT):
video_description = video_description[len(START_TEXT):]
video_thumbnail = self._search_meta(u'thumbnail', video_page)
video_thumbnail = self._search_meta('thumbnail', video_page)
upload_date_str = self._search_meta(u'uploadDate', video_page, u'upload date')
upload_date_str = self._search_meta('uploadDate', video_page, 'upload date')
if upload_date_str:
upload_date_m = re.search(r'(?P<year>\d{4})\.(?P<month>\d{2})\.(?P<day>\d{2})T', upload_date_str)
video_upload_date = (
@@ -183,8 +184,8 @@ class SmotriIE(InfoExtractor):
)
else:
video_upload_date = None
duration_str = self._search_meta(u'duration', video_page)
duration_str = self._search_meta('duration', video_page)
if duration_str:
duration_m = re.search(r'T(?P<hours>[0-9]{2})H(?P<minutes>[0-9]{2})M(?P<seconds>[0-9]{2})S', duration_str)
video_duration = (
@@ -197,19 +198,19 @@ class SmotriIE(InfoExtractor):
)
else:
video_duration = None
video_uploader = self._html_search_regex(
u'<div class="DescrUser"><div>Автор.*?onmouseover="popup_user_info[^"]+">(.*?)</a>',
video_page, u'uploader', fatal=False, flags=re.MULTILINE|re.DOTALL)
'<div class="DescrUser"><div>Автор.*?onmouseover="popup_user_info[^"]+">(.*?)</a>',
video_page, 'uploader', fatal=False, flags=re.MULTILINE|re.DOTALL)
video_uploader_id = self._html_search_regex(
u'<div class="DescrUser"><div>Автор.*?onmouseover="popup_user_info\\(.*?\'([^\']+)\'\\);">',
video_page, u'uploader id', fatal=False, flags=re.MULTILINE|re.DOTALL)
'<div class="DescrUser"><div>Автор.*?onmouseover="popup_user_info\\(.*?\'([^\']+)\'\\);">',
video_page, 'uploader id', fatal=False, flags=re.MULTILINE|re.DOTALL)
video_view_count = self._html_search_regex(
u'Общее количество просмотров.*?<span class="Number">(\\d+)</span>',
video_page, u'view count', fatal=False, flags=re.MULTILINE|re.DOTALL)
'Общее количество просмотров.*?<span class="Number">(\\d+)</span>',
video_page, 'view count', fatal=False, flags=re.MULTILINE|re.DOTALL)
return {
'id': video_id,
'url': video_url,
@@ -227,8 +228,8 @@ class SmotriIE(InfoExtractor):
class SmotriCommunityIE(InfoExtractor):
IE_DESC = u'Smotri.com community videos'
IE_NAME = u'smotri:community'
IE_DESC = 'Smotri.com community videos'
IE_NAME = 'smotri:community'
_VALID_URL = r'^https?://(?:www\.)?smotri\.com/community/video/(?P<communityid>[0-9A-Za-z_\'-]+)'
def _real_extract(self, url):
@@ -236,21 +237,21 @@ class SmotriCommunityIE(InfoExtractor):
community_id = mobj.group('communityid')
url = 'http://smotri.com/export/rss/video/by/community/-/%s/video.xml' % community_id
rss = self._download_xml(url, community_id, u'Downloading community RSS')
rss = self._download_xml(url, community_id, 'Downloading community RSS')
entries = [self.url_result(video_url.text, 'Smotri')
for video_url in rss.findall('./channel/item/link')]
description_text = rss.find('./channel/description').text
community_title = self._html_search_regex(
u'^Видео сообщества "([^"]+)"$', description_text, u'community title')
'^Видео сообщества "([^"]+)"$', description_text, 'community title')
return self.playlist_result(entries, community_id, community_title)
class SmotriUserIE(InfoExtractor):
IE_DESC = u'Smotri.com user videos'
IE_NAME = u'smotri:user'
IE_DESC = 'Smotri.com user videos'
IE_NAME = 'smotri:user'
_VALID_URL = r'^https?://(?:www\.)?smotri\.com/user/(?P<userid>[0-9A-Za-z_\'-]+)'
def _real_extract(self, url):
@@ -258,22 +259,22 @@ class SmotriUserIE(InfoExtractor):
user_id = mobj.group('userid')
url = 'http://smotri.com/export/rss/user/video/-/%s/video.xml' % user_id
rss = self._download_xml(url, user_id, u'Downloading user RSS')
rss = self._download_xml(url, user_id, 'Downloading user RSS')
entries = [self.url_result(video_url.text, 'Smotri')
for video_url in rss.findall('./channel/item/link')]
description_text = rss.find('./channel/description').text
user_nickname = self._html_search_regex(
u'^Видео режиссера (.*)$', description_text,
u'user nickname')
'^Видео режиссера (.*)$', description_text,
'user nickname')
return self.playlist_result(entries, user_id, user_nickname)
class SmotriBroadcastIE(InfoExtractor):
IE_DESC = u'Smotri.com broadcasts'
IE_NAME = u'smotri:broadcast'
IE_DESC = 'Smotri.com broadcasts'
IE_NAME = 'smotri:broadcast'
_VALID_URL = r'^https?://(?:www\.)?(?P<url>smotri\.com/live/(?P<broadcastid>[^/]+))/?.*'
def _real_extract(self, url):
@@ -281,46 +282,40 @@ class SmotriBroadcastIE(InfoExtractor):
broadcast_id = mobj.group('broadcastid')
broadcast_url = 'http://' + mobj.group('url')
broadcast_page = self._download_webpage(broadcast_url, broadcast_id, u'Downloading broadcast page')
broadcast_page = self._download_webpage(broadcast_url, broadcast_id, 'Downloading broadcast page')
if re.search(u'>Режиссер с логином <br/>"%s"<br/> <span>не существует<' % broadcast_id, broadcast_page) is not None:
raise ExtractorError(u'Broadcast %s does not exist' % broadcast_id, expected=True)
if re.search('>Режиссер с логином <br/>"%s"<br/> <span>не существует<' % broadcast_id, broadcast_page) is not None:
raise ExtractorError('Broadcast %s does not exist' % broadcast_id, expected=True)
# Adult content
if re.search(u'EroConfirmText">', broadcast_page) is not None:
if re.search('EroConfirmText">', broadcast_page) is not None:
(username, password) = self._get_login_info()
if username is None:
raise ExtractorError(u'Erotic broadcasts allowed only for registered users, '
u'use --username and --password options to provide account credentials.', expected=True)
raise ExtractorError('Erotic broadcasts allowed only for registered users, '
'use --username and --password options to provide account credentials.', expected=True)
# Log in
login_form_strs = {
u'login-hint53': '1',
u'confirm_erotic': '1',
u'login': username,
u'password': password,
login_form = {
'login-hint53': '1',
'confirm_erotic': '1',
'login': username,
'password': password,
}
# Convert to UTF-8 *before* urlencode because Python 2.x's urlencode
# chokes on unicode
login_form = dict((k.encode('utf-8'), v.encode('utf-8')) for k,v in login_form_strs.items())
login_data = compat_urllib_parse.urlencode(login_form).encode('utf-8')
login_url = broadcast_url + '/?no_redirect=1'
request = compat_urllib_request.Request(login_url, login_data)
request.add_header('Content-Type', 'application/x-www-form-urlencoded')
broadcast_page = self._download_webpage(
request, broadcast_id, note=u'Logging in and confirming age')
if re.search(u'>Неверный логин или пароль<', broadcast_page) is not None:
raise ExtractorError(u'Unable to log in: bad username or password', expected=True)
request = compat_urllib_request.Request(broadcast_url + '/?no_redirect=1', compat_urllib_parse.urlencode(login_form))
request.add_header('Content-Type', 'application/x-www-form-urlencoded')
broadcast_page = self._download_webpage(request, broadcast_id, 'Logging in and confirming age')
if re.search('>Неверный логин или пароль<', broadcast_page) is not None:
raise ExtractorError('Unable to log in: bad username or password', expected=True)
adult_content = True
else:
adult_content = False
ticket = self._html_search_regex(
u'window\.broadcast_control\.addFlashVar\\(\'file\', \'([^\']+)\'\\);',
broadcast_page, u'broadcast ticket')
'window\.broadcast_control\.addFlashVar\\(\'file\', \'([^\']+)\'\\);',
broadcast_page, 'broadcast ticket')
url = 'http://smotri.com/broadcast/view/url/?ticket=%s' % ticket
@@ -328,22 +323,22 @@ class SmotriBroadcastIE(InfoExtractor):
if broadcast_password:
url += '&pass=%s' % hashlib.md5(broadcast_password.encode('utf-8')).hexdigest()
broadcast_json_page = self._download_webpage(url, broadcast_id, u'Downloading broadcast JSON')
broadcast_json_page = self._download_webpage(url, broadcast_id, 'Downloading broadcast JSON')
try:
broadcast_json = json.loads(broadcast_json_page)
protected_broadcast = broadcast_json['_pass_protected'] == 1
if protected_broadcast and not broadcast_password:
raise ExtractorError(u'This broadcast is protected by a password, use the --video-password option', expected=True)
raise ExtractorError('This broadcast is protected by a password, use the --video-password option', expected=True)
broadcast_offline = broadcast_json['is_play'] == 0
if broadcast_offline:
raise ExtractorError(u'Broadcast %s is offline' % broadcast_id, expected=True)
raise ExtractorError('Broadcast %s is offline' % broadcast_id, expected=True)
rtmp_url = broadcast_json['_server']
if not rtmp_url.startswith('rtmp://'):
raise ExtractorError(u'Unexpected broadcast rtmp URL')
raise ExtractorError('Unexpected broadcast rtmp URL')
broadcast_playpath = broadcast_json['_streamName']
broadcast_thumbnail = broadcast_json['_imgURL']
@@ -354,8 +349,8 @@ class SmotriBroadcastIE(InfoExtractor):
rtmp_conn = 'S:%s' % uuid.uuid4().hex
except KeyError:
if protected_broadcast:
raise ExtractorError(u'Bad broadcast password', expected=True)
raise ExtractorError(u'Unexpected broadcast JSON')
raise ExtractorError('Bad broadcast password', expected=True)
raise ExtractorError('Unexpected broadcast JSON')
return {
'id': broadcast_id,

View File

@@ -1,3 +1,5 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
@@ -7,13 +9,13 @@ from ..utils import (
class TumblrIE(InfoExtractor):
_VALID_URL = r'http://(?P<blog_name>.*?)\.tumblr\.com/((post)|(video))/(?P<id>\d*)/(.*?)'
_VALID_URL = r'http://(?P<blog_name>.*?)\.tumblr\.com/((post)|(video))/(?P<id>\d*)($|/)'
_TEST = {
u'url': u'http://tatianamaslanydaily.tumblr.com/post/54196191430/orphan-black-dvd-extra-behind-the-scenes',
u'file': u'54196191430.mp4',
u'md5': u'479bb068e5b16462f5176a6828829767',
u'info_dict': {
u"title": u"tatiana maslany news"
'url': 'http://tatianamaslanydaily.tumblr.com/post/54196191430/orphan-black-dvd-extra-behind-the-scenes',
'file': '54196191430.mp4',
'md5': '479bb068e5b16462f5176a6828829767',
'info_dict': {
"title": "tatiana maslany news"
}
}
@@ -28,18 +30,20 @@ class TumblrIE(InfoExtractor):
re_video = r'src=\\x22(?P<video_url>http://%s\.tumblr\.com/video_file/%s/(.*?))\\x22 type=\\x22video/(?P<ext>.*?)\\x22' % (blog, video_id)
video = re.search(re_video, webpage)
if video is None:
raise ExtractorError(u'Unable to extract video')
raise ExtractorError('Unable to extract video')
video_url = video.group('video_url')
ext = video.group('ext')
video_thumbnail = self._search_regex(r'posters(.*?)\[\\x22(?P<thumb>.*?)\\x22',
webpage, u'thumbnail', fatal=False) # We pick the first poster
if video_thumbnail: video_thumbnail = video_thumbnail.replace('\\', '')
video_thumbnail = self._search_regex(
r'posters.*?\[\\x22(.*?)\\x22',
webpage, 'thumbnail', fatal=False) # We pick the first poster
if video_thumbnail:
video_thumbnail = video_thumbnail.replace('\\\\/', '/')
# The only place where you can get a title, it's not complete,
# but searching in other places doesn't work for all videos
video_title = self._html_search_regex(r'<title>(?P<title>.*?)(?: \| Tumblr)?</title>',
webpage, u'title', flags=re.DOTALL)
webpage, 'title', flags=re.DOTALL)
return [{'id': video_id,
'url': video_url,

View File

@@ -291,7 +291,7 @@ class VimeoIE(InfoExtractor):
class VimeoChannelIE(InfoExtractor):
IE_NAME = 'vimeo:channel'
_VALID_URL = r'(?:https?://)?vimeo.\com/channels/(?P<id>[^/]+)'
_VALID_URL = r'(?:https?://)?vimeo\.com/channels/(?P<id>[^/]+)'
_MORE_PAGES_INDICATOR = r'<a.+?rel="next"'
_TITLE_RE = r'<link rel="alternate"[^>]+?title="(.*?)"'
@@ -327,7 +327,7 @@ class VimeoChannelIE(InfoExtractor):
class VimeoUserIE(VimeoChannelIE):
IE_NAME = 'vimeo:user'
_VALID_URL = r'(?:https?://)?vimeo.\com/(?P<name>[^/]+)(?:/videos|[#?]|$)'
_VALID_URL = r'(?:https?://)?vimeo\.com/(?P<name>[^/]+)(?:/videos|[#?]|$)'
_TITLE_RE = r'<a[^>]+?class="user">([^<>]+?)</a>'
@classmethod
@@ -344,7 +344,7 @@ class VimeoUserIE(VimeoChannelIE):
class VimeoAlbumIE(VimeoChannelIE):
IE_NAME = 'vimeo:album'
_VALID_URL = r'(?:https?://)?vimeo.\com/album/(?P<id>\d+)'
_VALID_URL = r'(?:https?://)?vimeo\.com/album/(?P<id>\d+)'
_TITLE_RE = r'<header id="page_header">\n\s*<h1>(.*?)</h1>'
def _page_url(self, base_url, pagenum):
@@ -358,7 +358,7 @@ class VimeoAlbumIE(VimeoChannelIE):
class VimeoGroupsIE(VimeoAlbumIE):
IE_NAME = 'vimeo:group'
_VALID_URL = r'(?:https?://)?vimeo.\com/groups/(?P<name>[^/]+)'
_VALID_URL = r'(?:https?://)?vimeo\.com/groups/(?P<name>[^/]+)'
def _extract_list_title(self, webpage):
return self._og_search_title(webpage)
@@ -372,7 +372,7 @@ class VimeoGroupsIE(VimeoAlbumIE):
class VimeoReviewIE(InfoExtractor):
IE_NAME = 'vimeo:review'
IE_DESC = 'Review pages on vimeo'
_VALID_URL = r'(?:https?://)?vimeo.\com/[^/]+/review/(?P<id>[^/]+)'
_VALID_URL = r'(?:https?://)?vimeo\.com/[^/]+/review/(?P<id>[^/]+)'
_TEST = {
'url': 'https://vimeo.com/user21297594/review/75524534/3c257a1b5d',
'file': '75524534.mp4',

View File

@@ -1,59 +0,0 @@
# coding: utf-8
import re
from ..utils import (
compat_urllib_request,
compat_urllib_parse
)
from .common import InfoExtractor
class WeBSurgIE(InfoExtractor):
IE_NAME = u'websurg.com'
_VALID_URL = r'http://.*?\.websurg\.com/MEDIA/\?noheader=1&doi=(.*)'
_TEST = {
u'url': u'http://www.websurg.com/MEDIA/?noheader=1&doi=vd01en4012',
u'file': u'vd01en4012.mp4',
u'params': {
u'skip_download': True,
},
u'skip': u'Requires login information',
}
_LOGIN_URL = 'http://www.websurg.com/inc/login/login_div.ajax.php?login=1'
def _real_initialize(self):
login_form = {
'username': self._downloader.params['username'],
'password': self._downloader.params['password'],
'Submit': 1
}
request = compat_urllib_request.Request(
self._LOGIN_URL, compat_urllib_parse.urlencode(login_form))
request.add_header(
'Content-Type', 'application/x-www-form-urlencoded;charset=utf-8')
compat_urllib_request.urlopen(request).info()
webpage = self._download_webpage(self._LOGIN_URL, '', 'Logging in')
if webpage != 'OK':
self._downloader.report_error(
u'Unable to log in: bad username/password')
def _real_extract(self, url):
video_id = re.match(self._VALID_URL, url).group(1)
webpage = self._download_webpage(url, video_id)
url_info = re.search(r'streamer="(.*?)" src="(.*?)"', webpage)
return {'id': video_id,
'title': self._og_search_title(webpage),
'description': self._og_search_description(webpage),
'ext' : 'mp4',
'url' : url_info.group(1) + '/' + url_info.group(2),
'thumbnail': self._og_search_thumbnail(webpage)
}

View File

@@ -40,7 +40,7 @@ class YoutubeBaseInfoExtractor(InfoExtractor):
"""Provide base functions for Youtube extractors"""
_LOGIN_URL = 'https://accounts.google.com/ServiceLogin'
_LANG_URL = r'https://www.youtube.com/?hl=en&persist_hl=1&gl=US&persist_gl=1&opt_out_ackd=1'
_AGE_URL = 'http://www.youtube.com/verify_age?next_url=/&gl=US&hl=en'
_AGE_URL = 'https://www.youtube.com/verify_age?next_url=/&gl=US&hl=en'
_NETRC_MACHINE = 'youtube'
# If True it will raise an error if no login info is provided
_LOGIN_REQUIRED = False
@@ -111,7 +111,8 @@ class YoutubeBaseInfoExtractor(InfoExtractor):
'next_url': '/',
'action_confirm': 'Confirm',
}
req = compat_urllib_request.Request(self._AGE_URL, compat_urllib_parse.urlencode(age_form))
req = compat_urllib_request.Request(self._AGE_URL,
compat_urllib_parse.urlencode(age_form).encode('ascii'))
self._download_webpage(
req, None,
@@ -207,6 +208,12 @@ class YoutubeIE(YoutubeBaseInfoExtractor, SubtitlesInfoExtractor):
'141': {'ext': 'm4a', 'format_note': 'DASH audio', 'vcodec': 'none', 'abr': 256, 'preference': -50},
# Dash webm
'167': {'ext': 'webm', 'height': 360, 'width': 640, 'format_note': 'DASH video', 'container': 'webm', 'vcodec': 'VP8', 'acodec': 'none', 'preference': -40},
'168': {'ext': 'webm', 'height': 480, 'width': 854, 'format_note': 'DASH video', 'container': 'webm', 'vcodec': 'VP8', 'acodec': 'none', 'preference': -40},
'169': {'ext': 'webm', 'height': 720, 'width': 1280, 'format_note': 'DASH video', 'container': 'webm', 'vcodec': 'VP8', 'acodec': 'none', 'preference': -40},
'170': {'ext': 'webm', 'height': 1080, 'width': 1920, 'format_note': 'DASH video', 'container': 'webm', 'vcodec': 'VP8', 'acodec': 'none', 'preference': -40},
'218': {'ext': 'webm', 'height': 480, 'width': 854, 'format_note': 'DASH video', 'container': 'webm', 'vcodec': 'VP8', 'acodec': 'none', 'preference': -40},
'219': {'ext': 'webm', 'height': 480, 'width': 854, 'format_note': 'DASH video', 'container': 'webm', 'vcodec': 'VP8', 'acodec': 'none', 'preference': -40},
'242': {'ext': 'webm', 'height': 240, 'resolution': '240p', 'format_note': 'DASH webm', 'preference': -40},
'243': {'ext': 'webm', 'height': 360, 'resolution': '360p', 'format_note': 'DASH webm', 'preference': -40},
'244': {'ext': 'webm', 'height': 480, 'resolution': '480p', 'format_note': 'DASH webm', 'preference': -40},
@@ -1008,7 +1015,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor, SubtitlesInfoExtractor):
def _get_available_subtitles(self, video_id, webpage):
try:
sub_list = self._download_webpage(
'http://video.google.com/timedtext?hl=en&type=list&v=%s' % video_id,
'https://video.google.com/timedtext?hl=en&type=list&v=%s' % video_id,
video_id, note=False)
except ExtractorError as err:
self._downloader.report_warning(u'unable to download video subtitles: %s' % compat_str(err))
@@ -1024,7 +1031,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor, SubtitlesInfoExtractor):
'fmt': self._downloader.params.get('subtitlesformat', 'srt'),
'name': unescapeHTML(l[0]).encode('utf-8'),
})
url = u'http://www.youtube.com/api/timedtext?' + params
url = u'https://www.youtube.com/api/timedtext?' + params
sub_lang_list[lang] = url
if not sub_lang_list:
self._downloader.report_warning(u'video doesn\'t have subtitles')
@@ -1290,7 +1297,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor, SubtitlesInfoExtractor):
'url': video_real_url,
'player_url': player_url,
}
dct.update(self._formats[itag])
if itag in self._formats:
dct.update(self._formats[itag])
formats.append(dct)
return formats
@@ -1522,7 +1530,7 @@ class YoutubeTopListIE(YoutubePlaylistIE):
channel = mobj.group('chann')
title = mobj.group('title')
query = compat_urllib_parse.urlencode({'title': title})
playlist_re = 'href="([^"]+?%s[^"]+?)"' % re.escape(query)
playlist_re = 'href="([^"]+?%s.*?)"' % re.escape(query)
channel_page = self._download_webpage('https://www.youtube.com/%s' % channel, title)
link = self._html_search_regex(playlist_re, channel_page, u'list')
url = compat_urlparse.urljoin('https://www.youtube.com/', link)
@@ -1547,7 +1555,7 @@ class YoutubeChannelIE(InfoExtractor):
IE_DESC = u'YouTube.com channels'
_VALID_URL = r"^(?:https?://)?(?:youtu\.be|(?:\w+\.)?youtube(?:-nocookie)?\.com)/channel/([0-9A-Za-z_-]+)"
_MORE_PAGES_INDICATOR = 'yt-uix-load-more'
_MORE_PAGES_URL = 'http://www.youtube.com/c4_browse_ajax?action_load_more_videos=1&flow=list&paging=%s&view=0&sort=da&channel_id=%s'
_MORE_PAGES_URL = 'https://www.youtube.com/c4_browse_ajax?action_load_more_videos=1&flow=list&paging=%s&view=0&sort=da&channel_id=%s'
IE_NAME = u'youtube:channel'
def extract_videos_from_page(self, page):
@@ -1603,9 +1611,9 @@ class YoutubeChannelIE(InfoExtractor):
class YoutubeUserIE(InfoExtractor):
IE_DESC = u'YouTube.com user videos (URL or "ytuser" keyword)'
_VALID_URL = r'(?:(?:(?:https?://)?(?:\w+\.)?youtube\.com/(?:user/)?(?!(?:attribution_link|watch)(?:$|[^a-z_A-Z0-9-])))|ytuser:)(?!feed/)([A-Za-z0-9_-]+)'
_TEMPLATE_URL = 'http://gdata.youtube.com/feeds/api/users/%s'
_TEMPLATE_URL = 'https://gdata.youtube.com/feeds/api/users/%s'
_GDATA_PAGE_SIZE = 50
_GDATA_URL = 'http://gdata.youtube.com/feeds/api/users/%s/uploads?max-results=%d&start-index=%d&alt=json'
_GDATA_URL = 'https://gdata.youtube.com/feeds/api/users/%s/uploads?max-results=%d&start-index=%d&alt=json'
IE_NAME = u'youtube:user'
@classmethod
@@ -1654,7 +1662,7 @@ class YoutubeUserIE(InfoExtractor):
'_type': 'url',
'url': video_id,
'ie_key': 'Youtube',
'id': 'video_id',
'id': video_id,
'title': title,
}
url_results = PagedList(download_page, self._GDATA_PAGE_SIZE)
@@ -1736,7 +1744,7 @@ class YoutubeFeedsInfoExtractor(YoutubeBaseInfoExtractor):
action = 'action_load_system_feed'
if self._PERSONAL_FEED:
action = 'action_load_personal_feed'
return 'http://www.youtube.com/feed_ajax?%s=1&feed_name=%s&paging=%%s' % (action, self._FEED_NAME)
return 'https://www.youtube.com/feed_ajax?%s=1&feed_name=%s&paging=%%s' % (action, self._FEED_NAME)
@property
def IE_NAME(self):
@@ -1805,7 +1813,10 @@ class YoutubeFavouritesIE(YoutubeBaseInfoExtractor):
class YoutubeTruncatedURLIE(InfoExtractor):
IE_NAME = 'youtube:truncated_url'
IE_DESC = False # Do not list
_VALID_URL = r'(?:https?://)?[^/]+/watch\?feature=[a-z_]+$'
_VALID_URL = r'''(?x)
(?:https?://)?[^/]+/watch\?feature=[a-z_]+$|
(?:https?://)?(?:www\.)?youtube\.com/attribution_link\?a=[^&]+$
'''
def _real_extract(self, url):
raise ExtractorError(

View File

@@ -1,2 +1,2 @@
__version__ = '2014.01.23.1'
__version__ = '2014.01.29'