Compare commits

...

60 Commits

Author SHA1 Message Date
Philipp Hagemeister
dfe029a62c release 2014.07.23.2 2014-07-23 02:25:27 +02:00
Philipp Hagemeister
b0472057a3 [YoutubeDL] Make sure we really, really get out the encoding string
Fixes #3326
Apparently, on some platforms, even outputting this fails already.
2014-07-23 02:24:52 +02:00
Philipp Hagemeister
c081b35c27 [youtube] Support new player URLs (Fixes #3326) 2014-07-23 02:19:33 +02:00
Philipp Hagemeister
9f43890bcd [jsinterp] Allow digits in function names 2014-07-23 02:13:48 +02:00
Philipp Hagemeister
94a20aa5f8 [rtlnow] Simplify outdated test 2014-07-23 01:49:25 +02:00
Philipp Hagemeister
94e8df3a7e [wdr] Fix umlaut parsing on Python 2.x 2014-07-23 01:47:36 +02:00
Philipp Hagemeister
37e64addc8 [nbc] Add missing import 2014-07-23 01:47:18 +02:00
Philipp Hagemeister
d82ba23ba5 [soundcloud:playlist] Fix test description 2014-07-23 01:44:08 +02:00
Philipp Hagemeister
0fd7fd71b4 [test/helper] Do not use deprecated method 2014-07-23 01:43:46 +02:00
Philipp Hagemeister
eae12e3fe3 [soundcloud] Adapt test 2014-07-23 01:41:45 +02:00
Philipp Hagemeister
798a2cad4f [sockshare] Fix ext 2014-07-23 01:40:01 +02:00
Philipp Hagemeister
41c0849429 [savefrom] Make test description more flexible 2014-07-23 01:38:07 +02:00
Philipp Hagemeister
a4e5af1184 release 2014.07.23.1 2014-07-23 01:27:33 +02:00
Philipp Hagemeister
b090af5922 [vube] Fix comment count 2014-07-23 01:27:25 +02:00
Philipp Hagemeister
388841f819 release 2014.07.23 2014-07-23 01:18:42 +02:00
Philipp Hagemeister
1a2ecbfbc4 [vube] Add support for new data format (Fixes #3325) 2014-07-23 01:18:27 +02:00
Philipp Hagemeister
38e292b112 [mlb] Fix regex 2014-07-22 23:55:41 +02:00
Charles Chen
c4f731262d Merge remote-tracking branch 'upstream/master' into MLB
Conflicts:
	youtube_dl/extractor/mlb.py
2014-07-22 14:44:38 -07:00
Charles Chen
07cc63f386 [MLB] Enhanced _VALID_URL to cover more MLB videos 2014-07-22 14:10:27 -07:00
Philipp Hagemeister
e42a692f00 [cbs] Modernize
Also add threatening skip blocks in there - access is only possible from the US. We may want to find a better geolocation restriction method for tests.
2014-07-22 17:34:35 +02:00
Philipp Hagemeister
6ec7538bb4 Merge remote-tracking branch 'jterk/cbs-artists' 2014-07-22 17:29:09 +02:00
Jason Terk
2871d489a9 Support Alternative cbs.com URL Format
Adds support for cbs.com URLs containing "/artist" instead of
"/video". E.g.:
http://www.cbs.com/shows/liveonletterman/artist/221752/st-vincent/
2014-07-22 08:00:08 -07:00
Philipp Hagemeister
1771ddd85d release 2014.07.22 2014-07-22 16:59:40 +02:00
Philipp Hagemeister
5198bf68fc Merge remote-tracking branch 'origin/master' 2014-07-22 16:59:31 +02:00
Philipp Hagemeister
e00fc35dbe [kickstarter] Support embedded videos (Fixes #3322) 2014-07-22 16:57:43 +02:00
Sergey M․
8904e979df [vodlocker] Fix _VALID_URL 2014-07-22 20:37:33 +07:00
Philipp Hagemeister
53eb217661 Add another great example for the --extractor-descriptions output 2014-07-22 04:53:14 +02:00
Jaime Marquínez Ferrándiz
9dcb8f3fc7 [br] Allow '_' in the url (fixes #3311) 2014-07-21 20:43:56 +02:00
Philipp Hagemeister
1e8ac8364b release 2014.07.21 2014-07-21 18:06:51 +02:00
Philipp Hagemeister
754d8a035e [nbcnews] Look in all playlists for video 2014-07-21 18:06:21 +02:00
Philipp Hagemeister
f1f725c6a0 [dropbox] Fix title encoding on Python 2 2014-07-21 13:55:47 +02:00
Philipp Hagemeister
06c155420f [sockshare] Simplify (#3268) 2014-07-21 13:25:59 +02:00
Philipp Hagemeister
7dabd2ac45 Merge remote-tracking branch 'naglis/sockshare'
Conflicts:
	youtube_dl/extractor/__init__.py
2014-07-21 13:24:15 +02:00
Philipp Hagemeister
df8ba0d2cf [tagesschau] Remove test case
See http://de.wikipedia.org/wiki/Depublizieren for the sad rationale.
2014-07-21 13:22:15 +02:00
Philipp Hagemeister
ff1956e07b [wdr] Replace test case 2014-07-21 13:19:41 +02:00
Philipp Hagemeister
caf5a8817b [chilloutzone] Fix test description 2014-07-21 13:16:48 +02:00
Philipp Hagemeister
a850fde1d8 [funnyordie] Fix test description 2014-07-21 13:14:41 +02:00
Philipp Hagemeister
0e6ebc13d1 [vimeo] Update test description 2014-07-21 13:11:24 +02:00
Philipp Hagemeister
6f5342a201 [cnet] Fix title extraction
URLs are still missing
2014-07-21 13:03:19 +02:00
Philipp Hagemeister
264a7044f5 [dropbox] Fix test and add support for spaces in filenames 2014-07-21 12:57:40 +02:00
Philipp Hagemeister
1a30deca50 [teachertube] Fix title and playlist recognition 2014-07-21 12:47:01 +02:00
Philipp Hagemeister
d8624e6a80 [test_playlist] Add and use assertGreaterEqual 2014-07-21 12:25:49 +02:00
Philipp Hagemeister
4f95d455ed [steam] Update test description 2014-07-21 12:17:44 +02:00
Philipp Hagemeister
468d19a9c1 [savefrom] Fix test description 2014-07-21 12:15:23 +02:00
Philipp Hagemeister
9aeaf730ad [rtve] Fix md5sum
Looks like these guys reencoded the video.
2014-07-21 12:14:07 +02:00
Philipp Hagemeister
db964a33a1 Remove unused imports 2014-07-21 12:12:50 +02:00
Philipp Hagemeister
da8fb85859 [snotr] Add description 2014-07-21 12:08:44 +02:00
Philipp Hagemeister
54330a1c3c [swfinterp] Fix imports 2014-07-21 12:07:26 +02:00
Philipp Hagemeister
9732d77ed2 [snotr] PEP8 and minor fixes (#3296) 2014-07-21 12:02:44 +02:00
Philipp Hagemeister
199ece7eb8 Merge remote-tracking branch 'hassaanaliw/snotr' 2014-07-21 11:43:46 +02:00
Philipp Hagemeister
1997eb0078 Merge pull request #3310 from bentley/master
Fix typo: “ytseach” → “ytsearch”
2014-07-21 09:22:58 +02:00
Anthony J. Bentley
eef4a7a304 Fix typo: “ytseach” → “ytsearch” 2014-07-20 18:37:44 -06:00
Philipp Hagemeister
246168bd72 Remove unused imports 2014-07-20 23:38:44 +02:00
Philipp Hagemeister
7fbf54dc62 [swfinterp] Remove (at the moment) dead code 2014-07-20 23:37:10 +02:00
Philipp Hagemeister
351f373865 [swfinterp] Fix _u32 name 2014-07-20 23:36:21 +02:00
Philipp Hagemeister
72e785f36a [livestream] PEP8 2014-07-20 23:34:20 +02:00
Philipp Hagemeister
727d2930f2 release 2014.07.20.2 2014-07-20 23:23:01 +02:00
Philipp Hagemeister
c13bf7c836 [swfinterp] Use helper function struct_unpack for old Python 2.x releases (#3270) 2014-07-20 23:20:15 +02:00
hassaanaliw
8adec2b9e0 [snotr] Add new extractor 2014-07-19 22:49:25 +05:00
Naglis Jonaitis
66aa382eae [sockshare] Add new extractor 2014-07-16 02:07:20 +03:00
39 changed files with 461 additions and 175 deletions

View File

@@ -137,8 +137,8 @@ def expect_info_dict(self, expected_dict, got_dict):
def assertRegexpMatches(self, text, regexp, msg=None):
if hasattr(self, 'assertRegexpMatches'):
return self.assertRegexpMatches(text, regexp, msg)
if hasattr(self, 'assertRegexp'):
return self.assertRegexp(text, regexp, msg)
else:
m = re.match(regexp, text)
if not m:
@@ -148,3 +148,10 @@ def assertRegexpMatches(self, text, regexp, msg=None):
else:
msg = note + ', ' + msg
self.assertTrue(m, msg)
def assertGreaterEqual(self, got, expected, msg=None):
if not (got >= expected):
if msg is None:
msg = '%r not greater than or equal to %r' % (got, expected)
self.assertTrue(got >= expected, msg)

View File

@@ -11,6 +11,7 @@ sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import (
assertRegexpMatches,
assertGreaterEqual,
expect_info_dict,
FakeYDL,
)
@@ -71,8 +72,8 @@ class TestPlaylists(unittest.TestCase):
ie = DailymotionUserIE(dl)
result = ie.extract('https://www.dailymotion.com/user/nqtv')
self.assertIsPlaylist(result)
assertGreaterEqual(self, len(result['entries']), 100)
self.assertEqual(result['title'], 'Rémi Gaillard')
self.assertTrue(len(result['entries']) >= 100)
def test_vimeo_channel(self):
dl = FakeYDL()
@@ -111,7 +112,7 @@ class TestPlaylists(unittest.TestCase):
ie = VineUserIE(dl)
result = ie.extract('https://vine.co/Visa')
self.assertIsPlaylist(result)
self.assertTrue(len(result['entries']) >= 47)
assertGreaterEqual(self, len(result['entries']), 47)
def test_ustream_channel(self):
dl = FakeYDL()
@@ -119,7 +120,7 @@ class TestPlaylists(unittest.TestCase):
result = ie.extract('http://www.ustream.tv/channel/channeljapan')
self.assertIsPlaylist(result)
self.assertEqual(result['id'], '10874166')
self.assertTrue(len(result['entries']) >= 54)
assertGreaterEqual(self, len(result['entries']), 54)
def test_soundcloud_set(self):
dl = FakeYDL()
@@ -127,7 +128,7 @@ class TestPlaylists(unittest.TestCase):
result = ie.extract('https://soundcloud.com/the-concept-band/sets/the-royal-concept-ep')
self.assertIsPlaylist(result)
self.assertEqual(result['title'], 'The Royal Concept EP')
self.assertTrue(len(result['entries']) >= 6)
assertGreaterEqual(self, len(result['entries']), 6)
def test_soundcloud_user(self):
dl = FakeYDL()
@@ -135,7 +136,7 @@ class TestPlaylists(unittest.TestCase):
result = ie.extract('https://soundcloud.com/the-concept-band')
self.assertIsPlaylist(result)
self.assertEqual(result['id'], '9615865')
self.assertTrue(len(result['entries']) >= 12)
assertGreaterEqual(self, len(result['entries']), 12)
def test_soundcloud_likes(self):
dl = FakeYDL()
@@ -143,7 +144,7 @@ class TestPlaylists(unittest.TestCase):
result = ie.extract('https://soundcloud.com/the-concept-band/likes')
self.assertIsPlaylist(result)
self.assertEqual(result['id'], '9615865')
self.assertTrue(len(result['entries']) >= 1)
assertGreaterEqual(self, len(result['entries']), 1)
def test_soundcloud_playlist(self):
dl = FakeYDL()
@@ -153,7 +154,7 @@ class TestPlaylists(unittest.TestCase):
self.assertEqual(result['id'], '4110309')
self.assertEqual(result['title'], 'TILT Brass - Bowery Poetry Club, August \'03 [Non-Site SCR 02]')
assertRegexpMatches(
self, result['description'], r'TILT Brass - Bowery Poetry Club')
self, result['description'], r'.*?TILT Brass - Bowery Poetry Club')
self.assertEqual(len(result['entries']), 6)
def test_livestream_event(self):
@@ -162,7 +163,7 @@ class TestPlaylists(unittest.TestCase):
result = ie.extract('http://new.livestream.com/tedx/cityenglish')
self.assertIsPlaylist(result)
self.assertEqual(result['title'], 'TEDCity2.0 (English)')
self.assertTrue(len(result['entries']) >= 4)
assertGreaterEqual(self, len(result['entries']), 4)
def test_livestreamoriginal_folder(self):
dl = FakeYDL()
@@ -170,7 +171,7 @@ class TestPlaylists(unittest.TestCase):
result = ie.extract('https://www.livestream.com/newplay/folder?dirId=a07bf706-d0e4-4e75-a747-b021d84f2fd3')
self.assertIsPlaylist(result)
self.assertEqual(result['id'], 'a07bf706-d0e4-4e75-a747-b021d84f2fd3')
self.assertTrue(len(result['entries']) >= 28)
assertGreaterEqual(self, len(result['entries']), 28)
def test_nhl_videocenter(self):
dl = FakeYDL()
@@ -187,7 +188,7 @@ class TestPlaylists(unittest.TestCase):
result = ie.extract('http://bambuser.com/channel/pixelversity')
self.assertIsPlaylist(result)
self.assertEqual(result['title'], 'pixelversity')
self.assertTrue(len(result['entries']) >= 60)
assertGreaterEqual(self, len(result['entries']), 60)
def test_bandcamp_album(self):
dl = FakeYDL()
@@ -195,7 +196,7 @@ class TestPlaylists(unittest.TestCase):
result = ie.extract('http://mpallante.bandcamp.com/album/nightmare-night-ep')
self.assertIsPlaylist(result)
self.assertEqual(result['title'], 'Nightmare Night EP')
self.assertTrue(len(result['entries']) >= 4)
assertGreaterEqual(self, len(result['entries']), 4)
def test_smotri_community(self):
dl = FakeYDL()
@@ -204,7 +205,7 @@ class TestPlaylists(unittest.TestCase):
self.assertIsPlaylist(result)
self.assertEqual(result['id'], 'kommuna')
self.assertEqual(result['title'], 'КПРФ')
self.assertTrue(len(result['entries']) >= 4)
assertGreaterEqual(self, len(result['entries']), 4)
def test_smotri_user(self):
dl = FakeYDL()
@@ -213,7 +214,7 @@ class TestPlaylists(unittest.TestCase):
self.assertIsPlaylist(result)
self.assertEqual(result['id'], 'inspector')
self.assertEqual(result['title'], 'Inspector')
self.assertTrue(len(result['entries']) >= 9)
assertGreaterEqual(self, len(result['entries']), 9)
def test_AcademicEarthCourse(self):
dl = FakeYDL()
@@ -232,7 +233,7 @@ class TestPlaylists(unittest.TestCase):
self.assertIsPlaylist(result)
self.assertEqual(result['id'], 'dvoe_iz_lartsa')
self.assertEqual(result['title'], 'Двое из ларца (2006 - 2008)')
self.assertTrue(len(result['entries']) >= 24)
assertGreaterEqual(self, len(result['entries']), 24)
def test_ivi_compilation_season(self):
dl = FakeYDL()
@@ -241,7 +242,7 @@ class TestPlaylists(unittest.TestCase):
self.assertIsPlaylist(result)
self.assertEqual(result['id'], 'dvoe_iz_lartsa/season1')
self.assertEqual(result['title'], 'Двое из ларца (2006 - 2008) 1 сезон')
self.assertTrue(len(result['entries']) >= 12)
assertGreaterEqual(self, len(result['entries']), 12)
def test_imdb_list(self):
dl = FakeYDL()
@@ -260,7 +261,7 @@ class TestPlaylists(unittest.TestCase):
self.assertEqual(result['id'], 'cryptography')
self.assertEqual(result['title'], 'Journey into cryptography')
self.assertEqual(result['description'], 'How have humans protected their secret messages through history? What has changed today?')
self.assertTrue(len(result['entries']) >= 3)
assertGreaterEqual(self, len(result['entries']), 3)
def test_EveryonesMixtape(self):
dl = FakeYDL()
@@ -277,7 +278,7 @@ class TestPlaylists(unittest.TestCase):
result = ie.extract('http://rutube.ru/tags/video/1800/')
self.assertIsPlaylist(result)
self.assertEqual(result['id'], '1800')
self.assertTrue(len(result['entries']) >= 68)
assertGreaterEqual(self, len(result['entries']), 68)
def test_rutube_person(self):
dl = FakeYDL()
@@ -285,7 +286,7 @@ class TestPlaylists(unittest.TestCase):
result = ie.extract('http://rutube.ru/video/person/313878/')
self.assertIsPlaylist(result)
self.assertEqual(result['id'], '313878')
self.assertTrue(len(result['entries']) >= 37)
assertGreaterEqual(self, len(result['entries']), 37)
def test_multiple_brightcove_videos(self):
# https://github.com/rg3/youtube-dl/issues/2283
@@ -322,7 +323,7 @@ class TestPlaylists(unittest.TestCase):
self.assertIsPlaylist(result)
self.assertEqual(result['id'], '10')
self.assertEqual(result['title'], 'Who are the hackers?')
self.assertTrue(len(result['entries']) >= 6)
assertGreaterEqual(self, len(result['entries']), 6)
def test_toypics_user(self):
dl = FakeYDL()
@@ -330,7 +331,7 @@ class TestPlaylists(unittest.TestCase):
result = ie.extract('http://videos.toypics.net/Mikey')
self.assertIsPlaylist(result)
self.assertEqual(result['id'], 'Mikey')
self.assertTrue(len(result['entries']) >= 17)
assertGreaterEqual(self, len(result['entries']), 17)
def test_xtube_user(self):
dl = FakeYDL()
@@ -338,7 +339,7 @@ class TestPlaylists(unittest.TestCase):
result = ie.extract('http://www.xtube.com/community/profile.php?user=greenshowers')
self.assertIsPlaylist(result)
self.assertEqual(result['id'], 'greenshowers')
self.assertTrue(len(result['entries']) >= 155)
assertGreaterEqual(self, len(result['entries']), 155)
def test_InstagramUser(self):
dl = FakeYDL()
@@ -346,7 +347,7 @@ class TestPlaylists(unittest.TestCase):
result = ie.extract('http://instagram.com/porsche')
self.assertIsPlaylist(result)
self.assertEqual(result['id'], 'porsche')
self.assertTrue(len(result['entries']) >= 2)
assertGreaterEqual(self, len(result['entries']), 2)
test_video = next(
e for e in result['entries']
if e['id'] == '614605558512799803_462752227')
@@ -385,7 +386,7 @@ class TestPlaylists(unittest.TestCase):
self.assertEqual(result['id'], '152147')
self.assertEqual(
result['title'], 'Brace Yourself - Today\'s Weirdest News')
self.assertTrue(len(result['entries']) >= 10)
assertGreaterEqual(self, len(result['entries']), 10)
def test_TeacherTubeUser(self):
dl = FakeYDL()
@@ -393,7 +394,7 @@ class TestPlaylists(unittest.TestCase):
result = ie.extract('http://www.teachertube.com/user/profile/rbhagwati2')
self.assertIsPlaylist(result)
self.assertEqual(result['id'], 'rbhagwati2')
self.assertTrue(len(result['entries']) >= 179)
assertGreaterEqual(self, len(result['entries']), 179)
if __name__ == '__main__':
unittest.main()

View File

@@ -7,6 +7,7 @@ import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
import errno
import io
import json
import re

View File

@@ -57,6 +57,12 @@ _TESTS = [
u'F375F75BF2AFDAAF2666E43868D46816F83F13E81C46.3725A8218E446A0DECD33F79DC282994D6AA92C92C9',
u'9C29AA6D499282CD97F33DCED0A644E8128A5273.64C18E31F38361864D86834E6662FAADFA2FB57F'
),
(
u'https://s.ytimg.com/yts/jsbin/html5player-en_US-vflBb0OQx.js',
u'js',
84,
u'123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQ0STUVWXYZ!"#$%&\'()*+,@./:;<=>'
)
]

View File

@@ -1197,6 +1197,10 @@ class YoutubeDL(object):
if res:
res += ', '
res += format_bytes(fdict['filesize'])
elif fdict.get('filesize_approx') is not None:
if res:
res += ', '
res += '~' + format_bytes(fdict['filesize_approx'])
return res
def list_formats(self, info_dict):
@@ -1230,14 +1234,21 @@ class YoutubeDL(object):
if not self.params.get('verbose'):
return
write_string(
encoding_str = (
'[debug] Encodings: locale %s, fs %s, out %s, pref %s\n' % (
locale.getpreferredencoding(),
sys.getfilesystemencoding(),
sys.stdout.encoding,
self.get_encoding()),
encoding=None
)
self.get_encoding()))
try:
write_string(encoding_str, encoding=None)
except:
errmsg = 'Failed to write encoding string %r' % encoding_str
try:
sys.stdout.write(errmsg)
except:
pass
raise IOError(errmsg)
self._write_string('[debug] youtube-dl version ' + __version__ + '\n')
try:

View File

@@ -72,11 +72,9 @@ __license__ = 'Public Domain'
import codecs
import io
import locale
import optparse
import os
import random
import re
import shlex
import sys
@@ -635,7 +633,7 @@ def _real_main(argv=None):
if desc is False:
continue
if hasattr(ie, 'SEARCH_KEY'):
_SEARCHES = (u'cute kittens', u'slithering pythons', u'falling cat', u'angry poodle', u'purple fish', u'running tortoise')
_SEARCHES = (u'cute kittens', u'slithering pythons', u'falling cat', u'angry poodle', u'purple fish', u'running tortoise', u'sleeping bunny')
_COUNTS = (u'', u'5', u'10', u'all')
desc += u' (Example: "%s%s:%s" )' % (ie.SEARCH_KEY, random.choice(_COUNTS), random.choice(_SEARCHES))
compat_print(desc)

View File

@@ -267,6 +267,8 @@ from .smotri import (
SmotriUserIE,
SmotriBroadcastIE,
)
from .snotr import SnotrIE
from .sockshare import SockshareIE
from .sohu import SohuIE
from .soundcloud import (
SoundcloudIE,

View File

@@ -12,7 +12,7 @@ from ..utils import (
class BRIE(InfoExtractor):
IE_DESC = 'Bayerischer Rundfunk Mediathek'
_VALID_URL = r'https?://(?:www\.)?br\.de/(?:[a-z0-9\-]+/)+(?P<id>[a-z0-9\-]+)\.html'
_VALID_URL = r'https?://(?:www\.)?br\.de/(?:[a-z0-9\-_]+/)+(?P<id>[a-z0-9\-_]+)\.html'
_BASE_URL = 'http://www.br.de'
_TESTS = [

View File

@@ -1,24 +1,42 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
class CBSIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?cbs\.com/shows/[^/]+/video/(?P<id>[^/]+)/.*'
_VALID_URL = r'https?://(?:www\.)?cbs\.com/shows/[^/]+/(?:video|artist)/(?P<id>[^/]+)/.*'
_TEST = {
u'url': u'http://www.cbs.com/shows/garth-brooks/video/_u7W953k6la293J7EPTd9oHkSPs6Xn6_/connect-chat-feat-garth-brooks/',
u'file': u'4JUVEwq3wUT7.flv',
u'info_dict': {
u'title': u'Connect Chat feat. Garth Brooks',
u'description': u'Connect with country music singer Garth Brooks, as he chats with fans on Wednesday November 27, 2013. Be sure to tune in to Garth Brooks: Live from Las Vegas, Friday November 29, at 9/8c on CBS!',
u'duration': 1495,
_TESTS = [{
'url': 'http://www.cbs.com/shows/garth-brooks/video/_u7W953k6la293J7EPTd9oHkSPs6Xn6_/connect-chat-feat-garth-brooks/',
'info_dict': {
'id': '4JUVEwq3wUT7',
'ext': 'flv',
'title': 'Connect Chat feat. Garth Brooks',
'description': 'Connect with country music singer Garth Brooks, as he chats with fans on Wednesday November 27, 2013. Be sure to tune in to Garth Brooks: Live from Las Vegas, Friday November 29, at 9/8c on CBS!',
'duration': 1495,
},
u'params': {
'params': {
# rtmp download
u'skip_download': True,
'skip_download': True,
},
}
'_skip': 'Blocked outside the US',
}, {
'url': 'http://www.cbs.com/shows/liveonletterman/artist/221752/st-vincent/',
'info_dict': {
'id': 'P9gjWjelt6iP',
'ext': 'flv',
'title': 'Live on Letterman - St. Vincent',
'description': 'Live On Letterman: St. Vincent in concert from New York\'s Ed Sullivan Theater on Tuesday, July 16, 2014.',
'duration': 3221,
},
'params': {
# rtmp download
'skip_download': True,
},
'_skip': 'Blocked outside the US',
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
@@ -26,5 +44,5 @@ class CBSIE(InfoExtractor):
webpage = self._download_webpage(url, video_id)
real_id = self._search_regex(
r"video\.settings\.pid\s*=\s*'([^']+)';",
webpage, u'real video ID')
webpage, 'real video ID')
return self.url_result(u'theplatform:%s' % real_id)

View File

@@ -42,7 +42,7 @@ class ChilloutzoneIE(InfoExtractor):
'id': '85523671',
'ext': 'mp4',
'title': 'The Sunday Times - Icons',
'description': 'md5:3e1c0dc6047498d6728dcdaad0891762',
'description': 'md5:a5f7ff82e2f7a9ed77473fe666954e84',
'uploader': 'Us',
'uploader_id': 'usfilms',
'upload_date': '20140131'

View File

@@ -43,7 +43,11 @@ class CNETIE(InfoExtractor):
raise ExtractorError('Cannot find video data')
video_id = vdata['id']
title = vdata['headline']
title = vdata.get('headline')
if title is None:
title = vdata.get('title')
if title is None:
raise ExtractorError('Cannot find title!')
description = vdata.get('dek')
thumbnail = vdata.get('image', {}).get('path')
author = vdata.get('author')

View File

@@ -69,6 +69,7 @@ class InfoExtractor(object):
* vcodec Name of the video codec in use
* container Name of the container format
* filesize The number of bytes, if known in advance
* filesize_approx An estimate for the number of bytes
* player_url SWF Player URL (used for rtmpdump).
* protocol The protocol that will be used for the actual
download, lower-case.
@@ -300,8 +301,12 @@ class InfoExtractor(object):
def _download_json(self, url_or_request, video_id,
note=u'Downloading JSON metadata',
errnote=u'Unable to download JSON metadata',
transform_source=None):
json_string = self._download_webpage(url_or_request, video_id, note, errnote)
transform_source=None,
fatal=True):
json_string = self._download_webpage(
url_or_request, video_id, note, errnote, fatal=fatal)
if (not fatal) and json_string is False:
return None
if transform_source:
json_string = transform_source(json_string)
try:
@@ -468,7 +473,7 @@ class InfoExtractor(object):
display_name = name
return self._html_search_regex(
r'''(?ix)<meta
(?=[^>]+(?:itemprop|name|property)=["\']%s["\'])
(?=[^>]+(?:itemprop|name|property)=["\']?%s["\']?)
[^>]+content=["\']([^"\']+)["\']''' % re.escape(name),
html, display_name, fatal=fatal, **kwargs)
@@ -555,6 +560,7 @@ class InfoExtractor(object):
f.get('abr') if f.get('abr') is not None else -1,
audio_ext_preference,
f.get('filesize') if f.get('filesize') is not None else -1,
f.get('filesize_approx') if f.get('filesize_approx') is not None else -1,
f.get('format_id'),
)
formats.sort(key=_formats_key)

View File

@@ -5,24 +5,26 @@ import os.path
import re
from .common import InfoExtractor
from ..utils import compat_urllib_parse_unquote
class DropboxIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?dropbox[.]com/s/(?P<id>[a-zA-Z0-9]{15})/(?P<title>[^?#]*)'
_TEST = {
'url': 'https://www.dropbox.com/s/0qr9sai2veej4f8/THE_DOCTOR_GAMES.mp4',
'md5': '8ae17c51172fb7f93bdd6a214cc8c896',
'url': 'https://www.dropbox.com/s/nelirfsxnmcfbfh/youtube-dl%20test%20video%20%27%C3%A4%22BaW_jenozKc.mp4',
'md5': '8a3d905427a6951ccb9eb292f154530b',
'info_dict': {
'id': '0qr9sai2veej4f8',
'id': 'nelirfsxnmcfbfh',
'ext': 'mp4',
'title': 'THE_DOCTOR_GAMES'
'title': 'youtube-dl test video \'ä"BaW_jenozKc'
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
title = os.path.splitext(mobj.group('title'))[0]
fn = compat_urllib_parse_unquote(mobj.group('title'))
title = os.path.splitext(fn)[0]
video_url = url + '?dl=1'
return {

View File

@@ -8,7 +8,6 @@ from ..utils import (
ExtractorError,
compat_urllib_parse,
compat_urllib_request,
determine_ext,
)

View File

@@ -26,7 +26,7 @@ class FunnyOrDieIE(InfoExtractor):
'id': 'e402820827',
'ext': 'mp4',
'title': 'Please Use This Song (Jon Lajoie)',
'description': 'md5:2ed27d364f5a805a6dba199faaf6681d',
'description': 'Please use this to sell something. www.jonlajoie.com',
'thumbnail': 're:^http:.*\.jpg$',
},
}]

View File

@@ -402,7 +402,7 @@ class GenericIE(InfoExtractor):
elif default_search == 'error':
raise ExtractorError(
('%r is not a valid URL. '
'Set --default-search "ytseach" (or run youtube-dl "ytsearch:%s" ) to search YouTube'
'Set --default-search "ytsearch" (or run youtube-dl "ytsearch:%s" ) to search YouTube'
) % (url, url), expected=True)
else:
assert ':' in default_search

View File

@@ -8,7 +8,7 @@ from .common import InfoExtractor
class KickStarterIE(InfoExtractor):
_VALID_URL = r'https?://www\.kickstarter\.com/projects/(?P<id>[^/]*)/.*'
_TEST = {
_TESTS = [{
'url': 'https://www.kickstarter.com/projects/1404461844/intersection-the-story-of-josh-grant?ref=home_location',
'md5': 'c81addca81327ffa66c642b5d8b08cab',
'info_dict': {
@@ -18,22 +18,45 @@ class KickStarterIE(InfoExtractor):
'description': 'A unique motocross documentary that examines the '
'life and mind of one of sports most elite athletes: Josh Grant.',
},
}
}, {
'note': 'Embedded video (not using the native kickstarter video service)',
'url': 'https://www.kickstarter.com/projects/597507018/pebble-e-paper-watch-for-iphone-and-android/posts/659178',
'playlist': [
{
'info_dict': {
'id': '78704821',
'ext': 'mp4',
'uploader_id': 'pebble',
'uploader': 'Pebble Technology',
'title': 'Pebble iOS Notifications',
}
}
],
}]
def _real_extract(self, url):
m = re.match(self._VALID_URL, url)
video_id = m.group('id')
webpage = self._download_webpage(url, video_id)
video_url = self._search_regex(r'data-video-url="(.*?)"',
webpage, 'video URL')
video_title = self._html_search_regex(r'<title>(.*?)</title>',
webpage, 'title').rpartition('— Kickstarter')[0].strip()
title = self._html_search_regex(
r'<title>\s*(.*?)(?:\s*&mdash; Kickstarter)?\s*</title>',
webpage, 'title')
video_url = self._search_regex(
r'data-video-url="(.*?)"',
webpage, 'video URL', default=None)
if video_url is None: # No native kickstarter, look for embedded videos
return {
'_type': 'url_transparent',
'ie_key': 'Generic',
'url': url,
'title': title,
}
return {
'id': video_id,
'url': video_url,
'title': video_title,
'title': title,
'description': self._og_search_description(webpage),
'thumbnail': self._og_search_thumbnail(webpage),
}

View File

@@ -28,11 +28,13 @@ class LivestreamIE(InfoExtractor):
}
def _extract_video_info(self, video_data):
video_url = video_data.get('progressive_url_hd') or video_data.get('progressive_url')
video_url = (
video_data.get('progressive_url_hd') or
video_data.get('progressive_url')
)
return {
'id': compat_str(video_data['id']),
'url': video_url,
'ext': 'mp4',
'title': video_data['caption'],
'thumbnail': video_data['thumbnail_url'],
'upload_date': video_data['updated_at'].replace('-', '')[:8],
@@ -50,7 +52,8 @@ class LivestreamIE(InfoExtractor):
r'window.config = ({.*?});', webpage, 'window config')
info = json.loads(config_json)['event']
videos = [self._extract_video_info(video_data['data'])
for video_data in info['feed']['data'] if video_data['type'] == 'video']
for video_data in info['feed']['data']
if video_data['type'] == 'video']
return self.playlist_result(videos, info['id'], info['full_name'])
else:
og_video = self._og_search_video_url(webpage, 'player url')

View File

@@ -11,8 +11,22 @@ from ..utils import (
class MLBIE(InfoExtractor):
_VALID_URL = r'https?://m\.mlb\.com/video/(?:topic/[\da-z_-]+/)?v(?P<id>n?\d+)'
_VALID_URL = r'https?://m\.mlb\.com/(?:.*?/)?video/(?:topic/[\da-z_-]+/)?v(?P<id>n?\d+)'
_TESTS = [
{
'url': 'http://m.mlb.com/sea/video/topic/51231442/v34698933/nymsea-ackley-robs-a-home-run-with-an-amazing-catch/?c_id=sea',
'md5': 'ff56a598c2cf411a9a38a69709e97079',
'info_dict': {
'id': '34698933',
'ext': 'mp4',
'title': "Ackley's spectacular catch",
'description': 'md5:7f5a981eb4f3cbc8daf2aeffa2215bf0',
'duration': 66,
'timestamp': 1405980600,
'upload_date': '20140721',
'thumbnail': 're:^https?://.*\.jpg$',
},
},
{
'url': 'http://m.mlb.com/video/topic/81536970/v34496663/mianym-stanton-practices-for-the-home-run-derby',
'md5': 'd9c022c10d21f849f49c05ae12a8a7e9',

View File

@@ -4,7 +4,11 @@ import re
import json
from .common import InfoExtractor
from ..utils import find_xpath_attr, compat_str
from ..utils import (
compat_str,
ExtractorError,
find_xpath_attr,
)
class NBCIE(InfoExtractor):
@@ -85,11 +89,25 @@ class NBCNewsIE(InfoExtractor):
flags=re.MULTILINE)
bootstrap = json.loads(bootstrap_json)
info = bootstrap['results'][0]['video']
playlist_url = info['fallbackPlaylistUrl'] + '?form=MPXNBCNewsAPI'
mpxid = info['mpxId']
all_videos = self._download_json(playlist_url, title)['videos']
# The response contains additional videos
info = next(v for v in all_videos if v['mpxId'] == mpxid)
base_urls = [
info['fallbackPlaylistUrl'],
info['associatedPlaylistUrl'],
]
for base_url in base_urls:
playlist_url = base_url + '?form=MPXNBCNewsAPI'
all_videos = self._download_json(playlist_url, title)['videos']
try:
info = next(v for v in all_videos if v['mpxId'] == mpxid)
break
except StopIteration:
continue
if info is None:
raise ExtractorError('Could not find video in playlists')
return {
'_type': 'url',

View File

@@ -92,16 +92,7 @@ class RTLnowIE(InfoExtractor):
},
{
'url': 'http://www.n-tvnow.de/deluxe-alles-was-spass-macht/thema-ua-luxushotel-fuer-vierbeiner.php?container_id=153819&player=1&season=0',
'info_dict': {
'id': '153819',
'ext': 'flv',
'title': 'Deluxe - Alles was Spaß macht - Thema u.a.: Luxushotel für Vierbeiner',
'description': 'md5:c3705e1bb32e1a5b2bcd634fc065c631',
'thumbnail': 'http://autoimg.static-fra.de/ntvnow/383157/1500x1500/image2.jpg',
'upload_date': '20140221',
'duration': 2429,
},
'skip': 'Only works from Germany',
'only_matching': True,
},
]

View File

@@ -17,7 +17,7 @@ class RTVEALaCartaIE(InfoExtractor):
_TEST = {
'url': 'http://www.rtve.es/alacarta/videos/balonmano/o-swiss-cup-masculina-final-espana-suecia/2491869/',
'md5': '18fcd45965bdd076efdb12cd7f6d7b9e',
'md5': '1d49b7e1ca7a7502c56a4bf1b60f1b43',
'info_dict': {
'id': '2491869',
'ext': 'mp4',

View File

@@ -20,7 +20,7 @@ class SaveFromIE(InfoExtractor):
'upload_date': '20120816',
'uploader': 'Howcast',
'uploader_id': 'Howcast',
'description': 'md5:4f0aac94361a12e1ce57d74f85265175',
'description': 're:(?s).* Hi, my name is Rene Dreifuss\. And I\'m here to show you some MMA.*',
},
'params': {
'skip_download': True

View File

@@ -0,0 +1,68 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
float_or_none,
str_to_int,
parse_duration,
)
class SnotrIE(InfoExtractor):
_VALID_URL = r'http?://(?:www\.)?snotr\.com/video/(?P<id>\d+)/([\w]+)'
_TESTS = [{
'url': 'http://www.snotr.com/video/13708/Drone_flying_through_fireworks',
'info_dict': {
'id': '13708',
'ext': 'flv',
'title': 'Drone flying through fireworks!',
'duration': 247,
'filesize_approx': 98566144,
'description': 'A drone flying through Fourth of July Fireworks',
}
}, {
'url': 'http://www.snotr.com/video/530/David_Letteman_-_George_W_Bush_Top_10',
'info_dict': {
'id': '530',
'ext': 'flv',
'title': 'David Letteman - George W. Bush Top 10',
'duration': 126,
'filesize_approx': 8912896,
'description': 'The top 10 George W. Bush moments, brought to you by David Letterman!',
}
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
title = self._og_search_title(webpage)
description = self._og_search_description(webpage)
video_url = "http://cdn.videos.snotr.com/%s.flv" % video_id
view_count = str_to_int(self._html_search_regex(
r'<p>\n<strong>Views:</strong>\n([\d,\.]+)</p>',
webpage, 'view count', fatal=False))
duration = parse_duration(self._html_search_regex(
r'<p>\n<strong>Length:</strong>\n\s*([0-9:]+).*?</p>',
webpage, 'duration', fatal=False))
filesize_approx = float_or_none(self._html_search_regex(
r'<p>\n<strong>Filesize:</strong>\n\s*([0-9.]+)\s*megabyte</p>',
webpage, 'filesize', fatal=False), invscale=1024 * 1024)
return {
'id': video_id,
'description': description,
'title': title,
'url': video_url,
'view_count': view_count,
'duration': duration,
'filesize_approx': filesize_approx,
}

View File

@@ -0,0 +1,80 @@
# coding: utf-8
from __future__ import unicode_literals
from ..utils import (
ExtractorError,
compat_urllib_parse,
compat_urllib_request,
determine_ext,
)
import re
from .common import InfoExtractor
class SockshareIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?sockshare\.com/file/(?P<id>[0-9A-Za-z]+)'
_FILE_DELETED_REGEX = r'This file doesn\'t exist, or has been removed\.</div>'
_TEST = {
'url': 'http://www.sockshare.com/file/437BE28B89D799D7',
'md5': '9d0bf1cfb6dbeaa8d562f6c97506c5bd',
'info_dict': {
'id': '437BE28B89D799D7',
'title': 'big_buck_bunny_720p_surround.avi',
'ext': 'avi',
'thumbnail': 're:^http://.*\.jpg$',
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
url = 'http://sockshare.com/file/%s' % video_id
webpage = self._download_webpage(url, video_id)
if re.search(self._FILE_DELETED_REGEX, webpage) is not None:
raise ExtractorError('Video %s does not exist' % video_id,
expected=True)
confirm_hash = self._html_search_regex(r'''(?x)<input\s+
type="hidden"\s+
value="([^"]*)"\s+
name="hash"
''', webpage, 'hash')
fields = {
"hash": confirm_hash,
"confirm": "Continue as Free User"
}
post = compat_urllib_parse.urlencode(fields)
req = compat_urllib_request.Request(url, post)
# Apparently, this header is required for confirmation to work.
req.add_header('Host', 'www.sockshare.com')
req.add_header('Content-type', 'application/x-www-form-urlencoded')
webpage = self._download_webpage(
req, video_id, 'Downloading video page')
video_url = self._html_search_regex(
r'<a href="([^"]*)".+class="download_file_link"',
webpage, 'file url')
video_url = "http://www.sockshare.com" + video_url
title = self._html_search_regex(r'<h1>(.+)<strong>', webpage, 'title')
thumbnail = self._html_search_regex(
r'<img\s+src="([^"]*)".+?name="bg"',
webpage, 'thumbnail')
formats = [{
'format_id': 'sd',
'url': video_url,
'ext': determine_ext(title),
}]
return {
'id': video_id,
'title': title,
'thumbnail': thumbnail,
'formats': formats,
}

View File

@@ -82,10 +82,10 @@ class SoundcloudIE(InfoExtractor):
# downloadable song
{
'url': 'https://soundcloud.com/oddsamples/bus-brakes',
'md5': 'fee7b8747b09bb755cefd4b853e7249a',
'md5': '7624f2351f8a3b2e7cd51522496e7631',
'info_dict': {
'id': '128590877',
'ext': 'wav',
'ext': 'mp3',
'title': 'Bus Brakes',
'description': 'md5:0170be75dd395c96025d210d261c784e',
'uploader': 'oddsamples',

View File

@@ -53,7 +53,7 @@ class SteamIE(InfoExtractor):
'ext': 'mp4',
'upload_date': '20140329',
'title': 'FRONTIERS - Final Greenlight Trailer',
'description': 'md5:6df4fe8dd494ae811869672b0767e025',
'description': 'md5:dc96a773669d0ca1b36c13c1f30250d9',
'uploader': 'AAD Productions',
'uploader_id': 'AtomicAgeDogGames',
}

View File

@@ -19,16 +19,6 @@ class TagesschauIE(InfoExtractor):
'description': 'md5:69da3c61275b426426d711bde96463ab',
'thumbnail': 're:^http:.*\.jpg$',
},
}, {
'url': 'http://www.tagesschau.de/multimedia/video/video-5964.html',
'md5': '66652566900963a3f962333579eeffcf',
'info_dict': {
'id': '5964',
'ext': 'mp4',
'title': 'Nahost-Konflikt: Israel bombadiert Ziele im Gazastreifen und Westjordanland',
'description': 'md5:07bfc78c48eec3145ed4805299a1900a',
'thumbnail': 're:http://.*\.jpg',
},
}]
_FORMATS = {

View File

@@ -62,7 +62,7 @@ class TeacherTubeIE(InfoExtractor):
webpage = self._download_webpage(url, video_id)
title = self._html_search_meta('title', webpage, 'title')
title = self._html_search_meta('title', webpage, 'title', fatal=True)
TITLE_SUFFIX = ' - TeacherTube'
if title.endswith(TITLE_SUFFIX):
title = title[:-len(TITLE_SUFFIX)].strip()
@@ -101,7 +101,11 @@ class TeacherTubeUserIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?teachertube\.com/(user/profile|collection)/(?P<user>[0-9a-zA-Z]+)/?'
_MEDIA_RE = r'(?s)"sidebar_thumb_time">[0-9:]+</div>.+?<a href="(https?://(?:www\.)?teachertube\.com/(?:video|audio)/[^"]+)">'
_MEDIA_RE = r'''(?sx)
class="?sidebar_thumb_time"?>[0-9:]+</div>
\s*
<a\s+href="(https?://(?:www\.)?teachertube\.com/(?:video|audio)/[^"]+)"
'''
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
@@ -111,14 +115,12 @@ class TeacherTubeUserIE(InfoExtractor):
webpage = self._download_webpage(url, user_id)
urls.extend(re.findall(self._MEDIA_RE, webpage))
pages = re.findall(r'/ajax-user/user-videos/%s\?page=([0-9]+)' % user_id, webpage)[1:-1]
pages = re.findall(r'/ajax-user/user-videos/%s\?page=([0-9]+)' % user_id, webpage)[:-1]
for p in pages:
more = 'http://www.teachertube.com/ajax-user/user-videos/%s?page=%s' % (user_id, p)
webpage = self._download_webpage(more, user_id, 'Downloading page %s/%s' % (p, len(pages) + 1))
urls.extend(re.findall(self._MEDIA_RE, webpage))
entries = []
for url in urls:
entries.append(self.url_result(url, 'TeacherTube'))
webpage = self._download_webpage(more, user_id, 'Downloading page %s/%s' % (p, len(pages)))
video_urls = re.findall(self._MEDIA_RE, webpage)
urls.extend(video_urls)
entries = [self.url_result(vurl, 'TeacherTube') for vurl in urls]
return self.playlist_result(entries, user_id)

View File

@@ -1,8 +1,6 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor

View File

@@ -98,7 +98,7 @@ class VimeoIE(VimeoBaseInfoExtractor, SubtitlesInfoExtractor):
'info_dict': {
'id': '54469442',
'ext': 'mp4',
'title': 'Kathy Sierra: Building the minimum Badass User, Business of Software',
'title': 'Kathy Sierra: Building the minimum Badass User, Business of Software 2012',
'uploader': 'The BLN & Business of Software',
'uploader_id': 'theblnbusinessofsoftware',
'duration': 3610,

View File

@@ -10,7 +10,7 @@ from ..utils import (
class VodlockerIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?vodlocker.com/(?P<id>[0-9a-zA-Z]+)(?:\..*?)?'
_VALID_URL = r'https?://(?:www\.)?vodlocker\.com/(?P<id>[0-9a-zA-Z]+)(?:\..*?)?'
_TESTS = [{
'url': 'http://vodlocker.com/e8wvyzz4sl42',

View File

@@ -1,5 +1,6 @@
from __future__ import unicode_literals
import json
import re
from .common import InfoExtractor
@@ -20,12 +21,14 @@ class VubeIE(InfoExtractor):
'ext': 'mp4',
'title': 'Chiara Grispo - Price Tag by Jessie J',
'description': 'md5:8ea652a1f36818352428cb5134933313',
'thumbnail': 'http://frame.thestaticvube.com/snap/228x128/102e7e63057-5ebc-4f5c-4065-6ce4ebde131f.jpg',
'thumbnail': 're:^http://frame\.thestaticvube\.com/snap/[0-9x]+/102e7e63057-5ebc-4f5c-4065-6ce4ebde131f\.jpg$',
'uploader': 'Chiara.Grispo',
'uploader_id': '1u3hX0znhP',
'timestamp': 1388743358,
'upload_date': '20140103',
'duration': 170.56
'duration': 170.56,
'like_count': int,
'dislike_count': int,
'comment_count': int,
}
},
{
@@ -36,12 +39,30 @@ class VubeIE(InfoExtractor):
'ext': 'mp4',
'title': 'My 7 year old Sister and I singing "Alive" by Krewella',
'description': 'md5:40bcacb97796339f1690642c21d56f4a',
'thumbnail': 'http://frame.thestaticvube.com/snap/228x128/102265d5a9f-0f17-4f6b-5753-adf08484ee1e.jpg',
'thumbnail': 're:^http://frame\.thestaticvube\.com/snap/[0-9x]+/102265d5a9f-0f17-4f6b-5753-adf08484ee1e\.jpg$',
'uploader': 'Seraina',
'uploader_id': 'XU9VE2BQ2q',
'timestamp': 1396492438,
'upload_date': '20140403',
'duration': 240.107
'duration': 240.107,
'like_count': int,
'dislike_count': int,
'comment_count': int,
}
}, {
'url': 'http://vube.com/vote/Siren+Gene/0nmsMY5vEq?n=2&t=s',
'md5': '0584fc13b50f887127d9d1007589d27f',
'info_dict': {
'id': '0nmsMY5vEq',
'ext': 'mp4',
'title': 'Frozen - Let It Go Cover by Siren Gene',
'description': 'My rendition of "Let It Go" originally sung by Idina Menzel.',
'uploader': 'Siren Gene',
'uploader_id': 'Siren',
'thumbnail': 're:^http://frame\.thestaticvube\.com/snap/[0-9x]+/10283ab622a-86c9-4681-51f2-30d1f65774af\.jpg$',
'duration': 221.788,
'like_count': int,
'dislike_count': int,
'comment_count': int,
}
}
]
@@ -50,8 +71,16 @@ class VubeIE(InfoExtractor):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
video = self._download_json(
'http://vube.com/api/v2/video/%s' % video_id, video_id, 'Downloading video JSON')
webpage = self._download_webpage(url, video_id)
data_json = self._search_regex(
r'(?s)window\["(?:tapiVideoData|vubeOriginalVideoData)"\]\s*=\s*(\{.*?\n});\n',
webpage, 'video data'
)
data = json.loads(data_json)
video = (
data.get('video') or
data)
assert isinstance(video, dict)
public_id = video['public_id']
@@ -69,21 +98,31 @@ class VubeIE(InfoExtractor):
title = video['title']
description = video.get('description')
thumbnail = video['thumbnail_src']
if thumbnail.startswith('//'):
thumbnail = 'http:' + thumbnail
uploader = video['user_alias']
uploader_id = video['user_url_id']
timestamp = int(video['upload_time'])
thumbnail = self._proto_relative_url(
video.get('thumbnail') or video.get('thumbnail_src'),
scheme='http:')
uploader = data.get('user', {}).get('channel', {}).get('name') or video.get('user_alias')
uploader_id = data.get('user', {}).get('name')
timestamp = int_or_none(video.get('upload_time'))
duration = video['duration']
view_count = video.get('raw_view_count')
like_count = video.get('total_likes')
dislike_count= video.get('total_hates')
like_count = video.get('rlikes')
if like_count is None:
like_count = video.get('total_likes')
dislike_count = video.get('rhates')
if dislike_count is None:
dislike_count = video.get('total_hates')
comment = self._download_json(
'http://vube.com/api/video/%s/comment' % video_id, video_id, 'Downloading video comment JSON')
comment_count = int_or_none(comment.get('total'))
comments = video.get('comments')
comment_count = None
if comments is None:
comment_data = self._download_json(
'http://vube.com/api/video/%s/comment' % video_id,
video_id, 'Downloading video comment JSON', fatal=False)
if comment_data is not None:
comment_count = int_or_none(comment_data.get('total'))
else:
comment_count = len(comments)
return {
'id': video_id,

View File

@@ -6,7 +6,7 @@ import re
from .common import InfoExtractor
from ..utils import (
compat_parse_qs,
compat_urlparse,
compat_parse_qs,
determine_ext,
unified_strdate,
)
@@ -55,14 +55,14 @@ class WDRIE(InfoExtractor):
},
},
{
'url': 'http://www.funkhauseuropa.de/av/audiosuepersongsoulbossanova100-audioplayer.html',
'md5': '24e83813e832badb0a8d7d1ef9ef0691',
'url': 'http://www.funkhauseuropa.de/av/audioflaviacoelhoamaramar100-audioplayer.html',
'md5': '99a1443ff29af19f6c52cf6f4dc1f4aa',
'info_dict': {
'id': 'mdb-463528',
'id': 'mdb-478135',
'ext': 'mp3',
'title': 'Süpersong: Soul Bossa Nova',
'title': 'Flavia Coelho: Amar é Amar',
'description': 'md5:7b29e97e10dfb6e265238b32fa35b23a',
'upload_date': '20140630',
'upload_date': '20140717',
},
},
]
@@ -81,7 +81,7 @@ class WDRIE(InfoExtractor):
]
return self.playlist_result(entries, page_id)
flashvars = compat_urlparse.parse_qs(
flashvars = compat_parse_qs(
self._html_search_regex(r'<param name="flashvars" value="([^"]+)"', webpage, 'flashvars'))
page_id = flashvars['trackerClipId'][0]

View File

@@ -1,15 +1,12 @@
# coding: utf-8
import collections
import errno
import io
import itertools
import json
import os.path
import re
import struct
import traceback
import zlib
from .common import InfoExtractor, SearchInfoExtractor
from .subtitles import SubtitlesInfoExtractor
@@ -349,8 +346,10 @@ class YoutubeIE(YoutubeBaseInfoExtractor, SubtitlesInfoExtractor):
def _extract_signature_function(self, video_id, player_url, slen):
id_m = re.match(
r'.*-(?P<id>[a-zA-Z0-9_-]+)(?:/watch_as3)?\.(?P<ext>[a-z]+)$',
r'.*-(?P<id>[a-zA-Z0-9_-]+)(?:/watch_as3|/html5player)?\.(?P<ext>[a-z]+)$',
player_url)
if not id_m:
raise ExtractorError('Cannot identify player %r' % player_url)
player_type = id_m.group('ext')
player_id = id_m.group('id')

View File

@@ -114,13 +114,13 @@ class JSInterpreter(object):
obj = {}
obj_m = re.search(
(r'(?:var\s+)?%s\s*=\s*\{' % re.escape(objname)) +
r'\s*(?P<fields>([a-zA-Z$]+\s*:\s*function\(.*?\)\s*\{.*?\})*)' +
r'\s*(?P<fields>([a-zA-Z$0-9]+\s*:\s*function\(.*?\)\s*\{.*?\})*)' +
r'\}\s*;',
self.code)
fields = obj_m.group('fields')
# Currently, it only supports function definitions
fields_m = re.finditer(
r'(?P<key>[a-zA-Z$]+)\s*:\s*function'
r'(?P<key>[a-zA-Z$0-9]+)\s*:\s*function'
r'\((?P<args>[a-z,]+)\){(?P<code>[^}]+)}',
fields)
for f in fields_m:

View File

@@ -2,12 +2,12 @@ from __future__ import unicode_literals
import collections
import io
import struct
import zlib
from .utils import (
compat_str,
ExtractorError,
struct_unpack,
)
@@ -23,17 +23,17 @@ def _extract_tags(file_contents):
file_contents[:1])
# Determine number of bits in framesize rectangle
framesize_nbits = struct.unpack('!B', content[:1])[0] >> 3
framesize_nbits = struct_unpack('!B', content[:1])[0] >> 3
framesize_len = (5 + 4 * framesize_nbits + 7) // 8
pos = framesize_len + 2 + 2
while pos < len(content):
header16 = struct.unpack('<H', content[pos:pos + 2])[0]
header16 = struct_unpack('<H', content[pos:pos + 2])[0]
pos += 2
tag_code = header16 >> 6
tag_len = header16 & 0x3f
if tag_len == 0x3f:
tag_len = struct.unpack('<I', content[pos:pos + 4])[0]
tag_len = struct_unpack('<I', content[pos:pos + 4])[0]
pos += 4
assert pos + tag_len <= len(content), \
('Tag %d ends at %d+%d - that\'s longer than the file (%d)'
@@ -99,7 +99,7 @@ def _read_int(reader):
for _ in range(5):
buf = reader.read(1)
assert len(buf) == 1
b = struct.unpack('<B', buf)[0]
b = struct_unpack('<B', buf)[0]
res = res | ((b & 0x7f) << shift)
if b & 0x80 == 0:
break
@@ -111,7 +111,7 @@ def _u30(reader):
res = _read_int(reader)
assert res & 0xf0000000 == 0
return res
u32 = _read_int
_u32 = _read_int
def _s32(reader):
@@ -125,7 +125,7 @@ def _s24(reader):
bs = reader.read(3)
assert len(bs) == 3
last_byte = b'\xff' if (ord(bs[2:3]) >= 0x80) else b'\x00'
return struct.unpack('<i', bs + last_byte)[0]
return struct_unpack('<i', bs + last_byte)[0]
def _read_string(reader):
@@ -144,7 +144,7 @@ def _read_bytes(count, reader):
def _read_byte(reader):
resb = _read_bytes(1, reader=reader)
res = struct.unpack('<B', resb)[0]
res = struct_unpack('<B', resb)[0]
return res
@@ -470,8 +470,7 @@ class SWFInterpreter(object):
mname = self.multinames[index]
assert isinstance(obj, _AVMClass)
construct_method = self.extract_function(
obj, mname)
# We do not actually call the constructor for now;
# we just pretend it does nothing
stack.append(obj.make_object())

View File

@@ -91,11 +91,9 @@ except ImportError:
compat_subprocess_get_DEVNULL = lambda: open(os.path.devnull, 'w')
try:
from urllib.parse import parse_qs as compat_parse_qs
except ImportError: # Python 2
# HACK: The following is the correct parse_qs implementation from cpython 3's stdlib.
# Python 2's version is apparently totally broken
def _unquote(string, encoding='utf-8', errors='replace'):
from urllib.parse import unquote as compat_urllib_parse_unquote
except ImportError:
def compat_urllib_parse_unquote(string, encoding='utf-8', errors='replace'):
if string == '':
return string
res = string.split('%')
@@ -130,6 +128,13 @@ except ImportError: # Python 2
string += pct_sequence.decode(encoding, errors)
return string
try:
from urllib.parse import parse_qs as compat_parse_qs
except ImportError: # Python 2
# HACK: The following is the correct parse_qs implementation from cpython 3's stdlib.
# Python 2's version is apparently totally broken
def _parse_qsl(qs, keep_blank_values=False, strict_parsing=False,
encoding='utf-8', errors='replace'):
qs, _coerce_result = qs, unicode
@@ -149,10 +154,12 @@ except ImportError: # Python 2
continue
if len(nv[1]) or keep_blank_values:
name = nv[0].replace('+', ' ')
name = _unquote(name, encoding=encoding, errors=errors)
name = compat_urllib_parse_unquote(
name, encoding=encoding, errors=errors)
name = _coerce_result(name)
value = nv[1].replace('+', ' ')
value = _unquote(value, encoding=encoding, errors=errors)
value = compat_urllib_parse_unquote(
value, encoding=encoding, errors=errors)
value = _coerce_result(value)
r.append((name, value))
return r
@@ -1193,13 +1200,6 @@ def format_bytes(bytes):
return u'%.2f%s' % (converted, suffix)
def str_to_int(int_str):
if int_str is None:
return None
int_str = re.sub(r'[,\.]', u'', int_str)
return int(int_str)
def get_term_width():
columns = os.environ.get('COLUMNS', None)
if columns:
@@ -1267,15 +1267,22 @@ class HEADRequest(compat_urllib_request.Request):
return "HEAD"
def int_or_none(v, scale=1, default=None, get_attr=None):
def int_or_none(v, scale=1, default=None, get_attr=None, invscale=1):
if get_attr:
if v is not None:
v = getattr(v, get_attr, None)
return default if v is None else (int(v) // scale)
return default if v is None else (int(v) * invscale // scale)
def float_or_none(v, scale=1, default=None):
return default if v is None else (float(v) / scale)
def str_to_int(int_str):
if int_str is None:
return None
int_str = re.sub(r'[,\.]', u'', int_str)
return int(int_str)
def float_or_none(v, scale=1, invscale=1, default=None):
return default if v is None else (float(v) * invscale / scale)
def parse_duration(s):

View File

@@ -1,2 +1,2 @@
__version__ = '2014.07.20.1'
__version__ = '2014.07.23.2'