Compare commits

...

44 Commits

Author SHA1 Message Date
Philipp Hagemeister
c30ae9594c release 2014.11.02.1 2014-11-02 10:28:21 +01:00
Philipp Hagemeister
ffae28ae18 release 2014.11.02 2014-11-02 09:45:51 +01:00
Sergey M․
d9116714f2 [cinemassacre] Fix extraction (Closes #4083) 2014-11-02 08:01:14 +07:00
Philipp Hagemeister
08965906a8 [README] Update FAQ on Ubuntu (#4078) 2014-11-01 19:24:56 +01:00
Sergey M․
5263cdfcf9 [generic] Improve MLB iframe regex 2014-11-01 04:01:58 +07:00
Sergey M․
b2a68d14cf [mlb] Improve _VALID_URL (Closes #4063) 2014-11-01 04:01:18 +07:00
Sergey M․
6e1cff9c33 [canalplus] Improve and merge with d8 extractor 2014-10-31 21:54:30 +07:00
Sergey M․
72975729c8 [canalplus] Tweak extractor to support piwiplus (Closes #4046) 2014-10-31 20:19:30 +07:00
Sergey M․
d319948b6a [funnyordie] Add articles URL test 2014-10-31 19:26:56 +07:00
Sergey M.
9a4bf889f9 Merge pull request #4069 from anovicecodemonkey/support_funnyordie_articles_urls
[FunnyOrDie] Add support for "/articles/" URLs
2014-10-31 18:25:22 +05:00
anovicecodemonkey
2a834bdb21 [FunnyOrDie] Add support for "/articles/" URLs 2014-10-31 21:20:37 +10:30
Philipp Hagemeister
0d2c141865 [youtube] Detect formats 298 et al as mp4 (Fixes #4066) 2014-10-31 11:13:02 +01:00
Philipp Hagemeister
5ec39d8b96 release 2014.10.30 2014-10-30 09:53:48 +01:00
Philipp Hagemeister
7b6de3728a [youtube] Add format 266 (Fixes #4055) 2014-10-30 09:53:43 +01:00
Philipp Hagemeister
a51d3aa001 [youtube] Add support for formats 302 and 303 (Fixes #4060) 2014-10-30 09:43:11 +01:00
Philipp Hagemeister
2c8e03d937 Sort formats by fps as well 2014-10-30 09:40:52 +01:00
Philipp Hagemeister
fbb21cf528 [youtube] Add formats 298, 299 (Fixes #4056) 2014-10-30 09:34:13 +01:00
Naglis Jonaitis
b8a618f898 [ro220] Fix broken extractor and modernize (#4054) 2014-10-30 01:42:52 +02:00
Philipp Hagemeister
feb74960eb release 2014.10.29 2014-10-29 23:29:42 +01:00
Jaime Marquínez Ferrándiz
d65d628613 [crunchycroll] Fix building of ass subtitles (reported in #4019)
Parse the xml document instead of using regexes, otherwise unicode characters are left unescaped.
2014-10-29 21:19:20 +01:00
Philipp Hagemeister
ac645ac7d0 [generic] Allow soundcloud embeds with additional attributes 2014-10-29 20:27:58 +01:00
Philipp Hagemeister
7d11297f3f Merge branch 'master' of github.com:rg3/youtube-dl 2014-10-29 20:10:07 +01:00
Philipp Hagemeister
6ad4013d40 [drtv] Allow fractional timestamps (Fixes #4059) 2014-10-29 20:10:00 +01:00
Sergey M․
dbd1283d31 [naver] Capture and output error message (#4057) 2014-10-29 21:50:37 +07:00
Sergey M․
c451d4f553 [trutube] Fix extraction 2014-10-29 21:16:10 +07:00
Jaime Marquínez Ferrándiz
8abec2c8bb [test_utils] Fix compat_getenv and compat_expanduser tests on python 3.x 2014-10-29 11:13:34 +01:00
Jaime Marquínez Ferrándiz
a9bad429b3 [niconico] Add extractor for playlists (closes #4043) 2014-10-29 11:04:48 +01:00
Philipp Hagemeister
50c8266ef0 Merge branch 'master' of github.com:rg3/youtube-dl 2014-10-28 23:40:44 +01:00
Philipp Hagemeister
00edd4f9be [laola1tv] Mark as broken
When the f4m downloader gets live stream support, I expect this to work magically or with very minor changes.
2014-10-28 17:29:27 +01:00
Philipp Hagemeister
ee966928af [f4m] Support bootstrap URLs 2014-10-28 17:27:41 +01:00
Philipp Hagemeister
e5193599ec [laola1tv] Add new extractor
The extractor works fine, but the f4m downloader cannot handle the resulting bootstrap information.
2014-10-28 16:51:34 +01:00
Philipp Hagemeister
01d663bca3 [auengine] Simplify 2014-10-28 15:51:15 +01:00
Sergey M․
e0c51cdadc [vk] Generalize errors 2014-10-28 21:35:25 +07:00
Sergey M․
9334f8f17a [vk] Handle deleted videos 2014-10-28 21:06:07 +07:00
Sergey M․
632256d9ec [wimp] Update video URL regex 2014-10-28 20:35:02 +07:00
Philipp Hagemeister
03df7baa6a Start documentation on how to embed youtube-dl 2014-10-28 12:54:39 +01:00
Philipp Hagemeister
3511266bc3 [YoutubeDL] Simplify API of YoutubeDL
Calling add_default_extractors twice should be harmless since the first set of extractors will match.
2014-10-28 12:54:29 +01:00
Philipp Hagemeister
9fdece5d34 [srmediathek] Choose variable name more wisely 2014-10-28 10:44:47 +01:00
Philipp Hagemeister
bbf1092ad0 [fktv] Remove unused import 2014-10-28 10:44:17 +01:00
Philipp Hagemeister
9ef55c5bbc [quickvid] Add new extractor 2014-10-28 10:41:37 +01:00
Philipp Hagemeister
48a24ab746 [generic] Fix HTML5 video regexp 2014-10-28 10:41:24 +01:00
Philipp Hagemeister
68acdbda9d Merge remote-tracking branch 'origin/master' 2014-10-28 09:13:23 +01:00
Philipp Hagemeister
27c542c06f [iconosquare] Simplify 2014-10-28 09:12:28 +01:00
Naglis Jonaitis
aaa399d2f6 [sphinx] Fix version import 2014-10-27 18:49:48 +02:00
31 changed files with 381 additions and 146 deletions

View File

@@ -381,7 +381,7 @@ Again, from then on you'll be able to update with `sudo youtube-dl -U`.
YouTube changed their playlist format in March 2014 and later on, so you'll need at least youtube-dl 2014.07.25 to download all YouTube videos.
If you have installed youtube-dl with a package manager, pip, setup.py or a tarball, please use that to update. Note that Ubuntu packages do not seem to get updated anymore. Since we are not affiliated with Ubuntu, there is little we can do. Feel free to report bugs to the Ubuntu packaging guys - all they have to do is update the package to a somewhat recent version. See above for a way to update.
If you have installed youtube-dl with a package manager, pip, setup.py or a tarball, please use that to update. Note that Ubuntu packages do not seem to get updated anymore. Since we are not affiliated with Ubuntu, there is little we can do. Feel free to [report bugs](https://bugs.launchpad.net/ubuntu/+source/youtube-dl/+filebug) to the [Ubuntu packaging guys](mailto:ubuntu-motu@lists.ubuntu.com?subject=outdated%20version%20of%20youtube-dl) - all they have to do is update the package to a somewhat recent version. See above for a way to update.
### Do I always have to pass in `--max-quality FORMAT`, or `-citw`?
@@ -511,6 +511,20 @@ If you want to add support for a new site, you can follow this quick list (assum
In any case, thank you very much for your contributions!
# EMBEDDING YOUTUBE-DL
youtube-dl makes the best effort to be a good command-line program, and thus should be callable from any programming language. If you encounter any problems parsing its output, feel free to [create a report](https://github.com/rg3/youtube-dl/issues/new).
From a Python program, you can embed youtube-dl in a more powerful fashion, like this:
import youtube_dl
ydl_opts = {}
with youtube_dl.YoutubeDL(ydl_opts) as ydl:
ydl.download(['http://www.youtube.com/watch?v=BaW_jenozKc'])
Most likely, you'll want to use various options. For a list of what can be done, have a look at [youtube_dl/YoutubeDL.py](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/YoutubeDL.py#L69). For a start, if you want to intercept youtube-dl's output, set a `logger` object.
# BUGS
Bugs and suggestions should be reported at: <https://github.com/rg3/youtube-dl/issues> . Unless you were prompted so or there is another pertinent reason (e.g. GitHub fails to accept the bug report), please do not send bug reports via personal email.

View File

@@ -44,8 +44,8 @@ copyright = u'2014, Ricardo Garcia Gonzalez'
# built documents.
#
# The short X.Y version.
import youtube_dl
version = youtube_dl.__version__
from youtube_dl.version import __version__
version = __version__
# The full version, including alpha/beta/rc tags.
release = version

View File

@@ -289,6 +289,7 @@ class TestUtil(unittest.TestCase):
self.assertEqual(parse_iso8601('2014-03-23T23:04:26+0100'), 1395612266)
self.assertEqual(parse_iso8601('2014-03-23T22:04:26+0000'), 1395612266)
self.assertEqual(parse_iso8601('2014-03-23T22:04:26Z'), 1395612266)
self.assertEqual(parse_iso8601('2014-03-23T22:04:26.1234Z'), 1395612266)
def test_strip_jsonp(self):
stripped = strip_jsonp('cb ([ {"id":"532cb",\n\n\n"x":\n3}\n]\n);')
@@ -360,12 +361,14 @@ class TestUtil(unittest.TestCase):
def test_compat_getenv(self):
test_str = 'тест'
os.environ['YOUTUBE-DL-TEST'] = test_str.encode(get_filesystem_encoding())
os.environ['YOUTUBE-DL-TEST'] = (test_str if sys.version_info >= (3, 0)
else test_str.encode(get_filesystem_encoding()))
self.assertEqual(compat_getenv('YOUTUBE-DL-TEST'), test_str)
def test_compat_expanduser(self):
test_str = 'C:\Documents and Settings\тест\Application Data'
os.environ['HOME'] = test_str.encode(get_filesystem_encoding())
os.environ['HOME'] = (test_str if sys.version_info >= (3, 0)
else test_str.encode(get_filesystem_encoding()))
self.assertEqual(compat_expanduser('~'), test_str)
if __name__ == '__main__':

View File

@@ -189,7 +189,7 @@ class YoutubeDL(object):
_num_downloads = None
_screen_file = None
def __init__(self, params=None):
def __init__(self, params=None, auto_init=True):
"""Create a FileDownloader object with the given options."""
if params is None:
params = {}
@@ -246,6 +246,10 @@ class YoutubeDL(object):
self._setup_opener()
if auto_init:
self.print_debug_header()
self.add_default_info_extractors()
def add_info_extractor(self, ie):
"""Add an InfoExtractor object to the end of the list."""
self._ies.append(ie)
@@ -1207,6 +1211,8 @@ class YoutubeDL(object):
res += 'video@'
if fdict.get('vbr') is not None:
res += '%4dk' % fdict['vbr']
if fdict.get('fps') is not None:
res += ', %sfps' % fdict['fps']
if fdict.get('acodec') is not None:
if res:
res += ', '

View File

@@ -293,9 +293,6 @@ def _real_main(argv=None):
}
with YoutubeDL(ydl_opts) as ydl:
ydl.print_debug_header()
ydl.add_default_info_extractors()
# PostProcessors
# Add the metadata pp first, the other pps will copy it
if opts.addmetadata:

View File

@@ -244,9 +244,16 @@ class F4mFD(FileDownloader):
lambda f: int(f[0]) == requested_bitrate, formats))[0]
base_url = compat_urlparse.urljoin(man_url, media.attrib['url'])
bootstrap = base64.b64decode(doc.find(_add_ns('bootstrapInfo')).text)
bootstrap_node = doc.find(_add_ns('bootstrapInfo'))
if bootstrap_node.text is None:
bootstrap_url = compat_urlparse.urljoin(
base_url, bootstrap_node.attrib['url'])
bootstrap = self.ydl.urlopen(bootstrap_url).read()
else:
bootstrap = base64.b64decode(bootstrap_node.text)
metadata = base64.b64decode(media.find(_add_ns('metadata')).text)
boot_info = read_bootstrap_info(bootstrap)
fragments_list = build_fragments_list(boot_info)
if self.params.get('test', False):
# We only download the first fragment

View File

@@ -67,7 +67,6 @@ from .crunchyroll import (
CrunchyrollShowPlaylistIE
)
from .cspan import CSpanIE
from .d8 import D8IE
from .dailymotion import (
DailymotionIE,
DailymotionPlaylistIE,
@@ -189,6 +188,7 @@ from .kontrtube import KontrTubeIE
from .krasview import KrasViewIE
from .ku6 import Ku6IE
from .la7 import LA7IE
from .laola1tv import Laola1TvIE
from .lifenews import LifeNewsIE
from .liveleak import LiveLeakIE
from .livestream import (
@@ -251,7 +251,7 @@ from .newstube import NewstubeIE
from .nfb import NFBIE
from .nfl import NFLIE
from .nhl import NHLIE, NHLVideocenterIE
from .niconico import NiconicoIE
from .niconico import NiconicoIE, NiconicoPlaylistIE
from .ninegag import NineGagIE
from .noco import NocoIE
from .normalboots import NormalbootsIE
@@ -294,6 +294,7 @@ from .pornoxo import PornoXOIE
from .promptfile import PromptFileIE
from .prosiebensat1 import ProSiebenSat1IE
from .pyvideo import PyvideoIE
from .quickvid import QuickVidIE
from .radiofrance import RadioFranceIE
from .rai import RaiIE
from .rbmaradio import RBMARadioIE

View File

@@ -24,8 +24,7 @@ class AUEngineIE(InfoExtractor):
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
title = self._html_search_regex(r'<title>(?P<title>.+?)</title>', webpage, 'title')

View File

@@ -7,15 +7,21 @@ from .common import InfoExtractor
from ..utils import (
unified_strdate,
url_basename,
qualities,
)
class CanalplusIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.canalplus\.fr/.*?/(?P<path>.*)|player\.canalplus\.fr/#/(?P<id>[0-9]+))'
_VIDEO_INFO_TEMPLATE = 'http://service.canal-plus.com/video/rest/getVideosLiees/cplus/%s'
IE_NAME = 'canalplus.fr'
IE_DESC = 'canalplus.fr, piwiplus.fr and d8.tv'
_VALID_URL = r'https?://(?:www\.(?P<site>canalplus\.fr|piwiplus\.fr|d8\.tv)/.*?/(?P<path>.*)|player\.canalplus\.fr/#/(?P<id>[0-9]+))'
_VIDEO_INFO_TEMPLATE = 'http://service.canal-plus.com/video/rest/getVideosLiees/%s/%s'
_SITE_ID_MAP = {
'canalplus.fr': 'cplus',
'piwiplus.fr': 'teletoon',
'd8.tv': 'd8',
}
_TEST = {
_TESTS = [{
'url': 'http://www.canalplus.fr/c-infos-documentaires/pid1830-c-zapping.html?vid=922470',
'md5': '3db39fb48b9685438ecf33a1078023e4',
'info_dict': {
@@ -25,36 +31,73 @@ class CanalplusIE(InfoExtractor):
'description': 'Le meilleur de toutes les chaînes, tous les jours.\nEmission du 26 août 2013',
'upload_date': '20130826',
},
}
}, {
'url': 'http://www.piwiplus.fr/videos-piwi/pid1405-le-labyrinthe-boing-super-ranger.html?vid=1108190',
'info_dict': {
'id': '1108190',
'ext': 'flv',
'title': 'Le labyrinthe - Boing super ranger',
'description': 'md5:4cea7a37153be42c1ba2c1d3064376ff',
'upload_date': '20140724',
},
'skip': 'Only works from France',
}, {
'url': 'http://www.d8.tv/d8-docs-mags/pid6589-d8-campagne-intime.html',
'info_dict': {
'id': '966289',
'ext': 'flv',
'title': 'Campagne intime - Documentaire exceptionnel',
'description': 'md5:d2643b799fb190846ae09c61e59a859f',
'upload_date': '20131108',
},
'skip': 'videos get deleted after a while',
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.groupdict().get('id')
site_id = self._SITE_ID_MAP[mobj.group('site') or 'canal']
# Beware, some subclasses do not define an id group
display_id = url_basename(mobj.group('path'))
if video_id is None:
webpage = self._download_webpage(url, display_id)
video_id = self._search_regex(r'<canal:player videoId="(\d+)"', webpage, 'video id')
video_id = self._search_regex(
r'<canal:player[^>]+?videoId="(\d+)"', webpage, 'video id')
info_url = self._VIDEO_INFO_TEMPLATE % video_id
info_url = self._VIDEO_INFO_TEMPLATE % (site_id, video_id)
doc = self._download_xml(info_url, video_id, 'Downloading video XML')
video_info = [video for video in doc if video.find('ID').text == video_id][0]
media = video_info.find('MEDIA')
infos = video_info.find('INFOS')
preferences = ['MOBILE', 'BAS_DEBIT', 'HAUT_DEBIT', 'HD', 'HLS', 'HDS']
preference = qualities(['MOBILE', 'BAS_DEBIT', 'HAUT_DEBIT', 'HD', 'HLS', 'HDS'])
formats = [
{
'url': fmt.text + '?hdcore=2.11.3' if fmt.tag == 'HDS' else fmt.text,
'format_id': fmt.tag,
'ext': 'mp4' if fmt.tag == 'HLS' else 'flv',
'preference': preferences.index(fmt.tag) if fmt.tag in preferences else -1,
} for fmt in media.find('VIDEOS') if fmt.text
]
formats = []
for fmt in media.find('VIDEOS'):
format_url = fmt.text
if not format_url:
continue
format_id = fmt.tag
if format_id == 'HLS':
hls_formats = self._extract_m3u8_formats(format_url, video_id, 'flv')
for fmt in hls_formats:
fmt['preference'] = preference(format_id)
formats.extend(hls_formats)
elif format_id == 'HDS':
hds_formats = self._extract_f4m_formats(format_url + '?hdcore=2.11.3', video_id)
for fmt in hds_formats:
fmt['preference'] = preference(format_id)
formats.extend(hds_formats)
else:
formats.append({
'url': format_url,
'format_id': format_id,
'preference': preference(format_id),
})
self._sort_formats(formats)
return {

View File

@@ -59,12 +59,9 @@ class CinemassacreIE(InfoExtractor):
vidurl = self._search_regex(
r'\'vidurl\'\s*:\s*"([^\']+)"', playerdata, 'vidurl').replace('\\/', '/')
vidid = self._search_regex(
r'\'vidid\'\s*:\s*"([^\']+)"', playerdata, 'vidid')
videoserver = self._html_search_regex(
r"'videoserver'\s*:\s*'([^']+)'", playerdata, 'videoserver')
videolist_url = 'http://%s/vod/smil:%s.smil/jwplayer.smil' % (videoserver, vidid)
videolist_url = self._search_regex(
r"file\s*:\s*'(http.+?/jwplayer\.smil)'", playerdata, 'jwplayer.smil')
videolist = self._download_xml(videolist_url, video_id, 'Downloading videolist XML')
formats = []

View File

@@ -72,6 +72,7 @@ class InfoExtractor(object):
* acodec Name of the audio codec in use
* asr Audio sampling rate in Hertz
* vbr Average video bitrate in KBit/s
* fps Frame rate
* vcodec Name of the video codec in use
* container Name of the container format
* filesize The number of bytes, if known in advance
@@ -618,6 +619,7 @@ class InfoExtractor(object):
f.get('vbr') if f.get('vbr') is not None else -1,
f.get('abr') if f.get('abr') is not None else -1,
audio_ext_preference,
f.get('fps') if f.get('fps') is not None else -1,
f.get('filesize') if f.get('filesize') is not None else -1,
f.get('filesize_approx') if f.get('filesize_approx') is not None else -1,
f.get('source_preference') if f.get('source_preference') is not None else -1,

View File

@@ -109,19 +109,17 @@ class CrunchyrollIE(SubtitlesInfoExtractor):
decrypted_data = intlist_to_bytes(aes_cbc_decrypt(data, key, iv))
return zlib.decompress(decrypted_data)
def _convert_subtitles_to_srt(self, subtitles):
def _convert_subtitles_to_srt(self, sub_root):
output = ''
for i, (start, end, text) in enumerate(re.findall(r'<event [^>]*?start="([^"]+)" [^>]*?end="([^"]+)" [^>]*?text="([^"]+)"[^>]*?>', subtitles), 1):
start = start.replace('.', ',')
end = end.replace('.', ',')
text = clean_html(text)
text = text.replace('\\N', '\n')
if not text:
continue
for i, event in enumerate(sub_root.findall('./events/event'), 1):
start = event.attrib['start'].replace('.', ',')
end = event.attrib['end'].replace('.', ',')
text = event.attrib['text'].replace('\\N', '\n')
output += '%d\n%s --> %s\n%s\n\n' % (i, start, end, text)
return output
def _convert_subtitles_to_ass(self, subtitles):
def _convert_subtitles_to_ass(self, sub_root):
output = ''
def ass_bool(strvalue):
@@ -130,10 +128,6 @@ class CrunchyrollIE(SubtitlesInfoExtractor):
assvalue = '-1'
return assvalue
sub_root = xml.etree.ElementTree.fromstring(subtitles)
if not sub_root:
return output
output = '[Script Info]\n'
output += 'Title: %s\n' % sub_root.attrib["title"]
output += 'ScriptType: v4.00+\n'
@@ -270,10 +264,13 @@ Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
lang_code = self._search_regex(r'lang_code=["\']([^"\']+)', subtitle, 'subtitle_lang_code', fatal=False)
if not lang_code:
continue
sub_root = xml.etree.ElementTree.fromstring(subtitle)
if not sub_root:
subtitles[lang_code] = ''
if sub_format == 'ass':
subtitles[lang_code] = self._convert_subtitles_to_ass(subtitle)
subtitles[lang_code] = self._convert_subtitles_to_ass(sub_root)
else:
subtitles[lang_code] = self._convert_subtitles_to_srt(subtitle)
subtitles[lang_code] = self._convert_subtitles_to_srt(sub_root)
if self._downloader.params.get('listsubtitles', False):
self._list_available_subtitles(video_id, subtitles)

View File

@@ -1,25 +0,0 @@
# encoding: utf-8
from __future__ import unicode_literals
from .canalplus import CanalplusIE
class D8IE(CanalplusIE):
_VALID_URL = r'https?://www\.d8\.tv/.*?/(?P<path>.*)'
_VIDEO_INFO_TEMPLATE = 'http://service.canal-plus.com/video/rest/getVideosLiees/d8/%s'
IE_NAME = 'd8.tv'
_TEST = {
'url': 'http://www.d8.tv/d8-docs-mags/pid6589-d8-campagne-intime.html',
'file': '966289.flv',
'info_dict': {
'title': 'Campagne intime - Documentaire exceptionnel',
'description': 'md5:d2643b799fb190846ae09c61e59a859f',
'upload_date': '20131108',
},
'params': {
# rtmp
'skip_download': True,
},
'skip': 'videos get deleted after a while',
}

View File

@@ -1,7 +1,5 @@
from __future__ import unicode_literals
import re
from .subtitles import SubtitlesInfoExtractor
from .common import ExtractorError
from ..utils import parse_iso8601
@@ -25,8 +23,7 @@ class DRTVIE(SubtitlesInfoExtractor):
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
video_id = self._match_id(url)
programcard = self._download_json(
'http://www.dr.dk/mu/programcard/expanded/%s' % video_id, video_id, 'Downloading video JSON')
@@ -35,7 +32,7 @@ class DRTVIE(SubtitlesInfoExtractor):
title = data['Title']
description = data['Description']
timestamp = parse_iso8601(data['CreatedTime'][:-5])
timestamp = parse_iso8601(data['CreatedTime'])
thumbnail = None
duration = None

View File

@@ -6,7 +6,6 @@ import json
from .common import InfoExtractor
from ..utils import (
determine_ext,
get_element_by_id,
clean_html,
)

View File

@@ -8,7 +8,7 @@ from ..utils import ExtractorError
class FunnyOrDieIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?funnyordie\.com/(?P<type>embed|videos)/(?P<id>[0-9a-f]+)(?:$|[?#/])'
_VALID_URL = r'https?://(?:www\.)?funnyordie\.com/(?P<type>embed|articles|videos)/(?P<id>[0-9a-f]+)(?:$|[?#/])'
_TESTS = [{
'url': 'http://www.funnyordie.com/videos/0732f586d7/heart-shaped-box-literal-video-version',
'md5': 'bcd81e0c4f26189ee09be362ad6e6ba9',
@@ -29,6 +29,9 @@ class FunnyOrDieIE(InfoExtractor):
'description': 'Please use this to sell something. www.jonlajoie.com',
'thumbnail': 're:^http:.*\.jpg$',
},
}, {
'url': 'http://www.funnyordie.com/articles/ebf5e34fc8/10-hours-of-walking-in-nyc-as-a-man',
'only_matching': True,
}]
def _real_extract(self, url):

View File

@@ -405,6 +405,18 @@ class GenericIE(InfoExtractor):
'expected_warnings': [
r'501.*Not Implemented'
],
},
# Soundcloud embed
{
'url': 'http://nakedsecurity.sophos.com/2014/10/29/sscc-171-are-you-sure-that-1234-is-a-bad-password-podcast/',
'info_dict': {
'id': '174391317',
'ext': 'mp3',
'description': 'md5:ff867d6b555488ad3c52572bb33d432c',
'uploader': 'Sophos Security',
'title': 'Chet Chat 171 - Oct 29, 2014',
'upload_date': '20141029',
}
}
]
@@ -838,7 +850,7 @@ class GenericIE(InfoExtractor):
# Look for embeded soundcloud player
mobj = re.search(
r'<iframe src="(?P<url>https?://(?:w\.)?soundcloud\.com/player[^"]+)"',
r'<iframe\s+(?:[a-zA-Z0-9_-]+="[^"]+"\s+)*src="(?P<url>https?://(?:w\.)?soundcloud\.com/player[^"]+)"',
webpage)
if mobj is not None:
url = unescapeHTML(mobj.group('url'))
@@ -875,7 +887,7 @@ class GenericIE(InfoExtractor):
return self.url_result(mobj.group('url'), 'SBS')
mobj = re.search(
r'<iframe[^>]+?src=(["\'])(?P<url>https?://m\.mlb\.com/shared/video/embed/embed\.html\?.+?)\1',
r'<iframe[^>]+?src=(["\'])(?P<url>https?://m(?:lb)?\.mlb\.com/shared/video/embed/embed\.html\?.+?)\1',
webpage)
if mobj is not None:
return self.url_result(mobj.group('url'), 'MLB')
@@ -933,7 +945,7 @@ class GenericIE(InfoExtractor):
found = filter_video(re.findall(r'<meta.*?property="og:video".*?content="(.*?)"', webpage))
if not found:
# HTML5 video
found = re.findall(r'(?s)<video[^<]*(?:>.*?<source[^>]+)? src="([^"]+)"', webpage)
found = re.findall(r'(?s)<video[^<]*(?:>.*?<source[^>]*)?\s+src="([^"]+)"', webpage)
if not found:
found = re.search(
r'(?i)<meta\s+(?=(?:[a-z-]+="[^"]+"\s+)*http-equiv="refresh")'

View File

@@ -1,7 +1,5 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
@@ -20,13 +18,11 @@ class IconosquareIE(InfoExtractor):
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
html_title = self._html_search_regex(
r'<title>(.+?)</title>',
title = self._html_search_regex(
r'<title>(.+?)(?: *\(Videos?\))? \| (?:Iconosquare|Statigram)</title>',
webpage, 'title')
title = re.sub(r'(?: *\(Videos?\))? \| (?:Iconosquare|Statigram)$', '', html_title)
uploader_id = self._html_search_regex(
r'@([^ ]+)', title, 'uploader name', fatal=False)

View File

@@ -0,0 +1,77 @@
from __future__ import unicode_literals
import random
import re
from .common import InfoExtractor
class Laola1TvIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?laola1\.tv/(?P<lang>[a-z]+)-(?P<portal>[a-z]+)/.*?/(?P<id>[0-9]+)\.html'
_TEST = {
'url': 'http://www.laola1.tv/de-de/live/bwf-bitburger-open-grand-prix-gold-court-1/250019.html',
'info_dict': {
'id': '250019',
'ext': 'mp4',
'title': 'Bitburger Open Grand Prix Gold - Court 1',
'categories': ['Badminton'],
'uploader': 'BWF - Badminton World Federation',
'is_live': True,
},
'params': {
'skip_download': True,
}
}
_BROKEN = True # Not really - extractor works fine, but f4m downloader does not support live streams yet.
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
lang = mobj.group('lang')
portal = mobj.group('portal')
webpage = self._download_webpage(url, video_id)
iframe_url = self._search_regex(
r'<iframe[^>]*?class="main_tv_player"[^>]*?src="([^"]+)"',
webpage, 'iframe URL')
iframe = self._download_webpage(
iframe_url, video_id, note='Downloading iframe')
flashvars_m = re.findall(
r'flashvars\.([_a-zA-Z0-9]+)\s*=\s*"([^"]*)";', iframe)
flashvars = dict((m[0], m[1]) for m in flashvars_m)
xml_url = ('http://www.laola1.tv/server/hd_video.php?' +
'play=%s&partner=1&portal=%s&v5ident=&lang=%s' % (
video_id, portal, lang))
hd_doc = self._download_xml(xml_url, video_id)
title = hd_doc.find('.//video/title').text
flash_url = hd_doc.find('.//video/url').text
categories = hd_doc.find('.//video/meta_sports').text.split(',')
uploader = hd_doc.find('.//video/meta_organistation').text
ident = random.randint(10000000, 99999999)
token_url = '%s&ident=%s&klub=0&unikey=0&timestamp=%s&auth=%s' % (
flash_url, ident, flashvars['timestamp'], flashvars['auth'])
token_doc = self._download_xml(
token_url, video_id, note='Downloading token')
token_attrib = token_doc.find('.//token').attrib
if token_attrib.get('auth') == 'blocked':
raise ExtractorError('Token error: ' % token_attrib.get('comment'))
video_url = '%s?hdnea=%s&hdcore=3.2.0' % (
token_attrib['url'], token_attrib['auth'])
return {
'id': video_id,
'is_live': True,
'title': title,
'url': video_url,
'uploader': uploader,
'categories': categories,
'ext': 'mp4',
}

View File

@@ -10,7 +10,7 @@ from ..utils import (
class MLBIE(InfoExtractor):
_VALID_URL = r'https?://m\.mlb\.com/(?:(?:.*?/)?video/(?:topic/[\da-z_-]+/)?v|shared/video/embed/embed\.html\?.*?\bcontent_id=)(?P<id>n?\d+)'
_VALID_URL = r'https?://m(?:lb)?\.mlb\.com/(?:(?:.*?/)?video/(?:topic/[\da-z_-]+/)?v|(?:shared/video/embed/embed\.html|[^/]+/video/play\.jsp)\?.*?\bcontent_id=)(?P<id>n?\d+)'
_TESTS = [
{
'url': 'http://m.mlb.com/sea/video/topic/51231442/v34698933/nymsea-ackley-robs-a-home-run-with-an-amazing-catch/?c_id=sea',
@@ -72,6 +72,14 @@ class MLBIE(InfoExtractor):
'url': 'http://m.mlb.com/shared/video/embed/embed.html?content_id=35692085&topic_id=6479266&width=400&height=224&property=mlb',
'only_matching': True,
},
{
'url': 'http://mlb.mlb.com/shared/video/embed/embed.html?content_id=36599553',
'only_matching': True,
},
{
'url': 'http://mlb.mlb.com/es/video/play.jsp?content_id=36599553',
'only_matching': True,
},
]
def _real_extract(self, url):

View File

@@ -7,6 +7,7 @@ from .common import InfoExtractor
from ..utils import (
compat_urllib_parse,
ExtractorError,
clean_html,
)
@@ -31,6 +32,11 @@ class NaverIE(InfoExtractor):
m_id = re.search(r'var rmcPlayer = new nhn.rmcnmv.RMCVideoPlayer\("(.+?)", "(.+?)"',
webpage)
if m_id is None:
m_error = re.search(
r'(?s)<div class="nation_error">\s*(?:<!--.*?-->)?\s*<p class="[^"]+">(?P<msg>.+?)</p>\s*</div>',
webpage)
if m_error:
raise ExtractorError(clean_html(m_error.group('msg')), expected=True)
raise ExtractorError('couldn\'t extract vid and key')
vid = m_id.group(1)
key = m_id.group(2)

View File

@@ -2,6 +2,7 @@
from __future__ import unicode_literals
import re
import json
from .common import InfoExtractor
from ..utils import (
@@ -146,3 +147,36 @@ class NiconicoIE(InfoExtractor):
'duration': duration,
'webpage_url': webpage_url,
}
class NiconicoPlaylistIE(InfoExtractor):
_VALID_URL = r'https?://www\.nicovideo\.jp/mylist/(?P<id>\d+)'
_TEST = {
'url': 'http://www.nicovideo.jp/mylist/27411728',
'info_dict': {
'id': '27411728',
'title': 'AKB48のオールナイトニッポン',
},
'playlist_mincount': 225,
}
def _real_extract(self, url):
list_id = self._match_id(url)
webpage = self._download_webpage(url, list_id)
entries_json = self._search_regex(r'Mylist\.preload\(\d+, (\[.*\])\);',
webpage, 'entries')
entries = json.loads(entries_json)
entries = [{
'_type': 'url',
'ie_key': NiconicoIE.ie_key(),
'url': 'http://www.nicovideo.jp/watch/%s' % entry['item_id'],
} for entry in entries]
return {
'_type': 'playlist',
'title': self._search_regex(r'\s+name: "(.*?)"', webpage, 'title'),
'id': list_id,
'entries': entries,
}

View File

@@ -0,0 +1,51 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
compat_urlparse,
determine_ext,
int_or_none,
)
class QuickVidIE(InfoExtractor):
_VALID_URL = r'https?://(www\.)?quickvid\.org/watch\.php\?v=(?P<id>[a-zA-Z_0-9-]+)'
_TEST = {
'url': 'http://quickvid.org/watch.php?v=sUQT3RCG8dx',
'md5': 'c0c72dd473f260c06c808a05d19acdc5',
'info_dict': {
'id': 'sUQT3RCG8dx',
'ext': 'mp4',
'title': 'Nick Offerman\'s Summer Reading Recap',
'thumbnail': 're:^https?://.*\.(?:png|jpg|gif)$',
'view_count': int,
},
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
title = self._html_search_regex(r'<h2>(.*?)</h2>', webpage, 'title')
view_count = int_or_none(self._html_search_regex(
r'(?s)<div id="views">(.*?)</div>',
webpage, 'view count', fatal=False))
video_code = self._search_regex(
r'(?s)<video id="video"[^>]*>(.*?)</video>', webpage, 'video code')
formats = [
{
'url': compat_urlparse.urljoin(url, src),
'format_id': determine_ext(src, None),
} for src in re.findall('<source\s+src="([^"]+)"', video_code)
]
self._sort_formats(formats)
return {
'id': video_id,
'title': title,
'formats': formats,
'thumbnail': self._og_search_thumbnail(webpage),
'view_count': view_count,
}

View File

@@ -1,43 +1,43 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
clean_html,
compat_parse_qs,
)
from ..utils import compat_urllib_parse_unquote
class Ro220IE(InfoExtractor):
IE_NAME = '220.ro'
_VALID_URL = r'(?x)(?:https?://)?(?:www\.)?220\.ro/(?P<category>[^/]+)/(?P<shorttitle>[^/]+)/(?P<video_id>[^/]+)'
_VALID_URL = r'(?x)(?:https?://)?(?:www\.)?220\.ro/(?P<category>[^/]+)/(?P<shorttitle>[^/]+)/(?P<id>[^/]+)'
_TEST = {
"url": "http://www.220.ro/sport/Luati-Le-Banii-Sez-4-Ep-1/LYV6doKo7f/",
'file': 'LYV6doKo7f.mp4',
'url': 'http://www.220.ro/sport/Luati-Le-Banii-Sez-4-Ep-1/LYV6doKo7f/',
'md5': '03af18b73a07b4088753930db7a34add',
'info_dict': {
"title": "Luati-le Banii sez 4 ep 1",
"description": "re:^Iata-ne reveniti dupa o binemeritata vacanta\. +Va astept si pe Facebook cu pareri si comentarii.$",
'id': 'LYV6doKo7f',
'ext': 'mp4',
'title': 'Luati-le Banii sez 4 ep 1',
'description': 're:^Iata-ne reveniti dupa o binemeritata vacanta\. +Va astept si pe Facebook cu pareri si comentarii.$',
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('video_id')
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
flashVars_str = self._search_regex(
r'<param name="flashVars" value="([^"]+)"',
webpage, 'flashVars')
flashVars = compat_parse_qs(flashVars_str)
url = compat_urllib_parse_unquote(self._search_regex(
r'(?s)clip\s*:\s*{.*?url\s*:\s*\'([^\']+)\'', webpage, 'url'))
title = self._og_search_title(webpage)
description = self._og_search_description(webpage)
thumbnail = self._og_search_thumbnail(webpage)
formats = [{
'format_id': 'sd',
'url': url,
'ext': 'mp4',
}]
return {
'_type': 'video',
'id': video_id,
'ext': 'mp4',
'url': flashVars['videoURL'][0],
'title': flashVars['title'][0],
'description': clean_html(flashVars['desc'][0]),
'thumbnail': flashVars['preview'][0],
'formats': formats,
'title': title,
'description': description,
'thumbnail': thumbnail,
}

View File

@@ -26,9 +26,9 @@ class SRMediathekIE(InfoExtractor):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
urls = json.loads(js_to_json(self._search_regex(
murls = json.loads(js_to_json(self._search_regex(
r'var mediaURLs\s*=\s*(.*?);\n', webpage, 'video URLs')))
formats = [{'url': url} for url in urls]
formats = [{'url': murl} for murl in murls]
self._sort_formats(formats)
title = json.loads(js_to_json(self._search_regex(

View File

@@ -1,13 +1,12 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import xpath_text
class TruTubeIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?trutube\.tv/video/(?P<id>[0-9]+)/.*'
_TEST = {
_VALID_URL = r'https?://(?:www\.)?trutube\.tv/(?:video/|nuevo/player/embed\.php\?v=)(?P<id>[0-9]+)'
_TESTS = [{
'url': 'http://trutube.tv/video/14880/Ramses-II-Proven-To-Be-A-Red-Headed-Caucasoid-',
'md5': 'c5b6e301b0a2040b074746cbeaa26ca1',
'info_dict': {
@@ -16,29 +15,26 @@ class TruTubeIE(InfoExtractor):
'title': 'Ramses II - Proven To Be A Red Headed Caucasoid',
'thumbnail': 're:^http:.*\.jpg$',
}
}
}, {
'url': 'https://trutube.tv/nuevo/player/embed.php?v=14880',
'only_matching': True,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
video_title = self._og_search_title(webpage).strip()
thumbnail = self._search_regex(
r"var splash_img = '([^']+)';", webpage, 'thumbnail', fatal=False)
config = self._download_xml(
'https://trutube.tv/nuevo/player/config.php?v=%s' % video_id,
video_id, transform_source=lambda s: s.strip())
all_formats = re.finditer(
r"var (?P<key>[a-z]+)_video_file\s*=\s*'(?P<url>[^']+)';", webpage)
formats = [{
'format_id': m.group('key'),
'quality': -i,
'url': m.group('url'),
} for i, m in enumerate(all_formats)]
self._sort_formats(formats)
# filehd is always 404
video_url = xpath_text(config, './file', 'video URL', fatal=True)
title = xpath_text(config, './title', 'title')
thumbnail = xpath_text(config, './image', ' thumbnail')
return {
'id': video_id,
'title': video_title,
'formats': formats,
'url': video_url,
'title': title,
'thumbnail': thumbnail,
}

View File

@@ -138,9 +138,19 @@ class VKIE(InfoExtractor):
info_url = 'http://vk.com/al_video.php?act=show&al=1&video=%s' % video_id
info_page = self._download_webpage(info_url, video_id)
if re.search(r'<!>Please log in or <', info_page):
raise ExtractorError('This video is only available for registered users, '
'use --username and --password options to provide account credentials.', expected=True)
ERRORS = {
r'>Видеозапись .*? была изъята из публичного доступа в связи с обращением правообладателя.<':
'Video %s has been removed from public access due to rightholder complaint.',
r'<!>Please log in or <':
'Video %s is only available for registered users, '
'use --username and --password options to provide account credentials.',
'<!>Unknown error':
'Video %s does not exist.'
}
for error_re, error_msg in ERRORS.items():
if re.search(error_re, info_page):
raise ExtractorError(error_msg % video_id, expected=True)
m_yt = re.search(r'src="(http://www.youtube.com/.*?)"', info_page)
if m_yt is not None:

View File

@@ -37,7 +37,7 @@ class WimpIE(InfoExtractor):
video_id = mobj.group(1)
webpage = self._download_webpage(url, video_id)
video_url = self._search_regex(
r's1\.addVariable\("file",\s*"([^"]+)"\);', webpage, 'video URL')
r"'file'\s*:\s*'([^']+)'", webpage, 'video URL')
if YoutubeIE.suitable(video_url):
self.to_screen('Found YouTube video')
return {

View File

@@ -274,6 +274,9 @@ class YoutubeIE(YoutubeBaseInfoExtractor, SubtitlesInfoExtractor):
'138': {'ext': 'mp4', 'height': 2160, 'format_note': 'DASH video', 'acodec': 'none', 'preference': -40},
'160': {'ext': 'mp4', 'height': 144, 'format_note': 'DASH video', 'acodec': 'none', 'preference': -40},
'264': {'ext': 'mp4', 'height': 1440, 'format_note': 'DASH video', 'acodec': 'none', 'preference': -40},
'298': {'ext': 'mp4', 'height': 720, 'format_note': 'DASH video', 'acodec': 'none', 'preference': -40, 'fps': 60, 'vcodec': 'h264'},
'299': {'ext': 'mp4', 'height': 1080, 'format_note': 'DASH video', 'acodec': 'none', 'preference': -40, 'fps': 60, 'vcodec': 'h264'},
'266': {'ext': 'mp4', 'height': 2160, 'format_note': 'DASH video', 'acodec': 'none', 'preference': -40, 'vcodec': 'h264'},
# Dash mp4 audio
'139': {'ext': 'm4a', 'format_note': 'DASH audio', 'vcodec': 'none', 'abr': 48, 'preference': -50},
@@ -297,6 +300,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor, SubtitlesInfoExtractor):
'248': {'ext': 'webm', 'height': 1080, 'format_note': 'DASH video', 'acodec': 'none', 'preference': -40},
'271': {'ext': 'webm', 'height': 1440, 'format_note': 'DASH video', 'acodec': 'none', 'preference': -40},
'272': {'ext': 'webm', 'height': 2160, 'format_note': 'DASH video', 'acodec': 'none', 'preference': -40},
'302': {'ext': 'webm', 'height': 720, 'format_note': 'DASH video', 'acodec': 'none', 'preference': -40, 'fps': 60, 'vcodec': 'VP9'},
'303': {'ext': 'webm', 'height': 1080, 'format_note': 'DASH video', 'acodec': 'none', 'preference': -40, 'fps': 60, 'vcodec': 'VP9'},
# Dash webm audio
'171': {'ext': 'webm', 'vcodec': 'none', 'format_note': 'DASH audio', 'abr': 128, 'preference': -50},

View File

@@ -925,7 +925,7 @@ def parse_iso8601(date_str, delimiter='T'):
return None
m = re.search(
r'Z$| ?(?P<sign>\+|-)(?P<hours>[0-9]{2}):?(?P<minutes>[0-9]{2})$',
r'(\.[0-9]+)?(?:Z$| ?(?P<sign>\+|-)(?P<hours>[0-9]{2}):?(?P<minutes>[0-9]{2})$)',
date_str)
if not m:
timezone = datetime.timedelta()
@@ -938,7 +938,7 @@ def parse_iso8601(date_str, delimiter='T'):
timezone = datetime.timedelta(
hours=sign * int(m.group('hours')),
minutes=sign * int(m.group('minutes')))
date_format = '%Y-%m-%d{0}%H:%M:%S'.format(delimiter)
date_format = '%Y-%m-%d{0}%H:%M:%S'.format(delimiter)
dt = datetime.datetime.strptime(date_str, date_format) - timezone
return calendar.timegm(dt.timetuple())

View File

@@ -1,2 +1,2 @@
__version__ = '2014.10.27'
__version__ = '2014.11.02.1'