Compare commits

..

25 Commits

Author SHA1 Message Date
Sergey M․
16393d6535 release 2017.08.13 2017-08-13 08:58:30 +07:00
Sergey M․
4f049e4aa8 [ChangeLog] Actualize 2017-08-13 08:00:15 +07:00
Sergey M․
475bcb225f [pornhub:playlistbase] Skip videos from drop-down menu for all playlists (closes #12819, closes #13902) 2017-08-13 07:53:02 +07:00
Sergey M․
b3c6515365 [fourtube] Add support for other sites (closes #6022, closes #7859, closes #13901) 2017-08-13 07:23:29 +07:00
Sergey M․
eb02940cc7 [generic] Add test for #13895 2017-08-13 01:11:27 +07:00
Sergey M․
4ef9152428 [limelight] Improve embeds detection (closes #13895) 2017-08-13 00:58:39 +07:00
Sergey M․
0c43a481b9 [reddit] Add extractors (closes #13847) 2017-08-12 23:43:51 +07:00
Sergey M․
868f79db41 [extractor/common] Fix _media_formats 2017-08-12 19:24:26 +07:00
Sergey M․
70851a95c3 [aparat] Extract all formats (closes #13887) 2017-08-12 17:18:23 +07:00
Sergey M․
e74e3b63e3 [YoutubeDL] Make sure format id is not empty 2017-08-12 17:14:11 +07:00
Sergey M․
ac8491fcca [extractor/common] Make _family_friendly_search optional 2017-08-12 17:11:35 +07:00
Sergey M․
82889d4ae5 [extractor/common] Respect source's type attribute for HTML5 media (closes #13892) 2017-08-12 16:48:11 +07:00
Sergey M․
92a5c41532 [mixcloud] Fix play info decryption (closes #13885) 2017-08-12 16:30:50 +07:00
Sergey M․
1663bd6e1c [generic] Replace vzaar embed test 2017-08-11 22:02:00 +07:00
tetra-eder
41918eaa5c [generic] Add support for vzaar embeds 2017-08-11 22:00:39 +07:00
Sergey M․
6ed99754bb release 2017.08.09 2017-08-09 23:52:22 +07:00
Sergey M․
0e7dfa7d16 [ChangeLog] Actualize 2017-08-09 23:49:53 +07:00
Sergey M․
baba5f4d1d [xxxymovies] Fix title extraction (closes #13868) 2017-08-09 23:46:49 +07:00
Sergey M․
dee04d24a4 [nick] Add support for nick.com.pl (closes #13860) 2017-08-09 23:12:02 +07:00
Sergey M․
5b3ddadcc3 [mixcloud] Fix play info decryption (closes #13867) 2017-08-09 22:55:13 +07:00
Sergey M․
5b232f46dc [utils] Skip missing params in cli_bool_option (closes #13865) 2017-08-09 22:28:19 +07:00
Alex Seiler
4bf22f7a10 [20min] Fix embeds extraction 2017-08-08 05:41:38 +07:00
Sergey M․
15d1e8a23d [dplayit] Fix extraction (closes #13851) 2017-08-07 22:43:42 +07:00
Yen Chi Hsuan
ee6a611665 [niconico] Support videos with multiple formats (closes #13522) 2017-08-07 00:19:46 +08:00
Yen Chi Hsuan
463e7216c8 [niconico] Support HTML5-only videos (closes #13806) 2017-08-06 23:07:28 +08:00
22 changed files with 703 additions and 170 deletions

View File

@@ -6,8 +6,8 @@
--- ---
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2017.08.06*. If it's not, read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected. ### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2017.08.13*. If it's not, read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2017.08.06** - [ ] I've **verified** and **I assure** that I'm running youtube-dl **2017.08.13**
### Before submitting an *issue* make sure you have: ### Before submitting an *issue* make sure you have:
- [ ] At least skimmed through the [README](https://github.com/rg3/youtube-dl/blob/master/README.md), **most notably** the [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections - [ ] At least skimmed through the [README](https://github.com/rg3/youtube-dl/blob/master/README.md), **most notably** the [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
@@ -35,7 +35,7 @@ Add the `-v` flag to **your command line** you run youtube-dl with (`youtube-dl
[debug] User config: [] [debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj'] [debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251 [debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2017.08.06 [debug] youtube-dl version 2017.08.13
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2 [debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4 [debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {} [debug] Proxy map: {}

View File

@@ -1,3 +1,37 @@
version 2017.08.13
Core
* [YoutubeDL] Make sure format id is not empty
* [extractor/common] Make _family_friendly_search optional
* [extractor/common] Respect source's type attribute for HTML5 media (#13892)
Extractors
* [pornhub:playlistbase] Skip videos from drop-down menu (#12819, #13902)
+ [fourtube] Add support pornerbros.com (#6022)
+ [fourtube] Add support porntube.com (#7859, #13901)
+ [fourtube] Add support fux.com
* [limelight] Improve embeds detection (#13895)
+ [reddit] Add support for v.redd.it and reddit.com (#13847)
* [aparat] Extract all formats (#13887)
* [mixcloud] Fix play info decryption (#13885)
+ [generic] Add support for vzaar embeds (#13876)
version 2017.08.09
Core
* [utils] Skip missing params in cli_bool_option (#13865)
Extractors
* [xxxymovies] Fix title extraction (#13868)
+ [nick] Add support for nick.com.pl (#13860)
* [mixcloud] Fix play info decryption (#13867)
* [20min] Fix embeds extraction (#13852)
* [dplayit] Fix extraction (#13851)
+ [niconico] Support videos with multiple formats (#13522)
+ [niconico] Support HTML5-only videos (#13806)
version 2017.08.06 version 2017.08.06
Core Core

View File

@@ -294,6 +294,7 @@
- **Funimation** - **Funimation**
- **FunnyOrDie** - **FunnyOrDie**
- **Fusion** - **Fusion**
- **Fux**
- **FXNetworks** - **FXNetworks**
- **GameInformer** - **GameInformer**
- **GameOne** - **GameOne**
@@ -621,6 +622,7 @@
- **PolskieRadio** - **PolskieRadio**
- **PolskieRadioCategory** - **PolskieRadioCategory**
- **PornCom** - **PornCom**
- **PornerBros**
- **PornFlip** - **PornFlip**
- **PornHd** - **PornHd**
- **PornHub**: PornHub and Thumbzilla - **PornHub**: PornHub and Thumbzilla
@@ -629,6 +631,7 @@
- **Pornotube** - **Pornotube**
- **PornoVoisines** - **PornoVoisines**
- **PornoXO** - **PornoXO**
- **PornTube**
- **PressTV** - **PressTV**
- **PrimeShareTV** - **PrimeShareTV**
- **PromptFile** - **PromptFile**
@@ -654,6 +657,8 @@
- **RBMARadio** - **RBMARadio**
- **RDS**: RDS.ca - **RDS**: RDS.ca
- **RedBullTV** - **RedBullTV**
- **Reddit**
- **RedditR**
- **RedTube** - **RedTube**
- **RegioTV** - **RegioTV**
- **RENTV** - **RENTV**

View File

@@ -1182,6 +1182,10 @@ part 3</font></u>
cli_bool_option( cli_bool_option(
{'nocheckcertificate': False}, '--check-certificate', 'nocheckcertificate', 'false', 'true', '='), {'nocheckcertificate': False}, '--check-certificate', 'nocheckcertificate', 'false', 'true', '='),
['--check-certificate=true']) ['--check-certificate=true'])
self.assertEqual(
cli_bool_option(
{}, '--check-certificate', 'nocheckcertificate', 'false', 'true', '='),
[])
def test_ohdave_rsa_encrypt(self): def test_ohdave_rsa_encrypt(self):
N = 0xab86b6371b5318aaa1d3c9e612a9f1264f372323c8c0f19875b5fc3b3fd3afcc1e5bec527aa94bfa85bffc157e4245aebda05389a5357b75115ac94f074aefcd N = 0xab86b6371b5318aaa1d3c9e612a9f1264f372323c8c0f19875b5fc3b3fd3afcc1e5bec527aa94bfa85bffc157e4245aebda05389a5357b75115ac94f074aefcd

View File

@@ -1500,7 +1500,7 @@ class YoutubeDL(object):
sanitize_string_field(format, 'format_id') sanitize_string_field(format, 'format_id')
sanitize_numeric_fields(format) sanitize_numeric_fields(format)
format['url'] = sanitize_url(format['url']) format['url'] = sanitize_url(format['url'])
if format.get('format_id') is None: if not format.get('format_id'):
format['format_id'] = compat_str(i) format['format_id'] = compat_str(i)
else: else:
# Sanitize format_id from characters used in format selector expression # Sanitize format_id from characters used in format selector expression

View File

@@ -3,13 +3,13 @@ from __future__ import unicode_literals
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
ExtractorError, int_or_none,
HEADRequest, mimetype2ext,
) )
class AparatIE(InfoExtractor): class AparatIE(InfoExtractor):
_VALID_URL = r'^https?://(?:www\.)?aparat\.com/(?:v/|video/video/embed/videohash/)(?P<id>[a-zA-Z0-9]+)' _VALID_URL = r'https?://(?:www\.)?aparat\.com/(?:v/|video/video/embed/videohash/)(?P<id>[a-zA-Z0-9]+)'
_TEST = { _TEST = {
'url': 'http://www.aparat.com/v/wP8On', 'url': 'http://www.aparat.com/v/wP8On',
@@ -29,30 +29,41 @@ class AparatIE(InfoExtractor):
# Note: There is an easier-to-parse configuration at # Note: There is an easier-to-parse configuration at
# http://www.aparat.com/video/video/config/videohash/%video_id # http://www.aparat.com/video/video/config/videohash/%video_id
# but the URL in there does not work # but the URL in there does not work
embed_url = 'http://www.aparat.com/video/video/embed/vt/frame/showvideo/yes/videohash/' + video_id webpage = self._download_webpage(
webpage = self._download_webpage(embed_url, video_id) 'http://www.aparat.com/video/video/embed/vt/frame/showvideo/yes/videohash/' + video_id,
video_id)
file_list = self._parse_json(self._search_regex(
r'fileList\s*=\s*JSON\.parse\(\'([^\']+)\'\)', webpage, 'file list'), video_id)
for i, item in enumerate(file_list[0]):
video_url = item['file']
req = HEADRequest(video_url)
res = self._request_webpage(
req, video_id, note='Testing video URL %d' % i, errnote=False)
if res:
break
else:
raise ExtractorError('No working video URLs found')
title = self._search_regex(r'\s+title:\s*"([^"]+)"', webpage, 'title') title = self._search_regex(r'\s+title:\s*"([^"]+)"', webpage, 'title')
file_list = self._parse_json(
self._search_regex(
r'fileList\s*=\s*JSON\.parse\(\'([^\']+)\'\)', webpage,
'file list'),
video_id)
formats = []
for item in file_list[0]:
file_url = item.get('file')
if not file_url:
continue
ext = mimetype2ext(item.get('type'))
label = item.get('label')
formats.append({
'url': file_url,
'ext': ext,
'format_id': label or ext,
'height': int_or_none(self._search_regex(
r'(\d+)[pP]', label or '', 'height', default=None)),
})
self._sort_formats(formats)
thumbnail = self._search_regex( thumbnail = self._search_regex(
r'image:\s*"([^"]+)"', webpage, 'thumbnail', fatal=False) r'image:\s*"([^"]+)"', webpage, 'thumbnail', fatal=False)
return { return {
'id': video_id, 'id': video_id,
'title': title, 'title': title,
'url': video_url,
'ext': 'mp4',
'thumbnail': thumbnail, 'thumbnail': thumbnail,
'age_limit': self._family_friendly_search(webpage), 'age_limit': self._family_friendly_search(webpage),
'formats': formats,
} }

View File

@@ -940,7 +940,8 @@ class InfoExtractor(object):
def _family_friendly_search(self, html): def _family_friendly_search(self, html):
# See http://schema.org/VideoObject # See http://schema.org/VideoObject
family_friendly = self._html_search_meta('isFamilyFriendly', html) family_friendly = self._html_search_meta(
'isFamilyFriendly', html, default=None)
if not family_friendly: if not family_friendly:
return None return None
@@ -2114,9 +2115,9 @@ class InfoExtractor(object):
return f return f
return {} return {}
def _media_formats(src, cur_media_type): def _media_formats(src, cur_media_type, type_info={}):
full_url = absolute_url(src) full_url = absolute_url(src)
ext = determine_ext(full_url) ext = type_info.get('ext') or determine_ext(full_url)
if ext == 'm3u8': if ext == 'm3u8':
is_plain_url = False is_plain_url = False
formats = self._extract_m3u8_formats( formats = self._extract_m3u8_formats(
@@ -2165,9 +2166,9 @@ class InfoExtractor(object):
src = source_attributes.get('src') src = source_attributes.get('src')
if not src: if not src:
continue continue
is_plain_url, formats = _media_formats(src, media_type) f = parse_content_type(source_attributes.get('type'))
is_plain_url, formats = _media_formats(src, media_type, f)
if is_plain_url: if is_plain_url:
f = parse_content_type(source_attributes.get('type'))
f.update(formats[0]) f.update(formats[0])
media_info['formats'].append(f) media_info['formats'].append(f)
else: else:

View File

@@ -7,16 +7,18 @@ import time
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import ( from ..compat import (
compat_urlparse,
compat_HTTPError, compat_HTTPError,
compat_str,
compat_urlparse,
) )
from ..utils import ( from ..utils import (
USER_AGENTS,
ExtractorError, ExtractorError,
int_or_none, int_or_none,
unified_strdate,
remove_end, remove_end,
try_get,
unified_strdate,
update_url_query, update_url_query,
USER_AGENTS,
) )
@@ -183,28 +185,44 @@ class DPlayItIE(InfoExtractor):
webpage = self._download_webpage(url, display_id) webpage = self._download_webpage(url, display_id)
info_url = self._search_regex(
r'url\s*[:=]\s*["\']((?:https?:)?//[^/]+/playback/videoPlaybackInfo/\d+)',
webpage, 'video id')
title = remove_end(self._og_search_title(webpage), ' | Dplay') title = remove_end(self._og_search_title(webpage), ' | Dplay')
try: video_id = None
info = self._download_json(
info_url, display_id, headers={ info = self._search_regex(
'Authorization': 'Bearer %s' % self._get_cookies(url).get( r'playback_json\s*:\s*JSON\.parse\s*\(\s*("(?:\\.|[^"\\])+?")',
'dplayit_token').value, webpage, 'playback JSON', default=None)
'Referer': url, if info:
}) for _ in range(2):
except ExtractorError as e: info = self._parse_json(info, display_id, fatal=False)
if isinstance(e.cause, compat_HTTPError) and e.cause.code in (400, 403): if not info:
info = self._parse_json(e.cause.read().decode('utf-8'), display_id) break
error = info['errors'][0] else:
if error.get('code') == 'access.denied.geoblocked': video_id = try_get(info, lambda x: x['data']['id'])
self.raise_geo_restricted(
msg=error.get('detail'), countries=self._GEO_COUNTRIES) if not info:
raise ExtractorError(info['errors'][0]['detail'], expected=True) info_url = self._search_regex(
raise r'url\s*[:=]\s*["\']((?:https?:)?//[^/]+/playback/videoPlaybackInfo/\d+)',
webpage, 'info url')
video_id = info_url.rpartition('/')[-1]
try:
info = self._download_json(
info_url, display_id, headers={
'Authorization': 'Bearer %s' % self._get_cookies(url).get(
'dplayit_token').value,
'Referer': url,
})
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code in (400, 403):
info = self._parse_json(e.cause.read().decode('utf-8'), display_id)
error = info['errors'][0]
if error.get('code') == 'access.denied.geoblocked':
self.raise_geo_restricted(
msg=error.get('detail'), countries=self._GEO_COUNTRIES)
raise ExtractorError(info['errors'][0]['detail'], expected=True)
raise
hls_url = info['data']['attributes']['streaming']['hls']['url'] hls_url = info['data']['attributes']['streaming']['hls']['url']
@@ -230,7 +248,7 @@ class DPlayItIE(InfoExtractor):
season_number = episode_number = upload_date = None season_number = episode_number = upload_date = None
return { return {
'id': info_url.rpartition('/')[-1], 'id': compat_str(video_id or display_id),
'display_id': display_id, 'display_id': display_id,
'title': title, 'title': title,
'description': self._og_search_description(webpage), 'description': self._og_search_description(webpage),

View File

@@ -350,7 +350,12 @@ from .flipagram import FlipagramIE
from .folketinget import FolketingetIE from .folketinget import FolketingetIE
from .footyroom import FootyRoomIE from .footyroom import FootyRoomIE
from .formula1 import Formula1IE from .formula1 import Formula1IE
from .fourtube import FourTubeIE from .fourtube import (
FourTubeIE,
PornTubeIE,
PornerBrosIE,
FuxIE,
)
from .fox import FOXIE from .fox import FOXIE
from .fox9 import FOX9IE from .fox9 import FOX9IE
from .foxgay import FoxgayIE from .foxgay import FoxgayIE
@@ -840,6 +845,10 @@ from .rai import (
from .rbmaradio import RBMARadioIE from .rbmaradio import RBMARadioIE
from .rds import RDSIE from .rds import RDSIE
from .redbulltv import RedBullTVIE from .redbulltv import RedBullTVIE
from .reddit import (
RedditIE,
RedditRIE,
)
from .redtube import RedTubeIE from .redtube import RedTubeIE
from .regiotv import RegioTVIE from .regiotv import RegioTVIE
from .rentv import ( from .rentv import (

View File

@@ -3,39 +3,22 @@ from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_urlparse
from ..utils import ( from ..utils import (
parse_duration, parse_duration,
parse_iso8601, parse_iso8601,
sanitized_Request,
str_to_int, str_to_int,
) )
class FourTubeIE(InfoExtractor): class FourTubeBaseIE(InfoExtractor):
IE_NAME = '4tube'
_VALID_URL = r'https?://(?:www\.)?4tube\.com/videos/(?P<id>\d+)'
_TEST = {
'url': 'http://www.4tube.com/videos/209733/hot-babe-holly-michaels-gets-her-ass-stuffed-by-black',
'md5': '6516c8ac63b03de06bc8eac14362db4f',
'info_dict': {
'id': '209733',
'ext': 'mp4',
'title': 'Hot Babe Holly Michaels gets her ass stuffed by black',
'uploader': 'WCP Club',
'uploader_id': 'wcp-club',
'upload_date': '20131031',
'timestamp': 1383263892,
'duration': 583,
'view_count': int,
'like_count': int,
'categories': list,
'age_limit': 18,
}
}
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) mobj = re.match(self._VALID_URL, url)
kind, video_id, display_id = mobj.group('kind', 'id', 'display_id')
if kind == 'm' or not display_id:
url = self._URL_TEMPLATE % video_id
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
title = self._html_search_meta('name', webpage) title = self._html_search_meta('name', webpage)
@@ -43,10 +26,10 @@ class FourTubeIE(InfoExtractor):
'uploadDate', webpage)) 'uploadDate', webpage))
thumbnail = self._html_search_meta('thumbnailUrl', webpage) thumbnail = self._html_search_meta('thumbnailUrl', webpage)
uploader_id = self._html_search_regex( uploader_id = self._html_search_regex(
r'<a class="item-to-subscribe" href="[^"]+/channels/([^/"]+)" title="Go to [^"]+ page">', r'<a class="item-to-subscribe" href="[^"]+/(?:channel|user)s?/([^/"]+)" title="Go to [^"]+ page">',
webpage, 'uploader id', fatal=False) webpage, 'uploader id', fatal=False)
uploader = self._html_search_regex( uploader = self._html_search_regex(
r'<a class="item-to-subscribe" href="[^"]+/channels/[^/"]+" title="Go to ([^"]+) page">', r'<a class="item-to-subscribe" href="[^"]+/(?:channel|user)s?/[^/"]+" title="Go to ([^"]+) page">',
webpage, 'uploader', fatal=False) webpage, 'uploader', fatal=False)
categories_html = self._search_regex( categories_html = self._search_regex(
@@ -60,10 +43,10 @@ class FourTubeIE(InfoExtractor):
view_count = str_to_int(self._search_regex( view_count = str_to_int(self._search_regex(
r'<meta[^>]+itemprop="interactionCount"[^>]+content="UserPlays:([0-9,]+)">', r'<meta[^>]+itemprop="interactionCount"[^>]+content="UserPlays:([0-9,]+)">',
webpage, 'view count', fatal=False)) webpage, 'view count', default=None))
like_count = str_to_int(self._search_regex( like_count = str_to_int(self._search_regex(
r'<meta[^>]+itemprop="interactionCount"[^>]+content="UserLikes:([0-9,]+)">', r'<meta[^>]+itemprop="interactionCount"[^>]+content="UserLikes:([0-9,]+)">',
webpage, 'like count', fatal=False)) webpage, 'like count', default=None))
duration = parse_duration(self._html_search_meta('duration', webpage)) duration = parse_duration(self._html_search_meta('duration', webpage))
media_id = self._search_regex( media_id = self._search_regex(
@@ -87,12 +70,12 @@ class FourTubeIE(InfoExtractor):
token_url = 'https://tkn.kodicdn.com/{0}/desktop/{1}'.format( token_url = 'https://tkn.kodicdn.com/{0}/desktop/{1}'.format(
media_id, '+'.join(sources)) media_id, '+'.join(sources))
headers = {
b'Content-Type': b'application/x-www-form-urlencoded', parsed_url = compat_urlparse.urlparse(url)
b'Origin': b'https://www.4tube.com', tokens = self._download_json(token_url, video_id, data=b'', headers={
} 'Origin': '%s://%s' % (parsed_url.scheme, parsed_url.hostname),
token_req = sanitized_Request(token_url, b'{}', headers) 'Referer': url,
tokens = self._download_json(token_req, video_id) })
formats = [{ formats = [{
'url': tokens[format]['token'], 'url': tokens[format]['token'],
'format_id': format + 'p', 'format_id': format + 'p',
@@ -115,3 +98,126 @@ class FourTubeIE(InfoExtractor):
'duration': duration, 'duration': duration,
'age_limit': 18, 'age_limit': 18,
} }
class FourTubeIE(FourTubeBaseIE):
IE_NAME = '4tube'
_VALID_URL = r'https?://(?:(?P<kind>www|m)\.)?4tube\.com/(?:videos|embed)/(?P<id>\d+)(?:/(?P<display_id>[^/?#&]+))?'
_URL_TEMPLATE = 'https://www.4tube.com/videos/%s/video'
_TESTS = [{
'url': 'http://www.4tube.com/videos/209733/hot-babe-holly-michaels-gets-her-ass-stuffed-by-black',
'md5': '6516c8ac63b03de06bc8eac14362db4f',
'info_dict': {
'id': '209733',
'ext': 'mp4',
'title': 'Hot Babe Holly Michaels gets her ass stuffed by black',
'uploader': 'WCP Club',
'uploader_id': 'wcp-club',
'upload_date': '20131031',
'timestamp': 1383263892,
'duration': 583,
'view_count': int,
'like_count': int,
'categories': list,
'age_limit': 18,
},
}, {
'url': 'http://www.4tube.com/embed/209733',
'only_matching': True,
}, {
'url': 'http://m.4tube.com/videos/209733/hot-babe-holly-michaels-gets-her-ass-stuffed-by-black',
'only_matching': True,
}]
class FuxIE(FourTubeBaseIE):
_VALID_URL = r'https?://(?:(?P<kind>www|m)\.)?fux\.com/(?:video|embed)/(?P<id>\d+)(?:/(?P<display_id>[^/?#&]+))?'
_URL_TEMPLATE = 'https://www.fux.com/video/%s/video'
_TESTS = [{
'url': 'https://www.fux.com/video/195359/awesome-fucking-kitchen-ends-cum-swallow',
'info_dict': {
'id': '195359',
'ext': 'mp4',
'title': 'Awesome fucking in the kitchen ends with cum swallow',
'uploader': 'alenci2342',
'uploader_id': 'alenci2342',
'upload_date': '20131230',
'timestamp': 1388361660,
'duration': 289,
'view_count': int,
'like_count': int,
'categories': list,
'age_limit': 18,
},
'params': {
'skip_download': True,
},
}, {
'url': 'https://www.fux.com/embed/195359',
'only_matching': True,
}, {
'url': 'https://www.fux.com/video/195359/awesome-fucking-kitchen-ends-cum-swallow',
'only_matching': True,
}]
class PornTubeIE(FourTubeBaseIE):
_VALID_URL = r'https?://(?:(?P<kind>www|m)\.)?porntube\.com/(?:videos/(?P<display_id>[^/]+)_|embed/)(?P<id>\d+)'
_URL_TEMPLATE = 'https://www.porntube.com/videos/video_%s'
_TESTS = [{
'url': 'https://www.porntube.com/videos/teen-couple-doing-anal_7089759',
'info_dict': {
'id': '7089759',
'ext': 'mp4',
'title': 'Teen couple doing anal',
'uploader': 'Alexy',
'uploader_id': 'Alexy',
'upload_date': '20150606',
'timestamp': 1433595647,
'duration': 5052,
'view_count': int,
'like_count': int,
'categories': list,
'age_limit': 18,
},
'params': {
'skip_download': True,
},
}, {
'url': 'https://www.porntube.com/embed/7089759',
'only_matching': True,
}, {
'url': 'https://m.porntube.com/videos/teen-couple-doing-anal_7089759',
'only_matching': True,
}]
class PornerBrosIE(FourTubeBaseIE):
_VALID_URL = r'https?://(?:(?P<kind>www|m)\.)?pornerbros\.com/(?:videos/(?P<display_id>[^/]+)_|embed/)(?P<id>\d+)'
_URL_TEMPLATE = 'https://www.pornerbros.com/videos/video_%s'
_TESTS = [{
'url': 'https://www.pornerbros.com/videos/skinny-brunette-takes-big-cock-down-her-anal-hole_181369',
'md5': '6516c8ac63b03de06bc8eac14362db4f',
'info_dict': {
'id': '181369',
'ext': 'mp4',
'title': 'Skinny brunette takes big cock down her anal hole',
'uploader': 'PornerBros HD',
'uploader_id': 'pornerbros-hd',
'upload_date': '20130130',
'timestamp': 1359527401,
'duration': 1224,
'view_count': int,
'categories': list,
'age_limit': 18,
},
'params': {
'skip_download': True,
},
}, {
'url': 'https://www.pornerbros.com/embed/181369',
'only_matching': True,
}, {
'url': 'https://m.pornerbros.com/videos/skinny-brunette-takes-big-cock-down-her-anal-hole_181369',
'only_matching': True,
}]

View File

@@ -98,6 +98,7 @@ from .wistia import WistiaIE
from .mediaset import MediasetIE from .mediaset import MediasetIE
from .joj import JojIE from .joj import JojIE
from .megaphone import MegaphoneIE from .megaphone import MegaphoneIE
from .vzaar import VzaarIE
class GenericIE(InfoExtractor): class GenericIE(InfoExtractor):
@@ -1784,6 +1785,21 @@ class GenericIE(InfoExtractor):
}, },
'playlist_mincount': 5, 'playlist_mincount': 5,
}, },
{
# Limelight embed (LimelightPlayerUtil.embed)
'url': 'https://tv5.ca/videos?v=xuu8qowr291ri',
'info_dict': {
'id': '95d035dc5c8a401588e9c0e6bd1e9c92',
'ext': 'mp4',
'title': '07448641',
'timestamp': 1499890639,
'upload_date': '20170712',
},
'params': {
'skip_download': True,
},
'add_ie': ['LimelightMedia'],
},
{ {
'url': 'http://kron4.com/2017/04/28/standoff-with-walnut-creek-murder-suspect-ends-with-arrest/', 'url': 'http://kron4.com/2017/04/28/standoff-with-walnut-creek-murder-suspect-ends-with-arrest/',
'info_dict': { 'info_dict': {
@@ -1840,6 +1856,16 @@ class GenericIE(InfoExtractor):
'title': 'Стас Намин: «Мы нарушили девственность Кремля»', 'title': 'Стас Намин: «Мы нарушили девственность Кремля»',
}, },
}, },
{
# vzaar embed
'url': 'http://help.vzaar.com/article/165-embedding-video',
'md5': '7e3919d9d2620b89e3e00bec7fe8c9d4',
'info_dict': {
'id': '8707641',
'ext': 'mp4',
'title': 'Building A Business Online: Principal Chairs Q & A',
},
},
# { # {
# # TODO: find another test # # TODO: find another test
# # http://schema.org/VideoObject # # http://schema.org/VideoObject
@@ -2811,6 +2837,12 @@ class GenericIE(InfoExtractor):
return self.playlist_from_matches( return self.playlist_from_matches(
mpfn_urls, video_id, video_title, ie=MegaphoneIE.ie_key()) mpfn_urls, video_id, video_title, ie=MegaphoneIE.ie_key())
# Look for vzaar embeds
vzaar_urls = VzaarIE._extract_urls(webpage)
if vzaar_urls:
return self.playlist_from_matches(
vzaar_urls, video_id, video_title, ie=VzaarIE.ie_key())
def merge_dicts(dict1, dict2): def merge_dicts(dict1, dict2):
merged = {} merged = {}
for k, v in dict1.items(): for k, v in dict1.items():

View File

@@ -26,14 +26,16 @@ class LimelightBaseIE(InfoExtractor):
'Channel': 'channel', 'Channel': 'channel',
'ChannelList': 'channel_list', 'ChannelList': 'channel_list',
} }
def smuggle(url):
return smuggle_url(url, {'source_url': source_url})
entries = [] entries = []
for kind, video_id in re.findall( for kind, video_id in re.findall(
r'LimelightPlayer\.doLoad(Media|Channel|ChannelList)\(["\'](?P<id>[a-z0-9]{32})', r'LimelightPlayer\.doLoad(Media|Channel|ChannelList)\(["\'](?P<id>[a-z0-9]{32})',
webpage): webpage):
entries.append(cls.url_result( entries.append(cls.url_result(
smuggle_url( smuggle('limelight:%s:%s' % (lm[kind], video_id)),
'limelight:%s:%s' % (lm[kind], video_id),
{'source_url': source_url}),
'Limelight%s' % kind, video_id)) 'Limelight%s' % kind, video_id))
for mobj in re.finditer( for mobj in re.finditer(
# As per [1] class attribute should be exactly equal to # As per [1] class attribute should be exactly equal to
@@ -49,10 +51,15 @@ class LimelightBaseIE(InfoExtractor):
''', webpage): ''', webpage):
kind, video_id = mobj.group('kind'), mobj.group('id') kind, video_id = mobj.group('kind'), mobj.group('id')
entries.append(cls.url_result( entries.append(cls.url_result(
smuggle_url( smuggle('limelight:%s:%s' % (kind, video_id)),
'limelight:%s:%s' % (kind, video_id),
{'source_url': source_url}),
'Limelight%s' % kind.capitalize(), video_id)) 'Limelight%s' % kind.capitalize(), video_id))
# http://support.3playmedia.com/hc/en-us/articles/115009517327-Limelight-Embedding-the-Audio-Description-Plugin-with-the-Limelight-Player-on-Your-Web-Page)
for video_id in re.findall(
r'(?s)LimelightPlayerUtil\.embed\s*\(\s*{.*?\bmediaId["\']\s*:\s*["\'](?P<id>[a-z0-9]{32})',
webpage):
entries.append(cls.url_result(
smuggle('limelight:media:%s' % video_id),
LimelightMediaIE.ie_key(), video_id))
return entries return entries
def _call_playlist_service(self, item_id, method, fatal=True, referer=None): def _call_playlist_service(self, item_id, method, fatal=True, referer=None):

View File

@@ -54,15 +54,23 @@ class MixcloudIE(InfoExtractor):
}] }]
# See https://www.mixcloud.com/media/js2/www_js_2.9e23256562c080482435196ca3975ab5.js # See https://www.mixcloud.com/media/js2/www_js_2.9e23256562c080482435196ca3975ab5.js
@staticmethod def _decrypt_play_info(self, play_info, video_id):
def _decrypt_play_info(play_info): KEYS = (
KEY = 'pleasedontdownloadourmusictheartistswontgetpaid' 'pleasedontdownloadourmusictheartistswontgetpaid',
'window.addEventListener = window.addEventListener || function() {};',
'(function() { return new Date().toLocaleDateString(); })()',
)
play_info = base64.b64decode(play_info.encode('ascii')) play_info = base64.b64decode(play_info.encode('ascii'))
for num, key in enumerate(KEYS, start=1):
return ''.join([ try:
compat_chr(compat_ord(ch) ^ compat_ord(KEY[idx % len(KEY)])) return self._parse_json(
for idx, ch in enumerate(play_info)]) ''.join([
compat_chr(compat_ord(ch) ^ compat_ord(key[idx % len(key)]))
for idx, ch in enumerate(play_info)]),
video_id)
except ExtractorError:
if num == len(KEYS):
raise
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
@@ -78,8 +86,8 @@ class MixcloudIE(InfoExtractor):
encrypted_play_info = self._search_regex( encrypted_play_info = self._search_regex(
r'm-play-info="([^"]+)"', webpage, 'play info') r'm-play-info="([^"]+)"', webpage, 'play info')
play_info = self._parse_json(
self._decrypt_play_info(encrypted_play_info), track_id) play_info = self._decrypt_play_info(encrypted_play_info, track_id)
if message and 'stream_url' not in play_info: if message and 'stream_url' not in play_info:
raise ExtractorError('%s said: %s' % (self.IE_NAME, message), expected=True) raise ExtractorError('%s said: %s' % (self.IE_NAME, message), expected=True)

View File

@@ -75,7 +75,7 @@ class NickIE(MTVServicesInfoExtractor):
class NickDeIE(MTVServicesInfoExtractor): class NickDeIE(MTVServicesInfoExtractor):
IE_NAME = 'nick.de' IE_NAME = 'nick.de'
_VALID_URL = r'https?://(?:www\.)?(?P<host>nick\.de|nickelodeon\.(?:nl|at))/(?:playlist|shows)/(?:[^/]+/)*(?P<id>[^/?#&]+)' _VALID_URL = r'https?://(?:www\.)?(?P<host>nick\.(?:de|com\.pl)|nickelodeon\.(?:nl|at))/[^/]+/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.nick.de/playlist/3773-top-videos/videos/episode/17306-zu-wasser-und-zu-land-rauchende-erdnusse', 'url': 'http://www.nick.de/playlist/3773-top-videos/videos/episode/17306-zu-wasser-und-zu-land-rauchende-erdnusse',
'only_matching': True, 'only_matching': True,
@@ -88,6 +88,9 @@ class NickDeIE(MTVServicesInfoExtractor):
}, { }, {
'url': 'http://www.nickelodeon.at/playlist/3773-top-videos/videos/episode/77993-das-letzte-gefecht', 'url': 'http://www.nickelodeon.at/playlist/3773-top-videos/videos/episode/77993-das-letzte-gefecht',
'only_matching': True, 'only_matching': True,
}, {
'url': 'http://www.nick.com.pl/seriale/474-spongebob-kanciastoporty/wideo/17412-teatr-to-jest-to-rodeo-oszolom',
'only_matching': True,
}] }]
def _extract_mrss_url(self, webpage, host): def _extract_mrss_url(self, webpage, host):

View File

@@ -11,10 +11,15 @@ from ..compat import (
) )
from ..utils import ( from ..utils import (
determine_ext, determine_ext,
dict_get,
ExtractorError, ExtractorError,
int_or_none, int_or_none,
float_or_none,
parse_duration, parse_duration,
parse_iso8601, parse_iso8601,
remove_start,
try_get,
unified_timestamp,
urlencode_postdata, urlencode_postdata,
xpath_text, xpath_text,
) )
@@ -31,12 +36,15 @@ class NiconicoIE(InfoExtractor):
'id': 'sm22312215', 'id': 'sm22312215',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Big Buck Bunny', 'title': 'Big Buck Bunny',
'thumbnail': r're:https?://.*',
'uploader': 'takuya0301', 'uploader': 'takuya0301',
'uploader_id': '2698420', 'uploader_id': '2698420',
'upload_date': '20131123', 'upload_date': '20131123',
'timestamp': 1385182762, 'timestamp': 1385182762,
'description': '(c) copyright 2008, Blender Foundation / www.bigbuckbunny.org', 'description': '(c) copyright 2008, Blender Foundation / www.bigbuckbunny.org',
'duration': 33, 'duration': 33,
'view_count': int,
'comment_count': int,
}, },
'skip': 'Requires an account', 'skip': 'Requires an account',
}, { }, {
@@ -48,6 +56,7 @@ class NiconicoIE(InfoExtractor):
'ext': 'swf', 'ext': 'swf',
'title': '【鏡音リン】Dance on media【オリジナル】take2!', 'title': '【鏡音リン】Dance on media【オリジナル】take2!',
'description': 'md5:689f066d74610b3b22e0f1739add0f58', 'description': 'md5:689f066d74610b3b22e0f1739add0f58',
'thumbnail': r're:https?://.*',
'uploader': 'りょうた', 'uploader': 'りょうた',
'uploader_id': '18822557', 'uploader_id': '18822557',
'upload_date': '20110429', 'upload_date': '20110429',
@@ -64,9 +73,11 @@ class NiconicoIE(InfoExtractor):
'ext': 'unknown_video', 'ext': 'unknown_video',
'description': 'deleted', 'description': 'deleted',
'title': 'ドラえもんエターナル第3話「決戦第3新東京市」前編', 'title': 'ドラえもんエターナル第3話「決戦第3新東京市」前編',
'thumbnail': r're:https?://.*',
'upload_date': '20071224', 'upload_date': '20071224',
'timestamp': int, # timestamp field has different value if logged in 'timestamp': int, # timestamp field has different value if logged in
'duration': 304, 'duration': 304,
'view_count': int,
}, },
'skip': 'Requires an account', 'skip': 'Requires an account',
}, { }, {
@@ -76,12 +87,51 @@ class NiconicoIE(InfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'title': '【第1回】RADIOアニメロミックス ラブライブのぞえりRadio Garden', 'title': '【第1回】RADIOアニメロミックス ラブライブのぞえりRadio Garden',
'description': 'md5:b27d224bb0ff53d3c8269e9f8b561cf1', 'description': 'md5:b27d224bb0ff53d3c8269e9f8b561cf1',
'thumbnail': r're:https?://.*',
'timestamp': 1388851200, 'timestamp': 1388851200,
'upload_date': '20140104', 'upload_date': '20140104',
'uploader': 'アニメロチャンネル', 'uploader': 'アニメロチャンネル',
'uploader_id': '312', 'uploader_id': '312',
}, },
'skip': 'The viewing period of the video you were searching for has expired.', 'skip': 'The viewing period of the video you were searching for has expired.',
}, {
# video not available via `getflv`; "old" HTML5 video
'url': 'http://www.nicovideo.jp/watch/sm1151009',
'md5': '8fa81c364eb619d4085354eab075598a',
'info_dict': {
'id': 'sm1151009',
'ext': 'mp4',
'title': 'マスターシステム本体内蔵のスペハリのメインテーマ(PSG版)',
'description': 'md5:6ee077e0581ff5019773e2e714cdd0b7',
'thumbnail': r're:https?://.*',
'duration': 184,
'timestamp': 1190868283,
'upload_date': '20070927',
'uploader': 'denden2',
'uploader_id': '1392194',
'view_count': int,
'comment_count': int,
},
'skip': 'Requires an account',
}, {
# "New" HTML5 video
'url': 'http://www.nicovideo.jp/watch/sm31464864',
'md5': '351647b4917660986dc0fa8864085135',
'info_dict': {
'id': 'sm31464864',
'ext': 'mp4',
'title': '新作TVアニメ「戦姫絶唱シンフォギアAXZ」PV 最高画質',
'description': 'md5:e52974af9a96e739196b2c1ca72b5feb',
'timestamp': 1498514060,
'upload_date': '20170626',
'uploader': 'ゲス',
'uploader_id': '40826363',
'thumbnail': r're:https?://.*',
'duration': 198,
'view_count': int,
'comment_count': int,
},
'skip': 'Requires an account',
}, { }, {
'url': 'http://sp.nicovideo.jp/watch/sm28964488?ss_pos=1&cp_in=wt_tg', 'url': 'http://sp.nicovideo.jp/watch/sm28964488?ss_pos=1&cp_in=wt_tg',
'only_matching': True, 'only_matching': True,
@@ -119,6 +169,84 @@ class NiconicoIE(InfoExtractor):
self._downloader.report_warning('unable to log in: bad username or password') self._downloader.report_warning('unable to log in: bad username or password')
return login_ok return login_ok
def _extract_format_for_quality(self, api_data, video_id, audio_quality, video_quality):
def yesno(boolean):
return 'yes' if boolean else 'no'
session_api_data = api_data['video']['dmcInfo']['session_api']
session_api_endpoint = session_api_data['urls'][0]
format_id = '-'.join(map(lambda s: remove_start(s['id'], 'archive_'), [video_quality, audio_quality]))
session_response = self._download_json(
session_api_endpoint['url'], video_id,
query={'_format': 'json'},
headers={'Content-Type': 'application/json'},
note='Downloading JSON metadata for %s' % format_id,
data=json.dumps({
'session': {
'client_info': {
'player_id': session_api_data['player_id'],
},
'content_auth': {
'auth_type': session_api_data['auth_types'][session_api_data['protocols'][0]],
'content_key_timeout': session_api_data['content_key_timeout'],
'service_id': 'nicovideo',
'service_user_id': session_api_data['service_user_id']
},
'content_id': session_api_data['content_id'],
'content_src_id_sets': [{
'content_src_ids': [{
'src_id_to_mux': {
'audio_src_ids': [audio_quality['id']],
'video_src_ids': [video_quality['id']],
}
}]
}],
'content_type': 'movie',
'content_uri': '',
'keep_method': {
'heartbeat': {
'lifetime': session_api_data['heartbeat_lifetime']
}
},
'priority': session_api_data['priority'],
'protocol': {
'name': 'http',
'parameters': {
'http_parameters': {
'parameters': {
'http_output_download_parameters': {
'use_ssl': yesno(session_api_endpoint['is_ssl']),
'use_well_known_port': yesno(session_api_endpoint['is_well_known_port']),
}
}
}
}
},
'recipe_id': session_api_data['recipe_id'],
'session_operation_auth': {
'session_operation_auth_by_signature': {
'signature': session_api_data['signature'],
'token': session_api_data['token'],
}
},
'timing_constraint': 'unlimited'
}
}))
resolution = video_quality.get('resolution', {})
return {
'url': session_response['data']['session']['content_uri'],
'format_id': format_id,
'ext': 'mp4', # Session API are used in HTML5, which always serves mp4
'abr': float_or_none(audio_quality.get('bitrate'), 1000),
'vbr': float_or_none(video_quality.get('bitrate'), 1000),
'height': resolution.get('height'),
'width': resolution.get('width'),
}
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
@@ -130,33 +258,84 @@ class NiconicoIE(InfoExtractor):
if video_id.startswith('so'): if video_id.startswith('so'):
video_id = self._match_id(handle.geturl()) video_id = self._match_id(handle.geturl())
video_info = self._download_xml( api_data = self._parse_json(self._html_search_regex(
'http://ext.nicovideo.jp/api/getthumbinfo/' + video_id, video_id, 'data-api-data="([^"]+)"', webpage,
note='Downloading video info page') 'API data', default='{}'), video_id)
# Get flv info def _format_id_from_url(video_url):
flv_info_webpage = self._download_webpage( return 'economy' if video_real_url.endswith('low') else 'normal'
'http://flapi.nicovideo.jp/api/getflv/' + video_id + '?as3=1',
video_id, 'Downloading flv info')
flv_info = compat_urlparse.parse_qs(flv_info_webpage) try:
if 'url' not in flv_info: video_real_url = api_data['video']['smileInfo']['url']
if 'deleted' in flv_info: except KeyError: # Flash videos
raise ExtractorError('The video has been deleted.', # Get flv info
expected=True) flv_info_webpage = self._download_webpage(
elif 'closed' in flv_info: 'http://flapi.nicovideo.jp/api/getflv/' + video_id + '?as3=1',
raise ExtractorError('Niconico videos now require logging in', video_id, 'Downloading flv info')
expected=True)
elif 'error' in flv_info:
raise ExtractorError('%s reports error: %s' % (
self.IE_NAME, flv_info['error'][0]), expected=True)
else:
raise ExtractorError('Unable to find video URL')
video_real_url = flv_info['url'][0] flv_info = compat_urlparse.parse_qs(flv_info_webpage)
if 'url' not in flv_info:
if 'deleted' in flv_info:
raise ExtractorError('The video has been deleted.',
expected=True)
elif 'closed' in flv_info:
raise ExtractorError('Niconico videos now require logging in',
expected=True)
elif 'error' in flv_info:
raise ExtractorError('%s reports error: %s' % (
self.IE_NAME, flv_info['error'][0]), expected=True)
else:
raise ExtractorError('Unable to find video URL')
video_info_xml = self._download_xml(
'http://ext.nicovideo.jp/api/getthumbinfo/' + video_id,
video_id, note='Downloading video info page')
def get_video_info(items):
if not isinstance(items, list):
items = [items]
for item in items:
ret = xpath_text(video_info_xml, './/' + item)
if ret:
return ret
video_real_url = flv_info['url'][0]
extension = get_video_info('movie_type')
if not extension:
extension = determine_ext(video_real_url)
formats = [{
'url': video_real_url,
'ext': extension,
'format_id': _format_id_from_url(video_real_url),
}]
else:
formats = []
dmc_info = api_data['video'].get('dmcInfo')
if dmc_info: # "New" HTML5 videos
quality_info = dmc_info['quality']
for audio_quality in quality_info['audios']:
for video_quality in quality_info['videos']:
if not audio_quality['available'] or not video_quality['available']:
continue
formats.append(self._extract_format_for_quality(
api_data, video_id, audio_quality, video_quality))
self._sort_formats(formats)
else: # "Old" HTML5 videos
formats = [{
'url': video_real_url,
'ext': 'mp4',
'format_id': _format_id_from_url(video_real_url),
}]
def get_video_info(items):
return dict_get(api_data['video'], items)
# Start extracting information # Start extracting information
title = xpath_text(video_info, './/title') title = get_video_info('title')
if not title: if not title:
title = self._og_search_title(webpage, default=None) title = self._og_search_title(webpage, default=None)
if not title: if not title:
@@ -170,18 +349,15 @@ class NiconicoIE(InfoExtractor):
watch_api_data = self._parse_json(watch_api_data_string, video_id) if watch_api_data_string else {} watch_api_data = self._parse_json(watch_api_data_string, video_id) if watch_api_data_string else {}
video_detail = watch_api_data.get('videoDetail', {}) video_detail = watch_api_data.get('videoDetail', {})
extension = xpath_text(video_info, './/movie_type')
if not extension:
extension = determine_ext(video_real_url)
thumbnail = ( thumbnail = (
xpath_text(video_info, './/thumbnail_url') or get_video_info(['thumbnail_url', 'thumbnailURL']) or
self._html_search_meta('image', webpage, 'thumbnail', default=None) or self._html_search_meta('image', webpage, 'thumbnail', default=None) or
video_detail.get('thumbnail')) video_detail.get('thumbnail'))
description = xpath_text(video_info, './/description') description = get_video_info('description')
timestamp = parse_iso8601(xpath_text(video_info, './/first_retrieve')) timestamp = (parse_iso8601(get_video_info('first_retrieve')) or
unified_timestamp(get_video_info('postedDateTime')))
if not timestamp: if not timestamp:
match = self._html_search_meta('datePublished', webpage, 'date published', default=None) match = self._html_search_meta('datePublished', webpage, 'date published', default=None)
if match: if match:
@@ -191,7 +367,7 @@ class NiconicoIE(InfoExtractor):
video_detail['postedAt'].replace('/', '-'), video_detail['postedAt'].replace('/', '-'),
delimiter=' ', timezone=datetime.timedelta(hours=9)) delimiter=' ', timezone=datetime.timedelta(hours=9))
view_count = int_or_none(xpath_text(video_info, './/view_counter')) view_count = int_or_none(get_video_info(['view_counter', 'viewCount']))
if not view_count: if not view_count:
match = self._html_search_regex( match = self._html_search_regex(
r'>Views: <strong[^>]*>([^<]+)</strong>', r'>Views: <strong[^>]*>([^<]+)</strong>',
@@ -200,38 +376,33 @@ class NiconicoIE(InfoExtractor):
view_count = int_or_none(match.replace(',', '')) view_count = int_or_none(match.replace(',', ''))
view_count = view_count or video_detail.get('viewCount') view_count = view_count or video_detail.get('viewCount')
comment_count = int_or_none(xpath_text(video_info, './/comment_num')) comment_count = (int_or_none(get_video_info('comment_num')) or
video_detail.get('commentCount') or
try_get(api_data, lambda x: x['thread']['commentCount']))
if not comment_count: if not comment_count:
match = self._html_search_regex( match = self._html_search_regex(
r'>Comments: <strong[^>]*>([^<]+)</strong>', r'>Comments: <strong[^>]*>([^<]+)</strong>',
webpage, 'comment count', default=None) webpage, 'comment count', default=None)
if match: if match:
comment_count = int_or_none(match.replace(',', '')) comment_count = int_or_none(match.replace(',', ''))
comment_count = comment_count or video_detail.get('commentCount')
duration = (parse_duration( duration = (parse_duration(
xpath_text(video_info, './/length') or get_video_info('length') or
self._html_search_meta( self._html_search_meta(
'video:duration', webpage, 'video duration', default=None)) or 'video:duration', webpage, 'video duration', default=None)) or
video_detail.get('length')) video_detail.get('length') or
get_video_info('duration'))
webpage_url = xpath_text(video_info, './/watch_url') or url webpage_url = get_video_info('watch_url') or url
if video_info.find('.//ch_id') is not None: owner = api_data.get('owner', {})
uploader_id = video_info.find('.//ch_id').text uploader_id = get_video_info(['ch_id', 'user_id']) or owner.get('id')
uploader = video_info.find('.//ch_name').text uploader = get_video_info(['ch_name', 'user_nickname']) or owner.get('nickname')
elif video_info.find('.//user_id') is not None:
uploader_id = video_info.find('.//user_id').text
uploader = video_info.find('.//user_nickname').text
else:
uploader_id = uploader = None
return { return {
'id': video_id, 'id': video_id,
'url': video_real_url,
'title': title, 'title': title,
'ext': extension, 'formats': formats,
'format_id': 'economy' if video_real_url.endswith('low') else 'normal',
'thumbnail': thumbnail, 'thumbnail': thumbnail,
'description': description, 'description': description,
'uploader': uploader, 'uploader': uploader,

View File

@@ -227,20 +227,6 @@ class PornHubIE(InfoExtractor):
class PornHubPlaylistBaseIE(InfoExtractor): class PornHubPlaylistBaseIE(InfoExtractor):
def _extract_entries(self, webpage): def _extract_entries(self, webpage):
return [
self.url_result(
'http://www.pornhub.com/%s' % video_url,
PornHubIE.ie_key(), video_title=title)
for video_url, title in orderedSet(re.findall(
r'href="/?(view_video\.php\?.*\bviewkey=[\da-z]+[^"]*)"[^>]*\s+title="([^"]+)"',
webpage))
]
def _real_extract(self, url):
playlist_id = self._match_id(url)
webpage = self._download_webpage(url, playlist_id)
# Only process container div with main playlist content skipping # Only process container div with main playlist content skipping
# drop-down menu that uses similar pattern for videos (see # drop-down menu that uses similar pattern for videos (see
# https://github.com/rg3/youtube-dl/issues/11594). # https://github.com/rg3/youtube-dl/issues/11594).
@@ -248,7 +234,21 @@ class PornHubPlaylistBaseIE(InfoExtractor):
r'(?s)(<div[^>]+class=["\']container.+)', webpage, r'(?s)(<div[^>]+class=["\']container.+)', webpage,
'container', default=webpage) 'container', default=webpage)
entries = self._extract_entries(container) return [
self.url_result(
'http://www.pornhub.com/%s' % video_url,
PornHubIE.ie_key(), video_title=title)
for video_url, title in orderedSet(re.findall(
r'href="/?(view_video\.php\?.*\bviewkey=[\da-z]+[^"]*)"[^>]*\s+title="([^"]+)"',
container))
]
def _real_extract(self, url):
playlist_id = self._match_id(url)
webpage = self._download_webpage(url, playlist_id)
entries = self._extract_entries(webpage)
playlist = self._parse_json( playlist = self._parse_json(
self._search_regex( self._search_regex(

View File

@@ -0,0 +1,114 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
ExtractorError,
int_or_none,
float_or_none,
)
class RedditIE(InfoExtractor):
_VALID_URL = r'https?://v\.redd\.it/(?P<id>[^/?#&]+)'
_TEST = {
# from https://www.reddit.com/r/videos/comments/6rrwyj/that_small_heart_attack/
'url': 'https://v.redd.it/zv89llsvexdz',
'md5': '655d06ace653ea3b87bccfb1b27ec99d',
'info_dict': {
'id': 'zv89llsvexdz',
'ext': 'mp4',
'title': 'zv89llsvexdz',
},
'params': {
'format': 'bestvideo',
},
}
def _real_extract(self, url):
video_id = self._match_id(url)
formats = self._extract_m3u8_formats(
'https://v.redd.it/%s/HLSPlaylist.m3u8' % video_id, video_id,
'mp4', entry_protocol='m3u8_native', m3u8_id='hls', fatal=False)
formats.extend(self._extract_mpd_formats(
'https://v.redd.it/%s/DASHPlaylist.mpd' % video_id, video_id,
mpd_id='dash', fatal=False))
return {
'id': video_id,
'title': video_id,
'formats': formats,
}
class RedditRIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?reddit\.com/r/[^/]+/comments/(?P<id>[^/]+)'
_TESTS = [{
'url': 'https://www.reddit.com/r/videos/comments/6rrwyj/that_small_heart_attack/',
'info_dict': {
'id': 'zv89llsvexdz',
'ext': 'mp4',
'title': 'That small heart attack.',
'thumbnail': r're:^https?://.*\.jpg$',
'timestamp': 1501941939,
'upload_date': '20170805',
'uploader': 'Antw87',
'like_count': int,
'dislike_count': int,
'comment_count': int,
'age_limit': 0,
},
'params': {
'format': 'bestvideo',
'skip_download': True,
},
}, {
'url': 'https://www.reddit.com/r/videos/comments/6rrwyj',
'only_matching': True,
}, {
# imgur
'url': 'https://www.reddit.com/r/MadeMeSmile/comments/6t7wi5/wait_for_it/',
'only_matching': True,
}, {
# streamable
'url': 'https://www.reddit.com/r/videos/comments/6t7sg9/comedians_hilarious_joke_about_the_guam_flag/',
'only_matching': True,
}, {
# youtube
'url': 'https://www.reddit.com/r/videos/comments/6t75wq/southern_man_tries_to_speak_without_an_accent/',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
data = self._download_json(
url + '.json', video_id)[0]['data']['children'][0]['data']
video_url = data['url']
# Avoid recursing into the same reddit URL
if 'reddit.com/' in video_url and '/%s/' % video_id in video_url:
raise ExtractorError('No media found', expected=True)
over_18 = data.get('over_18')
if over_18 is True:
age_limit = 18
elif over_18 is False:
age_limit = 0
else:
age_limit = None
return {
'_type': 'url_transparent',
'url': video_url,
'title': data.get('title'),
'thumbnail': data.get('thumbnail'),
'timestamp': float_or_none(data.get('created_utc')),
'uploader': data.get('author'),
'like_count': int_or_none(data.get('ups')),
'dislike_count': int_or_none(data.get('downs')),
'comment_count': int_or_none(data.get('num_comments')),
'age_limit': age_limit,
}

View File

@@ -50,7 +50,7 @@ class TwentyMinutenIE(InfoExtractor):
@staticmethod @staticmethod
def _extract_urls(webpage): def _extract_urls(webpage):
return [m.group('url') for m in re.finditer( return [m.group('url') for m in re.finditer(
r'<iframe[^>]+src=(["\'])(?P<url>(?:https?://)?(?:www\.)?20min\.ch/videoplayer/videoplayer.html\?.*?\bvideoId@\d+.*?)\1', r'<iframe[^>]+src=(["\'])(?P<url>(?:(?:https?:)?//)?(?:www\.)?20min\.ch/videoplayer/videoplayer.html\?.*?\bvideoId@\d+.*?)\1',
webpage)] webpage)]
def _real_extract(self, url): def _real_extract(self, url):

View File

@@ -1,6 +1,8 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
int_or_none, int_or_none,
@@ -28,6 +30,12 @@ class VzaarIE(InfoExtractor):
}, },
}] }]
@staticmethod
def _extract_urls(webpage):
return re.findall(
r'<iframe[^>]+src=["\']((?:https?:)?//(?:view\.vzaar\.com)/[0-9]+)',
webpage)
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
video_data = self._download_json( video_data = self._download_json(

View File

@@ -39,8 +39,8 @@ class XXXYMoviesIE(InfoExtractor):
r"video_url\s*:\s*'([^']+)'", webpage, 'video URL') r"video_url\s*:\s*'([^']+)'", webpage, 'video URL')
title = self._html_search_regex( title = self._html_search_regex(
[r'<div class="block_header">\s*<h1>([^<]+)</h1>', [r'<div[^>]+\bclass="block_header"[^>]*>\s*<h1>([^<]+)<',
r'<title>(.*?)\s*-\s*XXXYMovies\.com</title>'], r'<title>(.*?)\s*-\s*(?:XXXYMovies\.com|XXX\s+Movies)</title>'],
webpage, 'title') webpage, 'title')
thumbnail = self._search_regex( thumbnail = self._search_regex(

View File

@@ -2733,6 +2733,8 @@ def cli_option(params, command_option, param):
def cli_bool_option(params, command_option, param, true_value='true', false_value='false', separator=None): def cli_bool_option(params, command_option, param, true_value='true', false_value='false', separator=None):
param = params.get(param) param = params.get(param)
if param is None:
return []
assert isinstance(param, bool) assert isinstance(param, bool)
if separator: if separator:
return [command_option + separator + (true_value if param else false_value)] return [command_option + separator + (true_value if param else false_value)]

View File

@@ -1,3 +1,3 @@
from __future__ import unicode_literals from __future__ import unicode_literals
__version__ = '2017.08.06' __version__ = '2017.08.13'