Compare commits

..

23 Commits

Author SHA1 Message Date
Sergey M․
1fa0dce2c0 release 2017.12.10 2017-12-10 23:18:53 +07:00
Sergey M․
fa1dd6d2cd [ChangeLog] Actualize 2017-12-10 23:15:24 +07:00
Sergey M․
c38970ca10 [culturebox] Improve video id extraction (closes #14947) 2017-12-10 22:46:21 +07:00
Remita Amine
51f2863357 [twitter] improve extraction(closes #14197) 2017-12-10 14:11:09 +01:00
Sergey M․
913b61eeee [udemy] Extract more HLS formats 2017-12-09 20:02:54 +07:00
Sergey M․
6f1ec339a0 [udemy] Improve course id extraction (closes #14938) 2017-12-09 20:02:49 +07:00
Sergey M․
a3de5e6c0e [stretchinternet] Fix issues and improve (closes #14576) 2017-12-09 17:59:08 +07:00
Andrew Bottom
f4cc03d60b [stretchinternet] Add extractor 2017-12-09 17:58:49 +07:00
Sergey M․
2a57b62b80 [ellentube] Fix issues, improve and simplify (closes #14570) 2017-12-09 02:16:54 +07:00
Alex Seiler
e2707a832c [ellentube] Fix extraction (closes #14407) 2017-12-09 02:16:48 +07:00
Sergey M․
1115271ac6 [raiplay:playlist] Fix issues and improve (closes #14563) 2017-12-09 00:48:04 +07:00
Timendum
d21d0ba6c1 [raiplay:playlist] Add extractor 2017-12-09 00:47:40 +07:00
Sergey M․
a670b1ba26 [README.md] Add is_live, start_time and end_time to output template section (closes #14926) 2017-12-07 22:16:41 +07:00
Remita Amine
1bd4fc96e6 [sonyliv] extract higher quality formats and bypass geo restriction(closes #14922) 2017-12-07 08:46:30 +01:00
Remita Amine
684ae10236 [fox] add support for adobe pass auth and extract subtitles(close #14489)(closes #14205) 2017-12-06 22:56:14 +01:00
Remita Amine
3c4fbfeca2 [dailymotion] remove dailymotion cloud extractor(closes #6794)
https://web.archive.org/web/20160312110217/https://www.dmcloud.net/
2017-12-06 10:56:48 +01:00
Windom
b271e33526 [xhamster] Add support for mobile URLs and fix thumbnail extraction 2017-12-06 00:08:31 +07:00
Sergey M․
d3f8b76b69 [extractor/generic] Fix typo (closes #14902)
Don't pass video_id as mpd_id
2017-12-05 23:11:15 +07:00
Sergey M․
91328f26b0 [ard] Skip invalid stream URLs (closes #14906) 2017-12-05 23:01:57 +07:00
Sergey M․
61d18c8a4b [porncom] Fix metadata extraction (closes #14911) 2017-12-05 22:42:02 +07:00
Sergey M․
c94427dd60 [pluralsight] Detect agreement request (#14913) 2017-12-05 22:34:56 +07:00
Remita Amine
d4f05d4731 [utils] add sami mimetype to mimetype2ext 2017-12-03 00:04:43 +01:00
Remita Amine
d7df308981 [toutv] fix login(closes 14614) 2017-12-02 20:22:40 +01:00
23 changed files with 419 additions and 302 deletions

View File

@@ -6,8 +6,8 @@
--- ---
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2017.12.02*. If it's not, read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected. ### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2017.12.10*. If it's not, read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2017.12.02** - [ ] I've **verified** and **I assure** that I'm running youtube-dl **2017.12.10**
### Before submitting an *issue* make sure you have: ### Before submitting an *issue* make sure you have:
- [ ] At least skimmed through the [README](https://github.com/rg3/youtube-dl/blob/master/README.md), **most notably** the [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections - [ ] At least skimmed through the [README](https://github.com/rg3/youtube-dl/blob/master/README.md), **most notably** the [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
@@ -35,7 +35,7 @@ Add the `-v` flag to **your command line** you run youtube-dl with (`youtube-dl
[debug] User config: [] [debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj'] [debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251 [debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2017.12.02 [debug] youtube-dl version 2017.12.10
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2 [debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4 [debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {} [debug] Proxy map: {}

View File

@@ -1,3 +1,30 @@
version 2017.12.10
Core
+ [utils] Add sami mimetype to mimetype2ext
Extractors
* [culturebox] Improve video id extraction (#14947)
* [twitter] Improve extraction (#14197)
+ [udemy] Extract more HLS formats
* [udemy] Improve course id extraction (#14938)
+ [stretchinternet] Add support for portal.stretchinternet.com (#14576)
* [ellentube] Fix extraction (#14407, #14570)
+ [raiplay:playlist] Add support for playlists (#14563)
* [sonyliv] Bypass geo restriction
* [sonyliv] Extract higher quality formats (#14922)
* [fox] Extract subtitles
+ [fox] Add support for Adobe Pass authentication (#14205, #14489)
- [dailymotion:cloud] Remove extractor (#6794)
* [xhamster] Fix thumbnail extraction (#14780)
+ [xhamster] Add support for mobile URLs (#14780)
* [generic] Don't pass video id as mpd id while extracting DASH (#14902)
* [ard] Skip invalid stream URLs (#14906)
* [porncom] Fix metadata extraction (#14911)
* [pluralsight] Detect agreement request (#14913)
* [toutv] Fix login (#14614)
version 2017.12.02 version 2017.12.02
Core Core

View File

@@ -511,6 +511,9 @@ The basic usage is not to set any template arguments when downloading a single f
- `average_rating` (numeric): Average rating give by users, the scale used depends on the webpage - `average_rating` (numeric): Average rating give by users, the scale used depends on the webpage
- `comment_count` (numeric): Number of comments on the video - `comment_count` (numeric): Number of comments on the video
- `age_limit` (numeric): Age restriction for the video (years) - `age_limit` (numeric): Age restriction for the video (years)
- `is_live` (boolean): Whether this video is a live stream or a fixed-length video
- `start_time` (numeric): Time in seconds where the reproduction should start, as specified in the URL
- `end_time` (numeric): Time in seconds where the reproduction should end, as specified in the URL
- `format` (string): A human-readable description of the format - `format` (string): A human-readable description of the format
- `format_id` (string): Format code specified by `--format` - `format_id` (string): Format code specified by `--format`
- `format_note` (string): Additional info about the format - `format_note` (string): Additional info about the format

View File

@@ -198,7 +198,6 @@
- **dailymotion** - **dailymotion**
- **dailymotion:playlist** - **dailymotion:playlist**
- **dailymotion:user** - **dailymotion:user**
- **DailymotionCloud**
- **DaisukiMotto** - **DaisukiMotto**
- **DaisukiMottoPlaylist** - **DaisukiMottoPlaylist**
- **daum.net** - **daum.net**
@@ -243,8 +242,9 @@
- **eHow** - **eHow**
- **Einthusan** - **Einthusan**
- **eitb.tv** - **eitb.tv**
- **EllenTV** - **EllenTube**
- **EllenTV:clips** - **EllenTubePlaylist**
- **EllenTubeVideo**
- **ElPais**: El País - **ElPais**: El País
- **Embedly** - **Embedly**
- **EMPFlix** - **EMPFlix**
@@ -662,6 +662,7 @@
- **Rai** - **Rai**
- **RaiPlay** - **RaiPlay**
- **RaiPlayLive** - **RaiPlayLive**
- **RaiPlayPlaylist**
- **RBMARadio** - **RBMARadio**
- **RDS**: RDS.ca - **RDS**: RDS.ca
- **RedBullTV** - **RedBullTV**
@@ -781,6 +782,7 @@
- **streamcloud.eu** - **streamcloud.eu**
- **StreamCZ** - **StreamCZ**
- **StreetVoice** - **StreetVoice**
- **StretchInternet**
- **SunPorno** - **SunPorno**
- **SVT** - **SVT**
- **SVTPlay**: SVT Play and Öppet arkiv - **SVTPlay**: SVT Play and Öppet arkiv

View File

@@ -5,6 +5,7 @@ import re
from .common import InfoExtractor from .common import InfoExtractor
from .generic import GenericIE from .generic import GenericIE
from ..compat import compat_str
from ..utils import ( from ..utils import (
determine_ext, determine_ext,
ExtractorError, ExtractorError,
@@ -126,6 +127,8 @@ class ARDMediathekIE(InfoExtractor):
quality = stream.get('_quality') quality = stream.get('_quality')
server = stream.get('_server') server = stream.get('_server')
for stream_url in stream_urls: for stream_url in stream_urls:
if not isinstance(stream_url, compat_str) or '//' not in stream_url:
continue
ext = determine_ext(stream_url) ext = determine_ext(stream_url)
if quality != 'auto' and ext in ('f4m', 'm3u8'): if quality != 'auto' and ext in ('f4m', 'm3u8'):
continue continue
@@ -146,13 +149,11 @@ class ARDMediathekIE(InfoExtractor):
'play_path': stream_url, 'play_path': stream_url,
'format_id': 'a%s-rtmp-%s' % (num, quality), 'format_id': 'a%s-rtmp-%s' % (num, quality),
} }
elif stream_url.startswith('http'): else:
f = { f = {
'url': stream_url, 'url': stream_url,
'format_id': 'a%s-%s-%s' % (num, ext, quality) 'format_id': 'a%s-%s-%s' % (num, ext, quality)
} }
else:
continue
m = re.search(r'_(?P<width>\d+)x(?P<height>\d+)\.mp4$', stream_url) m = re.search(r'_(?P<width>\d+)x(?P<height>\d+)\.mp4$', stream_url)
if m: if m:
f.update({ f.update({

View File

@@ -413,52 +413,3 @@ class DailymotionUserIE(DailymotionPlaylistIE):
'title': full_user, 'title': full_user,
'entries': self._extract_entries(user), 'entries': self._extract_entries(user),
} }
class DailymotionCloudIE(DailymotionBaseInfoExtractor):
_VALID_URL_PREFIX = r'https?://api\.dmcloud\.net/(?:player/)?embed/'
_VALID_URL = r'%s[^/]+/(?P<id>[^/?]+)' % _VALID_URL_PREFIX
_VALID_EMBED_URL = r'%s[^/]+/[^\'"]+' % _VALID_URL_PREFIX
_TESTS = [{
# From http://www.francetvinfo.fr/economie/entreprises/les-entreprises-familiales-le-secret-de-la-reussite_933271.html
# Tested at FranceTvInfo_2
'url': 'http://api.dmcloud.net/embed/4e7343f894a6f677b10006b4/556e03339473995ee145930c?auth=1464865870-0-jyhsm84b-ead4c701fb750cf9367bf4447167a3db&autoplay=1',
'only_matching': True,
}, {
# http://www.francetvinfo.fr/societe/larguez-les-amarres-le-cobaturage-se-developpe_980101.html
'url': 'http://api.dmcloud.net/player/embed/4e7343f894a6f677b10006b4/559545469473996d31429f06?auth=1467430263-0-90tglw2l-a3a4b64ed41efe48d7fccad85b8b8fda&autoplay=1',
'only_matching': True,
}]
@classmethod
def _extract_dmcloud_url(cls, webpage):
mobj = re.search(r'<iframe[^>]+src=[\'"](%s)[\'"]' % cls._VALID_EMBED_URL, webpage)
if mobj:
return mobj.group(1)
mobj = re.search(
r'<input[^>]+id=[\'"]dmcloudUrlEmissionSelect[\'"][^>]+value=[\'"](%s)[\'"]' % cls._VALID_EMBED_URL,
webpage)
if mobj:
return mobj.group(1)
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage_no_ff(url, video_id)
title = self._html_search_regex(r'<title>([^>]+)</title>', webpage, 'title')
video_info = self._parse_json(self._search_regex(
r'var\s+info\s*=\s*([^;]+);', webpage, 'video info'), video_id)
# TODO: parse ios_url, which is in fact a manifest
video_url = video_info['mp4_url']
return {
'id': video_id,
'url': video_url,
'title': title,
'thumbnail': video_info.get('thumbnail_url'),
}

View File

@@ -0,0 +1,133 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
clean_html,
extract_attributes,
float_or_none,
int_or_none,
try_get,
)
class EllenTubeBaseIE(InfoExtractor):
def _extract_data_config(self, webpage, video_id):
details = self._search_regex(
r'(<[^>]+\bdata-component=(["\'])[Dd]etails.+?></div>)', webpage,
'details')
return self._parse_json(
extract_attributes(details)['data-config'], video_id)
def _extract_video(self, data, video_id):
title = data['title']
formats = []
duration = None
for entry in data.get('media'):
if entry.get('id') == 'm3u8':
formats = self._extract_m3u8_formats(
entry['url'], video_id, 'mp4',
entry_protocol='m3u8_native', m3u8_id='hls')
duration = int_or_none(entry.get('duration'))
break
self._sort_formats(formats)
def get_insight(kind):
return int_or_none(try_get(
data, lambda x: x['insight']['%ss' % kind]))
return {
'extractor_key': EllenTubeIE.ie_key(),
'id': video_id,
'title': title,
'description': data.get('description'),
'duration': duration,
'thumbnail': data.get('thumbnail'),
'timestamp': float_or_none(data.get('publishTime'), scale=1000),
'view_count': get_insight('view'),
'like_count': get_insight('like'),
'formats': formats,
}
class EllenTubeIE(EllenTubeBaseIE):
_VALID_URL = r'''(?x)
(?:
ellentube:|
https://api-prod\.ellentube\.com/ellenapi/api/item/
)
(?P<id>[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})
'''
_TESTS = [{
'url': 'https://api-prod.ellentube.com/ellenapi/api/item/0822171c-3829-43bf-b99f-d77358ae75e3',
'md5': '2fabc277131bddafdd120e0fc0f974c9',
'info_dict': {
'id': '0822171c-3829-43bf-b99f-d77358ae75e3',
'ext': 'mp4',
'title': 'Ellen Meets Las Vegas Survivors Jesus Campos and Stephen Schuck',
'description': 'md5:76e3355e2242a78ad9e3858e5616923f',
'thumbnail': r're:^https?://.+?',
'duration': 514,
'timestamp': 1508505120,
'upload_date': '20171020',
'view_count': int,
'like_count': int,
}
}, {
'url': 'ellentube:734a3353-f697-4e79-9ca9-bfc3002dc1e0',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
data = self._download_json(
'https://api-prod.ellentube.com/ellenapi/api/item/%s' % video_id,
video_id)
return self._extract_video(data, video_id)
class EllenTubeVideoIE(EllenTubeBaseIE):
_VALID_URL = r'https?://(?:www\.)?ellentube\.com/video/(?P<id>.+?)\.html'
_TEST = {
'url': 'https://www.ellentube.com/video/ellen-meets-las-vegas-survivors-jesus-campos-and-stephen-schuck.html',
'only_matching': True,
}
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
video_id = self._extract_data_config(webpage, display_id)['id']
return self.url_result(
'ellentube:%s' % video_id, ie=EllenTubeIE.ie_key(),
video_id=video_id)
class EllenTubePlaylistIE(EllenTubeBaseIE):
_VALID_URL = r'https?://(?:www\.)?ellentube\.com/(?:episode|studios)/(?P<id>.+?)\.html'
_TESTS = [{
'url': 'https://www.ellentube.com/episode/dax-shepard-jordan-fisher-haim.html',
'info_dict': {
'id': 'dax-shepard-jordan-fisher-haim',
'title': "Dax Shepard, 'DWTS' Team Jordan Fisher & Lindsay Arnold, HAIM",
'description': 'md5:bfc982194dabb3f4e325e43aa6b2e21c',
},
'playlist_count': 6,
}, {
'url': 'https://www.ellentube.com/studios/macey-goes-rving0.html',
'only_matching': True,
}]
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
data = self._extract_data_config(webpage, display_id)['data']
feed = self._download_json(
'https://api-prod.ellentube.com/ellenapi/api/feed/?%s'
% data['filter'], display_id)
entries = [
self._extract_video(elem, elem['id'])
for elem in feed if elem.get('type') == 'VIDEO' and elem.get('id')]
return self.playlist_result(
entries, display_id, data.get('title'),
clean_html(data.get('description')))

View File

@@ -1,101 +0,0 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from .kaltura import KalturaIE
from ..utils import NO_DEFAULT
class EllenTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?(?:ellentv|ellentube)\.com/videos/(?P<id>[a-z0-9_-]+)'
_TESTS = [{
'url': 'http://www.ellentv.com/videos/0-ipq1gsai/',
'md5': '4294cf98bc165f218aaa0b89e0fd8042',
'info_dict': {
'id': '0_ipq1gsai',
'ext': 'mov',
'title': 'Fast Fingers of Fate',
'description': 'md5:3539013ddcbfa64b2a6d1b38d910868a',
'timestamp': 1428035648,
'upload_date': '20150403',
'uploader_id': 'batchUser',
},
}, {
# not available via http://widgets.ellentube.com/
'url': 'http://www.ellentv.com/videos/1-szkgu2m2/',
'info_dict': {
'id': '1_szkgu2m2',
'ext': 'flv',
'title': "Ellen's Amazingly Talented Audience",
'description': 'md5:86ff1e376ff0d717d7171590e273f0a5',
'timestamp': 1255140900,
'upload_date': '20091010',
'uploader_id': 'ellenkaltura@gmail.com',
},
'params': {
'skip_download': True,
},
}]
def _real_extract(self, url):
video_id = self._match_id(url)
URLS = ('http://widgets.ellentube.com/videos/%s' % video_id, url)
for num, url_ in enumerate(URLS, 1):
webpage = self._download_webpage(
url_, video_id, fatal=num == len(URLS))
default = NO_DEFAULT if num == len(URLS) else None
partner_id = self._search_regex(
r"var\s+partnerId\s*=\s*'([^']+)", webpage, 'partner id',
default=default)
kaltura_id = self._search_regex(
[r'id="kaltura_player_([^"]+)"',
r"_wb_entry_id\s*:\s*'([^']+)",
r'data-kaltura-entry-id="([^"]+)'],
webpage, 'kaltura id', default=default)
if partner_id and kaltura_id:
break
return self.url_result('kaltura:%s:%s' % (partner_id, kaltura_id), KalturaIE.ie_key())
class EllenTVClipsIE(InfoExtractor):
IE_NAME = 'EllenTV:clips'
_VALID_URL = r'https?://(?:www\.)?ellentv\.com/episodes/(?P<id>[a-z0-9_-]+)'
_TEST = {
'url': 'http://www.ellentv.com/episodes/meryl-streep-vanessa-hudgens/',
'info_dict': {
'id': 'meryl-streep-vanessa-hudgens',
'title': 'Meryl Streep, Vanessa Hudgens',
},
'playlist_mincount': 5,
}
def _real_extract(self, url):
playlist_id = self._match_id(url)
webpage = self._download_webpage(url, playlist_id)
playlist = self._extract_playlist(webpage, playlist_id)
return {
'_type': 'playlist',
'id': playlist_id,
'title': self._og_search_title(webpage),
'entries': self._extract_entries(playlist)
}
def _extract_playlist(self, webpage, playlist_id):
json_string = self._search_regex(r'playerView.addClips\(\[\{(.*?)\}\]\);', webpage, 'json')
return self._parse_json('[{' + json_string + '}]', playlist_id)
def _extract_entries(self, playlist):
return [
self.url_result(
'kaltura:%s:%s' % (item['kaltura_partner_id'], item['kaltura_entry_id']),
KalturaIE.ie_key(), video_id=item['kaltura_entry_id'])
for item in playlist]

View File

@@ -246,7 +246,6 @@ from .dailymotion import (
DailymotionIE, DailymotionIE,
DailymotionPlaylistIE, DailymotionPlaylistIE,
DailymotionUserIE, DailymotionUserIE,
DailymotionCloudIE,
) )
from .daisuki import ( from .daisuki import (
DaisukiMottoIE, DaisukiMottoIE,
@@ -312,9 +311,10 @@ from .ehow import EHowIE
from .eighttracks import EightTracksIE from .eighttracks import EightTracksIE
from .einthusan import EinthusanIE from .einthusan import EinthusanIE
from .eitb import EitbIE from .eitb import EitbIE
from .ellentv import ( from .ellentube import (
EllenTVIE, EllenTubeIE,
EllenTVClipsIE, EllenTubeVideoIE,
EllenTubePlaylistIE,
) )
from .elpais import ElPaisIE from .elpais import ElPaisIE
from .embedly import EmbedlyIE from .embedly import EmbedlyIE
@@ -857,6 +857,7 @@ from .radiofrance import RadioFranceIE
from .rai import ( from .rai import (
RaiPlayIE, RaiPlayIE,
RaiPlayLiveIE, RaiPlayLiveIE,
RaiPlayPlaylistIE,
RaiIE, RaiIE,
) )
from .rbmaradio import RBMARadioIE from .rbmaradio import RBMARadioIE
@@ -1000,6 +1001,7 @@ from .streamango import StreamangoIE
from .streamcloud import StreamcloudIE from .streamcloud import StreamcloudIE
from .streamcz import StreamCZIE from .streamcz import StreamCZIE
from .streetvoice import StreetVoiceIE from .streetvoice import StreetVoiceIE
from .stretchinternet import StretchInternetIE
from .sunporno import SunPornoIE from .sunporno import SunPornoIE
from .svt import ( from .svt import (
SVTIE, SVTIE,

View File

@@ -11,6 +11,7 @@ from ..utils import (
parse_duration, parse_duration,
try_get, try_get,
unified_timestamp, unified_timestamp,
update_url_query,
) )
@@ -62,7 +63,8 @@ class FOXIE(AdobePassIE):
duration = int_or_none(video.get('durationInSeconds')) or int_or_none( duration = int_or_none(video.get('durationInSeconds')) or int_or_none(
video.get('duration')) or parse_duration(video.get('duration')) video.get('duration')) or parse_duration(video.get('duration'))
timestamp = unified_timestamp(video.get('datePublished')) timestamp = unified_timestamp(video.get('datePublished'))
age_limit = parse_age_limit(video.get('contentRating')) rating = video.get('contentRating')
age_limit = parse_age_limit(rating)
data = try_get( data = try_get(
video, lambda x: x['trackingData']['properties'], dict) or {} video, lambda x: x['trackingData']['properties'], dict) or {}
@@ -77,8 +79,24 @@ class FOXIE(AdobePassIE):
release_year = int_or_none(video.get('releaseYear')) release_year = int_or_none(video.get('releaseYear'))
if data.get('authRequired'): if data.get('authRequired'):
# TODO: AP resource = self._get_mvpd_resource(
pass 'fbc-fox', title, video.get('guid'), rating)
release_url = update_url_query(
release_url, {
'auth': self._extract_mvpd_auth(
url, video_id, 'fbc-fox', resource)
})
subtitles = {}
for doc_rel in video.get('documentReleases', []):
rel_url = doc_rel.get('url')
if not url or doc_rel.get('format') != 'SCC':
continue
subtitles['en'] = [{
'url': rel_url,
'ext': 'scc',
}]
break
info = { info = {
'id': video_id, 'id': video_id,
@@ -93,6 +111,7 @@ class FOXIE(AdobePassIE):
'episode': episode, 'episode': episode,
'episode_number': episode_number, 'episode_number': episode_number,
'release_year': release_year, 'release_year': release_year,
'subtitles': subtitles,
} }
urlh = self._request_webpage(HEADRequest(release_url), video_id) urlh = self._request_webpage(HEADRequest(release_url), video_id)

View File

@@ -13,10 +13,7 @@ from ..utils import (
parse_duration, parse_duration,
determine_ext, determine_ext,
) )
from .dailymotion import ( from .dailymotion import DailymotionIE
DailymotionIE,
DailymotionCloudIE,
)
class FranceTVBaseInfoExtractor(InfoExtractor): class FranceTVBaseInfoExtractor(InfoExtractor):
@@ -290,10 +287,6 @@ class FranceTVInfoIE(FranceTVBaseInfoExtractor):
page_title = mobj.group('title') page_title = mobj.group('title')
webpage = self._download_webpage(url, page_title) webpage = self._download_webpage(url, page_title)
dmcloud_url = DailymotionCloudIE._extract_dmcloud_url(webpage)
if dmcloud_url:
return self.url_result(dmcloud_url, DailymotionCloudIE.ie_key())
dailymotion_urls = DailymotionIE._extract_urls(webpage) dailymotion_urls = DailymotionIE._extract_urls(webpage)
if dailymotion_urls: if dailymotion_urls:
return self.playlist_result([ return self.playlist_result([
@@ -363,6 +356,7 @@ class CultureboxIE(FranceTVBaseInfoExtractor):
raise ExtractorError('Video %s is not available' % name, expected=True) raise ExtractorError('Video %s is not available' % name, expected=True)
video_id, catalogue = self._search_regex( video_id, catalogue = self._search_regex(
r'"https?://videos\.francetv\.fr/video/([^@]+@[^"]+)"', webpage, 'video id').split('@') r'["\'>]https?://videos\.francetv\.fr/video/([^@]+@.+?)["\'<]',
webpage, 'video id').split('@')
return self._extract_video(video_id, catalogue) return self._extract_video(video_id, catalogue)

View File

@@ -59,10 +59,7 @@ from .tnaflix import TNAFlixNetworkEmbedIE
from .drtuber import DrTuberIE from .drtuber import DrTuberIE
from .redtube import RedTubeIE from .redtube import RedTubeIE
from .vimeo import VimeoIE from .vimeo import VimeoIE
from .dailymotion import ( from .dailymotion import DailymotionIE
DailymotionIE,
DailymotionCloudIE,
)
from .dailymail import DailyMailIE from .dailymail import DailyMailIE
from .onionstudios import OnionStudiosIE from .onionstudios import OnionStudiosIE
from .viewlift import ViewLiftEmbedIE from .viewlift import ViewLiftEmbedIE
@@ -1472,23 +1469,6 @@ class GenericIE(InfoExtractor):
'timestamp': 1432570283, 'timestamp': 1432570283,
}, },
}, },
# Dailymotion Cloud video
{
'url': 'http://replay.publicsenat.fr/vod/le-debat/florent-kolandjian,dominique-cena,axel-decourtye,laurence-abeille,bruno-parmentier/175910',
'md5': 'dcaf23ad0c67a256f4278bce6e0bae38',
'info_dict': {
'id': 'x2uy8t3',
'ext': 'mp4',
'title': 'Sauvons les abeilles ! - Le débat',
'description': 'md5:d9082128b1c5277987825d684939ca26',
'thumbnail': r're:^https?://.*\.jpe?g$',
'timestamp': 1434970506,
'upload_date': '20150622',
'uploader': 'Public Sénat',
'uploader_id': 'xa9gza',
},
'skip': 'File not found.',
},
# OnionStudios embed # OnionStudios embed
{ {
'url': 'http://www.clickhole.com/video/dont-understand-bitcoin-man-will-mumble-explanatio-2537', 'url': 'http://www.clickhole.com/video/dont-understand-bitcoin-man-will-mumble-explanatio-2537',
@@ -2195,7 +2175,7 @@ class GenericIE(InfoExtractor):
return self.playlist_result(self._parse_xspf(doc, video_id), video_id) return self.playlist_result(self._parse_xspf(doc, video_id), video_id)
elif re.match(r'(?i)^(?:{[^}]+})?MPD$', doc.tag): elif re.match(r'(?i)^(?:{[^}]+})?MPD$', doc.tag):
info_dict['formats'] = self._parse_mpd_formats( info_dict['formats'] = self._parse_mpd_formats(
doc, video_id, doc,
mpd_base_url=compat_str(full_response.geturl()).rpartition('/')[0], mpd_base_url=compat_str(full_response.geturl()).rpartition('/')[0],
mpd_url=url) mpd_url=url)
self._sort_formats(info_dict['formats']) self._sort_formats(info_dict['formats'])
@@ -2704,11 +2684,6 @@ class GenericIE(InfoExtractor):
if senate_isvp_url: if senate_isvp_url:
return self.url_result(senate_isvp_url, 'SenateISVP') return self.url_result(senate_isvp_url, 'SenateISVP')
# Look for Dailymotion Cloud videos
dmcloud_url = DailymotionCloudIE._extract_dmcloud_url(webpage)
if dmcloud_url:
return self.url_result(dmcloud_url, 'DailymotionCloud')
# Look for OnionStudios embeds # Look for OnionStudios embeds
onionstudios_url = OnionStudiosIE._extract_url(webpage) onionstudios_url = OnionStudiosIE._extract_url(webpage)
if onionstudios_url: if onionstudios_url:

View File

@@ -131,6 +131,13 @@ class PluralsightIE(PluralsightBaseIE):
if BLOCKED in response: if BLOCKED in response:
raise ExtractorError( raise ExtractorError(
'Unable to login: %s' % BLOCKED, expected=True) 'Unable to login: %s' % BLOCKED, expected=True)
MUST_AGREE = 'To continue using Pluralsight, you must agree to'
if any(p in response for p in (MUST_AGREE, '>Disagree<', '>Agree<')):
raise ExtractorError(
'Unable to login: %s some documents. Go to pluralsight.com, '
'log in and agree with what Pluralsight requires.'
% MUST_AGREE, expected=True)
raise ExtractorError('Unable to log in') raise ExtractorError('Unable to log in')
def _get_subtitles(self, author, clip_id, lang, name, duration, video_id): def _get_subtitles(self, author, clip_id, lang, name, duration, video_id):

View File

@@ -77,12 +77,14 @@ class PornComIE(InfoExtractor):
self._sort_formats(formats) self._sort_formats(formats)
view_count = str_to_int(self._search_regex( view_count = str_to_int(self._search_regex(
r'class=["\']views["\'][^>]*><p>([\d,.]+)', webpage, (r'Views:\s*</span>\s*<span>\s*([\d,.]+)',
r'class=["\']views["\'][^>]*><p>([\d,.]+)'), webpage,
'view count', fatal=False)) 'view count', fatal=False))
def extract_list(kind): def extract_list(kind):
s = self._search_regex( s = self._search_regex(
r'(?s)<p[^>]*>%s:(.+?)</p>' % kind.capitalize(), (r'(?s)%s:\s*</span>\s*<span>(.+?)</span>' % kind.capitalize(),
r'(?s)<p[^>]*>%s:(.+?)</p>' % kind.capitalize()),
webpage, kind, fatal=False) webpage, kind, fatal=False)
return re.findall(r'<a[^>]+>([^<]+)</a>', s or '') return re.findall(r'<a[^>]+>([^<]+)</a>', s or '')

View File

@@ -17,6 +17,7 @@ from ..utils import (
parse_duration, parse_duration,
strip_or_none, strip_or_none,
try_get, try_get,
unescapeHTML,
unified_strdate, unified_strdate,
unified_timestamp, unified_timestamp,
update_url_query, update_url_query,
@@ -249,6 +250,41 @@ class RaiPlayLiveIE(RaiBaseIE):
} }
class RaiPlayPlaylistIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?raiplay\.it/programmi/(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'http://www.raiplay.it/programmi/nondirloalmiocapo/',
'info_dict': {
'id': 'nondirloalmiocapo',
'title': 'Non dirlo al mio capo',
'description': 'md5:9f3d603b2947c1c7abb098f3b14fac86',
},
'playlist_mincount': 12,
}]
def _real_extract(self, url):
playlist_id = self._match_id(url)
webpage = self._download_webpage(url, playlist_id)
title = self._html_search_meta(
('programma', 'nomeProgramma'), webpage, 'title')
description = unescapeHTML(self._html_search_meta(
('description', 'og:description'), webpage, 'description'))
print(description)
entries = []
for mobj in re.finditer(
r'<a\b[^>]+\bhref=(["\'])(?P<path>/raiplay/video/.+?)\1',
webpage):
video_url = urljoin(url, mobj.group('path'))
entries.append(self.url_result(
video_url, ie=RaiPlayIE.ie_key(),
video_id=RaiPlayIE._match_id(video_url)))
return self.playlist_result(entries, playlist_id, title, description)
class RaiIE(RaiBaseIE): class RaiIE(RaiBaseIE):
_VALID_URL = r'https?://[^/]+\.(?:rai\.(?:it|tv)|rainews\.it)/dl/.+?-(?P<id>%s)(?:-.+?)?\.html' % RaiBaseIE._UUID_RE _VALID_URL = r'https?://[^/]+\.(?:rai\.(?:it|tv)|rainews\.it)/dl/.+?-(?P<id>%s)(?:-.+?)?\.html' % RaiBaseIE._UUID_RE
_TESTS = [{ _TESTS = [{

View File

@@ -2,6 +2,7 @@
from __future__ import unicode_literals from __future__ import unicode_literals
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import smuggle_url
class SonyLIVIE(InfoExtractor): class SonyLIVIE(InfoExtractor):
@@ -10,12 +11,12 @@ class SonyLIVIE(InfoExtractor):
'url': "http://www.sonyliv.com/details/episodes/5024612095001/Ep.-1---Achaari-Cheese-Toast---Bachelor's-Delight", 'url': "http://www.sonyliv.com/details/episodes/5024612095001/Ep.-1---Achaari-Cheese-Toast---Bachelor's-Delight",
'info_dict': { 'info_dict': {
'title': "Ep. 1 - Achaari Cheese Toast - Bachelor's Delight", 'title': "Ep. 1 - Achaari Cheese Toast - Bachelor's Delight",
'id': '5024612095001', 'id': 'ref:5024612095001',
'ext': 'mp4', 'ext': 'mp4',
'upload_date': '20160707', 'upload_date': '20170923',
'description': 'md5:7f28509a148d5be9d0782b4d5106410d', 'description': 'md5:7f28509a148d5be9d0782b4d5106410d',
'uploader_id': '4338955589001', 'uploader_id': '5182475815001',
'timestamp': 1467870968, 'timestamp': 1506200547,
}, },
'params': { 'params': {
'skip_download': True, 'skip_download': True,
@@ -26,9 +27,11 @@ class SonyLIVIE(InfoExtractor):
'only_matching': True, 'only_matching': True,
}] }]
BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/4338955589001/default_default/index.html?videoId=%s' # BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/4338955589001/default_default/index.html?videoId=%s'
BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/5182475815001/default_default/index.html?videoId=ref:%s'
def _real_extract(self, url): def _real_extract(self, url):
brightcove_id = self._match_id(url) brightcove_id = self._match_id(url)
return self.url_result( return self.url_result(
self.BRIGHTCOVE_URL_TEMPLATE % brightcove_id, 'BrightcoveNew', brightcove_id) smuggle_url(self.BRIGHTCOVE_URL_TEMPLATE % brightcove_id, {'geo_countries': ['IN']}),
'BrightcoveNew', brightcove_id)

View File

@@ -0,0 +1,48 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import int_or_none
class StretchInternetIE(InfoExtractor):
_VALID_URL = r'https?://portal\.stretchinternet\.com/[^/]+/portal\.htm\?.*?\beventId=(?P<id>\d+)'
_TEST = {
'url': 'https://portal.stretchinternet.com/umary/portal.htm?eventId=313900&streamType=video',
'info_dict': {
'id': '313900',
'ext': 'mp4',
'title': 'Augustana (S.D.) Baseball vs University of Mary',
'description': 'md5:7578478614aae3bdd4a90f578f787438',
'timestamp': 1490468400,
'upload_date': '20170325',
}
}
def _real_extract(self, url):
video_id = self._match_id(url)
stream = self._download_json(
'https://neo-client.stretchinternet.com/streamservice/v1/media/stream/v%s'
% video_id, video_id)
video_url = 'https://%s' % stream['source']
event = self._download_json(
'https://neo-client.stretchinternet.com/portal-ws/getEvent.json',
video_id, query={
'clientID': 99997,
'eventID': video_id,
'token': 'asdf',
})['event']
title = event.get('title') or event['mobileTitle']
description = event.get('customText')
timestamp = int_or_none(event.get('longtime'))
return {
'id': video_id,
'title': title,
'description': description,
'timestamp': timestamp,
'url': video_url,
}

View File

@@ -1,6 +1,8 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
int_or_none, int_or_none,
@@ -41,7 +43,7 @@ class TouTvIE(InfoExtractor):
email, password = self._get_login_info() email, password = self._get_login_info()
if email is None: if email is None:
return return
state = 'http://ici.tou.tv//' state = 'http://ici.tou.tv/'
webpage = self._download_webpage(state, None, 'Downloading homepage') webpage = self._download_webpage(state, None, 'Downloading homepage')
toutvlogin = self._parse_json(self._search_regex( toutvlogin = self._parse_json(self._search_regex(
r'(?s)toutvlogin\s*=\s*({.+?});', webpage, 'toutvlogin'), None, js_to_json) r'(?s)toutvlogin\s*=\s*({.+?});', webpage, 'toutvlogin'), None, js_to_json)
@@ -54,16 +56,30 @@ class TouTvIE(InfoExtractor):
'scope': 'media-drmt openid profile email id.write media-validation.read.privileged', 'scope': 'media-drmt openid profile email id.write media-validation.read.privileged',
'state': state, 'state': state,
}) })
login_form = self._search_regex(
r'(?s)(<form[^>]+(?:id|name)="Form-login".+?</form>)', login_webpage, 'login form') def extract_form_url_and_data(wp, default_form_url, form_spec_re=''):
form_data = self._hidden_inputs(login_form) form, form_elem = re.search(
r'(?s)((<form[^>]+?%s[^>]*?>).+?</form>)' % form_spec_re, wp).groups()
form_data = self._hidden_inputs(form)
form_url = extract_attributes(form_elem).get('action') or default_form_url
return form_url, form_data
post_url, form_data = extract_form_url_and_data(
login_webpage,
'https://services.radio-canada.ca/auth/oauth/v2/authorize/login',
r'(?:id|name)="Form-login"')
form_data.update({ form_data.update({
'login-email': email, 'login-email': email,
'login-password': password, 'login-password': password,
}) })
post_url = extract_attributes(login_form).get('action') or authorize_url consent_webpage = self._download_webpage(
_, urlh = self._download_webpage_handle(
post_url, None, 'Logging in', data=urlencode_postdata(form_data)) post_url, None, 'Logging in', data=urlencode_postdata(form_data))
post_url, form_data = extract_form_url_and_data(
consent_webpage,
'https://services.radio-canada.ca/auth/oauth/v2/authorize/consent')
_, urlh = self._download_webpage_handle(
post_url, None, 'Following Redirection',
data=urlencode_postdata(form_data))
self._access_token = self._search_regex( self._access_token = self._search_regex(
r'access_token=([\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})', r'access_token=([\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})',
urlh.geturl(), 'access token') urlh.geturl(), 'access token')

View File

@@ -43,7 +43,7 @@ class TwitterBaseIE(InfoExtractor):
class TwitterCardIE(TwitterBaseIE): class TwitterCardIE(TwitterBaseIE):
IE_NAME = 'twitter:card' IE_NAME = 'twitter:card'
_VALID_URL = r'https?://(?:www\.)?twitter\.com/i/(?:cards/tfw/v1|videos(?:/tweet)?)/(?P<id>\d+)' _VALID_URL = r'https?://(?:www\.)?twitter\.com/i/(?P<path>cards/tfw/v1|videos(?:/tweet)?)/(?P<id>\d+)'
_TESTS = [ _TESTS = [
{ {
'url': 'https://twitter.com/i/cards/tfw/v1/560070183650213889', 'url': 'https://twitter.com/i/cards/tfw/v1/560070183650213889',
@@ -51,11 +51,10 @@ class TwitterCardIE(TwitterBaseIE):
'info_dict': { 'info_dict': {
'id': '560070183650213889', 'id': '560070183650213889',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Twitter Card', 'title': 'Twitter web player',
'thumbnail': r're:^https?://.*\.jpg$', 'thumbnail': r're:^https?://.*\.jpg$',
'duration': 30.033, 'duration': 30.033,
}, },
'skip': 'Video gone',
}, },
{ {
'url': 'https://twitter.com/i/cards/tfw/v1/623160978427936768', 'url': 'https://twitter.com/i/cards/tfw/v1/623160978427936768',
@@ -63,11 +62,9 @@ class TwitterCardIE(TwitterBaseIE):
'info_dict': { 'info_dict': {
'id': '623160978427936768', 'id': '623160978427936768',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Twitter Card', 'title': 'Twitter web player',
'thumbnail': r're:^https?://.*\.jpg', 'thumbnail': r're:^https?://.*(?:\bformat=|\.)jpg',
'duration': 80.155,
}, },
'skip': 'Video gone',
}, },
{ {
'url': 'https://twitter.com/i/cards/tfw/v1/654001591733886977', 'url': 'https://twitter.com/i/cards/tfw/v1/654001591733886977',
@@ -120,15 +117,15 @@ class TwitterCardIE(TwitterBaseIE):
elif media_url.endswith('.mpd'): elif media_url.endswith('.mpd'):
formats.extend(self._extract_mpd_formats(media_url, video_id, mpd_id='dash')) formats.extend(self._extract_mpd_formats(media_url, video_id, mpd_id='dash'))
else: else:
vbr = int_or_none(dict_get(media_variant, ('bitRate', 'bitrate')), scale=1000) tbr = int_or_none(dict_get(media_variant, ('bitRate', 'bitrate')), scale=1000)
a_format = { a_format = {
'url': media_url, 'url': media_url,
'format_id': 'http-%d' % vbr if vbr else 'http', 'format_id': 'http-%d' % tbr if tbr else 'http',
'vbr': vbr, 'tbr': tbr,
} }
# Reported bitRate may be zero # Reported bitRate may be zero
if not a_format['vbr']: if not a_format['tbr']:
del a_format['vbr'] del a_format['tbr']
self._search_dimensions_in_video_url(a_format, media_url) self._search_dimensions_in_video_url(a_format, media_url)
@@ -150,79 +147,83 @@ class TwitterCardIE(TwitterBaseIE):
bearer_token = self._search_regex( bearer_token = self._search_regex(
r'BEARER_TOKEN\s*:\s*"([^"]+)"', r'BEARER_TOKEN\s*:\s*"([^"]+)"',
main_script, 'bearer token') main_script, 'bearer token')
guest_token = self._search_regex( # https://developer.twitter.com/en/docs/tweets/post-and-engage/api-reference/get-statuses-show-id
r'document\.cookie\s*=\s*decodeURIComponent\("gt=(\d+)',
webpage, 'guest token')
api_data = self._download_json( api_data = self._download_json(
'https://api.twitter.com/2/timeline/conversation/%s.json' % video_id, 'https://api.twitter.com/1.1/statuses/show/%s.json' % video_id,
video_id, 'Downloading mobile API data', video_id, 'Downloading API data',
headers={ headers={
'Authorization': 'Bearer ' + bearer_token, 'Authorization': 'Bearer ' + bearer_token,
'x-guest-token': guest_token,
}) })
media_info = try_get(api_data, lambda o: o['globalObjects']['tweets'][video_id] media_info = try_get(api_data, lambda o: o['extended_entities']['media'][0]['video_info']) or {}
['extended_entities']['media'][0]['video_info']) or {}
return self._parse_media_info(media_info, video_id) return self._parse_media_info(media_info, video_id)
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) path, video_id = re.search(self._VALID_URL, url).groups()
config = None config = None
formats = [] formats = []
duration = None duration = None
webpage = self._download_webpage(url, video_id) urls = [url]
if path.startswith('cards/'):
urls.append('https://twitter.com/i/videos/' + video_id)
iframe_url = self._html_search_regex( for u in urls:
r'<iframe[^>]+src="((?:https?:)?//(?:www\.youtube\.com/embed/[^"]+|(?:www\.)?vine\.co/v/\w+/card))"', webpage = self._download_webpage(u, video_id)
webpage, 'video iframe', default=None)
if iframe_url:
return self.url_result(iframe_url)
config = self._parse_json(self._html_search_regex( iframe_url = self._html_search_regex(
r'data-(?:player-)?config="([^"]+)"', webpage, r'<iframe[^>]+src="((?:https?:)?//(?:www\.youtube\.com/embed/[^"]+|(?:www\.)?vine\.co/v/\w+/card))"',
'data player config', default='{}'), webpage, 'video iframe', default=None)
video_id) if iframe_url:
return self.url_result(iframe_url)
if config.get('source_type') == 'vine': config = self._parse_json(self._html_search_regex(
return self.url_result(config['player_url'], 'Vine') r'data-(?:player-)?config="([^"]+)"', webpage,
'data player config', default='{}'),
video_id)
periscope_url = PeriscopeIE._extract_url(webpage) if config.get('source_type') == 'vine':
if periscope_url: return self.url_result(config['player_url'], 'Vine')
return self.url_result(periscope_url, PeriscopeIE.ie_key())
video_url = config.get('video_url') or config.get('playlist', [{}])[0].get('source') periscope_url = PeriscopeIE._extract_url(webpage)
if periscope_url:
return self.url_result(periscope_url, PeriscopeIE.ie_key())
if video_url: video_url = config.get('video_url') or config.get('playlist', [{}])[0].get('source')
if determine_ext(video_url) == 'm3u8':
formats.extend(self._extract_m3u8_formats(video_url, video_id, ext='mp4', m3u8_id='hls'))
else:
f = {
'url': video_url,
}
self._search_dimensions_in_video_url(f, video_url) if video_url:
if determine_ext(video_url) == 'm3u8':
formats.extend(self._extract_m3u8_formats(video_url, video_id, ext='mp4', m3u8_id='hls'))
else:
f = {
'url': video_url,
}
formats.append(f) self._search_dimensions_in_video_url(f, video_url)
vmap_url = config.get('vmapUrl') or config.get('vmap_url') formats.append(f)
if vmap_url:
formats.extend(
self._extract_formats_from_vmap_url(vmap_url, video_id))
media_info = None vmap_url = config.get('vmapUrl') or config.get('vmap_url')
if vmap_url:
formats.extend(
self._extract_formats_from_vmap_url(vmap_url, video_id))
for entity in config.get('status', {}).get('entities', []): media_info = None
if 'mediaInfo' in entity:
media_info = entity['mediaInfo']
if media_info: for entity in config.get('status', {}).get('entities', []):
formats.extend(self._parse_media_info(media_info, video_id)) if 'mediaInfo' in entity:
duration = float_or_none(media_info.get('duration', {}).get('nanos'), scale=1e9) media_info = entity['mediaInfo']
username = config.get('user', {}).get('screen_name') if media_info:
if username: formats.extend(self._parse_media_info(media_info, video_id))
formats.extend(self._extract_mobile_formats(username, video_id)) duration = float_or_none(media_info.get('duration', {}).get('nanos'), scale=1e9)
username = config.get('user', {}).get('screen_name')
if username:
formats.extend(self._extract_mobile_formats(username, video_id))
if formats:
break
self._remove_duplicate_formats(formats) self._remove_duplicate_formats(formats)
self._sort_formats(formats) self._sort_formats(formats)
@@ -258,9 +259,6 @@ class TwitterIE(InfoExtractor):
'uploader_id': 'freethenipple', 'uploader_id': 'freethenipple',
'duration': 12.922, 'duration': 12.922,
}, },
'params': {
'skip_download': True, # requires ffmpeg
},
}, { }, {
'url': 'https://twitter.com/giphz/status/657991469417025536/photo/1', 'url': 'https://twitter.com/giphz/status/657991469417025536/photo/1',
'md5': 'f36dcd5fb92bf7057f155e7d927eeb42', 'md5': 'f36dcd5fb92bf7057f155e7d927eeb42',
@@ -277,7 +275,6 @@ class TwitterIE(InfoExtractor):
'skip': 'Account suspended', 'skip': 'Account suspended',
}, { }, {
'url': 'https://twitter.com/starwars/status/665052190608723968', 'url': 'https://twitter.com/starwars/status/665052190608723968',
'md5': '39b7199856dee6cd4432e72c74bc69d4',
'info_dict': { 'info_dict': {
'id': '665052190608723968', 'id': '665052190608723968',
'ext': 'mp4', 'ext': 'mp4',
@@ -303,20 +300,16 @@ class TwitterIE(InfoExtractor):
}, },
}, { }, {
'url': 'https://twitter.com/jaydingeer/status/700207533655363584', 'url': 'https://twitter.com/jaydingeer/status/700207533655363584',
'md5': '',
'info_dict': { 'info_dict': {
'id': '700207533655363584', 'id': '700207533655363584',
'ext': 'mp4', 'ext': 'mp4',
'title': 'あかさ - BEAT PROD: @suhmeduh #Damndaniel', 'title': 'JG - BEAT PROD: @suhmeduh #Damndaniel',
'description': 'あかさ on Twitter: "BEAT PROD: @suhmeduh https://t.co/HBrQ4AfpvZ #Damndaniel https://t.co/byBooq2ejZ"', 'description': 'JG on Twitter: "BEAT PROD: @suhmeduh https://t.co/HBrQ4AfpvZ #Damndaniel https://t.co/byBooq2ejZ"',
'thumbnail': r're:^https?://.*\.jpg', 'thumbnail': r're:^https?://.*\.jpg',
'uploader': 'あかさ', 'uploader': 'JG',
'uploader_id': 'jaydingeer', 'uploader_id': 'jaydingeer',
'duration': 30.0, 'duration': 30.0,
}, },
'params': {
'skip_download': True, # requires ffmpeg
},
}, { }, {
'url': 'https://twitter.com/Filmdrunk/status/713801302971588609', 'url': 'https://twitter.com/Filmdrunk/status/713801302971588609',
'md5': '89a15ed345d13b86e9a5a5e051fa308a', 'md5': '89a15ed345d13b86e9a5a5e051fa308a',
@@ -342,9 +335,6 @@ class TwitterIE(InfoExtractor):
'uploader': 'Captain America', 'uploader': 'Captain America',
'duration': 3.17, 'duration': 3.17,
}, },
'params': {
'skip_download': True, # requires ffmpeg
},
}, { }, {
'url': 'https://twitter.com/OPP_HSD/status/779210622571536384', 'url': 'https://twitter.com/OPP_HSD/status/779210622571536384',
'info_dict': { 'info_dict': {
@@ -370,9 +360,6 @@ class TwitterIE(InfoExtractor):
'uploader_id': 'news_al3alm', 'uploader_id': 'news_al3alm',
'duration': 277.4, 'duration': 277.4,
}, },
'params': {
'format': 'best[format_id^=http-]',
},
}, { }, {
'url': 'https://twitter.com/i/web/status/910031516746514432', 'url': 'https://twitter.com/i/web/status/910031516746514432',
'info_dict': { 'info_dict': {

View File

@@ -62,11 +62,11 @@ class UdemyIE(InfoExtractor):
def _extract_course_info(self, webpage, video_id): def _extract_course_info(self, webpage, video_id):
course = self._parse_json( course = self._parse_json(
unescapeHTML(self._search_regex( unescapeHTML(self._search_regex(
r'ng-init=["\'].*\bcourse=({.+?});', webpage, 'course', default='{}')), r'ng-init=["\'].*\bcourse=({.+?})[;"\']',
webpage, 'course', default='{}')),
video_id, fatal=False) or {} video_id, fatal=False) or {}
course_id = course.get('id') or self._search_regex( course_id = course.get('id') or self._search_regex(
(r'&quot;id&quot;\s*:\s*(\d+)', r'data-course-id=["\'](\d+)'), r'data-course-id=["\'](\d+)', webpage, 'course id')
webpage, 'course id')
return course_id, course.get('title') return course_id, course.get('title')
def _enroll_course(self, base_url, webpage, course_id): def _enroll_course(self, base_url, webpage, course_id):
@@ -257,6 +257,11 @@ class UdemyIE(InfoExtractor):
video_url = source.get('file') or source.get('src') video_url = source.get('file') or source.get('src')
if not video_url or not isinstance(video_url, compat_str): if not video_url or not isinstance(video_url, compat_str):
continue continue
if source.get('type') == 'application/x-mpegURL' or determine_ext(video_url) == 'm3u8':
formats.extend(self._extract_m3u8_formats(
video_url, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls', fatal=False))
continue
format_id = source.get('label') format_id = source.get('label')
f = { f = {
'url': video_url, 'url': video_url,

View File

@@ -75,6 +75,10 @@ class XHamsterIE(InfoExtractor):
'params': { 'params': {
'skip_download': True, 'skip_download': True,
}, },
}, {
# mobile site
'url': 'https://m.xhamster.com/videos/cute-teen-jacqueline-solo-masturbation-8559111',
'only_matching': True,
}, { }, {
'url': 'https://xhamster.com/movies/2272726/amber_slayed_by_the_knight.html', 'url': 'https://xhamster.com/movies/2272726/amber_slayed_by_the_knight.html',
'only_matching': True, 'only_matching': True,
@@ -93,7 +97,8 @@ class XHamsterIE(InfoExtractor):
video_id = mobj.group('id') or mobj.group('id_2') video_id = mobj.group('id') or mobj.group('id_2')
display_id = mobj.group('display_id') or mobj.group('display_id_2') display_id = mobj.group('display_id') or mobj.group('display_id_2')
webpage = self._download_webpage(url, video_id) desktop_url = re.sub(r'^(https?://(?:.+?\.)?)m\.', r'\1', url)
webpage = self._download_webpage(desktop_url, video_id)
error = self._html_search_regex( error = self._html_search_regex(
r'<div[^>]+id=["\']videoClosed["\'][^>]*>(.+?)</div>', r'<div[^>]+id=["\']videoClosed["\'][^>]*>(.+?)</div>',
@@ -229,8 +234,8 @@ class XHamsterIE(InfoExtractor):
webpage, 'uploader', default='anonymous') webpage, 'uploader', default='anonymous')
thumbnail = self._search_regex( thumbnail = self._search_regex(
[r'''thumb\s*:\s*(?P<q>["'])(?P<thumbnail>.+?)(?P=q)''', [r'''["']thumbUrl["']\s*:\s*(?P<q>["'])(?P<thumbnail>.+?)(?P=q)''',
r'''<video[^>]+poster=(?P<q>["'])(?P<thumbnail>.+?)(?P=q)[^>]*>'''], r'''<video[^>]+"poster"=(?P<q>["'])(?P<thumbnail>.+?)(?P=q)[^>]*>'''],
webpage, 'thumbnail', fatal=False, group='thumbnail') webpage, 'thumbnail', fatal=False, group='thumbnail')
duration = parse_duration(self._search_regex( duration = parse_duration(self._search_regex(
@@ -274,15 +279,16 @@ class XHamsterIE(InfoExtractor):
class XHamsterEmbedIE(InfoExtractor): class XHamsterEmbedIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?xhamster\.com/xembed\.php\?video=(?P<id>\d+)' _VALID_URL = r'https?://(?:.+?\.)?xhamster\.com/xembed\.php\?video=(?P<id>\d+)'
_TEST = { _TEST = {
'url': 'http://xhamster.com/xembed.php?video=3328539', 'url': 'http://xhamster.com/xembed.php?video=3328539',
'info_dict': { 'info_dict': {
'id': '3328539', 'id': '3328539',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Pen Masturbation', 'title': 'Pen Masturbation',
'timestamp': 1406581861,
'upload_date': '20140728', 'upload_date': '20140728',
'uploader_id': 'anonymous', 'uploader': 'ManyakisArt',
'duration': 5, 'duration': 5,
'age_limit': 18, 'age_limit': 18,
} }

View File

@@ -2350,6 +2350,7 @@ def mimetype2ext(mt):
'ttml+xml': 'ttml', 'ttml+xml': 'ttml',
'x-flv': 'flv', 'x-flv': 'flv',
'x-mp4-fragmented': 'mp4', 'x-mp4-fragmented': 'mp4',
'x-ms-sami': 'sami',
'x-ms-wmv': 'wmv', 'x-ms-wmv': 'wmv',
'mpegurl': 'm3u8', 'mpegurl': 'm3u8',
'x-mpegurl': 'm3u8', 'x-mpegurl': 'm3u8',

View File

@@ -1,3 +1,3 @@
from __future__ import unicode_literals from __future__ import unicode_literals
__version__ = '2017.12.02' __version__ = '2017.12.10'