Compare commits

...

50 Commits

Author SHA1 Message Date
Sergey M․
5c4bfd4da5 release 2016.10.12 2016-10-12 21:30:05 +07:00
Sergey M․
7104ae799c [ChangeLog] Actualize 2016-10-12 21:25:04 +07:00
Sergey M․
bcd6276520 [downloader/common] Remove debug output 2016-10-12 21:22:33 +07:00
Sergey M․
591e384552 [streamable] Remove debug output 2016-10-12 21:22:12 +07:00
Yen Chi Hsuan
9feb1c9731 [dailymotion] Fix extraction and update _TESTS
Closes #10901

Seems all videos use player V5 syntax now
2016-10-12 21:45:49 +08:00
Yen Chi Hsuan
a093cfc78b [vimeo:review] Fix extraction (#10900)
Now Vimeo Review videos uses React. Thanks @davekaro for analyzing the
problem!
2016-10-12 01:48:06 +08:00
Yen Chi Hsuan
6f20b65e72 [test/test_http] Update tests
After switching to HTML5 extraction helpers in generic.py, the result
info_dict is always a playlist.
2016-10-12 01:41:41 +08:00
Yen Chi Hsuan
cea364f70c [extractor/common] Support HTML media elements without child nodes 2016-10-12 01:40:28 +08:00
Yen Chi Hsuan
55642487f0 [nhl] Skip invalid m3u8 formats (closes #10713) 2016-10-11 20:50:52 +08:00
Yen Chi Hsuan
3d643f4cec [hbo] Add HBOEpisodeIE (#10892) 2016-10-11 17:46:52 +08:00
Yen Chi Hsuan
c452e69d3d [footyroom] Fix extraction and update _TESTS (closes #10810) 2016-10-11 17:46:13 +08:00
Yen Chi Hsuan
555787d717 [streamable] Add helper for extracting embedded videos 2016-10-11 17:44:35 +08:00
Yen Chi Hsuan
f165ca70eb [abc.net.au:iview] Fix for non-series videos (closes #10895) 2016-10-11 12:53:27 +08:00
Yen Chi Hsuan
27b8d2ee95 [hbo] Add display_id and another test (#10892) 2016-10-11 12:41:44 +08:00
Yen Chi Hsuan
71cdcb2331 [hbo] Support episode pages (closes #10892) 2016-10-11 12:30:35 +08:00
Yen Chi Hsuan
176006a120 [allocine] Fix for /video/ videos (closes #10860) 2016-10-09 19:42:42 +08:00
Yen Chi Hsuan
65f4c1de3d [allocine] Fix extraction (closes #10860)
I change the URL of the third test case, because now the original URL
does not contain a video anymore, and there's no easy to get the real
URL from the /film/ one.
2016-10-09 18:58:15 +08:00
Yen Chi Hsuan
b0082629a9 [nextmedia] Support action news (動新聞) on Apple Daily 2016-10-09 18:42:15 +08:00
Yen Chi Hsuan
8204c73352 [Makefile] Fix for GNU make < 4 (closes #9387)
Shell assignment operator in BSD make != is ported to GNU make in
version 4.0, so 3.x doesn't work. I choose to drop BSD make support as
installing GNU make on *BSD systems is easier than installing newer GNU
make.
2016-10-09 18:24:45 +08:00
Déstin Reed
2b51dac1f9 [slutload] Fix test and simplify 2016-10-09 01:17:38 +07:00
Sergey M․
f68901e50a [reverbnation] Eliminate code duplication in thumbnails extraction 2016-10-09 01:02:35 +07:00
Déstin Reed
3adb9d119e [reverbnation] Modernize 2016-10-09 01:00:38 +07:00
Remita Amine
1dd58e14d8 [lego] improve info extraction and bypass geo restriction(closes #10872) 2016-10-08 08:33:18 +01:00
Sergey M․
dd4291f729 release 2016.10.07 2016-10-07 22:25:30 +07:00
Sergey M․
888f8d6ba4 [ChangeLog] Actualize 2016-10-07 22:23:16 +07:00
Sergey M․
f475e88121 [vimeo] PEP 8
[ci skip]
2016-10-07 22:15:26 +07:00
Remita Amine
3c6b3bf221 [iprima] detect geo restriction 2016-10-07 15:53:16 +01:00
Yen Chi Hsuan
38588ab977 [facebook] Fix for new handleServerJS syntax (closes #10846)
According to the dump file in #10846, handleServerJS() now accepts
an optional second argument. It's a string from available dump files.
2016-10-07 20:04:49 +08:00
Yen Chi Hsuan
85bcdd081c [extractors] Add MmsIE 2016-10-07 19:31:26 +08:00
Yen Chi Hsuan
9dcd6fd3aa [generic,commonprotocols] Move mms suuport from GenericIE
And use _generic_* helpers in those extractors
2016-10-07 19:24:22 +08:00
Yen Chi Hsuan
98763ee354 [extractor/common] Add id and title helpers for generic IEs 2016-10-07 19:20:53 +08:00
Yen Chi Hsuan
3d83a1ae92 [generic] Support direct MMS links (closes #10838) 2016-10-07 17:50:45 +08:00
Yen Chi Hsuan
c0a7b9b348 Revert "[Makefilea] Fix for GNU make < 4"
This reverts commit 831a34caa2.

The reverted commit breaks lazy extractors.
2016-10-07 16:03:34 +08:00
Yen Chi Hsuan
831a34caa2 [Makefilea] Fix for GNU make < 4
Closes #9387

The shell assignment operator != was introduced in GNU make 4.0, or
specifically the commit in [1]. This fix removes such usages and
fallback to a more portable syntax. Tested with:

* GNU make 3.82 on CentOS 7.2
* bmake 20150910 on CentOS 7.2, source RPM from Fedora 24 [2]
* GNU make 4.2.1 on Arch Linux (Arch official package)
* bmake 20160926 on Arch Linux (Arch official package)
* GNU make 3.82 on Arch Linux (Compiled from source)
* Apple bsdmake-24 on macOS Sierra, binary package from Homebrew

Thanks @bdeyal for the feedback of the first tests

[1] http://git.savannah.gnu.org/cgit/make.git/commit/?id=b34438bee83ee906a23b881f257e684a0993b9b1
[2] http://koji.fedoraproject.org/koji/buildinfo?buildID=716769
2016-10-07 03:28:41 +08:00
Sergey M․
09b9c45e24 [generic] Add support for multiple vimeo embeds (Closes #10862) 2016-10-06 23:22:52 +07:00
Remita Amine
33898fb19c [nzz] Add new extractor(#4407) 2016-10-06 10:45:57 +01:00
Remita Amine
017eb82934 [npo] detect geo restriction 2016-10-05 18:27:02 +01:00
Sergey M․
b1d798887e [npo] Add support for 2doc.nl (Closes #10842) 2016-10-05 23:43:08 +07:00
Steffan Donal
0a33bb2cb2 Rename "Steffan 'Ruirize' James" to "Steffan Donal"
Legal name change!
2016-10-05 03:32:14 +07:00
Remita Amine
185744f92f [lego] Add new extractor(closes #10369) 2016-10-04 10:30:57 +01:00
Remita Amine
7232e54813 [tonline] Add new extractor(#10376) 2016-10-04 08:00:25 +01:00
Sergey M․
6eb5503b12 [techtalks] Relax _VALID_URL 2016-10-04 02:54:36 +07:00
Aleksander Nitecki
539c881bfc [techtalks] Allow URL-s with name part omitted. 2016-10-04 02:52:33 +07:00
Sergey M․
c1b2a0858c [youtube:live] Extend _VALID_URL (Closes #10839) 2016-10-04 02:10:23 +07:00
Remita Amine
215ff6e0f3 [theweatherchannel] Add new extractor(closes #7188) 2016-10-03 18:20:34 +01:00
Déstin Reed
dcdb292fdd Unify coding cookie 2016-10-03 23:44:29 +07:00
Remita Amine
c1084ddb0c [thisoldhouse] Add new extractor(closes #10837) 2016-10-03 15:27:09 +01:00
Sergey M․
ee5de4e38e [nhl] Add support for wch2016.com (Closes #10833) 2016-10-03 00:54:02 +07:00
Yen Chi Hsuan
25291b979a Merge pull request #10829 from TRox1972/pornoxo_improve
[pornoxo] Use JWPlatform to improve metadata extraction
2016-10-02 20:19:34 +08:00
Déstin Reed
567a5996ca [pornoxo] Use JWPlatform to improve metadata extraction 2016-10-02 13:07:02 +02:00
132 changed files with 844 additions and 274 deletions

View File

@@ -6,8 +6,8 @@
---
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2016.10.02*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2016.10.02**
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2016.10.12*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2016.10.12**
### Before submitting an *issue* make sure you have:
- [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
@@ -35,7 +35,7 @@ $ youtube-dl -v <your command line>
[debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2016.10.02
[debug] youtube-dl version 2016.10.12
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {}

View File

@@ -26,7 +26,7 @@ Albert Kim
Pierre Rudloff
Huarong Huo
Ismael Mejía
Steffan 'Ruirize' James
Steffan Donal
Andras Elso
Jelle van der Waa
Marcin Cieślak

View File

@@ -85,7 +85,7 @@ To run the test, simply invoke your favorite test runner, or execute a test file
If you want to create a build of youtube-dl yourself, you'll need
* python
* make (both GNU make and BSD make are supported)
* make (only GNU make is supported)
* pandoc
* zip
* nosetests

View File

@@ -1,3 +1,41 @@
version 2016.10.12
Core
+ Support HTML media elements without child nodes
* [Makefile] Support for GNU make < 4 is fixed; BSD make dropped (#9387)
Extractors
* [dailymotion] Fix extraction (#10901)
* [vimeo:review] Fix extraction (#10900)
* [nhl] Correctly handle invalid formats (#10713)
* [footyroom] Fix extraction (#10810)
* [abc.net.au:iview] Fix for standalone (non series) videos (#10895)
+ [hbo] Add support for episode pages (#10892)
* [allocine] Fix extraction (#10860)
+ [nextmedia] Recognize action news on AppleDaily
* [lego] Improve info extraction and bypass geo restriction (#10872)
version 2016.10.07
Extractors
+ [iprima] Detect geo restriction
* [facebook] Fix video extraction (#10846)
+ [commonprotocols] Support direct MMS links (#10838)
+ [generic] Add support for multiple vimeo embeds (#10862)
+ [nzz] Add support for nzz.ch (#4407)
+ [npo] Detect geo restriction
+ [npo] Add support for 2doc.nl (#10842)
+ [lego] Add support for lego.com (#10369)
+ [tonline] Add support for t-online.de (#10376)
* [techtalks] Relax URL regular expression (#10840)
* [youtube:live] Extend URL regular expression (#10839)
+ [theweatherchannel] Add support for weather.com (#7188)
+ [thisoldhouse] Add support for thisoldhouse.com (#10837)
+ [nhl] Add support for wch2016.com (#10833)
* [pornoxo] Use JWPlatform to improve metadata extraction
version 2016.10.02
Core

View File

@@ -12,7 +12,7 @@ SHAREDIR ?= $(PREFIX)/share
PYTHON ?= /usr/bin/env python
# set SYSCONFDIR to /etc if PREFIX=/usr or PREFIX=/usr/local
SYSCONFDIR != if [ $(PREFIX) = /usr -o $(PREFIX) = /usr/local ]; then echo /etc; else echo $(PREFIX)/etc; fi
SYSCONFDIR = $(shell if [ $(PREFIX) = /usr -o $(PREFIX) = /usr/local ]; then echo /etc; else echo $(PREFIX)/etc; fi)
install: youtube-dl youtube-dl.1 youtube-dl.bash-completion youtube-dl.zsh youtube-dl.fish
install -d $(DESTDIR)$(BINDIR)
@@ -90,7 +90,7 @@ fish-completion: youtube-dl.fish
lazy-extractors: youtube_dl/extractor/lazy_extractors.py
_EXTRACTOR_FILES != find youtube_dl/extractor -iname '*.py' -and -not -iname 'lazy_extractors.py'
_EXTRACTOR_FILES = $(shell find youtube_dl/extractor -iname '*.py' -and -not -iname 'lazy_extractors.py')
youtube_dl/extractor/lazy_extractors.py: devscripts/make_lazy_extractors.py devscripts/lazy_load_template.py $(_EXTRACTOR_FILES)
$(PYTHON) devscripts/make_lazy_extractors.py $@

View File

@@ -923,7 +923,7 @@ To run the test, simply invoke your favorite test runner, or execute a test file
If you want to create a build of youtube-dl yourself, you'll need
* python
* make (both GNU make and BSD make are supported)
* make (only GNU make is supported)
* pandoc
* zip
* nosetests

View File

@@ -1,4 +1,4 @@
# encoding: utf-8
# coding: utf-8
from __future__ import unicode_literals
import re

View File

@@ -1,4 +1,4 @@
# -*- coding: utf-8 -*-
# coding: utf-8
#
# youtube-dl documentation build configuration file, created by
# sphinx-quickstart on Fri Mar 14 21:05:43 2014.

View File

@@ -289,6 +289,7 @@
- **Groupon**
- **Hark**
- **HBO**
- **HBOEpisode**
- **HearThisAt**
- **Heise**
- **HellPorno**
@@ -364,6 +365,7 @@
- **Le**: 乐视网
- **Learnr**
- **Lecture2Go**
- **LEGO**
- **Lemonde**
- **LePlaylist**
- **LetvCloud**: 乐视云
@@ -507,6 +509,7 @@
- **Nuvid**
- **NYTimes**
- **NYTimesArticle**
- **NZZ**
- **ocw.mit.edu**
- **OdaTV**
- **Odnoklassniki**
@@ -692,6 +695,7 @@
- **SWRMediathek**
- **Syfy**
- **SztvHu**
- **t-online.de**
- **Tagesschau**
- **tagesschau:player**
- **Tass**
@@ -721,8 +725,10 @@
- **TheScene**
- **TheSixtyOne**
- **TheStar**
- **TheWeatherChannel**
- **ThisAmericanLife**
- **ThisAV**
- **ThisOldHouse**
- **tinypic**: tinypic.com videos
- **tlc.de**
- **TMZ**

View File

@@ -1,5 +1,5 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# coding: utf-8
from __future__ import print_function

View File

@@ -87,7 +87,7 @@ class TestHTTP(unittest.TestCase):
ydl = YoutubeDL({'logger': FakeLogger()})
r = ydl.extract_info('http://localhost:%d/302' % self.port)
self.assertEqual(r['url'], 'http://localhost:%d/vid.mp4' % self.port)
self.assertEqual(r['entries'][0]['url'], 'http://localhost:%d/vid.mp4' % self.port)
class TestHTTPS(unittest.TestCase):
@@ -111,7 +111,7 @@ class TestHTTPS(unittest.TestCase):
ydl = YoutubeDL({'logger': FakeLogger(), 'nocheckcertificate': True})
r = ydl.extract_info('https://localhost:%d/video.html' % self.port)
self.assertEqual(r['url'], 'https://localhost:%d/vid.mp4' % self.port)
self.assertEqual(r['entries'][0]['url'], 'https://localhost:%d/vid.mp4' % self.port)
def _build_proxy_handler(name):

View File

@@ -1,5 +1,5 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# coding: utf-8
from __future__ import absolute_import, unicode_literals

View File

@@ -1,5 +1,5 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# coding: utf-8
from __future__ import unicode_literals

View File

@@ -346,7 +346,6 @@ class FileDownloader(object):
min_sleep_interval = self.params.get('sleep_interval')
if min_sleep_interval:
max_sleep_interval = self.params.get('max_sleep_interval', min_sleep_interval)
print(min_sleep_interval, max_sleep_interval)
sleep_interval = random.uniform(min_sleep_interval, max_sleep_interval)
self.to_screen('[download] Sleeping %s seconds...' % sleep_interval)
time.sleep(sleep_interval)

View File

@@ -102,16 +102,16 @@ class ABCIViewIE(InfoExtractor):
# ABC iview programs are normally available for 14 days only.
_TESTS = [{
'url': 'http://iview.abc.net.au/programs/gardening-australia/FA1505V024S00',
'md5': '979d10b2939101f0d27a06b79edad536',
'url': 'http://iview.abc.net.au/programs/diaries-of-a-broken-mind/ZX9735A001S00',
'md5': 'cde42d728b3b7c2b32b1b94b4a548afc',
'info_dict': {
'id': 'FA1505V024S00',
'id': 'ZX9735A001S00',
'ext': 'mp4',
'title': 'Series 27 Ep 24',
'description': 'md5:b28baeae7504d1148e1d2f0e3ed3c15d',
'upload_date': '20160820',
'uploader_id': 'abc1',
'timestamp': 1471719600,
'title': 'Diaries Of A Broken Mind',
'description': 'md5:7de3903874b7a1be279fe6b68718fc9e',
'upload_date': '20161010',
'uploader_id': 'abc2',
'timestamp': 1476064920,
},
'skip': 'Video gone',
}]
@@ -121,7 +121,7 @@ class ABCIViewIE(InfoExtractor):
webpage = self._download_webpage(url, video_id)
video_params = self._parse_json(self._search_regex(
r'videoParams\s*=\s*({.+?});', webpage, 'video params'), video_id)
title = video_params['title']
title = video_params.get('title') or video_params['seriesTitle']
stream = next(s for s in video_params['playlist'] if s.get('type') == 'program')
formats = self._extract_akamai_formats(stream['hds-unmetered'], video_id)
@@ -144,8 +144,8 @@ class ABCIViewIE(InfoExtractor):
'timestamp': parse_iso8601(video_params.get('pubDate'), ' '),
'series': video_params.get('seriesTitle'),
'series_id': video_params.get('seriesHouseNumber') or video_id[:7],
'episode_number': int_or_none(self._html_search_meta('episodeNumber', webpage)),
'episode': self._html_search_meta('episode_title', webpage),
'episode_number': int_or_none(self._html_search_meta('episodeNumber', webpage, default=None)),
'episode': self._html_search_meta('episode_title', webpage, default=None),
'uploader_id': video_params.get('channel'),
'formats': formats,
'subtitles': subtitles,

View File

@@ -1,4 +1,4 @@
# -*- coding: utf-8 -*-
# coding: utf-8
from __future__ import unicode_literals
import re

View File

@@ -1,29 +1,26 @@
# -*- coding: utf-8 -*-
# coding: utf-8
from __future__ import unicode_literals
import re
import json
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
remove_end,
qualities,
unescapeHTML,
xpath_element,
url_basename,
)
class AllocineIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?allocine\.fr/(?P<typ>article|video|film)/(fichearticle_gen_carticle=|player_gen_cmedia=|fichefilm_gen_cfilm=|video-)(?P<id>[0-9]+)(?:\.html)?'
_VALID_URL = r'https?://(?:www\.)?allocine\.fr/(?:article|video|film)/(?:fichearticle_gen_carticle=|player_gen_cmedia=|fichefilm_gen_cfilm=|video-)(?P<id>[0-9]+)(?:\.html)?'
_TESTS = [{
'url': 'http://www.allocine.fr/article/fichearticle_gen_carticle=18635087.html',
'md5': '0c9fcf59a841f65635fa300ac43d8269',
'info_dict': {
'id': '19546517',
'display_id': '18635087',
'ext': 'mp4',
'title': 'Astérix - Le Domaine des Dieux Teaser VF',
'description': 'md5:abcd09ce503c6560512c14ebfdb720d2',
'description': 'md5:4a754271d9c6f16c72629a8a993ee884',
'thumbnail': 're:http://.*\.jpg',
},
}, {
@@ -31,64 +28,82 @@ class AllocineIE(InfoExtractor):
'md5': 'd0cdce5d2b9522ce279fdfec07ff16e0',
'info_dict': {
'id': '19540403',
'display_id': '19540403',
'ext': 'mp4',
'title': 'Planes 2 Bande-annonce VF',
'description': 'Regardez la bande annonce du film Planes 2 (Planes 2 Bande-annonce VF). Planes 2, un film de Roberts Gannaway',
'thumbnail': 're:http://.*\.jpg',
},
}, {
'url': 'http://www.allocine.fr/film/fichefilm_gen_cfilm=181290.html',
'url': 'http://www.allocine.fr/video/player_gen_cmedia=19544709&cfilm=181290.html',
'md5': '101250fb127ef9ca3d73186ff22a47ce',
'info_dict': {
'id': '19544709',
'display_id': '19544709',
'ext': 'mp4',
'title': 'Dragons 2 - Bande annonce finale VF',
'description': 'md5:601d15393ac40f249648ef000720e7e3',
'description': 'md5:6cdd2d7c2687d4c6aafe80a35e17267a',
'thumbnail': 're:http://.*\.jpg',
},
}, {
'url': 'http://www.allocine.fr/video/video-19550147/',
'only_matching': True,
'md5': '3566c0668c0235e2d224fd8edb389f67',
'info_dict': {
'id': '19550147',
'ext': 'mp4',
'title': 'Faux Raccord N°123 - Les gaffes de Cliffhanger',
'description': 'md5:bc734b83ffa2d8a12188d9eb48bb6354',
'thumbnail': 're:http://.*\.jpg',
},
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
typ = mobj.group('typ')
display_id = mobj.group('id')
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
if typ == 'film':
video_id = self._search_regex(r'href="/video/player_gen_cmedia=([0-9]+).+"', webpage, 'video id')
else:
player = self._search_regex(r'data-player=\'([^\']+)\'>', webpage, 'data player', default=None)
if player:
player_data = json.loads(player)
video_id = compat_str(player_data['refMedia'])
else:
model = self._search_regex(r'data-model="([^"]+)">', webpage, 'data model')
model_data = self._parse_json(unescapeHTML(model), display_id)
video_id = compat_str(model_data['id'])
xml = self._download_xml('http://www.allocine.fr/ws/AcVisiondataV4.ashx?media=%s' % video_id, display_id)
video = xpath_element(xml, './/AcVisionVideo').attrib
formats = []
quality = qualities(['ld', 'md', 'hd'])
formats = []
for k, v in video.items():
if re.match(r'.+_path', k):
format_id = k.split('_')[0]
model = self._html_search_regex(
r'data-model="([^"]+)"', webpage, 'data model', default=None)
if model:
model_data = self._parse_json(model, display_id)
for video_url in model_data['sources'].values():
video_id, format_id = url_basename(video_url).split('_')[:2]
formats.append({
'format_id': format_id,
'quality': quality(format_id),
'url': v,
'url': video_url,
})
title = model_data['title']
else:
video_id = display_id
media_data = self._download_json(
'http://www.allocine.fr/ws/AcVisiondataV5.ashx?media=%s' % video_id, display_id)
for key, value in media_data['video'].items():
if not key.endswith('Path'):
continue
format_id = key[:-len('Path')]
formats.append({
'format_id': format_id,
'quality': quality(format_id),
'url': value,
})
title = remove_end(self._html_search_regex(
r'(?s)<title>(.+?)</title>', webpage, 'title'
).strip(), ' - AlloCiné')
self._sort_formats(formats)
return {
'id': video_id,
'title': video['videoTitle'],
'display_id': display_id,
'title': title,
'thumbnail': self._og_search_thumbnail(webpage),
'formats': formats,
'description': self._og_search_description(webpage),

View File

@@ -1,4 +1,4 @@
# encoding: utf-8
# coding: utf-8
from __future__ import unicode_literals
import re

View File

@@ -1,4 +1,4 @@
# encoding: utf-8
# coding: utf-8
from __future__ import unicode_literals
import re

View File

@@ -1,4 +1,4 @@
# encoding: utf-8
# coding: utf-8
from __future__ import unicode_literals
import re

View File

@@ -1,4 +1,4 @@
# encoding: utf-8
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor

View File

@@ -1,4 +1,4 @@
# -*- coding: utf-8 -*-
# coding: utf-8
from __future__ import unicode_literals
import re

View File

@@ -1,4 +1,4 @@
# encoding: utf-8
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor

View File

@@ -21,6 +21,7 @@ from ..compat import (
compat_os_name,
compat_str,
compat_urllib_error,
compat_urllib_parse_unquote,
compat_urllib_parse_urlencode,
compat_urllib_request,
compat_urlparse,
@@ -1801,7 +1802,11 @@ class InfoExtractor(object):
return is_plain_url, formats
entries = []
for media_tag, media_type, media_content in re.findall(r'(?s)(<(?P<tag>video|audio)[^>]*>)(.*?)</(?P=tag)>', webpage):
media_tags = [(media_tag, media_type, '')
for media_tag, media_type
in re.findall(r'(?s)(<(video|audio)[^>]*/>)', webpage)]
media_tags.extend(re.findall(r'(?s)(<(?P<tag>video|audio)[^>]*>)(.*?)</(?P=tag)>', webpage))
for media_tag, media_type, media_content in media_tags:
media_info = {
'formats': [],
'subtitles': {},
@@ -2020,6 +2025,12 @@ class InfoExtractor(object):
headers['Ytdl-request-proxy'] = geo_verification_proxy
return headers
def _generic_id(self, url):
return compat_urllib_parse_unquote(os.path.splitext(url.rstrip('/').split('/')[-1])[0])
def _generic_title(self, url):
return compat_urllib_parse_unquote(os.path.splitext(url_basename(url))[0])
class SearchInfoExtractor(InfoExtractor):
"""

View File

@@ -1,13 +1,9 @@
from __future__ import unicode_literals
import os
from .common import InfoExtractor
from ..compat import (
compat_urllib_parse_unquote,
compat_urlparse,
)
from ..utils import url_basename
class RtmpIE(InfoExtractor):
@@ -23,8 +19,8 @@ class RtmpIE(InfoExtractor):
}]
def _real_extract(self, url):
video_id = compat_urllib_parse_unquote(os.path.splitext(url.rstrip('/').split('/')[-1])[0])
title = compat_urllib_parse_unquote(os.path.splitext(url_basename(url))[0])
video_id = self._generic_id(url)
title = self._generic_title(url)
return {
'id': video_id,
'title': title,
@@ -34,3 +30,31 @@ class RtmpIE(InfoExtractor):
'format_id': compat_urlparse.urlparse(url).scheme,
}],
}
class MmsIE(InfoExtractor):
IE_DESC = False # Do not list
_VALID_URL = r'(?i)mms://.+'
_TEST = {
# Direct MMS link
'url': 'mms://kentro.kaist.ac.kr/200907/MilesReid(0709).wmv',
'info_dict': {
'id': 'MilesReid(0709)',
'ext': 'wmv',
'title': 'MilesReid(0709)',
},
'params': {
'skip_download': True, # rtsp downloads, requiring mplayer or mpv
},
}
def _real_extract(self, url):
video_id = self._generic_id(url)
title = self._generic_title(url)
return {
'id': video_id,
'title': title,
'url': url,
}

View File

@@ -1,4 +1,4 @@
# encoding: utf-8
# coding: utf-8
from __future__ import unicode_literals
import re

View File

@@ -94,7 +94,8 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
'title': 'Leanna Decker - Cyber Girl Of The Year Desires Nude [Playboy Plus]',
'uploader': 'HotWaves1012',
'age_limit': 18,
}
},
'skip': 'video gone',
},
# geo-restricted, player v5
{
@@ -144,7 +145,8 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
player_v5 = self._search_regex(
[r'buildPlayer\(({.+?})\);\n', # See https://github.com/rg3/youtube-dl/issues/7826
r'playerV5\s*=\s*dmp\.create\([^,]+?,\s*({.+?})\);',
r'buildPlayer\(({.+?})\);'],
r'buildPlayer\(({.+?})\);',
r'var\s+config\s*=\s*({.+?});'],
webpage, 'player v5', default=None)
if player_v5:
player = self._parse_json(player_v5, video_id)

View File

@@ -1,4 +1,4 @@
# encoding: utf-8
# coding: utf-8
from __future__ import unicode_literals

View File

@@ -1,4 +1,4 @@
# encoding: utf-8
# coding: utf-8
from __future__ import unicode_literals
import itertools

View File

@@ -1,4 +1,4 @@
# encoding: utf-8
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor

View File

@@ -1,4 +1,4 @@
# encoding: utf-8
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor

View File

@@ -186,7 +186,10 @@ from .comedycentral import (
)
from .comcarcoff import ComCarCoffIE
from .commonmistakes import CommonMistakesIE, UnicodeBOMIE
from .commonprotocols import RtmpIE
from .commonprotocols import (
MmsIE,
RtmpIE,
)
from .condenast import CondeNastIE
from .cracked import CrackedIE
from .crackle import CrackleIE
@@ -345,7 +348,10 @@ from .goshgay import GoshgayIE
from .gputechconf import GPUTechConfIE
from .groupon import GrouponIE
from .hark import HarkIE
from .hbo import HBOIE
from .hbo import (
HBOIE,
HBOEpisodeIE,
)
from .hearthisat import HearThisAtIE
from .heise import HeiseIE
from .hellporno import HellPornoIE
@@ -437,6 +443,7 @@ from .lcp import (
)
from .learnr import LearnrIE
from .lecture2go import Lecture2GoIE
from .lego import LEGOIE
from .lemonde import LemondeIE
from .leeco import (
LeIE,
@@ -637,6 +644,7 @@ from .nytimes import (
NYTimesArticleIE,
)
from .nuvid import NuvidIE
from .nzz import NZZIE
from .odatv import OdaTVIE
from .odnoklassniki import OdnoklassnikiIE
from .oktoberfesttv import OktoberfestTVIE
@@ -890,8 +898,10 @@ from .theplatform import (
from .thescene import TheSceneIE
from .thesixtyone import TheSixtyOneIE
from .thestar import TheStarIE
from .theweatherchannel import TheWeatherChannelIE
from .thisamericanlife import ThisAmericanLifeIE
from .thisav import ThisAVIE
from .thisoldhouse import ThisOldHouseIE
from .threeqsdn import ThreeQSDNIE
from .tinypic import TinyPicIE
from .tlc import TlcDeIE
@@ -906,6 +916,7 @@ from .tnaflix import (
MovieFapIE,
)
from .toggle import ToggleIE
from .tonline import TOnlineIE
from .toutv import TouTvIE
from .toypics import ToypicsUserIE, ToypicsIE
from .traileraddict import TrailerAddictIE

View File

@@ -258,7 +258,7 @@ class FacebookIE(InfoExtractor):
if not video_data:
server_js_data = self._parse_json(self._search_regex(
r'handleServerJS\(({.+})\);', webpage, 'server js data', default='{}'), video_id)
r'handleServerJS\(({.+})(?:\);|,")', webpage, 'server js data', default='{}'), video_id)
for item in server_js_data.get('instances', []):
if item[1][0] == 'VideoConfig':
video_data = video_data_list2dict(item[2][0]['videoData'])

View File

@@ -1,4 +1,4 @@
# encoding: utf-8
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor

View File

@@ -1,4 +1,4 @@
# encoding: utf-8
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor

View File

@@ -1,4 +1,4 @@
# -*- coding: utf-8 -*-
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor

View File

@@ -2,25 +2,27 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from .streamable import StreamableIE
class FootyRoomIE(InfoExtractor):
_VALID_URL = r'https?://footyroom\.com/(?P<id>[^/]+)'
_VALID_URL = r'https?://footyroom\.com/matches/(?P<id>\d+)'
_TESTS = [{
'url': 'http://footyroom.com/schalke-04-0-2-real-madrid-2015-02/',
'url': 'http://footyroom.com/matches/79922154/hull-city-vs-chelsea/review',
'info_dict': {
'id': 'schalke-04-0-2-real-madrid-2015-02',
'title': 'Schalke 04 0 2 Real Madrid',
'id': '79922154',
'title': 'VIDEO Hull City 0 - 2 Chelsea',
},
'playlist_count': 3,
'skip': 'Video for this match is not available',
'playlist_count': 2,
'add_ie': [StreamableIE.ie_key()],
}, {
'url': 'http://footyroom.com/georgia-0-2-germany-2015-03/',
'url': 'http://footyroom.com/matches/75817984/georgia-vs-germany/review',
'info_dict': {
'id': 'georgia-0-2-germany-2015-03',
'title': 'Georgia 0 2 Germany',
'id': '75817984',
'title': 'VIDEO Georgia 0 - 2 Germany',
},
'playlist_count': 1,
'add_ie': ['Playwire']
}]
def _real_extract(self, url):
@@ -28,9 +30,8 @@ class FootyRoomIE(InfoExtractor):
webpage = self._download_webpage(url, playlist_id)
playlist = self._parse_json(
self._search_regex(
r'VideoSelector\.load\((\[.+?\])\);', webpage, 'video selector'),
playlist = self._parse_json(self._search_regex(
r'DataStore\.media\s*=\s*([^;]+)', webpage, 'media data'),
playlist_id)
playlist_title = self._og_search_title(webpage)
@@ -40,11 +41,16 @@ class FootyRoomIE(InfoExtractor):
payload = video.get('payload')
if not payload:
continue
playwire_url = self._search_regex(
playwire_url = self._html_search_regex(
r'data-config="([^"]+)"', payload,
'playwire url', default=None)
if playwire_url:
entries.append(self.url_result(self._proto_relative_url(
playwire_url, 'http:'), 'Playwire'))
streamable_url = StreamableIE._extract_url(payload)
if streamable_url:
entries.append(self.url_result(
streamable_url, StreamableIE.ie_key()))
return self.playlist_result(entries, playlist_id, playlist_title)

View File

@@ -1,4 +1,4 @@
# encoding: utf-8
# coding: utf-8
from __future__ import unicode_literals

View File

@@ -1,4 +1,4 @@
# encoding: utf-8
# coding: utf-8
from __future__ import unicode_literals
@@ -27,7 +27,6 @@ from ..utils import (
unified_strdate,
unsmuggle_url,
UnsupportedError,
url_basename,
xpath_text,
)
from .brightcove import (
@@ -1549,7 +1548,7 @@ class GenericIE(InfoExtractor):
force_videoid = smuggled_data['force_videoid']
video_id = force_videoid
else:
video_id = compat_urllib_parse_unquote(os.path.splitext(url.rstrip('/').split('/')[-1])[0])
video_id = self._generic_id(url)
self.to_screen('%s: Requesting header' % video_id)
@@ -1578,7 +1577,7 @@ class GenericIE(InfoExtractor):
info_dict = {
'id': video_id,
'title': compat_urllib_parse_unquote(os.path.splitext(url_basename(url))[0]),
'title': self._generic_title(url),
'upload_date': unified_strdate(head_response.headers.get('Last-Modified'))
}
@@ -1754,9 +1753,9 @@ class GenericIE(InfoExtractor):
if matches:
return _playlist_from_matches(matches, ie='RtlNl')
vimeo_url = VimeoIE._extract_vimeo_url(url, webpage)
if vimeo_url is not None:
return self.url_result(vimeo_url)
vimeo_urls = VimeoIE._extract_urls(url, webpage)
if vimeo_urls:
return _playlist_from_matches(vimeo_urls, ie=VimeoIE.ie_key())
vid_me_embed_url = self._search_regex(
r'src=[\'"](https?://vid\.me/[^\'"]+)[\'"]',

View File

@@ -1,4 +1,4 @@
# -*- coding: utf-8 -*-
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor

View File

@@ -1,4 +1,4 @@
# -*- coding: utf-8 -*-
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor

View File

@@ -12,17 +12,7 @@ from ..utils import (
)
class HBOIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?hbo\.com/video/video\.html\?.*vid=(?P<id>[0-9]+)'
_TEST = {
'url': 'http://www.hbo.com/video/video.html?autoplay=true&g=u&vid=1437839',
'md5': '1c33253f0c7782142c993c0ba62a8753',
'info_dict': {
'id': '1437839',
'ext': 'mp4',
'title': 'Ep. 64 Clip: Encryption',
}
}
class HBOBaseIE(InfoExtractor):
_FORMATS_INFO = {
'1920': {
'width': 1280,
@@ -50,8 +40,7 @@ class HBOIE(InfoExtractor):
},
}
def _real_extract(self, url):
video_id = self._match_id(url)
def _extract_from_id(self, video_id):
video_data = self._download_xml(
'http://render.lv3.hbo.com/data/content/global/videos/data/%s.xml' % video_id, video_id)
title = xpath_text(video_data, 'title', 'title', True)
@@ -116,7 +105,60 @@ class HBOIE(InfoExtractor):
return {
'id': video_id,
'title': title,
'duration': parse_duration(xpath_element(video_data, 'duration/tv14')),
'duration': parse_duration(xpath_text(video_data, 'duration/tv14')),
'formats': formats,
'thumbnails': thumbnails,
}
class HBOIE(HBOBaseIE):
_VALID_URL = r'https?://(?:www\.)?hbo\.com/video/video\.html\?.*vid=(?P<id>[0-9]+)'
_TEST = {
'url': 'http://www.hbo.com/video/video.html?autoplay=true&g=u&vid=1437839',
'md5': '1c33253f0c7782142c993c0ba62a8753',
'info_dict': {
'id': '1437839',
'ext': 'mp4',
'title': 'Ep. 64 Clip: Encryption',
'thumbnail': 're:https?://.*\.jpg$',
'duration': 1072,
}
}
def _real_extract(self, url):
video_id = self._match_id(url)
return self._extract_from_id(video_id)
class HBOEpisodeIE(HBOBaseIE):
_VALID_URL = r'https?://(?:www\.)?hbo\.com/(?!video)([^/]+/)+video/(?P<id>[0-9a-z-]+)\.html'
_TESTS = [{
'url': 'http://www.hbo.com/girls/episodes/5/52-i-love-you-baby/video/ep-52-inside-the-episode.html?autoplay=true',
'md5': '689132b253cc0ab7434237fc3a293210',
'info_dict': {
'id': '1439518',
'display_id': 'ep-52-inside-the-episode',
'ext': 'mp4',
'title': 'Ep. 52: Inside the Episode',
'thumbnail': 're:https?://.*\.jpg$',
'duration': 240,
},
}, {
'url': 'http://www.hbo.com/game-of-thrones/about/video/season-5-invitation-to-the-set.html?autoplay=true',
'only_matching': True,
}]
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
video_id = self._search_regex(
r'(?P<q1>[\'"])videoId(?P=q1)\s*:\s*(?P<q2>[\'"])(?P<video_id>\d+)(?P=q2)',
webpage, 'video ID', group='video_id')
info_dict = self._extract_from_id(video_id)
info_dict['display_id'] = display_id
return info_dict

View File

@@ -1,4 +1,4 @@
# -*- coding: utf-8 -*-
# coding: utf-8
from __future__ import unicode_literals

View File

@@ -1,4 +1,4 @@
# encoding: utf-8
# coding: utf-8
from __future__ import unicode_literals
import re

View File

@@ -81,6 +81,9 @@ class IPrimaIE(InfoExtractor):
for _, src in re.findall(r'src["\']\s*:\s*(["\'])(.+?)\1', playerpage):
extract_formats(src)
if not formats and '>GEO_IP_NOT_ALLOWED<' in playerpage:
self.raise_geo_restricted()
self._sort_formats(formats)
return {

View File

@@ -1,4 +1,4 @@
# coding=utf-8
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor

View File

@@ -1,4 +1,4 @@
# encoding: utf-8
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor

View File

@@ -1,4 +1,4 @@
# encoding: utf-8
# coding: utf-8
from __future__ import unicode_literals
import re

View File

@@ -1,4 +1,4 @@
# encoding: utf-8
# coding: utf-8
from __future__ import unicode_literals
import json

View File

@@ -0,0 +1,128 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
unescapeHTML,
parse_duration,
get_element_by_class,
)
class LEGOIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?lego\.com/(?P<locale>[^/]+)/(?:[^/]+/)*videos/(?:[^/]+/)*[^/?#]+-(?P<id>[0-9a-f]+)'
_TESTS = [{
'url': 'http://www.lego.com/en-us/videos/themes/club/blocumentary-kawaguchi-55492d823b1b4d5e985787fa8c2973b1',
'md5': 'f34468f176cfd76488767fc162c405fa',
'info_dict': {
'id': '55492d823b1b4d5e985787fa8c2973b1',
'ext': 'mp4',
'title': 'Blocumentary Great Creations: Akiyuki Kawaguchi',
'description': 'Blocumentary Great Creations: Akiyuki Kawaguchi',
},
}, {
# geo-restricted but the contentUrl contain a valid url
'url': 'http://www.lego.com/nl-nl/videos/themes/nexoknights/episode-20-kingdom-of-heroes-13bdc2299ab24d9685701a915b3d71e7##sp=399',
'md5': '4c3fec48a12e40c6e5995abc3d36cc2e',
'info_dict': {
'id': '13bdc2299ab24d9685701a915b3d71e7',
'ext': 'mp4',
'title': 'Aflevering 20 - Helden van het koninkrijk',
'description': 'md5:8ee499aac26d7fa8bcb0cedb7f9c3941',
},
}, {
# special characters in title
'url': 'http://www.lego.com/en-us/starwars/videos/lego-star-wars-force-surprise-9685ee9d12e84ff38e84b4e3d0db533d',
'info_dict': {
'id': '9685ee9d12e84ff38e84b4e3d0db533d',
'ext': 'mp4',
'title': 'Force Surprise LEGO® Star Wars™ Microfighters',
'description': 'md5:9c673c96ce6f6271b88563fe9dc56de3',
},
'params': {
'skip_download': True,
},
}]
_BITRATES = [256, 512, 1024, 1536, 2560]
def _real_extract(self, url):
locale, video_id = re.match(self._VALID_URL, url).groups()
webpage = self._download_webpage(url, video_id)
title = get_element_by_class('video-header', webpage).strip()
progressive_base = 'https://lc-mediaplayerns-live-s.legocdn.com/'
streaming_base = 'http://legoprod-f.akamaihd.net/'
content_url = self._html_search_meta('contentUrl', webpage)
path = self._search_regex(
r'(?:https?:)?//[^/]+/(?:[iz]/s/)?public/(.+)_[0-9,]+\.(?:mp4|webm)',
content_url, 'video path', default=None)
if not path:
player_url = self._proto_relative_url(self._search_regex(
r'<iframe[^>]+src="((?:https?)?//(?:www\.)?lego\.com/[^/]+/mediaplayer/video/[^"]+)',
webpage, 'player url', default=None))
if not player_url:
base_url = self._proto_relative_url(self._search_regex(
r'data-baseurl="([^"]+)"', webpage, 'base url',
default='http://www.lego.com/%s/mediaplayer/video/' % locale))
player_url = base_url + video_id
player_webpage = self._download_webpage(player_url, video_id)
video_data = self._parse_json(unescapeHTML(self._search_regex(
r"video='([^']+)'", player_webpage, 'video data')), video_id)
progressive_base = self._search_regex(
r'data-video-progressive-url="([^"]+)"',
player_webpage, 'progressive base', default='https://lc-mediaplayerns-live-s.legocdn.com/')
streaming_base = self._search_regex(
r'data-video-streaming-url="([^"]+)"',
player_webpage, 'streaming base', default='http://legoprod-f.akamaihd.net/')
item_id = video_data['ItemId']
net_storage_path = video_data.get('NetStoragePath') or '/'.join([item_id[:2], item_id[2:4]])
base_path = '_'.join([item_id, video_data['VideoId'], video_data['Locale'], compat_str(video_data['VideoVersion'])])
path = '/'.join([net_storage_path, base_path])
streaming_path = ','.join(map(lambda bitrate: compat_str(bitrate), self._BITRATES))
formats = self._extract_akamai_formats(
'%si/s/public/%s_,%s,.mp4.csmil/master.m3u8' % (streaming_base, path, streaming_path), video_id)
m3u8_formats = list(filter(
lambda f: f.get('protocol') == 'm3u8_native' and f.get('vcodec') != 'none' and f.get('resolution') != 'multiple',
formats))
if len(m3u8_formats) == len(self._BITRATES):
self._sort_formats(m3u8_formats)
for bitrate, m3u8_format in zip(self._BITRATES, m3u8_formats):
progressive_base_url = '%spublic/%s_%d.' % (progressive_base, path, bitrate)
mp4_f = m3u8_format.copy()
mp4_f.update({
'url': progressive_base_url + 'mp4',
'format_id': m3u8_format['format_id'].replace('hls', 'mp4'),
'protocol': 'http',
})
web_f = {
'url': progressive_base_url + 'webm',
'format_id': m3u8_format['format_id'].replace('hls', 'webm'),
'width': m3u8_format['width'],
'height': m3u8_format['height'],
'tbr': m3u8_format.get('tbr'),
'ext': 'webm',
}
formats.extend([web_f, mp4_f])
else:
for bitrate in self._BITRATES:
for ext in ('web', 'mp4'):
formats.append({
'format_id': '%s-%s' % (ext, bitrate),
'url': '%spublic/%s_%d.%s' % (progressive_base, path, bitrate, ext),
'tbr': bitrate,
'ext': ext,
})
self._sort_formats(formats)
return {
'id': video_id,
'title': title,
'description': self._html_search_meta('description', webpage),
'thumbnail': self._html_search_meta('thumbnail', webpage),
'duration': parse_duration(self._html_search_meta('duration', webpage)),
'formats': formats,
}

View File

@@ -1,4 +1,4 @@
# encoding: utf-8
# coding: utf-8
from __future__ import unicode_literals
import re

View File

@@ -1,4 +1,4 @@
# encoding: utf-8
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor

View File

@@ -1,4 +1,4 @@
# encoding: utf-8
# coding: utf-8
from __future__ import unicode_literals
import re

View File

@@ -1,4 +1,4 @@
# -*- coding: utf-8 -*-
# coding: utf-8
from __future__ import unicode_literals
import re

View File

@@ -1,4 +1,4 @@
# encoding: utf-8
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor

View File

@@ -1,4 +1,4 @@
# encoding: utf-8
# coding: utf-8
from __future__ import unicode_literals
import re

View File

@@ -1,4 +1,4 @@
# encoding: utf-8
# coding: utf-8
from __future__ import unicode_literals
import re

View File

@@ -1,4 +1,4 @@
# encoding: utf-8
# coding: utf-8
from __future__ import unicode_literals
import re

View File

@@ -93,7 +93,7 @@ class NextMediaActionNewsIE(NextMediaIE):
class AppleDailyIE(NextMediaIE):
IE_DESC = '臺灣蘋果日報'
_VALID_URL = r'https?://(www|ent)\.appledaily\.com\.tw/(?:animation|appledaily|enews|realtimenews)/[^/]+/[^/]+/(?P<date>\d+)/(?P<id>\d+)(/.*)?'
_VALID_URL = r'https?://(www|ent)\.appledaily\.com\.tw/(?:animation|appledaily|enews|realtimenews|actionnews)/[^/]+/[^/]+/(?P<date>\d+)/(?P<id>\d+)(/.*)?'
_TESTS = [{
'url': 'http://ent.appledaily.com.tw/enews/article/entertainment/20150128/36354694',
'md5': 'a843ab23d150977cc55ef94f1e2c1e4d',
@@ -154,6 +154,9 @@ class AppleDailyIE(NextMediaIE):
'description': 'md5:7b859991a6a4fedbdf3dd3b66545c748',
'upload_date': '20140417',
},
}, {
'url': 'http://www.appledaily.com.tw/actionnews/appledaily/7/20161003/960588/',
'only_matching': True,
}]
_URL_PATTERN = r'\{url: \'(.+)\'\}'

View File

@@ -245,7 +245,11 @@ class NHLVideocenterCategoryIE(NHLBaseInfoExtractor):
class NHLIE(InfoExtractor):
IE_NAME = 'nhl.com'
_VALID_URL = r'https?://(?:www\.)?nhl\.com/([^/]+/)*c-(?P<id>\d+)'
_VALID_URL = r'https?://(?:www\.)?(?P<site>nhl|wch2016)\.com/(?:[^/]+/)*c-(?P<id>\d+)'
_SITES_MAP = {
'nhl': 'nhl',
'wch2016': 'wch',
}
_TESTS = [{
# type=video
'url': 'https://www.nhl.com/video/anisimov-cleans-up-mess/t-277752844/c-43663503',
@@ -270,13 +274,32 @@ class NHLIE(InfoExtractor):
'upload_date': '20160204',
'timestamp': 1454544904,
},
}, {
# Some m3u8 URLs are invalid (https://github.com/rg3/youtube-dl/issues/10713)
'url': 'https://www.nhl.com/predators/video/poile-laviolette-on-subban-trade/t-277437416/c-44315003',
'md5': '50b2bb47f405121484dda3ccbea25459',
'info_dict': {
'id': '44315003',
'ext': 'mp4',
'title': 'Poile, Laviolette on Subban trade',
'description': 'General manager David Poile and head coach Peter Laviolette share their thoughts on acquiring P.K. Subban from Montreal (06/29/16)',
'timestamp': 1467242866,
'upload_date': '20160629',
},
}, {
'url': 'https://www.wch2016.com/video/caneur-best-of-game-2-micd-up/t-281230378/c-44983703',
'only_matching': True,
}, {
'url': 'https://www.wch2016.com/news/3-stars-team-europe-vs-team-canada/c-282195068',
'only_matching': True,
}]
def _real_extract(self, url):
tmp_id = self._match_id(url)
mobj = re.match(self._VALID_URL, url)
tmp_id, site = mobj.group('id'), mobj.group('site')
video_data = self._download_json(
'https://nhl.bamcontent.com/nhl/id/v1/%s/details/web-v1.json' % tmp_id,
tmp_id)
'https://nhl.bamcontent.com/%s/id/v1/%s/details/web-v1.json'
% (self._SITES_MAP[site], tmp_id), tmp_id)
if video_data.get('type') == 'article':
video_data = video_data['media']
@@ -290,9 +313,11 @@ class NHLIE(InfoExtractor):
continue
ext = determine_ext(playback_url)
if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
m3u8_formats = self._extract_m3u8_formats(
playback_url, video_id, 'mp4', 'm3u8_native',
m3u8_id=playback.get('name', 'hls'), fatal=False))
m3u8_id=playback.get('name', 'hls'), fatal=False)
self._check_formats(m3u8_formats, video_id)
formats.extend(m3u8_formats)
else:
height = int_or_none(playback.get('height'))
formats.append({

View File

@@ -1,4 +1,4 @@
# encoding: utf-8
# coding: utf-8
from __future__ import unicode_literals
import re

View File

@@ -1,4 +1,4 @@
# encoding: utf-8
# coding: utf-8
from __future__ import unicode_literals
import re

View File

@@ -1,4 +1,4 @@
# encoding: utf-8
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor

View File

@@ -1,4 +1,4 @@
# encoding: utf-8
# coding: utf-8
from __future__ import unicode_literals
import re

View File

@@ -1,4 +1,4 @@
# encoding: utf-8
# coding: utf-8
from __future__ import unicode_literals
from .brightcove import (

View File

@@ -3,6 +3,7 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_HTTPError
from ..utils import (
fix_xml_ampersands,
orderedSet,
@@ -10,6 +11,7 @@ from ..utils import (
qualities,
strip_jsonp,
unified_strdate,
ExtractorError,
)
@@ -181,9 +183,16 @@ class NPOIE(NPOBaseIE):
continue
streams = format_info.get('streams')
if streams:
video_info = self._download_json(
streams[0] + '&type=json',
video_id, 'Downloading %s stream JSON' % format_id)
try:
video_info = self._download_json(
streams[0] + '&type=json',
video_id, 'Downloading %s stream JSON' % format_id)
except ExtractorError as ee:
if isinstance(ee.cause, compat_HTTPError) and ee.cause.code == 404:
error = (self._parse_json(ee.cause.read().decode(), video_id, fatal=False) or {}).get('errorstring')
if error:
raise ExtractorError(error, expected=True)
raise
else:
video_info = format_info
video_url = video_info.get('url')
@@ -459,8 +468,9 @@ class NPOPlaylistBaseIE(NPOIE):
class VPROIE(NPOPlaylistBaseIE):
IE_NAME = 'vpro'
_VALID_URL = r'https?://(?:www\.)?(?:tegenlicht\.)?vpro\.nl/(?:[^/]+/){2,}(?P<id>[^/]+)\.html'
_PLAYLIST_TITLE_RE = r'<h1[^>]+class=["\'].*?\bmedia-platform-title\b.*?["\'][^>]*>([^<]+)'
_VALID_URL = r'https?://(?:www\.)?(?:(?:tegenlicht\.)?vpro|2doc)\.nl/(?:[^/]+/)*(?P<id>[^/]+)\.html'
_PLAYLIST_TITLE_RE = (r'<h1[^>]+class=["\'].*?\bmedia-platform-title\b.*?["\'][^>]*>([^<]+)',
r'<h5[^>]+class=["\'].*?\bmedia-platform-subtitle\b.*?["\'][^>]*>([^<]+)')
_PLAYLIST_ENTRY_RE = r'data-media-id="([^"]+)"'
_TESTS = [
@@ -492,6 +502,27 @@ class VPROIE(NPOPlaylistBaseIE):
'title': 'education education',
},
'playlist_count': 2,
},
{
'url': 'http://www.2doc.nl/documentaires/series/2doc/2015/oktober/de-tegenprestatie.html',
'info_dict': {
'id': 'de-tegenprestatie',
'title': 'De Tegenprestatie',
},
'playlist_count': 2,
}, {
'url': 'http://www.2doc.nl/speel~VARA_101375237~mh17-het-verdriet-van-nederland~.html',
'info_dict': {
'id': 'VARA_101375237',
'ext': 'm4v',
'title': 'MH17: Het verdriet van Nederland',
'description': 'md5:09e1a37c1fdb144621e22479691a9f18',
'upload_date': '20150716',
},
'params': {
# Skip because of m3u8 download
'skip_download': True
},
}
]

View File

@@ -1,4 +1,4 @@
# encoding: utf-8
# coding: utf-8
from __future__ import unicode_literals
import re

View File

@@ -1,4 +1,4 @@
# encoding: utf-8
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor

View File

@@ -1,4 +1,4 @@
# encoding: utf-8
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor

View File

@@ -0,0 +1,36 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
extract_attributes,
)
class NZZIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?nzz\.ch/(?:[^/]+/)*[^/?#]+-ld\.(?P<id>\d+)'
_TEST = {
'url': 'http://www.nzz.ch/zuerich/gymizyte/gymizyte-schreiben-schueler-heute-noch-diktate-ld.9153',
'info_dict': {
'id': '9153',
},
'playlist_mincount': 6,
}
def _real_extract(self, url):
page_id = self._match_id(url)
webpage = self._download_webpage(url, page_id)
entries = []
for player_element in re.findall(r'(<[^>]+class="kalturaPlayer"[^>]*>)', webpage):
player_params = extract_attributes(player_element)
if player_params.get('data-type') not in ('kaltura_singleArticle',):
self.report_warning('Unsupported player type')
continue
entry_id = player_params['data-id']
entries.append(self.url_result(
'kaltura:1750922:' + entry_id, 'Kaltura', entry_id))
return self.playlist_result(entries, page_id)

View File

@@ -1,4 +1,4 @@
# encoding: utf-8
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor

View File

@@ -1,4 +1,4 @@
# encoding: utf-8
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor

View File

@@ -1,4 +1,4 @@
# encoding: utf-8
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor

View File

@@ -1,4 +1,4 @@
# encoding: utf-8
# coding: utf-8
from __future__ import unicode_literals
from ..compat import (

View File

@@ -2,13 +2,13 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from .jwplatform import JWPlatformBaseIE
from ..utils import (
str_to_int,
)
class PornoXOIE(InfoExtractor):
class PornoXOIE(JWPlatformBaseIE):
_VALID_URL = r'https?://(?:www\.)?pornoxo\.com/videos/(?P<id>\d+)/(?P<display_id>[^/]+)\.html'
_TEST = {
'url': 'http://www.pornoxo.com/videos/7564/striptease-from-sexy-secretary.html',
@@ -17,7 +17,8 @@ class PornoXOIE(InfoExtractor):
'id': '7564',
'ext': 'flv',
'title': 'Striptease From Sexy Secretary!',
'description': 'Striptease From Sexy Secretary!',
'display_id': 'striptease-from-sexy-secretary',
'description': 'md5:0ee35252b685b3883f4a1d38332f9980',
'categories': list, # NSFW
'thumbnail': 're:https?://.*\.jpg$',
'age_limit': 18,
@@ -26,23 +27,14 @@ class PornoXOIE(InfoExtractor):
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
video_id, display_id = mobj.groups()
webpage = self._download_webpage(url, video_id)
video_url = self._html_search_regex(
r'\'file\'\s*:\s*"([^"]+)"', webpage, 'video_url')
video_data = self._extract_jwplayer_data(webpage, video_id, require_title=False)
title = self._html_search_regex(
r'<title>([^<]+)\s*-\s*PornoXO', webpage, 'title')
description = self._html_search_regex(
r'<meta name="description" content="([^"]+)\s*featuring',
webpage, 'description', fatal=False)
thumbnail = self._html_search_regex(
r'\'image\'\s*:\s*"([^"]+)"', webpage, 'thumbnail', fatal=False)
view_count = str_to_int(self._html_search_regex(
r'[vV]iews:\s*([0-9,]+)', webpage, 'view count', fatal=False))
@@ -53,13 +45,14 @@ class PornoXOIE(InfoExtractor):
None if categories_str is None
else categories_str.split(','))
return {
video_data.update({
'id': video_id,
'url': video_url,
'title': title,
'description': description,
'thumbnail': thumbnail,
'display_id': display_id,
'description': self._html_search_meta('description', webpage),
'categories': categories,
'view_count': view_count,
'age_limit': 18,
}
})
return video_data

View File

@@ -1,4 +1,4 @@
# encoding: utf-8
# coding: utf-8
from __future__ import unicode_literals
import re

View File

@@ -1,4 +1,4 @@
# -*- coding: utf-8 -*-
# coding: utf-8
from __future__ import unicode_literals
from .prosiebensat1 import ProSiebenSat1BaseIE

View File

@@ -1,4 +1,4 @@
# -*- coding: utf-8 -*-
# coding: utf-8
from __future__ import unicode_literals

View File

@@ -1,29 +1,29 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import str_or_none
from ..utils import (
qualities,
str_or_none,
)
class ReverbNationIE(InfoExtractor):
_VALID_URL = r'^https?://(?:www\.)?reverbnation\.com/.*?/song/(?P<id>\d+).*?$'
_TESTS = [{
'url': 'http://www.reverbnation.com/alkilados/song/16965047-mona-lisa',
'md5': '3da12ebca28c67c111a7f8b262d3f7a7',
'md5': 'c0aaf339bcee189495fdf5a8c8ba8645',
'info_dict': {
'id': '16965047',
'ext': 'mp3',
'title': 'MONA LISA',
'uploader': 'ALKILADOS',
'uploader_id': '216429',
'thumbnail': 're:^https://gp1\.wac\.edgecastcdn\.net/.*?\.jpg$'
'thumbnail': 're:^https?://.*\.jpg',
},
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
song_id = mobj.group('id')
song_id = self._match_id(url)
api_res = self._download_json(
'https://api.reverbnation.com/song/%s' % song_id,
@@ -31,14 +31,23 @@ class ReverbNationIE(InfoExtractor):
note='Downloading information of song %s' % song_id
)
THUMBNAILS = ('thumbnail', 'image')
quality = qualities(THUMBNAILS)
thumbnails = []
for thumb_key in THUMBNAILS:
if api_res.get(thumb_key):
thumbnails.append({
'url': api_res[thumb_key],
'preference': quality(thumb_key)
})
return {
'id': song_id,
'title': api_res.get('name'),
'url': api_res.get('url'),
'title': api_res['name'],
'url': api_res['url'],
'uploader': api_res.get('artist', {}).get('name'),
'uploader_id': str_or_none(api_res.get('artist', {}).get('id')),
'thumbnail': self._proto_relative_url(
api_res.get('image', api_res.get('thumbnail'))),
'thumbnails': thumbnails,
'ext': 'mp3',
'vcodec': 'none',
}

View File

@@ -1,4 +1,4 @@
# encoding: utf-8
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor

View File

@@ -1,4 +1,4 @@
# encoding: utf-8
# coding: utf-8
from __future__ import unicode_literals
import re

View File

@@ -1,4 +1,4 @@
# encoding: utf-8
# coding: utf-8
from __future__ import unicode_literals
import base64

View File

@@ -1,4 +1,4 @@
# -*- coding: utf-8 -*-
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor

View File

@@ -1,4 +1,4 @@
# encoding: utf-8
# coding: utf-8
from __future__ import unicode_literals
import re

View File

@@ -1,4 +1,4 @@
# encoding: utf-8
# coding: utf-8
from __future__ import unicode_literals
import re

View File

@@ -1,4 +1,4 @@
# encoding: utf-8
# coding: utf-8
from __future__ import unicode_literals
import re

View File

@@ -1,4 +1,4 @@
# encoding: utf-8
# coding: utf-8
from __future__ import unicode_literals
import re

View File

@@ -1,4 +1,4 @@
# -*- coding: utf-8 -*-
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor

View File

@@ -1,4 +1,4 @@
# -*- coding: utf-8 -*-
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor

View File

@@ -1,4 +1,4 @@
# encoding: utf-8
# coding: utf-8
from __future__ import unicode_literals
import re

View File

@@ -1,7 +1,5 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
@@ -9,7 +7,7 @@ class SlutloadIE(InfoExtractor):
_VALID_URL = r'^https?://(?:\w+\.)?slutload\.com/video/[^/]+/(?P<id>[^/]+)/?$'
_TEST = {
'url': 'http://www.slutload.com/video/virginie-baisee-en-cam/TD73btpBqSxc/',
'md5': '0cf531ae8006b530bd9df947a6a0df77',
'md5': '868309628ba00fd488cf516a113fd717',
'info_dict': {
'id': 'TD73btpBqSxc',
'ext': 'mp4',
@@ -20,9 +18,7 @@ class SlutloadIE(InfoExtractor):
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
video_title = self._html_search_regex(r'<h1><strong>([^<]+)</strong>',

View File

@@ -1,4 +1,4 @@
# encoding: utf-8
# coding: utf-8
from __future__ import unicode_literals
import re

View File

@@ -1,4 +1,4 @@
# encoding: utf-8
# coding: utf-8
from __future__ import unicode_literals
import re

View File

@@ -1,4 +1,4 @@
# encoding: utf-8
# coding: utf-8
from __future__ import unicode_literals
import re

View File

@@ -1,4 +1,4 @@
# encoding: utf-8
# coding: utf-8
from __future__ import unicode_literals
from .mtv import MTVServicesInfoExtractor

View File

@@ -1,4 +1,4 @@
# encoding: utf-8
# coding: utf-8
from __future__ import unicode_literals
import re

View File

@@ -1,4 +1,4 @@
# encoding: utf-8
# coding: utf-8
from __future__ import unicode_literals
from .ard import ARDMediathekIE

View File

@@ -1,6 +1,8 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
ExtractorError,
@@ -48,6 +50,14 @@ class StreamableIE(InfoExtractor):
}
]
@staticmethod
def _extract_url(webpage):
mobj = re.search(
r'<iframe[^>]+src=(?P<q1>[\'"])(?P<src>(?:https?:)?//streamable\.com/(?:(?!\1).+))(?P=q1)',
webpage)
if mobj:
return mobj.group('src')
def _real_extract(self, url):
video_id = self._match_id(url)

View File

@@ -1,4 +1,4 @@
# -*- coding: utf-8 -*-
# coding: utf-8
from __future__ import unicode_literals
import hashlib

View File

@@ -1,4 +1,4 @@
# -*- coding: utf-8 -*-
# coding: utf-8
from __future__ import unicode_literals
import re

Some files were not shown because too many files have changed in this diff Show More