Compare commits

...

94 Commits

Author SHA1 Message Date
Philipp Hagemeister
7642c08763 release 2014.12.17.2 2014-12-17 11:39:25 +01:00
Philipp Hagemeister
fdc8000810 [downloader] Handle a file ./- (Fixes #4498) 2014-12-17 11:39:06 +01:00
Philipp Hagemeister
a91c9b15e3 release 2014.12.17.1 2014-12-17 11:29:52 +01:00
Philipp Hagemeister
27d67ea2ba [comedycentral] Match URLs with a second ID (fixes #4499) 2014-12-17 11:29:35 +01:00
Philipp Hagemeister
d6a8160902 release 2014.12.17 2014-12-17 10:53:17 +01:00
Philipp Hagemeister
6e1b9395c6 [screencastomatic] Add new extractor (Fixes #4497) 2014-12-17 10:53:12 +01:00
Philipp Hagemeister
b1ccbed3d4 [nhl] Allow upper-case video IDs (Fixes #4494) 2014-12-17 00:26:04 +01:00
Philipp Hagemeister
37381350f8 [aljazeera] Add unicode_literals marker 2014-12-17 00:08:04 +01:00
Philipp Hagemeister
7af808a5ef Improve code style 2014-12-17 00:06:41 +01:00
Philipp Hagemeister
876bef5937 [mit] Modernize 2014-12-17 00:04:24 +01:00
Jaime Marquínez Ferrándiz
a16af51873 flake8: Add more ignored files
* setup.py: the '__version__' variable is not defined in the script, it is loadded from youtube_dl/version.py
* devscripts/buildserver.py: Produces a lot of messages
2014-12-16 20:38:59 +01:00
Jaime Marquínez Ferrándiz
dc9a441bfa Move flake8 configuration to setup.cfg
It will be used when calling flake8 from any directory in the project
2014-12-16 20:34:07 +01:00
Jaime Marquínez Ferrándiz
ee6dfe8308 Use flake8 instead of pyflakes and pep8
It combines both tools
2014-12-16 20:34:07 +01:00
Jaime Marquínez Ferrándiz
2cb5b03e53 [test/test_unicode_literals] Remove duplicated imports 2014-12-16 20:33:23 +01:00
Philipp Hagemeister
964b190350 release 2014.12.16.2 2014-12-16 16:45:35 +01:00
Philipp Hagemeister
13d27a42cc [orf:tvthek] Add support for topic URLs (Fixes #4474) 2014-12-16 16:45:28 +01:00
Philipp Hagemeister
ec05fee43a [brightcove] Add shorter URL scheme for other extractors 2014-12-16 16:38:26 +01:00
Philipp Hagemeister
b50e3bc67f [README] Add table of contents (Closes #4458) 2014-12-16 16:33:23 +01:00
Philipp Hagemeister
ac78b5e97b release 2014.12.16.1 2014-12-16 16:03:57 +01:00
Philipp Hagemeister
17e0d63957 Merge branch 'master' of github.com:rg3/youtube-dl 2014-12-16 16:03:46 +01:00
Sergey M․
9209fe3878 [allocine] Add test for new URL format 2014-12-16 21:03:10 +06:00
Philipp Hagemeister
84d84211ac [youtube:feeds] (Fixes #4486) 2014-12-16 15:59:31 +01:00
Sergey M.
b4116dcdd5 Merge pull request #4490 from Tailszefox/master
[Allocine] Support for more URLs
2014-12-16 20:59:07 +06:00
Jaime Marquínez Ferrándiz
bb18d787b5 [aljazeera] Add extractor (closes #4487) 2014-12-16 15:48:01 +01:00
Tailszefox
0647084f39 [Allocine] Support for more URLs 2014-12-16 15:46:04 +01:00
Philipp Hagemeister
734ea11e3c Drop hash character in downloader output (#4484) 2014-12-16 00:37:42 +01:00
Philipp Hagemeister
3940450878 release 2014.12.16 2014-12-16 00:24:30 +01:00
Philipp Hagemeister
ccbfaa83b0 [devscripts/make_contributing] Switch to optparse (Fixes #4483) 2014-12-16 00:24:11 +01:00
Philipp Hagemeister
d86007873e [YoutubeDL] Document where details for format can be found 2014-12-16 00:22:23 +01:00
Jaime Marquínez Ferrándiz
4b7df0d30c [youtube:playlist] Work around buggy playlists (fixes #4449)
They show a "Load more" button, but they don't have more videos.
The continuation url in the json file was a link to itself, so we ended up in an infinite loop.
2014-12-15 19:19:15 +01:00
Philipp Hagemeister
caff59499c [README] Fix code rendering 2014-12-15 11:14:06 +01:00
Philipp Hagemeister
99a0f9824a [README] Highlight code examples 2014-12-15 11:11:52 +01:00
Jaime Marquínez Ferrándiz
3013bbb27d Remove unused imports 2014-12-15 08:24:50 +01:00
Naglis Jonaitis
6f9b54933f [streamcloud] Modernize 2014-12-15 03:32:17 +02:00
Naglis Jonaitis
1bbe317508 [mooshare] Modernize 2014-12-15 03:31:54 +02:00
Philipp Hagemeister
e97a534f13 release 2014.12.15 2014-12-15 01:36:46 +01:00
Philipp Hagemeister
8acb83d993 [README] Make example audio sound not that horrible ;) 2014-12-15 01:34:39 +01:00
Philipp Hagemeister
71b640cc5b [YoutubeDL] Add declarative version of progress hooks 2014-12-15 01:26:20 +01:00
Philipp Hagemeister
4f026fafbc [YoutubeDL] Make postprocessors declarative
Instead of having to configure PPs in code, this allows us and embedding programs not to worry about imports or finer details, similarly to how we handle IEs.
2014-12-15 01:06:25 +01:00
Philipp Hagemeister
39f594d660 [Makefile] Ensure that offline test really is offline 2014-12-15 00:59:23 +01:00
Philipp Hagemeister
cae97f6521 Improve and test ffmpeg version detection 2014-12-14 21:59:59 +01:00
Philipp Hagemeister
6cbf345f28 Remove test/write_info_json
This is now covered by every single test_download testcase anyways :)
2014-12-14 21:56:12 +01:00
Philipp Hagemeister
a0ab29f8a1 Add offlinetest make target 2014-12-14 21:55:57 +01:00
Naglis Jonaitis
4a4fbfc967 [yesjapan] Look for datetime inside submit_info
Oops..
2014-12-14 18:03:05 +02:00
Naglis Jonaitis
408b5839b1 [yesjapan] Add new extractor (Closes #4466) 2014-12-14 17:59:25 +02:00
Philipp Hagemeister
60620368d7 [youtube] Fix player ID detection 2014-12-14 00:43:34 +01:00
Philipp Hagemeister
4927de4f86 release 2014.12.14 2014-12-14 00:13:17 +01:00
Philipp Hagemeister
bad5c1a303 [rtp] Also match e-id-less URLs (#4382) 2014-12-14 00:13:07 +01:00
Philipp Hagemeister
6f18cc9abc release 2014.12.13.1 2014-12-13 23:51:57 +01:00
Philipp Hagemeister
4d144be8b0 [bandcamp:album] Do not match plain Bandcamp URLs (#4461)
The _VALID_URL 1fa174692a is to broad, since it matches everything beginning with bandcamp.com.
2014-12-13 23:50:06 +01:00
Philipp Hagemeister
2128b696b8 [utils] Do not make an exception for SSLv3
SSLv3 is terminally vulnerable to POODLE; web browsers are currently deprecating/removing it.
Closes #4459, fixes #4294
2014-12-13 23:45:34 +01:00
Philipp Hagemeister
a23669220a [utils] Make ssl work on Python 2.7.8 2014-12-13 23:27:21 +01:00
Philipp Hagemeister
051c46256b release 2014.12.13 2014-12-13 23:13:48 +01:00
Philipp Hagemeister
d5524947b5 Merge remote-tracking branch 'fstirlitz/master' 2014-12-13 23:05:41 +01:00
Philipp Hagemeister
74f91c4af7 Merge branch 'master' of github.com:rg3/youtube-dl 2014-12-13 23:05:28 +01:00
Philipp Hagemeister
da4d4191a9 Merge branch 'master' of github.com:rg3/youtube-dl 2014-12-13 23:05:22 +01:00
Sergey M․
2564300e55 Credit @Mortal for restudy (#4463) 2014-12-14 03:42:42 +06:00
Sergey M․
cb0713d2c9 Merge branch 'Mortal-restudy' 2014-12-14 03:41:17 +06:00
Sergey M․
ac265bef1e [restudy] Simplify and extract all formats 2014-12-14 03:41:00 +06:00
Mathias Rav
4a0132c570 [Restudy] Add new extractor for restudy.dk 2014-12-13 22:25:32 +01:00
Sergey M․
1fa174692a [bandcamp:album] Make path optional (Closes #4461) 2014-12-14 02:00:54 +06:00
Sergey M․
04c9544187 [bbccouk] Fix vpid warning 2014-12-13 18:47:34 +06:00
Sergey M․
8085fc15cc [adultswim] Improve segment duration extraction 2014-12-13 18:42:29 +06:00
Philipp Hagemeister
2f15832f56 Merge pull request #3927 from qrtt1/master
apply ratelimit to f4m
2014-12-13 12:59:12 +01:00
Jaime Marquínez Ferrándiz
1557ed153c [test_unicode_literals] Import from test.helper 2014-12-13 12:45:09 +01:00
Philipp Hagemeister
a6620ac28d [orf] Modernize 2014-12-13 12:41:38 +01:00
Philipp Hagemeister
89e36657cc [keek] remove unused import 2014-12-13 12:36:46 +01:00
Philipp Hagemeister
7129bed51b [keek] Modernize and extract uploader 2014-12-13 12:35:45 +01:00
Philipp Hagemeister
1cc79574fc Fix imports and general cleanup
· Import from compat what comes from compat. Yes, some names are available in utils too, but that's an implementation detail.
· Use _match_id consistently whenever possible
· Fix some outdated tests
· Use consistent valid URL (always match the whole protocol, no ^ at start required)
· Use modern test definitions
2014-12-13 12:35:45 +01:00
Philipp Hagemeister
20e35880bf [streamcz] Update extractor 2014-12-13 12:35:45 +01:00
Philipp Hagemeister
5e1912cfc1 [5min] Remove helper method and modernize
Previously, other extractor would go call a private(!) helper method. Instead, just hardcode the 5min:video_id format - it's not if that would ever change.
2014-12-13 12:35:45 +01:00
Jaime Marquínez Ferrándiz
293f0f39ce [utils] make_HTTPS_handler: Remove try/except block that would always raise an exception
This code is only run for Python < 3.4, where context.load_default_certs doesn't exist
2014-12-12 23:43:25 +01:00
Jaime Marquínez Ferrándiz
0db261ba56 [utils] make_HTTPS_handler: Use ssl.create_default_context in Python 2.7.9
The new features in the ssl module have been backported from 3.4, see https://docs.python.org/dev/whatsnew/2.7.html#pep-466-network-security-enhancements-for-python-2-7
2014-12-12 23:35:17 +01:00
felix
7668a2c5cb [comcarcoff] add webpage_url datum 2014-12-12 23:20:34 +01:00
Jaime Marquínez Ferrándiz
26c06f0c51 [youtube:playlist] Remove unused property 2014-12-12 22:26:50 +01:00
Jaime Marquínez Ferrándiz
23d3608c6b [youtube:channel] Fix extraction (fixes #4435)
It uses now the same pagination system as playlists
2014-12-12 22:23:54 +01:00
Philipp Hagemeister
baa7081d68 [urort] Update to new multi-format protocol 2014-12-12 20:55:18 +01:00
Philipp Hagemeister
19bf2b4e88 [comcarcoff] Add unicode_literals declaration 2014-12-12 20:37:58 +01:00
Philipp Hagemeister
6a1b20de2a [urort] Modernize 2014-12-12 20:37:28 +01:00
Philipp Hagemeister
3c864e930d [comcarcoff] Adapt c62159ea91a04ef82560472b254aef1cc9f70a11 2014-12-12 20:35:17 +01:00
Philipp Hagemeister
dc5596ff54 [comcarcoff] (#4454) 2014-12-12 20:32:02 +01:00
Philipp Hagemeister
46d9760f5e Merge remote-tracking branch 'fstirlitz/master' 2014-12-12 20:17:26 +01:00
Philipp Hagemeister
90d71d3f08 [ooyala] Remove test md5sums 2014-12-12 20:12:51 +01:00
Philipp Hagemeister
e9404524cc [ninegag] Test for additional properties 2014-12-12 20:10:15 +01:00
felix
dc65a213fd comediansincarsgettingcoffee.com support 2014-12-12 19:58:44 +01:00
Philipp Hagemeister
4237ba10dc [pornotube] Adapt to new interface 2014-12-12 19:44:25 +01:00
Naglis Jonaitis
c3f3b29b92 [rtp] Add new extractor (Closes #4382) 2014-12-12 20:22:24 +02:00
Philipp Hagemeister
1c985da0ca release 2014.12.12.7 2014-12-12 18:25:58 +01:00
Philipp Hagemeister
7a60322abf release 2014.12.12.6 2014-12-12 17:52:50 +01:00
Sergey M․
07bc9a3530 [nowvideo] Add .li domain (Closes #4453) 2014-12-12 22:44:16 +06:00
Philipp Hagemeister
a099965bad release 2014.12.12.5 2014-12-12 17:40:27 +01:00
Philipp Hagemeister
146323a7f8 [groupon] Add extractor (Fixes #4386) 2014-12-12 17:39:33 +01:00
Philipp Hagemeister
57e086dcea [ebaumsworld] Modernize 2014-12-12 17:24:05 +01:00
Ching Yi, Chan
b1c3a49fff apply ratelimit to f4m 2014-10-12 08:32:26 +08:00
158 changed files with 1321 additions and 766 deletions

View File

@@ -92,3 +92,4 @@ Tithen-Firion
Zack Fernandes
cryptonaut
Adrian Kretz
Mathias Rav

View File

@@ -35,13 +35,22 @@ install: youtube-dl youtube-dl.1 youtube-dl.bash-completion youtube-dl.zsh youtu
install -d $(DESTDIR)$(SYSCONFDIR)/fish/completions
install -m 644 youtube-dl.fish $(DESTDIR)$(SYSCONFDIR)/fish/completions/youtube-dl.fish
codetest:
flake8 .
test:
#nosetests --with-coverage --cover-package=youtube_dl --cover-html --verbose --processes 4 test
nosetests --verbose test
$(MAKE) codetest
ot: offlinetest
offlinetest: codetest
nosetests --verbose test --exclude test_download --exclude test_age_restriction --exclude test_subtitles --exclude test_write_annotations
tar: youtube-dl.tar.gz
.PHONY: all clean install test tar bash-completion pypi-files zsh-completion fish-completion
.PHONY: all clean install test tar bash-completion pypi-files zsh-completion fish-completion ot offlinetest codetest
pypi-files: youtube-dl.bash-completion README.txt youtube-dl.1 youtube-dl.fish

View File

@@ -1,7 +1,15 @@
youtube-dl - download videos from youtube.com or other video platforms
# SYNOPSIS
**youtube-dl** [OPTIONS] URL [URL...]
- [INSTALLATION](#installation)
- [DESCRIPTION](#description)
- [OPTIONS](#options)
- [CONFIGURATION](#configuration)
- [OUTPUT TEMPLATE](#output-template)
- [VIDEO SELECTION](#video-selection)
- [FAQ](#faq)
- [DEVELOPER INSTRUCTIONS](#developer-instructions)
- [BUGS](#bugs)
- [COPYRIGHT](#copyright)
# INSTALLATION
@@ -34,6 +42,8 @@ YouTube.com and a few more sites. It requires the Python interpreter, version
your Unix box, on Windows or on Mac OS X. It is released to the public domain,
which means you can modify it, redistribute it or use it however you like.
youtube-dl [OPTIONS] URL [URL...]
# OPTIONS
-h, --help print this help text and exit
--version print program version and exit
@@ -529,14 +539,52 @@ youtube-dl makes the best effort to be a good command-line program, and thus sho
From a Python program, you can embed youtube-dl in a more powerful fashion, like this:
import youtube_dl
```python
import youtube_dl
ydl_opts = {}
with youtube_dl.YoutubeDL(ydl_opts) as ydl:
ydl.download(['http://www.youtube.com/watch?v=BaW_jenozKc'])
ydl_opts = {}
with youtube_dl.YoutubeDL(ydl_opts) as ydl:
ydl.download(['http://www.youtube.com/watch?v=BaW_jenozKc'])
```
Most likely, you'll want to use various options. For a list of what can be done, have a look at [youtube_dl/YoutubeDL.py](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/YoutubeDL.py#L69). For a start, if you want to intercept youtube-dl's output, set a `logger` object.
Here's a more complete example of a program that outputs only errors (and a short message after the download is finished), and downloads/converts the video to an mp3 file:
```python
import youtube_dl
class MyLogger(object):
def debug(self, msg):
pass
def warning(self, msg):
pass
def error(self, msg):
print(msg)
def my_hook(d):
if d['status'] == 'finished':
print('Done downloading, now converting ...')
ydl_opts = {
'format': 'bestaudio/best',
'postprocessors': [{
'key': 'FFmpegExtractAudio',
'preferredcodec': 'mp3',
'preferredquality': '192',
}],
'logger': MyLogger(),
'progress_hooks': [my_hook],
}
with youtube_dl.YoutubeDL(ydl_opts) as ydl:
ydl.download(['http://www.youtube.com/watch?v=BaW_jenozKc'])
```
# BUGS
Bugs and suggestions should be reported at: <https://github.com/rg3/youtube-dl/issues> . Unless you were prompted so or there is another pertinent reason (e.g. GitHub fails to accept the bug report), please do not send bug reports via personal email. For discussions, join us in the irc channel #youtube-dl on freenode.

View File

@@ -1,20 +1,20 @@
#!/usr/bin/env python
from __future__ import unicode_literals
import argparse
import io
import optparse
import re
def main():
parser = argparse.ArgumentParser()
parser.add_argument(
'INFILE', help='README.md file name to read from')
parser.add_argument(
'OUTFILE', help='CONTRIBUTING.md file name to write to')
args = parser.parse_args()
parser = optparse.OptionParser(usage='%prog INFILE OUTFILE')
options, args = parser.parse_args()
if len(args) != 2:
parser.error('Expected an input and an output filename')
with io.open(args.INFILE, encoding='utf-8') as inf:
infile, outfile = args
with io.open(infile, encoding='utf-8') as inf:
readme = inf.read()
bug_text = re.search(
@@ -25,7 +25,7 @@ def main():
out = bug_text + dev_text
with io.open(args.OUTFILE, 'w', encoding='utf-8') as outf:
with io.open(outfile, 'w', encoding='utf-8') as outf:
outf.write(out)
if __name__ == '__main__':

View File

@@ -11,8 +11,19 @@ README_FILE = os.path.join(ROOT_DIR, 'README.md')
with io.open(README_FILE, encoding='utf-8') as f:
readme = f.read()
PREFIX = '%YOUTUBE-DL(1)\n\n# NAME\n'
readme = re.sub(r'(?s)# INSTALLATION.*?(?=# DESCRIPTION)', '', readme)
PREFIX = '''%YOUTUBE-DL(1)
# NAME
youtube\-dl \- download videos from youtube.com or other video platforms
# SYNOPSIS
**youtube-dl** \[OPTIONS\] URL [URL...]
'''
readme = re.sub(r'(?s)^.*?(?=# DESCRIPTION)', '', readme)
readme = re.sub(r'\s+youtube-dl \[OPTIONS\] URL \[URL\.\.\.\]', '', readme)
readme = PREFIX + readme
if sys.version_info < (3, 0):

View File

@@ -1,2 +1,6 @@
[wheel]
universal = True
[flake8]
exclude = youtube_dl/extractor/__init__.py,devscripts/buildserver.py,setup.py
ignore = E501

View File

@@ -7,9 +7,7 @@ import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
import io
import os
import re
import unittest
rootDir = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
@@ -20,7 +18,7 @@ IGNORED_FILES = [
]
from helper import assertRegexpMatches
from test.helper import assertRegexpMatches
class TestUnicodeLiterals(unittest.TestCase):

View File

@@ -16,39 +16,40 @@ import json
import xml.etree.ElementTree
from youtube_dl.utils import (
args_to_str,
clean_html,
DateRange,
detect_exe_version,
encodeFilename,
escape_rfc3986,
escape_url,
find_xpath_attr,
fix_xml_ampersands,
orderedSet,
OnDemandPagedList,
InAdvancePagedList,
intlist_to_bytes,
js_to_json,
limit_length,
OnDemandPagedList,
orderedSet,
parse_duration,
parse_filesize,
parse_iso8601,
read_batch_urls,
sanitize_filename,
shell_quote,
smuggle_url,
str_to_int,
strip_jsonp,
struct_unpack,
timeconvert,
unescapeHTML,
unified_strdate,
unsmuggle_url,
uppercase_escape,
url_basename,
urlencode_postdata,
xpath_with_ns,
parse_iso8601,
strip_jsonp,
uppercase_escape,
limit_length,
escape_rfc3986,
escape_url,
js_to_json,
intlist_to_bytes,
args_to_str,
parse_filesize,
version_tuple,
xpath_with_ns,
)
@@ -390,5 +391,16 @@ class TestUtil(unittest.TestCase):
self.assertEqual(version_tuple('10.23.344'), (10, 23, 344))
self.assertEqual(version_tuple('10.1-6'), (10, 1, 6)) # avconv style
def test_detect_exe_version(self):
self.assertEqual(detect_exe_version('''ffmpeg version 1.2.1
built on May 27 2013 08:37:26 with gcc 4.7 (Debian 4.7.3-4)
configuration: --prefix=/usr --extra-'''), '1.2.1')
self.assertEqual(detect_exe_version('''ffmpeg version N-63176-g1fb4685
built on May 15 2014 22:09:06 with gcc 4.8.2 (GCC)'''), 'N-63176-g1fb4685')
self.assertEqual(detect_exe_version('''X server found. dri2 connection failed!
Trying to open render node...
Success at /dev/dri/renderD128.
ffmpeg version 2.4.4 Copyright (c) 2000-2014 the FFmpeg ...'''), '2.4.4')
if __name__ == '__main__':
unittest.main()

View File

@@ -1,76 +0,0 @@
#!/usr/bin/env python
# coding: utf-8
from __future__ import unicode_literals
# Allow direct execution
import os
import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import get_params
import io
import json
import youtube_dl.YoutubeDL
import youtube_dl.extractor
class YoutubeDL(youtube_dl.YoutubeDL):
def __init__(self, *args, **kwargs):
super(YoutubeDL, self).__init__(*args, **kwargs)
self.to_stderr = self.to_screen
params = get_params({
'writeinfojson': True,
'skip_download': True,
'writedescription': True,
})
TEST_ID = 'BaW_jenozKc'
INFO_JSON_FILE = TEST_ID + '.info.json'
DESCRIPTION_FILE = TEST_ID + '.mp4.description'
EXPECTED_DESCRIPTION = '''test chars: "'/\ä↭𝕐
test URL: https://github.com/rg3/youtube-dl/issues/1892
This is a test video for youtube-dl.
For more information, contact phihag@phihag.de .'''
class TestInfoJSON(unittest.TestCase):
def setUp(self):
# Clear old files
self.tearDown()
def test_info_json(self):
ie = youtube_dl.extractor.YoutubeIE()
ydl = YoutubeDL(params)
ydl.add_info_extractor(ie)
ydl.download([TEST_ID])
self.assertTrue(os.path.exists(INFO_JSON_FILE))
with io.open(INFO_JSON_FILE, 'r', encoding='utf-8') as jsonf:
jd = json.load(jsonf)
self.assertEqual(jd['upload_date'], '20121002')
self.assertEqual(jd['description'], EXPECTED_DESCRIPTION)
self.assertEqual(jd['id'], TEST_ID)
self.assertEqual(jd['extractor'], 'youtube')
self.assertEqual(jd['title'], '''youtube-dl test video "'/\ä↭𝕐''')
self.assertEqual(jd['uploader'], 'Philipp Hagemeister')
self.assertTrue(os.path.exists(DESCRIPTION_FILE))
with io.open(DESCRIPTION_FILE, 'r', encoding='utf-8') as descf:
descr = descf.read()
self.assertEqual(descr, EXPECTED_DESCRIPTION)
def tearDown(self):
if os.path.exists(INFO_JSON_FILE):
os.remove(INFO_JSON_FILE)
if os.path.exists(DESCRIPTION_FILE):
os.remove(DESCRIPTION_FILE)
if __name__ == '__main__':
unittest.main()

View File

@@ -27,6 +27,7 @@ from .compat import (
compat_cookiejar,
compat_expanduser,
compat_http_client,
compat_kwargs,
compat_str,
compat_urllib_error,
compat_urllib_request,
@@ -67,7 +68,11 @@ from .cache import Cache
from .extractor import get_info_extractor, gen_extractors
from .downloader import get_suitable_downloader
from .downloader.rtmp import rtmpdump_version
from .postprocessor import FFmpegMergerPP, FFmpegPostProcessor
from .postprocessor import (
FFmpegMergerPP,
FFmpegPostProcessor,
get_postprocessor,
)
from .version import __version__
@@ -116,7 +121,7 @@ class YoutubeDL(object):
dump_single_json: Force printing the info_dict of the whole playlist
(or video) as a single JSON line.
simulate: Do not download the video files.
format: Video format code.
format: Video format code. See options.py for more information.
format_limit: Highest quality format to try.
outtmpl: Template for output names.
restrictfilenames: Do not allow "&" and spaces in file names
@@ -176,6 +181,28 @@ class YoutubeDL(object):
extract_flat: Do not resolve URLs, return the immediate result.
Pass in 'in_playlist' to only show this behavior for
playlist items.
postprocessors: A list of dictionaries, each with an entry
* key: The name of the postprocessor. See
youtube_dl/postprocessor/__init__.py for a list.
as well as any further keyword arguments for the
postprocessor.
progress_hooks: A list of functions that get called on download
progress, with a dictionary with the entries
* filename: The final filename
* status: One of "downloading" and "finished"
The dict may also have some of the following entries:
* downloaded_bytes: Bytes on disk
* total_bytes: Size of the whole file, None if unknown
* tmpfilename: The filename we're currently writing to
* eta: The estimated time in seconds, None if unknown
* speed: The download speed in bytes/second, None if
unknown
Progress hooks are guaranteed to be called at least once
(with status "finished") if the download is successful.
The following parameters are not used by YoutubeDL itself, they are used by
the FileDownloader:
@@ -256,6 +283,16 @@ class YoutubeDL(object):
self.print_debug_header()
self.add_default_info_extractors()
for pp_def_raw in self.params.get('postprocessors', []):
pp_class = get_postprocessor(pp_def_raw['key'])
pp_def = dict(pp_def_raw)
del pp_def['key']
pp = pp_class(self, **compat_kwargs(pp_def))
self.add_post_processor(pp)
for ph in self.params.get('progress_hooks', []):
self.add_progress_hook(ph)
def warn_if_short_id(self, argv):
# short YouTube ID starting with dash?
idxs = [
@@ -675,7 +712,7 @@ class YoutubeDL(object):
entries = entries[::-1]
for i, entry in enumerate(entries, 1):
self.to_screen('[download] Downloading video #%s of %s' % (i, n_entries))
self.to_screen('[download] Downloading video %s of %s' % (i, n_entries))
extra = {
'n_entries': n_entries,
'playlist': playlist,

View File

@@ -40,16 +40,6 @@ from .downloader import (
)
from .extractor import gen_extractors
from .YoutubeDL import YoutubeDL
from .postprocessor import (
AtomicParsleyPP,
FFmpegAudioFixPP,
FFmpegMetadataPP,
FFmpegVideoConvertor,
FFmpegExtractAudioPP,
FFmpegEmbedSubtitlePP,
XAttrMetadataPP,
ExecAfterDownloadPP,
)
def _real_main(argv=None):
@@ -212,6 +202,43 @@ def _real_main(argv=None):
any_printing = opts.geturl or opts.gettitle or opts.getid or opts.getthumbnail or opts.getdescription or opts.getfilename or opts.getformat or opts.getduration or opts.dumpjson or opts.dump_single_json
download_archive_fn = compat_expanduser(opts.download_archive) if opts.download_archive is not None else opts.download_archive
# PostProcessors
postprocessors = []
# Add the metadata pp first, the other pps will copy it
if opts.addmetadata:
postprocessors.append({'key': 'FFmpegMetadata'})
if opts.extractaudio:
postprocessors.append({
'key': 'FFmpegExtractAudio',
'preferredcodec': opts.audioformat,
'preferredquality': opts.audioquality,
'nopostoverwrites': opts.nopostoverwrites,
})
if opts.recodevideo:
postprocessors.append({
'key': 'FFmpegVideoConvertor',
'preferedformat': opts.recodevideo,
})
if opts.embedsubtitles:
postprocessors.append({
'key': 'FFmpegEmbedSubtitle',
'subtitlesformat': opts.subtitlesformat,
})
if opts.xattrs:
postprocessors.append({'key': 'XAttrMetadata'})
if opts.embedthumbnail:
if not opts.addmetadata:
postprocessors.append({'key': 'FFmpegAudioFix'})
postprocessors.append({'key': 'AtomicParsley'})
# Please keep ExecAfterDownload towards the bottom as it allows the user to modify the final file in any way.
# So if the user is able to remove the file before your postprocessor runs it might cause a few problems.
if opts.exec_cmd:
postprocessors.append({
'key': 'ExecAfterDownload',
'verboseOutput': opts.verbose,
'exec_cmd': opts.exec_cmd,
})
ydl_opts = {
'usenetrc': opts.usenetrc,
'username': opts.username,
@@ -297,32 +324,10 @@ def _real_main(argv=None):
'encoding': opts.encoding,
'exec_cmd': opts.exec_cmd,
'extract_flat': opts.extract_flat,
'postprocessors': postprocessors,
}
with YoutubeDL(ydl_opts) as ydl:
# PostProcessors
# Add the metadata pp first, the other pps will copy it
if opts.addmetadata:
ydl.add_post_processor(FFmpegMetadataPP())
if opts.extractaudio:
ydl.add_post_processor(FFmpegExtractAudioPP(preferredcodec=opts.audioformat, preferredquality=opts.audioquality, nopostoverwrites=opts.nopostoverwrites))
if opts.recodevideo:
ydl.add_post_processor(FFmpegVideoConvertor(preferedformat=opts.recodevideo))
if opts.embedsubtitles:
ydl.add_post_processor(FFmpegEmbedSubtitlePP(subtitlesformat=opts.subtitlesformat))
if opts.xattrs:
ydl.add_post_processor(XAttrMetadataPP())
if opts.embedthumbnail:
if not opts.addmetadata:
ydl.add_post_processor(FFmpegAudioFixPP())
ydl.add_post_processor(AtomicParsleyPP())
# Please keep ExecAfterDownload towards the bottom as it allows the user to modify the final file in any way.
# So if the user is able to remove the file before your postprocessor runs it might cause a few problems.
if opts.exec_cmd:
ydl.add_post_processor(ExecAfterDownloadPP(
verboseOutput=opts.verbose, exec_cmd=opts.exec_cmd))
# Update version
if opts.update_self:
update_self(ydl.to_screen, opts.verbose)

View File

@@ -5,8 +5,8 @@ import re
import sys
import time
from ..compat import compat_str
from ..utils import (
compat_str,
encodeFilename,
format_bytes,
timeconvert,
@@ -285,7 +285,7 @@ class FileDownloader(object):
Return True on success and False otherwise
"""
# Check file already present
if self.params.get('continuedl', False) and os.path.isfile(encodeFilename(filename)) and not self.params.get('nopart', False):
if filename != '-' and self.params.get('continuedl', False) and os.path.isfile(encodeFilename(filename)) and not self.params.get('nopart', False):
self.report_file_already_downloaded(filename)
self._hook_progress({
'filename': filename,
@@ -305,19 +305,6 @@ class FileDownloader(object):
ph(status)
def add_progress_hook(self, ph):
""" ph gets called on download progress, with a dictionary with the entries
* filename: The final filename
* status: One of "downloading" and "finished"
It can also have some of the following entries:
* downloaded_bytes: Bytes on disks
* total_bytes: Total bytes, None if unknown
* tmpfilename: The filename we're currently writing to
* eta: The estimated time in seconds, None if unknown
* speed: The download speed in bytes/second, None if unknown
Hooks are guaranteed to be called at least once (with status "finished")
if the download is successful.
"""
# See YoutubeDl.py (search for progress_hooks) for a description of
# this interface
self._progress_hooks.append(ph)

View File

@@ -9,10 +9,12 @@ import xml.etree.ElementTree as etree
from .common import FileDownloader
from .http import HttpFD
from ..compat import (
compat_urlparse,
)
from ..utils import (
struct_pack,
struct_unpack,
compat_urlparse,
format_bytes,
encodeFilename,
sanitize_open,
@@ -201,7 +203,7 @@ def write_flv_header(stream, metadata):
stream.write(b'\x00\x00\x00\x00\x00\x00\x00')
stream.write(metadata)
# Magic numbers extracted from the output files produced by AdobeHDS.php
#(https://github.com/K-S-V/Scripts)
# (https://github.com/K-S-V/Scripts)
stream.write(b'\x00\x00\x01\x73')
@@ -231,6 +233,7 @@ class F4mFD(FileDownloader):
'continuedl': True,
'quiet': True,
'noprogress': True,
'ratelimit': self.params.get('ratelimit', None),
'test': self.params.get('test', False),
}
)

View File

@@ -6,9 +6,11 @@ import subprocess
from ..postprocessor.ffmpeg import FFmpegPostProcessor
from .common import FileDownloader
from ..utils import (
from ..compat import (
compat_urlparse,
compat_urllib_request,
)
from ..utils import (
check_executable,
encodeFilename,
)

View File

@@ -4,11 +4,12 @@ import os
import time
from .common import FileDownloader
from ..utils import (
from ..compat import (
compat_urllib_request,
compat_urllib_error,
)
from ..utils import (
ContentTooShortError,
encodeFilename,
sanitize_open,
format_bytes,

View File

@@ -7,9 +7,9 @@ import sys
import time
from .common import FileDownloader
from ..compat import compat_str
from ..utils import (
check_executable,
compat_str,
encodeFilename,
format_bytes,
get_exe_version,
@@ -185,7 +185,7 @@ class RtmpFD(FileDownloader):
cursize = os.path.getsize(encodeFilename(tmpfilename))
if prevsize == cursize and retval == RD_FAILED:
break
# Some rtmp streams seem abort after ~ 99.8%. Don't complain for those
# Some rtmp streams seem abort after ~ 99.8%. Don't complain for those
if prevsize == cursize and retval == RD_INCOMPLETE and cursize > 1024:
self.to_screen('[rtmpdump] Could not download the whole video. This can happen for some advertisements.')
retval = RD_SUCCESS

View File

@@ -5,6 +5,7 @@ from .academicearth import AcademicEarthCourseIE
from .addanime import AddAnimeIE
from .adultswim import AdultSwimIE
from .aftonbladet import AftonbladetIE
from .aljazeera import AlJazeeraIE
from .anitube import AnitubeIE
from .anysex import AnySexIE
from .aol import AolIE
@@ -65,6 +66,7 @@ from .cnn import (
)
from .collegehumor import CollegeHumorIE
from .comedycentral import ComedyCentralIE, ComedyCentralShowsIE
from .comcarcoff import ComCarCoffIE
from .condenast import CondeNastIE
from .cracked import CrackedIE
from .criterion import CriterionIE
@@ -159,6 +161,7 @@ from .googlesearch import GoogleSearchIE
from .gorillavid import GorillaVidIE
from .goshgay import GoshgayIE
from .grooveshark import GroovesharkIE
from .groupon import GrouponIE
from .hark import HarkIE
from .heise import HeiseIE
from .helsinki import HelsinkiIE
@@ -314,6 +317,7 @@ from .radiofrance import RadioFranceIE
from .rai import RaiIE
from .rbmaradio import RBMARadioIE
from .redtube import RedTubeIE
from .restudy import RestudyIE
from .reverbnation import ReverbNationIE
from .ringtv import RingTVIE
from .ro220 import Ro220IE
@@ -322,6 +326,7 @@ from .roxwel import RoxwelIE
from .rtbf import RTBFIE
from .rtlnl import RtlXlIE
from .rtlnow import RTLnowIE
from .rtp import RTPIE
from .rts import RTSIE
from .rtve import RTVEALaCartaIE, RTVELiveIE
from .ruhd import RUHDIE
@@ -337,6 +342,7 @@ from .savefrom import SaveFromIE
from .sbs import SBSIE
from .scivee import SciVeeIE
from .screencast import ScreencastIE
from .screencastomatic import ScreencastOMaticIE
from .screenwavemedia import CinemassacreIE, ScreenwaveMediaIE, TeamFourIE
from .servingsys import ServingSysIE
from .sexu import SexuIE
@@ -506,6 +512,7 @@ from .yahoo import (
YahooIE,
YahooSearchIE,
)
from .yesjapan import YesJapanIE
from .ynet import YnetIE
from .youjizz import YouJizzIE
from .youku import YoukuIE

View File

@@ -7,6 +7,8 @@ import json
from .common import InfoExtractor
from ..utils import (
ExtractorError,
xpath_text,
float_or_none,
)
@@ -128,7 +130,8 @@ class AdultSwimIE(InfoExtractor):
segment_url, segment_title,
'Downloading segment information', 'Unable to download segment information')
segment_duration = idoc.find('.//trt').text.strip()
segment_duration = float_or_none(
xpath_text(idoc, './/trt', 'segment duration').strip())
formats = []
file_els = idoc.findall('.//files/file')

View File

@@ -0,0 +1,35 @@
from __future__ import unicode_literals
from .common import InfoExtractor
class AlJazeeraIE(InfoExtractor):
_VALID_URL = r'http://www\.aljazeera\.com/programmes/.*?/(?P<id>[^/]+)\.html'
_TEST = {
'url': 'http://www.aljazeera.com/programmes/the-slum/2014/08/deliverance-201482883754237240.html',
'info_dict': {
'id': '3792260579001',
'ext': 'mp4',
'title': 'The Slum - Episode 1: Deliverance',
'description': 'As a birth attendant advocating for family planning, Remy is on the frontline of Tondo\'s battle with overcrowding.',
'uploader': 'Al Jazeera English',
},
'add_ie': ['Brightcove'],
}
def _real_extract(self, url):
program_name = self._match_id(url)
webpage = self._download_webpage(url, program_name)
brightcove_id = self._search_regex(
r'RenderPagesVideo\(\'(.+?)\'', webpage, 'brightcove id')
return {
'_type': 'url',
'url': (
'brightcove:'
'playerKey=AQ~~%2CAAAAmtVJIFk~%2CTVGOQ5ZTwJbeMWnq5d_H4MOM57xfzApc'
'&%40videoPlayer={0}'.format(brightcove_id)
),
'ie_key': 'Brightcove',
}

View File

@@ -5,15 +5,14 @@ import re
import json
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
compat_str,
qualities,
determine_ext,
)
class AllocineIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?allocine\.fr/(?P<typ>article|video|film)/(fichearticle_gen_carticle=|player_gen_cmedia=|fichefilm_gen_cfilm=)(?P<id>[0-9]+)(?:\.html)?'
_VALID_URL = r'https?://(?:www\.)?allocine\.fr/(?P<typ>article|video|film)/(fichearticle_gen_carticle=|player_gen_cmedia=|fichefilm_gen_cfilm=|video-)(?P<id>[0-9]+)(?:\.html)?'
_TESTS = [{
'url': 'http://www.allocine.fr/article/fichearticle_gen_carticle=18635087.html',
@@ -45,6 +44,9 @@ class AllocineIE(InfoExtractor):
'description': 'md5:71742e3a74b0d692c7fce0dd2017a4ac',
'thumbnail': 're:http://.*\.jpg',
},
}, {
'url': 'http://www.allocine.fr/video/video-19550147/',
'only_matching': True,
}]
def _real_extract(self, url):
@@ -75,9 +77,7 @@ class AllocineIE(InfoExtractor):
'format_id': format_id,
'quality': quality(format_id),
'url': v,
'ext': determine_ext(v),
})
self._sort_formats(formats)
return {

View File

@@ -3,7 +3,6 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from .fivemin import FiveMinIE
class AolIE(InfoExtractor):
@@ -42,31 +41,30 @@ class AolIE(InfoExtractor):
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
playlist_id = mobj.group('playlist_id')
if playlist_id and not self._downloader.params.get('noplaylist'):
self.to_screen('Downloading playlist %s - add --no-playlist to just download video %s' % (playlist_id, video_id))
if not playlist_id or self._downloader.params.get('noplaylist'):
return self.url_result('5min:%s' % video_id)
webpage = self._download_webpage(url, playlist_id)
title = self._html_search_regex(
r'<h1 class="video-title[^"]*">(.+?)</h1>', webpage, 'title')
playlist_html = self._search_regex(
r"(?s)<ul\s+class='video-related[^']*'>(.*?)</ul>", webpage,
'playlist HTML')
entries = [{
'_type': 'url',
'url': 'aol-video:%s' % m.group('id'),
'ie_key': 'Aol',
} for m in re.finditer(
r"<a\s+href='.*videoid=(?P<id>[0-9]+)'\s+class='video-thumb'>",
playlist_html)]
self.to_screen('Downloading playlist %s - add --no-playlist to just download video %s' % (playlist_id, video_id))
return {
'_type': 'playlist',
'id': playlist_id,
'display_id': mobj.group('playlist_display_id'),
'title': title,
'entries': entries,
}
webpage = self._download_webpage(url, playlist_id)
title = self._html_search_regex(
r'<h1 class="video-title[^"]*">(.+?)</h1>', webpage, 'title')
playlist_html = self._search_regex(
r"(?s)<ul\s+class='video-related[^']*'>(.*?)</ul>", webpage,
'playlist HTML')
entries = [{
'_type': 'url',
'url': 'aol-video:%s' % m.group('id'),
'ie_key': 'Aol',
} for m in re.finditer(
r"<a\s+href='.*videoid=(?P<id>[0-9]+)'\s+class='video-thumb'>",
playlist_html)]
return FiveMinIE._build_result(video_id)
return {
'_type': 'playlist',
'id': playlist_id,
'display_id': mobj.group('playlist_display_id'),
'title': title,
'entries': entries,
}

View File

@@ -4,8 +4,8 @@ import re
import json
from .common import InfoExtractor
from ..compat import compat_urlparse
from ..utils import (
compat_urlparse,
int_or_none,
)

View File

@@ -3,8 +3,8 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_urllib_parse
from ..utils import (
compat_urllib_parse,
determine_ext,
ExtractorError,
)

View File

@@ -5,7 +5,7 @@ import json
import itertools
from .common import InfoExtractor
from ..utils import (
from ..compat import (
compat_urllib_request,
)

View File

@@ -4,9 +4,11 @@ import json
import re
from .common import InfoExtractor
from ..utils import (
from ..compat import (
compat_str,
compat_urlparse,
)
from ..utils import (
ExtractorError,
)
@@ -104,7 +106,7 @@ class BandcampIE(InfoExtractor):
class BandcampAlbumIE(InfoExtractor):
IE_NAME = 'Bandcamp:album'
_VALID_URL = r'https?://(?:(?P<subdomain>[^.]+)\.)?bandcamp\.com(?:/album/(?P<title>[^?#]+))'
_VALID_URL = r'https?://(?:(?P<subdomain>[^.]+)\.)?bandcamp\.com(?:/album/(?P<title>[^?#]+)|/?(?:$|[?#]))'
_TESTS = [{
'url': 'http://blazo.bandcamp.com/album/jazz-format-mixtape-vol-1',
@@ -139,6 +141,12 @@ class BandcampAlbumIE(InfoExtractor):
'title': 'Hierophany of the Open Grave',
},
'playlist_mincount': 9,
}, {
'url': 'http://dotscale.bandcamp.com',
'info_dict': {
'title': 'Loom',
},
'playlist_mincount': 7,
}]
def _real_extract(self, url):

View File

@@ -209,7 +209,7 @@ class BBCCoUkIE(SubtitlesInfoExtractor):
webpage = self._download_webpage(url, group_id, 'Downloading video page')
programme_id = self._search_regex(
r'"vpid"\s*:\s*"([\da-z]{8})"', webpage, 'vpid', fatal=False)
r'"vpid"\s*:\s*"([\da-z]{8})"', webpage, 'vpid', fatal=False, default=None)
if programme_id:
player = self._download_json(
'http://www.bbc.co.uk/iplayer/episode/%s.json' % group_id,

View File

@@ -1,8 +1,8 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_urllib_parse
from ..utils import (
compat_urllib_parse,
xpath_text,
xpath_with_ns,
int_or_none,

View File

@@ -4,8 +4,8 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_parse_qs
from ..utils import (
compat_parse_qs,
ExtractorError,
int_or_none,
unified_strdate,
@@ -29,10 +29,9 @@ class BiliBiliIE(InfoExtractor):
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
video_code = self._search_regex(
r'(?s)<div itemprop="video".*?>(.*?)</div>', webpage, 'video code')

View File

@@ -6,25 +6,26 @@ import json
import xml.etree.ElementTree
from .common import InfoExtractor
from ..utils import (
compat_urllib_parse,
find_xpath_attr,
fix_xml_ampersands,
compat_urlparse,
compat_str,
compat_urllib_request,
from ..compat import (
compat_parse_qs,
compat_str,
compat_urllib_parse,
compat_urllib_parse_urlparse,
compat_urllib_request,
compat_urlparse,
)
from ..utils import (
determine_ext,
ExtractorError,
unsmuggle_url,
find_xpath_attr,
fix_xml_ampersands,
unescapeHTML,
unsmuggle_url,
)
class BrightcoveIE(InfoExtractor):
_VALID_URL = r'https?://.*brightcove\.com/(services|viewer).*?\?(?P<query>.*)'
_VALID_URL = r'(?:https?://.*brightcove\.com/(services|viewer).*?\?|brightcove:)(?P<query>.*)'
_FEDERATED_URL_TEMPLATE = 'http://c.brightcove.com/services/viewer/htmlFederated?%s'
_TESTS = [

View File

@@ -4,10 +4,12 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
from ..compat import (
compat_urllib_request,
compat_urllib_parse,
compat_urllib_parse_urlparse,
)
from ..utils import (
ExtractorError,
)

View File

@@ -0,0 +1,57 @@
# encoding: utf-8
from __future__ import unicode_literals
import json
from .common import InfoExtractor
from ..utils import parse_iso8601
class ComCarCoffIE(InfoExtractor):
_VALID_URL = r'http://(?:www\.)?comediansincarsgettingcoffee\.com/(?P<id>[a-z0-9\-]*)'
_TESTS = [{
'url': 'http://comediansincarsgettingcoffee.com/miranda-sings-happy-thanksgiving-miranda/',
'info_dict': {
'id': 'miranda-sings-happy-thanksgiving-miranda',
'ext': 'mp4',
'upload_date': '20141127',
'timestamp': 1417107600,
'title': 'Happy Thanksgiving Miranda',
'description': 'Jerry Seinfeld and his special guest Miranda Sings cruise around town in search of coffee, complaining and apologizing along the way.',
'thumbnail': 'http://ccc.crackle.com/images/s5e4_thumb.jpg',
},
'params': {
'skip_download': 'requires ffmpeg',
}
}]
def _real_extract(self, url):
display_id = self._match_id(url)
if not display_id:
display_id = 'comediansincarsgettingcoffee.com'
webpage = self._download_webpage(url, display_id)
full_data = json.loads(self._search_regex(
r'<script type="application/json" id="videoData">(?P<json>.+?)</script>',
webpage, 'full data json'))
video_id = full_data['activeVideo']['video']
video_data = full_data['videos'][video_id]
thumbnails = [{
'url': video_data['images']['thumb'],
}, {
'url': video_data['images']['poster'],
}]
formats = self._extract_m3u8_formats(
video_data['mediaUrl'], video_id, ext='mp4')
return {
'id': video_id,
'display_id': display_id,
'title': video_data['title'],
'description': video_data.get('description'),
'timestamp': parse_iso8601(video_data.get('pubDate')),
'thumbnails': thumbnails,
'formats': formats,
'webpage_url': 'http://comediansincarsgettingcoffee.com/%s' % (video_data.get('urlSlug', video_data.get('slug'))),
}

View File

@@ -3,9 +3,11 @@ from __future__ import unicode_literals
import re
from .mtv import MTVServicesInfoExtractor
from ..utils import (
from ..compat import (
compat_str,
compat_urllib_parse,
)
from ..utils import (
ExtractorError,
float_or_none,
unified_strdate,
@@ -48,7 +50,7 @@ class ComedyCentralShowsIE(MTVServicesInfoExtractor):
)|
(?P<interview>
extended-interviews/(?P<interID>[0-9a-z]+)/(?:playlist_tds_extended_)?(?P<interview_title>.*?)(/.*?)?)))
(?:[?#].*|$)'''
'''
_TESTS = [{
'url': 'http://thedailyshow.cc.com/watch/thu-december-13-2012/kristen-stewart',
'md5': '4e2f5cb088a83cd8cdb7756132f9739d',
@@ -81,6 +83,9 @@ class ComedyCentralShowsIE(MTVServicesInfoExtractor):
}, {
'url': 'http://thedailyshow.cc.com/video-playlists/npde3s/the-daily-show-19088-highlights',
'only_matching': True,
}, {
'url': 'http://thedailyshow.cc.com/video-playlists/t6d9sg/the-daily-show-20038-highlights/be3cwo',
'only_matching': True,
}, {
'url': 'http://thedailyshow.cc.com/special-editions/2l8fdb/special-edition---a-look-back-at-food',
'only_matching': True,

View File

@@ -5,12 +5,14 @@ import re
import json
from .common import InfoExtractor
from ..utils import (
from ..compat import (
compat_urllib_parse,
orderedSet,
compat_urllib_parse_urlparse,
compat_urlparse,
)
from ..utils import (
orderedSet,
)
class CondeNastIE(InfoExtractor):

View File

@@ -10,10 +10,12 @@ import xml.etree.ElementTree
from hashlib import sha1
from math import pow, sqrt, floor
from .subtitles import SubtitlesInfoExtractor
from ..utils import (
ExtractorError,
from ..compat import (
compat_urllib_parse,
compat_urllib_request,
)
from ..utils import (
ExtractorError,
bytes_to_intlist,
intlist_to_bytes,
unified_strdate,
@@ -30,7 +32,6 @@ class CrunchyrollIE(SubtitlesInfoExtractor):
_VALID_URL = r'https?://(?:(?P<prefix>www|m)\.)?(?P<url>crunchyroll\.com/(?:[^/]*/[^/?&]*?|media/\?id=)(?P<video_id>[0-9]+))(?:[/?&]|$)'
_TEST = {
'url': 'http://www.crunchyroll.com/wanna-be-the-strongest-in-the-world/episode-1-an-idol-wrestler-is-born-645513',
#'md5': 'b1639fd6ddfaa43788c85f6d1dddd412',
'info_dict': {
'id': '645513',
'ext': 'flv',

View File

@@ -27,7 +27,6 @@ class CSpanIE(InfoExtractor):
'url': 'http://www.c-span.org/video/?c4486943/cspan-international-health-care-models',
# For whatever reason, the served video alternates between
# two different ones
#'md5': 'dbb0f047376d457f2ab8b3929cbb2d0c',
'info_dict': {
'id': '340723',
'ext': 'mp4',

View File

@@ -8,13 +8,15 @@ import itertools
from .common import InfoExtractor
from .subtitles import SubtitlesInfoExtractor
from ..utils import (
compat_urllib_request,
from ..compat import (
compat_str,
compat_urllib_request,
)
from ..utils import (
ExtractorError,
int_or_none,
orderedSet,
str_to_int,
int_or_none,
ExtractorError,
unescapeHTML,
)

View File

@@ -5,7 +5,7 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
from ..compat import (
compat_urllib_parse,
)

View File

@@ -1,7 +1,5 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
@@ -20,8 +18,7 @@ class EbaumsWorldIE(InfoExtractor):
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
video_id = self._match_id(url)
config = self._download_xml(
'http://www.ebaumsworld.com/video/player/%s' % video_id, video_id)
video_url = config.find('file').text

View File

@@ -1,8 +1,6 @@
from __future__ import unicode_literals
import re
from ..utils import (
from ..compat import (
compat_urllib_parse,
)
from .common import InfoExtractor
@@ -24,11 +22,10 @@ class EHowIE(InfoExtractor):
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
video_url = self._search_regex(r'(?:file|source)=(http[^\'"&]*)',
webpage, 'video URL')
video_url = self._search_regex(
r'(?:file|source)=(http[^\'"&]*)', webpage, 'video URL')
final_url = compat_urllib_parse.unquote(video_url)
uploader = self._html_search_meta('uploader', webpage)
title = self._og_search_title(webpage).replace(' | eHow', '')

View File

@@ -6,7 +6,7 @@ import random
import re
from .common import InfoExtractor
from ..utils import (
from ..compat import (
compat_str,
)

View File

@@ -3,7 +3,6 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from .fivemin import FiveMinIE
from ..utils import (
url_basename,
)
@@ -27,11 +26,10 @@ class EngadgetIE(InfoExtractor):
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
video_id = self._match_id(url)
if video_id is not None:
return FiveMinIE._build_result(video_id)
return self.url_result('5min:%s' % video_id)
else:
title = url_basename(url)
webpage = self._download_webpage(url, title)
@@ -39,5 +37,5 @@ class EngadgetIE(InfoExtractor):
return {
'_type': 'playlist',
'title': title,
'entries': [FiveMinIE._build_result(id) for id in ids]
'entries': [self.url_result('5min:%s' % vid) for vid in ids]
}

View File

@@ -3,9 +3,10 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
from ..compat import (
compat_urllib_parse,
)
from ..utils import (
ExtractorError,
)

View File

@@ -3,8 +3,10 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
from ..compat import (
compat_urllib_request,
)
from ..utils import (
ExtractorError,
)

View File

@@ -3,16 +3,18 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
from ..compat import (
compat_urllib_parse_urlparse,
compat_urllib_request,
compat_urllib_parse,
)
from ..utils import (
str_to_int,
)
class ExtremeTubeIE(InfoExtractor):
_VALID_URL = r'^(?:https?://)?(?:www\.)?(?P<url>extremetube\.com/.*?video/.+?(?P<videoid>[0-9]+))(?:[/?&]|$)'
_VALID_URL = r'https?://(?:www\.)?(?P<url>extremetube\.com/.*?video/.+?(?P<id>[0-9]+))(?:[/?&]|$)'
_TESTS = [{
'url': 'http://www.extremetube.com/video/music-video-14-british-euro-brit-european-cumshots-swallow-652431',
'md5': '1fb9228f5e3332ec8c057d6ac36f33e0',
@@ -31,7 +33,7 @@ class ExtremeTubeIE(InfoExtractor):
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('videoid')
video_id = mobj.group('id')
url = 'http://www.' + mobj.group('url')
req = compat_urllib_request.Request(url)

View File

@@ -1,19 +1,20 @@
#! -*- coding: utf-8 -*-
from __future__ import unicode_literals
import re
import hashlib
from .common import InfoExtractor
from ..utils import (
ExtractorError,
from ..compat import (
compat_urllib_request,
compat_urlparse,
)
from ..utils import (
ExtractorError,
)
class FC2IE(InfoExtractor):
_VALID_URL = r'^http://video\.fc2\.com/((?P<lang>[^/]+)/)?content/(?P<id>[^/]+)'
_VALID_URL = r'^http://video\.fc2\.com/(?:[^/]+/)?content/(?P<id>[^/]+)'
IE_NAME = 'fc2'
_TEST = {
'url': 'http://video.fc2.com/en/content/20121103kUan1KHs',
@@ -26,9 +27,7 @@ class FC2IE(InfoExtractor):
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
self._downloader.cookiejar.clear_session_cookies() # must clear

View File

@@ -4,11 +4,13 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
ExtractorError,
from ..compat import (
compat_urllib_parse,
compat_urllib_request,
)
from ..utils import (
ExtractorError,
)
class FiredriveIE(InfoExtractor):
@@ -28,11 +30,8 @@ class FiredriveIE(InfoExtractor):
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
video_id = self._match_id(url)
url = 'http://firedrive.com/file/%s' % video_id
webpage = self._download_webpage(url, video_id)
if re.search(self._FILE_DELETED_REGEX, webpage) is not None:

View File

@@ -1,11 +1,11 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
from ..compat import (
compat_str,
compat_urllib_parse,
)
from ..utils import (
ExtractorError,
)
@@ -13,7 +13,7 @@ from ..utils import (
class FiveMinIE(InfoExtractor):
IE_NAME = '5min'
_VALID_URL = r'''(?x)
(?:https?://[^/]*?5min\.com/Scripts/PlayerSeed\.js\?(.*?&)?playList=|
(?:https?://[^/]*?5min\.com/Scripts/PlayerSeed\.js\?(?:.*?&)?playList=|
5min:)
(?P<id>\d+)
'''
@@ -41,13 +41,8 @@ class FiveMinIE(InfoExtractor):
},
]
@classmethod
def _build_result(cls, video_id):
return cls.url_result('5min:%s' % video_id, cls.ie_key())
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
video_id = self._match_id(url)
embed_url = 'https://embed.5min.com/playerseed/?playList=%s' % video_id
embed_page = self._download_webpage(embed_url, video_id,
'Downloading embed page')

View File

@@ -3,12 +3,14 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
from ..compat import (
compat_urllib_request,
unified_strdate,
str_to_int,
parse_duration,
)
from ..utils import (
clean_html,
parse_duration,
str_to_int,
unified_strdate,
)
@@ -31,9 +33,7 @@ class FourTubeIE(InfoExtractor):
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
video_id = self._match_id(url)
webpage_url = 'http://www.4tube.com/videos/' + video_id
webpage = self._download_webpage(webpage_url, video_id)

View File

@@ -5,7 +5,7 @@ import json
import re
from .common import InfoExtractor
from ..utils import (
from ..compat import (
compat_parse_qs,
compat_urlparse,
)

View File

@@ -6,13 +6,15 @@ import re
import json
from .common import InfoExtractor
from ..utils import (
compat_urlparse,
ExtractorError,
clean_html,
parse_duration,
from ..compat import (
compat_urllib_parse_urlparse,
compat_urlparse,
)
from ..utils import (
clean_html,
ExtractorError,
int_or_none,
parse_duration,
)

View File

@@ -4,9 +4,11 @@ import re
import json
from .common import InfoExtractor
from ..utils import (
from ..compat import (
compat_urllib_parse,
compat_urlparse,
)
from ..utils import (
unescapeHTML,
)

View File

@@ -3,7 +3,7 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
from ..compat import (
compat_urllib_parse,
compat_urllib_request,
)

View File

@@ -2,8 +2,10 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
from ..compat import (
compat_urlparse,
)
from ..utils import (
determine_ext,
)

View File

@@ -4,7 +4,7 @@ import itertools
import re
from .common import SearchInfoExtractor
from ..utils import (
from ..compat import (
compat_urllib_parse,
)

View File

@@ -4,11 +4,12 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
ExtractorError,
determine_ext,
from ..compat import (
compat_urllib_parse,
compat_urllib_request,
)
from ..utils import (
ExtractorError,
int_or_none,
)
@@ -106,7 +107,6 @@ class GorillaVidIE(InfoExtractor):
formats = [{
'format_id': 'sd',
'url': video_url,
'ext': determine_ext(video_url),
'quality': 1,
}]

View File

@@ -0,0 +1,50 @@
from __future__ import unicode_literals
from .common import InfoExtractor
class GrouponIE(InfoExtractor):
_VALID_URL = r'https?://www\.groupon\.com/deals/(?P<id>[^?#]+)'
_TEST = {
'url': 'https://www.groupon.com/deals/bikram-yoga-huntington-beach-2#ooid=tubGNycTo_9Uxg82uESj4i61EYX8nyuf',
'info_dict': {
'id': 'bikram-yoga-huntington-beach-2',
'title': '$49 for 10 Yoga Classes or One Month of Unlimited Classes at Bikram Yoga Huntington Beach ($180 Value)',
'description': 'Studio kept at 105 degrees and 40% humidity with anti-microbial and anti-slip Flotex flooring; certified instructors',
},
'playlist': [{
'info_dict': {
'id': 'tubGNycTo_9Uxg82uESj4i61EYX8nyuf',
'ext': 'mp4',
'title': 'Bikram Yoga Huntington Beach | Orange County',
},
}],
'params': {
'skip_download': 'HLS',
}
}
def _real_extract(self, url):
playlist_id = self._match_id(url)
webpage = self._download_webpage(url, playlist_id)
payload = self._parse_json(self._search_regex(
r'var\s+payload\s*=\s*(.*?);\n', webpage, 'payload'), playlist_id)
videos = payload['carousel'].get('dealVideos', [])
entries = []
for v in videos:
if v.get('provider') != 'OOYALA':
self.report_warning(
'%s: Unsupported video provider %s, skipping video' %
(playlist_id, v.get('provider')))
continue
entries.append(self.url_result('ooyala:%s' % v['media']))
return {
'_type': 'playlist',
'id': playlist_id,
'entries': entries,
'title': self._og_search_title(webpage),
'description': self._og_search_description(webpage),
}

View File

@@ -4,9 +4,11 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import (
compat_urllib_request,
)
from ..utils import (
ExtractorError,
compat_urllib_request,
int_or_none,
urlencode_postdata,
)
@@ -30,9 +32,7 @@ class HostingBulkIE(InfoExtractor):
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
video_id = self._match_id(url)
url = 'http://hostingbulk.com/{0:}.html'.format(video_id)
# Custom request with cookie to set language to English, so our file

View File

@@ -1,20 +1,20 @@
from __future__ import unicode_literals
import json
import re
import time
from .common import InfoExtractor
from ..utils import (
from ..compat import (
compat_urllib_parse,
compat_urllib_request,
)
from ..utils import (
ExtractorError,
)
class HypemIE(InfoExtractor):
_VALID_URL = r'http://(?:www\.)?hypem\.com/track/([^/]+)/([^/]+)'
_VALID_URL = r'http://(?:www\.)?hypem\.com/track/(?P<id>[^/]+)/'
_TEST = {
'url': 'http://hypem.com/track/1v6ga/BODYWORK+-+TAME',
'md5': 'b9cc91b5af8995e9f0c1cee04c575828',
@@ -27,8 +27,7 @@ class HypemIE(InfoExtractor):
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
track_id = mobj.group(1)
track_id = self._match_id(url)
data = {'ax': 1, 'ts': time.time()}
data_encoded = compat_urllib_parse.urlencode(data)

View File

@@ -4,7 +4,7 @@ import re
import json
from .common import InfoExtractor
from ..utils import (
from ..compat import (
compat_urlparse,
)

View File

@@ -1,10 +1,9 @@
from __future__ import unicode_literals
import base64
import re
from .common import InfoExtractor
from ..utils import (
from ..compat import (
compat_urllib_parse,
)
@@ -24,9 +23,7 @@ class InfoQIE(InfoExtractor):
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
video_title = self._html_search_regex(r'<title>(.*?)</title>', webpage, 'title')

View File

@@ -3,9 +3,11 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
from ..compat import (
compat_urlparse,
compat_urllib_parse,
)
from ..utils import (
xpath_with_ns,
)

View File

@@ -6,8 +6,10 @@ from random import random
from math import floor
from .common import InfoExtractor
from ..utils import (
from ..compat import (
compat_urllib_request,
)
from ..utils import (
ExtractorError,
)

View File

@@ -5,8 +5,10 @@ import re
import json
from .common import InfoExtractor
from ..utils import (
from ..compat import (
compat_urllib_request,
)
from ..utils import (
ExtractorError,
)

View File

@@ -1,34 +1,39 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
class KeekIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?keek\.com/(?:!|\w+/keeks/)(?P<videoID>\w+)'
_VALID_URL = r'https?://(?:www\.)?keek\.com/(?:!|\w+/keeks/)(?P<id>\w+)'
IE_NAME = 'keek'
_TEST = {
'url': 'https://www.keek.com/ytdl/keeks/NODfbab',
'file': 'NODfbab.mp4',
'md5': '9b0636f8c0f7614afa4ea5e4c6e57e83',
'md5': '09c5c109067536c1cec8bac8c21fea05',
'info_dict': {
'uploader': 'ytdl',
'id': 'NODfbab',
'ext': 'mp4',
'uploader': 'youtube-dl project',
'uploader_id': 'ytdl',
'title': 'test chars: "\'/\\\u00e4<>This is a test video for youtube-dl.For more information, contact phihag@phihag.de .',
},
}
def _real_extract(self, url):
m = re.match(self._VALID_URL, url)
video_id = m.group('videoID')
video_id = self._match_id(url)
video_url = 'http://cdn.keek.com/keek/video/%s' % video_id
thumbnail = 'http://cdn.keek.com/keek/thumbnail/%s/w100/h75' % video_id
webpage = self._download_webpage(url, video_id)
uploader = self._html_search_regex(
r'<div class="user-name-and-bio">[\S\s]+?<h2>(?P<uploader>.+?)</h2>',
webpage, 'uploader', fatal=False)
raw_desc = self._html_search_meta('description', webpage)
if raw_desc:
uploader = self._html_search_regex(
r'Watch (.*?)\s+\(', raw_desc, 'uploader', fatal=False)
uploader_id = self._html_search_regex(
r'Watch .*?\(@(.+?)\)', raw_desc, 'uploader_id', fatal=False)
else:
uploader = None
uploader_id = None
return {
'id': video_id,
@@ -36,5 +41,6 @@ class KeekIE(InfoExtractor):
'ext': 'mp4',
'title': self._og_search_title(webpage),
'thumbnail': thumbnail,
'uploader': uploader
'uploader': uploader,
'uploader_id': uploader_id,
}

View File

@@ -4,7 +4,7 @@ import os
import re
from .common import InfoExtractor
from ..utils import (
from ..compat import (
compat_urllib_parse_urlparse,
compat_urllib_request,
compat_urllib_parse,
@@ -15,7 +15,7 @@ from ..aes import (
class KeezMoviesIE(InfoExtractor):
_VALID_URL = r'^https?://(?:www\.)?keezmovies\.com/video/.+?(?P<videoid>[0-9]+)(?:[/?&]|$)'
_VALID_URL = r'https?://(?:www\.)?keezmovies\.com/video/.+?(?P<id>[0-9]+)(?:[/?&]|$)'
_TEST = {
'url': 'http://www.keezmovies.com/video/petite-asian-lady-mai-playing-in-bathtub-1214711',
'file': '1214711.mp4',
@@ -27,8 +27,7 @@ class KeezMoviesIE(InfoExtractor):
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('videoid')
video_id = self._match_id(url)
req = compat_urllib_request.Request(url)
req.add_header('Cookie', 'age_verified=1')

View File

@@ -4,10 +4,12 @@ import re
import json
from .common import InfoExtractor
from ..utils import (
from ..compat import (
compat_str,
compat_urllib_parse_urlparse,
compat_urlparse,
)
from ..utils import (
ExtractorError,
find_xpath_attr,
int_or_none,

View File

@@ -5,12 +5,14 @@ import json
from .subtitles import SubtitlesInfoExtractor
from .common import InfoExtractor
from ..utils import (
from ..compat import (
compat_str,
compat_urllib_parse,
compat_urllib_request,
)
from ..utils import (
ExtractorError,
int_or_none,
compat_str,
)

View File

@@ -1,43 +1,33 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
from ..compat import (
compat_urllib_parse,
)
class MalemotionIE(InfoExtractor):
_VALID_URL = r'^(?:https?://)?malemotion\.com/video/(.+?)\.(?P<id>.+?)(#|$)'
_VALID_URL = r'https?://malemotion\.com/video/(.+?)\.(?P<id>.+?)(#|$)'
_TEST = {
'url': 'http://malemotion.com/video/bien-dur.10ew',
'file': '10ew.mp4',
'md5': 'b3cc49f953b107e4a363cdff07d100ce',
'url': 'http://malemotion.com/video/bete-de-concours.ltc',
'md5': '3013e53a0afbde2878bc39998c33e8a5',
'info_dict': {
"title": "Bien dur",
"age_limit": 18,
'id': 'ltc',
'ext': 'mp4',
'title': 'Bête de Concours',
'age_limit': 18,
},
'skip': 'This video has been deleted.'
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group("id")
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
self.report_extraction(video_id)
# Extract video URL
video_url = compat_urllib_parse.unquote(
self._search_regex(r'<source type="video/mp4" src="(.+?)"', webpage, 'video URL'))
# Extract title
video_url = compat_urllib_parse.unquote(self._search_regex(
r'<source type="video/mp4" src="(.+?)"', webpage, 'video URL'))
video_title = self._html_search_regex(
r'<title>(.*?)</title', webpage, 'title')
# Extract video thumbnail
video_thumbnail = self._search_regex(
r'<video .+?poster="(.+?)"', webpage, 'thumbnail', fatal=False)
@@ -47,14 +37,12 @@ class MalemotionIE(InfoExtractor):
'format_id': 'mp4',
'preference': 1,
}]
self._sort_formats(formats)
return {
'id': video_id,
'formats': formats,
'uploader': None,
'upload_date': None,
'title': video_title,
'thumbnail': video_thumbnail,
'description': None,
'age_limit': 18,
}

View File

@@ -3,10 +3,12 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
from ..compat import (
compat_parse_qs,
compat_urllib_parse,
compat_urllib_request,
)
from ..utils import (
determine_ext,
ExtractorError,
int_or_none,

View File

@@ -5,8 +5,10 @@ import json
from .common import InfoExtractor
from .youtube import YoutubeIE
from ..utils import (
from ..compat import (
compat_urlparse,
)
from ..utils import (
clean_html,
ExtractorError,
get_element_by_id,
@@ -15,7 +17,7 @@ from ..utils import (
class TechTVMITIE(InfoExtractor):
IE_NAME = 'techtv.mit.edu'
_VALID_URL = r'https?://techtv\.mit\.edu/(videos|embeds)/(?P<id>\d+)'
_VALID_URL = r'https?://techtv\.mit\.edu/(?:videos|embeds)/(?P<id>\d+)'
_TEST = {
'url': 'http://techtv.mit.edu/videos/25418-mit-dna-learning-center-set',
@@ -29,8 +31,7 @@ class TechTVMITIE(InfoExtractor):
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
video_id = self._match_id(url)
raw_page = self._download_webpage(
'http://techtv.mit.edu/videos/%s' % video_id, video_id)
clean_page = re.compile(r'<!--.*?-->', re.S).sub('', raw_page)
@@ -104,7 +105,7 @@ class OCWMITIE(InfoExtractor):
'ext': 'mp4',
'title': 'Lecture 7: Multiple Discrete Random Variables: Expectations, Conditioning, Independence',
'description': 'In this lecture, the professor discussed multiple random variables, expectations, and binomial distribution.',
#'subtitles': 'http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-041-probabilistic-systems-analysis-and-applied-probability-fall-2010/video-lectures/lecture-7-multiple-variables-expectations-independence/MIT6_041F11_lec07_300k.mp4.srt'
# 'subtitles': 'http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-041-probabilistic-systems-analysis-and-applied-probability-fall-2010/video-lectures/lecture-7-multiple-variables-expectations-independence/MIT6_041F11_lec07_300k.mp4.srt'
}
},
{
@@ -114,7 +115,7 @@ class OCWMITIE(InfoExtractor):
'ext': 'mp4',
'title': 'Session 1: Introduction to Derivatives',
'description': 'This section contains lecture video excerpts, lecture notes, an interactive mathlet with supporting documents, and problem solving videos.',
#'subtitles': 'http://ocw.mit.edu//courses/mathematics/18-01sc-single-variable-calculus-fall-2010/ocw-18.01-f07-lec01_300k.SRT'
# 'subtitles': 'http://ocw.mit.edu//courses/mathematics/18-01sc-single-variable-calculus-fall-2010/ocw-18.01-f07-lec01_300k.SRT'
}
}
]

View File

@@ -1,12 +1,13 @@
from __future__ import unicode_literals
import re
import json
from .common import InfoExtractor
from ..utils import (
from ..compat import (
compat_urllib_parse,
compat_urlparse,
)
from ..utils import (
get_element_by_attribute,
parse_duration,
strip_jsonp,
@@ -15,7 +16,7 @@ from ..utils import (
class MiTeleIE(InfoExtractor):
IE_NAME = 'mitele.es'
_VALID_URL = r'http://www\.mitele\.es/[^/]+/[^/]+/[^/]+/(?P<episode>[^/]+)/'
_VALID_URL = r'http://www\.mitele\.es/[^/]+/[^/]+/[^/]+/(?P<id>[^/]+)/'
_TEST = {
'url': 'http://www.mitele.es/programas-tv/diario-de/la-redaccion/programa-144/',
@@ -31,12 +32,10 @@ class MiTeleIE(InfoExtractor):
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
episode = mobj.group('episode')
episode = self._match_id(url)
webpage = self._download_webpage(url, episode)
embed_data_json = self._search_regex(
r'MSV\.embedData\[.*?\]\s*=\s*({.*?});', webpage, 'embed data',
flags=re.DOTALL
r'(?s)MSV\.embedData\[.*?\]\s*=\s*({.*?});', webpage, 'embed data',
).replace('\'', '"')
embed_data = json.loads(embed_data_json)

View File

@@ -3,8 +3,10 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
from ..compat import (
compat_urllib_parse,
)
from ..utils import (
ExtractorError,
HEADRequest,
int_or_none,

View File

@@ -5,10 +5,12 @@ import json
import re
from .common import InfoExtractor
from ..utils import (
ExtractorError,
from ..compat import (
compat_urllib_parse,
compat_urllib_request,
)
from ..utils import (
ExtractorError,
int_or_none,
)

View File

@@ -4,7 +4,7 @@ import os
import re
from .common import InfoExtractor
from ..utils import (
from ..compat import (
compat_urllib_parse_urlparse,
compat_urllib_request,
compat_urllib_parse,
@@ -12,7 +12,7 @@ from ..utils import (
class MofosexIE(InfoExtractor):
_VALID_URL = r'^https?://(?:www\.)?(?P<url>mofosex\.com/videos/(?P<videoid>[0-9]+)/.*?\.html)'
_VALID_URL = r'https?://(?:www\.)?(?P<url>mofosex\.com/videos/(?P<id>[0-9]+)/.*?\.html)'
_TEST = {
'url': 'http://www.mofosex.com/videos/5018/japanese-teen-music-video.html',
'md5': '1b2eb47ac33cc75d4a80e3026b613c5a',
@@ -26,7 +26,7 @@ class MofosexIE(InfoExtractor):
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('videoid')
video_id = mobj.group('id')
url = 'http://www.' + mobj.group('url')
req = compat_urllib_request.Request(url)

View File

@@ -5,7 +5,7 @@ import os.path
import re
from .common import InfoExtractor
from ..utils import (
from ..compat import (
compat_urllib_parse,
compat_urllib_request,
)
@@ -37,10 +37,9 @@ class MonikerIE(InfoExtractor):
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
video_id = self._match_id(url)
orig_webpage = self._download_webpage(url, video_id)
fields = re.findall(r'type="hidden" name="(.+?)"\s* value="?(.+?)">', orig_webpage)
data = dict(fields)

View File

@@ -1,14 +1,15 @@
from __future__ import unicode_literals
import re
import time
from .common import InfoExtractor
from ..utils import (
ExtractorError,
from ..compat import (
compat_urllib_request,
compat_urllib_parse,
)
from ..utils import (
ExtractorError,
)
class MooshareIE(InfoExtractor):
@@ -43,9 +44,7 @@ class MooshareIE(InfoExtractor):
]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
video_id = self._match_id(url)
page = self._download_webpage(url, video_id, 'Downloading page')
if re.search(r'>Video Not Found or Deleted<', page) is not None:
@@ -64,8 +63,7 @@ class MooshareIE(InfoExtractor):
'http://mooshare.biz/%s' % video_id, compat_urllib_parse.urlencode(download_form))
request.add_header('Content-Type', 'application/x-www-form-urlencoded')
self.to_screen('%s: Waiting for timeout' % video_id)
time.sleep(5)
self._sleep(5, video_id)
video_page = self._download_webpage(request, video_id, 'Downloading video page')

View File

@@ -3,13 +3,14 @@ from __future__ import unicode_literals
import hashlib
import json
import re
import time
from .common import InfoExtractor
from ..utils import (
from ..compat import (
compat_parse_qs,
compat_str,
)
from ..utils import (
int_or_none,
)
@@ -32,10 +33,9 @@ class MotorsportIE(InfoExtractor):
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
display_id = mobj.group('id')
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
flashvars_code = self._html_search_regex(
r'<embed id="player".*?flashvars="([^"]+)"', webpage, 'flashvars')
flashvars = compat_parse_qs(flashvars_code)

View File

@@ -3,9 +3,11 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import (
compat_str,
)
from ..utils import (
ExtractorError,
compat_str,
clean_html,
)

View File

@@ -3,9 +3,11 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
from ..compat import (
compat_urllib_parse,
compat_urllib_request,
)
from ..utils import (
ExtractorError,
find_xpath_attr,
fix_xml_ampersands,

View File

@@ -2,9 +2,10 @@ from __future__ import unicode_literals
import os.path
from .common import InfoExtractor
from ..utils import (
from ..compat import (
compat_urllib_parse_urlparse,
)
from ..utils import (
ExtractorError,
)

View File

@@ -4,8 +4,10 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
from ..compat import (
compat_urllib_parse,
)
from ..utils import (
ExtractorError,
clean_html,
)
@@ -26,9 +28,9 @@ class NaverIE(InfoExtractor):
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group(1)
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
m_id = re.search(r'var rmcPlayer = new nhn.rmcnmv.RMCVideoPlayer\("(.+?)", "(.+?)"',
webpage)
if m_id is None:

View File

@@ -4,8 +4,10 @@ import re
import json
from .common import InfoExtractor
from ..utils import (
from ..compat import (
compat_str,
)
from ..utils import (
ExtractorError,
find_xpath_attr,
)

View File

@@ -1,9 +1,7 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
from ..compat import (
compat_urllib_request,
compat_urllib_parse,
)
@@ -12,7 +10,7 @@ from ..utils import (
class NFBIE(InfoExtractor):
IE_NAME = 'nfb'
IE_DESC = 'National Film Board of Canada'
_VALID_URL = r'https?://(?:www\.)?(nfb|onf)\.ca/film/(?P<id>[\da-z_-]+)'
_VALID_URL = r'https?://(?:www\.)?(?:nfb|onf)\.ca/film/(?P<id>[\da-z_-]+)'
_TEST = {
'url': 'https://www.nfb.ca/film/qallunaat_why_white_people_are_funny',
@@ -32,10 +30,10 @@ class NFBIE(InfoExtractor):
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
page = self._download_webpage('https://www.nfb.ca/film/%s' % video_id, video_id, 'Downloading film page')
video_id = self._match_id(url)
page = self._download_webpage(
'https://www.nfb.ca/film/%s' % video_id, video_id,
'Downloading film page')
uploader_id = self._html_search_regex(r'<a class="director-link" href="/explore-all-directors/([^/]+)/"',
page, 'director id', fatal=False)

View File

@@ -4,9 +4,11 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import (
compat_urllib_parse_urlparse,
)
from ..utils import (
ExtractorError,
compat_urllib_parse_urlparse,
int_or_none,
remove_end,
)

View File

@@ -54,7 +54,7 @@ class NHLBaseInfoExtractor(InfoExtractor):
class NHLIE(NHLBaseInfoExtractor):
IE_NAME = 'nhl.com'
_VALID_URL = r'https?://video(?P<team>\.[^.]*)?\.nhl\.com/videocenter/console(?:\?(?:.*?[?&])?)id=(?P<id>[0-9a-z-]+)'
_VALID_URL = r'https?://video(?P<team>\.[^.]*)?\.nhl\.com/videocenter/console(?:\?(?:.*?[?&])?)id=(?P<id>[-0-9a-zA-Z]+)'
_TESTS = [{
'url': 'http://video.canucks.nhl.com/videocenter/console?catid=6?id=453614',

View File

@@ -5,14 +5,16 @@ import re
import json
from .common import InfoExtractor
from ..utils import (
from ..compat import (
compat_urllib_parse,
compat_urllib_request,
compat_urlparse,
unified_strdate,
parse_duration,
int_or_none,
)
from ..utils import (
ExtractorError,
int_or_none,
parse_duration,
unified_strdate,
)

View File

@@ -23,6 +23,9 @@ class NineGagIE(InfoExtractor):
"ext": "mp4",
"description": "This 3-minute video will make you smile and then make you feel untalented and insignificant. Anyway, you should share this awesomeness. (Thanks, Dino!)",
"title": "\"People Are Awesome 2013\" Is Absolutely Awesome",
'uploader_id': 'UCdEH6EjDKwtTe-sO2f0_1XA',
'uploader': 'CompilationChannel',
'upload_date': '20131110',
"view_count": int,
"thumbnail": "re:^https?://",
},
@@ -35,6 +38,9 @@ class NineGagIE(InfoExtractor):
'display_id': 'alternate-banned-opening-scene-of-gravity',
"description": "While Gravity was a pretty awesome movie already, YouTuber Krishna Shenoi came up with a way to improve upon it, introducing a much better solution to Sandra Bullock's seemingly endless tumble in space. The ending is priceless.",
'title': "Banned Opening Scene Of \"Gravity\" That Changes The Whole Movie",
'uploader': 'Krishna Shenoi',
'upload_date': '20140401',
'uploader_id': 'krishnashenoi93',
},
}]

View File

@@ -6,13 +6,15 @@ import time
import hashlib
from .common import InfoExtractor
from ..utils import (
compat_urllib_request,
compat_urllib_parse,
ExtractorError,
clean_html,
unified_strdate,
from ..compat import (
compat_str,
compat_urllib_parse,
compat_urllib_request,
)
from ..utils import (
clean_html,
ExtractorError,
unified_strdate,
)

View File

@@ -4,9 +4,11 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import (
compat_urllib_request,
)
from ..utils import (
ExtractorError,
compat_urllib_request,
urlencode_postdata,
xpath_text,
xpath_with_ns,
@@ -32,8 +34,7 @@ class NosVideoIE(InfoExtractor):
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
video_id = self._match_id(url)
fields = {
'id': video_id,

View File

@@ -3,9 +3,11 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import (
compat_urlparse,
)
from ..utils import (
ExtractorError,
compat_urlparse
)

View File

@@ -7,7 +7,7 @@ class NowVideoIE(NovaMovIE):
IE_NAME = 'nowvideo'
IE_DESC = 'NowVideo'
_VALID_URL = NovaMovIE._VALID_URL_TEMPLATE % {'host': 'nowvideo\.(?:ch|sx|eu|at|ag|co)'}
_VALID_URL = NovaMovIE._VALID_URL_TEMPLATE % {'host': 'nowvideo\.(?:ch|sx|eu|at|ag|co|li)'}
_HOST = 'www.nowvideo.ch'

View File

@@ -3,15 +3,17 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import (
compat_urllib_request,
)
from ..utils import (
parse_duration,
unified_strdate,
compat_urllib_request,
)
class NuvidIE(InfoExtractor):
_VALID_URL = r'^https?://(?:www|m)\.nuvid\.com/video/(?P<id>[0-9]+)'
_VALID_URL = r'https?://(?:www|m)\.nuvid\.com/video/(?P<id>[0-9]+)'
_TEST = {
'url': 'http://m.nuvid.com/video/1310741/',
'md5': 'eab207b7ac4fccfb4e23c86201f11277',
@@ -26,8 +28,7 @@ class NuvidIE(InfoExtractor):
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
video_id = self._match_id(url)
formats = []

View File

@@ -16,7 +16,6 @@ class OoyalaIE(InfoExtractor):
{
# From http://it.slashdot.org/story/13/04/25/178216/recovering-data-from-broken-hard-drives-and-ssds-video
'url': 'http://player.ooyala.com/player.js?embedCode=pxczE2YjpfHfn1f3M-ykG_AmJRRn0PD8',
'md5': '3f5cceb3a7bf461d6c29dc466cf8033c',
'info_dict': {
'id': 'pxczE2YjpfHfn1f3M-ykG_AmJRRn0PD8',
'ext': 'mp4',
@@ -26,7 +25,6 @@ class OoyalaIE(InfoExtractor):
}, {
# Only available for ipad
'url': 'http://player.ooyala.com/player.js?embedCode=x1b3lqZDq9y_7kMyC2Op5qo-p077tXD0',
'md5': '4b9754921fddb68106e48c142e2a01e6',
'info_dict': {
'id': 'x1b3lqZDq9y_7kMyC2Op5qo-p077tXD0',
'ext': 'mp4',

View File

@@ -17,24 +17,39 @@ from ..utils import (
class ORFTVthekIE(InfoExtractor):
IE_NAME = 'orf:tvthek'
IE_DESC = 'ORF TVthek'
_VALID_URL = r'https?://tvthek\.orf\.at/(?:programs/.+?/episodes|topics/.+?|program/[^/]+)/(?P<id>\d+)'
_VALID_URL = r'https?://tvthek\.orf\.at/(?:programs/.+?/episodes|topics?/.+?|program/[^/]+)/(?P<id>\d+)'
_TEST = {
'url': 'http://tvthek.orf.at/program/matinee-Was-Sie-schon-immer-ueber-Klassik-wissen-wollten/7317210/Was-Sie-schon-immer-ueber-Klassik-wissen-wollten/7319746/Was-Sie-schon-immer-ueber-Klassik-wissen-wollten/7319747',
'file': '7319747.mp4',
'md5': 'bd803c5d8c32d3c64a0ea4b4eeddf375',
'info_dict': {
'title': 'Was Sie schon immer über Klassik wissen wollten',
'description': 'md5:0ddf0d5f0060bd53f744edaa5c2e04a4',
'duration': 3508,
'upload_date': '20140105',
},
'skip': 'Blocked outside of Austria',
}
_TESTS = [{
'url': 'http://tvthek.orf.at/program/Aufgetischt/2745173/Aufgetischt-Mit-der-Steirischen-Tafelrunde/8891389',
'playlist': [{
'md5': '2942210346ed779588f428a92db88712',
'info_dict': {
'id': '8896777',
'ext': 'mp4',
'title': 'Aufgetischt: Mit der Steirischen Tafelrunde',
'description': 'md5:c1272f0245537812d4e36419c207b67d',
'duration': 2668,
'upload_date': '20141208',
},
}],
'skip': 'Blocked outside of Austria / Germany',
}, {
'url': 'http://tvthek.orf.at/topic/Im-Wandel-der-Zeit/8002126/Best-of-Ingrid-Thurnher/7982256',
'playlist': [{
'md5': '68f543909aea49d621dfc7703a11cfaf',
'info_dict': {
'id': '7982259',
'ext': 'mp4',
'title': 'Best of Ingrid Thurnher',
'upload_date': '20140527',
'description': 'Viele Jahre war Ingrid Thurnher das "Gesicht" der ZIB 2. Vor ihrem Wechsel zur ZIB 2 im jahr 1995 moderierte sie unter anderem "Land und Leute", "Österreich-Bild" und "Niederösterreich heute".',
}
}],
'_skip': 'Blocked outside of Austria / Germany',
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
playlist_id = mobj.group('id')
playlist_id = self._match_id(url)
webpage = self._download_webpage(url, playlist_id)
data_json = self._search_regex(
@@ -43,7 +58,9 @@ class ORFTVthekIE(InfoExtractor):
def get_segments(all_data):
for data in all_data:
if data['name'] == 'Tracker::EPISODE_DETAIL_PAGE_OVER_PROGRAM':
if data['name'] in (
'Tracker::EPISODE_DETAIL_PAGE_OVER_PROGRAM',
'Tracker::EPISODE_DETAIL_PAGE_OVER_TOPIC'):
return data['values']['segments']
sdata = get_segments(all_data)
@@ -120,9 +137,7 @@ class ORFOE1IE(InfoExtractor):
_VALID_URL = r'http://oe1\.orf\.at/programm/(?P<id>[0-9]+)'
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
show_id = mobj.group('id')
show_id = self._match_id(url)
data = self._download_json(
'http://oe1.orf.at/programm/%s/konsole' % show_id,
show_id

View File

@@ -4,16 +4,17 @@ import json
import re
from .common import InfoExtractor
from ..utils import compat_urllib_parse
from ..compat import compat_urllib_parse
class PhotobucketIE(InfoExtractor):
_VALID_URL = r'http://(?:[a-z0-9]+\.)?photobucket\.com/.*(([\?\&]current=)|_)(?P<id>.*)\.(?P<ext>(flv)|(mp4))'
_TEST = {
'url': 'http://media.photobucket.com/user/rachaneronas/media/TiredofLinkBuildingTryBacklinkMyDomaincom_zpsc0c3b9fa.mp4.html?filters[term]=search&filters[primary]=videos&filters[secondary]=images&sort=1&o=0',
'file': 'zpsc0c3b9fa.mp4',
'md5': '7dabfb92b0a31f6c16cebc0f8e60ff99',
'info_dict': {
'id': 'zpsc0c3b9fa',
'ext': 'mp4',
'timestamp': 1367669341,
'upload_date': '20130504',
'uploader': 'rachaneronas',

View File

@@ -5,11 +5,13 @@ import re
import os.path
from .common import InfoExtractor
from ..utils import (
ExtractorError,
from ..compat import (
compat_urllib_parse,
compat_urllib_request,
)
from ..utils import (
ExtractorError,
)
class PlayedIE(InfoExtractor):
@@ -28,7 +30,6 @@ class PlayedIE(InfoExtractor):
def _real_extract(self, url):
video_id = self._match_id(url)
orig_webpage = self._download_webpage(url, video_id)
m_error = re.search(

View File

@@ -4,9 +4,11 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
from ..compat import (
compat_urllib_parse,
compat_urllib_request,
)
from ..utils import (
ExtractorError,
float_or_none,
int_or_none,

View File

@@ -3,31 +3,31 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
ExtractorError,
clean_html,
from ..compat import (
compat_urllib_parse,
)
from ..utils import (
clean_html,
ExtractorError,
)
class PlayvidIE(InfoExtractor):
_VALID_URL = r'^https?://www\.playvid\.com/watch(\?v=|/)(?P<id>.+?)(?:#|$)'
_VALID_URL = r'https?://www\.playvid\.com/watch(\?v=|/)(?P<id>.+?)(?:#|$)'
_TEST = {
'url': 'http://www.playvid.com/watch/agbDDi7WZTV',
'md5': '44930f8afa616efdf9482daf4fe53e1e',
'url': 'http://www.playvid.com/watch/RnmBNgtrrJu',
'md5': 'ffa2f6b2119af359f544388d8c01eb6c',
'info_dict': {
'id': 'agbDDi7WZTV',
'id': 'RnmBNgtrrJu',
'ext': 'mp4',
'title': 'Michelle Lewin in Miami Beach',
'duration': 240,
'title': 'md5:9256d01c6317e3f703848b5906880dc8',
'duration': 82,
'age_limit': 18,
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
m_error = re.search(

View File

@@ -4,10 +4,12 @@ import os
import re
from .common import InfoExtractor
from ..utils import (
from ..compat import (
compat_urllib_parse,
compat_urllib_parse_urlparse,
compat_urllib_request,
compat_urllib_parse,
)
from ..utils import (
str_to_int,
)
from ..aes import (
@@ -16,7 +18,7 @@ from ..aes import (
class PornHubIE(InfoExtractor):
_VALID_URL = r'^https?://(?:www\.)?pornhub\.com/view_video\.php\?viewkey=(?P<id>[0-9a-f]+)'
_VALID_URL = r'https?://(?:www\.)?pornhub\.com/view_video\.php\?viewkey=(?P<id>[0-9a-f]+)'
_TEST = {
'url': 'http://www.pornhub.com/view_video.php?viewkey=648719015',
'md5': '882f488fa1f0026f023f33576004a2ed',

View File

@@ -1,56 +1,94 @@
from __future__ import unicode_literals
import re
import json
from .common import InfoExtractor
from ..compat import (
compat_urllib_request,
)
from ..utils import (
compat_urllib_parse,
unified_strdate,
int_or_none,
)
class PornotubeIE(InfoExtractor):
_VALID_URL = r'https?://(?:\w+\.)?pornotube\.com(/c/(?P<channel>[0-9]+))?(/m/(?P<videoid>[0-9]+))(/(?P<title>.+))$'
_VALID_URL = r'https?://(?:\w+\.)?pornotube\.com/(?:[^?#]*?)/video/(?P<id>[0-9]+)'
_TEST = {
'url': 'http://pornotube.com/c/173/m/1689755/Marilyn-Monroe-Bathing',
'md5': '374dd6dcedd24234453b295209aa69b6',
'url': 'http://www.pornotube.com/orientation/straight/video/4964/title/weird-hot-and-wet-science',
'md5': '60fc5a4f0d93a97968fc7999d98260c9',
'info_dict': {
'id': '1689755',
'ext': 'flv',
'upload_date': '20090708',
'title': 'Marilyn-Monroe-Bathing',
'age_limit': 18
'id': '4964',
'ext': 'mp4',
'upload_date': '20141203',
'title': 'Weird Hot and Wet Science',
'description': 'md5:a8304bef7ef06cb4ab476ca6029b01b0',
'categories': ['Adult Humor', 'Blondes'],
'uploader': 'Alpha Blue Archives',
'thumbnail': 're:^https?://.*\\.jpg$',
'timestamp': 1417582800,
'age_limit': 18,
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = self._match_id(url)
video_id = mobj.group('videoid')
video_title = mobj.group('title')
# Fetch origin token
js_config = self._download_webpage(
'http://www.pornotube.com/assets/src/app/config.js', video_id,
note='Download JS config')
originAuthenticationSpaceKey = self._search_regex(
r"constant\('originAuthenticationSpaceKey',\s*'([^']+)'",
js_config, 'originAuthenticationSpaceKey')
# Get webpage content
webpage = self._download_webpage(url, video_id)
# Fetch actual token
token_req_data = {
'authenticationSpaceKey': originAuthenticationSpaceKey,
'credentials': 'Clip Application',
}
token_req = compat_urllib_request.Request(
'https://api.aebn.net/auth/v1/token/primal',
data=json.dumps(token_req_data).encode('utf-8'))
token_req.add_header('Content-Type', 'application/json')
token_req.add_header('Origin', 'http://www.pornotube.com')
token_answer = self._download_json(
token_req, video_id, note='Requesting primal token')
token = token_answer['tokenKey']
# Get the video URL
VIDEO_URL_RE = r'url: "(?P<url>http://video[0-9].pornotube.com/.+\.flv)",'
video_url = self._search_regex(VIDEO_URL_RE, webpage, 'video url')
video_url = compat_urllib_parse.unquote(video_url)
# Get video URL
delivery_req = compat_urllib_request.Request(
'https://api.aebn.net/delivery/v1/clips/%s/MP4' % video_id)
delivery_req.add_header('Authorization', token)
delivery_info = self._download_json(
delivery_req, video_id, note='Downloading delivery information')
video_url = delivery_info['mediaUrl']
# Get the uploaded date
VIDEO_UPLOADED_RE = r'<div class="video_added_by">Added (?P<date>[0-9\/]+) by'
upload_date = self._html_search_regex(VIDEO_UPLOADED_RE, webpage, 'upload date', fatal=False)
if upload_date:
upload_date = unified_strdate(upload_date)
age_limit = self._rta_search(webpage)
# Get additional info (title etc.)
info_req = compat_urllib_request.Request(
'https://api.aebn.net/content/v1/clips/%s?expand='
'title,description,primaryImageNumber,startSecond,endSecond,'
'movie.title,movie.MovieId,movie.boxCoverFront,movie.stars,'
'movie.studios,stars.name,studios.name,categories.name,'
'clipActive,movieActive,publishDate,orientations' % video_id)
info_req.add_header('Authorization', token)
info = self._download_json(
info_req, video_id, note='Downloading metadata')
timestamp = int_or_none(info.get('publishDate'), scale=1000)
uploader = info.get('studios', [{}])[0].get('name')
movie_id = info['movie']['movieId']
thumbnail = 'http://pic.aebn.net/dis/t/%s/%s_%08d.jpg' % (
movie_id, movie_id, info['primaryImageNumber'])
categories = [c['name'] for c in info.get('categories')]
return {
'id': video_id,
'url': video_url,
'upload_date': upload_date,
'title': video_title,
'ext': 'flv',
'format': 'flv',
'age_limit': age_limit,
'title': info['title'],
'description': info.get('description'),
'timestamp': timestamp,
'uploader': uploader,
'thumbnail': thumbnail,
'categories': categories,
'age_limit': 18,
}

Some files were not shown because too many files have changed in this diff Show More