Compare commits

..

792 Commits

Author SHA1 Message Date
Philipp Hagemeister
a7680bf330 release 2014.08.27.1 2014-08-27 02:37:23 +02:00
Philipp Hagemeister
6d3d3fc083 [ard] Add suppor for plain ARD downloads (Fixes #3546) 2014-08-27 02:36:57 +02:00
Philipp Hagemeister
aff216edf4 [generic] Prevent <video> search from skipping over empty sources (#3546) 2014-08-27 02:09:59 +02:00
Philipp Hagemeister
1cb6dcdbbe [generic] Do not download images as videos by accident 2014-08-27 02:07:11 +02:00
Philipp Hagemeister
3f514a353e release 2014.08.27 2014-08-27 01:44:54 +02:00
Philipp Hagemeister
da9ec3b932 [muscivault] Add extractor (Fixes #3593) 2014-08-27 01:44:47 +02:00
Philipp Hagemeister
191b7cbba9 [mfs] Modernize 2014-08-27 01:04:32 +02:00
Philipp Hagemeister
e8c59b9642 release 2014.08.26 2014-08-26 21:30:52 +02:00
Philipp Hagemeister
6abb066128 [sockshare] Fix title extraction (Fixes #3592) 2014-08-26 21:30:30 +02:00
Philipp Hagemeister
8f1ea7cbb6 [empflix] Revert to XML parser
Don't rely on the XML being broken (if they fix it, our code wouldn't work anymore).
Instead, use the transform function we already have :)

This partially reverts commit c7bee2a725.
2014-08-26 15:51:42 +02:00
Jaime Marquínez Ferrándiz
a204c85408 [ign] Fix extraction of video in articles 2014-08-26 15:38:29 +02:00
Sergey M․
15a1f4b8fe [empflix] Extract thumbnail 2014-08-26 20:10:36 +07:00
Sergey M․
c7bee2a725 [empflix] Adapt to malformed config XML 2014-08-26 20:07:28 +07:00
Jaime Marquínez Ferrándiz
dbc1366b50 [mixcloud] Use a HEAD request when checking if the url is valid 2014-08-26 14:55:15 +02:00
Philipp Hagemeister
704df56da7 [sportdeutschland] add new extractor 2014-08-26 12:51:13 +02:00
Philipp Hagemeister
33ac271ba7 [utils] Let request headers override standard headers
What was I thinking when writing this?
2014-08-26 11:51:48 +02:00
Philipp Hagemeister
0963f92f23 [eighttracks] modernize 2014-08-26 11:31:23 +02:00
Philipp Hagemeister
9a66c1079c release 2014.08.25.3 2014-08-25 18:38:10 +02:00
Philipp Hagemeister
f971dcbba0 Merge branch 'master' of github.com:rg3/youtube-dl 2014-08-25 18:36:42 +02:00
Philipp Hagemeister
0990305d2a [generic] Fix rss under Python 2.x and move test to extractor 2014-08-25 18:03:01 +02:00
Jaime Marquínez Ferrándiz
bcc069a937 [generic] Remove debug statement 2014-08-25 17:21:58 +02:00
Jaime Marquínez Ferrándiz
34708e1bb6 [bliptv] Remove superfluous characters in _VALID_URL regex 2014-08-25 17:16:11 +02:00
Philipp Hagemeister
829476b80a [googlesearch] Move test to extractor 2014-08-25 17:02:52 +02:00
Philipp Hagemeister
1dd70fe330 release 2014.08.25.2 2014-08-25 16:52:28 +02:00
Philipp Hagemeister
067e922295 release 2014.08.25.1 2014-08-25 16:41:05 +02:00
Sergey M․
c28df2478f [wat] Use server time and pass country argument (Closes #3579) 2014-08-25 20:21:33 +07:00
Philipp Hagemeister
241f7a8ade Merge remote-tracking branch 'JGjorgji/fix-leading-zeroes' 2014-08-25 13:59:19 +02:00
Philipp Hagemeister
b252735910 [extractor/common] Generate better f4m format IDs 2014-08-25 13:03:08 +02:00
Philipp Hagemeister
7adcbe7594 [rtlnl] Extract duration 2014-08-25 12:59:53 +02:00
Philipp Hagemeister
8d31fa3cce [execafterdownload] Simplify (#3569) 2014-08-25 10:18:01 +02:00
Philipp Hagemeister
1f06864e9a [wat] Remove unused import 2014-08-25 10:15:32 +02:00
Philipp Hagemeister
348ae0a79e Merge remote-tracking branch 'mcd1992/exec_after_download' 2014-08-25 09:44:11 +02:00
Philipp Hagemeister
528d455632 release 2014.08.25 2014-08-25 09:35:46 +02:00
Philipp Hagemeister
ba5d51b340 [vimeo] Always pass in referer (Fixes #3582) 2014-08-25 09:35:37 +02:00
mcd1992
7833d941bb Rebased with upstream/master 2014-08-24 15:04:50 -05:00
mcd1992
a2360a4c80 Moved from os.system to subprocess.call 2014-08-24 14:38:43 -05:00
mcd1992
a7cacbca2b Implemented --exec option. 2014-08-24 14:38:43 -05:00
Gjorgji Jankovski
c6b4132a0a renamed for consistency 2014-08-24 18:49:04 +02:00
Gjorgji Jankovski
ad260c90ab Filenames are padded according to the playlist length 2014-08-24 18:23:32 +02:00
Philipp Hagemeister
b8313f07bc release 2014.08.24.6 2014-08-24 15:19:33 +02:00
Philipp Hagemeister
92a17d28ac [wat] Make geolock a warning (Fixes #3579) 2014-08-24 15:19:21 +02:00
Philipp Hagemeister
5f90042bd6 [generic] remove unused imports 2014-08-24 14:28:58 +02:00
Philipp Hagemeister
9480d1a566 Merge remote-tracking branch 'riking/twofactor' 2014-08-24 07:14:23 +02:00
Philipp Hagemeister
36b0079f23 Credit @olebowle for GameOne:playlist (#3247) 2014-08-24 07:06:54 +02:00
Philipp Hagemeister
28028629b9 [gameone:playlist] Move test to extractor 2014-08-24 07:05:49 +02:00
Philipp Hagemeister
11f75cac3d Merge remote-tracking branch 'olebowle/gameone' 2014-08-24 07:02:29 +02:00
Philipp Hagemeister
e673db0194 release 2014.08.24.5 2014-08-24 06:58:47 +02:00
Philipp Hagemeister
ebab4520ff [generic] Use default opener for HEAD request (Fixes #3528) 2014-08-24 06:58:11 +02:00
Philipp Hagemeister
a71d1414eb release 2014.08.24.4 2014-08-24 06:42:05 +02:00
Philipp Hagemeister
423817c468 [expotv] Add new extractor (Fixes #3552) 2014-08-24 06:41:55 +02:00
Philipp Hagemeister
51ed9fce09 [pornotube] Modernize 2014-08-24 06:16:24 +02:00
Philipp Hagemeister
d43aeb1d00 release 2014.08.24.3 2014-08-24 05:32:31 +02:00
Philipp Hagemeister
4d805e063c [generic] Automatic detection of flow player and age_limit (Fixes #3576) 2014-08-24 05:31:32 +02:00
Philipp Hagemeister
24e5e24166 release 2014.08.24.2 2014-08-24 04:47:38 +02:00
Philipp Hagemeister
4d54ef20a2 [ministrygrid] Add extractor (Fixes #2900) 2014-08-24 04:47:28 +02:00
Philipp Hagemeister
54036b3991 [wayofthemaster] Remove unused import 2014-08-24 04:18:09 +02:00
Philipp Hagemeister
e5402ac120 [wayofthemaster] Add extractor (Fixes #3575) 2014-08-24 04:14:02 +02:00
Philipp Hagemeister
f56f8399c7 [ebaumsworld] Remove spurious determine_ext 2014-08-24 03:37:19 +02:00
Philipp Hagemeister
cf0c5fa3a1 [ebaumsworld] Modernize 2014-08-24 03:31:38 +02:00
Philipp Hagemeister
8c2ccefae6 release 2014.08.24.1 2014-08-24 03:20:40 +02:00
Philipp Hagemeister
1f8b6af773 [bip.tv] Allow underscore in lookup ids (Fixes #3573) 2014-08-24 03:20:31 +02:00
Philipp Hagemeister
8f9b683eeb [blip.tv] Add legacy test case
This was broken in the mean time, so add a test case to make sure it doesn't break silently again.
2014-08-24 03:13:58 +02:00
Philipp Hagemeister
b5f4775b38 [arte.tv:creative] Fix test case 2014-08-24 03:11:00 +02:00
Philipp Hagemeister
01d906ffe9 [arte:creative] Support more URLs (fixes #3572) 2014-08-24 02:57:32 +02:00
Philipp Hagemeister
614582bcc4 release 2014.08.24 2014-08-24 02:44:36 +02:00
Philipp Hagemeister
e1ab5000b2 [brightcove] Add support for videoId= in og:video meta (Fixes #3571) 2014-08-24 02:41:21 +02:00
Philipp Hagemeister
a5ed3e571e [brightcove] Detect geoblocking 2014-08-24 02:40:26 +02:00
Philipp Hagemeister
10eaeb20c5 [generic] Require og:video URLs to contain a dot 2014-08-24 02:29:56 +02:00
Philipp Hagemeister
fa8deaf38b [generic] Prevent from downloading a .swf as a video
We're seeing quite a number of people who do not put a video file in the og:video field, but the player URL. Try to detect some of these and filter them out.
2014-08-24 02:24:49 +02:00
Philipp Hagemeister
6857590059 [brightcove] Add a truncated URL warning message (#3571) 2014-08-24 02:11:26 +02:00
Philipp Hagemeister
a3db22ebdf [grooveshark] Use proper imports 2014-08-24 02:06:59 +02:00
Philipp Hagemeister
c8e9a235d9 [generic] Add support for camtasia videos (Fixes #3574) 2014-08-24 02:02:17 +02:00
Philipp Hagemeister
30b871b0ca Merge remote-tracking branch 'origin/master' 2014-08-24 01:34:28 +02:00
Philipp Hagemeister
eb9da9b732 [grooveshark] Fix test md5sum 2014-08-24 01:33:55 +02:00
Philipp Hagemeister
d769be6c96 [grooveshark,http] Make HTTP POST downloads work 2014-08-24 01:31:35 +02:00
Sergey M․
a54bda3ae2 [wat] Add support for SD and HD videos (Closes #3558) 2014-08-24 02:22:10 +07:00
Philipp Hagemeister
00558d9414 Merge remote-tracking branch 'sehrgut/Grooveshark'
Conflicts:
	youtube_dl/__init__.py
	youtube_dl/extractor/__init__.py
2014-08-23 16:41:14 +02:00
Philipp Hagemeister
49f3c16543 release 2014.08.23 2014-08-23 15:24:31 +02:00
Philipp Hagemeister
2ef6fcb5d8 [sbs] Add new extractor (Fixes #3566) 2014-08-23 15:20:56 +02:00
Philipp Hagemeister
38fc045253 [rtlnl] Remove unused code 2014-08-23 15:05:21 +02:00
Philipp Hagemeister
af1fd929c6 [patreon] Remove unused import 2014-08-23 15:04:11 +02:00
Philipp Hagemeister
b7b04c9234 [vodlocker] Allow title to end with a <br> 2014-08-23 14:39:47 +02:00
Sergey M․
bc0bb6fd30 [movieclips] Add extractor (Closes #3554) 2014-08-23 17:44:56 +07:00
Philipp Hagemeister
430826c9d4 Merge pull request #3568 from MikeCol/xhamster_load
changed _VALID_URL to allow for country specific subdomains
2014-08-22 22:46:42 +02:00
MikeCol
68909f0c4e changed _VALID_URL to allow for country specific prefixes 2014-08-22 22:17:07 +02:00
Philipp Hagemeister
9d048a17d8 [rtve.es:live] Start supporting the 24h channel 2014-08-22 18:47:49 +02:00
Philipp Hagemeister
492641d10a release 2014.08.22.3 2014-08-22 18:41:43 +02:00
Philipp Hagemeister
2b9faf5542 [rtve] Add support for live stream
At the moment, only RTVE-1 seems to work flawlessly.
-2 seems geoblocked right now.
-TDP doesn't seem to be available outside of Spain.
2014-08-22 18:40:28 +02:00
Philipp Hagemeister
ed2d6a1960 [generic] Simplify playlist support (#2948) 2014-08-22 18:19:56 +02:00
Philipp Hagemeister
be843678b1 [YouTubeDL] Correct handling of age_limit = None in result 2014-08-22 17:46:57 +02:00
Philipp Hagemeister
c71dfccc98 Merge remote-tracking branch 'anovicecodemonkey/generic-data-video-url'
Conflicts:
	youtube_dl/extractor/generic.py
2014-08-22 17:40:36 +02:00
Philipp Hagemeister
1a9ccac7c1 Merge remote-tracking branch 'origin/master' 2014-08-22 17:38:11 +02:00
Philipp Hagemeister
e330d59abb [playfm] Add extractor (Fixes #3538) 2014-08-22 17:38:06 +02:00
Sergey M․
394df6d7d0 [nuvid] Adapt to latest layout changes 2014-08-22 21:41:51 +07:00
Philipp Hagemeister
218f754940 [README] Add thumbnail to _TEST example
While it's not mandatory, extractors are highly encouraged to provide a thumbnail field.
2014-08-22 11:30:49 +02:00
Philipp Hagemeister
a053c3493a [test_YoutubeDL] Reorder formats (#3542) 2014-08-22 03:44:30 +02:00
Philipp Hagemeister
50b294aab8 release 2014.08.22.2 2014-08-22 03:16:16 +02:00
Philipp Hagemeister
756b046f3e [pbs] recognize class=partnerPlayer as well (Fixes #3564) 2014-08-22 03:16:08 +02:00
Philipp Hagemeister
388ac0b18a release 2014.08.22.1 2014-08-22 03:02:49 +02:00
Philipp Hagemeister
ad06434bd3 release 2014.08.22 2014-08-22 02:57:08 +02:00
Philipp Hagemeister
bd9820c937 Merge remote-tracking branch 'liudongmiao/patch-subtitle' 2014-08-22 02:45:21 +02:00
Philipp Hagemeister
deda8ac376 Credit @terminalmage for patreon (#3390) 2014-08-22 02:34:22 +02:00
Philipp Hagemeister
e05f693942 [patreon] Simplify (#3390) 2014-08-22 02:33:29 +02:00
Philipp Hagemeister
b27295d2ab Merge remote-tracking branch 'terminalmage/add-patreon' 2014-08-22 01:52:56 +02:00
Philipp Hagemeister
ace52c5713 [README] format 2014-08-22 01:51:26 +02:00
Philipp Hagemeister
e62e150f64 [README] brevity is the soul of wit
These instructions are overly long as it is. Leave out the _TESTS example; most developers will not need it in their first IE.
2014-08-22 01:47:44 +02:00
Philipp Hagemeister
c44c0a775d Merge remote-tracking branch 'terminalmage/readme' 2014-08-22 01:46:46 +02:00
Philipp Hagemeister
5fcf2dbed0 [aparat] modernize 2014-08-22 01:44:52 +02:00
Philipp Hagemeister
91dff03217 [dump] Modernize (#3565) 2014-08-22 01:43:19 +02:00
Philipp Hagemeister
a200f4cee2 Merge remote-tracking branch 'yasoob/master' 2014-08-22 01:38:59 +02:00
Philipp Hagemeister
ea6e8d5454 [metacafe] Add support for movieclips videos (Fixes #3555) 2014-08-22 01:36:07 +02:00
M.Yasoob Ullah Khalid ☺
83d35817f5 Added test for dump.com 2014-08-22 01:31:12 +05:00
M.Yasoob Ullah Khalid ☺
76beff70a8 Added an IE for Dump.com 2014-08-22 01:30:49 +05:00
Philipp Hagemeister
61882bf7c6 release 2014.08.21.3 2014-08-21 18:02:02 +02:00
Philipp Hagemeister
cab317a680 Merge remote-tracking branch 'origin/master' 2014-08-21 18:01:33 +02:00
Sergey M․
73159f99cc [utils] Add missing mode and encoding arguments 2014-08-21 22:03:00 +07:00
Philipp Hagemeister
c15235cd07 [metacafe] Avoid excessive nesting 2014-08-21 13:37:19 +02:00
Philipp Hagemeister
12c3ec3382 [metacafe] Simplify 2014-08-21 13:25:17 +02:00
Philipp Hagemeister
55db73efdf [youtube] tag 171 is 128KBits (Fixes #3542) 2014-08-21 13:13:26 +02:00
Philipp Hagemeister
af40ac054a release 2014.08.21.2 2014-08-21 13:07:49 +02:00
Philipp Hagemeister
a36819731b [escapist] Add support for og:video:url (Fixes #3557) 2014-08-21 13:05:24 +02:00
Philipp Hagemeister
181c8655c7 [utils] Make JSON file writes atomic (Fixes #3549) 2014-08-21 13:01:13 +02:00
Philipp Hagemeister
3b95347bb6 [README] Document homebrew and pip installation (#3190) 2014-08-21 12:32:02 +02:00
Philipp Hagemeister
3b88ee9a7d release 2014.08.21.1 2014-08-21 12:16:21 +02:00
Philipp Hagemeister
55c49908d2 [youtube] Handle incorrectly written cache files (#3549) 2014-08-21 12:15:51 +02:00
Philipp Hagemeister
db9b0b67b7 release 2014.08.21 2014-08-21 11:58:08 +02:00
Philipp Hagemeister
35f76e0061 Merge remote-tracking branch 'origin/master' 2014-08-21 11:57:52 +02:00
Philipp Hagemeister
3f338cd6de Credit @akirk for ellentv 2014-08-21 11:57:44 +02:00
Philipp Hagemeister
1d01f26ab1 [ellentv] Simplify and correct tests 2014-08-21 11:57:03 +02:00
Philipp Hagemeister
266c71f971 Deprecate test_playlists 2014-08-21 11:56:49 +02:00
Philipp Hagemeister
e8ee972c6e Allow playlist test definitions in test_download.
This moves playlist tests where they belong, i.e. to the extractors themselves.
Additionally, all our network interaction configuration for tests in test_download now applies to playlist tests as well.
2014-08-21 11:52:07 +02:00
Sergey M․
f83dda12ad [teamcoco] Update video id regex 2014-08-20 20:30:29 +07:00
Sergey M․
696d49815e Merge branch 'naglis-jove' 2014-08-19 20:02:31 +07:00
Sergey M․
fe556f1b0c [jove] Simplify, extract full description and add test for video that requires subscription 2014-08-19 20:02:08 +07:00
Sergey M․
d5638d974f Merge branch 'jove' of https://github.com/naglis/youtube-dl into naglis-jove 2014-08-19 19:22:25 +07:00
Jaime Marquínez Ferrándiz
938dd254e5 [mitele] Add extractor for mitele.es 2014-08-18 22:43:35 +02:00
Jaime Marquínez Ferrándiz
6493f5d704 [rtlnl] Add extractor for rtlxl.nl (closes #3523) 2014-08-18 15:40:48 +02:00
Sergey M․
cd6b48365e [pbs] Add frontline video test 2014-08-18 19:24:18 +07:00
Sergey M․
4d9bd478f9 [pbs] Extract coveplayerid (Closes #3522) 2014-08-18 19:20:53 +07:00
riking
165250ff5e Remove debug prints 2014-08-16 14:49:30 -07:00
riking
83317f6938 [youtube] Add two-factor account signin (TOTP only)
Additional work is required to prompt the user for the SMS or phone call codes, as there is no framework currently to prompt the user during an extraction operation.

Fixes #3533
2014-08-16 14:48:17 -07:00
Sergey M․
c1d293cfa6 [dfb] Fix f4m manifest URL 2014-08-17 02:07:04 +07:00
Sergey M․
49807b4ac6 [yahoo] Add support for embedded videos (Closes #3525) 2014-08-16 13:56:22 +07:00
Sergey M․
c990bb3633 [howstuffworks] Add extractor (#3500)
Content-length is invalid for final download links.
2014-08-15 21:38:41 +07:00
Philipp Hagemeister
af8322d2f9 Merge remote-tracking branch 'akirk/ellentv' 2014-08-15 10:55:54 +02:00
rubicks
df866e7f2a envvar overrideable PREFIX, BINDIR, MANDIR, PYTHON 2014-08-14 11:01:23 -04:00
Sergey M․
664718ff63 [livestream] Improve extraction (Closes #3513) 2014-08-14 20:17:31 +07:00
Sergey M․
3258263371 [shared] Update test 2014-08-13 18:24:46 +07:00
Alexander Kirk
3cfafc4a9b [ellentv] Add new extractor 2014-08-13 12:14:44 +02:00
Sergey M․
6f600ff5d6 [ooyala] Try mobile player JS URLs for all available devices (Closes #3498)
Looks like some videos are only available for particular devices
(e.g. http://player.ooyala.com/player.js?embedCode=x1b3lqZDq9y_7kMyC2Op5qo-p077tXD0
is only available for ipad)
Working around with fetching URLs for all the devices found starting with 'unknown'
until we succeed or eventually fail for each device.
2014-08-12 20:54:08 +07:00
Philipp Hagemeister
90e075da3a release 2014.08.10 2014-08-10 19:47:15 +02:00
Philipp Hagemeister
9572013de9 [appletrailers] Support height-less videos 2014-08-10 13:04:45 +02:00
Sergey M․
3a5beb0ca1 [ard] Show error message for videos that are no longer available (#3422) 2014-08-10 17:53:17 +07:00
Jaime Marquínez Ferrándiz
a6da7b6b96 [facebook] Allow '?' before '#!' (fixes #3477) 2014-08-10 11:57:15 +02:00
Jaime Marquínez Ferrándiz
173a7026d5 [test/test_utils] Fix typo in method name 2014-08-10 11:08:56 +02:00
Jaime Marquínez Ferrándiz
40a90862f4 [reverbnation] The 'uploader_id' field must be a string 2014-08-10 11:00:14 +02:00
Jaime Marquínez Ferrándiz
511c4325dc [reverbnation] Simplify json download
We can directly get a json file instead of the jsonp.
2014-08-10 10:58:22 +02:00
Jaime Marquínez Ferrándiz
85a699246a [reverbnation] Modernize test 2014-08-10 10:56:37 +02:00
Jaime Marquínez Ferrándiz
4dc5286e13 [reverbnation] Make sure that the thumbnail url contain the protocol
They are protocol relative.
2014-08-10 10:45:27 +02:00
Sergey M․
c767dc74b8 [downloader/common] Fix typo 2014-08-10 01:41:01 +07:00
Sergey M․
56ca04f662 Credit @sehaas for ORF FM4 extractor (#3431) 2014-08-10 01:26:23 +07:00
Sergey M․
eb3680123a [orf] Move all ORF extractors in one place 2014-08-10 01:21:16 +07:00
Sergey M․
f5273890ee [fm4] Remove unused imports and minor changes 2014-08-10 01:04:10 +07:00
Sergey M.
c7a088a816 Merge pull request #3431 from sehaas/fm4
[fm4] Add new extractor
2014-08-10 00:55:56 +07:00
Sergey M․
fb17b60811 [arte] Do not filter formats when there are no videos of requested lang code (Closes #3433) 2014-08-09 05:45:15 +07:00
Sergey M․
1e58804260 Merge branch 'pyed-xboxclips' 2014-08-08 19:22:31 +07:00
Sergey M․
31bf213032 [xboxclips] PEP8 and extract more metadata 2014-08-08 19:21:24 +07:00
Sergey M․
1cccc41ddc Merge branch 'xboxclips' of https://github.com/pyed/youtube-dl into pyed-xboxclips 2014-08-08 18:48:10 +07:00
Sergey M․
a91cf27767 [nowness] Add support for cn URLs (Closes #3465) 2014-08-08 18:43:28 +07:00
pyed
64d02399d8 [xboxclips] Add new extractor 2014-08-08 09:48:02 +03:00
Sergey M․
5961017202 [vube] Extract audio and categories 2014-08-07 20:04:29 +07:00
Sergey M.
d9760fd43c Merge pull request #3461 from tinybug/patch-2
Update vube.py
2014-08-07 19:14:48 +07:00
tinybug
d42b2d2985 Update vube.py
fix extractor is broken #3459
2014-08-07 11:24:51 +08:00
Philipp Hagemeister
cccfab6412 Restore youtube-dl compat binary
Be on the lookout, it might be modified in pull requests.
When I come back from my vacation (in three days from now), I'll start looking whether we really need the compat binary.
2014-08-06 19:30:16 +02:00
Sergey M․
4665664c92 Credit @DavidFabijan for mojvideo (#3423) 2014-08-06 20:40:55 +07:00
Sergey M․
0adc996bc3 Merge branch 'DavidFabijan-mojvideo' 2014-08-06 20:38:27 +07:00
Sergey M․
b42a2a720b [mojvideo] Switch to API, handle errors, remove faked width and height 2014-08-06 20:37:59 +07:00
Sergey M․
37edd7dd4a Merge branch 'mojvideo' of https://github.com/DavidFabijan/youtube-dl into DavidFabijan-mojvideo 2014-08-06 20:06:48 +07:00
Sergey M.
f87b3500c5 Merge pull request #3453 from naglis/firedrive_fix
[firedrive] fix broken extractor
2014-08-06 19:48:45 +07:00
David Fabijan
66420a2db4 Fixed the encoding 2014-08-06 14:44:29 +02:00
Naglis Jonaitis
6b8492a782 [firedrive] fix broken extractor 2014-08-06 02:26:42 +03:00
Philipp Hagemeister
6de0595eb8 release 2014.08.05 2014-08-05 17:02:47 +02:00
Sergey M․
e48a2c646d Credit @matrixik for #3441 2014-08-05 19:09:11 +07:00
Sergey M.
0f831a1a92 Merge pull request #3441 from matrixik/patch-1
[vimeo] Ignore video 'base' thumbnail (Closes #3438)
2014-08-05 19:07:05 +07:00
Erik Johnson
1ce464aba9 Add more information about running tests, add syntax highlighting
There was no information in the README about how to handle multiple
tests for a given extractor. This commit adds an explanation of how this
is handled.

It also adds some syntax highlighting.
2014-08-05 01:54:58 -05:00
Erik Johnson
6994e70651 Fix CSS parsing for Patreon
Some of the CSS classes end in " double", so this commit refines the
HTML parsing to account for both kinds of classes, and also adds an
additional test case.
2014-08-05 00:26:23 -05:00
Dobrosław Żybort
3e510af38d [vimeo] Ignore video 'base' thumbnail (Closes #3438) 2014-08-04 21:37:36 +02:00
Sebastian Haas
5ecd7b0a92 [fm4] Add new extractor 2014-08-03 20:50:46 +02:00
Naglis Jonaitis
a229909fa6 [jove] Add new extractor. Closes #3177 2014-08-03 21:24:44 +03:00
Sergey M․
548f31d99c [vimeo] Use original URL when for standard vimeo.com links (Closes #3428)
Some videos that are freely accessible without password via the original URL (e.g. http://vimeo.com/channels/keypeele/75629013)
ask for password when accessed via http://vimeo.com/<video_id>.
2014-08-04 00:04:47 +07:00
David Fabijan
78b296b0ff [Mojvideo] Add new extractor (minor changes) 2014-08-03 11:56:32 +02:00
David Fabijan
be79b07907 [Mojvideo] Add new extractor (minor changes) 2014-08-03 11:55:51 +02:00
David Fabijan
5537dce84d [Mojvideo] Add new extractor 2014-08-03 10:50:25 +02:00
Sergey M․
493987fefe [ubu] Add missing whitespace 2014-08-03 01:20:51 +07:00
Philipp Hagemeister
c97797a737 release 2014.08.02.1 2014-08-02 18:16:52 +02:00
Sergey M․
8d7d9d3452 [pbs] Add support for frontline videos (Closes #3414 #3405) 2014-08-02 19:09:36 +07:00
Sergey M․
7a5e7b303c [ubu] Add extractor (Close #3418) 2014-08-02 17:56:01 +07:00
Philipp Hagemeister
61aabb9d70 release 2014.08.02 2014-08-02 12:25:40 +02:00
Philipp Hagemeister
62af3a0eb5 [youtube] Use new signature cache ID for in-memory cache as well 2014-08-02 12:23:18 +02:00
Philipp Hagemeister
60064c53f1 [youtube] Make cache ID a tuple of lengths instead of just the whole length 2014-08-02 12:21:53 +02:00
Philipp Hagemeister
98eb1c3fa2 [youtube] Clean up -v signature output 2014-08-02 11:55:20 +02:00
Philipp Hagemeister
201e9eaa0e [youtube] Show format ID in signature deobfuscation -v output 2014-08-02 06:35:18 +02:00
Sergey M․
9afa6ede21 Merge branch 'naglis-izlesene' 2014-08-01 19:08:27 +07:00
Sergey M․
f4776371ae [izlesene] Minor changes 2014-08-01 19:08:09 +07:00
Sergey M․
328a20bf9c Merge branch 'izlesene' of https://github.com/naglis/youtube-dl into naglis-izlesene 2014-08-01 18:16:47 +07:00
Sergey M․
5622f29ae4 [ard] Quote path part instead of whole URL encode 2014-07-31 21:23:15 +07:00
Sergey M․
b4f23afbd1 [ard] Encode url (Closes #3412) 2014-07-31 20:35:29 +07:00
Sergey M․
0138968a6a [vidme] Add extractor (Closes #3404) 2014-07-31 20:26:52 +07:00
Erik Johnson
c3f0b12b0f fix exception 2014-07-30 15:30:07 -05:00
Philipp Hagemeister
4f31d0f2b7 release 2014.07.30 2014-07-30 09:50:22 +02:00
Philipp Hagemeister
bff74bdd1a [vevo] Sort formats (Fixes #3399) 2014-07-30 09:49:55 +02:00
Philipp Hagemeister
10b04ff7f4 Move --bidi-workaround to workarounds option group
Duh.
2014-07-29 17:19:19 +02:00
Philipp Hagemeister
1f7ccb9014 [generic] Add --default-search fixup_error
This restores the ability to enter URLs without a scheme (and default to http), but still fail if the input is a search term.
2014-07-29 17:17:46 +02:00
Sergey M․
c7b3209668 [swrmediathek] Improve _VALID_URL 2014-07-29 20:43:31 +07:00
Philipp Hagemeister
895ba7d1dd [gamestar] Use helper methods to not break if something changes (#3393) 2014-07-29 05:59:47 +02:00
SyxbEaEQ2
a2a1b0baa2 [gamestar] Add new extractor (init) 2014-07-29 00:37:18 +02:00
SyxbEaEQ2
8646eb790e [gamestar] Add new extractor 2014-07-29 00:31:33 +02:00
Erik Johnson
27ace98f51 Add import for Patreon extractor 2014-07-28 13:41:28 -05:00
Erik Johnson
a00d73c8c8 Add Patreon extractor 2014-07-28 13:40:58 -05:00
Jaime Marquínez Ferrándiz
f036a6328e [extractor/common] _extract_f4m_formats: Use more specific messages when downloading the manifest 2014-07-28 15:42:19 +02:00
Jaime Marquínez Ferrándiz
31bb8d3f51 [bloomberg] Extract the available formats (closes #2776)
It uses a helper method in the InfoExtractor class.
The downloader will pick the requested formats using the bitrate in the info dict.
2014-07-28 15:32:38 +02:00
Jaime Marquínez Ferrándiz
4958ae2058 [francetv] Fix wrong variable name 2014-07-28 15:21:05 +02:00
Jaime Marquínez Ferrándiz
7e8d73c183 [francetv] Extract all the available formats (#3278)
For some videos the resolution is not included in the url, we will need to look in the m3u8 manifest.
2014-07-28 14:37:13 +02:00
Sergey M․
65bc504db8 [br] Extract duration 2014-07-28 00:51:38 +07:00
Sergey M․
0fc74a0d91 [br] Fix test 2014-07-28 00:45:46 +07:00
Sergey M․
8d2cc6fbb1 [blinkx] Fix duration 2014-07-28 00:40:17 +07:00
Sergey M․
a954584f63 [bandcamp] Replace 404 playlist test 2014-07-28 00:27:27 +07:00
Sergey M․
cb3ff6fb01 [godtube] Add extractor (Closes #3367) 2014-07-27 02:38:05 +07:00
Sergey M․
71aa656d13 [streamcloud] Remove duration and modernize (Closes #3374) 2014-07-27 02:05:06 +07:00
Naglis Jonaitis
366b1f3cfe [izlesene] Add new extractor. Closes #3184 2014-07-26 14:35:23 +03:00
Jaime Marquínez Ferrándiz
64ce58db38 [abc] Add extractor (closes #3361) 2014-07-26 00:05:37 +02:00
Philipp Hagemeister
11b85ce62e [YouTubeDL] Best practices (Closes #3370) 2014-07-25 23:37:32 +02:00
Sergey M․
1220352ff7 [tvplay] Add extractor (Closes #3245) 2014-07-25 21:33:29 +07:00
Philipp Hagemeister
8f3034d871 [livestream] Do not fail if SMIL download fails 2014-07-25 11:53:52 +02:00
Philipp Hagemeister
7fa547ab02 [livestream] Make clipBegin optional in SMIL 2014-07-25 11:50:10 +02:00
Philipp Hagemeister
3182f3e2dc [justin.tv] Fix page reporting (#3352)
youtube-dl -j http://www.twitch.tv/fang_i3anger still fails though.
2014-07-25 11:46:53 +02:00
Philipp Hagemeister
cbf915f3f6 [livestream] Parse SMIL (#2713) 2014-07-25 11:39:17 +02:00
Philipp Hagemeister
b490b8849a release 2014.07.25.1 2014-07-25 10:47:35 +02:00
Philipp Hagemeister
5d2519e5bf [gdcvault] Add support for direct URL video type
Fixes #3356
2014-07-25 10:45:07 +02:00
Philipp Hagemeister
c3415d1bac [extractor/common] PEP8 2014-07-25 10:43:03 +02:00
Philipp Hagemeister
36f3542883 release 2014.07.25 2014-07-25 07:05:17 +02:00
Philipp Hagemeister
4cb71e9b6a [jsinterp] Fix slice 2014-07-25 07:04:39 +02:00
Philipp Hagemeister
4bc7009e8a [jsinterp] Add new testcase 2014-07-25 07:00:54 +02:00
Philipp Hagemeister
16f8e9df8a [jsinterp] Allow uppercase object names 2014-07-25 06:54:52 +02:00
Philipp Hagemeister
b081cebefa [youtube] Fix player ID display 2014-07-25 06:49:26 +02:00
Sergey M․
916c145217 [shared] Add extractor (Closes #3312) 2014-07-24 21:12:45 +07:00
Philipp Hagemeister
4192b51c7c Replace failure handling with up-front check.
The only time that write_string should fail is if the Python is completely braindead.
Check for that condition and output a more accurate warning.

See #3326 for details.
2014-07-24 13:29:44 +02:00
Philipp Hagemeister
052421ff09 Add --rm-cache-dir 2014-07-24 12:16:16 +02:00
Philipp Hagemeister
4e99f48817 deprecate --title
This is the default already. If you want a specific format, pick it with -o or --id.
2014-07-24 11:52:18 +02:00
Philipp Hagemeister
a11165ecc6 Reorder filesytem options
* Push down the deprecated ones
* Roughly order file-name, no-*, write-*, further options
2014-07-24 11:50:50 +02:00
Philipp Hagemeister
fbb2fc5580 Group cache-related options under filesystem 2014-07-24 11:49:26 +02:00
Philipp Hagemeister
2fe3d240cc Regroup and hide workaround options
These options are rarely necessary. Hide them to make the important options in the general group more obvious.
2014-07-24 11:46:21 +02:00
Philipp Hagemeister
42f4dcfe41 [test_youtube_signatures] Modernize 2014-07-24 11:39:54 +02:00
Philipp Hagemeister
892e3192fb [jsinterp] Do not expect dot in simple function call 2014-07-24 11:33:42 +02:00
Philipp Hagemeister
7272eab9d0 release 2014.07.24 2014-07-24 11:24:43 +02:00
Jaime Marquínez Ferrándiz
ebe832dc37 [jsinterp] 'reverse' modifies the array in place (fixes #3334) 2014-07-24 11:08:31 +02:00
Philipp Hagemeister
825abb8175 [jsinterp] Implement splice and general improvement
I still get 403s on YouTube though.
2014-07-24 10:41:14 +02:00
Sergey M․
8944ec0109 [krasview] Add extractor (Closes #3313) 2014-07-23 19:29:15 +07:00
Jaime Marquínez Ferrándiz
c084c93402 [youtube] Extract the 'sts' parameter from the webpage (fixes #3327) 2014-07-23 12:16:26 +02:00
Ole Ernst
8c778adc39 [gameone] simplify playlist extractor 2014-07-23 10:00:50 +02:00
Ole Ernst
71b6065009 [gameone] add playlist test 2014-07-23 09:32:01 +02:00
Liu DongMiao
7e660ac113 if there is more than one subtitle for the language, use the first one 2014-07-23 10:56:09 +08:00
Philipp Hagemeister
d799b47b82 [ffmpeg] PEP8 and a more obvious variable name 2014-07-23 02:55:06 +02:00
rupertbaxter2
b7f8116406 Deletes temp files after postprocess merge unless -k option is specified 2014-07-23 02:53:44 +02:00
Philipp Hagemeister
6db274e057 Remove legacy FileDownloader (Closes #2964) 2014-07-23 02:47:52 +02:00
Philipp Hagemeister
0c92b57398 Remove unused imports 2014-07-23 02:46:21 +02:00
Philipp Hagemeister
becafcbf0f [wdr] fix up imports 2014-07-23 02:44:30 +02:00
Philipp Hagemeister
92a86f4c1a Do not import from legacy FileDownloader class 2014-07-23 02:43:59 +02:00
Philipp Hagemeister
dfe029a62c release 2014.07.23.2 2014-07-23 02:25:27 +02:00
Philipp Hagemeister
b0472057a3 [YoutubeDL] Make sure we really, really get out the encoding string
Fixes #3326
Apparently, on some platforms, even outputting this fails already.
2014-07-23 02:24:52 +02:00
Philipp Hagemeister
c081b35c27 [youtube] Support new player URLs (Fixes #3326) 2014-07-23 02:19:33 +02:00
Philipp Hagemeister
9f43890bcd [jsinterp] Allow digits in function names 2014-07-23 02:13:48 +02:00
Philipp Hagemeister
94a20aa5f8 [rtlnow] Simplify outdated test 2014-07-23 01:49:25 +02:00
Philipp Hagemeister
94e8df3a7e [wdr] Fix umlaut parsing on Python 2.x 2014-07-23 01:47:36 +02:00
Philipp Hagemeister
37e64addc8 [nbc] Add missing import 2014-07-23 01:47:18 +02:00
Philipp Hagemeister
d82ba23ba5 [soundcloud:playlist] Fix test description 2014-07-23 01:44:08 +02:00
Philipp Hagemeister
0fd7fd71b4 [test/helper] Do not use deprecated method 2014-07-23 01:43:46 +02:00
Philipp Hagemeister
eae12e3fe3 [soundcloud] Adapt test 2014-07-23 01:41:45 +02:00
Philipp Hagemeister
798a2cad4f [sockshare] Fix ext 2014-07-23 01:40:01 +02:00
Philipp Hagemeister
41c0849429 [savefrom] Make test description more flexible 2014-07-23 01:38:07 +02:00
Philipp Hagemeister
a4e5af1184 release 2014.07.23.1 2014-07-23 01:27:33 +02:00
Philipp Hagemeister
b090af5922 [vube] Fix comment count 2014-07-23 01:27:25 +02:00
Philipp Hagemeister
388841f819 release 2014.07.23 2014-07-23 01:18:42 +02:00
Philipp Hagemeister
1a2ecbfbc4 [vube] Add support for new data format (Fixes #3325) 2014-07-23 01:18:27 +02:00
Philipp Hagemeister
38e292b112 [mlb] Fix regex 2014-07-22 23:55:41 +02:00
Charles Chen
c4f731262d Merge remote-tracking branch 'upstream/master' into MLB
Conflicts:
	youtube_dl/extractor/mlb.py
2014-07-22 14:44:38 -07:00
Charles Chen
07cc63f386 [MLB] Enhanced _VALID_URL to cover more MLB videos 2014-07-22 14:10:27 -07:00
Philipp Hagemeister
e42a692f00 [cbs] Modernize
Also add threatening skip blocks in there - access is only possible from the US. We may want to find a better geolocation restriction method for tests.
2014-07-22 17:34:35 +02:00
Philipp Hagemeister
6ec7538bb4 Merge remote-tracking branch 'jterk/cbs-artists' 2014-07-22 17:29:09 +02:00
Jason Terk
2871d489a9 Support Alternative cbs.com URL Format
Adds support for cbs.com URLs containing "/artist" instead of
"/video". E.g.:
http://www.cbs.com/shows/liveonletterman/artist/221752/st-vincent/
2014-07-22 08:00:08 -07:00
Philipp Hagemeister
1771ddd85d release 2014.07.22 2014-07-22 16:59:40 +02:00
Philipp Hagemeister
5198bf68fc Merge remote-tracking branch 'origin/master' 2014-07-22 16:59:31 +02:00
Philipp Hagemeister
e00fc35dbe [kickstarter] Support embedded videos (Fixes #3322) 2014-07-22 16:57:43 +02:00
Sergey M․
8904e979df [vodlocker] Fix _VALID_URL 2014-07-22 20:37:33 +07:00
Philipp Hagemeister
53eb217661 Add another great example for the --extractor-descriptions output 2014-07-22 04:53:14 +02:00
Jaime Marquínez Ferrándiz
9dcb8f3fc7 [br] Allow '_' in the url (fixes #3311) 2014-07-21 20:43:56 +02:00
Philipp Hagemeister
1e8ac8364b release 2014.07.21 2014-07-21 18:06:51 +02:00
Philipp Hagemeister
754d8a035e [nbcnews] Look in all playlists for video 2014-07-21 18:06:21 +02:00
Philipp Hagemeister
f1f725c6a0 [dropbox] Fix title encoding on Python 2 2014-07-21 13:55:47 +02:00
Philipp Hagemeister
06c155420f [sockshare] Simplify (#3268) 2014-07-21 13:25:59 +02:00
Philipp Hagemeister
7dabd2ac45 Merge remote-tracking branch 'naglis/sockshare'
Conflicts:
	youtube_dl/extractor/__init__.py
2014-07-21 13:24:15 +02:00
Philipp Hagemeister
df8ba0d2cf [tagesschau] Remove test case
See http://de.wikipedia.org/wiki/Depublizieren for the sad rationale.
2014-07-21 13:22:15 +02:00
Philipp Hagemeister
ff1956e07b [wdr] Replace test case 2014-07-21 13:19:41 +02:00
Philipp Hagemeister
caf5a8817b [chilloutzone] Fix test description 2014-07-21 13:16:48 +02:00
Philipp Hagemeister
a850fde1d8 [funnyordie] Fix test description 2014-07-21 13:14:41 +02:00
Philipp Hagemeister
0e6ebc13d1 [vimeo] Update test description 2014-07-21 13:11:24 +02:00
Philipp Hagemeister
6f5342a201 [cnet] Fix title extraction
URLs are still missing
2014-07-21 13:03:19 +02:00
Philipp Hagemeister
264a7044f5 [dropbox] Fix test and add support for spaces in filenames 2014-07-21 12:57:40 +02:00
Philipp Hagemeister
1a30deca50 [teachertube] Fix title and playlist recognition 2014-07-21 12:47:01 +02:00
Philipp Hagemeister
d8624e6a80 [test_playlist] Add and use assertGreaterEqual 2014-07-21 12:25:49 +02:00
Philipp Hagemeister
4f95d455ed [steam] Update test description 2014-07-21 12:17:44 +02:00
Philipp Hagemeister
468d19a9c1 [savefrom] Fix test description 2014-07-21 12:15:23 +02:00
Philipp Hagemeister
9aeaf730ad [rtve] Fix md5sum
Looks like these guys reencoded the video.
2014-07-21 12:14:07 +02:00
Philipp Hagemeister
db964a33a1 Remove unused imports 2014-07-21 12:12:50 +02:00
Philipp Hagemeister
da8fb85859 [snotr] Add description 2014-07-21 12:08:44 +02:00
Philipp Hagemeister
54330a1c3c [swfinterp] Fix imports 2014-07-21 12:07:26 +02:00
Philipp Hagemeister
9732d77ed2 [snotr] PEP8 and minor fixes (#3296) 2014-07-21 12:02:44 +02:00
Philipp Hagemeister
199ece7eb8 Merge remote-tracking branch 'hassaanaliw/snotr' 2014-07-21 11:43:46 +02:00
Philipp Hagemeister
1997eb0078 Merge pull request #3310 from bentley/master
Fix typo: “ytseach” → “ytsearch”
2014-07-21 09:22:58 +02:00
Anthony J. Bentley
eef4a7a304 Fix typo: “ytseach” → “ytsearch” 2014-07-20 18:37:44 -06:00
Philipp Hagemeister
246168bd72 Remove unused imports 2014-07-20 23:38:44 +02:00
Philipp Hagemeister
7fbf54dc62 [swfinterp] Remove (at the moment) dead code 2014-07-20 23:37:10 +02:00
Philipp Hagemeister
351f373865 [swfinterp] Fix _u32 name 2014-07-20 23:36:21 +02:00
Philipp Hagemeister
72e785f36a [livestream] PEP8 2014-07-20 23:34:20 +02:00
Philipp Hagemeister
727d2930f2 release 2014.07.20.2 2014-07-20 23:23:01 +02:00
Philipp Hagemeister
c13bf7c836 [swfinterp] Use helper function struct_unpack for old Python 2.x releases (#3270) 2014-07-20 23:20:15 +02:00
Philipp Hagemeister
f3308e138d release 2014.07.20.1 2014-07-20 21:38:29 +02:00
Philipp Hagemeister
29546b345b [ard] Add support for NDR-style videos (fixes #3281) 2014-07-20 21:38:02 +02:00
Jaime Marquínez Ferrándiz
2c57c7fa5a [youtube] Fix extraction of age gate videos (closes #3270)
Setting the correct value of the 'sts' paramater in the 'get_video_info' url gives the correct urls.
Removed parameters that are not needed.
2014-07-20 21:05:02 +02:00
Philipp Hagemeister
b6ea11b967 [youtube] Add swf signature test case (#3270) 2014-07-20 20:45:36 +02:00
Philipp Hagemeister
b8c74d606a [youtube] fix display of swf player id 2014-07-20 20:20:42 +02:00
Sergey M․
a5d524ef46 [allocine] Update tests 2014-07-21 00:28:55 +07:00
Philipp Hagemeister
cceb5ec237 release 2014.07.20 2014-07-20 18:47:03 +02:00
Philipp Hagemeister
71a6eaff83 Merge remote-tracking branch 'origin/master' 2014-07-20 18:32:59 +02:00
Philipp Hagemeister
7fd48d0413 [youtube] Correct signature testcase 2014-07-20 18:30:27 +02:00
Philipp Hagemeister
1b38b5be86 [swfinterp] Remove debugging code 2014-07-20 18:29:09 +02:00
Philipp Hagemeister
decf2ae400 [swfinterp] Correct array access 2014-07-20 18:28:49 +02:00
Philipp Hagemeister
0d989011ff [swfinterp] Add support for calling methods on objects 2014-07-20 14:49:10 +02:00
Philipp Hagemeister
01b4b74574 [swfinterp] Add support for calls to instance methods 2014-07-20 12:47:15 +02:00
Philipp Hagemeister
70f767dc65 [swfinterp] Add support for multiple classes 2014-07-20 00:25:58 +02:00
Philipp Hagemeister
e75c24e889 [swfinterp] Extend tests and fix parsing 2014-07-20 00:03:54 +02:00
Philipp Hagemeister
0cb2056304 [swfinterp] Start working on basic tests 2014-07-19 23:05:07 +02:00
hassaanaliw
8adec2b9e0 [snotr] Add new extractor 2014-07-19 22:49:25 +05:00
Sergey M․
604f292ab7 [sapo] Add extractor (Closes #2816) 2014-07-20 00:00:20 +07:00
Sergey M․
23d3c422ab [francetv] Add support for mobile URLs (Closes #3275) 2014-07-19 17:47:50 +07:00
Sergey M․
0c1ffe980d [mlb] Fix _VALID_URL 2014-07-18 21:43:01 +07:00
Sergey M․
5e95cb27d6 Credit @hassaanaliw for cracked (#3274) 2014-07-18 21:41:34 +07:00
Sergey M․
3b86f936c5 Merge branch 'hassaanaliw-cracked' 2014-07-18 21:39:38 +07:00
Sergey M․
e0942e37aa [crackled] Improve, fix invalid regexes and extract more metadata 2014-07-18 21:39:21 +07:00
Sergey M․
c45a6caa95 [utils] Add None check in str_to_int 2014-07-18 21:37:40 +07:00
Sergey M․
61bbddbaa6 Merge branch 'cracked' of https://github.com/hassaanaliw/youtube-dl 2014-07-18 20:29:35 +07:00
Philipp Hagemeister
5425626790 [youtube] Move swfinterp into its own file 2014-07-18 10:24:28 +02:00
Philipp Hagemeister
5dc3552d85 [youtube] Add support for classes in swf parser 2014-07-18 00:54:17 +02:00
Philipp Hagemeister
3fbd27f73e [youtube] SWF parser: Add opcode 86
Yes, I know we need 96, but an implementation of 86 could help avoid a similar issue.
2014-07-17 23:22:49 +02:00
Philipp Hagemeister
0382ecb78d Merge pull request #3289 from Reventl0v/patch-1
Fix the url in the INSTALLATION section
2014-07-17 22:54:24 +02:00
Philipp Hagemeister
72edb6fc8c Merge remote-tracking branch 'origin/master' 2014-07-17 22:32:54 +02:00
Jaime Marquínez Ferrándiz
66149e3f2b [npo] Fix the json extraction (fixes #3282)
The comment in the javascript file is not always the same.
2014-07-17 22:29:03 +02:00
Reventl0v
6e74521d98 Fix the url in the INSTALLATION section 2014-07-17 21:08:43 +02:00
Philipp Hagemeister
cf01013161 [youtube] Find more swf players (Closes #3270, refer #3271) 2014-07-17 16:28:36 +02:00
Jaime Marquínez Ferrándiz
1e179c7528 Merge pull request #3283 from MikeCol/redtube_thumb_new
Redtube changed player config, new place to get thumb URL
2014-07-17 12:44:21 +02:00
MikeCol
530ed178b7 Redtube changed player config, new place to get thumb URL 2014-07-17 11:17:27 +02:00
Jaime Marquínez Ferrándiz
74aa18f68f [dfb] Add extractor (closes #3280) 2014-07-17 10:07:51 +02:00
Jaime Marquínez Ferrándiz
d9222264a8 [adultswim] The bitrate must be an integer or None (reported in #2952) 2014-07-17 09:31:48 +02:00
Jaime Marquínez Ferrándiz
ca14211e93 [adultswim] Simplify (closes #2952) 2014-07-17 09:27:06 +02:00
Jaime Marquínez Ferrándiz
b1d65c3369 Merge remote-tracking branch 'adammw/adultswim' 2014-07-17 09:21:43 +02:00
Jaime Marquínez Ferrándiz
b4c538b02b [comedycentral] Only recognize the cc.com domain
The old comedycentral.com urls redirect to the new urls.
2014-07-16 23:05:56 +02:00
Jaime Marquínez Ferrándiz
13059bceb2 [comedycentral] Recognize 'full-episodes' urls (fixes #3277) 2014-07-16 23:05:56 +02:00
Sergey M․
d8894e24a4 [rtbf] Fix data video regex 2014-07-17 01:57:38 +07:00
Sergey M․
3b09757bac Credit @chaochichen for mlb (#3252) 2014-07-16 21:03:30 +07:00
Sergey M․
2f97f76877 Merge branch 'cracked' of https://github.com/hassaanaliw/youtube-dl into hassaanaliw-cracked 2014-07-16 20:55:38 +07:00
hassaanaliw
43f0537c06 [cracked] Add new extractor 2014-07-16 18:45:42 +05:00
Sergey M․
a816da0dc3 Merge branch 'chaochichen-MLB' 2014-07-16 20:42:01 +07:00
Sergey M․
7bb49d1057 [mlb] Extract more metadata and all formats, provide more tests 2014-07-16 20:40:28 +07:00
Sergey M․
1aa42fedee Merge branch 'MLB' of https://github.com/chaochichen/youtube-dl into chaochichen-MLB 2014-07-16 19:13:35 +07:00
Naglis Jonaitis
66aa382eae [sockshare] Add new extractor 2014-07-16 02:07:20 +03:00
Philipp Hagemeister
ee90ddab94 release 2014.07.15 2014-07-15 22:59:12 +02:00
Charles Chen
172240c0a4 Switched to use media detail XML to extract video URL 2014-07-15 13:55:23 -07:00
Jaime Marquínez Ferrándiz
ad25aee245 [youtube & jsinterp] Fix signature extraction (fixes #3255)
Some functions are defined now inside an object, the jsinterp will search its definition if the variable is not defined in the local namespace.
2014-07-15 22:46:39 +02:00
Sergey M․
bd1f325b42 [tutv] Replace 404 test and modernize 2014-07-15 19:32:42 +07:00
Sergey M․
00a82ea805 [soundcloud] Replace 404 test 2014-07-15 19:18:06 +07:00
Charles Chen
b1b01841af [MLB] Add new extractor 2014-07-14 11:00:55 -07:00
Filippo Valsorda
816930c485 Fix utils.strip_jsonp 2014-07-14 00:41:23 +02:00
Sergey M․
76233cda34 [pyvideo] Fix title extraction 2014-07-14 00:38:10 +07:00
Jaime Marquínez Ferrándiz
9dcea39985 [tlc.de] If the url contains a fragment, use if in the iframe url (reported in #2748)
The fragment is used in the webpage for selecting different videos.
2014-07-13 14:38:26 +02:00
Jaime Marquínez Ferrándiz
10d00a756a rename southparkstudios.py to southpark.py
And make the extractor only recognize southpark.cc.com urls, the old urls are redirected.
2014-07-13 14:08:23 +02:00
Jaime Marquínez Ferrándiz
eb50741129 Merge remote-tracking branch 'adammw/southpark' 2014-07-13 14:01:09 +02:00
Adam Malcontenti-Wilson
3804b01276 Update test 2014-07-13 21:29:04 +10:00
Adam Malcontenti-Wilson
b1298d8e06 Test for colon in mgid 2014-07-13 21:15:18 +10:00
Ole Ernst
c065fd35ae [gameone] add playlist capability 2014-07-13 12:16:25 +02:00
Adam Malcontenti-Wilson
6a46dc8db7 Add southpark.cc.com to southpark IE 2014-07-13 12:48:30 +10:00
Filippo Valsorda
36cb99f958 [ReverbNation] Add new IE - closes #2250 2014-07-13 00:47:20 +02:00
Sergey M․
81650f95e2 [ruhd] Add extractor 2014-07-13 04:03:22 +07:00
Sergey M․
34dbcb8505 [ndr] Replace 404 test 2014-07-12 22:08:33 +07:00
Philipp Hagemeister
c993c829e2 [firedrive] Simplify 2014-07-12 14:27:14 +02:00
Philipp Hagemeister
0d90e0f067 Credit @naglis for firedrive (#3242) 2014-07-12 14:23:54 +02:00
Naglis Jonaitis
678f58de4b [firedrive] Add new extractor. Addresses #3095 2014-07-12 00:42:42 +03:00
Sergey M․
c961a0e63e [screencast] Add one more format and improve title extraction 2014-07-11 22:52:48 +07:00
Sergey M․
aaefb347c0 [gorillavid] Fix embedded videos extraction 2014-07-11 22:23:00 +07:00
Philipp Hagemeister
09018e19a5 release 2014.07.11.3 2014-07-11 17:21:16 +02:00
Sergey M․
345e37831c [youtube] Update nosubtitles test 2014-07-11 22:08:04 +07:00
Sergey M․
00ac799b68 [vine:user] Update test 2014-07-11 22:04:24 +07:00
Jaime Marquínez Ferrándiz
133af9385b Update supported formats for the --recode-video option (#3228) 2014-07-11 16:16:30 +02:00
Philipp Hagemeister
40c696e5c6 [screencast] Add suppot for more video types (#3236) 2014-07-11 15:39:24 +02:00
Philipp Hagemeister
d6d5028922 release 2014.07.11.2 2014-07-11 13:34:48 +02:00
Philipp Hagemeister
38ad119f97 [screencast] Add new extractor (Fixes #3236) 2014-07-11 13:34:19 +02:00
Philipp Hagemeister
4e415288d7 [criterion] Simplify and modernize 2014-07-11 13:21:32 +02:00
Philipp Hagemeister
fada438acf release 2014.07.11.1 2014-07-11 11:53:28 +02:00
Philipp Hagemeister
1df0ae2170 Credit @tobidope for gameone (#2941) 2014-07-11 11:29:17 +02:00
Philipp Hagemeister
d96b9d40f0 [gameone] Sort formats 2014-07-11 11:27:44 +02:00
Philipp Hagemeister
fa19dfccf9 Merge remote-tracking branch 'tobidope/gameone' 2014-07-11 11:17:57 +02:00
Philipp Hagemeister
cdc22cb886 Credit @adammw for tenplay (#2954) 2014-07-11 11:16:04 +02:00
Philipp Hagemeister
04c77a54b0 [tenplay] PEP8 2014-07-11 11:15:35 +02:00
Philipp Hagemeister
64a8c39a1f Merge remote-tracking branch 'adammw/tenplay' 2014-07-11 11:12:41 +02:00
Philipp Hagemeister
3d55f2806e Credit @irtusb for vimple (#3073) 2014-07-11 11:11:52 +02:00
Philipp Hagemeister
1eb867f33f [vimple] Simplify and PEP8 2014-07-11 11:11:09 +02:00
Philipp Hagemeister
e93f4f7578 [vodlocker] Remove unused imports 2014-07-11 11:09:01 +02:00
Philipp Hagemeister
45ead916d1 [vimple] Do not fail if duration is missing 2014-07-11 11:08:36 +02:00
Philipp Hagemeister
3a0879c8c8 Merge remote-tracking branch 'irtusb/vimple' 2014-07-11 11:07:44 +02:00
Philipp Hagemeister
ebf361ce18 Merge remote-tracking branch 'azeem/soundcloud_likes' 2014-07-11 11:06:33 +02:00
Philipp Hagemeister
953b358668 [gorillavid] Add support for daclips.in (Closes #3213) 2014-07-11 11:05:16 +02:00
Philipp Hagemeister
3dfd25b3aa [goshgay] PEP8 and test for age_limit (#3220) 2014-07-11 11:01:59 +02:00
Philipp Hagemeister
6f66eedc5d Merge remote-tracking branch 'MikeCol/goshgay' 2014-07-11 11:00:37 +02:00
Philipp Hagemeister
4094b6e36d [vodlocker] PEP8, generalization, and simplification (#3223) 2014-07-11 10:57:40 +02:00
Philipp Hagemeister
c09cbf0ed9 Merge remote-tracking branch 'pachacamac/vodlocker' 2014-07-11 10:54:53 +02:00
Philipp Hagemeister
391d53e1dd release 2014.07.11 2014-07-11 10:49:41 +02:00
Philipp Hagemeister
f64ebfe3e5 [youtube] Correct signature test 2014-07-11 10:46:11 +02:00
Philipp Hagemeister
fc040bfd05 [jsinterp] Prevent mis-recognitions of local functions 2014-07-11 10:44:56 +02:00
Philipp Hagemeister
c8bf86d50d [youtube] Correct signature extraction error detection 2014-07-11 10:44:39 +02:00
Philipp Hagemeister
61989fb5e9 [jsinterp] Remove superfluous u 2014-07-11 10:40:02 +02:00
Philipp Hagemeister
6f9d4d542f [youtube] Add test for new signature scheme (#3232) 2014-07-11 10:34:01 +02:00
Philipp Hagemeister
b3a8878080 [youtube] Remove static signatures
The always fail by now. Instead, use only automatic signature extraction
2014-07-11 10:23:19 +02:00
Philipp Hagemeister
f4d66a99cf release 2014.07.10 2014-07-10 14:49:16 +02:00
pachacamac
537ba6f381 [Vodlocker] Add new extractor 2014-07-09 18:21:46 +02:00
Sergey M․
411f691b21 [mpora] Fix player regex 2014-07-09 19:12:42 +07:00
MikeCol
d6aa1967ad GoshGay Extractor 2014-07-09 12:14:53 +02:00
Sergey M․
6e1e0e4b5b [veoh] Skip deleted test video 2014-07-08 20:22:27 +07:00
azeem
3941669d69 [soundcloud] Adding likes support to SoundcloudUserIE 2014-07-07 23:59:57 +05:30
Sergey M․
1aac03797e [ninegag] Fix extraction 2014-07-07 20:12:59 +07:00
Jaime Marquínez Ferrándiz
459af43494 [arte] Manually set the rtmp play_path (fix #3198)
rtmpdump doesn't parse it right
2014-07-07 14:10:57 +02:00
Philipp Hagemeister
f4f7e3cf41 Merge branch 'master' of github.com:rg3/youtube-dl 2014-07-06 20:57:05 +02:00
Sergey M․
1fd015516e [newstube] Replace test 2014-07-06 19:32:13 +07:00
Sergey M․
76bafa8ffe [newstube] Capture error message 2014-07-06 18:53:31 +07:00
Philipp Hagemeister
8d5797b00f [YoutubeDL] Show download URL when -v is set
This will allow us to debug issues like #3204
2014-07-06 11:28:51 +02:00
Philipp Hagemeister
7571c02c8a [generic] Set default-search to error
This prevents users from submitting bug reports where they mistyped a URL, and prevents me from getting a weird video when holding shift and thus searching for :Tds
2014-07-06 11:22:44 +02:00
Petr Půlpán
49cbe7c8e3 [allocine] add extractor for allocine.fr (fixes #3189) 2014-07-05 14:42:26 +02:00
Sergey M․
ba4133c9eb Credit @hakatashi for #3181 #3182 2014-07-04 22:30:43 +07:00
Sergey M․
b67f1840a1 [niconico] Remove unused import 2014-07-04 22:26:56 +07:00
Sergey M.
165c46690f Merge pull request #3180 from hakatashi/niconico-without-authentication
[niconico] Download without authentication
2014-07-04 22:25:05 +07:00
Sergey M․
16bc9ab601 Merge branch 'hakatashi-niconico-channel-video' 2014-07-04 22:06:31 +07:00
Sergey M․
15ce1338b4 [niconico] Extract more metadata and simplify (Closes #3181) 2014-07-04 22:05:46 +07:00
Sergey M․
0ff30c5333 Merge branch 'niconico-channel-video' of https://github.com/hakatashi/youtube-dl into hakatashi-niconico-channel-video 2014-07-04 21:39:54 +07:00
Sergey M․
6feb2d5e80 [youtube:search_url] Update regexes 2014-07-04 19:21:19 +07:00
Sergey M․
1e07fea200 [teachertube] Add support for new video URL format 2014-07-03 21:11:56 +07:00
Sergey M․
7aeb67b39b [teachertube:user:collection] Update media regex 2014-07-03 21:08:44 +07:00
Sergey M․
93881db22a [anitube] Modernize 2014-07-02 19:24:01 +07:00
hakatashi
64ed7a38f9 [niconico] Add support for channel video 2014-07-02 03:13:12 +09:00
hakatashi
2fd466fcfc [niconico] Download without authentication 2014-07-02 02:32:54 +09:00
Philipp Hagemeister
dc2fc73691 [youtube:truncated_url] Move test to extractor 2014-07-01 15:49:34 +02:00
Philipp Hagemeister
c4808c6009 [youtube_truncated_url] Add support for truncated watch URLs with annotations (#3178) 2014-07-01 15:49:16 +02:00
Sergey M․
c67f584eb3 [rai] Skip test 2014-07-01 19:24:18 +07:00
Petr Půlpán
29f6ed78e8 [tagesschau] replace 404 test 2014-07-01 10:35:49 +02:00
pulpe
7807ee664d [wdr] fix test 2014-07-01 09:59:57 +02:00
Sergey M․
d518d06efd [vk] Skip georestricted ivi embed test 2014-06-30 03:16:31 +07:00
Petr Půlpán
25a0cc44b9 [teachertube:user] fix regex 2014-06-29 20:33:46 +02:00
Petr Půlpán
825cdcec3c Merge branch 'master' of github.com:rg3/youtube-dl 2014-06-29 16:44:37 +02:00
Petr Půlpán
41b610acab [GooglePlus] fix video title extraction 2014-06-29 16:43:31 +02:00
Sergey M․
0364fa8b65 [generic] Add support for ivi.ru embedded player 2014-06-29 20:18:23 +07:00
Sergey M․
849086a1ae [vk] Better support for embeds 2014-06-29 20:07:59 +07:00
Sergey M․
36fbc6887f [ivi] Add support for embedded URLs 2014-06-29 20:06:47 +07:00
Sergey M․
a8a98e43f2 [vk] Add support for mobile URLs 2014-06-29 19:51:00 +07:00
Sergey M․
57bdc730e2 [vk] Add support for more URL formats (#3172) 2014-06-29 19:33:39 +07:00
Petr Půlpán
31a196d7f5 [TeacherTube] add user + collection, removed classrooms 2014-06-29 13:45:10 +02:00
Petr Půlpán
9b27e6c3b4 [Tumblr] fix encoding (PEP0263) 2014-06-29 09:32:53 +02:00
Petr Půlpán
62f1f9507f [Tumblr] fix test + add description 2014-06-29 09:08:46 +02:00
Petr Půlpán
ee8dda41ae [Toypics] support https urls 2014-06-29 08:21:23 +02:00
Sergey M․
01ba178097 [vk] Update test 2014-06-29 04:51:47 +07:00
Petr Půlpán
78ff59d052 [Motherless] simplify 2014-06-28 20:02:02 +02:00
Petr Půlpán
f3f1cd6b3b Merge pull request #3167 from Schnouki/motherless
* mother/motherless:
  [Motherless] Add new extractor
2014-06-28 19:12:31 +02:00
Sergey M․
803540e811 [drtv] Add missing extractor import 2014-06-28 17:36:13 +07:00
Petr Půlpán
458ade6361 [ArteTVFuture] fix empty formats list 2014-06-28 10:22:53 +02:00
Thomas Jost
a69969ee05 [Motherless] Add new extractor 2014-06-27 18:12:11 +02:00
Sergey M․
f2b8db57eb [drtv] Add extractor for DR TV (Closes #3126) 2014-06-27 20:53:59 +07:00
Jaime Marquínez Ferrándiz
331ae266ff [npo] Add extractor (closes #3145) 2014-06-26 20:30:44 +02:00
Philipp Hagemeister
4242001863 release 2014.06.26 2014-06-26 16:44:01 +02:00
Jaime Marquínez Ferrándiz
78338f71ca [livestream:original] Add support for folder urls (closes #2631)
The webpage only contains shortened links for the videos, since the server
doesn't support HEAD requests, we use an specific extractor for them.
2014-06-26 16:34:36 +02:00
Sergey M․
f5172a3084 [teachertube] Add support for new URL formats 2014-06-26 20:01:59 +07:00
Sergey M․
c7df67edbd [teachertube] Improve extraction 2014-06-26 20:00:47 +07:00
Petr Půlpán
d410fee91d [VideoTt] fix ValueError (#3161) 2014-06-26 07:35:47 +02:00
Philipp Hagemeister
ba7aa464de [soundgasm] PEP8 and add a display_id (#3155) 2014-06-25 23:47:38 +02:00
Philipp Hagemeister
8333034dce Merge remote-tracking branch 'pachacamac/soundgasm' 2014-06-25 23:45:03 +02:00
Philipp Hagemeister
637b6af80f release 2014.06.25 2014-06-25 21:24:01 +02:00
pachacamac
1044f8afd2 [Soundgasm] Add new extractor 2014-06-25 18:07:23 +02:00
Petr Půlpán
2f775107f9 Merge branch 'master' of github.com:rg3/youtube-dl 2014-06-25 17:45:24 +02:00
Petr Půlpán
85342674b2 [Dailymotion] fix uploader name (fixes #3153) 2014-06-25 17:44:19 +02:00
Sergey M․
fd69098a45 [rutube] Update playlist tests 2014-06-25 19:06:11 +07:00
Petr Půlpán
8867f908fc Merge pull request #3148 from crazedpsyc/master
[BlipTV] Allow plus sign in video ID
2014-06-25 07:14:04 +02:00
Michael Smith
b7c33124c8 [BlipTV] Allow plus sign in video ID 2014-06-24 17:55:08 -06:00
Jaime Marquínez Ferrándiz
89a8c423c7 Merge pull request #3146 from pvdl/patch-1
[discovery] Change default url
2014-06-24 18:11:26 +02:00
Peter
cea2582df2 [discovery] Change default url
URL does a redirect from dsc.discovery.com to www.discovery.com
This commit fixes the correct URL.
2014-06-24 17:41:53 +02:00
Sergey M․
e423e0baaa [wistia] Add duration and modernize 2014-06-24 19:34:39 +07:00
Philipp Hagemeister
60b2dd1285 [comedycentral] Correct handling when latest tds episode is a special-episode instead of a regular one 2014-06-24 10:50:41 +02:00
Philipp Hagemeister
36ddd8b3f7 release 2014.06.24.1 2014-06-24 09:03:52 +02:00
Philipp Hagemeister
7575d52a73 release 2014.06.24 2014-06-24 08:59:40 +02:00
Sergey M․
9a2dc4f7ac [teachertube] Fix extraction 2014-06-23 03:07:10 +07:00
Jaime Marquínez Ferrándiz
c5cd249e41 [generic] Extract mtvservices embedded videos 2014-06-22 21:39:36 +02:00
Jaime Marquínez Ferrándiz
8940c1c058 [mtv] Add an extractor for the mtvservices embedded player (closes #2995) 2014-06-22 21:39:27 +02:00
Petr Půlpán
27ec04b232 [BR] replace test 2014-06-22 17:33:27 +02:00
Sergey M․
d2824416aa [firstpost] Fix title extraction and add description 2014-06-22 01:20:40 +07:00
Petr Půlpán
18061bbab0 [Youtube] add DASH format 272 (fixes #3128) 2014-06-21 12:03:27 +02:00
Sergey M․
4ecbbcbcea Merge branch 'eliasp-spiegel' 2014-06-21 16:32:01 +07:00
Sergey M․
55c97a03e1 [spiegel] Add description and modernize 2014-06-21 16:31:18 +07:00
Elias Probst
98aeac6ea9 Use the 'base_url' for building the resulting 'url' as well. 2014-06-21 01:10:10 +02:00
Elias Probst
8bfb6723cb Extract the base_url for the XML download from the JS snippet's 'server' variable. 2014-06-21 01:00:48 +02:00
Elias Probst
a20575e8ae Make debug message useful and also report, which URL failed to download. 2014-06-21 00:35:12 +02:00
Sergey M․
7724572519 [noco] Switch to HTTPS (Closes #3116) 2014-06-20 18:40:47 +07:00
Philipp Hagemeister
d763637f6a release 2014.06.19 2014-06-19 17:13:50 +02:00
Jaime Marquínez Ferrándiz
c26e9ac4b2 [youtube] Recognize signature functions that contain '$' (fixes #3104) 2014-06-19 16:42:49 +02:00
Petr Půlpán
896bf55352 [LifeNews] update thumbnail in test 2014-06-19 16:34:48 +02:00
Petr Půlpán
a23ba9b53c [Steam] update description in test 2014-06-19 16:32:11 +02:00
Sergey M․
38a9339baf [prosiebensat1] Update some regexes 2014-06-19 19:51:49 +07:00
Sergey M․
def8b4039f [bilibili] Fix extraction 2014-06-18 18:53:25 +07:00
Petr Půlpán
a14e1538fe [ustream:channel] replace test for an updated channel 2014-06-17 16:03:03 +02:00
Petr Půlpán
5f28a1acad [GorillaVid] improve extractor 2014-06-17 15:18:46 +02:00
pulpe
25e9953c6f Merge pull request #3059 from marcwebbie/gorillavid
* marcwebbie/gorillavid:
  Changed video url to a public video
  [GorillaVid] Added GorillaVid extractor
2014-06-17 15:14:18 +02:00
Petr Půlpán
f9df094ca5 Merge pull request #3089 from pulpe/ard_fix
[ARDIE] fix formats extraction (fixes #3087)
2014-06-17 14:53:51 +02:00
Sergey M.
b60a469023 Merge pull request #3090 from Kagee/patch-1
tv.nrk.no urls mostly contain capital characters
2014-06-17 02:21:10 +07:00
Anders Einar Hilden
7012631257 Fix test
Didn't use .lower() as planned, so update test with new ID.
2014-06-16 19:37:59 +02:00
Anders Einar Hilden
e6c9f80c48 tv.nrk.no urls mostly contain capital characters
Updated regexp and one of the test cases to reflect this.
tv.nrksuper.no mostly uses lowercase, so that is still there.
2014-06-16 19:29:23 +02:00
pulpe
895ce482b1 [ARDIE] adjustments suggested by @jaimeMF 2014-06-16 18:15:41 +02:00
pulpe
e5da4021eb [ARDIE] fix formats extraction (fixes #3087) 2014-06-16 16:17:49 +02:00
Sergey M․
2371053565 [rai] Skip test 2014-06-16 18:50:15 +07:00
Philipp Hagemeister
33bf9033e0 release 2014.06.16 2014-06-16 10:15:24 +02:00
Jaime Marquínez Ferrándiz
35eacd0dae [brightcove] Set the filesize of the formats and use _sort_formats 2014-06-15 11:37:39 +02:00
Jaime Marquínez Ferrándiz
96bef88f5f [brightcove] Modernize some tests 2014-06-15 11:24:05 +02:00
Jaime Marquínez Ferrándiz
5524b242a7 [brightcove] Add support for renditions with 'remote' set to True (fixes #3081)
The url needs to be modified to get the flv video.
2014-06-15 11:20:40 +02:00
Jaime Marquínez Ferrándiz
a013eba65f [brightcove] Improve the 'experienceJSON' regex (#3081)
One of the strings may contain ';', we would get an invalid json string.
2014-06-15 11:08:24 +02:00
Sergey M.
36755d40b4 Merge pull request #3078 from pulpe/youtube_fix
[Youtube] Recognize playlists with LL
2014-06-15 02:49:16 +07:00
pulpe
7d568f5ab8 [Youtube] Recognize playlists with LL 2014-06-14 13:23:28 +02:00
Sergey M․
a7207cd580 [wrzuta] Add age limit 2014-06-14 17:00:59 +07:00
Sergey M.
e8ef659cd9 Merge pull request #3075 from pulpe/wrzuta
[WrzutaIE] Add extractor for wrzuta.pl (fixes #3072)
2014-06-14 16:51:27 +07:00
Sergey M․
b0adbe98fb [rai] Add support for Rai websites (Closes #2930) 2014-06-13 23:44:44 +07:00
pulpe
0c361c41b8 [WrzutaIE] Add extractor for wrzuta.pl (fixes #3072) 2014-06-13 08:51:35 +02:00
Ariset Llerena
e66ab17a36 Verified with pep8 and pyflakes 2014-06-12 23:08:06 -04:00
Ariset Llerena
cb437dc2ad removed extra char in regexp 2014-06-12 22:33:50 -04:00
Ariset Llerena
0d933b2ad5 Added vimple.ru support 2014-06-12 22:31:08 -04:00
Sergey M․
c5469e046a [livestream] Modernize 2014-06-12 20:42:46 +07:00
Sergey M․
4d2f143ce5 [ted] Update test md5 2014-06-12 20:33:53 +07:00
Sergey M․
8f93030c85 [blinkx] Modernize 2014-06-11 18:38:13 +07:00
Sergey M․
fdb9aebead [tube8] Update test and modernize 2014-06-11 18:20:14 +07:00
Sergey M․
3141feb73b [ndtv] Fix title extraction and modernize 2014-06-10 19:37:38 +07:00
Philipp Hagemeister
9706f3f802 release 2014.06.09 2014-06-09 23:16:37 +02:00
Philipp Hagemeister
d5e944359e Remove unused import 2014-06-09 23:14:04 +02:00
Philipp Hagemeister
826ec77fb2 [Vulture] Add support for vulture.com 2014-06-09 23:06:39 +02:00
Philipp Hagemeister
2656f4eb6a [hypem] Modernize 2014-06-09 22:34:41 +02:00
Philipp Hagemeister
2b88feedf7 [generic] Add support for <embed YouTube 2014-06-09 22:06:45 +02:00
Jaime Marquínez Ferrándiz
23566e0d78 rtmp and hls downloaders: Clarify error message when the external tools are not installed
Ask to install them, as we do in the postprocessor.
We get some reports with it, like #3061 or #3048.
2014-06-09 20:23:20 +02:00
Sergey M․
828553b614 [nuvid] Remove superfluous slash 2014-06-09 20:41:33 +07:00
Sergey M․
3048e82a94 [nuvid] Improve extraction 2014-06-09 20:37:04 +07:00
Sergey M․
09ffa08ba1 [veoh] Capture error message 2014-06-08 23:05:20 +07:00
Sergey M․
e0b4cc489f [dreisat] Modernize 2014-06-08 22:45:12 +07:00
Sergey M․
15e423407f [dreisat] Fix thumbnails' width and height 2014-06-08 22:41:24 +07:00
Sergey M․
702e522044 [teachertube] Fix extraction for Python 3 2014-06-08 22:16:48 +07:00
marcwebbie
77abae55df Changed video url to a public video 2014-06-08 03:13:45 -03:00
marcwebbie
617c0b2239 [GorillaVid] Added GorillaVid extractor 2014-06-07 23:09:45 -03:00
Philipp Hagemeister
814d4257df Remove unused imports 2014-06-07 16:52:34 +02:00
Philipp Hagemeister
23ae281b31 [fc2] Fall back to webpage title if needed 2014-06-07 16:52:11 +02:00
Philipp Hagemeister
94128d6b0d [nrk] Fix test checksum 2014-06-07 16:50:19 +02:00
Philipp Hagemeister
059009c592 release 2014.06.07 2014-06-07 16:42:53 +02:00
Philipp Hagemeister
9cc977f104 Credit @ralfharing for vh1 2014-06-07 16:41:44 +02:00
Philipp Hagemeister
1c0ade7afa [vh1] Skip tests (Do not work from Germany) 2014-06-07 16:40:16 +02:00
Philipp Hagemeister
f2741c8d3a [vh1] Simplify 2014-06-07 16:39:08 +02:00
Philipp Hagemeister
6ab8f3584a Merge remote-tracking branch 'ralfharing/vh1' 2014-06-07 15:53:30 +02:00
Philipp Hagemeister
8ae5ce1726 [cmt] Simplify (mentioned in #2072) 2014-06-07 15:52:49 +02:00
Philipp Hagemeister
eb92077720 [soundcloud] Add duration information (Closes #3035, Fixes #3034) 2014-06-07 15:51:01 +02:00
Philipp Hagemeister
90e0fd4bad [ku6] Improve (#3015) 2014-06-07 15:46:33 +02:00
codelol
05741e05d9 [ku6] Add new extractor 2014-06-07 15:42:33 +02:00
Philipp Hagemeister
9aa6637644 Merge branch 'master' of github.com:rg3/youtube-dl 2014-06-07 15:41:12 +02:00
Philipp Hagemeister
d30d28156d Credit @georgjaehnig for spiegeltv 2014-06-07 15:40:27 +02:00
Philipp Hagemeister
be6d722904 [cnn] Improve thumbnail extraction 2014-06-07 15:39:21 +02:00
Philipp Hagemeister
d551980823 [spiegeltv] Simplify and PEP8 2014-06-07 15:35:13 +02:00
Sergey M․
f0a6c3d2bc [teachertube] Add support for audios 2014-06-07 20:32:23 +07:00
Philipp Hagemeister
4e0fb1280a Merge remote-tracking branch 'georgjaehnig/spiegeltv' 2014-06-07 15:21:33 +02:00
Philipp Hagemeister
24f5251cce Merge remote-tracking branch 'pulpe/teachertube'
Conflicts:
	youtube_dl/extractor/__init__.py
2014-06-07 15:20:12 +02:00
Philipp Hagemeister
ac1390eee8 Merge branch 'master' of github.com:rg3/youtube-dl
Conflicts:
	youtube_dl/extractor/__init__.py
2014-06-07 15:15:39 +02:00
Philipp Hagemeister
4a5b4d34dc [tagesschau] Add support for width/height 2014-06-07 15:14:20 +02:00
Jaime Marquínez Ferrándiz
63adb0cc61 Merge pull request #3057 from pulpe/yt_fmt
[Youtube] Add format code 271 (1440p webm)
2014-06-07 14:39:28 +02:00
pulpe
3c80377b69 [Youtube] Add format code 271 (1440p webm) 2014-06-07 14:31:10 +02:00
Jaime Marquínez Ferrándiz
24577db241 [test/test_youtube_lists] Replace mix list
The old video doesn't have a mix anymore.
2014-06-07 13:43:27 +02:00
Jaime Marquínez Ferrándiz
566bd96da8 [teachingchannel] Add extractor (closes #3048) 2014-06-07 13:11:04 +02:00
Philipp Hagemeister
ebdb64d605 Merge remote-tracking branch 'pulpe/tagesschau' 2014-06-07 12:43:31 +02:00
Sergey M․
a6ffb92f0b [xvideos] Replace test 2014-06-06 21:23:36 +07:00
Sergey M․
3217377b3c [xvideos] Capture and output inline error if any 2014-06-06 21:15:06 +07:00
Jaime Marquínez Ferrándiz
24da5893fc [naver] Modernize 2014-06-06 14:57:37 +02:00
Jaime Marquínez Ferrándiz
087ca2cb07 [naver] Add rtmp formats (fixes #3054) 2014-06-06 14:55:19 +02:00
pulpe
b4e7447458 [TeacherTubeIE] Add extractor for teachertube.com videos + classrooms (fixes #3046) 2014-06-06 11:21:59 +02:00
pulpe
a45e6aadd7 [TagesschauIE] Fix possible error if quality is not defined 2014-06-06 09:00:28 +02:00
Jaime Marquínez Ferrándiz
70e322695d [youtube:playlist] Fix mixes extraction (fixes #3051)
The username seems to be empty now.
2014-06-05 21:23:27 +02:00
pulpe
6a15923b77 [TagesschauIE] Add note to 2nd _download_webpage 2014-06-05 19:34:30 +02:00
pulpe
7ffad0af5a [TagesschauIE] Remove unused import 2014-06-05 18:49:34 +02:00
pulpe
0e3ae92441 [TagesschauIE] Add extractor for tagesschau.de (fixes #3049) 2014-06-05 18:48:03 +02:00
Sergey M.
b3ae826f7a Merge pull request #3047 from pulpe/yahoo_thumb
[yahoo] improve thumbnail extraction
2014-06-05 19:31:28 +07:00
pulpe
dede691aca [yahoo] improve thumbnail extraction 2014-06-04 17:38:41 +02:00
Sergey M․
fb6a5b965b [yahoo] Improve content id extraction 2014-06-04 20:13:36 +07:00
Sergey M․
6340716b3a [yahoo] Make thumbnail optional (Closes #3043) 2014-06-04 20:11:23 +07:00
Philipp Hagemeister
b675b32e6b release 2014.06.04 2014-06-04 06:47:57 +02:00
Jaime Marquínez Ferrándiz
6a3fa81ffb [ard] Fix format extraction (fixes #3006 and #3032) 2014-06-03 21:56:49 +02:00
Georg Jaehnig
df53a98f2b [Spiegeltv] remove the md5 field to pass Travis test build 2014-06-03 17:52:39 +02:00
Georg Jaehnig
db23d8d2a2 [Spiegeltv] skip rtmp download to pass Travis test build 2014-06-03 16:50:54 +02:00
Jaime Marquínez Ferrándiz
0d69795014 Merge pull request #2962 from simonwjackson/patch-1
Update test_age_restriction.py
2014-06-03 16:47:59 +02:00
Sergey M.
3374f3fdc2 Merge pull request #3022 from MikeCol/Extremetube_title
title extraction condition less restrictive
2014-06-03 19:59:08 +07:00
Sergey M.
4bf0727b1f Merge pull request #3033 from Forever-Young/patch-2
Recognize a third format of the upload_date in the 'watch-uploader-info'...
2014-06-02 20:20:21 +07:00
Anton Novosyolov
263bd4ec50 Recognize a third format of the upload_date in the 'watch-uploader-info' element 2014-06-02 13:30:23 +04:00
Philipp Hagemeister
b7e8b6e37a release 2014.06.02 2014-06-02 10:47:24 +02:00
Sergey M․
ceb7a17f34 [mailru] Add support for new mail.ru URL format (Closes #3024) 2014-06-01 14:38:36 +07:00
Philipp Hagemeister
1a2f2e1e66 release 2014.05.31.4 2014-05-31 20:45:24 +02:00
Philipp Hagemeister
6803016858 release 2014.05.31.3 2014-05-31 20:40:48 +02:00
Philipp Hagemeister
9b7c4fd981 release 2014.05.31.2 2014-05-31 20:35:12 +02:00
Philipp Hagemeister
dc31942f42 release 2014.05.31.1 2014-05-31 20:29:53 +02:00
Philipp Hagemeister
1f6b8f3115 release 2014.05.31 2014-05-31 20:28:03 +02:00
MikeCol
9c7b79acd9 title extraction condition less restrictive 2014-05-31 18:31:39 +02:00
Jaime Marquínez Ferrándiz
9168308579 [vevo] The title in the url is optional (fixes #3020) 2014-05-31 17:55:03 +02:00
anovicecodemonkey
37e3cbe22e Move duplicate check to generic.py 2014-06-01 01:16:35 +09:30
Jaime Marquínez Ferrándiz
7e8fdb1aae [fc2] Recognize urls without language part (reported in #1154) 2014-05-31 14:45:46 +02:00
Jaime Marquínez Ferrándiz
386ba39cac [fc2] Encode the string used for the md5 checksum
In python 3 it must be a bytes object.
2014-05-31 14:40:05 +02:00
Sergey M․
236d0cd07c [nrktv] Recognize tv.nrksuper.no URL 2014-05-31 17:45:00 +07:00
Jaime Marquínez Ferrándiz
ed86f38a11 [theplatform] Use unicode_literals and _download_json 2014-05-30 21:10:48 +02:00
Jaime Marquínez Ferrándiz
6db80ad2db [comedycentralshows] Transform the rtmp urls so that rtmpdump can download them (fixes #3010)
From 'rtmpe://viacomccstrmfs.fplive.net/viacomccstrm/gsp.comedystor/*' to 'rtmpe://viacommtvstrmfs.fplive.net:1935/viacommtvstrm/gsp.comedystor/*'
2014-05-30 20:59:15 +02:00
Georg Jaehnig
14470ac87b tabs as spaces 2014-05-30 17:56:13 +02:00
Georg Jaehnig
0cdf576d86 use provided function to get JSON 2014-05-30 17:51:36 +02:00
Georg Jaehnig
4ffeca4ea2 cleanup 2014-05-30 16:39:24 +02:00
Georg Jaehnig
211fd6c674 added spiegel.tv 2014-05-30 16:35:17 +02:00
Sergey M․
6ebb46c106 [ivi] Replace tests 2014-05-30 19:12:55 +07:00
Philipp Hagemeister
0f97c9a06f [ard] Fix title (#3006) 2014-05-30 04:59:18 +02:00
Philipp Hagemeister
77fb72646f release 2014.05.30.1 2014-05-30 03:26:03 +02:00
Philipp Hagemeister
aae74e3832 [Makefile] Remove CHANGELOG entry 2014-05-30 03:26:00 +02:00
Philipp Hagemeister
894e730911 release 2014.05.30 2014-05-30 03:19:51 +02:00
Philipp Hagemeister
63961d87a6 [devscripts/release] Do not commit CHANGELOG 2014-05-30 03:19:37 +02:00
Jaime Marquínez Ferrándiz
87fe568c28 [nbcnews] Add support for /feature/* pages (closes #3007) 2014-05-30 00:38:57 +02:00
Sergey M․
46531b374d Merge branch 'anovicecodemonkey-ustream-embed-recorded2' 2014-05-29 20:23:36 +07:00
Sergey M․
9e8753911c [ustream] Modernize 2014-05-29 20:22:36 +07:00
Sergey M․
5c6b1e578c [ustream] Remove unnecessary webpage download 2014-05-29 20:20:11 +07:00
Sergey M․
8f0c8fb452 Merge branch 'ustream-embed-recorded2' of https://github.com/anovicecodemonkey/youtube-dl into anovicecodemonkey-ustream-embed-recorded2 2014-05-29 19:57:42 +07:00
anovicecodemonkey
b702ecebf0 [UstreamIE] added support for "/embed/recorded/" style URLs (Fixes #2990) 2014-05-28 22:17:13 +09:30
Sergey M․
950dc95e97 Merge branch 'rzhxeo-cinemassacre' 2014-05-28 19:38:55 +07:00
Sergey M․
d9dd3584e1 [cinemassacre] Improve formats extraction and modernize 2014-05-28 19:38:44 +07:00
Sergey M․
15a9f36849 Merge branch 'cinemassacre' of https://github.com/rzhxeo/youtube-dl into rzhxeo-cinemassacre 2014-05-28 19:31:23 +07:00
Sergey M․
d0087d4ff2 [nuvid] Fix video URL extraction 2014-05-27 18:46:30 +07:00
Sergey M․
cc5ada6f4c [ivi] Update playlist tests 2014-05-26 00:16:10 +07:00
Sergey M․
dfb2e1a325 [nrktv] Add support for tv.nrk.no (Closes #2980) 2014-05-25 07:14:18 +07:00
Sergey M.
65bab327b4 Merge pull request #2953 from codesparkle/ndr-regexes-escape-correctly
[ndr] fix regexes containing illegal characters
2014-05-25 05:42:06 +07:00
Sergey M.
9eeb7abc6b Merge pull request #2960 from codesparkle/fix-test-format-note-regex
[test] fixed typo in test_format_note (test_YoutubeDL)
2014-05-25 05:36:03 +07:00
Sergey M․
c70df21099 [streamcz] Workaround CertificateError 2014-05-25 05:32:19 +07:00
Sergey M․
418424e5f5 [streamcz] Use compat_str 2014-05-25 05:30:15 +07:00
Sergey M.
8477466125 Merge pull request #2979 from pulpe/streamcz_fix
[StreamCZ] correct video id + add test
2014-05-25 05:28:49 +07:00
pulpe
865dbd4a26 [StreamCZ] correct video id + add test 2014-05-24 16:01:37 +02:00
Sergey M․
b1e6f55912 [empflix] Fix extraction 2014-05-24 01:06:03 +07:00
Sergey M․
4d78f3b770 [pornhub] Fix uploader extraction 2014-05-24 00:44:34 +07:00
Sergey M․
7f739999e9 [swrmediathek] Extract direct links from JSON and add support for audio files 2014-05-23 21:04:21 +07:00
Sergey M․
0f8a01d4f3 [swrmediathek] Simplify 2014-05-22 19:35:46 +07:00
Sergey M.
e2bf499b14 Merge pull request #2944 from pulpe/SWRMediathek
[SWRMediathek] add support for swrmediathek.de (fixes #2929)
2014-05-22 19:30:09 +07:00
rzhxeo
7cf4547ab6 [CinemassacreIE] Extract all available video/audio formats 2014-05-22 10:33:30 +02:00
Simon W. Jackson
8ae980807a Update test_age_restriction.py
typo
2014-05-21 16:35:49 +02:00
Sergey M․
eec4d8ef96 [gamekings] Update test description 2014-05-21 19:53:58 +07:00
anovicecodemonkey
610134730a Add a _TEST_ 2014-05-21 19:25:37 +09:30
anovicecodemonkey
212a5e28ba Add a duplicate check to /extractor/common.py playlist_result function 2014-05-21 19:04:55 +09:30
codesparkle
1c783bca88 fixed (what I assume was a typo) that caused test_format_note to always fail.
This test was introduced in c57f775710.
2014-05-21 18:03:17 +10:00
Philipp Hagemeister
ac73651f66 Merge pull request #2940 from codesparkle/remove-unused-files
Remove old, unused CHANGELOG and LATEST_VERSION files
2014-05-21 08:33:13 +02:00
Keith Beckman
ee1a7032d5 Fixed errors found by travisci:
py26: re.split can't take flags. use inline flags or re.compile
py27: info_dict must be serializable. remove request object
py335, py34: no urlparse module. use utils.compat_urlparse
2014-05-20 22:28:32 -04:00
codesparkle
e5ceb3bfda Bringing back LATEST_VERSION 2014-05-21 00:55:54 +10:00
Sergey M․
c2ef29234c Credit @codesparkle for #2928, #2934, #2938, #2939 2014-05-20 20:12:57 +07:00
Sergey M.
1a1826c1af Merge pull request #2939 from codesparkle/upload-date-fix
No longer erroneously calculate upload_date within some extractors
2014-05-20 19:53:28 +07:00
Sergey M․
c7c6d43fe1 Merge branch 'codesparkle-bandcamp-albums-regex-duplicate-fix' 2014-05-20 19:45:28 +07:00
Sergey M․
2902d44f99 [bandcamp] Replace maxsplit keyword argument with regular one
Named arguments are not supported by methods implemented in native C (see http://bugs.python.org/issue1176)
2014-05-20 19:44:42 +07:00
Sergey M․
d6e4ba287b Merge branch 'bandcamp-albums-regex-duplicate-fix' of https://github.com/codesparkle/youtube-dl into codesparkle-bandcamp-albums-regex-duplicate-fix 2014-05-20 19:38:28 +07:00
Keith Beckman
7ed806d241 Fixed pyflakes and pep8 warnings 2014-05-20 02:55:21 -04:00
Keith Beckman
dd06c95e43 Added new IE for Grooveshark 2014-05-20 02:47:34 -04:00
Tobias Bell
e5c3a4b549 [gameone] Fix indentation and removed unused constants 2014-05-19 22:33:51 +02:00
Philipp Hagemeister
f50ee8d1c3 Merge branch 'master' of github.com:rg3/youtube-dl 2014-05-19 17:10:19 +02:00
Philipp Hagemeister
0e67ab0d8e [generic] Abort if user passes in URL "url" (#2942) 2014-05-19 17:10:11 +02:00
Adam Malcontenti-Wilson
1d0668ed5a [tenplay] Add new extractor 2014-05-19 23:28:21 +10:00
Adam Malcontenti-Wilson
d415299a80 [adultswim] Fix tests 2014-05-19 22:32:45 +10:00
codesparkle
77541837e5 The opening curly brace, '{', is a regex reserved control character, so it needs to be escaped (see http://stackoverflow.com/a/400316/1106367)
Minor improvements:
no need to sort the whole list if all we need is the maximum element, also instead of reinventing the wheel we can use utils to get indices from qualities.
2014-05-19 22:17:54 +10:00
Adam Malcontenti-Wilson
48fbb1003d [adultswim] Add new extractor 2014-05-19 22:05:46 +10:00
Sergey M․
e3a6576f35 [nowness] Update test file md5 and modernize 2014-05-19 19:05:18 +07:00
Philipp Hagemeister
89bb8e97ee release 2014.05.19 2014-05-19 11:42:37 +02:00
anovicecodemonkey
3442b30ab2 [generic] Support data-video-url for YouTube embeds (Fixes #2862) 2014-05-18 23:15:09 +09:30
pulpe
375696b1b1 [SWRMediathek] add support for swrmediathek.de 2014-05-18 14:56:35 +02:00
Sergey M․
4ea5c7b70d [ndr] Improve thumbnail extraction 2014-05-18 14:23:02 +07:00
Tobias Bell
305d068362 [gameone] Added timestamp extraction 2014-05-17 19:04:02 +02:00
Tobias Bell
a231ce87b5 [gameone] Added extraction of age_limit 2014-05-17 18:35:11 +02:00
Tobias Bell
a84d20fc14 [gameone] Simplified extraction of description 2014-05-17 18:20:29 +02:00
Tobias Bell
9e30092361 [gameone] Added extraction of description and fixed failing tests 2014-05-17 17:07:40 +02:00
Tobias Bell
10d5c7aa5f [gameone] Added explanation for usage of http://cdn.riptide-mtvn.com/ 2014-05-17 15:10:19 +02:00
Tobias Bell
412f356e04 [gameone] Add new extractor gameone
Currently only usable for downloading tv episodes residing under
http://www.gameone.de/tv/
2014-05-17 14:47:23 +02:00
Sergey M․
8dfa187b8a [generic] Support pagespeed_iframe for NovaMov embeds 2014-05-17 18:12:12 +07:00
Sergey M․
c1ed1f7055 [ndr] Fix title, description and duration extraction 2014-05-17 18:11:40 +07:00
Sergey M․
1514f74967 [ndr] Fix thumbnail extraction 2014-05-17 17:58:37 +07:00
codesparkle
2e8323e3f7 CHANGELOG and LATEST_VERSION seem to serve no purpose at all. They haven't been changed in years. Unless these are actually used somewhere, let's get rid of them. 2014-05-17 17:07:50 +10:00
codesparkle
69f8364042 removed duplicate and somemtimes incorrect logic for parsing upload date as this job is already taken care of automatically by YoutubeDL.py 2014-05-17 15:21:46 +10:00
codesparkle
79981f039b Fixed test failure in test_all_urls: test_no_duplicates: BandcampAlbumIE inappropriately matched non-album bandcamp links as well.
BandcampIE changed to report full-accuracy duration instead of unnecessarily rounding it to the nearest integer.
Simplified conditionals and parsing a bit. Fixed typos.
2014-05-17 14:22:24 +10:00
Ralf Haring
34d863f3fc [vh1] use standard sort (#2072) 2014-05-16 23:49:41 -04:00
Philipp Hagemeister
91994c2c81 release 2014.05.17 2014-05-17 00:17:40 +02:00
Ralf Haring
3ee4b60d56 [vh1] Add new extractor (#2072) 2014-05-16 18:15:02 -04:00
Jaime Marquínez Ferrándiz
76e92371ac [youtube] Recognize a second format of the upload_date in the 'watch-uploader-info' element (#2911) 2014-05-16 22:12:52 +02:00
Jaime Marquínez Ferrándiz
08af0205f9 Merge remote-tracking branch 'codesparkle/fix-photobucket-url' (closes #2934)
Fix photobucket url extraction
2014-05-16 20:44:52 +02:00
codesparkle
a725fb1f43 test_download works for photobucket after this change 2014-05-17 03:25:41 +10:00
Jaime Marquínez Ferrándiz
05ee2b6dad [youtube] Fix extraction of the feed 'paging' values (fixes #2925) 2014-05-16 16:01:13 +02:00
Philipp Hagemeister
b74feacac5 release 2014.05.16.1 2014-05-16 15:53:17 +02:00
Philipp Hagemeister
426b52fc5d Merge remote-tracking branch 'origin/master' 2014-05-16 15:52:01 +02:00
Philipp Hagemeister
5c30b26846 [francetv] Add support for non-numeric video IDs (Fixes #2927) 2014-05-16 15:51:01 +02:00
Philipp Hagemeister
f07b74fc18 [ffmpeg] Correct argument encoding on Windows with Python 2.x
Fixes #2924
2014-05-16 15:47:56 +02:00
Sergey M․
a5a45015ba [generic] Fix redirect 2014-05-16 20:32:53 +07:00
Philipp Hagemeister
beee53de06 [youtube] Look for published-on date if uploaded-on is not found
Fixes #2911
2014-05-16 13:21:44 +02:00
Philipp Hagemeister
8712f2bea7 release 2014.05.16 2014-05-16 12:04:52 +02:00
Philipp Hagemeister
ea102818c9 Merge remote-tracking branch 'origin/master' 2014-05-16 12:04:24 +02:00
Philipp Hagemeister
0a871f6880 Provide compatibility check_output for 2.6 (Fixes #2926) 2014-05-16 12:03:59 +02:00
Sergey M․
481efc84a8 [bliptv] Switch extraction to RSS (Closes #2920) 2014-05-15 22:20:40 +07:00
Jaime Marquínez Ferrándiz
01ed5c9be3 [youtube] Fix typo 2014-05-15 13:43:29 +02:00
Philipp Hagemeister
ad3bc6acd5 Document and test categories (#2923) 2014-05-15 12:41:42 +02:00
Philipp Hagemeister
5afa7f8bee [extractor/common] --write-pages: Correct file name if video_id is None 2014-05-15 12:39:33 +02:00
Dario Guarascio
ec8deefc27 [youtube] Video categories added to metadata 2014-05-15 13:59:27 +07:00
Sergey M․
a2d5a4ee64 [gamespot] Update test URL and modernize 2014-05-14 20:13:34 +07:00
Jaime Marquínez Ferrándiz
dffcc2ea0c Makefile: write the manpage to the right file and use the processed markdown document 2014-05-13 14:37:05 +02:00
Philipp Hagemeister
1800eeefed add prepare_manpage 2014-05-13 14:21:21 +02:00
Sergey M․
d7e7dedbde [noco] Skip test 2014-05-13 19:12:17 +07:00
Philipp Hagemeister
d19bb9c0aa Split man and README (Fixes #2892) 2014-05-13 11:16:11 +02:00
Philipp Hagemeister
3ef79a974a [README] Stress example URL
This seems to be the part most often overlooked in our README.
2014-05-13 10:28:58 +02:00
Philipp Hagemeister
bc6800fbed release 2014.05.13 2014-05-13 10:20:10 +02:00
Philipp Hagemeister
65314dccf8 [empflix] Simplify (#2903) 2014-05-13 10:14:05 +02:00
Philipp Hagemeister
feb7221209 Merge remote-tracking branch 'hojel/empflix' 2014-05-13 10:11:14 +02:00
Philipp Hagemeister
56a94d8cbb [hentaistigma] Simplified (#2902) 2014-05-13 10:10:59 +02:00
Philipp Hagemeister
24e6ec8ac8 Merge remote-tracking branch 'hojel/hentaistigma' 2014-05-13 10:09:04 +02:00
Philipp Hagemeister
87724af7a8 [nuvid] Simplify (#2901) 2014-05-13 10:08:32 +02:00
Philipp Hagemeister
b65c3e77e8 Merge remote-tracking branch 'hojel/nuvid' 2014-05-13 10:05:20 +02:00
Philipp Hagemeister
5301304bf2 [slutload] Simplify (#2898) 2014-05-13 10:04:29 +02:00
Philipp Hagemeister
948bcc60df Merge remote-tracking branch 'hojel/slutload' 2014-05-13 10:00:49 +02:00
Philipp Hagemeister
25dfe0eb10 Credit @hojel for fc2 and other extractors (#2877) 2014-05-13 10:00:27 +02:00
Philipp Hagemeister
8e71456a81 [fc2] Add new extractor (Fixes #2877)
This commit has been recreated, since there seems to have been a problem with GitHub; the PR doesn't have a branch.
2014-05-13 09:58:36 +02:00
Philipp Hagemeister
ccdd34ed78 Credit @jnormore for vine:user (#2888) 2014-05-13 09:53:58 +02:00
Philipp Hagemeister
26d886354f Merge remote-tracking branch 'frewsxcv/patch-1' 2014-05-13 09:52:28 +02:00
Philipp Hagemeister
a172b258ac [vine:user] Simplify 2014-05-13 09:50:03 +02:00
Philipp Hagemeister
7b93c2c204 Merge remote-tracking branch 'jnormore/vine_user' 2014-05-13 09:45:27 +02:00
Philipp Hagemeister
57c7411f46 [mixcloud] Shed API dependency (#2904) 2014-05-13 09:42:38 +02:00
Philipp Hagemeister
d0a122348e [test/helper] Clarify which field failed an assertion 2014-05-13 09:41:36 +02:00
Philipp Hagemeister
e4cbb5f382 [wdr] Add support for mobile URLs 2014-05-12 22:17:19 +02:00
Philipp Hagemeister
c1bce22f23 [extractor/common] Protect against long video IDs and URLs 2014-05-12 21:58:23 +02:00
Philipp Hagemeister
e3abbbe301 release 2014.05.12 2014-05-12 16:40:03 +02:00
Sergey M․
55b36e3710 [videott] Add support for video.tt (Closes #2889) 2014-05-12 20:23:08 +07:00
hojel
877bea9ce1 [empflix] Add new extractor 2014-05-12 04:10:29 -07:00
hojel
33c7ff861e [hentaistigma] Add new extractor 2014-05-12 03:58:07 -07:00
hojel
749fe60c1e [nuvid] Add new extractor 2014-05-12 03:48:40 -07:00
hojel
63b31b059c [slutload] Add new extractor 2014-05-12 01:29:19 -07:00
hojel
1476b497eb [slutload] Add new extractor 2014-05-12 01:28:56 -07:00
Jaime Marquínez Ferrándiz
e399853d0c [youtube:playlist] Improve detection of private lists (#2840) 2014-05-12 07:59:33 +02:00
Corey Farwell
fdb205b19e Enable testing on Python 3.4 2014-05-11 20:13:22 -07:00
Sergey M․
fbe8053120 [vk] Update test 2014-05-11 16:43:59 +07:00
Jason Normore
ea783d01e1 Added VineUserIE extractor for vine user timeline
Added vine user timeline extractor using unofficial
vine api user profile and timeline api endpoints.
2014-05-10 23:18:20 -04:00
Jaime Marquínez Ferrándiz
b7d73595dc Allow recoding the video to mkv 2014-05-10 15:09:56 +02:00
Sergey M․
e97e53eeed [vevo] Add friendly error output (#2874) 2014-05-10 04:34:53 +07:00
Sergey M․
342f630dbf [rutv] Add support for more live stream URLs (Closes #2875) 2014-05-10 02:23:24 +07:00
Sergey M․
69c8fb9e5d [vimeo] Add video duration extraction(Closes #2876) 2014-05-10 01:46:40 +07:00
Sergey M․
5f0f8013ac [vube] Consider optional fields and modernize 2014-05-09 01:45:34 +07:00
Sergey M․
b5368acee8 [vube] Improve URL detection and extract timestamp 2014-05-09 01:31:25 +07:00
Sergey M․
f71959fcf5 [nfb] Add support for videos with captions (#2866) 2014-05-08 22:07:14 +07:00
Philipp Hagemeister
5c9f3b8b16 [arte] Fix versionCode interpretation (#2588) 2014-05-08 02:00:47 +02:00
Sergey M․
bebd6f9308 [funnyordie] Extract more formats 2014-05-07 21:02:57 +07:00
Sergey M.
84a2806c16 Merge pull request #2859 from pulpe/FunnyOrDie_thumb
[FunnyOrDie] fix thumbnails + add test (fixes #2856)
2014-05-06 19:46:40 +07:00
pulpe
d0111a7409 [FunnyOrDie] simplify 2014-05-06 10:19:13 +02:00
pulpe
aab8874c55 [FunnyOrDie] fix thumbnails + add test (fixes #2856) 2014-05-06 08:57:28 +02:00
Sergey M․
fcf5b01746 [prosiebensat1] Simplify 2014-05-05 19:02:49 +07:00
Philipp Hagemeister
4de9e9a6db [canalplus] Fix id determination (Fixes #2851) 2014-05-05 03:30:05 +02:00
Philipp Hagemeister
0067d6c4be release 2014.05.05 2014-05-05 03:15:40 +02:00
Philipp Hagemeister
2099125333 [soundcloud/generic] Add support for playlists 2014-05-05 03:15:17 +02:00
Philipp Hagemeister
b48f147d5a [bandcamp] Add support for subdomains (Fixes #2850) 2014-05-05 02:44:44 +02:00
Jaime Marquínez Ferrándiz
4f3e943080 [vimeo] Some modernization and style fixes 2014-05-04 22:27:56 +02:00
Jaime Marquínez Ferrándiz
7558830fa3 [vimeo] Fix description extraction 2014-05-04 21:48:08 +02:00
Sergey M․
867274e997 [statigram] Update to fit new website name and rename extractor 2014-05-04 16:52:10 +07:00
Sergey M․
6515778305 [nytimes] Improve file size extraction 2014-05-03 03:11:38 +07:00
Sergey M․
3b1dfc0f2f [newstube] Do not shadow standard str 2014-05-03 02:30:50 +07:00
Sergey M․
d664de44b7 [nytimes] Add support for nytimes.com (Closes #2846) 2014-05-03 02:28:38 +07:00
Sergey M․
bbe99d26ec Credit @nicoe for rtbf.be (#2822) 2014-05-02 02:36:11 +07:00
Sergey M․
50fc59968e [ntv] Simplify 2014-05-02 02:26:07 +07:00
Sergey M․
b8b01bb92a [newstube] Add support for newstube.ru (Closes #2814) 2014-05-01 21:15:25 +07:00
Sergey M․
eb45133451 [rtmp] Add support for multiple AFM data entries 2014-05-01 21:14:21 +07:00
Jaime Marquínez Ferrándiz
10c0e2d818 [youtube:playlist] Raise an error if the list doesn't exist or is private (closes #2840) 2014-05-01 15:40:35 +02:00
Sergey M․
669f0e7cda [generic] Fix wrong entries index 2014-05-01 16:28:37 +07:00
Sergey M․
32fd27ec98 [http] Fix string/None comparison with int while in test 2014-04-30 20:02:17 +07:00
Philipp Hagemeister
0c13f378de Merge remote-tracking branch 'origin/master' 2014-04-30 14:12:41 +02:00
Philipp Hagemeister
0049594efb [vine] Remove debugging code 2014-04-30 14:12:30 +02:00
Sergey M․
113c7d3eb0 [canalplus] Update test file checksum 2014-04-30 18:54:12 +07:00
Sergey M․
549371fc99 [nrk] Update test file checksums 2014-04-30 18:51:50 +07:00
Sergey M․
957f27e5bb [scivee] Revert test file download 2014-04-30 18:49:29 +07:00
221 changed files with 10243 additions and 2216 deletions

1
.gitignore vendored
View File

@@ -26,5 +26,6 @@ updates_key.pem
*.m4a
*.m4v
*.part
*.swp
test/testdata
.tox

View File

@@ -3,6 +3,7 @@ python:
- "2.6"
- "2.7"
- "3.3"
- "3.4"
script: nosetests test --verbose
notifications:
email:

View File

@@ -1,14 +0,0 @@
2013.01.02 Codename: GIULIA
* Add support for ComedyCentral clips <nto>
* Corrected Vimeo description fetching <Nick Daniels>
* Added the --no-post-overwrites argument <Barbu Paul - Gheorghe>
* --verbose offers more environment info
* New info_dict field: uploader_id
* New updates system, with signature checking
* New IEs: NBA, JustinTV, FunnyOrDie, TweetReel, Steam, Ustream
* Fixed IEs: BlipTv
* Fixed for Python 3 IEs: Xvideo, Youku, XNXX, Dailymotion, Vimeo, InfoQ
* Simplified IEs and test code
* Various (Python 3 and other) fixes
* Revamped and expanded tests

View File

@@ -1,15 +1,15 @@
all: youtube-dl README.md README.txt youtube-dl.1 youtube-dl.bash-completion
clean:
rm -rf youtube-dl.1 youtube-dl.bash-completion README.txt MANIFEST build/ dist/ .coverage cover/ youtube-dl.tar.gz
rm -rf youtube-dl.1.temp.md youtube-dl.1 youtube-dl.bash-completion README.txt MANIFEST build/ dist/ .coverage cover/ youtube-dl.tar.gz
cleanall: clean
rm -f youtube-dl youtube-dl.exe
PREFIX=/usr/local
BINDIR=$(PREFIX)/bin
MANDIR=$(PREFIX)/man
PYTHON=/usr/bin/env python
PREFIX ?= /usr/local
BINDIR ?= $(PREFIX)/bin
MANDIR ?= $(PREFIX)/man
PYTHON ?= /usr/bin/env python
# set SYSCONFDIR to /etc if PREFIX=/usr or PREFIX=/usr/local
ifeq ($(PREFIX),/usr)
@@ -55,7 +55,9 @@ README.txt: README.md
pandoc -f markdown -t plain README.md -o README.txt
youtube-dl.1: README.md
pandoc -s -f markdown -t man README.md -o youtube-dl.1
python devscripts/prepare_manpage.py >youtube-dl.1.temp.md
pandoc -s -f markdown -t man youtube-dl.1.temp.md -o youtube-dl.1
rm -f youtube-dl.1.temp.md
youtube-dl.bash-completion: youtube_dl/*.py youtube_dl/*/*.py devscripts/bash-completion.in
python devscripts/bash-completion.py
@@ -75,6 +77,6 @@ youtube-dl.tar.gz: youtube-dl README.md README.txt youtube-dl.1 youtube-dl.bash-
--exclude 'docs/_build' \
-- \
bin devscripts test youtube_dl docs \
CHANGELOG LICENSE README.md README.txt \
LICENSE README.md README.txt \
Makefile MANIFEST.in youtube-dl.1 youtube-dl.bash-completion setup.py \
youtube-dl

202
README.md
View File

@@ -1,11 +1,32 @@
% YOUTUBE-DL(1)
# NAME
youtube-dl - download videos from youtube.com or other video platforms
# SYNOPSIS
**youtube-dl** [OPTIONS] URL [URL...]
# INSTALLATION
To install it right away for all UNIX users (Linux, OS X, etc.), type:
sudo curl https://yt-dl.org/latest/youtube-dl -o /usr/local/bin/youtube-dl
sudo chmod a+x /usr/local/bin/youtube-dl
If you do not have curl, you can alternatively use a recent wget:
sudo wget https://yt-dl.org/downloads/latest/youtube-dl -O /usr/local/bin/youtube-dl
sudo chmod a+x /usr/local/bin/youtube-dl
Windows users can [download a .exe file](https://yt-dl.org/latest/youtube-dl.exe) and place it in their home directory or any other location on their [PATH](http://en.wikipedia.org/wiki/PATH_%28variable%29).
OS X users can install **youtube-dl** with [Homebrew](http://brew.sh/).
brew install youtube-dl
You can also use pip:
sudo pip install youtube-dl
Alternatively, refer to the developer instructions below for how to check out and work with the git repository. For further options, including PGP signatures, see https://rg3.github.io/youtube-dl/download.html .
# DESCRIPTION
**youtube-dl** is a small command-line program to download videos from
YouTube.com and a few more sites. It requires the Python interpreter, version
@@ -25,12 +46,6 @@ which means you can modify it, redistribute it or use it however you like.
playlist or the command line) if an error
occurs
--dump-user-agent display the current browser identification
--user-agent UA specify a custom user agent
--referer REF specify a custom referer, use if the video
access is restricted to one domain
--add-header FIELD:VALUE specify a custom HTTP header and its value,
separated by a colon ':'. You can use this
option multiple times
--list-extractors List all supported extractors and the URLs
they would handle
--extractor-descriptions Output descriptions of all supported
@@ -38,34 +53,22 @@ which means you can modify it, redistribute it or use it however you like.
--proxy URL Use the specified HTTP/HTTPS proxy. Pass in
an empty string (--proxy "") for direct
connection
--no-check-certificate Suppress HTTPS certificate validation.
--prefer-insecure Use an unencrypted connection to retrieve
information about the video. (Currently
supported only for YouTube)
--cache-dir DIR Location in the filesystem where youtube-dl
can store some downloaded information
permanently. By default $XDG_CACHE_HOME
/youtube-dl or ~/.cache/youtube-dl . At the
moment, only YouTube player files (for
videos with obfuscated signatures) are
cached, but that may change.
--no-cache-dir Disable filesystem caching
--socket-timeout None Time to wait before giving up, in seconds
--bidi-workaround Work around terminals that lack
bidirectional text support. Requires bidiv
or fribidi executable in PATH
--default-search PREFIX Use this prefix for unqualified URLs. For
example "gvsearch2:" downloads two videos
from google videos for youtube-dl "large
apple". By default (with value "auto")
youtube-dl guesses.
apple". Use the value "auto" to let
youtube-dl guess ("auto_warning" to emit a
warning when guessing). "error" just throws
an error. The default value "fixup_error"
repairs broken URLs, but emits an error if
this is not possible instead of searching.
--ignore-config Do not read configuration files. When given
in the global configuration file /etc
/youtube-dl.conf: do not read the user
configuration in ~/.config/youtube-dl.conf
(%APPDATA%/youtube-dl/config.txt on
Windows)
--encoding ENCODING Force the specified encoding (experimental)
## Video Selection:
--playlist-start NUMBER playlist video to start at (default is 1)
@@ -111,9 +114,9 @@ which means you can modify it, redistribute it or use it however you like.
of SIZE.
## Filesystem Options:
-t, --title use title in file name (default)
-a, --batch-file FILE file containing URLs to download ('-' for
stdin)
--id use only video ID in file name
-l, --literal [deprecated] alias of --title
-A, --auto-number number downloaded files starting from 00000
-o, --output TEMPLATE output filename template. Use %(title)s to
get the title, %(uploader)s for the
@@ -146,18 +149,15 @@ which means you can modify it, redistribute it or use it however you like.
--restrict-filenames Restrict filenames to only ASCII
characters, and avoid "&" and spaces in
filenames
-a, --batch-file FILE file containing URLs to download ('-' for
stdin)
--load-info FILE json file containing the video information
(created with the "--write-json" option)
-t, --title [deprecated] use title in file name
(default)
-l, --literal [deprecated] alias of --title
-w, --no-overwrites do not overwrite files
-c, --continue force resume of partially downloaded files.
By default, youtube-dl will resume
downloads if possible.
--no-continue do not resume partially downloaded files
(restart from beginning)
--cookies FILE file to read cookies from and dump cookie
jar in
--no-part do not use .part files
--no-mtime do not use the Last-modified header to set
the file modification time
@@ -167,6 +167,19 @@ which means you can modify it, redistribute it or use it however you like.
--write-annotations write video annotations to a .annotation
file
--write-thumbnail write thumbnail image to disk
--load-info FILE json file containing the video information
(created with the "--write-json" option)
--cookies FILE file to read cookies from and dump cookie
jar in
--cache-dir DIR Location in the filesystem where youtube-dl
can store some downloaded information
permanently. By default $XDG_CACHE_HOME
/youtube-dl or ~/.cache/youtube-dl . At the
moment, only YouTube player files (for
videos with obfuscated signatures) are
cached, but that may change.
--no-cache-dir Disable filesystem caching
--rm-cache-dir Delete all filesystem cache files
## Verbosity / Simulation Options:
-q, --quiet activates quiet mode
@@ -196,6 +209,22 @@ which means you can modify it, redistribute it or use it however you like.
problems
--print-traffic Display sent and read HTTP traffic
## Workarounds:
--encoding ENCODING Force the specified encoding (experimental)
--no-check-certificate Suppress HTTPS certificate validation.
--prefer-insecure Use an unencrypted connection to retrieve
information about the video. (Currently
supported only for YouTube)
--user-agent UA specify a custom user agent
--referer REF specify a custom referer, use if the video
access is restricted to one domain
--add-header FIELD:VALUE specify a custom HTTP header and its value,
separated by a colon ':'. You can use this
option multiple times
--bidi-workaround Work around terminals that lack
bidirectional text support. Requires bidiv
or fribidi executable in PATH
## Video Format Options:
-f, --format FORMAT video format code, specify the order of
preference using slashes: "-f 22/17/18".
@@ -226,6 +255,7 @@ which means you can modify it, redistribute it or use it however you like.
## Authentication Options:
-u, --username USERNAME account username
-p, --password PASSWORD account password
-2, --twofactor TWOFACTOR two-factor auth code
-n, --netrc use .netrc authentication data
--video-password PASSWORD video password (vimeo, smotri)
@@ -241,7 +271,7 @@ which means you can modify it, redistribute it or use it however you like.
128K (default 5)
--recode-video FORMAT Encode the video to another format if
necessary (currently supported:
mp4|flv|ogg|webm)
mp4|flv|ogg|webm|mkv)
-k, --keep-video keeps the video file on disk after the
post-processing; the video is erased by
default
@@ -258,6 +288,10 @@ which means you can modify it, redistribute it or use it however you like.
postprocessors (default)
--prefer-ffmpeg Prefer ffmpeg over avconv for running the
postprocessors
--exec CMD Execute a command on the file after
downloading, similar to find's -exec
syntax. Example: --exec 'adb push {}
/sdcard/Music/ && rm {}'
# CONFIGURATION
@@ -282,10 +316,12 @@ The current default template is `%(title)s-%(id)s.%(ext)s`.
In some cases, you don't want special characters such as 中, spaces, or &, such as when transferring the downloaded filename to a Windows system or the filename through an 8bit-unsafe channel. In these cases, add the `--restrict-filenames` flag to get a shorter title:
$ youtube-dl --get-filename -o "%(title)s.%(ext)s" BaW_jenozKc
youtube-dl test video ''_ä↭𝕐.mp4 # All kinds of weird characters
$ youtube-dl --get-filename -o "%(title)s.%(ext)s" BaW_jenozKc --restrict-filenames
youtube-dl_test_video_.mp4 # A simple file name
```bash
$ youtube-dl --get-filename -o "%(title)s.%(ext)s" BaW_jenozKc
youtube-dl test video ''_ä↭𝕐.mp4 # All kinds of weird characters
$ youtube-dl --get-filename -o "%(title)s.%(ext)s" BaW_jenozKc --restrict-filenames
youtube-dl_test_video_.mp4 # A simple file name
```
# VIDEO SELECTION
@@ -296,14 +332,16 @@ Videos can be filtered by their upload date using the options `--date`, `--dateb
Examples:
# Download only the videos uploaded in the last 6 months
$ youtube-dl --dateafter now-6months
```bash
# Download only the videos uploaded in the last 6 months
$ youtube-dl --dateafter now-6months
# Download only the videos uploaded on January 1, 1970
$ youtube-dl --date 19700101
# Download only the videos uploaded on January 1, 1970
$ youtube-dl --date 19700101
$ # will only download the videos uploaded in the 200x decade
$ youtube-dl --dateafter 20000101 --datebefore 20091231
$ # will only download the videos uploaded in the 200x decade
$ youtube-dl --dateafter 20000101 --datebefore 20091231
```
# FAQ
@@ -378,49 +416,49 @@ If you want to add support for a new site, you can follow this quick list (assum
2. Check out the source code with `git clone git@github.com:YOUR_GITHUB_USERNAME/youtube-dl.git`
3. Start a new git branch with `cd youtube-dl; git checkout -b yourextractor`
4. Start with this simple template and save it to `youtube_dl/extractor/yourextractor.py`:
```python
# coding: utf-8
from __future__ import unicode_literals
# coding: utf-8
from __future__ import unicode_literals
import re
import re
from .common import InfoExtractor
from .common import InfoExtractor
class YourExtractorIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?yourextractor\.com/watch/(?P<id>[0-9]+)'
_TEST = {
'url': 'http://yourextractor.com/watch/42',
'md5': 'TODO: md5 sum of the first 10KiB of the video file',
'info_dict': {
'id': '42',
'ext': 'mp4',
'title': 'Video title goes here',
# TODO more properties, either as:
# * A value
# * MD5 checksum; start the string with md5:
# * A regular expression; start the string with re:
# * Any Python type (for example int or float)
}
class YourExtractorIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?yourextractor\.com/watch/(?P<id>[0-9]+)'
_TEST = {
'url': 'http://yourextractor.com/watch/42',
'md5': 'TODO: md5 sum of the first 10KiB of the video file',
'info_dict': {
'id': '42',
'ext': 'mp4',
'title': 'Video title goes here',
'thumbnail': 're:^https?://.*\.jpg$',
# TODO more properties, either as:
# * A value
# * MD5 checksum; start the string with md5:
# * A regular expression; start the string with re:
# * Any Python type (for example int or float)
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
# TODO more code goes here, for example ...
webpage = self._download_webpage(url, video_id)
title = self._html_search_regex(r'<h1>(.*?)</h1>', webpage, 'title')
return {
'id': video_id,
'title': title,
# TODO more properties (see youtube_dl/extractor/common.py)
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
# TODO more code goes here, for example ...
webpage = self._download_webpage(url, video_id)
title = self._html_search_regex(r'<h1>(.*?)</h1>', webpage, 'title')
return {
'id': video_id,
'title': title,
# TODO more properties (see youtube_dl/extractor/common.py)
}
```
5. Add an import in [`youtube_dl/extractor/__init__.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/__init__.py).
6. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done.
6. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, then rename ``_TEST`` to ``_TESTS`` and make it into a list of dictionaries. The tests will be then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc.
7. Have a look at [`youtube_dl/common/extractor/common.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should return](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L38). Add tests and code for as many as you want.
8. If you can, check the code with [pyflakes](https://pypi.python.org/pypi/pyflakes) (a good idea) and [pep8](https://pypi.python.org/pypi/pep8) (optional, ignore E501).
9. When the tests pass, [add](https://www.kernel.org/pub/software/scm/git/docs/git-add.html) the new files and [commit](https://www.kernel.org/pub/software/scm/git/docs/git-commit.html) them and [push](https://www.kernel.org/pub/software/scm/git/docs/git-push.html) the result, like this:
@@ -458,7 +496,7 @@ If your report is shorter than two lines, it is almost certainly missing some of
For bug reports, this means that your report should contain the *complete* output of youtube-dl when called with the -v flag. The error message you get for (most) bugs even says so, but you would not believe how many of our bug reports do not contain this information.
Site support requests must contain an example URL. An example URL is a URL you might want to download, like http://www.youtube.com/watch?v=BaW_jenozKc . There should be an obvious video present. Except under very special circumstances, the main page of a video service (e.g. http://www.youtube.com/ ) is *not* an example URL.
Site support requests **must contain an example URL**. An example URL is a URL you might want to download, like http://www.youtube.com/watch?v=BaW_jenozKc . There should be an obvious video present. Except under very special circumstances, the main page of a video service (e.g. http://www.youtube.com/ ) is *not* an example URL.
### Are you using the latest version?

View File

@@ -15,7 +15,7 @@ header = oldreadme[:oldreadme.index('# OPTIONS')]
footer = oldreadme[oldreadme.index('# CONFIGURATION'):]
options = helptext[helptext.index(' General Options:') + 19:]
options = re.sub(r'^ (\w.+)$', r'## \1', options, flags=re.M)
options = re.sub(r'(?m)^ (\w.+)$', r'## \1', options)
options = '# OPTIONS\n' + options + '\n'
with io.open(README_FILE, 'w', encoding='utf-8') as f:

View File

@@ -0,0 +1,20 @@
import io
import os.path
import sys
import re
ROOT_DIR = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
README_FILE = os.path.join(ROOT_DIR, 'README.md')
with io.open(README_FILE, encoding='utf-8') as f:
readme = f.read()
PREFIX = '%YOUTUBE-DL(1)\n\n# NAME\n'
readme = re.sub(r'(?s)# INSTALLATION.*?(?=# DESCRIPTION)', '', readme)
readme = PREFIX + readme
if sys.version_info < (3, 0):
print(readme.encode('utf-8'))
else:
print(readme)

View File

@@ -45,9 +45,9 @@ fi
/bin/echo -e "\n### Changing version in version.py..."
sed -i "s/__version__ = '.*'/__version__ = '$version'/" youtube_dl/version.py
/bin/echo -e "\n### Committing CHANGELOG README.md and youtube_dl/version.py..."
/bin/echo -e "\n### Committing README.md and youtube_dl/version.py..."
make README.md
git add CHANGELOG README.md youtube_dl/version.py
git add README.md youtube_dl/version.py
git commit -m "release $version"
/bin/echo -e "\n### Now tagging, signing and pushing..."

View File

@@ -102,12 +102,15 @@ def expect_info_dict(self, expected_dict, got_dict):
match_rex = re.compile(match_str)
self.assertTrue(
isinstance(got, compat_str) and match_rex.match(got),
isinstance(got, compat_str),
'Expected a %r object, but got %r' % (compat_str, type(got)))
self.assertTrue(
match_rex.match(got),
u'field %s (value: %r) should match %r' % (info_field, got, match_str))
elif isinstance(expected, type):
got = got_dict.get(info_field)
self.assertTrue(isinstance(got, expected),
u'Expected type %r, but got value %r of type %r' % (expected, got, type(got)))
u'Expected type %r for field %s, but got value %r of type %r' % (expected, info_field, got, type(got)))
else:
if isinstance(expected, compat_str) and expected.startswith('md5:'):
got = 'md5:' + md5(got_dict.get(info_field))
@@ -117,8 +120,9 @@ def expect_info_dict(self, expected_dict, got_dict):
u'invalid value for field %s, expected %r, got %r' % (info_field, expected, got))
# Check for the presence of mandatory fields
for key in ('id', 'url', 'title', 'ext'):
self.assertTrue(got_dict.get(key), 'Missing mandatory field %s' % key)
if got_dict.get('_type') != 'playlist':
for key in ('id', 'url', 'title', 'ext'):
self.assertTrue(got_dict.get(key), 'Missing mandatory field %s' % key)
# Check for mandatory fields that are automatically set by YoutubeDL
for key in ['webpage_url', 'extractor', 'extractor_key']:
self.assertTrue(got_dict.get(key), u'Missing field: %s' % key)
@@ -137,8 +141,8 @@ def expect_info_dict(self, expected_dict, got_dict):
def assertRegexpMatches(self, text, regexp, msg=None):
if hasattr(self, 'assertRegexpMatches'):
return self.assertRegexpMatches(text, regexp, msg)
if hasattr(self, 'assertRegexp'):
return self.assertRegexp(text, regexp, msg)
else:
m = re.match(regexp, text)
if not m:
@@ -148,3 +152,10 @@ def assertRegexpMatches(self, text, regexp, msg=None):
else:
msg = note + ', ' + msg
self.assertTrue(m, msg)
def assertGreaterEqual(self, got, expected, msg=None):
if not (got >= expected):
if msg is None:
msg = '%r not greater than or equal to %r' % (got, expected)
self.assertTrue(got >= expected, msg)

1
test/swftests/.gitignore vendored Normal file
View File

@@ -0,0 +1 @@
*.swf

View File

@@ -0,0 +1,19 @@
// input: [["a", "b", "c", "d"]]
// output: ["c", "b", "a", "d"]
package {
public class ArrayAccess {
public static function main(ar:Array):Array {
var aa:ArrayAccess = new ArrayAccess();
return aa.f(ar, 2);
}
private function f(ar:Array, num:Number):Array{
var x:String = ar[0];
var y:String = ar[num % ar.length];
ar[0] = y;
ar[num] = x;
return ar;
}
}
}

View File

@@ -0,0 +1,17 @@
// input: []
// output: 121
package {
public class ClassCall {
public static function main():int{
var f:OtherClass = new OtherClass();
return f.func(100,20);
}
}
}
class OtherClass {
public function func(x: int, y: int):int {
return x+y+1;
}
}

View File

@@ -0,0 +1,15 @@
// input: []
// output: 0
package {
public class ClassConstruction {
public static function main():int{
var f:Foo = new Foo();
return 0;
}
}
}
class Foo {
}

View File

@@ -0,0 +1,13 @@
// input: [1, 2]
// output: 3
package {
public class LocalVars {
public static function main(a:int, b:int):int{
var c:int = a + b + b;
var d:int = c - b;
var e:int = d;
return e;
}
}
}

View File

@@ -0,0 +1,21 @@
// input: []
// output: 9
package {
public class PrivateCall {
public static function main():int{
var f:OtherClass = new OtherClass();
return f.func();
}
}
}
class OtherClass {
private function pf():int {
return 9;
}
public function func():int {
return this.pf();
}
}

View File

@@ -0,0 +1,13 @@
// input: [1]
// output: 1
package {
public class StaticAssignment {
public static var v:int;
public static function main(a:int):int{
v = a;
return v;
}
}
}

View File

@@ -0,0 +1,16 @@
// input: []
// output: 1
package {
public class StaticRetrieval {
public static var v:int;
public static function main():int{
if (v) {
return 0;
} else {
return 1;
}
}
}
}

View File

@@ -67,7 +67,7 @@ class TestFormatSelection(unittest.TestCase):
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['ext'], 'mp4')
# No prefer_free_formats => prefer mp4 and flv for greater compatibilty
# No prefer_free_formats => prefer mp4 and flv for greater compatibility
ydl = YDL()
ydl.params['prefer_free_formats'] = False
formats = [
@@ -221,7 +221,7 @@ class TestFormatSelection(unittest.TestCase):
'138', '137', '248', '136', '247', '135', '246',
'245', '244', '134', '243', '133', '242', '160',
# Dash audio
'141', '172', '140', '139', '171',
'141', '172', '140', '171', '139',
]
for f1id, f2id in zip(order, order[1:]):
@@ -279,7 +279,7 @@ class TestFormatSelection(unittest.TestCase):
self.assertEqual(ydl._format_note({}), '')
assertRegexpMatches(self, ydl._format_note({
'vbr': 10,
}), '^x\s*10k$')
}), '^\s*10k$')
if __name__ == '__main__':
unittest.main()

View File

@@ -13,7 +13,7 @@ from youtube_dl import YoutubeDL
def _download_restricted(url, filename, age):
""" Returns true iff the file has been downloaded """
""" Returns true if the file has been downloaded """
params = {
'age_limit': age,

View File

@@ -15,7 +15,6 @@ from youtube_dl.extractor import (
FacebookIE,
gen_extractors,
JustinTVIE,
PBSIE,
YoutubeIE,
)
@@ -69,9 +68,6 @@ class TestAllURLsMatching(unittest.TestCase):
def test_youtube_show_matching(self):
self.assertMatch('http://www.youtube.com/show/airdisasters', ['youtube:show'])
def test_youtube_truncated(self):
self.assertMatch('http://www.youtube.com/watch?', ['youtube:truncated_url'])
def test_youtube_search_matching(self):
self.assertMatch('http://www.youtube.com/results?search_query=making+mustard', ['youtube:search_url'])
self.assertMatch('https://www.youtube.com/results?baz=bar&search_query=youtube-dl+test+video&filters=video&lclk=video', ['youtube:search_url'])
@@ -103,6 +99,7 @@ class TestAllURLsMatching(unittest.TestCase):
def test_facebook_matching(self):
self.assertTrue(FacebookIE.suitable('https://www.facebook.com/Shiniknoh#!/photo.php?v=10153317450565268'))
self.assertTrue(FacebookIE.suitable('https://www.facebook.com/cindyweather?fref=ts#!/photo.php?v=10152183998945793'))
def test_no_duplicates(self):
ies = gen_extractors()

View File

@@ -7,10 +7,10 @@ import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import (
assertGreaterEqual,
get_params,
gettestcases,
expect_info_dict,
md5,
try_rm,
report_warning,
)
@@ -24,7 +24,6 @@ import socket
import youtube_dl.YoutubeDL
from youtube_dl.utils import (
compat_http_client,
compat_str,
compat_urllib_error,
compat_HTTPError,
DownloadError,
@@ -65,15 +64,21 @@ def generator(test_case):
def test_template(self):
ie = youtube_dl.extractor.get_info_extractor(test_case['name'])
other_ies = [get_info_extractor(ie_key) for ie_key in test_case.get('add_ie', [])]
is_playlist = any(k.startswith('playlist') for k in test_case)
test_cases = test_case.get(
'playlist', [] if is_playlist else [test_case])
def print_skipping(reason):
print('Skipping %s: %s' % (test_case['name'], reason))
if not ie.working():
print_skipping('IE marked as not _WORKING')
return
if 'playlist' not in test_case:
info_dict = test_case.get('info_dict', {})
if not test_case.get('file') and not (info_dict.get('id') and info_dict.get('ext')):
for tc in test_cases:
info_dict = tc.get('info_dict', {})
if not tc.get('file') and not (info_dict.get('id') and info_dict.get('ext')):
raise Exception('Test definition incorrect. The output file cannot be known. Are both \'id\' and \'ext\' keys present?')
if 'skip' in test_case:
print_skipping(test_case['skip'])
return
@@ -83,6 +88,9 @@ def generator(test_case):
return
params = get_params(test_case.get('params', {}))
if is_playlist and 'playlist' not in test_case:
params.setdefault('extract_flat', True)
params.setdefault('skip_download', True)
ydl = YoutubeDL(params)
ydl.add_default_info_extractors()
@@ -95,7 +103,6 @@ def generator(test_case):
def get_tc_filename(tc):
return tc.get('file') or ydl.prepare_filename(tc.get('info_dict', {}))
test_cases = test_case.get('playlist', [test_case])
def try_rm_tcs_files():
for tc in test_cases:
tc_filename = get_tc_filename(tc)
@@ -107,7 +114,10 @@ def generator(test_case):
try_num = 1
while True:
try:
ydl.download([test_case['url']])
# We're not using .download here sine that is just a shim
# for outside error handling, and returns the exit code
# instead of the result dict.
res_dict = ydl.extract_info(test_case['url'])
except (DownloadError, ExtractorError) as err:
# Check if the exception is not a network related one
if not err.exc_info[0] in (compat_urllib_error.URLError, socket.timeout, UnavailableVideoError, compat_http_client.BadStatusLine) or (err.exc_info[0] == compat_HTTPError and err.exc_info[1].code == 503):
@@ -123,6 +133,23 @@ def generator(test_case):
else:
break
if is_playlist:
self.assertEqual(res_dict['_type'], 'playlist')
expect_info_dict(self, test_case.get('info_dict', {}), res_dict)
if 'playlist_mincount' in test_case:
assertGreaterEqual(
self,
len(res_dict['entries']),
test_case['playlist_mincount'],
'Expected at least %d in playlist %s, but got only %d' % (
test_case['playlist_mincount'], test_case['url'],
len(res_dict['entries'])))
if 'playlist_count' in test_case:
self.assertEqual(
len(res_dict['entries']),
test_case['playlist_count'],
'Expected at %d in playlist %s, but got %d.')
for tc in test_cases:
tc_filename = get_tc_filename(tc)
if not test_case.get('params', {}).get('skip_download', False):

View File

@@ -1,6 +1,17 @@
#!/usr/bin/env python
# encoding: utf-8
## DEPRECATED FILE!
# Add new tests to the extractors themselves, like this:
# _TEST = {
# 'url': 'http://example.com/playlist/42',
# 'playlist_mincount': 99,
# 'info_dict': {
# 'id': '42',
# 'title': 'Playlist number forty-two',
# }
# }
from __future__ import unicode_literals
# Allow direct execution
@@ -10,6 +21,8 @@ import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import (
assertRegexpMatches,
assertGreaterEqual,
expect_info_dict,
FakeYDL,
)
@@ -22,10 +35,14 @@ from youtube_dl.extractor import (
VimeoUserIE,
VimeoAlbumIE,
VimeoGroupsIE,
VineUserIE,
UstreamChannelIE,
SoundcloudSetIE,
SoundcloudUserIE,
SoundcloudPlaylistIE,
TeacherTubeUserIE,
LivestreamIE,
LivestreamOriginalIE,
NHLVideocenterIE,
BambuserChannelIE,
BandcampAlbumIE,
@@ -36,6 +53,7 @@ from youtube_dl.extractor import (
KhanAcademyIE,
EveryonesMixtapeIE,
RutubeChannelIE,
RutubePersonIE,
GoogleSearchIE,
GenericIE,
TEDIE,
@@ -44,6 +62,7 @@ from youtube_dl.extractor import (
InstagramUserIE,
CSpanIE,
AolIE,
GameOnePlaylistIE,
)
@@ -65,8 +84,8 @@ class TestPlaylists(unittest.TestCase):
ie = DailymotionUserIE(dl)
result = ie.extract('https://www.dailymotion.com/user/nqtv')
self.assertIsPlaylist(result)
assertGreaterEqual(self, len(result['entries']), 100)
self.assertEqual(result['title'], 'Rémi Gaillard')
self.assertTrue(len(result['entries']) >= 100)
def test_vimeo_channel(self):
dl = FakeYDL()
@@ -100,13 +119,20 @@ class TestPlaylists(unittest.TestCase):
self.assertEqual(result['title'], 'Rolex Awards for Enterprise')
self.assertTrue(len(result['entries']) > 72)
def test_vine_user(self):
dl = FakeYDL()
ie = VineUserIE(dl)
result = ie.extract('https://vine.co/Visa')
self.assertIsPlaylist(result)
assertGreaterEqual(self, len(result['entries']), 47)
def test_ustream_channel(self):
dl = FakeYDL()
ie = UstreamChannelIE(dl)
result = ie.extract('http://www.ustream.tv/channel/young-americans-for-liberty')
result = ie.extract('http://www.ustream.tv/channel/channeljapan')
self.assertIsPlaylist(result)
self.assertEqual(result['id'], '5124905')
self.assertTrue(len(result['entries']) >= 6)
self.assertEqual(result['id'], '10874166')
assertGreaterEqual(self, len(result['entries']), 54)
def test_soundcloud_set(self):
dl = FakeYDL()
@@ -114,7 +140,7 @@ class TestPlaylists(unittest.TestCase):
result = ie.extract('https://soundcloud.com/the-concept-band/sets/the-royal-concept-ep')
self.assertIsPlaylist(result)
self.assertEqual(result['title'], 'The Royal Concept EP')
self.assertTrue(len(result['entries']) >= 6)
assertGreaterEqual(self, len(result['entries']), 6)
def test_soundcloud_user(self):
dl = FakeYDL()
@@ -122,7 +148,26 @@ class TestPlaylists(unittest.TestCase):
result = ie.extract('https://soundcloud.com/the-concept-band')
self.assertIsPlaylist(result)
self.assertEqual(result['id'], '9615865')
self.assertTrue(len(result['entries']) >= 12)
assertGreaterEqual(self, len(result['entries']), 12)
def test_soundcloud_likes(self):
dl = FakeYDL()
ie = SoundcloudUserIE(dl)
result = ie.extract('https://soundcloud.com/the-concept-band/likes')
self.assertIsPlaylist(result)
self.assertEqual(result['id'], '9615865')
assertGreaterEqual(self, len(result['entries']), 1)
def test_soundcloud_playlist(self):
dl = FakeYDL()
ie = SoundcloudPlaylistIE(dl)
result = ie.extract('http://api.soundcloud.com/playlists/4110309')
self.assertIsPlaylist(result)
self.assertEqual(result['id'], '4110309')
self.assertEqual(result['title'], 'TILT Brass - Bowery Poetry Club, August \'03 [Non-Site SCR 02]')
assertRegexpMatches(
self, result['description'], r'.*?TILT Brass - Bowery Poetry Club')
self.assertEqual(len(result['entries']), 6)
def test_livestream_event(self):
dl = FakeYDL()
@@ -130,7 +175,15 @@ class TestPlaylists(unittest.TestCase):
result = ie.extract('http://new.livestream.com/tedx/cityenglish')
self.assertIsPlaylist(result)
self.assertEqual(result['title'], 'TEDCity2.0 (English)')
self.assertTrue(len(result['entries']) >= 4)
assertGreaterEqual(self, len(result['entries']), 4)
def test_livestreamoriginal_folder(self):
dl = FakeYDL()
ie = LivestreamOriginalIE(dl)
result = ie.extract('https://www.livestream.com/newplay/folder?dirId=a07bf706-d0e4-4e75-a747-b021d84f2fd3')
self.assertIsPlaylist(result)
self.assertEqual(result['id'], 'a07bf706-d0e4-4e75-a747-b021d84f2fd3')
assertGreaterEqual(self, len(result['entries']), 28)
def test_nhl_videocenter(self):
dl = FakeYDL()
@@ -147,15 +200,15 @@ class TestPlaylists(unittest.TestCase):
result = ie.extract('http://bambuser.com/channel/pixelversity')
self.assertIsPlaylist(result)
self.assertEqual(result['title'], 'pixelversity')
self.assertTrue(len(result['entries']) >= 60)
assertGreaterEqual(self, len(result['entries']), 60)
def test_bandcamp_album(self):
dl = FakeYDL()
ie = BandcampAlbumIE(dl)
result = ie.extract('http://mpallante.bandcamp.com/album/nightmare-night-ep')
result = ie.extract('http://nightbringer.bandcamp.com/album/hierophany-of-the-open-grave')
self.assertIsPlaylist(result)
self.assertEqual(result['title'], 'Nightmare Night EP')
self.assertTrue(len(result['entries']) >= 4)
self.assertEqual(result['title'], 'Hierophany of the Open Grave')
assertGreaterEqual(self, len(result['entries']), 9)
def test_smotri_community(self):
dl = FakeYDL()
@@ -164,7 +217,7 @@ class TestPlaylists(unittest.TestCase):
self.assertIsPlaylist(result)
self.assertEqual(result['id'], 'kommuna')
self.assertEqual(result['title'], 'КПРФ')
self.assertTrue(len(result['entries']) >= 4)
assertGreaterEqual(self, len(result['entries']), 4)
def test_smotri_user(self):
dl = FakeYDL()
@@ -173,7 +226,7 @@ class TestPlaylists(unittest.TestCase):
self.assertIsPlaylist(result)
self.assertEqual(result['id'], 'inspector')
self.assertEqual(result['title'], 'Inspector')
self.assertTrue(len(result['entries']) >= 9)
assertGreaterEqual(self, len(result['entries']), 9)
def test_AcademicEarthCourse(self):
dl = FakeYDL()
@@ -188,20 +241,20 @@ class TestPlaylists(unittest.TestCase):
def test_ivi_compilation(self):
dl = FakeYDL()
ie = IviCompilationIE(dl)
result = ie.extract('http://www.ivi.ru/watch/dezhurnyi_angel')
result = ie.extract('http://www.ivi.ru/watch/dvoe_iz_lartsa')
self.assertIsPlaylist(result)
self.assertEqual(result['id'], 'dezhurnyi_angel')
self.assertEqual(result['title'], 'Дежурный ангел (2010 - 2012)')
self.assertTrue(len(result['entries']) >= 23)
self.assertEqual(result['id'], 'dvoe_iz_lartsa')
self.assertEqual(result['title'], 'Двое из ларца (2006 - 2008)')
assertGreaterEqual(self, len(result['entries']), 24)
def test_ivi_compilation_season(self):
dl = FakeYDL()
ie = IviCompilationIE(dl)
result = ie.extract('http://www.ivi.ru/watch/dezhurnyi_angel/season2')
result = ie.extract('http://www.ivi.ru/watch/dvoe_iz_lartsa/season1')
self.assertIsPlaylist(result)
self.assertEqual(result['id'], 'dezhurnyi_angel/season2')
self.assertEqual(result['title'], 'Дежурный ангел (2010 - 2012) 2 сезон')
self.assertTrue(len(result['entries']) >= 7)
self.assertEqual(result['id'], 'dvoe_iz_lartsa/season1')
self.assertEqual(result['title'], 'Двое из ларца (2006 - 2008) 1 сезон')
assertGreaterEqual(self, len(result['entries']), 12)
def test_imdb_list(self):
dl = FakeYDL()
@@ -220,7 +273,7 @@ class TestPlaylists(unittest.TestCase):
self.assertEqual(result['id'], 'cryptography')
self.assertEqual(result['title'], 'Journey into cryptography')
self.assertEqual(result['description'], 'How have humans protected their secret messages through history? What has changed today?')
self.assertTrue(len(result['entries']) >= 3)
assertGreaterEqual(self, len(result['entries']), 3)
def test_EveryonesMixtape(self):
dl = FakeYDL()
@@ -234,10 +287,18 @@ class TestPlaylists(unittest.TestCase):
def test_rutube_channel(self):
dl = FakeYDL()
ie = RutubeChannelIE(dl)
result = ie.extract('http://rutube.ru/tags/video/1409')
result = ie.extract('http://rutube.ru/tags/video/1800/')
self.assertIsPlaylist(result)
self.assertEqual(result['id'], '1409')
self.assertTrue(len(result['entries']) >= 34)
self.assertEqual(result['id'], '1800')
assertGreaterEqual(self, len(result['entries']), 68)
def test_rutube_person(self):
dl = FakeYDL()
ie = RutubePersonIE(dl)
result = ie.extract('http://rutube.ru/video/person/313878/')
self.assertIsPlaylist(result)
self.assertEqual(result['id'], '313878')
assertGreaterEqual(self, len(result['entries']), 37)
def test_multiple_brightcove_videos(self):
# https://github.com/rg3/youtube-dl/issues/2283
@@ -249,24 +310,6 @@ class TestPlaylists(unittest.TestCase):
self.assertEqual(result['title'], 'Always/Never: A Little-Seen Movie About Nuclear Command and Control : The New Yorker')
self.assertEqual(len(result['entries']), 3)
def test_GoogleSearch(self):
dl = FakeYDL()
ie = GoogleSearchIE(dl)
result = ie.extract('gvsearch15:python language')
self.assertIsPlaylist(result)
self.assertEqual(result['id'], 'python language')
self.assertEqual(result['title'], 'python language')
self.assertEqual(len(result['entries']), 15)
def test_generic_rss_feed(self):
dl = FakeYDL()
ie = GenericIE(dl)
result = ie.extract('http://phihag.de/2014/youtube-dl/rss.xml')
self.assertIsPlaylist(result)
self.assertEqual(result['id'], 'http://phihag.de/2014/youtube-dl/rss.xml')
self.assertEqual(result['title'], 'Zero Punctuation')
self.assertTrue(len(result['entries']) > 10)
def test_ted_playlist(self):
dl = FakeYDL()
ie = TEDIE(dl)
@@ -274,7 +317,7 @@ class TestPlaylists(unittest.TestCase):
self.assertIsPlaylist(result)
self.assertEqual(result['id'], '10')
self.assertEqual(result['title'], 'Who are the hackers?')
self.assertTrue(len(result['entries']) >= 6)
assertGreaterEqual(self, len(result['entries']), 6)
def test_toypics_user(self):
dl = FakeYDL()
@@ -282,7 +325,7 @@ class TestPlaylists(unittest.TestCase):
result = ie.extract('http://videos.toypics.net/Mikey')
self.assertIsPlaylist(result)
self.assertEqual(result['id'], 'Mikey')
self.assertTrue(len(result['entries']) >= 17)
assertGreaterEqual(self, len(result['entries']), 17)
def test_xtube_user(self):
dl = FakeYDL()
@@ -290,7 +333,7 @@ class TestPlaylists(unittest.TestCase):
result = ie.extract('http://www.xtube.com/community/profile.php?user=greenshowers')
self.assertIsPlaylist(result)
self.assertEqual(result['id'], 'greenshowers')
self.assertTrue(len(result['entries']) >= 155)
assertGreaterEqual(self, len(result['entries']), 155)
def test_InstagramUser(self):
dl = FakeYDL()
@@ -298,7 +341,7 @@ class TestPlaylists(unittest.TestCase):
result = ie.extract('http://instagram.com/porsche')
self.assertIsPlaylist(result)
self.assertEqual(result['id'], 'porsche')
self.assertTrue(len(result['entries']) >= 2)
assertGreaterEqual(self, len(result['entries']), 2)
test_video = next(
e for e in result['entries']
if e['id'] == '614605558512799803_462752227')
@@ -337,7 +380,16 @@ class TestPlaylists(unittest.TestCase):
self.assertEqual(result['id'], '152147')
self.assertEqual(
result['title'], 'Brace Yourself - Today\'s Weirdest News')
self.assertTrue(len(result['entries']) >= 10)
assertGreaterEqual(self, len(result['entries']), 10)
def test_TeacherTubeUser(self):
dl = FakeYDL()
ie = TeacherTubeUserIE(dl)
result = ie.extract('http://www.teachertube.com/user/profile/rbhagwati2')
self.assertIsPlaylist(result)
self.assertEqual(result['id'], 'rbhagwati2')
assertGreaterEqual(self, len(result['entries']), 179)
if __name__ == '__main__':
unittest.main()

View File

@@ -87,7 +87,7 @@ class TestYoutubeSubtitles(BaseTestSubtitles):
def test_youtube_nosubtitles(self):
self.DL.expect_warning(u'video doesn\'t have subtitles')
self.url = 'sAjKT8FhjI8'
self.url = 'n5BB19UTcdA'
self.DL.params['writesubtitles'] = True
self.DL.params['allsubtitles'] = True
subtitles = self.getSubtitles()

77
test/test_swfinterp.py Normal file
View File

@@ -0,0 +1,77 @@
#!/usr/bin/env python
# Allow direct execution
import os
import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
import errno
import io
import json
import re
import subprocess
from youtube_dl.swfinterp import SWFInterpreter
TEST_DIR = os.path.join(
os.path.dirname(os.path.abspath(__file__)), 'swftests')
class TestSWFInterpreter(unittest.TestCase):
pass
def _make_testfunc(testfile):
m = re.match(r'^(.*)\.(as)$', testfile)
if not m:
return
test_id = m.group(1)
def test_func(self):
as_file = os.path.join(TEST_DIR, testfile)
swf_file = os.path.join(TEST_DIR, test_id + '.swf')
if ((not os.path.exists(swf_file))
or os.path.getmtime(swf_file) < os.path.getmtime(as_file)):
# Recompile
try:
subprocess.check_call(['mxmlc', '-output', swf_file, as_file])
except OSError as ose:
if ose.errno == errno.ENOENT:
print('mxmlc not found! Skipping test.')
return
raise
with open(swf_file, 'rb') as swf_f:
swf_content = swf_f.read()
swfi = SWFInterpreter(swf_content)
with io.open(as_file, 'r', encoding='utf-8') as as_f:
as_content = as_f.read()
def _find_spec(key):
m = re.search(
r'(?m)^//\s*%s:\s*(.*?)\n' % re.escape(key), as_content)
if not m:
raise ValueError('Cannot find %s in %s' % (key, testfile))
return json.loads(m.group(1))
input_args = _find_spec('input')
output = _find_spec('output')
swf_class = swfi.extract_class(test_id)
func = swfi.extract_function(swf_class, 'main')
res = func(input_args)
self.assertEqual(res, output)
test_func.__name__ = str('test_swf_' + test_id)
setattr(TestSWFInterpreter, test_func.__name__, test_func)
for testfile in os.listdir(TEST_DIR):
_make_testfunc(testfile)
if __name__ == '__main__':
unittest.main()

View File

@@ -219,6 +219,7 @@ class TestUtil(unittest.TestCase):
self.assertEqual(parse_duration('0h0m0s'), 0)
self.assertEqual(parse_duration('0m0s'), 0)
self.assertEqual(parse_duration('0s'), 0)
self.assertEqual(parse_duration('01:02:03.05'), 3723.05)
def test_fix_xml_ampersands(self):
self.assertEqual(
@@ -280,7 +281,7 @@ class TestUtil(unittest.TestCase):
d = json.loads(stripped)
self.assertEqual(d, [{"id": "532cb", "x": 3}])
def test_uppercase_escpae(self):
def test_uppercase_escape(self):
self.assertEqual(uppercase_escape(u''), u'')
self.assertEqual(uppercase_escape(u'\\U0001d550'), u'𝕐')

View File

@@ -112,11 +112,11 @@ class TestYoutubeLists(unittest.TestCase):
def test_youtube_mix(self):
dl = FakeYDL()
ie = YoutubePlaylistIE(dl)
result = ie.extract('http://www.youtube.com/watch?v=lLJf9qJHR3E&list=RDrjFaenf1T-Y')
result = ie.extract('https://www.youtube.com/watch?v=W01L70IGBgE&index=2&list=RDOQpdSVF_k_w')
entries = result['entries']
self.assertTrue(len(entries) >= 20)
original_video = entries[0]
self.assertEqual(original_video['id'], 'rjFaenf1T-Y')
self.assertEqual(original_video['id'], 'OQpdSVF_k_w')
def test_youtube_toptracks(self):
print('Skipping: The playlist page gives error 500')

View File

@@ -1,5 +1,7 @@
#!/usr/bin/env python
from __future__ import unicode_literals
# Allow direct execution
import os
import sys
@@ -16,23 +18,65 @@ from youtube_dl.utils import compat_str, compat_urlretrieve
_TESTS = [
(
u'https://s.ytimg.com/yts/jsbin/html5player-vflHOr_nV.js',
u'js',
'https://s.ytimg.com/yts/jsbin/html5player-vflHOr_nV.js',
'js',
86,
u'>=<;:/.-[+*)(\'&%$#"!ZYX0VUTSRQPONMLKJIHGFEDCBA\\yxwvutsrqponmlkjihgfedcba987654321',
'>=<;:/.-[+*)(\'&%$#"!ZYX0VUTSRQPONMLKJIHGFEDCBA\\yxwvutsrqponmlkjihgfedcba987654321',
),
(
u'https://s.ytimg.com/yts/jsbin/html5player-vfldJ8xgI.js',
u'js',
'https://s.ytimg.com/yts/jsbin/html5player-vfldJ8xgI.js',
'js',
85,
u'3456789a0cdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRS[UVWXYZ!"#$%&\'()*+,-./:;<=>?@',
'3456789a0cdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRS[UVWXYZ!"#$%&\'()*+,-./:;<=>?@',
),
(
u'https://s.ytimg.com/yts/jsbin/html5player-vfle-mVwz.js',
u'js',
'https://s.ytimg.com/yts/jsbin/html5player-vfle-mVwz.js',
'js',
90,
u']\\[@?>=<;:/.-,+*)(\'&%$#"hZYXWVUTSRQPONMLKJIHGFEDCBAzyxwvutsrqponmlkjiagfedcb39876',
']\\[@?>=<;:/.-,+*)(\'&%$#"hZYXWVUTSRQPONMLKJIHGFEDCBAzyxwvutsrqponmlkjiagfedcb39876',
),
(
'https://s.ytimg.com/yts/jsbin/html5player-en_US-vfl0Cbn9e.js',
'js',
84,
'O1I3456789abcde0ghijklmnopqrstuvwxyzABCDEFGHfJKLMN2PQRSTUVW@YZ!"#$%&\'()*+,-./:;<=',
),
(
'https://s.ytimg.com/yts/jsbin/html5player-en_US-vflXGBaUN.js',
'js',
'2ACFC7A61CA478CD21425E5A57EBD73DDC78E22A.2094302436B2D377D14A3BBA23022D023B8BC25AA',
'A52CB8B320D22032ABB3A41D773D2B6342034902.A22E87CDD37DBE75A5E52412DC874AC16A7CFCA2',
),
(
'http://s.ytimg.com/yts/swfbin/player-vfl5vIhK2/watch_as3.swf',
'swf',
86,
'O1I3456789abcde0ghijklmnopqrstuvwxyzABCDEFGHfJKLMN2PQRSTUVWXY\\!"#$%&\'()*+,-./:;<=>?'
),
(
'http://s.ytimg.com/yts/swfbin/player-vflmDyk47/watch_as3.swf',
'swf',
'F375F75BF2AFDAAF2666E43868D46816F83F13E81C46.3725A8218E446A0DECD33F79DC282994D6AA92C92C9',
'9C29AA6D499282CD97F33DCED0A644E8128A5273.64C18E31F38361864D86834E6662FAADFA2FB57F'
),
(
'https://s.ytimg.com/yts/jsbin/html5player-en_US-vflBb0OQx.js',
'js',
84,
'123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQ0STUVWXYZ!"#$%&\'()*+,@./:;<=>'
),
(
'https://s.ytimg.com/yts/jsbin/html5player-en_US-vfl9FYC6l.js',
'js',
83,
'123456789abcdefghijklmnopqr0tuvwxyzABCDETGHIJKLMNOPQRS>UVWXYZ!"#$%&\'()*+,-./:;<=F'
),
(
'https://s.ytimg.com/yts/jsbin/html5player-en_US-vflCGk6yw/html5player.js',
'js',
'4646B5181C6C3020DF1D9C7FCFEA.AD80ABF70C39BD369CCCAE780AFBB98FA6B6CB42766249D9488C288',
'82C8849D94266724DC6B6AF89BBFA087EACCD963.B93C07FBA084ACAEFCF7C9D1FD0203C6C1815B6B'
)
]
@@ -44,13 +88,13 @@ class TestSignature(unittest.TestCase):
os.mkdir(self.TESTDATA_DIR)
def make_tfunc(url, stype, sig_length, expected_sig):
basename = url.rpartition('/')[2]
m = re.match(r'.*-([a-zA-Z0-9_-]+)\.[a-z]+$', basename)
assert m, '%r should follow URL format' % basename
def make_tfunc(url, stype, sig_input, expected_sig):
m = re.match(r'.*-([a-zA-Z0-9_-]+)(?:/watch_as3|/html5player)?\.[a-z]+$', url)
assert m, '%r should follow URL format' % url
test_id = m.group(1)
def test_func(self):
basename = 'player-%s.%s' % (test_id, stype)
fn = os.path.join(self.TESTDATA_DIR, basename)
if not os.path.exists(fn):
@@ -66,7 +110,9 @@ def make_tfunc(url, stype, sig_length, expected_sig):
with open(fn, 'rb') as testf:
swfcode = testf.read()
func = ie._parse_sig_swf(swfcode)
src_sig = compat_str(string.printable[:sig_length])
src_sig = (
compat_str(string.printable[:sig_input])
if isinstance(sig_input, int) else sig_input)
got_sig = func(src_sig)
self.assertEqual(got_sig, expected_sig)

View File

@@ -1,12 +0,0 @@
# Legacy file for backwards compatibility, use youtube_dl.downloader instead!
from .downloader import FileDownloader as RealFileDownloader
from .downloader import get_suitable_downloader
# This class reproduces the old behaviour of FileDownloader
class FileDownloader(RealFileDownloader):
def _do_download(self, filename, info_dict):
real_fd = get_suitable_downloader(info_dict)(self.ydl, self.params)
for ph in self._progress_hooks:
real_fd.add_progress_hook(ph)
return real_fd.download(filename, info_dict)

View File

@@ -162,6 +162,7 @@ class YoutubeDL(object):
default_search: Prepend this string if an input url is not valid.
'auto' for elaborate guessing
encoding: Use this encoding instead of the system-specified.
extract_flat: Do not resolve URLs, return the immediate result.
The following parameters are not used by YoutubeDL itself, they are used by
the FileDownloader:
@@ -171,6 +172,7 @@ class YoutubeDL(object):
The following options are used by the post processors:
prefer_ffmpeg: If True, use ffmpeg instead of avconv if both are available,
otherwise prefer avconv.
exec_cmd: Arbitrary command to run after downloading
"""
params = None
@@ -275,7 +277,7 @@ class YoutubeDL(object):
return message
assert hasattr(self, '_output_process')
assert type(message) == type('')
assert isinstance(message, compat_str)
line_count = message.count('\n') + 1
self._output_process.stdin.write((message + '\n').encode('utf-8'))
self._output_process.stdin.flush()
@@ -303,7 +305,7 @@ class YoutubeDL(object):
def to_stderr(self, message):
"""Print message to stderr."""
assert type(message) == type('')
assert isinstance(message, compat_str)
if self.params.get('logger'):
self.params['logger'].error(message)
else:
@@ -423,7 +425,7 @@ class YoutubeDL(object):
autonumber_templ = '%0' + str(autonumber_size) + 'd'
template_dict['autonumber'] = autonumber_templ % self._num_downloads
if template_dict.get('playlist_index') is not None:
template_dict['playlist_index'] = '%05d' % template_dict['playlist_index']
template_dict['playlist_index'] = '%0*d' % (len(str(template_dict['n_entries'])), template_dict['playlist_index'])
if template_dict.get('resolution') is None:
if template_dict.get('width') and template_dict.get('height'):
template_dict['resolution'] = '%dx%d' % (template_dict['width'], template_dict['height'])
@@ -479,7 +481,10 @@ class YoutubeDL(object):
return 'Skipping %s, because it has exceeded the maximum view count (%d/%d)' % (video_title, view_count, max_views)
age_limit = self.params.get('age_limit')
if age_limit is not None:
if age_limit < info_dict.get('age_limit', 0):
actual_age_limit = info_dict.get('age_limit')
if actual_age_limit is None:
actual_age_limit = 0
if age_limit < actual_age_limit:
return 'Skipping "' + title + '" because it is age restricted'
if self.in_download_archive(info_dict):
return '%s has already been recorded in archive' % video_title
@@ -558,7 +563,12 @@ class YoutubeDL(object):
Returns the resolved ie_result.
"""
result_type = ie_result.get('_type', 'video') # If not given we suppose it's a video, support the default old system
result_type = ie_result.get('_type', 'video')
if self.params.get('extract_flat', False):
if result_type in ('url', 'url_transparent'):
return ie_result
if result_type == 'video':
self.add_extra_info(ie_result, extra_info)
return self.process_video_result(ie_result, download=download)
@@ -627,6 +637,7 @@ class YoutubeDL(object):
for i, entry in enumerate(entries, 1):
self.to_screen('[download] Downloading video #%s of %s' % (i, n_entries))
extra = {
'n_entries': n_entries,
'playlist': playlist,
'playlist_index': i + playliststart,
'extractor': ie_result['extractor'],
@@ -717,6 +728,17 @@ class YoutubeDL(object):
info_dict['playlist'] = None
info_dict['playlist_index'] = None
thumbnails = info_dict.get('thumbnails')
if thumbnails:
thumbnails.sort(key=lambda t: (
t.get('width'), t.get('height'), t.get('url')))
for t in thumbnails:
if 'width' in t and 'height' in t:
t['resolution'] = '%dx%d' % (t['width'], t['height'])
if thumbnails and 'thumbnail' not in info_dict:
info_dict['thumbnail'] = thumbnails[-1]['url']
if 'display_id' not in info_dict and 'id' in info_dict:
info_dict['display_id'] = info_dict['id']
@@ -838,7 +860,7 @@ class YoutubeDL(object):
# Keep for backwards compatibility
info_dict['stitle'] = info_dict['title']
if not 'format' in info_dict:
if 'format' not in info_dict:
info_dict['format'] = info_dict['ext']
reason = self._match_entry(info_dict)
@@ -982,11 +1004,13 @@ class YoutubeDL(object):
fd = get_suitable_downloader(info)(self, self.params)
for ph in self._progress_hooks:
fd.add_progress_hook(ph)
if self.params.get('verbose'):
self.to_stdout('[debug] Invoking downloader on %r' % info.get('url'))
return fd.download(name, info)
if info_dict.get('requested_formats') is not None:
downloaded = []
success = True
merger = FFmpegMergerPP(self)
merger = FFmpegMergerPP(self, not self.params.get('keepvideo'))
if not merger._get_executable():
postprocessors = []
self.report_warning('You have requested multiple '
@@ -1184,6 +1208,10 @@ class YoutubeDL(object):
if res:
res += ', '
res += format_bytes(fdict['filesize'])
elif fdict.get('filesize_approx') is not None:
if res:
res += ', '
res += '~' + format_bytes(fdict['filesize_approx'])
return res
def list_formats(self, info_dict):
@@ -1217,14 +1245,18 @@ class YoutubeDL(object):
if not self.params.get('verbose'):
return
write_string(
if type('') is not compat_str:
# Python 2.6 on SLES11 SP1 (https://github.com/rg3/youtube-dl/issues/3326)
self.report_warning(
'Your Python is broken! Update to a newer and supported version')
encoding_str = (
'[debug] Encodings: locale %s, fs %s, out %s, pref %s\n' % (
locale.getpreferredencoding(),
sys.getfilesystemencoding(),
sys.stdout.encoding,
self.get_encoding()),
encoding=None
)
self.get_encoding()))
write_string(encoding_str, encoding=None)
self._write_string('[debug] youtube-dl version ' + __version__ + '\n')
try:

View File

@@ -53,18 +53,38 @@ __authors__ = (
'Mattias Harrysson',
'phaer',
'Sainyam Kapoor',
'Nicolas Évrard',
'Jason Normore',
'Hoje Lee',
'Adam Thalhammer',
'Georg Jähnig',
'Ralf Haring',
'Koki Takahashi',
'Ariset Llerena',
'Adam Malcontenti-Wilson',
'Tobias Bell',
'Naglis Jonaitis',
'Charles Chen',
'Hassaan Ali',
'Dobrosław Żybort',
'David Fabijan',
'Sebastian Haas',
'Alexander Kirk',
'Erik Johnson',
'Keith Beckman',
'Ole Ernst',
'Aaron McDaniel (mcd1992)',
)
__license__ = 'Public Domain'
import codecs
import io
import locale
import optparse
import os
import random
import re
import shlex
import shutil
import sys
@@ -86,7 +106,7 @@ from .utils import (
write_string,
)
from .update import update_self
from .FileDownloader import (
from .downloader import (
FileDownloader,
)
from .extractor import gen_extractors
@@ -100,6 +120,7 @@ from .postprocessor import (
FFmpegExtractAudioPP,
FFmpegEmbedSubtitlePP,
XAttrMetadataPP,
ExecAfterDownloadPP,
)
@@ -211,6 +232,7 @@ def parseOpts(overrideArguments=None):
downloader = optparse.OptionGroup(parser, 'Download Options')
postproc = optparse.OptionGroup(parser, 'Post-processing Options')
filesystem = optparse.OptionGroup(parser, 'Filesystem Options')
workarounds = optparse.OptionGroup(parser, 'Workarounds')
verbosity = optparse.OptionGroup(parser, 'Verbosity / Simulation Options')
general.add_option('-h', '--help',
@@ -227,14 +249,6 @@ def parseOpts(overrideArguments=None):
general.add_option('--dump-user-agent',
action='store_true', dest='dump_user_agent',
help='display the current browser identification', default=False)
general.add_option('--user-agent',
dest='user_agent', help='specify a custom user agent', metavar='UA')
general.add_option('--referer',
dest='referer', help='specify a custom referer, use if the video access is restricted to one domain',
metavar='REF', default=None)
general.add_option('--add-header',
dest='headers', help='specify a custom HTTP header and its value, separated by a colon \':\'. You can use this option multiple times', action="append",
metavar='FIELD:VALUE')
general.add_option('--list-extractors',
action='store_true', dest='list_extractors',
help='List all supported extractors and the URLs they would handle', default=False)
@@ -244,33 +258,17 @@ def parseOpts(overrideArguments=None):
general.add_option(
'--proxy', dest='proxy', default=None, metavar='URL',
help='Use the specified HTTP/HTTPS proxy. Pass in an empty string (--proxy "") for direct connection')
general.add_option('--no-check-certificate', action='store_true', dest='no_check_certificate', default=False, help='Suppress HTTPS certificate validation.')
general.add_option(
'--prefer-insecure', '--prefer-unsecure', action='store_true', dest='prefer_insecure',
help='Use an unencrypted connection to retrieve information about the video. (Currently supported only for YouTube)')
general.add_option(
'--cache-dir', dest='cachedir', default=get_cachedir(), metavar='DIR',
help='Location in the filesystem where youtube-dl can store some downloaded information permanently. By default $XDG_CACHE_HOME/youtube-dl or ~/.cache/youtube-dl . At the moment, only YouTube player files (for videos with obfuscated signatures) are cached, but that may change.')
general.add_option(
'--no-cache-dir', action='store_const', const=None, dest='cachedir',
help='Disable filesystem caching')
general.add_option(
'--socket-timeout', dest='socket_timeout',
type=float, default=None, help=u'Time to wait before giving up, in seconds')
general.add_option(
'--bidi-workaround', dest='bidi_workaround', action='store_true',
help=u'Work around terminals that lack bidirectional text support. Requires bidiv or fribidi executable in PATH')
general.add_option(
'--default-search',
dest='default_search', metavar='PREFIX',
help='Use this prefix for unqualified URLs. For example "gvsearch2:" downloads two videos from google videos for youtube-dl "large apple". By default (with value "auto") youtube-dl guesses.')
help='Use this prefix for unqualified URLs. For example "gvsearch2:" downloads two videos from google videos for youtube-dl "large apple". Use the value "auto" to let youtube-dl guess ("auto_warning" to emit a warning when guessing). "error" just throws an error. The default value "fixup_error" repairs broken URLs, but emits an error if this is not possible instead of searching.')
general.add_option(
'--ignore-config',
action='store_true',
help='Do not read configuration files. When given in the global configuration file /etc/youtube-dl.conf: do not read the user configuration in ~/.config/youtube-dl.conf (%APPDATA%/youtube-dl/config.txt on Windows)')
general.add_option(
'--encoding', dest='encoding', metavar='ENCODING',
help='Force the specified encoding (experimental)')
selection.add_option(
'--playlist-start',
@@ -322,6 +320,8 @@ def parseOpts(overrideArguments=None):
dest='username', metavar='USERNAME', help='account username')
authentication.add_option('-p', '--password',
dest='password', metavar='PASSWORD', help='account password')
authentication.add_option('-2', '--twofactor',
dest='twofactor', metavar='TWOFACTOR', help='two-factor auth code')
authentication.add_option('-n', '--netrc',
action='store_true', dest='usenetrc', help='use .netrc authentication data', default=False)
authentication.add_option('--video-password',
@@ -371,6 +371,33 @@ def parseOpts(overrideArguments=None):
help='do not automatically adjust the buffer size. By default, the buffer size is automatically resized from an initial value of SIZE.', default=False)
downloader.add_option('--test', action='store_true', dest='test', default=False, help=optparse.SUPPRESS_HELP)
workarounds.add_option(
'--encoding', dest='encoding', metavar='ENCODING',
help='Force the specified encoding (experimental)')
workarounds.add_option(
'--no-check-certificate', action='store_true',
dest='no_check_certificate', default=False,
help='Suppress HTTPS certificate validation.')
workarounds.add_option(
'--prefer-insecure', '--prefer-unsecure', action='store_true', dest='prefer_insecure',
help='Use an unencrypted connection to retrieve information about the video. (Currently supported only for YouTube)')
workarounds.add_option(
'--user-agent', metavar='UA',
dest='user_agent', help='specify a custom user agent')
workarounds.add_option(
'--referer', metavar='REF',
dest='referer', default=None,
help='specify a custom referer, use if the video access is restricted to one domain',
)
workarounds.add_option(
'--add-header', metavar='FIELD:VALUE',
dest='headers', action='append',
help='specify a custom HTTP header and its value, separated by a colon \':\'. You can use this option multiple times',
)
workarounds.add_option(
'--bidi-workaround', dest='bidi_workaround', action='store_true',
help=u'Work around terminals that lack bidirectional text support. Requires bidiv or fribidi executable in PATH')
verbosity.add_option('-q', '--quiet',
action='store_true', dest='quiet', help='activates quiet mode', default=False)
verbosity.add_option(
@@ -428,12 +455,10 @@ def parseOpts(overrideArguments=None):
help='Display sent and read HTTP traffic')
filesystem.add_option('-t', '--title',
action='store_true', dest='usetitle', help='use title in file name (default)', default=False)
filesystem.add_option('-a', '--batch-file',
dest='batchfile', metavar='FILE', help='file containing URLs to download (\'-\' for stdin)')
filesystem.add_option('--id',
action='store_true', dest='useid', help='use only video ID in file name', default=False)
filesystem.add_option('-l', '--literal',
action='store_true', dest='usetitle', help='[deprecated] alias of --title', default=False)
filesystem.add_option('-A', '--auto-number',
action='store_true', dest='autonumber',
help='number downloaded files starting from 00000', default=False)
@@ -459,11 +484,10 @@ def parseOpts(overrideArguments=None):
filesystem.add_option('--restrict-filenames',
action='store_true', dest='restrictfilenames',
help='Restrict filenames to only ASCII characters, and avoid "&" and spaces in filenames', default=False)
filesystem.add_option('-a', '--batch-file',
dest='batchfile', metavar='FILE', help='file containing URLs to download (\'-\' for stdin)')
filesystem.add_option('--load-info',
dest='load_info_filename', metavar='FILE',
help='json file containing the video information (created with the "--write-json" option)')
filesystem.add_option('-t', '--title',
action='store_true', dest='usetitle', help='[deprecated] use title in file name (default)', default=False)
filesystem.add_option('-l', '--literal',
action='store_true', dest='usetitle', help='[deprecated] alias of --title', default=False)
filesystem.add_option('-w', '--no-overwrites',
action='store_true', dest='nooverwrites', help='do not overwrite files', default=False)
filesystem.add_option('-c', '--continue',
@@ -471,8 +495,6 @@ def parseOpts(overrideArguments=None):
filesystem.add_option('--no-continue',
action='store_false', dest='continue_dl',
help='do not resume partially downloaded files (restart from beginning)')
filesystem.add_option('--cookies',
dest='cookiefile', metavar='FILE', help='file to read cookies from and dump cookie jar in')
filesystem.add_option('--no-part',
action='store_true', dest='nopart', help='do not use .part files', default=False)
filesystem.add_option('--no-mtime',
@@ -490,6 +512,20 @@ def parseOpts(overrideArguments=None):
filesystem.add_option('--write-thumbnail',
action='store_true', dest='writethumbnail',
help='write thumbnail image to disk', default=False)
filesystem.add_option('--load-info',
dest='load_info_filename', metavar='FILE',
help='json file containing the video information (created with the "--write-json" option)')
filesystem.add_option('--cookies',
dest='cookiefile', metavar='FILE', help='file to read cookies from and dump cookie jar in')
filesystem.add_option(
'--cache-dir', dest='cachedir', default=get_cachedir(), metavar='DIR',
help='Location in the filesystem where youtube-dl can store some downloaded information permanently. By default $XDG_CACHE_HOME/youtube-dl or ~/.cache/youtube-dl . At the moment, only YouTube player files (for videos with obfuscated signatures) are cached, but that may change.')
filesystem.add_option(
'--no-cache-dir', action='store_const', const=None, dest='cachedir',
help='Disable filesystem caching')
filesystem.add_option(
'--rm-cache-dir', action='store_true', dest='rm_cachedir',
help='Delete all filesystem cache files')
postproc.add_option('-x', '--extract-audio', action='store_true', dest='extractaudio', default=False,
@@ -499,7 +535,7 @@ def parseOpts(overrideArguments=None):
postproc.add_option('--audio-quality', metavar='QUALITY', dest='audioquality', default='5',
help='ffmpeg/avconv audio quality specification, insert a value between 0 (better) and 9 (worse) for VBR or a specific bitrate like 128K (default 5)')
postproc.add_option('--recode-video', metavar='FORMAT', dest='recodevideo', default=None,
help='Encode the video to another format if necessary (currently supported: mp4|flv|ogg|webm)')
help='Encode the video to another format if necessary (currently supported: mp4|flv|ogg|webm|mkv)')
postproc.add_option('-k', '--keep-video', action='store_true', dest='keepvideo', default=False,
help='keeps the video file on disk after the post-processing; the video is erased by default')
postproc.add_option('--no-post-overwrites', action='store_true', dest='nopostoverwrites', default=False,
@@ -516,13 +552,16 @@ def parseOpts(overrideArguments=None):
help='Prefer avconv over ffmpeg for running the postprocessors (default)')
postproc.add_option('--prefer-ffmpeg', action='store_true', dest='prefer_ffmpeg',
help='Prefer ffmpeg over avconv for running the postprocessors')
postproc.add_option(
'--exec', metavar='CMD', dest='exec_cmd',
help='Execute a command on the file after downloading, similar to find\'s -exec syntax. Example: --exec \'adb push {} /sdcard/Music/ && rm {}\'' )
parser.add_option_group(general)
parser.add_option_group(selection)
parser.add_option_group(downloader)
parser.add_option_group(filesystem)
parser.add_option_group(verbosity)
parser.add_option_group(workarounds)
parser.add_option_group(video_format)
parser.add_option_group(subtitles)
parser.add_option_group(authentication)
@@ -622,7 +661,7 @@ def _real_main(argv=None):
if desc is False:
continue
if hasattr(ie, 'SEARCH_KEY'):
_SEARCHES = (u'cute kittens', u'slithering pythons', u'falling cat', u'angry poodle', u'purple fish', u'running tortoise')
_SEARCHES = (u'cute kittens', u'slithering pythons', u'falling cat', u'angry poodle', u'purple fish', u'running tortoise', u'sleeping bunny')
_COUNTS = (u'', u'5', u'10', u'all')
desc += u' (Example: "%s%s:%s" )' % (ie.SEARCH_KEY, random.choice(_COUNTS), random.choice(_SEARCHES))
compat_print(desc)
@@ -677,13 +716,13 @@ def _real_main(argv=None):
if not opts.audioquality.isdigit():
parser.error(u'invalid audio quality specified')
if opts.recodevideo is not None:
if opts.recodevideo not in ['mp4', 'flv', 'webm', 'ogg']:
if opts.recodevideo not in ['mp4', 'flv', 'webm', 'ogg', 'mkv']:
parser.error(u'invalid video recode format specified')
if opts.date is not None:
date = DateRange.day(opts.date)
else:
date = DateRange(opts.dateafter, opts.datebefore)
if opts.default_search not in ('auto', 'auto_warning', None) and ':' not in opts.default_search:
if opts.default_search not in ('auto', 'auto_warning', 'error', 'fixup_error', None) and ':' not in opts.default_search:
parser.error(u'--default-search invalid; did you forget a colon (:) at the end?')
# Do not download videos when there are audio-only formats
@@ -719,6 +758,7 @@ def _real_main(argv=None):
'usenetrc': opts.usenetrc,
'username': opts.username,
'password': opts.password,
'twofactor': opts.twofactor,
'videopassword': opts.videopassword,
'quiet': (opts.quiet or any_printing),
'no_warnings': opts.no_warnings,
@@ -795,6 +835,7 @@ def _real_main(argv=None):
'default_search': opts.default_search,
'youtube_include_dash_manifest': opts.youtube_include_dash_manifest,
'encoding': opts.encoding,
'exec_cmd': opts.exec_cmd,
}
with YoutubeDL(ydl_opts) as ydl:
@@ -818,13 +859,37 @@ def _real_main(argv=None):
ydl.add_post_processor(FFmpegAudioFixPP())
ydl.add_post_processor(AtomicParsleyPP())
# Please keep ExecAfterDownload towards the bottom as it allows the user to modify the final file in any way.
# So if the user is able to remove the file before your postprocessor runs it might cause a few problems.
if opts.exec_cmd:
ydl.add_post_processor(ExecAfterDownloadPP(
verboseOutput=opts.verbose, exec_cmd=opts.exec_cmd))
# Update version
if opts.update_self:
update_self(ydl.to_screen, opts.verbose)
# Remove cache dir
if opts.rm_cachedir:
if opts.cachedir is None:
ydl.to_screen(u'No cache dir specified (Did you combine --no-cache-dir and --rm-cache-dir?)')
else:
if ('.cache' not in opts.cachedir) or ('youtube-dl' not in opts.cachedir):
ydl.to_screen(u'Not removing directory %s - this does not look like a cache dir')
retcode = 141
else:
ydl.to_screen(
u'Removing cache dir %s .' % opts.cachedir,
skip_eol=True)
if os.path.exists(opts.cachedir):
ydl.to_screen(u'.', skip_eol=True)
shutil.rmtree(opts.cachedir)
ydl.to_screen(u'.')
# Maybe do nothing
if (len(all_urls) < 1) and (opts.load_info_filename is None):
if not opts.update_self:
if not (opts.update_self or opts.rm_cachedir):
parser.error(u'you must provide at least one URL')
else:
sys.exit()

View File

@@ -292,7 +292,7 @@ class FileDownloader(object):
def real_download(self, filename, info_dict):
"""Real download process. Redefine in subclasses."""
raise NotImplementedError(u'This method must be implemented by sublcasses')
raise NotImplementedError(u'This method must be implemented by subclasses')
def _hook_progress(self, status):
for ph in self._progress_hooks:

View File

@@ -220,6 +220,7 @@ class F4mFD(FileDownloader):
def real_download(self, filename, info_dict):
man_url = info_dict['url']
requested_bitrate = info_dict.get('tbr')
self.to_screen('[download] Downloading f4m manifest')
manifest = self.ydl.urlopen(man_url).read()
self.report_destination(filename)
@@ -233,8 +234,14 @@ class F4mFD(FileDownloader):
doc = etree.fromstring(manifest)
formats = [(int(f.attrib.get('bitrate', -1)), f) for f in doc.findall(_add_ns('media'))]
formats = sorted(formats, key=lambda f: f[0])
rate, media = formats[-1]
if requested_bitrate is None:
# get the best format
formats = sorted(formats, key=lambda f: f[0])
rate, media = formats[-1]
else:
rate, media = list(filter(
lambda f: int(f[0]) == requested_bitrate, formats))[0]
base_url = compat_urlparse.urljoin(man_url, media.attrib['url'])
bootstrap = base64.b64decode(doc.find(_add_ns('bootstrapInfo')).text)
metadata = base64.b64decode(media.find(_add_ns('metadata')).text)

View File

@@ -25,7 +25,7 @@ class HlsFD(FileDownloader):
except (OSError, IOError):
pass
else:
self.report_error(u'm3u8 download detected but ffmpeg or avconv could not be found')
self.report_error(u'm3u8 download detected but ffmpeg or avconv could not be found. Please install one.')
cmd = [program] + args
retval = subprocess.call(cmd)

View File

@@ -27,8 +27,16 @@ class HttpFD(FileDownloader):
headers['Youtubedl-user-agent'] = info_dict['user_agent']
if 'http_referer' in info_dict:
headers['Referer'] = info_dict['http_referer']
basic_request = compat_urllib_request.Request(url, None, headers)
request = compat_urllib_request.Request(url, None, headers)
add_headers = info_dict.get('http_headers')
if add_headers:
headers.update(add_headers)
data = info_dict.get('http_post_data')
http_method = info_dict.get('http_method')
basic_request = compat_urllib_request.Request(url, data, headers)
request = compat_urllib_request.Request(url, data, headers)
if http_method is not None:
basic_request.get_method = lambda: http_method
request.get_method = lambda: http_method
is_test = self.params.get('test', False)
@@ -110,7 +118,7 @@ class HttpFD(FileDownloader):
# However, for a test we still would like to download just a piece of a file.
# To achieve this we limit data_len to _TEST_FILE_SIZE and manually control
# block size when downloading a file.
if is_test and data_len > self._TEST_FILE_SIZE:
if is_test and (data_len is None or int(data_len) > self._TEST_FILE_SIZE):
data_len = self._TEST_FILE_SIZE
if data_len is not None:

View File

@@ -10,6 +10,7 @@ from .common import FileDownloader
from ..utils import (
encodeFilename,
format_bytes,
compat_str,
)
@@ -95,6 +96,7 @@ class RtmpFD(FileDownloader):
flash_version = info_dict.get('flash_version', None)
live = info_dict.get('rtmp_live', False)
conn = info_dict.get('rtmp_conn', None)
protocol = info_dict.get('rtmp_protocol', None)
self.report_destination(filename)
tmpfilename = self.temp_name(filename)
@@ -104,7 +106,7 @@ class RtmpFD(FileDownloader):
try:
subprocess.call(['rtmpdump', '-h'], stdout=(open(os.path.devnull, 'w')), stderr=subprocess.STDOUT)
except (OSError, IOError):
self.report_error('RTMP download detected but "rtmpdump" could not be run')
self.report_error('RTMP download detected but "rtmpdump" could not be run. Please install it.')
return False
# Download using rtmpdump. rtmpdump returns exit code 2 when
@@ -127,8 +129,13 @@ class RtmpFD(FileDownloader):
basic_args += ['--flashVer', flash_version]
if live:
basic_args += ['--live']
if conn:
if isinstance(conn, list):
for entry in conn:
basic_args += ['--conn', entry]
elif isinstance(conn, compat_str):
basic_args += ['--conn', conn]
if protocol is not None:
basic_args += ['--protocol', protocol]
args = basic_args + [[], ['--resume', '--skip', '1']][not live and self.params.get('continuedl', False)]
if sys.platform == 'win32' and sys.version_info < (3, 0):

View File

@@ -1,12 +1,15 @@
from .abc import ABCIE
from .academicearth import AcademicEarthCourseIE
from .addanime import AddAnimeIE
from .adultswim import AdultSwimIE
from .aftonbladet import AftonbladetIE
from .anitube import AnitubeIE
from .aol import AolIE
from .allocine import AllocineIE
from .aparat import AparatIE
from .appletrailers import AppleTrailersIE
from .archiveorg import ArchiveOrgIE
from .ard import ARDIE
from .ard import ARDIE, ARDMediathekIE
from .arte import (
ArteTvIE,
ArteTVPlus7IE,
@@ -51,6 +54,7 @@ from .cnn import (
from .collegehumor import CollegeHumorIE
from .comedycentral import ComedyCentralIE, ComedyCentralShowsIE
from .condenast import CondeNastIE
from .cracked import CrackedIE
from .criterion import CriterionIE
from .crunchyroll import CrunchyrollIE
from .cspan import CSpanIE
@@ -61,8 +65,11 @@ from .dailymotion import (
DailymotionUserIE,
)
from .daum import DaumIE
from .dfb import DFBIE
from .dotsub import DotsubIE
from .dreisat import DreiSatIE
from .drtv import DRTVIE
from .dump import DumpIE
from .defense import DefenseGouvFrIE
from .discovery import DiscoveryIE
from .divxstage import DivxStageIE
@@ -71,14 +78,22 @@ from .ebaumsworld import EbaumsWorldIE
from .ehow import EHowIE
from .eighttracks import EightTracksIE
from .eitb import EitbIE
from .ellentv import (
EllenTVIE,
EllenTVClipsIE,
)
from .elpais import ElPaisIE
from .empflix import EmpflixIE
from .engadget import EngadgetIE
from .escapist import EscapistIE
from .everyonesmixtape import EveryonesMixtapeIE
from .exfm import ExfmIE
from .expotv import ExpoTVIE
from .extremetube import ExtremeTubeIE
from .facebook import FacebookIE
from .faz import FazIE
from .fc2 import FC2IE
from .firedrive import FiredriveIE
from .firstpost import FirstpostIE
from .firsttv import FirstTVIE
from .fivemin import FiveMinIE
@@ -101,18 +116,30 @@ from .freesound import FreesoundIE
from .freespeech import FreespeechIE
from .funnyordie import FunnyOrDieIE
from .gamekings import GamekingsIE
from .gameone import (
GameOneIE,
GameOnePlaylistIE,
)
from .gamespot import GameSpotIE
from .gamestar import GameStarIE
from .gametrailers import GametrailersIE
from .gdcvault import GDCVaultIE
from .generic import GenericIE
from .godtube import GodTubeIE
from .googleplus import GooglePlusIE
from .googlesearch import GoogleSearchIE
from .gorillavid import GorillaVidIE
from .goshgay import GoshgayIE
from .grooveshark import GroovesharkIE
from .hark import HarkIE
from .helsinki import HelsinkiIE
from .hentaistigma import HentaiStigmaIE
from .hotnewhiphop import HotNewHipHopIE
from .howcast import HowcastIE
from .howstuffworks import HowStuffWorksIE
from .huffpost import HuffPostIE
from .hypem import HypemIE
from .iconosquare import IconosquareIE
from .ign import IGNIE, OneUPIE
from .imdb import (
ImdbIE,
@@ -127,8 +154,10 @@ from .ivi import (
IviIE,
IviCompilationIE
)
from .izlesene import IzleseneIE
from .jadorecettepub import JadoreCettePubIE
from .jeuxvideo import JeuxVideoIE
from .jove import JoveIE
from .jukebox import JukeboxIE
from .justintv import JustinTVIE
from .jpopsukitv import JpopsukiIE
@@ -138,10 +167,16 @@ from .khanacademy import KhanAcademyIE
from .kickstarter import KickStarterIE
from .keek import KeekIE
from .kontrtube import KontrTubeIE
from .krasview import KrasViewIE
from .ku6 import Ku6IE
from .la7 import LA7IE
from .lifenews import LifeNewsIE
from .liveleak import LiveLeakIE
from .livestream import LivestreamIE, LivestreamOriginalIE
from .livestream import (
LivestreamIE,
LivestreamOriginalIE,
LivestreamShortenerIE,
)
from .lynda import (
LyndaIE,
LyndaCourseIE
@@ -153,20 +188,28 @@ from .malemotion import MalemotionIE
from .mdr import MDRIE
from .metacafe import MetacafeIE
from .metacritic import MetacriticIE
from .ministrygrid import MinistryGridIE
from .mit import TechTVMITIE, MITIE, OCWMITIE
from .mitele import MiTeleIE
from .mixcloud import MixcloudIE
from .mlb import MLBIE
from .mpora import MporaIE
from .mofosex import MofosexIE
from .mojvideo import MojvideoIE
from .mooshare import MooshareIE
from .morningstar import MorningstarIE
from .motherless import MotherlessIE
from .motorsport import MotorsportIE
from .movieclips import MovieClipsIE
from .moviezine import MoviezineIE
from .movshare import MovShareIE
from .mtv import (
MTVIE,
MTVServicesEmbeddedIE,
MTVIggyIE,
)
from .musicplayon import MusicPlayOnIE
from .musicvault import MusicVaultIE
from .muzu import MuzuTVIE
from .myspace import MySpaceIE
from .myspass import MySpassIE
@@ -180,6 +223,7 @@ from .nbc import (
from .ndr import NDRIE
from .ndtv import NDTVIE
from .newgrounds import NewgroundsIE
from .newstube import NewstubeIE
from .nfb import NFBIE
from .nhl import NHLIE, NHLVideocenterIE
from .niconico import NiconicoIE
@@ -189,14 +233,25 @@ from .normalboots import NormalbootsIE
from .novamov import NovaMovIE
from .nowness import NownessIE
from .nowvideo import NowVideoIE
from .nrk import NRKIE
from .npo import NPOIE
from .nrk import (
NRKIE,
NRKTVIE,
)
from .ntv import NTVIE
from .oe1 import OE1IE
from .nytimes import NYTimesIE
from .nuvid import NuvidIE
from .ooyala import OoyalaIE
from .orf import ORFIE
from .orf import (
ORFTVthekIE,
ORFOE1IE,
ORFFM4IE,
)
from .parliamentliveuk import ParliamentLiveUKIE
from .patreon import PatreonIE
from .pbs import PBSIE
from .photobucket import PhotobucketIE
from .playfm import PlayFMIE
from .playvid import PlayvidIE
from .podomatic import PodomaticIE
from .pornhd import PornHdIE
@@ -205,16 +260,20 @@ from .pornotube import PornotubeIE
from .prosiebensat1 import ProSiebenSat1IE
from .pyvideo import PyvideoIE
from .radiofrance import RadioFranceIE
from .rai import RaiIE
from .rbmaradio import RBMARadioIE
from .redtube import RedTubeIE
from .reverbnation import ReverbNationIE
from .ringtv import RingTVIE
from .ro220 import Ro220IE
from .rottentomatoes import RottenTomatoesIE
from .roxwel import RoxwelIE
from .rtbf import RTBFIE
from .rtlnl import RtlXlIE
from .rtlnow import RTLnowIE
from .rts import RTSIE
from .rtve import RTVEALaCartaIE
from .rtve import RTVEALaCartaIE, RTVELiveIE
from .ruhd import RUHDIE
from .rutube import (
RutubeIE,
RutubeChannelIE,
@@ -222,37 +281,59 @@ from .rutube import (
RutubePersonIE,
)
from .rutv import RUTVIE
from .sapo import SapoIE
from .savefrom import SaveFromIE
from .sbs import SBSIE
from .scivee import SciVeeIE
from .screencast import ScreencastIE
from .servingsys import ServingSysIE
from .shared import SharedIE
from .sina import SinaIE
from .slideshare import SlideshareIE
from .slutload import SlutloadIE
from .smotri import (
SmotriIE,
SmotriCommunityIE,
SmotriUserIE,
SmotriBroadcastIE,
)
from .snotr import SnotrIE
from .sockshare import SockshareIE
from .sohu import SohuIE
from .soundcloud import SoundcloudIE, SoundcloudSetIE, SoundcloudUserIE
from .southparkstudios import (
SouthParkStudiosIE,
from .soundcloud import (
SoundcloudIE,
SoundcloudSetIE,
SoundcloudUserIE,
SoundcloudPlaylistIE
)
from .soundgasm import SoundgasmIE
from .southpark import (
SouthParkIE,
SouthparkDeIE,
)
from .space import SpaceIE
from .spankwire import SpankwireIE
from .spiegel import SpiegelIE
from .spiegeltv import SpiegeltvIE
from .spike import SpikeIE
from .sportdeutschland import SportDeutschlandIE
from .stanfordoc import StanfordOpenClassroomIE
from .statigram import StatigramIE
from .steam import SteamIE
from .streamcloud import StreamcloudIE
from .streamcz import StreamCZIE
from .swrmediathek import SWRMediathekIE
from .syfy import SyfyIE
from .sztvhu import SztvHuIE
from .tagesschau import TagesschauIE
from .teachertube import (
TeacherTubeIE,
TeacherTubeUserIE,
)
from .teachingchannel import TeachingChannelIE
from .teamcoco import TeamcocoIE
from .techtalks import TechTalksIE
from .ted import TEDIE
from .tenplay import TenPlayIE
from .testurl import TestURLIE
from .tf1 import TF1IE
from .theplatform import ThePlatformIE
@@ -270,6 +351,8 @@ from .tumblr import TumblrIE
from .tutv import TutvIE
from .tvigle import TvigleIE
from .tvp import TvpIE
from .tvplay import TVPlayIE
from .ubu import UbuIE
from .udemy import (
UdemyIE,
UdemyCourseIE
@@ -282,13 +365,16 @@ from .veehd import VeeHDIE
from .veoh import VeohIE
from .vesti import VestiIE
from .vevo import VevoIE
from .vh1 import VH1IE
from .viddler import ViddlerIE
from .videobam import VideoBamIE
from .videodetective import VideoDetectiveIE
from .videolecturesnet import VideoLecturesNetIE
from .videofyme import VideofyMeIE
from .videopremium import VideoPremiumIE
from .videott import VideoTtIE
from .videoweed import VideoWeedIE
from .vidme import VidmeIE
from .vimeo import (
VimeoIE,
VimeoChannelIE,
@@ -298,22 +384,32 @@ from .vimeo import (
VimeoReviewIE,
VimeoWatchLaterIE,
)
from .vine import VineIE
from .vimple import VimpleIE
from .vine import (
VineIE,
VineUserIE,
)
from .viki import VikiIE
from .vk import VKIE
from .vodlocker import VodlockerIE
from .vube import VubeIE
from .vuclip import VuClipIE
from .vulture import VultureIE
from .washingtonpost import WashingtonPostIE
from .wat import WatIE
from .wayofthemaster import WayOfTheMasterIE
from .wdr import (
WDRIE,
WDRMobileIE,
WDRMausIE,
)
from .weibo import WeiboIE
from .wimp import WimpIE
from .wistia import WistiaIE
from .worldstarhiphop import WorldStarHipHopIE
from .wrzuta import WrzutaIE
from .xbef import XBefIE
from .xboxclips import XboxClipsIE
from .xhamster import XHamsterIE
from .xnxx import XNXXIE
from .xvideos import XVideosIE
@@ -343,6 +439,7 @@ from .youtube import (
YoutubeUserIE,
YoutubeWatchLaterIE,
)
from .zdf import ZDFIE

View File

@@ -0,0 +1,48 @@
from __future__ import unicode_literals
import re
import json
from .common import InfoExtractor
class ABCIE(InfoExtractor):
IE_NAME = 'abc.net.au'
_VALID_URL = r'http://www\.abc\.net\.au/news/[^/]+/[^/]+/(?P<id>\d+)'
_TEST = {
'url': 'http://www.abc.net.au/news/2014-07-25/bringing-asylum-seekers-to-australia-would-give/5624716',
'md5': 'dad6f8ad011a70d9ddf887ce6d5d0742',
'info_dict': {
'id': '5624716',
'ext': 'mp4',
'title': 'Bringing asylum seekers to Australia would give them right to asylum claims: professor',
'description': 'md5:ba36fa5e27e5c9251fd929d339aea4af',
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
urls_info_json = self._search_regex(
r'inlineVideoData\.push\((.*?)\);', webpage, 'video urls',
flags=re.DOTALL)
urls_info = json.loads(urls_info_json.replace('\'', '"'))
formats = [{
'url': url_info['url'],
'width': int(url_info['width']),
'height': int(url_info['height']),
'tbr': int(url_info['bitrate']),
'filesize': int(url_info['filesize']),
} for url_info in urls_info]
self._sort_formats(formats)
return {
'id': video_id,
'title': self._og_search_title(webpage),
'formats': formats,
'description': self._og_search_description(webpage),
'thumbnail': self._og_search_thumbnail(webpage),
}

View File

@@ -0,0 +1,139 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
class AdultSwimIE(InfoExtractor):
_VALID_URL = r'https?://video\.adultswim\.com/(?P<path>.+?)(?:\.html)?(?:\?.*)?(?:#.*)?$'
_TEST = {
'url': 'http://video.adultswim.com/rick-and-morty/close-rick-counters-of-the-rick-kind.html?x=y#title',
'playlist': [
{
'md5': '4da359ec73b58df4575cd01a610ba5dc',
'info_dict': {
'id': '8a250ba1450996e901453d7f02ca02f5',
'ext': 'flv',
'title': 'Rick and Morty Close Rick-Counters of the Rick Kind part 1',
'description': 'Rick has a run in with some old associates, resulting in a fallout with Morty. You got any chips, broh?',
'uploader': 'Rick and Morty',
'thumbnail': 'http://i.cdn.turner.com/asfix/repository/8a250ba13f865824013fc9db8b6b0400/thumbnail_267549017116827057.jpg'
}
},
{
'md5': 'ffbdf55af9331c509d95350bd0cc1819',
'info_dict': {
'id': '8a250ba1450996e901453d7f4bd102f6',
'ext': 'flv',
'title': 'Rick and Morty Close Rick-Counters of the Rick Kind part 2',
'description': 'Rick has a run in with some old associates, resulting in a fallout with Morty. You got any chips, broh?',
'uploader': 'Rick and Morty',
'thumbnail': 'http://i.cdn.turner.com/asfix/repository/8a250ba13f865824013fc9db8b6b0400/thumbnail_267549017116827057.jpg'
}
},
{
'md5': 'b92409635540304280b4b6c36bd14a0a',
'info_dict': {
'id': '8a250ba1450996e901453d7fa73c02f7',
'ext': 'flv',
'title': 'Rick and Morty Close Rick-Counters of the Rick Kind part 3',
'description': 'Rick has a run in with some old associates, resulting in a fallout with Morty. You got any chips, broh?',
'uploader': 'Rick and Morty',
'thumbnail': 'http://i.cdn.turner.com/asfix/repository/8a250ba13f865824013fc9db8b6b0400/thumbnail_267549017116827057.jpg'
}
},
{
'md5': 'e8818891d60e47b29cd89d7b0278156d',
'info_dict': {
'id': '8a250ba1450996e901453d7fc8ba02f8',
'ext': 'flv',
'title': 'Rick and Morty Close Rick-Counters of the Rick Kind part 4',
'description': 'Rick has a run in with some old associates, resulting in a fallout with Morty. You got any chips, broh?',
'uploader': 'Rick and Morty',
'thumbnail': 'http://i.cdn.turner.com/asfix/repository/8a250ba13f865824013fc9db8b6b0400/thumbnail_267549017116827057.jpg'
}
}
]
}
_video_extensions = {
'3500': 'flv',
'640': 'mp4',
'150': 'mp4',
'ipad': 'm3u8',
'iphone': 'm3u8'
}
_video_dimensions = {
'3500': (1280, 720),
'640': (480, 270),
'150': (320, 180)
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_path = mobj.group('path')
webpage = self._download_webpage(url, video_path)
episode_id = self._html_search_regex(r'<link rel="video_src" href="http://i\.adultswim\.com/adultswim/adultswimtv/tools/swf/viralplayer.swf\?id=([0-9a-f]+?)"\s*/?\s*>', webpage, 'episode_id')
title = self._og_search_title(webpage)
index_url = 'http://asfix.adultswim.com/asfix-svc/episodeSearch/getEpisodesByIDs?networkName=AS&ids=%s' % episode_id
idoc = self._download_xml(index_url, title, 'Downloading episode index', 'Unable to download episode index')
episode_el = idoc.find('.//episode')
show_title = episode_el.attrib.get('collectionTitle')
episode_title = episode_el.attrib.get('title')
thumbnail = episode_el.attrib.get('thumbnailUrl')
description = episode_el.find('./description').text.strip()
entries = []
segment_els = episode_el.findall('./segments/segment')
for part_num, segment_el in enumerate(segment_els):
segment_id = segment_el.attrib.get('id')
segment_title = '%s %s part %d' % (show_title, episode_title, part_num + 1)
thumbnail = segment_el.attrib.get('thumbnailUrl')
duration = segment_el.attrib.get('duration')
segment_url = 'http://asfix.adultswim.com/asfix-svc/episodeservices/getCvpPlaylist?networkName=AS&id=%s' % segment_id
idoc = self._download_xml(segment_url, segment_title, 'Downloading segment information', 'Unable to download segment information')
formats = []
file_els = idoc.findall('.//files/file')
for file_el in file_els:
bitrate = file_el.attrib.get('bitrate')
type = file_el.attrib.get('type')
width, height = self._video_dimensions.get(bitrate, (None, None))
formats.append({
'format_id': '%s-%s' % (bitrate, type),
'url': file_el.text,
'ext': self._video_extensions.get(bitrate, 'mp4'),
# The bitrate may not be a number (for example: 'iphone')
'tbr': int(bitrate) if bitrate.isdigit() else None,
'height': height,
'width': width
})
self._sort_formats(formats)
entries.append({
'id': segment_id,
'title': segment_title,
'formats': formats,
'uploader': show_title,
'thumbnail': thumbnail,
'duration': duration,
'description': description
})
return {
'_type': 'playlist',
'id': episode_id,
'display_id': video_path,
'entries': entries,
'title': '%s %s' % (show_title, episode_title),
'description': description,
'thumbnail': thumbnail
}

View File

@@ -1,7 +1,6 @@
# encoding: utf-8
from __future__ import unicode_literals
import datetime
import re
from .common import InfoExtractor
@@ -16,6 +15,7 @@ class AftonbladetIE(InfoExtractor):
'ext': 'mp4',
'title': 'Vulkanutbrott i rymden - nu släpper NASA bilderna',
'description': 'Jupiters måne mest aktiv av alla himlakroppar',
'timestamp': 1394142732,
'upload_date': '20140306',
},
}
@@ -27,17 +27,17 @@ class AftonbladetIE(InfoExtractor):
webpage = self._download_webpage(url, video_id)
# find internal video meta data
META_URL = 'http://aftonbladet-play.drlib.aptoma.no/video/%s.json'
meta_url = 'http://aftonbladet-play.drlib.aptoma.no/video/%s.json'
internal_meta_id = self._html_search_regex(
r'data-aptomaId="([\w\d]+)"', webpage, 'internal_meta_id')
internal_meta_url = META_URL % internal_meta_id
internal_meta_url = meta_url % internal_meta_id
internal_meta_json = self._download_json(
internal_meta_url, video_id, 'Downloading video meta data')
# find internal video formats
FORMATS_URL = 'http://aftonbladet-play.videodata.drvideo.aptoma.no/actions/video/?id=%s'
format_url = 'http://aftonbladet-play.videodata.drvideo.aptoma.no/actions/video/?id=%s'
internal_video_id = internal_meta_json['videoId']
internal_formats_url = FORMATS_URL % internal_video_id
internal_formats_url = format_url % internal_video_id
internal_formats_json = self._download_json(
internal_formats_url, video_id, 'Downloading video formats')
@@ -54,16 +54,13 @@ class AftonbladetIE(InfoExtractor):
})
self._sort_formats(formats)
timestamp = datetime.datetime.fromtimestamp(internal_meta_json['timePublished'])
upload_date = timestamp.strftime('%Y%m%d')
return {
'id': video_id,
'title': internal_meta_json['title'],
'formats': formats,
'thumbnail': internal_meta_json['imageUrl'],
'description': internal_meta_json['shortPreamble'],
'upload_date': upload_date,
'timestamp': internal_meta_json['timePublished'],
'duration': internal_meta_json['duration'],
'view_count': internal_meta_json['views'],
}

View File

@@ -0,0 +1,89 @@
# -*- coding: utf-8 -*-
from __future__ import unicode_literals
import re
import json
from .common import InfoExtractor
from ..utils import (
compat_str,
qualities,
determine_ext,
)
class AllocineIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?allocine\.fr/(?P<typ>article|video|film)/(fichearticle_gen_carticle=|player_gen_cmedia=|fichefilm_gen_cfilm=)(?P<id>[0-9]+)(?:\.html)?'
_TESTS = [{
'url': 'http://www.allocine.fr/article/fichearticle_gen_carticle=18635087.html',
'md5': '0c9fcf59a841f65635fa300ac43d8269',
'info_dict': {
'id': '19546517',
'ext': 'mp4',
'title': 'Astérix - Le Domaine des Dieux Teaser VF',
'description': 'md5:4a754271d9c6f16c72629a8a993ee884',
'thumbnail': 're:http://.*\.jpg',
},
}, {
'url': 'http://www.allocine.fr/video/player_gen_cmedia=19540403&cfilm=222257.html',
'md5': 'd0cdce5d2b9522ce279fdfec07ff16e0',
'info_dict': {
'id': '19540403',
'ext': 'mp4',
'title': 'Planes 2 Bande-annonce VF',
'description': 'md5:eeaffe7c2d634525e21159b93acf3b1e',
'thumbnail': 're:http://.*\.jpg',
},
}, {
'url': 'http://www.allocine.fr/film/fichefilm_gen_cfilm=181290.html',
'md5': '101250fb127ef9ca3d73186ff22a47ce',
'info_dict': {
'id': '19544709',
'ext': 'mp4',
'title': 'Dragons 2 - Bande annonce finale VF',
'description': 'md5:71742e3a74b0d692c7fce0dd2017a4ac',
'thumbnail': 're:http://.*\.jpg',
},
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
typ = mobj.group('typ')
display_id = mobj.group('id')
webpage = self._download_webpage(url, display_id)
if typ == 'film':
video_id = self._search_regex(r'href="/video/player_gen_cmedia=([0-9]+).+"', webpage, 'video id')
else:
player = self._search_regex(r'data-player=\'([^\']+)\'>', webpage, 'data player')
player_data = json.loads(player)
video_id = compat_str(player_data['refMedia'])
xml = self._download_xml('http://www.allocine.fr/ws/AcVisiondataV4.ashx?media=%s' % video_id, display_id)
video = xml.find('.//AcVisionVideo').attrib
quality = qualities(['ld', 'md', 'hd'])
formats = []
for k, v in video.items():
if re.match(r'.+_path', k):
format_id = k.split('_')[0]
formats.append({
'format_id': format_id,
'quality': quality(format_id),
'url': v,
'ext': determine_ext(v),
})
self._sort_formats(formats)
return {
'id': video_id,
'title': video['videoTitle'],
'thumbnail': self._og_search_thumbnail(webpage),
'formats': formats,
'description': self._og_search_description(webpage),
}

View File

@@ -1,22 +1,24 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
class AnitubeIE(InfoExtractor):
IE_NAME = u'anitube.se'
IE_NAME = 'anitube.se'
_VALID_URL = r'https?://(?:www\.)?anitube\.se/video/(?P<id>\d+)'
_TEST = {
u'url': u'http://www.anitube.se/video/36621',
u'md5': u'59d0eeae28ea0bc8c05e7af429998d43',
u'file': u'36621.mp4',
u'info_dict': {
u'id': u'36621',
u'ext': u'mp4',
u'title': u'Recorder to Randoseru 01',
'url': 'http://www.anitube.se/video/36621',
'md5': '59d0eeae28ea0bc8c05e7af429998d43',
'info_dict': {
'id': '36621',
'ext': 'mp4',
'title': 'Recorder to Randoseru 01',
'duration': 180.19,
},
u'skip': u'Blocked in the US',
'skip': 'Blocked in the US',
}
def _real_extract(self, url):
@@ -24,13 +26,15 @@ class AnitubeIE(InfoExtractor):
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
key = self._html_search_regex(r'http://www\.anitube\.se/embed/([A-Za-z0-9_-]*)',
webpage, u'key')
key = self._html_search_regex(
r'http://www\.anitube\.se/embed/([A-Za-z0-9_-]*)', webpage, 'key')
config_xml = self._download_xml('http://www.anitube.se/nuevo/econfig.php?key=%s' % key,
key)
config_xml = self._download_xml(
'http://www.anitube.se/nuevo/econfig.php?key=%s' % key, key)
video_title = config_xml.find('title').text
thumbnail = config_xml.find('image').text
duration = float(config_xml.find('duration').text)
formats = []
video_url = config_xml.find('file')
@@ -49,5 +53,7 @@ class AnitubeIE(InfoExtractor):
return {
'id': video_id,
'title': video_title,
'thumbnail': thumbnail,
'duration': duration,
'formats': formats
}

View File

@@ -1,5 +1,7 @@
#coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
@@ -13,13 +15,14 @@ class AparatIE(InfoExtractor):
_VALID_URL = r'^https?://(?:www\.)?aparat\.com/(?:v/|video/video/embed/videohash/)(?P<id>[a-zA-Z0-9]+)'
_TEST = {
u'url': u'http://www.aparat.com/v/wP8On',
u'file': u'wP8On.mp4',
u'md5': u'6714e0af7e0d875c5a39c4dc4ab46ad1',
u'info_dict': {
u"title": u"تیم گلکسی 11 - زومیت",
'url': 'http://www.aparat.com/v/wP8On',
'md5': '6714e0af7e0d875c5a39c4dc4ab46ad1',
'info_dict': {
'id': 'wP8On',
'ext': 'mp4',
'title': 'تیم گلکسی 11 - زومیت',
},
#u'skip': u'Extremely unreliable',
# 'skip': 'Extremely unreliable',
}
def _real_extract(self, url):
@@ -29,8 +32,8 @@ class AparatIE(InfoExtractor):
# Note: There is an easier-to-parse configuration at
# http://www.aparat.com/video/video/config/videohash/%video_id
# but the URL in there does not work
embed_url = (u'http://www.aparat.com/video/video/embed/videohash/' +
video_id + u'/vt/frame')
embed_url = ('http://www.aparat.com/video/video/embed/videohash/' +
video_id + '/vt/frame')
webpage = self._download_webpage(embed_url, video_id)
video_urls = re.findall(r'fileList\[[0-9]+\]\s*=\s*"([^"]+)"', webpage)

View File

@@ -6,6 +6,7 @@ import json
from .common import InfoExtractor
from ..utils import (
compat_urlparse,
int_or_none,
)
@@ -110,8 +111,8 @@ class AppleTrailersIE(InfoExtractor):
formats.append({
'url': format_url,
'format': format['type'],
'width': format['width'],
'height': int(format['height']),
'width': int_or_none(format['width']),
'height': int_or_none(format['height']),
})
self._sort_formats(formats)

View File

@@ -7,23 +7,38 @@ from .common import InfoExtractor
from ..utils import (
determine_ext,
ExtractorError,
qualities,
compat_urllib_parse_urlparse,
compat_urllib_parse,
int_or_none,
parse_duration,
unified_strdate,
)
class ARDIE(InfoExtractor):
_VALID_URL = r'^https?://(?:(?:www\.)?ardmediathek\.de|mediathek\.daserste\.de)/(?:.*/)(?P<video_id>[^/\?]+)(?:\?.*)?'
class ARDMediathekIE(InfoExtractor):
IE_NAME = 'ARD:mediathek'
_VALID_URL = r'^https?://(?:(?:www\.)?ardmediathek\.de|mediathek\.daserste\.de)/(?:.*/)(?P<video_id>[0-9]+|[^0-9][^/\?]+)[^/\?]*(?:\?.*)?'
_TEST = {
'url': 'http://www.ardmediathek.de/das-erste/guenther-jauch/edward-snowden-im-interview-held-oder-verraeter?documentId=19288786',
'file': '19288786.mp4',
'md5': '515bf47ce209fb3f5a61b7aad364634c',
_TESTS = [{
'url': 'http://mediathek.daserste.de/sendungen_a-z/328454_anne-will/22429276_vertrauen-ist-gut-spionieren-ist-besser-geht',
'file': '22429276.mp4',
'md5': '469751912f1de0816a9fc9df8336476c',
'info_dict': {
'title': 'Edward Snowden im Interview - Held oder Verräter?',
'description': 'Edward Snowden hat alles aufs Spiel gesetzt, um die weltweite \xdcberwachung durch die Geheimdienste zu enttarnen. Nun stellt sich der ehemalige NSA-Mitarbeiter erstmals weltweit in einem TV-Interview den Fragen eines NDR-Journalisten. Die Sendung vom Sonntagabend.',
'thumbnail': 'http://www.ardmediathek.de/ard/servlet/contentblob/19/28/87/90/19288790/bild/2250037',
'title': 'Vertrauen ist gut, Spionieren ist besser - Geht so deutsch-amerikanische Freundschaft?',
'description': 'Das Erste Mediathek [ARD]: Vertrauen ist gut, Spionieren ist besser - Geht so deutsch-amerikanische Freundschaft?, Anne Will, Über die Spionage-Affäre diskutieren Clemens Binninger, Katrin Göring-Eckardt, Georg Mascolo, Andrew B. Denison und Constanze Kurz.. Das Video zur Sendung Anne Will am Mittwoch, 16.07.2014',
},
'skip': 'Blocked outside of Germany',
}
}, {
'url': 'http://www.ardmediathek.de/tv/Tatort/Das-Wunder-von-Wolbeck-Video-tgl-ab-20/Das-Erste/Video?documentId=22490580&bcastId=602916',
'info_dict': {
'id': '22490580',
'ext': 'mp4',
'title': 'Das Wunder von Wolbeck (Video tgl. ab 20 Uhr)',
'description': 'Auf einem restaurierten Hof bei Wolbeck wird der Heilpraktiker Raffael Lembeck eines morgens von seiner Frau Stella tot aufgefunden. Das Opfer war offensichtlich in seiner Praxis zu Fall gekommen und ist dann verblutet, erklärt Prof. Boerne am Tatort.',
},
'skip': 'Blocked outside of Germany',
}]
def _real_extract(self, url):
# determine video id from url
@@ -35,42 +50,78 @@ class ARDIE(InfoExtractor):
else:
video_id = m.group('video_id')
urlp = compat_urllib_parse_urlparse(url)
url = urlp._replace(path=compat_urllib_parse.quote(urlp.path.encode('utf-8'))).geturl()
webpage = self._download_webpage(url, video_id)
if '>Der gewünschte Beitrag ist nicht mehr verfügbar.<' in webpage:
raise ExtractorError('Video %s is no longer available' % video_id, expected=True)
title = self._html_search_regex(
r'<h1(?:\s+class="boxTopHeadline")?>(.*?)</h1>', webpage, 'title')
[r'<h1(?:\s+class="boxTopHeadline")?>(.*?)</h1>',
r'<meta name="dcterms.title" content="(.*?)"/>',
r'<h4 class="headline">(.*?)</h4>'],
webpage, 'title')
description = self._html_search_meta(
'dcterms.abstract', webpage, 'description')
thumbnail = self._og_search_thumbnail(webpage)
'dcterms.abstract', webpage, 'description', default=None)
if description is None:
description = self._html_search_meta(
'description', webpage, 'meta description')
streams = [
mo.groupdict()
for mo in re.finditer(
r'mediaCollection\.addMediaStream\((?P<media_type>\d+), (?P<quality>\d+), "(?P<rtmp_url>[^"]*)", "(?P<video_url>[^"]*)", "[^"]*"\)', webpage)]
if not streams:
if '"fsk"' in webpage:
raise ExtractorError('This video is only available after 20:00')
# Thumbnail is sometimes not present.
# It is in the mobile version, but that seems to use a different URL
# structure altogether.
thumbnail = self._og_search_thumbnail(webpage, default=None)
formats = []
for s in streams:
format = {
'quality': int(s['quality']),
}
if s.get('rtmp_url'):
format['protocol'] = 'rtmp'
format['url'] = s['rtmp_url']
format['playpath'] = s['video_url']
else:
format['url'] = s['video_url']
media_streams = re.findall(r'''(?x)
mediaCollection\.addMediaStream\([0-9]+,\s*[0-9]+,\s*"[^"]*",\s*
"([^"]+)"''', webpage)
quality_name = self._search_regex(
r'[,.]([a-zA-Z0-9_-]+),?\.mp4', format['url'],
'quality name', default='NA')
format['format_id'] = '%s-%s-%s-%s' % (
determine_ext(format['url']), quality_name, s['media_type'],
s['quality'])
if media_streams:
QUALITIES = qualities(['lo', 'hi', 'hq'])
formats = []
for furl in set(media_streams):
if furl.endswith('.f4m'):
fid = 'f4m'
else:
fid_m = re.match(r'.*\.([^.]+)\.[^.]+$', furl)
fid = fid_m.group(1) if fid_m else None
formats.append({
'quality': QUALITIES(fid),
'format_id': fid,
'url': furl,
})
else: # request JSON file
media_info = self._download_json(
'http://www.ardmediathek.de/play/media/%s' % video_id, video_id)
# The second element of the _mediaArray contains the standard http urls
streams = media_info['_mediaArray'][1]['_mediaStreamArray']
if not streams:
if '"fsk"' in webpage:
raise ExtractorError('This video is only available after 20:00')
formats.append(format)
formats = []
for s in streams:
if type(s['_stream']) == list:
for index, url in enumerate(s['_stream'][::-1]):
quality = s['_quality'] + index
formats.append({
'quality': quality,
'url': url,
'format_id': '%s-%s' % (determine_ext(url), quality)
})
continue
format = {
'quality': s['_quality'],
'url': s['_stream'],
}
format['format_id'] = '%s-%s' % (
determine_ext(format['url']), format['quality'])
formats.append(format)
self._sort_formats(formats)
@@ -81,3 +132,60 @@ class ARDIE(InfoExtractor):
'formats': formats,
'thumbnail': thumbnail,
}
class ARDIE(InfoExtractor):
_VALID_URL = '(?P<mainurl>https?://(www\.)?daserste\.de/[^?#]+/videos/(?P<display_id>[^/?#]+)-(?P<id>[0-9]+))\.html'
_TEST = {
'url': 'http://www.daserste.de/information/reportage-dokumentation/dokus/videos/die-story-im-ersten-mission-unter-falscher-flagge-100.html',
'md5': 'd216c3a86493f9322545e045ddc3eb35',
'info_dict': {
'display_id': 'die-story-im-ersten-mission-unter-falscher-flagge',
'id': '100',
'ext': 'mp4',
'duration': 2600,
'title': 'Die Story im Ersten: Mission unter falscher Flagge',
'upload_date': '20140804',
'thumbnail': 're:^https?://.*\.jpg$',
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
display_id = mobj.group('display_id')
player_url = mobj.group('mainurl') + '~playerXml.xml'
doc = self._download_xml(player_url, display_id)
video_node = doc.find('./video')
upload_date = unified_strdate(video_node.find('./broadcastDate').text)
thumbnail = video_node.find('.//teaserImage//variant/url').text
formats = []
for a in video_node.findall('.//asset'):
f = {
'format_id': a.attrib['type'],
'width': int_or_none(a.find('./frameWidth').text),
'height': int_or_none(a.find('./frameHeight').text),
'vbr': int_or_none(a.find('./bitrateVideo').text),
'abr': int_or_none(a.find('./bitrateAudio').text),
'vcodec': a.find('./codecVideo').text,
'tbr': int_or_none(a.find('./totalBitrate').text),
}
if a.find('./serverPrefix').text:
f['url'] = a.find('./serverPrefix').text
f['playpath'] = a.find('./fileName').text
else:
f['url'] = a.find('./fileName').text
formats.append(f)
self._sort_formats(formats)
return {
'id': mobj.group('id'),
'formats': formats,
'display_id': display_id,
'title': video_node.find('./title').text,
'duration': parse_duration(video_node.find('./duration').text),
'upload_date': upload_date,
'thumbnail': thumbnail,
}

View File

@@ -39,7 +39,10 @@ class ArteTvIE(InfoExtractor):
formats = [{
'forma_id': q.attrib['quality'],
'url': q.text,
# The playpath starts at 'mp4:', if we don't manually
# split the url, rtmpdump will incorrectly parse them
'url': q.text.split('mp4:', 1)[0],
'play_path': 'mp4:' + q.text.split('mp4:', 1)[1],
'ext': 'flv',
'quality': 2 if q.attrib['quality'] == 'hd' else 1,
} for q in config.findall('./urls/url')]
@@ -106,29 +109,36 @@ class ArteTVPlus7IE(InfoExtractor):
regexes = [r'VO?%s' % l, r'VO?.-ST%s' % l]
return any(re.match(r, f['versionCode']) for r in regexes)
# Some formats may not be in the same language as the url
# TODO: Might want not to drop videos that does not match requested language
# but to process those formats with lower precedence
formats = filter(_match_lang, all_formats)
formats = list(formats) # in python3 filter returns an iterator
formats = list(formats) # in python3 filter returns an iterator
if not formats:
# Some videos are only available in the 'Originalversion'
# they aren't tagged as being in French or German
if all(f['versionCode'] == 'VO' for f in all_formats):
formats = all_formats
else:
raise ExtractorError(u'The formats list is empty')
# Sometimes there are neither videos of requested lang code
# nor original version videos available
# For such cases we just take all_formats as is
formats = all_formats
if not formats:
raise ExtractorError('The formats list is empty')
if re.match(r'[A-Z]Q', formats[0]['quality']) is not None:
def sort_key(f):
return ['HQ', 'MQ', 'EQ', 'SQ'].index(f['quality'])
else:
def sort_key(f):
versionCode = f.get('versionCode')
if versionCode is None:
versionCode = ''
return (
# Sort first by quality
int(f.get('height',-1)),
int(f.get('bitrate',-1)),
int(f.get('height', -1)),
int(f.get('bitrate', -1)),
# The original version with subtitles has lower relevance
re.match(r'VO-ST(F|A)', f.get('versionCode', '')) is None,
re.match(r'VO-ST(F|A)', versionCode) is None,
# The version with sourds/mal subtitles has also lower relevance
re.match(r'VO?(F|A)-STM\1', f.get('versionCode', '')) is None,
re.match(r'VO?(F|A)-STM\1', versionCode) is None,
# Prefer http downloads over m3u8
0 if f['url'].endswith('m3u8') else 1,
)
@@ -167,16 +177,26 @@ class ArteTVPlus7IE(InfoExtractor):
# It also uses the arte_vp_url url from the webpage to extract the information
class ArteTVCreativeIE(ArteTVPlus7IE):
IE_NAME = 'arte.tv:creative'
_VALID_URL = r'https?://creative\.arte\.tv/(?P<lang>fr|de)/magazine?/(?P<id>.+)'
_VALID_URL = r'https?://creative\.arte\.tv/(?P<lang>fr|de)/(?:magazine?/)?(?P<id>[^?#]+)'
_TEST = {
_TESTS = [{
'url': 'http://creative.arte.tv/de/magazin/agentur-amateur-corporate-design',
'info_dict': {
'id': '050489-002',
'id': '72176',
'ext': 'mp4',
'title': 'Agentur Amateur / Agence Amateur #2 : Corporate Design',
'title': 'Folge 2 - Corporate Design',
'upload_date': '20131004',
},
}
}, {
'url': 'http://creative.arte.tv/fr/Monty-Python-Reunion',
'info_dict': {
'id': '160676',
'ext': 'mp4',
'title': 'Monty Python live (mostly)',
'description': 'Événement ! Quarante-cinq ans après leurs premiers succès, les légendaires Monty Python remontent sur scène.\n',
'upload_date': '20140805',
}
}]
class ArteTVFutureIE(ArteTVPlus7IE):
@@ -186,9 +206,10 @@ class ArteTVFutureIE(ArteTVPlus7IE):
_TEST = {
'url': 'http://future.arte.tv/fr/sujet/info-sciences#article-anchor-7081',
'info_dict': {
'id': '050940-003',
'id': '5201',
'ext': 'mp4',
'title': 'Les champignons au secours de la planète',
'upload_date': '20131101',
},
}

View File

@@ -12,14 +12,14 @@ from ..utils import (
class BandcampIE(InfoExtractor):
_VALID_URL = r'http://.*?\.bandcamp\.com/track/(?P<title>.*)'
_VALID_URL = r'https?://.*?\.bandcamp\.com/track/(?P<title>.*)'
_TESTS = [{
'url': 'http://youtube-dl.bandcamp.com/track/youtube-dl-test-song',
'file': '1812978515.mp3',
'md5': 'c557841d5e50261777a6585648adf439',
'info_dict': {
"title": "youtube-dl \"'/\\\u00e4\u21ad - youtube-dl test song \"'/\\\u00e4\u21ad",
"duration": 10,
"duration": 9.8485,
},
'_skip': 'There is a limit of 200 free downloads / month for the test song'
}]
@@ -28,36 +28,32 @@ class BandcampIE(InfoExtractor):
mobj = re.match(self._VALID_URL, url)
title = mobj.group('title')
webpage = self._download_webpage(url, title)
# We get the link to the free download page
m_download = re.search(r'freeDownloadPage: "(.*?)"', webpage)
if m_download is None:
if not m_download:
m_trackinfo = re.search(r'trackinfo: (.+),\s*?\n', webpage)
if m_trackinfo:
json_code = m_trackinfo.group(1)
data = json.loads(json_code)
d = data[0]
data = json.loads(json_code)[0]
duration = int(round(d['duration']))
formats = []
for format_id, format_url in d['file'].items():
ext, _, abr_str = format_id.partition('-')
for format_id, format_url in data['file'].items():
ext, abr_str = format_id.split('-', 1)
formats.append({
'format_id': format_id,
'url': format_url,
'ext': format_id.partition('-')[0],
'ext': ext,
'vcodec': 'none',
'acodec': format_id.partition('-')[0],
'abr': int(format_id.partition('-')[2]),
'acodec': ext,
'abr': int(abr_str),
})
self._sort_formats(formats)
return {
'id': compat_str(d['id']),
'title': d['title'],
'id': compat_str(data['id']),
'title': data['title'],
'formats': formats,
'duration': duration,
'duration': float(data['duration']),
}
else:
raise ExtractorError('No free songs found')
@@ -67,11 +63,9 @@ class BandcampIE(InfoExtractor):
r'var TralbumData = {(.*?)id: (?P<id>\d*?)$',
webpage, re.MULTILINE | re.DOTALL).group('id')
download_webpage = self._download_webpage(download_link, video_id,
'Downloading free downloads page')
# We get the dictionary of the track from some javascrip code
info = re.search(r'items: (.*?),$',
download_webpage, re.MULTILINE).group(1)
download_webpage = self._download_webpage(download_link, video_id, 'Downloading free downloads page')
# We get the dictionary of the track from some javascript code
info = re.search(r'items: (.*?),$', download_webpage, re.MULTILINE).group(1)
info = json.loads(info)[0]
# We pick mp3-320 for now, until format selection can be easily implemented.
mp3_info = info['downloads']['mp3-320']
@@ -100,7 +94,7 @@ class BandcampIE(InfoExtractor):
class BandcampAlbumIE(InfoExtractor):
IE_NAME = 'Bandcamp:album'
_VALID_URL = r'http://.*?\.bandcamp\.com/album/(?P<title>.*)'
_VALID_URL = r'https?://(?:(?P<subdomain>[^.]+)\.)?bandcamp\.com(?:/album/(?P<title>[^?#]+))'
_TEST = {
'url': 'http://blazo.bandcamp.com/album/jazz-format-mixtape-vol-1',
@@ -123,13 +117,15 @@ class BandcampAlbumIE(InfoExtractor):
'params': {
'playlistend': 2
},
'skip': 'Bancamp imposes download limits. See test_playlists:test_bandcamp_album for the playlist test'
'skip': 'Bandcamp imposes download limits. See test_playlists:test_bandcamp_album for the playlist test'
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
playlist_id = mobj.group('subdomain')
title = mobj.group('title')
webpage = self._download_webpage(url, title)
display_id = title or playlist_id
webpage = self._download_webpage(url, display_id)
tracks_paths = re.findall(r'<a href="(.*?)" itemprop="url">', webpage)
if not tracks_paths:
raise ExtractorError('The page doesn\'t contain any tracks')
@@ -139,6 +135,8 @@ class BandcampAlbumIE(InfoExtractor):
title = self._search_regex(r'album_title : "(.*?)"', webpage, 'title')
return {
'_type': 'playlist',
'id': playlist_id,
'display_id': display_id,
'title': title,
'entries': entries,
}

View File

@@ -13,7 +13,7 @@ from ..utils import (
class BiliBiliIE(InfoExtractor):
_VALID_URL = r'http://www\.bilibili\.tv/video/av(?P<id>[0-9]+)/'
_VALID_URL = r'http://www\.bilibili\.(?:tv|com)/video/av(?P<id>[0-9]+)/'
_TEST = {
'url': 'http://www.bilibili.tv/video/av1074402/',
@@ -56,7 +56,7 @@ class BiliBiliIE(InfoExtractor):
'thumbnailUrl', video_code, 'thumbnail', fatal=False)
player_params = compat_parse_qs(self._html_search_regex(
r'<iframe .*?class="player" src="https://secure.bilibili.tv/secure,([^"]+)"',
r'<iframe .*?class="player" src="https://secure\.bilibili\.(?:tv|com)/secure,([^"]+)"',
webpage, 'player params'))
if 'cid' in player_params:

View File

@@ -1,13 +1,10 @@
from __future__ import unicode_literals
import datetime
import json
import re
from .common import InfoExtractor
from ..utils import (
remove_start,
)
from ..utils import remove_start
class BlinkxIE(InfoExtractor):
@@ -16,18 +13,21 @@ class BlinkxIE(InfoExtractor):
_TEST = {
'url': 'http://www.blinkx.com/ce/8aQUy7GVFYgFzpKhT0oqsilwOGFRVXk3R1ZGWWdGenBLaFQwb3FzaWx3OGFRVXk3R1ZGWWdGenB',
'file': '8aQUy7GV.mp4',
'md5': '2e9a07364af40163a908edbf10bb2492',
'info_dict': {
"title": "Police Car Rolls Away",
"uploader": "stupidvideos.com",
"upload_date": "20131215",
"description": "A police car gently rolls away from a fight. Maybe it felt weird being around a confrontation and just had to get out of there!",
"duration": 14.886,
"thumbnails": [{
"width": 100,
"height": 76,
"url": "http://cdn.blinkx.com/stream/b/41/StupidVideos/20131215/1873969261/1873969261_tn_0.jpg",
'id': '8aQUy7GV',
'ext': 'mp4',
'title': 'Police Car Rolls Away',
'uploader': 'stupidvideos.com',
'upload_date': '20131215',
'timestamp': 1387068000,
'description': 'A police car gently rolls away from a fight. Maybe it felt weird being around a confrontation and just had to get out of there!',
'duration': 14.886,
'thumbnails': [{
'width': 100,
'height': 76,
'resolution': '100x76',
'url': 'http://cdn.blinkx.com/stream/b/41/StupidVideos/20131215/1873969261/1873969261_tn_0.jpg',
}],
},
}
@@ -37,13 +37,10 @@ class BlinkxIE(InfoExtractor):
video_id = m.group('id')
display_id = video_id[:8]
api_url = (u'https://apib4.blinkx.com/api.php?action=play_video&' +
api_url = ('https://apib4.blinkx.com/api.php?action=play_video&' +
'video=%s' % video_id)
data_json = self._download_webpage(api_url, display_id)
data = json.loads(data_json)['api']['results'][0]
dt = datetime.datetime.fromtimestamp(data['pubdate_epoch'])
pload_date = dt.strftime('%Y%m%d')
duration = None
thumbnails = []
formats = []
@@ -55,19 +52,16 @@ class BlinkxIE(InfoExtractor):
'height': int(m['h']),
})
elif m['type'] == 'original':
duration = m['d']
duration = float(m['d'])
elif m['type'] == 'youtube':
yt_id = m['link']
self.to_screen(u'Youtube video detected: %s' % yt_id)
self.to_screen('Youtube video detected: %s' % yt_id)
return self.url_result(yt_id, 'Youtube', video_id=yt_id)
elif m['type'] in ('flv', 'mp4'):
vcodec = remove_start(m['vcodec'], 'ff')
acodec = remove_start(m['acodec'], 'ff')
tbr = (int(m['vbr']) + int(m['abr'])) // 1000
format_id = (u'%s-%sk-%s' %
(vcodec,
tbr,
m['w']))
format_id = '%s-%sk-%s' % (vcodec, tbr, m['w'])
formats.append({
'format_id': format_id,
'url': m['link'],
@@ -88,7 +82,7 @@ class BlinkxIE(InfoExtractor):
'title': data['title'],
'formats': formats,
'uploader': data['channel_name'],
'upload_date': pload_date,
'timestamp': data['pubdate_epoch'],
'description': data.get('description'),
'thumbnails': thumbnails,
'duration': duration,

View File

@@ -1,102 +1,139 @@
from __future__ import unicode_literals
import datetime
import re
from .common import InfoExtractor
from .subtitles import SubtitlesInfoExtractor
from ..utils import (
compat_str,
compat_urllib_request,
unescapeHTML,
parse_iso8601,
compat_urlparse,
clean_html,
compat_str,
)
class BlipTVIE(SubtitlesInfoExtractor):
"""Information extractor for blip.tv"""
_VALID_URL = r'https?://(?:\w+\.)?blip\.tv/(?:(?:.+-|rss/flash/)(?P<id>\d+)|((?:play/|api\.swf#)(?P<lookup_id>[\da-zA-Z+_]+)))'
_VALID_URL = r'https?://(?:\w+\.)?blip\.tv/((.+/)|(play/)|(api\.swf#))(?P<presumptive_id>.+)$'
_TESTS = [{
'url': 'http://blip.tv/cbr/cbr-exclusive-gotham-city-imposters-bats-vs-jokerz-short-3-5796352',
'md5': 'c6934ad0b6acf2bd920720ec888eb812',
'info_dict': {
'id': '5779306',
'ext': 'mov',
'upload_date': '20111205',
'description': 'md5:9bc31f227219cde65e47eeec8d2dc596',
'uploader': 'Comic Book Resources - CBR TV',
'title': 'CBR EXCLUSIVE: "Gotham City Imposters" Bats VS Jokerz Short 3',
_TESTS = [
{
'url': 'http://blip.tv/cbr/cbr-exclusive-gotham-city-imposters-bats-vs-jokerz-short-3-5796352',
'md5': 'c6934ad0b6acf2bd920720ec888eb812',
'info_dict': {
'id': '5779306',
'ext': 'mov',
'title': 'CBR EXCLUSIVE: "Gotham City Imposters" Bats VS Jokerz Short 3',
'description': 'md5:9bc31f227219cde65e47eeec8d2dc596',
'timestamp': 1323138843,
'upload_date': '20111206',
'uploader': 'cbr',
'uploader_id': '679425',
'duration': 81,
}
},
{
# https://github.com/rg3/youtube-dl/pull/2274
'note': 'Video with subtitles',
'url': 'http://blip.tv/play/h6Uag5OEVgI.html',
'md5': '309f9d25b820b086ca163ffac8031806',
'info_dict': {
'id': '6586561',
'ext': 'mp4',
'title': 'Red vs. Blue Season 11 Episode 1',
'description': 'One-Zero-One',
'timestamp': 1371261608,
'upload_date': '20130615',
'uploader': 'redvsblue',
'uploader_id': '792887',
'duration': 279,
}
},
{
# https://bugzilla.redhat.com/show_bug.cgi?id=967465
'url': 'http://a.blip.tv/api.swf#h6Uag5KbVwI',
'md5': '314e87b1ebe7a48fcbfdd51b791ce5a6',
'info_dict': {
'id': '6573122',
'ext': 'mov',
'upload_date': '20130520',
'description': 'Two hapless space marines argue over what to do when they realize they have an astronomically huge problem on their hands.',
'title': 'Red vs. Blue Season 11 Trailer',
'timestamp': 1369029609,
'uploader': 'redvsblue',
'uploader_id': '792887',
}
}
}, {
# https://github.com/rg3/youtube-dl/pull/2274
'note': 'Video with subtitles',
'url': 'http://blip.tv/play/h6Uag5OEVgI.html',
'md5': '309f9d25b820b086ca163ffac8031806',
'info_dict': {
'id': '6586561',
'ext': 'mp4',
'uploader': 'Red vs. Blue',
'description': 'One-Zero-One',
'upload_date': '20130614',
'title': 'Red vs. Blue Season 11 Episode 1',
}
}]
]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
presumptive_id = mobj.group('presumptive_id')
lookup_id = mobj.group('lookup_id')
# See https://github.com/rg3/youtube-dl/issues/857
embed_mobj = re.match(r'https?://(?:\w+\.)?blip\.tv/(?:play/|api\.swf#)([a-zA-Z0-9]+)', url)
if embed_mobj:
info_url = 'http://blip.tv/play/%s.x?p=1' % embed_mobj.group(1)
info_page = self._download_webpage(info_url, embed_mobj.group(1))
video_id = self._search_regex(
r'data-episode-id="([0-9]+)', info_page, 'video_id')
return self.url_result('http://blip.tv/a/a-' + video_id, 'BlipTV')
cchar = '&' if '?' in url else '?'
json_url = url + cchar + 'skin=json&version=2&no_wrap=1'
request = compat_urllib_request.Request(json_url)
request.add_header('User-Agent', 'iTunes/10.6.1')
json_data = self._download_json(request, video_id=presumptive_id)
if 'Post' in json_data:
data = json_data['Post']
if lookup_id:
info_page = self._download_webpage(
'http://blip.tv/play/%s.x?p=1' % lookup_id, lookup_id, 'Resolving lookup id')
video_id = self._search_regex(r'data-episode-id="([0-9]+)', info_page, 'video_id')
else:
data = json_data
video_id = mobj.group('id')
rss = self._download_xml('http://blip.tv/rss/flash/%s' % video_id, video_id, 'Downloading video RSS')
def blip(s):
return '{http://blip.tv/dtd/blip/1.0}%s' % s
def media(s):
return '{http://search.yahoo.com/mrss/}%s' % s
def itunes(s):
return '{http://www.itunes.com/dtds/podcast-1.0.dtd}%s' % s
item = rss.find('channel/item')
video_id = item.find(blip('item_id')).text
title = item.find('./title').text
description = clean_html(compat_str(item.find(blip('puredescription')).text))
timestamp = parse_iso8601(item.find(blip('datestamp')).text)
uploader = item.find(blip('user')).text
uploader_id = item.find(blip('userid')).text
duration = int(item.find(blip('runtime')).text)
media_thumbnail = item.find(media('thumbnail'))
thumbnail = media_thumbnail.get('url') if media_thumbnail is not None else item.find(itunes('image')).text
categories = [category.text for category in item.findall('category')]
video_id = compat_str(data['item_id'])
upload_date = datetime.datetime.strptime(data['datestamp'], '%m-%d-%y %H:%M%p').strftime('%Y%m%d')
subtitles = {}
formats = []
if 'additionalMedia' in data:
for f in data['additionalMedia']:
if f.get('file_type_srt') == 1:
LANGS = {
'english': 'en',
}
lang = f['role'].rpartition('-')[-1].strip().lower()
langcode = LANGS.get(lang, lang)
subtitles[langcode] = f['url']
continue
if not int(f['media_width']): # filter m3u8
continue
subtitles = {}
media_group = item.find(media('group'))
for media_content in media_group.findall(media('content')):
url = media_content.get('url')
role = media_content.get(blip('role'))
msg = self._download_webpage(
url + '?showplayer=20140425131715&referrer=http://blip.tv&mask=7&skin=flashvars&view=url',
video_id, 'Resolving URL for %s' % role)
real_url = compat_urlparse.parse_qs(msg)['message'][0]
media_type = media_content.get('type')
if media_type == 'text/srt' or url.endswith('.srt'):
LANGS = {
'english': 'en',
}
lang = role.rpartition('-')[-1].strip().lower()
langcode = LANGS.get(lang, lang)
subtitles[langcode] = url
elif media_type.startswith('video/'):
formats.append({
'url': f['url'],
'format_id': f['role'],
'width': int(f['media_width']),
'height': int(f['media_height']),
'url': real_url,
'format_id': role,
'format_note': media_type,
'vcodec': media_content.get(blip('vcodec')),
'acodec': media_content.get(blip('acodec')),
'filesize': media_content.get('filesize'),
'width': int(media_content.get('width')),
'height': int(media_content.get('height')),
})
else:
formats.append({
'url': data['media']['url'],
'width': int(data['media']['width']),
'height': int(data['media']['height']),
})
self._sort_formats(formats)
# subtitles
@@ -107,12 +144,14 @@ class BlipTVIE(SubtitlesInfoExtractor):
return {
'id': video_id,
'uploader': data['display_name'],
'upload_date': upload_date,
'title': data['title'],
'thumbnail': data['thumbnailUrl'],
'description': data['description'],
'user_agent': 'iTunes/10.6.1',
'title': title,
'description': description,
'timestamp': timestamp,
'uploader': uploader,
'uploader_id': uploader_id,
'duration': duration,
'thumbnail': thumbnail,
'categories': categories,
'formats': formats,
'subtitles': video_subtitles,
}
@@ -126,7 +165,7 @@ class BlipTVIE(SubtitlesInfoExtractor):
class BlipTVUserIE(InfoExtractor):
_VALID_URL = r'(?:(?:(?:https?://)?(?:\w+\.)?blip\.tv/)|bliptvuser:)([^/]+)/*$'
_VALID_URL = r'(?:(?:(?:https?://)?(?:\w+\.)?blip\.tv/)|bliptvuser:)(?!api\.swf)([^/]+)/*$'
_PAGE_SIZE = 12
IE_NAME = 'blip.tv:user'

View File

@@ -10,7 +10,7 @@ class BloombergIE(InfoExtractor):
_TEST = {
'url': 'http://www.bloomberg.com/video/shah-s-presentation-on-foreign-exchange-strategies-qurhIVlJSB6hzkVi229d8g.html',
'md5': '7bf08858ff7c203c870e8a6190e221e5',
# The md5 checksum changes
'info_dict': {
'id': 'qurhIVlJSB6hzkVi229d8g',
'ext': 'flv',
@@ -31,8 +31,7 @@ class BloombergIE(InfoExtractor):
return {
'id': name.split('-')[-1],
'title': title,
'url': f4m_url,
'ext': 'flv',
'formats': self._extract_f4m_formats(f4m_url, name),
'description': self._og_search_description(webpage),
'thumbnail': self._og_search_thumbnail(webpage),
}

View File

@@ -7,25 +7,25 @@ from .common import InfoExtractor
from ..utils import (
ExtractorError,
int_or_none,
parse_duration,
)
class BRIE(InfoExtractor):
IE_DESC = 'Bayerischer Rundfunk Mediathek'
_VALID_URL = r'https?://(?:www\.)?br\.de/(?:[a-z0-9\-]+/)+(?P<id>[a-z0-9\-]+)\.html'
_VALID_URL = r'https?://(?:www\.)?br\.de/(?:[a-z0-9\-_]+/)+(?P<id>[a-z0-9\-_]+)\.html'
_BASE_URL = 'http://www.br.de'
_TESTS = [
{
'url': 'http://www.br.de/mediathek/video/anselm-gruen-114.html',
'md5': 'c4f83cf0f023ba5875aba0bf46860df2',
'url': 'http://www.br.de/mediathek/video/sendungen/heimatsound/heimatsound-festival-2014-trailer-100.html',
'md5': '93556dd2bcb2948d9259f8670c516d59',
'info_dict': {
'id': '2c8d81c5-6fb7-4a74-88d4-e768e5856532',
'id': '25e279aa-1ffd-40fd-9955-5325bd48a53a',
'ext': 'mp4',
'title': 'Feiern und Verzichten',
'description': 'Anselm Grün: Feiern und Verzichten',
'uploader': 'BR/Birgit Baier',
'upload_date': '20140301',
'title': 'Wenn das Traditions-Theater wackelt',
'description': 'Heimatsound-Festival 2014: Wenn das Traditions-Theater wackelt',
'duration': 34,
}
},
{
@@ -36,6 +36,7 @@ class BRIE(InfoExtractor):
'ext': 'mp4',
'title': 'Über den Pass',
'description': 'Die Eroberung der Alpen: Über den Pass',
'duration': 2588,
}
},
{
@@ -46,6 +47,7 @@ class BRIE(InfoExtractor):
'ext': 'aac',
'title': '"Keine neuen Schulden im nächsten Jahr"',
'description': 'Haushaltsentwurf: "Keine neuen Schulden im nächsten Jahr"',
'duration': 64,
}
},
{
@@ -56,6 +58,7 @@ class BRIE(InfoExtractor):
'ext': 'mp4',
'title': 'Umweltbewusster Häuslebauer',
'description': 'Uwe Erdelt: Umweltbewusster Häuslebauer',
'duration': 116,
}
},
{
@@ -66,6 +69,7 @@ class BRIE(InfoExtractor):
'ext': 'mp4',
'title': 'Folge 1 - Metaphysik',
'description': 'Kant für Anfänger: Folge 1 - Metaphysik',
'duration': 893,
'uploader': 'Eva Maria Steimle',
'upload_date': '20140117',
}
@@ -86,6 +90,7 @@ class BRIE(InfoExtractor):
media = {
'id': xml_media.get('externalId'),
'title': xml_media.find('title').text,
'duration': parse_duration(xml_media.find('duration').text),
'formats': self._extract_formats(xml_media.find('assets')),
'thumbnails': self._extract_thumbnails(xml_media.find('teaserImage/variants')),
'description': ' '.join(xml_media.find('shareTitle').text.splitlines()),

View File

@@ -15,6 +15,7 @@ from ..utils import (
compat_urllib_request,
compat_parse_qs,
determine_ext,
ExtractorError,
unsmuggle_url,
unescapeHTML,
@@ -29,10 +30,11 @@ class BrightcoveIE(InfoExtractor):
{
# From http://www.8tv.cat/8aldia/videos/xavier-sala-i-martin-aquesta-tarda-a-8-al-dia/
'url': 'http://c.brightcove.com/services/viewer/htmlFederated?playerID=1654948606001&flashID=myExperience&%40videoPlayer=2371591881001',
'file': '2371591881001.mp4',
'md5': '5423e113865d26e40624dce2e4b45d95',
'note': 'Test Brightcove downloads and detection in GenericIE',
'info_dict': {
'id': '2371591881001',
'ext': 'mp4',
'title': 'Xavier Sala i Martín: “Un banc que no presta és un banc zombi que no serveix per a res”',
'uploader': '8TV',
'description': 'md5:a950cc4285c43e44d763d036710cd9cd',
@@ -41,8 +43,9 @@ class BrightcoveIE(InfoExtractor):
{
# From http://medianetwork.oracle.com/video/player/1785452137001
'url': 'http://c.brightcove.com/services/viewer/htmlFederated?playerID=1217746023001&flashID=myPlayer&%40videoPlayer=1785452137001',
'file': '1785452137001.flv',
'info_dict': {
'id': '1785452137001',
'ext': 'flv',
'title': 'JVMLS 2012: Arrays 2.0 - Opportunities and Challenges',
'description': 'John Rose speaks at the JVM Language Summit, August 1, 2012.',
'uploader': 'Oracle',
@@ -70,7 +73,20 @@ class BrightcoveIE(InfoExtractor):
'description': 'md5:363109c02998fee92ec02211bd8000df',
'uploader': 'National Ballet of Canada',
},
}
},
{
# test flv videos served by akamaihd.net
# From http://www.redbull.com/en/bike/stories/1331655643987/replay-uci-dh-world-cup-2014-from-fort-william
'url': 'http://c.brightcove.com/services/viewer/htmlFederated?%40videoPlayer=ref%3ABC2996102916001&linkBaseURL=http%3A%2F%2Fwww.redbull.com%2Fen%2Fbike%2Fvideos%2F1331655630249%2Freplay-uci-fort-william-2014-dh&playerKey=AQ%7E%7E%2CAAAApYJ7UqE%7E%2Cxqr_zXk0I-zzNndy8NlHogrCb5QdyZRf&playerID=1398061561001#__youtubedl_smuggle=%7B%22Referer%22%3A+%22http%3A%2F%2Fwww.redbull.com%2Fen%2Fbike%2Fstories%2F1331655643987%2Freplay-uci-dh-world-cup-2014-from-fort-william%22%7D',
# The md5 checksum changes on each download
'info_dict': {
'id': '2996102916001',
'ext': 'flv',
'title': 'UCI MTB World Cup 2014: Fort William, UK - Downhill Finals',
'uploader': 'Red Bull TV',
'description': 'UCI MTB World Cup 2014: Fort William, UK - Downhill Finals',
},
},
]
@classmethod
@@ -138,12 +154,14 @@ class BrightcoveIE(InfoExtractor):
def _extract_brightcove_urls(cls, webpage):
"""Return a list of all Brightcove URLs from the webpage """
url_m = re.search(r'<meta\s+property="og:video"\s+content="(http://c.brightcove.com/[^"]+)"', webpage)
url_m = re.search(
r'<meta\s+property="og:video"\s+content="(https?://(?:secure|c)\.brightcove.com/[^"]+)"',
webpage)
if url_m:
url = unescapeHTML(url_m.group(1))
# Some sites don't add it, we can't download with this url, for example:
# http://www.ktvu.com/videos/news/raw-video-caltrain-releases-video-of-man-almost/vCTZdY/
if 'playerKey' in url:
if 'playerKey' in url or 'videoId' in url:
return [url]
matches = re.findall(
@@ -172,9 +190,13 @@ class BrightcoveIE(InfoExtractor):
referer = smuggled_data.get('Referer', url)
return self._get_video_info(
videoPlayer[0], query_str, query, referer=referer)
else:
elif 'playerKey' in query:
player_key = query['playerKey']
return self._get_playlist_info(player_key[0])
else:
raise ExtractorError(
'Cannot find playerKey= variable. Did you forget quotes in a shell invocation?',
expected=True)
def _get_video_info(self, video_id, query_str, query, referer=None):
request_url = self._FEDERATED_URL_TEMPLATE % query_str
@@ -186,8 +208,15 @@ class BrightcoveIE(InfoExtractor):
req.add_header('Referer', referer)
webpage = self._download_webpage(req, video_id)
error_msg = self._html_search_regex(
r"<h1>We're sorry.</h1>\s*<p>(.*?)</p>", webpage,
'error message', default=None)
if error_msg is not None:
raise ExtractorError(
'brightcove said: %s' % error_msg, expected=True)
self.report_extraction(video_id)
info = self._search_regex(r'var experienceJSON = ({.*?});', webpage, 'json')
info = self._search_regex(r'var experienceJSON = ({.*});', webpage, 'json')
info = json.loads(info)['data']
video_info = info['programmedContent']['videoPlayer']['mediaDTO']
video_info['_youtubedl_adServerURL'] = info.get('adServerURL')
@@ -219,12 +248,26 @@ class BrightcoveIE(InfoExtractor):
renditions = video_info.get('renditions')
if renditions:
renditions = sorted(renditions, key=lambda r: r['size'])
info['formats'] = [{
'url': rend['defaultURL'],
'height': rend.get('frameHeight'),
'width': rend.get('frameWidth'),
} for rend in renditions]
formats = []
for rend in renditions:
url = rend['defaultURL']
if rend['remote']:
# This type of renditions are served through akamaihd.net,
# but they don't use f4m manifests
url = url.replace('control/', '') + '?&v=3.3.0&fp=13&r=FEEFJ&g=RTSJIMBMPFPB'
ext = 'flv'
else:
ext = determine_ext(url)
size = rend.get('size')
formats.append({
'url': url,
'ext': ext,
'height': rend.get('frameHeight'),
'width': rend.get('frameWidth'),
'filesize': size if size != 0 else None,
})
self._sort_formats(formats)
info['formats'] = formats
elif video_info.get('FLVFullLengthURL') is not None:
info.update({
'url': video_info['FLVFullLengthURL'],

View File

@@ -4,17 +4,20 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import unified_strdate
from ..utils import (
unified_strdate,
url_basename,
)
class CanalplusIE(InfoExtractor):
_VALID_URL = r'https?://(www\.canalplus\.fr/.*?/(?P<path>.*)|player\.canalplus\.fr/#/(?P<id>\d+))'
_VALID_URL = r'https?://(?:www\.canalplus\.fr/.*?/(?P<path>.*)|player\.canalplus\.fr/#/(?P<id>[0-9]+))'
_VIDEO_INFO_TEMPLATE = 'http://service.canal-plus.com/video/rest/getVideosLiees/cplus/%s'
IE_NAME = 'canalplus.fr'
_TEST = {
'url': 'http://www.canalplus.fr/c-infos-documentaires/pid1830-c-zapping.html?vid=922470',
'md5': '60c29434a416a83c15dae2587d47027d',
'md5': '3db39fb48b9685438ecf33a1078023e4',
'info_dict': {
'id': '922470',
'ext': 'flv',
@@ -26,10 +29,13 @@ class CanalplusIE(InfoExtractor):
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
video_id = mobj.groupdict().get('id')
# Beware, some subclasses do not define an id group
display_id = url_basename(mobj.group('path'))
if video_id is None:
webpage = self._download_webpage(url, mobj.group('path'))
webpage = self._download_webpage(url, display_id)
video_id = self._search_regex(r'<canal:player videoId="(\d+)"', webpage, 'video id')
info_url = self._VIDEO_INFO_TEMPLATE % video_id
@@ -53,6 +59,7 @@ class CanalplusIE(InfoExtractor):
return {
'id': video_id,
'display_id': display_id,
'title': '%s - %s' % (infos.find('TITRAGE/TITRE').text,
infos.find('TITRAGE/SOUS_TITRE').text),
'upload_date': unified_strdate(infos.find('PUBLICATION/DATE').text),

View File

@@ -1,24 +1,42 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
class CBSIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?cbs\.com/shows/[^/]+/video/(?P<id>[^/]+)/.*'
_VALID_URL = r'https?://(?:www\.)?cbs\.com/shows/[^/]+/(?:video|artist)/(?P<id>[^/]+)/.*'
_TEST = {
u'url': u'http://www.cbs.com/shows/garth-brooks/video/_u7W953k6la293J7EPTd9oHkSPs6Xn6_/connect-chat-feat-garth-brooks/',
u'file': u'4JUVEwq3wUT7.flv',
u'info_dict': {
u'title': u'Connect Chat feat. Garth Brooks',
u'description': u'Connect with country music singer Garth Brooks, as he chats with fans on Wednesday November 27, 2013. Be sure to tune in to Garth Brooks: Live from Las Vegas, Friday November 29, at 9/8c on CBS!',
u'duration': 1495,
_TESTS = [{
'url': 'http://www.cbs.com/shows/garth-brooks/video/_u7W953k6la293J7EPTd9oHkSPs6Xn6_/connect-chat-feat-garth-brooks/',
'info_dict': {
'id': '4JUVEwq3wUT7',
'ext': 'flv',
'title': 'Connect Chat feat. Garth Brooks',
'description': 'Connect with country music singer Garth Brooks, as he chats with fans on Wednesday November 27, 2013. Be sure to tune in to Garth Brooks: Live from Las Vegas, Friday November 29, at 9/8c on CBS!',
'duration': 1495,
},
u'params': {
'params': {
# rtmp download
u'skip_download': True,
'skip_download': True,
},
}
'_skip': 'Blocked outside the US',
}, {
'url': 'http://www.cbs.com/shows/liveonletterman/artist/221752/st-vincent/',
'info_dict': {
'id': 'P9gjWjelt6iP',
'ext': 'flv',
'title': 'Live on Letterman - St. Vincent',
'description': 'Live On Letterman: St. Vincent in concert from New York\'s Ed Sullivan Theater on Tuesday, July 16, 2014.',
'duration': 3221,
},
'params': {
# rtmp download
'skip_download': True,
},
'_skip': 'Blocked outside the US',
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
@@ -26,5 +44,5 @@ class CBSIE(InfoExtractor):
webpage = self._download_webpage(url, video_id)
real_id = self._search_regex(
r"video\.settings\.pid\s*=\s*'([^']+)';",
webpage, u'real video ID')
webpage, 'real video ID')
return self.url_result(u'theplatform:%s' % real_id)

View File

@@ -42,7 +42,7 @@ class ChilloutzoneIE(InfoExtractor):
'id': '85523671',
'ext': 'mp4',
'title': 'The Sunday Times - Icons',
'description': 'md5:3e1c0dc6047498d6728dcdaad0891762',
'description': 'md5:a5f7ff82e2f7a9ed77473fe666954e84',
'uploader': 'Us',
'uploader_id': 'usfilms',
'upload_date': '20140131'

View File

@@ -1,10 +1,12 @@
# encoding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
ExtractorError,
int_or_none,
)
@@ -13,9 +15,10 @@ class CinemassacreIE(InfoExtractor):
_TESTS = [
{
'url': 'http://cinemassacre.com/2012/11/10/avgn-the-movie-trailer/',
'file': '19911.mp4',
'md5': '782f8504ca95a0eba8fc9177c373eec7',
'md5': 'fde81fbafaee331785f58cd6c0d46190',
'info_dict': {
'id': '19911',
'ext': 'mp4',
'upload_date': '20121110',
'title': '“Angry Video Game Nerd: The Movie” Trailer',
'description': 'md5:fb87405fcb42a331742a0dce2708560b',
@@ -23,9 +26,10 @@ class CinemassacreIE(InfoExtractor):
},
{
'url': 'http://cinemassacre.com/2013/10/02/the-mummys-hand-1940',
'file': '521be8ef82b16.mp4',
'md5': 'dec39ee5118f8d9cc067f45f9cbe3a35',
'md5': 'd72f10cd39eac4215048f62ab477a511',
'info_dict': {
'id': '521be8ef82b16',
'ext': 'mp4',
'upload_date': '20131002',
'title': 'The Mummys Hand (1940)',
},
@@ -50,29 +54,40 @@ class CinemassacreIE(InfoExtractor):
r'<div class="entry-content">(?P<description>.+?)</div>',
webpage, 'description', flags=re.DOTALL, fatal=False)
playerdata = self._download_webpage(playerdata_url, video_id)
playerdata = self._download_webpage(playerdata_url, video_id, 'Downloading player webpage')
video_thumbnail = self._search_regex(
r'image: \'(?P<thumbnail>[^\']+)\'', playerdata, 'thumbnail', fatal=False)
sd_url = self._search_regex(r'file: \'([^\']+)\', label: \'SD\'', playerdata, 'sd_file')
videolist_url = self._search_regex(r'file: \'([^\']+\.smil)\'}', playerdata, 'videolist_url')
sd_url = self._html_search_regex(r'file: \'([^\']+)\', label: \'SD\'', playerdata, 'sd_file')
hd_url = self._html_search_regex(
r'file: \'([^\']+)\', label: \'HD\'', playerdata, 'hd_file',
default=None)
video_thumbnail = self._html_search_regex(r'image: \'(?P<thumbnail>[^\']+)\'', playerdata, 'thumbnail', fatal=False)
videolist = self._download_xml(videolist_url, video_id, 'Downloading videolist XML')
formats = [{
'url': sd_url,
'ext': 'mp4',
'format': 'sd',
'format_id': 'sd',
'quality': 1,
}]
if hd_url:
formats.append({
'url': hd_url,
'ext': 'mp4',
'format': 'hd',
'format_id': 'hd',
'quality': 2,
})
formats = []
baseurl = sd_url[:sd_url.rfind('/')+1]
for video in videolist.findall('.//video'):
src = video.get('src')
if not src:
continue
file_ = src.partition(':')[-1]
width = int_or_none(video.get('width'))
height = int_or_none(video.get('height'))
bitrate = int_or_none(video.get('system-bitrate'))
format = {
'url': baseurl + file_,
'format_id': src.rpartition('.')[0].rpartition('_')[-1],
}
if width or height:
format.update({
'tbr': bitrate // 1000 if bitrate else None,
'width': width,
'height': height,
})
else:
format.update({
'abr': bitrate // 1000 if bitrate else None,
'vcodec': 'none',
})
formats.append(format)
self._sort_formats(formats)
return {

View File

@@ -1,19 +1,19 @@
from __future__ import unicode_literals
from .mtv import MTVIE
class CMTIE(MTVIE):
IE_NAME = u'cmt.com'
IE_NAME = 'cmt.com'
_VALID_URL = r'https?://www\.cmt\.com/videos/.+?/(?P<videoid>[^/]+)\.jhtml'
_FEED_URL = 'http://www.cmt.com/sitewide/apps/player/embed/rss/'
_TESTS = [
{
u'url': u'http://www.cmt.com/videos/garth-brooks/989124/the-call-featuring-trisha-yearwood.jhtml#artist=30061',
u'md5': u'e6b7ef3c4c45bbfae88061799bbba6c2',
u'info_dict': {
u'id': u'989124',
u'ext': u'mp4',
u'title': u'Garth Brooks - "The Call (featuring Trisha Yearwood)"',
u'description': u'Blame It All On My Roots',
},
_TESTS = [{
'url': 'http://www.cmt.com/videos/garth-brooks/989124/the-call-featuring-trisha-yearwood.jhtml#artist=30061',
'md5': 'e6b7ef3c4c45bbfae88061799bbba6c2',
'info_dict': {
'id': '989124',
'ext': 'mp4',
'title': 'Garth Brooks - "The Call (featuring Trisha Yearwood)"',
'description': 'Blame It All On My Roots',
},
]
}]

View File

@@ -43,7 +43,11 @@ class CNETIE(InfoExtractor):
raise ExtractorError('Cannot find video data')
video_id = vdata['id']
title = vdata['headline']
title = vdata.get('headline')
if title is None:
title = vdata.get('title')
if title is None:
raise ExtractorError('Cannot find title!')
description = vdata.get('dek')
thumbnail = vdata.get('image', {}).get('path')
author = vdata.get('author')

View File

@@ -79,8 +79,11 @@ class CNNIE(InfoExtractor):
self._sort_formats(formats)
thumbnails = sorted([((int(t.attrib['height']),int(t.attrib['width'])), t.text) for t in info.findall('images/image')])
thumbs_dict = [{'resolution': res, 'url': t_url} for (res, t_url) in thumbnails]
thumbnails = [{
'height': int(t.attrib['height']),
'width': int(t.attrib['width']),
'url': t.text,
} for t in info.findall('images/image')]
metas_el = info.find('metas')
upload_date = (
@@ -93,8 +96,7 @@ class CNNIE(InfoExtractor):
'id': info.attrib['id'],
'title': info.find('headline').text,
'formats': formats,
'thumbnail': thumbnails[-1][1],
'thumbnails': thumbs_dict,
'thumbnails': thumbnails,
'description': info.find('description').text,
'duration': duration,
'upload_date': upload_date,

View File

@@ -14,13 +14,13 @@ from ..utils import (
class ComedyCentralIE(MTVServicesInfoExtractor):
_VALID_URL = r'''(?x)https?://(?:www\.)?(comedycentral|cc)\.com/
(video-clips|episodes|cc-studios|video-collections)
_VALID_URL = r'''(?x)https?://(?:www\.)?cc\.com/
(video-clips|episodes|cc-studios|video-collections|full-episodes)
/(?P<title>.*)'''
_FEED_URL = 'http://comedycentral.com/feeds/mrss/'
_TEST = {
'url': 'http://www.comedycentral.com/video-clips/kllhuv/stand-up-greg-fitzsimmons--uncensored---too-good-of-a-mother',
'url': 'http://www.cc.com/video-clips/kllhuv/stand-up-greg-fitzsimmons--uncensored---too-good-of-a-mother',
'md5': 'c4f48e9eda1b16dd10add0744344b6d8',
'info_dict': {
'id': 'cef0cbb3-e776-4bc9-b62e-8016deccb354',
@@ -130,7 +130,7 @@ class ComedyCentralShowsIE(InfoExtractor):
raise ExtractorError('Invalid redirected URL: ' + url)
if mobj.group('episode') == '':
raise ExtractorError('Redirected URL is still not specific: ' + url)
epTitle = mobj.group('episode').rpartition('/')[-1]
epTitle = (mobj.group('episode') or mobj.group('videotitle')).rpartition('/')[-1]
mMovieParams = re.findall('(?:<param name="movie" value="|var url = ")(http://media.mtvnservices.com/([^"]*(?:episode|video).*?:.*?))"', webpage)
if len(mMovieParams) == 0:
@@ -188,7 +188,7 @@ class ComedyCentralShowsIE(InfoExtractor):
})
formats.append({
'format_id': 'rtmp-%s' % format,
'url': rtmp_video_url,
'url': rtmp_video_url.replace('viacomccstrm', 'viacommtvstrm'),
'ext': self._video_extensions.get(format, 'mp4'),
'height': h,
'width': w,

View File

@@ -1,11 +1,12 @@
import base64
import hashlib
import json
import netrc
import os
import re
import socket
import sys
import netrc
import time
import xml.etree.ElementTree
from ..utils import (
@@ -17,6 +18,7 @@ from ..utils import (
clean_html,
compiled_regex_type,
ExtractorError,
int_or_none,
RegexNotFoundError,
sanitize_filename,
unescapeHTML,
@@ -68,6 +70,7 @@ class InfoExtractor(object):
* vcodec Name of the video codec in use
* container Name of the container format
* filesize The number of bytes, if known in advance
* filesize_approx An estimate for the number of bytes
* player_url SWF Player URL (used for rtmpdump).
* protocol The protocol that will be used for the actual
download, lower-case.
@@ -81,6 +84,12 @@ class InfoExtractor(object):
format, irrespective of the file format.
-1 for default (order by other properties),
-2 or smaller for less than default.
* http_referer HTTP Referer header value to set.
* http_method HTTP method to use for the download.
* http_headers A dictionary of additional HTTP headers
to add to the request.
* http_post_data Additional data to send with a POST
request.
url: Final video URL.
ext: Video filename extension.
format: The video format, defaults to ext (used for --get-format)
@@ -92,8 +101,12 @@ class InfoExtractor(object):
unique, but available before title. Typically, id is
something like "4234987", title "Dancing naked mole rats",
and display_id "dancing-naked-mole-rats"
thumbnails: A list of dictionaries (with the entries "resolution" and
"url") for the varying thumbnails
thumbnails: A list of dictionaries, with the following entries:
* "url"
* "width" (optional, int)
* "height" (optional, int)
* "resolution" (optional, string "{width}x{height"},
deprecated)
thumbnail: Full URL to a video thumbnail image.
description: One-line video description.
uploader: Full name of the video uploader.
@@ -101,7 +114,7 @@ class InfoExtractor(object):
upload_date: Video upload date (YYYYMMDD).
If not explicitly set, calculated from timestamp.
uploader_id: Nickname or id of the video uploader.
location: Physical location of the video.
location: Physical location where the video was filmed.
subtitles: The subtitle file contents as a dictionary in the format
{language: subtitles}.
duration: Length of the video in seconds, as an integer.
@@ -113,6 +126,8 @@ class InfoExtractor(object):
webpage_url: The url to the video webpage, if given to youtube-dl it
should allow to get the same result again. (It will be set
by YoutubeDL if it's missing)
categories: A list of categories that the video falls in, for example
["Sports", "Berlin"]
Unless mentioned otherwise, the fields should be Unicode strings.
@@ -242,10 +257,11 @@ class InfoExtractor(object):
url = url_or_request.get_full_url()
except AttributeError:
url = url_or_request
if len(url) > 200:
h = u'___' + hashlib.md5(url.encode('utf-8')).hexdigest()
url = url[:200 - len(h)] + h
raw_filename = ('%s_%s.dump' % (video_id, url))
basen = '%s_%s' % (video_id, url)
if len(basen) > 240:
h = u'___' + hashlib.md5(basen.encode('utf-8')).hexdigest()
basen = basen[:240 - len(h)] + h
raw_filename = basen + '.dump'
filename = sanitize_filename(raw_filename, restricted=True)
self.to_screen(u'Saving request to ' + filename)
with open(filename, 'wb') as outf:
@@ -292,8 +308,12 @@ class InfoExtractor(object):
def _download_json(self, url_or_request, video_id,
note=u'Downloading JSON metadata',
errnote=u'Unable to download JSON metadata',
transform_source=None):
json_string = self._download_webpage(url_or_request, video_id, note, errnote)
transform_source=None,
fatal=True):
json_string = self._download_webpage(
url_or_request, video_id, note, errnote, fatal=fatal)
if (not fatal) and json_string is False:
return None
if transform_source:
json_string = transform_source(json_string)
try:
@@ -360,7 +380,8 @@ class InfoExtractor(object):
else:
for p in pattern:
mobj = re.search(p, string, flags)
if mobj: break
if mobj:
break
if os.name != 'nt' and sys.stderr.isatty():
_name = u'\033[0;34m%s\033[0m' % name
@@ -419,6 +440,22 @@ class InfoExtractor(object):
return (username, password)
def _get_tfa_info(self):
"""
Get the two-factor authentication info
TODO - asking the user will be required for sms/phone verify
currently just uses the command line option
If there's no info available, return None
"""
if self._downloader is None:
return None
downloader_params = self._downloader.params
if downloader_params.get('twofactor', None) is not None:
return downloader_params['twofactor']
return None
# Helper functions for extracting OpenGraph info
@staticmethod
def _og_regexes(prop):
@@ -448,18 +485,22 @@ class InfoExtractor(object):
return self._og_search_property('title', html, **kargs)
def _og_search_video_url(self, html, name='video url', secure=True, **kargs):
regexes = self._og_regexes('video')
if secure: regexes = self._og_regexes('video:secure_url') + regexes
regexes = self._og_regexes('video') + self._og_regexes('video:url')
if secure:
regexes = self._og_regexes('video:secure_url') + regexes
return self._html_search_regex(regexes, html, name, **kargs)
def _html_search_meta(self, name, html, display_name=None, fatal=False):
def _og_search_url(self, html, **kargs):
return self._og_search_property('url', html, **kargs)
def _html_search_meta(self, name, html, display_name=None, fatal=False, **kwargs):
if display_name is None:
display_name = name
return self._html_search_regex(
r'''(?ix)<meta
(?=[^>]+(?:itemprop|name|property)=["\']%s["\'])
(?=[^>]+(?:itemprop|name|property)=["\']?%s["\']?)
[^>]+content=["\']([^"\']+)["\']''' % re.escape(name),
html, display_name, fatal=fatal)
html, display_name, fatal=fatal, **kwargs)
def _dc_search_uploader(self, html):
return self._html_search_meta('dc.creator', html, 'uploader')
@@ -544,10 +585,106 @@ class InfoExtractor(object):
f.get('abr') if f.get('abr') is not None else -1,
audio_ext_preference,
f.get('filesize') if f.get('filesize') is not None else -1,
f.get('filesize_approx') if f.get('filesize_approx') is not None else -1,
f.get('format_id'),
)
formats.sort(key=_formats_key)
def http_scheme(self):
""" Either "https:" or "https:", depending on the user's preferences """
return (
'http:'
if self._downloader.params.get('prefer_insecure', False)
else 'https:')
def _proto_relative_url(self, url, scheme=None):
if url is None:
return url
if url.startswith('//'):
if scheme is None:
scheme = self.http_scheme()
return scheme + url
else:
return url
def _sleep(self, timeout, video_id, msg_template=None):
if msg_template is None:
msg_template = u'%(video_id)s: Waiting for %(timeout)s seconds'
msg = msg_template % {'video_id': video_id, 'timeout': timeout}
self.to_screen(msg)
time.sleep(timeout)
def _extract_f4m_formats(self, manifest_url, video_id):
manifest = self._download_xml(
manifest_url, video_id, 'Downloading f4m manifest',
'Unable to download f4m manifest')
formats = []
media_nodes = manifest.findall('{http://ns.adobe.com/f4m/1.0}media')
for i, media_el in enumerate(media_nodes):
tbr = int_or_none(media_el.attrib.get('bitrate'))
format_id = 'f4m-%d' % (i if tbr is None else tbr)
formats.append({
'format_id': format_id,
'url': manifest_url,
'ext': 'flv',
'tbr': tbr,
'width': int_or_none(media_el.attrib.get('width')),
'height': int_or_none(media_el.attrib.get('height')),
})
self._sort_formats(formats)
return formats
def _extract_m3u8_formats(self, m3u8_url, video_id, ext=None):
formats = [{
'format_id': 'm3u8-meta',
'url': m3u8_url,
'ext': ext,
'protocol': 'm3u8',
'preference': -1,
'resolution': 'multiple',
'format_note': 'Quality selection URL',
}]
m3u8_doc = self._download_webpage(m3u8_url, video_id)
last_info = None
kv_rex = re.compile(
r'(?P<key>[a-zA-Z_-]+)=(?P<val>"[^"]+"|[^",]+)(?:,|$)')
for line in m3u8_doc.splitlines():
if line.startswith('#EXT-X-STREAM-INF:'):
last_info = {}
for m in kv_rex.finditer(line):
v = m.group('val')
if v.startswith('"'):
v = v[1:-1]
last_info[m.group('key')] = v
elif line.startswith('#') or not line.strip():
continue
else:
tbr = int_or_none(last_info.get('BANDWIDTH'), scale=1000)
f = {
'format_id': 'm3u8-%d' % (tbr if tbr else len(formats)),
'url': line.strip(),
'tbr': tbr,
'ext': ext,
}
codecs = last_info.get('CODECS')
if codecs:
video, audio = codecs.split(',')
f['vcodec'] = video.partition('.')[0]
f['acodec'] = audio.partition('.')[0]
resolution = last_info.get('RESOLUTION')
if resolution:
width_str, height_str = resolution.split('x')
f['width'] = int(width_str)
f['height'] = int(height_str)
formats.append(f)
last_info = {}
self._sort_formats(formats)
return formats
class SearchInfoExtractor(InfoExtractor):
"""

View File

@@ -0,0 +1,65 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
parse_iso8601,
str_to_int,
)
class CrackedIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?cracked\.com/video_(?P<id>\d+)_[\da-z-]+\.html'
_TEST = {
'url': 'http://www.cracked.com/video_19006_4-plot-holes-you-didnt-notice-in-your-favorite-movies.html',
'md5': '4b29a5eeec292cd5eca6388c7558db9e',
'info_dict': {
'id': '19006',
'ext': 'mp4',
'title': '4 Plot Holes You Didn\'t Notice in Your Favorite Movies',
'description': 'md5:3b909e752661db86007d10e5ec2df769',
'timestamp': 1405659600,
'upload_date': '20140718',
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
video_url = self._html_search_regex(
[r'var\s+CK_vidSrc\s*=\s*"([^"]+)"', r'<video\s+src="([^"]+)"'], webpage, 'video URL')
title = self._og_search_title(webpage)
description = self._og_search_description(webpage)
timestamp = self._html_search_regex(r'<time datetime="([^"]+)"', webpage, 'upload date', fatal=False)
if timestamp:
timestamp = parse_iso8601(timestamp[:-6])
view_count = str_to_int(self._html_search_regex(
r'<span class="views" id="viewCounts">([\d,\.]+) Views</span>', webpage, 'view count', fatal=False))
comment_count = str_to_int(self._html_search_regex(
r'<span id="commentCounts">([\d,\.]+)</span>', webpage, 'comment count', fatal=False))
m = re.search(r'_(?P<width>\d+)X(?P<height>\d+)\.mp4$', video_url)
if m:
width = int(m.group('width'))
height = int(m.group('height'))
else:
width = height = None
return {
'id': video_id,
'url':video_url,
'title': title,
'description': description,
'timestamp': timestamp,
'view_count': view_count,
'comment_count': comment_count,
'height': height,
'width': width,
}

View File

@@ -1,40 +1,43 @@
# -*- coding: utf-8 -*-
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import determine_ext
class CriterionIE(InfoExtractor):
_VALID_URL = r'https?://www\.criterion\.com/films/(\d*)-.+'
_VALID_URL = r'https?://www\.criterion\.com/films/(?P<id>[0-9]+)-.+'
_TEST = {
u'url': u'http://www.criterion.com/films/184-le-samourai',
u'file': u'184.mp4',
u'md5': u'bc51beba55685509883a9a7830919ec3',
u'info_dict': {
u"title": u"Le Samouraï",
u"description" : u'md5:a2b4b116326558149bef81f76dcbb93f',
'url': 'http://www.criterion.com/films/184-le-samourai',
'md5': 'bc51beba55685509883a9a7830919ec3',
'info_dict': {
'id': '184',
'ext': 'mp4',
'title': 'Le Samouraï',
'description': 'md5:a2b4b116326558149bef81f76dcbb93f',
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group(1)
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
final_url = self._search_regex(r'so.addVariable\("videoURL", "(.+?)"\)\;',
webpage, 'video url')
title = self._html_search_regex(r'<meta content="(.+?)" property="og:title" />',
webpage, 'video title')
description = self._html_search_regex(r'<meta name="description" content="(.+?)" />',
webpage, 'video description')
thumbnail = self._search_regex(r'so.addVariable\("thumbnailURL", "(.+?)"\)\;',
webpage, 'thumbnail url')
final_url = self._search_regex(
r'so.addVariable\("videoURL", "(.+?)"\)\;', webpage, 'video url')
title = self._og_search_title(webpage)
description = self._html_search_regex(
r'<meta name="description" content="(.+?)" />',
webpage, 'video description')
thumbnail = self._search_regex(
r'so.addVariable\("thumbnailURL", "(.+?)"\)\;',
webpage, 'thumbnail url')
return {'id': video_id,
'url' : final_url,
'title': title,
'ext': determine_ext(final_url),
'description': description,
'thumbnail': thumbnail,
}
return {
'id': video_id,
'url': final_url,
'title': title,
'description': description,
'thumbnail': thumbnail,
}

View File

@@ -150,7 +150,7 @@ class DailymotionIE(DailymotionBaseInfoExtractor, SubtitlesInfoExtractor):
return {
'id': video_id,
'formats': formats,
'uploader': info['owner_screenname'],
'uploader': info['owner.screenname'],
'upload_date': video_upload_date,
'title': self._og_search_title(webpage),
'subtitles': video_subtitles,

View File

@@ -0,0 +1,44 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
class DFBIE(InfoExtractor):
IE_NAME = 'tv.dfb.de'
_VALID_URL = r'https?://tv\.dfb\.de/video/[^/]+/(?P<id>\d+)'
_TEST = {
'url': 'http://tv.dfb.de/video/highlights-des-empfangs-in-berlin/9070/',
# The md5 is different each time
'info_dict': {
'id': '9070',
'ext': 'flv',
'title': 'Highlights des Empfangs in Berlin',
'upload_date': '20140716',
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
player_info = self._download_xml(
'http://tv.dfb.de/server/hd_video.php?play=%s' % video_id,
video_id)
video_info = player_info.find('video')
f4m_info = self._download_xml(self._proto_relative_url(video_info.find('url').text.strip()), video_id)
token_el = f4m_info.find('token')
manifest_url = token_el.attrib['url'] + '?' + 'hdnea=' + token_el.attrib['auth'] + '&hdcore=3.2.0'
return {
'id': video_id,
'title': video_info.find('title').text,
'url': manifest_url,
'ext': 'flv',
'thumbnail': self._og_search_thumbnail(webpage),
'upload_date': ''.join(video_info.find('time_date').text.split('.')[::-1]),
}

View File

@@ -7,9 +7,9 @@ from .common import InfoExtractor
class DiscoveryIE(InfoExtractor):
_VALID_URL = r'http://dsc\.discovery\.com\/[a-zA-Z0-9\-]*/[a-zA-Z0-9\-]*/videos/(?P<id>[a-zA-Z0-9\-]*)(.htm)?'
_VALID_URL = r'http://www\.discovery\.com\/[a-zA-Z0-9\-]*/[a-zA-Z0-9\-]*/videos/(?P<id>[a-zA-Z0-9\-]*)(.htm)?'
_TEST = {
'url': 'http://dsc.discovery.com/tv-shows/mythbusters/videos/mission-impossible-outtakes.htm',
'url': 'http://www.discovery.com/tv-shows/mythbusters/videos/mission-impossible-outtakes.htm',
'md5': 'e12614f9ee303a6ccef415cb0793eba2',
'info_dict': {
'id': '614784',

View File

@@ -1,39 +1,37 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
unified_strdate,
)
from ..utils import unified_strdate
class DreiSatIE(InfoExtractor):
IE_NAME = '3sat'
_VALID_URL = r'(?:http://)?(?:www\.)?3sat\.de/mediathek/(?:index\.php)?\?(?:(?:mode|display)=[^&]+&)*obj=(?P<id>[0-9]+)$'
_TEST = {
u"url": u"http://www.3sat.de/mediathek/index.php?obj=36983",
u'file': u'36983.mp4',
u'md5': u'9dcfe344732808dbfcc901537973c922',
u'info_dict': {
u"title": u"Kaffeeland Schweiz",
u"description": u"Über 80 Kaffeeröstereien liefern in der Schweiz das Getränk, in das das Land so vernarrt ist: Mehr als 1000 Tassen trinkt ein Schweizer pro Jahr. SCHWEIZWEIT nimmt die Kaffeekultur unter die...",
u"uploader": u"3sat",
u"upload_date": u"20130622"
'url': 'http://www.3sat.de/mediathek/index.php?obj=36983',
'md5': '9dcfe344732808dbfcc901537973c922',
'info_dict': {
'id': '36983',
'ext': 'mp4',
'title': 'Kaffeeland Schweiz',
'description': 'md5:cc4424b18b75ae9948b13929a0814033',
'uploader': '3sat',
'upload_date': '20130622'
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
details_url = 'http://www.3sat.de/mediathek/xmlservice/web/beitragsDetails?ak=web&id=%s' % video_id
details_doc = self._download_xml(details_url, video_id, note=u'Downloading video details')
details_doc = self._download_xml(details_url, video_id, 'Downloading video details')
thumbnail_els = details_doc.findall('.//teaserimage')
thumbnails = [{
'width': te.attrib['key'].partition('x')[0],
'height': te.attrib['key'].partition('x')[2],
'width': int(te.attrib['key'].partition('x')[0]),
'height': int(te.attrib['key'].partition('x')[2]),
'url': te.text,
} for te in thumbnail_els]

View File

@@ -5,24 +5,26 @@ import os.path
import re
from .common import InfoExtractor
from ..utils import compat_urllib_parse_unquote
class DropboxIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?dropbox[.]com/s/(?P<id>[a-zA-Z0-9]{15})/(?P<title>[^?#]*)'
_TEST = {
'url': 'https://www.dropbox.com/s/0qr9sai2veej4f8/THE_DOCTOR_GAMES.mp4',
'md5': '8ae17c51172fb7f93bdd6a214cc8c896',
'url': 'https://www.dropbox.com/s/nelirfsxnmcfbfh/youtube-dl%20test%20video%20%27%C3%A4%22BaW_jenozKc.mp4',
'md5': '8a3d905427a6951ccb9eb292f154530b',
'info_dict': {
'id': '0qr9sai2veej4f8',
'id': 'nelirfsxnmcfbfh',
'ext': 'mp4',
'title': 'THE_DOCTOR_GAMES'
'title': 'youtube-dl test video \'ä"BaW_jenozKc'
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
title = os.path.splitext(mobj.group('title'))[0]
fn = compat_urllib_parse_unquote(mobj.group('title'))
title = os.path.splitext(fn)[0]
video_url = url + '?dl=1'
return {

View File

@@ -0,0 +1,91 @@
from __future__ import unicode_literals
import re
from .subtitles import SubtitlesInfoExtractor
from .common import ExtractorError
from ..utils import parse_iso8601
class DRTVIE(SubtitlesInfoExtractor):
_VALID_URL = r'http://(?:www\.)?dr\.dk/tv/se/[^/]+/(?P<id>[\da-z-]+)'
_TEST = {
'url': 'http://www.dr.dk/tv/se/partiets-mand/partiets-mand-7-8',
'md5': '4a7e1dd65cdb2643500a3f753c942f25',
'info_dict': {
'id': 'partiets-mand-7-8',
'ext': 'mp4',
'title': 'Partiets mand (7:8)',
'description': 'md5:a684b90a8f9336cd4aab94b7647d7862',
'timestamp': 1403047940,
'upload_date': '20140617',
'duration': 1299.040,
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
programcard = self._download_json(
'http://www.dr.dk/mu/programcard/expanded/%s' % video_id, video_id, 'Downloading video JSON')
data = programcard['Data'][0]
title = data['Title']
description = data['Description']
timestamp = parse_iso8601(data['CreatedTime'][:-5])
thumbnail = None
duration = None
restricted_to_denmark = False
formats = []
subtitles = {}
for asset in data['Assets']:
if asset['Kind'] == 'Image':
thumbnail = asset['Uri']
elif asset['Kind'] == 'VideoResource':
duration = asset['DurationInMilliseconds'] / 1000.0
restricted_to_denmark = asset['RestrictedToDenmark']
for link in asset['Links']:
target = link['Target']
uri = link['Uri']
formats.append({
'url': uri + '?hdcore=3.3.0&plugin=aasp-3.3.0.99.43' if target == 'HDS' else uri,
'format_id': target,
'ext': link['FileFormat'],
'preference': -1 if target == 'HDS' else -2,
})
subtitles_list = asset.get('SubtitlesList')
if isinstance(subtitles_list, list):
LANGS = {
'Danish': 'dk',
}
for subs in subtitles_list:
lang = subs['Language']
subtitles[LANGS.get(lang, lang)] = subs['Uri']
if not formats and restricted_to_denmark:
raise ExtractorError(
'Unfortunately, DR is not allowed to show this program outside Denmark.', expected=True)
self._sort_formats(formats)
if self._downloader.params.get('listsubtitles', False):
self._list_available_subtitles(video_id, subtitles)
return
return {
'id': video_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'timestamp': timestamp,
'duration': duration,
'formats': formats,
'subtitles': self.extract_subtitles(video_id, subtitles),
}

View File

@@ -0,0 +1,39 @@
# encoding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
class DumpIE(InfoExtractor):
_VALID_URL = r'^https?://(?:www\.)?dump\.com/(?P<id>[a-zA-Z0-9]+)/'
_TEST = {
'url': 'http://www.dump.com/oneus/',
'md5': 'ad71704d1e67dfd9e81e3e8b42d69d99',
'info_dict': {
'id': 'oneus',
'ext': 'flv',
'title': "He's one of us.",
'thumbnail': 're:^https?://.*\.jpg$',
},
}
def _real_extract(self, url):
m = re.match(self._VALID_URL, url)
video_id = m.group('id')
webpage = self._download_webpage(url, video_id)
video_url = self._search_regex(
r's1.addVariable\("file",\s*"([^"]+)"', webpage, 'video URL')
thumb = self._og_search_thumbnail(webpage)
title = self._search_regex(r'<b>([^"]+)</b>', webpage, 'title')
return {
'id': video_id,
'title': title,
'url': video_url,
'thumbnail': thumb,
}

View File

@@ -1,19 +1,21 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import determine_ext
class EbaumsWorldIE(InfoExtractor):
_VALID_URL = r'https?://www\.ebaumsworld\.com/video/watch/(?P<id>\d+)'
_TEST = {
u'url': u'http://www.ebaumsworld.com/video/watch/83367677/',
u'file': u'83367677.mp4',
u'info_dict': {
u'title': u'A Giant Python Opens The Door',
u'description': u'This is how nightmares start...',
u'uploader': u'jihadpizza',
'url': 'http://www.ebaumsworld.com/video/watch/83367677/',
'info_dict': {
'id': '83367677',
'ext': 'mp4',
'title': 'A Giant Python Opens The Door',
'description': 'This is how nightmares start...',
'uploader': 'jihadpizza',
},
}
@@ -28,7 +30,6 @@ class EbaumsWorldIE(InfoExtractor):
'id': video_id,
'title': config.find('title').text,
'url': video_url,
'ext': determine_ext(video_url),
'description': config.find('description').text,
'thumbnail': config.find('image').text,
'uploader': config.find('username').text,

View File

@@ -1,10 +1,13 @@
# coding: utf-8
from __future__ import unicode_literals
import json
import random
import re
from .common import InfoExtractor
from ..utils import (
ExtractorError,
compat_str,
)
@@ -12,86 +15,98 @@ class EightTracksIE(InfoExtractor):
IE_NAME = '8tracks'
_VALID_URL = r'https?://8tracks\.com/(?P<user>[^/]+)/(?P<id>[^/#]+)(?:#.*)?$'
_TEST = {
u"name": u"EightTracks",
u"url": u"http://8tracks.com/ytdl/youtube-dl-test-tracks-a",
u"playlist": [
"name": "EightTracks",
"url": "http://8tracks.com/ytdl/youtube-dl-test-tracks-a",
"info_dict": {
'id': '1336550',
'display_id': 'youtube-dl-test-tracks-a',
"description": "test chars: \"'/\\ä↭",
"title": "youtube-dl test tracks \"'/\\ä↭<>",
},
"playlist": [
{
u"file": u"11885610.m4a",
u"md5": u"96ce57f24389fc8734ce47f4c1abcc55",
u"info_dict": {
u"title": u"youtue-dl project<>\"' - youtube-dl test track 1 \"'/\\\u00e4\u21ad",
u"uploader_id": u"ytdl"
"md5": "96ce57f24389fc8734ce47f4c1abcc55",
"info_dict": {
"id": "11885610",
"ext": "m4a",
"title": "youtue-dl project<>\"' - youtube-dl test track 1 \"'/\\\u00e4\u21ad",
"uploader_id": "ytdl"
}
},
{
u"file": u"11885608.m4a",
u"md5": u"4ab26f05c1f7291ea460a3920be8021f",
u"info_dict": {
u"title": u"youtube-dl project - youtube-dl test track 2 \"'/\\\u00e4\u21ad",
u"uploader_id": u"ytdl"
"md5": "4ab26f05c1f7291ea460a3920be8021f",
"info_dict": {
"id": "11885608",
"ext": "m4a",
"title": "youtube-dl project - youtube-dl test track 2 \"'/\\\u00e4\u21ad",
"uploader_id": "ytdl"
}
},
{
u"file": u"11885679.m4a",
u"md5": u"d30b5b5f74217410f4689605c35d1fd7",
u"info_dict": {
u"title": u"youtube-dl project as well - youtube-dl test track 3 \"'/\\\u00e4\u21ad",
u"uploader_id": u"ytdl"
"md5": "d30b5b5f74217410f4689605c35d1fd7",
"info_dict": {
"id": "11885679",
"ext": "m4a",
"title": "youtube-dl project as well - youtube-dl test track 3 \"'/\\\u00e4\u21ad",
"uploader_id": "ytdl"
}
},
{
u"file": u"11885680.m4a",
u"md5": u"4eb0a669317cd725f6bbd336a29f923a",
u"info_dict": {
u"title": u"youtube-dl project as well - youtube-dl test track 4 \"'/\\\u00e4\u21ad",
u"uploader_id": u"ytdl"
"md5": "4eb0a669317cd725f6bbd336a29f923a",
"info_dict": {
"id": "11885680",
"ext": "m4a",
"title": "youtube-dl project as well - youtube-dl test track 4 \"'/\\\u00e4\u21ad",
"uploader_id": "ytdl"
}
},
{
u"file": u"11885682.m4a",
u"md5": u"1893e872e263a2705558d1d319ad19e8",
u"info_dict": {
u"title": u"PH - youtube-dl test track 5 \"'/\\\u00e4\u21ad",
u"uploader_id": u"ytdl"
"md5": "1893e872e263a2705558d1d319ad19e8",
"info_dict": {
"id": "11885682",
"ext": "m4a",
"title": "PH - youtube-dl test track 5 \"'/\\\u00e4\u21ad",
"uploader_id": "ytdl"
}
},
{
u"file": u"11885683.m4a",
u"md5": u"b673c46f47a216ab1741ae8836af5899",
u"info_dict": {
u"title": u"PH - youtube-dl test track 6 \"'/\\\u00e4\u21ad",
u"uploader_id": u"ytdl"
"md5": "b673c46f47a216ab1741ae8836af5899",
"info_dict": {
"id": "11885683",
"ext": "m4a",
"title": "PH - youtube-dl test track 6 \"'/\\\u00e4\u21ad",
"uploader_id": "ytdl"
}
},
{
u"file": u"11885684.m4a",
u"md5": u"1d74534e95df54986da7f5abf7d842b7",
u"info_dict": {
u"title": u"phihag - youtube-dl test track 7 \"'/\\\u00e4\u21ad",
u"uploader_id": u"ytdl"
"md5": "1d74534e95df54986da7f5abf7d842b7",
"info_dict": {
"id": "11885684",
"ext": "m4a",
"title": "phihag - youtube-dl test track 7 \"'/\\\u00e4\u21ad",
"uploader_id": "ytdl"
}
},
{
u"file": u"11885685.m4a",
u"md5": u"f081f47af8f6ae782ed131d38b9cd1c0",
u"info_dict": {
u"title": u"phihag - youtube-dl test track 8 \"'/\\\u00e4\u21ad",
u"uploader_id": u"ytdl"
"md5": "f081f47af8f6ae782ed131d38b9cd1c0",
"info_dict": {
"id": "11885685",
"ext": "m4a",
"title": "phihag - youtube-dl test track 8 \"'/\\\u00e4\u21ad",
"uploader_id": "ytdl"
}
}
]
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
if mobj is None:
raise ExtractorError(u'Invalid URL: %s' % url)
playlist_id = mobj.group('id')
webpage = self._download_webpage(url, playlist_id)
json_like = self._search_regex(r"PAGE.mix = (.*?);\n", webpage, u'trax information', flags=re.DOTALL)
json_like = self._search_regex(
r"(?s)PAGE.mix = (.*?);\n", webpage, 'trax information')
data = json.loads(json_like)
session = str(random.randint(0, 1000000000))
@@ -99,21 +114,30 @@ class EightTracksIE(InfoExtractor):
track_count = data['tracks_count']
first_url = 'http://8tracks.com/sets/%s/play?player=sm&mix_id=%s&format=jsonh' % (session, mix_id)
next_url = first_url
res = []
entries = []
for i in range(track_count):
api_json = self._download_webpage(next_url, playlist_id,
note=u'Downloading song information %s/%s' % (str(i+1), track_count),
errnote=u'Failed to download song information')
api_json = self._download_webpage(
next_url, playlist_id,
note='Downloading song information %d/%d' % (i + 1, track_count),
errnote='Failed to download song information')
api_data = json.loads(api_json)
track_data = api_data[u'set']['track']
track_data = api_data['set']['track']
info = {
'id': track_data['id'],
'id': compat_str(track_data['id']),
'url': track_data['track_file_stream_url'],
'title': track_data['performer'] + u' - ' + track_data['name'],
'raw_title': track_data['name'],
'uploader_id': data['user']['login'],
'ext': 'm4a',
}
res.append(info)
next_url = 'http://8tracks.com/sets/%s/next?player=sm&mix_id=%s&format=jsonh&track_id=%s' % (session, mix_id, track_data['id'])
return res
entries.append(info)
next_url = 'http://8tracks.com/sets/%s/next?player=sm&mix_id=%s&format=jsonh&track_id=%s' % (
session, mix_id, track_data['id'])
return {
'_type': 'playlist',
'entries': entries,
'id': compat_str(mix_id),
'display_id': playlist_id,
'title': data.get('name'),
'description': data.get('description'),
}

View File

@@ -0,0 +1,79 @@
# coding: utf-8
from __future__ import unicode_literals
import re
import json
from .common import InfoExtractor
from ..utils import (
ExtractorError,
parse_iso8601,
)
class EllenTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?ellentv\.com/videos/(?P<id>[a-z0-9_-]+)'
_TEST = {
'url': 'http://www.ellentv.com/videos/0-7jqrsr18/',
'md5': 'e4af06f3bf0d5f471921a18db5764642',
'info_dict': {
'id': '0-7jqrsr18',
'ext': 'mp4',
'title': 'What\'s Wrong with These Photos? A Whole Lot',
'timestamp': 1406876400,
'upload_date': '20140801',
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
timestamp = parse_iso8601(self._search_regex(
r'<span class="publish-date"><time datetime="([^"]+)">',
webpage, 'timestamp'))
return {
'id': video_id,
'title': self._og_search_title(webpage),
'url': self._html_search_meta('VideoURL', webpage, 'url'),
'timestamp': timestamp,
}
class EllenTVClipsIE(InfoExtractor):
IE_NAME = 'EllenTV:clips'
_VALID_URL = r'https?://(?:www\.)?ellentv\.com/episodes/(?P<id>[a-z0-9_-]+)'
_TEST = {
'url': 'http://www.ellentv.com/episodes/meryl-streep-vanessa-hudgens/',
'info_dict': {
'id': 'meryl-streep-vanessa-hudgens',
'title': 'Meryl Streep, Vanessa Hudgens',
},
'playlist_mincount': 9,
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
playlist_id = mobj.group('id')
webpage = self._download_webpage(url, playlist_id)
playlist = self._extract_playlist(webpage)
return {
'_type': 'playlist',
'id': playlist_id,
'title': self._og_search_title(webpage),
'entries': self._extract_entries(playlist)
}
def _extract_playlist(self, webpage):
json_string = self._search_regex(r'playerView.addClips\(\[\{(.*?)\}\]\);', webpage, 'json')
try:
return json.loads("[{" + json_string + "}]")
except ValueError as ve:
raise ExtractorError('Failed to download JSON', cause=ve)
def _extract_entries(self, playlist):
return [self.url_result(item['url'], 'EllenTV') for item in playlist]

View File

@@ -0,0 +1,58 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import fix_xml_ampersands
class EmpflixIE(InfoExtractor):
_VALID_URL = r'^https?://www\.empflix\.com/videos/.*?-(?P<id>[0-9]+)\.html'
_TEST = {
'url': 'http://www.empflix.com/videos/Amateur-Finger-Fuck-33051.html',
'md5': 'b1bc15b6412d33902d6e5952035fcabc',
'info_dict': {
'id': '33051',
'ext': 'mp4',
'title': 'Amateur Finger Fuck',
'description': 'Amateur solo finger fucking.',
'age_limit': 18,
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
age_limit = self._rta_search(webpage)
video_title = self._html_search_regex(
r'name="title" value="(?P<title>[^"]*)"', webpage, 'title')
video_description = self._html_search_regex(
r'name="description" value="([^"]*)"', webpage, 'description', fatal=False)
cfg_url = self._html_search_regex(
r'flashvars\.config = escape\("([^"]+)"',
webpage, 'flashvars.config')
cfg_xml = self._download_xml(
cfg_url, video_id, note='Downloading metadata',
transform_source=fix_xml_ampersands)
formats = [
{
'url': item.find('videoLink').text,
'format_id': item.find('res').text,
} for item in cfg_xml.findall('./quality/item')
]
thumbnail = cfg_xml.find('./startThumb').text
return {
'id': video_id,
'title': video_title,
'description': video_description,
'thumbnail': thumbnail,
'formats': formats,
'age_limit': age_limit,
}

View File

@@ -36,7 +36,7 @@ class EscapistIE(InfoExtractor):
r'<meta name="description" content="([^"]*)"',
webpage, 'description', fatal=False)
playerUrl = self._og_search_video_url(webpage, name=u'player URL')
playerUrl = self._og_search_video_url(webpage, name='player URL')
title = self._html_search_regex(
r'<meta name="title" content="([^"]*)"',

View File

@@ -0,0 +1,73 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
int_or_none,
unified_strdate,
)
class ExpoTVIE(InfoExtractor):
_VALID_URL = r'https?://www\.expotv\.com/videos/[^?#]*/(?P<id>[0-9]+)($|[?#])'
_TEST = {
'url': 'http://www.expotv.com/videos/reviews/1/24/LinneCardscom/17561',
'md5': '2985e6d7a392b2f7a05e0ca350fe41d0',
'info_dict': {
'id': '17561',
'ext': 'mp4',
'upload_date': '20060212',
'title': 'My Favorite Online Scrapbook Store',
'view_count': int,
'description': 'You\'ll find most everything you need at this virtual store front.',
'uploader': 'Anna T.',
'thumbnail': 're:^https?://.*\.jpg$',
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
player_key = self._search_regex(
r'<param name="playerKey" value="([^"]+)"', webpage, 'player key')
config_url = 'http://client.expotv.com/video/config/%s/%s' % (
video_id, player_key)
config = self._download_json(
config_url, video_id,
note='Downloading video configuration')
formats = [{
'url': fcfg['file'],
'height': int_or_none(fcfg.get('height')),
'format_note': fcfg.get('label'),
'ext': self._search_regex(
r'filename=.*\.([a-z0-9_A-Z]+)&', fcfg['file'],
'file extension', default=None),
} for fcfg in config['sources']]
self._sort_formats(formats)
title = self._og_search_title(webpage)
description = self._og_search_description(webpage)
thumbnail = config.get('image')
view_count = int_or_none(self._search_regex(
r'<h5>Plays: ([0-9]+)</h5>', webpage, 'view counts'))
uploader = self._search_regex(
r'<div class="reviewer">\s*<img alt="([^"]+)"', webpage, 'uploader',
fatal=False)
upload_date = unified_strdate(self._search_regex(
r'<h5>Reviewed on ([0-9/.]+)</h5>', webpage, 'upload date',
fatal=False))
return {
'id': video_id,
'formats': formats,
'title': title,
'description': description,
'view_count': view_count,
'thumbnail': thumbnail,
'uploader': uploader,
'upload_date': upload_date,
}

View File

@@ -37,7 +37,7 @@ class ExtremeTubeIE(InfoExtractor):
webpage = self._download_webpage(req, video_id)
video_title = self._html_search_regex(
r'<h1 [^>]*?title="([^"]+)"[^>]*>\1<', webpage, 'title')
r'<h1 [^>]*?title="([^"]+)"[^>]*>', webpage, 'title')
uploader = self._html_search_regex(
r'>Posted by:(?=<)(?:\s|<[^>]*>)*(.+?)\|', webpage, 'uploader',
fatal=False)

View File

@@ -20,7 +20,7 @@ from ..utils import (
class FacebookIE(InfoExtractor):
_VALID_URL = r'''(?x)
https?://(?:\w+\.)?facebook\.com/
(?:[^#?]*\#!/)?
(?:[^#]*?\#!/)?
(?:video/video\.php|photo\.php|video/embed)\?(?:.*?)
(?:v|video_id)=(?P<id>[0-9]+)
(?:.*)'''

View File

@@ -0,0 +1,63 @@
#! -*- coding: utf-8 -*-
from __future__ import unicode_literals
import re
import hashlib
from .common import InfoExtractor
from ..utils import (
ExtractorError,
compat_urllib_request,
compat_urlparse,
)
class FC2IE(InfoExtractor):
_VALID_URL = r'^http://video\.fc2\.com/((?P<lang>[^/]+)/)?content/(?P<id>[^/]+)'
IE_NAME = 'fc2'
_TEST = {
'url': 'http://video.fc2.com/en/content/20121103kUan1KHs',
'md5': 'a6ebe8ebe0396518689d963774a54eb7',
'info_dict': {
'id': '20121103kUan1KHs',
'ext': 'flv',
'title': 'Boxing again with Puff',
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
self._downloader.cookiejar.clear_session_cookies() # must clear
title = self._og_search_title(webpage)
thumbnail = self._og_search_thumbnail(webpage)
refer = url.replace('/content/', '/a/content/')
mimi = hashlib.md5((video_id + '_gGddgPfeaf_gzyr').encode('utf-8')).hexdigest()
info_url = (
"http://video.fc2.com/ginfo.php?mimi={1:s}&href={2:s}&v={0:s}&fversion=WIN%2011%2C6%2C602%2C180&from=2&otag=0&upid={0:s}&tk=null&".
format(video_id, mimi, compat_urllib_request.quote(refer, safe='').replace('.','%2E')))
info_webpage = self._download_webpage(
info_url, video_id, note='Downloading info page')
info = compat_urlparse.parse_qs(info_webpage)
if 'err_code' in info:
raise ExtractorError('Error code: %s' % info['err_code'][0])
video_url = info['filepath'][0] + '?mid=' + info['mid'][0]
title_info = info.get('title')
if title_info:
title = title_info[0]
return {
'id': video_id,
'title': title,
'url': video_url,
'ext': 'flv',
'thumbnail': thumbnail,
}

View File

@@ -0,0 +1,81 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
ExtractorError,
compat_urllib_parse,
compat_urllib_request,
)
class FiredriveIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?firedrive\.com/' + \
'(?:file|embed)/(?P<id>[0-9a-zA-Z]+)'
_FILE_DELETED_REGEX = r'<div class="removed_file_image">'
_TESTS = [{
'url': 'https://www.firedrive.com/file/FEB892FA160EBD01',
'md5': 'd5d4252f80ebeab4dc2d5ceaed1b7970',
'info_dict': {
'id': 'FEB892FA160EBD01',
'ext': 'flv',
'title': 'bbb_theora_486kbit.flv',
'thumbnail': 're:^http://.*\.jpg$',
},
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
url = 'http://firedrive.com/file/%s' % video_id
webpage = self._download_webpage(url, video_id)
if re.search(self._FILE_DELETED_REGEX, webpage) is not None:
raise ExtractorError('Video %s does not exist' % video_id,
expected=True)
fields = dict(re.findall(r'''(?x)<input\s+
type="hidden"\s+
name="([^"]+)"\s+
value="([^"]*)"
''', webpage))
post = compat_urllib_parse.urlencode(fields)
req = compat_urllib_request.Request(url, post)
req.add_header('Content-type', 'application/x-www-form-urlencoded')
# Apparently, this header is required for confirmation to work.
req.add_header('Host', 'www.firedrive.com')
webpage = self._download_webpage(req, video_id,
'Downloading video page')
title = self._search_regex(r'class="external_title_left">(.+)</div>',
webpage, 'title')
thumbnail = self._search_regex(r'image:\s?"(//[^\"]+)', webpage,
'thumbnail', fatal=False)
if thumbnail is not None:
thumbnail = 'http:' + thumbnail
ext = self._search_regex(r'type:\s?\'([^\']+)\',',
webpage, 'extension', fatal=False)
video_url = self._search_regex(
r'file:\s?loadURL\(\'(http[^\']+)\'\),', webpage, 'file url')
formats = [{
'format_id': 'sd',
'url': video_url,
'ext': ext,
}]
return {
'id': video_id,
'title': title,
'thumbnail': thumbnail,
'formats': formats,
}

View File

@@ -15,6 +15,7 @@ class FirstpostIE(InfoExtractor):
'id': '1025403',
'ext': 'mp4',
'title': 'India to launch indigenous aircraft carrier INS Vikrant today',
'description': 'md5:feef3041cb09724e0bdc02843348f5f4',
}
}
@@ -22,13 +23,16 @@ class FirstpostIE(InfoExtractor):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
page = self._download_webpage(url, video_id)
title = self._html_search_meta('twitter:title', page, 'title')
description = self._html_search_meta('twitter:description', page, 'title')
data = self._download_xml(
'http://www.firstpost.com/getvideoxml-%s.xml' % video_id, video_id,
'Downloading video XML')
item = data.find('./playlist/item')
thumbnail = item.find('./image').text
title = item.find('./title').text
formats = [
{
@@ -42,6 +46,7 @@ class FirstpostIE(InfoExtractor):
return {
'id': video_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'formats': formats,
}

View File

@@ -19,17 +19,35 @@ class FranceTVBaseInfoExtractor(InfoExtractor):
+ video_id, video_id, 'Downloading XML config')
manifest_url = info.find('videos/video/url').text
video_url = manifest_url.replace('manifest.f4m', 'index_2_av.m3u8')
video_url = video_url.replace('/z/', '/i/')
manifest_url = manifest_url.replace('/z/', '/i/')
if manifest_url.startswith('rtmp'):
formats = [{'url': manifest_url, 'ext': 'flv'}]
else:
formats = []
available_formats = self._search_regex(r'/[^,]*,(.*?),k\.mp4', manifest_url, 'available formats')
for index, format_descr in enumerate(available_formats.split(',')):
format_info = {
'url': manifest_url.replace('manifest.f4m', 'index_%d_av.m3u8' % index),
'ext': 'mp4',
}
m_resolution = re.search(r'(?P<width>\d+)x(?P<height>\d+)', format_descr)
if m_resolution is not None:
format_info.update({
'width': int(m_resolution.group('width')),
'height': int(m_resolution.group('height')),
})
formats.append(format_info)
thumbnail_path = info.find('image').text
return {'id': video_id,
'ext': 'flv' if video_url.startswith('rtmp') else 'mp4',
'url': video_url,
'title': info.find('titre').text,
'thumbnail': compat_urlparse.urljoin('http://pluzz.francetv.fr', thumbnail_path),
'description': info.find('synopsis').text,
}
return {
'id': video_id,
'title': info.find('titre').text,
'formats': formats,
'thumbnail': compat_urlparse.urljoin('http://pluzz.francetv.fr', thumbnail_path),
'description': info.find('synopsis').text,
}
class PluzzIE(FranceTVBaseInfoExtractor):
@@ -48,24 +66,36 @@ class PluzzIE(FranceTVBaseInfoExtractor):
class FranceTvInfoIE(FranceTVBaseInfoExtractor):
IE_NAME = 'francetvinfo.fr'
_VALID_URL = r'https?://www\.francetvinfo\.fr/replay.*/(?P<title>.+)\.html'
_VALID_URL = r'https?://(?:www|mobile)\.francetvinfo\.fr/.*/(?P<title>.+)\.html'
_TEST = {
_TESTS = [{
'url': 'http://www.francetvinfo.fr/replay-jt/france-3/soir-3/jt-grand-soir-3-lundi-26-aout-2013_393427.html',
'file': '84981923.mp4',
'info_dict': {
'id': '84981923',
'ext': 'mp4',
'title': 'Soir 3',
},
'params': {
'skip_download': True,
},
}
}, {
'url': 'http://www.francetvinfo.fr/elections/europeennes/direct-europeennes-regardez-le-debat-entre-les-candidats-a-la-presidence-de-la-commission_600639.html',
'info_dict': {
'id': 'EV_20019',
'ext': 'mp4',
'title': 'Débat des candidats à la Commission européenne',
'description': 'Débat des candidats à la Commission européenne',
},
'params': {
'skip_download': 'HLS (reqires ffmpeg)'
}
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
page_title = mobj.group('title')
webpage = self._download_webpage(url, page_title)
video_id = self._search_regex(r'id-video=(\d+?)[@"]', webpage, 'video id')
video_id = self._search_regex(r'id-video=((?:[^0-9]*?_)?[0-9]+)[@"]', webpage, 'video id')
return self._extract_video(video_id)
@@ -199,7 +229,7 @@ class GenerationQuoiIE(InfoExtractor):
class CultureboxIE(FranceTVBaseInfoExtractor):
IE_NAME = 'culturebox.francetvinfo.fr'
_VALID_URL = r'https?://culturebox\.francetvinfo\.fr/(?P<name>.*?)(\?|$)'
_VALID_URL = r'https?://(?:m\.)?culturebox\.francetvinfo\.fr/(?P<name>.*?)(\?|$)'
_TEST = {
'url': 'http://culturebox.francetvinfo.fr/einstein-on-the-beach-au-theatre-du-chatelet-146813',

View File

@@ -4,22 +4,32 @@ import json
import re
from .common import InfoExtractor
from ..utils import ExtractorError
class FunnyOrDieIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?funnyordie\.com/(?P<type>embed|videos)/(?P<id>[0-9a-f]+)(?:$|[?#/])'
_TEST = {
_TESTS = [{
'url': 'http://www.funnyordie.com/videos/0732f586d7/heart-shaped-box-literal-video-version',
'file': '0732f586d7.mp4',
'md5': 'f647e9e90064b53b6e046e75d0241fbd',
'md5': 'bcd81e0c4f26189ee09be362ad6e6ba9',
'info_dict': {
'description': ('Lyrics changed to match the video. Spoken cameo '
'by Obscurus Lupa (from ThatGuyWithTheGlasses.com). Based on a '
'concept by Dustin McLean (DustFilms.com). Performed, edited, '
'and written by David A. Scott.'),
'id': '0732f586d7',
'ext': 'mp4',
'title': 'Heart-Shaped Box: Literal Video Version',
'description': 'md5:ea09a01bc9a1c46d9ab696c01747c338',
'thumbnail': 're:^http:.*\.jpg$',
},
}
}, {
'url': 'http://www.funnyordie.com/embed/e402820827',
'md5': 'ff4d83318f89776ed0250634cfaa8d36',
'info_dict': {
'id': 'e402820827',
'ext': 'mp4',
'title': 'Please Use This Song (Jon Lajoie)',
'description': 'Please use this to sell something. www.jonlajoie.com',
'thumbnail': 're:^http:.*\.jpg$',
},
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
@@ -27,27 +37,34 @@ class FunnyOrDieIE(InfoExtractor):
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
video_url = self._search_regex(
[r'type="video/mp4" src="(.*?)"', r'src="([^>]*?)" type=\'video/mp4\''],
webpage, 'video URL', flags=re.DOTALL)
links = re.findall(r'<source src="([^"]+/v)\d+\.([^"]+)" type=\'video', webpage)
if not links:
raise ExtractorError('No media links available for %s' % video_id)
if mobj.group('type') == 'embed':
post_json = self._search_regex(
r'fb_post\s*=\s*(\{.*?\});', webpage, 'post details')
post = json.loads(post_json)
title = post['name']
description = post.get('description')
thumbnail = post.get('picture')
else:
title = self._og_search_title(webpage)
description = self._og_search_description(webpage)
thumbnail = None
links.sort(key=lambda link: 1 if link[1] == 'mp4' else 0)
bitrates = self._html_search_regex(r'<source src="[^"]+/v,((?:\d+,)+)\.mp4\.csmil', webpage, 'video bitrates')
bitrates = [int(b) for b in bitrates.rstrip(',').split(',')]
bitrates.sort()
formats = []
for bitrate in bitrates:
for link in links:
formats.append({
'url': '%s%d.%s' % (link[0], bitrate, link[1]),
'format_id': '%s-%d' % (link[1], bitrate),
'vbr': bitrate,
})
post_json = self._search_regex(
r'fb_post\s*=\s*(\{.*?\});', webpage, 'post details')
post = json.loads(post_json)
return {
'id': video_id,
'url': video_url,
'ext': 'mp4',
'title': title,
'description': description,
'thumbnail': thumbnail,
'title': post['name'],
'description': post.get('description'),
'thumbnail': post.get('picture'),
'formats': formats,
}

View File

@@ -15,7 +15,7 @@ class GamekingsIE(InfoExtractor):
'id': '20130811',
'ext': 'mp4',
'title': 'Phoenix Wright: Ace Attorney \u2013 Dual Destinies Review',
'description': 'md5:632e61a9f97d700e83f43d77ddafb6a4',
'description': 'md5:36fd701e57e8c15ac8682a2374c99731',
}
}

View File

@@ -0,0 +1,115 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
xpath_with_ns,
parse_iso8601
)
NAMESPACE_MAP = {
'media': 'http://search.yahoo.com/mrss/',
}
# URL prefix to download the mp4 files directly instead of streaming via rtmp
# Credits go to XBox-Maniac
# http://board.jdownloader.org/showpost.php?p=185835&postcount=31
RAW_MP4_URL = 'http://cdn.riptide-mtvn.com/'
class GameOneIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?gameone\.de/tv/(?P<id>\d+)'
_TEST = {
'url': 'http://www.gameone.de/tv/288',
'md5': '136656b7fb4c9cb4a8e2d500651c499b',
'info_dict': {
'id': '288',
'ext': 'mp4',
'title': 'Game One - Folge 288',
'duration': 1238,
'thumbnail': 'http://s3.gameone.de/gameone/assets/video_metas/teaser_images/000/643/636/big/640x360.jpg',
'description': 'FIFA-Pressepokal 2014, Star Citizen, Kingdom Come: Deliverance, Project Cars, Schöner Trants Nerdquiz Folge 2 Runde 1',
'age_limit': 16,
'upload_date': '20140513',
'timestamp': 1399980122,
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
og_video = self._og_search_video_url(webpage, secure=False)
description = self._html_search_meta('description', webpage)
age_limit = int(
self._search_regex(
r'age=(\d+)',
self._html_search_meta(
'age-de-meta-label',
webpage),
'age_limit',
'0'))
mrss_url = self._search_regex(r'mrss=([^&]+)', og_video, 'mrss')
mrss = self._download_xml(mrss_url, video_id, 'Downloading mrss')
title = mrss.find('.//item/title').text
thumbnail = mrss.find('.//item/image').get('url')
timestamp = parse_iso8601(mrss.find('.//pubDate').text, delimiter=' ')
content = mrss.find(xpath_with_ns('.//media:content', NAMESPACE_MAP))
content_url = content.get('url')
content = self._download_xml(
content_url,
video_id,
'Downloading media:content')
rendition_items = content.findall('.//rendition')
duration = int(rendition_items[0].get('duration'))
formats = [
{
'url': re.sub(r'.*/(r2)', RAW_MP4_URL + r'\1', r.find('./src').text),
'width': int(r.get('width')),
'height': int(r.get('height')),
'tbr': int(r.get('bitrate')),
}
for r in rendition_items
]
self._sort_formats(formats)
return {
'id': video_id,
'title': title,
'thumbnail': thumbnail,
'duration': duration,
'formats': formats,
'description': description,
'age_limit': age_limit,
'timestamp': timestamp,
}
class GameOnePlaylistIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?gameone\.de(?:/tv)?/?$'
IE_NAME = 'gameone:playlist'
_TEST = {
'url': 'http://www.gameone.de/tv',
'info_dict': {
'title': 'GameOne',
},
'playlist_mincount': 294,
}
def _real_extract(self, url):
webpage = self._download_webpage('http://www.gameone.de/tv', 'TV')
max_id = max(map(int, re.findall(r'<a href="/tv/(\d+)"', webpage)))
entries = [
self.url_result('http://www.gameone.de/tv/%d' % video_id, 'GameOne')
for video_id in range(max_id, 0, -1)]
return {
'_type': 'playlist',
'title': 'GameOne',
'entries': entries,
}

View File

@@ -15,11 +15,12 @@ from ..utils import (
class GameSpotIE(InfoExtractor):
_VALID_URL = r'(?:http://)?(?:www\.)?gamespot\.com/.*-(?P<page_id>\d+)/?'
_TEST = {
"url": "http://www.gamespot.com/arma-iii/videos/arma-iii-community-guide-sitrep-i-6410818/",
"file": "gs-2300-6410818.mp4",
"md5": "b2a30deaa8654fcccd43713a6b6a4825",
"info_dict": {
"title": "Arma 3 - Community Guide: SITREP I",
'url': 'http://www.gamespot.com/videos/arma-3-community-guide-sitrep-i/2300-6410818/',
'md5': 'b2a30deaa8654fcccd43713a6b6a4825',
'info_dict': {
'id': 'gs-2300-6410818',
'ext': 'mp4',
'title': 'Arma 3 - Community Guide: SITREP I',
'description': 'Check out this video where some of the basics of Arma 3 is explained.',
}
}

View File

@@ -0,0 +1,74 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
int_or_none,
parse_duration,
str_to_int,
unified_strdate,
)
class GameStarIE(InfoExtractor):
_VALID_URL = r'http://www\.gamestar\.de/videos/.*,(?P<id>[0-9]+)\.html'
_TEST = {
'url': 'http://www.gamestar.de/videos/trailer,3/hobbit-3-die-schlacht-der-fuenf-heere,76110.html',
'md5': '96974ecbb7fd8d0d20fca5a00810cea7',
'info_dict': {
'id': '76110',
'ext': 'mp4',
'title': 'Hobbit 3: Die Schlacht der Fünf Heere - Teaser-Trailer zum dritten Teil',
'description': 'Der Teaser-Trailer zu Hobbit 3: Die Schlacht der Fünf Heere zeigt einige Szenen aus dem dritten Teil der Saga und kündigt den vollständigen Trailer an.',
'thumbnail': 'http://images.gamestar.de/images/idgwpgsgp/bdb/2494525/600x.jpg',
'upload_date': '20140728',
'duration': 17
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
og_title = self._og_search_title(webpage)
title = og_title.replace(' - Video bei GameStar.de', '').strip()
url = 'http://gamestar.de/_misc/videos/portal/getVideoUrl.cfm?premium=0&videoId=' + video_id
description = self._og_search_description(webpage).strip()
thumbnail = self._proto_relative_url(
self._og_search_thumbnail(webpage), scheme='http:')
upload_date = unified_strdate(self._html_search_regex(
r'<span style="float:left;font-size:11px;">Datum: ([0-9]+\.[0-9]+\.[0-9]+)&nbsp;&nbsp;',
webpage, 'upload_date', fatal=False))
duration = parse_duration(self._html_search_regex(
r'&nbsp;&nbsp;Länge: ([0-9]+:[0-9]+)</span>', webpage, 'duration',
fatal=False))
view_count = str_to_int(self._html_search_regex(
r'&nbsp;&nbsp;Zuschauer: ([0-9\.]+)&nbsp;&nbsp;', webpage,
'view_count', fatal=False))
comment_count = int_or_none(self._html_search_regex(
r'>Kommentieren \(([0-9]+)\)</a>', webpage, 'comment_count',
fatal=False))
return {
'id': video_id,
'title': title,
'url': url,
'ext': 'mp4',
'thumbnail': thumbnail,
'description': description,
'upload_date': upload_date,
'duration': duration,
'view_count': view_count,
'comment_count': comment_count
}

View File

@@ -8,6 +8,7 @@ from ..utils import (
compat_urllib_request,
)
class GDCVaultIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?gdcvault\.com/play/(?P<id>\d+)/(?P<name>(\w|-)+)'
_TESTS = [
@@ -31,6 +32,15 @@ class GDCVaultIE(InfoExtractor):
'skip_download': True, # Requires rtmpdump
}
},
{
'url': 'http://www.gdcvault.com/play/1015301/Thexder-Meets-Windows-95-or',
'md5': 'a5eb77996ef82118afbbe8e48731b98e',
'info_dict': {
'id': '1015301',
'ext': 'flv',
'title': 'Thexder Meets Windows 95, or Writing Great Games in the Windows 95 Environment',
}
}
]
def _parse_mp4(self, xml_description):
@@ -103,18 +113,40 @@ class GDCVaultIE(InfoExtractor):
webpage_url = 'http://www.gdcvault.com/play/' + video_id
start_page = self._download_webpage(webpage_url, video_id)
xml_root = self._html_search_regex(r'<iframe src="(?P<xml_root>.*?)player.html.*?".*?</iframe>', start_page, 'xml root', None, False)
direct_url = self._search_regex(
r's1\.addVariable\("file",\s*encodeURIComponent\("(/[^"]+)"\)\);',
start_page, 'url', default=None)
if direct_url:
video_url = 'http://www.gdcvault.com/' + direct_url
title = self._html_search_regex(
r'<td><strong>Session Name</strong></td>\s*<td>(.*?)</td>',
start_page, 'title')
return {
'id': video_id,
'url': video_url,
'ext': 'flv',
'title': title,
}
xml_root = self._html_search_regex(
r'<iframe src="(?P<xml_root>.*?)player.html.*?".*?</iframe>',
start_page, 'xml root', default=None)
if xml_root is None:
# Probably need to authenticate
start_page = self._login(webpage_url, video_id)
if start_page is None:
login_res = self._login(webpage_url, video_id)
if login_res is None:
self.report_warning('Could not login.')
else:
start_page = login_res
# Grab the url from the authenticated page
xml_root = self._html_search_regex(r'<iframe src="(?P<xml_root>.*?)player.html.*?".*?</iframe>', start_page, 'xml root')
xml_root = self._html_search_regex(
r'<iframe src="(.*?)player.html.*?".*?</iframe>',
start_page, 'xml root')
xml_name = self._html_search_regex(r'<iframe src=".*?\?xml=(?P<xml_file>.+?\.xml).*?".*?</iframe>', start_page, 'xml filename', None, False)
xml_name = self._html_search_regex(
r'<iframe src=".*?\?xml=(.+?\.xml).*?".*?</iframe>',
start_page, 'xml filename', default=None)
if xml_name is None:
# Fallback to the older format
xml_name = self._html_search_regex(r'<iframe src=".*?\?xmlURL=xml/(?P<xml_file>.+?\.xml).*?".*?</iframe>', start_page, 'xml filename')

View File

@@ -8,18 +8,19 @@ import re
from .common import InfoExtractor
from .youtube import YoutubeIE
from ..utils import (
compat_urllib_error,
compat_urllib_parse,
compat_urllib_request,
compat_urlparse,
compat_xml_parse_error,
ExtractorError,
float_or_none,
HEADRequest,
orderedSet,
parse_xml,
smuggle_url,
unescapeHTML,
unified_strdate,
unsmuggle_url,
url_basename,
)
from .brightcove import BrightcoveIE
@@ -260,6 +261,96 @@ class GenericIE(InfoExtractor):
'uploader': 'Spi0n',
},
'add_ie': ['Dailymotion'],
},
# YouTube embed
{
'url': 'http://www.badzine.de/ansicht/datum/2014/06/09/so-funktioniert-die-neue-englische-badminton-liga.html',
'info_dict': {
'id': 'FXRb4ykk4S0',
'ext': 'mp4',
'title': 'The NBL Auction 2014',
'uploader': 'BADMINTON England',
'uploader_id': 'BADMINTONEvents',
'upload_date': '20140603',
'description': 'md5:9ef128a69f1e262a700ed83edb163a73',
},
'add_ie': ['Youtube'],
'params': {
'skip_download': True,
}
},
# MTVSercices embed
{
'url': 'http://www.gametrailers.com/news-post/76093/north-america-europe-is-getting-that-mario-kart-8-mercedes-dlc-too',
'md5': '35727f82f58c76d996fc188f9755b0d5',
'info_dict': {
'id': '0306a69b-8adf-4fb5-aace-75f8e8cbfca9',
'ext': 'mp4',
'title': 'Review',
'description': 'Mario\'s life in the fast lane has never looked so good.',
},
},
# YouTube embed via <data-embed-url="">
{
'url': 'https://play.google.com/store/apps/details?id=com.gameloft.android.ANMP.GloftA8HM',
'info_dict': {
'id': 'jpSGZsgga_I',
'ext': 'mp4',
'title': 'Asphalt 8: Airborne - Launch Trailer',
'uploader': 'Gameloft',
'uploader_id': 'gameloft',
'upload_date': '20130821',
'description': 'md5:87bd95f13d8be3e7da87a5f2c443106a',
},
'params': {
'skip_download': True,
}
},
# Camtasia studio
{
'url': 'http://www.ll.mit.edu/workshops/education/videocourses/antennas/lecture1/video/',
'playlist': [{
'md5': '0c5e352edabf715d762b0ad4e6d9ee67',
'info_dict': {
'id': 'Fenn-AA_PA_Radar_Course_Lecture_1c_Final',
'title': 'Fenn-AA_PA_Radar_Course_Lecture_1c_Final - video1',
'ext': 'flv',
'duration': 2235.90,
}
}, {
'md5': '10e4bb3aaca9fd630e273ff92d9f3c63',
'info_dict': {
'id': 'Fenn-AA_PA_Radar_Course_Lecture_1c_Final_PIP',
'title': 'Fenn-AA_PA_Radar_Course_Lecture_1c_Final - pip',
'ext': 'flv',
'duration': 2235.93,
}
}],
'info_dict': {
'title': 'Fenn-AA_PA_Radar_Course_Lecture_1c_Final',
}
},
# Flowplayer
{
'url': 'http://www.handjobhub.com/video/busty-blonde-siri-tit-fuck-while-wank-6313.html',
'md5': '9d65602bf31c6e20014319c7d07fba27',
'info_dict': {
'id': '5123ea6d5e5a7',
'ext': 'mp4',
'age_limit': 18,
'uploader': 'www.handjobhub.com',
'title': 'Busty Blonde Siri Tit Fuck While Wank at Handjob Hub',
}
},
# RSS feed
{
'url': 'http://phihag.de/2014/youtube-dl/rss2.xml',
'info_dict': {
'id': 'http://phihag.de/2014/youtube-dl/rss2.xml',
'title': 'Zero Punctuation',
'description': 're:'
},
'playlist_mincount': 11,
}
]
@@ -273,58 +364,6 @@ class GenericIE(InfoExtractor):
"""Report information extraction."""
self._downloader.to_screen('[redirect] Following redirect to %s' % new_url)
def _send_head(self, url):
"""Check if it is a redirect, like url shorteners, in case return the new url."""
class HEADRedirectHandler(compat_urllib_request.HTTPRedirectHandler):
"""
Subclass the HTTPRedirectHandler to make it use our
HEADRequest also on the redirected URL
"""
def redirect_request(self, req, fp, code, msg, headers, newurl):
if code in (301, 302, 303, 307):
newurl = newurl.replace(' ', '%20')
newheaders = dict((k,v) for k,v in req.headers.items()
if k.lower() not in ("content-length", "content-type"))
try:
# This function was deprecated in python 3.3 and removed in 3.4
origin_req_host = req.get_origin_req_host()
except AttributeError:
origin_req_host = req.origin_req_host
return HEADRequest(newurl,
headers=newheaders,
origin_req_host=origin_req_host,
unverifiable=True)
else:
raise compat_urllib_error.HTTPError(req.get_full_url(), code, msg, headers, fp)
class HTTPMethodFallback(compat_urllib_request.BaseHandler):
"""
Fallback to GET if HEAD is not allowed (405 HTTP error)
"""
def http_error_405(self, req, fp, code, msg, headers):
fp.read()
fp.close()
newheaders = dict((k,v) for k,v in req.headers.items()
if k.lower() not in ("content-length", "content-type"))
return self.parent.open(compat_urllib_request.Request(req.get_full_url(),
headers=newheaders,
origin_req_host=req.get_origin_req_host(),
unverifiable=True))
# Build our opener
opener = compat_urllib_request.OpenerDirector()
for handler in [compat_urllib_request.HTTPHandler, compat_urllib_request.HTTPDefaultErrorHandler,
HTTPMethodFallback, HEADRedirectHandler,
compat_urllib_request.HTTPErrorProcessor, compat_urllib_request.HTTPSHandler]:
opener.add_handler(handler())
response = opener.open(HEADRequest(url))
if response is None:
raise ExtractorError('Invalid URL protocol')
return response
def _extract_rss(self, url, video_id, doc):
playlist_title = doc.find('./channel/title').text
playlist_desc_el = doc.find('./channel/description')
@@ -344,45 +383,104 @@ class GenericIE(InfoExtractor):
'entries': entries,
}
def _extract_camtasia(self, url, video_id, webpage):
""" Returns None if no camtasia video can be found. """
camtasia_cfg = self._search_regex(
r'fo\.addVariable\(\s*"csConfigFile",\s*"([^"]+)"\s*\);',
webpage, 'camtasia configuration file', default=None)
if camtasia_cfg is None:
return None
title = self._html_search_meta('DC.title', webpage, fatal=True)
camtasia_url = compat_urlparse.urljoin(url, camtasia_cfg)
camtasia_cfg = self._download_xml(
camtasia_url, video_id,
note='Downloading camtasia configuration',
errnote='Failed to download camtasia configuration')
fileset_node = camtasia_cfg.find('./playlist/array/fileset')
entries = []
for n in fileset_node.getchildren():
url_n = n.find('./uri')
if url_n is None:
continue
entries.append({
'id': os.path.splitext(url_n.text.rpartition('/')[2])[0],
'title': '%s - %s' % (title, n.tag),
'url': compat_urlparse.urljoin(url, url_n.text),
'duration': float_or_none(n.find('./duration').text),
})
return {
'_type': 'playlist',
'entries': entries,
'title': title,
}
def _real_extract(self, url):
if url.startswith('//'):
return {
'_type': 'url',
'url': (
'http:'
if self._downloader.params.get('prefer_insecure', False)
else 'https:') + url,
'url': self.http_scheme() + url,
}
parsed_url = compat_urlparse.urlparse(url)
if not parsed_url.scheme:
default_search = self._downloader.params.get('default_search')
if default_search is None:
default_search = 'auto_warning'
default_search = 'fixup_error'
if default_search in ('auto', 'auto_warning'):
if default_search in ('auto', 'auto_warning', 'fixup_error'):
if '/' in url:
self._downloader.report_warning('The url doesn\'t specify the protocol, trying with http')
return self.url_result('http://' + url)
else:
elif default_search != 'fixup_error':
if default_search == 'auto_warning':
self._downloader.report_warning(
'Falling back to youtube search for %s . Set --default-search to "auto" to suppress this warning.' % url)
if re.match(r'^(?:url|URL)$', url):
raise ExtractorError(
'Invalid URL: %r . Call youtube-dl like this: youtube-dl -v "https://www.youtube.com/watch?v=BaW_jenozKc" ' % url,
expected=True)
else:
self._downloader.report_warning(
'Falling back to youtube search for %s . Set --default-search "auto" to suppress this warning.' % url)
return self.url_result('ytsearch:' + url)
if default_search in ('error', 'fixup_error'):
raise ExtractorError(
('%r is not a valid URL. '
'Set --default-search "ytsearch" (or run youtube-dl "ytsearch:%s" ) to search YouTube'
) % (url, url), expected=True)
else:
assert ':' in default_search
return self.url_result(default_search + url)
video_id = os.path.splitext(url.rstrip('/').split('/')[-1])[0]
url, smuggled_data = unsmuggle_url(url)
force_videoid = None
if smuggled_data and 'force_videoid' in smuggled_data:
force_videoid = smuggled_data['force_videoid']
video_id = force_videoid
else:
video_id = os.path.splitext(url.rstrip('/').split('/')[-1])[0]
self.to_screen('%s: Requesting header' % video_id)
try:
response = self._send_head(url)
head_req = HEADRequest(url)
response = self._request_webpage(
head_req, video_id,
note=False, errnote='Could not send HEAD request to %s' % url,
fatal=False)
if response is not False:
# Check for redirect
new_url = response.geturl()
if url != new_url:
self.report_following_redirect(new_url)
if force_videoid:
new_url = smuggle_url(
new_url, {'force_videoid': force_videoid})
return self.url_result(new_url)
# Check for direct link to a video
@@ -403,10 +501,6 @@ class GenericIE(InfoExtractor):
'upload_date': upload_date,
}
except compat_urllib_error.HTTPError:
# This may be a stupid server that doesn't like HEAD, our UA, or so
pass
try:
webpage = self._download_webpage(url, video_id)
except ValueError:
@@ -424,6 +518,11 @@ class GenericIE(InfoExtractor):
except compat_xml_parse_error:
pass
# Is it a Camtasia project?
camtasia_res = self._extract_camtasia(url, video_id, webpage)
if camtasia_res is not None:
return camtasia_res
# Sometimes embedded video player is hidden behind percent encoding
# (e.g. https://github.com/rg3/youtube-dl/issues/2448)
# Unescaping the whole page allows to handle those cases in a generic way
@@ -439,10 +538,26 @@ class GenericIE(InfoExtractor):
r'(?s)<title>(.*?)</title>', webpage, 'video title',
default='video')
# Try to detect age limit automatically
age_limit = self._rta_search(webpage)
# And then there are the jokers who advertise that they use RTA,
# but actually don't.
AGE_LIMIT_MARKERS = [
r'Proudly Labeled <a href="http://www.rtalabel.org/" title="Restricted to Adults">RTA</a>',
]
if any(re.search(marker, webpage) for marker in AGE_LIMIT_MARKERS):
age_limit = 18
# video uploader is domain name
video_uploader = self._search_regex(
r'^(?:https?://)?([^/]*)/.*', url, 'video uploader')
# Helper method
def _playlist_from_matches(matches, getter, ie=None):
urlrs = orderedSet(self.url_result(getter(m), ie) for m in matches)
return self.playlist_result(
urlrs, playlist_id=video_id, playlist_title=video_title)
# Look for BrightCove:
bc_urls = BrightcoveIE._extract_brightcove_urls(webpage)
if bc_urls:
@@ -476,24 +591,26 @@ class GenericIE(InfoExtractor):
# Look for embedded YouTube player
matches = re.findall(r'''(?x)
(?:<iframe[^>]+?src=|embedSWF\(\s*)
(["\'])(?P<url>(?:https?:)?//(?:www\.)?youtube\.com/
(?:
<iframe[^>]+?src=|
data-video-url=|
<embed[^>]+?src=|
embedSWF\(?:\s*
)
(["\'])
(?P<url>(?:https?:)?//(?:www\.)?youtube\.com/
(?:embed|v)/.+?)
\1''', webpage)
if matches:
urlrs = [self.url_result(unescapeHTML(tuppl[1]), 'Youtube')
for tuppl in matches]
return self.playlist_result(
urlrs, playlist_id=video_id, playlist_title=video_title)
return _playlist_from_matches(
matches, lambda m: unescapeHTML(m[1]), ie='Youtube')
# Look for embedded Dailymotion player
matches = re.findall(
r'<iframe[^>]+?src=(["\'])(?P<url>(?:https?:)?//(?:www\.)?dailymotion\.com/embed/video/.+?)\1', webpage)
if matches:
urlrs = [self.url_result(unescapeHTML(tuppl[1]))
for tuppl in matches]
return self.playlist_result(
urlrs, playlist_id=video_id, playlist_title=video_title)
return _playlist_from_matches(
matches, lambda m: unescapeHTML(m[1]))
# Look for embedded Wistia player
match = re.search(
@@ -512,7 +629,7 @@ class GenericIE(InfoExtractor):
mobj = re.search(r'<meta\s[^>]*https?://api\.blip\.tv/\w+/redirect/\w+/(\d+)', webpage)
if mobj:
return self.url_result('http://blip.tv/a/a-'+mobj.group(1), 'BlipTV')
mobj = re.search(r'<(?:iframe|embed|object)\s[^>]*(https?://(?:\w+\.)?blip\.tv/(?:play/|api\.swf#)[a-zA-Z0-9]+)', webpage)
mobj = re.search(r'<(?:iframe|embed|object)\s[^>]*(https?://(?:\w+\.)?blip\.tv/(?:play/|api\.swf#)[a-zA-Z0-9_]+)', webpage)
if mobj:
return self.url_result(mobj.group(1), 'BlipTV')
@@ -563,7 +680,7 @@ class GenericIE(InfoExtractor):
# Look for embedded NovaMov-based player
mobj = re.search(
r'''(?x)<iframe[^>]+?src=(["\'])
r'''(?x)<(?:pagespeed_)?iframe[^>]+?src=(["\'])
(?P<url>http://(?:(?:embed|www)\.)?
(?:novamov\.com|
nowvideo\.(?:ch|sx|eu|at|ag|co)|
@@ -585,6 +702,11 @@ class GenericIE(InfoExtractor):
if mobj is not None:
return self.url_result(mobj.group('url'), 'VK')
# Look for embedded ivi player
mobj = re.search(r'<embed[^>]+?src=(["\'])(?P<url>https?://(?:www\.)?ivi\.ru/video/player.+?)\1', webpage)
if mobj is not None:
return self.url_result(mobj.group('url'), 'Ivi')
# Look for embedded Huffington Post player
mobj = re.search(
r'<iframe[^>]+?src=(["\'])(?P<url>https?://embed\.live\.huffingtonpost\.com/.+?)\1', webpage)
@@ -602,10 +724,8 @@ class GenericIE(InfoExtractor):
# Look for funnyordie embed
matches = re.findall(r'<iframe[^>]+?src="(https?://(?:www\.)?funnyordie\.com/embed/[^"]+)"', webpage)
if matches:
urlrs = [self.url_result(unescapeHTML(eurl), 'FunnyOrDie')
for eurl in matches]
return self.playlist_result(
urlrs, playlist_id=video_id, playlist_title=video_title)
return _playlist_from_matches(
matches, getter=unescapeHTML, ie='FunnyOrDie')
# Look for embedded RUTV player
rutv_url = RUTVIE._extract_url(webpage)
@@ -636,6 +756,44 @@ class GenericIE(InfoExtractor):
if smotri_url:
return self.url_result(smotri_url, 'Smotri')
# Look for embeded soundcloud player
mobj = re.search(
r'<iframe src="(?P<url>https?://(?:w\.)?soundcloud\.com/player[^"]+)"',
webpage)
if mobj is not None:
url = unescapeHTML(mobj.group('url'))
return self.url_result(url)
# Look for embedded vulture.com player
mobj = re.search(
r'<iframe src="(?P<url>https?://video\.vulture\.com/[^"]+)"',
webpage)
if mobj is not None:
url = unescapeHTML(mobj.group('url'))
return self.url_result(url, ie='Vulture')
# Look for embedded mtvservices player
mobj = re.search(
r'<iframe src="(?P<url>https?://media\.mtvnservices\.com/embed/[^"]+)"',
webpage)
if mobj is not None:
url = unescapeHTML(mobj.group('url'))
return self.url_result(url, ie='MTVServicesEmbedded')
# Look for embedded yahoo player
mobj = re.search(
r'<iframe[^>]+?src=(["\'])(?P<url>https?://(?:screen|movies)\.yahoo\.com/.+?\.html\?format=embed)\1',
webpage)
if mobj is not None:
return self.url_result(mobj.group('url'), 'Yahoo')
# Look for embedded sbs.com.au player
mobj = re.search(
r'<iframe[^>]+?src=(["\'])(?P<url>https?://(?:www\.)sbs\.com\.au/ondemand/video/single/.+?)\1',
webpage)
if mobj is not None:
return self.url_result(mobj.group('url'), 'SBS')
# Start with something easy: JW Player in SWFObject
found = re.findall(r'flashvars: [\'"](?:.*&)?file=(http[^\'"&]*)', webpage)
if not found:
@@ -653,6 +811,14 @@ class GenericIE(InfoExtractor):
if not found:
# Broaden the findall a little bit: JWPlayer JS loader
found = re.findall(r'[^A-Za-z0-9]?file["\']?:\s*["\'](http(?![^\'"]+\.[0-9]+[\'"])[^\'"]+)["\']', webpage)
if not found:
# Flow player
found = re.findall(r'''(?xs)
flowplayer\("[^"]+",\s*
\{[^}]+?\}\s*,
\s*{[^}]+? ["']?clip["']?\s*:\s*\{\s*
["']?url["']?\s*:\s*["']([^"']+)["']
''', webpage)
if not found:
# Try to find twitter cards info
found = re.findall(r'<meta (?:property|name)="twitter:player:stream" (?:content|value)="(.+?)"', webpage)
@@ -662,12 +828,18 @@ class GenericIE(InfoExtractor):
m_video_type = re.findall(r'<meta.*?property="og:video:type".*?content="video/(.*?)"', webpage)
# We only look in og:video if the MIME type is a video, don't try if it's a Flash player:
if m_video_type is not None:
found = re.findall(r'<meta.*?property="og:video".*?content="(.*?)"', webpage)
def check_video(vurl):
vpath = compat_urlparse.urlparse(vurl).path
vext = determine_ext(vpath)
return '.' in vpath and vext not in ('swf', 'png', 'jpg')
found = list(filter(
check_video,
re.findall(r'<meta.*?property="og:video".*?content="(.*?)"', webpage)))
if not found:
# HTML5 video
found = re.findall(r'(?s)<video[^<]*(?:>.*?<source.*?)? src="([^"]+)"', webpage)
found = re.findall(r'(?s)<video[^<]*(?:>.*?<source[^>]+)? src="([^"]+)"', webpage)
if not found:
found = re.findall(
found = re.search(
r'(?i)<meta\s+(?=(?:[a-z-]+="[^"]+"\s+)*http-equiv="refresh")'
r'(?:[a-z-]+="[^"]+"\s+)*?content="[0-9]{,2};url=\'([^\']+)\'"',
webpage)
@@ -699,10 +871,11 @@ class GenericIE(InfoExtractor):
'url': video_url,
'uploader': video_uploader,
'title': video_title,
'age_limit': age_limit,
})
if len(entries) == 1:
return entries[1]
return entries[0]
else:
for num, e in enumerate(entries, start=1):
e['title'] = '%s (%d)' % (e['title'], num)

View File

@@ -0,0 +1,58 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
parse_duration,
parse_iso8601,
)
class GodTubeIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?godtube\.com/watch/\?v=(?P<id>[\da-zA-Z]+)'
_TESTS = [
{
'url': 'https://www.godtube.com/watch/?v=0C0CNNNU',
'md5': '77108c1e4ab58f48031101a1a2119789',
'info_dict': {
'id': '0C0CNNNU',
'ext': 'mp4',
'title': 'Woman at the well.',
'duration': 159,
'timestamp': 1205712000,
'uploader': 'beverlybmusic',
'upload_date': '20080317',
'thumbnail': 're:^https?://.*\.jpg$',
},
},
]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
config = self._download_xml(
'http://www.godtube.com/resource/mediaplayer/%s.xml' % video_id.lower(),
video_id, 'Downloading player config XML')
video_url = config.find('.//file').text
uploader = config.find('.//author').text
timestamp = parse_iso8601(config.find('.//date').text)
duration = parse_duration(config.find('.//duration').text)
thumbnail = config.find('.//image').text
media = self._download_xml(
'http://www.godtube.com/media/xml/?v=%s' % video_id, video_id, 'Downloading media XML')
title = media.find('.//title').text
return {
'id': video_id,
'url': video_url,
'title': title,
'thumbnail': thumbnail,
'timestamp': timestamp,
'uploader': uploader,
'duration': duration,
}

View File

@@ -52,8 +52,7 @@ class GooglePlusIE(InfoExtractor):
# Extract title
# Get the first line for title
video_title = self._html_search_regex(r'<meta name\=\"Description\" content\=\"(.*?)[\n<"]',
webpage, 'title', default='NA')
video_title = self._og_search_description(webpage).splitlines()[0]
# Step 2, Simulate clicking the image box to launch video
DOMAIN = 'https://plus.google.com/'

View File

@@ -14,6 +14,14 @@ class GoogleSearchIE(SearchInfoExtractor):
_MAX_RESULTS = 1000
IE_NAME = 'video.google:search'
_SEARCH_KEY = 'gvsearch'
_TEST = {
'url': 'gvsearch15:python language',
'info_dict': {
'id': 'python language',
'title': 'python language',
},
'playlist_count': 15,
}
def _get_n_results(self, query, n):
"""Get a specified number of results for a query"""

View File

@@ -0,0 +1,88 @@
# -*- coding: utf-8 -*-
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
determine_ext,
compat_urllib_parse,
compat_urllib_request,
)
class GorillaVidIE(InfoExtractor):
IE_DESC = 'GorillaVid.in and daclips.in'
_VALID_URL = r'''(?x)
https?://(?P<host>(?:www\.)?
(?:daclips\.in|gorillavid\.in))/
(?:embed-)?(?P<id>[0-9a-zA-Z]+)(?:-[0-9]+x[0-9]+\.html)?
'''
_TESTS = [{
'url': 'http://gorillavid.in/06y9juieqpmi',
'md5': '5ae4a3580620380619678ee4875893ba',
'info_dict': {
'id': '06y9juieqpmi',
'ext': 'flv',
'title': 'Rebecca Black My Moment Official Music Video Reaction',
'thumbnail': 're:http://.*\.jpg',
},
}, {
'url': 'http://gorillavid.in/embed-z08zf8le23c6-960x480.html',
'md5': 'c9e293ca74d46cad638e199c3f3fe604',
'info_dict': {
'id': 'z08zf8le23c6',
'ext': 'mp4',
'title': 'Say something nice',
'thumbnail': 're:http://.*\.jpg',
},
}, {
'url': 'http://daclips.in/3rso4kdn6f9m',
'md5': '1ad8fd39bb976eeb66004d3a4895f106',
'info_dict': {
'id': '3rso4kdn6f9m',
'ext': 'mp4',
'title': 'Micro Pig piglets ready on 16th July 2009',
'thumbnail': 're:http://.*\.jpg',
},
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
webpage = self._download_webpage('http://%s/%s' % (mobj.group('host'), video_id), video_id)
fields = dict(re.findall(r'''(?x)<input\s+
type="hidden"\s+
name="([^"]+)"\s+
(?:id="[^"]+"\s+)?
value="([^"]*)"
''', webpage))
if fields['op'] == 'download1':
post = compat_urllib_parse.urlencode(fields)
req = compat_urllib_request.Request(url, post)
req.add_header('Content-type', 'application/x-www-form-urlencoded')
webpage = self._download_webpage(req, video_id, 'Downloading video page')
title = self._search_regex(r'style="z-index: [0-9]+;">([0-9a-zA-Z ]+)(?:-.+)?</span>', webpage, 'title')
thumbnail = self._search_regex(r'image:\'(http[^\']+)\',', webpage, 'thumbnail')
url = self._search_regex(r'file: \'(http[^\']+)\',', webpage, 'file url')
formats = [{
'format_id': 'sd',
'url': url,
'ext': determine_ext(url),
'quality': 1,
}]
return {
'id': video_id,
'title': title,
'thumbnail': thumbnail,
'formats': formats,
}

View File

@@ -0,0 +1,73 @@
# -*- coding: utf-8 -*-
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
compat_urlparse,
str_to_int,
ExtractorError,
)
import json
class GoshgayIE(InfoExtractor):
_VALID_URL = r'^(?:https?://)www.goshgay.com/video(?P<id>\d+?)($|/)'
_TEST = {
'url': 'http://www.goshgay.com/video4116282',
'md5': '268b9f3c3229105c57859e166dd72b03',
'info_dict': {
'id': '4116282',
'ext': 'flv',
'title': 'md5:089833a4790b5e103285a07337f245bf',
'thumbnail': 're:http://.*\.jpg',
'age_limit': 18,
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
title = self._search_regex(r'class="video-title"><h1>(.+?)<', webpage, 'title')
player_config = self._search_regex(
r'(?s)jwplayer\("player"\)\.setup\(({.+?})\)', webpage, 'config settings')
player_vars = json.loads(player_config.replace("'", '"'))
width = str_to_int(player_vars.get('width'))
height = str_to_int(player_vars.get('height'))
config_uri = player_vars.get('config')
if config_uri is None:
raise ExtractorError('Missing config URI')
node = self._download_xml(config_uri, video_id, 'Downloading player config XML',
errnote='Unable to download XML')
if node is None:
raise ExtractorError('Missing config XML')
if node.tag != 'config':
raise ExtractorError('Missing config attribute')
fns = node.findall('file')
imgs = node.findall('image')
if len(fns) != 1:
raise ExtractorError('Missing media URI')
video_url = fns[0].text
if len(imgs) < 1:
thumbnail = None
else:
thumbnail = imgs[0].text
url_comp = compat_urlparse.urlparse(url)
ref = "%s://%s%s" % (url_comp[0], url_comp[1], url_comp[2])
return {
'id': video_id,
'url': video_url,
'title': title,
'width': width,
'height': height,
'thumbnail': thumbnail,
'http_referer': ref,
'age_limit': 18,
}

View File

@@ -0,0 +1,190 @@
# coding: utf-8
from __future__ import unicode_literals
import time
import math
import os.path
import re
from .common import InfoExtractor
from ..utils import ExtractorError, compat_urllib_request, compat_html_parser
from ..utils import (
compat_urllib_parse,
compat_urlparse,
)
class GroovesharkHtmlParser(compat_html_parser.HTMLParser):
def __init__(self):
self._current_object = None
self.objects = []
compat_html_parser.HTMLParser.__init__(self)
def handle_starttag(self, tag, attrs):
attrs = dict((k, v) for k, v in attrs)
if tag == 'object':
self._current_object = {'attrs': attrs, 'params': []}
elif tag == 'param':
self._current_object['params'].append(attrs)
def handle_endtag(self, tag):
if tag == 'object':
self.objects.append(self._current_object)
self._current_object = None
@classmethod
def extract_object_tags(cls, html):
p = cls()
p.feed(html)
p.close()
return p.objects
class GroovesharkIE(InfoExtractor):
_VALID_URL = r'https?://(www\.)?grooveshark\.com/#!/s/([^/]+)/([^/]+)'
_TEST = {
'url': 'http://grooveshark.com/#!/s/Jolene+Tenth+Key+Remix+Ft+Will+Sessions/6SS1DW?src=5',
'md5': '7ecf8aefa59d6b2098517e1baa530023',
'info_dict': {
'id': '6SS1DW',
'title': 'Jolene (Tenth Key Remix ft. Will Sessions)',
'ext': 'mp3',
'duration': 227,
}
}
do_playerpage_request = True
do_bootstrap_request = True
def _parse_target(self, target):
uri = compat_urlparse.urlparse(target)
hash = uri.fragment[1:].split('?')[0]
token = os.path.basename(hash.rstrip('/'))
return (uri, hash, token)
def _build_bootstrap_url(self, target):
(uri, hash, token) = self._parse_target(target)
query = 'getCommunicationToken=1&hash=%s&%d' % (compat_urllib_parse.quote(hash, safe=''), self.ts)
return (compat_urlparse.urlunparse((uri.scheme, uri.netloc, '/preload.php', None, query, None)), token)
def _build_meta_url(self, target):
(uri, hash, token) = self._parse_target(target)
query = 'hash=%s&%d' % (compat_urllib_parse.quote(hash, safe=''), self.ts)
return (compat_urlparse.urlunparse((uri.scheme, uri.netloc, '/preload.php', None, query, None)), token)
def _build_stream_url(self, meta):
return compat_urlparse.urlunparse(('http', meta['streamKey']['ip'], '/stream.php', None, None, None))
def _build_swf_referer(self, target, obj):
(uri, _, _) = self._parse_target(target)
return compat_urlparse.urlunparse((uri.scheme, uri.netloc, obj['attrs']['data'], None, None, None))
def _transform_bootstrap(self, js):
return re.split('(?m)^\s*try\s*{', js)[0] \
.split(' = ', 1)[1].strip().rstrip(';')
def _transform_meta(self, js):
return js.split('\n')[0].split('=')[1].rstrip(';')
def _get_meta(self, target):
(meta_url, token) = self._build_meta_url(target)
self.to_screen('Metadata URL: %s' % meta_url)
headers = {'Referer': compat_urlparse.urldefrag(target)[0]}
req = compat_urllib_request.Request(meta_url, headers=headers)
res = self._download_json(req, token,
transform_source=self._transform_meta)
if 'getStreamKeyWithSong' not in res:
raise ExtractorError(
'Metadata not found. URL may be malformed, or Grooveshark API may have changed.')
if res['getStreamKeyWithSong'] is None:
raise ExtractorError(
'Metadata download failed, probably due to Grooveshark anti-abuse throttling. Wait at least an hour before retrying from this IP.',
expected=True)
return res['getStreamKeyWithSong']
def _get_bootstrap(self, target):
(bootstrap_url, token) = self._build_bootstrap_url(target)
headers = {'Referer': compat_urlparse.urldefrag(target)[0]}
req = compat_urllib_request.Request(bootstrap_url, headers=headers)
res = self._download_json(req, token, fatal=False,
note='Downloading player bootstrap data',
errnote='Unable to download player bootstrap data',
transform_source=self._transform_bootstrap)
return res
def _get_playerpage(self, target):
(_, _, token) = self._parse_target(target)
webpage = self._download_webpage(
target, token,
note='Downloading player page',
errnote='Unable to download player page',
fatal=False)
if webpage is not None:
# Search (for example German) error message
error_msg = self._html_search_regex(
r'<div id="content">\s*<h2>(.*?)</h2>', webpage,
'error message', default=None)
if error_msg is not None:
error_msg = error_msg.replace('\n', ' ')
raise ExtractorError('Grooveshark said: %s' % error_msg)
if webpage is not None:
o = GroovesharkHtmlParser.extract_object_tags(webpage)
return (webpage, [x for x in o if x['attrs']['id'] == 'jsPlayerEmbed'])
return (webpage, None)
def _real_initialize(self):
self.ts = int(time.time() * 1000) # timestamp in millis
def _real_extract(self, url):
(target_uri, _, token) = self._parse_target(url)
# 1. Fill cookiejar by making a request to the player page
swf_referer = None
if self.do_playerpage_request:
(_, player_objs) = self._get_playerpage(url)
if player_objs is not None:
swf_referer = self._build_swf_referer(url, player_objs[0])
self.to_screen('SWF Referer: %s' % swf_referer)
# 2. Ask preload.php for swf bootstrap data to better mimic webapp
if self.do_bootstrap_request:
bootstrap = self._get_bootstrap(url)
self.to_screen('CommunicationToken: %s' % bootstrap['getCommunicationToken'])
# 3. Ask preload.php for track metadata.
meta = self._get_meta(url)
# 4. Construct stream request for track.
stream_url = self._build_stream_url(meta)
duration = int(math.ceil(float(meta['streamKey']['uSecs']) / 1000000))
post_dict = {'streamKey': meta['streamKey']['streamKey']}
post_data = compat_urllib_parse.urlencode(post_dict).encode('utf-8')
headers = {
'Content-Length': len(post_data),
'Content-Type': 'application/x-www-form-urlencoded'
}
if swf_referer is not None:
headers['Referer'] = swf_referer
return {
'id': token,
'title': meta['song']['Name'],
'http_method': 'POST',
'url': stream_url,
'ext': 'mp3',
'format': 'mp3 audio',
'duration': duration,
'http_post_data': post_data,
'http_headers': headers,
}

View File

@@ -0,0 +1,42 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
class HentaiStigmaIE(InfoExtractor):
_VALID_URL = r'^https?://hentai\.animestigma\.com/(?P<id>[^/]+)'
_TEST = {
'url': 'http://hentai.animestigma.com/inyouchuu-etsu-bonus/',
'md5': '4e3d07422a68a4cc363d8f57c8bf0d23',
'info_dict': {
'id': 'inyouchuu-etsu-bonus',
'ext': 'mp4',
"title": "Inyouchuu Etsu Bonus",
"age_limit": 18,
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
title = self._html_search_regex(
r'<h2 class="posttitle"><a[^>]*>([^<]+)</a>',
webpage, 'title')
wrap_url = self._html_search_regex(
r'<iframe src="([^"]+mp4)"', webpage, 'wrapper url')
wrap_webpage = self._download_webpage(wrap_url, video_id)
video_url = self._html_search_regex(
r'clip:\s*{\s*url: "([^"]*)"', wrap_webpage, 'video url')
return {
'id': video_id,
'url': video_url,
'title': title,
'age_limit': 18,
}

View File

@@ -0,0 +1,134 @@
from __future__ import unicode_literals
import re
import json
import random
import string
from .common import InfoExtractor
from ..utils import find_xpath_attr
class HowStuffWorksIE(InfoExtractor):
_VALID_URL = r'https?://[\da-z-]+\.howstuffworks\.com/(?:[^/]+/)*\d+-(?P<id>.+?)-video\.htm'
_TESTS = [
{
'url': 'http://adventure.howstuffworks.com/5266-cool-jobs-iditarod-musher-video.htm',
'info_dict': {
'id': '450221',
'display_id': 'cool-jobs-iditarod-musher',
'ext': 'flv',
'title': 'Cool Jobs - Iditarod Musher',
'description': 'md5:82bb58438a88027b8186a1fccb365f90',
'thumbnail': 're:^https?://.*\.jpg$',
},
'params': {
# md5 is not consistent
'skip_download': True
}
},
{
'url': 'http://adventure.howstuffworks.com/39516-deadliest-catch-jakes-farewell-pots-video.htm',
'info_dict': {
'id': '553470',
'display_id': 'deadliest-catch-jakes-farewell-pots',
'ext': 'mp4',
'title': 'Deadliest Catch: Jake\'s Farewell Pots',
'description': 'md5:9632c346d5e43ee238028c9cefd8dbbc',
'thumbnail': 're:^https?://.*\.jpg$',
},
'params': {
# md5 is not consistent
'skip_download': True
}
},
{
'url': 'http://entertainment.howstuffworks.com/arts/2706-sword-swallowing-1-by-dan-meyer-video.htm',
'info_dict': {
'id': '440011',
'display_id': 'sword-swallowing-1-by-dan-meyer',
'ext': 'flv',
'title': 'Sword Swallowing #1 by Dan Meyer',
'description': 'md5:b2409e88172913e2e7d3d1159b0ef735',
'thumbnail': 're:^https?://.*\.jpg$',
},
'params': {
# md5 is not consistent
'skip_download': True
}
},
]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
display_id = mobj.group('id')
webpage = self._download_webpage(url, display_id)
content_id = self._search_regex(r'var siteSectionId="(\d+)";', webpage, 'content id')
mp4 = self._search_regex(
r'''(?xs)var\s+clip\s*=\s*{\s*
.+?\s*
content_id\s*:\s*%s\s*,\s*
.+?\s*
mp4\s*:\s*\[(.*?),?\]\s*
};\s*
videoData\.push\(clip\);''' % content_id,
webpage, 'mp4', fatal=False, default=None)
smil = self._download_xml(
'http://services.media.howstuffworks.com/videos/%s/smil-service.smil' % content_id,
content_id, 'Downloading video SMIL')
http_base = find_xpath_attr(
smil,
'./{0}head/{0}meta'.format('{http://www.w3.org/2001/SMIL20/Language}'),
'name',
'httpBase').get('content')
def random_string(str_len=0):
return ''.join([random.choice(string.ascii_uppercase) for _ in range(str_len)])
URL_SUFFIX = '?v=2.11.3&fp=LNX 11,2,202,356&r=%s&g=%s' % (random_string(5), random_string(12))
formats = []
if mp4:
for video in json.loads('[%s]' % mp4):
bitrate = video['bitrate']
fmt = {
'url': video['src'].replace('http://pmd.video.howstuffworks.com', http_base) + URL_SUFFIX,
'format_id': bitrate,
}
m = re.search(r'(?P<vbr>\d+)[Kk]', bitrate)
if m:
fmt['vbr'] = int(m.group('vbr'))
formats.append(fmt)
else:
for video in smil.findall(
'.//{0}body/{0}switch/{0}video'.format('{http://www.w3.org/2001/SMIL20/Language}')):
vbr = int(video.attrib['system-bitrate']) / 1000
formats.append({
'url': '%s/%s%s' % (http_base, video.attrib['src'], URL_SUFFIX),
'format_id': '%dk' % vbr,
'vbr': vbr,
})
self._sort_formats(formats)
title = self._og_search_title(webpage)
TITLE_SUFFIX = ' : HowStuffWorks'
if title.endswith(TITLE_SUFFIX):
title = title[:-len(TITLE_SUFFIX)]
description = self._og_search_description(webpage)
thumbnail = self._og_search_thumbnail(webpage)
return {
'id': content_id,
'display_id': display_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'formats': formats,
}

View File

@@ -1,10 +1,11 @@
from __future__ import unicode_literals
import json
import re
import time
from .common import InfoExtractor
from ..utils import (
compat_str,
compat_urllib_parse,
compat_urllib_request,
@@ -13,59 +14,55 @@ from ..utils import (
class HypemIE(InfoExtractor):
"""Information Extractor for hypem"""
_VALID_URL = r'(?:http://)?(?:www\.)?hypem\.com/track/([^/]+)/([^/]+)'
_VALID_URL = r'http://(?:www\.)?hypem\.com/track/([^/]+)/([^/]+)'
_TEST = {
u'url': u'http://hypem.com/track/1v6ga/BODYWORK+-+TAME',
u'file': u'1v6ga.mp3',
u'md5': u'b9cc91b5af8995e9f0c1cee04c575828',
u'info_dict': {
u"title": u"Tame"
'url': 'http://hypem.com/track/1v6ga/BODYWORK+-+TAME',
'md5': 'b9cc91b5af8995e9f0c1cee04c575828',
'info_dict': {
'id': '1v6ga',
'ext': 'mp3',
'title': 'Tame',
'uploader': 'BODYWORK',
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
if mobj is None:
raise ExtractorError(u'Invalid URL: %s' % url)
track_id = mobj.group(1)
data = {'ax': 1, 'ts': time.time()}
data_encoded = compat_urllib_parse.urlencode(data)
complete_url = url + "?" + data_encoded
request = compat_urllib_request.Request(complete_url)
response, urlh = self._download_webpage_handle(request, track_id, u'Downloading webpage with the url')
response, urlh = self._download_webpage_handle(
request, track_id, 'Downloading webpage with the url')
cookie = urlh.headers.get('Set-Cookie', '')
self.report_extraction(track_id)
html_tracks = self._html_search_regex(r'<script type="application/json" id="displayList-data">(.*?)</script>',
response, u'tracks', flags=re.MULTILINE|re.DOTALL).strip()
html_tracks = self._html_search_regex(
r'(?ms)<script type="application/json" id="displayList-data">\s*(.*?)\s*</script>',
response, 'tracks')
try:
track_list = json.loads(html_tracks)
track = track_list[u'tracks'][0]
track = track_list['tracks'][0]
except ValueError:
raise ExtractorError(u'Hypemachine contained invalid JSON.')
raise ExtractorError('Hypemachine contained invalid JSON.')
key = track[u"key"]
track_id = track[u"id"]
artist = track[u"artist"]
title = track[u"song"]
key = track['key']
track_id = track['id']
artist = track['artist']
title = track['song']
serve_url = "http://hypem.com/serve/source/%s/%s" % (compat_str(track_id), compat_str(key))
request = compat_urllib_request.Request(serve_url, "" , {'Content-Type': 'application/json'})
serve_url = "http://hypem.com/serve/source/%s/%s" % (track_id, key)
request = compat_urllib_request.Request(
serve_url, '', {'Content-Type': 'application/json'})
request.add_header('cookie', cookie)
song_data_json = self._download_webpage(request, track_id, u'Downloading metadata')
try:
song_data = json.loads(song_data_json)
except ValueError:
raise ExtractorError(u'Hypemachine contained invalid JSON.')
final_url = song_data[u"url"]
song_data = self._download_json(request, track_id, 'Downloading metadata')
final_url = song_data["url"]
return [{
'id': track_id,
'url': final_url,
'ext': "mp3",
'title': title,
'artist': artist,
}]
return {
'id': track_id,
'url': final_url,
'ext': 'mp3',
'title': title,
'uploader': artist,
}

View File

@@ -5,8 +5,8 @@ import re
from .common import InfoExtractor
class StatigramIE(InfoExtractor):
_VALID_URL = r'https?://(www\.)?statigr\.am/p/(?P<id>[^/]+)'
class IconosquareIE(InfoExtractor):
_VALID_URL = r'https?://(www\.)?(?:iconosquare\.com|statigr\.am)/p/(?P<id>[^/]+)'
_TEST = {
'url': 'http://statigr.am/p/522207370455279102_24101272',
'md5': '6eb93b882a3ded7c378ee1d6884b1814',
@@ -15,6 +15,7 @@ class StatigramIE(InfoExtractor):
'ext': 'mp4',
'uploader_id': 'aguynamedpatrick',
'title': 'Instagram photo by @aguynamedpatrick (Patrick Janelle)',
'description': 'md5:644406a9ec27457ed7aa7a9ebcd4ce3d',
},
}
@@ -25,7 +26,7 @@ class StatigramIE(InfoExtractor):
html_title = self._html_search_regex(
r'<title>(.+?)</title>',
webpage, 'title')
title = re.sub(r'(?: *\(Videos?\))? \| Statigram$', '', html_title)
title = re.sub(r'(?: *\(Videos?\))? \| (?:Iconosquare|Statigram)$', '', html_title)
uploader_id = self._html_search_regex(
r'@([^ ]+)', title, 'uploader name', fatal=False)
@@ -33,6 +34,7 @@ class StatigramIE(InfoExtractor):
'id': video_id,
'url': self._og_search_video_url(webpage),
'title': title,
'description': self._og_search_description(webpage),
'thumbnail': self._og_search_thumbnail(webpage),
'uploader_id': uploader_id
}

View File

@@ -18,6 +18,7 @@ class IGNIE(InfoExtractor):
_DESCRIPTION_RE = [
r'<span class="page-object-description">(.+?)</span>',
r'id="my_show_video">.*?<p>(.*?)</p>',
r'<meta name="description" content="(.*?)"',
]
_TESTS = [
@@ -55,6 +56,17 @@ class IGNIE(InfoExtractor):
'skip_download': True,
},
},
{
'url': 'http://www.ign.com/articles/2014/08/15/rewind-theater-wild-trailer-gamescom-2014?watch',
'md5': '4e9a0bda1e5eebd31ddcf86ec0b9b3c7',
'info_dict': {
'id': '078fdd005f6d3c02f63d795faa1b984f',
'ext': 'mp4',
'title': 'Rewind Theater - Wild Trailer Gamescom 2014',
'description': 'Giant skeletons, bloody hunts, and captivating'
' natural beauty take our breath away.',
},
},
]
def _find_video_id(self, webpage):
@@ -62,6 +74,7 @@ class IGNIE(InfoExtractor):
r'data-video-id="(.+?)"',
r'<object id="vid_(.+?)"',
r'<meta name="og:image" content=".*/(.+?)-(.+?)/.+.jpg"',
r'class="hero-poster[^"]*?"[^>]*id="(.+?)"',
]
return self._search_regex(res_id, webpage, 'video id')
@@ -70,10 +83,7 @@ class IGNIE(InfoExtractor):
name_or_id = mobj.group('name_or_id')
page_type = mobj.group('type')
webpage = self._download_webpage(url, name_or_id)
if page_type == 'articles':
video_url = self._search_regex(r'var videoUrl = "(.+?)"', webpage, 'video url')
return self.url_result(video_url, ie='IGN')
elif page_type != 'video':
if page_type != 'video':
multiple_urls = re.findall(
'<param name="flashvars" value="[^"]*?url=(https?://www\.ign\.com/videos/.*?)["&]',
webpage)

Some files were not shown because too many files have changed in this diff Show More