Compare commits

...

289 Commits

Author SHA1 Message Date
Philipp Hagemeister
1cf64ee468 release 2013.10.23.2 2013-10-23 18:38:09 +02:00
Jaime Marquínez Ferrándiz
cdec0190c4 [dailymotion] Extract all the available formats (closes #1028) 2013-10-23 17:33:38 +02:00
Jaime Marquínez Ferrándiz
2450bcb28b [nowvideo] Fix key extraction
Extract it from the embed page
2013-10-23 17:00:33 +02:00
Jaime Marquínez Ferrándiz
3126050c0f Hide the video password on verbose mode 2013-10-23 16:32:17 +02:00
Jaime Marquínez Ferrándiz
93b22c7828 [vimeo] fix the extraction for videos protected with password
Added a test video.
2013-10-23 16:31:53 +02:00
Philipp Hagemeister
0a89b2852e release 2013.10.23.1 2013-10-23 15:12:33 +02:00
Jaime Marquínez Ferrándiz
55b3e45bba [vimeo] Fix pro videos and player.vimeo.com urls
The old process can still be used for those videos.
Added RegexNotFoundError, which is raised by _search_regex if it can't extract the info.
2013-10-23 14:38:03 +02:00
Philipp Hagemeister
365bcf6d97 Merge remote-tracking branch 'origin/master' 2013-10-23 11:40:46 +02:00
Philipp Hagemeister
71907db3ba [vimeo] Fix normal videos (Fixes #1642)
Vimeo Pro Videos are still broken
2013-10-23 11:38:53 +02:00
Philipp Hagemeister
6803655ced Merge pull request #1622 from rbrito/fix-extension
extractor: youtube: Set extension of AAC audio formats to m4a.
2013-10-22 15:16:26 -07:00
Philipp Hagemeister
df1c39ec5c release 2013.10.23 2013-10-23 00:07:27 +02:00
Philipp Hagemeister
80f55a9511 release 2013.10.22 2013-10-22 22:35:13 +02:00
Philipp Hagemeister
7853cc5ae1 Merge remote-tracking branch 'origin/master'
Conflicts:
	youtube_dl/YoutubeDL.py
2013-10-22 22:30:06 +02:00
Philipp Hagemeister
586a91b67f Expand tilde in template (Fixes #1639) 2013-10-22 22:28:26 +02:00
Jaime Marquínez Ferrándiz
b028e96144 [arte.tv:creative] Update the title of the test 2013-10-22 21:06:06 +02:00
Jaime Marquínez Ferrándiz
ce68b5907c [nhl:videocenter] Fix playlist title extraction 2013-10-22 21:01:16 +02:00
Jaime Marquínez Ferrándiz
fe7e0c9825 Style fixes in YoutubeDL.py
Fixed some of the problems reported by pep8
2013-10-22 14:49:34 +02:00
Jaime Marquínez Ferrándiz
12893efe01 Respect the download parameter in YoutubeDL.process_video_result if the extractor handle the format selection 2013-10-22 00:01:59 +02:00
Joshua Elsasser
a6387bfd3c [vimeo] Implement the new format selection system (closes PR #996)
Rebased and deleted some parts to use the new system instead of copying the one from YoutubeIE
2013-10-21 23:16:11 +02:00
Jaime Marquínez Ferrándiz
f6a54188c2 [youtube] Use 'node is None' when checking if the video has automatic captions
It had stopped working and it reports a FutureWarning
2013-10-21 16:28:55 +02:00
Jaime Marquínez Ferrándiz
cbbd9a9c69 Fix the duration field for the VideoDetective and InternetVideoArchive tests
Also remove the use of the old format system and the comment
2013-10-21 15:07:33 +02:00
Jaime Marquínez Ferrándiz
685a9cd2f1 [googleplus] Fix upload_date extraction 2013-10-21 15:00:21 +02:00
Jaime Marquínez Ferrándiz
182a107877 [arte] Set the format_note and the format_id fields (closes #1628) 2013-10-21 14:42:30 +02:00
Jaime Marquínez Ferrándiz
8c51aa6506 The 'format' field now defaults to '{format_id} - {width}x{height}{format_note}'
Following the YoutubeIE format. The 'format_note' gives additional info about the format, for example '3D' or 'DASH video'.
2013-10-21 14:42:06 +02:00
Jaime Marquínez Ferrándiz
3fd39e37f2 YoutubeDL: remove method that came from FileDownloader 2013-10-21 13:52:24 +02:00
Jaime Marquínez Ferrándiz
49e86983e7 Allow to use the extension for the format selection
The best format with the extension is downloaded.
2013-10-21 13:31:55 +02:00
Jaime Marquínez Ferrándiz
a9c58ad945 Accept requested formats to be in the format 35/best (closes #1552)
The format selection code is now an independent function.
2013-10-21 13:19:58 +02:00
Philipp Hagemeister
f8b45beacc Merge remote-tracking branch 'rbrito/set-age'
Conflicts:
	youtube_dl/extractor/xhamster.py
2013-10-19 21:16:14 +02:00
Philipp Hagemeister
9d92015d43 [xhamster] Add support for age_limit (Instead of #1627) 2013-10-19 21:09:48 +02:00
Rogério Brito
50a6150ed9 extractor: Set age limit on some adult-related extractors.
More age limit of videos for adult-related sites.

Note that, for redtube, I explicitly left the variable containing the age
limit, since the comment justifying the age limit is a good thing to have.

That being said, I included the age limit field on the test, to better
reflect what the information extractor does (even if it may not break the
automated tests).

Signed-off-by: Rogério Brito <rbrito@ime.usp.br>
2013-10-19 14:19:25 -03:00
Philipp Hagemeister
284acd57d6 Add an author email 2013-10-19 11:14:20 +02:00
Rogério Brito
8ed6b34477 extractor: Set age limit on some adult-related extractors.
This is similar in spirit to what was done in commit 8e590a117f.

Signed-off-by: Rogério Brito <rbrito@ime.usp.br>
2013-10-18 19:32:37 -03:00
Rogério Brito
f6f1fc9286 extractor: youtube: Fix extension of dash formats.
While we are at it, separate the audio formats from the video formats.

Signed-off-by: Rogério Brito <rbrito@ime.usp.br>
2013-10-18 18:53:00 -03:00
Philipp Hagemeister
8e590a117f [xnxx] Add age_limit 2013-10-18 23:35:17 +02:00
Philipp Hagemeister
d5594202aa Simplify release process 2013-10-18 23:34:55 +02:00
Philipp Hagemeister
b186d949cf release 2013.10.18.2 2013-10-18 23:22:54 +02:00
Philipp Hagemeister
3d2986063c [bash-completion] Do not use dash in function name (Fixes #1623) 2013-10-18 23:13:46 +02:00
Philipp Hagemeister
41fd7c7e60 Add new option --abort-on-error 2013-10-18 23:09:32 +02:00
Philipp Hagemeister
fdefe96bf2 Document %(format)s (#1612) 2013-10-18 23:09:08 +02:00
Rogério Brito
16f36a6fc9 extractor: youtube: Set extension of AAC audio formats to m4a.
This, in particular, eases downloading both audio and videos in DASH formats
before muxing them, which alleviates the problem that I exposed on issue

Furthermore, one may argue that this is, indeed, the case for correctness's
sake.

Signed-off-by: Rogério Brito <rbrito@ime.usp.br>
2013-10-18 17:50:55 -03:00
Philipp Hagemeister
cce722b79c Add metavar to --cache-dir 2013-10-18 11:50:48 +02:00
Philipp Hagemeister
82697fb2ab release 2013.10.18.1 2013-10-18 11:45:30 +02:00
Philipp Hagemeister
53c1d3ef49 Check for embedded YouTube player (Fixes #1616) 2013-10-18 11:44:57 +02:00
Philipp Hagemeister
8e55e9abfc release 2013.10.18 2013-10-18 11:17:21 +02:00
Philipp Hagemeister
7c58ef3275 [tudou] Fix title regex (Fixes #1614) 2013-10-18 11:16:20 +02:00
Philipp Hagemeister
416a5efce7 fix typos 2013-10-18 00:49:45 +02:00
Philipp Hagemeister
f4d96df0f1 Extend #980 with --max-quality support 2013-10-18 00:46:35 +02:00
Philipp Hagemeister
5d254f776a Fix test 2013-10-18 00:27:51 +02:00
Philipp Hagemeister
1c1218fefc Merge remote-tracking branch 'jaimeMF/format_selection' 2013-10-18 00:17:03 +02:00
Jaime Marquínez Ferrándiz
d21ab29200 Add an extractor for techtalks.tv (closes #1606) 2013-10-17 08:20:58 +02:00
Philipp Hagemeister
54ed626cf8 release 2013.10.17 2013-10-17 02:20:26 +02:00
Philipp Hagemeister
a733eb6c53 [youtube] Do not crash if caption info is missing altogether (Fixes #1610) 2013-10-17 02:19:19 +02:00
Philipp Hagemeister
591454798d [brightcove] Raise error if playlist is empty (#1608) 2013-10-17 01:02:17 +02:00
Philipp Hagemeister
38604f1a4f Merge remote-tracking branch 'origin/master' 2013-10-17 00:55:06 +02:00
Philipp Hagemeister
2d0efe70a6 [brightcove] Fix more broken XML (#1608) 2013-10-17 00:46:11 +02:00
Jaime Marquínez Ferrándiz
bfd14b1b2f Add an extractor for rutube.ru (closes #1136)
It downloads with a m3u8 manifest, requires ffmpeg.
2013-10-16 16:57:40 +02:00
Jaime Marquínez Ferrándiz
76965512da Fix the indentation of the Makefile
It uses tabs, no spaces.
2013-10-15 23:15:15 +02:00
Jaime Marquínez Ferrándiz
996d1c3242 Don't include the test/testdata directory in the youtube-dl.tar.gz
The last releases included big files that increased the size of the compressed file.
2013-10-15 23:08:52 +02:00
Philipp Hagemeister
8abbf43f21 release 2013.10.15 2013-10-15 12:06:45 +02:00
Philipp Hagemeister
10eaae48ff Merge branch 'master' of github.com:rg3/youtube-dl 2013-10-15 12:05:24 +02:00
Philipp Hagemeister
9d4660cab1 [generic] Support embedded vimeo videos (#1602) 2013-10-15 12:05:13 +02:00
Jaime Marquínez Ferrándiz
9d74e308f7 [sztvhu] Fix the title extraction 2013-10-15 08:22:59 +02:00
Jaime Marquínez Ferrándiz
e772692ffd Fix an import in the tests and the Youtube Shows test 2013-10-15 08:22:20 +02:00
Jaime Marquínez Ferrándiz
8381a92120 [websurg] Skipt the test
It needs login information.
2013-10-15 08:12:30 +02:00
Philipp Hagemeister
cd054fc491 Use upper-case for prefixes in help to signify bytes (#1043) 2013-10-15 04:53:02 +02:00
Philipp Hagemeister
f219743e33 Merge remote-tracking branch 'alphapapa/master' 2013-10-15 04:52:07 +02:00
Philipp Hagemeister
4f41664de8 Merge remote-tracking branch 'Rudloff/websurg' 2013-10-15 02:11:33 +02:00
Philipp Hagemeister
a4fd04158e Do not import * 2013-10-15 02:07:26 +02:00
Philipp Hagemeister
44a5f1718a Simplify tests
* Make them directly executable again
* Move common stuff (md5, parameters) to helper
* Never import *
* General clean up
2013-10-15 02:00:55 +02:00
Philipp Hagemeister
a623df4c7b Credit @Elbandi for sztvhu 2013-10-15 01:34:47 +02:00
Philipp Hagemeister
7cf67fbe29 [sztvhu] Simplify 2013-10-15 01:33:20 +02:00
Philipp Hagemeister
3ddf1a6d01 Merge remote-tracking branch 'Elbandi/master' 2013-10-15 01:26:34 +02:00
Philipp Hagemeister
850555c484 Merge remote-tracking branch 'origin/master' 2013-10-15 01:25:47 +02:00
Philipp Hagemeister
9ed3bdc64d [tudou] Add support for youku links (Closes #1571) 2013-10-15 01:20:04 +02:00
Jaime Marquínez Ferrándiz
c45aa56080 [gamespot] Fix video extraction (fixes #1587) 2013-10-14 16:46:07 +02:00
Philipp Hagemeister
7394b8db3b Merge remote-tracking branch 'origin/master' 2013-10-14 16:07:53 +02:00
Andras Elso
f9b3d7af47 Add an extractor for Szombathelyi TV 2013-10-14 13:07:47 +02:00
Filippo Valsorda
ea62a2da46 add VideoPremium.tv RTMP support 2013-10-14 01:32:47 -04:00
Filippo Valsorda
7468b6b71d Merge pull request #1569 from Jaiz909/1321-download-annotations
Added downloading annotations download support - closes #1321
2013-10-13 22:03:22 -07:00
Jai Grimshaw
1fb07d10a3 [youtube] Adds #1312 Download annotations
Adds #1321 Download annotations from youtube
Annotations are downloaded and written to a .annotations.xml file using the https://www.youtube.com/annotations_invideo?features=1&legacy=1&video_id=$VIDEOID API.
Added unit test for annotations.
2013-10-14 16:22:27 +11:00
Philipp Hagemeister
9378ae6e1d [youku] Allow shortcut youku:ID and make non-matching groups non-matching (#1571) 2013-10-13 15:55:05 +02:00
Philipp Hagemeister
06723d47c4 Merge remote-tracking branch 'jaimeMF/opus-fix' 2013-10-13 15:26:10 +02:00
Jaime Marquínez Ferrándiz
69a0c470b5 [arte] Add an extractor for future.arte.tv (closes #1593) 2013-10-13 14:21:13 +02:00
Jaime Marquínez Ferrándiz
c40f5cf45c [arte] add an extractor for creative.arte.tv (#1593)
The +7 videos now use an independent extractor that is also used for the creative videos
2013-10-13 13:54:31 +02:00
Jaime Marquínez Ferrándiz
4b7b839f24 Add an extractor for rottentomatoes.com and improve InternetVideoArchiveIE to get the best quality 2013-10-12 22:22:31 +02:00
Jaime Marquínez Ferrándiz
3d60d33773 Add an extractor for videodetective.com (closes #262)
It uses the internetvideoarchive.com platform.
2013-10-12 21:36:17 +02:00
Jaime Marquínez Ferrándiz
d7e66d39a0 Add an extractor for internetvideoarchive.com videos
It's used by videodetective.com
2013-10-12 21:34:04 +02:00
Filippo Valsorda
d3f46b9aa5 Add support for single-test tox runs
Use a sintax like
    tox test.test_download:TestDownload.test_NowVideo
to run the specific test on all the tox environments (Python versions)
2013-10-12 13:17:11 -04:00
Filippo Valsorda
f5e54a1fda add support for NowVideo.ch 2013-10-12 13:11:03 -04:00
Jaime Marquínez Ferrándiz
4eb7f1d12e FFmpegPostProcessor: print the command line used if the --verbose option is given 2013-10-12 13:49:27 +02:00
Jaime Marquínez Ferrándiz
0f6d12e43c Don't set the '-aq' option with the opus format (fixes #1263) 2013-10-12 13:30:30 +02:00
Jaime Marquínez Ferrándiz
b4cdc245cf Merge pull request #1590 from joeyadams/master
Fix Brightcove detection when another Flash object is on the page
2013-10-12 02:09:39 -07:00
Joey Adams
3283533149 Fix Brightcove detection when another Flash object is on the page
The regex used non-greedy match, but alas it failed on input like this:

    <object class="...> ... class="BrightcoveExperience"

It captured two objects and the intervening HTML.  This commit fixes this by
not allowing a ">" to appear before BrightcoveExperience.

Video in question: http://www.harpercollinschildrens.com/feature/petethecat/
2013-10-11 21:52:33 -04:00
Jaime Marquínez Ferrándiz
8032e31f2d Merge pull request #1558 from rzhxeo/cinemassacre
Add support for http://cinemassacre.com
2013-10-11 20:38:26 +02:00
Jaime Marquínez Ferrándiz
d2f9cdb205 Merge branch 'cinemassacre' of github.com:rzhxeo/youtube-dl into rzhxeo-cinemassacre 2013-10-11 19:53:27 +02:00
Jaime Marquínez Ferrándiz
8016c92297 Fix the default values of format_id and format 2013-10-11 16:34:49 +02:00
Jaime Marquínez Ferrándiz
e028d0d1e3 Implement the prefer_free_formats in YoutubeDL 2013-10-11 16:34:49 +02:00
Jaime Marquínez Ferrándiz
79819f58f2 Default 'format' field to {width}x{height}
If width is None, use {height}p and if height is None, '???'
2013-10-11 16:34:49 +02:00
Jaime Marquínez Ferrándiz
6ff000b888 Do not handle format selection for IEs that already handle it 2013-10-11 16:34:48 +02:00
Jaime Marquínez Ferrándiz
99e206d508 Implement the max quality option in YoutubeDL 2013-10-11 16:34:48 +02:00
Jaime Marquínez Ferrándiz
dd82ffea0c Implement format selection in YoutubeDL
Now the IEs can set a formats field in the info_dict, with the formats ordered from worst to best quality. It's a list of dicts with the following fields:
* Mandatory: url and ext
* Optional: format and format_id

The format_id is used for choosing which formats have to be downloaded.

Now a video result is processed by the method process_video_result.
2013-10-11 16:34:48 +02:00
Jaime Marquínez Ferrándiz
3823342d9d [arte] Prepare for generic format support (#980) 2013-10-11 16:33:31 +02:00
Jaime Marquínez Ferrándiz
91dbaef406 [nhl] Add an extractor for videocenter's categories (#1586)
It downloads the last 12 videos.
2013-10-11 14:33:26 +02:00
Jaime Marquínez Ferrándiz
9026dd3858 Make sure it only runs rtmpdump one time in test mode and return True if the download can be resumed 2013-10-11 12:42:15 +02:00
Jaime Marquínez Ferrándiz
81d7f1928c Merge pull request #1565 from rzhxeo/rtmpdump_test
Only download 1 sec. with rtmpdump in test mode
2013-10-11 12:40:18 +02:00
Jaime Marquínez Ferrándiz
bc4f29170f Add a PostProcessor for adding metadata to the file (closes #1570)
It currently sets the title, the date and the author values.
2013-10-11 11:19:09 +02:00
Jaime Marquínez Ferrándiz
cb354c8f62 [yahoo] Download the info from another page
The 'meta' field is not always in the video webpage
2013-10-10 21:01:45 +02:00
Jaime Marquínez Ferrándiz
1cbb27b151 [gamespot] Mark as broken (#1587) 2013-10-10 19:55:52 +02:00
Jaime Marquínez Ferrándiz
0ab4ff6378 [mtv] Strip the description
There were some tabs and newlines added around the string.
2013-10-10 19:53:44 +02:00
Jaime Marquínez Ferrándiz
63da13e829 Add an extractor for faz.net (closes #1582) 2013-10-10 19:37:17 +02:00
Jaime Marquínez Ferrándiz
4193a453c2 Don't add extractors with IE_DESC set to False to the page of supported sites. 2013-10-10 16:18:02 +02:00
Jaime Marquínez Ferrándiz
2e1fa03bf5 Add an extractor for video.nhl.com (closes #1586) 2013-10-10 16:16:49 +02:00
Philipp Hagemeister
8f1ae18a18 release 2013.10.09 2013-10-09 23:50:47 +02:00
Philipp Hagemeister
57da92b7df [youtube] Do not recognize attribution link as user (Fixes #1573) 2013-10-09 23:50:38 +02:00
Jaime Marquínez Ferrándiz
df4f632dbc Merge pull request #1584 from wingsuit/master
Tiny tpo
2013-10-09 07:44:06 -07:00
Jaime Marquínez Ferrándiz
a34c2faae4 [youtube] set the 'name' parameter in the subtitles url (fixes #1577) 2013-10-09 16:41:36 +02:00
Tom
1d368c7589 Tiny tpo 2013-10-09 21:56:09 +08:00
Jaime Marquínez Ferrándiz
88bd97e34c [vevo] Some improvements (fixes #1580)
Extract the info from http://videoplayer.vevo.com/VideoService/AuthenticateVideo?isrc={id}
Some videos don't have an smil manifest, extract the video urls directly from the json and use the last version of the video.
Extract all the available formats and set the 'formats' field of the result
2013-10-08 21:25:38 +02:00
Jaime Marquínez Ferrándiz
2ae3edb1cf Fix the printing of the proxy map in debug mode
The proxies have to be extracted from the opener.handlers
2013-10-07 21:10:31 +02:00
Philipp Hagemeister
b2ad967e45 Simplify test setup 2013-10-07 19:06:36 +02:00
Philipp Hagemeister
a27b9e8bd5 Move opener setup into a separate helper function 2013-10-07 19:01:47 +02:00
Philipp Hagemeister
4481a754e4 release 2013.10.07 2013-10-07 14:34:19 +02:00
Philipp Hagemeister
faa6ef6bc8 [jeuxvideo] Improve code quality (fixes #1567) 2013-10-07 14:33:23 +02:00
Philipp Hagemeister
15870e90b0 Restore warning when user forgets to quote URL (#1396) 2013-10-07 12:21:24 +02:00
rzhxeo
8e4f824365 Remove test parameter from _download_with_rtmpdump 2013-10-06 22:04:32 +02:00
Jaime Marquínez Ferrándiz
387ae5f30b [vimeo] Recognize urls ending in a slash (fixes #1242) 2013-10-06 21:56:23 +02:00
rzhxeo
ad7a071ab6 Only download 1 sec. with rtmpdump in test mode 2013-10-06 20:55:24 +02:00
Philipp Hagemeister
1310bf2474 [redtube] add age_limit 2013-10-06 16:39:35 +02:00
Philipp Hagemeister
b24f347190 Merge branch 'download-archive'
Conflicts:
	youtube_dl/YoutubeDL.py
	youtube_dl/__init__.py
2013-10-06 16:30:26 +02:00
Philipp Hagemeister
ee6c9f95e1 Remove superfluous parenthesis 2013-10-06 16:28:36 +02:00
Philipp Hagemeister
2a69c6b879 Merge branch 'age_limit' 2013-10-06 16:23:18 +02:00
Philipp Hagemeister
cfadd183c4 Call extracted property age_limit everywhere 2013-10-06 16:23:06 +02:00
Philipp Hagemeister
e484c81f0c [generic] Clarify error messages 2013-10-06 16:03:18 +02:00
Philipp Hagemeister
7e5e8306fd release 2013.10.06 2013-10-06 07:13:14 +02:00
Philipp Hagemeister
41e8bca4d0 [viddler] Add basic support (Fixes #1520) 2013-10-06 07:12:47 +02:00
Philipp Hagemeister
8dbe9899a9 Allow users to specify an age limit (fixes #1545)
With these changes, users can now restrict what videos are downloaded by the intented audience, by specifying their age with --age-limit YEARS .
Add rudimentary support in youtube, pornotube, and youporn.
2013-10-06 06:08:56 +02:00
Philipp Hagemeister
f4aac741d5 Move try_rm to test helpers 2013-10-06 05:47:17 +02:00
Philipp Hagemeister
c1c9a79c49 Add basic --download-archive option
Often, users want to be able to download only videos they haven't seen before, despite the video files having been deleted or moved in the mean time.
When --download-archive FILE is given, the extractor and ID of every download is recorded in the specified file. If it is already present, the video in question is skipped.
2013-10-06 04:27:10 +02:00
Philipp Hagemeister
226113c880 Merge remote-tracking branch 'origin/tox' 2013-10-05 22:47:44 +02:00
Filippo Valsorda
8932a66e49 [fixup] remove unnecessary commented function 2013-10-05 16:38:37 -04:00
Filippo Valsorda
79cfb46d42 add tox configuration file for easy testing 2013-10-05 16:08:48 -04:00
Filippo Valsorda
00fcc17aee add capability to suppress expected warnings in tests 2013-10-05 15:55:58 -04:00
Philipp Hagemeister
e94b783c74 [googleplus] Fix upload_date detection 2013-10-05 16:38:33 +02:00
Philipp Hagemeister
97dae9ae07 [bliptv] Make sure video ID is a string 2013-10-05 16:12:29 +02:00
rzhxeo
ca215e0a4f [CinemassacreIE] Use MD5 to check in TEST description 2013-10-05 13:42:17 +02:00
rzhxeo
91a26ca559 [CinemassacreIE] Remove docstring from class 2013-10-05 13:40:05 +02:00
rzhxeo
1ece880d7c [CinemassacreIE] Add support for other embed methods 2013-10-05 13:36:13 +02:00
rzhxeo
400afddaf4 Add CinemassacreIE 2013-10-05 09:37:11 +02:00
Jaime Marquínez Ferrándiz
c3fef636b5 [dailymotion] Fix playlist extraction
The html code has changed, make the video ids extraction more solid.
2013-10-04 14:07:29 +02:00
Philipp Hagemeister
46e28a84ca [brightcove] Fix up some broken HTML (#1553) 2013-10-04 11:53:49 +02:00
Philipp Hagemeister
17ad2b3fb1 [yahoo] Switch ext of test 2013-10-04 11:44:56 +02:00
Philipp Hagemeister
5e2a60db4a [yahoo] Fix test title 2013-10-04 11:44:02 +02:00
Philipp Hagemeister
cd214418f6 [redtube] pep8 2013-10-04 11:41:57 +02:00
Philipp Hagemeister
ba2d9f213e [jeuxvideo] fix video file md5sum 2013-10-04 11:38:56 +02:00
Philipp Hagemeister
7f8ae73a5d Include length in player cache ID
Some videos use the same player with IDs of multiple lengths.
See https://travis-ci.org/rg3/youtube-dl/jobs/12126506#L319 for an example.
2013-10-04 11:36:06 +02:00
Philipp Hagemeister
466880f531 [yahoo] Do not try to run rtmpdump on travis 2013-10-04 11:34:12 +02:00
Philipp Hagemeister
9f1f6d2437 [rtlnow] Skip test on travis 2013-10-04 11:33:14 +02:00
Philipp Hagemeister
9e0f897f6b [francetv] Use common format for ID of generation-quoi subextractor 2013-10-04 11:30:47 +02:00
Philipp Hagemeister
c0f6aa876f Merge remote-tracking branch 'origin/master' 2013-10-04 11:14:20 +02:00
Philipp Hagemeister
d93bdee9a6 [comedycentral] Prepare for generic video extraction (#980) 2013-10-04 11:14:10 +02:00
Philipp Hagemeister
f13d09332d [mtv] Prepare for #980 2013-10-04 11:10:04 +02:00
Philipp Hagemeister
2f5865cc6d Clarify that url and ext are optional when formats is given (#980) 2013-10-04 11:09:43 +02:00
Philipp Hagemeister
deefc05b88 Document formats (for #980) 2013-10-04 10:40:42 +02:00
Philipp Hagemeister
0d8cb1cc14 [ted] Prepare #980 merge 2013-10-04 10:32:34 +02:00
Jaime Marquínez Ferrándiz
a90b9fd209 Merge pull request #1551 from rzhxeo/flickr
[FlickrIE] Fix HTTPS url
2013-10-03 23:14:12 -07:00
rzhxeo
829493439a [FlickrIE] Fix HTTPS url 2013-10-04 07:47:40 +02:00
Pierre Rudloff
73b4fafd82 Use self._download_webpage everywhere 2013-10-04 01:12:42 +02:00
Pierre Rudloff
b039775057 Unused variable 2013-10-04 01:07:24 +02:00
Pierre Rudloff
5c1d63b737 Changes suggested by @phihag 2013-10-04 01:04:38 +02:00
Philipp Hagemeister
3cd022f6e6 Merge remote-tracking branch 'rzhxeo/rtl_ntv' 2013-10-04 00:59:11 +02:00
Philipp Hagemeister
abefd1f7c4 Merge remote-tracking branch 'rzhxeo/rtl_upload_date' 2013-10-04 00:58:35 +02:00
Philipp Hagemeister
c21315f273 [youtube] new static 82 signature 2013-10-04 00:43:01 +02:00
Philipp Hagemeister
9ab1018b1a release 2013.10.04 2013-10-04 00:38:19 +02:00
Philipp Hagemeister
da0a5d2d6e [france2] Add support for URLs without video IDs (Fixes #1547) 2013-10-04 00:34:36 +02:00
Jaime Marquínez Ferrándiz
ee6adb166c [ign] Support more urls and detect multiple videos in articles (fixes #1543) 2013-10-02 20:59:34 +02:00
Philipp Hagemeister
be8fe32c92 Fix help of --cachedir 2013-10-02 14:37:19 +02:00
Philipp Hagemeister
c38b1e776d [youtube] Simplify cache_dir code (#1529) 2013-10-02 08:41:14 +02:00
Philipp Hagemeister
4f8bf17f23 Merge remote-tracking branch 'holomorph/master' 2013-10-02 08:23:53 +02:00
Philipp Hagemeister
ca40186c75 [youtube] Fix static 82 signature (Closes #1539) 2013-10-02 08:20:00 +02:00
Philipp Hagemeister
a8c6b24155 [youtube] Support videos without a title (Fixes #1391, Closes #1542) 2013-10-02 07:25:35 +02:00
Filippo Valsorda
bd8e5c7ca2 Merge pull request #1531 from rg3/no-playlist
[youtube] implement --no-playlist to only download current video
2013-10-01 10:08:20 -07:00
Filippo Valsorda
7c61bd36bb [youtube] correct --no-playlist for python3 2013-10-01 11:58:13 -04:00
Jaime Marquínez Ferrándiz
c54283824c [dailymotion] Detect vevo videos (fixes #1532)
All videos from the Vevo user, just embed videos from vevo.com
2013-10-01 15:05:41 +02:00
Philipp Hagemeister
52f15da2ca release 2013.10.01.1 2013-10-01 14:44:26 +02:00
Philipp Hagemeister
44d466559e Properly handle stream meap not being present 2013-10-01 14:44:09 +02:00
Philipp Hagemeister
05751eb047 release 2013.10.01 2013-10-01 11:43:54 +02:00
Philipp Hagemeister
f10503db67 Handle videos without url_encoded_fmt_stream_map (Fixes #1535) 2013-10-01 11:39:11 +02:00
rzhxeo
adfeafe9e1 [RTLnowIE] Allow video description without upload date
Some videos (feature films) have no upload date.
2013-10-01 07:22:49 +02:00
rzhxeo
4c62a16f4f [RTLnowIE] Add support for http://n-tvnow.de 2013-10-01 06:55:30 +02:00
rzhxeo
c0de39e6d4 Merge pull request #2 from rg3/master
Update
2013-09-30 21:39:58 -07:00
Mark Oteiza
fa55675593 Support XDG base directory specification 2013-09-30 18:22:38 -04:00
Filippo Valsorda
d4d9920a26 add test for --no-playlist 2013-09-30 18:01:17 -04:00
Filippo Valsorda
47192f92d8 implement --no-playlist to only download current video - closes #755 2013-09-30 16:26:25 -04:00
Jaime Marquínez Ferrándiz
722076a123 [rtlnow] Replace one of the tests
The video is no longer available.
2013-09-29 23:07:26 +02:00
Jaime Marquínez Ferrándiz
bb4aa62cf7 [appletrailers] The request for the settings must have the trailer name in lower case (fixes #1329) 2013-09-29 20:59:19 +02:00
Jaime Marquínez Ferrándiz
843530568f [appletrailers] Rework extraction (fixes #1387)
The exraction was broken:
* The includes page contains img elements that need to be fixed.
* Use the 'itunes.inc' page, it contains a json dictionary for each trailer with information.
* Get the formats from 'includes/settings{trailer_name}.json'
* Use urljoin to allow urls with a fragment identifier to work

Removed the thumbnail urls from the tests, they are different now.
2013-09-29 20:49:58 +02:00
Philipp Hagemeister
138a5454b5 release 2013.09.29 2013-09-29 14:38:37 +02:00
Philipp Hagemeister
d279037036 [update] Prevent cmd window popup on Windows (Fixes #1478) 2013-09-29 14:37:06 +02:00
Philipp Hagemeister
46353f6783 [update] Look for .exe extension on Windows (Fixes #745) 2013-09-29 14:37:00 +02:00
Jaime Marquínez Ferrándiz
70922df8b5 [dailymotion] Disable the family filter in the playlists (fixes #1524) 2013-09-29 12:44:02 +02:00
Jaime Marquínez Ferrándiz
9c15e9de84 [yahoo] Fix video extraction (fixes #1521)
There's no need to use two different methods.
Now we can also download videos over http if possible.
Also run the test for rtmp videos, but skip the download.
2013-09-28 21:19:52 +02:00
Philipp Hagemeister
123c10608d Merge branch 'master' of github.com:rg3/youtube-dl 2013-09-28 15:43:38 +02:00
Philipp Hagemeister
0b7c2485b6 [zdf] Add support for hash URLs and simplify (#1518) 2013-09-28 15:43:34 +02:00
Jaime Marquínez Ferrándiz
9abb32045a [youtube] Add hlsvp to the error message if it can't be found and remove the live stream test
It's no longer available, other olympics streams have the same problem.
2013-09-27 15:06:27 +02:00
Jaime Marquínez Ferrándiz
f490e77e77 [youtube] Set the thumbnail to None if it can't be extracted 2013-09-27 14:22:36 +02:00
Jaime Marquínez Ferrándiz
2dc592991a [youtube] update description of test 2013-09-27 14:20:52 +02:00
Jaime Marquínez Ferrándiz
0a60edcfa9 Don't fail if the video thumbnail couldn't be downloaded (fixes #1516)
Just report a warning
2013-09-27 14:19:19 +02:00
Philipp Hagemeister
c53f9d30c8 Merge branch 'master' of github.com:rg3/youtube-dl 2013-09-27 13:09:58 +02:00
Philipp Hagemeister
509f398292 Remove youtube_genalgo (#1515)
With the automatic signature extraction, this script has become superfluous now
2013-09-27 13:09:24 +02:00
Jaime Marquínez Ferrándiz
74bab3f0a4 Don't embed subtitles if the list is empty or the field is not set (fixes #1510) 2013-09-27 08:08:43 +02:00
Philipp Hagemeister
8574862991 Merge remote-tracking branch 'rzhxeo/RTL_T' 2013-09-27 06:25:04 +02:00
Philipp Hagemeister
2de957c7e1 Merge remote-tracking branch 'rzhxeo/RTL' 2013-09-27 06:23:10 +02:00
Philipp Hagemeister
920de7a27d [youtube] Fix 83 signature (Closes #1511) 2013-09-27 06:15:21 +02:00
rzhxeo
63efc427cd [RTLnowIE] Clean video title
The title of some videos has the following format:
Series - Episode | Series online schauen bei ... NOW
2013-09-27 06:00:37 +02:00
rzhxeo
ce65fb6c76 [RTLnowIE] Add support for http://rtlnitronow.de 2013-09-27 05:50:16 +02:00
Jaime Marquínez Ferrándiz
4de1994b6e [brightcove] Use direct url for the tests
The test_all_urls.py test failed because BrightcoveIE doesn't match them.
2013-09-26 18:59:56 +02:00
Jaime Marquínez Ferrándiz
592882aa9f [brightcove] Support videos that only provide flv versions (fixes #1504)
Moved the test from generic.py to brightcove.py
2013-09-26 13:54:31 +02:00
Philipp Hagemeister
b98d6a1e19 release 2013.09.24.2 2013-09-24 21:55:34 +02:00
Philipp Hagemeister
29c7a63df8 Remove debugging code 2013-09-24 21:55:25 +02:00
Philipp Hagemeister
8b25323ae2 release 2013.09.24.1 2013-09-24 21:40:47 +02:00
Philipp Hagemeister
f426de8460 Merge remote-tracking branch 'origin/master' 2013-09-24 21:40:30 +02:00
Philipp Hagemeister
695dc094ab Merge branch 'automatic-signatures' 2013-09-24 21:40:08 +02:00
Jaime Marquínez Ferrándiz
e80d861064 Revert "[southparkstudios] Fix mgid extraction"
This reverts commit 0fd49457f5.

It seems that the redesign was temporary.
2013-09-24 21:39:38 +02:00
Philipp Hagemeister
2cdeb20135 release 2013.09.24 2013-09-24 21:28:06 +02:00
Philipp Hagemeister
7f74773254 Add option --no-cache-dir 2013-09-24 21:26:10 +02:00
Philipp Hagemeister
f2c327fd39 Fix 86 signature (#1494) 2013-09-24 21:20:42 +02:00
Philipp Hagemeister
e35e4ddc9a Fix output of --youtube-print-sig-code when counting down to 0 2013-09-24 21:18:03 +02:00
Philipp Hagemeister
c3c88a2664 Allow opts.cachedir == None to disable cache 2013-09-24 21:04:43 +02:00
Jaime Marquínez Ferrándiz
bb0eee71e7 [youtube] Update one of the test's description 2013-09-24 21:04:13 +02:00
Jaime Marquínez Ferrándiz
6f56389b88 [youtube] update algos for length 86 and 84 (fixes #1494) 2013-09-24 21:02:00 +02:00
Jaime Marquínez Ferrándiz
5b333c1ce6 [francetv] Add an extractor for Generation Quoi (closes #1475) 2013-09-23 21:41:54 +02:00
Jaime Marquínez Ferrándiz
a825f33030 [francetv] Add an extractor for France2 2013-09-23 21:28:33 +02:00
Philipp Hagemeister
92f618f2e2 Merge remote-tracking branch 'origin/master' 2013-09-23 11:24:49 +02:00
Philipp Hagemeister
81ec7c7901 [facebook] Allow untitled videos (Fixes #1484) 2013-09-23 11:24:33 +02:00
Jaime Marquínez Ferrándiz
dd5d2eb03c If the file is already downloaded include the size in the progress hook 2013-09-22 23:39:30 +02:00
Jaime Marquínez Ferrándiz
4ae720042c Include the eta and the speed in the progress hooks
Useful when listening to the progress hook, for example in a GUI.
2013-09-22 23:31:39 +02:00
Philipp Hagemeister
c705320f48 Correct test strings 2013-09-22 12:18:16 +02:00
Philipp Hagemeister
d2d8f89531 Do not warn if fallback is without alternatives (because we did not get the flash player URL) 2013-09-22 12:18:10 +02:00
Philipp Hagemeister
bdde940e90 [youtube] Improve flash player URL handling 2013-09-22 12:17:42 +02:00
Philipp Hagemeister
45f4a76dbc Work around nosetests nosiness 2013-09-22 11:45:29 +02:00
Philipp Hagemeister
13dc64ce74 [youtube] Remove _decrypt_signature_age_gate 2013-09-22 11:17:21 +02:00
Philipp Hagemeister
c35f9e72ce Move cachedir doc 2013-09-22 11:09:25 +02:00
Philipp Hagemeister
f8061589e6 [youtube] Actually pass in cachedir option 2013-09-22 10:51:33 +02:00
Philipp Hagemeister
0ca96d48c7 [youtube] Improve source code quality 2013-09-22 10:37:23 +02:00
Philipp Hagemeister
4ba146f35d Update static signatures 2013-09-22 10:31:25 +02:00
Philipp Hagemeister
edf3e38ebd [youtube] Improve cache and add an option to print the extracted signatures 2013-09-22 10:30:02 +02:00
Philipp Hagemeister
c4417ddb61 [youtube] Add filesystem signature cache 2013-09-22 00:35:03 +02:00
tewe
4a2080e407 [youku] better error handling
blocked videos used to cause death by TypeError, now we report what the
server says
2013-09-21 20:50:31 +02:00
Philipp Hagemeister
2f2ffea9ca Clarify a couple of calls 2013-09-21 15:34:29 +02:00
Philipp Hagemeister
ba552f542f Use reader instead of indexing 2013-09-21 15:32:37 +02:00
Philipp Hagemeister
8379969834 Prepare signature function caching 2013-09-21 15:19:48 +02:00
Philipp Hagemeister
95dbd2f990 Change test target (Verified with node.js) 2013-09-21 15:10:38 +02:00
Philipp Hagemeister
a7177865b1 Implement more opcodes 2013-09-21 14:48:12 +02:00
Philipp Hagemeister
e0df6211cc Restore accidentally deleted commits
That's what happens if you let Windows machines write :(
2013-09-21 14:40:35 +02:00
Jaime Marquínez Ferrándiz
b00ca882a4 [livestream] Fix events extraction (fixes #1467) 2013-09-21 13:50:52 +02:00
Jaime Marquínez Ferrándiz
39baacc49f [dailymotion] Add an extractor for users (closes #1476) 2013-09-21 12:45:53 +02:00
Jaime Marquínez Ferrándiz
3a1d48d6de [dailymotion] Raise ExtractorError if the dailymotion response reports an error 2013-09-21 12:15:54 +02:00
Philipp Hagemeister
34308b30d6 Warn if no locale is set (#1474) 2013-09-21 11:48:07 +02:00
Philipp Hagemeister
bc1506f8c0 Merge branch 'master' of github.com:rg3/youtube-dl 2013-09-21 11:10:30 +02:00
Philipp Hagemeister
b61067fa4f Abort if extractaudio is given without a variable extension (#1470) 2013-09-21 11:10:22 +02:00
Jaime Marquínez Ferrándiz
69b227a9bc [southparkstudios] add support for http://www.southparkstudios.com/full-episodes/* urls (closes #1469) 2013-09-21 10:58:43 +02:00
Jaime Marquínez Ferrándiz
0fd49457f5 [southparkstudios] Fix mgid extraction 2013-09-21 10:51:25 +02:00
Philipp Hagemeister
58f289d013 release 2013.09.20.1 2013-09-20 22:59:14 +02:00
Jaime Marquínez Ferrándiz
3d60bb96e1 Add an extractor for ebaumsworld.com (closes #1462) 2013-09-20 16:55:50 +02:00
Jaime Marquínez Ferrándiz
38d025b3f0 [youtube] add algo for length 91 2013-09-20 14:43:16 +02:00
Jaime Marquínez Ferrándiz
c40c6aaaaa Catch socket.error before IOError
Since python 2.6 it's a child class.
2013-09-20 13:26:03 +02:00
Jaime Marquínez Ferrándiz
1a810f0d4e [funnyordie] Fix video url extraction 2013-09-20 13:05:34 +02:00
Philipp Hagemeister
63037593c0 release 2013.09.20 2013-09-20 10:24:48 +02:00
Jaime Marquínez Ferrándiz
7a878d47fa Merge pull request #1464 from patrickslin/patch-7
Unable to decrypt signature length 93 (fixes #1461)
2013-09-20 08:25:10 +02:00
patrickslin
bc4b900898 Unable to decrypt signature length 93 (fixes #1461) 2013-09-19 21:49:06 -07:00
Jaime Marquínez Ferrándiz
c5e743f66f [fktv] support videos splitted in any number of parts and some style changes 2013-09-18 23:32:37 +02:00
Jaime Marquínez Ferrándiz
6c36d8d6fb Merge pull request #1438 from rzhxeo/fktv
Add support for http://fernsehkritik.tv
2013-09-18 23:05:56 +02:00
Jaime Marquínez Ferrándiz
71c82637e7 [youtube] apply the fix for lists with number of videos multiple of _MAX_RESULTS to user extraction
Copied from the playlist extractor.
2013-09-18 23:00:32 +02:00
Philipp Hagemeister
2dad310e2c Credit @Ruirize for newgrounds 2013-09-18 22:30:22 +02:00
Philipp Hagemeister
d0ae9e3a8d [newgrounds] simplify 2013-09-18 22:14:43 +02:00
Ruirize
a19413c311 Changed file hash. 2013-09-18 17:17:12 +01:00
Ruirize
1ef80b55dd Fixes test fail
Was unaware of --id being passed to test.
2013-09-18 16:23:38 +01:00
Ruirize
eb03f4dad3 Added Newgrounds support 2013-09-18 15:54:45 +01:00
Philipp Hagemeister
830dd1944a Clarify -i help (#1453) 2013-09-18 13:23:04 +02:00
Pierre Rudloff
cc6943e86a Improvements 2013-09-18 00:07:04 +02:00
rzhxeo
1237c9a3a5 XHamsterIE: Fix support for new HD video url format and add test (closes PR #1443) 2013-09-17 23:08:01 +02:00
Pierre Rudloff
8f77093262 Merge remote-tracking branch 'upstream/master' into websurg 2013-09-17 23:07:44 +02:00
Jaime Marquínez Ferrándiz
5d13df79a5 [francetv] Remove Pluzz test
Videos expire in 7 days
2013-09-17 22:49:43 +02:00
Pierre Rudloff
d79a0e233a Extractor for websurg.com 2013-09-17 22:13:40 +02:00
Jaime Marquínez Ferrándiz
6523223a4c [hotnewhiphop] Fix test case title 2013-09-17 21:10:57 +02:00
Jaime Marquínez Ferrándiz
4a67aafb7e [youtube] Don't search the flash player version for videos with age gate activated 2013-09-17 20:59:55 +02:00
rzhxeo
0761d02b0b Add FKTV extractor 2013-09-16 14:46:19 +02:00
rzhxeo
71c107fc57 Add FKTV extractor
Support for Fernsehkritik-TV (incl. Postecke)
2013-09-16 14:45:14 +02:00
alphapapa
0025da15cf Clarify that download rate is in bytes per second
I found f918ec7ea2 but it is still not clear to anyone who hasn't read Issue #723 whether the limit is in bits or bytes.  This is doubly confusing because 1) ISPs usually advertise speeds in bits per second, and 2) lowercase "k" and "m" are often used in correlation with bits rather than bytes.
2013-07-13 16:42:16 -05:00
82 changed files with 4025 additions and 984 deletions

2
.gitignore vendored
View File

@@ -24,3 +24,5 @@ updates_key.pem
*.flv
*.mp4
*.part
test/testdata
.tox

View File

@@ -13,13 +13,13 @@ PYTHON=/usr/bin/env python
# set SYSCONFDIR to /etc if PREFIX=/usr or PREFIX=/usr/local
ifeq ($(PREFIX),/usr)
SYSCONFDIR=/etc
SYSCONFDIR=/etc
else
ifeq ($(PREFIX),/usr/local)
SYSCONFDIR=/etc
else
SYSCONFDIR=$(PREFIX)/etc
endif
ifeq ($(PREFIX),/usr/local)
SYSCONFDIR=/etc
else
SYSCONFDIR=$(PREFIX)/etc
endif
endif
install: youtube-dl youtube-dl.1 youtube-dl.bash-completion
@@ -71,6 +71,7 @@ youtube-dl.tar.gz: youtube-dl README.md README.txt youtube-dl.1 youtube-dl.bash-
--exclude '*~' \
--exclude '__pycache' \
--exclude '.git' \
--exclude 'testdata' \
-- \
bin devscripts test youtube_dl \
CHANGELOG LICENSE README.md README.txt \

View File

@@ -19,7 +19,10 @@ which means you can modify it, redistribute it or use it however you like.
-U, --update update this program to latest version. Make sure
that you have sufficient permissions (run with
sudo if needed)
-i, --ignore-errors continue on download errors
-i, --ignore-errors continue on download errors, for example to to
skip unavailable videos in a playlist
--abort-on-error Abort downloading of further videos (in the
playlist or the command line) if an error occurs
--dump-user-agent display the current browser identification
--user-agent UA specify a custom user agent
--referer REF specify a custom referer, use if the video access
@@ -29,6 +32,11 @@ which means you can modify it, redistribute it or use it however you like.
--extractor-descriptions Output descriptions of all supported extractors
--proxy URL Use the specified HTTP/HTTPS proxy
--no-check-certificate Suppress HTTPS certificate validation.
--cache-dir DIR Location in the filesystem where youtube-dl can
store downloaded information permanently. By
default $XDG_CACHE_HOME/youtube-dl or ~/.cache
/youtube-dl .
--no-cache-dir Disable filesystem caching
## Video Selection:
--playlist-start NUMBER playlist video to start at (default is 1)
@@ -45,11 +53,16 @@ which means you can modify it, redistribute it or use it however you like.
--date DATE download only videos uploaded in this date
--datebefore DATE download only videos uploaded before this date
--dateafter DATE download only videos uploaded after this date
--no-playlist download only the currently playing video
--age-limit YEARS download only videos suitable for the given age
--download-archive FILE Download only videos not present in the archive
file. Record all downloaded videos in it.
## Download Options:
-r, --rate-limit LIMIT maximum download rate (e.g. 50k or 44.6m)
-r, --rate-limit LIMIT maximum download rate in bytes per second (e.g.
50K or 4.2M)
-R, --retries RETRIES number of retries (default is 10)
--buffer-size SIZE size of download buffer (e.g. 1024 or 16k)
--buffer-size SIZE size of download buffer (e.g. 1024 or 16K)
(default is 1024)
--no-resize-buffer do not automatically adjust the buffer size. By
default, the buffer size is automatically resized
@@ -65,15 +78,17 @@ which means you can modify it, redistribute it or use it however you like.
%(uploader_id)s for the uploader nickname if
different, %(autonumber)s to get an automatically
incremented number, %(ext)s for the filename
extension, %(upload_date)s for the upload date
(YYYYMMDD), %(extractor)s for the provider
(youtube, metacafe, etc), %(id)s for the video id
, %(playlist)s for the playlist the video is in,
%(playlist_index)s for the position in the
playlist and %% for a literal percent. Use - to
output to stdout. Can also be used to download to
a different directory, for example with -o '/my/d
ownloads/%(uploader)s/%(title)s-%(id)s.%(ext)s' .
extension, %(format)s for the format description
(like "22 - 1280x720" or "HD")%(upload_date)s for
the upload date (YYYYMMDD), %(extractor)s for the
provider (youtube, metacafe, etc), %(id)s for the
video id , %(playlist)s for the playlist the
video is in, %(playlist_index)s for the position
in the playlist and %% for a literal percent. Use
- to output to stdout. Can also be used to
download to a different directory, for example
with -o '/my/downloads/%(uploader)s/%(title)s-%(i
d)s.%(ext)s' .
--autonumber-size NUMBER Specifies the number of digits in %(autonumber)s
when it is present in output filename template or
--autonumber option is given
@@ -90,6 +105,7 @@ which means you can modify it, redistribute it or use it however you like.
file modification time
--write-description write video description to a .description file
--write-info-json write video metadata to a .info.json file
--write-annotations write video annotations to a .annotation file
--write-thumbnail write thumbnail image to disk
## Verbosity / Simulation Options:
@@ -156,6 +172,7 @@ which means you can modify it, redistribute it or use it however you like.
processed files are overwritten by default
--embed-subs embed subtitles in the video (only for mp4
videos)
--add-metadata add metadata to the files
# CONFIGURATION

View File

@@ -1,4 +1,4 @@
__youtube-dl()
__youtube_dl()
{
local cur prev opts
COMPREPLY=()
@@ -15,4 +15,4 @@ __youtube-dl()
fi
}
complete -F __youtube-dl youtube-dl
complete -F __youtube_dl youtube-dl

View File

@@ -16,10 +16,11 @@ def main():
ie_htmls = []
for ie in sorted(youtube_dl.gen_extractors(), key=lambda i: i.IE_NAME.lower()):
ie_html = '<b>{}</b>'.format(ie.IE_NAME)
try:
ie_desc = getattr(ie, 'IE_DESC', None)
if ie_desc is False:
continue
elif ie_desc is not None:
ie_html += ': {}'.format(ie.IE_DESC)
except AttributeError:
pass
if ie.working() == False:
ie_html += ' (Currently broken)'
ie_htmls.append('<li>{}</li>'.format(ie_html))

View File

@@ -88,10 +88,6 @@ ROOT=$(pwd)
"$ROOT/devscripts/gh-pages/update-sites.py"
git add *.html *.html.in update
git commit -m "release $version"
git show HEAD
read -p "Is it good, can I push? (y/n) " -n 1
if [[ ! $REPLY =~ ^[Yy]$ ]]; then exit 1; fi
echo
git push "$ROOT" gh-pages
git push "$ORIGIN_URL" gh-pages
)

View File

@@ -1,109 +0,0 @@
#!/usr/bin/env python
# Generate youtube signature algorithm from test cases
import sys
tests = [
# 92 - vflQw-fB4 2013/07/17
("qwertyuioplkjhgfdsazxcvbnm1234567890QWERTYUIOPLKJHGFDSAZXCVBNM!@#$%^&*()_-+={[]}|:;?/>.<'`~\"",
"mrtyuioplkjhgfdsazxcvbnq1234567890QWERTY}IOPLKJHGFDSAZXCVBNM!@#$%^&*()_-+={[]\"|:;"),
# 90
("qwertyuioplkjhgfdsazxcvbnm1234567890QWERTYUIOPLKJHGFDSAZXCVBNM!@#$%^&*()_-+={[]}|:;?/>.<'`",
"mrtyuioplkjhgfdsazxcvbne1234567890QWER[YUIOPLKJHGFDSAZXCVBNM!@#$%^&*()_-+={`]}|"),
# 89
("qwertyuioplkjhgfdsazxcvbnm1234567890QWERTYUIOPLKJHGFDSAZXCVBNM!@#$%^&*()_-+={[]}|:;?/>.<'",
"/?;:|}<[{=+-_)(*&^%$#@!MqBVCXZASDFGHJKLPOIUYTREWQ0987654321mnbvcxzasdfghjklpoiuyt"),
# 88 - vflapUV9V 2013/08/28
("qwertyuioplkjhgfdsazxcvbnm1234567890QWERTYUIOPLKJHGFDSAZXCVBNM!@#$%^&*()_-+={[]}|:;?/>.<",
"ioplkjhgfdsazxcvbnm12<4567890QWERTYUIOZLKJHGFDSAeXCVBNM!@#$%^&*()_-+={[]}|:;?/>.3"),
# 87
("qwertyuioplkjhgfdsazxcvbnm1234567890QWERTYUIOPLKJHGFDSAZXCVBNM!@#$^&*()_-+={[]}|:;?/>.<",
"uioplkjhgfdsazxcvbnm1t34567890QWE2TYUIOPLKJHGFDSAZXCVeNM!@#$^&*()_-+={[]}|:;?/>.<"),
# 86 - vfluy6kdb 2013/09/06
("qwertyuioplkjhgfdsazxcvbnm1234567890QWERTYUIOPLKJHGFDSAZXCVBNM!@#$%^&*()_-+={[|};?/>.<",
"yuioplkjhgfdsazxcvbnm12345678q0QWrRTYUIOELKJHGFD-AZXCVBNM!@#$%^&*()_<+={[|};?/>.S"),
# 85 - vflkuzxcs 2013/09/11
('0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[',
'3456789a0cdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRS[UVWXYZ!"#$%&\'()*+,-./:;<=>?@'),
# 84 - vflg0g8PQ 2013/08/29 (sporadic)
("qwertyuioplkjhgfdsazxcvbnm1234567890QWERTYUIOPLKJHGFDSAZXCVBNM!@#$%^&*()_-+={[};?>.<",
">?;}[{=+-_)(*&^%$#@!MNBVCXZASDFGHJKLPOIUYTREWq0987654321mnbvcxzasdfghjklpoiuytr"),
# 83
("qwertyuioplkjhgfdsazxcvbnm1234567890QWERTYUIOPLKJHGFDSAZXCVBNM!#$%^&*()_+={[};?/>.<",
".>/?;}[{=+_)(*&^%<#!MNBVCXZASPFGHJKLwOIUYTREWQ0987654321mnbvcxzasdfghjklpoiuytreq"),
# 82 - vflGNjMhJ 2013/09/12
("qwertyuioplkjhgfdsazxcvbnm1234567890QWERTYUIOPLKHGFDSAZXCVBNM!@#$%^&*(-+={[};?/>.<",
".>/?;}[<=+-(*&^%$#@!MNBVCXeASDFGHKLPOqUYTREWQ0987654321mnbvcxzasdfghjklpoiuytrIwZ"),
# 81 - vflLC8JvQ 2013/07/25
("qwertyuioplkjhgfdsazxcvbnm1234567890QWERTYUIOPLKHGFDSAZXCVBNM!@#$%^&*(-+={[};?/>.",
"C>/?;}[{=+-(*&^%$#@!MNBVYXZASDFGHKLPOIU.TREWQ0q87659321mnbvcxzasdfghjkl4oiuytrewp"),
# 80 - vflZK4ZYR 2013/08/23 (sporadic)
("qwertyuioplkjhgfdsazxcvbnm1234567890QWERTYUIOPLKHGFDSAZXCVBNM!@#$%^&*(-+={[};?/>",
"wertyuioplkjhgfdsaqxcvbnm1234567890QWERTYUIOPLKHGFDSAZXCVBNM!@#$%^&z(-+={[};?/>"),
# 79 - vflLC8JvQ 2013/07/25 (sporadic)
("qwertyuioplkjhgfdsazxcvbnm1234567890QWERTYUIOPLKHGFDSAZXCVBNM!@#$%^&*(-+={[};?/",
"Z?;}[{=+-(*&^%$#@!MNBVCXRASDFGHKLPOIUYT/EWQ0q87659321mnbvcxzasdfghjkl4oiuytrewp"),
]
tests_age_gate = [
# 86 - vflqinMWD
("qwertyuioplkjhgfdsazxcvbnm1234567890QWERTYUIOPLKJHGFDSAZXCVBNM!@#$%^&*()_-+={[|};?/>.<",
"ertyuioplkjhgfdsazxcvbnm1234567890QWERTYUIOPLKJHGFDSAZXCVBNM!/#$%^&*()_-+={[|};?@"),
]
def find_matching(wrong, right):
idxs = [wrong.index(c) for c in right]
return compress(idxs)
return ('s[%d]' % i for i in idxs)
def compress(idxs):
def _genslice(start, end, step):
starts = '' if start == 0 else str(start)
ends = ':%d' % (end+step)
steps = '' if step == 1 else (':%d' % step)
return 's[%s%s%s]' % (starts, ends, steps)
step = None
for i, prev in zip(idxs[1:], idxs[:-1]):
if step is not None:
if i - prev == step:
continue
yield _genslice(start, prev, step)
step = None
continue
if i - prev in [-1, 1]:
step = i - prev
start = prev
continue
else:
yield 's[%d]' % prev
if step is None:
yield 's[%d]' % i
else:
yield _genslice(start, i, step)
def _assert_compress(inp, exp):
res = list(compress(inp))
if res != exp:
print('Got %r, expected %r' % (res, exp))
assert res == exp
_assert_compress([0,2,4,6], ['s[0]', 's[2]', 's[4]', 's[6]'])
_assert_compress([0,1,2,4,6,7], ['s[:3]', 's[4]', 's[6:8]'])
_assert_compress([8,0,1,2,4,7,6,9], ['s[8]', 's[:3]', 's[4]', 's[7:5:-1]', 's[9]'])
def gen(wrong, right, indent):
code = ' + '.join(find_matching(wrong, right))
return 'if len(s) == %d:\n%s return %s\n' % (len(wrong), indent, code)
def genall(tests):
indent = ' ' * 8
return indent + (indent + 'el').join(gen(wrong, right, indent) for wrong,right in tests)
def main():
print(genall(tests))
print(u' Age gate:')
print(genall(tests_age_gate))
if __name__ == '__main__':
main()

View File

@@ -63,6 +63,7 @@ setup(
' YouTube.com and other video sites.',
url='https://github.com/rg3/youtube-dl',
author='Ricardo Garcia',
author_email='ytdl@yt-dl.org',
maintainer='Philipp Hagemeister',
maintainer_email='phihag@phihag.de',
packages=['youtube_dl', 'youtube_dl.extractor'],

0
test/__init__.py Normal file
View File

View File

@@ -1,38 +1,63 @@
import errno
import io
import hashlib
import json
import os.path
import re
import types
import youtube_dl.extractor
from youtube_dl import YoutubeDL, YoutubeDLHandler
from youtube_dl.utils import (
compat_cookiejar,
compat_urllib_request,
)
from youtube_dl import YoutubeDL
# General configuration (from __init__, not very elegant...)
jar = compat_cookiejar.CookieJar()
cookie_processor = compat_urllib_request.HTTPCookieProcessor(jar)
proxy_handler = compat_urllib_request.ProxyHandler()
opener = compat_urllib_request.build_opener(proxy_handler, cookie_processor, YoutubeDLHandler())
compat_urllib_request.install_opener(opener)
PARAMETERS_FILE = os.path.join(os.path.dirname(os.path.abspath(__file__)), "parameters.json")
with io.open(PARAMETERS_FILE, encoding='utf-8') as pf:
parameters = json.load(pf)
def global_setup():
youtube_dl._setup_opener(timeout=10)
def get_params(override=None):
PARAMETERS_FILE = os.path.join(os.path.dirname(os.path.abspath(__file__)),
"parameters.json")
with io.open(PARAMETERS_FILE, encoding='utf-8') as pf:
parameters = json.load(pf)
if override:
parameters.update(override)
return parameters
def try_rm(filename):
""" Remove a file if it exists """
try:
os.remove(filename)
except OSError as ose:
if ose.errno != errno.ENOENT:
raise
class FakeYDL(YoutubeDL):
def __init__(self):
self.result = []
def __init__(self, override=None):
# Different instances of the downloader can't share the same dictionary
# some test set the "sublang" parameter, which would break the md5 checks.
self.params = dict(parameters)
def to_screen(self, s):
params = get_params(override=override)
super(FakeYDL, self).__init__(params)
self.result = []
def to_screen(self, s, skip_eol=None):
print(s)
def trouble(self, s, tb=None):
raise Exception(s)
def download(self, x):
self.result.append(x)
def expect_warning(self, regex):
# Silence an expected warning matching a regex
old_report_warning = self.report_warning
def report_warning(self, message):
if re.match(regex, message): return
old_report_warning(message)
self.report_warning = types.MethodType(report_warning, self)
def get_testcases():
for ie in youtube_dl.extractor.gen_extractors():
t = getattr(ie, '_TEST', None)
@@ -42,3 +67,6 @@ def get_testcases():
for t in getattr(ie, '_TESTS', []):
t['name'] = type(ie).__name__[:-len('IE')]
yield t
md5 = lambda s: hashlib.md5(s.encode('utf-8')).hexdigest()

133
test/test_YoutubeDL.py Normal file
View File

@@ -0,0 +1,133 @@
#!/usr/bin/env python
# Allow direct execution
import os
import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import FakeYDL
class YDL(FakeYDL):
def __init__(self, *args, **kwargs):
super(YDL, self).__init__(*args, **kwargs)
self.downloaded_info_dicts = []
self.msgs = []
def process_info(self, info_dict):
self.downloaded_info_dicts.append(info_dict)
def to_screen(self, msg):
self.msgs.append(msg)
class TestFormatSelection(unittest.TestCase):
def test_prefer_free_formats(self):
# Same resolution => download webm
ydl = YDL()
ydl.params['prefer_free_formats'] = True
formats = [
{u'ext': u'webm', u'height': 460},
{u'ext': u'mp4', u'height': 460},
]
info_dict = {u'formats': formats, u'extractor': u'test'}
ydl.process_ie_result(info_dict)
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded[u'ext'], u'webm')
# Different resolution => download best quality (mp4)
ydl = YDL()
ydl.params['prefer_free_formats'] = True
formats = [
{u'ext': u'webm', u'height': 720},
{u'ext': u'mp4', u'height': 1080},
]
info_dict[u'formats'] = formats
ydl.process_ie_result(info_dict)
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded[u'ext'], u'mp4')
# No prefer_free_formats => keep original formats order
ydl = YDL()
ydl.params['prefer_free_formats'] = False
formats = [
{u'ext': u'webm', u'height': 720},
{u'ext': u'flv', u'height': 720},
]
info_dict[u'formats'] = formats
ydl.process_ie_result(info_dict)
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded[u'ext'], u'flv')
def test_format_limit(self):
formats = [
{u'format_id': u'meh'},
{u'format_id': u'good'},
{u'format_id': u'great'},
{u'format_id': u'excellent'},
]
info_dict = {
u'formats': formats, u'extractor': u'test', 'id': 'testvid'}
ydl = YDL()
ydl.process_ie_result(info_dict)
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded[u'format_id'], u'excellent')
ydl = YDL({'format_limit': 'good'})
assert ydl.params['format_limit'] == 'good'
ydl.process_ie_result(info_dict)
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded[u'format_id'], u'good')
ydl = YDL({'format_limit': 'great', 'format': 'all'})
ydl.process_ie_result(info_dict)
self.assertEqual(ydl.downloaded_info_dicts[0][u'format_id'], u'meh')
self.assertEqual(ydl.downloaded_info_dicts[1][u'format_id'], u'good')
self.assertEqual(ydl.downloaded_info_dicts[2][u'format_id'], u'great')
self.assertTrue('3' in ydl.msgs[0])
ydl = YDL()
ydl.params['format_limit'] = 'excellent'
ydl.process_ie_result(info_dict)
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded[u'format_id'], u'excellent')
def test_format_selection(self):
formats = [
{u'format_id': u'35', u'ext': u'mp4'},
{u'format_id': u'45', u'ext': u'webm'},
{u'format_id': u'47', u'ext': u'webm'},
{u'format_id': u'2', u'ext': u'flv'},
]
info_dict = {u'formats': formats, u'extractor': u'test'}
ydl = YDL({'format': u'20/47'})
ydl.process_ie_result(info_dict)
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], u'47')
ydl = YDL({'format': u'20/71/worst'})
ydl.process_ie_result(info_dict)
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], u'35')
ydl = YDL()
ydl.process_ie_result(info_dict)
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], u'2')
ydl = YDL({'format': u'webm/mp4'})
ydl.process_ie_result(info_dict)
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], u'47')
ydl = YDL({'format': u'3gp/40/mp4'})
ydl.process_ie_result(info_dict)
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], u'35')
if __name__ == '__main__':
unittest.main()

View File

@@ -0,0 +1,55 @@
#!/usr/bin/env python
# Allow direct execution
import os
import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import global_setup, try_rm
global_setup()
from youtube_dl import YoutubeDL
def _download_restricted(url, filename, age):
""" Returns true iff the file has been downloaded """
params = {
'age_limit': age,
'skip_download': True,
'writeinfojson': True,
"outtmpl": "%(id)s.%(ext)s",
}
ydl = YoutubeDL(params)
ydl.add_default_info_extractors()
json_filename = filename + '.info.json'
try_rm(json_filename)
ydl.download([url])
res = os.path.exists(json_filename)
try_rm(json_filename)
return res
class TestAgeRestriction(unittest.TestCase):
def _assert_restricted(self, url, filename, age, old_age=None):
self.assertTrue(_download_restricted(url, filename, old_age))
self.assertFalse(_download_restricted(url, filename, age))
def test_youtube(self):
self._assert_restricted('07FYdnEawAQ', '07FYdnEawAQ.mp4', 10)
def test_youporn(self):
self._assert_restricted(
'http://www.youporn.com/watch/505835/sex-ed-is-it-safe-to-masturbate-daily/',
'505835.mp4', 2, old_age=25)
def test_pornotube(self):
self._assert_restricted(
'http://pornotube.com/c/173/m/1689755/Marilyn-Monroe-Bathing',
'1689755.flv', 13)
if __name__ == '__main__':
unittest.main()

View File

@@ -1,14 +1,20 @@
#!/usr/bin/env python
import sys
import unittest
# Allow direct execution
import os
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import get_testcases
from youtube_dl.extractor import (
gen_extractors,
JustinTVIE,
YoutubeIE,
)
from youtube_dl.extractor import YoutubeIE, YoutubePlaylistIE, YoutubeChannelIE, JustinTVIE, gen_extractors
from helper import get_testcases
class TestAllURLsMatching(unittest.TestCase):
def setUp(self):

View File

@@ -1,20 +1,16 @@
#!/usr/bin/env python
import sys
import unittest
import json
import io
import hashlib
# Allow direct execution
import os
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import FakeYDL, global_setup, md5
global_setup()
from youtube_dl.extractor import DailymotionIE
from youtube_dl.utils import *
from helper import FakeYDL
md5 = lambda s: hashlib.md5(s.encode('utf-8')).hexdigest()
class TestDailymotionSubtitles(unittest.TestCase):
def setUp(self):
@@ -45,15 +41,18 @@ class TestDailymotionSubtitles(unittest.TestCase):
subtitles = self.getSubtitles()
self.assertEqual(len(subtitles.keys()), 5)
def test_list_subtitles(self):
self.DL.expect_warning(u'Automatic Captions not supported by this server')
self.DL.params['listsubtitles'] = True
info_dict = self.getInfoDict()
self.assertEqual(info_dict, None)
def test_automatic_captions(self):
self.DL.expect_warning(u'Automatic Captions not supported by this server')
self.DL.params['writeautomaticsub'] = True
self.DL.params['subtitleslang'] = ['en']
subtitles = self.getSubtitles()
self.assertTrue(len(subtitles.keys()) == 0)
def test_nosubtitles(self):
self.DL.expect_warning(u'video doesn\'t have subtitles')
self.url = 'http://www.dailymotion.com/video/x12u166_le-zapping-tele-star-du-08-aout-2013_tv'
self.DL.params['writesubtitles'] = True
self.DL.params['allsubtitles'] = True

View File

@@ -1,43 +1,31 @@
#!/usr/bin/env python
import errno
# Allow direct execution
import os
import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import get_params, get_testcases, global_setup, try_rm, md5
global_setup()
import hashlib
import io
import os
import json
import unittest
import sys
import socket
import binascii
# Allow direct execution
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
import youtube_dl.YoutubeDL
from youtube_dl.utils import *
PARAMETERS_FILE = os.path.join(os.path.dirname(os.path.abspath(__file__)), "parameters.json")
from youtube_dl.utils import (
compat_str,
compat_urllib_error,
DownloadError,
ExtractorError,
UnavailableVideoError,
)
RETRIES = 3
# General configuration (from __init__, not very elegant...)
jar = compat_cookiejar.CookieJar()
cookie_processor = compat_urllib_request.HTTPCookieProcessor(jar)
proxy_handler = compat_urllib_request.ProxyHandler()
opener = compat_urllib_request.build_opener(proxy_handler, cookie_processor, YoutubeDLHandler())
compat_urllib_request.install_opener(opener)
socket.setdefaulttimeout(10)
def _try_rm(filename):
""" Remove a file if it exists """
try:
os.remove(filename)
except OSError as ose:
if ose.errno != errno.ENOENT:
raise
md5 = lambda s: hashlib.md5(s.encode('utf-8')).hexdigest()
class YoutubeDL(youtube_dl.YoutubeDL):
def __init__(self, *args, **kwargs):
self.to_stderr = self.to_screen
@@ -54,17 +42,12 @@ def _file_md5(fn):
with open(fn, 'rb') as f:
return hashlib.md5(f.read()).hexdigest()
from helper import get_testcases
defs = get_testcases()
with io.open(PARAMETERS_FILE, encoding='utf-8') as pf:
parameters = json.load(pf)
class TestDownload(unittest.TestCase):
maxDiff = None
def setUp(self):
self.parameters = parameters
self.defs = defs
### Dynamically generate tests
@@ -84,8 +67,7 @@ def generator(test_case):
print_skipping(test_case['skip'])
return
params = self.parameters.copy()
params.update(test_case.get('params', {}))
params = get_params(test_case.get('params', {}))
ydl = YoutubeDL(params)
ydl.add_default_info_extractors()
@@ -97,9 +79,9 @@ def generator(test_case):
test_cases = test_case.get('playlist', [test_case])
for tc in test_cases:
_try_rm(tc['file'])
_try_rm(tc['file'] + '.part')
_try_rm(tc['file'] + '.info.json')
try_rm(tc['file'])
try_rm(tc['file'] + '.part')
try_rm(tc['file'] + '.info.json')
try:
for retry in range(1, RETRIES + 1):
try:
@@ -145,9 +127,9 @@ def generator(test_case):
self.assertTrue(key in info_dict.keys() and info_dict[key])
finally:
for tc in test_cases:
_try_rm(tc['file'])
_try_rm(tc['file'] + '.part')
_try_rm(tc['file'] + '.info.json')
try_rm(tc['file'])
try_rm(tc['file'] + '.part')
try_rm(tc['file'] + '.info.json')
return test_template

View File

@@ -1,17 +1,27 @@
#!/usr/bin/env python
# encoding: utf-8
import sys
import unittest
import json
# Allow direct execution
import os
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from youtube_dl.extractor import DailymotionPlaylistIE, VimeoChannelIE, UstreamChannelIE, SoundcloudUserIE
from youtube_dl.utils import *
from test.helper import FakeYDL, global_setup
global_setup()
from youtube_dl.extractor import (
DailymotionPlaylistIE,
DailymotionUserIE,
VimeoChannelIE,
UstreamChannelIE,
SoundcloudUserIE,
LivestreamIE,
NHLVideocenterIE,
)
from helper import FakeYDL
class TestPlaylists(unittest.TestCase):
def assertIsPlaylist(self, info):
@@ -26,6 +36,14 @@ class TestPlaylists(unittest.TestCase):
self.assertEqual(result['title'], u'SPORT')
self.assertTrue(len(result['entries']) > 20)
def test_dailymotion_user(self):
dl = FakeYDL()
ie = DailymotionUserIE(dl)
result = ie.extract('http://www.dailymotion.com/user/generation-quoi/')
self.assertIsPlaylist(result)
self.assertEqual(result['title'], u'Génération Quoi')
self.assertTrue(len(result['entries']) >= 26)
def test_vimeo_channel(self):
dl = FakeYDL()
ie = VimeoChannelIE(dl)
@@ -50,5 +68,22 @@ class TestPlaylists(unittest.TestCase):
self.assertEqual(result['id'], u'9615865')
self.assertTrue(len(result['entries']) >= 12)
def test_livestream_event(self):
dl = FakeYDL()
ie = LivestreamIE(dl)
result = ie.extract('http://new.livestream.com/tedx/cityenglish')
self.assertIsPlaylist(result)
self.assertEqual(result['title'], u'TEDCity2.0 (English)')
self.assertTrue(len(result['entries']) >= 4)
def test_nhl_videocenter(self):
dl = FakeYDL()
ie = NHLVideocenterIE(dl)
result = ie.extract('http://video.canucks.nhl.com/videocenter/console?catid=999')
self.assertIsPlaylist(result)
self.assertEqual(result['id'], u'999')
self.assertEqual(result['title'], u'Highlights')
self.assertEqual(len(result['entries']), 12)
if __name__ == '__main__':
unittest.main()

View File

@@ -1,14 +1,15 @@
#!/usr/bin/env python
# Various small unit tests
import sys
import unittest
import xml.etree.ElementTree
# coding: utf-8
# Allow direct execution
import os
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
# Various small unit tests
import xml.etree.ElementTree
#from youtube_dl.utils import htmlentity_transform
from youtube_dl.utils import (
@@ -20,6 +21,9 @@ from youtube_dl.utils import (
unified_strdate,
find_xpath_attr,
get_meta_content,
xpath_with_ns,
smuggle_url,
unsmuggle_url,
)
if sys.version_info < (3, 0):
@@ -141,5 +145,31 @@ class TestUtil(unittest.TestCase):
self.assertEqual(get_meta('description'), u'foo & bar')
self.assertEqual(get_meta('author'), 'Plato')
def test_xpath_with_ns(self):
testxml = u'''<root xmlns:media="http://example.com/">
<media:song>
<media:author>The Author</media:author>
<url>http://server.com/download.mp3</url>
</media:song>
</root>'''
doc = xml.etree.ElementTree.fromstring(testxml)
find = lambda p: doc.find(xpath_with_ns(p, {'media': 'http://example.com/'}))
self.assertTrue(find('media:song') is not None)
self.assertEqual(find('media:song/media:author').text, u'The Author')
self.assertEqual(find('media:song/url').text, u'http://server.com/download.mp3')
def test_smuggle_url(self):
data = {u"ö": u"ö", u"abc": [3]}
url = 'https://foo.bar/baz?x=y#a'
smug_url = smuggle_url(url, data)
unsmug_url, unsmug_data = unsmuggle_url(smug_url)
self.assertEqual(url, unsmug_url)
self.assertEqual(data, unsmug_data)
res_url, res_data = unsmuggle_url(url)
self.assertEqual(res_url, url)
self.assertEqual(res_data, None)
if __name__ == '__main__':
unittest.main()

View File

@@ -0,0 +1,80 @@
#!/usr/bin/env python
# coding: utf-8
# Allow direct execution
import os
import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import get_params, global_setup, try_rm
global_setup()
import io
import xml.etree.ElementTree
import youtube_dl.YoutubeDL
import youtube_dl.extractor
class YoutubeDL(youtube_dl.YoutubeDL):
def __init__(self, *args, **kwargs):
super(YoutubeDL, self).__init__(*args, **kwargs)
self.to_stderr = self.to_screen
params = get_params({
'writeannotations': True,
'skip_download': True,
'writeinfojson': False,
'format': 'flv',
})
TEST_ID = 'gr51aVj-mLg'
ANNOTATIONS_FILE = TEST_ID + '.flv.annotations.xml'
EXPECTED_ANNOTATIONS = ['Speech bubble', 'Note', 'Title', 'Spotlight', 'Label']
class TestAnnotations(unittest.TestCase):
def setUp(self):
# Clear old files
self.tearDown()
def test_info_json(self):
expected = list(EXPECTED_ANNOTATIONS) #Two annotations could have the same text.
ie = youtube_dl.extractor.YoutubeIE()
ydl = YoutubeDL(params)
ydl.add_info_extractor(ie)
ydl.download([TEST_ID])
self.assertTrue(os.path.exists(ANNOTATIONS_FILE))
annoxml = None
with io.open(ANNOTATIONS_FILE, 'r', encoding='utf-8') as annof:
annoxml = xml.etree.ElementTree.parse(annof)
self.assertTrue(annoxml is not None, 'Failed to parse annotations XML')
root = annoxml.getroot()
self.assertEqual(root.tag, 'document')
annotationsTag = root.find('annotations')
self.assertEqual(annotationsTag.tag, 'annotations')
annotations = annotationsTag.findall('annotation')
#Not all the annotations have TEXT children and the annotations are returned unsorted.
for a in annotations:
self.assertEqual(a.tag, 'annotation')
if a.get('type') == 'text':
textTag = a.find('TEXT')
text = textTag.text
self.assertTrue(text in expected) #assertIn only added in python 2.7
#remove the first occurance, there could be more than one annotation with the same text
expected.remove(text)
#We should have seen (and removed) all the expected annotation texts.
self.assertEqual(len(expected), 0, 'Not all expected annotations were found.')
def tearDown(self):
try_rm(ANNOTATIONS_FILE)
if __name__ == '__main__':
unittest.main()

View File

@@ -1,37 +1,34 @@
#!/usr/bin/env python
# coding: utf-8
import json
# Allow direct execution
import os
import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
# Allow direct execution
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import get_params, global_setup
global_setup()
import io
import json
import youtube_dl.YoutubeDL
import youtube_dl.extractor
from youtube_dl.utils import *
PARAMETERS_FILE = os.path.join(os.path.dirname(os.path.abspath(__file__)), "parameters.json")
# General configuration (from __init__, not very elegant...)
jar = compat_cookiejar.CookieJar()
cookie_processor = compat_urllib_request.HTTPCookieProcessor(jar)
proxy_handler = compat_urllib_request.ProxyHandler()
opener = compat_urllib_request.build_opener(proxy_handler, cookie_processor, YoutubeDLHandler())
compat_urllib_request.install_opener(opener)
class YoutubeDL(youtube_dl.YoutubeDL):
def __init__(self, *args, **kwargs):
super(YoutubeDL, self).__init__(*args, **kwargs)
self.to_stderr = self.to_screen
with io.open(PARAMETERS_FILE, encoding='utf-8') as pf:
params = json.load(pf)
params['writeinfojson'] = True
params['skip_download'] = True
params['writedescription'] = True
params = get_params({
'writeinfojson': True,
'skip_download': True,
'writedescription': True,
})
TEST_ID = 'BaW_jenozKc'
INFO_JSON_FILE = TEST_ID + '.mp4.info.json'
@@ -42,6 +39,7 @@ This is a test video for youtube-dl.
For more information, contact phihag@phihag.de .'''
class TestInfoJSON(unittest.TestCase):
def setUp(self):
# Clear old files

View File

@@ -1,20 +1,26 @@
#!/usr/bin/env python
import sys
import unittest
import json
# Allow direct execution
import os
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from youtube_dl.extractor import YoutubeUserIE, YoutubePlaylistIE, YoutubeIE, YoutubeChannelIE, YoutubeShowIE
from youtube_dl.utils import *
from test.helper import FakeYDL, global_setup
global_setup()
from youtube_dl.extractor import (
YoutubeUserIE,
YoutubePlaylistIE,
YoutubeIE,
YoutubeChannelIE,
YoutubeShowIE,
)
from helper import FakeYDL
class TestYoutubeLists(unittest.TestCase):
def assertIsPlaylist(self,info):
def assertIsPlaylist(self, info):
"""Make sure the info has '_type' set to 'playlist'"""
self.assertEqual(info['_type'], 'playlist')
@@ -27,6 +33,14 @@ class TestYoutubeLists(unittest.TestCase):
ytie_results = [YoutubeIE()._extract_id(url['url']) for url in result['entries']]
self.assertEqual(ytie_results, [ 'bV9L5Ht9LgY', 'FXxLjLQi3Fg', 'tU3Bgo5qJZE'])
def test_youtube_playlist_noplaylist(self):
dl = FakeYDL()
dl.params['noplaylist'] = True
ie = YoutubePlaylistIE(dl)
result = ie.extract('https://www.youtube.com/watch?v=FXxLjLQi3Fg&list=PLwiyx1dc3P2JR9N8gQaQN_BCvlSlap7re')
self.assertEqual(result['_type'], 'url')
self.assertEqual(YoutubeIE()._extract_id(result['url']), 'FXxLjLQi3Fg')
def test_issue_673(self):
dl = FakeYDL()
ie = YoutubePlaylistIE(dl)
@@ -92,7 +106,7 @@ class TestYoutubeLists(unittest.TestCase):
dl = FakeYDL()
ie = YoutubeShowIE(dl)
result = ie.extract('http://www.youtube.com/show/airdisasters')
self.assertTrue(len(result) >= 4)
self.assertTrue(len(result) >= 3)
if __name__ == '__main__':
unittest.main()

View File

@@ -0,0 +1,84 @@
#!/usr/bin/env python
# Allow direct execution
import os
import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import global_setup
global_setup()
import io
import re
import string
from youtube_dl.extractor import YoutubeIE
from youtube_dl.utils import compat_str, compat_urlretrieve
_TESTS = [
(
u'https://s.ytimg.com/yts/jsbin/html5player-vflHOr_nV.js',
u'js',
86,
u'>=<;:/.-[+*)(\'&%$#"!ZYX0VUTSRQPONMLKJIHGFEDCBA\\yxwvutsrqponmlkjihgfedcba987654321',
),
(
u'https://s.ytimg.com/yts/jsbin/html5player-vfldJ8xgI.js',
u'js',
85,
u'3456789a0cdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRS[UVWXYZ!"#$%&\'()*+,-./:;<=>?@',
),
(
u'https://s.ytimg.com/yts/swfbin/watch_as3-vflg5GhxU.swf',
u'swf',
82,
u':/.-,+*)=\'&%$#"!ZYX0VUTSRQPONMLKJIHGFEDCBAzyxw>utsrqponmlkjihgfedcba987654321'
),
]
class TestSignature(unittest.TestCase):
def setUp(self):
TEST_DIR = os.path.dirname(os.path.abspath(__file__))
self.TESTDATA_DIR = os.path.join(TEST_DIR, 'testdata')
if not os.path.exists(self.TESTDATA_DIR):
os.mkdir(self.TESTDATA_DIR)
def make_tfunc(url, stype, sig_length, expected_sig):
basename = url.rpartition('/')[2]
m = re.match(r'.*-([a-zA-Z0-9_-]+)\.[a-z]+$', basename)
assert m, '%r should follow URL format' % basename
test_id = m.group(1)
def test_func(self):
fn = os.path.join(self.TESTDATA_DIR, basename)
if not os.path.exists(fn):
compat_urlretrieve(url, fn)
ie = YoutubeIE()
if stype == 'js':
with io.open(fn, encoding='utf-8') as testf:
jscode = testf.read()
func = ie._parse_sig_js(jscode)
else:
assert stype == 'swf'
with open(fn, 'rb') as testf:
swfcode = testf.read()
func = ie._parse_sig_swf(swfcode)
src_sig = compat_str(string.printable[:sig_length])
got_sig = func(src_sig)
self.assertEqual(got_sig, expected_sig)
test_func.__name__ = str('test_signature_' + stype + '_' + test_id)
setattr(TestSignature, test_func.__name__, test_func)
for test_spec in _TESTS:
make_tfunc(*test_spec)
if __name__ == '__main__':
unittest.main()

View File

@@ -1,76 +1,87 @@
#!/usr/bin/env python
import sys
import unittest
import json
import io
import hashlib
# Allow direct execution
import os
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import FakeYDL, global_setup, md5
global_setup()
from youtube_dl.extractor import YoutubeIE
from youtube_dl.utils import *
from helper import FakeYDL
md5 = lambda s: hashlib.md5(s.encode('utf-8')).hexdigest()
class TestYoutubeSubtitles(unittest.TestCase):
def setUp(self):
self.DL = FakeYDL()
self.url = 'QRS8MkLhQmM'
def getInfoDict(self):
IE = YoutubeIE(self.DL)
info_dict = IE.extract(self.url)
return info_dict
def getSubtitles(self):
info_dict = self.getInfoDict()
return info_dict[0]['subtitles']
return info_dict[0]['subtitles']
def test_youtube_no_writesubtitles(self):
self.DL.params['writesubtitles'] = False
subtitles = self.getSubtitles()
self.assertEqual(subtitles, None)
def test_youtube_subtitles(self):
self.DL.params['writesubtitles'] = True
subtitles = self.getSubtitles()
self.assertEqual(md5(subtitles['en']), '4cd9278a35ba2305f47354ee13472260')
def test_youtube_subtitles_lang(self):
self.DL.params['writesubtitles'] = True
self.DL.params['subtitleslangs'] = ['it']
subtitles = self.getSubtitles()
self.assertEqual(md5(subtitles['it']), '164a51f16f260476a05b50fe4c2f161d')
def test_youtube_allsubtitles(self):
self.DL.params['writesubtitles'] = True
self.DL.params['allsubtitles'] = True
subtitles = self.getSubtitles()
self.assertEqual(len(subtitles.keys()), 13)
def test_youtube_subtitles_sbv_format(self):
self.DL.params['writesubtitles'] = True
self.DL.params['subtitlesformat'] = 'sbv'
subtitles = self.getSubtitles()
self.assertEqual(md5(subtitles['en']), '13aeaa0c245a8bed9a451cb643e3ad8b')
def test_youtube_subtitles_vtt_format(self):
self.DL.params['writesubtitles'] = True
self.DL.params['subtitlesformat'] = 'vtt'
subtitles = self.getSubtitles()
self.assertEqual(md5(subtitles['en']), '356cdc577fde0c6783b9b822e7206ff7')
def test_youtube_list_subtitles(self):
self.DL.expect_warning(u'Video doesn\'t have automatic captions')
self.DL.params['listsubtitles'] = True
info_dict = self.getInfoDict()
self.assertEqual(info_dict, None)
def test_youtube_automatic_captions(self):
self.url = '8YoUxe5ncPo'
self.DL.params['writeautomaticsub'] = True
self.DL.params['subtitleslangs'] = ['it']
subtitles = self.getSubtitles()
self.assertTrue(subtitles['it'] is not None)
def test_youtube_nosubtitles(self):
self.DL.expect_warning(u'video doesn\'t have subtitles')
self.url = 'sAjKT8FhjI8'
self.DL.params['writesubtitles'] = True
self.DL.params['allsubtitles'] = True
subtitles = self.getSubtitles()
self.assertEqual(len(subtitles), 0)
def test_youtube_multiple_langs(self):
self.url = 'QRS8MkLhQmM'
self.DL.params['writesubtitles'] = True

8
tox.ini Normal file
View File

@@ -0,0 +1,8 @@
[tox]
envlist = py26,py27,py33
[testenv]
deps =
nose
coverage
commands = nosetests --verbose {posargs:test} # --with-coverage --cover-package=youtube_dl --cover-html
# test.test_download:TestDownload.test_NowVideo

View File

@@ -77,26 +77,43 @@ class FileDownloader(object):
@staticmethod
def calc_percent(byte_counter, data_len):
if data_len is None:
return None
return float(byte_counter) / float(data_len) * 100.0
@staticmethod
def format_percent(percent):
if percent is None:
return '---.-%'
return '%6s' % ('%3.1f%%' % (float(byte_counter) / float(data_len) * 100.0))
return '%6s' % ('%3.1f%%' % percent)
@staticmethod
def calc_eta(start, now, total, current):
if total is None:
return '--:--'
return None
dif = now - start
if current == 0 or dif < 0.001: # One millisecond
return '--:--'
return None
rate = float(current) / dif
eta = int((float(total) - float(current)) / rate)
return int((float(total) - float(current)) / rate)
@staticmethod
def format_eta(eta):
if eta is None:
return '--:--'
return FileDownloader.format_seconds(eta)
@staticmethod
def calc_speed(start, now, bytes):
dif = now - start
if bytes == 0 or dif < 0.001: # One millisecond
return None
return float(bytes) / dif
@staticmethod
def format_speed(speed):
if speed is None:
return '%10s' % '---b/s'
return '%10s' % ('%s/s' % FileDownloader.format_bytes(float(bytes) / dif))
return '%10s' % ('%s/s' % FileDownloader.format_bytes(speed))
@staticmethod
def best_block_size(elapsed_time, bytes):
@@ -205,11 +222,14 @@ class FileDownloader(object):
"""Report destination filename."""
self.to_screen(u'[download] Destination: ' + filename)
def report_progress(self, percent_str, data_len_str, speed_str, eta_str):
def report_progress(self, percent, data_len_str, speed, eta):
"""Report download progress."""
if self.params.get('noprogress', False):
return
clear_line = (u'\x1b[K' if sys.stderr.isatty() and os.name != 'nt' else u'')
eta_str = self.format_eta(eta)
percent_str = self.format_percent(percent)
speed_str = self.format_speed(speed)
if self.params.get('progress_with_newline', False):
self.to_screen(u'[download] %s of %s at %s ETA %s' %
(percent_str, data_len_str, speed_str, eta_str))
@@ -250,6 +270,7 @@ class FileDownloader(object):
def _download_with_rtmpdump(self, filename, url, player_url, page_url, play_path, tc_url):
self.report_destination(filename)
tmpfilename = self.temp_name(filename)
test = self.params.get('test', False)
# Check for rtmpdump first
try:
@@ -271,6 +292,8 @@ class FileDownloader(object):
basic_args += ['--playpath', play_path]
if tc_url is not None:
basic_args += ['--tcUrl', url]
if test:
basic_args += ['--stop', '1']
args = basic_args + [[], ['--resume', '--skip', '1']][self.params.get('continuedl', False)]
if self.params.get('verbose', False):
try:
@@ -280,7 +303,7 @@ class FileDownloader(object):
shell_quote = repr
self.to_screen(u'[debug] rtmpdump command line: ' + shell_quote(args))
retval = subprocess.call(args)
while retval == 2 or retval == 1:
while (retval == 2 or retval == 1) and not test:
prevsize = os.path.getsize(encodeFilename(tmpfilename))
self.to_screen(u'\r[rtmpdump] %s bytes' % prevsize, skip_eol=True)
time.sleep(5.0) # This seems to be needed
@@ -293,7 +316,7 @@ class FileDownloader(object):
self.to_screen(u'\r[rtmpdump] Could not download the whole video. This can happen for some advertisements.')
retval = 0
break
if retval == 0:
if retval == 0 or (test and retval == 2):
fsize = os.path.getsize(encodeFilename(tmpfilename))
self.to_screen(u'\r[rtmpdump] %s bytes' % fsize)
self.try_rename(tmpfilename, filename)
@@ -378,6 +401,7 @@ class FileDownloader(object):
self._hook_progress({
'filename': filename,
'status': 'finished',
'total_bytes': os.path.getsize(encodeFilename(filename)),
})
return True
@@ -524,13 +548,14 @@ class FileDownloader(object):
block_size = self.best_block_size(after - before, len(data_block))
# Progress message
speed_str = self.calc_speed(start, time.time(), byte_counter - resume_len)
speed = self.calc_speed(start, time.time(), byte_counter - resume_len)
if data_len is None:
self.report_progress('Unknown %', data_len_str, speed_str, 'Unknown ETA')
eta = None
else:
percent_str = self.calc_percent(byte_counter, data_len)
eta_str = self.calc_eta(start, time.time(), data_len - resume_len, byte_counter - resume_len)
self.report_progress(percent_str, data_len_str, speed_str, eta_str)
percent = self.calc_percent(byte_counter, data_len)
eta = self.calc_eta(start, time.time(), data_len - resume_len, byte_counter - resume_len)
self.report_progress(percent, data_len_str, speed, eta)
self._hook_progress({
'downloaded_bytes': byte_counter,
@@ -538,6 +563,8 @@ class FileDownloader(object):
'tmpfilename': tmpfilename,
'filename': filename,
'status': 'downloading',
'eta': eta,
'speed': speed,
})
# Apply rate limit
@@ -580,6 +607,8 @@ class FileDownloader(object):
* downloaded_bytes: Bytes on disks
* total_bytes: Total bytes, None if unknown
* tmpfilename: The filename we're currently writing to
* eta: The estimated time in seconds, None if unknown
* speed: The download speed in bytes/second, None if unknown
Hooks are guaranteed to be called at least once (with status "finished")
if the download is successful.

View File

@@ -3,7 +3,14 @@ import subprocess
import sys
import time
from .utils import *
from .utils import (
compat_subprocess_get_DEVNULL,
encodeFilename,
PostProcessingError,
shell_quote,
subtitles_filename,
)
class PostProcessor(object):
@@ -82,6 +89,8 @@ class FFmpegPostProcessor(PostProcessor):
+ opts +
[encodeFilename(self._ffmpeg_filename_argument(out_path))])
if self._downloader.params.get('verbose', False):
self._downloader.to_screen(u'[debug] ffmpeg command line: %s' % shell_quote(cmd))
p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
stdout,stderr = p.communicate()
if p.returncode != 0:
@@ -177,7 +186,8 @@ class FFmpegExtractAudioPP(FFmpegPostProcessor):
extension = self._preferredcodec
more_opts = []
if self._preferredquality is not None:
if int(self._preferredquality) < 10:
# The opus codec doesn't support the -aq option
if int(self._preferredquality) < 10 and extension != 'opus':
more_opts += [self._exes['avconv'] and '-q:a' or '-aq', self._preferredquality]
else:
more_opts += [self._exes['avconv'] and '-b:a' or '-ab', self._preferredquality + 'k']
@@ -444,8 +454,11 @@ class FFmpegEmbedSubtitlePP(FFmpegPostProcessor):
if information['ext'] != u'mp4':
self._downloader.to_screen(u'[ffmpeg] Subtitles can only be embedded in mp4 files')
return True, information
sub_langs = [key for key in information['subtitles']]
if not information.get('subtitles'):
self._downloader.to_screen(u'[ffmpeg] There aren\'t any subtitles to embed')
return True, information
sub_langs = [key for key in information['subtitles']]
filename = information['filepath']
input_files = [filename] + [subtitles_filename(filename, lang, self._subformat) for lang in sub_langs]
@@ -464,3 +477,35 @@ class FFmpegEmbedSubtitlePP(FFmpegPostProcessor):
os.rename(encodeFilename(temp_filename), encodeFilename(filename))
return True, information
class FFmpegMetadataPP(FFmpegPostProcessor):
def run(self, info):
metadata = {}
if info.get('title') is not None:
metadata['title'] = info['title']
if info.get('upload_date') is not None:
metadata['date'] = info['upload_date']
if info.get('uploader') is not None:
metadata['artist'] = info['uploader']
elif info.get('uploader_id') is not None:
metadata['artist'] = info['uploader_id']
if not metadata:
self._downloader.to_screen(u'[ffmpeg] There isn\'t any metadata to add')
return True, info
filename = info['filepath']
ext = os.path.splitext(filename)[1][1:]
temp_filename = filename + u'.temp'
options = ['-c', 'copy']
for (name, value) in metadata.items():
options.extend(['-metadata', '%s="%s"' % (name, value)])
options.extend(['-f', ext])
self._downloader.to_screen(u'[ffmpeg] Adding metadata to \'%s\'' % filename)
self.run_ffmpeg(filename, temp_filename, options)
os.remove(encodeFilename(filename))
os.rename(encodeFilename(temp_filename), encodeFilename(filename))
return True, info

View File

@@ -3,6 +3,7 @@
from __future__ import absolute_import
import errno
import io
import os
import re
@@ -70,6 +71,7 @@ class YoutubeDL(object):
logtostderr: Log messages to stderr instead of stdout.
writedescription: Write the video description to a .description file
writeinfojson: Write the video description to a .info.json file
writeannotations: Write the video annotations to a .annotations.xml file
writethumbnail: Write the thumbnail image to a file
writesubtitles: Write the video subtitles to a file
writeautomaticsub: Write the automatic subtitles to a file
@@ -81,7 +83,15 @@ class YoutubeDL(object):
keepvideo: Keep the video file after post-processing
daterange: A DateRange object, download only if the upload_date is in the range.
skip_download: Skip the actual download of the video file
cachedir: Location of the cache files in the filesystem.
None to disable filesystem cache.
noplaylist: Download single video instead of a playlist if in doubt.
age_limit: An integer representing the user's age in years.
Unsuitable videos for the given age are skipped.
downloadarchive: File name of a file where all downloads are recorded.
Videos already present in the file are not downloaded
again.
The following parameters are not used by YoutubeDL itself, they are used by
the FileDownloader:
nopart, updatetime, buffersize, ratelimit, min_filesize, max_filesize, test,
@@ -104,6 +114,17 @@ class YoutubeDL(object):
self._download_retcode = 0
self._num_downloads = 0
self._screen_file = [sys.stdout, sys.stderr][params.get('logtostderr', False)]
if (sys.version_info >= (3,) and sys.platform != 'win32' and
sys.getfilesystemencoding() in ['ascii', 'ANSI_X3.4-1968']
and not params['restrictfilenames']):
# On Python 3, the Unicode filesystem API will throw errors (#1474)
self.report_warning(
u'Assuming --restrict-filenames since file system encoding '
u'cannot encode all charactes. '
u'Set the LC_ALL environment variable to fix this.')
params['restrictfilenames'] = True
self.params = params
self.fd = FileDownloader(self, self.params)
@@ -195,10 +216,10 @@ class YoutubeDL(object):
If stderr is a tty file the 'WARNING:' will be colored
'''
if sys.stderr.isatty() and os.name != 'nt':
_msg_header=u'\033[0;33mWARNING:\033[0m'
_msg_header = u'\033[0;33mWARNING:\033[0m'
else:
_msg_header=u'WARNING:'
warning_message=u'%s %s' % (_msg_header,message)
_msg_header = u'WARNING:'
warning_message = u'%s %s' % (_msg_header, message)
self.to_stderr(warning_message)
def report_error(self, message, tb=None):
@@ -213,19 +234,6 @@ class YoutubeDL(object):
error_message = u'%s %s' % (_msg_header, message)
self.trouble(error_message, tb)
def slow_down(self, start_time, byte_counter):
"""Sleep if the download speed is over the rate limit."""
rate_limit = self.params.get('ratelimit', None)
if rate_limit is None or byte_counter == 0:
return
now = time.time()
elapsed = now - start_time
if elapsed <= 0.0:
return
speed = float(byte_counter) / elapsed
if speed > rate_limit:
time.sleep((byte_counter - rate_limit * (now - start_time)) / rate_limit)
def report_writedescription(self, descfn):
""" Report that the description file is being written """
self.to_screen(u'[info] Writing video description to: ' + descfn)
@@ -238,6 +246,10 @@ class YoutubeDL(object):
""" Report that the metadata file has been written """
self.to_screen(u'[info] Video description metadata as JSON to: ' + infofn)
def report_writeannotations(self, annofn):
""" Report that the annotations file has been written. """
self.to_screen(u'[info] Writing video annotations to: ' + annofn)
def report_file_already_downloaded(self, file_name):
"""Report file has already been fully downloaded."""
try:
@@ -263,13 +275,15 @@ class YoutubeDL(object):
if template_dict['playlist_index'] is not None:
template_dict['playlist_index'] = u'%05d' % template_dict['playlist_index']
sanitize = lambda k,v: sanitize_filename(
sanitize = lambda k, v: sanitize_filename(
u'NA' if v is None else compat_str(v),
restricted=self.params.get('restrictfilenames'),
is_id=(k==u'id'))
template_dict = dict((k, sanitize(k, v)) for k,v in template_dict.items())
is_id=(k == u'id'))
template_dict = dict((k, sanitize(k, v))
for k, v in template_dict.items())
filename = self.params['outtmpl'] % template_dict
tmpl = os.path.expanduser(self.params['outtmpl'])
filename = tmpl % template_dict
return filename
except KeyError as err:
self.report_error(u'Erroneous output template')
@@ -295,15 +309,22 @@ class YoutubeDL(object):
dateRange = self.params.get('daterange', DateRange())
if date not in dateRange:
return u'[download] %s upload date is not in range %s' % (date_from_str(date).isoformat(), dateRange)
age_limit = self.params.get('age_limit')
if age_limit is not None:
if age_limit < info_dict.get('age_limit', 0):
return u'Skipping "' + title + '" because it is age restricted'
if self.in_download_archive(info_dict):
return (u'%(title)s has already been recorded in archive'
% info_dict)
return None
def extract_info(self, url, download=True, ie_key=None, extra_info={}):
'''
Returns a list with a dictionary for each video we find.
If 'download', also downloads the videos.
extra_info is a dict containing the extra values to add to each result
'''
if ie_key:
ies = [self.get_info_extractor(ie_key)]
else:
@@ -345,7 +366,7 @@ class YoutubeDL(object):
raise
else:
self.report_error(u'no suitable InfoExtractor: %s' % url)
def process_ie_result(self, ie_result, download=True, extra_info={}):
"""
Take the result of the ie(may be modified) and resolve all unresolved
@@ -358,13 +379,7 @@ class YoutubeDL(object):
result_type = ie_result.get('_type', 'video') # If not given we suppose it's a video, support the default old system
if result_type == 'video':
ie_result.update(extra_info)
if 'playlist' not in ie_result:
# It isn't part of a playlist
ie_result['playlist'] = None
ie_result['playlist_index'] = None
if download:
self.process_info(ie_result)
return ie_result
return self.process_video_result(ie_result)
elif result_type == 'url':
# We have to add extra_info to the results because it may be
# contained in a playlist
@@ -375,7 +390,7 @@ class YoutubeDL(object):
elif result_type == 'playlist':
# We process each entry in the playlist
playlist = ie_result.get('title', None) or ie_result.get('id', None)
self.to_screen(u'[download] Downloading playlist: %s' % playlist)
self.to_screen(u'[download] Downloading playlist: %s' % playlist)
playlist_results = []
@@ -393,12 +408,12 @@ class YoutubeDL(object):
self.to_screen(u"[%s] playlist '%s': Collected %d video ids (downloading %d of them)" %
(ie_result['extractor'], playlist, n_all_entries, n_entries))
for i,entry in enumerate(entries,1):
self.to_screen(u'[download] Downloading video #%s of %s' %(i, n_entries))
for i, entry in enumerate(entries, 1):
self.to_screen(u'[download] Downloading video #%s of %s' % (i, n_entries))
extra = {
'playlist': playlist,
'playlist_index': i + playliststart,
}
'playlist': playlist,
'playlist_index': i + playliststart,
}
if not 'extractor' in entry:
# We set the extractor, if it's an url it will be set then to
# the new extractor, but if it's already a video we must make
@@ -422,6 +437,103 @@ class YoutubeDL(object):
else:
raise Exception('Invalid result type: %s' % result_type)
def select_format(self, format_spec, available_formats):
if format_spec == 'best' or format_spec is None:
return available_formats[-1]
elif format_spec == 'worst':
return available_formats[0]
else:
extensions = [u'mp4', u'flv', u'webm', u'3gp']
if format_spec in extensions:
filter_f = lambda f: f['ext'] == format_spec
else:
filter_f = lambda f: f['format_id'] == format_spec
matches = list(filter(filter_f, available_formats))
if matches:
return matches[-1]
return None
def process_video_result(self, info_dict, download=True):
assert info_dict.get('_type', 'video') == 'video'
if 'playlist' not in info_dict:
# It isn't part of a playlist
info_dict['playlist'] = None
info_dict['playlist_index'] = None
# This extractors handle format selection themselves
if info_dict['extractor'] in [u'youtube', u'Youku', u'YouPorn', u'mixcloud']:
if download:
self.process_info(info_dict)
return info_dict
# We now pick which formats have to be downloaded
if info_dict.get('formats') is None:
# There's only one format available
formats = [info_dict]
else:
formats = info_dict['formats']
# We check that all the formats have the format and format_id fields
for (i, format) in enumerate(formats):
if format.get('format_id') is None:
format['format_id'] = compat_str(i)
if format.get('format') is None:
format['format'] = u'{id} - {res}{note}'.format(
id=format['format_id'],
res=self.format_resolution(format),
note=u' ({})'.format(format['format_note']) if format.get('format_note') is not None else '',
)
if self.params.get('listformats', None):
self.list_formats(info_dict)
return
format_limit = self.params.get('format_limit', None)
if format_limit:
formats = list(takewhile_inclusive(
lambda f: f['format_id'] != format_limit, formats
))
if self.params.get('prefer_free_formats'):
def _free_formats_key(f):
try:
ext_ord = [u'flv', u'mp4', u'webm'].index(f['ext'])
except ValueError:
ext_ord = -1
# We only compare the extension if they have the same height and width
return (f.get('height'), f.get('width'), ext_ord)
formats = sorted(formats, key=_free_formats_key)
req_format = self.params.get('format', 'best')
if req_format is None:
req_format = 'best'
formats_to_download = []
# The -1 is for supporting YoutubeIE
if req_format in ('-1', 'all'):
formats_to_download = formats
else:
# We can accept formats requestd in the format: 34/5/best, we pick
# the first that is available, starting from left
req_formats = req_format.split('/')
for rf in req_formats:
selected_format = self.select_format(rf, formats)
if selected_format is not None:
formats_to_download = [selected_format]
break
if not formats_to_download:
raise ExtractorError(u'requested format not available')
if download:
if len(formats_to_download) > 1:
self.to_screen(u'[info] %s: downloading video in %s formats' % (info_dict['id'], len(formats_to_download)))
for format in formats_to_download:
new_info = dict(info_dict)
new_info.update(format)
self.process_info(new_info)
# We update the info dict with the best quality format (backwards compatibility)
info_dict.update(formats_to_download[-1])
return info_dict
def process_info(self, info_dict):
"""Process a single resolved IE result."""
@@ -495,10 +607,22 @@ class YoutubeDL(object):
self.report_error(u'Cannot write description file ' + descfn)
return
if self.params.get('writeannotations', False):
try:
annofn = filename + u'.annotations.xml'
self.report_writeannotations(annofn)
with io.open(encodeFilename(annofn), 'w', encoding='utf-8') as annofile:
annofile.write(info_dict['annotations'])
except (KeyError, TypeError):
self.report_warning(u'There are no annotations to write.')
except (OSError, IOError):
self.report_error(u'Cannot write annotations file: ' + annofn)
return
subtitles_are_requested = any([self.params.get('writesubtitles', False),
self.params.get('writeautomaticsub')])
if subtitles_are_requested and 'subtitles' in info_dict and info_dict['subtitles']:
if subtitles_are_requested and 'subtitles' in info_dict and info_dict['subtitles']:
# subtitles download errors are already managed as troubles in relevant IE
# that way it will silently go on when used with unsupporting IE
subtitles = info_dict['subtitles']
@@ -520,7 +644,7 @@ class YoutubeDL(object):
infofn = filename + u'.info.json'
self.report_writeinfojson(infofn)
try:
json_info_dict = dict((k, v) for k,v in info_dict.items() if not k in ['urlhandle'])
json_info_dict = dict((k, v) for k, v in info_dict.items() if not k in ['urlhandle'])
write_json_file(json_info_dict, encodeFilename(infofn))
except (OSError, IOError):
self.report_error(u'Cannot write metadata to JSON file ' + infofn)
@@ -532,11 +656,15 @@ class YoutubeDL(object):
thumb_filename = filename.rpartition('.')[0] + u'.' + thumb_format
self.to_screen(u'[%s] %s: Downloading thumbnail ...' %
(info_dict['extractor'], info_dict['id']))
uf = compat_urllib_request.urlopen(info_dict['thumbnail'])
with open(thumb_filename, 'wb') as thumbf:
shutil.copyfileobj(uf, thumbf)
self.to_screen(u'[%s] %s: Writing thumbnail to: %s' %
(info_dict['extractor'], info_dict['id'], thumb_filename))
try:
uf = compat_urllib_request.urlopen(info_dict['thumbnail'])
with open(thumb_filename, 'wb') as thumbf:
shutil.copyfileobj(uf, thumbf)
self.to_screen(u'[%s] %s: Writing thumbnail to: %s' %
(info_dict['extractor'], info_dict['id'], thumb_filename))
except (compat_urllib_error.URLError, compat_http_client.HTTPException, socket.error) as err:
self.report_warning(u'Unable to download thumbnail "%s": %s' %
(info_dict['thumbnail'], compat_str(err)))
if not self.params.get('skip_download', False):
if self.params.get('nooverwrites', False) and os.path.exists(encodeFilename(filename)):
@@ -544,11 +672,11 @@ class YoutubeDL(object):
else:
try:
success = self.fd._do_download(filename, info_dict)
except (OSError, IOError) as err:
raise UnavailableVideoError(err)
except (compat_urllib_error.URLError, compat_http_client.HTTPException, socket.error) as err:
self.report_error(u'unable to download video data: %s' % str(err))
return
except (OSError, IOError) as err:
raise UnavailableVideoError(err)
except (ContentTooShortError, ) as err:
self.report_error(u'content too short (expected %s bytes and served %s)' % (err.expected, err.downloaded))
return
@@ -560,6 +688,8 @@ class YoutubeDL(object):
self.report_error(u'postprocessing: %s' % str(err))
return
self.record_download_archive(info_dict)
def download(self, url_list):
"""Download a given list of URLs."""
if len(url_list) > 1 and self.fixed_template():
@@ -584,7 +714,7 @@ class YoutubeDL(object):
keep_video = None
for pp in self._pps:
try:
keep_video_wish,new_info = pp.run(info)
keep_video_wish, new_info = pp.run(info)
if keep_video_wish is not None:
if keep_video_wish:
keep_video = keep_video_wish
@@ -599,3 +729,55 @@ class YoutubeDL(object):
os.remove(encodeFilename(filename))
except (IOError, OSError):
self.report_warning(u'Unable to remove downloaded video file')
def in_download_archive(self, info_dict):
fn = self.params.get('download_archive')
if fn is None:
return False
vid_id = info_dict['extractor'] + u' ' + info_dict['id']
try:
with locked_file(fn, 'r', encoding='utf-8') as archive_file:
for line in archive_file:
if line.strip() == vid_id:
return True
except IOError as ioe:
if ioe.errno != errno.ENOENT:
raise
return False
def record_download_archive(self, info_dict):
fn = self.params.get('download_archive')
if fn is None:
return
vid_id = info_dict['extractor'] + u' ' + info_dict['id']
with locked_file(fn, 'a', encoding='utf-8') as archive_file:
archive_file.write(vid_id + u'\n')
@staticmethod
def format_resolution(format):
if format.get('height') is not None:
if format.get('width') is not None:
res = u'%sx%s' % (format['width'], format['height'])
else:
res = u'%sp' % format['height']
else:
res = '???'
return res
def list_formats(self, info_dict):
formats_s = []
for format in info_dict.get('formats', [info_dict]):
formats_s.append(u'%-15s: %-5s %-15s[%s]' % (
format['format_id'],
format['ext'],
format.get('format_note') or '-',
self.format_resolution(format),
)
)
if len(formats_s) != 1:
formats_s[0] += ' (worst)'
formats_s[-1] += ' (best)'
formats_s = "\n".join(formats_s)
self.to_screen(u'[info] Available formats for %s:\n'
u'format code extension note resolution\n%s' % (
info_dict['id'], formats_s))

View File

@@ -30,11 +30,14 @@ __authors__ = (
'Pierre Rudloff',
'Huarong Huo',
'Ismael Mejía',
'Steffan \'Ruirize\' James',
'Andras Elso',
)
__license__ = 'Public Domain'
import codecs
import collections
import getpass
import optparse
import os
@@ -44,17 +47,43 @@ import shlex
import socket
import subprocess
import sys
import warnings
import traceback
import platform
from .utils import *
from .utils import (
compat_cookiejar,
compat_print,
compat_str,
compat_urllib_request,
DateRange,
decodeOption,
determine_ext,
DownloadError,
get_cachedir,
make_HTTPS_handler,
MaxDownloadsReached,
platform_name,
preferredencoding,
SameFileError,
std_headers,
write_string,
YoutubeDLHandler,
)
from .update import update_self
from .version import __version__
from .FileDownloader import *
from .FileDownloader import (
FileDownloader,
)
from .extractor import gen_extractors
from .YoutubeDL import YoutubeDL
from .PostProcessor import *
from .PostProcessor import (
FFmpegMetadataPP,
FFmpegVideoConvertor,
FFmpegExtractAudioPP,
FFmpegEmbedSubtitlePP,
)
def parseOpts(overrideArguments=None):
def _readOptions(filename_bytes):
@@ -104,7 +133,7 @@ def parseOpts(overrideArguments=None):
def _hide_login_info(opts):
opts = list(opts)
for private_opt in ['-p', '--password', '-u', '--username']:
for private_opt in ['-p', '--password', '-u', '--username', '--video-password']:
try:
i = opts.index(private_opt)
opts[i+1] = '<PRIVATE>'
@@ -149,7 +178,10 @@ def parseOpts(overrideArguments=None):
general.add_option('-U', '--update',
action='store_true', dest='update_self', help='update this program to latest version. Make sure that you have sufficient permissions (run with sudo if needed)')
general.add_option('-i', '--ignore-errors',
action='store_true', dest='ignoreerrors', help='continue on download errors', default=False)
action='store_true', dest='ignoreerrors', help='continue on download errors, for example to to skip unavailable videos in a playlist', default=False)
general.add_option('--abort-on-error',
action='store_false', dest='ignoreerrors',
help='Abort downloading of further videos (in the playlist or the command line) if an error occurs')
general.add_option('--dump-user-agent',
action='store_true', dest='dump_user_agent',
help='display the current browser identification', default=False)
@@ -166,6 +198,12 @@ def parseOpts(overrideArguments=None):
help='Output descriptions of all supported extractors', default=False)
general.add_option('--proxy', dest='proxy', default=None, help='Use the specified HTTP/HTTPS proxy', metavar='URL')
general.add_option('--no-check-certificate', action='store_true', dest='no_check_certificate', default=False, help='Suppress HTTPS certificate validation.')
general.add_option(
'--cache-dir', dest='cachedir', default=get_cachedir(), metavar='DIR',
help='Location in the filesystem where youtube-dl can store downloaded information permanently. By default $XDG_CACHE_HOME/youtube-dl or ~/.cache/youtube-dl .')
general.add_option(
'--no-cache-dir', action='store_const', const=None, dest='cachedir',
help='Disable filesystem caching')
selection.add_option('--playlist-start',
@@ -180,6 +218,13 @@ def parseOpts(overrideArguments=None):
selection.add_option('--date', metavar='DATE', dest='date', help='download only videos uploaded in this date', default=None)
selection.add_option('--datebefore', metavar='DATE', dest='datebefore', help='download only videos uploaded before this date', default=None)
selection.add_option('--dateafter', metavar='DATE', dest='dateafter', help='download only videos uploaded after this date', default=None)
selection.add_option('--no-playlist', action='store_true', dest='noplaylist', help='download only the currently playing video', default=False)
selection.add_option('--age-limit', metavar='YEARS', dest='age_limit',
help='download only videos suitable for the given age',
default=None, type=int)
selection.add_option('--download-archive', metavar='FILE',
dest='download_archive',
help='Download only videos not present in the archive file. Record all downloaded videos in it.')
authentication.add_option('-u', '--username',
@@ -193,7 +238,7 @@ def parseOpts(overrideArguments=None):
video_format.add_option('-f', '--format',
action='store', dest='format', metavar='FORMAT',
action='store', dest='format', metavar='FORMAT', default='best',
help='video format code, specifiy the order of preference using slashes: "-f 22/17/18". "-f mp4" and "-f flv" are also supported')
video_format.add_option('--all-formats',
action='store_const', dest='format', help='download all available video formats', const='all')
@@ -225,11 +270,11 @@ def parseOpts(overrideArguments=None):
help='languages of the subtitles to download (optional) separated by commas, use IETF language tags like \'en,pt\'')
downloader.add_option('-r', '--rate-limit',
dest='ratelimit', metavar='LIMIT', help='maximum download rate (e.g. 50k or 44.6m)')
dest='ratelimit', metavar='LIMIT', help='maximum download rate in bytes per second (e.g. 50K or 4.2M)')
downloader.add_option('-R', '--retries',
dest='retries', metavar='RETRIES', help='number of retries (default is %default)', default=10)
downloader.add_option('--buffer-size',
dest='buffersize', metavar='SIZE', help='size of download buffer (e.g. 1024 or 16k) (default is %default)', default="1024")
dest='buffersize', metavar='SIZE', help='size of download buffer (e.g. 1024 or 16K) (default is %default)', default="1024")
downloader.add_option('--no-resize-buffer',
action='store_true', dest='noresizebuffer',
help='do not automatically adjust the buffer size. By default, the buffer size is automatically resized from an initial value of SIZE.', default=False)
@@ -271,6 +316,10 @@ def parseOpts(overrideArguments=None):
verbosity.add_option('--dump-intermediate-pages',
action='store_true', dest='dump_intermediate_pages', default=False,
help='print downloaded pages to debug problems(very verbose)')
verbosity.add_option('--youtube-print-sig-code',
action='store_true', dest='youtube_print_sig_code', default=False,
help=optparse.SUPPRESS_HELP)
filesystem.add_option('-t', '--title',
action='store_true', dest='usetitle', help='use title in file name (default)', default=False)
@@ -286,7 +335,9 @@ def parseOpts(overrideArguments=None):
help=('output filename template. Use %(title)s to get the title, '
'%(uploader)s for the uploader name, %(uploader_id)s for the uploader nickname if different, '
'%(autonumber)s to get an automatically incremented number, '
'%(ext)s for the filename extension, %(upload_date)s for the upload date (YYYYMMDD), '
'%(ext)s for the filename extension, '
'%(format)s for the format description (like "22 - 1280x720" or "HD")'
'%(upload_date)s for the upload date (YYYYMMDD), '
'%(extractor)s for the provider (youtube, metacafe, etc), '
'%(id)s for the video id , %(playlist)s for the playlist the video is in, '
'%(playlist_index)s for the position in the playlist and %% for a literal percent. '
@@ -320,6 +371,9 @@ def parseOpts(overrideArguments=None):
filesystem.add_option('--write-info-json',
action='store_true', dest='writeinfojson',
help='write video metadata to a .info.json file', default=False)
filesystem.add_option('--write-annotations',
action='store_true', dest='writeannotations',
help='write video annotations to a .annotation file', default=False)
filesystem.add_option('--write-thumbnail',
action='store_true', dest='writethumbnail',
help='write thumbnail image to disk', default=False)
@@ -339,6 +393,8 @@ def parseOpts(overrideArguments=None):
help='do not overwrite post-processed files; the post-processed files are overwritten by default')
postproc.add_option('--embed-subs', action='store_true', dest='embedsubtitles', default=False,
help='embed subtitles in the video (only for mp4 videos)')
postproc.add_option('--add-metadata', action='store_true', dest='addmetadata', default=False,
help='add metadata to the files')
parser.add_option_group(general)
@@ -358,9 +414,13 @@ def parseOpts(overrideArguments=None):
else:
xdg_config_home = os.environ.get('XDG_CONFIG_HOME')
if xdg_config_home:
userConfFile = os.path.join(xdg_config_home, 'youtube-dl.conf')
userConfFile = os.path.join(xdg_config_home, 'youtube-dl', 'config')
if not os.path.isfile(userConfFile):
userConfFile = os.path.join(xdg_config_home, 'youtube-dl.conf')
else:
userConfFile = os.path.join(os.path.expanduser('~'), '.config', 'youtube-dl.conf')
userConfFile = os.path.join(os.path.expanduser('~'), '.config', 'youtube-dl', 'config')
if not os.path.isfile(userConfFile):
userConfFile = os.path.join(os.path.expanduser('~'), '.config', 'youtube-dl.conf')
systemConf = _readOptions('/etc/youtube-dl.conf')
userConf = _readOptions(userConfFile)
commandLineConf = sys.argv[1:]
@@ -425,27 +485,7 @@ def _real_main(argv=None):
all_urls = batchurls + args
all_urls = [url.strip() for url in all_urls]
# General configuration
cookie_processor = compat_urllib_request.HTTPCookieProcessor(jar)
if opts.proxy is not None:
if opts.proxy == '':
proxies = {}
else:
proxies = {'http': opts.proxy, 'https': opts.proxy}
else:
proxies = compat_urllib_request.getproxies()
# Set HTTPS proxy to HTTP one if given (https://github.com/rg3/youtube-dl/issues/805)
if 'http' in proxies and 'https' not in proxies:
proxies['https'] = proxies['http']
proxy_handler = compat_urllib_request.ProxyHandler(proxies)
https_handler = make_HTTPS_handler(opts)
opener = compat_urllib_request.build_opener(https_handler, proxy_handler, cookie_processor, YoutubeDLHandler())
# Delete the default user-agent header, which would otherwise apply in
# cases where our custom HTTP handler doesn't come into play
# (See https://github.com/rg3/youtube-dl/issues/1309 for details)
opener.addheaders =[]
compat_urllib_request.install_opener(opener)
socket.setdefaulttimeout(300) # 5 minutes should be enough (famous last words)
opener = _setup_opener(jar=jar, opts=opts)
extractors = gen_extractors()
@@ -462,6 +502,8 @@ def _real_main(argv=None):
if not ie._WORKING:
continue
desc = getattr(ie, 'IE_DESC', ie.IE_NAME)
if desc is False:
continue
if hasattr(ie, 'SEARCH_KEY'):
_SEARCHES = (u'cute kittens', u'slithering pythons', u'falling cat', u'angry poodle', u'purple fish', u'running tortoise')
_COUNTS = (u'', u'5', u'10', u'all')
@@ -550,6 +592,10 @@ def _real_main(argv=None):
or (opts.useid and u'%(id)s.%(ext)s')
or (opts.autonumber and u'%(autonumber)s-%(id)s.%(ext)s')
or u'%(title)s-%(id)s.%(ext)s')
if '%(ext)s' not in outtmpl and opts.extractaudio:
parser.error(u'Cannot download a video and extract audio into the same'
u' file! Use "%%(ext)s" instead of %r' %
determine_ext(outtmpl, u''))
# YoutubeDL
ydl = YoutubeDL({
@@ -584,11 +630,13 @@ def _real_main(argv=None):
'progress_with_newline': opts.progress_with_newline,
'playliststart': opts.playliststart,
'playlistend': opts.playlistend,
'noplaylist': opts.noplaylist,
'logtostderr': opts.outtmpl == '-',
'consoletitle': opts.consoletitle,
'nopart': opts.nopart,
'updatetime': opts.updatetime,
'writedescription': opts.writedescription,
'writeannotations': opts.writeannotations,
'writeinfojson': opts.writeinfojson,
'writethumbnail': opts.writethumbnail,
'writesubtitles': opts.writesubtitles,
@@ -608,6 +656,10 @@ def _real_main(argv=None):
'min_filesize': opts.min_filesize,
'max_filesize': opts.max_filesize,
'daterange': date,
'cachedir': opts.cachedir,
'youtube_print_sig_code': opts.youtube_print_sig_code,
'age_limit': opts.age_limit,
'download_archive': opts.download_archive,
})
if opts.verbose:
@@ -627,11 +679,19 @@ def _real_main(argv=None):
except:
pass
write_string(u'[debug] Python version %s - %s' %(platform.python_version(), platform_name()) + u'\n')
write_string(u'[debug] Proxy map: ' + str(proxy_handler.proxies) + u'\n')
proxy_map = {}
for handler in opener.handlers:
if hasattr(handler, 'proxies'):
proxy_map.update(handler.proxies)
write_string(u'[debug] Proxy map: ' + compat_str(proxy_map) + u'\n')
ydl.add_default_info_extractors()
# PostProcessors
# Add the metadata pp first, the other pps will copy it
if opts.addmetadata:
ydl.add_post_processor(FFmpegMetadataPP())
if opts.extractaudio:
ydl.add_post_processor(FFmpegExtractAudioPP(preferredcodec=opts.audioformat, preferredquality=opts.audioquality, nopostoverwrites=opts.nopostoverwrites))
if opts.recodevideo:
@@ -641,7 +701,7 @@ def _real_main(argv=None):
# Update version
if opts.update_self:
update_self(ydl.to_screen, opts.verbose, sys.argv[0])
update_self(ydl.to_screen, opts.verbose)
# Maybe do nothing
if len(all_urls) < 1:
@@ -660,11 +720,42 @@ def _real_main(argv=None):
if opts.cookiefile is not None:
try:
jar.save()
except (IOError, OSError) as err:
except (IOError, OSError):
sys.exit(u'ERROR: unable to save cookie jar')
sys.exit(retcode)
def _setup_opener(jar=None, opts=None, timeout=300):
if opts is None:
FakeOptions = collections.namedtuple(
'FakeOptions', ['proxy', 'no_check_certificate'])
opts = FakeOptions(proxy=None, no_check_certificate=False)
cookie_processor = compat_urllib_request.HTTPCookieProcessor(jar)
if opts.proxy is not None:
if opts.proxy == '':
proxies = {}
else:
proxies = {'http': opts.proxy, 'https': opts.proxy}
else:
proxies = compat_urllib_request.getproxies()
# Set HTTPS proxy to HTTP one if given (https://github.com/rg3/youtube-dl/issues/805)
if 'http' in proxies and 'https' not in proxies:
proxies['https'] = proxies['http']
proxy_handler = compat_urllib_request.ProxyHandler(proxies)
https_handler = make_HTTPS_handler(opts)
opener = compat_urllib_request.build_opener(
https_handler, proxy_handler, cookie_processor, YoutubeDLHandler())
# Delete the default user-agent header, which would otherwise apply in
# cases where our custom HTTP handler doesn't come into play
# (See https://github.com/rg3/youtube-dl/issues/1309 for details)
opener.addheaders = []
compat_urllib_request.install_opener(opener)
socket.setdefaulttimeout(timeout)
return opener
def main(argv=None):
try:
_real_main(argv)

View File

@@ -2,7 +2,12 @@ from .appletrailers import AppleTrailersIE
from .addanime import AddAnimeIE
from .archiveorg import ArchiveOrgIE
from .ard import ARDIE
from .arte import ArteTvIE
from .arte import (
ArteTvIE,
ArteTVPlus7IE,
ArteTVCreativeIE,
ArteTVFutureIE,
)
from .auengine import AUEngineIE
from .bandcamp import BandcampIE
from .bliptv import BlipTVIE, BlipTVUserIE
@@ -12,27 +17,40 @@ from .brightcove import BrightcoveIE
from .c56 import C56IE
from .canalplus import CanalplusIE
from .canalc2 import Canalc2IE
from .cinemassacre import CinemassacreIE
from .cnn import CNNIE
from .collegehumor import CollegeHumorIE
from .comedycentral import ComedyCentralIE
from .condenast import CondeNastIE
from .criterion import CriterionIE
from .cspan import CSpanIE
from .dailymotion import DailymotionIE, DailymotionPlaylistIE
from .dailymotion import (
DailymotionIE,
DailymotionPlaylistIE,
DailymotionUserIE,
)
from .daum import DaumIE
from .depositfiles import DepositFilesIE
from .dotsub import DotsubIE
from .dreisat import DreiSatIE
from .defense import DefenseGouvFrIE
from .ebaumsworld import EbaumsWorldIE
from .ehow import EHowIE
from .eighttracks import EightTracksIE
from .escapist import EscapistIE
from .exfm import ExfmIE
from .facebook import FacebookIE
from .faz import FazIE
from .fktv import (
FKTVIE,
FKTVPosteckeIE,
)
from .flickr import FlickrIE
from .francetv import (
PluzzIE,
FranceTvInfoIE,
France2IE,
GenerationQuoiIE
)
from .freesound import FreesoundIE
from .funnyordie import FunnyOrDieIE
@@ -49,6 +67,7 @@ from .ign import IGNIE, OneUPIE
from .ina import InaIE
from .infoq import InfoQIE
from .instagram import InstagramIE
from .internetvideoarchive import InternetVideoArchiveIE
from .jeuxvideo import JeuxVideoIE
from .jukebox import JukeboxIE
from .justintv import JustinTVIE
@@ -68,6 +87,9 @@ from .myvideo import MyVideoIE
from .naver import NaverIE
from .nba import NBAIE
from .nbc import NBCNewsIE
from .newgrounds import NewgroundsIE
from .nhl import NHLIE, NHLVideocenterIE
from .nowvideo import NowVideoIE
from .ooyala import OoyalaIE
from .orf import ORFIE
from .pbs import PBSIE
@@ -77,8 +99,10 @@ from .rbmaradio import RBMARadioIE
from .redtube import RedTubeIE
from .ringtv import RingTVIE
from .ro220 import Ro220IE
from .rottentomatoes import RottenTomatoesIE
from .roxwel import RoxwelIE
from .rtlnow import RTLnowIE
from .rutube import RutubeIE
from .sina import SinaIE
from .slashdot import SlashdotIE
from .slideshare import SlideshareIE
@@ -89,7 +113,9 @@ from .spiegel import SpiegelIE
from .stanfordoc import StanfordOpenClassroomIE
from .statigram import StatigramIE
from .steam import SteamIE
from .sztvhu import SztvHuIE
from .teamcoco import TeamcocoIE
from .techtalks import TechTalksIE
from .ted import TEDIE
from .tf1 import TF1IE
from .thisav import ThisAVIE
@@ -105,10 +131,14 @@ from .veehd import VeeHDIE
from .veoh import VeohIE
from .vevo import VevoIE
from .vice import ViceIE
from .viddler import ViddlerIE
from .videodetective import VideoDetectiveIE
from .videofyme import VideofyMeIE
from .videopremium import VideoPremiumIE
from .vimeo import VimeoIE, VimeoChannelIE
from .vine import VineIE
from .wat import WatIE
from .websurg import WeBSurgIE
from .weibo import WeiboIE
from .wimp import WimpIE
from .worldstarhiphop import WorldStarHipHopIE
@@ -128,6 +158,7 @@ from .youtube import (
YoutubeShowIE,
YoutubeSubscriptionsIE,
YoutubeRecommendedIE,
YoutubeTruncatedURLIE,
YoutubeWatchLaterIE,
YoutubeFavouritesIE,
)

View File

@@ -1,8 +1,10 @@
import re
import xml.etree.ElementTree
import json
from .common import InfoExtractor
from ..utils import (
compat_urlparse,
determine_ext,
)
@@ -14,10 +16,9 @@ class AppleTrailersIE(InfoExtractor):
u"playlist": [
{
u"file": u"manofsteel-trailer4.mov",
u"md5": u"11874af099d480cc09e103b189805d5f",
u"md5": u"d97a8e575432dbcb81b7c3acb741f8a8",
u"info_dict": {
u"duration": 111,
u"thumbnail": u"http://trailers.apple.com/trailers/wb/manofsteel/images/thumbnail_11624.jpg",
u"title": u"Trailer 4",
u"upload_date": u"20130523",
u"uploader_id": u"wb",
@@ -25,10 +26,9 @@ class AppleTrailersIE(InfoExtractor):
},
{
u"file": u"manofsteel-trailer3.mov",
u"md5": u"07a0a262aae5afe68120eed61137ab34",
u"md5": u"b8017b7131b721fb4e8d6f49e1df908c",
u"info_dict": {
u"duration": 182,
u"thumbnail": u"http://trailers.apple.com/trailers/wb/manofsteel/images/thumbnail_10793.jpg",
u"title": u"Trailer 3",
u"upload_date": u"20130417",
u"uploader_id": u"wb",
@@ -36,10 +36,9 @@ class AppleTrailersIE(InfoExtractor):
},
{
u"file": u"manofsteel-trailer.mov",
u"md5": u"e401fde0813008e3307e54b6f384cff1",
u"md5": u"d0f1e1150989b9924679b441f3404d48",
u"info_dict": {
u"duration": 148,
u"thumbnail": u"http://trailers.apple.com/trailers/wb/manofsteel/images/thumbnail_8703.jpg",
u"title": u"Trailer",
u"upload_date": u"20121212",
u"uploader_id": u"wb",
@@ -47,10 +46,9 @@ class AppleTrailersIE(InfoExtractor):
},
{
u"file": u"manofsteel-teaser.mov",
u"md5": u"76b392f2ae9e7c98b22913c10a639c97",
u"md5": u"5fe08795b943eb2e757fa95cb6def1cb",
u"info_dict": {
u"duration": 93,
u"thumbnail": u"http://trailers.apple.com/trailers/wb/manofsteel/images/thumbnail_6899.jpg",
u"title": u"Teaser",
u"upload_date": u"20120721",
u"uploader_id": u"wb",
@@ -59,87 +57,61 @@ class AppleTrailersIE(InfoExtractor):
]
}
_JSON_RE = r'iTunes.playURL\((.*?)\);'
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
movie = mobj.group('movie')
uploader_id = mobj.group('company')
playlist_url = url.partition(u'?')[0] + u'/includes/playlists/web.inc'
playlist_url = compat_urlparse.urljoin(url, u'includes/playlists/itunes.inc')
playlist_snippet = self._download_webpage(playlist_url, movie)
playlist_cleaned = re.sub(r'(?s)<script>.*?</script>', u'', playlist_snippet)
playlist_cleaned = re.sub(r'(?s)<script[^<]*?>.*?</script>', u'', playlist_snippet)
playlist_cleaned = re.sub(r'<img ([^<]*?)>', r'<img \1/>', playlist_cleaned)
# The ' in the onClick attributes are not escaped, it couldn't be parsed
# with xml.etree.ElementTree.fromstring
# like: http://trailers.apple.com/trailers/wb/gravity/
def _clean_json(m):
return u'iTunes.playURL(%s);' % m.group(1).replace('\'', '&#39;')
playlist_cleaned = re.sub(self._JSON_RE, _clean_json, playlist_cleaned)
playlist_html = u'<html>' + playlist_cleaned + u'</html>'
size_cache = {}
doc = xml.etree.ElementTree.fromstring(playlist_html)
playlist = []
for li in doc.findall('./div/ul/li'):
title = li.find('.//h3').text
on_click = li.find('.//a').attrib['onClick']
trailer_info_json = self._search_regex(self._JSON_RE,
on_click, u'trailer info')
trailer_info = json.loads(trailer_info_json)
title = trailer_info['title']
video_id = movie + '-' + re.sub(r'[^a-zA-Z0-9]', '', title).lower()
thumbnail = li.find('.//img').attrib['src']
upload_date = trailer_info['posted'].replace('-', '')
date_el = li.find('.//p')
upload_date = None
m = re.search(r':\s?(?P<month>[0-9]{2})/(?P<day>[0-9]{2})/(?P<year>[0-9]{2})', date_el.text)
if m:
upload_date = u'20' + m.group('year') + m.group('month') + m.group('day')
runtime_el = date_el.find('./br')
m = re.search(r':\s?(?P<minutes>[0-9]+):(?P<seconds>[0-9]{1,2})', runtime_el.tail)
runtime = trailer_info['runtime']
m = re.search(r'(?P<minutes>[0-9]+):(?P<seconds>[0-9]{1,2})', runtime)
duration = None
if m:
duration = 60 * int(m.group('minutes')) + int(m.group('seconds'))
first_url = trailer_info['url']
trailer_id = first_url.split('/')[-1].rpartition('_')[0].lower()
settings_json_url = compat_urlparse.urljoin(url, 'includes/settings/%s.json' % trailer_id)
settings_json = self._download_webpage(settings_json_url, trailer_id, u'Downloading settings json')
settings = json.loads(settings_json)
formats = []
for formats_el in li.findall('.//a'):
if formats_el.attrib['class'] != 'OverlayPanel':
continue
target = formats_el.attrib['target']
format_code = formats_el.text
if 'Automatic' in format_code:
continue
size_q = formats_el.attrib['href']
size_id = size_q.rpartition('#videos-')[2]
if size_id not in size_cache:
size_url = url + size_q
sizepage_html = self._download_webpage(
size_url, movie,
note=u'Downloading size info %s' % size_id,
errnote=u'Error while downloading size info %s' % size_id,
)
_doc = xml.etree.ElementTree.fromstring(sizepage_html)
size_cache[size_id] = _doc
sizepage_doc = size_cache[size_id]
links = sizepage_doc.findall('.//{http://www.w3.org/1999/xhtml}ul/{http://www.w3.org/1999/xhtml}li/{http://www.w3.org/1999/xhtml}a')
for vid_a in links:
href = vid_a.get('href')
if not href.endswith(target):
continue
detail_q = href.partition('#')[0]
detail_url = url + '/' + detail_q
m = re.match(r'includes/(?P<detail_id>[^/]+)/', detail_q)
detail_id = m.group('detail_id')
detail_html = self._download_webpage(
detail_url, movie,
note=u'Downloading detail %s %s' % (detail_id, size_id),
errnote=u'Error while downloading detail %s %s' % (detail_id, size_id)
)
detail_doc = xml.etree.ElementTree.fromstring(detail_html)
movie_link_el = detail_doc.find('.//{http://www.w3.org/1999/xhtml}a')
assert movie_link_el.get('class') == 'movieLink'
movie_link = movie_link_el.get('href').partition('?')[0].replace('_', '_h')
ext = determine_ext(movie_link)
assert ext == 'mov'
formats.append({
'format': format_code,
'ext': ext,
'url': movie_link,
})
for format in settings['metadata']['sizes']:
# The src is a file pointing to the real video file
format_url = re.sub(r'_(\d*p.mov)', r'_h\1', format['src'])
formats.append({
'url': format_url,
'ext': determine_ext(format_url),
'format': format['type'],
'width': format['width'],
'height': int(format['height']),
})
formats = sorted(formats, key=lambda f: (f['height'], f['width']))
info = {
'_type': 'video',

View File

@@ -1,3 +1,4 @@
# encoding: utf-8
import re
import json
import xml.etree.ElementTree
@@ -7,15 +8,15 @@ from ..utils import (
ExtractorError,
find_xpath_attr,
unified_strdate,
determine_ext,
get_element_by_id,
)
# There are different sources of video in arte.tv, the extraction process
# is different for each one. The videos usually expire in 7 days, so we can't
# add tests.
class ArteTvIE(InfoExtractor):
"""
There are two sources of video in arte.tv: videos.arte.tv and
www.arte.tv/guide, the extraction process is different for each one.
The videos expire in 7 days, so we can't add tests.
"""
_EMISSION_URL = r'(?:http://)?www\.arte.tv/guide/(?P<lang>fr|de)/(?:(?:sendungen|emissions)/)?(?P<id>.*?)/(?P<name>.*?)(\?.*)?'
_VIDEOS_URL = r'(?:http://)?videos.arte.tv/(?P<lang>fr|de)/.*-(?P<id>.*?).html'
_LIVEWEB_URL = r'(?:http://)?liveweb.arte.tv/(?P<lang>fr|de)/(?P<subpage>.+?)/(?P<name>.+)'
_LIVE_URL = r'index-[0-9]+\.html$'
@@ -24,7 +25,7 @@ class ArteTvIE(InfoExtractor):
@classmethod
def suitable(cls, url):
return any(re.match(regex, url) for regex in (cls._EMISSION_URL, cls._VIDEOS_URL, cls._LIVEWEB_URL))
return any(re.match(regex, url) for regex in (cls._VIDEOS_URL, cls._LIVEWEB_URL))
# TODO implement Live Stream
# from ..utils import compat_urllib_parse
@@ -55,14 +56,6 @@ class ArteTvIE(InfoExtractor):
# video_url = u'%s/%s' % (info.get('url'), info.get('path'))
def _real_extract(self, url):
mobj = re.match(self._EMISSION_URL, url)
if mobj is not None:
lang = mobj.group('lang')
# This is not a real id, it can be for example AJT for the news
# http://www.arte.tv/guide/fr/emissions/AJT/arte-journal
video_id = mobj.group('id')
return self._extract_emission(url, video_id, lang)
mobj = re.match(self._VIDEOS_URL, url)
if mobj is not None:
id = mobj.group('id')
@@ -80,49 +73,6 @@ class ArteTvIE(InfoExtractor):
# self.extractLiveStream(url)
# return
def _extract_emission(self, url, video_id, lang):
"""Extract from www.arte.tv/guide"""
webpage = self._download_webpage(url, video_id)
json_url = self._html_search_regex(r'arte_vp_url="(.*?)"', webpage, 'json url')
json_info = self._download_webpage(json_url, video_id, 'Downloading info json')
self.report_extraction(video_id)
info = json.loads(json_info)
player_info = info['videoJsonPlayer']
info_dict = {'id': player_info['VID'],
'title': player_info['VTI'],
'description': player_info.get('VDE'),
'upload_date': unified_strdate(player_info['VDA'].split(' ')[0]),
'thumbnail': player_info['programImage'],
'ext': 'flv',
}
formats = player_info['VSR'].values()
def _match_lang(f):
# Return true if that format is in the language of the url
if lang == 'fr':
l = 'F'
elif lang == 'de':
l = 'A'
regexes = [r'VO?%s' % l, r'VO?.-ST%s' % l]
return any(re.match(r, f['versionCode']) for r in regexes)
# Some formats may not be in the same language as the url
formats = filter(_match_lang, formats)
# We order the formats by quality
formats = sorted(formats, key=lambda f: int(f['height']))
# Prefer videos without subtitles in the same language
formats = sorted(formats, key=lambda f: re.match(r'VO(F|A)-STM\1', f['versionCode']) is None)
# Pick the best quality
format_info = formats[-1]
if format_info['mediaType'] == u'rtmp':
info_dict['url'] = format_info['streamer']
info_dict['play_path'] = 'mp4:' + format_info['url']
else:
info_dict['url'] = format_info['url']
return info_dict
def _extract_video(self, url, video_id, lang):
"""Extract from videos.arte.tv"""
ref_xml_url = url.replace('/videos/', '/do_delegate/videos/')
@@ -172,3 +122,123 @@ class ArteTvIE(InfoExtractor):
'ext': 'flv',
'thumbnail': self._og_search_thumbnail(webpage),
}
class ArteTVPlus7IE(InfoExtractor):
IE_NAME = u'arte.tv:+7'
_VALID_URL = r'https?://www\.arte.tv/guide/(?P<lang>fr|de)/(?:(?:sendungen|emissions)/)?(?P<id>.*?)/(?P<name>.*?)(\?.*)?'
@classmethod
def _extract_url_info(cls, url):
mobj = re.match(cls._VALID_URL, url)
lang = mobj.group('lang')
# This is not a real id, it can be for example AJT for the news
# http://www.arte.tv/guide/fr/emissions/AJT/arte-journal
video_id = mobj.group('id')
return video_id, lang
def _real_extract(self, url):
video_id, lang = self._extract_url_info(url)
webpage = self._download_webpage(url, video_id)
return self._extract_from_webpage(webpage, video_id, lang)
def _extract_from_webpage(self, webpage, video_id, lang):
json_url = self._html_search_regex(r'arte_vp_url="(.*?)"', webpage, 'json url')
json_info = self._download_webpage(json_url, video_id, 'Downloading info json')
self.report_extraction(video_id)
info = json.loads(json_info)
player_info = info['videoJsonPlayer']
info_dict = {
'id': player_info['VID'],
'title': player_info['VTI'],
'description': player_info.get('VDE'),
'upload_date': unified_strdate(player_info.get('VDA', '').split(' ')[0]),
'thumbnail': player_info.get('programImage') or player_info.get('VTU', {}).get('IUR'),
}
formats = player_info['VSR'].values()
def _match_lang(f):
if f.get('versionCode') is None:
return True
# Return true if that format is in the language of the url
if lang == 'fr':
l = 'F'
elif lang == 'de':
l = 'A'
regexes = [r'VO?%s' % l, r'VO?.-ST%s' % l]
return any(re.match(r, f['versionCode']) for r in regexes)
# Some formats may not be in the same language as the url
formats = filter(_match_lang, formats)
# Some formats use the m3u8 protocol
formats = filter(lambda f: f.get('videoFormat') != 'M3U8', formats)
# We order the formats by quality
formats = list(formats) # in python3 filter returns an iterator
if re.match(r'[A-Z]Q', formats[0]['quality']) is not None:
sort_key = lambda f: ['HQ', 'MQ', 'EQ', 'SQ'].index(f['quality'])
else:
sort_key = lambda f: int(f.get('height',-1))
formats = sorted(formats, key=sort_key)
# Prefer videos without subtitles in the same language
formats = sorted(formats, key=lambda f: re.match(r'VO(F|A)-STM\1', f.get('versionCode', '')) is None)
# Pick the best quality
def _format(format_info):
quality = format_info['quality']
m_quality = re.match(r'\w*? - (\d*)p', quality)
if m_quality is not None:
quality = m_quality.group(1)
if format_info.get('versionCode') is not None:
format_id = u'%s-%s' % (quality, format_info['versionCode'])
else:
format_id = quality
info = {
'format_id': format_id,
'format_note': format_info.get('versionLibelle'),
'width': format_info.get('width'),
'height': format_info.get('height'),
}
if format_info['mediaType'] == u'rtmp':
info['url'] = format_info['streamer']
info['play_path'] = 'mp4:' + format_info['url']
info['ext'] = 'flv'
else:
info['url'] = format_info['url']
info['ext'] = determine_ext(info['url'])
return info
info_dict['formats'] = [_format(f) for f in formats]
return info_dict
# It also uses the arte_vp_url url from the webpage to extract the information
class ArteTVCreativeIE(ArteTVPlus7IE):
IE_NAME = u'arte.tv:creative'
_VALID_URL = r'https?://creative\.arte\.tv/(?P<lang>fr|de)/magazine?/(?P<id>.+)'
_TEST = {
u'url': u'http://creative.arte.tv/de/magazin/agentur-amateur-corporate-design',
u'file': u'050489-002.mp4',
u'info_dict': {
u'title': u'Agentur Amateur / Agence Amateur #2 : Corporate Design',
},
}
class ArteTVFutureIE(ArteTVPlus7IE):
IE_NAME = u'arte.tv:future'
_VALID_URL = r'https?://future\.arte\.tv/(?P<lang>fr|de)/(thema|sujet)/.*?#article-anchor-(?P<id>\d+)'
_TEST = {
u'url': u'http://future.arte.tv/fr/sujet/info-sciences#article-anchor-7081',
u'file': u'050940-003.mp4',
u'info_dict': {
u'title': u'Les champignons au secours de la planète',
},
}
def _real_extract(self, url):
anchor_id, lang = self._extract_url_info(url)
webpage = self._download_webpage(url, anchor_id)
row = get_element_by_id(anchor_id, webpage)
return self._extract_from_webpage(row, anchor_id, lang)

View File

@@ -115,7 +115,7 @@ class BlipTVIE(InfoExtractor):
ext = umobj.group(1)
info = {
'id': data['item_id'],
'id': compat_str(data['item_id']),
'url': video_url,
'uploader': data['display_name'],
'upload_date': upload_date,

View File

@@ -1,3 +1,5 @@
# encoding: utf-8
import re
import json
import xml.etree.ElementTree
@@ -7,15 +9,39 @@ from ..utils import (
compat_urllib_parse,
find_xpath_attr,
compat_urlparse,
ExtractorError,
)
class BrightcoveIE(InfoExtractor):
_VALID_URL = r'https?://.*brightcove\.com/(services|viewer).*\?(?P<query>.*)'
_FEDERATED_URL_TEMPLATE = 'http://c.brightcove.com/services/viewer/htmlFederated?%s'
_PLAYLIST_URL_TEMPLATE = 'http://c.brightcove.com/services/json/experience/runtime/?command=get_programming_for_experience&playerKey=%s'
# There is a test for Brigtcove in GenericIE, that way we test both the download
# and the detection of videos, and we don't have to find an URL that is always valid
_TESTS = [
{
# From http://www.8tv.cat/8aldia/videos/xavier-sala-i-martin-aquesta-tarda-a-8-al-dia/
u'url': u'http://c.brightcove.com/services/viewer/htmlFederated?playerID=1654948606001&flashID=myExperience&%40videoPlayer=2371591881001',
u'file': u'2371591881001.mp4',
u'md5': u'9e80619e0a94663f0bdc849b4566af19',
u'note': u'Test Brightcove downloads and detection in GenericIE',
u'info_dict': {
u'title': u'Xavier Sala i Martín: “Un banc que no presta és un banc zombi que no serveix per a res”',
u'uploader': u'8TV',
u'description': u'md5:a950cc4285c43e44d763d036710cd9cd',
}
},
{
# From http://medianetwork.oracle.com/video/player/1785452137001
u'url': u'http://c.brightcove.com/services/viewer/htmlFederated?playerID=1217746023001&flashID=myPlayer&%40videoPlayer=1785452137001',
u'file': u'1785452137001.flv',
u'info_dict': {
u'title': u'JVMLS 2012: Arrays 2.0 - Opportunities and Challenges',
u'description': u'John Rose speaks at the JVM Language Summit, August 1, 2012.',
u'uploader': u'Oracle',
},
},
]
@classmethod
def _build_brighcove_url(cls, object_str):
@@ -23,6 +49,13 @@ class BrightcoveIE(InfoExtractor):
Build a Brightcove url from a xml string containing
<object class="BrightcoveExperience">{params}</object>
"""
# Fix up some stupid HTML, see https://github.com/rg3/youtube-dl/issues/1553
object_str = re.sub(r'(<param name="[^"]+" value="[^"]+")>',
lambda m: m.group(1) + '/>', object_str)
# Fix up some stupid XML, see https://github.com/rg3/youtube-dl/issues/1608
object_str = object_str.replace(u'<--', u'<!--')
object_doc = xml.etree.ElementTree.fromstring(object_str)
assert u'BrightcoveExperience' in object_doc.attrib['class']
params = {'flashID': object_doc.attrib['id'],
@@ -65,22 +98,37 @@ class BrightcoveIE(InfoExtractor):
playlist_info = self._download_webpage(self._PLAYLIST_URL_TEMPLATE % player_key,
player_key, u'Downloading playlist information')
playlist_info = json.loads(playlist_info)['videoList']
json_data = json.loads(playlist_info)
if 'videoList' not in json_data:
raise ExtractorError(u'Empty playlist')
playlist_info = json_data['videoList']
videos = [self._extract_video_info(video_info) for video_info in playlist_info['mediaCollectionDTO']['videoDTOs']]
return self.playlist_result(videos, playlist_id=playlist_info['id'],
playlist_title=playlist_info['mediaCollectionDTO']['displayName'])
def _extract_video_info(self, video_info):
renditions = video_info['renditions']
renditions = sorted(renditions, key=lambda r: r['size'])
best_format = renditions[-1]
info = {
'id': video_info['id'],
'title': video_info['displayName'],
'description': video_info.get('shortDescription'),
'thumbnail': video_info.get('videoStillURL') or video_info.get('thumbnailURL'),
'uploader': video_info.get('publisherName'),
}
return {'id': video_info['id'],
'title': video_info['displayName'],
'url': best_format['defaultURL'],
renditions = video_info.get('renditions')
if renditions:
renditions = sorted(renditions, key=lambda r: r['size'])
best_format = renditions[-1]
info.update({
'url': best_format['defaultURL'],
'ext': 'mp4',
'description': video_info.get('shortDescription'),
'thumbnail': video_info.get('videoStillURL') or video_info.get('thumbnailURL'),
'uploader': video_info.get('publisherName'),
}
})
elif video_info.get('FLVFullLengthURL') is not None:
info.update({
'url': video_info['FLVFullLengthURL'],
'ext': 'flv',
})
else:
raise ExtractorError(u'Unable to extract video url for %s' % info['id'])
return info

View File

@@ -0,0 +1,91 @@
# encoding: utf-8
import re
from .common import InfoExtractor
from ..utils import (
ExtractorError,
)
class CinemassacreIE(InfoExtractor):
_VALID_URL = r'(?:http://)?(?:www\.)?(?P<url>cinemassacre\.com/(?P<date_Y>[0-9]{4})/(?P<date_m>[0-9]{2})/(?P<date_d>[0-9]{2})/.+?)(?:[/?].*)?'
_TESTS = [{
u'url': u'http://cinemassacre.com/2012/11/10/avgn-the-movie-trailer/',
u'file': u'19911.flv',
u'info_dict': {
u'upload_date': u'20121110',
u'title': u'“Angry Video Game Nerd: The Movie” Trailer',
u'description': u'md5:fb87405fcb42a331742a0dce2708560b',
},
u'params': {
# rtmp download
u'skip_download': True,
},
},
{
u'url': u'http://cinemassacre.com/2013/10/02/the-mummys-hand-1940',
u'file': u'521be8ef82b16.flv',
u'info_dict': {
u'upload_date': u'20131002',
u'title': u'The Mummys Hand (1940)',
},
u'params': {
# rtmp download
u'skip_download': True,
},
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
webpage_url = u'http://' + mobj.group('url')
webpage = self._download_webpage(webpage_url, None) # Don't know video id yet
video_date = mobj.group('date_Y') + mobj.group('date_m') + mobj.group('date_d')
mobj = re.search(r'src="(?P<embed_url>http://player\.screenwavemedia\.com/play/(?:embed|player)\.php\?id=(?:Cinemassacre-)?(?P<video_id>.+?))"', webpage)
if not mobj:
raise ExtractorError(u'Can\'t extract embed url and video id')
playerdata_url = mobj.group(u'embed_url')
video_id = mobj.group(u'video_id')
video_title = self._html_search_regex(r'<title>(?P<title>.+?)\|',
webpage, u'title')
video_description = self._html_search_regex(r'<div class="entry-content">(?P<description>.+?)</div>',
webpage, u'description', flags=re.DOTALL, fatal=False)
if len(video_description) == 0:
video_description = None
playerdata = self._download_webpage(playerdata_url, video_id)
base_url = self._html_search_regex(r'\'streamer\': \'(?P<base_url>rtmp://.*?)/(?:vod|Cinemassacre)\'',
playerdata, u'base_url')
base_url += '/Cinemassacre/'
# Important: The file names in playerdata are not used by the player and even wrong for some videos
sd_file = 'Cinemassacre-%s_high.mp4' % video_id
hd_file = 'Cinemassacre-%s.mp4' % video_id
video_thumbnail = 'http://image.screenwavemedia.com/Cinemassacre/Cinemassacre-%s_thumb_640x360.jpg' % video_id
formats = [
{
'url': base_url + sd_file,
'ext': 'flv',
'format': 'sd',
'format_id': 'sd',
},
{
'url': base_url + hd_file,
'ext': 'flv',
'format': 'hd',
'format_id': 'hd',
},
]
info = {
'id': video_id,
'title': video_title,
'formats': formats,
'description': video_description,
'upload_date': video_date,
'thumbnail': video_thumbnail,
}
# TODO: Remove when #980 has been merged
info.update(formats[-1])
return info

View File

@@ -51,12 +51,12 @@ class ComedyCentralIE(InfoExtractor):
'400': 'mp4',
}
_video_dimensions = {
'3500': '1280x720',
'2200': '960x540',
'1700': '768x432',
'1200': '640x360',
'750': '512x288',
'400': '384x216',
'3500': (1280, 720),
'2200': (960, 540),
'1700': (768, 432),
'1200': (640, 360),
'750': (512, 288),
'400': (384, 216),
}
@classmethod
@@ -64,11 +64,13 @@ class ComedyCentralIE(InfoExtractor):
"""Receives a URL and returns True if suitable for this IE."""
return re.match(cls._VALID_URL, url, re.VERBOSE) is not None
def _print_formats(self, formats):
print('Available formats:')
for x in formats:
print('%s\t:\t%s\t[%s]' %(x, self._video_extensions.get(x, 'mp4'), self._video_dimensions.get(x, '???')))
@staticmethod
def _transform_rtmp_url(rtmp_video_url):
m = re.match(r'^rtmpe?://.*?/(?P<finalid>gsp.comedystor/.*)$', rtmp_video_url)
if not m:
raise ExtractorError(u'Cannot transform RTMP url')
base = 'http://mtvnmobile.vo.llnwd.net/kip0/_pxn=1+_pxI0=Ripod-h264+_pxL0=undefined+_pxM0=+_pxK=18639+_pxE=mp4/44620/mtvnorigin/'
return base + m.group('finalid')
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url, re.VERBOSE)
@@ -155,40 +157,31 @@ class ComedyCentralIE(InfoExtractor):
self._downloader.report_error(u'unable to download ' + mediaId + ': No videos found')
continue
if self._downloader.params.get('listformats', None):
self._print_formats([i[0] for i in turls])
return
# For now, just pick the highest bitrate
format,rtmp_video_url = turls[-1]
# Get the format arg from the arg stream
req_format = self._downloader.params.get('format', None)
# Select format if we can find one
for f,v in turls:
if f == req_format:
format, rtmp_video_url = f, v
break
m = re.match(r'^rtmpe?://.*?/(?P<finalid>gsp.comedystor/.*)$', rtmp_video_url)
if not m:
raise ExtractorError(u'Cannot transform RTMP url')
base = 'http://mtvnmobile.vo.llnwd.net/kip0/_pxn=1+_pxI0=Ripod-h264+_pxL0=undefined+_pxM0=+_pxK=18639+_pxE=mp4/44620/mtvnorigin/'
video_url = base + m.group('finalid')
formats = []
for format, rtmp_video_url in turls:
w, h = self._video_dimensions.get(format, (None, None))
formats.append({
'url': self._transform_rtmp_url(rtmp_video_url),
'ext': self._video_extensions.get(format, 'mp4'),
'format_id': format,
'height': h,
'width': w,
})
effTitle = showId + u'-' + epTitle + u' part ' + compat_str(partNum+1)
info = {
'id': shortMediaId,
'url': video_url,
'formats': formats,
'uploader': showId,
'upload_date': officialDate,
'title': effTitle,
'ext': 'mp4',
'format': format,
'thumbnail': None,
'description': compat_str(officialTitle),
}
# TODO: Remove when #980 has been merged
info.update(info['formats'][-1])
results.append(info)
return results

View File

@@ -14,6 +14,7 @@ from ..utils import (
clean_html,
compiled_regex_type,
ExtractorError,
RegexNotFoundError,
unescapeHTML,
)
@@ -35,6 +36,8 @@ class InfoExtractor(object):
title: Video title, unescaped.
ext: Video filename extension.
Instead of url and ext, formats can also specified.
The following fields are optional:
format: The video format, defaults to ext (used for --get-format)
@@ -52,8 +55,23 @@ class InfoExtractor(object):
view_count: How many users have watched the video on the platform.
urlhandle: [internal] The urlHandle to be used to download the file,
like returned by urllib.request.urlopen
age_limit: Age restriction for the video, as an integer (years)
formats: A list of dictionaries for each format available, it must
be ordered from worst to best quality. Potential fields:
* url Mandatory. The URL of the video file
* ext Will be calculated from url if missing
* format A human-readable description of the format
("mp4 container with h264/opus").
Calculated from the format_id, width, height
and format_note fields if missing.
* format_id A short description of the format
("mp4_h264_opus" or "19")
* format_note Additional info about the format
("3D" or "DASH video")
* width Width of the video, if known
* height Height of the video, if known
The fields should all be Unicode strings.
Unless mentioned otherwise, the fields should be Unicode strings.
Subclasses of this one should re-define the _real_initialize() and
_real_extract() methods and define a _VALID_URL regexp.
@@ -214,7 +232,7 @@ class InfoExtractor(object):
Perform a regex search on the given string, using a single or a list of
patterns returning the first matching group.
In case of failure return a default value or raise a WARNING or a
ExtractorError, depending on fatal, specifying the field name.
RegexNotFoundError, depending on fatal, specifying the field name.
"""
if isinstance(pattern, (str, compat_str, compiled_regex_type)):
mobj = re.search(pattern, string, flags)
@@ -234,7 +252,7 @@ class InfoExtractor(object):
elif default is not None:
return default
elif fatal:
raise ExtractorError(u'Unable to extract %s' % _name)
raise RegexNotFoundError(u'Unable to extract %s' % _name)
else:
self._downloader.report_warning(u'unable to extract %s; '
u'please report this issue on http://yt-dl.org/bug' % _name)
@@ -305,6 +323,15 @@ class InfoExtractor(object):
self._og_regex('video')],
html, name, **kargs)
def _rta_search(self, html):
# See http://www.rtalabel.org/index.php?content=howtofaq#single
if re.search(r'(?ix)<meta\s+name="rating"\s+'
r' content="RTA-5042-1996-1400-1577-RTA"',
html):
return 18
return 0
class SearchInfoExtractor(InfoExtractor):
"""
Base class for paged search queries extractors.
@@ -342,7 +369,7 @@ class SearchInfoExtractor(InfoExtractor):
def _get_n_results(self, query, n):
"""Get a specified number of results for a query"""
raise NotImplementedError("This method must be implemented by sublclasses")
raise NotImplementedError("This method must be implemented by subclasses")
@property
def SEARCH_KEY(self):

View File

@@ -10,25 +10,58 @@ from ..utils import (
compat_str,
get_element_by_attribute,
get_element_by_id,
orderedSet,
ExtractorError,
)
class DailymotionBaseInfoExtractor(InfoExtractor):
@staticmethod
def _build_request(url):
"""Build a request with the family filter disabled"""
request = compat_urllib_request.Request(url)
request.add_header('Cookie', 'family_filter=off')
return request
class DailymotionIE(SubtitlesInfoExtractor):
class DailymotionIE(DailymotionBaseInfoExtractor, SubtitlesInfoExtractor):
"""Information Extractor for Dailymotion"""
_VALID_URL = r'(?i)(?:https?://)?(?:www\.)?dailymotion\.[a-z]{2,3}/(?:embed/)?video/([^/]+)'
IE_NAME = u'dailymotion'
_TEST = {
u'url': u'http://www.dailymotion.com/video/x33vw9_tutoriel-de-youtubeur-dl-des-video_tech',
u'file': u'x33vw9.mp4',
u'md5': u'392c4b85a60a90dc4792da41ce3144eb',
u'info_dict': {
u"uploader": u"Amphora Alex and Van .",
u"title": u"Tutoriel de Youtubeur\"DL DES VIDEO DE YOUTUBE\""
}
}
_FORMATS = [
(u'stream_h264_ld_url', u'ld'),
(u'stream_h264_url', u'standard'),
(u'stream_h264_hq_url', u'hq'),
(u'stream_h264_hd_url', u'hd'),
(u'stream_h264_hd1080_url', u'hd180'),
]
_TESTS = [
{
u'url': u'http://www.dailymotion.com/video/x33vw9_tutoriel-de-youtubeur-dl-des-video_tech',
u'file': u'x33vw9.mp4',
u'md5': u'392c4b85a60a90dc4792da41ce3144eb',
u'info_dict': {
u"uploader": u"Amphora Alex and Van .",
u"title": u"Tutoriel de Youtubeur\"DL DES VIDEO DE YOUTUBE\""
}
},
# Vevo video
{
u'url': u'http://www.dailymotion.com/video/x149uew_katy-perry-roar-official_musi',
u'file': u'USUV71301934.mp4',
u'info_dict': {
u'title': u'Roar (Official)',
u'uploader': u'Katy Perry',
u'upload_date': u'20130905',
},
u'params': {
u'skip_download': True,
},
u'skip': u'VEVO is only available in some countries',
},
]
def _real_extract(self, url):
# Extract id and simplified title from URL
@@ -36,17 +69,24 @@ class DailymotionIE(SubtitlesInfoExtractor):
video_id = mobj.group(1).split('_')[0].split('?')[0]
video_extension = 'mp4'
url = 'http://www.dailymotion.com/video/%s' % video_id
# Retrieve video webpage to extract further information
request = compat_urllib_request.Request(url)
request.add_header('Cookie', 'family_filter=off')
request = self._build_request(url)
webpage = self._download_webpage(request, video_id)
# Extract URL, uploader and title from webpage
self.report_extraction(video_id)
# It may just embed a vevo video:
m_vevo = re.search(
r'<link rel="video_src" href="[^"]*?vevo.com[^"]*?videoId=(?P<id>[\w]*)',
webpage)
if m_vevo is not None:
vevo_id = m_vevo.group('id')
self.to_screen(u'Vevo video detected: %s' % vevo_id)
return self.url_result(u'vevo:%s' % vevo_id, ie='Vevo')
video_uploader = self._search_regex([r'(?im)<span class="owner[^\"]+?">[^<]+?<a [^>]+?>([^<]+?)</a>',
# Looking for official user
r'<(?:span|a) .*?rel="author".*?>([^<]+?)</'],
@@ -63,19 +103,28 @@ class DailymotionIE(SubtitlesInfoExtractor):
info = self._search_regex(r'var info = ({.*?}),$', embed_page,
'video info', flags=re.MULTILINE)
info = json.loads(info)
if info.get('error') is not None:
msg = 'Couldn\'t get video, Dailymotion says: %s' % info['error']['title']
raise ExtractorError(msg, expected=True)
# TODO: support choosing qualities
for key in ['stream_h264_hd1080_url','stream_h264_hd_url',
'stream_h264_hq_url','stream_h264_url',
'stream_h264_ld_url']:
if info.get(key):#key in info and info[key]:
max_quality = key
self.to_screen(u'Using %s' % key)
break
else:
formats = []
for (key, format_id) in self._FORMATS:
video_url = info.get(key)
if video_url is not None:
m_size = re.search(r'H264-(\d+)x(\d+)', video_url)
if m_size is not None:
width, height = m_size.group(1), m_size.group(2)
else:
width, height = None, None
formats.append({
'url': video_url,
'ext': 'mp4',
'format_id': format_id,
'width': width,
'height': height,
})
if not formats:
raise ExtractorError(u'Unable to extract video URL')
video_url = info[max_quality]
# subtitles
video_subtitles = self.extract_subtitles(video_id)
@@ -85,11 +134,10 @@ class DailymotionIE(SubtitlesInfoExtractor):
return [{
'id': video_id,
'url': video_url,
'formats': formats,
'uploader': video_uploader,
'upload_date': video_upload_date,
'title': self._og_search_title(webpage),
'ext': video_extension,
'subtitles': video_subtitles,
'thumbnail': info['thumbnail_url']
}]
@@ -110,29 +158,56 @@ class DailymotionIE(SubtitlesInfoExtractor):
return {}
class DailymotionPlaylistIE(InfoExtractor):
class DailymotionPlaylistIE(DailymotionBaseInfoExtractor):
IE_NAME = u'dailymotion:playlist'
_VALID_URL = r'(?:https?://)?(?:www\.)?dailymotion\.[a-z]{2,3}/playlist/(?P<id>.+?)/'
_MORE_PAGES_INDICATOR = r'<div class="next">.*?<a.*?href="/playlist/.+?".*?>.*?</a>.*?</div>'
_PAGE_TEMPLATE = 'https://www.dailymotion.com/playlist/%s/%s'
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
playlist_id = mobj.group('id')
def _extract_entries(self, id):
video_ids = []
for pagenum in itertools.count(1):
webpage = self._download_webpage('https://www.dailymotion.com/playlist/%s/%s' % (playlist_id, pagenum),
playlist_id, u'Downloading page %s' % pagenum)
request = self._build_request(self._PAGE_TEMPLATE % (id, pagenum))
webpage = self._download_webpage(request,
id, u'Downloading page %s' % pagenum)
playlist_el = get_element_by_attribute(u'class', u'video_list', webpage)
video_ids.extend(re.findall(r'data-id="(.+?)" data-ext-id', playlist_el))
video_ids.extend(re.findall(r'data-id="(.+?)"', playlist_el))
if re.search(self._MORE_PAGES_INDICATOR, webpage, re.DOTALL) is None:
break
return [self.url_result('http://www.dailymotion.com/video/%s' % video_id, 'Dailymotion')
for video_id in orderedSet(video_ids)]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
playlist_id = mobj.group('id')
webpage = self._download_webpage(url, playlist_id)
entries = [self.url_result('http://www.dailymotion.com/video/%s' % video_id, 'Dailymotion')
for video_id in video_ids]
return {'_type': 'playlist',
'id': playlist_id,
'title': get_element_by_id(u'playlist_name', webpage),
'entries': entries,
'entries': self._extract_entries(playlist_id),
}
class DailymotionUserIE(DailymotionPlaylistIE):
IE_NAME = u'dailymotion:user'
_VALID_URL = r'(?:https?://)?(?:www\.)?dailymotion\.[a-z]{2,3}/user/(?P<user>[^/]+)'
_MORE_PAGES_INDICATOR = r'<div class="next">.*?<a.*?href="/user/.+?".*?>.*?</a>.*?</div>'
_PAGE_TEMPLATE = 'http://www.dailymotion.com/user/%s/%s'
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
user = mobj.group('user')
webpage = self._download_webpage(url, user)
full_user = self._html_search_regex(
r'<a class="label" href="/%s".*?>(.*?)</' % re.escape(user),
webpage, u'user', flags=re.DOTALL)
return {
'_type': 'playlist',
'id': user,
'title': full_user,
'entries': self._extract_entries(user),
}

View File

@@ -0,0 +1,37 @@
import re
import xml.etree.ElementTree
from .common import InfoExtractor
from ..utils import determine_ext
class EbaumsWorldIE(InfoExtractor):
_VALID_URL = r'https?://www\.ebaumsworld\.com/video/watch/(?P<id>\d+)'
_TEST = {
u'url': u'http://www.ebaumsworld.com/video/watch/83367677/',
u'file': u'83367677.mp4',
u'info_dict': {
u'title': u'A Giant Python Opens The Door',
u'description': u'This is how nightmares start...',
u'uploader': u'jihadpizza',
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
config_xml = self._download_webpage(
'http://www.ebaumsworld.com/video/player/%s' % video_id, video_id)
config = xml.etree.ElementTree.fromstring(config_xml.encode('utf-8'))
video_url = config.find('file').text
return {
'id': video_id,
'title': config.find('title').text,
'url': video_url,
'ext': determine_ext(video_url),
'description': config.find('description').text,
'thumbnail': config.find('image').text,
'uploader': config.find('username').text,
}

View File

@@ -106,8 +106,8 @@ class FacebookIE(InfoExtractor):
video_duration = int(video_data['video_duration'])
thumbnail = video_data['thumbnail_src']
video_title = self._html_search_regex('<h2 class="uiHeaderTitle">([^<]+)</h2>',
webpage, u'title')
video_title = self._html_search_regex(
r'<h2 class="uiHeaderTitle">([^<]*)</h2>', webpage, u'title')
info = {
'id': video_id,

View File

@@ -0,0 +1,60 @@
# encoding: utf-8
import re
import xml.etree.ElementTree
from .common import InfoExtractor
from ..utils import (
determine_ext,
clean_html,
get_element_by_attribute,
)
class FazIE(InfoExtractor):
IE_NAME = u'faz.net'
_VALID_URL = r'https?://www\.faz\.net/multimedia/videos/.*?-(?P<id>\d+).html'
_TEST = {
u'url': u'http://www.faz.net/multimedia/videos/stockholm-chemie-nobelpreis-fuer-drei-amerikanische-forscher-12610585.html',
u'file': u'12610585.mp4',
u'info_dict': {
u'title': u'Stockholm: Chemie-Nobelpreis für drei amerikanische Forscher',
u'description': u'md5:1453fbf9a0d041d985a47306192ea253',
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
self.to_screen(video_id)
webpage = self._download_webpage(url, video_id)
config_xml_url = self._search_regex(r'writeFLV\(\'(.+?)\',', webpage,
u'config xml url')
config_xml = self._download_webpage(config_xml_url, video_id,
u'Downloading config xml')
config = xml.etree.ElementTree.fromstring(config_xml.encode('utf-8'))
encodings = config.find('ENCODINGS')
formats = []
for code in ['LOW', 'HIGH', 'HQ']:
encoding = encodings.find(code)
if encoding is None:
continue
encoding_url = encoding.find('FILENAME').text
formats.append({
'url': encoding_url,
'ext': determine_ext(encoding_url),
'format_id': code.lower(),
})
descr_html = get_element_by_attribute('class', 'Content Copy', webpage)
info = {
'id': video_id,
'title': self._og_search_title(webpage),
'formats': formats,
'description': clean_html(descr_html),
'thumbnail': config.find('STILL/STILL_BIG').text,
}
# TODO: Remove when #980 has been merged
info.update(formats[-1])
return info

View File

@@ -0,0 +1,79 @@
import re
import random
import json
from .common import InfoExtractor
from ..utils import (
determine_ext,
get_element_by_id,
clean_html,
)
class FKTVIE(InfoExtractor):
IE_NAME = u'fernsehkritik.tv'
_VALID_URL = r'(?:http://)?(?:www\.)?fernsehkritik.tv/folge-(?P<ep>[0-9]+)(?:/.*)?'
_TEST = {
u'url': u'http://fernsehkritik.tv/folge-1',
u'file': u'00011.flv',
u'info_dict': {
u'title': u'Folge 1 vom 10. April 2007',
u'description': u'md5:fb4818139c7cfe6907d4b83412a6864f',
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
episode = int(mobj.group('ep'))
server = random.randint(2, 4)
video_thumbnail = 'http://fernsehkritik.tv/images/magazin/folge%d.jpg' % episode
start_webpage = self._download_webpage('http://fernsehkritik.tv/folge-%d/Start' % episode,
episode)
playlist = self._search_regex(r'playlist = (\[.*?\]);', start_webpage,
u'playlist', flags=re.DOTALL)
files = json.loads(re.sub('{[^{}]*?}', '{}', playlist))
# TODO: return a single multipart video
videos = []
for i, _ in enumerate(files, 1):
video_id = '%04d%d' % (episode, i)
video_url = 'http://dl%d.fernsehkritik.tv/fernsehkritik%d%s.flv' % (server, episode, '' if i == 1 else '-%d' % i)
video_title = 'Fernsehkritik %d.%d' % (episode, i)
videos.append({
'id': video_id,
'url': video_url,
'ext': determine_ext(video_url),
'title': clean_html(get_element_by_id('eptitle', start_webpage)),
'description': clean_html(get_element_by_id('contentlist', start_webpage)),
'thumbnail': video_thumbnail
})
return videos
class FKTVPosteckeIE(InfoExtractor):
IE_NAME = u'fernsehkritik.tv:postecke'
_VALID_URL = r'(?:http://)?(?:www\.)?fernsehkritik.tv/inline-video/postecke.php\?(.*&)?ep=(?P<ep>[0-9]+)(&|$)'
_TEST = {
u'url': u'http://fernsehkritik.tv/inline-video/postecke.php?iframe=true&width=625&height=440&ep=120',
u'file': u'0120.flv',
u'md5': u'262f0adbac80317412f7e57b4808e5c4',
u'info_dict': {
u"title": u"Postecke 120"
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
episode = int(mobj.group('ep'))
server = random.randint(2, 4)
video_id = '%04d' % episode
video_url = 'http://dl%d.fernsehkritik.tv/postecke/postecke%d.flv' % (server, episode)
video_title = 'Postecke %d' % episode
return {
'id': video_id,
'url': video_url,
'ext': determine_ext(video_url),
'title': video_title,
}

View File

@@ -9,7 +9,7 @@ from ..utils import (
class FlickrIE(InfoExtractor):
"""Information Extractor for Flickr videos"""
_VALID_URL = r'(?:https?://)?(?:www\.)?flickr\.com/photos/(?P<uploader_id>[\w\-_@]+)/(?P<id>\d+).*'
_VALID_URL = r'(?:https?://)?(?:www\.|secure\.)?flickr\.com/photos/(?P<uploader_id>[\w\-_@]+)/(?P<id>\d+).*'
_TEST = {
u'url': u'http://www.flickr.com/photos/forestwander-nature-pictures/5645318632/in/photostream/',
u'file': u'5645318632.mp4',

View File

@@ -1,6 +1,7 @@
# encoding: utf-8
import re
import xml.etree.ElementTree
import json
from .common import InfoExtractor
from ..utils import (
@@ -34,17 +35,7 @@ class PluzzIE(FranceTVBaseInfoExtractor):
IE_NAME = u'pluzz.francetv.fr'
_VALID_URL = r'https?://pluzz\.francetv\.fr/videos/(.*?)\.html'
_TEST = {
u'url': u'http://pluzz.francetv.fr/videos/allo_rufo_saison5_,88439064.html',
u'file': u'88439064.mp4',
u'info_dict': {
u'title': u'Allô Rufo',
u'description': u'md5:d909f1ebdf963814b65772aea250400e',
},
u'params': {
u'skip_download': True,
},
}
# Can't use tests, videos expire in 7 days
def _real_extract(self, url):
title = re.match(self._VALID_URL, url).group(1)
@@ -75,3 +66,64 @@ class FranceTvInfoIE(FranceTVBaseInfoExtractor):
webpage = self._download_webpage(url, page_title)
video_id = self._search_regex(r'id-video=(\d+?)"', webpage, u'video id')
return self._extract_video(video_id)
class France2IE(FranceTVBaseInfoExtractor):
IE_NAME = u'france2.fr'
_VALID_URL = r'''(?x)https?://www\.france2\.fr/
(?:
emissions/.*?/videos/(?P<id>\d+)
| emission/(?P<key>[^/?]+)
)'''
_TEST = {
u'url': u'http://www.france2.fr/emissions/13h15-le-samedi-le-dimanche/videos/75540104',
u'file': u'75540104.mp4',
u'info_dict': {
u'title': u'13h15, le samedi...',
u'description': u'md5:2e5b58ba7a2d3692b35c792be081a03d',
},
u'params': {
u'skip_download': True,
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
if mobj.group('key'):
webpage = self._download_webpage(url, mobj.group('key'))
video_id = self._html_search_regex(
r'''(?x)<div\s+class="video-player">\s*
<a\s+href="http://videos.francetv.fr/video/([0-9]+)"\s+
class="francetv-video-player">''',
webpage, u'video ID')
else:
video_id = mobj.group('id')
return self._extract_video(video_id)
class GenerationQuoiIE(InfoExtractor):
IE_NAME = u'france2.fr:generation-quoi'
_VALID_URL = r'https?://generation-quoi\.france2\.fr/portrait/(?P<name>.*)(\?|$)'
_TEST = {
u'url': u'http://generation-quoi.france2.fr/portrait/garde-a-vous',
u'file': u'k7FJX8VBcvvLmX4wA5Q.mp4',
u'info_dict': {
u'title': u'Génération Quoi - Garde à Vous',
u'uploader': u'Génération Quoi',
},
u'params': {
# It uses Dailymotion
u'skip_download': True,
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
name = mobj.group('name')
info_url = compat_urlparse.urljoin(url, '/medias/video/%s.json' % name)
info_json = self._download_webpage(info_url, name)
info = json.loads(info_json)
return self.url_result('http://www.dailymotion.com/video/%s' % info['id'],
ie='Dailymotion')

View File

@@ -21,7 +21,8 @@ class FunnyOrDieIE(InfoExtractor):
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
video_url = self._search_regex(r'type="video/mp4" src="(.*?)"',
video_url = self._search_regex(
[r'type="video/mp4" src="(.*?)"', r'src="([^>]*?)" type=\'video/mp4\''],
webpage, u'video URL', flags=re.DOTALL)
info = {

View File

@@ -1,55 +1,59 @@
import re
import xml.etree.ElementTree
import json
from .common import InfoExtractor
from ..utils import (
unified_strdate,
compat_urllib_parse,
compat_urlparse,
unescapeHTML,
get_meta_content,
)
class GameSpotIE(InfoExtractor):
_VALID_URL = r'(?:http://)?(?:www\.)?gamespot\.com/.*-(?P<page_id>\d+)/?'
_TEST = {
u"url": u"http://www.gamespot.com/arma-iii/videos/arma-iii-community-guide-sitrep-i-6410818/",
u"file": u"6410818.mp4",
u"file": u"gs-2300-6410818.mp4",
u"md5": u"b2a30deaa8654fcccd43713a6b6a4825",
u"info_dict": {
u"title": u"Arma 3 - Community Guide: SITREP I",
u"upload_date": u"20130627",
u'description': u'Check out this video where some of the basics of Arma 3 is explained.',
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
page_id = mobj.group('page_id')
page_id = video_id = mobj.group('page_id')
webpage = self._download_webpage(url, page_id)
video_id = self._html_search_regex([r'"og:video" content=".*?\?id=(\d+)"',
r'http://www\.gamespot\.com/videoembed/(\d+)'],
webpage, 'video id')
data = compat_urllib_parse.urlencode({'id': video_id, 'newplayer': '1'})
info_url = 'http://www.gamespot.com/pages/video_player/xml.php?' + data
info_xml = self._download_webpage(info_url, video_id)
doc = xml.etree.ElementTree.fromstring(info_xml)
clip_el = doc.find('./playList/clip')
data_video_json = self._search_regex(r'data-video=\'(.*?)\'', webpage, u'data video')
data_video = json.loads(unescapeHTML(data_video_json))
http_urls = [{'url': node.find('filePath').text,
'rate': int(node.find('rate').text)}
for node in clip_el.find('./httpURI')]
best_quality = sorted(http_urls, key=lambda f: f['rate'])[-1]
video_url = best_quality['url']
title = clip_el.find('./title').text
ext = video_url.rpartition('.')[2]
thumbnail_url = clip_el.find('./screenGrabURI').text
view_count = int(clip_el.find('./views').text)
upload_date = unified_strdate(clip_el.find('./postDate').text)
# Transform the manifest url to a link to the mp4 files
# they are used in mobile devices.
f4m_url = data_video['videoStreams']['f4m_stream']
f4m_path = compat_urlparse.urlparse(f4m_url).path
QUALITIES_RE = r'((,\d+)+,?)'
qualities = self._search_regex(QUALITIES_RE, f4m_path, u'qualities').strip(',').split(',')
http_path = f4m_path[1:].split('/', 1)[1]
http_template = re.sub(QUALITIES_RE, r'%s', http_path)
http_template = http_template.replace('.csmil/manifest.f4m', '')
http_template = compat_urlparse.urljoin('http://video.gamespotcdn.com/', http_template)
formats = []
for q in qualities:
formats.append({
'url': http_template % q,
'ext': 'mp4',
'format_id': q,
})
return [{
'id' : video_id,
'url' : video_url,
'ext' : ext,
'title' : title,
'thumbnail' : thumbnail_url,
'upload_date' : upload_date,
'view_count' : view_count,
}]
info = {
'id': data_video['guid'],
'title': compat_urllib_parse.unquote(data_video['title']),
'formats': formats,
'description': get_meta_content('description', webpage),
'thumbnail': self._og_search_thumbnail(webpage),
}
# TODO: Remove when #980 has been merged
info.update(formats[-1])
return info

View File

@@ -11,6 +11,8 @@ from ..utils import (
compat_urlparse,
ExtractorError,
smuggle_url,
unescapeHTML,
)
from .brightcove import BrightcoveIE
@@ -29,17 +31,17 @@ class GenericIE(InfoExtractor):
u"title": u"R\u00e9gis plante sa Jeep"
}
},
# embedded vimeo video
{
u'url': u'http://www.8tv.cat/8aldia/videos/xavier-sala-i-martin-aquesta-tarda-a-8-al-dia/',
u'file': u'2371591881001.mp4',
u'md5': u'9e80619e0a94663f0bdc849b4566af19',
u'note': u'Test Brightcove downloads and detection in GenericIE',
u'url': u'http://skillsmatter.com/podcast/home/move-semanticsperfect-forwarding-and-rvalue-references',
u'file': u'22444065.mp4',
u'md5': u'2903896e23df39722c33f015af0666e2',
u'info_dict': {
u'title': u'Xavier Sala i Martín: “Un banc que no presta és un banc zombi que no serveix per a res”',
u'uploader': u'8TV',
u'description': u'md5:a950cc4285c43e44d763d036710cd9cd',
u'title': u'ACCU 2011: Move Semantics,Perfect Forwarding, and Rvalue references- Scott Meyers- 13/04/2011',
u"uploader_id": u"skillsmatter",
u"uploader": u"Skills Matter",
}
},
}
]
def report_download_webpage(self, video_id):
@@ -128,16 +130,31 @@ class GenericIE(InfoExtractor):
except ValueError:
# since this is the last-resort InfoExtractor, if
# this error is thrown, it'll be thrown here
raise ExtractorError(u'Invalid URL: %s' % url)
raise ExtractorError(u'Failed to download URL: %s' % url)
self.report_extraction(video_id)
# Look for BrightCove:
m_brightcove = re.search(r'<object.+?class=([\'"]).*?BrightcoveExperience.*?\1.+?</object>', webpage, re.DOTALL)
m_brightcove = re.search(r'<object[^>]+?class=([\'"])[^>]*?BrightcoveExperience.*?\1.+?</object>', webpage, re.DOTALL)
if m_brightcove is not None:
self.to_screen(u'Brightcove video detected.')
bc_url = BrightcoveIE._build_brighcove_url(m_brightcove.group())
return self.url_result(bc_url, 'Brightcove')
# Look for embedded Vimeo player
mobj = re.search(
r'<iframe[^>]+?src="(https?://player.vimeo.com/video/.+?)"', webpage)
if mobj:
player_url = unescapeHTML(mobj.group(1))
surl = smuggle_url(player_url, {'Referer': url})
return self.url_result(surl, 'Vimeo')
# Look for embedded YouTube player
mobj = re.search(
r'<iframe[^>]+?src="(https?://(?:www\.)?youtube.com/embed/.+?)"', webpage)
if mobj:
surl = unescapeHTML(mobj.group(1))
return self.url_result(surl, 'Youtube')
# Start with something easy: JW Player in SWFObject
mobj = re.search(r'flashvars: [\'"](?:.*&)?file=(http[^\'"&]*)', webpage)
if mobj is None:
@@ -160,12 +177,12 @@ class GenericIE(InfoExtractor):
# HTML5 video
mobj = re.search(r'<video[^<]*(?:>.*?<source.*?)? src="([^"]+)"', webpage, flags=re.DOTALL)
if mobj is None:
raise ExtractorError(u'Invalid URL: %s' % url)
raise ExtractorError(u'Unsupported URL: %s' % url)
# It's possible that one of the regexes
# matched, but returned an empty group:
if mobj.group(1) is None:
raise ExtractorError(u'Invalid URL: %s' % url)
raise ExtractorError(u'Did not find a valid video URL at %s' % url)
video_url = mobj.group(1)
video_url = compat_urlparse.urljoin(url, video_url)

View File

@@ -41,8 +41,9 @@ class GooglePlusIE(InfoExtractor):
# Extract update date
upload_date = self._html_search_regex(
['title="Timestamp">(.*?)</a>', r'<a.+?class="g-M.+?>(.+?)</a>'],
webpage, u'upload date', fatal=False)
r'''(?x)<a.+?class="o-U-s\s[^"]+"\s+style="display:\s*none"\s*>
([0-9]{4}-[0-9]{2}-[0-9]{2})</a>''',
webpage, u'upload date', fatal=False, flags=re.VERBOSE)
if upload_date:
# Convert timestring to a format suitable for filename
upload_date = datetime.datetime.strptime(upload_date, "%Y-%m-%d")

View File

@@ -7,11 +7,11 @@ from .common import InfoExtractor
class HotNewHipHopIE(InfoExtractor):
_VALID_URL = r'http://www\.hotnewhiphop.com/.*\.(?P<id>.*)\.html'
_TEST = {
u'url': u"http://www.hotnewhiphop.com/freddie-gibbs-lay-it-down-song.1435540.html'",
u'url': u"http://www.hotnewhiphop.com/freddie-gibbs-lay-it-down-song.1435540.html",
u'file': u'1435540.mp3',
u'md5': u'2c2cd2f76ef11a9b3b581e8b232f3d96',
u'info_dict': {
u"title": u"Freddie Gibbs Songs - Lay It Down"
u"title": u"Freddie Gibbs - Lay It Down"
}
}

View File

@@ -13,7 +13,7 @@ class IGNIE(InfoExtractor):
Some videos of it.ign.com are also supported
"""
_VALID_URL = r'https?://.+?\.ign\.com/(?P<type>videos|show_videos|articles)(/.+)?/(?P<name_or_id>.+)'
_VALID_URL = r'https?://.+?\.ign\.com/(?P<type>videos|show_videos|articles|(?:[^/]*/feature))(/.+)?/(?P<name_or_id>.+)'
IE_NAME = u'ign.com'
_CONFIG_URL_TEMPLATE = 'http://www.ign.com/videos/configs/id/%s.config'
@@ -21,15 +21,39 @@ class IGNIE(InfoExtractor):
r'id="my_show_video">.*?<p>(.*?)</p>',
]
_TEST = {
u'url': u'http://www.ign.com/videos/2013/06/05/the-last-of-us-review',
u'file': u'8f862beef863986b2785559b9e1aa599.mp4',
u'md5': u'eac8bdc1890980122c3b66f14bdd02e9',
u'info_dict': {
u'title': u'The Last of Us Review',
u'description': u'md5:c8946d4260a4d43a00d5ae8ed998870c',
}
}
_TESTS = [
{
u'url': u'http://www.ign.com/videos/2013/06/05/the-last-of-us-review',
u'file': u'8f862beef863986b2785559b9e1aa599.mp4',
u'md5': u'eac8bdc1890980122c3b66f14bdd02e9',
u'info_dict': {
u'title': u'The Last of Us Review',
u'description': u'md5:c8946d4260a4d43a00d5ae8ed998870c',
}
},
{
u'url': u'http://me.ign.com/en/feature/15775/100-little-things-in-gta-5-that-will-blow-your-mind',
u'playlist': [
{
u'file': u'5ebbd138523268b93c9141af17bec937.mp4',
u'info_dict': {
u'title': u'GTA 5 Video Review',
u'description': u'Rockstar drops the mic on this generation of games. Watch our review of the masterly Grand Theft Auto V.',
},
},
{
u'file': u'638672ee848ae4ff108df2a296418ee2.mp4',
u'info_dict': {
u'title': u'GTA 5\'s Twisted Beauty in Super Slow Motion',
u'description': u'The twisted beauty of GTA 5 in stunning slow motion.',
},
},
],
u'params': {
u'skip_download': True,
},
},
]
def _find_video_id(self, webpage):
res_id = [r'data-video-id="(.+?)"',
@@ -46,6 +70,13 @@ class IGNIE(InfoExtractor):
if page_type == 'articles':
video_url = self._search_regex(r'var videoUrl = "(.+?)"', webpage, u'video url')
return self.url_result(video_url, ie='IGN')
elif page_type != 'video':
multiple_urls = re.findall(
'<param name="flashvars" value="[^"]*?url=(https?://www\.ign\.com/videos/.*?)["&]',
webpage)
if multiple_urls:
return [self.url_result(u, ie='IGN') for u in multiple_urls]
video_id = self._find_video_id(webpage)
result = self._get_video_info(video_id)
description = self._html_search_regex(self._DESCRIPTION_RE,
@@ -87,6 +118,9 @@ class OneUPIE(IGNIE):
}
}
# Override IGN tests
_TESTS = []
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
id = mobj.group('name_or_id')

View File

@@ -0,0 +1,84 @@
import re
import xml.etree.ElementTree
from .common import InfoExtractor
from ..utils import (
compat_urlparse,
compat_urllib_parse,
xpath_with_ns,
determine_ext,
)
class InternetVideoArchiveIE(InfoExtractor):
_VALID_URL = r'https?://video\.internetvideoarchive\.net/flash/players/.*?\?.*?publishedid.*?'
_TEST = {
u'url': u'http://video.internetvideoarchive.net/flash/players/flashconfiguration.aspx?customerid=69249&publishedid=452693&playerid=247',
u'file': u'452693.mp4',
u'info_dict': {
u'title': u'SKYFALL',
u'description': u'In SKYFALL, Bond\'s loyalty to M is tested as her past comes back to haunt her. As MI6 comes under attack, 007 must track down and destroy the threat, no matter how personal the cost.',
u'duration': 153,
},
}
@staticmethod
def _build_url(query):
return 'http://video.internetvideoarchive.net/flash/players/flashconfiguration.aspx?' + query
@staticmethod
def _clean_query(query):
NEEDED_ARGS = ['publishedid', 'customerid']
query_dic = compat_urlparse.parse_qs(query)
cleaned_dic = dict((k,v[0]) for (k,v) in query_dic.items() if k in NEEDED_ARGS)
# Other player ids return m3u8 urls
cleaned_dic['playerid'] = '247'
cleaned_dic['videokbrate'] = '100000'
return compat_urllib_parse.urlencode(cleaned_dic)
def _real_extract(self, url):
query = compat_urlparse.urlparse(url).query
query_dic = compat_urlparse.parse_qs(query)
video_id = query_dic['publishedid'][0]
url = self._build_url(query)
flashconfiguration_xml = self._download_webpage(url, video_id,
u'Downloading flash configuration')
flashconfiguration = xml.etree.ElementTree.fromstring(flashconfiguration_xml.encode('utf-8'))
file_url = flashconfiguration.find('file').text
file_url = file_url.replace('/playlist.aspx', '/mrssplaylist.aspx')
# Replace some of the parameters in the query to get the best quality
# and http links (no m3u8 manifests)
file_url = re.sub(r'(?<=\?)(.+)$',
lambda m: self._clean_query(m.group()),
file_url)
info_xml = self._download_webpage(file_url, video_id,
u'Downloading video info')
info = xml.etree.ElementTree.fromstring(info_xml.encode('utf-8'))
item = info.find('channel/item')
def _bp(p):
return xpath_with_ns(p,
{'media': 'http://search.yahoo.com/mrss/',
'jwplayer': 'http://developer.longtailvideo.com/trac/wiki/FlashFormats'})
formats = []
for content in item.findall(_bp('media:group/media:content')):
attr = content.attrib
f_url = attr['url']
formats.append({
'url': f_url,
'ext': determine_ext(f_url),
'width': int(attr['width']),
'bitrate': int(attr['bitrate']),
})
formats = sorted(formats, key=lambda f: f['bitrate'])
return {
'id': video_id,
'title': item.find('title').text,
'formats': formats,
'thumbnail': item.find(_bp('media:thumbnail')).attrib['url'],
'description': item.find('description').text,
'duration': int(attr['duration']),
}

View File

@@ -6,13 +6,14 @@ import xml.etree.ElementTree
from .common import InfoExtractor
class JeuxVideoIE(InfoExtractor):
_VALID_URL = r'http://.*?\.jeuxvideo\.com/.*/(.*?)-\d+\.htm'
_TEST = {
u'url': u'http://www.jeuxvideo.com/reportages-videos-jeux/0004/00046170/tearaway-playstation-vita-gc-2013-tearaway-nous-presente-ses-papiers-d-identite-00115182.htm',
u'file': u'5182.mp4',
u'md5': u'e0fdb0cd3ce98713ef9c1e1e025779d0',
u'md5': u'046e491afb32a8aaac1f44dd4ddd54ee',
u'info_dict': {
u'title': u'GC 2013 : Tearaway nous présente ses papiers d\'identité',
u'description': u'Lorsque les développeurs de LittleBigPlanet proposent un nouveau titre, on ne peut que s\'attendre à un résultat original et fort attrayant.\n',
@@ -23,25 +24,29 @@ class JeuxVideoIE(InfoExtractor):
mobj = re.match(self._VALID_URL, url)
title = re.match(self._VALID_URL, url).group(1)
webpage = self._download_webpage(url, title)
m_download = re.search(r'<param name="flashvars" value="config=(.*?)" />', webpage)
xml_link = m_download.group(1)
xml_link = self._html_search_regex(
r'<param name="flashvars" value="config=(.*?)" />',
webpage, u'config URL')
id = re.search(r'http://www.jeuxvideo.com/config/\w+/0011/(.*?)/\d+_player\.xml', xml_link).group(1)
video_id = self._search_regex(
r'http://www\.jeuxvideo\.com/config/\w+/\d+/(.*?)/\d+_player\.xml',
xml_link, u'video ID')
xml_config = self._download_webpage(xml_link, title,
'Downloading XML config')
xml_config = self._download_webpage(
xml_link, title, u'Downloading XML config')
config = xml.etree.ElementTree.fromstring(xml_config.encode('utf-8'))
info = re.search(r'<format\.json>(.*?)</format\.json>',
xml_config, re.MULTILINE|re.DOTALL).group(1)
info = json.loads(info)['versions'][0]
info_json = self._search_regex(
r'(?sm)<format\.json>(.*?)</format\.json>',
xml_config, u'JSON information')
info = json.loads(info_json)['versions'][0]
video_url = 'http://video720.jeuxvideo.com/' + info['file']
return {'id': id,
'title' : config.find('titre_video').text,
'ext' : 'mp4',
'url' : video_url,
'description': self._og_search_description(webpage),
'thumbnail': config.find('image').text,
}
return {
'id': video_id,
'title': config.find('titre_video').text,
'ext': 'mp4',
'url': video_url,
'description': self._og_search_description(webpage),
'thumbnail': config.find('image').text,
}

View File

@@ -2,7 +2,12 @@ import re
import json
from .common import InfoExtractor
from ..utils import compat_urllib_parse_urlparse, compat_urlparse
from ..utils import (
compat_urllib_parse_urlparse,
compat_urlparse,
get_meta_content,
ExtractorError,
)
class LivestreamIE(InfoExtractor):
@@ -35,8 +40,11 @@ class LivestreamIE(InfoExtractor):
if video_id is None:
# This is an event page:
api_url = self._search_regex(r'event_design_eventId: \'(.+?)\'',
webpage, 'api url')
player = get_meta_content('twitter:player', webpage)
if player is None:
raise ExtractorError('Couldn\'t extract event api url')
api_url = player.replace('/player', '')
api_url = re.sub(r'^(https?://)(new\.)', r'\1api.\2', api_url)
info = json.loads(self._download_webpage(api_url, event_name,
u'Downloading event info'))
videos = [self._extract_video_info(video_data['data'])

View File

@@ -54,23 +54,26 @@ class MTVIE(InfoExtractor):
def _get_thumbnail_url(self, uri, itemdoc):
return 'http://mtv.mtvnimages.com/uri/' + uri
def _extract_video_url(self, metadataXml):
def _extract_video_formats(self, metadataXml):
if '/error_country_block.swf' in metadataXml:
raise ExtractorError(u'This video is not available from your country.', expected=True)
mdoc = xml.etree.ElementTree.fromstring(metadataXml.encode('utf-8'))
renditions = mdoc.findall('.//rendition')
# For now, always pick the highest quality.
rendition = renditions[-1]
try:
_,_,ext = rendition.attrib['type'].partition('/')
format = ext + '-' + rendition.attrib['width'] + 'x' + rendition.attrib['height'] + '_' + rendition.attrib['bitrate']
rtmp_video_url = rendition.find('./src').text
except KeyError:
raise ExtractorError('Invalid rendition field.')
video_url = self._transform_rtmp_url(rtmp_video_url)
return {'ext': ext, 'url': video_url, 'format': format}
formats = []
for rendition in mdoc.findall('.//rendition'):
try:
_, _, ext = rendition.attrib['type'].partition('/')
rtmp_video_url = rendition.find('./src').text
formats.append({'ext': ext,
'url': self._transform_rtmp_url(rtmp_video_url),
'format_id': rendition.get('bitrate'),
'width': int(rendition.get('width')),
'height': int(rendition.get('height')),
})
except (KeyError, TypeError):
raise ExtractorError('Invalid rendition field.')
return formats
def _get_video_info(self, itemdoc):
uri = itemdoc.find('guid').text
@@ -81,19 +84,25 @@ class MTVIE(InfoExtractor):
mediagen_url += '&acceptMethods=fms'
mediagen_page = self._download_webpage(mediagen_url, video_id,
u'Downloading video urls')
video_info = self._extract_video_url(mediagen_page)
description_node = itemdoc.find('description')
if description_node is not None:
description = description_node.text
description = description_node.text.strip()
else:
description = None
video_info.update({'title': itemdoc.find('title').text,
'id': video_id,
'thumbnail': self._get_thumbnail_url(uri, itemdoc),
'description': description,
})
return video_info
info = {
'title': itemdoc.find('title').text,
'formats': self._extract_video_formats(mediagen_page),
'id': video_id,
'thumbnail': self._get_thumbnail_url(uri, itemdoc),
'description': description,
}
# TODO: Remove when #980 has been merged
info.update(info['formats'][-1])
return info
def _get_videos_info(self, uri):
video_id = self._id_from_uri(uri)

View File

@@ -0,0 +1,38 @@
import json
import re
from .common import InfoExtractor
from ..utils import determine_ext
class NewgroundsIE(InfoExtractor):
_VALID_URL = r'(?:https?://)?(?:www\.)?newgrounds\.com/audio/listen/(?P<id>\d+)'
_TEST = {
u'url': u'http://www.newgrounds.com/audio/listen/549479',
u'file': u'549479.mp3',
u'md5': u'fe6033d297591288fa1c1f780386f07a',
u'info_dict': {
u"title": u"B7 - BusMode",
u"uploader": u"Burn7",
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
music_id = mobj.group('id')
webpage = self._download_webpage(url, music_id)
title = self._html_search_regex(r',"name":"([^"]+)",', webpage, u'music title')
uploader = self._html_search_regex(r',"artist":"([^"]+)",', webpage, u'music uploader')
music_url_json_string = self._html_search_regex(r'({"url":"[^"]+"),', webpage, u'music url') + '}'
music_url_json = json.loads(music_url_json_string)
music_url = music_url_json['url']
return {
'id': music_id,
'title': title,
'url': music_url,
'uploader': uploader,
'ext': determine_ext(music_url),
}

120
youtube_dl/extractor/nhl.py Normal file
View File

@@ -0,0 +1,120 @@
import re
import json
import xml.etree.ElementTree
from .common import InfoExtractor
from ..utils import (
compat_urlparse,
compat_urllib_parse,
determine_ext,
unified_strdate,
)
class NHLBaseInfoExtractor(InfoExtractor):
@staticmethod
def _fix_json(json_string):
return json_string.replace('\\\'', '\'')
def _extract_video(self, info):
video_id = info['id']
self.report_extraction(video_id)
initial_video_url = info['publishPoint']
data = compat_urllib_parse.urlencode({
'type': 'fvod',
'path': initial_video_url.replace('.mp4', '_sd.mp4'),
})
path_url = 'http://video.nhl.com/videocenter/servlets/encryptvideopath?' + data
path_response = self._download_webpage(path_url, video_id,
u'Downloading final video url')
path_doc = xml.etree.ElementTree.fromstring(path_response)
video_url = path_doc.find('path').text
join = compat_urlparse.urljoin
return {
'id': video_id,
'title': info['name'],
'url': video_url,
'ext': determine_ext(video_url),
'description': info['description'],
'duration': int(info['duration']),
'thumbnail': join(join(video_url, '/u/'), info['bigImage']),
'upload_date': unified_strdate(info['releaseDate'].split('.')[0]),
}
class NHLIE(NHLBaseInfoExtractor):
IE_NAME = u'nhl.com'
_VALID_URL = r'https?://video(?P<team>\.[^.]*)?\.nhl\.com/videocenter/console\?.*?(?<=[?&])id=(?P<id>\d+)'
_TEST = {
u'url': u'http://video.canucks.nhl.com/videocenter/console?catid=6?id=453614',
u'file': u'453614.mp4',
u'info_dict': {
u'title': u'Quick clip: Weise 4-3 goal vs Flames',
u'description': u'Dale Weise scores his first of the season to put the Canucks up 4-3.',
u'duration': 18,
u'upload_date': u'20131006',
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
json_url = 'http://video.nhl.com/videocenter/servlets/playlist?ids=%s&format=json' % video_id
info_json = self._download_webpage(json_url, video_id,
u'Downloading info json')
info_json = self._fix_json(info_json)
info = json.loads(info_json)[0]
return self._extract_video(info)
class NHLVideocenterIE(NHLBaseInfoExtractor):
IE_NAME = u'nhl.com:videocenter'
IE_DESC = u'Download the first 12 videos from a videocenter category'
_VALID_URL = r'https?://video\.(?P<team>[^.]*)\.nhl\.com/videocenter/(console\?.*?catid=(?P<catid>[^&]+))?'
@classmethod
def suitable(cls, url):
if NHLIE.suitable(url):
return False
return super(NHLVideocenterIE, cls).suitable(url)
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
team = mobj.group('team')
webpage = self._download_webpage(url, team)
cat_id = self._search_regex(
[r'var defaultCatId = "(.+?)";',
r'{statusIndex:0,index:0,.*?id:(.*?),'],
webpage, u'category id')
playlist_title = self._html_search_regex(
r'tab0"[^>]*?>(.*?)</td>',
webpage, u'playlist title', flags=re.DOTALL).lower().capitalize()
data = compat_urllib_parse.urlencode({
'cid': cat_id,
# This is the default value
'count': 12,
'ptrs': 3,
'format': 'json',
})
path = '/videocenter/servlets/browse?' + data
request_url = compat_urlparse.urljoin(url, path)
response = self._download_webpage(request_url, playlist_title)
response = self._fix_json(response)
if not response.strip():
self._downloader.report_warning(u'Got an empty reponse, trying '
u'adding the "newvideos" parameter')
response = self._download_webpage(request_url + '&newvideos=true',
playlist_title)
response = self._fix_json(response)
videos = json.loads(response)
return {
'_type': 'playlist',
'title': playlist_title,
'id': cat_id,
'entries': [self._extract_video(i) for i in videos],
}

View File

@@ -0,0 +1,46 @@
import re
from .common import InfoExtractor
from ..utils import compat_urlparse
class NowVideoIE(InfoExtractor):
_VALID_URL = r'(?:https?://)?(?:www\.)?nowvideo\.ch/video/(?P<id>\w+)'
_TEST = {
u'url': u'http://www.nowvideo.ch/video/0mw0yow7b6dxa',
u'file': u'0mw0yow7b6dxa.flv',
u'md5': u'f8fbbc8add72bd95b7850c6a02fc8817',
u'info_dict': {
u"title": u"youtubedl test video _BaW_jenozKc.mp4"
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
webpage_url = 'http://www.nowvideo.ch/video/' + video_id
embed_url = 'http://embed.nowvideo.ch/embed.php?v=' + video_id
webpage = self._download_webpage(webpage_url, video_id)
embed_page = self._download_webpage(embed_url, video_id,
u'Downloading embed page')
self.report_extraction(video_id)
video_title = self._html_search_regex(r'<h4>(.*)</h4>',
webpage, u'video title')
video_key = self._search_regex(r'var fkzd="(.*)";',
embed_page, u'video key')
api_call = "http://www.nowvideo.ch/api/player.api.php?file={0}&numOfErrors=0&cid=1&key={1}".format(video_id, video_key)
api_response = self._download_webpage(api_call, video_id,
u'Downloading API page')
video_url = compat_urlparse.parse_qs(api_response)[u'url'][0]
return [{
'id': video_id,
'url': video_url,
'ext': 'flv',
'title': video_title,
}]

View File

@@ -38,6 +38,7 @@ class PornotubeIE(InfoExtractor):
VIDEO_UPLOADED_RE = r'<div class="video_added_by">Added (?P<date>[0-9\/]+) by'
upload_date = self._html_search_regex(VIDEO_UPLOADED_RE, webpage, u'upload date', fatal=False)
if upload_date: upload_date = unified_strdate(upload_date)
age_limit = self._rta_search(webpage)
info = {'id': video_id,
'url': video_url,
@@ -45,6 +46,7 @@ class PornotubeIE(InfoExtractor):
'upload_date': upload_date,
'title': video_title,
'ext': 'flv',
'format': 'flv'}
'format': 'flv',
'age_limit': age_limit}
return [info]

View File

@@ -10,28 +10,35 @@ class RedTubeIE(InfoExtractor):
u'file': u'66418.mp4',
u'md5': u'7b8c22b5e7098a3e1c09709df1126d2d',
u'info_dict': {
u"title": u"Sucked on a toilet"
u"title": u"Sucked on a toilet",
u"age_limit": 18,
}
}
def _real_extract(self,url):
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
video_extension = 'mp4'
video_extension = 'mp4'
webpage = self._download_webpage(url, video_id)
self.report_extraction(video_id)
video_url = self._html_search_regex(r'<source src="(.+?)" type="video/mp4">',
webpage, u'video URL')
video_url = self._html_search_regex(
r'<source src="(.+?)" type="video/mp4">', webpage, u'video URL')
video_title = self._html_search_regex('<h1 class="videoTitle slidePanelMovable">(.+?)</h1>',
video_title = self._html_search_regex(
r'<h1 class="videoTitle slidePanelMovable">(.+?)</h1>',
webpage, u'title')
return [{
'id': video_id,
'url': video_url,
'ext': video_extension,
'title': video_title,
}]
# No self-labeling, but they describe themselves as
# "Home of Videos Porno"
age_limit = 18
return {
'id': video_id,
'url': video_url,
'ext': video_extension,
'title': video_title,
'age_limit': age_limit,
}

View File

@@ -0,0 +1,16 @@
from .videodetective import VideoDetectiveIE
# It just uses the same method as videodetective.com,
# the internetvideoarchive.com is extracted from the og:video property
class RottenTomatoesIE(VideoDetectiveIE):
_VALID_URL = r'https?://www\.rottentomatoes\.com/m/[^/]+/trailers/(?P<id>\d+)'
_TEST = {
u'url': u'http://www.rottentomatoes.com/m/toy_story_3/trailers/11028566/',
u'file': '613340.mp4',
u'info_dict': {
u'title': u'TOY STORY 3',
u'description': u'From the creators of the beloved TOY STORY films, comes a story that will reunite the gang in a whole new way.',
},
}

View File

@@ -8,8 +8,8 @@ from ..utils import (
)
class RTLnowIE(InfoExtractor):
"""Information Extractor for RTL NOW, RTL2 NOW, SUPER RTL NOW and VOX NOW"""
_VALID_URL = r'(?:http://)?(?P<url>(?P<base_url>rtl-now\.rtl\.de/|rtl2now\.rtl2\.de/|(?:www\.)?voxnow\.de/|(?:www\.)?superrtlnow\.de/)[a-zA-Z0-9-]+/[a-zA-Z0-9-]+\.php\?(?:container_id|film_id)=(?P<video_id>[0-9]+)&player=1(?:&season=[0-9]+)?(?:&.*)?)'
"""Information Extractor for RTL NOW, RTL2 NOW, RTL NITRO, SUPER RTL NOW, VOX NOW and n-tv NOW"""
_VALID_URL = r'(?:http://)?(?P<url>(?P<base_url>rtl-now\.rtl\.de/|rtl2now\.rtl2\.de/|(?:www\.)?voxnow\.de/|(?:www\.)?rtlnitronow\.de/|(?:www\.)?superrtlnow\.de/|(?:www\.)?n-tvnow\.de/)[a-zA-Z0-9-]+/[a-zA-Z0-9-]+\.php\?(?:container_id|film_id)=(?P<video_id>[0-9]+)&player=1(?:&season=[0-9]+)?(?:&.*)?)'
_TESTS = [{
u'url': u'http://rtl-now.rtl.de/ahornallee/folge-1.php?film_id=90419&player=1&season=1',
u'file': u'90419.flv',
@@ -61,8 +61,35 @@ class RTLnowIE(InfoExtractor):
u'params': {
u'skip_download': True,
},
},
{
u'url': u'http://www.rtlnitronow.de/recht-ordnung/lebensmittelkontrolle-erlangenordnungsamt-berlin.php?film_id=127367&player=1&season=1',
u'file': u'127367.flv',
u'info_dict': {
u'upload_date': u'20130926',
u'title': u'Recht & Ordnung - Lebensmittelkontrolle Erlangen/Ordnungsamt...',
u'description': u'Lebensmittelkontrolle Erlangen/Ordnungsamt Berlin',
u'thumbnail': u'http://autoimg.static-fra.de/nitronow/344787/1500x1500/image2.jpg',
},
u'params': {
u'skip_download': True,
},
},
{
u'url': u'http://www.n-tvnow.de/top-gear/episode-1-2013-01-01-00-00-00.php?film_id=124903&player=1&season=10',
u'file': u'124903.flv',
u'info_dict': {
u'upload_date': u'20130101',
u'title': u'Top Gear vom 01.01.2013',
u'description': u'Episode 1',
},
u'params': {
u'skip_download': True,
},
u'skip': u'Only works from Germany',
}]
def _real_extract(self,url):
mobj = re.match(self._VALID_URL, url)
@@ -79,20 +106,23 @@ class RTLnowIE(InfoExtractor):
msg = clean_html(note_m.group(1))
raise ExtractorError(msg)
video_title = self._html_search_regex(r'<title>(?P<title>[^<]+)</title>',
video_title = self._html_search_regex(r'<title>(?P<title>[^<]+?)( \| [^<]*)?</title>',
webpage, u'title')
playerdata_url = self._html_search_regex(r'\'playerdata\': \'(?P<playerdata_url>[^\']+)\'',
webpage, u'playerdata_url')
playerdata = self._download_webpage(playerdata_url, video_id)
mobj = re.search(r'<title><!\[CDATA\[(?P<description>.+?)\s+- (?:Sendung )?vom (?P<upload_date_d>[0-9]{2})\.(?P<upload_date_m>[0-9]{2})\.(?:(?P<upload_date_Y>[0-9]{4})|(?P<upload_date_y>[0-9]{2})) [0-9]{2}:[0-9]{2} Uhr\]\]></title>', playerdata)
mobj = re.search(r'<title><!\[CDATA\[(?P<description>.+?)(?:\s+- (?:Sendung )?vom (?P<upload_date_d>[0-9]{2})\.(?P<upload_date_m>[0-9]{2})\.(?:(?P<upload_date_Y>[0-9]{4})|(?P<upload_date_y>[0-9]{2})) [0-9]{2}:[0-9]{2} Uhr)?\]\]></title>', playerdata)
if mobj:
video_description = mobj.group(u'description')
if mobj.group('upload_date_Y'):
video_upload_date = mobj.group('upload_date_Y')
else:
elif mobj.group('upload_date_y'):
video_upload_date = u'20' + mobj.group('upload_date_y')
video_upload_date += mobj.group('upload_date_m')+mobj.group('upload_date_d')
else:
video_upload_date = None
if video_upload_date:
video_upload_date += mobj.group('upload_date_m')+mobj.group('upload_date_d')
else:
video_description = None
video_upload_date = None

View File

@@ -0,0 +1,58 @@
# encoding: utf-8
import re
import json
from .common import InfoExtractor
from ..utils import (
compat_urlparse,
compat_str,
ExtractorError,
)
class RutubeIE(InfoExtractor):
_VALID_URL = r'https?://rutube.ru/video/(?P<long_id>\w+)'
_TEST = {
u'url': u'http://rutube.ru/video/3eac3b4561676c17df9132a9a1e62e3e/',
u'file': u'3eac3b4561676c17df9132a9a1e62e3e.mp4',
u'info_dict': {
u'title': u'Раненный кенгуру забежал в аптеку',
u'uploader': u'NTDRussian',
u'uploader_id': u'29790',
},
u'params': {
# It requires ffmpeg (m3u8 download)
u'skip_download': True,
},
}
def _get_api_response(self, short_id, subpath):
api_url = 'http://rutube.ru/api/play/%s/%s/?format=json' % (subpath, short_id)
response_json = self._download_webpage(api_url, short_id,
u'Downloading %s json' % subpath)
return json.loads(response_json)
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
long_id = mobj.group('long_id')
webpage = self._download_webpage(url, long_id)
og_video = self._og_search_video_url(webpage)
short_id = compat_urlparse.urlparse(og_video).path[1:]
options = self._get_api_response(short_id, 'options')
trackinfo = self._get_api_response(short_id, 'trackinfo')
# Some videos don't have the author field
author = trackinfo.get('author') or {}
m3u8_url = trackinfo['video_balancer'].get('m3u8')
if m3u8_url is None:
raise ExtractorError(u'Couldn\'t find m3u8 manifest url')
return {
'id': trackinfo['id'],
'title': trackinfo['title'],
'url': m3u8_url,
'ext': 'mp4',
'thumbnail': options['thumbnail_url'],
'uploader': author.get('name'),
'uploader_id': compat_str(author['id']) if author else None,
}

View File

@@ -5,7 +5,7 @@ from .mtv import MTVIE, _media_xml_tag
class SouthParkStudiosIE(MTVIE):
IE_NAME = u'southparkstudios.com'
_VALID_URL = r'https?://www\.southparkstudios\.com/clips/(?P<id>\d+)'
_VALID_URL = r'https?://www\.southparkstudios\.com/(clips|full-episodes)/(?P<id>.+?)(\?|#|$)'
_FEED_URL = 'http://www.southparkstudios.com/feeds/video-player/mrss'
@@ -23,7 +23,11 @@ class SouthParkStudiosIE(MTVIE):
def _get_thumbnail_url(self, uri, itemdoc):
search_path = '%s/%s' % (_media_xml_tag('group'), _media_xml_tag('thumbnail'))
return itemdoc.find(search_path).attrib['url']
thumb_node = itemdoc.find(search_path)
if thumb_node is None:
return None
else:
return thumb_node.attrib['url']
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)

View File

@@ -0,0 +1,44 @@
# -*- coding: utf-8 -*-
import re
from .common import InfoExtractor
from ..utils import determine_ext
class SztvHuIE(InfoExtractor):
_VALID_URL = r'(?:http://)?(?:(?:www\.)?sztv\.hu|www\.tvszombathely\.hu)/(?:[^/]+)/.+-(?P<id>[0-9]+)'
_TEST = {
u'url': u'http://sztv.hu/hirek/cserkeszek-nepszerusitettek-a-kornyezettudatos-eletmodot-a-savaria-teren-20130909',
u'file': u'20130909.mp4',
u'md5': u'a6df607b11fb07d0e9f2ad94613375cb',
u'info_dict': {
u"title": u"Cserkészek népszerűsítették a környezettudatos életmódot a Savaria téren",
u"description": u'A zöld nap játékos ismeretterjesztő programjait a Magyar Cserkész Szövetség szervezte, akik az ország nyolc városában adják át tudásukat az érdeklődőknek. A PET...',
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
video_file = self._search_regex(
r'file: "...:(.*?)",', webpage, 'video file')
title = self._html_search_regex(
r'<meta name="title" content="([^"]*?) - [^-]*? - [^-]*?"',
webpage, 'video title')
description = self._html_search_regex(
r'<meta name="description" content="([^"]*)"/>',
webpage, 'video description', fatal=False)
thumbnail = self._og_search_thumbnail(webpage)
video_url = 'http://media.sztv.hu/vod/' + video_file
return {
'id': video_id,
'url': video_url,
'title': title,
'ext': determine_ext(video_url),
'description': description,
'thumbnail': thumbnail,
}

View File

@@ -0,0 +1,65 @@
import re
from .common import InfoExtractor
from ..utils import (
get_element_by_attribute,
clean_html,
)
class TechTalksIE(InfoExtractor):
_VALID_URL = r'https?://techtalks\.tv/talks/[^/]*/(?P<id>\d+)/'
_TEST = {
u'url': u'http://techtalks.tv/talks/learning-topic-models-going-beyond-svd/57758/',
u'playlist': [
{
u'file': u'57758.flv',
u'info_dict': {
u'title': u'Learning Topic Models --- Going beyond SVD',
},
},
{
u'file': u'57758-slides.flv',
u'info_dict': {
u'title': u'Learning Topic Models --- Going beyond SVD',
},
},
],
u'params': {
# rtmp download
u'skip_download': True,
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
talk_id = mobj.group('id')
webpage = self._download_webpage(url, talk_id)
rtmp_url = self._search_regex(r'netConnectionUrl: \'(.*?)\'', webpage,
u'rtmp url')
play_path = self._search_regex(r'href=\'(.*?)\' [^>]*id="flowplayer_presenter"',
webpage, u'presenter play path')
title = clean_html(get_element_by_attribute('class', 'title', webpage))
video_info = {
'id': talk_id,
'title': title,
'url': rtmp_url,
'play_path': play_path,
'ext': 'flv',
}
m_slides = re.search(r'<a class="slides" href=\'(.*?)\'', webpage)
if m_slides is None:
return video_info
else:
return [
video_info,
# The slides video
{
'id': talk_id + '-slides',
'title': title,
'url': rtmp_url,
'play_path': m_slides.group(1),
'ext': 'flv',
},
]

View File

@@ -77,12 +77,20 @@ class TEDIE(InfoExtractor):
thumbnail = self._search_regex(r'</span>[\s.]*</div>[\s.]*<img src="(.*?)"',
webpage, 'thumbnail')
formats = [{
'ext': 'mp4',
'url': stream['file'],
'format': stream['id']
} for stream in info['htmlStreams']]
info = {
'id': info['id'],
'url': info['htmlStreams'][-1]['file'],
'ext': 'mp4',
'title': title,
'thumbnail': thumbnail,
'description': desc,
}
'id': info['id'],
'title': title,
'thumbnail': thumbnail,
'description': desc,
'formats': formats,
}
# TODO: Remove when #980 has been merged
info.update(info['formats'][-1])
return info

View File

@@ -7,15 +7,25 @@ from .common import InfoExtractor
class TudouIE(InfoExtractor):
_VALID_URL = r'(?:http://)?(?:www\.)?tudou\.com/(?:listplay|programs)/(?:view|(.+?))/(?:([^/]+)|([^/]+))(?:\.html)?'
_TEST = {
_VALID_URL = r'(?:http://)?(?:www\.)?tudou\.com/(?:listplay|programs|albumplay)/(?:view|(.+?))/(?:([^/]+)|([^/]+))(?:\.html)?'
_TESTS = [{
u'url': u'http://www.tudou.com/listplay/zzdE77v6Mmo/2xN2duXMxmw.html',
u'file': u'159448201.f4v',
u'md5': u'140a49ed444bd22f93330985d8475fcb',
u'info_dict': {
u"title": u"卡马乔国足开大脚长传冲吊集锦"
}
}
},
{
u'url': u'http://www.tudou.com/albumplay/TenTw_JgiPM/PzsAs5usU9A.html',
u'file': u'todo.mp4',
u'md5': u'todo.mp4',
u'info_dict': {
u'title': u'todo.mp4',
},
u'add_ie': [u'Youku'],
u'skip': u'Only works from China'
}]
def _url_for_id(self, id, quality = None):
info_url = "http://v2.tudou.com/f?id="+str(id)
@@ -29,14 +39,19 @@ class TudouIE(InfoExtractor):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group(2)
webpage = self._download_webpage(url, video_id)
title = re.search(",kw:\"(.+)\"",webpage)
if title is None:
title = re.search(",kw: \'(.+)\'",webpage)
title = title.group(1)
thumbnail_url = re.search(",pic: \'(.+?)\'",webpage)
if thumbnail_url is None:
thumbnail_url = re.search(",pic:\"(.+?)\"",webpage)
thumbnail_url = thumbnail_url.group(1)
m = re.search(r'vcode:\s*[\'"](.+?)[\'"]', webpage)
if m and m.group(1):
return {
'_type': 'url',
'url': u'youku:' + m.group(1),
'ie_key': 'Youku'
}
title = self._search_regex(
r",kw:\s*['\"](.+?)[\"']", webpage, u'title')
thumbnail_url = self._search_regex(
r",pic:\s*[\"'](.+?)[\"']", webpage, u'thumbnail URL', fatal=False)
segs_json = self._search_regex(r'segs: \'(.*)\'', webpage, 'segments')
segments = json.loads(segs_json)

View File

@@ -1,11 +1,15 @@
import re
import json
import xml.etree.ElementTree
import datetime
from .common import InfoExtractor
from ..utils import (
determine_ext,
ExtractorError,
)
class VevoIE(InfoExtractor):
"""
Accepts urls from vevo.com or in the format 'vevo:{id}'
@@ -15,11 +19,11 @@ class VevoIE(InfoExtractor):
_TEST = {
u'url': u'http://www.vevo.com/watch/hurts/somebody-to-die-for/GB1101300280',
u'file': u'GB1101300280.mp4',
u'md5': u'06bea460acb744eab74a9d7dcb4bfd61',
u'info_dict': {
u"upload_date": u"20130624",
u"uploader": u"Hurts",
u"title": u"Somebody to Die For"
u"title": u"Somebody to Die For",
u'duration': 230,
}
}
@@ -27,27 +31,47 @@ class VevoIE(InfoExtractor):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
json_url = 'http://www.vevo.com/data/video/%s' % video_id
base_url = 'http://smil.lvl3.vevo.com'
videos_url = '%s/Video/V2/VFILE/%s/%sr.smil' % (base_url, video_id, video_id.lower())
json_url = 'http://videoplayer.vevo.com/VideoService/AuthenticateVideo?isrc=%s' % video_id
info_json = self._download_webpage(json_url, video_id, u'Downloading json info')
links_webpage = self._download_webpage(videos_url, video_id, u'Downloading videos urls')
self.report_extraction(video_id)
video_info = json.loads(info_json)
m_urls = list(re.finditer(r'<video src="(?P<ext>.*?):/?(?P<url>.*?)"', links_webpage))
if m_urls is None or len(m_urls) == 0:
raise ExtractorError(u'Unable to extract video url')
# They are sorted from worst to best quality
m_url = m_urls[-1]
video_url = base_url + '/' + m_url.group('url')
ext = m_url.group('ext')
video_info = json.loads(info_json)['video']
last_version = {'version': -1}
for version in video_info['videoVersions']:
# These are the HTTP downloads, other types are for different manifests
if version['sourceType'] == 2:
if version['version'] > last_version['version']:
last_version = version
if last_version['version'] == -1:
raise ExtractorError(u'Unable to extract last version of the video')
return {'url': video_url,
'ext': ext,
'id': video_id,
'title': video_info['title'],
'thumbnail': video_info['img'],
'upload_date': video_info['launchDate'].replace('/',''),
'uploader': video_info['Artists'][0]['title'],
}
renditions = xml.etree.ElementTree.fromstring(last_version['data'])
formats = []
# Already sorted from worst to best quality
for rend in renditions.findall('rendition'):
attr = rend.attrib
f_url = attr['url']
formats.append({
'url': f_url,
'ext': determine_ext(f_url),
'height': int(attr['frameheight']),
'width': int(attr['frameWidth']),
})
date_epoch = int(self._search_regex(
r'/Date\((\d+)\)/', video_info['launchDate'], u'launch date'))/1000
upload_date = datetime.datetime.fromtimestamp(date_epoch)
info = {
'id': video_id,
'title': video_info['title'],
'formats': formats,
'thumbnail': video_info['imageUrl'],
'upload_date': upload_date.strftime('%Y%m%d'),
'uploader': video_info['mainArtists'][0]['artistName'],
'duration': video_info['duration'],
}
# TODO: Remove when #980 has been merged
info.update(formats[-1])
return info

View File

@@ -0,0 +1,64 @@
import json
import re
from .common import InfoExtractor
from ..utils import (
determine_ext,
)
class ViddlerIE(InfoExtractor):
_VALID_URL = r'(?P<domain>https?://(?:www\.)?viddler.com)/(?:v|embed|player)/(?P<id>[0-9]+)'
_TEST = {
u"url": u"http://www.viddler.com/v/43903784",
u'file': u'43903784.mp4',
u'md5': u'fbbaedf7813e514eb7ca30410f439ac9',
u'info_dict': {
u"title": u"Video Made Easy",
u"uploader": u"viddler",
u"duration": 100.89,
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
embed_url = mobj.group('domain') + u'/embed/' + video_id
webpage = self._download_webpage(embed_url, video_id)
video_sources_code = self._search_regex(
r"(?ms)sources\s*:\s*(\{.*?\})", webpage, u'video URLs')
video_sources = json.loads(video_sources_code.replace("'", '"'))
formats = [{
'url': video_url,
'format': format_id,
} for video_url, format_id in video_sources.items()]
title = self._html_search_regex(
r"title\s*:\s*'([^']*)'", webpage, u'title')
uploader = self._html_search_regex(
r"authorName\s*:\s*'([^']*)'", webpage, u'uploader', fatal=False)
duration_s = self._html_search_regex(
r"duration\s*:\s*([0-9.]*)", webpage, u'duration', fatal=False)
duration = float(duration_s) if duration_s else None
thumbnail = self._html_search_regex(
r"thumbnail\s*:\s*'([^']*)'",
webpage, u'thumbnail', fatal=False)
info = {
'_type': 'video',
'id': video_id,
'title': title,
'thumbnail': thumbnail,
'uploader': uploader,
'duration': duration,
'formats': formats,
}
# TODO: Remove when #980 has been merged
info['formats'][-1]['ext'] = determine_ext(info['formats'][-1]['url'])
info.update(info['formats'][-1])
return info

View File

@@ -0,0 +1,30 @@
import re
from .common import InfoExtractor
from .internetvideoarchive import InternetVideoArchiveIE
from ..utils import (
compat_urlparse,
)
class VideoDetectiveIE(InfoExtractor):
_VALID_URL = r'https?://www\.videodetective\.com/[^/]+/[^/]+/(?P<id>\d+)'
_TEST = {
u'url': u'http://www.videodetective.com/movies/kick-ass-2/194487',
u'file': u'194487.mp4',
u'info_dict': {
u'title': u'KICK-ASS 2',
u'description': u'md5:65ba37ad619165afac7d432eaded6013',
u'duration': 135,
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
og_video = self._og_search_video_url(webpage)
query = compat_urlparse.urlparse(og_video).query
return self.url_result(InternetVideoArchiveIE._build_url(query),
ie=InternetVideoArchiveIE.ie_key())

View File

@@ -0,0 +1,40 @@
import re
import random
from .common import InfoExtractor
class VideoPremiumIE(InfoExtractor):
_VALID_URL = r'(?:https?://)?(?:www\.)?videopremium\.tv/(?P<id>\w+)(?:/.*)?'
_TEST = {
u'url': u'http://videopremium.tv/4w7oadjsf156',
u'file': u'4w7oadjsf156.f4v',
u'info_dict': {
u"title": u"youtube-dl_test_video____a_________-BaW_jenozKc.mp4.mp4"
},
u'params': {
u'skip_download': True,
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
webpage_url = 'http://videopremium.tv/' + video_id
webpage = self._download_webpage(webpage_url, video_id)
self.report_extraction(video_id)
video_title = self._html_search_regex(r'<h2(?:.*?)>\s*(.+?)\s*<',
webpage, u'video title')
return [{
'id': video_id,
'url': "rtmp://e%d.md.iplay.md/play" % random.randint(1, 16),
'play_path': "mp4:%s.f4v" % video_id,
'page_url': "http://videopremium.tv/" + video_id,
'player_url': "http://videopremium.tv/uplayer/uppod.swf",
'ext': 'f4v',
'title': video_title,
}]

View File

@@ -1,3 +1,4 @@
# encoding: utf-8
import json
import re
import itertools
@@ -10,21 +11,23 @@ from ..utils import (
clean_html,
get_element_by_attribute,
ExtractorError,
RegexNotFoundError,
std_headers,
unsmuggle_url,
)
class VimeoIE(InfoExtractor):
"""Information extractor for vimeo.com."""
# _VALID_URL matches Vimeo URLs
_VALID_URL = r'(?P<proto>https?://)?(?:(?:www|player)\.)?vimeo(?P<pro>pro)?\.com/(?:(?:(?:groups|album)/[^/]+)|(?:.*?)/)?(?P<direct_link>play_redirect_hls\?clip_id=)?(?:videos?/)?(?P<id>[0-9]+)(?:[?].*)?$'
_VALID_URL = r'(?P<proto>https?://)?(?:(?:www|player)\.)?vimeo(?P<pro>pro)?\.com/(?:(?:(?:groups|album)/[^/]+)|(?:.*?)/)?(?P<direct_link>play_redirect_hls\?clip_id=)?(?:videos?/)?(?P<id>[0-9]+)/?(?:[?].*)?$'
_NETRC_MACHINE = 'vimeo'
IE_NAME = u'vimeo'
_TESTS = [
{
u'url': u'http://vimeo.com/56015672',
u'file': u'56015672.mp4',
u'md5': u'8879b6cc097e987f02484baf890129e5',
u'md5': u'ae7a1d8b183758a0506b0622f37dfa14',
u'info_dict': {
u"upload_date": u"20121220",
u"description": u"This is a test case for youtube-dl.\nFor more information, see github.com/rg3/youtube-dl\nTest chars: \u2605 \" ' \u5e78 / \\ \u00e4 \u21ad \U0001d550",
@@ -54,6 +57,21 @@ class VimeoIE(InfoExtractor):
u'uploader': u'The BLN & Business of Software',
},
},
{
u'url': u'http://vimeo.com/68375962',
u'file': u'68375962.mp4',
u'md5': u'aaf896bdb7ddd6476df50007a0ac0ae7',
u'note': u'Video protected with password',
u'info_dict': {
u'title': u'youtube-dl password protected test video',
u'upload_date': u'20130614',
u'uploader_id': u'user18948128',
u'uploader': u'Jaime Marquínez Ferrándiz',
},
u'params': {
u'videopassword': u'youtube-dl',
},
},
]
def _login(self):
@@ -98,6 +116,12 @@ class VimeoIE(InfoExtractor):
self._login()
def _real_extract(self, url, new_video=True):
url, data = unsmuggle_url(url)
headers = std_headers
if data is not None:
headers = headers.copy()
headers.update(data)
# Extract ID from URL
mobj = re.match(self._VALID_URL, url)
if mobj is None:
@@ -112,7 +136,7 @@ class VimeoIE(InfoExtractor):
url = 'https://vimeo.com/' + video_id
# Retrieve video webpage to extract further information
request = compat_urllib_request.Request(url, None, std_headers)
request = compat_urllib_request.Request(url, None, headers)
webpage = self._download_webpage(request, video_id)
# Now we begin extracting as much information as we can from what we
@@ -122,18 +146,26 @@ class VimeoIE(InfoExtractor):
# Extract the config JSON
try:
config = self._search_regex([r' = {config:({.+?}),assets:', r'c=({.+?);'],
webpage, u'info section', flags=re.DOTALL)
config = json.loads(config)
except:
try:
config_url = self._html_search_regex(
r' data-config-url="(.+?)"', webpage, u'config URL')
config_json = self._download_webpage(config_url, video_id)
config = json.loads(config_json)
except RegexNotFoundError:
# For pro videos or player.vimeo.com urls
config = self._search_regex([r' = {config:({.+?}),assets:', r'c=({.+?);'],
webpage, u'info section', flags=re.DOTALL)
config = json.loads(config)
except Exception as e:
if re.search('The creator of this video has not given you permission to embed it on this domain.', webpage):
raise ExtractorError(u'The author has restricted the access to this video, try with the "--referer" option')
if re.search('If so please provide the correct password.', webpage):
if re.search('<form[^>]+?id="pw_form"', webpage) is not None:
self._verify_video_password(url, video_id, webpage)
return self._real_extract(url)
else:
raise ExtractorError(u'Unable to extract info section')
raise ExtractorError(u'Unable to extract info section',
cause=e)
# Extract title
video_title = config["video"]["title"]
@@ -172,46 +204,45 @@ class VimeoIE(InfoExtractor):
# Vimeo specific: extract video codec and quality information
# First consider quality, then codecs, then take everything
# TODO bind to format param
codecs = [('h264', 'mp4'), ('vp8', 'flv'), ('vp6', 'flv')]
codecs = [('vp6', 'flv'), ('vp8', 'flv'), ('h264', 'mp4')]
files = { 'hd': [], 'sd': [], 'other': []}
config_files = config["video"].get("files") or config["request"].get("files")
for codec_name, codec_extension in codecs:
if codec_name in config_files:
if 'hd' in config_files[codec_name]:
files['hd'].append((codec_name, codec_extension, 'hd'))
elif 'sd' in config_files[codec_name]:
files['sd'].append((codec_name, codec_extension, 'sd'))
for quality in config_files.get(codec_name, []):
format_id = '-'.join((codec_name, quality)).lower()
key = quality if quality in files else 'other'
video_url = None
if isinstance(config_files[codec_name], dict):
file_info = config_files[codec_name][quality]
video_url = file_info.get('url')
else:
files['other'].append((codec_name, codec_extension, config_files[codec_name][0]))
file_info = {}
if video_url is None:
video_url = "http://player.vimeo.com/play_redirect?clip_id=%s&sig=%s&time=%s&quality=%s&codecs=%s&type=moogaloop_local&embed_location=" \
%(video_id, sig, timestamp, quality, codec_name.upper())
for quality in ('hd', 'sd', 'other'):
if len(files[quality]) > 0:
video_quality = files[quality][0][2]
video_codec = files[quality][0][0]
video_extension = files[quality][0][1]
self.to_screen(u'%s: Downloading %s file at %s quality' % (video_id, video_codec.upper(), video_quality))
break
else:
files[key].append({
'ext': codec_extension,
'url': video_url,
'format_id': format_id,
'width': file_info.get('width'),
'height': file_info.get('height'),
})
formats = []
for key in ('other', 'sd', 'hd'):
formats += files[key]
if len(formats) == 0:
raise ExtractorError(u'No known codec found')
video_url = None
if isinstance(config_files[video_codec], dict):
video_url = config_files[video_codec][video_quality].get("url")
if video_url is None:
video_url = "http://player.vimeo.com/play_redirect?clip_id=%s&sig=%s&time=%s&quality=%s&codecs=%s&type=moogaloop_local&embed_location=" \
%(video_id, sig, timestamp, video_quality, video_codec.upper())
return [{
'id': video_id,
'url': video_url,
'uploader': video_uploader,
'uploader_id': video_uploader_id,
'upload_date': video_upload_date,
'title': video_title,
'ext': video_extension,
'thumbnail': video_thumbnail,
'description': video_description,
'formats': formats,
}]

View File

@@ -0,0 +1,59 @@
# coding: utf-8
import re
from ..utils import (
compat_urllib_request,
compat_urllib_parse
)
from .common import InfoExtractor
class WeBSurgIE(InfoExtractor):
IE_NAME = u'websurg.com'
_VALID_URL = r'http://.*?\.websurg\.com/MEDIA/\?noheader=1&doi=(.*)'
_TEST = {
u'url': u'http://www.websurg.com/MEDIA/?noheader=1&doi=vd01en4012',
u'file': u'vd01en4012.mp4',
u'params': {
u'skip_download': True,
},
u'skip': u'Requires login information',
}
_LOGIN_URL = 'http://www.websurg.com/inc/login/login_div.ajax.php?login=1'
def _real_initialize(self):
login_form = {
'username': self._downloader.params['username'],
'password': self._downloader.params['password'],
'Submit': 1
}
request = compat_urllib_request.Request(
self._LOGIN_URL, compat_urllib_parse.urlencode(login_form))
request.add_header(
'Content-Type', 'application/x-www-form-urlencoded;charset=utf-8')
compat_urllib_request.urlopen(request).info()
webpage = self._download_webpage(self._LOGIN_URL, '', 'Logging in')
if webpage != 'OK':
self._downloader.report_error(
u'Unable to log in: bad username/password')
def _real_extract(self, url):
video_id = re.match(self._VALID_URL, url).group(1)
webpage = self._download_webpage(url, video_id)
url_info = re.search(r'streamer="(.*?)" src="(.*?)"', webpage)
return {'id': video_id,
'title': self._og_search_title(webpage),
'description': self._og_search_description(webpage),
'ext' : 'mp4',
'url' : url_info.group(1) + '/' + url_info.group(2),
'thumbnail': self._og_search_thumbnail(webpage)
}

View File

@@ -11,23 +11,36 @@ from ..utils import (
class XHamsterIE(InfoExtractor):
"""Information Extractor for xHamster"""
_VALID_URL = r'(?:http://)?(?:www.)?xhamster\.com/movies/(?P<id>[0-9]+)/.*\.html'
_TEST = {
_VALID_URL = r'(?:http://)?(?:www\.)?xhamster\.com/movies/(?P<id>[0-9]+)/(?P<seo>.+?)\.html(?:\?.*)?'
_TESTS = [{
u'url': u'http://xhamster.com/movies/1509445/femaleagent_shy_beauty_takes_the_bait.html',
u'file': u'1509445.flv',
u'md5': u'9f48e0e8d58e3076bb236ff412ab62fa',
u'info_dict': {
u"upload_date": u"20121014",
u"uploader_id": u"Ruseful2011",
u"title": u"FemaleAgent Shy beauty takes the bait"
u"title": u"FemaleAgent Shy beauty takes the bait",
u"age_limit": 18,
}
}
},
{
u'url': u'http://xhamster.com/movies/2221348/britney_spears_sexy_booty.html?hd',
u'file': u'2221348.flv',
u'md5': u'e767b9475de189320f691f49c679c4c7',
u'info_dict': {
u"upload_date": u"20130914",
u"uploader_id": u"jojo747400",
u"title": u"Britney Spears Sexy Booty",
u"age_limit": 18,
}
}]
def _real_extract(self,url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
mrss_url = 'http://xhamster.com/movies/%s/.html?hd' % video_id
seo = mobj.group('seo')
mrss_url = 'http://xhamster.com/movies/%s/%s.html?hd' % (video_id, seo)
webpage = self._download_webpage(mrss_url, video_id)
mobj = re.search(r'\'srv\': \'(?P<server>[^\']*)\',\s*\'file\': \'(?P<file>[^\']+)\',', webpage)
@@ -61,6 +74,8 @@ class XHamsterIE(InfoExtractor):
video_thumbnail = self._search_regex(r'\'image\':\'(?P<thumbnail>[^\']+)\'',
webpage, u'thumbnail', fatal=False)
age_limit = self._rta_search(webpage)
return [{
'id': video_id,
'url': video_url,
@@ -69,5 +84,6 @@ class XHamsterIE(InfoExtractor):
'description': video_description,
'upload_date': video_upload_date,
'uploader_id': video_uploader_id,
'thumbnail': video_thumbnail
'thumbnail': video_thumbnail,
'age_limit': age_limit,
}]

View File

@@ -18,7 +18,8 @@ class XNXXIE(InfoExtractor):
u'file': u'1135332.flv',
u'md5': u'0831677e2b4761795f68d417e0b7b445',
u'info_dict': {
u"title": u"lida \u00bb Naked Funny Actress (5)"
u"title": u"lida \u00bb Naked Funny Actress (5)",
u"age_limit": 18,
}
}
@@ -50,4 +51,5 @@ class XNXXIE(InfoExtractor):
'ext': 'flv',
'thumbnail': video_thumbnail,
'description': None,
'age_limit': 18,
}]

View File

@@ -13,7 +13,8 @@ class XVideosIE(InfoExtractor):
u'file': u'939581.flv',
u'md5': u'1d0c835822f0a71a7bf011855db929d0',
u'info_dict': {
u"title": u"Funny Porns By >>>>S<<<<<< -1"
u"title": u"Funny Porns By >>>>S<<<<<< -1",
u"age_limit": 18,
}
}
@@ -46,6 +47,7 @@ class XVideosIE(InfoExtractor):
'ext': 'flv',
'thumbnail': video_thumbnail,
'description': None,
'age_limit': 18,
}
return [info]

View File

@@ -1,4 +1,3 @@
import datetime
import itertools
import json
import re
@@ -6,86 +5,104 @@ import re
from .common import InfoExtractor, SearchInfoExtractor
from ..utils import (
compat_urllib_parse,
ExtractorError,
compat_urlparse,
determine_ext,
clean_html,
)
class YahooIE(InfoExtractor):
IE_DESC = u'Yahoo screen'
_VALID_URL = r'http://screen\.yahoo\.com/.*?-(?P<id>\d*?)\.html'
_TEST = {
u'url': u'http://screen.yahoo.com/julian-smith-travis-legg-watch-214727115.html',
u'file': u'214727115.flv',
u'md5': u'2e717f169c1be93d84d3794a00d4a325',
u'info_dict': {
u"title": u"Julian Smith & Travis Legg Watch Julian Smith"
_TESTS = [
{
u'url': u'http://screen.yahoo.com/julian-smith-travis-legg-watch-214727115.html',
u'file': u'214727115.flv',
u'info_dict': {
u'title': u'Julian Smith & Travis Legg Watch Julian Smith',
u'description': u'Julian and Travis watch Julian Smith',
},
u'params': {
# Requires rtmpdump
u'skip_download': True,
},
},
u'skip': u'Requires rtmpdump'
}
{
u'url': u'http://screen.yahoo.com/wired/codefellas-s1-ep12-cougar-lies-103000935.html',
u'file': u'103000935.flv',
u'info_dict': {
u'title': u'Codefellas - The Cougar Lies with Spanish Moss',
u'description': u'Agent Topple\'s mustache does its dirty work, and Nicole brokers a deal for peace. But why is the NSA collecting millions of Instagram brunch photos? And if your waffles have nothing to hide, what are they so worried about?',
},
u'params': {
# Requires rtmpdump
u'skip_download': True,
},
},
]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
if mobj is None:
raise ExtractorError(u'Invalid URL: %s' % url)
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
m_id = re.search(r'YUI\.namespace\("Media"\)\.CONTENT_ID = "(?P<new_id>.+?)";', webpage)
if m_id is None:
# TODO: Check which url parameters are required
info_url = 'http://cosmos.bcst.yahoo.com/rest/v2/pops;lmsoverride=1;outputformat=mrss;cb=974419660;id=%s;rd=news.yahoo.com;datacontext=mdb;lg=KCa2IihxG3qE60vQ7HtyUy' % video_id
webpage = self._download_webpage(info_url, video_id, u'Downloading info webpage')
info_re = r'''<title><!\[CDATA\[(?P<title>.*?)\]\]></title>.*
<description><!\[CDATA\[(?P<description>.*?)\]\]></description>.*
<media:pubStart><!\[CDATA\[(?P<date>.*?)\ .*\]\]></media:pubStart>.*
<media:content\ medium="image"\ url="(?P<thumb>.*?)"\ name="LARGETHUMB"
'''
self.report_extraction(video_id)
m_info = re.search(info_re, webpage, re.VERBOSE|re.DOTALL)
if m_info is None:
raise ExtractorError(u'Unable to extract video info')
video_title = m_info.group('title')
video_description = m_info.group('description')
video_thumb = m_info.group('thumb')
video_date = m_info.group('date')
video_date = datetime.datetime.strptime(video_date, '%m/%d/%Y').strftime('%Y%m%d')
# TODO: Find a way to get mp4 videos
rest_url = 'http://cosmos.bcst.yahoo.com/rest/v2/pops;element=stream;outputformat=mrss;id=%s;lmsoverride=1;bw=375;dynamicstream=1;cb=83521105;tech=flv,mp4;rd=news.yahoo.com;datacontext=mdb;lg=KCa2IihxG3qE60vQ7HtyUy' % video_id
webpage = self._download_webpage(rest_url, video_id, u'Downloading video url webpage')
m_rest = re.search(r'<media:content url="(?P<url>.*?)" path="(?P<path>.*?)"', webpage)
video_url = m_rest.group('url')
video_path = m_rest.group('path')
if m_rest is None:
raise ExtractorError(u'Unable to extract video url')
items_json = self._search_regex(r'YVIDEO_INIT_ITEMS = ({.*?});$',
webpage, u'items', flags=re.MULTILINE)
items = json.loads(items_json)
info = items['mediaItems']['query']['results']['mediaObj'][0]
# The 'meta' field is not always in the video webpage, we request it
# from another page
long_id = info['id']
query = ('SELECT * FROM yahoo.media.video.streams WHERE id="%s"'
' AND plrs="86Gj0vCaSzV_Iuf6hNylf2"' % long_id)
data = compat_urllib_parse.urlencode({
'q': query,
'env': 'prod',
'format': 'json',
})
query_result_json = self._download_webpage(
'http://video.query.yahoo.com/v1/public/yql?' + data,
video_id, u'Downloading video info')
query_result = json.loads(query_result_json)
info = query_result['query']['results']['mediaObj'][0]
meta = info['meta']
else: # We have to use a different method if another id is defined
long_id = m_id.group('new_id')
info_url = 'http://video.query.yahoo.com/v1/public/yql?q=SELECT%20*%20FROM%20yahoo.media.video.streams%20WHERE%20id%3D%22' + long_id + '%22%20AND%20format%3D%22mp4%2Cflv%22%20AND%20protocol%3D%22rtmp%2Chttp%22%20AND%20plrs%3D%2286Gj0vCaSzV_Iuf6hNylf2%22%20AND%20acctid%3D%22389%22%20AND%20plidl%3D%22%22%20AND%20pspid%3D%22792700001%22%20AND%20offnetwork%3D%22false%22%20AND%20site%3D%22ivy%22%20AND%20lang%3D%22en-US%22%20AND%20region%3D%22US%22%20AND%20override%3D%22none%22%3B&env=prod&format=json&callback=YUI.Env.JSONP.yui_3_8_1_1_1368368376830_335'
webpage = self._download_webpage(info_url, video_id, u'Downloading info json')
json_str = re.search(r'YUI.Env.JSONP.yui.*?\((.*?)\);', webpage).group(1)
info = json.loads(json_str)
res = info[u'query'][u'results'][u'mediaObj'][0]
stream = res[u'streams'][0]
video_path = stream[u'path']
video_url = stream[u'host']
meta = res[u'meta']
video_title = meta[u'title']
video_description = meta[u'description']
video_thumb = meta[u'thumbnail']
video_date = None # I can't find it
formats = []
for s in info['streams']:
format_info = {
'width': s.get('width'),
'height': s.get('height'),
'bitrate': s.get('bitrate'),
}
host = s['host']
path = s['path']
if host.startswith('rtmp'):
format_info.update({
'url': host,
'play_path': path,
'ext': 'flv',
})
else:
format_url = compat_urlparse.urljoin(host, path)
format_info['url'] = format_url
format_info['ext'] = determine_ext(format_url)
formats.append(format_info)
formats = sorted(formats, key=lambda f:(f['height'], f['width']))
info = {
'id': video_id,
'title': meta['title'],
'formats': formats,
'description': clean_html(meta['description']),
'thumbnail': meta['thumbnail'],
}
# TODO: Remove when #980 has been merged
info.update(formats[-1])
return info
info_dict = {
'id': video_id,
'url': video_url,
'play_path': video_path,
'title':video_title,
'description': video_description,
'thumbnail': video_thumb,
'upload_date': video_date,
'ext': 'flv',
}
return info_dict
class YahooSearchIE(SearchInfoExtractor):
IE_DESC = u'Yahoo screen search'

View File

@@ -13,7 +13,7 @@ from ..utils import (
class YoukuIE(InfoExtractor):
_VALID_URL = r'(?:http://)?(v|player)\.youku\.com/(v_show/id_|player\.php/sid/)(?P<ID>[A-Za-z0-9]+)(\.html|/v.swf)'
_VALID_URL = r'(?:(?:http://)?(?:v|player)\.youku\.com/(?:v_show/id_|player\.php/sid/)|youku:)(?P<ID>[A-Za-z0-9]+)(?:\.html|/v\.swf|)'
_TEST = {
u"url": u"http://v.youku.com/v_show/id_XNDgyMDQ2NTQw.html",
u"file": u"XNDgyMDQ2NTQw_part00.flv",
@@ -66,6 +66,12 @@ class YoukuIE(InfoExtractor):
self.report_extraction(video_id)
try:
config = json.loads(jsondata)
error_code = config['data'][0].get('error_code')
if error_code:
# -8 means blocked outside China.
error = config['data'][0].get('error') # Chinese and English, separated by newline.
raise ExtractorError(error or u'Server reported error %i' % error_code,
expected=True)
video_title = config['data'][0]['title']
seed = config['data'][0]['seed']
@@ -89,6 +95,7 @@ class YoukuIE(InfoExtractor):
fileid = config['data'][0]['streamfileids'][format]
keys = [s['k'] for s in config['data'][0]['segs'][format]]
# segs is usually a dictionary, but an empty *list* if an error occured.
except (UnicodeDecodeError, ValueError, KeyError):
raise ExtractorError(u'Unable to extract info section')

View File

@@ -26,7 +26,8 @@ class YouPornIE(InfoExtractor):
u"upload_date": u"20101221",
u"description": u"Love & Sex Answers: http://bit.ly/DanAndJenn -- Is It Unhealthy To Masturbate Daily?",
u"uploader": u"Ask Dan And Jennifer",
u"title": u"Sex Ed: Is It Safe To Masturbate Daily?"
u"title": u"Sex Ed: Is It Safe To Masturbate Daily?",
u"age_limit": 18,
}
}
@@ -51,6 +52,7 @@ class YouPornIE(InfoExtractor):
req = compat_urllib_request.Request(url)
req.add_header('Cookie', 'age_verified=1')
webpage = self._download_webpage(req, video_id)
age_limit = self._rta_search(webpage)
# Get JSON parameters
json_params = self._search_regex(r'var currentVideo = new Video\((.*)\);', webpage, u'JSON parameters')
@@ -115,7 +117,8 @@ class YouPornIE(InfoExtractor):
'ext': extension,
'format': format,
'thumbnail': thumbnail,
'description': video_description
'description': video_description,
'age_limit': age_limit,
})
if self._downloader.params.get('listformats', None):

View File

@@ -1,28 +1,39 @@
# coding: utf-8
import collections
import errno
import io
import itertools
import json
import netrc
import os.path
import re
import socket
import itertools
import string
import struct
import traceback
import xml.etree.ElementTree
import zlib
from .common import InfoExtractor, SearchInfoExtractor
from .subtitles import SubtitlesInfoExtractor
from ..utils import (
compat_chr,
compat_http_client,
compat_parse_qs,
compat_urllib_error,
compat_urllib_parse,
compat_urllib_request,
compat_urlparse,
compat_str,
clean_html,
get_cachedir,
get_element_by_id,
ExtractorError,
unescapeHTML,
unified_strdate,
orderedSet,
write_json_file,
)
class YoutubeBaseInfoExtractor(InfoExtractor):
@@ -225,11 +236,13 @@ class YoutubeIE(YoutubeBaseInfoExtractor, SubtitlesInfoExtractor):
'136': 'mp4',
'137': 'mp4',
'138': 'mp4',
'139': 'mp4',
'140': 'mp4',
'141': 'mp4',
'160': 'mp4',
# Dash mp4 audio
'139': 'm4a',
'140': 'm4a',
'141': 'm4a',
# Dash webm
'171': 'webm',
'172': 'webm',
@@ -352,7 +365,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor, SubtitlesInfoExtractor):
u"info_dict": {
u"upload_date": u"20120506",
u"title": u"Icona Pop - I Love It (feat. Charli XCX) [OFFICIAL VIDEO]",
u"description": u"md5:3e2666e0a55044490499ea45fe9037b7",
u"description": u"md5:5b292926389560516e384ac437c0ec07",
u"uploader": u"Icona Pop",
u"uploader_id": u"IconaPop"
}
@@ -369,21 +382,6 @@ class YoutubeIE(YoutubeBaseInfoExtractor, SubtitlesInfoExtractor):
u"uploader_id": u"justintimberlakeVEVO"
}
},
{
u'url': u'https://www.youtube.com/watch?v=TGi3HqYrWHE',
u'file': u'TGi3HqYrWHE.mp4',
u'note': u'm3u8 video',
u'info_dict': {
u'title': u'Triathlon - Men - London 2012 Olympic Games',
u'description': u'- Men - TR02 - Triathlon - 07 August 2012 - London 2012 Olympic Games',
u'uploader': u'olympic',
u'upload_date': u'20120807',
u'uploader_id': u'olympic',
},
u'params': {
u'skip_download': True,
},
},
]
@@ -393,6 +391,10 @@ class YoutubeIE(YoutubeBaseInfoExtractor, SubtitlesInfoExtractor):
if YoutubePlaylistIE.suitable(url): return False
return re.match(cls._VALID_URL, url, re.VERBOSE) is not None
def __init__(self, *args, **kwargs):
super(YoutubeIE, self).__init__(*args, **kwargs)
self._player_cache = {}
def report_video_webpage_download(self, video_id):
"""Report attempt to download video webpage."""
self.to_screen(u'%s: Downloading video webpage' % video_id)
@@ -413,11 +415,664 @@ class YoutubeIE(YoutubeBaseInfoExtractor, SubtitlesInfoExtractor):
"""Indicate the download will use the RTMP protocol."""
self.to_screen(u'RTMP download detected')
def _decrypt_signature(self, s):
def _extract_signature_function(self, video_id, player_url, slen):
id_m = re.match(r'.*-(?P<id>[a-zA-Z0-9_-]+)\.(?P<ext>[a-z]+)$',
player_url)
player_type = id_m.group('ext')
player_id = id_m.group('id')
# Read from filesystem cache
func_id = '%s_%s_%d' % (player_type, player_id, slen)
assert os.path.basename(func_id) == func_id
cache_dir = get_cachedir(self._downloader.params)
cache_enabled = cache_dir is not None
if cache_enabled:
cache_fn = os.path.join(os.path.expanduser(cache_dir),
u'youtube-sigfuncs',
func_id + '.json')
try:
with io.open(cache_fn, 'r', encoding='utf-8') as cachef:
cache_spec = json.load(cachef)
return lambda s: u''.join(s[i] for i in cache_spec)
except IOError:
pass # No cache available
if player_type == 'js':
code = self._download_webpage(
player_url, video_id,
note=u'Downloading %s player %s' % (player_type, player_id),
errnote=u'Download of %s failed' % player_url)
res = self._parse_sig_js(code)
elif player_type == 'swf':
urlh = self._request_webpage(
player_url, video_id,
note=u'Downloading %s player %s' % (player_type, player_id),
errnote=u'Download of %s failed' % player_url)
code = urlh.read()
res = self._parse_sig_swf(code)
else:
assert False, 'Invalid player type %r' % player_type
if cache_enabled:
try:
test_string = u''.join(map(compat_chr, range(slen)))
cache_res = res(test_string)
cache_spec = [ord(c) for c in cache_res]
try:
os.makedirs(os.path.dirname(cache_fn))
except OSError as ose:
if ose.errno != errno.EEXIST:
raise
write_json_file(cache_spec, cache_fn)
except Exception:
tb = traceback.format_exc()
self._downloader.report_warning(
u'Writing cache to %r failed: %s' % (cache_fn, tb))
return res
def _print_sig_code(self, func, slen):
def gen_sig_code(idxs):
def _genslice(start, end, step):
starts = u'' if start == 0 else str(start)
ends = (u':%d' % (end+step)) if end + step >= 0 else u':'
steps = u'' if step == 1 else (u':%d' % step)
return u's[%s%s%s]' % (starts, ends, steps)
step = None
start = '(Never used)' # Quelch pyflakes warnings - start will be
# set as soon as step is set
for i, prev in zip(idxs[1:], idxs[:-1]):
if step is not None:
if i - prev == step:
continue
yield _genslice(start, prev, step)
step = None
continue
if i - prev in [-1, 1]:
step = i - prev
start = prev
continue
else:
yield u's[%d]' % prev
if step is None:
yield u's[%d]' % i
else:
yield _genslice(start, i, step)
test_string = u''.join(map(compat_chr, range(slen)))
cache_res = func(test_string)
cache_spec = [ord(c) for c in cache_res]
expr_code = u' + '.join(gen_sig_code(cache_spec))
code = u'if len(s) == %d:\n return %s\n' % (slen, expr_code)
self.to_screen(u'Extracted signature function:\n' + code)
def _parse_sig_js(self, jscode):
funcname = self._search_regex(
r'signature=([a-zA-Z]+)', jscode,
u'Initial JS player signature function name')
functions = {}
def argidx(varname):
return string.lowercase.index(varname)
def interpret_statement(stmt, local_vars, allow_recursion=20):
if allow_recursion < 0:
raise ExtractorError(u'Recursion limit reached')
if stmt.startswith(u'var '):
stmt = stmt[len(u'var '):]
ass_m = re.match(r'^(?P<out>[a-z]+)(?:\[(?P<index>[^\]]+)\])?' +
r'=(?P<expr>.*)$', stmt)
if ass_m:
if ass_m.groupdict().get('index'):
def assign(val):
lvar = local_vars[ass_m.group('out')]
idx = interpret_expression(ass_m.group('index'),
local_vars, allow_recursion)
assert isinstance(idx, int)
lvar[idx] = val
return val
expr = ass_m.group('expr')
else:
def assign(val):
local_vars[ass_m.group('out')] = val
return val
expr = ass_m.group('expr')
elif stmt.startswith(u'return '):
assign = lambda v: v
expr = stmt[len(u'return '):]
else:
raise ExtractorError(
u'Cannot determine left side of statement in %r' % stmt)
v = interpret_expression(expr, local_vars, allow_recursion)
return assign(v)
def interpret_expression(expr, local_vars, allow_recursion):
if expr.isdigit():
return int(expr)
if expr.isalpha():
return local_vars[expr]
m = re.match(r'^(?P<in>[a-z]+)\.(?P<member>.*)$', expr)
if m:
member = m.group('member')
val = local_vars[m.group('in')]
if member == 'split("")':
return list(val)
if member == 'join("")':
return u''.join(val)
if member == 'length':
return len(val)
if member == 'reverse()':
return val[::-1]
slice_m = re.match(r'slice\((?P<idx>.*)\)', member)
if slice_m:
idx = interpret_expression(
slice_m.group('idx'), local_vars, allow_recursion-1)
return val[idx:]
m = re.match(
r'^(?P<in>[a-z]+)\[(?P<idx>.+)\]$', expr)
if m:
val = local_vars[m.group('in')]
idx = interpret_expression(m.group('idx'), local_vars,
allow_recursion-1)
return val[idx]
m = re.match(r'^(?P<a>.+?)(?P<op>[%])(?P<b>.+?)$', expr)
if m:
a = interpret_expression(m.group('a'),
local_vars, allow_recursion)
b = interpret_expression(m.group('b'),
local_vars, allow_recursion)
return a % b
m = re.match(
r'^(?P<func>[a-zA-Z]+)\((?P<args>[a-z0-9,]+)\)$', expr)
if m:
fname = m.group('func')
if fname not in functions:
functions[fname] = extract_function(fname)
argvals = [int(v) if v.isdigit() else local_vars[v]
for v in m.group('args').split(',')]
return functions[fname](argvals)
raise ExtractorError(u'Unsupported JS expression %r' % expr)
def extract_function(funcname):
func_m = re.search(
r'function ' + re.escape(funcname) +
r'\((?P<args>[a-z,]+)\){(?P<code>[^}]+)}',
jscode)
argnames = func_m.group('args').split(',')
def resf(args):
local_vars = dict(zip(argnames, args))
for stmt in func_m.group('code').split(';'):
res = interpret_statement(stmt, local_vars)
return res
return resf
initial_function = extract_function(funcname)
return lambda s: initial_function([s])
def _parse_sig_swf(self, file_contents):
if file_contents[1:3] != b'WS':
raise ExtractorError(
u'Not an SWF file; header is %r' % file_contents[:3])
if file_contents[:1] == b'C':
content = zlib.decompress(file_contents[8:])
else:
raise NotImplementedError(u'Unsupported compression format %r' %
file_contents[:1])
def extract_tags(content):
pos = 0
while pos < len(content):
header16 = struct.unpack('<H', content[pos:pos+2])[0]
pos += 2
tag_code = header16 >> 6
tag_len = header16 & 0x3f
if tag_len == 0x3f:
tag_len = struct.unpack('<I', content[pos:pos+4])[0]
pos += 4
assert pos+tag_len <= len(content)
yield (tag_code, content[pos:pos+tag_len])
pos += tag_len
code_tag = next(tag
for tag_code, tag in extract_tags(content)
if tag_code == 82)
p = code_tag.index(b'\0', 4) + 1
code_reader = io.BytesIO(code_tag[p:])
# Parse ABC (AVM2 ByteCode)
def read_int(reader=None):
if reader is None:
reader = code_reader
res = 0
shift = 0
for _ in range(5):
buf = reader.read(1)
assert len(buf) == 1
b = struct.unpack('<B', buf)[0]
res = res | ((b & 0x7f) << shift)
if b & 0x80 == 0:
break
shift += 7
return res
def u30(reader=None):
res = read_int(reader)
assert res & 0xf0000000 == 0
return res
u32 = read_int
def s32(reader=None):
v = read_int(reader)
if v & 0x80000000 != 0:
v = - ((v ^ 0xffffffff) + 1)
return v
def read_string(reader=None):
if reader is None:
reader = code_reader
slen = u30(reader)
resb = reader.read(slen)
assert len(resb) == slen
return resb.decode('utf-8')
def read_bytes(count, reader=None):
if reader is None:
reader = code_reader
resb = reader.read(count)
assert len(resb) == count
return resb
def read_byte(reader=None):
resb = read_bytes(1, reader=reader)
res = struct.unpack('<B', resb)[0]
return res
# minor_version + major_version
read_bytes(2 + 2)
# Constant pool
int_count = u30()
for _c in range(1, int_count):
s32()
uint_count = u30()
for _c in range(1, uint_count):
u32()
double_count = u30()
read_bytes((double_count-1) * 8)
string_count = u30()
constant_strings = [u'']
for _c in range(1, string_count):
s = read_string()
constant_strings.append(s)
namespace_count = u30()
for _c in range(1, namespace_count):
read_bytes(1) # kind
u30() # name
ns_set_count = u30()
for _c in range(1, ns_set_count):
count = u30()
for _c2 in range(count):
u30()
multiname_count = u30()
MULTINAME_SIZES = {
0x07: 2, # QName
0x0d: 2, # QNameA
0x0f: 1, # RTQName
0x10: 1, # RTQNameA
0x11: 0, # RTQNameL
0x12: 0, # RTQNameLA
0x09: 2, # Multiname
0x0e: 2, # MultinameA
0x1b: 1, # MultinameL
0x1c: 1, # MultinameLA
}
multinames = [u'']
for _c in range(1, multiname_count):
kind = u30()
assert kind in MULTINAME_SIZES, u'Invalid multiname kind %r' % kind
if kind == 0x07:
u30() # namespace_idx
name_idx = u30()
multinames.append(constant_strings[name_idx])
else:
multinames.append('[MULTINAME kind: %d]' % kind)
for _c2 in range(MULTINAME_SIZES[kind]):
u30()
# Methods
method_count = u30()
MethodInfo = collections.namedtuple(
'MethodInfo',
['NEED_ARGUMENTS', 'NEED_REST'])
method_infos = []
for method_id in range(method_count):
param_count = u30()
u30() # return type
for _ in range(param_count):
u30() # param type
u30() # name index (always 0 for youtube)
flags = read_byte()
if flags & 0x08 != 0:
# Options present
option_count = u30()
for c in range(option_count):
u30() # val
read_bytes(1) # kind
if flags & 0x80 != 0:
# Param names present
for _ in range(param_count):
u30() # param name
mi = MethodInfo(flags & 0x01 != 0, flags & 0x04 != 0)
method_infos.append(mi)
# Metadata
metadata_count = u30()
for _c in range(metadata_count):
u30() # name
item_count = u30()
for _c2 in range(item_count):
u30() # key
u30() # value
def parse_traits_info():
trait_name_idx = u30()
kind_full = read_byte()
kind = kind_full & 0x0f
attrs = kind_full >> 4
methods = {}
if kind in [0x00, 0x06]: # Slot or Const
u30() # Slot id
u30() # type_name_idx
vindex = u30()
if vindex != 0:
read_byte() # vkind
elif kind in [0x01, 0x02, 0x03]: # Method / Getter / Setter
u30() # disp_id
method_idx = u30()
methods[multinames[trait_name_idx]] = method_idx
elif kind == 0x04: # Class
u30() # slot_id
u30() # classi
elif kind == 0x05: # Function
u30() # slot_id
function_idx = u30()
methods[function_idx] = multinames[trait_name_idx]
else:
raise ExtractorError(u'Unsupported trait kind %d' % kind)
if attrs & 0x4 != 0: # Metadata present
metadata_count = u30()
for _c3 in range(metadata_count):
u30() # metadata index
return methods
# Classes
TARGET_CLASSNAME = u'SignatureDecipher'
searched_idx = multinames.index(TARGET_CLASSNAME)
searched_class_id = None
class_count = u30()
for class_id in range(class_count):
name_idx = u30()
if name_idx == searched_idx:
# We found the class we're looking for!
searched_class_id = class_id
u30() # super_name idx
flags = read_byte()
if flags & 0x08 != 0: # Protected namespace is present
u30() # protected_ns_idx
intrf_count = u30()
for _c2 in range(intrf_count):
u30()
u30() # iinit
trait_count = u30()
for _c2 in range(trait_count):
parse_traits_info()
if searched_class_id is None:
raise ExtractorError(u'Target class %r not found' %
TARGET_CLASSNAME)
method_names = {}
method_idxs = {}
for class_id in range(class_count):
u30() # cinit
trait_count = u30()
for _c2 in range(trait_count):
trait_methods = parse_traits_info()
if class_id == searched_class_id:
method_names.update(trait_methods.items())
method_idxs.update(dict(
(idx, name)
for name, idx in trait_methods.items()))
# Scripts
script_count = u30()
for _c in range(script_count):
u30() # init
trait_count = u30()
for _c2 in range(trait_count):
parse_traits_info()
# Method bodies
method_body_count = u30()
Method = collections.namedtuple('Method', ['code', 'local_count'])
methods = {}
for _c in range(method_body_count):
method_idx = u30()
u30() # max_stack
local_count = u30()
u30() # init_scope_depth
u30() # max_scope_depth
code_length = u30()
code = read_bytes(code_length)
if method_idx in method_idxs:
m = Method(code, local_count)
methods[method_idxs[method_idx]] = m
exception_count = u30()
for _c2 in range(exception_count):
u30() # from
u30() # to
u30() # target
u30() # exc_type
u30() # var_name
trait_count = u30()
for _c2 in range(trait_count):
parse_traits_info()
assert p + code_reader.tell() == len(code_tag)
assert len(methods) == len(method_idxs)
method_pyfunctions = {}
def extract_function(func_name):
if func_name in method_pyfunctions:
return method_pyfunctions[func_name]
if func_name not in methods:
raise ExtractorError(u'Cannot find function %r' % func_name)
m = methods[func_name]
def resfunc(args):
registers = ['(this)'] + list(args) + [None] * m.local_count
stack = []
coder = io.BytesIO(m.code)
while True:
opcode = struct.unpack('!B', coder.read(1))[0]
if opcode == 36: # pushbyte
v = struct.unpack('!B', coder.read(1))[0]
stack.append(v)
elif opcode == 44: # pushstring
idx = u30(coder)
stack.append(constant_strings[idx])
elif opcode == 48: # pushscope
# We don't implement the scope register, so we'll just
# ignore the popped value
stack.pop()
elif opcode == 70: # callproperty
index = u30(coder)
mname = multinames[index]
arg_count = u30(coder)
args = list(reversed(
[stack.pop() for _ in range(arg_count)]))
obj = stack.pop()
if mname == u'split':
assert len(args) == 1
assert isinstance(args[0], compat_str)
assert isinstance(obj, compat_str)
if args[0] == u'':
res = list(obj)
else:
res = obj.split(args[0])
stack.append(res)
elif mname == u'slice':
assert len(args) == 1
assert isinstance(args[0], int)
assert isinstance(obj, list)
res = obj[args[0]:]
stack.append(res)
elif mname == u'join':
assert len(args) == 1
assert isinstance(args[0], compat_str)
assert isinstance(obj, list)
res = args[0].join(obj)
stack.append(res)
elif mname in method_pyfunctions:
stack.append(method_pyfunctions[mname](args))
else:
raise NotImplementedError(
u'Unsupported property %r on %r'
% (mname, obj))
elif opcode == 72: # returnvalue
res = stack.pop()
return res
elif opcode == 79: # callpropvoid
index = u30(coder)
mname = multinames[index]
arg_count = u30(coder)
args = list(reversed(
[stack.pop() for _ in range(arg_count)]))
obj = stack.pop()
if mname == u'reverse':
assert isinstance(obj, list)
obj.reverse()
else:
raise NotImplementedError(
u'Unsupported (void) property %r on %r'
% (mname, obj))
elif opcode == 93: # findpropstrict
index = u30(coder)
mname = multinames[index]
res = extract_function(mname)
stack.append(res)
elif opcode == 97: # setproperty
index = u30(coder)
value = stack.pop()
idx = stack.pop()
obj = stack.pop()
assert isinstance(obj, list)
assert isinstance(idx, int)
obj[idx] = value
elif opcode == 98: # getlocal
index = u30(coder)
stack.append(registers[index])
elif opcode == 99: # setlocal
index = u30(coder)
value = stack.pop()
registers[index] = value
elif opcode == 102: # getproperty
index = u30(coder)
pname = multinames[index]
if pname == u'length':
obj = stack.pop()
assert isinstance(obj, list)
stack.append(len(obj))
else: # Assume attribute access
idx = stack.pop()
assert isinstance(idx, int)
obj = stack.pop()
assert isinstance(obj, list)
stack.append(obj[idx])
elif opcode == 128: # coerce
u30(coder)
elif opcode == 133: # coerce_s
assert isinstance(stack[-1], (type(None), compat_str))
elif opcode == 164: # modulo
value2 = stack.pop()
value1 = stack.pop()
res = value1 % value2
stack.append(res)
elif opcode == 208: # getlocal_0
stack.append(registers[0])
elif opcode == 209: # getlocal_1
stack.append(registers[1])
elif opcode == 210: # getlocal_2
stack.append(registers[2])
elif opcode == 211: # getlocal_3
stack.append(registers[3])
elif opcode == 214: # setlocal_2
registers[2] = stack.pop()
elif opcode == 215: # setlocal_3
registers[3] = stack.pop()
else:
raise NotImplementedError(
u'Unsupported opcode %d' % opcode)
method_pyfunctions[func_name] = resfunc
return resfunc
initial_function = extract_function(u'decipher')
return lambda s: initial_function([s])
def _decrypt_signature(self, s, video_id, player_url, age_gate=False):
"""Turn the encrypted s field into a working signature"""
if len(s) == 92:
if player_url is not None:
try:
player_id = (player_url, len(s))
if player_id not in self._player_cache:
func = self._extract_signature_function(
video_id, player_url, len(s)
)
self._player_cache[player_id] = func
func = self._player_cache[player_id]
if self._downloader.params.get('youtube_print_sig_code'):
self._print_sig_code(func, len(s))
return func(s)
except Exception:
tb = traceback.format_exc()
self._downloader.report_warning(
u'Automatic signature extraction failed: ' + tb)
self._downloader.report_warning(
u'Warning: Falling back to static signature algorithm')
return self._static_decrypt_signature(
s, video_id, player_url, age_gate)
def _static_decrypt_signature(self, s, video_id, player_url, age_gate):
if age_gate:
# The videos with age protection use another player, so the
# algorithms can be different.
if len(s) == 86:
return s[2:63] + s[82] + s[64:82] + s[63]
if len(s) == 93:
return s[86:29:-1] + s[88] + s[28:5:-1]
elif len(s) == 92:
return s[25] + s[3:25] + s[0] + s[26:42] + s[79] + s[43:79] + s[91] + s[80:83]
elif len(s) == 91:
return s[84:27:-1] + s[86] + s[26:5:-1]
elif len(s) == 90:
return s[25] + s[3:25] + s[2] + s[26:40] + s[77] + s[41:77] + s[89] + s[78:81]
elif len(s) == 89:
@@ -427,15 +1082,15 @@ class YoutubeIE(YoutubeBaseInfoExtractor, SubtitlesInfoExtractor):
elif len(s) == 87:
return s[6:27] + s[4] + s[28:39] + s[27] + s[40:59] + s[2] + s[60:]
elif len(s) == 86:
return s[5:34] + s[0] + s[35:38] + s[3] + s[39:45] + s[38] + s[46:53] + s[73] + s[54:73] + s[85] + s[74:85] + s[53]
return s[80:72:-1] + s[16] + s[71:39:-1] + s[72] + s[38:16:-1] + s[82] + s[15::-1]
elif len(s) == 85:
return s[3:11] + s[0] + s[12:55] + s[84] + s[56:84]
elif len(s) == 84:
return s[81:36:-1] + s[0] + s[35:2:-1]
return s[78:70:-1] + s[14] + s[69:37:-1] + s[70] + s[36:14:-1] + s[80] + s[:14][::-1]
elif len(s) == 83:
return s[81:64:-1] + s[82] + s[63:52:-1] + s[45] + s[51:45:-1] + s[1] + s[44:1:-1] + s[0]
return s[80:63:-1] + s[0] + s[62:0:-1] + s[63]
elif len(s) == 82:
return s[80:73:-1] + s[81] + s[72:54:-1] + s[2] + s[53:43:-1] + s[0] + s[42:2:-1] + s[43] + s[1] + s[54]
return s[80:37:-1] + s[7] + s[36:7:-1] + s[0] + s[6:0:-1] + s[37]
elif len(s) == 81:
return s[56] + s[79:56:-1] + s[41] + s[55:41:-1] + s[80] + s[40:34:-1] + s[0] + s[33:29:-1] + s[34] + s[28:9:-1] + s[29] + s[8:0:-1] + s[9]
elif len(s) == 80:
@@ -446,15 +1101,6 @@ class YoutubeIE(YoutubeBaseInfoExtractor, SubtitlesInfoExtractor):
else:
raise ExtractorError(u'Unable to decrypt signature, key length %d not supported; retrying might work' % (len(s)))
def _decrypt_signature_age_gate(self, s):
# The videos with age protection use another player, so the algorithms
# can be different.
if len(s) == 86:
return s[2:63] + s[82] + s[64:82] + s[63]
else:
# Fallback to the other algortihms
return self._decrypt_signature(s)
def _get_available_subtitles(self, video_id):
try:
sub_list = self._download_webpage(
@@ -472,6 +1118,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor, SubtitlesInfoExtractor):
'lang': lang,
'v': video_id,
'fmt': self._downloader.params.get('subtitlesformat'),
'name': l[0],
})
url = u'http://www.youtube.com/api/timedtext?' + params
sub_lang_list[lang] = url
@@ -505,7 +1152,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor, SubtitlesInfoExtractor):
list_page = self._download_webpage(list_url, video_id)
caption_list = xml.etree.ElementTree.fromstring(list_page.encode('utf-8'))
original_lang_node = caption_list.find('track')
if original_lang_node.attrib.get('kind') != 'asr' :
if original_lang_node is None or original_lang_node.attrib.get('kind') != 'asr' :
self._downloader.report_warning(u'Video doesn\'t have automatic captions')
return {}
original_lang = original_lang_node.attrib['lang_code']
@@ -605,10 +1252,11 @@ class YoutubeIE(YoutubeBaseInfoExtractor, SubtitlesInfoExtractor):
url_map[itag] = format_url
return url_map
def _real_extract(self, url):
if re.match(r'(?:https?://)?[^/]+/watch\?feature=[a-z_]+$', url):
self._downloader.report_warning(u'Did you forget to quote the URL? Remember that & is a meta-character in most shells, so you want to put the URL in quotes, like youtube-dl \'http://www.youtube.com/watch?feature=foo&v=BaW_jenozKc\' (or simply youtube-dl BaW_jenozKc ).')
def _extract_annotations(self, video_id):
url = 'https://www.youtube.com/annotations_invideo?features=1&legacy=1&video_id=%s' % video_id
return self._download_webpage(url, video_id, note=u'Searching for annotations.', errnote=u'Unable to download video annotations.')
def _real_extract(self, url):
# Extract original video URL from URL with redirection, like age verification, using next_url parameter
mobj = re.search(self._NEXT_URL_RE, url)
if mobj:
@@ -627,7 +1275,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor, SubtitlesInfoExtractor):
video_webpage = video_webpage_bytes.decode('utf-8', 'ignore')
# Attempt to extract SWF player URL
mobj = re.search(r'swfConfig.*?"(http:\\/\\/.*?watch.*?-.*?\.swf)"', video_webpage)
mobj = re.search(r'swfConfig.*?"(https?:\\/\\/.*?watch.*?-.*?\.swf)"', video_webpage)
if mobj is not None:
player_url = re.sub(r'\\(.)', r'\1', mobj.group(1))
else:
@@ -691,9 +1339,11 @@ class YoutubeIE(YoutubeBaseInfoExtractor, SubtitlesInfoExtractor):
self._downloader.report_warning(u'unable to extract uploader nickname')
# title
if 'title' not in video_info:
raise ExtractorError(u'Unable to extract video title')
video_title = compat_urllib_parse.unquote_plus(video_info['title'][0])
if 'title' in video_info:
video_title = compat_urllib_parse.unquote_plus(video_info['title'][0])
else:
self._downloader.report_warning(u'Unable to extract video title')
video_title = u'_'
# thumbnail image
# We try first to get a high quality image:
@@ -703,7 +1353,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor, SubtitlesInfoExtractor):
video_thumbnail = m_thumb.group(1)
elif 'thumbnail_url' not in video_info:
self._downloader.report_warning(u'unable to extract video thumbnail')
video_thumbnail = ''
video_thumbnail = None
else: # don't panic if we can't find it
video_thumbnail = compat_urllib_parse.unquote_plus(video_info['thumbnail_url'][0])
@@ -738,6 +1388,11 @@ class YoutubeIE(YoutubeBaseInfoExtractor, SubtitlesInfoExtractor):
else:
video_duration = compat_urllib_parse.unquote_plus(video_info['length_seconds'][0])
# annotations
video_annotations = None
if self._downloader.params.get('writeannotations', False):
video_annotations = self._extract_annotations(video_id)
# Decide which formats to download
try:
@@ -748,6 +1403,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor, SubtitlesInfoExtractor):
args = info['args']
# Easy way to know if the 's' value is in url_encoded_fmt_stream_map
# this signatures are encrypted
if 'url_encoded_fmt_stream_map' not in args:
raise ValueError(u'No stream_map present') # caught below
m_s = re.search(r'[&,]s=', args['url_encoded_fmt_stream_map'])
if m_s is not None:
self.to_screen(u'%s: Encrypted signatures detected.' % video_id)
@@ -780,24 +1437,34 @@ class YoutubeIE(YoutubeBaseInfoExtractor, SubtitlesInfoExtractor):
if 'sig' in url_data:
url += '&signature=' + url_data['sig'][0]
elif 's' in url_data:
if self._downloader.params.get('verbose'):
s = url_data['s'][0]
if age_gate:
player_version = self._search_regex(r'ad3-(.+?)\.swf',
video_info['ad3_module'][0] if 'ad3_module' in video_info else 'NOT FOUND',
'flash player', fatal=False)
player = 'flash player %s' % player_version
else:
player = u'html5 player %s' % self._search_regex(r'html5player-(.+?)\.js', video_webpage,
'html5 player', fatal=False)
parts_sizes = u'.'.join(compat_str(len(part)) for part in s.split('.'))
self.to_screen(u'encrypted signature length %d (%s), itag %s, %s' %
(len(s), parts_sizes, url_data['itag'][0], player))
encrypted_sig = url_data['s'][0]
if age_gate:
signature = self._decrypt_signature_age_gate(encrypted_sig)
else:
signature = self._decrypt_signature(encrypted_sig)
if self._downloader.params.get('verbose'):
if age_gate:
if player_url is None:
player_version = 'unknown'
else:
player_version = self._search_regex(
r'-(.+)\.swf$', player_url,
u'flash player', fatal=False)
player_desc = 'flash player %s' % player_version
else:
player_version = self._search_regex(
r'html5player-(.+?)\.js', video_webpage,
'html5 player', fatal=False)
player_desc = u'html5 player %s' % player_version
parts_sizes = u'.'.join(compat_str(len(part)) for part in encrypted_sig.split('.'))
self.to_screen(u'encrypted signature length %d (%s), itag %s, %s' %
(len(encrypted_sig), parts_sizes, url_data['itag'][0], player_desc))
if not age_gate:
jsplayer_url_json = self._search_regex(
r'"assets":.+?"js":\s*("[^"]+")',
video_webpage, u'JS player URL')
player_url = json.loads(jsplayer_url_json)
signature = self._decrypt_signature(
encrypted_sig, video_id, player_url, age_gate)
url += '&signature=' + signature
if 'ratebypass' not in url:
url += '&ratebypass=yes'
@@ -813,7 +1480,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor, SubtitlesInfoExtractor):
return
else:
raise ExtractorError(u'no conn or url_encoded_fmt_stream_map information found in video info')
raise ExtractorError(u'no conn, hlsvp or url_encoded_fmt_stream_map information found in video info')
results = []
for format_param, video_real_url in video_url_list:
@@ -837,7 +1504,9 @@ class YoutubeIE(YoutubeBaseInfoExtractor, SubtitlesInfoExtractor):
'description': video_description,
'player_url': player_url,
'subtitles': video_subtitles,
'duration': video_duration
'duration': video_duration,
'age_limit': 18 if age_gate else 0,
'annotations': video_annotations
})
return results
@@ -871,9 +1540,19 @@ class YoutubePlaylistIE(InfoExtractor):
mobj = re.match(self._VALID_URL, url, re.VERBOSE)
if mobj is None:
raise ExtractorError(u'Invalid URL: %s' % url)
playlist_id = mobj.group(1) or mobj.group(2)
# Check if it's a video-specific URL
query_dict = compat_urlparse.parse_qs(compat_urlparse.urlparse(url).query)
if 'v' in query_dict:
video_id = query_dict['v'][0]
if self._downloader.params.get('noplaylist'):
self.to_screen(u'Downloading just video %s because of --no-playlist' % video_id)
return self.url_result('https://www.youtube.com/watch?v=' + video_id, 'Youtube')
else:
self.to_screen(u'Downloading playlist PL%s - add --no-playlist to just download video %s' % (playlist_id, video_id))
# Download playlist videos from API
playlist_id = mobj.group(1) or mobj.group(2)
videos = []
for page_num in itertools.count(1):
@@ -968,7 +1647,7 @@ class YoutubeChannelIE(InfoExtractor):
class YoutubeUserIE(InfoExtractor):
IE_DESC = u'YouTube.com user videos (URL or "ytuser" keyword)'
_VALID_URL = r'(?:(?:(?:https?://)?(?:\w+\.)?youtube\.com/(?:user/)?)|ytuser:)(?!feed/)([A-Za-z0-9_-]+)'
_VALID_URL = r'(?:(?:(?:https?://)?(?:\w+\.)?youtube\.com/(?:user/)?(?!(?:attribution_link|watch)(?:$|[^a-z_A-Z0-9-])))|ytuser:)(?!feed/)([A-Za-z0-9_-]+)'
_TEMPLATE_URL = 'http://gdata.youtube.com/feeds/api/users/%s'
_GDATA_PAGE_SIZE = 50
_GDATA_URL = 'http://gdata.youtube.com/feeds/api/users/%s/uploads?max-results=%d&start-index=%d&alt=json'
@@ -1008,6 +1687,9 @@ class YoutubeUserIE(InfoExtractor):
response = json.loads(page)
except ValueError as err:
raise ExtractorError(u'Invalid JSON in API response: ' + compat_str(err))
if 'entry' not in response['feed']:
# Number of videos is a multiple of self._MAX_RESULTS
break
# Extract video identifiers
ids_in_page = []
@@ -1158,3 +1840,18 @@ class YoutubeFavouritesIE(YoutubeBaseInfoExtractor):
webpage = self._download_webpage('https://www.youtube.com/my_favorites', 'Youtube Favourites videos')
playlist_id = self._search_regex(r'list=(.+?)["&]', webpage, u'favourites playlist id')
return self.url_result(playlist_id, 'YoutubePlaylist')
class YoutubeTruncatedURLIE(InfoExtractor):
IE_NAME = 'youtube:truncated_url'
IE_DESC = False # Do not list
_VALID_URL = r'(?:https?://)?[^/]+/watch\?feature=[a-z_]+$'
def _real_extract(self, url):
raise ExtractorError(
u'Did you forget to quote the URL? Remember that & is a meta '
u'character in most shells, so you want to put the URL in quotes, '
u'like youtube-dl '
u'\'http://www.youtube.com/watch?feature=foo&v=BaW_jenozKc\''
u' (or simply youtube-dl BaW_jenozKc ).',
expected=True)

View File

@@ -2,16 +2,14 @@ import re
from .common import InfoExtractor
from ..utils import (
determine_ext,
ExtractorError,
unescapeHTML,
)
class ZDFIE(InfoExtractor):
_VALID_URL = r'^http://www\.zdf\.de\/ZDFmediathek\/(.*beitrag\/video\/)(?P<video_id>[^/\?]+)(?:\?.*)?'
_TITLE = r'<h1(?: class="beitragHeadline")?>(?P<title>.*)</h1>'
_VALID_URL = r'^http://www\.zdf\.de\/ZDFmediathek(?P<hash>#)?\/(.*beitrag\/video\/)(?P<video_id>[^/\?]+)(?:\?.*)?'
_MEDIA_STREAM = r'<a href="(?P<video_url>.+(?P<media_type>.streaming).+/zdf/(?P<quality>[^\/]+)/[^"]*)".+class="play".+>'
_MMS_STREAM = r'href="(?P<video_url>mms://[^"]*)"'
_RTSP_STREAM = r'(?P<video_url>rtsp://[^"]*.mp4)'
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
@@ -19,6 +17,9 @@ class ZDFIE(InfoExtractor):
raise ExtractorError(u'Invalid URL: %s' % url)
video_id = mobj.group('video_id')
if mobj.group('hash'):
url = url.replace(u'#', u'', 1)
html = self._download_webpage(url, video_id)
streams = [m.groupdict() for m in re.finditer(self._MEDIA_STREAM, html)]
if streams is None:
@@ -27,39 +28,48 @@ class ZDFIE(InfoExtractor):
# s['media_type'] == 'wstreaming' -> use 'Windows Media Player' and mms url
# s['media_type'] == 'hstreaming' -> use 'Quicktime' and rtsp url
# choose first/default media type and highest quality for now
for s in streams: #find 300 - dsl1000mbit
if s['quality'] == '300' and s['media_type'] == 'wstreaming':
stream_=s
break
for s in streams: #find veryhigh - dsl2000mbit
if s['quality'] == 'veryhigh' and s['media_type'] == 'wstreaming': # 'hstreaming' - rtsp is not working
stream_=s
break
if stream_ is None:
def stream_pref(s):
TYPE_ORDER = ['ostreaming', 'hstreaming', 'wstreaming']
try:
type_pref = TYPE_ORDER.index(s['media_type'])
except ValueError:
type_pref = 999
QUALITY_ORDER = ['veryhigh', '300']
try:
quality_pref = QUALITY_ORDER.index(s['quality'])
except ValueError:
quality_pref = 999
return (type_pref, quality_pref)
sorted_streams = sorted(streams, key=stream_pref)
if not sorted_streams:
raise ExtractorError(u'No stream found.')
stream = sorted_streams[0]
media_link = self._download_webpage(stream_['video_url'], video_id,'Get stream URL')
media_link = self._download_webpage(
stream['video_url'],
video_id,
u'Get stream URL')
self.report_extraction(video_id)
mobj = re.search(self._TITLE, html)
MMS_STREAM = r'href="(?P<video_url>mms://[^"]*)"'
RTSP_STREAM = r'(?P<video_url>rtsp://[^"]*.mp4)'
mobj = re.search(self._MEDIA_STREAM, media_link)
if mobj is None:
raise ExtractorError(u'Cannot extract title')
title = unescapeHTML(mobj.group('title'))
mobj = re.search(self._MMS_STREAM, media_link)
if mobj is None:
mobj = re.search(self._RTSP_STREAM, media_link)
mobj = re.search(RTSP_STREAM, media_link)
if mobj is None:
raise ExtractorError(u'Cannot extract mms:// or rtsp:// URL')
mms_url = mobj.group('video_url')
video_url = mobj.group('video_url')
mobj = re.search('(.*)[.](?P<ext>[^.]+)', mms_url)
if mobj is None:
raise ExtractorError(u'Cannot extract extention')
ext = mobj.group('ext')
title = self._html_search_regex(
r'<h1(?: class="beitragHeadline")?>(.*?)</h1>',
html, u'title')
return [{'id': video_id,
'url': mms_url,
'title': title,
'ext': ext
}]
return {
'id': video_id,
'url': video_url,
'title': title,
'ext': determine_ext(video_url)
}

View File

@@ -1,6 +1,9 @@
import io
import json
import traceback
import hashlib
import subprocess
import sys
from zipimport import zipimporter
from .utils import *
@@ -34,7 +37,7 @@ def rsa_verify(message, signature, key):
if signature != sha256(message).digest(): return False
return True
def update_self(to_screen, verbose, filename):
def update_self(to_screen, verbose):
"""Update the program file with the latest version from the repository"""
UPDATE_URL = "http://rg3.github.io/youtube-dl/update/"
@@ -42,7 +45,6 @@ def update_self(to_screen, verbose, filename):
JSON_URL = UPDATE_URL + 'versions.json'
UPDATES_RSA_KEY = (0x9d60ee4d8f805312fdb15a62f87b95bd66177b91df176765d13514a0f1754bcd2057295c5b6f1d35daa6742c3ffc9a82d3e118861c207995a8031e151d863c9927e304576bc80692bc8e094896fcf11b66f3e29e04e3a71e9a11558558acea1840aec37fc396fb6b65dc81a1c4144e03bd1c011de62e3f1357b327d08426fe93, 65537)
if not isinstance(globals().get('__loader__'), zipimporter) and not hasattr(sys, "frozen"):
to_screen(u'It looks like you installed youtube-dl with a package manager, pip, setup.py or a tarball. Please use that to update.')
return
@@ -75,11 +77,18 @@ def update_self(to_screen, verbose, filename):
to_screen(u'ERROR: the versions file signature is invalid. Aborting.')
return
to_screen(u'Updating to version ' + versions_info['latest'] + '...')
version = versions_info['versions'][versions_info['latest']]
version_id = versions_info['latest']
to_screen(u'Updating to version ' + version_id + '...')
version = versions_info['versions'][version_id]
print_notes(to_screen, versions_info['versions'])
filename = sys.argv[0]
# Py2EXE: Filename could be different
if hasattr(sys, "frozen") and not os.path.isfile(filename):
if os.path.isfile(filename + u'.exe'):
filename += u'.exe'
if not os.access(filename, os.W_OK):
to_screen(u'ERROR: no write permissions on %s' % filename)
return
@@ -116,16 +125,18 @@ def update_self(to_screen, verbose, filename):
try:
bat = os.path.join(directory, 'youtube-dl-updater.bat')
b = open(bat, 'w')
b.write("""
echo Updating youtube-dl...
with io.open(bat, 'w') as batfile:
batfile.write(u"""
@echo off
echo Waiting for file handle to be closed ...
ping 127.0.0.1 -n 5 -w 1000 > NUL
move /Y "%s.new" "%s"
del "%s"
\n""" %(exe, exe, bat))
b.close()
move /Y "%s.new" "%s" > NUL
echo Updated youtube-dl to version %s.
start /b "" cmd /c del "%%~f0"&exit /b"
\n""" % (exe, exe, version_id))
os.startfile(bat)
subprocess.Popen([bat]) # Continues to run in the background
return # Do not show premature success messages
except (IOError, OSError) as err:
if verbose: to_screen(compat_str(traceback.format_exc()))
to_screen(u'ERROR: unable to overwrite current version')

View File

@@ -9,6 +9,7 @@ import io
import json
import locale
import os
import pipes
import platform
import re
import socket
@@ -66,6 +67,12 @@ try:
except ImportError: # Python 2
from urllib2 import HTTPError as compat_HTTPError
try:
from urllib.request import urlretrieve as compat_urlretrieve
except ImportError: # Python 2
from urllib import urlretrieve as compat_urlretrieve
try:
from subprocess import DEVNULL
compat_subprocess_get_DEVNULL = lambda: DEVNULL
@@ -169,7 +176,7 @@ def compat_ord(c):
compiled_regex_type = type(re.compile(''))
std_headers = {
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:10.0) Gecko/20100101 Firefox/10.0',
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:10.0) Gecko/20100101 Firefox/10.0 (Chrome)',
'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.7',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Encoding': 'gzip, deflate',
@@ -223,6 +230,19 @@ else:
return f
return None
# On python2.6 the xml.etree.ElementTree.Element methods don't support
# the namespace parameter
def xpath_with_ns(path, ns_map):
components = [c.split(':') for c in path.split('/')]
replaced = []
for c in components:
if len(c) == 1:
replaced.append(c[0])
else:
ns, tag = c
replaced.append('{%s}%s' % (ns_map[ns], tag))
return '/'.join(replaced)
def htmlentity_transform(matchobj):
"""Transforms an HTML entity to a character.
@@ -552,6 +572,11 @@ class ExtractorError(Exception):
return u''.join(traceback.format_tb(self.traceback))
class RegexNotFoundError(ExtractorError):
"""Error when a regex didn't match"""
pass
class DownloadError(Exception):
"""Download Error exception.
@@ -709,6 +734,7 @@ def unified_strdate(date_str):
'%Y/%m/%d %H:%M:%S',
'%d.%m.%Y %H:%M',
'%Y-%m-%dT%H:%M:%SZ',
'%Y-%m-%dT%H:%M:%S',
]
for expression in format_expressions:
try:
@@ -818,3 +844,135 @@ def intlist_to_bytes(xs):
return ''.join([chr(x) for x in xs])
else:
return bytes(xs)
def get_cachedir(params={}):
cache_root = os.environ.get('XDG_CACHE_HOME',
os.path.expanduser('~/.cache'))
return params.get('cachedir', os.path.join(cache_root, 'youtube-dl'))
# Cross-platform file locking
if sys.platform == 'win32':
import ctypes.wintypes
import msvcrt
class OVERLAPPED(ctypes.Structure):
_fields_ = [
('Internal', ctypes.wintypes.LPVOID),
('InternalHigh', ctypes.wintypes.LPVOID),
('Offset', ctypes.wintypes.DWORD),
('OffsetHigh', ctypes.wintypes.DWORD),
('hEvent', ctypes.wintypes.HANDLE),
]
kernel32 = ctypes.windll.kernel32
LockFileEx = kernel32.LockFileEx
LockFileEx.argtypes = [
ctypes.wintypes.HANDLE, # hFile
ctypes.wintypes.DWORD, # dwFlags
ctypes.wintypes.DWORD, # dwReserved
ctypes.wintypes.DWORD, # nNumberOfBytesToLockLow
ctypes.wintypes.DWORD, # nNumberOfBytesToLockHigh
ctypes.POINTER(OVERLAPPED) # Overlapped
]
LockFileEx.restype = ctypes.wintypes.BOOL
UnlockFileEx = kernel32.UnlockFileEx
UnlockFileEx.argtypes = [
ctypes.wintypes.HANDLE, # hFile
ctypes.wintypes.DWORD, # dwReserved
ctypes.wintypes.DWORD, # nNumberOfBytesToLockLow
ctypes.wintypes.DWORD, # nNumberOfBytesToLockHigh
ctypes.POINTER(OVERLAPPED) # Overlapped
]
UnlockFileEx.restype = ctypes.wintypes.BOOL
whole_low = 0xffffffff
whole_high = 0x7fffffff
def _lock_file(f, exclusive):
overlapped = OVERLAPPED()
overlapped.Offset = 0
overlapped.OffsetHigh = 0
overlapped.hEvent = 0
f._lock_file_overlapped_p = ctypes.pointer(overlapped)
handle = msvcrt.get_osfhandle(f.fileno())
if not LockFileEx(handle, 0x2 if exclusive else 0x0, 0,
whole_low, whole_high, f._lock_file_overlapped_p):
raise OSError('Locking file failed: %r' % ctypes.FormatError())
def _unlock_file(f):
assert f._lock_file_overlapped_p
handle = msvcrt.get_osfhandle(f.fileno())
if not UnlockFileEx(handle, 0,
whole_low, whole_high, f._lock_file_overlapped_p):
raise OSError('Unlocking file failed: %r' % ctypes.FormatError())
else:
import fcntl
def _lock_file(f, exclusive):
fcntl.lockf(f, fcntl.LOCK_EX if exclusive else fcntl.LOCK_SH)
def _unlock_file(f):
fcntl.lockf(f, fcntl.LOCK_UN)
class locked_file(object):
def __init__(self, filename, mode, encoding=None):
assert mode in ['r', 'a', 'w']
self.f = io.open(filename, mode, encoding=encoding)
self.mode = mode
def __enter__(self):
exclusive = self.mode != 'r'
try:
_lock_file(self.f, exclusive)
except IOError:
self.f.close()
raise
return self
def __exit__(self, etype, value, traceback):
try:
_unlock_file(self.f)
finally:
self.f.close()
def __iter__(self):
return iter(self.f)
def write(self, *args):
return self.f.write(*args)
def read(self, *args):
return self.f.read(*args)
def shell_quote(args):
return ' '.join(map(pipes.quote, args))
def takewhile_inclusive(pred, seq):
""" Like itertools.takewhile, but include the latest evaluated element
(the first element so that Not pred(e)) """
for e in seq:
yield e
if not pred(e):
return
def smuggle_url(url, data):
""" Pass additional data in a URL for internal use. """
sdata = compat_urllib_parse.urlencode(
{u'__youtubedl_smuggle': json.dumps(data)})
return url + u'#' + sdata
def unsmuggle_url(smug_url):
if not '#__youtubedl_smuggle' in smug_url:
return smug_url, None
url, _, sdata = smug_url.rpartition(u'#')
jsond = compat_parse_qs(sdata)[u'__youtubedl_smuggle'][0]
data = json.loads(jsond)
return url, data

View File

@@ -1,2 +1,2 @@
__version__ = '2013.09.17'
__version__ = '2013.10.23.2'